Skip to content

fix regresion from https://github.com/apache/nuttx/pull/14881#15073

Merged
xiaoxiang781216 merged 2 commits into
apache:masterfrom
hujun260:apache_9
Dec 9, 2024
Merged

fix regresion from https://github.com/apache/nuttx/pull/14881#15073
xiaoxiang781216 merged 2 commits into
apache:masterfrom
hujun260:apache_9

Conversation

@hujun260

@hujun260 hujun260 commented Dec 6, 2024

Copy link
Copy Markdown
Contributor

Summary

fix regresion from #14881
reason:
svc call may trigger hardfalt

Background
The origin of this issue is our desire to eliminate the function of storing
"regs" in g_current_regs and instead utilize (*running_task)->xcp.regs for storage.
The benefits of this approach include faster storage speed and
avoiding multiple accesses to g_current_regs during context switching,
thus ensuring that whether returning from an interrupt or an exception,
we consistently use this_task()->xcp.regs

Issue Encountered
However, when storing registers, we must ensure that (running_task)->xcp.regs is invalid
so that it can be safely overwritten.
According to the existing logic, the only scenario where (running_task)->xcp.regs
is valid is during restore_context. We must accurately identify this scenario.
Initially, we used the condition (running_task)==NULL for this purpose, but we deemed
this approach unsatisfactory as it did not align well with the actual logic.
(running_task) should not be NULL. Consequently, we adopted other arch-specific methods for judgment,
but due to special logic in some arch, the judgment was not accurate, leading to this issue.

Solution:
For armv6-m, we haven't found a more suitable solution, so we are sticking with (*running_task)==NULL.
For armv7-m/armv8-m, by removing support for primask, we can achieve accurate judgment.
PRIMASK is a design in armv6-m, that's why arm introduce BASEPRI from armv7-m. It's wrong to provide this option for armv7-m/armv8-m arch.

Impact

armv6-m
armv7-m
armv8-m

Testing

ci

@github-actions github-actions Bot added Arch: arm Issues related to ARM (32-bit) architecture Size: S The size of the change in this PR is small labels Dec 6, 2024
@nuttxpr

nuttxpr commented Dec 6, 2024

Copy link
Copy Markdown

[Experimental Bot, please feedback here]

No, this PR does not fully meet the NuttX requirements. While it provides a link to the related PR and identifies affected architectures, it lacks crucial details. Here's a breakdown:

  • Insufficient Summary: It states what is fixed (a regression), but not how. It briefly mentions a hard fault possibility, but doesn't explain the root cause of the regression or the solution implemented. The functional part of the code being changed isn't specified.
  • Incomplete Impact: While it lists architectures, it doesn't use the YES/NO format and lacks descriptions. All other impact categories are missing. Does this change require documentation updates? Are there compatibility concerns?
  • Inadequate Testing: Stating "ci" isn't sufficient. The requirements ask for specific build host details and target details. Critically, it's missing "before" and "after" testing logs, which are essential to demonstrate the fix's effectiveness.

The PR needs substantial improvement in all three sections (Summary, Impact, and Testing) before it can be considered complete.

Comment thread arch/arm/src/armv6-m/arm_doirq.c Outdated
@jasonbu

jasonbu commented Dec 6, 2024

Copy link
Copy Markdown
Contributor

also influence this issue #14514

@hujun260 hujun260 marked this pull request as draft December 6, 2024 07:18
@hujun260 hujun260 marked this pull request as ready for review December 6, 2024 07:59
@github-actions github-actions Bot added Size: XS The size of the change in this PR is very small and removed Size: S The size of the change in this PR is small labels Dec 6, 2024
@xiaoxiang781216 xiaoxiang781216 linked an issue Dec 6, 2024 that may be closed by this pull request
1 task
@raiden00pl

Copy link
Copy Markdown
Member

I confirm that this PR fixes this issue on nrf52 #14514

@raiden00pl raiden00pl linked an issue Dec 6, 2024 that may be closed by this pull request
1 task
Comment thread arch/arm/Kconfig Outdated

@raiden00pl raiden00pl left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the consequences of this change? Does it have any disadvantages compared to PRIMASK? This is not a trivial change, so it's worth discuss it more as we basically force users to use BASEPRI here.

@xiaoxiang781216

xiaoxiang781216 commented Dec 6, 2024

Copy link
Copy Markdown
Contributor

What are the consequences of this change?

syscall doesn't go through hardfault, which make the syscall process is simple and fast, also avoid confusing the developer with JTAG debugging.

Does it have any disadvantages compared to PRIMASK?

I don't see any bad side effect.

This is not a trivial change, so it's worth discuss it more as we basically force users to use BASEPRI here.

PRIMASK is a bad design in armv6-m, that's why arm introduce BASEPRI from armv7-m. It's wrong to provide this option for armv7-m/armv8-m arch.

@raiden00pl

Copy link
Copy Markdown
Member

@hujun260 what about armv6-m ? this arch doesn't support BASEPRI, so this architecture is still broken.

@raiden00pl

Copy link
Copy Markdown
Member

PRIMASK is a bad design in armv6-m, that's why arm introduce BASEPRI from armv7-m. It's wrong to provide this option for armv7-m/armv8-m arch.

I think the timeline is different. armv7-m was introduced before armv6-m. The second one is a cut version of the first one and if ARM decided to get rid of BASEPRI they must have had a reason for this (but of course they could have made a design mistake here)

I don't see any bad side effect.

I don't see any flaws in this solution either and it can simplify a lot of things. But people use NuttX in different ways. A change like this should be more widely communicated in the community, maybe someone sees drawbacks.

@hujun260

hujun260 commented Dec 6, 2024

Copy link
Copy Markdown
Contributor Author

@hujun260 what about armv6-m ? this arch doesn't support BASEPRI, so this architecture is still broken.

It has been restored to the old implementation.

@github-actions github-actions Bot added Size: S The size of the change in this PR is small and removed Size: XS The size of the change in this PR is very small labels Dec 6, 2024
@github-actions github-actions Bot added Board: arm Size: L The size of the change in this PR is large and removed Size: S The size of the change in this PR is small labels Dec 6, 2024
@hujun260 hujun260 force-pushed the apache_9 branch 2 times, most recently from 103c9ea to a36b0bd Compare December 6, 2024 13:11
Comment thread arch/arm/src/armv8-m/arm_tcbinfo.c
Comment thread arch/arm/src/armv7-m/arm_tcbinfo.c
Comment thread arch/arm/Kconfig Outdated
Comment thread arch/arm/src/armv8-m/arm_tcbinfo.c
@acassis

acassis commented Dec 6, 2024

Copy link
Copy Markdown
Contributor

@hujun260 Please include a complete explanation in the commit log message. I saw @xiaoxiang781216 try to explain a little bit about it, but it is important to explain what was modified initially that caused the issue (and why it was modified) and then how the new modification fixes the issue. It could help us understand it better and help other people in the future when looking this commit.

@hujun260

hujun260 commented Dec 8, 2024

Copy link
Copy Markdown
Contributor Author

was not accurate, leading to this issue.

Solution: For armv6-m, we haven't found a more suitable solution, so we are sticking with (*running_task)==NULL. For armv7-m/armv8-m, by remo

done

Comment thread arch/arm/include/types.h Outdated
@acassis

acassis commented Dec 8, 2024

Copy link
Copy Markdown
Contributor

@hujun260 please extend the explanation from @xiaoxiang781216 and include it in the commit log message

reason:
svc call may trigger hardfalt

Signed-off-by: hujun5 <hujun5@xiaomi.com>
@xiaoxiang781216

Copy link
Copy Markdown
Contributor

@hujun260 please extend the explanation from @xiaoxiang781216 and include it in the commit log message

aleady add to the second commit

@anchao

anchao commented Dec 9, 2024

Copy link
Copy Markdown
Contributor

@hujun260 @xiaoxiang781216 Could we remove the word "bad" from the commit message? I don't think this is a bad design, right? These are two interrupt masking mechanisms provided by the CPU. Similar to cortex-a/r, the CPU also provides cpsr and gic mechanisms, but the application scenarios are different.

reason:
svc call may trigger hardfault

Background
    The origin of this issue is our desire to eliminate the function of storing
"regs" in g_current_regs and instead utilize (*running_task)->xcp.regs for storage.
The benefits of this approach include faster storage speed and
avoiding multiple accesses to g_current_regs during context switching,
thus ensuring that whether returning from an interrupt or an exception,
we consistently use this_task()->xcp.regs

Issue Encountered
    However, when storing registers, we must ensure that (running_task)->xcp.regs is invalid
so that it can be safely overwritten.
According to the existing logic, the only scenario where (running_task)->xcp.regs
is valid is during restore_context. We must accurately identify this scenario.
Initially, we used the condition (running_task)==NULL for this purpose, but we deemed
this approach unsatisfactory as it did not align well with the actual logic.
(running_task) should not be NULL. Consequently, we adopted other arch-specific methods for judgment,
but due to special logic in some arch, the judgment was not accurate, leading to this issue.

Solution:
    For armv6-m, we haven't found a more suitable solution, so we are sticking with (*running_task)==NULL.
    For armv7-m/armv8-m, by removing support for primask, we can achieve accurate judgment.

    PRIMASK is a design in armv6-m, that's why arm introduce BASEPRI from armv7-m.
It's wrong to provide this option for armv7-m/armv8-m arch.

Signed-off-by: hujun5 <hujun5@xiaomi.com>
@hujun260

hujun260 commented Dec 9, 2024

Copy link
Copy Markdown
Contributor Author

@hujun260 @xiaoxiang781216 Could we remove the word "bad" from the commit message? I don't think this is a bad design, right? These are two interrupt masking mechanisms provided by the CPU. Similar to cortex-a/r, the CPU also provides cpsr and gic mechanisms, but the application scenarios are different.

ok

@xiaoxiang781216 xiaoxiang781216 merged commit 0e1b432 into apache:master Dec 9, 2024
@hartmannathan

hartmannathan commented Dec 9, 2024

Copy link
Copy Markdown
Contributor

EDIT: The following comment, which I made earlier, is incorrect. This PR removes support for PRIMASK, not BASEPRI. PRIMASK is not needed because BASEPRI is a superset of the functionality. This PR does not break High Priority, Zero Latency Interrupts! (Leaving the earlier text in place below...)

Sorry, I missed this PR before it was merged, but this PR breaks High Priority, Zero Latency Interrupts!

See the help text for ARMV7M_USEBASEPRI, which was removed in this PR:

config ARMV7M_USEBASEPRI
	bool "Use BASEPRI Register"
	default ARCH_HIPRI_INTERRUPT
	depends on ARCH_CORTEXM3 || ARCH_CORTEXM4 || ARCH_CORTEXM7
	---help---
		Use the BASEPRI register to enable and disable interrupts. By
		default, the PRIMASK register is used for this purpose. This
		usually results in hardfaults when supervisor calls are made.
		Though, these hardfaults are properly handled by the RTOS, the
		hardfaults can confuse some debuggers. With the BASEPRI
		register, these hardfaults, will be avoided. For more details see
		https://cwiki.apache.org/confluence/display/NUTTX/ARMv7-M+Hardfaults%2C+SVCALL%2C+and+Debuggers
		WARNING:  If CONFIG_ARCH_HIPRI_INTERRUPT is selected, then you
		MUST select CONFIG_ARMV7M_USEBASEPRI.  The Kconfig dependencies
		here will permit to select an invalid configuration because it
		cannot enforce that requirement.  If you create this invalid
		configuration, you will encounter some problems that may be
		very difficult to debug.

Also see https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html

Important note: If this feature is removed from NuttX, then NuttX is completely useless to me and my employer! We depend on the Zero Latency Interrupts for critical parts of our firmware!

@hartmannathan

Copy link
Copy Markdown
Contributor

Please see Issue #15100

@hartmannathan

Copy link
Copy Markdown
Contributor

What are the consequences of this change? Does it have any disadvantages compared to PRIMASK? This is not a trivial change, so it's worth discuss it more as we basically force users to use BASEPRI here.

@raiden00pl the consequences of removing CONFIG_ARMV7M_USEBASEPRI is that high priority zero latency interrupts cannot work anymore. Unfortunately I think several PRs will need to be reverted. It seems the issue starts with PR-14881 which attempts to optimize storing
"regs" in g_current_regs and instead utilize (*running_task)->xcp.regs for storage, but unfortunately it seems this leads to hardfault regression; unfortunately, if the Zero Latency Interrupts are broken, I cannot use NuttX anymore. It is a complete showstopper. I recommend to revert the PRs that led to this issue. I have filed a bug #15100 to track this issue.

@hujun260

Copy link
Copy Markdown
Contributor Author

Important note: If this feature is removed from NuttX, then NuttX is completely useless to me and my employer! We depend on the Zero Latency Interrupts for critical parts of our firmware!

We haven't removed this feature; we've just set it to be enabled by default.

@hartmannathan

Copy link
Copy Markdown
Contributor

Important note: If this feature is removed from NuttX, then NuttX is completely useless to me and my employer! We depend on the Zero Latency Interrupts for critical parts of our firmware!

We haven't removed this feature; we've just set it to be enabled by default.

Thank you for clarifying. I misunderstood the changes earlier.

hujun260 added a commit to hujun260/nuttx that referenced this pull request Jan 13, 2025
Signed-off-by: hujun5 <hujun5@xiaomi.com>
hujun260 added a commit to hujun260/nuttx that referenced this pull request Jan 13, 2025
Signed-off-by: hujun5 <hujun5@xiaomi.com>
@hujun260 hujun260 deleted the apache_9 branch January 24, 2025 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: arm Issues related to ARM (32-bit) architecture Board: arm Size: L The size of the change in this PR is large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Crash Introduced when calling app from nsh [BUG] hardware timers broken for ARM devices

8 participants