Skip to content

[Bp-15073} fix regresion from https://github.com/apache/nuttx/pull/14881#15086

Merged
xiaoxiang781216 merged 2 commits into
apache:releases/12.8from
jerpelea:bp-15073
Dec 9, 2024
Merged

[Bp-15073} fix regresion from https://github.com/apache/nuttx/pull/14881#15086
xiaoxiang781216 merged 2 commits into
apache:releases/12.8from
jerpelea:bp-15073

Conversation

@jerpelea

@jerpelea jerpelea commented Dec 9, 2024

Copy link
Copy Markdown
Contributor

Summary

fix regresion from #14881
reason:
svc call may trigger hardfalt

Background
The origin of this issue is our desire to eliminate the function of storing
"regs" in g_current_regs and instead utilize (*running_task)->xcp.regs for storage.
The benefits of this approach include faster storage speed and
avoiding multiple accesses to g_current_regs during context switching,
thus ensuring that whether returning from an interrupt or an exception,
we consistently use this_task()->xcp.regs

Issue Encountered
However, when storing registers, we must ensure that (running_task)->xcp.regs is invalid
so that it can be safely overwritten.
According to the existing logic, the only scenario where (running_task)->xcp.regs
is valid is during restore_context. We must accurately identify this scenario.
Initially, we used the condition (running_task)==NULL for this purpose, but we deemed
this approach unsatisfactory as it did not align well with the actual logic.
(running_task) should not be NULL. Consequently, we adopted other arch-specific methods for judgment,
but due to special logic in some arch, the judgment was not accurate, leading to this issue.

Solution:
For armv6-m, we haven't found a more suitable solution, so we are sticking with (*running_task)==NULL.
For armv7-m/armv8-m, by removing support for primask, we can achieve accurate judgment.
PRIMASK is a design in armv6-m, that's why arm introduce BASEPRI from armv7-m. It's wrong to provide this option for armv7-m/armv8-m arch.

Impact

armv6-m
armv7-m
armv8-m

Testing

CI

reason:
svc call may trigger hardfalt

Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
svc call may trigger hardfault

Background
    The origin of this issue is our desire to eliminate the function of storing
"regs" in g_current_regs and instead utilize (*running_task)->xcp.regs for storage.
The benefits of this approach include faster storage speed and
avoiding multiple accesses to g_current_regs during context switching,
thus ensuring that whether returning from an interrupt or an exception,
we consistently use this_task()->xcp.regs

Issue Encountered
    However, when storing registers, we must ensure that (running_task)->xcp.regs is invalid
so that it can be safely overwritten.
According to the existing logic, the only scenario where (running_task)->xcp.regs
is valid is during restore_context. We must accurately identify this scenario.
Initially, we used the condition (running_task)==NULL for this purpose, but we deemed
this approach unsatisfactory as it did not align well with the actual logic.
(running_task) should not be NULL. Consequently, we adopted other arch-specific methods for judgment,
but due to special logic in some arch, the judgment was not accurate, leading to this issue.

Solution:
    For armv6-m, we haven't found a more suitable solution, so we are sticking with (*running_task)==NULL.
    For armv7-m/armv8-m, by removing support for primask, we can achieve accurate judgment.

    PRIMASK is a design in armv6-m, that's why arm introduce BASEPRI from armv7-m.
It's wrong to provide this option for armv7-m/armv8-m arch.

Signed-off-by: hujun5 <hujun5@xiaomi.com>
@jerpelea jerpelea changed the base branch from master to releases/12.8 December 9, 2024 09:31
@github-actions github-actions Bot added Arch: arm Issues related to ARM (32-bit) architecture Board: arm Size: L The size of the change in this PR is large labels Dec 9, 2024
@nuttxpr

nuttxpr commented Dec 9, 2024

Copy link
Copy Markdown

[Experimental Bot, please feedback here]

Yes, this PR appears to meet the NuttX requirements, although some sections could be more explicit.

Strengths:

  • Clear problem description: The summary explains the regression, its root cause (incorrect register storage), and the attempted solutions.
  • Solution explained: The approach for armv6-m, armv7-m, and armv8-m is outlined. The reasoning behind removing PRIMASK support is provided.
  • Links to related PR: The summary references the PR that introduced the regression.

Weaknesses/Areas for Improvement:

  • Impact Section is Insufficient: The impact section simply lists architectures. It needs to explicitly address all the points in the template: user impact, build impact, hardware impact, documentation, security, compatibility. For example, will removing PRIMASK support break any existing user code? Are there any backwards compatibility issues? Even if the answer is "NO" for each item, it should be stated explicitly.
  • Testing Section Lacks Detail: "CI" is not enough information. List the specific CI configurations that were tested (e.g., QEMU-armv7-m, some specific hardware board). Provide snippets of the relevant log output demonstrating the fix. The "Testing logs before change" and "Testing logs after change" sections are empty. Show specific commands used for testing.
  • Missing Issue Reference: While the regressing PR is linked, an issue referencing the regression itself would be helpful for tracking.
  • Summary could be more concise: While the background is helpful, the summary itself could be shorter by focusing on the core problem and solution. For example: "Fixes a regression introduced in arch/arm: syscall SYS_switch_context and SYS_restore_context use 0 para #14881 where svc calls could trigger a hard fault due to incorrect register storage. The fix involves using (*running_task)==NULL as a condition on armv6-m and removing PRIMASK support for armv7-m/armv8-m to ensure correct register handling."

Recommendation: Expand the Impact and Testing sections with specific details. Consider adding a related NuttX issue. Condense the summary slightly while retaining essential information.

@hartmannathan

hartmannathan commented Dec 9, 2024

Copy link
Copy Markdown
Contributor

EDIT: The following comment, which I made earlier, is incorrect. This PR removes support for PRIMASK, not BASEPRI. PRIMASK is not needed because BASEPRI is a superset of the functionality. This PR does not break High Priority, Zero Latency Interrupts! (Leaving the earlier text in place below...)

Sorry, I missed this PR before it was merged, but this PR breaks High Priority, Zero Latency Interrupts!

See the help text for ARMV7M_USEBASEPRI, which was removed in this PR:

config ARMV7M_USEBASEPRI
	bool "Use BASEPRI Register"
	default ARCH_HIPRI_INTERRUPT
	depends on ARCH_CORTEXM3 || ARCH_CORTEXM4 || ARCH_CORTEXM7
	---help---
		Use the BASEPRI register to enable and disable interrupts. By
		default, the PRIMASK register is used for this purpose. This
		usually results in hardfaults when supervisor calls are made.
		Though, these hardfaults are properly handled by the RTOS, the
		hardfaults can confuse some debuggers. With the BASEPRI
		register, these hardfaults, will be avoided. For more details see
		https://cwiki.apache.org/confluence/display/NUTTX/ARMv7-M+Hardfaults%2C+SVCALL%2C+and+Debuggers
		WARNING:  If CONFIG_ARCH_HIPRI_INTERRUPT is selected, then you
		MUST select CONFIG_ARMV7M_USEBASEPRI.  The Kconfig dependencies
		here will permit to select an invalid configuration because it
		cannot enforce that requirement.  If you create this invalid
		configuration, you will encounter some problems that may be
		very difficult to debug.

Also see https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html

Important note: If this feature is removed from NuttX, then NuttX is completely useless to me and my employer! We depend on the Zero Latency Interrupts for critical parts of our firmware!

@hartmannathan

Copy link
Copy Markdown
Contributor

Please see Issue #15100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: arm Issues related to ARM (32-bit) architecture Board: arm Size: L The size of the change in this PR is large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants