Skip to content

xtensa: support coredump by register set alignment#15141

Merged
xiaoxiang781216 merged 1 commit into
apache:masterfrom
sdc-g:feature/xtensa-core_dump
Dec 13, 2024
Merged

xtensa: support coredump by register set alignment#15141
xiaoxiang781216 merged 1 commit into
apache:masterfrom
sdc-g:feature/xtensa-core_dump

Conversation

@sdc-g

@sdc-g sdc-g commented Dec 11, 2024

Copy link
Copy Markdown
Contributor

Note: Please adhere to Contributing Guidelines.

Summary

refer to https://github.com/espressif/binutils-gdb/blob/esp-gdb-12.1/gdb/arch/xtensa.h
align the register set with gdb

Impact

While coredump config is enabled, g_tcbinfo size is larger hundreds of bytes

Testing

./tools/configure.sh esp32-devkitc:nsh
enable CONFIG_COREDUMP, CONFIG_BOARD_COREDUMP_SYSLOG
default enabled CONFIG_BOARD_COREDUMP_FULL, CONFIG_BOARD_COREDUMP_COMPRESSION

check register dumped from NSH:
up_dump_register: PC: 400f1188 PS: 00060f30
up_dump_register: A0: 800e33af A1: 3ffe0bd0 A2: 00000001 A3: 3ffe0470
up_dump_register: A4: 00000000 A5: 3ffafb48 A6: 00000000 A7: 3ffb0d30
up_dump_register: A8: 00000000 A9: 3ffe0bb0 A10: ffffffff A11: 3ffe0470
up_dump_register: A12: 3f402c02 A13: 3ffe0bd0 A14: 00000000 A15: 00000068
up_dump_register: SAR: 00000000 CAUSE: 0000001d VADDR: 00000000
up_dump_register: LBEG: 4000c2e0 LEND: 4000c2f6 LCNT: 00000000

And compare with gdb:
(gdb) info register
pc 0x400f1188 0x400f1188 <coredump_main+144>
lbeg 0x4000c2e0 1073791712
lend 0x4000c2f6 1073791734
lcount 0x0 0
sar 0x0 0
ps 0x60f30 397104
threadptr
br
scompare1
acclo
acchi
m0
m1
m2
m3
expstate
f64r_lo
f64r_hi
f64s
fcr
fsr
a0 0x800e33af -2146552913
a1 0x3ffe0bd0 1073613776
a2 0x1 1
a3 0x3ffe0470 1073611888
a4 0x0 0
a5 0x3ffafb48 1073412936
a6 0x0 0
a7 0x3ffb0d30 1073417520
a8 0x0 0
a9 0x3ffe0bb0 1073613744
a10 0xffffffff -1
a11 0x3ffe0470 1073611888
a12 0x3f402c02 1061170178
a13 0x3ffe0bd0 1073613776
a14 0x0 0
a15 0x68 104

@github-actions github-actions Bot added Arch: xtensa Issues related to the Xtensa architecture Size: S The size of the change in this PR is small labels Dec 11, 2024
@nuttxpr

nuttxpr commented Dec 11, 2024

Copy link
Copy Markdown

[Experimental Bot, please feedback here]

Fill In The Commit Message: This PR contains a Commit with an Empty Commit Message. Please fill in the Commit Message with the PR Summary.

This PR does not fully meet the NuttX requirements. Here's a breakdown:

Missing/Insufficient Information:

  • Summary: While linking to the Xtensa header is helpful, the summary needs more detail. Why is aligning the register set with GDB necessary? What problem does it solve? Was there a bug in the core dump output? Was it missing registers? Were the registers incorrect? Be explicit.

  • Impact: "g_tcbinfo size is larger hundreds of bytes" is vague. How many hundreds? This increase should be quantified. Are there any negative impacts from this increased size? Does it affect memory usage significantly on any particular platforms? Consider any potential downsides.

  • Testing: The testing description is better than nothing, but insufficient.

    • "esp32-devkitc:nsh" is assumed to be a configuration. State this explicitly. "Configured for esp32-devkitc with the nsh configuration" is clearer.
    • Instead of simply stating the configurations enabled, briefly explain why these configurations are relevant to this change. For example: "Enabled CONFIG_COREDUMP to generate core dumps, CONFIG_BOARD_COREDUMP_SYSLOG to output core dumps to syslog, and the default options CONFIG_BOARD_COREDUMP_FULL and CONFIG_BOARD_COREDUMP_COMPRESSION."
    • "check register dumped from NSH" is not descriptive enough. What specific command was used to trigger the core dump? How was the core dump output obtained from the syslog?
    • Provide the complete log output, not just a snippet. The provided snippets don't show how you determined the sizes of g_tcbinfo before and after the change.
    • The testing needs to demonstrate that the change fixes a problem or adds the expected functionality. Simply showing output before and after is not enough. For instance, if this fixes incorrect register values in the core dump, the before log should demonstrate the incorrect values and the after log should demonstrate the corrected values. Highlight the relevant differences.
    • Ideally, there should be a test case added to the NuttX test suite to verify this functionality in the future.
  • Build Host(s): Missing. Specify your build host OS, CPU architecture, and compiler version.

  • Target(s): More specific information is needed. While you mention "esp32-devkitc," specify the Xtensa architecture revision.

Recommendations for Improvement:

  1. Clarify the "why": Explain the specific problem this PR solves and why aligning the registers with GDB is the correct solution.
  2. Quantify the impact: Provide the exact size increase of g_tcbinfo in bytes. Discuss any potential downsides to this increase.
  3. Improve testing description: Describe the steps taken to generate the core dump, capture the output, and compare the register values. Include the full log output before and after the change. Highlight the differences that demonstrate the fix. Add a test case if possible.
  4. Provide build host details: Include your host OS, CPU, and compiler information.
  5. Specify target architecture: Detail the Xtensa architecture revision.

By addressing these points, the PR will better meet the NuttX requirements and make it easier for reviewers to understand, evaluate, and merge your contribution.

@lupyuen

lupyuen commented Dec 11, 2024

Copy link
Copy Markdown
Member

FYI ESP32-S3 will fail to build, I'm not sure why:

@tmedicci

Copy link
Copy Markdown
Contributor

FYI ESP32-S3 will fail to build, I'm not sure why:

HI @lupyuen, the issue is related to #15143 (already merged)

@tmedicci

Copy link
Copy Markdown
Contributor

HI @sdc-g , I didn't get exactly what this PR is trying to solve: Can you please better describe the issue you've found and how this PR solves it?

Also, I didn't get how to test it. Did you use the ELF loader (because https://github.com/apache/nuttx/pull/15141/files#diff-3d73ca45b6136e2c1158bbb323370c90dc4b8209a851129ce6543516e10e5269 was changed)? And how did you dump the registers during runtime? (did you force an exception or use CONFIG_SYSTEM_COREDUMP?)

@xiaoxiang781216

xiaoxiang781216 commented Dec 12, 2024

Copy link
Copy Markdown
Contributor

@sdc-g please fix the style problem, reported by:
./tools/checkpatch.sh -g HEAD~...HEAD
and rebase to the master since the build error is fixed.

Comment thread arch/xtensa/include/elf.h Outdated
Comment thread arch/xtensa/include/elf.h
Comment thread arch/xtensa/src/common/xtensa_tcbinfo.c Outdated
Comment thread arch/xtensa/src/common/xtensa_tcbinfo.c Outdated
Comment thread arch/xtensa/src/common/xtensa_tcbinfo.c Outdated
@sdc-g

sdc-g commented Dec 12, 2024

Copy link
Copy Markdown
Contributor Author

HI @tmedicci

HI @sdc-g , I didn't get exactly what this PR is trying to solve: Can you please better describe the issue you've found and how this PR solves it?

Current xtensa core dump do not define elf_gregset_t, EM_ARCH, EF_FLAG.
So even the build can not be pass when enable CONFIG_COREDUMP

And the defination of elf_gregset_t refer to https://github.com/espressif/binutils-gdb/blob/esp-gdb-12.1/gdb/arch/xtensa.h
Also double confirmed by https://github.com/espressif/binutils-gdb/blob/esp-gdb-12.1/gdb/xtensa-tdep.c#L828

Also, I didn't get how to test it. Did you use the ELF loader (because https://github.com/apache/nuttx/pull/15141/files#diff-3d73ca45b6136e2c1158bbb323370c90dc4b8209a851129ce6543516e10e5269 was changed)? And how did you dump the registers during runtime? (did you force an exception or use CONFIG_SYSTEM_COREDUMP?)

Yes, force crash as below

@@ -313,6 +313,8 @@ static int coredump_now(int pid, FAR char *filename)
   FAR void *stream;
   FAR FILE *file;
   int logmask;
+  char *str = NULL;
+  str[0] = '1';
 
   if (filename != NULL)
     {

@sdc-g sdc-g force-pushed the feature/xtensa-core_dump branch from 49691ea to 67bfc60 Compare December 12, 2024 08:18
@xiaoxiang781216

xiaoxiang781216 commented Dec 12, 2024

Copy link
Copy Markdown
Contributor

@sdc-g please fix:

In file included from common/xtensa_tcbinfo.c:29:
Error: common/xtensa_tcbinfo.c:41:15: error: 'REG_LBEG' undeclared here (not in a function)
   41 |   TCB_REG_OFF(REG_LBEG),
      |               ^~~~~~~~
Error: common/xtensa_tcbinfo.c:42:15: error: 'REG_LEND' undeclared here (not in a function)
   42 |   TCB_REG_OFF(REG_LEND),
      |               ^~~~~~~~
Error: common/xtensa_tcbinfo.c:43:15: error: 'REG_LCOUNT' undeclared here (not in a function)
   43 |   TCB_REG_OFF(REG_LCOUNT),
      |               ^~~~~~~~~~

@sdc-g sdc-g force-pushed the feature/xtensa-core_dump branch from 67bfc60 to 24b3eb1 Compare December 13, 2024 08:58
@xiaoxiang781216 xiaoxiang781216 merged commit 9e8e7ac into apache:master Dec 13, 2024
@chirping78

Copy link
Copy Markdown
Contributor

@sdc-g could you elaborate the steps after coredump?

Do you extract the coredump in log (from "Start coredump:" to “Finish coredump”) save them to a text file,
then use tools/coredump.py to convert it to core file,
at last use xtensa-esp32-elf-gdb nuttx core to examine the core file, is this you steps?

@sdc-g

sdc-g commented Apr 2, 2025

Copy link
Copy Markdown
Contributor Author

Hi @chirping78

Do you extract the coredump in log (from "Start coredump:" to “Finish coredump”) save them to a text file,
then use tools/coredump.py to convert it to core file,
at last use xtensa-esp32-elf-gdb nuttx core to examine the core file, is this you steps?

Yes. But if CONFIG_BOARD_COREDUMP_COMPRESSION is disabled, please use below command to convert from HEX string to binary

  • xxd -p -r elf.dump elf.core

@chirping78

Copy link
Copy Markdown
Contributor

@sdc-g your this function might have already been broken on latest master version. This is what I tested:


First, test the version around 24/12/13, this is when your patch first merged, apps fa22f80, nuttx 9e8e7ac.

use esp32-devkitc:nsh to test, enable CONFIG_COREDUMP CONFIG_BOARD_COREDUMP_SYSLOG
disable CONFIG_NSH_DISABLE_MW

then use mw -1 to trigger the coredump,
save coredump to a text file,
use tools/coredump.py to convert it to core file,
use xtensa-esp32-elf-gdb nuttx core to check register, match with log as expection.


2nd, test the latest master, apps cf1d5bb, nuttx 87c217a.

Still use esp32-devkitc:nsh to test, enable CONFIG_COREDUMP CONFIG_LIBC_LZF CONFIG_BOARD_COREDUMP_SYSLOG
disable CONFIG_NSH_DISABLE_MW

Then same steps as above to trigger, save, convert and launch gdb to check,
this time the register a0 - a15 do not match with log.

Please check whether my steps are not correct, or the funtion is really broken.

@sdc-g

sdc-g commented Apr 2, 2025

Copy link
Copy Markdown
Contributor Author

Hi @chirping78

Thanks for your information.
Sorry, I am not working on the latest Nuttx branch/sha1. I will check later. Thanks for your understanding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: xtensa Issues related to the Xtensa architecture Size: S The size of the change in this PR is small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants