Skip to content

arm64 gki_defconfig boot failure on android-4.19-stable after LLVM commit 5ecd36329508 #1837

@nathanchance

Description

@nathanchance

ARCH=arm64 gki_defconfig has been failing to boot on android-4.19-stable for some time (we were missing it due to -no-reboot):

https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4740164167/jobs/8418405133

$ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- LLVM=1 LLVM_IAS=1 O=build mrproper gki_defconfig menuconfig Image.gz
...

$ boot-qemu.py -k build -t 15s
...
[    0.071379] alternatives: patching kernel code
[    0.085488] Unable to handle kernel paging request at virtual address ffffffbefe638000
[    0.086678] Mem abort info:
[    0.086784]   ESR = 0x97de8007
[    0.086894]   Exception class = DABT (current EL), IL = 32 bits
[    0.087049]   SET = 0, FnV = 0
[    0.087143]   EA = 0, S1PTW = 0
[    0.087303] Data abort info:
[    0.087383]   Access size = 8 byte(s)
[    0.087485]   SSE = 0, SRT = 30
[    0.087571]   SF = 1, AR = 0
[    0.087644]   CM = 0, WnR = 0
[    0.087922] swapper pgtable: 4k pages, 39-bit VAs, pgdp = 00000000533798c2
[    0.088132] [ffffffbefe638000] pgd=00000000414e2803, pud=00000000414e2803, pmd=00000000414e3803, pte=000000005eff9802
[    0.088720] Internal error: Oops: 97de8007 [#1] PREEMPT SMP
[    0.088999] Modules linked in:
[    0.089188] Process swapper/0 (pid: 1, stack limit = 0x0000000085102783)
[    0.089508] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.280-00088-g9b68392a2b58 #1
[    0.089701] Hardware name: linux,dummy-virt (DT)
[    0.089971] pstate: 80800009 (Nzcv daif -PAN +UAO)
[    0.090581] pc : __create_pgd_mapping+0x310/0x6dc
[    0.090706] lr : update_mapping_prot+0x60/0x100
[    0.090814] sp : ffffff800800bce0
[    0.090919] x29: ffffff800800bd70 x28: 0070000000000f93
[    0.091072] x27: 00000ffffbefe639 x26: 0000000040080000
[    0.091211] x25: 0060000000000f93 x24: ffffffa362ce3000
[    0.091347] x23: ffd7fffffffff77f x22: ffffffe5812b0000
[    0.091479] x21: ffffffe580080000 x20: ffffffbefe638000
[    0.091610] x19: 0000000000000002 x18: ffffffe59d913838
[    0.091738] x17: 0000000000002856 x16: 000000018f0365db
[    0.091884] x15: 0000000000002856 x14: 0000000000000289
[    0.092023] x13: ffffffe580080000 x12: 0000000040080000
[    0.092153] x11: 0060000000000f90 x10: 00000000fffff06c
[    0.092307] x9 : ffffffe5801fffff x8 : ffffffe580200000
[    0.092425] x7 : ffffffe5812affff x6 : ffffffe5812b0000
[    0.092546] x5 : 0000000000000000 x4 : ffffffbefe638000
[    0.092687] x3 : 0000000000000041 x2 : ffffffa362ce5000
[    0.092811] x1 : 0000000040080000 x0 : ffffffa362d4d000
[    0.093012] Call trace:
[    0.093152]  __create_pgd_mapping+0x310/0x6dc
[    0.093267]  update_mapping_prot+0x60/0x100
[    0.093379]  mark_linear_text_alias_ro+0x50/0x70
[    0.093476]  smp_cpus_done+0x38/0x44
[    0.093572]  smp_init+0xf8/0x110
[    0.093643]  kernel_init_freeable+0xb4/0x144
[    0.093754]  kernel_init+0x18/0x298
[    0.093841]  ret_from_fork+0x10/0x18
[    0.095232] Code: 926baaa8 b24052a9 91480108 eb07013f (f940029e)
[    0.097110] ---[ end trace 28f618bca044740b ]---
[    0.098193] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.098193]
[    0.098865] Rebooting in 5 seconds..
qemu-system-aarch64: terminating on signal 15 from pid 1139109 (timeout)

I bisected it down to llvm/llvm-project@5ecd363.

# bad: [603c286334b07f568d39f6706c848f576914f323] Bump the trunk major version to 17
# good: [809855b56f06dd7182685f88fbbc64111df9339a] Bump the trunk major version to 16
git bisect start 'llvmorg-17-init' 'llvmorg-16-init'
# good: [a784de783af5096e593c5e214c2c78215fe303f5] [flang] Add -ffp-contract option processing
git bisect good a784de783af5096e593c5e214c2c78215fe303f5
# bad: [3c700cf754dbeb5f1f7c1e03e1f04ed716f7d9dc] [BOLT] Use std::optional instead of None in comments (NFC)
git bisect bad 3c700cf754dbeb5f1f7c1e03e1f04ed716f7d9dc
# good: [93b553e3f2e4f53ce3dda13cd18dbc43643a535b] Revert "Host: Internalize computeHostNumPhysicalCores/computeHostNumHardwareThreads"
git bisect good 93b553e3f2e4f53ce3dda13cd18dbc43643a535b
# good: [642c6638a3d78359552f5cf71d24a80a9bd9801f] Reland "[clang][deps] During scanning don't emit warnings-as-errors that are ignored with diagnostic pragmas."
git bisect good 642c6638a3d78359552f5cf71d24a80a9bd9801f
# bad: [40ade845be698355b230abc19c7a76b200188772] Revert "Store OptTable::Info::Name as a StringRef"
git bisect bad 40ade845be698355b230abc19c7a76b200188772
# bad: [c48e0cf03a50bb8a2043ac4bb5e9a83ff135247a] [mlir] Remove TypedAttr and ElementsAttr from DenseArrayAttr
git bisect bad c48e0cf03a50bb8a2043ac4bb5e9a83ff135247a
# good: [89fab98e884f05076bbd420d95b5de3596f5452c] [DebugInfo] llvm::Optional => std::optional
git bisect good 89fab98e884f05076bbd420d95b5de3596f5452c
# good: [7224cffd62f555d82f6475738e0646c37730cc24] Intrinsics: Fix not speculating llvm.fptrunc.round
git bisect good 7224cffd62f555d82f6475738e0646c37730cc24
# good: [e1edcf7d14c126b9ebd2a77fcd9041d056cce64a] Reland "[lldb][Target] Flush the scratch TypeSystem when owning lldb_private::Module gets unloaded"
git bisect good e1edcf7d14c126b9ebd2a77fcd9041d056cce64a
# bad: [031ff673d814901a0ec27af705710abdad1b1b1a] [mlir] Fix alias printing for dialect attribute self types
git bisect bad 031ff673d814901a0ec27af705710abdad1b1b1a
# good: [d1d129356909af2f6fefd6f1b9335a39fe172e9a] [NFC] Port all runlines for SimplifyCFG pass tests to -passes syntax
git bisect good d1d129356909af2f6fefd6f1b9335a39fe172e9a
# bad: [51e33ac9c75f4ff704cc36e39f86eb5c8a306fff] [gn build] Port 5ecd36329508
git bisect bad 51e33ac9c75f4ff704cc36e39f86eb5c8a306fff
# good: [962863d988195917b7d2ccfb83a3a166e01ffc77] [flang] Catch attempts to copy pointers in allocatables in PURE
git bisect good 962863d988195917b7d2ccfb83a3a166e01ffc77
# bad: [5ecd363295089ad2db3c428ab1ee08ef1864ce3b] Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
git bisect bad 5ecd363295089ad2db3c428ab1ee08ef1864ce3b
# first bad commit: [5ecd363295089ad2db3c428ab1ee08ef1864ce3b] Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."

I can actually reproduce this on upstream linux-4.19.y with gki_defconfig copied to .config, which means it is not related to an out of tree Android series or patch (namely LTO):

$ boot-qemu.py -k build -t 15s
...
[    0.067987] alternatives: patching kernel code
[    0.081905] Unable to handle kernel paging request at virtual address ffffffbefe638000
[    0.083124] Mem abort info:
[    0.083247]   ESR = 0x97de8007
[    0.083361]   Exception class = DABT (current EL), IL = 32 bits
[    0.083517]   SET = 0, FnV = 0
[    0.083606]   EA = 0, S1PTW = 0
[    0.083764] Data abort info:
[    0.083861]   Access size = 8 byte(s)
[    0.083960]   SSE = 0, SRT = 30
[    0.084043]   SF = 1, AR = 0
[    0.084137]   CM = 0, WnR = 0
[    0.084420] swapper pgtable: 4k pages, 39-bit VAs, pgdp = 00000000edcd8fba
[    0.084653] [ffffffbefe638000] pgd=000000004131d803, pud=000000004131d803, pmd=000000004131e803, pte=000000005eff8802
[    0.085285] Internal error: Oops: 97de8007 [#1] PREEMPT SMP
[    0.085571] Modules linked in:
[    0.085767] Process swapper/0 (pid: 1, stack limit = 0x00000000b8047f99)
[    0.086076] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.280 #1
[    0.086239] Hardware name: linux,dummy-virt (DT)
[    0.086497] pstate: 80800009 (Nzcv daif -PAN +UAO)
[    0.087074] pc : __create_pgd_mapping+0x31c/0x5fc
[    0.087215] lr : update_mapping_prot+0x5c/0xf8
[    0.087327] sp : ffffff800800bce0
[    0.087426] x29: ffffff800800bd70 x28: ffffffe2c0080000
[    0.087586] x27: ffffffbefe638000 x26: 0000000040080000
[    0.087719] x25: ffffff92f5d1e000 x24: 0070000000000f93
[    0.087862] x23: ffd7fffffffff77f x22: 0060000000000f93
[    0.087998] x21: 00000ffffbefe639 x20: 0060000000000f93
[    0.088125] x19: 0000000000000002 x18: 000000019f17612b
[    0.088248] x17: 000000000000279c x16: 0000000000000161
[    0.088370] x15: 00000000009e7000 x14: 0000000000000000
[    0.088491] x13: 0000000000000400 x12: ffffffe2c0080000
[    0.088613] x11: 0000000040080000 x10: 00000000fffff06c
[    0.088764] x9 : ffffffe2c01fffff x8 : ffffffe2c0200000
[    0.088922] x7 : 0060000000000f90 x6 : ffffffe2c0f4ffff
[    0.089052] x5 : 0000000000000000 x4 : ffffffbefe638000
[    0.089194] x3 : ffffffe2c0f50000 x2 : 000000005eff8803
[    0.089323] x1 : 0000000000000041 x0 : ffffff92f5d7d000
[    0.089523] Call trace:
[    0.089656]  __create_pgd_mapping+0x31c/0x5fc
[    0.089769]  update_mapping_prot+0x5c/0xf8
[    0.089879]  mark_linear_text_alias_ro+0x4c/0x68
[    0.090003]  smp_cpus_done+0x34/0x3c
[    0.090093]  smp_init+0xf4/0x10c
[    0.090170]  kernel_init_freeable+0xb0/0x13c
[    0.090274]  kernel_init+0x14/0x294
[    0.090360]  ret_from_fork+0x10/0x18
[    0.091736] Code: 926bab88 b2405389 91480108 eb06013f (f940037e)
[    0.093477] ---[ end trace a1cb5f5a4e0eb6fe ]---
[    0.094086] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.094086]
[    0.094750] Rebooting in 5 seconds..
qemu-system-aarch64: terminating on signal 15 from pid 1241405 (timeout)

There are other issues I need to initially triage today, I will double back around to this later unless someone beats me to it. The last line before the crash (alternatives: patching kernel code) is certainly suspicious to me... It is also worth trying to configuration bisect between defconfig and gki_defconfig to see if that gives us an immediate clue as to what is going on here.

Metadata

Metadata

Labels

[ARCH] arm64This bug impacts ARCH=arm64[BUG] llvmA bug that should be fixed in upstream LLVM[FIXED][LLVM] 16This bug was fixed in LLVM 16.0[FIXED][LLVM] mainThis bug was only present and fixed in an unreleased version of LLVMboot failureThis issue results in a failure to boot

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions