ARCH=arm64 gki_defconfig has been failing to boot on android-4.19-stable for some time (we were missing it due to -no-reboot):
https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4740164167/jobs/8418405133
$ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- LLVM=1 LLVM_IAS=1 O=build mrproper gki_defconfig menuconfig Image.gz
...
$ boot-qemu.py -k build -t 15s
...
[ 0.071379] alternatives: patching kernel code
[ 0.085488] Unable to handle kernel paging request at virtual address ffffffbefe638000
[ 0.086678] Mem abort info:
[ 0.086784] ESR = 0x97de8007
[ 0.086894] Exception class = DABT (current EL), IL = 32 bits
[ 0.087049] SET = 0, FnV = 0
[ 0.087143] EA = 0, S1PTW = 0
[ 0.087303] Data abort info:
[ 0.087383] Access size = 8 byte(s)
[ 0.087485] SSE = 0, SRT = 30
[ 0.087571] SF = 1, AR = 0
[ 0.087644] CM = 0, WnR = 0
[ 0.087922] swapper pgtable: 4k pages, 39-bit VAs, pgdp = 00000000533798c2
[ 0.088132] [ffffffbefe638000] pgd=00000000414e2803, pud=00000000414e2803, pmd=00000000414e3803, pte=000000005eff9802
[ 0.088720] Internal error: Oops: 97de8007 [#1] PREEMPT SMP
[ 0.088999] Modules linked in:
[ 0.089188] Process swapper/0 (pid: 1, stack limit = 0x0000000085102783)
[ 0.089508] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.280-00088-g9b68392a2b58 #1
[ 0.089701] Hardware name: linux,dummy-virt (DT)
[ 0.089971] pstate: 80800009 (Nzcv daif -PAN +UAO)
[ 0.090581] pc : __create_pgd_mapping+0x310/0x6dc
[ 0.090706] lr : update_mapping_prot+0x60/0x100
[ 0.090814] sp : ffffff800800bce0
[ 0.090919] x29: ffffff800800bd70 x28: 0070000000000f93
[ 0.091072] x27: 00000ffffbefe639 x26: 0000000040080000
[ 0.091211] x25: 0060000000000f93 x24: ffffffa362ce3000
[ 0.091347] x23: ffd7fffffffff77f x22: ffffffe5812b0000
[ 0.091479] x21: ffffffe580080000 x20: ffffffbefe638000
[ 0.091610] x19: 0000000000000002 x18: ffffffe59d913838
[ 0.091738] x17: 0000000000002856 x16: 000000018f0365db
[ 0.091884] x15: 0000000000002856 x14: 0000000000000289
[ 0.092023] x13: ffffffe580080000 x12: 0000000040080000
[ 0.092153] x11: 0060000000000f90 x10: 00000000fffff06c
[ 0.092307] x9 : ffffffe5801fffff x8 : ffffffe580200000
[ 0.092425] x7 : ffffffe5812affff x6 : ffffffe5812b0000
[ 0.092546] x5 : 0000000000000000 x4 : ffffffbefe638000
[ 0.092687] x3 : 0000000000000041 x2 : ffffffa362ce5000
[ 0.092811] x1 : 0000000040080000 x0 : ffffffa362d4d000
[ 0.093012] Call trace:
[ 0.093152] __create_pgd_mapping+0x310/0x6dc
[ 0.093267] update_mapping_prot+0x60/0x100
[ 0.093379] mark_linear_text_alias_ro+0x50/0x70
[ 0.093476] smp_cpus_done+0x38/0x44
[ 0.093572] smp_init+0xf8/0x110
[ 0.093643] kernel_init_freeable+0xb4/0x144
[ 0.093754] kernel_init+0x18/0x298
[ 0.093841] ret_from_fork+0x10/0x18
[ 0.095232] Code: 926baaa8 b24052a9 91480108 eb07013f (f940029e)
[ 0.097110] ---[ end trace 28f618bca044740b ]---
[ 0.098193] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.098193]
[ 0.098865] Rebooting in 5 seconds..
qemu-system-aarch64: terminating on signal 15 from pid 1139109 (timeout)
I bisected it down to llvm/llvm-project@5ecd363.
# bad: [603c286334b07f568d39f6706c848f576914f323] Bump the trunk major version to 17
# good: [809855b56f06dd7182685f88fbbc64111df9339a] Bump the trunk major version to 16
git bisect start 'llvmorg-17-init' 'llvmorg-16-init'
# good: [a784de783af5096e593c5e214c2c78215fe303f5] [flang] Add -ffp-contract option processing
git bisect good a784de783af5096e593c5e214c2c78215fe303f5
# bad: [3c700cf754dbeb5f1f7c1e03e1f04ed716f7d9dc] [BOLT] Use std::optional instead of None in comments (NFC)
git bisect bad 3c700cf754dbeb5f1f7c1e03e1f04ed716f7d9dc
# good: [93b553e3f2e4f53ce3dda13cd18dbc43643a535b] Revert "Host: Internalize computeHostNumPhysicalCores/computeHostNumHardwareThreads"
git bisect good 93b553e3f2e4f53ce3dda13cd18dbc43643a535b
# good: [642c6638a3d78359552f5cf71d24a80a9bd9801f] Reland "[clang][deps] During scanning don't emit warnings-as-errors that are ignored with diagnostic pragmas."
git bisect good 642c6638a3d78359552f5cf71d24a80a9bd9801f
# bad: [40ade845be698355b230abc19c7a76b200188772] Revert "Store OptTable::Info::Name as a StringRef"
git bisect bad 40ade845be698355b230abc19c7a76b200188772
# bad: [c48e0cf03a50bb8a2043ac4bb5e9a83ff135247a] [mlir] Remove TypedAttr and ElementsAttr from DenseArrayAttr
git bisect bad c48e0cf03a50bb8a2043ac4bb5e9a83ff135247a
# good: [89fab98e884f05076bbd420d95b5de3596f5452c] [DebugInfo] llvm::Optional => std::optional
git bisect good 89fab98e884f05076bbd420d95b5de3596f5452c
# good: [7224cffd62f555d82f6475738e0646c37730cc24] Intrinsics: Fix not speculating llvm.fptrunc.round
git bisect good 7224cffd62f555d82f6475738e0646c37730cc24
# good: [e1edcf7d14c126b9ebd2a77fcd9041d056cce64a] Reland "[lldb][Target] Flush the scratch TypeSystem when owning lldb_private::Module gets unloaded"
git bisect good e1edcf7d14c126b9ebd2a77fcd9041d056cce64a
# bad: [031ff673d814901a0ec27af705710abdad1b1b1a] [mlir] Fix alias printing for dialect attribute self types
git bisect bad 031ff673d814901a0ec27af705710abdad1b1b1a
# good: [d1d129356909af2f6fefd6f1b9335a39fe172e9a] [NFC] Port all runlines for SimplifyCFG pass tests to -passes syntax
git bisect good d1d129356909af2f6fefd6f1b9335a39fe172e9a
# bad: [51e33ac9c75f4ff704cc36e39f86eb5c8a306fff] [gn build] Port 5ecd36329508
git bisect bad 51e33ac9c75f4ff704cc36e39f86eb5c8a306fff
# good: [962863d988195917b7d2ccfb83a3a166e01ffc77] [flang] Catch attempts to copy pointers in allocatables in PURE
git bisect good 962863d988195917b7d2ccfb83a3a166e01ffc77
# bad: [5ecd363295089ad2db3c428ab1ee08ef1864ce3b] Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
git bisect bad 5ecd363295089ad2db3c428ab1ee08ef1864ce3b
# first bad commit: [5ecd363295089ad2db3c428ab1ee08ef1864ce3b] Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
I can actually reproduce this on upstream linux-4.19.y with gki_defconfig copied to .config, which means it is not related to an out of tree Android series or patch (namely LTO):
$ boot-qemu.py -k build -t 15s
...
[ 0.067987] alternatives: patching kernel code
[ 0.081905] Unable to handle kernel paging request at virtual address ffffffbefe638000
[ 0.083124] Mem abort info:
[ 0.083247] ESR = 0x97de8007
[ 0.083361] Exception class = DABT (current EL), IL = 32 bits
[ 0.083517] SET = 0, FnV = 0
[ 0.083606] EA = 0, S1PTW = 0
[ 0.083764] Data abort info:
[ 0.083861] Access size = 8 byte(s)
[ 0.083960] SSE = 0, SRT = 30
[ 0.084043] SF = 1, AR = 0
[ 0.084137] CM = 0, WnR = 0
[ 0.084420] swapper pgtable: 4k pages, 39-bit VAs, pgdp = 00000000edcd8fba
[ 0.084653] [ffffffbefe638000] pgd=000000004131d803, pud=000000004131d803, pmd=000000004131e803, pte=000000005eff8802
[ 0.085285] Internal error: Oops: 97de8007 [#1] PREEMPT SMP
[ 0.085571] Modules linked in:
[ 0.085767] Process swapper/0 (pid: 1, stack limit = 0x00000000b8047f99)
[ 0.086076] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.280 #1
[ 0.086239] Hardware name: linux,dummy-virt (DT)
[ 0.086497] pstate: 80800009 (Nzcv daif -PAN +UAO)
[ 0.087074] pc : __create_pgd_mapping+0x31c/0x5fc
[ 0.087215] lr : update_mapping_prot+0x5c/0xf8
[ 0.087327] sp : ffffff800800bce0
[ 0.087426] x29: ffffff800800bd70 x28: ffffffe2c0080000
[ 0.087586] x27: ffffffbefe638000 x26: 0000000040080000
[ 0.087719] x25: ffffff92f5d1e000 x24: 0070000000000f93
[ 0.087862] x23: ffd7fffffffff77f x22: 0060000000000f93
[ 0.087998] x21: 00000ffffbefe639 x20: 0060000000000f93
[ 0.088125] x19: 0000000000000002 x18: 000000019f17612b
[ 0.088248] x17: 000000000000279c x16: 0000000000000161
[ 0.088370] x15: 00000000009e7000 x14: 0000000000000000
[ 0.088491] x13: 0000000000000400 x12: ffffffe2c0080000
[ 0.088613] x11: 0000000040080000 x10: 00000000fffff06c
[ 0.088764] x9 : ffffffe2c01fffff x8 : ffffffe2c0200000
[ 0.088922] x7 : 0060000000000f90 x6 : ffffffe2c0f4ffff
[ 0.089052] x5 : 0000000000000000 x4 : ffffffbefe638000
[ 0.089194] x3 : ffffffe2c0f50000 x2 : 000000005eff8803
[ 0.089323] x1 : 0000000000000041 x0 : ffffff92f5d7d000
[ 0.089523] Call trace:
[ 0.089656] __create_pgd_mapping+0x31c/0x5fc
[ 0.089769] update_mapping_prot+0x5c/0xf8
[ 0.089879] mark_linear_text_alias_ro+0x4c/0x68
[ 0.090003] smp_cpus_done+0x34/0x3c
[ 0.090093] smp_init+0xf4/0x10c
[ 0.090170] kernel_init_freeable+0xb0/0x13c
[ 0.090274] kernel_init+0x14/0x294
[ 0.090360] ret_from_fork+0x10/0x18
[ 0.091736] Code: 926bab88 b2405389 91480108 eb06013f (f940037e)
[ 0.093477] ---[ end trace a1cb5f5a4e0eb6fe ]---
[ 0.094086] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.094086]
[ 0.094750] Rebooting in 5 seconds..
qemu-system-aarch64: terminating on signal 15 from pid 1241405 (timeout)
There are other issues I need to initially triage today, I will double back around to this later unless someone beats me to it. The last line before the crash (alternatives: patching kernel code) is certainly suspicious to me... It is also worth trying to configuration bisect between defconfig and gki_defconfig to see if that gives us an immediate clue as to what is going on here.
ARCH=arm64 gki_defconfighas been failing to boot onandroid-4.19-stablefor some time (we were missing it due to-no-reboot):https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4740164167/jobs/8418405133
I bisected it down to llvm/llvm-project@5ecd363.
I can actually reproduce this on upstream
linux-4.19.ywithgki_defconfigcopied to.config, which means it is not related to an out of tree Android series or patch (namely LTO):There are other issues I need to initially triage today, I will double back around to this later unless someone beats me to it. The last line before the crash (
alternatives: patching kernel code) is certainly suspicious to me... It is also worth trying to configuration bisect betweendefconfigandgki_defconfigto see if that gives us an immediate clue as to what is going on here.