Message ID | 20240902150016.63072-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/cpu: revert opt_allow_unsafe from __ro_after_init to __read_mostly | expand |
On 02.09.2024 17:00, Roger Pau Monne wrote: > Making opt_allow_unsafe read only after init requires changes to the logic in > init_amd(), otherwise the following #PF happens on CPU hotplug: > > ----[ Xen-4.20.0-1-d x86_64 debug=y Tainted: H ]---- > CPU: 1 > RIP: e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993 > [...] > Xen call trace: > [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993 > [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db > [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf > [<ffff82d040203327>] F __high_start+0x87/0xa0 > > Pagetable walk from ffff82d0404011ea: > L4[0x105] = 000000006fc2e063 ffffffffffffffff > L3[0x141] = 000000006fc2b063 ffffffffffffffff > L2[0x002] = 000000807c7ca063 ffffffffffffffff > L1[0x001] = 800000006f801121 ffffffffffffffff > > **************************************** > Panic on CPU 1: > FATAL PAGE FAULT > [error_code=0003] > Faulting linear address: ffff82d0404011ea > **************************************** Hmm, I specifically looked at that code, but I can see how I screwed up. > For the time being revert opt_allow_unsafe to be __read_mostly. There's exactly one write that an AP can hit. Is it really worth moving back, rather than just doing if (opt_allow_unsafe <= 0 && !cpu_has_amd_erratum(c, AMD_ERRATUM_121)) opt_allow_unsafe = 1; else if ... ? Jan
On Mon, Sep 02, 2024 at 05:16:05PM +0200, Jan Beulich wrote: > On 02.09.2024 17:00, Roger Pau Monne wrote: > > Making opt_allow_unsafe read only after init requires changes to the logic in > > init_amd(), otherwise the following #PF happens on CPU hotplug: > > > > ----[ Xen-4.20.0-1-d x86_64 debug=y Tainted: H ]---- > > CPU: 1 > > RIP: e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993 > > [...] > > Xen call trace: > > [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993 > > [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db > > [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf > > [<ffff82d040203327>] F __high_start+0x87/0xa0 > > > > Pagetable walk from ffff82d0404011ea: > > L4[0x105] = 000000006fc2e063 ffffffffffffffff > > L3[0x141] = 000000006fc2b063 ffffffffffffffff > > L2[0x002] = 000000807c7ca063 ffffffffffffffff > > L1[0x001] = 800000006f801121 ffffffffffffffff > > > > **************************************** > > Panic on CPU 1: > > FATAL PAGE FAULT > > [error_code=0003] > > Faulting linear address: ffff82d0404011ea > > **************************************** > > Hmm, I specifically looked at that code, but I can see how I screwed up. > > > For the time being revert opt_allow_unsafe to be __read_mostly. > > There's exactly one write that an AP can hit. Is it really worth moving > back, rather than just doing > > if (opt_allow_unsafe <= 0 && !cpu_has_amd_erratum(c, AMD_ERRATUM_121)) > opt_allow_unsafe = 1; > else if ... > > ? I would rather move this logic so it's only the BSP that can set opt_allow_unsafe, and the APs check they match the configuration set by the BSP. I think the resulting logic would be cleaner, but I didn't want to do such a change as part of this fix. Thanks, Roger.
On 02.09.2024 17:30, Roger Pau Monné wrote: > On Mon, Sep 02, 2024 at 05:16:05PM +0200, Jan Beulich wrote: >> On 02.09.2024 17:00, Roger Pau Monne wrote: >>> Making opt_allow_unsafe read only after init requires changes to the logic in >>> init_amd(), otherwise the following #PF happens on CPU hotplug: >>> >>> ----[ Xen-4.20.0-1-d x86_64 debug=y Tainted: H ]---- >>> CPU: 1 >>> RIP: e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993 >>> [...] >>> Xen call trace: >>> [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993 >>> [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db >>> [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf >>> [<ffff82d040203327>] F __high_start+0x87/0xa0 >>> >>> Pagetable walk from ffff82d0404011ea: >>> L4[0x105] = 000000006fc2e063 ffffffffffffffff >>> L3[0x141] = 000000006fc2b063 ffffffffffffffff >>> L2[0x002] = 000000807c7ca063 ffffffffffffffff >>> L1[0x001] = 800000006f801121 ffffffffffffffff >>> >>> **************************************** >>> Panic on CPU 1: >>> FATAL PAGE FAULT >>> [error_code=0003] >>> Faulting linear address: ffff82d0404011ea >>> **************************************** >> >> Hmm, I specifically looked at that code, but I can see how I screwed up. >> >>> For the time being revert opt_allow_unsafe to be __read_mostly. >> >> There's exactly one write that an AP can hit. Is it really worth moving >> back, rather than just doing >> >> if (opt_allow_unsafe <= 0 && !cpu_has_amd_erratum(c, AMD_ERRATUM_121)) >> opt_allow_unsafe = 1; >> else if ... >> >> ? > > I would rather move this logic so it's only the BSP that can set > opt_allow_unsafe, and the APs check they match the configuration set > by the BSP. I think the resulting logic would be cleaner, but I > didn't want to do such a change as part of this fix. Well, okay then: Reviewed-by: Jan Beulich <jbeulich@suse.com> And I guess I'll put it in right away. Jan
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c index 903be14af4b9..7da04230393a 100644 --- a/xen/arch/x86/cpu/amd.c +++ b/xen/arch/x86/cpu/amd.c @@ -46,7 +46,7 @@ static unsigned int __initdata opt_cpuid_mask_thermal_ecx = ~0u; integer_param("cpuid_mask_thermal_ecx", opt_cpuid_mask_thermal_ecx); /* 1 = allow, 0 = don't allow guest creation, -1 = don't allow boot */ -int8_t __ro_after_init opt_allow_unsafe; +int8_t __read_mostly opt_allow_unsafe; boolean_param("allow_unsafe", opt_allow_unsafe); /* Signal whether the ACPI C1E quirk is required. */
Making opt_allow_unsafe read only after init requires changes to the logic in init_amd(), otherwise the following #PF happens on CPU hotplug: ----[ Xen-4.20.0-1-d x86_64 debug=y Tainted: H ]---- CPU: 1 RIP: e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993 [...] Xen call trace: [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993 [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf [<ffff82d040203327>] F __high_start+0x87/0xa0 Pagetable walk from ffff82d0404011ea: L4[0x105] = 000000006fc2e063 ffffffffffffffff L3[0x141] = 000000006fc2b063 ffffffffffffffff L2[0x002] = 000000807c7ca063 ffffffffffffffff L1[0x001] = 800000006f801121 ffffffffffffffff **************************************** Panic on CPU 1: FATAL PAGE FAULT [error_code=0003] Faulting linear address: ffff82d0404011ea **************************************** For the time being revert opt_allow_unsafe to be __read_mostly. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Fixes: bfcb0abb191f ('types: replace remaining uses of s8') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/cpu/amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)