diff mbox series

x86/cpu: revert opt_allow_unsafe from __ro_after_init to __read_mostly

Message ID 20240902150016.63072-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series x86/cpu: revert opt_allow_unsafe from __ro_after_init to __read_mostly | expand

Commit Message

Roger Pau Monné Sept. 2, 2024, 3 p.m. UTC
Making opt_allow_unsafe read only after init requires changes to the logic in
init_amd(), otherwise the following #PF happens on CPU hotplug:

----[ Xen-4.20.0-1-d  x86_64  debug=y  Tainted:     H  ]----
CPU:    1
RIP:    e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993
[...]
Xen call trace:
   [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993
   [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db
   [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf
   [<ffff82d040203327>] F __high_start+0x87/0xa0

Pagetable walk from ffff82d0404011ea:
 L4[0x105] = 000000006fc2e063 ffffffffffffffff
 L3[0x141] = 000000006fc2b063 ffffffffffffffff
 L2[0x002] = 000000807c7ca063 ffffffffffffffff
 L1[0x001] = 800000006f801121 ffffffffffffffff

****************************************
Panic on CPU 1:
FATAL PAGE FAULT
[error_code=0003]
Faulting linear address: ffff82d0404011ea
****************************************

For the time being revert opt_allow_unsafe to be __read_mostly.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Fixes: bfcb0abb191f ('types: replace remaining uses of s8')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/cpu/amd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jan Beulich Sept. 2, 2024, 3:16 p.m. UTC | #1
On 02.09.2024 17:00, Roger Pau Monne wrote:
> Making opt_allow_unsafe read only after init requires changes to the logic in
> init_amd(), otherwise the following #PF happens on CPU hotplug:
> 
> ----[ Xen-4.20.0-1-d  x86_64  debug=y  Tainted:     H  ]----
> CPU:    1
> RIP:    e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993
> [...]
> Xen call trace:
>    [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993
>    [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db
>    [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf
>    [<ffff82d040203327>] F __high_start+0x87/0xa0
> 
> Pagetable walk from ffff82d0404011ea:
>  L4[0x105] = 000000006fc2e063 ffffffffffffffff
>  L3[0x141] = 000000006fc2b063 ffffffffffffffff
>  L2[0x002] = 000000807c7ca063 ffffffffffffffff
>  L1[0x001] = 800000006f801121 ffffffffffffffff
> 
> ****************************************
> Panic on CPU 1:
> FATAL PAGE FAULT
> [error_code=0003]
> Faulting linear address: ffff82d0404011ea
> ****************************************

Hmm, I specifically looked at that code, but I can see how I screwed up.

> For the time being revert opt_allow_unsafe to be __read_mostly.

There's exactly one write that an AP can hit. Is it really worth moving
back, rather than just doing

	if (opt_allow_unsafe <= 0 && !cpu_has_amd_erratum(c, AMD_ERRATUM_121))
		opt_allow_unsafe = 1;
	else if ...

?

Jan
Roger Pau Monné Sept. 2, 2024, 3:30 p.m. UTC | #2
On Mon, Sep 02, 2024 at 05:16:05PM +0200, Jan Beulich wrote:
> On 02.09.2024 17:00, Roger Pau Monne wrote:
> > Making opt_allow_unsafe read only after init requires changes to the logic in
> > init_amd(), otherwise the following #PF happens on CPU hotplug:
> > 
> > ----[ Xen-4.20.0-1-d  x86_64  debug=y  Tainted:     H  ]----
> > CPU:    1
> > RIP:    e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993
> > [...]
> > Xen call trace:
> >    [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993
> >    [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db
> >    [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf
> >    [<ffff82d040203327>] F __high_start+0x87/0xa0
> > 
> > Pagetable walk from ffff82d0404011ea:
> >  L4[0x105] = 000000006fc2e063 ffffffffffffffff
> >  L3[0x141] = 000000006fc2b063 ffffffffffffffff
> >  L2[0x002] = 000000807c7ca063 ffffffffffffffff
> >  L1[0x001] = 800000006f801121 ffffffffffffffff
> > 
> > ****************************************
> > Panic on CPU 1:
> > FATAL PAGE FAULT
> > [error_code=0003]
> > Faulting linear address: ffff82d0404011ea
> > ****************************************
> 
> Hmm, I specifically looked at that code, but I can see how I screwed up.
> 
> > For the time being revert opt_allow_unsafe to be __read_mostly.
> 
> There's exactly one write that an AP can hit. Is it really worth moving
> back, rather than just doing
> 
> 	if (opt_allow_unsafe <= 0 && !cpu_has_amd_erratum(c, AMD_ERRATUM_121))
> 		opt_allow_unsafe = 1;
> 	else if ...
> 
> ?

I would rather move this logic so it's only the BSP that can set
opt_allow_unsafe, and the APs check they match the configuration set
by the BSP.  I think the resulting logic would be cleaner, but I
didn't want to do such a change as part of this fix.

Thanks, Roger.
Jan Beulich Sept. 2, 2024, 3:33 p.m. UTC | #3
On 02.09.2024 17:30, Roger Pau Monné wrote:
> On Mon, Sep 02, 2024 at 05:16:05PM +0200, Jan Beulich wrote:
>> On 02.09.2024 17:00, Roger Pau Monne wrote:
>>> Making opt_allow_unsafe read only after init requires changes to the logic in
>>> init_amd(), otherwise the following #PF happens on CPU hotplug:
>>>
>>> ----[ Xen-4.20.0-1-d  x86_64  debug=y  Tainted:     H  ]----
>>> CPU:    1
>>> RIP:    e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993
>>> [...]
>>> Xen call trace:
>>>    [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993
>>>    [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db
>>>    [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf
>>>    [<ffff82d040203327>] F __high_start+0x87/0xa0
>>>
>>> Pagetable walk from ffff82d0404011ea:
>>>  L4[0x105] = 000000006fc2e063 ffffffffffffffff
>>>  L3[0x141] = 000000006fc2b063 ffffffffffffffff
>>>  L2[0x002] = 000000807c7ca063 ffffffffffffffff
>>>  L1[0x001] = 800000006f801121 ffffffffffffffff
>>>
>>> ****************************************
>>> Panic on CPU 1:
>>> FATAL PAGE FAULT
>>> [error_code=0003]
>>> Faulting linear address: ffff82d0404011ea
>>> ****************************************
>>
>> Hmm, I specifically looked at that code, but I can see how I screwed up.
>>
>>> For the time being revert opt_allow_unsafe to be __read_mostly.
>>
>> There's exactly one write that an AP can hit. Is it really worth moving
>> back, rather than just doing
>>
>> 	if (opt_allow_unsafe <= 0 && !cpu_has_amd_erratum(c, AMD_ERRATUM_121))
>> 		opt_allow_unsafe = 1;
>> 	else if ...
>>
>> ?
> 
> I would rather move this logic so it's only the BSP that can set
> opt_allow_unsafe, and the APs check they match the configuration set
> by the BSP.  I think the resulting logic would be cleaner, but I
> didn't want to do such a change as part of this fix.

Well, okay then:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

And I guess I'll put it in right away.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 903be14af4b9..7da04230393a 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -46,7 +46,7 @@  static unsigned int __initdata opt_cpuid_mask_thermal_ecx = ~0u;
 integer_param("cpuid_mask_thermal_ecx", opt_cpuid_mask_thermal_ecx);
 
 /* 1 = allow, 0 = don't allow guest creation, -1 = don't allow boot */
-int8_t __ro_after_init opt_allow_unsafe;
+int8_t __read_mostly opt_allow_unsafe;
 boolean_param("allow_unsafe", opt_allow_unsafe);
 
 /* Signal whether the ACPI C1E quirk is required. */