diff mbox series

x86/realmode: Make stack lock work in trampoline_compat()

Message ID 87h6rujdvl.ffs@tglx (mailing list archive)
State Handled Elsewhere
Headers show
Series x86/realmode: Make stack lock work in trampoline_compat() | expand

Checks

Context Check Description
conchuod/tree_selection fail Failed to apply to next/pending-fixes, riscv/for-next or riscv/master

Commit Message

Thomas Gleixner May 30, 2023, 10:46 a.m. UTC
The stack locking and stack assignment macro LOAD_REALMODE_ESP fails to
work when invoked from the 64bit trampoline entry point:

trampoline_start64
  trampoline_compat
    LOAD_REALMODE_ESP <- lock

Accessing tr_lock is only possible from 16bit mode. For the compat entry
point this needs to be pa_tr_lock so that the required relocation entry is
generated. Otherwise it locks the non-relocated address which is
aside of being wrong never cleared in secondary_startup_64() causing all
but the first CPU to get stuck on the lock.

Make the macro take an argument lock_pa which defaults to 0 and rename it
to LOCK_AND_LOAD_REALMODE_ESP to make it clear what this is about.

Fixes: f6f1ae9128d2 ("x86/smpboot: Implement a bit spinlock to protect the realmode stack")
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/realmode/rm/trampoline_64.S |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Kirill A. Shutemov May 30, 2023, 11:12 a.m. UTC | #1
On Tue, May 30, 2023 at 12:46:22PM +0200, Thomas Gleixner wrote:
> The stack locking and stack assignment macro LOAD_REALMODE_ESP fails to
> work when invoked from the 64bit trampoline entry point:
> 
> trampoline_start64
>   trampoline_compat
>     LOAD_REALMODE_ESP <- lock
> 
> Accessing tr_lock is only possible from 16bit mode. For the compat entry
> point this needs to be pa_tr_lock so that the required relocation entry is
> generated. Otherwise it locks the non-relocated address which is
> aside of being wrong never cleared in secondary_startup_64() causing all
> but the first CPU to get stuck on the lock.
> 
> Make the macro take an argument lock_pa which defaults to 0 and rename it
> to LOCK_AND_LOAD_REALMODE_ESP to make it clear what this is about.
> 
> Fixes: f6f1ae9128d2 ("x86/smpboot: Implement a bit spinlock to protect the realmode stack")
> Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Yunhong Jiang June 8, 2023, 11:34 p.m. UTC | #2
On Tue, May 30, 2023 at 12:46:22PM +0200, Thomas Gleixner wrote:
> The stack locking and stack assignment macro LOAD_REALMODE_ESP fails to
> work when invoked from the 64bit trampoline entry point:
> 
> trampoline_start64
>   trampoline_compat
>     LOAD_REALMODE_ESP <- lock
One possibly dumb question and hope get some hints. The LOAD_REALMODE_ESP is
defined under .code16 directive and will be used by 32-bit mode caller also. Is
it ok because the instructions there will be same for both 16-bit and 32-bit? I
checked
https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_16.html#SEC205 and
don't find much information there.
Andrew Cooper June 8, 2023, 11:57 p.m. UTC | #3
On 09/06/2023 12:34 am, Yunhong Jiang wrote:
> On Tue, May 30, 2023 at 12:46:22PM +0200, Thomas Gleixner wrote:
>> The stack locking and stack assignment macro LOAD_REALMODE_ESP fails to
>> work when invoked from the 64bit trampoline entry point:
>>
>> trampoline_start64
>>   trampoline_compat
>>     LOAD_REALMODE_ESP <- lock
> One possibly dumb question and hope get some hints.

There's a phrase.  "The only dumb question is the one not asked".

If you have this question, there's an excellent chance that someone else
reading this thread has the same question.

>  The LOAD_REALMODE_ESP is
> defined under .code16 directive and will be used by 32-bit mode caller also. Is
> it ok because the instructions there will be same for both 16-bit and 32-bit? I
> checked
> https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_16.html#SEC205 and
> don't find much information there.

The position of the LOAD_REALMODE_ESP .macro itself doesn't matter. 
It's just some text which gets pasted elsewhere.  Imagine it just the
same as running the C preprocessor on a file before compiling it.

As you note, some expansions of the macro are in .code16, and some are
not.  This does result in different bytes being emitted.  The default
operands size flips between .code16 and .code32, so there will be some
0x66 prefixes in one mode, and not in others.

The important point is the l suffix on btsl, which forces it to be long
(32bit) irrespective of the default operand size.

So yes, it will work, but that's because gas is handling the differing
encodings automatically based on the default operand size where the
LOAD_REALMODE_ESP gets expanded.

Hope this helps,

~Andrew
Yunhong Jiang June 9, 2023, 12:22 a.m. UTC | #4
On Fri, Jun 09, 2023 at 12:57:46AM +0100, Andrew Cooper wrote:
> On 09/06/2023 12:34 am, Yunhong Jiang wrote:
> > On Tue, May 30, 2023 at 12:46:22PM +0200, Thomas Gleixner wrote:
> >> The stack locking and stack assignment macro LOAD_REALMODE_ESP fails to
> >> work when invoked from the 64bit trampoline entry point:
> >>
> >> trampoline_start64
> >>   trampoline_compat
> >>     LOAD_REALMODE_ESP <- lock
> > One possibly dumb question and hope get some hints.
> 
> There's a phrase.  "The only dumb question is the one not asked".
> 
> If you have this question, there's an excellent chance that someone else
> reading this thread has the same question.
> 
> >  The LOAD_REALMODE_ESP is
> > defined under .code16 directive and will be used by 32-bit mode caller also. Is
> > it ok because the instructions there will be same for both 16-bit and 32-bit? I
> > checked
> > https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_16.html#SEC205 and
> > don't find much information there.
> 
> The position of the LOAD_REALMODE_ESP .macro itself doesn't matter. 
> It's just some text which gets pasted elsewhere.  Imagine it just the
> same as running the C preprocessor on a file before compiling it.
> 
> As you note, some expansions of the macro are in .code16, and some are
> not.  This does result in different bytes being emitted.  The default
> operands size flips between .code16 and .code32, so there will be some
> 0x66 prefixes in one mode, and not in others.
> 
> The important point is the l suffix on btsl, which forces it to be long
> (32bit) irrespective of the default operand size.
> 
> So yes, it will work, but that's because gas is handling the differing
> encodings automatically based on the default operand size where the
> LOAD_REALMODE_ESP gets expanded.
> 
> Hope this helps,
Thank you for the explaination, it's quite clear now.
> 
> ~Andrew
David Laight June 10, 2023, 7:50 p.m. UTC | #5
From: Andrew Cooper
> Sent: 09 June 2023 00:58
> 
...
> The important point is the l suffix on btsl, which forces it to be long
> (32bit) irrespective of the default operand size.

Does that matter at all?
The 'bit' opcodes (I'm sure 'bts' is 'bit test and set') take
a bit offset from the base address.
This accesses the same bit regardless of the operand size.

The one real issue is that a byte operand will only lock the one byte.
This might be problematic if non-bit locked accesses are also used.
Although it would need to be rather obscure use.
(This may be one of them...)

The only other problem is that btsl always does a locked 32bit
access. If the base address is misaligned this is a misaligned
locked access - problematic if it crosses a cache line boundary.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Andrew Cooper June 10, 2023, 10:51 p.m. UTC | #6
On 10/06/2023 8:50 pm, David Laight wrote:
> From: Andrew Cooper
>> Sent: 09 June 2023 00:58
>>
> ...
>> The important point is the l suffix on btsl, which forces it to be long
>> (32bit) irrespective of the default operand size.
> Does that matter at all?

Yes, or I wouldn't have wasted everyone's time saying otherwise.

I recommend educating yourself with the instruction manual, rather than
presenting a set of statements which are all wrong.

~Andrew
diff mbox series

Patch

--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -37,12 +37,16 @@ 
 	.text
 	.code16
 
-.macro LOAD_REALMODE_ESP
+.macro LOCK_AND_LOAD_REALMODE_ESP lock_pa=0
 	/*
 	 * Make sure only one CPU fiddles with the realmode stack
 	 */
 .Llock_rm\@:
+	.if \lock_pa
+        lock btsl       $0, pa_tr_lock
+	.else
         lock btsl       $0, tr_lock
+	.endif
         jnc             2f
         pause
         jmp             .Llock_rm\@
@@ -63,7 +67,7 @@  SYM_CODE_START(trampoline_start)
 	mov	%ax, %es
 	mov	%ax, %ss
 
-	LOAD_REALMODE_ESP
+	LOCK_AND_LOAD_REALMODE_ESP
 
 	call	verify_cpu		# Verify the cpu supports long mode
 	testl   %eax, %eax		# Check for return code
@@ -106,7 +110,7 @@  SYM_CODE_START(sev_es_trampoline_start)
 	mov	%ax, %es
 	mov	%ax, %ss
 
-	LOAD_REALMODE_ESP
+	LOCK_AND_LOAD_REALMODE_ESP
 
 	jmp	.Lswitch_to_protected
 SYM_CODE_END(sev_es_trampoline_start)
@@ -189,7 +193,7 @@  SYM_CODE_START(pa_trampoline_compat)
 	 * In compatibility mode.  Prep ESP and DX for startup_32, then disable
 	 * paging and complete the switch to legacy 32-bit mode.
 	 */
-	LOAD_REALMODE_ESP
+	LOCK_AND_LOAD_REALMODE_ESP lock_pa=1
 	movw	$__KERNEL_DS, %dx
 
 	movl	$(CR0_STATE & ~X86_CR0_PG), %eax