diff mbox

[RFC/PATCH,2/3] x86/signal/64: Try to preserve hardware SS across 64-bit signal delivery

Message ID a2cffaa2f2806e20095aea486e358322ef8a4703.1439496828.git.luto@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Andy Lutomirski Aug. 13, 2015, 8:18 p.m. UTC
Linux used to have nearly useless SS handling for 64-bit signals.  Signal
delivery to a 64-bit process would preserve the SS selector, but the
selector wasn't saved in sigcontext.  Sigreturn would then clobber SS.
If the signal being delivered was due to a bad SS, signal delivery would
fail and the task would be killed.

As of Linux 4.1, it's fixed: signal delivery sets up a valid SS in the
hardware SS selector, saves the old SS in the sigcontext, and restores it
properly in sigreturn.

DOSEMU had a curious workaround for the old behavior: it saved the
hardware SS selector it was given during signal delivery and fudged
RIP and CS so that sigreturn would return to a trampoline that
restored the old RIP, CS, and, importantly, SS.

The upshot is that the change in sigcontext had no effect on DOSEMU
(DOSEMU doesn't care what was in sigcontext, and the fact that the
old SS is presented to the trampoline in new kernels is irrelevant
because the trampoline uses long mode), but the change in signal
delivery caused DOSEMU's workaround to restore __USER_DS instead of
the correct pre-signal SS value.

Do our best to work around it: explicitly check whether the old SS
is usable and leave it alone during signal delivery if it is.
Sigreturn is unchanged.

Reported-by: Stas Sergeev <stsp@list.ru>
Fixes: c6f2062935c8 ("x86/signal/64: Fix SS handling for signals delivered to 64-bit programs")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/desc_defs.h | 23 +++++++++++++++++++++++
 arch/x86/kernel/signal.c         | 29 +++++++++++++++++++++++++++--
 2 files changed, 50 insertions(+), 2 deletions(-)

Comments

Andy Lutomirski Aug. 13, 2015, 8:25 p.m. UTC | #1
On Thu, Aug 13, 2015 at 1:18 PM, Andy Lutomirski <luto@kernel.org> wrote:
> Linux used to have nearly useless SS handling for 64-bit signals.  Signal
> delivery to a 64-bit process would preserve the SS selector, but the
> selector wasn't saved in sigcontext.  Sigreturn would then clobber SS.
> If the signal being delivered was due to a bad SS, signal delivery would
> fail and the task would be killed.
>
> As of Linux 4.1, it's fixed: signal delivery sets up a valid SS in the
> hardware SS selector, saves the old SS in the sigcontext, and restores it
> properly in sigreturn.
>
> DOSEMU had a curious workaround for the old behavior: it saved the
> hardware SS selector it was given during signal delivery and fudged
> RIP and CS so that sigreturn would return to a trampoline that
> restored the old RIP, CS, and, importantly, SS.
>
> The upshot is that the change in sigcontext had no effect on DOSEMU
> (DOSEMU doesn't care what was in sigcontext, and the fact that the
> old SS is presented to the trampoline in new kernels is irrelevant
> because the trampoline uses long mode), but the change in signal
> delivery caused DOSEMU's workaround to restore __USER_DS instead of
> the correct pre-signal SS value.
>
> Do our best to work around it: explicitly check whether the old SS
> is usable and leave it alone during signal delivery if it is.
> Sigreturn is unchanged.
>
> Reported-by: Stas Sergeev <stsp@list.ru>
> Fixes: c6f2062935c8 ("x86/signal/64: Fix SS handling for signals delivered to 64-bit programs")
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---

> +               asm ("lar %[old_ss], %[ar]\n\t"
> +                    "jz 1f\n\t"
> +                    "xorl %[ar], %[ar]\n\t"    /* If invalid, set ar = 0 */
> +                    "1:"
> +                    : [ar] "=r" (ar)
> +                    : [old_ss] "rm" ((u16)regs->ss));
> +

Now that I sent this...

I should learn to think very carefully before doubting Linus'
off-the-cuff intuition about the x86 instruction set.  This can use
VERW after all, as long as it's careful to set RPL==3.

For purposes of testing DOSEMU testing, the version I sent should be
fine.  For purposes of review, please pretend that I was sensible and
used VERW instead of LAR.  This also means that the VMX fixlet is
optional, but I still think it's a good idea.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski Aug. 13, 2015, 9:26 p.m. UTC | #2
On Thu, Aug 13, 2015 at 1:25 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Aug 13, 2015 at 1:18 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> Linux used to have nearly useless SS handling for 64-bit signals.  Signal
>> delivery to a 64-bit process would preserve the SS selector, but the
>> selector wasn't saved in sigcontext.  Sigreturn would then clobber SS.
>> If the signal being delivered was due to a bad SS, signal delivery would
>> fail and the task would be killed.
>>
>> As of Linux 4.1, it's fixed: signal delivery sets up a valid SS in the
>> hardware SS selector, saves the old SS in the sigcontext, and restores it
>> properly in sigreturn.
>>
>> DOSEMU had a curious workaround for the old behavior: it saved the
>> hardware SS selector it was given during signal delivery and fudged
>> RIP and CS so that sigreturn would return to a trampoline that
>> restored the old RIP, CS, and, importantly, SS.
>>
>> The upshot is that the change in sigcontext had no effect on DOSEMU
>> (DOSEMU doesn't care what was in sigcontext, and the fact that the
>> old SS is presented to the trampoline in new kernels is irrelevant
>> because the trampoline uses long mode), but the change in signal
>> delivery caused DOSEMU's workaround to restore __USER_DS instead of
>> the correct pre-signal SS value.
>>
>> Do our best to work around it: explicitly check whether the old SS
>> is usable and leave it alone during signal delivery if it is.
>> Sigreturn is unchanged.
>>
>> Reported-by: Stas Sergeev <stsp@list.ru>
>> Fixes: c6f2062935c8 ("x86/signal/64: Fix SS handling for signals delivered to 64-bit programs")
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>
>> +               asm ("lar %[old_ss], %[ar]\n\t"
>> +                    "jz 1f\n\t"
>> +                    "xorl %[ar], %[ar]\n\t"    /* If invalid, set ar = 0 */
>> +                    "1:"
>> +                    : [ar] "=r" (ar)
>> +                    : [old_ss] "rm" ((u16)regs->ss));
>> +
>
> Now that I sent this...
>
> I should learn to think very carefully before doubting Linus'
> off-the-cuff intuition about the x86 instruction set.  This can use
> VERW after all, as long as it's careful to set RPL==3.

And the corollary to that is: I should also assume that Linus is out
to get me :)

VERW is no good, because it considers non-present segments to be
writable.  Test cases for the win!

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Aug. 13, 2015, 9:41 p.m. UTC | #3
On Thu, Aug 13, 2015 at 2:26 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> VERW is no good, because it considers non-present segments to be
> writable.  Test cases for the win!

Seriously? That's crazy. I don't think I've actually ever used VERW,
but the documentation certainly says that the segment has to be
writable, and I quote

  "The validation performed is the same as is performed when a segment
selector is loaded into the DS, ES, FS, or GS register, and the
indicated access (read or write) is performed"

which damn well shouldn't work for non-present segments. Odd.

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski Aug. 13, 2015, 9:49 p.m. UTC | #4
On Thu, Aug 13, 2015 at 2:41 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 13, 2015 at 2:26 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> VERW is no good, because it considers non-present segments to be
>> writable.  Test cases for the win!
>
> Seriously? That's crazy. I don't think I've actually ever used VERW,
> but the documentation certainly says that the segment has to be
> writable, and I quote
>
>   "The validation performed is the same as is performed when a segment
> selector is loaded into the DS, ES, FS, or GS register, and the
> indicated access (read or write) is performed"
>
> which damn well shouldn't work for non-present segments. Odd.
>

I can try to come up with a self-contained test case, but I'm
reasonably confident that I did it right and that I sprinkled the
right printks around.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski Aug. 13, 2015, 10:03 p.m. UTC | #5
On Thu, Aug 13, 2015 at 2:49 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Aug 13, 2015 at 2:41 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Thu, Aug 13, 2015 at 2:26 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>
>>> VERW is no good, because it considers non-present segments to be
>>> writable.  Test cases for the win!
>>
>> Seriously? That's crazy. I don't think I've actually ever used VERW,
>> but the documentation certainly says that the segment has to be
>> writable, and I quote
>>
>>   "The validation performed is the same as is performed when a segment
>> selector is loaded into the DS, ES, FS, or GS register, and the
>> indicated access (read or write) is performed"
>>
>> which damn well shouldn't work for non-present segments. Odd.
>>
>
> I can try to come up with a self-contained test case, but I'm
> reasonably confident that I did it right and that I sprinkled the
> right printks around.

The SDM pseudocode, the APM's description:

A segment is writable if all of the following apply:
 - the selector is not a null selector.
 - the descriptor is within the GDT or LDT limit.
 - the segment is a writable data segment.
 - the descriptor DPL is greater than or equal to both the CPL and RPL.

and the SDM's bullet points:

To set the ZF flag, the following conditions must be met:
 - The segment selector is not NULL.
 - The selector must denote a descriptor within the bounds of the
descriptor table (GDT or LDT).
 - The selector must denote the descriptor of a code or data segment
(not that of a system segment or gate).
 - For the VERR instruction, the segment must be readable.
 - For the VERW instruction, the segment must be a writable data segment.
 - If the segment is not a conforming code segment, the segment’s DPL
must be greater than...

all seem to suggest that P isn't checked.

If I quote a bit farther than you did:

The validation performed is the same as is performed when a segment
selector is loaded into the DS, ES, FS, or GS
register, and the indicated access (read or write) is performed. The
segment selector's value cannot result in a
protection exception, enabling the software to anticipate possible
segment access problems.

I think the idea is that VERW is supposed to check protection but not
presence, the idea being that a hypothetical non-paged segmented OS
would swap out a segment and mark it not-present, and the resulting
failure would be #NP, which isn't a "protection exception".

Did anyone ever write an OS that used this stuff?  The Internet
suggests that OS/2 1.0 on the 286 supported swapping, so I bet it
actually used this mechanism, and woe unto any user (ahem, ring 1-3)
app that used LAR, checked the present bit, and blew up when a segment
was paged out.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/desc_defs.h b/arch/x86/include/asm/desc_defs.h
index 278441f39856..00971705a16d 100644
--- a/arch/x86/include/asm/desc_defs.h
+++ b/arch/x86/include/asm/desc_defs.h
@@ -98,4 +98,27 @@  struct desc_ptr {
 
 #endif /* !__ASSEMBLY__ */
 
+/* Access rights as returned by LAR */
+#define AR_TYPE_RODATA		(0 * (1 << 9))
+#define AR_TYPE_RWDATA		(1 * (1 << 9))
+#define AR_TYPE_RODATA_EXPDOWN	(2 * (1 << 9))
+#define AR_TYPE_RWDATA_EXPDOWN	(3 * (1 << 9))
+#define AR_TYPE_XOCODE		(4 * (1 << 9))
+#define AR_TYPE_XRCODE		(5 * (1 << 9))
+#define AR_TYPE_XOCODE_CONF	(6 * (1 << 9))
+#define AR_TYPE_XRCODE_CONF	(7 * (1 << 9))
+#define AR_TYPE_MASK		(7 * (1 << 9))
+
+#define AR_DPL0			(0 * (1 << 13))
+#define AR_DPL3			(3 * (1 << 13))
+#define AR_DPL_MASK		(3 * (1 << 13))
+
+#define AR_A			(1 << 8)	/* A means "accessed" */
+#define AR_S			(1 << 12)	/* S means "not system" */
+#define AR_P			(1 << 15)	/* P means "present" */
+#define AR_AVL			(1 << 20)	/* AVL does nothing */
+#define AR_L			(1 << 21)	/* L means "long mode" */
+#define AR_DB			(1 << 22)	/* D or B, depending on type */
+#define AR_G			(1 << 23)	/* G means "limit in pages" */
+
 #endif /* _ASM_X86_DESC_DEFS_H */
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 206996c1669d..784af1e49fc1 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -460,10 +460,35 @@  static int __setup_rt_frame(int sig, struct ksignal *ksig,
 	 * SS descriptor, but we do need SS to be valid.  It's possible
 	 * that the old SS is entirely bogus -- this can happen if the
 	 * signal we're trying to deliver is #GP or #SS caused by a bad
-	 * SS value.
+	 * SS value.  We also have a compatbility issue here: DOSEMU
+	 * relies on the contents of the SS register indicating the
+	 * SS value at the time of the signal, even though that code in
+	 * DOSEMU predates sigreturn's ability to restore SS.  (DOSEMU
+	 * avoids relying on sigreturn to restore SS; instead it uses
+	 * a trampoline.)  So we do our best: if the old SS was valid,
+	 * we keep it.  Otherwise we replace it.
 	 */
 	regs->cs = __USER_CS;
-	regs->ss = __USER_DS;
+
+	if (unlikely(regs->ss != __USER_DS)) {
+		u32 ar;
+		asm ("lar %[old_ss], %[ar]\n\t"
+		     "jz 1f\n\t"
+		     "xorl %[ar], %[ar]\n\t"	/* If invalid, set ar = 0 */
+		     "1:"
+		     : [ar] "=r" (ar)
+		     : [old_ss] "rm" ((u16)regs->ss));
+
+		/*
+		 * For a valid 64-bit user context, we need DPL 3, type
+		 * read-write data or read-write exp-down data, and S and P
+		 * set.
+		 */
+		ar &= AR_DPL_MASK | AR_S | AR_P | AR_TYPE_MASK;
+		if (ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA) &&
+		    ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA_EXPDOWN))
+			regs->ss = __USER_DS;
+	}
 
 	return 0;
 }