diff mbox series

x86/traps: 'Fix' safety of read_registers() in #DF path

Message ID 20201012134908.27497-1-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/traps: 'Fix' safety of read_registers() in #DF path | expand

Commit Message

Andrew Cooper Oct. 12, 2020, 1:49 p.m. UTC
All interrupts and exceptions pass a struct cpu_user_regs up into C.  This
contains the legacy vm86 fields from 32bit days, which are beyond the
hardware-pushed frame.

Accessing these fields is generally illegal, as they are logically out of
bounds for anything other than an interrupt/exception hitting ring1/3 code.

Unfortunately, the #DF handler uses these fields as part of preparing the
state dump, and being IST, accesses the adjacent stack frame.

This has been broken forever, but c/s 6001660473 "x86/shstk: Rework the stack
layout to support shadow stacks" repositioned the #DF stack to be adjacent to
the guard page, which turns this OoB write into a fatal pagefault:

  (XEN) *** DOUBLE FAULT ***
  (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
  (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
  (XEN) CPU:    4
  (XEN) RIP:    e008:[<ffff82d04031fd4f>] traps.c#read_registers+0x29/0xc1
  (XEN) RFLAGS: 0000000000050086   CONTEXT: hypervisor (d1v0)
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d04031fd4f>] R traps.c#read_registers+0x29/0xc1
  (XEN)    [<ffff82d0403207b3>] F do_double_fault+0x3d/0x7e
  (XEN)    [<ffff82d04039acd7>] F double_fault+0x107/0x110
  (XEN)
  (XEN) Pagetable walk from ffff830236f6d008:
  (XEN)  L4[0x106] = 80000000bfa9b063 ffffffffffffffff
  (XEN)  L3[0x008] = 0000000236ffd063 ffffffffffffffff
  (XEN)  L2[0x1b7] = 0000000236ffc063 ffffffffffffffff
  (XEN)  L1[0x16d] = 8000000236f6d161 ffffffffffffffff
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 4:
  (XEN) FATAL PAGE FAULT
  (XEN) [error_code=0003]
  (XEN) Faulting linear address: ffff830236f6d008
  (XEN) ****************************************
  (XEN)

and rendering the main #DF analysis broken.

The proper fix is to delete cpu_user_regs.es and later, so no
interrupt/exception path can access OoB, but this needs disentangling from the
PV ABI first.

Not-really-fixes: 6001660473 ("x86/shstk: Rework the stack layout to support shadow stacks")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
---
 xen/arch/x86/cpu/common.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Jan Beulich Oct. 13, 2020, 3:51 p.m. UTC | #1
On 12.10.2020 15:49, Andrew Cooper wrote:
> All interrupts and exceptions pass a struct cpu_user_regs up into C.  This
> contains the legacy vm86 fields from 32bit days, which are beyond the
> hardware-pushed frame.
> 
> Accessing these fields is generally illegal, as they are logically out of
> bounds for anything other than an interrupt/exception hitting ring1/3 code.
> 
> Unfortunately, the #DF handler uses these fields as part of preparing the
> state dump, and being IST, accesses the adjacent stack frame.
> 
> This has been broken forever, but c/s 6001660473 "x86/shstk: Rework the stack
> layout to support shadow stacks" repositioned the #DF stack to be adjacent to
> the guard page, which turns this OoB write into a fatal pagefault:
> 
>   (XEN) *** DOUBLE FAULT ***
>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>   (XEN) CPU:    4
>   (XEN) RIP:    e008:[<ffff82d04031fd4f>] traps.c#read_registers+0x29/0xc1
>   (XEN) RFLAGS: 0000000000050086   CONTEXT: hypervisor (d1v0)
>   ...
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d04031fd4f>] R traps.c#read_registers+0x29/0xc1
>   (XEN)    [<ffff82d0403207b3>] F do_double_fault+0x3d/0x7e
>   (XEN)    [<ffff82d04039acd7>] F double_fault+0x107/0x110
>   (XEN)
>   (XEN) Pagetable walk from ffff830236f6d008:
>   (XEN)  L4[0x106] = 80000000bfa9b063 ffffffffffffffff
>   (XEN)  L3[0x008] = 0000000236ffd063 ffffffffffffffff
>   (XEN)  L2[0x1b7] = 0000000236ffc063 ffffffffffffffff
>   (XEN)  L1[0x16d] = 8000000236f6d161 ffffffffffffffff
>   (XEN)
>   (XEN) ****************************************
>   (XEN) Panic on CPU 4:
>   (XEN) FATAL PAGE FAULT
>   (XEN) [error_code=0003]
>   (XEN) Faulting linear address: ffff830236f6d008
>   (XEN) ****************************************
>   (XEN)
> 
> and rendering the main #DF analysis broken.
> 
> The proper fix is to delete cpu_user_regs.es and later, so no
> interrupt/exception path can access OoB, but this needs disentangling from the
> PV ABI first.
> 
> Not-really-fixes: 6001660473 ("x86/shstk: Rework the stack layout to support shadow stacks")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Is it perhaps worth also saying explicitly that the other IST
stacks don't suffer the same problem because show_registers()
makes an local copy of the passed in struct? (Doing so also
for #DF would apparently be an alternative solution.)

Jan
Andrew Cooper Oct. 14, 2020, 6 p.m. UTC | #2
On 13/10/2020 16:51, Jan Beulich wrote:
> On 12.10.2020 15:49, Andrew Cooper wrote:
>> All interrupts and exceptions pass a struct cpu_user_regs up into C.  This
>> contains the legacy vm86 fields from 32bit days, which are beyond the
>> hardware-pushed frame.
>>
>> Accessing these fields is generally illegal, as they are logically out of
>> bounds for anything other than an interrupt/exception hitting ring1/3 code.
>>
>> Unfortunately, the #DF handler uses these fields as part of preparing the
>> state dump, and being IST, accesses the adjacent stack frame.
>>
>> This has been broken forever, but c/s 6001660473 "x86/shstk: Rework the stack
>> layout to support shadow stacks" repositioned the #DF stack to be adjacent to
>> the guard page, which turns this OoB write into a fatal pagefault:
>>
>>   (XEN) *** DOUBLE FAULT ***
>>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>>   (XEN) CPU:    4
>>   (XEN) RIP:    e008:[<ffff82d04031fd4f>] traps.c#read_registers+0x29/0xc1
>>   (XEN) RFLAGS: 0000000000050086   CONTEXT: hypervisor (d1v0)
>>   ...
>>   (XEN) Xen call trace:
>>   (XEN)    [<ffff82d04031fd4f>] R traps.c#read_registers+0x29/0xc1
>>   (XEN)    [<ffff82d0403207b3>] F do_double_fault+0x3d/0x7e
>>   (XEN)    [<ffff82d04039acd7>] F double_fault+0x107/0x110
>>   (XEN)
>>   (XEN) Pagetable walk from ffff830236f6d008:
>>   (XEN)  L4[0x106] = 80000000bfa9b063 ffffffffffffffff
>>   (XEN)  L3[0x008] = 0000000236ffd063 ffffffffffffffff
>>   (XEN)  L2[0x1b7] = 0000000236ffc063 ffffffffffffffff
>>   (XEN)  L1[0x16d] = 8000000236f6d161 ffffffffffffffff
>>   (XEN)
>>   (XEN) ****************************************
>>   (XEN) Panic on CPU 4:
>>   (XEN) FATAL PAGE FAULT
>>   (XEN) [error_code=0003]
>>   (XEN) Faulting linear address: ffff830236f6d008
>>   (XEN) ****************************************
>>   (XEN)
>>
>> and rendering the main #DF analysis broken.
>>
>> The proper fix is to delete cpu_user_regs.es and later, so no
>> interrupt/exception path can access OoB, but this needs disentangling from the
>> PV ABI first.
>>
>> Not-really-fixes: 6001660473 ("x86/shstk: Rework the stack layout to support shadow stacks")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> Is it perhaps worth also saying explicitly that the other IST
> stacks don't suffer the same problem because show_registers()
> makes an local copy of the passed in struct? (Doing so also
> for #DF would apparently be an alternative solution.)

They're not safe.  They merely don't explode.

https://lore.kernel.org/xen-devel/1532546157-5974-1-git-send-email-andrew.cooper3@citrix.com/
was "fixed" by CET-SS turning the guard page from not present to
read-only, but the same CET-SS series swapped #DB for #DF when it comes
to the single OoB write problem case.

~Andrew
Jan Beulich Oct. 15, 2020, 7:27 a.m. UTC | #3
On 14.10.2020 20:00, Andrew Cooper wrote:
> On 13/10/2020 16:51, Jan Beulich wrote:
>> On 12.10.2020 15:49, Andrew Cooper wrote:
>>> All interrupts and exceptions pass a struct cpu_user_regs up into C.  This
>>> contains the legacy vm86 fields from 32bit days, which are beyond the
>>> hardware-pushed frame.
>>>
>>> Accessing these fields is generally illegal, as they are logically out of
>>> bounds for anything other than an interrupt/exception hitting ring1/3 code.
>>>
>>> Unfortunately, the #DF handler uses these fields as part of preparing the
>>> state dump, and being IST, accesses the adjacent stack frame.
>>>
>>> This has been broken forever, but c/s 6001660473 "x86/shstk: Rework the stack
>>> layout to support shadow stacks" repositioned the #DF stack to be adjacent to
>>> the guard page, which turns this OoB write into a fatal pagefault:
>>>
>>>   (XEN) *** DOUBLE FAULT ***
>>>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>>>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>>>   (XEN) CPU:    4
>>>   (XEN) RIP:    e008:[<ffff82d04031fd4f>] traps.c#read_registers+0x29/0xc1
>>>   (XEN) RFLAGS: 0000000000050086   CONTEXT: hypervisor (d1v0)
>>>   ...
>>>   (XEN) Xen call trace:
>>>   (XEN)    [<ffff82d04031fd4f>] R traps.c#read_registers+0x29/0xc1
>>>   (XEN)    [<ffff82d0403207b3>] F do_double_fault+0x3d/0x7e
>>>   (XEN)    [<ffff82d04039acd7>] F double_fault+0x107/0x110
>>>   (XEN)
>>>   (XEN) Pagetable walk from ffff830236f6d008:
>>>   (XEN)  L4[0x106] = 80000000bfa9b063 ffffffffffffffff
>>>   (XEN)  L3[0x008] = 0000000236ffd063 ffffffffffffffff
>>>   (XEN)  L2[0x1b7] = 0000000236ffc063 ffffffffffffffff
>>>   (XEN)  L1[0x16d] = 8000000236f6d161 ffffffffffffffff
>>>   (XEN)
>>>   (XEN) ****************************************
>>>   (XEN) Panic on CPU 4:
>>>   (XEN) FATAL PAGE FAULT
>>>   (XEN) [error_code=0003]
>>>   (XEN) Faulting linear address: ffff830236f6d008
>>>   (XEN) ****************************************
>>>   (XEN)
>>>
>>> and rendering the main #DF analysis broken.
>>>
>>> The proper fix is to delete cpu_user_regs.es and later, so no
>>> interrupt/exception path can access OoB, but this needs disentangling from the
>>> PV ABI first.
>>>
>>> Not-really-fixes: 6001660473 ("x86/shstk: Rework the stack layout to support shadow stacks")
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>> Is it perhaps worth also saying explicitly that the other IST
>> stacks don't suffer the same problem because show_registers()
>> makes an local copy of the passed in struct? (Doing so also
>> for #DF would apparently be an alternative solution.)
> 
> They're not safe.  They merely don't explode.
> 
> https://lore.kernel.org/xen-devel/1532546157-5974-1-git-send-email-andrew.cooper3@citrix.com/
> was "fixed" by CET-SS turning the guard page from not present to
> read-only, but the same CET-SS series swapped #DB for #DF when it comes
> to the single OoB write problem case.

I see. While indeed I didn't pay attention to the OoB read aspect,
me saying "the other IST stacks don't suffer the same problem" was
still correct, wasn't it? Anyway - not a big deal.

Jan
Andrew Cooper Oct. 16, 2020, 10:58 a.m. UTC | #4
On 15/10/2020 08:27, Jan Beulich wrote:
> On 14.10.2020 20:00, Andrew Cooper wrote:
>> On 13/10/2020 16:51, Jan Beulich wrote:
>>> On 12.10.2020 15:49, Andrew Cooper wrote:
>>>> All interrupts and exceptions pass a struct cpu_user_regs up into C.  This
>>>> contains the legacy vm86 fields from 32bit days, which are beyond the
>>>> hardware-pushed frame.
>>>>
>>>> Accessing these fields is generally illegal, as they are logically out of
>>>> bounds for anything other than an interrupt/exception hitting ring1/3 code.
>>>>
>>>> Unfortunately, the #DF handler uses these fields as part of preparing the
>>>> state dump, and being IST, accesses the adjacent stack frame.
>>>>
>>>> This has been broken forever, but c/s 6001660473 "x86/shstk: Rework the stack
>>>> layout to support shadow stacks" repositioned the #DF stack to be adjacent to
>>>> the guard page, which turns this OoB write into a fatal pagefault:
>>>>
>>>>   (XEN) *** DOUBLE FAULT ***
>>>>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>>>>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>>>>   (XEN) CPU:    4
>>>>   (XEN) RIP:    e008:[<ffff82d04031fd4f>] traps.c#read_registers+0x29/0xc1
>>>>   (XEN) RFLAGS: 0000000000050086   CONTEXT: hypervisor (d1v0)
>>>>   ...
>>>>   (XEN) Xen call trace:
>>>>   (XEN)    [<ffff82d04031fd4f>] R traps.c#read_registers+0x29/0xc1
>>>>   (XEN)    [<ffff82d0403207b3>] F do_double_fault+0x3d/0x7e
>>>>   (XEN)    [<ffff82d04039acd7>] F double_fault+0x107/0x110
>>>>   (XEN)
>>>>   (XEN) Pagetable walk from ffff830236f6d008:
>>>>   (XEN)  L4[0x106] = 80000000bfa9b063 ffffffffffffffff
>>>>   (XEN)  L3[0x008] = 0000000236ffd063 ffffffffffffffff
>>>>   (XEN)  L2[0x1b7] = 0000000236ffc063 ffffffffffffffff
>>>>   (XEN)  L1[0x16d] = 8000000236f6d161 ffffffffffffffff
>>>>   (XEN)
>>>>   (XEN) ****************************************
>>>>   (XEN) Panic on CPU 4:
>>>>   (XEN) FATAL PAGE FAULT
>>>>   (XEN) [error_code=0003]
>>>>   (XEN) Faulting linear address: ffff830236f6d008
>>>>   (XEN) ****************************************
>>>>   (XEN)
>>>>
>>>> and rendering the main #DF analysis broken.
>>>>
>>>> The proper fix is to delete cpu_user_regs.es and later, so no
>>>> interrupt/exception path can access OoB, but this needs disentangling from the
>>>> PV ABI first.
>>>>
>>>> Not-really-fixes: 6001660473 ("x86/shstk: Rework the stack layout to support shadow stacks")
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> Is it perhaps worth also saying explicitly that the other IST
>>> stacks don't suffer the same problem because show_registers()
>>> makes an local copy of the passed in struct? (Doing so also
>>> for #DF would apparently be an alternative solution.)
>> They're not safe.  They merely don't explode.
>>
>> https://lore.kernel.org/xen-devel/1532546157-5974-1-git-send-email-andrew.cooper3@citrix.com/
>> was "fixed" by CET-SS turning the guard page from not present to
>> read-only, but the same CET-SS series swapped #DB for #DF when it comes
>> to the single OoB write problem case.
> I see. While indeed I didn't pay attention to the OoB read aspect,
> me saying "the other IST stacks don't suffer the same problem" was
> still correct, wasn't it? Anyway - not a big deal.

I've tweaked the commit message to make this more clear.

--8<---
Accessing these fields is generally illegal, as they are logically out of
bounds for anything other than an interrupt/exception hitting ring1/3 code.

show_registers() unconditionally reads these fields, but the content is
discarded before use.  This is benign right now, as all parts of the
stack are
readable, including the guard pages.

However, read_registers() in the #DF handler writes to these fields as
part of
preparing the state dump, and being IST, hits the adjacent stack frame.
--8<--

~Andrew
Jan Beulich Oct. 16, 2020, 11:03 a.m. UTC | #5
On 16.10.2020 12:58, Andrew Cooper wrote:
> On 15/10/2020 08:27, Jan Beulich wrote:
>> On 14.10.2020 20:00, Andrew Cooper wrote:
>>> On 13/10/2020 16:51, Jan Beulich wrote:
>>>> On 12.10.2020 15:49, Andrew Cooper wrote:
>>>>> All interrupts and exceptions pass a struct cpu_user_regs up into C.  This
>>>>> contains the legacy vm86 fields from 32bit days, which are beyond the
>>>>> hardware-pushed frame.
>>>>>
>>>>> Accessing these fields is generally illegal, as they are logically out of
>>>>> bounds for anything other than an interrupt/exception hitting ring1/3 code.
>>>>>
>>>>> Unfortunately, the #DF handler uses these fields as part of preparing the
>>>>> state dump, and being IST, accesses the adjacent stack frame.
>>>>>
>>>>> This has been broken forever, but c/s 6001660473 "x86/shstk: Rework the stack
>>>>> layout to support shadow stacks" repositioned the #DF stack to be adjacent to
>>>>> the guard page, which turns this OoB write into a fatal pagefault:
>>>>>
>>>>>   (XEN) *** DOUBLE FAULT ***
>>>>>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>>>>>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>>>>>   (XEN) CPU:    4
>>>>>   (XEN) RIP:    e008:[<ffff82d04031fd4f>] traps.c#read_registers+0x29/0xc1
>>>>>   (XEN) RFLAGS: 0000000000050086   CONTEXT: hypervisor (d1v0)
>>>>>   ...
>>>>>   (XEN) Xen call trace:
>>>>>   (XEN)    [<ffff82d04031fd4f>] R traps.c#read_registers+0x29/0xc1
>>>>>   (XEN)    [<ffff82d0403207b3>] F do_double_fault+0x3d/0x7e
>>>>>   (XEN)    [<ffff82d04039acd7>] F double_fault+0x107/0x110
>>>>>   (XEN)
>>>>>   (XEN) Pagetable walk from ffff830236f6d008:
>>>>>   (XEN)  L4[0x106] = 80000000bfa9b063 ffffffffffffffff
>>>>>   (XEN)  L3[0x008] = 0000000236ffd063 ffffffffffffffff
>>>>>   (XEN)  L2[0x1b7] = 0000000236ffc063 ffffffffffffffff
>>>>>   (XEN)  L1[0x16d] = 8000000236f6d161 ffffffffffffffff
>>>>>   (XEN)
>>>>>   (XEN) ****************************************
>>>>>   (XEN) Panic on CPU 4:
>>>>>   (XEN) FATAL PAGE FAULT
>>>>>   (XEN) [error_code=0003]
>>>>>   (XEN) Faulting linear address: ffff830236f6d008
>>>>>   (XEN) ****************************************
>>>>>   (XEN)
>>>>>
>>>>> and rendering the main #DF analysis broken.
>>>>>
>>>>> The proper fix is to delete cpu_user_regs.es and later, so no
>>>>> interrupt/exception path can access OoB, but this needs disentangling from the
>>>>> PV ABI first.
>>>>>
>>>>> Not-really-fixes: 6001660473 ("x86/shstk: Rework the stack layout to support shadow stacks")
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> Is it perhaps worth also saying explicitly that the other IST
>>>> stacks don't suffer the same problem because show_registers()
>>>> makes an local copy of the passed in struct? (Doing so also
>>>> for #DF would apparently be an alternative solution.)
>>> They're not safe.  They merely don't explode.
>>>
>>> https://lore.kernel.org/xen-devel/1532546157-5974-1-git-send-email-andrew.cooper3@citrix.com/
>>> was "fixed" by CET-SS turning the guard page from not present to
>>> read-only, but the same CET-SS series swapped #DB for #DF when it comes
>>> to the single OoB write problem case.
>> I see. While indeed I didn't pay attention to the OoB read aspect,
>> me saying "the other IST stacks don't suffer the same problem" was
>> still correct, wasn't it? Anyway - not a big deal.
> 
> I've tweaked the commit message to make this more clear.
> 
> --8<---
> Accessing these fields is generally illegal, as they are logically out of
> bounds for anything other than an interrupt/exception hitting ring1/3 code.
> 
> show_registers() unconditionally reads these fields, but the content is
> discarded before use.  This is benign right now, as all parts of the
> stack are
> readable, including the guard pages.
> 
> However, read_registers() in the #DF handler writes to these fields as
> part of
> preparing the state dump, and being IST, hits the adjacent stack frame.
> --8<--

Thanks, lgtm.

Jan
Andrew Cooper Oct. 16, 2020, 11:24 a.m. UTC | #6
On 16/10/2020 12:03, Jan Beulich wrote:
> On 16.10.2020 12:58, Andrew Cooper wrote:
>> On 15/10/2020 08:27, Jan Beulich wrote:
>>> On 14.10.2020 20:00, Andrew Cooper wrote:
>>>> On 13/10/2020 16:51, Jan Beulich wrote:
>>>>> On 12.10.2020 15:49, Andrew Cooper wrote:
>>>>>> All interrupts and exceptions pass a struct cpu_user_regs up into C.  This
>>>>>> contains the legacy vm86 fields from 32bit days, which are beyond the
>>>>>> hardware-pushed frame.
>>>>>>
>>>>>> Accessing these fields is generally illegal, as they are logically out of
>>>>>> bounds for anything other than an interrupt/exception hitting ring1/3 code.
>>>>>>
>>>>>> Unfortunately, the #DF handler uses these fields as part of preparing the
>>>>>> state dump, and being IST, accesses the adjacent stack frame.
>>>>>>
>>>>>> This has been broken forever, but c/s 6001660473 "x86/shstk: Rework the stack
>>>>>> layout to support shadow stacks" repositioned the #DF stack to be adjacent to
>>>>>> the guard page, which turns this OoB write into a fatal pagefault:
>>>>>>
>>>>>>   (XEN) *** DOUBLE FAULT ***
>>>>>>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>>>>>>   (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:  C   ]----
>>>>>>   (XEN) CPU:    4
>>>>>>   (XEN) RIP:    e008:[<ffff82d04031fd4f>] traps.c#read_registers+0x29/0xc1
>>>>>>   (XEN) RFLAGS: 0000000000050086   CONTEXT: hypervisor (d1v0)
>>>>>>   ...
>>>>>>   (XEN) Xen call trace:
>>>>>>   (XEN)    [<ffff82d04031fd4f>] R traps.c#read_registers+0x29/0xc1
>>>>>>   (XEN)    [<ffff82d0403207b3>] F do_double_fault+0x3d/0x7e
>>>>>>   (XEN)    [<ffff82d04039acd7>] F double_fault+0x107/0x110
>>>>>>   (XEN)
>>>>>>   (XEN) Pagetable walk from ffff830236f6d008:
>>>>>>   (XEN)  L4[0x106] = 80000000bfa9b063 ffffffffffffffff
>>>>>>   (XEN)  L3[0x008] = 0000000236ffd063 ffffffffffffffff
>>>>>>   (XEN)  L2[0x1b7] = 0000000236ffc063 ffffffffffffffff
>>>>>>   (XEN)  L1[0x16d] = 8000000236f6d161 ffffffffffffffff
>>>>>>   (XEN)
>>>>>>   (XEN) ****************************************
>>>>>>   (XEN) Panic on CPU 4:
>>>>>>   (XEN) FATAL PAGE FAULT
>>>>>>   (XEN) [error_code=0003]
>>>>>>   (XEN) Faulting linear address: ffff830236f6d008
>>>>>>   (XEN) ****************************************
>>>>>>   (XEN)
>>>>>>
>>>>>> and rendering the main #DF analysis broken.
>>>>>>
>>>>>> The proper fix is to delete cpu_user_regs.es and later, so no
>>>>>> interrupt/exception path can access OoB, but this needs disentangling from the
>>>>>> PV ABI first.
>>>>>>
>>>>>> Not-really-fixes: 6001660473 ("x86/shstk: Rework the stack layout to support shadow stacks")
>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>> Is it perhaps worth also saying explicitly that the other IST
>>>>> stacks don't suffer the same problem because show_registers()
>>>>> makes an local copy of the passed in struct? (Doing so also
>>>>> for #DF would apparently be an alternative solution.)
>>>> They're not safe.  They merely don't explode.
>>>>
>>>> https://lore.kernel.org/xen-devel/1532546157-5974-1-git-send-email-andrew.cooper3@citrix.com/
>>>> was "fixed" by CET-SS turning the guard page from not present to
>>>> read-only, but the same CET-SS series swapped #DB for #DF when it comes
>>>> to the single OoB write problem case.
>>> I see. While indeed I didn't pay attention to the OoB read aspect,
>>> me saying "the other IST stacks don't suffer the same problem" was
>>> still correct, wasn't it? Anyway - not a big deal.
>> I've tweaked the commit message to make this more clear.
>>
>> --8<---
>> Accessing these fields is generally illegal, as they are logically out of
>> bounds for anything other than an interrupt/exception hitting ring1/3 code.
>>
>> show_registers() unconditionally reads these fields, but the content is
>> discarded before use.  This is benign right now, as all parts of the
>> stack are
>> readable, including the guard pages.
>>
>> However, read_registers() in the #DF handler writes to these fields as
>> part of
>> preparing the state dump, and being IST, hits the adjacent stack frame.
>> --8<--
> Thanks, lgtm.

On a tangent, what are your views WRT backport beyond 4.14?

Back then, it was #DB which was adjacent to the guard frame (which was
not present), but it doesn't use show_registers() by default, so I think
the problem is mostly hidden.

~Andrew
Jan Beulich Oct. 16, 2020, 11:55 a.m. UTC | #7
On 16.10.2020 13:24, Andrew Cooper wrote:
> On a tangent, what are your views WRT backport beyond 4.14?
> 
> Back then, it was #DB which was adjacent to the guard frame (which was
> not present), but it doesn't use show_registers() by default, so I think
> the problem is mostly hidden.

I wasn't fully decided yet, but as long as it applies reasonably
cleanly I think I'm leaning towards also putting it on 4.13.
4.12 closes anyway once 4.12.4 is out, and I don't think I want
to pick up not-really-urgent changes for putting there beyond
the few ones that I already have (and that I mean to put in
alongside the XSA fixes on Tuesday); I could be talked into it,
though.

Jan
Andrew Cooper Oct. 16, 2020, 12:07 p.m. UTC | #8
On 16/10/2020 12:55, Jan Beulich wrote:
> On 16.10.2020 13:24, Andrew Cooper wrote:
>> On a tangent, what are your views WRT backport beyond 4.14?
>>
>> Back then, it was #DB which was adjacent to the guard frame (which was
>> not present), but it doesn't use show_registers() by default, so I think
>> the problem is mostly hidden.
> I wasn't fully decided yet, but as long as it applies reasonably
> cleanly I think I'm leaning towards also putting it on 4.13.
> 4.12 closes anyway once 4.12.4 is out, and I don't think I want
> to pick up not-really-urgent changes for putting there beyond
> the few ones that I already have (and that I mean to put in
> alongside the XSA fixes on Tuesday); I could be talked into it,
> though.

The question I was asking was really "should I try and make an
equivalent fix for 4.13 and older".

While the base premise of the fix would be the same, the logic in
load_system_tables() is different, and the commit message is completely
wrong.

I only encountered this problem with added instrumentation in the #DB
handler, which is why I'm questioning the utility of going to this effort.

~Andrew
Jan Beulich Oct. 16, 2020, 12:14 p.m. UTC | #9
On 16.10.2020 14:07, Andrew Cooper wrote:
> On 16/10/2020 12:55, Jan Beulich wrote:
>> On 16.10.2020 13:24, Andrew Cooper wrote:
>>> On a tangent, what are your views WRT backport beyond 4.14?
>>>
>>> Back then, it was #DB which was adjacent to the guard frame (which was
>>> not present), but it doesn't use show_registers() by default, so I think
>>> the problem is mostly hidden.
>> I wasn't fully decided yet, but as long as it applies reasonably
>> cleanly I think I'm leaning towards also putting it on 4.13.
>> 4.12 closes anyway once 4.12.4 is out, and I don't think I want
>> to pick up not-really-urgent changes for putting there beyond
>> the few ones that I already have (and that I mean to put in
>> alongside the XSA fixes on Tuesday); I could be talked into it,
>> though.
> 
> The question I was asking was really "should I try and make an
> equivalent fix for 4.13 and older".

Oh, I see.

> While the base premise of the fix would be the same, the logic in
> load_system_tables() is different, and the commit message is completely
> wrong.
> 
> I only encountered this problem with added instrumentation in the #DB
> handler, which is why I'm questioning the utility of going to this effort.

Yeah, then probably not worth it.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index da74172776..a684519a20 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -770,7 +770,13 @@  void load_system_tables(void)
 	tss->ist[IST_MCE - 1] = stack_top + (1 + IST_MCE) * PAGE_SIZE;
 	tss->ist[IST_NMI - 1] = stack_top + (1 + IST_NMI) * PAGE_SIZE;
 	tss->ist[IST_DB  - 1] = stack_top + (1 + IST_DB)  * PAGE_SIZE;
-	tss->ist[IST_DF  - 1] = stack_top + (1 + IST_DF)  * PAGE_SIZE;
+	/*
+	 * Gross bodge.  The #DF handler uses the vm86 fields of cpu_user_regs
+	 * beyond the hardware frame.  Adjust the stack entrypoint so this
+	 * doesn't manifest as an OoB write which hits the guard page.
+	 */
+	tss->ist[IST_DF  - 1] = stack_top + (1 + IST_DF)  * PAGE_SIZE -
+		(sizeof(struct cpu_user_regs) - offsetof(struct cpu_user_regs, es));
 	tss->bitmap = IOBMP_INVALID_OFFSET;
 
 	/* All other stack pointers poisioned. */