x86/shadow: Drop incorrect diagnostic when shadowing TSS.RSP0
diff mbox series

Message ID 1554491395-19127-1-git-send-email-andrew.cooper3@citrix.com
State New, archived
Headers show
Series
  • x86/shadow: Drop incorrect diagnostic when shadowing TSS.RSP0
Related show

Commit Message

Andrew Cooper April 5, 2019, 7:09 p.m. UTC
During development of the XTF pagewalk tests, I reliably encountered this
message exactly once per run.  It occurs when the first action to touch
TSS.RSP0 is an interrupt/exception taken in userspace, and the processor tries
to push the IRET frame.

Subsequently, OSSTest has demonstrated that it triggers frequently for a
KPTI-enabled kernel.

  (XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad2646687f38, mfn=0x2415a1
  [ 1411.949155] systemd-logind[2683]: New session 73 of user root.
  (XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad264671ff38, mfn=0x240a41
  (XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad2646837f38, mfn=0x2415c5
  (XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad26468a7f38, mfn=0x2414e7
  [ 1442.207473] systemd-logind[2683]: New session 74 of user root.
  [ 1471.452206] systemd-logind[2683]: New session 75 of user root.
  (XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad2646d17f08, mfn=0x2417c5
  [ 1501.698971] systemd-logind[2683]: New session 76 of user root.

The actions performed by the shadow code are correct, and the guest continues
without error, but the emitted error is misleading.  Tweak the comment more
clearly identify why the condition exists, but drop the message.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Tim Deegan <tim@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

I could have sworn I posted this patch a while ago, but I can't find any
evidence of actually having done so.  Oh well - better late than never.
---
 xen/arch/x86/mm/shadow/multi.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Comments

Jan Beulich April 8, 2019, 10:14 a.m. UTC | #1
>>> On 05.04.19 at 21:09, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -3305,8 +3305,9 @@ static int sh_page_fault(struct vcpu *v,
>      {
>          /*
>           * If we are in the middle of injecting an exception or interrupt then
> -         * we should not emulate: it is not the instruction at %eip that caused
> -         * the fault. Furthermore it is almost certainly the case the handler
> +         * we should not emulate: the fault is a side effect of the processor
> +         * trying to push an exception frame onto a stack which has yet to be
> +         * shadowed.  Furthermore it is almost certainly the case the handler
>           * stack is currently considered to be a page table, so we should
>           * unshadow the faulting page before exiting.
>           */

Your addition to me looks to contradict the part of the comment you
leave in place: You say "which has yet to be shadowed", while the
pre-existing text says "it is almost certainly the case the handler
stack is currently considered to be a page table", which to me means
it _is_ already shadowed (and in fact should not be).

In your addition, do you perhaps mean the page tables covering the
stack which have yet to be shadowed?

> @@ -3319,9 +3320,6 @@ static int sh_page_fault(struct vcpu *v,
>                  v->arch.paging.last_write_emul_ok = 0;
>              }
>  #endif
> -            gdprintk(XENLOG_DEBUG, "write to pagetable during event "
> -                     "injection: cr2=%#lx, mfn=%#lx\n",
> -                     va, mfn_x(gmfn));
>              sh_remove_shadows(d, gmfn, 0 /* thorough */, 1 /* must succeed */);

The "almost certainly" above makes me wonder whether it wouldn't
be better to leave the message in place, but put it under some
conditional. A perhaps too simplistic initial thought of mine was to
issue it in case sh_remove_shadows() ended up crashing the
guest. Considering sh_mfn_is_a_page_table(gmfn)'s return value
might be another option.

Jan
Andrew Cooper April 8, 2019, 11:37 a.m. UTC | #2
On 08/04/2019 11:14, Jan Beulich wrote:
>>>> On 05.04.19 at 21:09, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/mm/shadow/multi.c
>> +++ b/xen/arch/x86/mm/shadow/multi.c
>> @@ -3305,8 +3305,9 @@ static int sh_page_fault(struct vcpu *v,
>>      {
>>          /*
>>           * If we are in the middle of injecting an exception or interrupt then
>> -         * we should not emulate: it is not the instruction at %eip that caused
>> -         * the fault. Furthermore it is almost certainly the case the handler
>> +         * we should not emulate: the fault is a side effect of the processor
>> +         * trying to push an exception frame onto a stack which has yet to be
>> +         * shadowed.  Furthermore it is almost certainly the case the handler
>>           * stack is currently considered to be a page table, so we should
>>           * unshadow the faulting page before exiting.
>>           */
> Your addition to me looks to contradict the part of the comment you
> leave in place: You say "which has yet to be shadowed", while the
> pre-existing text says "it is almost certainly the case the handler
> stack is currently considered to be a page table", which to me means
> it _is_ already shadowed (and in fact should not be).
>
> In your addition, do you perhaps mean the page tables covering the
> stack which have yet to be shadowed?

This clause is inside an hvm_event_pending() which looks at VMCS/VMCB
pending injection.

This only becomes true via VT-x's

    __vmread(IDT_VECTORING_INFO, &idtv_info);
    if ( exit_reason != EXIT_REASON_TASK_SWITCH )
        vmx_idtv_reinject(idtv_info);

path, and the equivalent case on SVM which leaves the EVENTINJ field
valid after vmexit.  (This is assuming that we have no bugs whereby we
enter sh_page_fault() late, after some emulation has occurred.)

What this means is that the processor is trying to deliver an exception,
and the #PF intercept has been hit (which occurs before escalation to
#DF).  i.e. it is the memory reads/writes made by microcode which suffer
a fault due to the linear addresses not being present in the shadows.

Beyond that, there is a second aspect to getting here, which is when the
linear address hit something which the shadow code thinks is protected,
which AFAICT, starts off as everything which doesn't have an L1 shadow
pointing writeably at it.

In the XTF case where I encountered this first, it so happens that the
processor delivering an exception from userspace is the first thing to
ever touch the linear address at RSP0, so the stack always becomes
shadowed during IDT vectoring.

~Andrew
Jan Beulich April 8, 2019, 12:11 p.m. UTC | #3
>>> On 08.04.19 at 13:37, <andrew.cooper3@citrix.com> wrote:
> On 08/04/2019 11:14, Jan Beulich wrote:
>>>>> On 05.04.19 at 21:09, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/mm/shadow/multi.c
>>> +++ b/xen/arch/x86/mm/shadow/multi.c
>>> @@ -3305,8 +3305,9 @@ static int sh_page_fault(struct vcpu *v,
>>>      {
>>>          /*
>>>           * If we are in the middle of injecting an exception or interrupt then
>>> -         * we should not emulate: it is not the instruction at %eip that caused
>>> -         * the fault. Furthermore it is almost certainly the case the handler
>>> +         * we should not emulate: the fault is a side effect of the processor
>>> +         * trying to push an exception frame onto a stack which has yet to be
>>> +         * shadowed.  Furthermore it is almost certainly the case the handler
>>>           * stack is currently considered to be a page table, so we should
>>>           * unshadow the faulting page before exiting.
>>>           */
>> Your addition to me looks to contradict the part of the comment you
>> leave in place: You say "which has yet to be shadowed", while the
>> pre-existing text says "it is almost certainly the case the handler
>> stack is currently considered to be a page table", which to me means
>> it _is_ already shadowed (and in fact should not be).
>>
>> In your addition, do you perhaps mean the page tables covering the
>> stack which have yet to be shadowed?
> 
> This clause is inside an hvm_event_pending() which looks at VMCS/VMCB
> pending injection.
> 
> This only becomes true via VT-x's
> 
>     __vmread(IDT_VECTORING_INFO, &idtv_info);
>     if ( exit_reason != EXIT_REASON_TASK_SWITCH )
>         vmx_idtv_reinject(idtv_info);
> 
> path, and the equivalent case on SVM which leaves the EVENTINJ field
> valid after vmexit.  (This is assuming that we have no bugs whereby we
> enter sh_page_fault() late, after some emulation has occurred.)
> 
> What this means is that the processor is trying to deliver an exception,
> and the #PF intercept has been hit (which occurs before escalation to
> #DF).  i.e. it is the memory reads/writes made by microcode which suffer
> a fault due to the linear addresses not being present in the shadows.
> 
> Beyond that, there is a second aspect to getting here, which is when the
> linear address hit something which the shadow code thinks is protected,
> which AFAICT, starts off as everything which doesn't have an L1 shadow
> pointing writeably at it.
> 
> In the XTF case where I encountered this first, it so happens that the
> processor delivering an exception from userspace is the first thing to
> ever touch the linear address at RSP0, so the stack always becomes
> shadowed during IDT vectoring.

I'm (at least) mildly confused: I follow what you write (I think), but
you again say "the stack always becomes shadowed". My original
question was whether you really mean that, as stacks, if at all,
should get shadowed only unintentionally (and hence get un-shadowed
immediately when that condition is detected). That is, my (slightly
rephrased) question stands: Do you perhaps mean the page tables
mapping the stack to become shadowed, rather than the stack itself?

Jan
Andrew Cooper April 8, 2019, 12:32 p.m. UTC | #4
On 08/04/2019 13:11, Jan Beulich wrote:
>>>> On 08.04.19 at 13:37, <andrew.cooper3@citrix.com> wrote:
>> On 08/04/2019 11:14, Jan Beulich wrote:
>>>>>> On 05.04.19 at 21:09, <andrew.cooper3@citrix.com> wrote:
>>>> --- a/xen/arch/x86/mm/shadow/multi.c
>>>> +++ b/xen/arch/x86/mm/shadow/multi.c
>>>> @@ -3305,8 +3305,9 @@ static int sh_page_fault(struct vcpu *v,
>>>>      {
>>>>          /*
>>>>           * If we are in the middle of injecting an exception or interrupt then
>>>> -         * we should not emulate: it is not the instruction at %eip that caused
>>>> -         * the fault. Furthermore it is almost certainly the case the handler
>>>> +         * we should not emulate: the fault is a side effect of the processor
>>>> +         * trying to push an exception frame onto a stack which has yet to be
>>>> +         * shadowed.  Furthermore it is almost certainly the case the handler
>>>>           * stack is currently considered to be a page table, so we should
>>>>           * unshadow the faulting page before exiting.
>>>>           */
>>> Your addition to me looks to contradict the part of the comment you
>>> leave in place: You say "which has yet to be shadowed", while the
>>> pre-existing text says "it is almost certainly the case the handler
>>> stack is currently considered to be a page table", which to me means
>>> it _is_ already shadowed (and in fact should not be).
>>>
>>> In your addition, do you perhaps mean the page tables covering the
>>> stack which have yet to be shadowed?
>> This clause is inside an hvm_event_pending() which looks at VMCS/VMCB
>> pending injection.
>>
>> This only becomes true via VT-x's
>>
>>     __vmread(IDT_VECTORING_INFO, &idtv_info);
>>     if ( exit_reason != EXIT_REASON_TASK_SWITCH )
>>         vmx_idtv_reinject(idtv_info);
>>
>> path, and the equivalent case on SVM which leaves the EVENTINJ field
>> valid after vmexit.  (This is assuming that we have no bugs whereby we
>> enter sh_page_fault() late, after some emulation has occurred.)
>>
>> What this means is that the processor is trying to deliver an exception,
>> and the #PF intercept has been hit (which occurs before escalation to
>> #DF).  i.e. it is the memory reads/writes made by microcode which suffer
>> a fault due to the linear addresses not being present in the shadows.
>>
>> Beyond that, there is a second aspect to getting here, which is when the
>> linear address hit something which the shadow code thinks is protected,
>> which AFAICT, starts off as everything which doesn't have an L1 shadow
>> pointing writeably at it.
>>
>> In the XTF case where I encountered this first, it so happens that the
>> processor delivering an exception from userspace is the first thing to
>> ever touch the linear address at RSP0, so the stack always becomes
>> shadowed during IDT vectoring.
> I'm (at least) mildly confused: I follow what you write (I think), but
> you again say "the stack always becomes shadowed". My original
> question was whether you really mean that, as stacks, if at all,
> should get shadowed only unintentionally (and hence get un-shadowed
> immediately when that condition is detected). That is, my (slightly
> rephrased) question stands: Do you perhaps mean the page tables
> mapping the stack to become shadowed, rather than the stack itself?

I guess this is an issue of terminology, to which I defer to Tim to judge.

But yes - I mean is "the linear address mapping RSP0 getting entered
into the shadow pagetables".

~Andrew
Tim Deegan April 16, 2019, 7:48 p.m. UTC | #5
At 13:32 +0100 on 08 Apr (1554730320), Andrew Cooper wrote:
> On 08/04/2019 13:11, Jan Beulich wrote:
> >>>> On 08.04.19 at 13:37, <andrew.cooper3@citrix.com> wrote:
> >> On 08/04/2019 11:14, Jan Beulich wrote:
> >>>>>> On 05.04.19 at 21:09, <andrew.cooper3@citrix.com> wrote:
> >>>> --- a/xen/arch/x86/mm/shadow/multi.c
> >>>> +++ b/xen/arch/x86/mm/shadow/multi.c
> >>>> @@ -3305,8 +3305,9 @@ static int sh_page_fault(struct vcpu *v,
> >>>>      {
> >>>>          /*
> >>>>           * If we are in the middle of injecting an exception or interrupt then
> >>>> -         * we should not emulate: it is not the instruction at %eip that caused
> >>>> -         * the fault. Furthermore it is almost certainly the case the handler
> >>>> +         * we should not emulate: the fault is a side effect of the processor
> >>>> +         * trying to push an exception frame onto a stack which has yet to be
> >>>> +         * shadowed.  Furthermore it is almost certainly the case the handler
> >>>>           * stack is currently considered to be a page table, so we should
> >>>>           * unshadow the faulting page before exiting.
> >>>>           */
> > I'm (at least) mildly confused: I follow what you write (I think), but
> > you again say "the stack always becomes shadowed". My original
> > question was whether you really mean that, as stacks, if at all,
> > should get shadowed only unintentionally (and hence get un-shadowed
> > immediately when that condition is detected). That is, my (slightly
> > rephrased) question stands: Do you perhaps mean the page tables
> > mapping the stack to become shadowed, rather than the stack itself?
> 
> I guess this is an issue of terminology, to which I defer to Tim to judge.

Hi,

Sorry for the delay; I have been away.

FWIW I prefer the original comment, and I think that referring to the
stack as "not yet shadowed" is confusing when the problem might be
that the stack itself is indeed shadowed.  I'd also be happy with just
saying "the fault is a side effect of the processor trying to push an
exception frame onto the stack."

Happy to see the debug message being removed if it's being triggered
in the real world.  IIRC it has not proved to be useful since that
code was first developed.  So, with the comment adjusted,
Acked-by: Tim Deegan <tim@xen.org>

Cheers,

Tim.

Patch
diff mbox series

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 1d282c9..460ec80 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3305,8 +3305,9 @@  static int sh_page_fault(struct vcpu *v,
     {
         /*
          * If we are in the middle of injecting an exception or interrupt then
-         * we should not emulate: it is not the instruction at %eip that caused
-         * the fault. Furthermore it is almost certainly the case the handler
+         * we should not emulate: the fault is a side effect of the processor
+         * trying to push an exception frame onto a stack which has yet to be
+         * shadowed.  Furthermore it is almost certainly the case the handler
          * stack is currently considered to be a page table, so we should
          * unshadow the faulting page before exiting.
          */
@@ -3319,9 +3320,6 @@  static int sh_page_fault(struct vcpu *v,
                 v->arch.paging.last_write_emul_ok = 0;
             }
 #endif
-            gdprintk(XENLOG_DEBUG, "write to pagetable during event "
-                     "injection: cr2=%#lx, mfn=%#lx\n",
-                     va, mfn_x(gmfn));
             sh_remove_shadows(d, gmfn, 0 /* thorough */, 1 /* must succeed */);
             trace_shadow_emulate_other(TRC_SHADOW_EMULATE_UNSHADOW_EVTINJ,
                                        va, gfn);