diff mbox

x86/ioreq server: Fix DomU couldn't reboot when using p2m_ioreq_server p2m_type

Message ID 1493956334-3310-1-git-send-email-xiong.y.zhang@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhang, Xiong Y May 5, 2017, 3:52 a.m. UTC
'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
outstanding p2m_ioreq_server entries")' will call
p2m_change_entry_type_global() which set entry.recalc=1. Then
the following get_entry(p2m_ioreq_server) will return
p2m_ram_rw type.
But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
type, then reset p2m_ioreq_server entries. The fact is the assumption
isn't true, and sysnchronously reset function couldn't work. Then
ioreq.entry_count is larger than zero after an ioreq server unmaps,
finally this results DomU couldn't reboot.

This patch add a P2M_PRE_RECALC flag to p2m_query_t, then
get_entry(P2M_PRE_RECALC) will return p2m_ioreq_server type
for p2m_ioreq_server pfn, and finally change mem type through set_entry.

Fix: 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
      outstanding p2m_ioreq_server entries when an ioreq server unmaps")'

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 xen/arch/x86/mm/p2m-ept.c |  7 +++++--
 xen/arch/x86/mm/p2m-pt.c  | 19 ++++++++++++++-----
 xen/arch/x86/mm/p2m.c     |  8 +++++---
 xen/include/asm-x86/p2m.h |  1 +
 4 files changed, 25 insertions(+), 10 deletions(-)

Comments

Jan Beulich May 5, 2017, 2:40 p.m. UTC | #1
>>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
> outstanding p2m_ioreq_server entries")' will call
> p2m_change_entry_type_global() which set entry.recalc=1. Then
> the following get_entry(p2m_ioreq_server) will return
> p2m_ram_rw type.
> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
> outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
> type, then reset p2m_ioreq_server entries. The fact is the assumption
> isn't true, and sysnchronously reset function couldn't work. Then
> ioreq.entry_count is larger than zero after an ioreq server unmaps,
> finally this results DomU couldn't reboot.

I've had trouble understanding this part already on v1 (btw, why is
this one not tagged v2?), and since I still can't figure it I have to ask:
Why is it that guest reboot is being impacted here? From what I recall
a non-zero count should only prevent migration.

> This patch add a P2M_PRE_RECALC flag to p2m_query_t, then
> get_entry(P2M_PRE_RECALC) will return p2m_ioreq_server type
> for p2m_ioreq_server pfn, and finally change mem type through set_entry.

This looks to be a relatively little impact change, but nevertheless
I'm wondering whether someone else (George?) may be able to
think of some more elegant solution (I have to admit that, having
suggested the one here, I can't).

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -991,8 +991,11 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>  
>      if ( is_epte_valid(ept_entry) )
>      {
> -        *t = p2m_recalc_type(recalc || ept_entry->recalc,
> -                             ept_entry->sa_p2mt, p2m, gfn);
> +        if ( !(q & P2M_PRE_RECALC) )

Here and elsewhere you want to use likely(). And in fact I wonder
whether overall it wouldn't be better to pass q (or just the boolean
resulting from q & P2M_PRE_RECALC) to p2m_recalc_type(),
avoiding these recurring if/else-s you add.

> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1020,6 +1020,8 @@ void p2m_finish_type_change(struct domain *d,
>      p2m_type_t t;
>      unsigned long gfn = gfn_x(first_gfn);
>      unsigned long last_gfn = gfn + max_nr - 1;
> +    mfn_t mfn;
> +    p2m_access_t a;

Please put these variable declarations ...

> @@ -1029,10 +1031,10 @@ void p2m_finish_type_change(struct domain *d,
>      last_gfn = min(last_gfn, p2m->max_mapped_pfn);
>      while ( gfn <= last_gfn )
>      {

... here.

Jan
Zhang, Xiong Y May 6, 2017, 1:51 a.m. UTC | #2
> >>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
> > 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
> > outstanding p2m_ioreq_server entries")' will call
> > p2m_change_entry_type_global() which set entry.recalc=1. Then
> > the following get_entry(p2m_ioreq_server) will return
> > p2m_ram_rw type.
> > But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
> > outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
> > assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
> > type, then reset p2m_ioreq_server entries. The fact is the assumption
> > isn't true, and sysnchronously reset function couldn't work. Then
> > ioreq.entry_count is larger than zero after an ioreq server unmaps,
> > finally this results DomU couldn't reboot.
> 
> I've had trouble understanding this part already on v1 (btw, why is
> this one not tagged v2?), and since I still can't figure it I have to ask:
> Why is it that guest reboot is being impacted here? From what I recall
> a non-zero count should only prevent migration.
[Zhang, Xiong Y] Sorry, although they solve the same issue, the solution is
totally different, so I didn't mark this as V2, I will mark the following
as v2 with this solution.
During DomU reboot, it will first unmap ioreq server in shutdown process,
then it call map ioreq server in boot process. The following sentence in
p2m_set_ioreq_server() result mapping ioreq server failure, then DomU
couldn't continue booting.
If ( read_atomic(&p->ioreq.entry_count))
   goto out;
> 
> > This patch add a P2M_PRE_RECALC flag to p2m_query_t, then
> > get_entry(P2M_PRE_RECALC) will return p2m_ioreq_server type
> > for p2m_ioreq_server pfn, and finally change mem type through set_entry.
> 
> This looks to be a relatively little impact change, but nevertheless
> I'm wondering whether someone else (George?) may be able to
> think of some more elegant solution (I have to admit that, having
> suggested the one here, I can't).
> 
> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -991,8 +991,11 @@ static mfn_t ept_get_entry(struct p2m_domain
> *p2m,
> >
> >      if ( is_epte_valid(ept_entry) )
> >      {
> > -        *t = p2m_recalc_type(recalc || ept_entry->recalc,
> > -                             ept_entry->sa_p2mt, p2m, gfn);
> > +        if ( !(q & P2M_PRE_RECALC) )
> 
> Here and elsewhere you want to use likely(). And in fact I wonder
> whether overall it wouldn't be better to pass q (or just the boolean
> resulting from q & P2M_PRE_RECALC) to p2m_recalc_type(),
> avoiding these recurring if/else-s you add.
[Zhang, Xiong Y] Good suggestion, I will pass (q & P2M_PRE_RECALC) to
recalc bool parameter in p2m_recalc_type();
> 
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -1020,6 +1020,8 @@ void p2m_finish_type_change(struct domain *d,
> >      p2m_type_t t;
> >      unsigned long gfn = gfn_x(first_gfn);
> >      unsigned long last_gfn = gfn + max_nr - 1;
> > +    mfn_t mfn;
> > +    p2m_access_t a;
> 
> Please put these variable declarations ...
> 
> > @@ -1029,10 +1031,10 @@ void p2m_finish_type_change(struct domain
> *d,
> >      last_gfn = min(last_gfn, p2m->max_mapped_pfn);
> >      while ( gfn <= last_gfn )
> >      {
> 
> ... here.
[Zhang, Xiong Y] ok, thanks
> 
> Jan
Jan Beulich May 8, 2017, 7:04 a.m. UTC | #3
>>> On 06.05.17 at 03:51, <xiong.y.zhang@intel.com> wrote:
>> >>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
>> > 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
>> > outstanding p2m_ioreq_server entries")' will call
>> > p2m_change_entry_type_global() which set entry.recalc=1. Then
>> > the following get_entry(p2m_ioreq_server) will return
>> > p2m_ram_rw type.
>> > But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
>> > outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
>> > assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
>> > type, then reset p2m_ioreq_server entries. The fact is the assumption
>> > isn't true, and sysnchronously reset function couldn't work. Then
>> > ioreq.entry_count is larger than zero after an ioreq server unmaps,
>> > finally this results DomU couldn't reboot.
>> 
>> I've had trouble understanding this part already on v1 (btw, why is
>> this one not tagged v2?), and since I still can't figure it I have to ask:
>> Why is it that guest reboot is being impacted here? From what I recall
>> a non-zero count should only prevent migration.
> [Zhang, Xiong Y] Sorry, although they solve the same issue, the solution is
> totally different, so I didn't mark this as V2, I will mark the following
> as v2 with this solution.
> During DomU reboot, it will first unmap ioreq server in shutdown process,
> then it call map ioreq server in boot process. The following sentence in
> p2m_set_ioreq_server() result mapping ioreq server failure, then DomU
> couldn't continue booting.
> If ( read_atomic(&p->ioreq.entry_count))
>    goto out;

It is clear that it would be this statement to be the problem one,
but I continue to not see why this would affect reboot: The rebooted
guest runs in another VM with, hence, a different p2m. I cannot see
why there would be a non-zero ioreq.entry_count the first time an
ioreq server claims the p2m_ioreq_server type for this new domain.

Jan
Zhang, Xiong Y May 8, 2017, 10:52 a.m. UTC | #4
> >>> On 06.05.17 at 03:51, <xiong.y.zhang@intel.com> wrote:
> >> >>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
> >> > 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
> >> > outstanding p2m_ioreq_server entries")' will call
> >> > p2m_change_entry_type_global() which set entry.recalc=1. Then
> >> > the following get_entry(p2m_ioreq_server) will return
> >> > p2m_ram_rw type.
> >> > But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
> >> > outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
> >> > assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
> >> > type, then reset p2m_ioreq_server entries. The fact is the assumption
> >> > isn't true, and sysnchronously reset function couldn't work. Then
> >> > ioreq.entry_count is larger than zero after an ioreq server unmaps,
> >> > finally this results DomU couldn't reboot.
> >>
> >> I've had trouble understanding this part already on v1 (btw, why is
> >> this one not tagged v2?), and since I still can't figure it I have to ask:
> >> Why is it that guest reboot is being impacted here? From what I recall
> >> a non-zero count should only prevent migration.
> > [Zhang, Xiong Y] Sorry, although they solve the same issue, the solution is
> > totally different, so I didn't mark this as V2, I will mark the following
> > as v2 with this solution.
> > During DomU reboot, it will first unmap ioreq server in shutdown process,
> > then it call map ioreq server in boot process. The following sentence in
> > p2m_set_ioreq_server() result mapping ioreq server failure, then DomU
> > couldn't continue booting.
> > If ( read_atomic(&p->ioreq.entry_count))
> >    goto out;
> 
> It is clear that it would be this statement to be the problem one,
> but I continue to not see why this would affect reboot: The rebooted
> guest runs in another VM with, hence, a different p2m. I cannot see
> why there would be a non-zero ioreq.entry_count the first time an
> ioreq server claims the p2m_ioreq_server type for this new domain.
> 
[Zhang, Xiong Y] This is what I see from xl dmesg when a DomU reboot
1) unmap io_req_server with old domid
2) map io_req_server with old domid 
3)unmap io_req_server with old domid
4) map io_req_server with new domid

The 1) and 2) are triggered by our device reset handler in qemu, it will
destroy old device handler, then create device handler with the old domid
again. so we could see ioreq.entry_could > 0 with old domid, then reboot
process terminated.
> Jan
George Dunlap May 8, 2017, 11:12 a.m. UTC | #5
On 08/05/17 11:52, Zhang, Xiong Y wrote:
>>>>> On 06.05.17 at 03:51, <xiong.y.zhang@intel.com> wrote:
>>>>>>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
>>>>> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
>>>>> outstanding p2m_ioreq_server entries")' will call
>>>>> p2m_change_entry_type_global() which set entry.recalc=1. Then
>>>>> the following get_entry(p2m_ioreq_server) will return
>>>>> p2m_ram_rw type.
>>>>> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
>>>>> outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
>>>>> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
>>>>> type, then reset p2m_ioreq_server entries. The fact is the assumption
>>>>> isn't true, and sysnchronously reset function couldn't work. Then
>>>>> ioreq.entry_count is larger than zero after an ioreq server unmaps,
>>>>> finally this results DomU couldn't reboot.
>>>>
>>>> I've had trouble understanding this part already on v1 (btw, why is
>>>> this one not tagged v2?), and since I still can't figure it I have to ask:
>>>> Why is it that guest reboot is being impacted here? From what I recall
>>>> a non-zero count should only prevent migration.
>>> [Zhang, Xiong Y] Sorry, although they solve the same issue, the solution is
>>> totally different, so I didn't mark this as V2, I will mark the following
>>> as v2 with this solution.
>>> During DomU reboot, it will first unmap ioreq server in shutdown process,
>>> then it call map ioreq server in boot process. The following sentence in
>>> p2m_set_ioreq_server() result mapping ioreq server failure, then DomU
>>> couldn't continue booting.
>>> If ( read_atomic(&p->ioreq.entry_count))
>>>    goto out;
>>
>> It is clear that it would be this statement to be the problem one,
>> but I continue to not see why this would affect reboot: The rebooted
>> guest runs in another VM with, hence, a different p2m. I cannot see
>> why there would be a non-zero ioreq.entry_count the first time an
>> ioreq server claims the p2m_ioreq_server type for this new domain.
>>
> [Zhang, Xiong Y] This is what I see from xl dmesg when a DomU reboot
> 1) unmap io_req_server with old domid
> 2) map io_req_server with old domid 
> 3)unmap io_req_server with old domid
> 4) map io_req_server with new domid
> 
> The 1) and 2) are triggered by our device reset handler in qemu, it will
> destroy old device handler, then create device handler with the old domid
> again. so we could see ioreq.entry_could > 0 with old domid, then reboot
> process terminated.

Oh, so it prevents reboot of XenGT, but not of normal guests?

Why does a reboot cause the device to detach, re-attach, and then
re-detach again?

Also, I'm sorry for missing the bug during review, but it's a bit
annoying to find out that the core functionality of patch -- detaching
and re-attaching -- wasn't tested at all before submission.

 -George
Zhang, Xiong Y May 8, 2017, 11:59 a.m. UTC | #6
> On 08/05/17 11:52, Zhang, Xiong Y wrote:
> >>>>> On 06.05.17 at 03:51, <xiong.y.zhang@intel.com> wrote:
> >>>>>>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
> >>>>> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
> >>>>> outstanding p2m_ioreq_server entries")' will call
> >>>>> p2m_change_entry_type_global() which set entry.recalc=1. Then
> >>>>> the following get_entry(p2m_ioreq_server) will return
> >>>>> p2m_ram_rw type.
> >>>>> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
> >>>>> outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
> >>>>> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
> >>>>> type, then reset p2m_ioreq_server entries. The fact is the assumption
> >>>>> isn't true, and sysnchronously reset function couldn't work. Then
> >>>>> ioreq.entry_count is larger than zero after an ioreq server unmaps,
> >>>>> finally this results DomU couldn't reboot.
> >>>>
> >>>> I've had trouble understanding this part already on v1 (btw, why is
> >>>> this one not tagged v2?), and since I still can't figure it I have to ask:
> >>>> Why is it that guest reboot is being impacted here? From what I recall
> >>>> a non-zero count should only prevent migration.
> >>> [Zhang, Xiong Y] Sorry, although they solve the same issue, the solution is
> >>> totally different, so I didn't mark this as V2, I will mark the following
> >>> as v2 with this solution.
> >>> During DomU reboot, it will first unmap ioreq server in shutdown process,
> >>> then it call map ioreq server in boot process. The following sentence in
> >>> p2m_set_ioreq_server() result mapping ioreq server failure, then DomU
> >>> couldn't continue booting.
> >>> If ( read_atomic(&p->ioreq.entry_count))
> >>>    goto out;
> >>
> >> It is clear that it would be this statement to be the problem one,
> >> but I continue to not see why this would affect reboot: The rebooted
> >> guest runs in another VM with, hence, a different p2m. I cannot see
> >> why there would be a non-zero ioreq.entry_count the first time an
> >> ioreq server claims the p2m_ioreq_server type for this new domain.
> >>
> > [Zhang, Xiong Y] This is what I see from xl dmesg when a DomU reboot
> > 1) unmap io_req_server with old domid
> > 2) map io_req_server with old domid
> > 3)unmap io_req_server with old domid
> > 4) map io_req_server with new domid
> >
> > The 1) and 2) are triggered by our device reset handler in qemu, it will
> > destroy old device handler, then create device handler with the old domid
> > again. so we could see ioreq.entry_could > 0 with old domid, then reboot
> > process terminated.
> 
> Oh, so it prevents reboot of XenGT, but not of normal guests?
[Zhang, Xiong Y] Yes, only XenGT has this issue.
> 
> Why does a reboot cause the device to detach, re-attach, and then
> re-detach again?
[Zhang, Xiong Y] As our vgt_reset() function will destroy a vgt instance,
then create a vgt instance. And this vgt_reset function is also used
in normal boot process. 
> 
> Also, I'm sorry for missing the bug during review, but it's a bit
> annoying to find out that the core functionality of patch -- detaching
> and re-attaching -- wasn't tested at all before submission.
[Zhang, Xiong Y] Sorry, it is mainly our fault that we miss this.
As we are preparing Xengt community release recently and are refactoring 
our code, then this issue are exposed.

> 
>  -George
Jan Beulich May 8, 2017, 12:13 p.m. UTC | #7
>>> On 08.05.17 at 13:59, <xiong.y.zhang@intel.com> wrote:
>>  On 08/05/17 11:52, Zhang, Xiong Y wrote:
>> >>>>> On 06.05.17 at 03:51, <xiong.y.zhang@intel.com> wrote:
>> >>>>>>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
>> >>>>> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
>> >>>>> outstanding p2m_ioreq_server entries")' will call
>> >>>>> p2m_change_entry_type_global() which set entry.recalc=1. Then
>> >>>>> the following get_entry(p2m_ioreq_server) will return
>> >>>>> p2m_ram_rw type.
>> >>>>> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
>> >>>>> outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
>> >>>>> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
>> >>>>> type, then reset p2m_ioreq_server entries. The fact is the assumption
>> >>>>> isn't true, and sysnchronously reset function couldn't work. Then
>> >>>>> ioreq.entry_count is larger than zero after an ioreq server unmaps,
>> >>>>> finally this results DomU couldn't reboot.
>> >>>>
>> >>>> I've had trouble understanding this part already on v1 (btw, why is
>> >>>> this one not tagged v2?), and since I still can't figure it I have to ask:
>> >>>> Why is it that guest reboot is being impacted here? From what I recall
>> >>>> a non-zero count should only prevent migration.
>> >>> [Zhang, Xiong Y] Sorry, although they solve the same issue, the solution is
>> >>> totally different, so I didn't mark this as V2, I will mark the following
>> >>> as v2 with this solution.
>> >>> During DomU reboot, it will first unmap ioreq server in shutdown process,
>> >>> then it call map ioreq server in boot process. The following sentence in
>> >>> p2m_set_ioreq_server() result mapping ioreq server failure, then DomU
>> >>> couldn't continue booting.
>> >>> If ( read_atomic(&p->ioreq.entry_count))
>> >>>    goto out;
>> >>
>> >> It is clear that it would be this statement to be the problem one,
>> >> but I continue to not see why this would affect reboot: The rebooted
>> >> guest runs in another VM with, hence, a different p2m. I cannot see
>> >> why there would be a non-zero ioreq.entry_count the first time an
>> >> ioreq server claims the p2m_ioreq_server type for this new domain.
>> >>
>> > [Zhang, Xiong Y] This is what I see from xl dmesg when a DomU reboot
>> > 1) unmap io_req_server with old domid
>> > 2) map io_req_server with old domid
>> > 3)unmap io_req_server with old domid
>> > 4) map io_req_server with new domid
>> >
>> > The 1) and 2) are triggered by our device reset handler in qemu, it will
>> > destroy old device handler, then create device handler with the old domid
>> > again. so we could see ioreq.entry_could > 0 with old domid, then reboot
>> > process terminated.
>> 
>> Oh, so it prevents reboot of XenGT, but not of normal guests?
> [Zhang, Xiong Y] Yes, only XenGT has this issue.

The patch description should say so then.

>> Why does a reboot cause the device to detach, re-attach, and then
>> re-detach again?
> [Zhang, Xiong Y] As our vgt_reset() function will destroy a vgt instance,
> then create a vgt instance. And this vgt_reset function is also used
> in normal boot process. 

Sounds like that's where the immediate problem is then. (Of course
we want the hypervisor side taken care of too.)

Jan
George Dunlap May 8, 2017, 1:26 p.m. UTC | #8
On 05/05/17 15:40, Jan Beulich wrote:
>>>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
>> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
>> outstanding p2m_ioreq_server entries")' will call
>> p2m_change_entry_type_global() which set entry.recalc=1. Then
>> the following get_entry(p2m_ioreq_server) will return
>> p2m_ram_rw type.
>> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
>> outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
>> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
>> type, then reset p2m_ioreq_server entries. The fact is the assumption
>> isn't true, and sysnchronously reset function couldn't work. Then
>> ioreq.entry_count is larger than zero after an ioreq server unmaps,
>> finally this results DomU couldn't reboot.
> 
> I've had trouble understanding this part already on v1 (btw, why is
> this one not tagged v2?), and since I still can't figure it I have to ask:
> Why is it that guest reboot is being impacted here? From what I recall
> a non-zero count should only prevent migration.
> 
>> This patch add a P2M_PRE_RECALC flag to p2m_query_t, then
>> get_entry(P2M_PRE_RECALC) will return p2m_ioreq_server type
>> for p2m_ioreq_server pfn, and finally change mem type through set_entry.
> 
> This looks to be a relatively little impact change, but nevertheless
> I'm wondering whether someone else (George?) may be able to
> think of some more elegant solution (I have to admit that, having
> suggested the one here, I can't).

So the basic problem is that the implementation-specific get_entry()
will return the theoretical new type without changing it, and
finish_type_change() is in the generic code which calls these
implementation-specific p2m functions.

I think the possible solutions are:

1. Add a flag to the get_entry() call to return the real entry (i.e.,
the solution in this patch)

2. Have get_entry() update entries as it reads them

3. Implement an implementation-specific finish_type_change()

Of the three, there's something attractive about #2; but it would take a
lot of careful thought.  In theory there shouldn't be any harm in
changing the type of an entry marked "recalc", but with so many callers
it's difficult to be sure.  I'd never want to make such a change this
close to a release.

#3 will introduce a lot of unnecessary code duplication; so I think for
now #1 is probably the best idea (in particular during a code freeze).

 -George
George Dunlap May 8, 2017, 1:55 p.m. UTC | #9
On 08/05/17 14:26, George Dunlap wrote:
> On 05/05/17 15:40, Jan Beulich wrote:
>>>>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
>>> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
>>> outstanding p2m_ioreq_server entries")' will call
>>> p2m_change_entry_type_global() which set entry.recalc=1. Then
>>> the following get_entry(p2m_ioreq_server) will return
>>> p2m_ram_rw type.
>>> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
>>> outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
>>> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
>>> type, then reset p2m_ioreq_server entries. The fact is the assumption
>>> isn't true, and sysnchronously reset function couldn't work. Then
>>> ioreq.entry_count is larger than zero after an ioreq server unmaps,
>>> finally this results DomU couldn't reboot.
>>
>> I've had trouble understanding this part already on v1 (btw, why is
>> this one not tagged v2?), and since I still can't figure it I have to ask:
>> Why is it that guest reboot is being impacted here? From what I recall
>> a non-zero count should only prevent migration.
>>
>>> This patch add a P2M_PRE_RECALC flag to p2m_query_t, then
>>> get_entry(P2M_PRE_RECALC) will return p2m_ioreq_server type
>>> for p2m_ioreq_server pfn, and finally change mem type through set_entry.
>>
>> This looks to be a relatively little impact change, but nevertheless
>> I'm wondering whether someone else (George?) may be able to
>> think of some more elegant solution (I have to admit that, having
>> suggested the one here, I can't).
> 
> So the basic problem is that the implementation-specific get_entry()
> will return the theoretical new type without changing it, and
> finish_type_change() is in the generic code which calls these
> implementation-specific p2m functions.
> 
> I think the possible solutions are:
> 
> 1. Add a flag to the get_entry() call to return the real entry (i.e.,
> the solution in this patch)
> 
> 2. Have get_entry() update entries as it reads them
> 
> 3. Implement an implementation-specific finish_type_change()

Actually -- there are already implementation-specific versions for
individual gpas: p2m-pt.c:do_recalc() and p2m-ept.c:resolve_misconfig().
 They even already have the same function signature

If we made a new p2m hook, p2m->recalc(), we could simply loop around
calling that instead of doing the get_entry() / set_entry() cycle.

One advantage of doing this is that those functions automatically handle
superpages, whereas the patch you send here resets all entries to 4k.

Ideally we'd modify the functions to return the order of the entry
changed to make the loop more efficient, but that should probably wait
until after 4.9.

 -George

> 
> Of the three, there's something attractive about #2; but it would take a
> lot of careful thought.  In theory there shouldn't be any harm in
> changing the type of an entry marked "recalc", but with so many callers
> it's difficult to be sure.  I'd never want to make such a change this
> close to a release.
> 
> #3 will introduce a lot of unnecessary code duplication; so I think for
> now #1 is probably the best idea (in particular during a code freeze).
> 
>  -George
>
Jan Beulich May 8, 2017, 3:47 p.m. UTC | #10
>>> On 08.05.17 at 15:55, <george.dunlap@citrix.com> wrote:
> On 08/05/17 14:26, George Dunlap wrote:
>> On 05/05/17 15:40, Jan Beulich wrote:
>>>>>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
>>>> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
>>>> outstanding p2m_ioreq_server entries")' will call
>>>> p2m_change_entry_type_global() which set entry.recalc=1. Then
>>>> the following get_entry(p2m_ioreq_server) will return
>>>> p2m_ram_rw type.
>>>> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
>>>> outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
>>>> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
>>>> type, then reset p2m_ioreq_server entries. The fact is the assumption
>>>> isn't true, and sysnchronously reset function couldn't work. Then
>>>> ioreq.entry_count is larger than zero after an ioreq server unmaps,
>>>> finally this results DomU couldn't reboot.
>>>
>>> I've had trouble understanding this part already on v1 (btw, why is
>>> this one not tagged v2?), and since I still can't figure it I have to ask:
>>> Why is it that guest reboot is being impacted here? From what I recall
>>> a non-zero count should only prevent migration.
>>>
>>>> This patch add a P2M_PRE_RECALC flag to p2m_query_t, then
>>>> get_entry(P2M_PRE_RECALC) will return p2m_ioreq_server type
>>>> for p2m_ioreq_server pfn, and finally change mem type through set_entry.
>>>
>>> This looks to be a relatively little impact change, but nevertheless
>>> I'm wondering whether someone else (George?) may be able to
>>> think of some more elegant solution (I have to admit that, having
>>> suggested the one here, I can't).
>> 
>> So the basic problem is that the implementation-specific get_entry()
>> will return the theoretical new type without changing it, and
>> finish_type_change() is in the generic code which calls these
>> implementation-specific p2m functions.
>> 
>> I think the possible solutions are:
>> 
>> 1. Add a flag to the get_entry() call to return the real entry (i.e.,
>> the solution in this patch)
>> 
>> 2. Have get_entry() update entries as it reads them
>> 
>> 3. Implement an implementation-specific finish_type_change()
> 
> Actually -- there are already implementation-specific versions for
> individual gpas: p2m-pt.c:do_recalc() and p2m-ept.c:resolve_misconfig().
>  They even already have the same function signature
> 
> If we made a new p2m hook, p2m->recalc(), we could simply loop around
> calling that instead of doing the get_entry() / set_entry() cycle.

Ah, that's a neat idea indeed.

Jan
Yu Zhang May 9, 2017, 5:21 a.m. UTC | #11
On 5/8/2017 7:12 PM, George Dunlap wrote:
> On 08/05/17 11:52, Zhang, Xiong Y wrote:
>>>>>> On 06.05.17 at 03:51, <xiong.y.zhang@intel.com> wrote:
>>>>>>>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
>>>>>> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
>>>>>> outstanding p2m_ioreq_server entries")' will call
>>>>>> p2m_change_entry_type_global() which set entry.recalc=1. Then
>>>>>> the following get_entry(p2m_ioreq_server) will return
>>>>>> p2m_ram_rw type.
>>>>>> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
>>>>>> outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
>>>>>> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
>>>>>> type, then reset p2m_ioreq_server entries. The fact is the assumption
>>>>>> isn't true, and sysnchronously reset function couldn't work. Then
>>>>>> ioreq.entry_count is larger than zero after an ioreq server unmaps,
>>>>>> finally this results DomU couldn't reboot.
>>>>> I've had trouble understanding this part already on v1 (btw, why is
>>>>> this one not tagged v2?), and since I still can't figure it I have to ask:
>>>>> Why is it that guest reboot is being impacted here? From what I recall
>>>>> a non-zero count should only prevent migration.
>>>> [Zhang, Xiong Y] Sorry, although they solve the same issue, the solution is
>>>> totally different, so I didn't mark this as V2, I will mark the following
>>>> as v2 with this solution.
>>>> During DomU reboot, it will first unmap ioreq server in shutdown process,
>>>> then it call map ioreq server in boot process. The following sentence in
>>>> p2m_set_ioreq_server() result mapping ioreq server failure, then DomU
>>>> couldn't continue booting.
>>>> If ( read_atomic(&p->ioreq.entry_count))
>>>>     goto out;
>>> It is clear that it would be this statement to be the problem one,
>>> but I continue to not see why this would affect reboot: The rebooted
>>> guest runs in another VM with, hence, a different p2m. I cannot see
>>> why there would be a non-zero ioreq.entry_count the first time an
>>> ioreq server claims the p2m_ioreq_server type for this new domain.
>>>
>> [Zhang, Xiong Y] This is what I see from xl dmesg when a DomU reboot
>> 1) unmap io_req_server with old domid
>> 2) map io_req_server with old domid
>> 3)unmap io_req_server with old domid
>> 4) map io_req_server with new domid
>>
>> The 1) and 2) are triggered by our device reset handler in qemu, it will
>> destroy old device handler, then create device handler with the old domid
>> again. so we could see ioreq.entry_could > 0 with old domid, then reboot
>> process terminated.
> Oh, so it prevents reboot of XenGT, but not of normal guests?
>
> Why does a reboot cause the device to detach, re-attach, and then
> re-detach again?
>
> Also, I'm sorry for missing the bug during review, but it's a bit
> annoying to find out that the core functionality of patch -- detaching
> and re-attaching -- wasn't tested at all before submission.

Thanks for your reply, George.
This error was introduced in the last version of patch "x86/ioreq 
server: Asynchronously reset
outstanding p2m_ioreq_server entries.", which included 2 changes from 
its previous version:

1> Do not reset p2m_ioreq_server back to p2m_ram_rw when an ioreq server 
is mapped;

2> Use a helper function to return the p2m type -  this new helper 
routine returns p2m_ram_rw
instead of p2m_ioreq_server if there's no mapping ioreq server. In 
previous versions I just returned
p2m_ioreq_server, and XenGT tests all passed successfully. But I had not 
realized this issue for this
last version and did not have enough time to do the test for this 
version in the last moment before
code freeze...

Sorry about my negligence, we will perform all necessary tests next 
before we send out our new
patches. :-)

Yu


>   -George
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
George Dunlap May 9, 2017, 9:24 a.m. UTC | #12
On 09/05/17 06:21, Yu Zhang wrote:
> 
> 
> On 5/8/2017 7:12 PM, George Dunlap wrote:
>> On 08/05/17 11:52, Zhang, Xiong Y wrote:
>>>>>>> On 06.05.17 at 03:51, <xiong.y.zhang@intel.com> wrote:
>>>>>>>>> On 05.05.17 at 05:52, <xiong.y.zhang@intel.com> wrote:
>>>>>>> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
>>>>>>> outstanding p2m_ioreq_server entries")' will call
>>>>>>> p2m_change_entry_type_global() which set entry.recalc=1. Then
>>>>>>> the following get_entry(p2m_ioreq_server) will return
>>>>>>> p2m_ram_rw type.
>>>>>>> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
>>>>>>> outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
>>>>>>> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
>>>>>>> type, then reset p2m_ioreq_server entries. The fact is the
>>>>>>> assumption
>>>>>>> isn't true, and sysnchronously reset function couldn't work. Then
>>>>>>> ioreq.entry_count is larger than zero after an ioreq server unmaps,
>>>>>>> finally this results DomU couldn't reboot.
>>>>>> I've had trouble understanding this part already on v1 (btw, why is
>>>>>> this one not tagged v2?), and since I still can't figure it I have
>>>>>> to ask:
>>>>>> Why is it that guest reboot is being impacted here? From what I
>>>>>> recall
>>>>>> a non-zero count should only prevent migration.
>>>>> [Zhang, Xiong Y] Sorry, although they solve the same issue, the
>>>>> solution is
>>>>> totally different, so I didn't mark this as V2, I will mark the
>>>>> following
>>>>> as v2 with this solution.
>>>>> During DomU reboot, it will first unmap ioreq server in shutdown
>>>>> process,
>>>>> then it call map ioreq server in boot process. The following
>>>>> sentence in
>>>>> p2m_set_ioreq_server() result mapping ioreq server failure, then DomU
>>>>> couldn't continue booting.
>>>>> If ( read_atomic(&p->ioreq.entry_count))
>>>>>     goto out;
>>>> It is clear that it would be this statement to be the problem one,
>>>> but I continue to not see why this would affect reboot: The rebooted
>>>> guest runs in another VM with, hence, a different p2m. I cannot see
>>>> why there would be a non-zero ioreq.entry_count the first time an
>>>> ioreq server claims the p2m_ioreq_server type for this new domain.
>>>>
>>> [Zhang, Xiong Y] This is what I see from xl dmesg when a DomU reboot
>>> 1) unmap io_req_server with old domid
>>> 2) map io_req_server with old domid
>>> 3)unmap io_req_server with old domid
>>> 4) map io_req_server with new domid
>>>
>>> The 1) and 2) are triggered by our device reset handler in qemu, it will
>>> destroy old device handler, then create device handler with the old
>>> domid
>>> again. so we could see ioreq.entry_could > 0 with old domid, then reboot
>>> process terminated.
>> Oh, so it prevents reboot of XenGT, but not of normal guests?
>>
>> Why does a reboot cause the device to detach, re-attach, and then
>> re-detach again?
>>
>> Also, I'm sorry for missing the bug during review, but it's a bit
>> annoying to find out that the core functionality of patch -- detaching
>> and re-attaching -- wasn't tested at all before submission.
> 
> Thanks for your reply, George.
> This error was introduced in the last version of patch "x86/ioreq
> server: Asynchronously reset
> outstanding p2m_ioreq_server entries.", which included 2 changes from
> its previous version:
> 
> 1> Do not reset p2m_ioreq_server back to p2m_ram_rw when an ioreq server
> is mapped;
> 
> 2> Use a helper function to return the p2m type -  this new helper
> routine returns p2m_ram_rw
> instead of p2m_ioreq_server if there's no mapping ioreq server. In
> previous versions I just returned
> p2m_ioreq_server, and XenGT tests all passed successfully. But I had not
> realized this issue for this
> last version and did not have enough time to do the test for this
> version in the last moment before
> code freeze...

Indeed, we were under a lot of time pressure and you did spend a lot of
time outside of your normal office hours to try to get things fixed; and
your (plural) testing did eventually turn this up before the release.
Sorry for being a bit irritable.

 -George
diff mbox

Patch

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index f37a1f2..8f88d2b 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -991,8 +991,11 @@  static mfn_t ept_get_entry(struct p2m_domain *p2m,
 
     if ( is_epte_valid(ept_entry) )
     {
-        *t = p2m_recalc_type(recalc || ept_entry->recalc,
-                             ept_entry->sa_p2mt, p2m, gfn);
+        if ( !(q & P2M_PRE_RECALC) )
+            *t = p2m_recalc_type(recalc || ept_entry->recalc,
+                                 ept_entry->sa_p2mt, p2m, gfn);
+        else
+            *t = ept_entry->sa_p2mt;
         *a = ept_entry->access;
         if ( sve )
             *sve = ept_entry->suppress_ve;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 5079b59..50e74f5 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -840,8 +840,11 @@  pod_retry_l3:
             mfn = _mfn(l3e_get_pfn(*l3e) +
                        l2_table_offset(addr) * L1_PAGETABLE_ENTRIES +
                        l1_table_offset(addr));
-            *t = p2m_recalc_type(recalc || _needs_recalc(flags),
-                                 p2m_flags_to_type(flags), p2m, gfn);
+            if ( !(q & P2M_PRE_RECALC) )
+                *t = p2m_recalc_type(recalc || _needs_recalc(flags),
+                                p2m_flags_to_type(flags), p2m, gfn);
+            else
+                *t = p2m_flags_to_type(flags);
             unmap_domain_page(l3e);
 
             ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
@@ -879,8 +882,11 @@  pod_retry_l2:
     if ( flags & _PAGE_PSE )
     {
         mfn = _mfn(l2e_get_pfn(*l2e) + l1_table_offset(addr));
-        *t = p2m_recalc_type(recalc || _needs_recalc(flags),
-                             p2m_flags_to_type(flags), p2m, gfn);
+        if ( !(q & P2M_PRE_RECALC) )
+            *t = p2m_recalc_type(recalc || _needs_recalc(flags),
+                                p2m_flags_to_type(flags), p2m, gfn);
+        else
+            *t = p2m_flags_to_type(flags);
         unmap_domain_page(l2e);
         
         ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
@@ -916,7 +922,10 @@  pod_retry_l1:
         return INVALID_MFN;
     }
     mfn = _mfn(l1e_get_pfn(*l1e));
-    *t = p2m_recalc_type(recalc || _needs_recalc(flags), l1t, p2m, gfn);
+    if ( !(q & P2M_PRE_RECALC) )
+        *t = p2m_recalc_type(recalc || _needs_recalc(flags), l1t, p2m, gfn);
+    else
+        *t = l1t;
     unmap_domain_page(l1e);
 
     ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t) || p2m_is_paging(*t));
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 1d57e5c..1c3e22f 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1020,6 +1020,8 @@  void p2m_finish_type_change(struct domain *d,
     p2m_type_t t;
     unsigned long gfn = gfn_x(first_gfn);
     unsigned long last_gfn = gfn + max_nr - 1;
+    mfn_t mfn;
+    p2m_access_t a;
 
     ASSERT(ot != nt);
     ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
@@ -1029,10 +1031,10 @@  void p2m_finish_type_change(struct domain *d,
     last_gfn = min(last_gfn, p2m->max_mapped_pfn);
     while ( gfn <= last_gfn )
     {
-        get_gfn_query_unlocked(d, gfn, &t);
-
+        mfn = p2m->get_entry(p2m, gfn, &t, &a, P2M_PRE_RECALC, NULL, NULL);
         if ( t == ot )
-            p2m_change_type_one(d, gfn, t, nt);
+            p2m_set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, nt,
+                          p2m->default_access);
 
         gfn++;
     }
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 7574a9b..9645260 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -78,6 +78,7 @@  typedef enum {
 typedef unsigned int p2m_query_t;
 #define P2M_ALLOC    (1u<<0)   /* Populate PoD and paged-out entries */
 #define P2M_UNSHARE  (1u<<1)   /* Break CoW sharing */
+#define P2M_PRE_RECALC (1u<<2)   /* Get p2m type before recalc */
 
 /* We use bitmaps and maks to handle groups of types */
 #define p2m_to_mask(_t) (1UL << (_t))