diff mbox

[1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

Message ID 945CA011AD5F084CBEA3E851C0AB28894B871005@SHSMSX101.ccr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Quan Xu March 30, 2016, 2:28 a.m. UTC
On March 29, 2016 3:21pm, <JBeulich@suse.com> wrote:
> >>> On 28.03.16 at 05:33, <quan.xu@intel.com> wrote:
> > On March 18, 2016 1:15am, <JBeulich@suse.com> wrote:
> >> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
> >> > --- a/xen/common/grant_table.c
> >> > +++ b/xen/common/grant_table.c
> >> > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
> >> >              {
> >> >                  nr_gets++;
> >> >                  (void)get_page(pg, rd);
> >> > -                if ( !(op->flags & GNTMAP_readonly) )
> >> > -                    get_page_type(pg, PGT_writable_page);
> >> > +                if ( !(op->flags & GNTMAP_readonly) &&
> >> > +                     !get_page_type(pg, PGT_writable_page) )
> >> > +                        goto could_not_pin;
> >>
> >> This needs explanation, as it doesn't look related to what your
> >> actual goal is: If an error was possible here, I think this would be
> >> a security issue. However, as also kind of documented by the
> >> explicitly ignored return value from get_page(), it is my understanding there
> here we only obtain an _extra_ reference.
> >>
> >
> > For this point, I inferred from:
> > map_vcpu_info()
> > {
> > ...
> >     if ( !get_page_type(page, PGT_writable_page) )
> >     {
> >         put_page(page);
> >         return -EINVAL;
> >     }
> > ...
> > }
> > , then for get_page_type(), I think the return value:
> >      0 -- error,
> >      1-- right.
> >
> > So if get_page_type() is failed, we should goto could_not_pin.
> 
> Did you read my reply at all? The explanation I'm expecting here is why error
> checking is all of the sudden needed _at all_.
> 

Sorry for my stupid reply.
As in this version, before the open discussion, I try to return the iommu_{,un}map_page() error in this call tree:
           iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
then, in this point, I try to deal with this iommu_{,un}map_page() error.

> > btw, there is another issue in the call path:
> >     iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
> >
> >
> > I tried to return iommu_{,un}map_page() error code in
> > __get_page_type(), is it right?
> 
> If the operation got fully rolled back - yes. Whether fully rolling back is feasible
> there though is - see the respective discussion - an open question.
> 

For the open question, does it refer to as below:

"""
As said, we first need
to settle on an abstract model. Do we want IOMMU mapping
failures to be fatal to the domain (perhaps with the exception
of the hardware one)? I think we do, and for the hardware domain
we'd do things on a best effort basis (always erring on the side
of unmapping). Which would probably mean crashing the domain
could be centralized in iommu_{,un}map_page(). How much roll
back would then still be needed in callers of these functions
for the hardware domain's sake would need to be seen.
"""

I hope it is yes. I read all of your emails again and again, I found I did get the point until this Monday.
I am summarizing it and would send out in a new thread.


> >> > --- a/xen/drivers/passthrough/x86/iommu.c
> >> > +++ b/xen/drivers/passthrough/x86/iommu.c
> >> > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct
> >> domain *d)
> >> >      this_cpu(iommu_dont_flush_iotlb) = 0;
> >> >
> >> >      if ( !rc )
> >> > -        iommu_iotlb_flush_all(d);
> >> > +    {
> >> > +        rc = iommu_iotlb_flush_all(d);
> >> > +        if ( rc )
> >> > +            iommu_teardown(d);
> >> > +    }
> >> >      else if ( rc != -ERESTART )
> >> >          iommu_teardown(d);
> >>
> >> Why can't you just use the existing call to iommu_teardown(), by
> >> simply
> > deleting
> >> the "else"?
> >>
> >
> > Just check it, could I modify it as below:
> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -105,7 +105,8 @@ int arch_iommu_populate_page_table(struct domain
> > *d)
> >
> >      if ( !rc )
> >          iommu_iotlb_flush_all(d);
> > -    else if ( rc != -ERESTART )
> > +
> > +    if ( rc != -ERESTART )
> >          iommu_teardown(d);
> 
> Clearly not - not only are you losing the return value of
> iommu_iotlb_flush_all() now, you would then also call
> iommu_teardown() in the "success" case. My comment was related to code
> structure, yet you seem to have taken it literally.
> 

Then, what about this one:
IMO, my original modification is correct and redundant with 2 'iommu_teardown()'..
If this is still the correct one, could you help me send out the correct one?

Quan

Comments

Quan Xu March 30, 2016, 2:35 a.m. UTC | #1
On March 30, 2016 10:28am, Xu, Quan <quan.xu@intel.com> wrote:
> If this is still the correct one, could you help me send out the correct one?

> 

Sorry, a typo:
If this is still _not_ the correct one, could you help me send out the correct one?

Quan
Jan Beulich March 30, 2016, 8:05 a.m. UTC | #2
>>> On 30.03.16 at 04:28, <quan.xu@intel.com> wrote:
> On March 29, 2016 3:21pm, <JBeulich@suse.com> wrote:
>> >>> On 28.03.16 at 05:33, <quan.xu@intel.com> wrote:
>> > On March 18, 2016 1:15am, <JBeulich@suse.com> wrote:
>> >> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
>> >> > --- a/xen/common/grant_table.c
>> >> > +++ b/xen/common/grant_table.c
>> >> > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
>> >> >              {
>> >> >                  nr_gets++;
>> >> >                  (void)get_page(pg, rd);
>> >> > -                if ( !(op->flags & GNTMAP_readonly) )
>> >> > -                    get_page_type(pg, PGT_writable_page);
>> >> > +                if ( !(op->flags & GNTMAP_readonly) &&
>> >> > +                     !get_page_type(pg, PGT_writable_page) )
>> >> > +                        goto could_not_pin;
>> >>
>> >> This needs explanation, as it doesn't look related to what your
>> >> actual goal is: If an error was possible here, I think this would be
>> >> a security issue. However, as also kind of documented by the
>> >> explicitly ignored return value from get_page(), it is my understanding there
>> here we only obtain an _extra_ reference.
>> >>
>> >
>> > For this point, I inferred from:
>> > map_vcpu_info()
>> > {
>> > ...
>> >     if ( !get_page_type(page, PGT_writable_page) )
>> >     {
>> >         put_page(page);
>> >         return -EINVAL;
>> >     }
>> > ...
>> > }
>> > , then for get_page_type(), I think the return value:
>> >      0 -- error,
>> >      1-- right.
>> >
>> > So if get_page_type() is failed, we should goto could_not_pin.
>> 
>> Did you read my reply at all? The explanation I'm expecting here is why 
> error
>> checking is all of the sudden needed _at all_.
>> 
> 
> Sorry for my stupid reply.
> As in this version, before the open discussion, I try to return the 
> iommu_{,un}map_page() error in this call tree:
>            iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
> then, in this point, I try to deal with this iommu_{,un}map_page() error.

I still don't get it: We're talking about a get_page_type() invocation
that previously was known to never fail (or at least so we hope,
based on the existing code). What I'm expecting as an explanation
is why this "cannot fail" state is not true any longer. And while
sorting this out, please pay particular attention to the limited set of
cases where __get_page_type() calls iommu_{,un}map_page() in
the first place.

>> > btw, there is another issue in the call path:
>> >     iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
>> >
>> >
>> > I tried to return iommu_{,un}map_page() error code in
>> > __get_page_type(), is it right?
>> 
>> If the operation got fully rolled back - yes. Whether fully rolling back is feasible
>> there though is - see the respective discussion - an open question.
>> 
> 
> For the open question, does it refer to as below:

Partly.

> """
> As said, we first need
> to settle on an abstract model. Do we want IOMMU mapping
> failures to be fatal to the domain (perhaps with the exception
> of the hardware one)? I think we do, and for the hardware domain
> we'd do things on a best effort basis (always erring on the side
> of unmapping). Which would probably mean crashing the domain
> could be centralized in iommu_{,un}map_page(). How much roll
> back would then still be needed in callers of these functions
> for the hardware domain's sake would need to be seen.
> """
> 
> I hope it is yes.

It is not clear to me what part of the above this is meant to refer to.
Perhaps this is meant to answer the question in the 2nd sentence,
but I think this really ought to take a little more than "yes".

>> >> > --- a/xen/drivers/passthrough/x86/iommu.c
>> >> > +++ b/xen/drivers/passthrough/x86/iommu.c
>> >> > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct
>> >> domain *d)
>> >> >      this_cpu(iommu_dont_flush_iotlb) = 0;
>> >> >
>> >> >      if ( !rc )
>> >> > -        iommu_iotlb_flush_all(d);
>> >> > +    {
>> >> > +        rc = iommu_iotlb_flush_all(d);
>> >> > +        if ( rc )
>> >> > +            iommu_teardown(d);
>> >> > +    }
>> >> >      else if ( rc != -ERESTART )
>> >> >          iommu_teardown(d);
>> >>
>> >> Why can't you just use the existing call to iommu_teardown(), by
>> >> simply
>> > deleting
>> >> the "else"?
>> >>
>> >
>> > Just check it, could I modify it as below:
>> > --- a/xen/drivers/passthrough/x86/iommu.c
>> > +++ b/xen/drivers/passthrough/x86/iommu.c
>> > @@ -105,7 +105,8 @@ int arch_iommu_populate_page_table(struct domain
>> > *d)
>> >
>> >      if ( !rc )
>> >          iommu_iotlb_flush_all(d);
>> > -    else if ( rc != -ERESTART )
>> > +
>> > +    if ( rc != -ERESTART )
>> >          iommu_teardown(d);
>> 
>> Clearly not - not only are you losing the return value of
>> iommu_iotlb_flush_all() now, you would then also call
>> iommu_teardown() in the "success" case. My comment was related to code
>> structure, yet you seem to have taken it literally.
>> 
> 
> Then, what about this one:
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -104,8 +104,9 @@ int arch_iommu_populate_page_table(struct domain *d)
>      this_cpu(iommu_dont_flush_iotlb) = 0;
> 
>      if ( !rc )
> -        iommu_iotlb_flush_all(d);
> -    else if ( rc != -ERESTART )
> +        rc = iommu_iotlb_flush_all(d);
> +
> +    if ( !rc && rc != -ERESTART )
>          iommu_teardown(d);
> 
> 
> IMO, my original modification is correct and redundant with 2 
> 'iommu_teardown()'..
> If this is still the correct one, could you help me send out the correct 
> one?

The above looks right to me.

Jan
diff mbox

Patch

--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -104,8 +104,9 @@  int arch_iommu_populate_page_table(struct domain *d)
     this_cpu(iommu_dont_flush_iotlb) = 0;

     if ( !rc )
-        iommu_iotlb_flush_all(d);
-    else if ( rc != -ERESTART )
+        rc = iommu_iotlb_flush_all(d);
+
+    if ( !rc && rc != -ERESTART )
         iommu_teardown(d);