diff mbox

[v4,02/10] IOMMU: handle IOMMU mapping and unmapping failures

Message ID 1462524880-67205-3-git-send-email-quan.xu@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Quan Xu May 6, 2016, 8:54 a.m. UTC
Treat IOMMU mapping and unmapping failures as a fatal to the domain
(with the exception of the hardware domain).

If IOMMU mapping and unmapping failed, crash the domain (with the
exception of the hardware domain) and propagate the error up to the
call trees.

Signed-off-by: Quan Xu <quan.xu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

CC: Jan Beulich <jbeulich@suse.com>
CC: Kevin Tian <kevin.tian@intel.com>
---
 xen/drivers/passthrough/iommu.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

Comments

Jan Beulich May 9, 2016, 4:13 p.m. UTC | #1
>>> On 06.05.16 at 10:54, <quan.xu@intel.com> wrote:
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -240,21 +240,47 @@ int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
>                     unsigned int flags)
>  {
>      const struct domain_iommu *hd = dom_iommu(d);
> +    int rc;
>  
>      if ( !iommu_enabled || !hd->platform_ops )
>          return 0;
>  
> -    return hd->platform_ops->map_page(d, gfn, mfn, flags);
> +    rc = hd->platform_ops->map_page(d, gfn, mfn, flags);
> +
> +    if ( unlikely(rc) )
> +    {
> +        printk(XENLOG_ERR
> +               "iommu_map_page: IOMMU mapping gfn %#lx mfn %#lx failed for dom%d.",
> +               gfn, mfn, d->domain_id);
> +
> +        if ( !is_hardware_domain(d) )
> +            domain_crash(d);
> +    }

This still may spam the console in at least the case of Dom0. For
DomU I'd really expect you to state in the commit message why no
spamming can occur (of course assuming it really can't, which I'm
not convinced of).

Jan
Quan Xu May 10, 2016, 3:41 a.m. UTC | #2
On May 10, 2016 12:14 AM, Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 06.05.16 at 10:54, <quan.xu@intel.com> wrote:
> > --- a/xen/drivers/passthrough/iommu.c
> > +++ b/xen/drivers/passthrough/iommu.c
> > @@ -240,21 +240,47 @@ int iommu_map_page(struct domain *d,
> unsigned long gfn, unsigned long mfn,
> >                     unsigned int flags)  {
> >      const struct domain_iommu *hd = dom_iommu(d);
> > +    int rc;
> >
> >      if ( !iommu_enabled || !hd->platform_ops )
> >          return 0;
> >
> > -    return hd->platform_ops->map_page(d, gfn, mfn, flags);
> > +    rc = hd->platform_ops->map_page(d, gfn, mfn, flags);
> > +
> > +    if ( unlikely(rc) )
> > +    {
> > +        printk(XENLOG_ERR
> > +               "iommu_map_page: IOMMU mapping gfn %#lx mfn %#lx failed for
> dom%d.",
> > +               gfn, mfn, d->domain_id);
> > +
> > +        if ( !is_hardware_domain(d) )
> > +            domain_crash(d);
> > +    }
> 
> This still may spam the console in at least the case of Dom0.

I am afraid we may need a minor trade-off. What about:

       dprintk(XENLOG_ERR, "...");

to print out in debug mode.

>  For DomU I'd
> really expect you to state in the commit message why no spamming can occur
> (of course assuming it really can't, which I'm not convinced of).
>

In this v4, I think we will still spam the console in extreme cases :(:(..

For mapping:
+                ret = iommu_map_page();
+                if ( unlikely(ret) )
+                {
+                    while ( i-- )
+                        iommu_unmap_page();
+                }

We'll  stop map against any error and unmapping the previous mappings.  The extreme case is error for unmapping the previous mappings.

Again -- I think dprintk is a better solution. Any suggestion?

Quan
Jan Beulich May 10, 2016, 6:53 a.m. UTC | #3
>>> On 10.05.16 at 05:41, <quan.xu@intel.com> wrote:
> On May 10, 2016 12:14 AM, Jan Beulich <JBeulich@suse.com> wrote:
>> >>> On 06.05.16 at 10:54, <quan.xu@intel.com> wrote:
>> > --- a/xen/drivers/passthrough/iommu.c
>> > +++ b/xen/drivers/passthrough/iommu.c
>> > @@ -240,21 +240,47 @@ int iommu_map_page(struct domain *d,
>> unsigned long gfn, unsigned long mfn,
>> >                     unsigned int flags)  {
>> >      const struct domain_iommu *hd = dom_iommu(d);
>> > +    int rc;
>> >
>> >      if ( !iommu_enabled || !hd->platform_ops )
>> >          return 0;
>> >
>> > -    return hd->platform_ops->map_page(d, gfn, mfn, flags);
>> > +    rc = hd->platform_ops->map_page(d, gfn, mfn, flags);
>> > +
>> > +    if ( unlikely(rc) )
>> > +    {
>> > +        printk(XENLOG_ERR
>> > +               "iommu_map_page: IOMMU mapping gfn %#lx mfn %#lx failed for
>> dom%d.",
>> > +               gfn, mfn, d->domain_id);
>> > +
>> > +        if ( !is_hardware_domain(d) )
>> > +            domain_crash(d);
>> > +    }
>> 
>> This still may spam the console in at least the case of Dom0.
> 
> I am afraid we may need a minor trade-off. What about:
> 
>        dprintk(XENLOG_ERR, "...");
> 
> to print out in debug mode.

And be silent in non-debug mode? That's not what we want.

>>  For DomU I'd
>> really expect you to state in the commit message why no spamming can occur
>> (of course assuming it really can't, which I'm not convinced of).
>>
> 
> In this v4, I think we will still spam the console in extreme cases :(:(..
> 
> For mapping:
> +                ret = iommu_map_page();
> +                if ( unlikely(ret) )
> +                {
> +                    while ( i-- )
> +                        iommu_unmap_page();
> +                }
> 
> We'll  stop map against any error and unmapping the previous mappings.  The 
> extreme case is error for unmapping the previous mappings.
> 
> Again -- I think dprintk is a better solution. Any suggestion?

For DomU the solution seems quite obvious: Only log a message if
the domain is not already marked crashed. For Dom0 you'll need to
get a little more creative (but by leveraging the fact that there's
only one in the system, this can't be too difficult a problem to solve:
e.g. "manually" rate limit these messages - see printk_ratelimit() et
al).

Jan
Quan Xu May 10, 2016, 7:53 a.m. UTC | #4
On May 10, 2016 2:54 PM, Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 10.05.16 at 05:41, <quan.xu@intel.com> wrote:
> > On May 10, 2016 12:14 AM, Jan Beulich <JBeulich@suse.com> wrote:
> >> >>> On 06.05.16 at 10:54, <quan.xu@intel.com> wrote:
> >> > --- a/xen/drivers/passthrough/iommu.c
> >> > +++ b/xen/drivers/passthrough/iommu.c
> >> > @@ -240,21 +240,47 @@ int iommu_map_page(struct domain *d,
> >> unsigned long gfn, unsigned long mfn,
> >> >                     unsigned int flags)  {
> >> >      const struct domain_iommu *hd = dom_iommu(d);
> >> > +    int rc;
> >> >
> >> >      if ( !iommu_enabled || !hd->platform_ops )
> >> >          return 0;
> >> >
> >> > -    return hd->platform_ops->map_page(d, gfn, mfn, flags);
> >> > +    rc = hd->platform_ops->map_page(d, gfn, mfn, flags);
> >> > +
> >> > +    if ( unlikely(rc) )
> >> > +    {
> >> > +        printk(XENLOG_ERR
> >> > +               "iommu_map_page: IOMMU mapping gfn %#lx mfn %#lx
> >> > + failed for
> >> dom%d.",
> >> > +               gfn, mfn, d->domain_id);
> >> > +
> >> > +        if ( !is_hardware_domain(d) )
> >> > +            domain_crash(d);
> >> > +    }
> >>
> >> This still may spam the console in at least the case of Dom0.
> >
> > I am afraid we may need a minor trade-off. What about:
> >
> >        dprintk(XENLOG_ERR, "...");
> >
> > to print out in debug mode.
> 
> And be silent in non-debug mode? That's not what we want.
> 

Without your below suggestion, this is really my best solution.

> >>  For DomU I'd
> >> really expect you to state in the commit message why no spamming can
> >> occur (of course assuming it really can't, which I'm not convinced of).
> >>
> >
> > In this v4, I think we will still spam the console in extreme cases :(:(..
> >
> > For mapping:
> > +                ret = iommu_map_page();
> > +                if ( unlikely(ret) )
> > +                {
> > +                    while ( i-- )
> > +                        iommu_unmap_page();
> > +                }
> >
> > We'll  stop map against any error and unmapping the previous mappings.
> > The extreme case is error for unmapping the previous mappings.
> >
> > Again -- I think dprintk is a better solution. Any suggestion?
> 
> For DomU the solution seems quite obvious: Only log a message if the domain
> is not already marked crashed. For Dom0 you'll need to get a little more
> creative (but by leveraging the fact that there's only one in the system, this
> can't be too difficult a problem to solve:
> e.g. "manually" rate limit these messages - see printk_ratelimit() et al).

Amazing!!
As the comment said, printk_ratelimit() is lifted from Linux. referred to the Linux, __iiuc__ , I will fix this issue as below (a variant):

...
+    rc = hd->platform_ops->unmap_page(d, gfn);
+
+    if ( unlikely(rc) )
+    {
+        if ( printk_ratelimit() )
+            printk(XENLOG_ERR
+                   "iommu_unmap_page: IOMMU unmapping gfn %#lx failed for dom%d.",
+                   gfn, d->domain_id);
+
+        if ( !is_hardware_domain(d) )
+            domain_crash(d);
+    }
+
+    return rc;
...

Thanks!!



Quan
Jan Beulich May 10, 2016, 8:02 a.m. UTC | #5
>>> On 10.05.16 at 09:53, <quan.xu@intel.com> wrote:
> +    rc = hd->platform_ops->unmap_page(d, gfn);
> +
> +    if ( unlikely(rc) )
> +    {
> +        if ( printk_ratelimit() )
> +            printk(XENLOG_ERR
> +                   "iommu_unmap_page: IOMMU unmapping gfn %#lx failed for dom%d.",
> +                   gfn, d->domain_id);
> +
> +        if ( !is_hardware_domain(d) )
> +            domain_crash(d);
> +    }
> +
> +    return rc;

But please - as said - also avoid logging any message for already
dying domains.

Jan
Quan Xu May 10, 2016, 8:20 a.m. UTC | #6
On May 10, 2016 4:03 PM, Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 10.05.16 at 09:53, <quan.xu@intel.com> wrote:
> > +    rc = hd->platform_ops->unmap_page(d, gfn);
> > +
> > +    if ( unlikely(rc) )
> > +    {
> > +        if ( printk_ratelimit() )
> > +            printk(XENLOG_ERR
> > +                   "iommu_unmap_page: IOMMU unmapping gfn %#lx failed for
> dom%d.",
> > +                   gfn, d->domain_id);
> > +
> > +        if ( !is_hardware_domain(d) )
> > +            domain_crash(d);
> > +    }
> > +
> > +    return rc;
> 
> But please - as said - also avoid logging any message for already dying
> domains.
> 


Kept Kevin's opinion for later, I hope I have got your point as below:
...
+    rc = hd->platform_ops->unmap_page(d, gfn);
+
+    if ( unlikely(rc) )
+    {
+        if ( is_hardware_domain(d) )
+            if ( printk_ratelimit() )
+                printk(XENLOG_ERR
+                       "iommu_unmap_page: IOMMU unmapping gfn %#lx failed for dom%d.",
+                       gfn, d->domain_id);
+        else
+            domain_crash(d);
+    }
+
+    return rc;
...

Thanks for your patience.
Quan
Jan Beulich May 10, 2016, 8:26 a.m. UTC | #7
>>> On 10.05.16 at 10:20, <quan.xu@intel.com> wrote:
> On May 10, 2016 4:03 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> But please - as said - also avoid logging any message for already dying
>> domains.
>> 
> 
> 
> Kept Kevin's opinion for later, I hope I have got your point as below:
> ...
> +    rc = hd->platform_ops->unmap_page(d, gfn);
> +
> +    if ( unlikely(rc) )
> +    {
> +        if ( is_hardware_domain(d) )
> +            if ( printk_ratelimit() )
> +                printk(XENLOG_ERR
> +                       "iommu_unmap_page: IOMMU unmapping gfn %#lx failed for dom%d.",
> +                       gfn, d->domain_id);
> +        else
> +            domain_crash(d);
> +    }
> +
> +    return rc;
> ...

I don't see how this would address my previous comment (not to
speak of the "else" now being associated with the wrong "if").

Jan
Quan Xu May 12, 2016, 2:28 p.m. UTC | #8
On May 10, 2016 2:54 PM, Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 10.05.16 at 05:41, <quan.xu@intel.com> wrote:
> > On May 10, 2016 12:14 AM, Jan Beulich <JBeulich@suse.com> wrote:
> >> >>> On 06.05.16 at 10:54, <quan.xu@intel.com> wrote:
> For DomU the solution seems quite obvious: Only log a message if the domain
> is not already marked crashed.

Jan, I am still confused about  this sentence and your another sentence ( _as said_ also avoid logging any message for already dying domains).

>  For Dom0 you'll need to get a little more
> creative (but by leveraging the fact that there's only one in the system, this
> can't be too difficult a problem to solve:
> e.g. "manually" rate limit these messages - see printk_ratelimit() et al).
> 

Reading this thread again and again, sorry, I am still inclined to:

+    rc = hd->platform_ops->unmap_page(d, gfn);
+
+    if ( unlikely(rc) )
+    {
+        if ( printk_ratelimit() )
+            printk(XENLOG_ERR
+                   "dom%d: IOMMU unmapping gfn %#lx failed %d.",
+                   d->domain_id, gfn, rc);
+
+        if ( !is_hardware_domain(d) )
+            domain_crash(d);
+    }
+
+    return rc;


Waiting for Kevin's opinion..


Quan
Jan Beulich May 12, 2016, 3:06 p.m. UTC | #9
>>> On 12.05.16 at 16:28, <quan.xu@intel.com> wrote:
> On May 10, 2016 2:54 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> >>> On 10.05.16 at 05:41, <quan.xu@intel.com> wrote:
>> > On May 10, 2016 12:14 AM, Jan Beulich <JBeulich@suse.com> wrote:
>> >> >>> On 06.05.16 at 10:54, <quan.xu@intel.com> wrote:
>> For DomU the solution seems quite obvious: Only log a message if the domain
>> is not already marked crashed.
> 
> Jan, I am still confused about  this sentence and your another sentence ( 
> _as said_ also avoid logging any message for already dying domains).

The two say the same, so I don't see what you're confused about.
Please be more precise.

>>  For Dom0 you'll need to get a little more
>> creative (but by leveraging the fact that there's only one in the system, 
> this
>> can't be too difficult a problem to solve:
>> e.g. "manually" rate limit these messages - see printk_ratelimit() et al).
>> 
> 
> Reading this thread again and again, sorry, I am still inclined to:
> 
> +    rc = hd->platform_ops->unmap_page(d, gfn);
> +
> +    if ( unlikely(rc) )
> +    {
> +        if ( printk_ratelimit() )
> +            printk(XENLOG_ERR
> +                   "dom%d: IOMMU unmapping gfn %#lx failed %d.",
> +                   d->domain_id, gfn, rc);
> +
> +        if ( !is_hardware_domain(d) )
> +            domain_crash(d);
> +    }
> +
> +    return rc;

This is certainly better than unconditional logging, but will still
produce more than one message per crashed guest (or for
Dom0) on a batch of unmaps.

Jan
diff mbox

Patch

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index c59b2ab..09560c0 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -240,21 +240,47 @@  int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
                    unsigned int flags)
 {
     const struct domain_iommu *hd = dom_iommu(d);
+    int rc;
 
     if ( !iommu_enabled || !hd->platform_ops )
         return 0;
 
-    return hd->platform_ops->map_page(d, gfn, mfn, flags);
+    rc = hd->platform_ops->map_page(d, gfn, mfn, flags);
+
+    if ( unlikely(rc) )
+    {
+        printk(XENLOG_ERR
+               "iommu_map_page: IOMMU mapping gfn %#lx mfn %#lx failed for dom%d.",
+               gfn, mfn, d->domain_id);
+
+        if ( !is_hardware_domain(d) )
+            domain_crash(d);
+    }
+
+    return rc;
 }
 
 int iommu_unmap_page(struct domain *d, unsigned long gfn)
 {
     const struct domain_iommu *hd = dom_iommu(d);
+    int rc;
 
     if ( !iommu_enabled || !hd->platform_ops )
         return 0;
 
-    return hd->platform_ops->unmap_page(d, gfn);
+    rc = hd->platform_ops->unmap_page(d, gfn);
+
+    if ( unlikely(rc) )
+    {
+        printk(XENLOG_ERR
+               "iommu_unmap_page: IOMMU unmapping gfn %#lx failed for dom%d.",
+               gfn, d->domain_id);
+
+        if ( !is_hardware_domain(d) )
+            domain_crash(d);
+    }
+
+    return rc;
 }
 
 static void iommu_free_pagetables(unsigned long unused)