diff mbox series

[for-4.15,RESEND,v4,1/2] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying

Message ID 20210224094356.7606-2-julien@xen.org (mailing list archive)
State Superseded
Headers show
Series xen/iommu: Collection of bug fixes for IOMMU teardown | expand

Commit Message

Julien Grall Feb. 24, 2021, 9:43 a.m. UTC
From: Julien Grall <jgrall@amazon.com>

The new x86 IOMMU page-tables allocator will release the pages when
relinquishing the domain resources. However, this is not sufficient
when the domain is dying because nothing prevents page-table to be
allocated.

As the domain is dying, it is not necessary to continue to modify the
IOMMU page-tables as they are going to be destroyed soon.

At the moment, page-table allocates will only happen when iommu_map().
So after this change there will be no more page-table allocation
happening.

In order to observe d->is_dying correctly, we need to rely on per-arch
locking, so the check to ignore IOMMU mapping is added on the per-driver
map_page() callback.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

As discussed in v3, this is only covering 4.15. We can discuss
post-4.15 how to catch page-table allocations if another caller (e.g.
iommu_unmap() if we ever decide to support superpages) start to use the
page-table allocator.

Changes in v4:
    - Move the patch to the top of the queue
    - Reword the commit message

Changes in v3:
    - Patch added. This is a replacement of "xen/iommu: iommu_map: Don't
    crash the domain if it is dying"
---
 xen/drivers/passthrough/amd/iommu_map.c | 12 ++++++++++++
 xen/drivers/passthrough/vtd/iommu.c     | 12 ++++++++++++
 xen/drivers/passthrough/x86/iommu.c     |  6 ++++++
 3 files changed, 30 insertions(+)

Comments

Jan Beulich Feb. 24, 2021, 2:07 p.m. UTC | #1
On 24.02.2021 10:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The new x86 IOMMU page-tables allocator will release the pages when
> relinquishing the domain resources. However, this is not sufficient
> when the domain is dying because nothing prevents page-table to be
> allocated.
> 
> As the domain is dying, it is not necessary to continue to modify the
> IOMMU page-tables as they are going to be destroyed soon.
> 
> At the moment, page-table allocates will only happen when iommu_map().
> So after this change there will be no more page-table allocation
> happening.

While I'm still not happy about this asymmetry, I'm willing to accept
it in the interest of getting the underlying issue addressed. May I
ask though that you add something like "... because we don't use
superpage mappings yet when not sharing page tables"?

But there are two more minor things:

> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -267,6 +267,12 @@ int iommu_free_pgtables(struct domain *d)
>      struct page_info *pg;
>      unsigned int done = 0;
>  
> +    if ( !is_iommu_enabled(d) )
> +        return 0;

Why is this addition needed? Hitting a not yet initialize spin lock
is - afaict - no worse than a not yet initialized list, so it would
seem to me that this can't be the reason. No other reason looks to
be called out by the description.

> +    /* After this barrier, no more IOMMU mapping can happen */
> +    spin_barrier(&hd->arch.mapping_lock);

On the v3 discussion I thought you did agree to change the wording
of the comment to something like "no new IOMMU mappings can be
inserted"?

Jan
Julien Grall Feb. 25, 2021, 11:56 a.m. UTC | #2
Hi Jan,

On 24/02/2021 14:07, Jan Beulich wrote:
> On 24.02.2021 10:43, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The new x86 IOMMU page-tables allocator will release the pages when
>> relinquishing the domain resources. However, this is not sufficient
>> when the domain is dying because nothing prevents page-table to be
>> allocated.
>>
>> As the domain is dying, it is not necessary to continue to modify the
>> IOMMU page-tables as they are going to be destroyed soon.
>>
>> At the moment, page-table allocates will only happen when iommu_map().
>> So after this change there will be no more page-table allocation
>> happening.
> 
> While I'm still not happy about this asymmetry, I'm willing to accept
> it in the interest of getting the underlying issue addressed. May I
> ask though that you add something like "... because we don't use
> superpage mappings yet when not sharing page tables"?

Sure.

> But there are two more minor things:
> 
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -267,6 +267,12 @@ int iommu_free_pgtables(struct domain *d)
>>       struct page_info *pg;
>>       unsigned int done = 0;
>>   
>> +    if ( !is_iommu_enabled(d) )
>> +        return 0;
> 
> Why is this addition needed? Hitting a not yet initialize spin lock
> is - afaict - no worse than a not yet initialized list, so it would
> seem to me that this can't be the reason. No other reason looks to
> be called out by the description.

struct domain_iommu will be initially zeroed as it is part of struct domain.

For the list, we are so far fine because page_list_remove_head() 
tolerates NULL. If we were using the normal list operations (e.g. 
list_del), then this code would have segfaulted.

Now about the spinlock, at least lock debugging expects a non-zero 
initial value. We are lucky here because this path is not called with 
IRQ disabled. If it were, Xen would crash as it would consider the lock 
was used in a non-IRQ safe environment.

So in the spinlock case, we are really playing with the fire. Hence, the 
check here.

>> +    /* After this barrier, no more IOMMU mapping can happen */
>> +    spin_barrier(&hd->arch.mapping_lock);
> 
> On the v3 discussion I thought you did agree to change the wording
> of the comment to something like "no new IOMMU mappings can be
> inserted"?

Sorry I missed this comment. I will update it in the next version.

Cheers,
Jan Beulich Feb. 25, 2021, 1:18 p.m. UTC | #3
On 25.02.2021 12:56, Julien Grall wrote:
> On 24/02/2021 14:07, Jan Beulich wrote:
>> On 24.02.2021 10:43, Julien Grall wrote:
>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -267,6 +267,12 @@ int iommu_free_pgtables(struct domain *d)
>>>       struct page_info *pg;
>>>       unsigned int done = 0;
>>>   
>>> +    if ( !is_iommu_enabled(d) )
>>> +        return 0;
>>
>> Why is this addition needed? Hitting a not yet initialize spin lock
>> is - afaict - no worse than a not yet initialized list, so it would
>> seem to me that this can't be the reason. No other reason looks to
>> be called out by the description.
> 
> struct domain_iommu will be initially zeroed as it is part of struct domain.
> 
> For the list, we are so far fine because page_list_remove_head() 
> tolerates NULL. If we were using the normal list operations (e.g. 
> list_del), then this code would have segfaulted.

And so we do, in the CONFIG_BIGMEM case. May I suggest then to split
this out as a prereq patch, or add wording to the description
mentioning this additional effect?

Jan
Julien Grall Feb. 26, 2021, 10:47 a.m. UTC | #4
Hi Jan,

On 25/02/2021 13:18, Jan Beulich wrote:
> On 25.02.2021 12:56, Julien Grall wrote:
>> On 24/02/2021 14:07, Jan Beulich wrote:
>>> On 24.02.2021 10:43, Julien Grall wrote:
>>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>>> @@ -267,6 +267,12 @@ int iommu_free_pgtables(struct domain *d)
>>>>        struct page_info *pg;
>>>>        unsigned int done = 0;
>>>>    
>>>> +    if ( !is_iommu_enabled(d) )
>>>> +        return 0;
>>>
>>> Why is this addition needed? Hitting a not yet initialize spin lock
>>> is - afaict - no worse than a not yet initialized list, so it would
>>> seem to me that this can't be the reason. No other reason looks to
>>> be called out by the description.
>>
>> struct domain_iommu will be initially zeroed as it is part of struct domain.
>>
>> For the list, we are so far fine because page_list_remove_head()
>> tolerates NULL. If we were using the normal list operations (e.g.
>> list_del), then this code would have segfaulted.
> 
> And so we do, in the CONFIG_BIGMEM case. May I suggest then to split
> this out as a prereq patch, or add wording to the description
> mentioning this additional effect?

You are correct, I can crash the hypervisor when enabling 
CONFIG_BIGMEM=y and not using the IOMMU. I will move this chunk in a 
separate patch.

Cheers,
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index d3a8b1aec766..560af54b765b 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -285,6 +285,18 @@  int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t mfn,
 
     spin_lock(&hd->arch.mapping_lock);
 
+    /*
+     * IOMMU mapping request can be safely ignored when the domain is dying.
+     *
+     * hd->arch.mapping_lock guarantees that d->is_dying will be observed
+     * before any page tables are freed (see iommu_free_pgtables()).
+     */
+    if ( d->is_dying )
+    {
+        spin_unlock(&hd->arch.mapping_lock);
+        return 0;
+    }
+
     rc = amd_iommu_alloc_root(d);
     if ( rc )
     {
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index d136fe36883b..b549a71530d5 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1762,6 +1762,18 @@  static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
 
     spin_lock(&hd->arch.mapping_lock);
 
+    /*
+     * IOMMU mapping request can be safely ignored when the domain is dying.
+     *
+     * hd->arch.mapping_lock guarantees that d->is_dying will be observed
+     * before any page tables are freed (see iommu_free_pgtables())
+     */
+    if ( d->is_dying )
+    {
+        spin_unlock(&hd->arch.mapping_lock);
+        return 0;
+    }
+
     pg_maddr = addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), 1);
     if ( !pg_maddr )
     {
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index cea1032b3d02..c6b03624fe28 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -267,6 +267,12 @@  int iommu_free_pgtables(struct domain *d)
     struct page_info *pg;
     unsigned int done = 0;
 
+    if ( !is_iommu_enabled(d) )
+        return 0;
+
+    /* After this barrier, no more IOMMU mapping can happen */
+    spin_barrier(&hd->arch.mapping_lock);
+
     while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
     {
         free_domheap_page(pg);