diff mbox series

[for-4.15,v3,2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying

Message ID 20210217142458.3769-3-julien@xen.org (mailing list archive)
State New
Headers show
Series xen/iommu: Collection of bug fixes for IOMMU teadorwn | expand

Commit Message

Julien Grall Feb. 17, 2021, 2:24 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

The new x86 IOMMU page-tables allocator will release the pages when
relinquishing the domain resources. However, this is not sufficient
when the domain is dying because nothing prevents page-table to be
allocated.

Currently page-table allocations can only happen from iommu_map(). As
the domain is dying, there is no good reason to continue to modify the
IOMMU page-tables.

In order to observe d->is_dying correctly, we need to rely on per-arch
locking, so the check to ignore IOMMU mapping is added on the per-driver
map_page() callback.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

Changes in v3:
    - Patch added. This is a replacement of "xen/iommu: iommu_map: Don't
    crash the domain if it is dying"
---
 xen/drivers/passthrough/amd/iommu_map.c | 13 +++++++++++++
 xen/drivers/passthrough/vtd/iommu.c     | 13 +++++++++++++
 xen/drivers/passthrough/x86/iommu.c     |  3 +++
 3 files changed, 29 insertions(+)

Comments

Jan Beulich Feb. 17, 2021, 3:01 p.m. UTC | #1
On 17.02.2021 15:24, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The new x86 IOMMU page-tables allocator will release the pages when
> relinquishing the domain resources. However, this is not sufficient
> when the domain is dying because nothing prevents page-table to be
> allocated.
> 
> Currently page-table allocations can only happen from iommu_map(). As
> the domain is dying, there is no good reason to continue to modify the
> IOMMU page-tables.

While I agree this to be the case right now, I'm not sure it is a
good idea to build on it (in that you leave the unmap paths
untouched). Imo there's a fair chance this would be overlooked at
the point super page mappings get introduced (which has been long
overdue), and I thought prior discussion had lead to a possible
approach without risking use-after-free due to squashed unmap
requests.

> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -273,6 +273,9 @@ int iommu_free_pgtables(struct domain *d)
>      /*
>       * Pages will be moved to the free list below. So we want to
>       * clear the root page-table to avoid any potential use after-free.
> +     *
> +     * After this call, no more IOMMU mapping can happen.
> +     *
>       */
>      hd->platform_ops->clear_root_pgtable(d);

I.e. you utilize the call in place of spin_barrier(). Maybe worth
saying in the comment?

Also, nit: Stray blank comment line.

Jan
Julien Grall Feb. 17, 2021, 4:07 p.m. UTC | #2
Hi Jan,

On 17/02/2021 15:01, Jan Beulich wrote:
> On 17.02.2021 15:24, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The new x86 IOMMU page-tables allocator will release the pages when
>> relinquishing the domain resources. However, this is not sufficient
>> when the domain is dying because nothing prevents page-table to be
>> allocated.
>>
>> Currently page-table allocations can only happen from iommu_map(). As
>> the domain is dying, there is no good reason to continue to modify the
>> IOMMU page-tables.
> 
> While I agree this to be the case right now, I'm not sure it is a
> good idea to build on it (in that you leave the unmap paths
> untouched).

I don't build on that assumption. See next patch.

> Imo there's a fair chance this would be overlooked at
> the point super page mappings get introduced (which has been long
> overdue), and I thought prior discussion had lead to a possible
> approach without risking use-after-free due to squashed unmap
> requests.

I know you suggested to zap the root page-tables... However, I don't 
think this is 4.15 material and you agree with this (you were the one 
pointed out that out).

>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -273,6 +273,9 @@ int iommu_free_pgtables(struct domain *d)
>>       /*
>>        * Pages will be moved to the free list below. So we want to
>>        * clear the root page-table to avoid any potential use after-free.
>> +     *
>> +     * After this call, no more IOMMU mapping can happen.
>> +     *
>>        */
>>       hd->platform_ops->clear_root_pgtable(d);
> 
> I.e. you utilize the call in place of spin_barrier(). Maybe worth
> saying in the comment?

Sure.

> 
> Also, nit: Stray blank comment line.

Cheers,
Jan Beulich Feb. 18, 2021, 1:05 p.m. UTC | #3
On 17.02.2021 17:07, Julien Grall wrote:
> On 17/02/2021 15:01, Jan Beulich wrote:
>> On 17.02.2021 15:24, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> The new x86 IOMMU page-tables allocator will release the pages when
>>> relinquishing the domain resources. However, this is not sufficient
>>> when the domain is dying because nothing prevents page-table to be
>>> allocated.
>>>
>>> Currently page-table allocations can only happen from iommu_map(). As
>>> the domain is dying, there is no good reason to continue to modify the
>>> IOMMU page-tables.
>>
>> While I agree this to be the case right now, I'm not sure it is a
>> good idea to build on it (in that you leave the unmap paths
>> untouched).
> 
> I don't build on that assumption. See next patch.

Yet as said there that patch makes unmapping perhaps more fragile,
by introducing a latent error source into the path.

>> Imo there's a fair chance this would be overlooked at
>> the point super page mappings get introduced (which has been long
>> overdue), and I thought prior discussion had lead to a possible
>> approach without risking use-after-free due to squashed unmap
>> requests.
> 
> I know you suggested to zap the root page-tables... However, I don't 
> think this is 4.15 material and you agree with this (you were the one 
> pointed out that out).

Paul - do you have any thoughts here? Outright zapping isn't
going to work, we'd need to switch to quarantine page tables at
the very least to prevent the issue with babbling devices. But
that still seems better to me than the introduction of a latent
issue on the unmap paths.

>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -273,6 +273,9 @@ int iommu_free_pgtables(struct domain *d)
>>>       /*
>>>        * Pages will be moved to the free list below. So we want to
>>>        * clear the root page-table to avoid any potential use after-free.
>>> +     *
>>> +     * After this call, no more IOMMU mapping can happen.
>>> +     *
>>>        */
>>>       hd->platform_ops->clear_root_pgtable(d);
>>
>> I.e. you utilize the call in place of spin_barrier(). Maybe worth
>> saying in the comment?
> 
> Sure.

Btw - "no more IOMMU mapping" is also possibly ambiguous here:
One might take it to mean both maps and unmaps. How about "no
new IOMMU mappings can be inserted", as long as the unmap paths
don't follow suit?

Jan
Julien Grall Feb. 18, 2021, 1:25 p.m. UTC | #4
Hi,

On 18/02/2021 13:05, Jan Beulich wrote:
> On 17.02.2021 17:07, Julien Grall wrote:
>> On 17/02/2021 15:01, Jan Beulich wrote:
>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> The new x86 IOMMU page-tables allocator will release the pages when
>>>> relinquishing the domain resources. However, this is not sufficient
>>>> when the domain is dying because nothing prevents page-table to be
>>>> allocated.
>>>>
>>>> Currently page-table allocations can only happen from iommu_map(). As
>>>> the domain is dying, there is no good reason to continue to modify the
>>>> IOMMU page-tables.
>>>
>>> While I agree this to be the case right now, I'm not sure it is a
>>> good idea to build on it (in that you leave the unmap paths
>>> untouched).
>>
>> I don't build on that assumption. See next patch.
> 
> Yet as said there that patch makes unmapping perhaps more fragile,
> by introducing a latent error source into the path.

I still don't see what latent error my patch will introduce. Allocation 
of page-tables are not guarantee to succeed.

So are you implying that a code may rely on the page allocation to succeed?

> 
>>> Imo there's a fair chance this would be overlooked at
>>> the point super page mappings get introduced (which has been long
>>> overdue), and I thought prior discussion had lead to a possible
>>> approach without risking use-after-free due to squashed unmap
>>> requests.
>>
>> I know you suggested to zap the root page-tables... However, I don't
>> think this is 4.15 material and you agree with this (you were the one
>> pointed out that out).
> 
> Paul - do you have any thoughts here? Outright zapping isn't
> going to work, we'd need to switch to quarantine page tables at
> the very least to prevent the issue with babbling devices. But
> that still seems better to me than the introduction of a latent
> issue on the unmap paths.

I am afraid I am not going to be able to come up with such patch for 
4.15. If you want to go that route for 4.15, then feel free to suggest a 
patch.

[...]

> Btw - "no more IOMMU mapping" is also possibly ambiguous here:
> One might take it to mean both maps and unmaps. How about "no
> new IOMMU mappings can be inserted", as long as the unmap paths
> don't follow suit?

Sure.

Cheers,
Paul Durrant Feb. 18, 2021, 2 p.m. UTC | #5
On 18/02/2021 13:05, Jan Beulich wrote:
> On 17.02.2021 17:07, Julien Grall wrote:
>> On 17/02/2021 15:01, Jan Beulich wrote:
>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> The new x86 IOMMU page-tables allocator will release the pages when
>>>> relinquishing the domain resources. However, this is not sufficient
>>>> when the domain is dying because nothing prevents page-table to be
>>>> allocated.
>>>>
>>>> Currently page-table allocations can only happen from iommu_map(). As
>>>> the domain is dying, there is no good reason to continue to modify the
>>>> IOMMU page-tables.
>>>
>>> While I agree this to be the case right now, I'm not sure it is a
>>> good idea to build on it (in that you leave the unmap paths
>>> untouched).
>>
>> I don't build on that assumption. See next patch.
> 
> Yet as said there that patch makes unmapping perhaps more fragile,
> by introducing a latent error source into the path.
> 
>>> Imo there's a fair chance this would be overlooked at
>>> the point super page mappings get introduced (which has been long
>>> overdue), and I thought prior discussion had lead to a possible
>>> approach without risking use-after-free due to squashed unmap
>>> requests.
>>
>> I know you suggested to zap the root page-tables... However, I don't
>> think this is 4.15 material and you agree with this (you were the one
>> pointed out that out).
> 
> Paul - do you have any thoughts here? Outright zapping isn't
> going to work, we'd need to switch to quarantine page tables at
> the very least to prevent the issue with babbling devices. But
> that still seems better to me than the introduction of a latent
> issue on the unmap paths.
> 

I've not really been following this. AFAICT clear_root_pgtable() does 
not actually mess with any context entries so the device view of the 
tables won't change, will it?

   Paul
Jan Beulich Feb. 19, 2021, 8:49 a.m. UTC | #6
On 18.02.2021 14:25, Julien Grall wrote:
> Hi,
> 
> On 18/02/2021 13:05, Jan Beulich wrote:
>> On 17.02.2021 17:07, Julien Grall wrote:
>>> On 17/02/2021 15:01, Jan Beulich wrote:
>>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>
>>>>> The new x86 IOMMU page-tables allocator will release the pages when
>>>>> relinquishing the domain resources. However, this is not sufficient
>>>>> when the domain is dying because nothing prevents page-table to be
>>>>> allocated.
>>>>>
>>>>> Currently page-table allocations can only happen from iommu_map(). As
>>>>> the domain is dying, there is no good reason to continue to modify the
>>>>> IOMMU page-tables.
>>>>
>>>> While I agree this to be the case right now, I'm not sure it is a
>>>> good idea to build on it (in that you leave the unmap paths
>>>> untouched).
>>>
>>> I don't build on that assumption. See next patch.
>>
>> Yet as said there that patch makes unmapping perhaps more fragile,
>> by introducing a latent error source into the path.
> 
> I still don't see what latent error my patch will introduce. Allocation 
> of page-tables are not guarantee to succeed.
> 
> So are you implying that a code may rely on the page allocation to succeed?

I'm implying that failure to split a superpage may have unknown
consequences. Since we make no use of superpages when not
sharing page tables, I call this a latent issue which may go
unnoticed for quite some time once no longer latent.

Jan
Jan Beulich Feb. 19, 2021, 8:56 a.m. UTC | #7
On 18.02.2021 15:00, Paul Durrant wrote:
> On 18/02/2021 13:05, Jan Beulich wrote:
>> On 17.02.2021 17:07, Julien Grall wrote:
>>> On 17/02/2021 15:01, Jan Beulich wrote:
>>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>
>>>>> The new x86 IOMMU page-tables allocator will release the pages when
>>>>> relinquishing the domain resources. However, this is not sufficient
>>>>> when the domain is dying because nothing prevents page-table to be
>>>>> allocated.
>>>>>
>>>>> Currently page-table allocations can only happen from iommu_map(). As
>>>>> the domain is dying, there is no good reason to continue to modify the
>>>>> IOMMU page-tables.
>>>>
>>>> While I agree this to be the case right now, I'm not sure it is a
>>>> good idea to build on it (in that you leave the unmap paths
>>>> untouched).
>>>
>>> I don't build on that assumption. See next patch.
>>
>> Yet as said there that patch makes unmapping perhaps more fragile,
>> by introducing a latent error source into the path.
>>
>>>> Imo there's a fair chance this would be overlooked at
>>>> the point super page mappings get introduced (which has been long
>>>> overdue), and I thought prior discussion had lead to a possible
>>>> approach without risking use-after-free due to squashed unmap
>>>> requests.
>>>
>>> I know you suggested to zap the root page-tables... However, I don't
>>> think this is 4.15 material and you agree with this (you were the one
>>> pointed out that out).
>>
>> Paul - do you have any thoughts here? Outright zapping isn't
>> going to work, we'd need to switch to quarantine page tables at
>> the very least to prevent the issue with babbling devices. But
>> that still seems better to me than the introduction of a latent
>> issue on the unmap paths.
>>
> 
> I've not really been following this. AFAICT clear_root_pgtable() does 
> not actually mess with any context entries so the device view of the 
> tables won't change, will it?

Correct. Elsewhere I said that "zapping" here has two meanings,
one is what Julien does and the other is what I mean and what I
understand you also refer to - to actually replace the mappings
referenced by a context entry as soon as we no longer guarantee
to update them correctly.

My concern with not touching the unmap paths is that if there
was any "best effort" unmap left anywhere in the tree, we may
still end up with a use-after-free when late unmaps are now
made possibly fail by failing late allocation attempts.

Jan
Julien Grall Feb. 19, 2021, 9:24 a.m. UTC | #8
Hi Jan,

On 19/02/2021 08:49, Jan Beulich wrote:
> On 18.02.2021 14:25, Julien Grall wrote:
>> Hi,
>>
>> On 18/02/2021 13:05, Jan Beulich wrote:
>>> On 17.02.2021 17:07, Julien Grall wrote:
>>>> On 17/02/2021 15:01, Jan Beulich wrote:
>>>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>>
>>>>>> The new x86 IOMMU page-tables allocator will release the pages when
>>>>>> relinquishing the domain resources. However, this is not sufficient
>>>>>> when the domain is dying because nothing prevents page-table to be
>>>>>> allocated.
>>>>>>
>>>>>> Currently page-table allocations can only happen from iommu_map(). As
>>>>>> the domain is dying, there is no good reason to continue to modify the
>>>>>> IOMMU page-tables.
>>>>>
>>>>> While I agree this to be the case right now, I'm not sure it is a
>>>>> good idea to build on it (in that you leave the unmap paths
>>>>> untouched).
>>>>
>>>> I don't build on that assumption. See next patch.
>>>
>>> Yet as said there that patch makes unmapping perhaps more fragile,
>>> by introducing a latent error source into the path.
>>
>> I still don't see what latent error my patch will introduce. Allocation
>> of page-tables are not guarantee to succeed.
>>
>> So are you implying that a code may rely on the page allocation to succeed?
> 
> I'm implying that failure to split a superpage may have unknown
> consequences.

As failure to flush the TLBs (see below).

> Since we make no use of superpages when not
> sharing page tables, I call this a latent issue which may go
> unnoticed for quite some time once no longer latent.

And it would take a lot longer to unnotice an OOM as this should happen 
less often ;).

But, I think this is wrong to blame the lower layer for a (latent) bug 
in the upper layer...

Even with your solution to zap page-tables, there is still a risk of 
failure because the TLB flush may fail (the operation can return an error).

So regardless the solution, we still need to have callers that function 
correctly.

Anyway, what matters right now is fixing a host crash when running PV 
guest with passthrough.  Neither patch #3 nor the iommu_unmap() part is 
strictly necessary for 4.15 (as you say this is latent).

So I would suggest to defer this discussion post-4.15.

Cheers,
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index d3a8b1aec766..ed78a083ba12 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -285,6 +285,19 @@  int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t mfn,
 
     spin_lock(&hd->arch.mapping_lock);
 
+    /*
+     * IOMMU mapping request can be safely ignored when the domain is dying.
+     *
+     * hd->arch.mapping_lock guarantees that d->is_dying will be observed
+     * before any page tables are freed (see iommu_free_pgtables() and
+     * iommu_clear_root_pgtable()).
+     */
+    if ( d->is_dying )
+    {
+        spin_unlock(&hd->arch.mapping_lock);
+        return 0;
+    }
+
     rc = amd_iommu_alloc_root(d);
     if ( rc )
     {
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index e1871f6c2bc1..239a63f74f64 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1771,6 +1771,19 @@  static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
 
     spin_lock(&hd->arch.mapping_lock);
 
+    /*
+     * IOMMU mapping request can be safely ignored when the domain is dying.
+     *
+     * hd->arch.mapping_lock guarantees that d->is_dying will be observed
+     * before any page tables are freed (see iommu_free_pgtables() and
+     * iommu_clear_root_pgtable()).
+     */
+    if ( d->is_dying )
+    {
+        spin_unlock(&hd->arch.mapping_lock);
+        return 0;
+    }
+
     pg_maddr = addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), 1);
     if ( !pg_maddr )
     {
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index f54fc8093f18..faa0078db595 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -273,6 +273,9 @@  int iommu_free_pgtables(struct domain *d)
     /*
      * Pages will be moved to the free list below. So we want to
      * clear the root page-table to avoid any potential use after-free.
+     *
+     * After this call, no more IOMMU mapping can happen.
+     *
      */
     hd->platform_ops->clear_root_pgtable(d);