diff mbox series

[for-4.15,2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held

Message ID 20201222154338.9459-3-julien@xen.org (mailing list archive)
State Superseded
Headers show
Series xen/iommu: Collection of bug fixes for IOMMU teadorwn | expand

Commit Message

Julien Grall Dec. 22, 2020, 3:43 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

The pgtables.lock is protecting access to the page list pgtables.list.
However, iommu_free_pgtables() will not held it. I guess it was assumed
that page-tables cannot be allocated while the domain is dying.

Unfortunately, there is no guarantee that iommu_map() will not be
called while a domain is dying (it looks like to be possible from
XEN_DOMCTL_memory_mapping). So it would be possible to be concurrently
allocate memory and free the page-tables.

Therefore, we need to held the lock when freeing the page tables.

There are more issues around the IOMMU page-allocator. They will be
handled in follow-up patches.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/drivers/passthrough/x86/iommu.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Jan Beulich Dec. 23, 2020, 1:48 p.m. UTC | #1
On 22.12.2020 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The pgtables.lock is protecting access to the page list pgtables.list.
> However, iommu_free_pgtables() will not held it. I guess it was assumed
> that page-tables cannot be allocated while the domain is dying.
> 
> Unfortunately, there is no guarantee that iommu_map() will not be
> called while a domain is dying (it looks like to be possible from
> XEN_DOMCTL_memory_mapping).

I'd rather disallow any new allocations for a dying domain, not
the least because ...

> So it would be possible to be concurrently
> allocate memory and free the page-tables.
> 
> Therefore, we need to held the lock when freeing the page tables.

... we should try to avoid holding locks across allocation /
freeing functions wherever possible.

As to where to place a respective check - I wonder if we wouldn't
be better off disallowing a majority of domctl-s (and perhaps
other operations) on dying domains. Thoughts?

Jan
Julien Grall Dec. 23, 2020, 2:01 p.m. UTC | #2
Hi Jan,

On 23/12/2020 13:48, Jan Beulich wrote:
> On 22.12.2020 16:43, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The pgtables.lock is protecting access to the page list pgtables.list.
>> However, iommu_free_pgtables() will not held it. I guess it was assumed
>> that page-tables cannot be allocated while the domain is dying.
>>
>> Unfortunately, there is no guarantee that iommu_map() will not be
>> called while a domain is dying (it looks like to be possible from
>> XEN_DOMCTL_memory_mapping).
> 
> I'd rather disallow any new allocations for a dying domain, not
> the least because ...

Patch #4 will disallow such allocation. However...

> 
>> So it would be possible to be concurrently
>> allocate memory and free the page-tables.
>>
>> Therefore, we need to held the lock when freeing the page tables.
> 
> ... we should try to avoid holding locks across allocation /
> freeing functions wherever possible. >
> As to where to place a respective check - I wonder if we wouldn't
> be better off disallowing a majority of domctl-s (and perhaps
> other operations) on dying domains. Thoughts?

... this is still pretty racy because you need to guarantee that 
d->is_dying is seen by the other processors to prevent allocation.

As to whether we can forbid most of the domctl-s, I would agree this is 
a good move. But this doesn't remove the underlying problem here.

We are hoping that a top-level function will protect us against the 
race. Given the IOMMU code is quite deep in the callstack, this is 
something pretty hard to guarantee with future change.

So I still think we need a way to mitigate the issue.

Cheers,
Jan Beulich Dec. 23, 2020, 2:16 p.m. UTC | #3
On 23.12.2020 15:01, Julien Grall wrote:
> Hi Jan,
> 
> On 23/12/2020 13:48, Jan Beulich wrote:
>> On 22.12.2020 16:43, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> The pgtables.lock is protecting access to the page list pgtables.list.
>>> However, iommu_free_pgtables() will not held it. I guess it was assumed
>>> that page-tables cannot be allocated while the domain is dying.
>>>
>>> Unfortunately, there is no guarantee that iommu_map() will not be
>>> called while a domain is dying (it looks like to be possible from
>>> XEN_DOMCTL_memory_mapping).
>>
>> I'd rather disallow any new allocations for a dying domain, not
>> the least because ...
> 
> Patch #4 will disallow such allocation. However...
> 
>>
>>> So it would be possible to be concurrently
>>> allocate memory and free the page-tables.
>>>
>>> Therefore, we need to held the lock when freeing the page tables.
>>
>> ... we should try to avoid holding locks across allocation /
>> freeing functions wherever possible. >
>> As to where to place a respective check - I wonder if we wouldn't
>> be better off disallowing a majority of domctl-s (and perhaps
>> other operations) on dying domains. Thoughts?
> 
> ... this is still pretty racy because you need to guarantee that 
> d->is_dying is seen by the other processors to prevent allocation.

The function freeing the page tables will need a spin_barrier()
or alike similar to evtchn_destroy(). Aiui this will eliminate
all potential for races.

Jan
Julien Grall Jan. 14, 2021, 7:19 p.m. UTC | #4
On 23/12/2020 14:01, Julien Grall wrote:
> Hi Jan,
> 
> On 23/12/2020 13:48, Jan Beulich wrote:
>> On 22.12.2020 16:43, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> The pgtables.lock is protecting access to the page list pgtables.list.
>>> However, iommu_free_pgtables() will not held it. I guess it was assumed
>>> that page-tables cannot be allocated while the domain is dying.
>>>
>>> Unfortunately, there is no guarantee that iommu_map() will not be
>>> called while a domain is dying (it looks like to be possible from
>>> XEN_DOMCTL_memory_mapping).
>>
>> I'd rather disallow any new allocations for a dying domain, not
>> the least because ...
> 
> Patch #4 will disallow such allocation. However...

It turns out that I can't disallow it because there is at least one call 
of iommu_map() while an HVM domain is destroyed:

     (XEN)    [<ffff82d04026e399>] R iommu_map+0xf2/0x171
     (XEN)    [<ffff82d04026e43e>] F iommu_legacy_map+0x26/0x63
     (XEN)    [<ffff82d040302a42>] F 
arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730
     (XEN)    [<ffff82d0402f761d>] F p2m_set_entry+0x91/0x128
     (XEN)    [<ffff82d0402f8643>] F guest_physmap_add_entry+0x3d7/0x4e0
     (XEN)    [<ffff82d0402f87cb>] F guest_physmap_add_page+0x7f/0xfe
     (XEN)    [<ffff82d0402ba0a2>] F 
arch/x86/hvm/ioreq.c#hvm_add_ioreq_gfn+0x6f/0x8d
     (XEN)    [<ffff82d0402ba0f9>] F 
arch/x86/hvm/ioreq.c#hvm_ioreq_server_disable+0x39/0x4f
     (XEN)    [<ffff82d0402bb50e>] F hvm_destroy_all_ioreq_servers+0x6f/0x9b
     (XEN)    [<ffff82d0402afc15>] F 
hvm_domain_relinquish_resources+0x3e/0x92
     (XEN)    [<ffff82d04031f278>] F domain_relinquish_resources+0x355/0x372
     (XEN)    [<ffff82d040205dd4>] F domain_kill+0xc7/0x150
     (XEN)    [<ffff82d04023baf0>] F do_domctl+0xba7/0x1a09
     (XEN)    [<ffff82d040312c8f>] F pv_hypercall+0x2f0/0x55f
     (XEN)    [<ffff82d040391432>] F lstar_enter+0x112/0x120

This one resulted to a domain crash rather than a clean destruction.

I would still like to disallow page-table allocation, so I am think to 
ignore any request in iommu_map() if the domain is dying.

Cheers,
Jan Beulich Jan. 15, 2021, 11:06 a.m. UTC | #5
On 14.01.2021 20:19, Julien Grall wrote:
> 
> 
> On 23/12/2020 14:01, Julien Grall wrote:
>> Hi Jan,
>>
>> On 23/12/2020 13:48, Jan Beulich wrote:
>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> The pgtables.lock is protecting access to the page list pgtables.list.
>>>> However, iommu_free_pgtables() will not held it. I guess it was assumed
>>>> that page-tables cannot be allocated while the domain is dying.
>>>>
>>>> Unfortunately, there is no guarantee that iommu_map() will not be
>>>> called while a domain is dying (it looks like to be possible from
>>>> XEN_DOMCTL_memory_mapping).
>>>
>>> I'd rather disallow any new allocations for a dying domain, not
>>> the least because ...
>>
>> Patch #4 will disallow such allocation. However...
> 
> It turns out that I can't disallow it because there is at least one call 
> of iommu_map() while an HVM domain is destroyed:
> 
>      (XEN)    [<ffff82d04026e399>] R iommu_map+0xf2/0x171
>      (XEN)    [<ffff82d04026e43e>] F iommu_legacy_map+0x26/0x63
>      (XEN)    [<ffff82d040302a42>] F 
> arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730
>      (XEN)    [<ffff82d0402f761d>] F p2m_set_entry+0x91/0x128
>      (XEN)    [<ffff82d0402f8643>] F guest_physmap_add_entry+0x3d7/0x4e0
>      (XEN)    [<ffff82d0402f87cb>] F guest_physmap_add_page+0x7f/0xfe
>      (XEN)    [<ffff82d0402ba0a2>] F 
> arch/x86/hvm/ioreq.c#hvm_add_ioreq_gfn+0x6f/0x8d
>      (XEN)    [<ffff82d0402ba0f9>] F 
> arch/x86/hvm/ioreq.c#hvm_ioreq_server_disable+0x39/0x4f
>      (XEN)    [<ffff82d0402bb50e>] F hvm_destroy_all_ioreq_servers+0x6f/0x9b
>      (XEN)    [<ffff82d0402afc15>] F 
> hvm_domain_relinquish_resources+0x3e/0x92
>      (XEN)    [<ffff82d04031f278>] F domain_relinquish_resources+0x355/0x372
>      (XEN)    [<ffff82d040205dd4>] F domain_kill+0xc7/0x150
>      (XEN)    [<ffff82d04023baf0>] F do_domctl+0xba7/0x1a09
>      (XEN)    [<ffff82d040312c8f>] F pv_hypercall+0x2f0/0x55f
>      (XEN)    [<ffff82d040391432>] F lstar_enter+0x112/0x120
> 
> This one resulted to a domain crash rather than a clean destruction.

A domain crash despite the domain already getting cleaned up is
something we may at least want to consider doing better: There
already is a !d->is_shutting_down conditional printk() there.
What would people think about avoiding the domain_crash() call
in similar ways? (It could even be considered to make
domain_crash() itself behave like this, but such a step may make
it necessary to first audit all use sites.)

> I would still like to disallow page-table allocation, so I am think to 
> ignore any request in iommu_map() if the domain is dying.

Ignoring requests there seems fragile to me. Paul - what are your
thoughts about bailing early from hvm_add_ioreq_gfn() when the
domain is dying?

Jan
Paul Durrant Jan. 15, 2021, 3:18 p.m. UTC | #6
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 15 January 2021 11:07
> To: Julien Grall <julien@xen.org>; Paul Durrant <paul@xen.org>
> Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; xen-devel@lists.xenproject.org; Andrew
> Cooper <andrew.cooper3@citrix.com>
> Subject: Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock
> held
> 
> On 14.01.2021 20:19, Julien Grall wrote:
> >
> >
> > On 23/12/2020 14:01, Julien Grall wrote:
> >> Hi Jan,
> >>
> >> On 23/12/2020 13:48, Jan Beulich wrote:
> >>> On 22.12.2020 16:43, Julien Grall wrote:
> >>>> From: Julien Grall <jgrall@amazon.com>
> >>>>
> >>>> The pgtables.lock is protecting access to the page list pgtables.list.
> >>>> However, iommu_free_pgtables() will not held it. I guess it was assumed
> >>>> that page-tables cannot be allocated while the domain is dying.
> >>>>
> >>>> Unfortunately, there is no guarantee that iommu_map() will not be
> >>>> called while a domain is dying (it looks like to be possible from
> >>>> XEN_DOMCTL_memory_mapping).
> >>>
> >>> I'd rather disallow any new allocations for a dying domain, not
> >>> the least because ...
> >>
> >> Patch #4 will disallow such allocation. However...
> >
> > It turns out that I can't disallow it because there is at least one call
> > of iommu_map() while an HVM domain is destroyed:
> >
> >      (XEN)    [<ffff82d04026e399>] R iommu_map+0xf2/0x171
> >      (XEN)    [<ffff82d04026e43e>] F iommu_legacy_map+0x26/0x63
> >      (XEN)    [<ffff82d040302a42>] F
> > arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730
> >      (XEN)    [<ffff82d0402f761d>] F p2m_set_entry+0x91/0x128
> >      (XEN)    [<ffff82d0402f8643>] F guest_physmap_add_entry+0x3d7/0x4e0
> >      (XEN)    [<ffff82d0402f87cb>] F guest_physmap_add_page+0x7f/0xfe
> >      (XEN)    [<ffff82d0402ba0a2>] F
> > arch/x86/hvm/ioreq.c#hvm_add_ioreq_gfn+0x6f/0x8d
> >      (XEN)    [<ffff82d0402ba0f9>] F
> > arch/x86/hvm/ioreq.c#hvm_ioreq_server_disable+0x39/0x4f
> >      (XEN)    [<ffff82d0402bb50e>] F hvm_destroy_all_ioreq_servers+0x6f/0x9b
> >      (XEN)    [<ffff82d0402afc15>] F
> > hvm_domain_relinquish_resources+0x3e/0x92
> >      (XEN)    [<ffff82d04031f278>] F domain_relinquish_resources+0x355/0x372
> >      (XEN)    [<ffff82d040205dd4>] F domain_kill+0xc7/0x150
> >      (XEN)    [<ffff82d04023baf0>] F do_domctl+0xba7/0x1a09
> >      (XEN)    [<ffff82d040312c8f>] F pv_hypercall+0x2f0/0x55f
> >      (XEN)    [<ffff82d040391432>] F lstar_enter+0x112/0x120
> >
> > This one resulted to a domain crash rather than a clean destruction.
> 
> A domain crash despite the domain already getting cleaned up is
> something we may at least want to consider doing better: There
> already is a !d->is_shutting_down conditional printk() there.
> What would people think about avoiding the domain_crash() call
> in similar ways? (It could even be considered to make
> domain_crash() itself behave like this, but such a step may make
> it necessary to first audit all use sites.)
> 
> > I would still like to disallow page-table allocation, so I am think to
> > ignore any request in iommu_map() if the domain is dying.
> 
> Ignoring requests there seems fragile to me. Paul - what are your
> thoughts about bailing early from hvm_add_ioreq_gfn() when the
> domain is dying?
> 

I think that's ok. Really, the only reason for putting the pages back in the P2M is to allow migration to work so if the domain is dying then we don't really care do we?

  Paul

> Jan
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index cea1032b3d02..779dbb5b98ba 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -267,13 +267,18 @@  int iommu_free_pgtables(struct domain *d)
     struct page_info *pg;
     unsigned int done = 0;
 
+    spin_lock(&hd->arch.pgtables.lock);
     while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
     {
         free_domheap_page(pg);
 
         if ( !(++done & 0xff) && general_preempt_check() )
+        {
+            spin_unlock(&hd->arch.pgtables.lock);
             return -ERESTART;
+        }
     }
+    spin_unlock(&hd->arch.pgtables.lock);
 
     return 0;
 }