diff mbox series

[for-4.15,3/4,RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables

Message ID 20201222154338.9459-4-julien@xen.org (mailing list archive)
State Superseded
Headers show
Series xen/iommu: Collection of bug fixes for IOMMU teadorwn | expand

Commit Message

Julien Grall Dec. 22, 2020, 3:43 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

The new per-domain IOMMU page-table allocator will now free the
page-tables when domain's resources are relinquished. However, the root
page-table (i.e. hd->arch.pg_maddr) will not be cleared.

Xen may access the IOMMU page-tables afterwards at least in the case of
PV domain:

(XEN) Xen call trace:
(XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
(XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
(XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
(XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
(XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
(XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
(XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
(XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
(XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
(XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
(XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
(XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
(XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
(XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
(XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
(XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
(XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120

This will result to a use after-free and possibly an host crash or
memory corruption.

Freeing the page-tables further down in domain_relinquish_resources()
would not work because pages may not be released until later if another
domain hold a reference on them.

Once all the PCI devices have been de-assigned, it is actually pointless
to access modify the IOMMU page-tables. So we can simply clear the root
page-table address.

Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
Signed-off-by: Julien Grall <jgrall@amazon.com>

---

This is an RFC because it would break AMD IOMMU driver. One option would
be to move the call to the teardown callback earlier on. Any opinions?
---
 xen/drivers/passthrough/x86/iommu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Jan Beulich Dec. 23, 2020, 2:12 p.m. UTC | #1
On 22.12.2020 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The new per-domain IOMMU page-table allocator will now free the
> page-tables when domain's resources are relinquished. However, the root
> page-table (i.e. hd->arch.pg_maddr) will not be cleared.
> 
> Xen may access the IOMMU page-tables afterwards at least in the case of
> PV domain:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
> (XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
> (XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
> (XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
> (XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
> (XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
> (XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
> (XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
> (XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
> (XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
> (XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
> (XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
> (XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
> (XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
> (XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
> (XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
> (XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120
> 
> This will result to a use after-free and possibly an host crash or
> memory corruption.
> 
> Freeing the page-tables further down in domain_relinquish_resources()
> would not work because pages may not be released until later if another
> domain hold a reference on them.
> 
> Once all the PCI devices have been de-assigned, it is actually pointless
> to access modify the IOMMU page-tables. So we can simply clear the root
> page-table address.
> 
> Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ---
> 
> This is an RFC because it would break AMD IOMMU driver. One option would
> be to move the call to the teardown callback earlier on. Any opinions?

We already have

static void amd_iommu_domain_destroy(struct domain *d)
{
    dom_iommu(d)->arch.amd.root_table = NULL;
}

and this function is AMD's teardown handler. Hence I suppose
doing the same for VT-d would be quite reasonable. And really
VT-d's iommu_domain_teardown() also already has

    hd->arch.vtd.pgd_maddr = 0;

I guess what's missing is prevention of the root table
getting re-setup. This, I guess, would be helped by the
previously suggested preventing of any further modifications
to the p2m and alike for dying domains. Note how e.g. the
handling of XEN_DOMCTL_assign_device already includes a
"dying" check.

Jan
Julien Grall Dec. 23, 2020, 2:56 p.m. UTC | #2
Hi Jan,

On 23/12/2020 14:12, Jan Beulich wrote:
> On 22.12.2020 16:43, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The new per-domain IOMMU page-table allocator will now free the
>> page-tables when domain's resources are relinquished. However, the root
>> page-table (i.e. hd->arch.pg_maddr) will not be cleared.
>>
>> Xen may access the IOMMU page-tables afterwards at least in the case of
>> PV domain:
>>
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
>> (XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
>> (XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
>> (XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
>> (XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
>> (XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
>> (XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
>> (XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
>> (XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
>> (XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
>> (XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
>> (XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
>> (XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
>> (XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
>> (XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
>> (XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
>> (XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
>> (XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
>> (XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
>> (XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120
>>
>> This will result to a use after-free and possibly an host crash or
>> memory corruption.
>>
>> Freeing the page-tables further down in domain_relinquish_resources()
>> would not work because pages may not be released until later if another
>> domain hold a reference on them.
>>
>> Once all the PCI devices have been de-assigned, it is actually pointless
>> to access modify the IOMMU page-tables. So we can simply clear the root
>> page-table address.
>>
>> Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>
>> ---
>>
>> This is an RFC because it would break AMD IOMMU driver. One option would
>> be to move the call to the teardown callback earlier on. Any opinions?
> 
> We already have
> 
> static void amd_iommu_domain_destroy(struct domain *d)
> {
>      dom_iommu(d)->arch.amd.root_table = NULL;
> }
> 
> and this function is AMD's teardown handler. Hence I suppose
> doing the same for VT-d would be quite reasonable. And really
> VT-d's iommu_domain_teardown() also already has
> 
>      hd->arch.vtd.pgd_maddr = 0;

Let me have a look if that works.

> 
> I guess what's missing is prevention of the root table
> getting re-setup. 

This is taken care in the follow-up patch by forbidding page-table 
allocation. I can mention it in the commit message.


> This, I guess, would be helped by the
> previously suggested preventing of any further modifications
> to the p2m and alike for dying domains. Note how e.g. the
> handling of XEN_DOMCTL_assign_device already includes a
> "dying" check.
> 
> Jan
>
Jan Beulich Dec. 23, 2020, 3 p.m. UTC | #3
On 23.12.2020 15:56, Julien Grall wrote:
> On 23/12/2020 14:12, Jan Beulich wrote:
>> On 22.12.2020 16:43, Julien Grall wrote:
>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>> be to move the call to the teardown callback earlier on. Any opinions?
>>
>> We already have
>>
>> static void amd_iommu_domain_destroy(struct domain *d)
>> {
>>      dom_iommu(d)->arch.amd.root_table = NULL;
>> }
>>
>> and this function is AMD's teardown handler. Hence I suppose
>> doing the same for VT-d would be quite reasonable. And really
>> VT-d's iommu_domain_teardown() also already has
>>
>>      hd->arch.vtd.pgd_maddr = 0;
> 
> Let me have a look if that works.
> 
>>
>> I guess what's missing is prevention of the root table
>> getting re-setup. 
> 
> This is taken care in the follow-up patch by forbidding page-table 
> allocation. I can mention it in the commit message.

My expectation is that with that subsequent change the change here
(or any variant of it) would become unnecessary.

Jan
Julien Grall Dec. 23, 2020, 3:16 p.m. UTC | #4
Hi Jan,

On 23/12/2020 15:00, Jan Beulich wrote:
> On 23.12.2020 15:56, Julien Grall wrote:
>> On 23/12/2020 14:12, Jan Beulich wrote:
>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>
>>> We already have
>>>
>>> static void amd_iommu_domain_destroy(struct domain *d)
>>> {
>>>       dom_iommu(d)->arch.amd.root_table = NULL;
>>> }
>>>
>>> and this function is AMD's teardown handler. Hence I suppose
>>> doing the same for VT-d would be quite reasonable. And really
>>> VT-d's iommu_domain_teardown() also already has
>>>
>>>       hd->arch.vtd.pgd_maddr = 0;
>>
>> Let me have a look if that works.
>>
>>>
>>> I guess what's missing is prevention of the root table
>>> getting re-setup.
>>
>> This is taken care in the follow-up patch by forbidding page-table
>> allocation. I can mention it in the commit message.
> 
> My expectation is that with that subsequent change the change here
> (or any variant of it) would become unnecessary.

I am not be sure. iommu_unmap() would still get called from put_page(). 
Are you suggesting to gate the code if d->is_dying as well?

Even if this patch is deemed to be unecessary to fix the issue.
This issue was quite hard to chase/reproduce.

I think it would still be good to harden the code by zeroing 
hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the 
pointer was still "valid".

Cheers,
Jan Beulich Dec. 23, 2020, 4:11 p.m. UTC | #5
On 23.12.2020 16:16, Julien Grall wrote:
> On 23/12/2020 15:00, Jan Beulich wrote:
>> On 23.12.2020 15:56, Julien Grall wrote:
>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>>
>>>> We already have
>>>>
>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>> {
>>>>       dom_iommu(d)->arch.amd.root_table = NULL;
>>>> }
>>>>
>>>> and this function is AMD's teardown handler. Hence I suppose
>>>> doing the same for VT-d would be quite reasonable. And really
>>>> VT-d's iommu_domain_teardown() also already has
>>>>
>>>>       hd->arch.vtd.pgd_maddr = 0;
>>>
>>> Let me have a look if that works.
>>>
>>>>
>>>> I guess what's missing is prevention of the root table
>>>> getting re-setup.
>>>
>>> This is taken care in the follow-up patch by forbidding page-table
>>> allocation. I can mention it in the commit message.
>>
>> My expectation is that with that subsequent change the change here
>> (or any variant of it) would become unnecessary.
> 
> I am not be sure. iommu_unmap() would still get called from put_page(). 
> Are you suggesting to gate the code if d->is_dying as well?

Unmap shouldn't be allocating any memory right now, as in
non-shared-page-table mode we don't install any superpages
(unless I misremember).

> Even if this patch is deemed to be unecessary to fix the issue.
> This issue was quite hard to chase/reproduce.
> 
> I think it would still be good to harden the code by zeroing 
> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the 
> pointer was still "valid".

But my point was that this zeroing already happens. What I
suspect is that it gets re-populated after it was zeroed,
because of page table manipulation that shouldn't be
occurring anymore for a dying domain.

Jan
Julien Grall Dec. 23, 2020, 4:16 p.m. UTC | #6
Hi,

On 23/12/2020 16:11, Jan Beulich wrote:
> On 23.12.2020 16:16, Julien Grall wrote:
>> On 23/12/2020 15:00, Jan Beulich wrote:
>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>>>
>>>>> We already have
>>>>>
>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>> {
>>>>>        dom_iommu(d)->arch.amd.root_table = NULL;
>>>>> }
>>>>>
>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>
>>>>>        hd->arch.vtd.pgd_maddr = 0;
>>>>
>>>> Let me have a look if that works.
>>>>
>>>>>
>>>>> I guess what's missing is prevention of the root table
>>>>> getting re-setup.
>>>>
>>>> This is taken care in the follow-up patch by forbidding page-table
>>>> allocation. I can mention it in the commit message.
>>>
>>> My expectation is that with that subsequent change the change here
>>> (or any variant of it) would become unnecessary.
>>
>> I am not be sure. iommu_unmap() would still get called from put_page().
>> Are you suggesting to gate the code if d->is_dying as well?
> 
> Unmap shouldn't be allocating any memory right now, as in
> non-shared-page-table mode we don't install any superpages
> (unless I misremember).

It doesn't allocate memory, but it will try to access the IOMMU 
page-tables (see more below).

> 
>> Even if this patch is deemed to be unecessary to fix the issue.
>> This issue was quite hard to chase/reproduce.
>>
>> I think it would still be good to harden the code by zeroing
>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>> pointer was still "valid".
> 
> But my point was that this zeroing already happens. 
> What I
> suspect is that it gets re-populated after it was zeroed,
> because of page table manipulation that shouldn't be
> occurring anymore for a dying domain.

AFAICT, the zeroing is happening in ->teardown() helper.

It is only called when the domain is fully destroyed (see call in 
arch_domain_destroy()). This will happen much after relinquishing the code.

Could you clarify why you think it is already zeroed and by who?

Cheers,
Jan Beulich Dec. 23, 2020, 4:24 p.m. UTC | #7
On 23.12.2020 17:16, Julien Grall wrote:
> On 23/12/2020 16:11, Jan Beulich wrote:
>> On 23.12.2020 16:16, Julien Grall wrote:
>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?

Please note this (in your original submission). I simply ...

>>>>>> We already have
>>>>>>
>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>> {
>>>>>>        dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>> }
>>>>>>
>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>
>>>>>>        hd->arch.vtd.pgd_maddr = 0;
>>>>>
>>>>> Let me have a look if that works.
>>>>>
>>>>>>
>>>>>> I guess what's missing is prevention of the root table
>>>>>> getting re-setup.
>>>>>
>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>> allocation. I can mention it in the commit message.
>>>>
>>>> My expectation is that with that subsequent change the change here
>>>> (or any variant of it) would become unnecessary.
>>>
>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>> Are you suggesting to gate the code if d->is_dying as well?
>>
>> Unmap shouldn't be allocating any memory right now, as in
>> non-shared-page-table mode we don't install any superpages
>> (unless I misremember).
> 
> It doesn't allocate memory, but it will try to access the IOMMU 
> page-tables (see more below).
> 
>>
>>> Even if this patch is deemed to be unecessary to fix the issue.
>>> This issue was quite hard to chase/reproduce.
>>>
>>> I think it would still be good to harden the code by zeroing
>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>> pointer was still "valid".
>>
>> But my point was that this zeroing already happens. 
>> What I
>> suspect is that it gets re-populated after it was zeroed,
>> because of page table manipulation that shouldn't be
>> occurring anymore for a dying domain.
> 
> AFAICT, the zeroing is happening in ->teardown() helper.
> 
> It is only called when the domain is fully destroyed (see call in 
> arch_domain_destroy()). This will happen much after relinquishing the code.
> 
> Could you clarify why you think it is already zeroed and by who?

... trusted you on what you stated there. But perhaps I somehow
misunderstood that sentence to mean you want to put your addition
into the teardown functions, when apparently you meant to invoke
them earlier in the process. Without clearly identifying why this
would be a safe thing to do, I couldn't imagine that's what you
suggest as alternative. This is because the interdependencies of
the IOMMU code are pretty hard to follow at times, and hence any
such re-ordering has a fair risk of breaking something elsewhere.

Jan
Julien Grall Dec. 23, 2020, 4:29 p.m. UTC | #8
On 23/12/2020 16:24, Jan Beulich wrote:
> On 23.12.2020 17:16, Julien Grall wrote:
>> On 23/12/2020 16:11, Jan Beulich wrote:
>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
> 
> Please note this (in your original submission). I simply ...
> 
>>>>>>> We already have
>>>>>>>
>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>> {
>>>>>>>         dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>> }
>>>>>>>
>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>
>>>>>>>         hd->arch.vtd.pgd_maddr = 0;
>>>>>>
>>>>>> Let me have a look if that works.
>>>>>>
>>>>>>>
>>>>>>> I guess what's missing is prevention of the root table
>>>>>>> getting re-setup.
>>>>>>
>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>> allocation. I can mention it in the commit message.
>>>>>
>>>>> My expectation is that with that subsequent change the change here
>>>>> (or any variant of it) would become unnecessary.
>>>>
>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>
>>> Unmap shouldn't be allocating any memory right now, as in
>>> non-shared-page-table mode we don't install any superpages
>>> (unless I misremember).
>>
>> It doesn't allocate memory, but it will try to access the IOMMU
>> page-tables (see more below).
>>
>>>
>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>> This issue was quite hard to chase/reproduce.
>>>>
>>>> I think it would still be good to harden the code by zeroing
>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>> pointer was still "valid".
>>>
>>> But my point was that this zeroing already happens.
>>> What I
>>> suspect is that it gets re-populated after it was zeroed,
>>> because of page table manipulation that shouldn't be
>>> occurring anymore for a dying domain.
>>
>> AFAICT, the zeroing is happening in ->teardown() helper.
>>
>> It is only called when the domain is fully destroyed (see call in
>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>
>> Could you clarify why you think it is already zeroed and by who?
> 
> ... trusted you on what you stated there. But perhaps I somehow
> misunderstood that sentence to mean you want to put your addition
> into the teardown functions, when apparently you meant to invoke
> them earlier in the process. Without clearly identifying why this
> would be a safe thing to do, I couldn't imagine that's what you
> suggest as alternative. 

This was a wording issue. I meant moving ->teardown() before (or calling 
from) iommu_free_pgtables().

Shall I introduce a new callback then?

> This is because the interdependencies of
> the IOMMU code are pretty hard to follow at times, and hence any
> such re-ordering has a fair risk of breaking something elsewhere.

Right, this is another reason to try to get most of my fix 
self-contained rather relying on top-layer call to protect against a 
domain dying.

Cheers,
Jan Beulich Dec. 23, 2020, 4:46 p.m. UTC | #9
On 23.12.2020 17:29, Julien Grall wrote:
> On 23/12/2020 16:24, Jan Beulich wrote:
>> On 23.12.2020 17:16, Julien Grall wrote:
>>> On 23/12/2020 16:11, Jan Beulich wrote:
>>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>
>> Please note this (in your original submission). I simply ...
>>
>>>>>>>> We already have
>>>>>>>>
>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>>> {
>>>>>>>>         dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>>> }
>>>>>>>>
>>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>>
>>>>>>>>         hd->arch.vtd.pgd_maddr = 0;
>>>>>>>
>>>>>>> Let me have a look if that works.
>>>>>>>
>>>>>>>>
>>>>>>>> I guess what's missing is prevention of the root table
>>>>>>>> getting re-setup.
>>>>>>>
>>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>>> allocation. I can mention it in the commit message.
>>>>>>
>>>>>> My expectation is that with that subsequent change the change here
>>>>>> (or any variant of it) would become unnecessary.
>>>>>
>>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>>
>>>> Unmap shouldn't be allocating any memory right now, as in
>>>> non-shared-page-table mode we don't install any superpages
>>>> (unless I misremember).
>>>
>>> It doesn't allocate memory, but it will try to access the IOMMU
>>> page-tables (see more below).
>>>
>>>>
>>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>>> This issue was quite hard to chase/reproduce.
>>>>>
>>>>> I think it would still be good to harden the code by zeroing
>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>>> pointer was still "valid".
>>>>
>>>> But my point was that this zeroing already happens.
>>>> What I
>>>> suspect is that it gets re-populated after it was zeroed,
>>>> because of page table manipulation that shouldn't be
>>>> occurring anymore for a dying domain.
>>>
>>> AFAICT, the zeroing is happening in ->teardown() helper.
>>>
>>> It is only called when the domain is fully destroyed (see call in
>>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>>
>>> Could you clarify why you think it is already zeroed and by who?
>>
>> ... trusted you on what you stated there. But perhaps I somehow
>> misunderstood that sentence to mean you want to put your addition
>> into the teardown functions, when apparently you meant to invoke
>> them earlier in the process. Without clearly identifying why this
>> would be a safe thing to do, I couldn't imagine that's what you
>> suggest as alternative. 
> 
> This was a wording issue. I meant moving ->teardown() before (or calling 
> from) iommu_free_pgtables().
> 
> Shall I introduce a new callback then?

Earlier zeroing won't help unless you prevent re-population, or
unless you make the code capable of telling "still zero" from
"already zero". But I have to admit I'd like to also have Paul's
opinion on the matter.

Jan

>> This is because the interdependencies of
>> the IOMMU code are pretty hard to follow at times, and hence any
>> such re-ordering has a fair risk of breaking something elsewhere.
> 
> Right, this is another reason to try to get most of my fix 
> self-contained rather relying on top-layer call to protect against a 
> domain dying.
> 
> Cheers,
>
Julien Grall Dec. 23, 2020, 4:54 p.m. UTC | #10
On 23/12/2020 16:46, Jan Beulich wrote:
> On 23.12.2020 17:29, Julien Grall wrote:
>> On 23/12/2020 16:24, Jan Beulich wrote:
>>> On 23.12.2020 17:16, Julien Grall wrote:
>>>> On 23/12/2020 16:11, Jan Beulich wrote:
>>>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>
>>> Please note this (in your original submission). I simply ...
>>>
>>>>>>>>> We already have
>>>>>>>>>
>>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>>>> {
>>>>>>>>>          dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>>>
>>>>>>>>>          hd->arch.vtd.pgd_maddr = 0;
>>>>>>>>
>>>>>>>> Let me have a look if that works.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I guess what's missing is prevention of the root table
>>>>>>>>> getting re-setup.
>>>>>>>>
>>>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>>>> allocation. I can mention it in the commit message.
>>>>>>>
>>>>>>> My expectation is that with that subsequent change the change here
>>>>>>> (or any variant of it) would become unnecessary.
>>>>>>
>>>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>>>
>>>>> Unmap shouldn't be allocating any memory right now, as in
>>>>> non-shared-page-table mode we don't install any superpages
>>>>> (unless I misremember).
>>>>
>>>> It doesn't allocate memory, but it will try to access the IOMMU
>>>> page-tables (see more below).
>>>>
>>>>>
>>>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>>>> This issue was quite hard to chase/reproduce.
>>>>>>
>>>>>> I think it would still be good to harden the code by zeroing
>>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>>>> pointer was still "valid".
>>>>>
>>>>> But my point was that this zeroing already happens.
>>>>> What I
>>>>> suspect is that it gets re-populated after it was zeroed,
>>>>> because of page table manipulation that shouldn't be
>>>>> occurring anymore for a dying domain.
>>>>
>>>> AFAICT, the zeroing is happening in ->teardown() helper.
>>>>
>>>> It is only called when the domain is fully destroyed (see call in
>>>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>>>
>>>> Could you clarify why you think it is already zeroed and by who?
>>>
>>> ... trusted you on what you stated there. But perhaps I somehow
>>> misunderstood that sentence to mean you want to put your addition
>>> into the teardown functions, when apparently you meant to invoke
>>> them earlier in the process. Without clearly identifying why this
>>> would be a safe thing to do, I couldn't imagine that's what you
>>> suggest as alternative.
>>
>> This was a wording issue. I meant moving ->teardown() before (or calling
>> from) iommu_free_pgtables().
>>
>> Shall I introduce a new callback then?
> 
> Earlier zeroing won't help unless you prevent re-population, or
> unless you make the code capable of telling "still zero" from
> "already zero". But I have to admit I'd like to also have Paul's
> opinion on the matter.

Patch #4 is meant to prevent that with the d->is_dying check in the 
IOMMU page-table allocation.

Do you think this is not enough?

Cheers,
Jan Beulich Dec. 23, 2020, 5:02 p.m. UTC | #11
On 23.12.2020 17:54, Julien Grall wrote:
> 
> 
> On 23/12/2020 16:46, Jan Beulich wrote:
>> On 23.12.2020 17:29, Julien Grall wrote:
>>> On 23/12/2020 16:24, Jan Beulich wrote:
>>>> On 23.12.2020 17:16, Julien Grall wrote:
>>>>> On 23/12/2020 16:11, Jan Beulich wrote:
>>>>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>>
>>>> Please note this (in your original submission). I simply ...
>>>>
>>>>>>>>>> We already have
>>>>>>>>>>
>>>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>>>>> {
>>>>>>>>>>          dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>>>>
>>>>>>>>>>          hd->arch.vtd.pgd_maddr = 0;
>>>>>>>>>
>>>>>>>>> Let me have a look if that works.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I guess what's missing is prevention of the root table
>>>>>>>>>> getting re-setup.
>>>>>>>>>
>>>>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>>>>> allocation. I can mention it in the commit message.
>>>>>>>>
>>>>>>>> My expectation is that with that subsequent change the change here
>>>>>>>> (or any variant of it) would become unnecessary.
>>>>>>>
>>>>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>>>>
>>>>>> Unmap shouldn't be allocating any memory right now, as in
>>>>>> non-shared-page-table mode we don't install any superpages
>>>>>> (unless I misremember).
>>>>>
>>>>> It doesn't allocate memory, but it will try to access the IOMMU
>>>>> page-tables (see more below).
>>>>>
>>>>>>
>>>>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>>>>> This issue was quite hard to chase/reproduce.
>>>>>>>
>>>>>>> I think it would still be good to harden the code by zeroing
>>>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>>>>> pointer was still "valid".
>>>>>>
>>>>>> But my point was that this zeroing already happens.
>>>>>> What I
>>>>>> suspect is that it gets re-populated after it was zeroed,
>>>>>> because of page table manipulation that shouldn't be
>>>>>> occurring anymore for a dying domain.
>>>>>
>>>>> AFAICT, the zeroing is happening in ->teardown() helper.
>>>>>
>>>>> It is only called when the domain is fully destroyed (see call in
>>>>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>>>>
>>>>> Could you clarify why you think it is already zeroed and by who?
>>>>
>>>> ... trusted you on what you stated there. But perhaps I somehow
>>>> misunderstood that sentence to mean you want to put your addition
>>>> into the teardown functions, when apparently you meant to invoke
>>>> them earlier in the process. Without clearly identifying why this
>>>> would be a safe thing to do, I couldn't imagine that's what you
>>>> suggest as alternative.
>>>
>>> This was a wording issue. I meant moving ->teardown() before (or calling
>>> from) iommu_free_pgtables().
>>>
>>> Shall I introduce a new callback then?
>>
>> Earlier zeroing won't help unless you prevent re-population, or
>> unless you make the code capable of telling "still zero" from
>> "already zero". But I have to admit I'd like to also have Paul's
>> opinion on the matter.
> 
> Patch #4 is meant to prevent that with the d->is_dying check in the 
> IOMMU page-table allocation.
> 
> Do you think this is not enough?

It probably is; I think that other patch would want to come first
then, or both be folded. Nevertheless I'm not fully convinced
putting the check there is the best course of action.

Jan
Julien Grall Dec. 23, 2020, 5:26 p.m. UTC | #12
Hi Jan,

On 23/12/2020 17:02, Jan Beulich wrote:
> On 23.12.2020 17:54, Julien Grall wrote:
>>
>>
>> On 23/12/2020 16:46, Jan Beulich wrote:
>>> On 23.12.2020 17:29, Julien Grall wrote:
>>>> On 23/12/2020 16:24, Jan Beulich wrote:
>>>>> On 23.12.2020 17:16, Julien Grall wrote:
>>>>>> On 23/12/2020 16:11, Jan Beulich wrote:
>>>>>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>>>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>>>
>>>>> Please note this (in your original submission). I simply ...
>>>>>
>>>>>>>>>>> We already have
>>>>>>>>>>>
>>>>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>>>>>> {
>>>>>>>>>>>           dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>>>>>
>>>>>>>>>>>           hd->arch.vtd.pgd_maddr = 0;
>>>>>>>>>>
>>>>>>>>>> Let me have a look if that works.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I guess what's missing is prevention of the root table
>>>>>>>>>>> getting re-setup.
>>>>>>>>>>
>>>>>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>>>>>> allocation. I can mention it in the commit message.
>>>>>>>>>
>>>>>>>>> My expectation is that with that subsequent change the change here
>>>>>>>>> (or any variant of it) would become unnecessary.
>>>>>>>>
>>>>>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>>>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>>>>>
>>>>>>> Unmap shouldn't be allocating any memory right now, as in
>>>>>>> non-shared-page-table mode we don't install any superpages
>>>>>>> (unless I misremember).
>>>>>>
>>>>>> It doesn't allocate memory, but it will try to access the IOMMU
>>>>>> page-tables (see more below).
>>>>>>
>>>>>>>
>>>>>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>>>>>> This issue was quite hard to chase/reproduce.
>>>>>>>>
>>>>>>>> I think it would still be good to harden the code by zeroing
>>>>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>>>>>> pointer was still "valid".
>>>>>>>
>>>>>>> But my point was that this zeroing already happens.
>>>>>>> What I
>>>>>>> suspect is that it gets re-populated after it was zeroed,
>>>>>>> because of page table manipulation that shouldn't be
>>>>>>> occurring anymore for a dying domain.
>>>>>>
>>>>>> AFAICT, the zeroing is happening in ->teardown() helper.
>>>>>>
>>>>>> It is only called when the domain is fully destroyed (see call in
>>>>>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>>>>>
>>>>>> Could you clarify why you think it is already zeroed and by who?
>>>>>
>>>>> ... trusted you on what you stated there. But perhaps I somehow
>>>>> misunderstood that sentence to mean you want to put your addition
>>>>> into the teardown functions, when apparently you meant to invoke
>>>>> them earlier in the process. Without clearly identifying why this
>>>>> would be a safe thing to do, I couldn't imagine that's what you
>>>>> suggest as alternative.
>>>>
>>>> This was a wording issue. I meant moving ->teardown() before (or calling
>>>> from) iommu_free_pgtables().
>>>>
>>>> Shall I introduce a new callback then?
>>>
>>> Earlier zeroing won't help unless you prevent re-population, or
>>> unless you make the code capable of telling "still zero" from
>>> "already zero". But I have to admit I'd like to also have Paul's
>>> opinion on the matter.
>>
>> Patch #4 is meant to prevent that with the d->is_dying check in the
>> IOMMU page-table allocation.
>>
>> Do you think this is not enough?
> 
> It probably is; I think that other patch would want to come first
> then, or both be folded. 

I would like to keep them separated. But I am happy to re-order them.

> Nevertheless I'm not fully convinced
> putting the check there is the best course of action.

As you pointed out in a previous e-mail, the IOMMU code is pretty hard 
to follow at times. The check in the allocator is quite simple, so I 
think it would be best to keep it.

It doesn't mean this should be the only change, but it will avoid a 
whole lot of potential issues if we missed any path that may touch the 
IOMMU page-tables while the domain is dying.

The other checks can just be shortcut to prevent extra work that will 
result to a failure.

I will wait for Paul's input before reworking the series.

Cheers,
Paul Durrant Jan. 4, 2021, 9:53 a.m. UTC | #13
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 23 December 2020 16:46
> To: Julien Grall <julien@xen.org>; Paul Durrant <paul@xen.org>
> Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the
> page-tables
> 
> On 23.12.2020 17:29, Julien Grall wrote:
> > On 23/12/2020 16:24, Jan Beulich wrote:
> >> On 23.12.2020 17:16, Julien Grall wrote:
> >>> On 23/12/2020 16:11, Jan Beulich wrote:
> >>>> On 23.12.2020 16:16, Julien Grall wrote:
> >>>>> On 23/12/2020 15:00, Jan Beulich wrote:
> >>>>>> On 23.12.2020 15:56, Julien Grall wrote:
> >>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
> >>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
> >>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
> >>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
> >>
> >> Please note this (in your original submission). I simply ...
> >>
> >>>>>>>> We already have
> >>>>>>>>
> >>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
> >>>>>>>> {
> >>>>>>>>         dom_iommu(d)->arch.amd.root_table = NULL;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> and this function is AMD's teardown handler. Hence I suppose
> >>>>>>>> doing the same for VT-d would be quite reasonable. And really
> >>>>>>>> VT-d's iommu_domain_teardown() also already has
> >>>>>>>>
> >>>>>>>>         hd->arch.vtd.pgd_maddr = 0;
> >>>>>>>
> >>>>>>> Let me have a look if that works.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> I guess what's missing is prevention of the root table
> >>>>>>>> getting re-setup.
> >>>>>>>
> >>>>>>> This is taken care in the follow-up patch by forbidding page-table
> >>>>>>> allocation. I can mention it in the commit message.
> >>>>>>
> >>>>>> My expectation is that with that subsequent change the change here
> >>>>>> (or any variant of it) would become unnecessary.
> >>>>>
> >>>>> I am not be sure. iommu_unmap() would still get called from put_page().
> >>>>> Are you suggesting to gate the code if d->is_dying as well?
> >>>>
> >>>> Unmap shouldn't be allocating any memory right now, as in
> >>>> non-shared-page-table mode we don't install any superpages
> >>>> (unless I misremember).
> >>>
> >>> It doesn't allocate memory, but it will try to access the IOMMU
> >>> page-tables (see more below).
> >>>
> >>>>
> >>>>> Even if this patch is deemed to be unecessary to fix the issue.
> >>>>> This issue was quite hard to chase/reproduce.
> >>>>>
> >>>>> I think it would still be good to harden the code by zeroing
> >>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
> >>>>> pointer was still "valid".
> >>>>
> >>>> But my point was that this zeroing already happens.
> >>>> What I
> >>>> suspect is that it gets re-populated after it was zeroed,
> >>>> because of page table manipulation that shouldn't be
> >>>> occurring anymore for a dying domain.
> >>>
> >>> AFAICT, the zeroing is happening in ->teardown() helper.
> >>>
> >>> It is only called when the domain is fully destroyed (see call in
> >>> arch_domain_destroy()). This will happen much after relinquishing the code.
> >>>
> >>> Could you clarify why you think it is already zeroed and by who?
> >>
> >> ... trusted you on what you stated there. But perhaps I somehow
> >> misunderstood that sentence to mean you want to put your addition
> >> into the teardown functions, when apparently you meant to invoke
> >> them earlier in the process. Without clearly identifying why this
> >> would be a safe thing to do, I couldn't imagine that's what you
> >> suggest as alternative.
> >
> > This was a wording issue. I meant moving ->teardown() before (or calling
> > from) iommu_free_pgtables().
> >
> > Shall I introduce a new callback then?
> 
> Earlier zeroing won't help unless you prevent re-population, or
> unless you make the code capable of telling "still zero" from
> "already zero". But I have to admit I'd like to also have Paul's
> opinion on the matter.
> 

I have a pending series to dynamically unshare IOMMU mappings (to allow logdirty to be enabled on domains with currently-shared EPT) which gets rid of the lazy allocation of the root table and uses pgd_maddr == NULL as a test of whether EPT is shared or not. Therefore it would seem best, to fix this immediate problem, to also stop lazy allocation of pgd_maddr, and re-introduce a per-implementation free_pgtables() callback which zeroes pgd_maddr and then calls the common function.

  Paul
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 779dbb5b98ba..99a23177b3d2 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -267,6 +267,16 @@  int iommu_free_pgtables(struct domain *d)
     struct page_info *pg;
     unsigned int done = 0;
 
+    /*
+     * Pages will be moved to the free list in a bit. So we want to
+     * clear the root page-table to avoid any potential use after-free.
+     *
+     * XXX: This only code works for Intel vT-D.
+     */
+    spin_lock(&hd->arch.mapping_lock);
+    hd->arch.vtd.pgd_maddr = 0;
+    spin_unlock(&hd->arch.mapping_lock);
+
     spin_lock(&hd->arch.pgtables.lock);
     while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
     {