Message ID | 20201222154338.9459-4-julien@xen.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen/iommu: Collection of bug fixes for IOMMU teadorwn | expand |
On 22.12.2020 16:43, Julien Grall wrote: > From: Julien Grall <jgrall@amazon.com> > > The new per-domain IOMMU page-table allocator will now free the > page-tables when domain's resources are relinquished. However, the root > page-table (i.e. hd->arch.pg_maddr) will not be cleared. > > Xen may access the IOMMU page-tables afterwards at least in the case of > PV domain: > > (XEN) Xen call trace: > (XEN) [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8 > (XEN) [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8 > (XEN) [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129 > (XEN) [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63 > (XEN) [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144 > (XEN) [<ffff82d04033c61d>] F put_page+0x4b/0xb3 > (XEN) [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b > (XEN) [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc > (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e > (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 > (XEN) [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf > (XEN) [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc > (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e > (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 > (XEN) [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf > (XEN) [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc > (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e > (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 > (XEN) [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d > (XEN) [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc > (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e > (XEN) [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15 > (XEN) [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9 > (XEN) [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a > (XEN) [<ffff82d040205cdf>] F domain_kill+0xb8/0x141 > (XEN) [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5 > (XEN) [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f > (XEN) [<ffff82d04039b432>] F lstar_enter+0x112/0x120 > > This will result to a use after-free and possibly an host crash or > memory corruption. > > Freeing the page-tables further down in domain_relinquish_resources() > would not work because pages may not be released until later if another > domain hold a reference on them. > > Once all the PCI devices have been de-assigned, it is actually pointless > to access modify the IOMMU page-tables. So we can simply clear the root > page-table address. > > Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator") > Signed-off-by: Julien Grall <jgrall@amazon.com> > > --- > > This is an RFC because it would break AMD IOMMU driver. One option would > be to move the call to the teardown callback earlier on. Any opinions? We already have static void amd_iommu_domain_destroy(struct domain *d) { dom_iommu(d)->arch.amd.root_table = NULL; } and this function is AMD's teardown handler. Hence I suppose doing the same for VT-d would be quite reasonable. And really VT-d's iommu_domain_teardown() also already has hd->arch.vtd.pgd_maddr = 0; I guess what's missing is prevention of the root table getting re-setup. This, I guess, would be helped by the previously suggested preventing of any further modifications to the p2m and alike for dying domains. Note how e.g. the handling of XEN_DOMCTL_assign_device already includes a "dying" check. Jan
Hi Jan, On 23/12/2020 14:12, Jan Beulich wrote: > On 22.12.2020 16:43, Julien Grall wrote: >> From: Julien Grall <jgrall@amazon.com> >> >> The new per-domain IOMMU page-table allocator will now free the >> page-tables when domain's resources are relinquished. However, the root >> page-table (i.e. hd->arch.pg_maddr) will not be cleared. >> >> Xen may access the IOMMU page-tables afterwards at least in the case of >> PV domain: >> >> (XEN) Xen call trace: >> (XEN) [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8 >> (XEN) [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8 >> (XEN) [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129 >> (XEN) [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63 >> (XEN) [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144 >> (XEN) [<ffff82d04033c61d>] F put_page+0x4b/0xb3 >> (XEN) [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b >> (XEN) [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc >> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e >> (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 >> (XEN) [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf >> (XEN) [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc >> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e >> (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 >> (XEN) [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf >> (XEN) [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc >> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e >> (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 >> (XEN) [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d >> (XEN) [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc >> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e >> (XEN) [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15 >> (XEN) [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9 >> (XEN) [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a >> (XEN) [<ffff82d040205cdf>] F domain_kill+0xb8/0x141 >> (XEN) [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5 >> (XEN) [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f >> (XEN) [<ffff82d04039b432>] F lstar_enter+0x112/0x120 >> >> This will result to a use after-free and possibly an host crash or >> memory corruption. >> >> Freeing the page-tables further down in domain_relinquish_resources() >> would not work because pages may not be released until later if another >> domain hold a reference on them. >> >> Once all the PCI devices have been de-assigned, it is actually pointless >> to access modify the IOMMU page-tables. So we can simply clear the root >> page-table address. >> >> Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator") >> Signed-off-by: Julien Grall <jgrall@amazon.com> >> >> --- >> >> This is an RFC because it would break AMD IOMMU driver. One option would >> be to move the call to the teardown callback earlier on. Any opinions? > > We already have > > static void amd_iommu_domain_destroy(struct domain *d) > { > dom_iommu(d)->arch.amd.root_table = NULL; > } > > and this function is AMD's teardown handler. Hence I suppose > doing the same for VT-d would be quite reasonable. And really > VT-d's iommu_domain_teardown() also already has > > hd->arch.vtd.pgd_maddr = 0; Let me have a look if that works. > > I guess what's missing is prevention of the root table > getting re-setup. This is taken care in the follow-up patch by forbidding page-table allocation. I can mention it in the commit message. > This, I guess, would be helped by the > previously suggested preventing of any further modifications > to the p2m and alike for dying domains. Note how e.g. the > handling of XEN_DOMCTL_assign_device already includes a > "dying" check. > > Jan >
On 23.12.2020 15:56, Julien Grall wrote: > On 23/12/2020 14:12, Jan Beulich wrote: >> On 22.12.2020 16:43, Julien Grall wrote: >>> This is an RFC because it would break AMD IOMMU driver. One option would >>> be to move the call to the teardown callback earlier on. Any opinions? >> >> We already have >> >> static void amd_iommu_domain_destroy(struct domain *d) >> { >> dom_iommu(d)->arch.amd.root_table = NULL; >> } >> >> and this function is AMD's teardown handler. Hence I suppose >> doing the same for VT-d would be quite reasonable. And really >> VT-d's iommu_domain_teardown() also already has >> >> hd->arch.vtd.pgd_maddr = 0; > > Let me have a look if that works. > >> >> I guess what's missing is prevention of the root table >> getting re-setup. > > This is taken care in the follow-up patch by forbidding page-table > allocation. I can mention it in the commit message. My expectation is that with that subsequent change the change here (or any variant of it) would become unnecessary. Jan
Hi Jan, On 23/12/2020 15:00, Jan Beulich wrote: > On 23.12.2020 15:56, Julien Grall wrote: >> On 23/12/2020 14:12, Jan Beulich wrote: >>> On 22.12.2020 16:43, Julien Grall wrote: >>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>> be to move the call to the teardown callback earlier on. Any opinions? >>> >>> We already have >>> >>> static void amd_iommu_domain_destroy(struct domain *d) >>> { >>> dom_iommu(d)->arch.amd.root_table = NULL; >>> } >>> >>> and this function is AMD's teardown handler. Hence I suppose >>> doing the same for VT-d would be quite reasonable. And really >>> VT-d's iommu_domain_teardown() also already has >>> >>> hd->arch.vtd.pgd_maddr = 0; >> >> Let me have a look if that works. >> >>> >>> I guess what's missing is prevention of the root table >>> getting re-setup. >> >> This is taken care in the follow-up patch by forbidding page-table >> allocation. I can mention it in the commit message. > > My expectation is that with that subsequent change the change here > (or any variant of it) would become unnecessary. I am not be sure. iommu_unmap() would still get called from put_page(). Are you suggesting to gate the code if d->is_dying as well? Even if this patch is deemed to be unecessary to fix the issue. This issue was quite hard to chase/reproduce. I think it would still be good to harden the code by zeroing hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the pointer was still "valid". Cheers,
On 23.12.2020 16:16, Julien Grall wrote: > On 23/12/2020 15:00, Jan Beulich wrote: >> On 23.12.2020 15:56, Julien Grall wrote: >>> On 23/12/2020 14:12, Jan Beulich wrote: >>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>> be to move the call to the teardown callback earlier on. Any opinions? >>>> >>>> We already have >>>> >>>> static void amd_iommu_domain_destroy(struct domain *d) >>>> { >>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>> } >>>> >>>> and this function is AMD's teardown handler. Hence I suppose >>>> doing the same for VT-d would be quite reasonable. And really >>>> VT-d's iommu_domain_teardown() also already has >>>> >>>> hd->arch.vtd.pgd_maddr = 0; >>> >>> Let me have a look if that works. >>> >>>> >>>> I guess what's missing is prevention of the root table >>>> getting re-setup. >>> >>> This is taken care in the follow-up patch by forbidding page-table >>> allocation. I can mention it in the commit message. >> >> My expectation is that with that subsequent change the change here >> (or any variant of it) would become unnecessary. > > I am not be sure. iommu_unmap() would still get called from put_page(). > Are you suggesting to gate the code if d->is_dying as well? Unmap shouldn't be allocating any memory right now, as in non-shared-page-table mode we don't install any superpages (unless I misremember). > Even if this patch is deemed to be unecessary to fix the issue. > This issue was quite hard to chase/reproduce. > > I think it would still be good to harden the code by zeroing > hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the > pointer was still "valid". But my point was that this zeroing already happens. What I suspect is that it gets re-populated after it was zeroed, because of page table manipulation that shouldn't be occurring anymore for a dying domain. Jan
Hi, On 23/12/2020 16:11, Jan Beulich wrote: > On 23.12.2020 16:16, Julien Grall wrote: >> On 23/12/2020 15:00, Jan Beulich wrote: >>> On 23.12.2020 15:56, Julien Grall wrote: >>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>> be to move the call to the teardown callback earlier on. Any opinions? >>>>> >>>>> We already have >>>>> >>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>> { >>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>> } >>>>> >>>>> and this function is AMD's teardown handler. Hence I suppose >>>>> doing the same for VT-d would be quite reasonable. And really >>>>> VT-d's iommu_domain_teardown() also already has >>>>> >>>>> hd->arch.vtd.pgd_maddr = 0; >>>> >>>> Let me have a look if that works. >>>> >>>>> >>>>> I guess what's missing is prevention of the root table >>>>> getting re-setup. >>>> >>>> This is taken care in the follow-up patch by forbidding page-table >>>> allocation. I can mention it in the commit message. >>> >>> My expectation is that with that subsequent change the change here >>> (or any variant of it) would become unnecessary. >> >> I am not be sure. iommu_unmap() would still get called from put_page(). >> Are you suggesting to gate the code if d->is_dying as well? > > Unmap shouldn't be allocating any memory right now, as in > non-shared-page-table mode we don't install any superpages > (unless I misremember). It doesn't allocate memory, but it will try to access the IOMMU page-tables (see more below). > >> Even if this patch is deemed to be unecessary to fix the issue. >> This issue was quite hard to chase/reproduce. >> >> I think it would still be good to harden the code by zeroing >> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >> pointer was still "valid". > > But my point was that this zeroing already happens. > What I > suspect is that it gets re-populated after it was zeroed, > because of page table manipulation that shouldn't be > occurring anymore for a dying domain. AFAICT, the zeroing is happening in ->teardown() helper. It is only called when the domain is fully destroyed (see call in arch_domain_destroy()). This will happen much after relinquishing the code. Could you clarify why you think it is already zeroed and by who? Cheers,
On 23.12.2020 17:16, Julien Grall wrote: > On 23/12/2020 16:11, Jan Beulich wrote: >> On 23.12.2020 16:16, Julien Grall wrote: >>> On 23/12/2020 15:00, Jan Beulich wrote: >>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>> be to move the call to the teardown callback earlier on. Any opinions? Please note this (in your original submission). I simply ... >>>>>> We already have >>>>>> >>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>> { >>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>> } >>>>>> >>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>> VT-d's iommu_domain_teardown() also already has >>>>>> >>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>> >>>>> Let me have a look if that works. >>>>> >>>>>> >>>>>> I guess what's missing is prevention of the root table >>>>>> getting re-setup. >>>>> >>>>> This is taken care in the follow-up patch by forbidding page-table >>>>> allocation. I can mention it in the commit message. >>>> >>>> My expectation is that with that subsequent change the change here >>>> (or any variant of it) would become unnecessary. >>> >>> I am not be sure. iommu_unmap() would still get called from put_page(). >>> Are you suggesting to gate the code if d->is_dying as well? >> >> Unmap shouldn't be allocating any memory right now, as in >> non-shared-page-table mode we don't install any superpages >> (unless I misremember). > > It doesn't allocate memory, but it will try to access the IOMMU > page-tables (see more below). > >> >>> Even if this patch is deemed to be unecessary to fix the issue. >>> This issue was quite hard to chase/reproduce. >>> >>> I think it would still be good to harden the code by zeroing >>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>> pointer was still "valid". >> >> But my point was that this zeroing already happens. >> What I >> suspect is that it gets re-populated after it was zeroed, >> because of page table manipulation that shouldn't be >> occurring anymore for a dying domain. > > AFAICT, the zeroing is happening in ->teardown() helper. > > It is only called when the domain is fully destroyed (see call in > arch_domain_destroy()). This will happen much after relinquishing the code. > > Could you clarify why you think it is already zeroed and by who? ... trusted you on what you stated there. But perhaps I somehow misunderstood that sentence to mean you want to put your addition into the teardown functions, when apparently you meant to invoke them earlier in the process. Without clearly identifying why this would be a safe thing to do, I couldn't imagine that's what you suggest as alternative. This is because the interdependencies of the IOMMU code are pretty hard to follow at times, and hence any such re-ordering has a fair risk of breaking something elsewhere. Jan
On 23/12/2020 16:24, Jan Beulich wrote: > On 23.12.2020 17:16, Julien Grall wrote: >> On 23/12/2020 16:11, Jan Beulich wrote: >>> On 23.12.2020 16:16, Julien Grall wrote: >>>> On 23/12/2020 15:00, Jan Beulich wrote: >>>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? > > Please note this (in your original submission). I simply ... > >>>>>>> We already have >>>>>>> >>>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>>> { >>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>>> } >>>>>>> >>>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>>> VT-d's iommu_domain_teardown() also already has >>>>>>> >>>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>>> >>>>>> Let me have a look if that works. >>>>>> >>>>>>> >>>>>>> I guess what's missing is prevention of the root table >>>>>>> getting re-setup. >>>>>> >>>>>> This is taken care in the follow-up patch by forbidding page-table >>>>>> allocation. I can mention it in the commit message. >>>>> >>>>> My expectation is that with that subsequent change the change here >>>>> (or any variant of it) would become unnecessary. >>>> >>>> I am not be sure. iommu_unmap() would still get called from put_page(). >>>> Are you suggesting to gate the code if d->is_dying as well? >>> >>> Unmap shouldn't be allocating any memory right now, as in >>> non-shared-page-table mode we don't install any superpages >>> (unless I misremember). >> >> It doesn't allocate memory, but it will try to access the IOMMU >> page-tables (see more below). >> >>> >>>> Even if this patch is deemed to be unecessary to fix the issue. >>>> This issue was quite hard to chase/reproduce. >>>> >>>> I think it would still be good to harden the code by zeroing >>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>>> pointer was still "valid". >>> >>> But my point was that this zeroing already happens. >>> What I >>> suspect is that it gets re-populated after it was zeroed, >>> because of page table manipulation that shouldn't be >>> occurring anymore for a dying domain. >> >> AFAICT, the zeroing is happening in ->teardown() helper. >> >> It is only called when the domain is fully destroyed (see call in >> arch_domain_destroy()). This will happen much after relinquishing the code. >> >> Could you clarify why you think it is already zeroed and by who? > > ... trusted you on what you stated there. But perhaps I somehow > misunderstood that sentence to mean you want to put your addition > into the teardown functions, when apparently you meant to invoke > them earlier in the process. Without clearly identifying why this > would be a safe thing to do, I couldn't imagine that's what you > suggest as alternative. This was a wording issue. I meant moving ->teardown() before (or calling from) iommu_free_pgtables(). Shall I introduce a new callback then? > This is because the interdependencies of > the IOMMU code are pretty hard to follow at times, and hence any > such re-ordering has a fair risk of breaking something elsewhere. Right, this is another reason to try to get most of my fix self-contained rather relying on top-layer call to protect against a domain dying. Cheers,
On 23.12.2020 17:29, Julien Grall wrote: > On 23/12/2020 16:24, Jan Beulich wrote: >> On 23.12.2020 17:16, Julien Grall wrote: >>> On 23/12/2020 16:11, Jan Beulich wrote: >>>> On 23.12.2020 16:16, Julien Grall wrote: >>>>> On 23/12/2020 15:00, Jan Beulich wrote: >>>>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? >> >> Please note this (in your original submission). I simply ... >> >>>>>>>> We already have >>>>>>>> >>>>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>>>> { >>>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>>>> } >>>>>>>> >>>>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>>>> VT-d's iommu_domain_teardown() also already has >>>>>>>> >>>>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>>>> >>>>>>> Let me have a look if that works. >>>>>>> >>>>>>>> >>>>>>>> I guess what's missing is prevention of the root table >>>>>>>> getting re-setup. >>>>>>> >>>>>>> This is taken care in the follow-up patch by forbidding page-table >>>>>>> allocation. I can mention it in the commit message. >>>>>> >>>>>> My expectation is that with that subsequent change the change here >>>>>> (or any variant of it) would become unnecessary. >>>>> >>>>> I am not be sure. iommu_unmap() would still get called from put_page(). >>>>> Are you suggesting to gate the code if d->is_dying as well? >>>> >>>> Unmap shouldn't be allocating any memory right now, as in >>>> non-shared-page-table mode we don't install any superpages >>>> (unless I misremember). >>> >>> It doesn't allocate memory, but it will try to access the IOMMU >>> page-tables (see more below). >>> >>>> >>>>> Even if this patch is deemed to be unecessary to fix the issue. >>>>> This issue was quite hard to chase/reproduce. >>>>> >>>>> I think it would still be good to harden the code by zeroing >>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>>>> pointer was still "valid". >>>> >>>> But my point was that this zeroing already happens. >>>> What I >>>> suspect is that it gets re-populated after it was zeroed, >>>> because of page table manipulation that shouldn't be >>>> occurring anymore for a dying domain. >>> >>> AFAICT, the zeroing is happening in ->teardown() helper. >>> >>> It is only called when the domain is fully destroyed (see call in >>> arch_domain_destroy()). This will happen much after relinquishing the code. >>> >>> Could you clarify why you think it is already zeroed and by who? >> >> ... trusted you on what you stated there. But perhaps I somehow >> misunderstood that sentence to mean you want to put your addition >> into the teardown functions, when apparently you meant to invoke >> them earlier in the process. Without clearly identifying why this >> would be a safe thing to do, I couldn't imagine that's what you >> suggest as alternative. > > This was a wording issue. I meant moving ->teardown() before (or calling > from) iommu_free_pgtables(). > > Shall I introduce a new callback then? Earlier zeroing won't help unless you prevent re-population, or unless you make the code capable of telling "still zero" from "already zero". But I have to admit I'd like to also have Paul's opinion on the matter. Jan >> This is because the interdependencies of >> the IOMMU code are pretty hard to follow at times, and hence any >> such re-ordering has a fair risk of breaking something elsewhere. > > Right, this is another reason to try to get most of my fix > self-contained rather relying on top-layer call to protect against a > domain dying. > > Cheers, >
On 23/12/2020 16:46, Jan Beulich wrote: > On 23.12.2020 17:29, Julien Grall wrote: >> On 23/12/2020 16:24, Jan Beulich wrote: >>> On 23.12.2020 17:16, Julien Grall wrote: >>>> On 23/12/2020 16:11, Jan Beulich wrote: >>>>> On 23.12.2020 16:16, Julien Grall wrote: >>>>>> On 23/12/2020 15:00, Jan Beulich wrote: >>>>>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? >>> >>> Please note this (in your original submission). I simply ... >>> >>>>>>>>> We already have >>>>>>>>> >>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>>>>> { >>>>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>>>>> } >>>>>>>>> >>>>>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>>>>> VT-d's iommu_domain_teardown() also already has >>>>>>>>> >>>>>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>>>>> >>>>>>>> Let me have a look if that works. >>>>>>>> >>>>>>>>> >>>>>>>>> I guess what's missing is prevention of the root table >>>>>>>>> getting re-setup. >>>>>>>> >>>>>>>> This is taken care in the follow-up patch by forbidding page-table >>>>>>>> allocation. I can mention it in the commit message. >>>>>>> >>>>>>> My expectation is that with that subsequent change the change here >>>>>>> (or any variant of it) would become unnecessary. >>>>>> >>>>>> I am not be sure. iommu_unmap() would still get called from put_page(). >>>>>> Are you suggesting to gate the code if d->is_dying as well? >>>>> >>>>> Unmap shouldn't be allocating any memory right now, as in >>>>> non-shared-page-table mode we don't install any superpages >>>>> (unless I misremember). >>>> >>>> It doesn't allocate memory, but it will try to access the IOMMU >>>> page-tables (see more below). >>>> >>>>> >>>>>> Even if this patch is deemed to be unecessary to fix the issue. >>>>>> This issue was quite hard to chase/reproduce. >>>>>> >>>>>> I think it would still be good to harden the code by zeroing >>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>>>>> pointer was still "valid". >>>>> >>>>> But my point was that this zeroing already happens. >>>>> What I >>>>> suspect is that it gets re-populated after it was zeroed, >>>>> because of page table manipulation that shouldn't be >>>>> occurring anymore for a dying domain. >>>> >>>> AFAICT, the zeroing is happening in ->teardown() helper. >>>> >>>> It is only called when the domain is fully destroyed (see call in >>>> arch_domain_destroy()). This will happen much after relinquishing the code. >>>> >>>> Could you clarify why you think it is already zeroed and by who? >>> >>> ... trusted you on what you stated there. But perhaps I somehow >>> misunderstood that sentence to mean you want to put your addition >>> into the teardown functions, when apparently you meant to invoke >>> them earlier in the process. Without clearly identifying why this >>> would be a safe thing to do, I couldn't imagine that's what you >>> suggest as alternative. >> >> This was a wording issue. I meant moving ->teardown() before (or calling >> from) iommu_free_pgtables(). >> >> Shall I introduce a new callback then? > > Earlier zeroing won't help unless you prevent re-population, or > unless you make the code capable of telling "still zero" from > "already zero". But I have to admit I'd like to also have Paul's > opinion on the matter. Patch #4 is meant to prevent that with the d->is_dying check in the IOMMU page-table allocation. Do you think this is not enough? Cheers,
On 23.12.2020 17:54, Julien Grall wrote: > > > On 23/12/2020 16:46, Jan Beulich wrote: >> On 23.12.2020 17:29, Julien Grall wrote: >>> On 23/12/2020 16:24, Jan Beulich wrote: >>>> On 23.12.2020 17:16, Julien Grall wrote: >>>>> On 23/12/2020 16:11, Jan Beulich wrote: >>>>>> On 23.12.2020 16:16, Julien Grall wrote: >>>>>>> On 23/12/2020 15:00, Jan Beulich wrote: >>>>>>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? >>>> >>>> Please note this (in your original submission). I simply ... >>>> >>>>>>>>>> We already have >>>>>>>>>> >>>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>>>>>> { >>>>>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>>>>>> VT-d's iommu_domain_teardown() also already has >>>>>>>>>> >>>>>>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>>>>>> >>>>>>>>> Let me have a look if that works. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I guess what's missing is prevention of the root table >>>>>>>>>> getting re-setup. >>>>>>>>> >>>>>>>>> This is taken care in the follow-up patch by forbidding page-table >>>>>>>>> allocation. I can mention it in the commit message. >>>>>>>> >>>>>>>> My expectation is that with that subsequent change the change here >>>>>>>> (or any variant of it) would become unnecessary. >>>>>>> >>>>>>> I am not be sure. iommu_unmap() would still get called from put_page(). >>>>>>> Are you suggesting to gate the code if d->is_dying as well? >>>>>> >>>>>> Unmap shouldn't be allocating any memory right now, as in >>>>>> non-shared-page-table mode we don't install any superpages >>>>>> (unless I misremember). >>>>> >>>>> It doesn't allocate memory, but it will try to access the IOMMU >>>>> page-tables (see more below). >>>>> >>>>>> >>>>>>> Even if this patch is deemed to be unecessary to fix the issue. >>>>>>> This issue was quite hard to chase/reproduce. >>>>>>> >>>>>>> I think it would still be good to harden the code by zeroing >>>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>>>>>> pointer was still "valid". >>>>>> >>>>>> But my point was that this zeroing already happens. >>>>>> What I >>>>>> suspect is that it gets re-populated after it was zeroed, >>>>>> because of page table manipulation that shouldn't be >>>>>> occurring anymore for a dying domain. >>>>> >>>>> AFAICT, the zeroing is happening in ->teardown() helper. >>>>> >>>>> It is only called when the domain is fully destroyed (see call in >>>>> arch_domain_destroy()). This will happen much after relinquishing the code. >>>>> >>>>> Could you clarify why you think it is already zeroed and by who? >>>> >>>> ... trusted you on what you stated there. But perhaps I somehow >>>> misunderstood that sentence to mean you want to put your addition >>>> into the teardown functions, when apparently you meant to invoke >>>> them earlier in the process. Without clearly identifying why this >>>> would be a safe thing to do, I couldn't imagine that's what you >>>> suggest as alternative. >>> >>> This was a wording issue. I meant moving ->teardown() before (or calling >>> from) iommu_free_pgtables(). >>> >>> Shall I introduce a new callback then? >> >> Earlier zeroing won't help unless you prevent re-population, or >> unless you make the code capable of telling "still zero" from >> "already zero". But I have to admit I'd like to also have Paul's >> opinion on the matter. > > Patch #4 is meant to prevent that with the d->is_dying check in the > IOMMU page-table allocation. > > Do you think this is not enough? It probably is; I think that other patch would want to come first then, or both be folded. Nevertheless I'm not fully convinced putting the check there is the best course of action. Jan
Hi Jan, On 23/12/2020 17:02, Jan Beulich wrote: > On 23.12.2020 17:54, Julien Grall wrote: >> >> >> On 23/12/2020 16:46, Jan Beulich wrote: >>> On 23.12.2020 17:29, Julien Grall wrote: >>>> On 23/12/2020 16:24, Jan Beulich wrote: >>>>> On 23.12.2020 17:16, Julien Grall wrote: >>>>>> On 23/12/2020 16:11, Jan Beulich wrote: >>>>>>> On 23.12.2020 16:16, Julien Grall wrote: >>>>>>>> On 23/12/2020 15:00, Jan Beulich wrote: >>>>>>>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? >>>>> >>>>> Please note this (in your original submission). I simply ... >>>>> >>>>>>>>>>> We already have >>>>>>>>>>> >>>>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>>>>>>> { >>>>>>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>>>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>>>>>>> VT-d's iommu_domain_teardown() also already has >>>>>>>>>>> >>>>>>>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>>>>>>> >>>>>>>>>> Let me have a look if that works. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I guess what's missing is prevention of the root table >>>>>>>>>>> getting re-setup. >>>>>>>>>> >>>>>>>>>> This is taken care in the follow-up patch by forbidding page-table >>>>>>>>>> allocation. I can mention it in the commit message. >>>>>>>>> >>>>>>>>> My expectation is that with that subsequent change the change here >>>>>>>>> (or any variant of it) would become unnecessary. >>>>>>>> >>>>>>>> I am not be sure. iommu_unmap() would still get called from put_page(). >>>>>>>> Are you suggesting to gate the code if d->is_dying as well? >>>>>>> >>>>>>> Unmap shouldn't be allocating any memory right now, as in >>>>>>> non-shared-page-table mode we don't install any superpages >>>>>>> (unless I misremember). >>>>>> >>>>>> It doesn't allocate memory, but it will try to access the IOMMU >>>>>> page-tables (see more below). >>>>>> >>>>>>> >>>>>>>> Even if this patch is deemed to be unecessary to fix the issue. >>>>>>>> This issue was quite hard to chase/reproduce. >>>>>>>> >>>>>>>> I think it would still be good to harden the code by zeroing >>>>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>>>>>>> pointer was still "valid". >>>>>>> >>>>>>> But my point was that this zeroing already happens. >>>>>>> What I >>>>>>> suspect is that it gets re-populated after it was zeroed, >>>>>>> because of page table manipulation that shouldn't be >>>>>>> occurring anymore for a dying domain. >>>>>> >>>>>> AFAICT, the zeroing is happening in ->teardown() helper. >>>>>> >>>>>> It is only called when the domain is fully destroyed (see call in >>>>>> arch_domain_destroy()). This will happen much after relinquishing the code. >>>>>> >>>>>> Could you clarify why you think it is already zeroed and by who? >>>>> >>>>> ... trusted you on what you stated there. But perhaps I somehow >>>>> misunderstood that sentence to mean you want to put your addition >>>>> into the teardown functions, when apparently you meant to invoke >>>>> them earlier in the process. Without clearly identifying why this >>>>> would be a safe thing to do, I couldn't imagine that's what you >>>>> suggest as alternative. >>>> >>>> This was a wording issue. I meant moving ->teardown() before (or calling >>>> from) iommu_free_pgtables(). >>>> >>>> Shall I introduce a new callback then? >>> >>> Earlier zeroing won't help unless you prevent re-population, or >>> unless you make the code capable of telling "still zero" from >>> "already zero". But I have to admit I'd like to also have Paul's >>> opinion on the matter. >> >> Patch #4 is meant to prevent that with the d->is_dying check in the >> IOMMU page-table allocation. >> >> Do you think this is not enough? > > It probably is; I think that other patch would want to come first > then, or both be folded. I would like to keep them separated. But I am happy to re-order them. > Nevertheless I'm not fully convinced > putting the check there is the best course of action. As you pointed out in a previous e-mail, the IOMMU code is pretty hard to follow at times. The check in the allocator is quite simple, so I think it would be best to keep it. It doesn't mean this should be the only change, but it will avoid a whole lot of potential issues if we missed any path that may touch the IOMMU page-tables while the domain is dying. The other checks can just be shortcut to prevent extra work that will result to a failure. I will wait for Paul's input before reworking the series. Cheers,
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 23 December 2020 16:46 > To: Julien Grall <julien@xen.org>; Paul Durrant <paul@xen.org> > Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; xen-devel@lists.xenproject.org > Subject: Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the > page-tables > > On 23.12.2020 17:29, Julien Grall wrote: > > On 23/12/2020 16:24, Jan Beulich wrote: > >> On 23.12.2020 17:16, Julien Grall wrote: > >>> On 23/12/2020 16:11, Jan Beulich wrote: > >>>> On 23.12.2020 16:16, Julien Grall wrote: > >>>>> On 23/12/2020 15:00, Jan Beulich wrote: > >>>>>> On 23.12.2020 15:56, Julien Grall wrote: > >>>>>>> On 23/12/2020 14:12, Jan Beulich wrote: > >>>>>>>> On 22.12.2020 16:43, Julien Grall wrote: > >>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would > >>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? > >> > >> Please note this (in your original submission). I simply ... > >> > >>>>>>>> We already have > >>>>>>>> > >>>>>>>> static void amd_iommu_domain_destroy(struct domain *d) > >>>>>>>> { > >>>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; > >>>>>>>> } > >>>>>>>> > >>>>>>>> and this function is AMD's teardown handler. Hence I suppose > >>>>>>>> doing the same for VT-d would be quite reasonable. And really > >>>>>>>> VT-d's iommu_domain_teardown() also already has > >>>>>>>> > >>>>>>>> hd->arch.vtd.pgd_maddr = 0; > >>>>>>> > >>>>>>> Let me have a look if that works. > >>>>>>> > >>>>>>>> > >>>>>>>> I guess what's missing is prevention of the root table > >>>>>>>> getting re-setup. > >>>>>>> > >>>>>>> This is taken care in the follow-up patch by forbidding page-table > >>>>>>> allocation. I can mention it in the commit message. > >>>>>> > >>>>>> My expectation is that with that subsequent change the change here > >>>>>> (or any variant of it) would become unnecessary. > >>>>> > >>>>> I am not be sure. iommu_unmap() would still get called from put_page(). > >>>>> Are you suggesting to gate the code if d->is_dying as well? > >>>> > >>>> Unmap shouldn't be allocating any memory right now, as in > >>>> non-shared-page-table mode we don't install any superpages > >>>> (unless I misremember). > >>> > >>> It doesn't allocate memory, but it will try to access the IOMMU > >>> page-tables (see more below). > >>> > >>>> > >>>>> Even if this patch is deemed to be unecessary to fix the issue. > >>>>> This issue was quite hard to chase/reproduce. > >>>>> > >>>>> I think it would still be good to harden the code by zeroing > >>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the > >>>>> pointer was still "valid". > >>>> > >>>> But my point was that this zeroing already happens. > >>>> What I > >>>> suspect is that it gets re-populated after it was zeroed, > >>>> because of page table manipulation that shouldn't be > >>>> occurring anymore for a dying domain. > >>> > >>> AFAICT, the zeroing is happening in ->teardown() helper. > >>> > >>> It is only called when the domain is fully destroyed (see call in > >>> arch_domain_destroy()). This will happen much after relinquishing the code. > >>> > >>> Could you clarify why you think it is already zeroed and by who? > >> > >> ... trusted you on what you stated there. But perhaps I somehow > >> misunderstood that sentence to mean you want to put your addition > >> into the teardown functions, when apparently you meant to invoke > >> them earlier in the process. Without clearly identifying why this > >> would be a safe thing to do, I couldn't imagine that's what you > >> suggest as alternative. > > > > This was a wording issue. I meant moving ->teardown() before (or calling > > from) iommu_free_pgtables(). > > > > Shall I introduce a new callback then? > > Earlier zeroing won't help unless you prevent re-population, or > unless you make the code capable of telling "still zero" from > "already zero". But I have to admit I'd like to also have Paul's > opinion on the matter. > I have a pending series to dynamically unshare IOMMU mappings (to allow logdirty to be enabled on domains with currently-shared EPT) which gets rid of the lazy allocation of the root table and uses pgd_maddr == NULL as a test of whether EPT is shared or not. Therefore it would seem best, to fix this immediate problem, to also stop lazy allocation of pgd_maddr, and re-introduce a per-implementation free_pgtables() callback which zeroes pgd_maddr and then calls the common function. Paul
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c index 779dbb5b98ba..99a23177b3d2 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -267,6 +267,16 @@ int iommu_free_pgtables(struct domain *d) struct page_info *pg; unsigned int done = 0; + /* + * Pages will be moved to the free list in a bit. So we want to + * clear the root page-table to avoid any potential use after-free. + * + * XXX: This only code works for Intel vT-D. + */ + spin_lock(&hd->arch.mapping_lock); + hd->arch.vtd.pgd_maddr = 0; + spin_unlock(&hd->arch.mapping_lock); + spin_lock(&hd->arch.pgtables.lock); while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) ) {