Message ID | 20210209152816.15792-4-julien@xen.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen/iommu: Collection of bug fixes for IOMMU teadorwn | expand |
> -----Original Message----- > From: Julien Grall <julien@xen.org> > Sent: 09 February 2021 15:28 > To: xen-devel@lists.xenproject.org > Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich > <jbeulich@suse.com>; Paul Durrant <paul@xen.org> > Subject: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying > > From: Julien Grall <jgrall@amazon.com> > > It is a bit pointless to crash a domain that is already dying. This will > become more an annoyance with a follow-up change where page-table > allocation will be forbidden when the domain is dying. > > Security wise, there is no change as the devices would still have access > to the IOMMU page-tables even if the domain has crashed until Xen > start to relinquish the resources. > > For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure > d->is_dying is correctly observed (a follow-up patch will held it in the s/held/hold > relinquish path). > > For Arm, there is still a small race possible. But there is so far no > failure specific to a domain dying. > > Signed-off-by: Julien Grall <jgrall@amazon.com> > > --- > > This was spotted when trying to destroy IOREQ servers while the domain > is dying. The code will try to add the entry back in the P2M and > therefore update the P2M (see arch_ioreq_server_disable() -> > hvm_add_ioreq_gfn()). > > It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however s/mappin/mapping > I didn't try a patch yet because checking d->is_dying can be racy (I > can't find a proper lock). > > Changes in v2: > - Patch added > --- > xen/drivers/passthrough/iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c > index 879d238bcd31..75419f20f76d 100644 > --- a/xen/drivers/passthrough/iommu.c > +++ b/xen/drivers/passthrough/iommu.c > @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn, > flush_flags) ) > continue; > > - if ( !is_hardware_domain(d) ) > + if ( !is_hardware_domain(d) && !d->is_dying ) > domain_crash(d); Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)? Paul > > break; > -- > 2.17.1
On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote: > > > -----Original Message----- > > From: Julien Grall <julien@xen.org> > > Sent: 09 February 2021 15:28 > > To: xen-devel@lists.xenproject.org > > Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich > > <jbeulich@suse.com>; Paul Durrant <paul@xen.org> > > Subject: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying > > > > From: Julien Grall <jgrall@amazon.com> > > > > It is a bit pointless to crash a domain that is already dying. This will > > become more an annoyance with a follow-up change where page-table > > allocation will be forbidden when the domain is dying. > > > > Security wise, there is no change as the devices would still have access > > to the IOMMU page-tables even if the domain has crashed until Xen > > start to relinquish the resources. > > > > For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure > > d->is_dying is correctly observed (a follow-up patch will held it in the > > s/held/hold > > > relinquish path). > > > > For Arm, there is still a small race possible. But there is so far no > > failure specific to a domain dying. > > > > Signed-off-by: Julien Grall <jgrall@amazon.com> > > > > --- > > > > This was spotted when trying to destroy IOREQ servers while the domain > > is dying. The code will try to add the entry back in the P2M and > > therefore update the P2M (see arch_ioreq_server_disable() -> > > hvm_add_ioreq_gfn()). > > > > It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however > > s/mappin/mapping > > > I didn't try a patch yet because checking d->is_dying can be racy (I > > can't find a proper lock). > > > > Changes in v2: > > - Patch added > > --- > > xen/drivers/passthrough/iommu.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c > > index 879d238bcd31..75419f20f76d 100644 > > --- a/xen/drivers/passthrough/iommu.c > > +++ b/xen/drivers/passthrough/iommu.c > > @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn, > > flush_flags) ) > > continue; > > > > - if ( !is_hardware_domain(d) ) > > + if ( !is_hardware_domain(d) && !d->is_dying ) > > domain_crash(d); > > Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)? Jan also suggested moving the check in domain_crash(). However, I felt it is potentially a too risky change for 4.15 as there are quite a few callers. Cheers,
On 09.02.2021 22:14, Julien Grall wrote: > On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote: >>> From: Julien Grall <julien@xen.org> >>> Sent: 09 February 2021 15:28 >>> >>> It is a bit pointless to crash a domain that is already dying. This will >>> become more an annoyance with a follow-up change where page-table >>> allocation will be forbidden when the domain is dying. >>> >>> Security wise, there is no change as the devices would still have access >>> to the IOMMU page-tables even if the domain has crashed until Xen >>> start to relinquish the resources. >>> >>> For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure >>> d->is_dying is correctly observed (a follow-up patch will held it in the >>> relinquish path). Am I to understand this to mean that at this point of the series things aren't really correct yet in this regard? If so, wouldn't it be better to re-order? >>> For Arm, there is still a small race possible. But there is so far no >>> failure specific to a domain dying. >>> >>> Signed-off-by: Julien Grall <jgrall@amazon.com> >>> >>> --- >>> >>> This was spotted when trying to destroy IOREQ servers while the domain >>> is dying. The code will try to add the entry back in the P2M and >>> therefore update the P2M (see arch_ioreq_server_disable() -> >>> hvm_add_ioreq_gfn()). >>> >>> It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however >>> I didn't try a patch yet because checking d->is_dying can be racy (I >>> can't find a proper lock). I understand the concern. I find it odd though that we permit iommu_map() to do anything at all when the domain is already dying. So irrespective of the remark below, how about bailing from iommu_map() earlier when the domain is dying? >>> --- a/xen/drivers/passthrough/iommu.c >>> +++ b/xen/drivers/passthrough/iommu.c >>> @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn, >>> flush_flags) ) >>> continue; >>> >>> - if ( !is_hardware_domain(d) ) >>> + if ( !is_hardware_domain(d) && !d->is_dying ) >>> domain_crash(d); >> >> Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)? > > Jan also suggested moving the check in domain_crash(). However, I felt > it is potentially a too risky change for 4.15 as there are quite a few > callers. This is a fair point. However, in such a case I'd prefer symmetry at least throughout this one source file (there are three more places), unless there are strong reasons against doing so. Jan
Hi Jan, On 10/02/2021 14:14, Jan Beulich wrote: > On 09.02.2021 22:14, Julien Grall wrote: >> On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote: >>>> From: Julien Grall <julien@xen.org> >>>> Sent: 09 February 2021 15:28 >>>> >>>> It is a bit pointless to crash a domain that is already dying. This will >>>> become more an annoyance with a follow-up change where page-table >>>> allocation will be forbidden when the domain is dying. >>>> >>>> Security wise, there is no change as the devices would still have access >>>> to the IOMMU page-tables even if the domain has crashed until Xen >>>> start to relinquish the resources. >>>> >>>> For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure >>>> d->is_dying is correctly observed (a follow-up patch will held it in the >>>> relinquish path). > > Am I to understand this to mean that at this point of the series > things aren't really correct yet in this regard? If so, wouldn't > it be better to re-order? You asked this specific order... So are you saying you want me to use the original ordering? > >>>> For Arm, there is still a small race possible. But there is so far no >>>> failure specific to a domain dying. >>>> >>>> Signed-off-by: Julien Grall <jgrall@amazon.com> >>>> >>>> --- >>>> >>>> This was spotted when trying to destroy IOREQ servers while the domain >>>> is dying. The code will try to add the entry back in the P2M and >>>> therefore update the P2M (see arch_ioreq_server_disable() -> >>>> hvm_add_ioreq_gfn()). >>>> >>>> It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however >>>> I didn't try a patch yet because checking d->is_dying can be racy (I >>>> can't find a proper lock). > > I understand the concern. I find it odd though that we permit > iommu_map() to do anything at all when the domain is already > dying. So irrespective of the remark below, how about bailing > from iommu_map() earlier when the domain is dying? I felt this was potentially too racy to use it. But it should be fine if keep the !d->is_dying below. > >>>> --- a/xen/drivers/passthrough/iommu.c >>>> +++ b/xen/drivers/passthrough/iommu.c >>>> @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn, >>>> flush_flags) ) >>>> continue; >>>> >>>> - if ( !is_hardware_domain(d) ) >>>> + if ( !is_hardware_domain(d) && !d->is_dying ) >>>> domain_crash(d); >>> >>> Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)? >> >> Jan also suggested moving the check in domain_crash(). However, I felt >> it is potentially a too risky change for 4.15 as there are quite a few >> callers. > > This is a fair point. However, in such a case I'd prefer symmetry > at least throughout this one source file (there are three more > places), unless there are strong reasons against doing so. I can have a look and see if the decision is easy to make. Cheers,
On 10.02.2021 15:58, Julien Grall wrote: > Hi Jan, > > On 10/02/2021 14:14, Jan Beulich wrote: >> On 09.02.2021 22:14, Julien Grall wrote: >>> On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote: >>>>> From: Julien Grall <julien@xen.org> >>>>> Sent: 09 February 2021 15:28 >>>>> >>>>> It is a bit pointless to crash a domain that is already dying. This will >>>>> become more an annoyance with a follow-up change where page-table >>>>> allocation will be forbidden when the domain is dying. >>>>> >>>>> Security wise, there is no change as the devices would still have access >>>>> to the IOMMU page-tables even if the domain has crashed until Xen >>>>> start to relinquish the resources. >>>>> >>>>> For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure >>>>> d->is_dying is correctly observed (a follow-up patch will held it in the >>>>> relinquish path). >> >> Am I to understand this to mean that at this point of the series >> things aren't really correct yet in this regard? If so, wouldn't >> it be better to re-order? > > You asked this specific order... So are you saying you want me to use > the original ordering? Well, it's been a while and I don't recall the specific reason for the request. But then at least the spin_barrier() you mean to rely on could / should be moved here? >>>>> For Arm, there is still a small race possible. But there is so far no >>>>> failure specific to a domain dying. >>>>> >>>>> Signed-off-by: Julien Grall <jgrall@amazon.com> >>>>> >>>>> --- >>>>> >>>>> This was spotted when trying to destroy IOREQ servers while the domain >>>>> is dying. The code will try to add the entry back in the P2M and >>>>> therefore update the P2M (see arch_ioreq_server_disable() -> >>>>> hvm_add_ioreq_gfn()). >>>>> >>>>> It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however >>>>> I didn't try a patch yet because checking d->is_dying can be racy (I >>>>> can't find a proper lock). >> >> I understand the concern. I find it odd though that we permit >> iommu_map() to do anything at all when the domain is already >> dying. So irrespective of the remark below, how about bailing >> from iommu_map() earlier when the domain is dying? > > I felt this was potentially too racy to use it. But it should be fine if > keep the !d->is_dying below. Why? As per later comments I didn't necessarily mean iommu_map() literally - as indicated, the per-vendor functions ought to be suitable to place the check, right after having taken the lock. Jan
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 879d238bcd31..75419f20f76d 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn, flush_flags) ) continue; - if ( !is_hardware_domain(d) ) + if ( !is_hardware_domain(d) && !d->is_dying ) domain_crash(d); break;