diff mbox series

[1/8] IOMMU/x86: drop locking from quarantine_init() hooks

Message ID 3430b8fa-6700-b5ec-a160-2e7609dd38a3@suse.com (mailing list archive)
State New, archived
Headers show
Series IOMMU: assorted follow-on to XSA-400 | expand

Commit Message

Jan Beulich April 11, 2022, 9:35 a.m. UTC
Prior extension of these functions to enable per-device quarantine page
tables already didn't add more locking there, but merely left in place
what had been there before. But really locking is unnecessary here:
We're running with pcidevs_lock held (i.e. multiple invocations of the
same function [or their teardown equivalents] are impossible, and hence
there are no "local" races), while all consuming of the data being
populated here can't race anyway due to happening sequentially
afterwards. See also the comment in struct arch_pci_dev.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Andrew Cooper April 11, 2022, 10:01 a.m. UTC | #1
On 11/04/2022 10:35, Jan Beulich wrote:
> Prior extension of these functions to enable per-device quarantine page
> tables already didn't add more locking there, but merely left in place
> what had been there before. But really locking is unnecessary here:
> We're running with pcidevs_lock held (i.e. multiple invocations of the
> same function [or their teardown equivalents] are impossible, and hence
> there are no "local" races), while all consuming of the data being
> populated here can't race anyway due to happening sequentially
> afterwards. See also the comment in struct arch_pci_dev.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

It is only legitimate to drop these calls if you delete the mapping_lock
entirely.  Otherwise you're breaking the semantics of mapping_lock.

Your argument of "well this is already guarded by the pci lock" means
that these are uncontended lock/unlock operations, and therefore not
interesting to drop in the first place.

This patch is specifically setting us up for an XSA in the future when
the behaviour of the the PCI lock changes, the fix for which will be
revert this patch.

~Andrew
Jan Beulich April 11, 2022, 10:18 a.m. UTC | #2
On 11.04.2022 12:01, Andrew Cooper wrote:
> On 11/04/2022 10:35, Jan Beulich wrote:
>> Prior extension of these functions to enable per-device quarantine page
>> tables already didn't add more locking there, but merely left in place
>> what had been there before. But really locking is unnecessary here:
>> We're running with pcidevs_lock held (i.e. multiple invocations of the
>> same function [or their teardown equivalents] are impossible, and hence
>> there are no "local" races), while all consuming of the data being
>> populated here can't race anyway due to happening sequentially
>> afterwards. See also the comment in struct arch_pci_dev.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> It is only legitimate to drop these calls if you delete the mapping_lock
> entirely.  Otherwise you're breaking the semantics of mapping_lock.

Not for all domains other than DomIO. And DomIO is, well, special. As
is at what times quarantine_init() is actually being invoked.

> Your argument of "well this is already guarded by the pci lock" means
> that these are uncontended lock/unlock operations, and therefore not
> interesting to drop in the first place.
> 
> This patch is specifically setting us up for an XSA in the future when
> the behaviour of the the PCI lock changes, the fix for which will be
> revert this patch.

That wouldn't suffice then. As said in the description, and as discussed
during the development of the XSA-400 series, further locking would need
adding then.

Jan
Roger Pau Monné April 12, 2022, 11:05 a.m. UTC | #3
On Mon, Apr 11, 2022 at 11:35:54AM +0200, Jan Beulich wrote:
> Prior extension of these functions to enable per-device quarantine page
> tables already didn't add more locking there, but merely left in place
> what had been there before. But really locking is unnecessary here:
> We're running with pcidevs_lock held (i.e. multiple invocations of the
> same function [or their teardown equivalents] are impossible, and hence
> there are no "local" races), while all consuming of the data being
> populated here can't race anyway due to happening sequentially
> afterwards. See also the comment in struct arch_pci_dev.

I would explicitly say that none of the code in the locked region
touches any data in the domain_iommu struct, so taking the
mapping_lock is unneeded.

Long term we might wish to implemented a per-device lock that could be
used here, instead of relying on the pcidevs lock.

> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.
Jan Beulich April 12, 2022, 12:17 p.m. UTC | #4
On 12.04.2022 13:05, Roger Pau Monné wrote:
> On Mon, Apr 11, 2022 at 11:35:54AM +0200, Jan Beulich wrote:
>> Prior extension of these functions to enable per-device quarantine page
>> tables already didn't add more locking there, but merely left in place
>> what had been there before. But really locking is unnecessary here:
>> We're running with pcidevs_lock held (i.e. multiple invocations of the
>> same function [or their teardown equivalents] are impossible, and hence
>> there are no "local" races), while all consuming of the data being
>> populated here can't race anyway due to happening sequentially
>> afterwards. See also the comment in struct arch_pci_dev.
> 
> I would explicitly say that none of the code in the locked region
> touches any data in the domain_iommu struct, so taking the
> mapping_lock is unneeded.

But that would limit what the mapping_lock protects more than it actually
does: The entire page tables hanging off of the root table are also
protected by that lock. It's just that for a pdev, after having
installed identity mappings, the root doesn't hang off of hd. But in
principle - i.e. if the per-device mappings weren't static once created -
the lock would be the one to hold whenever any of these page tables was
modified.

> Long term we might wish to implemented a per-device lock that could be
> used here, instead of relying on the pcidevs lock.

Well, I would want to avoid this unless a need arises to not hold the
pcidevs lock here. Or, of course, if a need arose to dynamically alter
these page tables.

>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

Jan
Roger Pau Monné April 12, 2022, 12:54 p.m. UTC | #5
On Tue, Apr 12, 2022 at 02:17:16PM +0200, Jan Beulich wrote:
> On 12.04.2022 13:05, Roger Pau Monné wrote:
> > On Mon, Apr 11, 2022 at 11:35:54AM +0200, Jan Beulich wrote:
> >> Prior extension of these functions to enable per-device quarantine page
> >> tables already didn't add more locking there, but merely left in place
> >> what had been there before. But really locking is unnecessary here:
> >> We're running with pcidevs_lock held (i.e. multiple invocations of the
> >> same function [or their teardown equivalents] are impossible, and hence
> >> there are no "local" races), while all consuming of the data being
> >> populated here can't race anyway due to happening sequentially
> >> afterwards. See also the comment in struct arch_pci_dev.
> > 
> > I would explicitly say that none of the code in the locked region
> > touches any data in the domain_iommu struct, so taking the
> > mapping_lock is unneeded.
> 
> But that would limit what the mapping_lock protects more than it actually
> does: The entire page tables hanging off of the root table are also
> protected by that lock.

Right, but at the point where fill_qpt() gets called
hd->arch.amd.root_table == NULL, and hence it seems completely
pointless to wrap this in a mapping_lock locked region.

> It's just that for a pdev, after having
> installed identity mappings, the root doesn't hang off of hd. But in
> principle - i.e. if the per-device mappings weren't static once created -
> the lock would be the one to hold whenever any of these page tables was
> modified.

The lock would need to be held if pages tables are modified while
being set in hd->arch.amd.root_table, or at least that's my
understanding.

This is a special case anyway, as the page tables are not per-domain
but per-device, but it seems clear to me that if the page tables are
not set in hd->arch.amd.root_table the lock in hd->arch.mapping_lock
is not supposed to be protecting them.

> > Long term we might wish to implemented a per-device lock that could be
> > used here, instead of relying on the pcidevs lock.
> 
> Well, I would want to avoid this unless a need arises to not hold the
> pcidevs lock here. Or, of course, if a need arose to dynamically alter
> these page tables.

I think it's likely we will need such lock for other purposes if we
ever manage to convert the pcidevs lock into a rwlock, so my comment
was not so much as it's required for the use case here, but a side
effect if we ever manage to change pcidevs lock.

Thanks, Roger.
Jan Beulich April 12, 2022, 1:12 p.m. UTC | #6
On 12.04.2022 14:54, Roger Pau Monné wrote:
> On Tue, Apr 12, 2022 at 02:17:16PM +0200, Jan Beulich wrote:
>> On 12.04.2022 13:05, Roger Pau Monné wrote:
>>> On Mon, Apr 11, 2022 at 11:35:54AM +0200, Jan Beulich wrote:
>>>> Prior extension of these functions to enable per-device quarantine page
>>>> tables already didn't add more locking there, but merely left in place
>>>> what had been there before. But really locking is unnecessary here:
>>>> We're running with pcidevs_lock held (i.e. multiple invocations of the
>>>> same function [or their teardown equivalents] are impossible, and hence
>>>> there are no "local" races), while all consuming of the data being
>>>> populated here can't race anyway due to happening sequentially
>>>> afterwards. See also the comment in struct arch_pci_dev.
>>>
>>> I would explicitly say that none of the code in the locked region
>>> touches any data in the domain_iommu struct, so taking the
>>> mapping_lock is unneeded.
>>
>> But that would limit what the mapping_lock protects more than it actually
>> does: The entire page tables hanging off of the root table are also
>> protected by that lock.
> 
> Right, but at the point where fill_qpt() gets called
> hd->arch.amd.root_table == NULL, and hence it seems completely
> pointless to wrap this in a mapping_lock locked region.
> 
>> It's just that for a pdev, after having
>> installed identity mappings, the root doesn't hang off of hd. But in
>> principle - i.e. if the per-device mappings weren't static once created -
>> the lock would be the one to hold whenever any of these page tables was
>> modified.
> 
> The lock would need to be held if pages tables are modified while
> being set in hd->arch.amd.root_table, or at least that's my
> understanding.
> 
> This is a special case anyway, as the page tables are not per-domain
> but per-device, but it seems clear to me that if the page tables are
> not set in hd->arch.amd.root_table the lock in hd->arch.mapping_lock
> is not supposed to be protecting them.

There are multiple models possible, one being that for per-device
page tables DomIO's lock protects all of them. Hence my hesitance to
say something along these lines in the description.

>>> Long term we might wish to implemented a per-device lock that could be
>>> used here, instead of relying on the pcidevs lock.
>>
>> Well, I would want to avoid this unless a need arises to not hold the
>> pcidevs lock here. Or, of course, if a need arose to dynamically alter
>> these page tables.
> 
> I think it's likely we will need such lock for other purposes if we
> ever manage to convert the pcidevs lock into a rwlock, so my comment
> was not so much as it's required for the use case here, but a side
> effect if we ever manage to change pcidevs lock.

Such a need would further depend on whether the code paths leading here
would hold the lock in read or write mode. But yes, it is reasonable to
expect that it would only be read mode.

Jan
Jan Beulich April 12, 2022, 1:14 p.m. UTC | #7
On 11.04.2022 12:01, Andrew Cooper wrote:
> On 11/04/2022 10:35, Jan Beulich wrote:
>> Prior extension of these functions to enable per-device quarantine page
>> tables already didn't add more locking there, but merely left in place
>> what had been there before. But really locking is unnecessary here:
>> We're running with pcidevs_lock held (i.e. multiple invocations of the
>> same function [or their teardown equivalents] are impossible, and hence
>> there are no "local" races), while all consuming of the data being
>> populated here can't race anyway due to happening sequentially
>> afterwards. See also the comment in struct arch_pci_dev.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> It is only legitimate to drop these calls if you delete the mapping_lock
> entirely.  Otherwise you're breaking the semantics of mapping_lock.
> 
> Your argument of "well this is already guarded by the pci lock" means
> that these are uncontended lock/unlock operations, and therefore not
> interesting to drop in the first place.
> 
> This patch is specifically setting us up for an XSA in the future when
> the behaviour of the the PCI lock changes, the fix for which will be
> revert this patch.

Further to my earlier reply, may I remind you that changes to the PCI
lock won't go unnoticed here, as there are respective ASSERT()s in
place.

Jan
Tian, Kevin April 20, 2022, 6:22 a.m. UTC | #8
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, April 11, 2022 5:36 PM
> 
> Prior extension of these functions to enable per-device quarantine page
> tables already didn't add more locking there, but merely left in place
> what had been there before. But really locking is unnecessary here:
> We're running with pcidevs_lock held (i.e. multiple invocations of the
> same function [or their teardown equivalents] are impossible, and hence
> there are no "local" races), while all consuming of the data being
> populated here can't race anyway due to happening sequentially
> afterwards. See also the comment in struct arch_pci_dev.

Better add some explanation (as you explained in other replies)
why above rationale around pcidevs_lock leads to the drop of
a different lock (mapping_lock).

> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
diff mbox series

Patch

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -699,15 +699,11 @@  int cf_check amd_iommu_quarantine_init(s
         union amd_iommu_pte *root;
         struct page_info *pgs[IOMMU_MAX_PT_LEVELS] = {};
 
-        spin_lock(&hd->arch.mapping_lock);
-
         root = __map_domain_page(pdev->arch.amd.root_table);
         rc = fill_qpt(root, level - 1, pgs);
         unmap_domain_page(root);
 
         pdev->arch.leaf_mfn = page_to_mfn(pgs[0]);
-
-        spin_unlock(&hd->arch.mapping_lock);
     }
 
     page_list_move(&pdev->arch.pgtables_list, &hd->arch.pgtables.list);
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -3054,15 +3054,11 @@  static int cf_check intel_iommu_quaranti
         struct dma_pte *root;
         struct page_info *pgs[6] = {};
 
-        spin_lock(&hd->arch.mapping_lock);
-
         root = map_vtd_domain_page(pdev->arch.vtd.pgd_maddr);
         rc = fill_qpt(root, level - 1, pgs);
         unmap_vtd_domain_page(root);
 
         pdev->arch.leaf_mfn = page_to_mfn(pgs[0]);
-
-        spin_unlock(&hd->arch.mapping_lock);
     }
 
     page_list_move(&pdev->arch.pgtables_list, &hd->arch.pgtables.list);