diff mbox series

[v2] VT-d: don't pass bridge devices to domain_context_mapping_one()

Message ID 620f37b6-43f2-030e-b259-84a4e9ceb7fc@suse.com (mailing list archive)
State New, archived
Headers show
Series [v2] VT-d: don't pass bridge devices to domain_context_mapping_one() | expand

Commit Message

Jan Beulich Jan. 20, 2020, 3:42 p.m. UTC
When passed a non-NULL pdev, the function does an owner check when it
finds an already existing context mapping. Bridges, however, don't get
passed through to guests, and hence their owner is always going to be
Dom0, leading to the assigment of all but one of the function of multi-
function PCI devices behind bridges to fail.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Add comments.
---
Note: This was reported as an apparent regression from XSA-302 / -306.
      So far I haven't been able to figure out how the code would have
      worked before, i.e. to me it looks like a pre-existing problem.
      This leaves the risk of the change here papering over another
      issue.

Comments

Roger Pau Monne Jan. 20, 2020, 4:07 p.m. UTC | #1
On Mon, Jan 20, 2020 at 04:42:22PM +0100, Jan Beulich wrote:
> When passed a non-NULL pdev, the function does an owner check when it
> finds an already existing context mapping. Bridges, however, don't get
> passed through to guests, and hence their owner is always going to be
> Dom0, leading to the assigment of all but one of the function of multi-
> function PCI devices behind bridges to fail.
> 
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Add comments.
> ---
> Note: This was reported as an apparent regression from XSA-302 / -306.
>       So far I haven't been able to figure out how the code would have
>       worked before, i.e. to me it looks like a pre-existing problem.
>       This leaves the risk of the change here papering over another
>       issue.
> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1493,18 +1493,28 @@ static int domain_context_mapping(struct
>          if ( find_upstream_bridge(seg, &bus, &devfn, &secbus) < 1 )
>              break;
>  
> +        /*
> +         * Mapping a bridge should, if anything, pass the struct pci_dev of
> +         * that bridge. Since bridges don't normally get assigned to guests,
> +         * their owner would be the wrong one. Pass NULL instead.
> +         */
>          ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
> -                                         pci_get_pdev(seg, bus, devfn));
> +                                         NULL);
>  
>          /*
>           * Devices behind PCIe-to-PCI/PCIx bridge may generate different
>           * requester-id. It may originate from devfn=0 on the secondary bus
>           * behind the bridge. Map that id as well if we didn't already.
> +         *
> +         * Somewhat similar as for bridges, we don't want to pass a struct
> +         * pci_dev here - there may not even exist one for this (secbus,0,0)
> +         * tuple. If there is one, without properly working device groups it
> +         * may again not have the correct owner.
>           */
>          if ( !ret && pdev_type(seg, bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE &&
>               (secbus != pdev->bus || pdev->devfn != 0) )
>              ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0,
> -                                             pci_get_pdev(seg, secbus, 0));
> +                                             NULL);

Isn't it dangerous to map this device to the guest, and that multiple
guests might end up with the same device mapped?

Thanks, Roger.
Jan Beulich Jan. 20, 2020, 4:15 p.m. UTC | #2
On 20.01.2020 17:07, Roger Pau Monné  wrote:
> On Mon, Jan 20, 2020 at 04:42:22PM +0100, Jan Beulich wrote:
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -1493,18 +1493,28 @@ static int domain_context_mapping(struct
>>          if ( find_upstream_bridge(seg, &bus, &devfn, &secbus) < 1 )
>>              break;
>>  
>> +        /*
>> +         * Mapping a bridge should, if anything, pass the struct pci_dev of
>> +         * that bridge. Since bridges don't normally get assigned to guests,
>> +         * their owner would be the wrong one. Pass NULL instead.
>> +         */
>>          ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
>> -                                         pci_get_pdev(seg, bus, devfn));
>> +                                         NULL);
>>  
>>          /*
>>           * Devices behind PCIe-to-PCI/PCIx bridge may generate different
>>           * requester-id. It may originate from devfn=0 on the secondary bus
>>           * behind the bridge. Map that id as well if we didn't already.
>> +         *
>> +         * Somewhat similar as for bridges, we don't want to pass a struct
>> +         * pci_dev here - there may not even exist one for this (secbus,0,0)
>> +         * tuple. If there is one, without properly working device groups it
>> +         * may again not have the correct owner.
>>           */
>>          if ( !ret && pdev_type(seg, bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE &&
>>               (secbus != pdev->bus || pdev->devfn != 0) )
>>              ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0,
>> -                                             pci_get_pdev(seg, secbus, 0));
>> +                                             NULL);
> 
> Isn't it dangerous to map this device to the guest, and that multiple
> guests might end up with the same device mapped?

They won't (afaict) - see the checking done by domain_context_mapping_one()
when it finds an already present context entry. The first one to make such
a mapping will win.

Jan
Roger Pau Monne Jan. 20, 2020, 4:37 p.m. UTC | #3
On Mon, Jan 20, 2020 at 05:15:22PM +0100, Jan Beulich wrote:
> On 20.01.2020 17:07, Roger Pau Monné  wrote:
> > On Mon, Jan 20, 2020 at 04:42:22PM +0100, Jan Beulich wrote:
> >> --- a/xen/drivers/passthrough/vtd/iommu.c
> >> +++ b/xen/drivers/passthrough/vtd/iommu.c
> >> @@ -1493,18 +1493,28 @@ static int domain_context_mapping(struct
> >>          if ( find_upstream_bridge(seg, &bus, &devfn, &secbus) < 1 )
> >>              break;
> >>  
> >> +        /*
> >> +         * Mapping a bridge should, if anything, pass the struct pci_dev of
> >> +         * that bridge. Since bridges don't normally get assigned to guests,
> >> +         * their owner would be the wrong one. Pass NULL instead.
> >> +         */
> >>          ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
> >> -                                         pci_get_pdev(seg, bus, devfn));
> >> +                                         NULL);
> >>  
> >>          /*
> >>           * Devices behind PCIe-to-PCI/PCIx bridge may generate different
> >>           * requester-id. It may originate from devfn=0 on the secondary bus
> >>           * behind the bridge. Map that id as well if we didn't already.
> >> +         *
> >> +         * Somewhat similar as for bridges, we don't want to pass a struct
> >> +         * pci_dev here - there may not even exist one for this (secbus,0,0)
> >> +         * tuple. If there is one, without properly working device groups it
> >> +         * may again not have the correct owner.
> >>           */
> >>          if ( !ret && pdev_type(seg, bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE &&
> >>               (secbus != pdev->bus || pdev->devfn != 0) )
> >>              ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0,
> >> -                                             pci_get_pdev(seg, secbus, 0));
> >> +                                             NULL);
> > 
> > Isn't it dangerous to map this device to the guest, and that multiple
> > guests might end up with the same device mapped?
> 
> They won't (afaict) - see the checking done by domain_context_mapping_one()
> when it finds an already present context entry. The first one to make such
> a mapping will win.

Right, thanks, I find all this code quite confusing. If the iommu
context is assigned to a domain, won't it make sense to keep the
device in sync and also assign it to that domain?

So that the owner in the iommu context matches the owner on the
pci_dev struct.

Roger.
Jan Beulich Jan. 20, 2020, 4:41 p.m. UTC | #4
On 20.01.2020 17:37, Roger Pau Monné wrote:
> On Mon, Jan 20, 2020 at 05:15:22PM +0100, Jan Beulich wrote:
>> On 20.01.2020 17:07, Roger Pau Monné  wrote:
>>> On Mon, Jan 20, 2020 at 04:42:22PM +0100, Jan Beulich wrote:
>>>> --- a/xen/drivers/passthrough/vtd/iommu.c
>>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>>>> @@ -1493,18 +1493,28 @@ static int domain_context_mapping(struct
>>>>          if ( find_upstream_bridge(seg, &bus, &devfn, &secbus) < 1 )
>>>>              break;
>>>>  
>>>> +        /*
>>>> +         * Mapping a bridge should, if anything, pass the struct pci_dev of
>>>> +         * that bridge. Since bridges don't normally get assigned to guests,
>>>> +         * their owner would be the wrong one. Pass NULL instead.
>>>> +         */
>>>>          ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
>>>> -                                         pci_get_pdev(seg, bus, devfn));
>>>> +                                         NULL);
>>>>  
>>>>          /*
>>>>           * Devices behind PCIe-to-PCI/PCIx bridge may generate different
>>>>           * requester-id. It may originate from devfn=0 on the secondary bus
>>>>           * behind the bridge. Map that id as well if we didn't already.
>>>> +         *
>>>> +         * Somewhat similar as for bridges, we don't want to pass a struct
>>>> +         * pci_dev here - there may not even exist one for this (secbus,0,0)
>>>> +         * tuple. If there is one, without properly working device groups it
>>>> +         * may again not have the correct owner.
>>>>           */
>>>>          if ( !ret && pdev_type(seg, bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE &&
>>>>               (secbus != pdev->bus || pdev->devfn != 0) )
>>>>              ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0,
>>>> -                                             pci_get_pdev(seg, secbus, 0));
>>>> +                                             NULL);
>>>
>>> Isn't it dangerous to map this device to the guest, and that multiple
>>> guests might end up with the same device mapped?
>>
>> They won't (afaict) - see the checking done by domain_context_mapping_one()
>> when it finds an already present context entry. The first one to make such
>> a mapping will win.
> 
> Right, thanks, I find all this code quite confusing. If the iommu
> context is assigned to a domain, won't it make sense to keep the
> device in sync and also assign it to that domain?
> 
> So that the owner in the iommu context matches the owner on the
> pci_dev struct.

For bridges - no, I don't think so. For these "fake" (possibly phantom,
possibly real) devices at (secbus,0,0) I don't know for sure, but - as
the comment I'm adding says - I think this should be taken care of when
we gain properly working device groups (at which point if this "fake"
device is actually a real one, it should be put into the same group).

Jan
Roger Pau Monne Jan. 20, 2020, 5:06 p.m. UTC | #5
On Mon, Jan 20, 2020 at 05:41:17PM +0100, Jan Beulich wrote:
> On 20.01.2020 17:37, Roger Pau Monné wrote:
> > On Mon, Jan 20, 2020 at 05:15:22PM +0100, Jan Beulich wrote:
> >> On 20.01.2020 17:07, Roger Pau Monné  wrote:
> >>> On Mon, Jan 20, 2020 at 04:42:22PM +0100, Jan Beulich wrote:
> >>>> --- a/xen/drivers/passthrough/vtd/iommu.c
> >>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
> >>>> @@ -1493,18 +1493,28 @@ static int domain_context_mapping(struct
> >>>>          if ( find_upstream_bridge(seg, &bus, &devfn, &secbus) < 1 )
> >>>>              break;
> >>>>  
> >>>> +        /*
> >>>> +         * Mapping a bridge should, if anything, pass the struct pci_dev of
> >>>> +         * that bridge. Since bridges don't normally get assigned to guests,
> >>>> +         * their owner would be the wrong one. Pass NULL instead.
> >>>> +         */
> >>>>          ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
> >>>> -                                         pci_get_pdev(seg, bus, devfn));
> >>>> +                                         NULL);
> >>>>  
> >>>>          /*
> >>>>           * Devices behind PCIe-to-PCI/PCIx bridge may generate different
> >>>>           * requester-id. It may originate from devfn=0 on the secondary bus
> >>>>           * behind the bridge. Map that id as well if we didn't already.
> >>>> +         *
> >>>> +         * Somewhat similar as for bridges, we don't want to pass a struct
> >>>> +         * pci_dev here - there may not even exist one for this (secbus,0,0)
> >>>> +         * tuple. If there is one, without properly working device groups it
> >>>> +         * may again not have the correct owner.
> >>>>           */
> >>>>          if ( !ret && pdev_type(seg, bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE &&
> >>>>               (secbus != pdev->bus || pdev->devfn != 0) )
> >>>>              ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0,
> >>>> -                                             pci_get_pdev(seg, secbus, 0));
> >>>> +                                             NULL);
> >>>
> >>> Isn't it dangerous to map this device to the guest, and that multiple
> >>> guests might end up with the same device mapped?
> >>
> >> They won't (afaict) - see the checking done by domain_context_mapping_one()
> >> when it finds an already present context entry. The first one to make such
> >> a mapping will win.
> > 
> > Right, thanks, I find all this code quite confusing. If the iommu
> > context is assigned to a domain, won't it make sense to keep the
> > device in sync and also assign it to that domain?
> > 
> > So that the owner in the iommu context matches the owner on the
> > pci_dev struct.
> 
> For bridges - no, I don't think so. For these "fake" (possibly phantom,
> possibly real) devices at (secbus,0,0) I don't know for sure, but - as
> the comment I'm adding says - I think this should be taken care of when
> we gain properly working device groups (at which point if this "fake"
> device is actually a real one, it should be put into the same group).

Yes, that's true. Also assigning the pci_dev to the guest could allow
the guest to actually interact with it, which is not what we actually
want.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.
Tian, Kevin Jan. 21, 2020, 6:28 a.m. UTC | #6
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, January 20, 2020 11:42 PM
> 
> When passed a non-NULL pdev, the function does an owner check when it
> finds an already existing context mapping. Bridges, however, don't get
> passed through to guests, and hence their owner is always going to be
> Dom0, leading to the assigment of all but one of the function of multi-
> function PCI devices behind bridges to fail.
> 
> Reported-by: Marek Marczykowski-Górecki
> <marmarek@invisiblethingslab.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>


Reviewed-by: Kevin Tian <kevin.tian@intel.com>
diff mbox series

Patch

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1493,18 +1493,28 @@  static int domain_context_mapping(struct
         if ( find_upstream_bridge(seg, &bus, &devfn, &secbus) < 1 )
             break;
 
+        /*
+         * Mapping a bridge should, if anything, pass the struct pci_dev of
+         * that bridge. Since bridges don't normally get assigned to guests,
+         * their owner would be the wrong one. Pass NULL instead.
+         */
         ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
-                                         pci_get_pdev(seg, bus, devfn));
+                                         NULL);
 
         /*
          * Devices behind PCIe-to-PCI/PCIx bridge may generate different
          * requester-id. It may originate from devfn=0 on the secondary bus
          * behind the bridge. Map that id as well if we didn't already.
+         *
+         * Somewhat similar as for bridges, we don't want to pass a struct
+         * pci_dev here - there may not even exist one for this (secbus,0,0)
+         * tuple. If there is one, without properly working device groups it
+         * may again not have the correct owner.
          */
         if ( !ret && pdev_type(seg, bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE &&
              (secbus != pdev->bus || pdev->devfn != 0) )
             ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0,
-                                             pci_get_pdev(seg, secbus, 0));
+                                             NULL);
 
         break;