Message ID | 1597333243-29483-2-git-send-email-mjrosato@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v3] PCI: Introduce flag for detached virtual functions | expand |
On 8/13/20 11:40 AM, Matthew Rosato wrote: > s390x has the notion of providing VFs to the kernel in a manner > where the associated PF is inaccessible other than via firmware. > These are not treated as typical VFs and access to them is emulated > by underlying firmware which can still access the PF. After > the referened commit however these detached VFs were no longer able > to work with vfio-pci as the firmware does not provide emulation of > the PCI_COMMAND_MEMORY bit. In this case, let's explicitly recognize > these detached VFs so that vfio-pci can allow memory access to > them again. > > Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Polite ping - If unhappy with the approach moving in this direction, I have also played around with Alex's prior suggestion of a dev_flags bit that denotes a device that doesn't implement PCI_COMMAND_MEMORY. Please advise. > --- > arch/s390/pci/pci_bus.c | 13 +++++++++++++ > drivers/vfio/pci/vfio_pci_config.c | 8 ++++---- > include/linux/pci.h | 4 ++++ > 3 files changed, 21 insertions(+), 4 deletions(-) > > diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c > index 642a993..1b33076 100644 > --- a/arch/s390/pci/pci_bus.c > +++ b/arch/s390/pci/pci_bus.c > @@ -184,6 +184,19 @@ static inline int zpci_bus_setup_virtfn(struct zpci_bus *zbus, > } > #endif > > +void pcibios_bus_add_device(struct pci_dev *pdev) > +{ > + struct zpci_dev *zdev = to_zpci(pdev); > + > + /* > + * If we have a VF on a non-multifunction bus, it must be a VF that is > + * detached from its parent PF. We rely on firmware emulation to > + * provide underlying PF details. > + */ > + if (zdev->vfn && !zdev->zbus->multifunction) > + pdev->detached_vf = 1; > +} > + > static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev) > { > struct pci_bus *bus; > diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c > index d98843f..98f93d1 100644 > --- a/drivers/vfio/pci/vfio_pci_config.c > +++ b/drivers/vfio/pci/vfio_pci_config.c > @@ -406,7 +406,7 @@ bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev) > * PF SR-IOV capability, there's therefore no need to trigger > * faults based on the virtual value. > */ > - return pdev->is_virtfn || (cmd & PCI_COMMAND_MEMORY); > + return dev_is_vf(&pdev->dev) || (cmd & PCI_COMMAND_MEMORY); > } > > /* > @@ -420,7 +420,7 @@ static void vfio_bar_restore(struct vfio_pci_device *vdev) > u16 cmd; > int i; > > - if (pdev->is_virtfn) > + if (dev_is_vf(&pdev->dev)) > return; > > pci_info(pdev, "%s: reset recovery - restoring BARs\n", __func__); > @@ -521,7 +521,7 @@ static int vfio_basic_config_read(struct vfio_pci_device *vdev, int pos, > count = vfio_default_config_read(vdev, pos, count, perm, offset, val); > > /* Mask in virtual memory enable for SR-IOV devices */ > - if (offset == PCI_COMMAND && vdev->pdev->is_virtfn) { > + if ((offset == PCI_COMMAND) && (dev_is_vf(&vdev->pdev->dev))) { > u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]); > u32 tmp_val = le32_to_cpu(*val); > > @@ -1713,7 +1713,7 @@ int vfio_config_init(struct vfio_pci_device *vdev) > vdev->rbar[5] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_5]); > vdev->rbar[6] = le32_to_cpu(*(__le32 *)&vconfig[PCI_ROM_ADDRESS]); > > - if (pdev->is_virtfn) { > + if (dev_is_vf(&pdev->dev)) { > *(__le16 *)&vconfig[PCI_VENDOR_ID] = cpu_to_le16(pdev->vendor); > *(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(pdev->device); > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 8355306..7c062de 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -445,6 +445,7 @@ struct pci_dev { > unsigned int is_probed:1; /* Device probing in progress */ > unsigned int link_active_reporting:1;/* Device capable of reporting link active */ > unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */ > + unsigned int detached_vf:1; /* VF without local PF access */ > pci_dev_flags_t dev_flags; > atomic_t enable_cnt; /* pci_enable_device has been called */ > > @@ -1057,6 +1058,8 @@ struct resource *pci_find_parent_resource(const struct pci_dev *dev, > void pci_sort_breadthfirst(void); > #define dev_is_pci(d) ((d)->bus == &pci_bus_type) > #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false)) > +#define dev_is_vf(d) ((dev_is_pci(d) ? (to_pci_dev(d)->is_virtfn || \ > + to_pci_dev(d)->detached_vf) : false)) > > /* Generic PCI functions exported to card drivers */ > > @@ -1764,6 +1767,7 @@ static inline struct pci_dev *pci_get_domain_bus_and_slot(int domain, > > #define dev_is_pci(d) (false) > #define dev_is_pf(d) (false) > +#define dev_is_vf(d) (false) > static inline bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) > { return false; } > static inline int pci_irqd_intx_xlate(struct irq_domain *d, >
On Mon, 24 Aug 2020 10:21:24 -0400 Matthew Rosato <mjrosato@linux.ibm.com> wrote: > On 8/13/20 11:40 AM, Matthew Rosato wrote: > > s390x has the notion of providing VFs to the kernel in a manner > > where the associated PF is inaccessible other than via firmware. > > These are not treated as typical VFs and access to them is emulated > > by underlying firmware which can still access the PF. After > > the referened commit however these detached VFs were no longer able > > to work with vfio-pci as the firmware does not provide emulation of > > the PCI_COMMAND_MEMORY bit. In this case, let's explicitly recognize > > these detached VFs so that vfio-pci can allow memory access to > > them again. > > > Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > > Polite ping - If unhappy with the approach moving in this direction, I > have also played around with Alex's prior suggestion of a dev_flags bit > that denotes a device that doesn't implement PCI_COMMAND_MEMORY. Please > advise. I'm not unhappy with it, but there are quite a number of users of is_virtfn and I wonder to what extent we can replace all of them. For instance if the longer term plan would be to consider is_virtfn private then I think there are places in vfio-pci where we'd need to test (pci_physfn(pdev) != pdev) in order to make sure we're working on the topology we expect (see VF token handling). If we want to consider these detached VFs as actual VFs (minus the PF) everywhere in the code, rather than a PF that doesn't implement random features as determined by the bare metal hypervisor, then this might be the way to go. The former implies that we'd migrate away from is_virtfn to this new interface, potentially changing the code path these devices would take as that adoption proceeds. Have you taken a look at other is_virtfn use cases to see if any would be strictly undesirable for this class of devices? Otherwise I think Bjorn needs to weigh in since the PCI-core change is a central aspect to this proposal. Thanks, Alex > > --- > > arch/s390/pci/pci_bus.c | 13 +++++++++++++ > > drivers/vfio/pci/vfio_pci_config.c | 8 ++++---- > > include/linux/pci.h | 4 ++++ > > 3 files changed, 21 insertions(+), 4 deletions(-) > > > > diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c > > index 642a993..1b33076 100644 > > --- a/arch/s390/pci/pci_bus.c > > +++ b/arch/s390/pci/pci_bus.c > > @@ -184,6 +184,19 @@ static inline int zpci_bus_setup_virtfn(struct zpci_bus *zbus, > > } > > #endif > > > > +void pcibios_bus_add_device(struct pci_dev *pdev) > > +{ > > + struct zpci_dev *zdev = to_zpci(pdev); > > + > > + /* > > + * If we have a VF on a non-multifunction bus, it must be a VF that is > > + * detached from its parent PF. We rely on firmware emulation to > > + * provide underlying PF details. > > + */ > > + if (zdev->vfn && !zdev->zbus->multifunction) > > + pdev->detached_vf = 1; > > +} > > + > > static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev) > > { > > struct pci_bus *bus; > > diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c > > index d98843f..98f93d1 100644 > > --- a/drivers/vfio/pci/vfio_pci_config.c > > +++ b/drivers/vfio/pci/vfio_pci_config.c > > @@ -406,7 +406,7 @@ bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev) > > * PF SR-IOV capability, there's therefore no need to trigger > > * faults based on the virtual value. > > */ > > - return pdev->is_virtfn || (cmd & PCI_COMMAND_MEMORY); > > + return dev_is_vf(&pdev->dev) || (cmd & PCI_COMMAND_MEMORY); > > } > > > > /* > > @@ -420,7 +420,7 @@ static void vfio_bar_restore(struct vfio_pci_device *vdev) > > u16 cmd; > > int i; > > > > - if (pdev->is_virtfn) > > + if (dev_is_vf(&pdev->dev)) > > return; > > > > pci_info(pdev, "%s: reset recovery - restoring BARs\n", __func__); > > @@ -521,7 +521,7 @@ static int vfio_basic_config_read(struct vfio_pci_device *vdev, int pos, > > count = vfio_default_config_read(vdev, pos, count, perm, offset, val); > > > > /* Mask in virtual memory enable for SR-IOV devices */ > > - if (offset == PCI_COMMAND && vdev->pdev->is_virtfn) { > > + if ((offset == PCI_COMMAND) && (dev_is_vf(&vdev->pdev->dev))) { > > u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]); > > u32 tmp_val = le32_to_cpu(*val); > > > > @@ -1713,7 +1713,7 @@ int vfio_config_init(struct vfio_pci_device *vdev) > > vdev->rbar[5] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_5]); > > vdev->rbar[6] = le32_to_cpu(*(__le32 *)&vconfig[PCI_ROM_ADDRESS]); > > > > - if (pdev->is_virtfn) { > > + if (dev_is_vf(&pdev->dev)) { > > *(__le16 *)&vconfig[PCI_VENDOR_ID] = cpu_to_le16(pdev->vendor); > > *(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(pdev->device); > > > > diff --git a/include/linux/pci.h b/include/linux/pci.h > > index 8355306..7c062de 100644 > > --- a/include/linux/pci.h > > +++ b/include/linux/pci.h > > @@ -445,6 +445,7 @@ struct pci_dev { > > unsigned int is_probed:1; /* Device probing in progress */ > > unsigned int link_active_reporting:1;/* Device capable of reporting link active */ > > unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */ > > + unsigned int detached_vf:1; /* VF without local PF access */ > > pci_dev_flags_t dev_flags; > > atomic_t enable_cnt; /* pci_enable_device has been called */ > > > > @@ -1057,6 +1058,8 @@ struct resource *pci_find_parent_resource(const struct pci_dev *dev, > > void pci_sort_breadthfirst(void); > > #define dev_is_pci(d) ((d)->bus == &pci_bus_type) > > #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false)) > > +#define dev_is_vf(d) ((dev_is_pci(d) ? (to_pci_dev(d)->is_virtfn || \ > > + to_pci_dev(d)->detached_vf) : false)) > > > > /* Generic PCI functions exported to card drivers */ > > > > @@ -1764,6 +1767,7 @@ static inline struct pci_dev *pci_get_domain_bus_and_slot(int domain, > > > > #define dev_is_pci(d) (false) > > #define dev_is_pf(d) (false) > > +#define dev_is_vf(d) (false) > > static inline bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) > > { return false; } > > static inline int pci_irqd_intx_xlate(struct irq_domain *d, > > >
Re the subject line, this patch does a lot more than just "introduce a flag"; AFAICT it actually enables important VFIO functionality, e.g., something like: vfio/pci: Enable MMIO access for s390 detached VFs On Thu, Aug 13, 2020 at 11:40:43AM -0400, Matthew Rosato wrote: > s390x has the notion of providing VFs to the kernel in a manner > where the associated PF is inaccessible other than via firmware. > These are not treated as typical VFs and access to them is emulated > by underlying firmware which can still access the PF. After > the referened commit however these detached VFs were no longer able > to work with vfio-pci as the firmware does not provide emulation of > the PCI_COMMAND_MEMORY bit. In this case, let's explicitly recognize > these detached VFs so that vfio-pci can allow memory access to > them again. Out of curiosity, in what sense is the PF inaccessible? Is it *impossible* for Linux to access the PF, or is it just not enumerated by clp_list_pci() so Linux doesn't know about it? VFs do not implement PCI_COMMAND, so I guess "firmware does not provide emulation of PCI_COMMAND_MEMORY" means something like "we can't access the PF so we can't enable/disable PCI_COMMAND_MEMORY"? s/referened/referenced/ > Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > arch/s390/pci/pci_bus.c | 13 +++++++++++++ > drivers/vfio/pci/vfio_pci_config.c | 8 ++++---- > include/linux/pci.h | 4 ++++ > 3 files changed, 21 insertions(+), 4 deletions(-) > > diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c > index 642a993..1b33076 100644 > --- a/arch/s390/pci/pci_bus.c > +++ b/arch/s390/pci/pci_bus.c > @@ -184,6 +184,19 @@ static inline int zpci_bus_setup_virtfn(struct zpci_bus *zbus, > } > #endif > > +void pcibios_bus_add_device(struct pci_dev *pdev) > +{ > + struct zpci_dev *zdev = to_zpci(pdev); > + > + /* > + * If we have a VF on a non-multifunction bus, it must be a VF that is > + * detached from its parent PF. We rely on firmware emulation to > + * provide underlying PF details. What exactly does "multifunction bus" mean? I'm familiar with multi-function *devices*, but not multi-function buses. > + */ > + if (zdev->vfn && !zdev->zbus->multifunction) > + pdev->detached_vf = 1; > +} > + > static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev) > { > struct pci_bus *bus; > diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c > index d98843f..98f93d1 100644 > --- a/drivers/vfio/pci/vfio_pci_config.c > +++ b/drivers/vfio/pci/vfio_pci_config.c > @@ -406,7 +406,7 @@ bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev) > * PF SR-IOV capability, there's therefore no need to trigger > * faults based on the virtual value. > */ > - return pdev->is_virtfn || (cmd & PCI_COMMAND_MEMORY); > + return dev_is_vf(&pdev->dev) || (cmd & PCI_COMMAND_MEMORY); I'm not super keen on the idea of having two subtly different ways of identifying VFs. I think that will be confusing. This seems to be the critical line, so whatever we do here, it will be out of the ordinary and probably deserves a little comment. If Linux doesn't see the PF, does pci_physfn(VF) return NULL, i.e., is VF->physfn NULL? > } > > /* > @@ -420,7 +420,7 @@ static void vfio_bar_restore(struct vfio_pci_device *vdev) > u16 cmd; > int i; > > - if (pdev->is_virtfn) > + if (dev_is_vf(&pdev->dev)) > return; > > pci_info(pdev, "%s: reset recovery - restoring BARs\n", __func__); > @@ -521,7 +521,7 @@ static int vfio_basic_config_read(struct vfio_pci_device *vdev, int pos, > count = vfio_default_config_read(vdev, pos, count, perm, offset, val); > > /* Mask in virtual memory enable for SR-IOV devices */ > - if (offset == PCI_COMMAND && vdev->pdev->is_virtfn) { > + if ((offset == PCI_COMMAND) && (dev_is_vf(&vdev->pdev->dev))) { > u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]); > u32 tmp_val = le32_to_cpu(*val); > > @@ -1713,7 +1713,7 @@ int vfio_config_init(struct vfio_pci_device *vdev) > vdev->rbar[5] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_5]); > vdev->rbar[6] = le32_to_cpu(*(__le32 *)&vconfig[PCI_ROM_ADDRESS]); > > - if (pdev->is_virtfn) { > + if (dev_is_vf(&pdev->dev)) { > *(__le16 *)&vconfig[PCI_VENDOR_ID] = cpu_to_le16(pdev->vendor); > *(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(pdev->device); > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 8355306..7c062de 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -445,6 +445,7 @@ struct pci_dev { > unsigned int is_probed:1; /* Device probing in progress */ > unsigned int link_active_reporting:1;/* Device capable of reporting link active */ > unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */ > + unsigned int detached_vf:1; /* VF without local PF access */ > pci_dev_flags_t dev_flags; > atomic_t enable_cnt; /* pci_enable_device has been called */ > > @@ -1057,6 +1058,8 @@ struct resource *pci_find_parent_resource(const struct pci_dev *dev, > void pci_sort_breadthfirst(void); > #define dev_is_pci(d) ((d)->bus == &pci_bus_type) > #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false)) > +#define dev_is_vf(d) ((dev_is_pci(d) ? (to_pci_dev(d)->is_virtfn || \ > + to_pci_dev(d)->detached_vf) : false)) > > /* Generic PCI functions exported to card drivers */ > > @@ -1764,6 +1767,7 @@ static inline struct pci_dev *pci_get_domain_bus_and_slot(int domain, > > #define dev_is_pci(d) (false) > #define dev_is_pf(d) (false) > +#define dev_is_vf(d) (false) > static inline bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) > { return false; } > static inline int pci_irqd_intx_xlate(struct irq_domain *d, > -- > 1.8.3.1 >
On Thu, 27 Aug 2020 13:31:38 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > Re the subject line, this patch does a lot more than just "introduce a > flag"; AFAICT it actually enables important VFIO functionality, e.g., > something like: > > vfio/pci: Enable MMIO access for s390 detached VFs > > On Thu, Aug 13, 2020 at 11:40:43AM -0400, Matthew Rosato wrote: > > s390x has the notion of providing VFs to the kernel in a manner > > where the associated PF is inaccessible other than via firmware. > > These are not treated as typical VFs and access to them is emulated > > by underlying firmware which can still access the PF. After > > the referened commit however these detached VFs were no longer able > > to work with vfio-pci as the firmware does not provide emulation of > > the PCI_COMMAND_MEMORY bit. In this case, let's explicitly recognize > > these detached VFs so that vfio-pci can allow memory access to > > them again. > > Out of curiosity, in what sense is the PF inaccessible? Is it > *impossible* for Linux to access the PF, or is it just not enumerated > by clp_list_pci() so Linux doesn't know about it? > > VFs do not implement PCI_COMMAND, so I guess "firmware does not > provide emulation of PCI_COMMAND_MEMORY" means something like "we > can't access the PF so we can't enable/disable PCI_COMMAND_MEMORY"? > > s/referened/referenced/ > > > Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > > --- > > arch/s390/pci/pci_bus.c | 13 +++++++++++++ > > drivers/vfio/pci/vfio_pci_config.c | 8 ++++---- > > include/linux/pci.h | 4 ++++ > > 3 files changed, 21 insertions(+), 4 deletions(-) > > > > diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c > > index 642a993..1b33076 100644 > > --- a/arch/s390/pci/pci_bus.c > > +++ b/arch/s390/pci/pci_bus.c > > @@ -184,6 +184,19 @@ static inline int zpci_bus_setup_virtfn(struct zpci_bus *zbus, > > } > > #endif > > > > +void pcibios_bus_add_device(struct pci_dev *pdev) > > +{ > > + struct zpci_dev *zdev = to_zpci(pdev); > > + > > + /* > > + * If we have a VF on a non-multifunction bus, it must be a VF that is > > + * detached from its parent PF. We rely on firmware emulation to > > + * provide underlying PF details. > > What exactly does "multifunction bus" mean? I'm familiar with > multi-function *devices*, but not multi-function buses. > > > + */ > > + if (zdev->vfn && !zdev->zbus->multifunction) > > + pdev->detached_vf = 1; > > +} > > + > > static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev) > > { > > struct pci_bus *bus; > > diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c > > index d98843f..98f93d1 100644 > > --- a/drivers/vfio/pci/vfio_pci_config.c > > +++ b/drivers/vfio/pci/vfio_pci_config.c > > @@ -406,7 +406,7 @@ bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev) > > * PF SR-IOV capability, there's therefore no need to trigger > > * faults based on the virtual value. > > */ > > - return pdev->is_virtfn || (cmd & PCI_COMMAND_MEMORY); > > + return dev_is_vf(&pdev->dev) || (cmd & PCI_COMMAND_MEMORY); > > I'm not super keen on the idea of having two subtly different ways of > identifying VFs. I think that will be confusing. This seems to be > the critical line, so whatever we do here, it will be out of the > ordinary and probably deserves a little comment. > > If Linux doesn't see the PF, does pci_physfn(VF) return NULL, i.e., is > VF->physfn NULL? FWIW, pci_physfn() never returns NULL, it returns the provided pdev if is_virtfn is not set. This proposal wouldn't change that return value. AIUI pci_physfn(), the caller needs to test that the returned device is different from the provided device if there's really code that wants to traverse to the PF. My interpretation of what's happening here is that we're a guest running on a bare metal hypervisor (I assume z/VM) and we're assigned a VF that appears on this non-multifunction bus, but the hypervisor doesn't provide emulation of all of the non-implemented config space features of a VF, the memory enable bit being relevant for this fix. We're therefore trying to detect this VF nature of the device, which gets a bit messy since a VF implies a PF on bare metal. The PF would be owned by the hypervisor and not accessible to us. An alternative idea we tossed around, that might still be a possibility, is using dev_flags to describe the specific missing feature, for example something about the command register memory bit being hardwired to zero but always enabled (assuming the PF SR-IOV MSE bit is not cleared). Thanks, Alex
On 8/27/20 2:31 PM, Bjorn Helgaas wrote: > Re the subject line, this patch does a lot more than just "introduce a > flag"; AFAICT it actually enables important VFIO functionality, e.g., > something like: > > vfio/pci: Enable MMIO access for s390 detached VFs > Fair -- maybe s/Enable/Restore/ as this functionality had been working until the mem bit enforcement was added to vfio via abafbc551fdd. > On Thu, Aug 13, 2020 at 11:40:43AM -0400, Matthew Rosato wrote: >> s390x has the notion of providing VFs to the kernel in a manner >> where the associated PF is inaccessible other than via firmware. >> These are not treated as typical VFs and access to them is emulated >> by underlying firmware which can still access the PF. After >> the referened commit however these detached VFs were no longer able >> to work with vfio-pci as the firmware does not provide emulation of >> the PCI_COMMAND_MEMORY bit. In this case, let's explicitly recognize >> these detached VFs so that vfio-pci can allow memory access to >> them again. > > Out of curiosity, in what sense is the PF inaccessible? Is it > *impossible* for Linux to access the PF, or is it just not enumerated > by clp_list_pci() so Linux doesn't know about it? So, it is not enumerated via clp_list_pci because it is not assigned to the host partition -- So while it is physically available to the machine and the firmware can see it, the machine is divided up into multiple logical partitions -- The partition where we are running the host kernel in this scenario has no access to the PF. Basically, think bare metal hypervisor and only the VF was passed through to the kernel guest. > VFs do not implement PCI_COMMAND, so I guess "firmware does not > provide emulation of PCI_COMMAND_MEMORY" means something like "we > can't access the PF so we can't enable/disable PCI_COMMAND_MEMORY"? > So, the root issue is that vfio-pci added enforcement of the bit via abafbc551fdd -- Then subsequently waived the requirement for VFs via bugfix ebfa440ce38b since as you say it can't be on for VFs per the PCIe spec -- Problem is, vfio can't tell these devices are VFs because is_virtfn=0 for these devices. Fundamentally, I am trying to find a proper way to tell vfio it's OK to waive the requirement for these devices too. > s/referened/referenced/ > >> Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") >> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> >> --- >> arch/s390/pci/pci_bus.c | 13 +++++++++++++ >> drivers/vfio/pci/vfio_pci_config.c | 8 ++++---- >> include/linux/pci.h | 4 ++++ >> 3 files changed, 21 insertions(+), 4 deletions(-) >> >> diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c >> index 642a993..1b33076 100644 >> --- a/arch/s390/pci/pci_bus.c >> +++ b/arch/s390/pci/pci_bus.c >> @@ -184,6 +184,19 @@ static inline int zpci_bus_setup_virtfn(struct zpci_bus *zbus, >> } >> #endif >> >> +void pcibios_bus_add_device(struct pci_dev *pdev) >> +{ >> + struct zpci_dev *zdev = to_zpci(pdev); >> + >> + /* >> + * If we have a VF on a non-multifunction bus, it must be a VF that is >> + * detached from its parent PF. We rely on firmware emulation to >> + * provide underlying PF details. > > What exactly does "multifunction bus" mean? I'm familiar with > multi-function *devices*, but not multi-function buses. > So, this flag is effectively stating whether proper SR-IOV support is available for devices on the bus. This is an interesting point though, I don't think this will catch all instances of this scenario (vs a more-direct check for a device that has a vfn but no linked PF). I need to look at this again, thanks. >> + */ >> + if (zdev->vfn && !zdev->zbus->multifunction) >> + pdev->detached_vf = 1; >> +} >> + >> static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev) >> { >> struct pci_bus *bus; >> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c >> index d98843f..98f93d1 100644 >> --- a/drivers/vfio/pci/vfio_pci_config.c >> +++ b/drivers/vfio/pci/vfio_pci_config.c >> @@ -406,7 +406,7 @@ bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev) >> * PF SR-IOV capability, there's therefore no need to trigger >> * faults based on the virtual value. >> */ >> - return pdev->is_virtfn || (cmd & PCI_COMMAND_MEMORY); >> + return dev_is_vf(&pdev->dev) || (cmd & PCI_COMMAND_MEMORY); > > I'm not super keen on the idea of having two subtly different ways of > identifying VFs. I think that will be confusing. This seems to be > the critical line, so whatever we do here, it will be out of the > ordinary and probably deserves a little comment. > Another notion that Alex floated was the idea of tagging these devices that don't implement the PCI_COMMAND_MEMORY bit via a dev_flags bit rather than changing the way we identify VFs. It's grown on me and I've tried it out as an alternative. Does this sort of approach sound better? > If Linux doesn't see the PF, does pci_physfn(VF) return NULL, i.e., is > VF->physfn NULL? > pci_physfn(VF) == VF because is_virtfn=0 for these devices.
On Thu, Aug 27, 2020 at 01:17:48PM -0600, Alex Williamson wrote: > On Thu, 27 Aug 2020 13:31:38 -0500 > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > Re the subject line, this patch does a lot more than just "introduce a > > flag"; AFAICT it actually enables important VFIO functionality, e.g., > > something like: > > > > vfio/pci: Enable MMIO access for s390 detached VFs > > > > On Thu, Aug 13, 2020 at 11:40:43AM -0400, Matthew Rosato wrote: > > > s390x has the notion of providing VFs to the kernel in a manner > > > where the associated PF is inaccessible other than via firmware. > > > These are not treated as typical VFs and access to them is emulated > > > by underlying firmware which can still access the PF. After > > > the referened commit however these detached VFs were no longer able > > > to work with vfio-pci as the firmware does not provide emulation of > > > the PCI_COMMAND_MEMORY bit. In this case, let's explicitly recognize > > > these detached VFs so that vfio-pci can allow memory access to > > > them again. > > > > Out of curiosity, in what sense is the PF inaccessible? Is it > > *impossible* for Linux to access the PF, or is it just not enumerated > > by clp_list_pci() so Linux doesn't know about it? > > > > VFs do not implement PCI_COMMAND, so I guess "firmware does not > > provide emulation of PCI_COMMAND_MEMORY" means something like "we > > can't access the PF so we can't enable/disable PCI_COMMAND_MEMORY"? > > > > s/referened/referenced/ > > > > > Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") > > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > > > --- > > > arch/s390/pci/pci_bus.c | 13 +++++++++++++ > > > drivers/vfio/pci/vfio_pci_config.c | 8 ++++---- > > > include/linux/pci.h | 4 ++++ > > > 3 files changed, 21 insertions(+), 4 deletions(-) > > > > > > diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c > > > index 642a993..1b33076 100644 > > > --- a/arch/s390/pci/pci_bus.c > > > +++ b/arch/s390/pci/pci_bus.c > > > @@ -184,6 +184,19 @@ static inline int zpci_bus_setup_virtfn(struct zpci_bus *zbus, > > > } > > > #endif > > > > > > +void pcibios_bus_add_device(struct pci_dev *pdev) > > > +{ > > > + struct zpci_dev *zdev = to_zpci(pdev); > > > + > > > + /* > > > + * If we have a VF on a non-multifunction bus, it must be a VF that is > > > + * detached from its parent PF. We rely on firmware emulation to > > > + * provide underlying PF details. > > > > What exactly does "multifunction bus" mean? I'm familiar with > > multi-function *devices*, but not multi-function buses. > > > > > + */ > > > + if (zdev->vfn && !zdev->zbus->multifunction) > > > + pdev->detached_vf = 1; > > > +} > > > + > > > static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev) > > > { > > > struct pci_bus *bus; > > > diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c > > > index d98843f..98f93d1 100644 > > > --- a/drivers/vfio/pci/vfio_pci_config.c > > > +++ b/drivers/vfio/pci/vfio_pci_config.c > > > @@ -406,7 +406,7 @@ bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev) > > > * PF SR-IOV capability, there's therefore no need to trigger > > > * faults based on the virtual value. > > > */ > > > - return pdev->is_virtfn || (cmd & PCI_COMMAND_MEMORY); > > > + return dev_is_vf(&pdev->dev) || (cmd & PCI_COMMAND_MEMORY); > > > > I'm not super keen on the idea of having two subtly different ways of > > identifying VFs. I think that will be confusing. This seems to be > > the critical line, so whatever we do here, it will be out of the > > ordinary and probably deserves a little comment. > > > > If Linux doesn't see the PF, does pci_physfn(VF) return NULL, i.e., is > > VF->physfn NULL? > > FWIW, pci_physfn() never returns NULL, it returns the provided pdev if > is_virtfn is not set. This proposal wouldn't change that return value. > AIUI pci_physfn(), the caller needs to test that the returned device is > different from the provided device if there's really code that wants to > traverse to the PF. Oh, so this VF has is_virtfn==0. That seems weird. There are lots of other ways that a VF is different: Vendor/Device IDs are 0xffff, BARs are zeroes, etc. It sounds like you're sweeping those under the rug by avoiding the normal enumeration path (e.g., you don't have to size the BARs), but if it actually is a VF, it seems like there might be fewer surprises if we treat it as one. Why don't you just set is_virtfn=1 since it *is* a VF, and then deal with the special cases where you want to touch the PF? Bjorn
On 8/27/20 10:33 PM, Bjorn Helgaas wrote: > On Thu, Aug 27, 2020 at 01:17:48PM -0600, Alex Williamson wrote: >> On Thu, 27 Aug 2020 13:31:38 -0500 >> Bjorn Helgaas <helgaas@kernel.org> wrote: >> >>> Re the subject line, this patch does a lot more than just "introduce a >>> flag"; AFAICT it actually enables important VFIO functionality, e.g., >>> something like: >>> >>> vfio/pci: Enable MMIO access for s390 detached VFs >>> >>> On Thu, Aug 13, 2020 at 11:40:43AM -0400, Matthew Rosato wrote: >>>> s390x has the notion of providing VFs to the kernel in a manner >>>> where the associated PF is inaccessible other than via firmware. >>>> These are not treated as typical VFs and access to them is emulated >>>> by underlying firmware which can still access the PF. After >>>> the referened commit however these detached VFs were no longer able >>>> to work with vfio-pci as the firmware does not provide emulation of >>>> the PCI_COMMAND_MEMORY bit. In this case, let's explicitly recognize >>>> these detached VFs so that vfio-pci can allow memory access to >>>> them again. >>> >>> Out of curiosity, in what sense is the PF inaccessible? Is it >>> *impossible* for Linux to access the PF, or is it just not enumerated >>> by clp_list_pci() so Linux doesn't know about it? If it is possible to access the PF that would be a very severe bug in the machine level hypervisor partition isolation. Note also that POWER has a very similar setup. Also even if we have access to the PF, we do get some hypervisor involvement (pdev->no_vf_scan). Remind you all OSs on IBM Z are _always_ running under a machine level hypervisor in logical partitions (with partitioned memory, no paging). >>> >>> VFs do not implement PCI_COMMAND, so I guess "firmware does not >>> provide emulation of PCI_COMMAND_MEMORY" means something like "we >>> can't access the PF so we can't enable/disable PCI_COMMAND_MEMORY"? >>> >>> s/referened/referenced/ >>> >>>> Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") >>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> >>>> --- >>>> arch/s390/pci/pci_bus.c | 13 +++++++++++++ >>>> drivers/vfio/pci/vfio_pci_config.c | 8 ++++---- >>>> include/linux/pci.h | 4 ++++ >>>> 3 files changed, 21 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c >>>> index 642a993..1b33076 100644 >>>> --- a/arch/s390/pci/pci_bus.c >>>> +++ b/arch/s390/pci/pci_bus.c >>>> @@ -184,6 +184,19 @@ static inline int zpci_bus_setup_virtfn(struct zpci_bus *zbus, >>>> } >>>> #endif >>>> >>>> +void pcibios_bus_add_device(struct pci_dev *pdev) >>>> +{ >>>> + struct zpci_dev *zdev = to_zpci(pdev); >>>> + >>>> + /* >>>> + * If we have a VF on a non-multifunction bus, it must be a VF that is >>>> + * detached from its parent PF. We rely on firmware emulation to >>>> + * provide underlying PF details. >>> >>> What exactly does "multifunction bus" mean? I'm familiar with >>> multi-function *devices*, but not multi-function buses. Yes this is a bit of an IBM Z quirk, up until v5.8-rc1 IBM Z Linux only knew isolated PCI functions that would get a PCI ID of the form <uid>:00:00.0 where the domain is a value (called UID) that can be determined by the machine administrator. Now for some multi-function devices one really needs to have some of the physical PCI information known to the device driver/in the PCI ID. Still we need to stay compatible to the old scheme and also somehow deal with the fact that the domain value (UID) is set per function. So now for each physical multi-function device we create a zbus that gets assigned all functions belonging to that physical device and we use the UID of the function with devfn == 0 as the domain. Resulting in PCI IDs of the form: <uid>:00:<device>.<function> Now zbus->multifunction basically says if there is more than one function on that zbus which is equivalent to saying that the zbus represents a multi-function device. >>> >>>> + */ >>>> + if (zdev->vfn && !zdev->zbus->multifunction) >>>> + pdev->detached_vf = 1; >>>> +} Note that as of v5.9-rc2 setting pdev->detached_vf would move into zpci_bus_setup_virtfn() and it will be obvious that whenever zdev->vfn != 0 (i.e. it really is a VF according to the platform) we either link the VF with the parent PF or set pdev->detached_vf. It's just that this version was sent before that code landed upstream. >>>> + >>>> static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev) >>>> { >>>> struct pci_bus *bus; >>>> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c >>>> index d98843f..98f93d1 100644 >>>> --- a/drivers/vfio/pci/vfio_pci_config.c >>>> +++ b/drivers/vfio/pci/vfio_pci_config.c >>>> @@ -406,7 +406,7 @@ bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev) >>>> * PF SR-IOV capability, there's therefore no need to trigger >>>> * faults based on the virtual value. >>>> */ >>>> - return pdev->is_virtfn || (cmd & PCI_COMMAND_MEMORY); >>>> + return dev_is_vf(&pdev->dev) || (cmd & PCI_COMMAND_MEMORY); >>> >>> I'm not super keen on the idea of having two subtly different ways of >>> identifying VFs. I think that will be confusing. This seems to be >>> the critical line, so whatever we do here, it will be out of the >>> ordinary and probably deserves a little comment. >>> >>> If Linux doesn't see the PF, does pci_physfn(VF) return NULL, i.e., is >>> VF->physfn NULL? No and yes, as Matthew already set pci_physfn(vf) never return NULL because it returns the pdev itself if is_virtfn is 0. That said we can easily make Linux have pdev->is_virtfn = 1, pdev->physfn = NULL and in fact it was the first thing I suggested because I feel like it is indeed the most logical way to encode "detached VF" and AFAIU there is already some code (ex: in powerpc, eeh_debugfs_break_device()) that assumes this to be the case. However there is also code that assumes that pdev->is_virtfn implies pdev->physfn != NULL including in vfio so this requires checking all pdev->is_virtfn/pci_physfn() uses and of course a clear upstream decision. >> >> FWIW, pci_physfn() never returns NULL, it returns the provided pdev if >> is_virtfn is not set. This proposal wouldn't change that return value. >> AIUI pci_physfn(), the caller needs to test that the returned device is >> different from the provided device if there's really code that wants to >> traverse to the PF. > > Oh, so this VF has is_virtfn==0. That seems weird. There are lots of > other ways that a VF is different: Vendor/Device IDs are 0xffff, BARs > are zeroes, etc. > > It sounds like you're sweeping those under the rug by avoiding the > normal enumeration path (e.g., you don't have to size the BARs), but > if it actually is a VF, it seems like there might be fewer surprises > if we treat it as one. > > Why don't you just set is_virtfn=1 since it *is* a VF, and then deal > with the special cases where you want to touch the PF? > > Bjorn > As we are always running under at least a machine level hypervisor we're somewhat in the same situation as e.g. a KVM guest in that the VFs we see have some emulation that makes them act more like normal PCI functions. It just so happens that the machine level hypervisor does not emulate the PCI_COMMAND_MEMORY, it does emulate BARs and Vendor/Device IDs though. So is_virtfn is 0 for some VF for the same reason it is 0 on KVM/ESXi/HyperV/Jailhouse… guests on other architectures. Note that the BAR and Vendor/Device ID emulation exists also for the VFs created through /sys/…/sriov_numvfs that do have pdev->is_virtfn set to 1 and yes that means some of the emulation is not strictly necessary for us (e.g. Vendor/Device ID) but keeps things the same as on other architectures. Think of it, if any of the other hypervisors also don't implement PCI_COMMAND_MEMORY second level guest PCI pass-through would be broken for the same reason.
> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On Behalf > Of Niklas Schnelle > Sent: Friday, August 28, 2020 5:10 PM > To: Bjorn Helgaas <helgaas@kernel.org>; Alex Williamson > <alex.williamson@redhat.com> > [...] > >> > >> FWIW, pci_physfn() never returns NULL, it returns the provided pdev if > >> is_virtfn is not set. This proposal wouldn't change that return value. > >> AIUI pci_physfn(), the caller needs to test that the returned device is > >> different from the provided device if there's really code that wants to > >> traverse to the PF. > > > > Oh, so this VF has is_virtfn==0. That seems weird. There are lots of > > other ways that a VF is different: Vendor/Device IDs are 0xffff, BARs > > are zeroes, etc. > > > > It sounds like you're sweeping those under the rug by avoiding the > > normal enumeration path (e.g., you don't have to size the BARs), but > > if it actually is a VF, it seems like there might be fewer surprises > > if we treat it as one. > > > > Why don't you just set is_virtfn=1 since it *is* a VF, and then deal > > with the special cases where you want to touch the PF? > > > > Bjorn > > > > As we are always running under at least a machine level hypervisor > we're somewhat in the same situation as e.g. a KVM guest in > that the VFs we see have some emulation that makes them act more like > normal PCI functions. It just so happens that the machine level hypervisor > does not emulate the PCI_COMMAND_MEMORY, it does emulate BARs and > Vendor/Device IDs > though. > So is_virtfn is 0 for some VF for the same reason it is 0 on > KVM/ESXi/HyperV/Jailhouse… > guests on other architectures. I wonder whether it's a good idea to also find a way to set is_virtfn for normal KVM guest which get a vf assigned. There are other cases where faithful emulation of certain PCI capabilities is difficult, e.g. when enabling guest SVA related features (PASID/ATS/PRS). Per PCIe spec, some or all fields of those capabilities are shared between PF and VF. Among them: 1) Some could be emulated properly and indirectly reflected in hardware, e.g. Intel VT-d allows additional control per VF about whether to accept page request, execute/privileged permission, etc. thus allowing VF-specific control even when device-side setting is shared; 2) Some could be purely emulated in software and it's harmless to leave the hardware following PF setting, e.g. ATS enable, STU(?), outstanding page request allocation, etc.; 3) However, I didn’t see a clean way of emulating page_request_ctrl.reset and page_request_status.stopped. Those two have clear definition about outstanding page requests. They are shared thus we cannot issue physical action just due to request on one VF, while pure software emulation cannot guarantee the desired expectation. Of course this issue also exists even on bare metal - pci_enable/disable/reset_pri just do nothing for vf. But there is chance to mitigate (e.g. timeout), but not possible in guest if the guest doesn't know it's actually a VF. Setting is_virtfn=1 allows guest to be cooperative like running together with PF driver. But there is an ordering issue. The guest knows whether a device is VF only when the VF driver is loaded (based on PCI_ID), but related capabilities might be already enabled when attaching the device to IOMMU (at least for intel_iommu). But suppose it's not a hard fix. Last, detached vf is not a PCISIG definition. So the host still needs to do proper emulation (even not faithful) of those capabilities for guests who don't recognize detached vf. Thoughts? Thanks Kevin
diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c index 642a993..1b33076 100644 --- a/arch/s390/pci/pci_bus.c +++ b/arch/s390/pci/pci_bus.c @@ -184,6 +184,19 @@ static inline int zpci_bus_setup_virtfn(struct zpci_bus *zbus, } #endif +void pcibios_bus_add_device(struct pci_dev *pdev) +{ + struct zpci_dev *zdev = to_zpci(pdev); + + /* + * If we have a VF on a non-multifunction bus, it must be a VF that is + * detached from its parent PF. We rely on firmware emulation to + * provide underlying PF details. + */ + if (zdev->vfn && !zdev->zbus->multifunction) + pdev->detached_vf = 1; +} + static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev) { struct pci_bus *bus; diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index d98843f..98f93d1 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -406,7 +406,7 @@ bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev) * PF SR-IOV capability, there's therefore no need to trigger * faults based on the virtual value. */ - return pdev->is_virtfn || (cmd & PCI_COMMAND_MEMORY); + return dev_is_vf(&pdev->dev) || (cmd & PCI_COMMAND_MEMORY); } /* @@ -420,7 +420,7 @@ static void vfio_bar_restore(struct vfio_pci_device *vdev) u16 cmd; int i; - if (pdev->is_virtfn) + if (dev_is_vf(&pdev->dev)) return; pci_info(pdev, "%s: reset recovery - restoring BARs\n", __func__); @@ -521,7 +521,7 @@ static int vfio_basic_config_read(struct vfio_pci_device *vdev, int pos, count = vfio_default_config_read(vdev, pos, count, perm, offset, val); /* Mask in virtual memory enable for SR-IOV devices */ - if (offset == PCI_COMMAND && vdev->pdev->is_virtfn) { + if ((offset == PCI_COMMAND) && (dev_is_vf(&vdev->pdev->dev))) { u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]); u32 tmp_val = le32_to_cpu(*val); @@ -1713,7 +1713,7 @@ int vfio_config_init(struct vfio_pci_device *vdev) vdev->rbar[5] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_5]); vdev->rbar[6] = le32_to_cpu(*(__le32 *)&vconfig[PCI_ROM_ADDRESS]); - if (pdev->is_virtfn) { + if (dev_is_vf(&pdev->dev)) { *(__le16 *)&vconfig[PCI_VENDOR_ID] = cpu_to_le16(pdev->vendor); *(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(pdev->device); diff --git a/include/linux/pci.h b/include/linux/pci.h index 8355306..7c062de 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -445,6 +445,7 @@ struct pci_dev { unsigned int is_probed:1; /* Device probing in progress */ unsigned int link_active_reporting:1;/* Device capable of reporting link active */ unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */ + unsigned int detached_vf:1; /* VF without local PF access */ pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */ @@ -1057,6 +1058,8 @@ struct resource *pci_find_parent_resource(const struct pci_dev *dev, void pci_sort_breadthfirst(void); #define dev_is_pci(d) ((d)->bus == &pci_bus_type) #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false)) +#define dev_is_vf(d) ((dev_is_pci(d) ? (to_pci_dev(d)->is_virtfn || \ + to_pci_dev(d)->detached_vf) : false)) /* Generic PCI functions exported to card drivers */ @@ -1764,6 +1767,7 @@ static inline struct pci_dev *pci_get_domain_bus_and_slot(int domain, #define dev_is_pci(d) (false) #define dev_is_pf(d) (false) +#define dev_is_vf(d) (false) static inline bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) { return false; } static inline int pci_irqd_intx_xlate(struct irq_domain *d,
s390x has the notion of providing VFs to the kernel in a manner where the associated PF is inaccessible other than via firmware. These are not treated as typical VFs and access to them is emulated by underlying firmware which can still access the PF. After the referened commit however these detached VFs were no longer able to work with vfio-pci as the firmware does not provide emulation of the PCI_COMMAND_MEMORY bit. In this case, let's explicitly recognize these detached VFs so that vfio-pci can allow memory access to them again. Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> --- arch/s390/pci/pci_bus.c | 13 +++++++++++++ drivers/vfio/pci/vfio_pci_config.c | 8 ++++---- include/linux/pci.h | 4 ++++ 3 files changed, 21 insertions(+), 4 deletions(-)