Message ID | 20210818151606.202815-10-yishaih@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce vfio_pci_core subsystem | expand |
On Wed, Aug 18, 2021 at 06:16:03PM +0300, Yishai Hadas wrote: > From: Max Gurtovoy <mgurtovoy@nvidia.com> > > Allow device drivers to include match entries in the modules.alias file > produced by kbuild that are not used for normal driver autoprobing and > module autoloading. Drivers using these match entries can be connected > to the PCI device manually, by userspace, using the existing > driver_override sysfs. > > To achieve it, we add the 'override_only' bitmap to struct pci_device_id > and a helper macro named 'PCI_DEVICE_DRIVER_OVERRIDE' to enable setting > specific bits on it. > > The first bit (i.e. 'PCI_ID_F_VFIO_DRIVER_OVERRIDE') indicates that the > match entry is for the VFIO subsystem, it can be set by another helper > macro named 'PCI_DRIVER_OVERRIDE_DEVICE_VFIO'. > > These match entries are prefixed with "vfio_" in the modules.alias. > > For example the resulting modules.alias may have: > > alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core > alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci > alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci > > In this example mlx5_core and mlx5_vfio_pci match to the same PCI > device. The kernel will autoload and autobind to mlx5_core but the kernel > and udev mechanisms will ignore mlx5_vfio_pci. > > When userspace wants to change a device to the VFIO subsystem userspace > can implement a generic algorithm: > > 1) Identify the sysfs path to the device: > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 > > 2) Get the modalias string from the kernel: > $ cat /sys/bus/pci/devices/0000:01:00.0/modalias > pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 > > 3) Prefix it with vfio_: > vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 > > 4) Search modules.alias for the above string and select the entry that > has the fewest *'s: > alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci > > 5) modprobe the matched module name: > $ modprobe mlx5_vfio_pci > > 6) cat the matched module name to driver_override: > echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override > > 7) unbind device from original module > echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind > > 8) probe PCI drivers (or explicitly bind to mlx5_vfio_pci) > echo 0000:01:00.0 > /sys/bus/pci/drivers_probe > > The algorithm is independent of bus type. In future the other buses's with s/buses's/buses/ > VFIO device drivers, like platform and ACPI, can use this algorithm as > well. > > This patch is the infrastructure to provide the information in the > modules.alias to userspace. Convert the only VFIO pci_driver which results > in one new line in the modules.alias: > > alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci > > Later series introduce additional HW specific VFIO PCI drivers, such as > mlx5_vfio_pci. > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Yishai Hadas <yishaih@nvidia.com> > --- > Documentation/PCI/pci.rst | 1 + > drivers/pci/pci-driver.c | 27 ++++++++++++++++++++------- > drivers/vfio/pci/vfio_pci.c | 9 ++++++++- > include/linux/mod_devicetable.h | 6 ++++++ > include/linux/pci.h | 28 ++++++++++++++++++++++++++++ > scripts/mod/devicetable-offsets.c | 1 + > scripts/mod/file2alias.c | 8 ++++++-- > 7 files changed, 70 insertions(+), 10 deletions(-) > > diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst > index fa651e25d98c..87c6f4a6ca32 100644 > --- a/Documentation/PCI/pci.rst > +++ b/Documentation/PCI/pci.rst > @@ -103,6 +103,7 @@ need pass only as many optional fields as necessary: > - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF) > - class and classmask fields default to 0 > - driver_data defaults to 0UL. > + - override_only field defaults to 0. > > Note that driver_data must match the value used by any of the pci_device_id > entries defined in the driver. This makes the driver_data field mandatory > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 3a72352aa5cf..8a6bd3364127 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -136,7 +136,7 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv, > struct pci_dev *dev) > { > struct pci_dynid *dynid; > - const struct pci_device_id *found_id = NULL; > + const struct pci_device_id *found_id = NULL, *ids; > > /* When driver_override is set, only bind to the matching driver */ > if (dev->driver_override && strcmp(dev->driver_override, drv->name)) > @@ -152,14 +152,27 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv, > } > spin_unlock(&drv->dynids.lock); > > - if (!found_id) > - found_id = pci_match_id(drv->id_table, dev); > + if (found_id) > + return found_id; > > - /* driver_override will always match, send a dummy id */ > - if (!found_id && dev->driver_override) > - found_id = &pci_device_id_any; > + for (ids = drv->id_table; (found_id = pci_match_id(ids, dev)); > + ids = found_id + 1) { > + /* > + * The match table is split based on driver_override. Check the > + * override_only as well so that any matching entry is > + * returned. > + */ > + if (!found_id->override_only || dev->driver_override) > + return found_id; The negation makes this short, but IMO, makes this harder to read. I'd rather test for the special case directly instead of testing for the *absence* of the special case, e.g., if (found_id->override_only) { if (dev->driver_override) return found_id; } else return found_id; > + } > > - return found_id; > + /* > + * if no static match, driver_override will always match, send a dummy > + * id. I think the original comment was better. This comment implies that we only checked for static matches above, but we actually checked for *both* dynamic IDs and static IDs. > + */ > + if (dev->driver_override) > + return &pci_device_id_any; > + return NULL; > } > > /** > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index 07edddf7e6ca..c52620ac5e70 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -180,9 +180,16 @@ static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn) > return vfio_pci_core_sriov_configure(pdev, nr_virtfn); > } > > +static const struct pci_device_id vfio_pci_table[] = { > + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_ANY_ID, PCI_ANY_ID) }, /* match all by default */ > + {} > +}; > + > +MODULE_DEVICE_TABLE(pci, vfio_pci_table); > + > static struct pci_driver vfio_pci_driver = { > .name = "vfio-pci", > - .id_table = NULL, /* only dynamic ids */ > + .id_table = vfio_pci_table, > .probe = vfio_pci_probe, > .remove = vfio_pci_remove, > .sriov_configure = vfio_pci_sriov_configure, > diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h > index 8e291cfdaf06..39c229a7ab8c 100644 > --- a/include/linux/mod_devicetable.h > +++ b/include/linux/mod_devicetable.h > @@ -16,6 +16,10 @@ typedef unsigned long kernel_ulong_t; > > #define PCI_ANY_ID (~0) > > +enum { > + PCI_ID_F_VFIO_DRIVER_OVERRIDE = 1 << 0, > +}; > + > /** > * struct pci_device_id - PCI device ID structure > * @vendor: Vendor ID to match (or PCI_ANY_ID) > @@ -34,12 +38,14 @@ typedef unsigned long kernel_ulong_t; > * Best practice is to use driver_data as an index > * into a static list of equivalent device types, > * instead of using it as a pointer. > + * @override_only: Bitmap for override_only PCI drivers. "Match only when dev->driver_override is this driver"? As far as PCI core is concerned there's no need for this to be a bitmap. I think this would make more sense if split into two patches. The first would add override_only and change pci_match_device(). Then there's no confusion about whether this is specific to VFIO. The second can add PCI_ID_F_VFIO_DRIVER_OVERRIDE and make the file2alias.c changes. Most of the commit log applies to this part. > */ > struct pci_device_id { > __u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/ > __u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */ > __u32 class, class_mask; /* (class,subclass,prog-if) triplet */ > kernel_ulong_t driver_data; /* Data private to the driver */ > + __u32 override_only; > }; > > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 540b377ca8f6..57f9aa60f3b4 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -901,6 +901,34 @@ struct pci_driver { > .vendor = (vend), .device = (dev), \ > .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID > > +/** > + * PCI_DEVICE_DRIVER_OVERRIDE - macro used to describe a PCI device with > + * override_only flags. > + * @vend: the 16 bit PCI Vendor ID > + * @dev: the 16 bit PCI Device ID > + * @driver_override: PCI Device override_only bitmap > + * > + * This macro is used to create a struct pci_device_id that matches a > + * specific device. The subvendor and subdevice fields will be set to > + * PCI_ANY_ID. > + */ > +#define PCI_DEVICE_DRIVER_OVERRIDE(vend, dev, driver_override) \ > + .vendor = (vend), .device = (dev), .subvendor = PCI_ANY_ID, \ > + .subdevice = PCI_ANY_ID, .override_only = (driver_override) > + > +/** > + * PCI_DRIVER_OVERRIDE_DEVICE_VFIO - macro used to describe a VFIO > + * "driver_override" PCI device. > + * @vend: the 16 bit PCI Vendor ID > + * @dev: the 16 bit PCI Device ID > + * > + * This macro is used to create a struct pci_device_id that matches a > + * specific device. The subvendor and subdevice fields will be set to > + * PCI_ANY_ID and the flags will be set to PCI_ID_F_VFIO_DRIVER_OVERRIDE. > + */ > +#define PCI_DRIVER_OVERRIDE_DEVICE_VFIO(vend, dev) \ > + PCI_DEVICE_DRIVER_OVERRIDE(vend, dev, PCI_ID_F_VFIO_DRIVER_OVERRIDE) > + > /** > * PCI_DEVICE_SUB - macro used to describe a specific PCI device with subsystem > * @vend: the 16 bit PCI Vendor ID > diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c > index 9bb6c7edccc4..cc3625617a0e 100644 > --- a/scripts/mod/devicetable-offsets.c > +++ b/scripts/mod/devicetable-offsets.c > @@ -42,6 +42,7 @@ int main(void) > DEVID_FIELD(pci_device_id, subdevice); > DEVID_FIELD(pci_device_id, class); > DEVID_FIELD(pci_device_id, class_mask); > + DEVID_FIELD(pci_device_id, override_only); > > DEVID(ccw_device_id); > DEVID_FIELD(ccw_device_id, match_flags); > diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c > index 7c97fa8e36bc..c3edbf73157e 100644 > --- a/scripts/mod/file2alias.c > +++ b/scripts/mod/file2alias.c > @@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename, > return 1; > } > > -/* Looks like: pci:vNdNsvNsdNbcNscNiN. */ > +/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */ > static int do_pci_entry(const char *filename, > void *symval, char *alias) > { > @@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename, > DEF_FIELD(symval, pci_device_id, subdevice); > DEF_FIELD(symval, pci_device_id, class); > DEF_FIELD(symval, pci_device_id, class_mask); > + DEF_FIELD(symval, pci_device_id, override_only); > > - strcpy(alias, "pci:"); > + if (override_only & PCI_ID_F_VFIO_DRIVER_OVERRIDE) > + strcpy(alias, "vfio_pci:"); > + else > + strcpy(alias, "pci:"); > ADD(alias, "v", vendor != PCI_ANY_ID, vendor); > ADD(alias, "d", device != PCI_ANY_ID, device); > ADD(alias, "sv", subvendor != PCI_ANY_ID, subvendor); > -- > 2.18.1 >
On 8/19/2021 6:15 PM, Bjorn Helgaas wrote: > On Wed, Aug 18, 2021 at 06:16:03PM +0300, Yishai Hadas wrote: >> From: Max Gurtovoy <mgurtovoy@nvidia.com> >> >> Allow device drivers to include match entries in the modules.alias file >> produced by kbuild that are not used for normal driver autoprobing and >> module autoloading. Drivers using these match entries can be connected >> to the PCI device manually, by userspace, using the existing >> driver_override sysfs. >> >> To achieve it, we add the 'override_only' bitmap to struct pci_device_id >> and a helper macro named 'PCI_DEVICE_DRIVER_OVERRIDE' to enable setting >> specific bits on it. >> >> The first bit (i.e. 'PCI_ID_F_VFIO_DRIVER_OVERRIDE') indicates that the >> match entry is for the VFIO subsystem, it can be set by another helper >> macro named 'PCI_DRIVER_OVERRIDE_DEVICE_VFIO'. >> >> These match entries are prefixed with "vfio_" in the modules.alias. >> >> For example the resulting modules.alias may have: >> >> alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core >> alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci >> alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci >> >> In this example mlx5_core and mlx5_vfio_pci match to the same PCI >> device. The kernel will autoload and autobind to mlx5_core but the kernel >> and udev mechanisms will ignore mlx5_vfio_pci. >> >> When userspace wants to change a device to the VFIO subsystem userspace >> can implement a generic algorithm: >> >> 1) Identify the sysfs path to the device: >> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 >> >> 2) Get the modalias string from the kernel: >> $ cat /sys/bus/pci/devices/0000:01:00.0/modalias >> pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 >> >> 3) Prefix it with vfio_: >> vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 >> >> 4) Search modules.alias for the above string and select the entry that >> has the fewest *'s: >> alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci >> >> 5) modprobe the matched module name: >> $ modprobe mlx5_vfio_pci >> >> 6) cat the matched module name to driver_override: >> echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override >> >> 7) unbind device from original module >> echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind >> >> 8) probe PCI drivers (or explicitly bind to mlx5_vfio_pci) >> echo 0000:01:00.0 > /sys/bus/pci/drivers_probe >> >> The algorithm is independent of bus type. In future the other buses's with > s/buses's/buses/ OK >> VFIO device drivers, like platform and ACPI, can use this algorithm as >> well. >> >> This patch is the infrastructure to provide the information in the >> modules.alias to userspace. Convert the only VFIO pci_driver which results >> in one new line in the modules.alias: >> >> alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci >> >> Later series introduce additional HW specific VFIO PCI drivers, such as >> mlx5_vfio_pci. >> >> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> >> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> >> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> >> --- >> Documentation/PCI/pci.rst | 1 + >> drivers/pci/pci-driver.c | 27 ++++++++++++++++++++------- >> drivers/vfio/pci/vfio_pci.c | 9 ++++++++- >> include/linux/mod_devicetable.h | 6 ++++++ >> include/linux/pci.h | 28 ++++++++++++++++++++++++++++ >> scripts/mod/devicetable-offsets.c | 1 + >> scripts/mod/file2alias.c | 8 ++++++-- >> 7 files changed, 70 insertions(+), 10 deletions(-) >> >> diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst >> index fa651e25d98c..87c6f4a6ca32 100644 >> --- a/Documentation/PCI/pci.rst >> +++ b/Documentation/PCI/pci.rst >> @@ -103,6 +103,7 @@ need pass only as many optional fields as necessary: >> - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF) >> - class and classmask fields default to 0 >> - driver_data defaults to 0UL. >> + - override_only field defaults to 0. >> >> Note that driver_data must match the value used by any of the pci_device_id >> entries defined in the driver. This makes the driver_data field mandatory >> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c >> index 3a72352aa5cf..8a6bd3364127 100644 >> --- a/drivers/pci/pci-driver.c >> +++ b/drivers/pci/pci-driver.c >> @@ -136,7 +136,7 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv, >> struct pci_dev *dev) >> { >> struct pci_dynid *dynid; >> - const struct pci_device_id *found_id = NULL; >> + const struct pci_device_id *found_id = NULL, *ids; >> >> /* When driver_override is set, only bind to the matching driver */ >> if (dev->driver_override && strcmp(dev->driver_override, drv->name)) >> @@ -152,14 +152,27 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv, >> } >> spin_unlock(&drv->dynids.lock); >> >> - if (!found_id) >> - found_id = pci_match_id(drv->id_table, dev); >> + if (found_id) >> + return found_id; >> >> - /* driver_override will always match, send a dummy id */ >> - if (!found_id && dev->driver_override) >> - found_id = &pci_device_id_any; >> + for (ids = drv->id_table; (found_id = pci_match_id(ids, dev)); >> + ids = found_id + 1) { >> + /* >> + * The match table is split based on driver_override. Check the >> + * override_only as well so that any matching entry is >> + * returned. >> + */ >> + if (!found_id->override_only || dev->driver_override) >> + return found_id; > The negation makes this short, but IMO, makes this harder to read. > I'd rather test for the special case directly instead of testing for > the *absence* of the special case, e.g., > > if (found_id->override_only) { > if (dev->driver_override) > return found_id; > } else > return found_id; This can be fine as well. >> + } >> >> - return found_id; >> + /* >> + * if no static match, driver_override will always match, send a dummy >> + * id. > I think the original comment was better. This comment implies that we > only checked for static matches above, but we actually checked for > *both* dynamic IDs and static IDs. OK >> + */ >> + if (dev->driver_override) >> + return &pci_device_id_any; >> + return NULL; >> } >> >> /** >> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c >> index 07edddf7e6ca..c52620ac5e70 100644 >> --- a/drivers/vfio/pci/vfio_pci.c >> +++ b/drivers/vfio/pci/vfio_pci.c >> @@ -180,9 +180,16 @@ static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn) >> return vfio_pci_core_sriov_configure(pdev, nr_virtfn); >> } >> >> +static const struct pci_device_id vfio_pci_table[] = { >> + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_ANY_ID, PCI_ANY_ID) }, /* match all by default */ >> + {} >> +}; >> + >> +MODULE_DEVICE_TABLE(pci, vfio_pci_table); >> + >> static struct pci_driver vfio_pci_driver = { >> .name = "vfio-pci", >> - .id_table = NULL, /* only dynamic ids */ >> + .id_table = vfio_pci_table, >> .probe = vfio_pci_probe, >> .remove = vfio_pci_remove, >> .sriov_configure = vfio_pci_sriov_configure, >> diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h >> index 8e291cfdaf06..39c229a7ab8c 100644 >> --- a/include/linux/mod_devicetable.h >> +++ b/include/linux/mod_devicetable.h >> @@ -16,6 +16,10 @@ typedef unsigned long kernel_ulong_t; >> >> #define PCI_ANY_ID (~0) >> >> +enum { >> + PCI_ID_F_VFIO_DRIVER_OVERRIDE = 1 << 0, >> +}; >> + >> /** >> * struct pci_device_id - PCI device ID structure >> * @vendor: Vendor ID to match (or PCI_ANY_ID) >> @@ -34,12 +38,14 @@ typedef unsigned long kernel_ulong_t; >> * Best practice is to use driver_data as an index >> * into a static list of equivalent device types, >> * instead of using it as a pointer. >> + * @override_only: Bitmap for override_only PCI drivers. > "Match only when dev->driver_override is this driver"? Just to be aligned here, This field will stay __u32 and may hold at the most 1 bit value set to represent the actual subsystem/driver. This is required to later on set the correct prefix in the modules.alias file, and you just suggested to change the comment as of above, right ? > As far as PCI core is concerned there's no need for this to be a > bitmap. > > I think this would make more sense if split into two patches. The > first would add override_only and change pci_match_device(). Then > there's no confusion about whether this is specific to VFIO. Splitting may end-up the first patch with a dead-code on below, as found_id->override_only will be always 0. If you still believe that this is better we can do it. if (found_id->override_only) { if (dev->driver_override) return found_id; } else return found_id; > The second can add PCI_ID_F_VFIO_DRIVER_OVERRIDE and make the > file2alias.c changes. Most of the commit log applies to this part. > >> */ >> struct pci_device_id { >> __u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/ >> __u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */ >> __u32 class, class_mask; /* (class,subclass,prog-if) triplet */ >> kernel_ulong_t driver_data; /* Data private to the driver */ >> + __u32 override_only; >> }; >> >> >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 540b377ca8f6..57f9aa60f3b4 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -901,6 +901,34 @@ struct pci_driver { >> .vendor = (vend), .device = (dev), \ >> .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID >> >> +/** >> + * PCI_DEVICE_DRIVER_OVERRIDE - macro used to describe a PCI device with >> + * override_only flags. >> + * @vend: the 16 bit PCI Vendor ID >> + * @dev: the 16 bit PCI Device ID >> + * @driver_override: PCI Device override_only bitmap >> + * >> + * This macro is used to create a struct pci_device_id that matches a >> + * specific device. The subvendor and subdevice fields will be set to >> + * PCI_ANY_ID. >> + */ >> +#define PCI_DEVICE_DRIVER_OVERRIDE(vend, dev, driver_override) \ >> + .vendor = (vend), .device = (dev), .subvendor = PCI_ANY_ID, \ >> + .subdevice = PCI_ANY_ID, .override_only = (driver_override) >> + >> +/** >> + * PCI_DRIVER_OVERRIDE_DEVICE_VFIO - macro used to describe a VFIO >> + * "driver_override" PCI device. >> + * @vend: the 16 bit PCI Vendor ID >> + * @dev: the 16 bit PCI Device ID >> + * >> + * This macro is used to create a struct pci_device_id that matches a >> + * specific device. The subvendor and subdevice fields will be set to >> + * PCI_ANY_ID and the flags will be set to PCI_ID_F_VFIO_DRIVER_OVERRIDE. >> + */ >> +#define PCI_DRIVER_OVERRIDE_DEVICE_VFIO(vend, dev) \ >> + PCI_DEVICE_DRIVER_OVERRIDE(vend, dev, PCI_ID_F_VFIO_DRIVER_OVERRIDE) >> + >> /** >> * PCI_DEVICE_SUB - macro used to describe a specific PCI device with subsystem >> * @vend: the 16 bit PCI Vendor ID >> diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c >> index 9bb6c7edccc4..cc3625617a0e 100644 >> --- a/scripts/mod/devicetable-offsets.c >> +++ b/scripts/mod/devicetable-offsets.c >> @@ -42,6 +42,7 @@ int main(void) >> DEVID_FIELD(pci_device_id, subdevice); >> DEVID_FIELD(pci_device_id, class); >> DEVID_FIELD(pci_device_id, class_mask); >> + DEVID_FIELD(pci_device_id, override_only); >> >> DEVID(ccw_device_id); >> DEVID_FIELD(ccw_device_id, match_flags); >> diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c >> index 7c97fa8e36bc..c3edbf73157e 100644 >> --- a/scripts/mod/file2alias.c >> +++ b/scripts/mod/file2alias.c >> @@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename, >> return 1; >> } >> >> -/* Looks like: pci:vNdNsvNsdNbcNscNiN. */ >> +/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */ >> static int do_pci_entry(const char *filename, >> void *symval, char *alias) >> { >> @@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename, >> DEF_FIELD(symval, pci_device_id, subdevice); >> DEF_FIELD(symval, pci_device_id, class); >> DEF_FIELD(symval, pci_device_id, class_mask); >> + DEF_FIELD(symval, pci_device_id, override_only); >> >> - strcpy(alias, "pci:"); >> + if (override_only & PCI_ID_F_VFIO_DRIVER_OVERRIDE) >> + strcpy(alias, "vfio_pci:"); >> + else >> + strcpy(alias, "pci:"); >> ADD(alias, "v", vendor != PCI_ANY_ID, vendor); >> ADD(alias, "d", device != PCI_ANY_ID, device); >> ADD(alias, "sv", subvendor != PCI_ANY_ID, subvendor); >> -- >> 2.18.1 >>
On Thu, Aug 19, 2021 at 07:16:20PM +0300, Yishai Hadas wrote: > On 8/19/2021 6:15 PM, Bjorn Helgaas wrote: > > On Wed, Aug 18, 2021 at 06:16:03PM +0300, Yishai Hadas wrote: > > > From: Max Gurtovoy <mgurtovoy@nvidia.com> > > > /** > > > * struct pci_device_id - PCI device ID structure > > > * @vendor: Vendor ID to match (or PCI_ANY_ID) > > > @@ -34,12 +38,14 @@ typedef unsigned long kernel_ulong_t; > > > * Best practice is to use driver_data as an index > > > * into a static list of equivalent device types, > > > * instead of using it as a pointer. > > > + * @override_only: Bitmap for override_only PCI drivers. > > "Match only when dev->driver_override is this driver"? > > Just to be aligned here, > > This field will stay __u32 and may hold at the most 1 bit value set to > represent the actual subsystem/driver. The PCI core does not require "at most 1 bit is set." Actually, I don't think even the file2alias code requires that. If you set two bits, you can generate two aliases. > This is required to later on set the correct prefix in the modules.alias > file, and you just suggested to change the comment as of above, right ? Yes, __u32 is fine and I'm only suggesting a comment change here. > > As far as PCI core is concerned there's no need for this to be a > > bitmap. > > > > I think this would make more sense if split into two patches. The > > first would add override_only and change pci_match_device(). Then > > there's no confusion about whether this is specific to VFIO. > > Splitting may end-up the first patch with a dead-code on below, as > found_id->override_only will be always 0. > > If you still believe that this is better we can do it. I think it's fine to add the functionality in one patch and use it in the next if it makes the commit clearer. I wouldn't want to add functionality that's not used at all in the series, but it's OK when they're both posted together. > if (found_id->override_only) { > if (dev->driver_override) > return found_id; > } else > return found_id; > > > The second can add PCI_ID_F_VFIO_DRIVER_OVERRIDE and make the > > file2alias.c changes. Most of the commit log applies to this part.
On 8/19/2021 7:39 PM, Bjorn Helgaas wrote: > On Thu, Aug 19, 2021 at 07:16:20PM +0300, Yishai Hadas wrote: >> On 8/19/2021 6:15 PM, Bjorn Helgaas wrote: >>> On Wed, Aug 18, 2021 at 06:16:03PM +0300, Yishai Hadas wrote: >>>> From: Max Gurtovoy <mgurtovoy@nvidia.com> >>>> /** >>>> * struct pci_device_id - PCI device ID structure >>>> * @vendor: Vendor ID to match (or PCI_ANY_ID) >>>> @@ -34,12 +38,14 @@ typedef unsigned long kernel_ulong_t; >>>> * Best practice is to use driver_data as an index >>>> * into a static list of equivalent device types, >>>> * instead of using it as a pointer. >>>> + * @override_only: Bitmap for override_only PCI drivers. >>> "Match only when dev->driver_override is this driver"? >> Just to be aligned here, >> >> This field will stay __u32 and may hold at the most 1 bit value set to >> represent the actual subsystem/driver. > The PCI core does not require "at most 1 bit is set." > > Actually, I don't think even the file2alias code requires that. If > you set two bits, you can generate two aliases. > >> This is required to later on set the correct prefix in the modules.alias >> file, and you just suggested to change the comment as of above, right ? > Yes, __u32 is fine and I'm only suggesting a comment change here. great. > >>> As far as PCI core is concerned there's no need for this to be a >>> bitmap. >>> >>> I think this would make more sense if split into two patches. The >>> first would add override_only and change pci_match_device(). Then >>> there's no confusion about whether this is specific to VFIO. >> Splitting may end-up the first patch with a dead-code on below, as >> found_id->override_only will be always 0. >> >> If you still believe that this is better we can do it. > I think it's fine to add the functionality in one patch and use it in > the next if it makes the commit clearer. I wouldn't want to add > functionality that's not used at all in the series, but it's OK when > they're both posted together. Ok. We can do the separation if all agree that the first commit is have a dead section. Alex, we would like to get few more reviewed-by signatures and we'll send the V3 series in a couple of days to make it to 5.15 merge window as we planned. Are you ok with the series after we got the green light for this patch ? do you think we need another pair of eyes to review the other patches ? -Max. > >> if (found_id->override_only) { >> if (dev->driver_override) >> return found_id; >> } else >> return found_id; >> >>> The second can add PCI_ID_F_VFIO_DRIVER_OVERRIDE and make the >>> file2alias.c changes. Most of the commit log applies to this part.
On Thu, 19 Aug 2021 22:57:30 +0300 Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > On 8/19/2021 7:39 PM, Bjorn Helgaas wrote: > > On Thu, Aug 19, 2021 at 07:16:20PM +0300, Yishai Hadas wrote: > >> On 8/19/2021 6:15 PM, Bjorn Helgaas wrote: > >>> On Wed, Aug 18, 2021 at 06:16:03PM +0300, Yishai Hadas wrote: > >>>> From: Max Gurtovoy <mgurtovoy@nvidia.com> > >>>> /** > >>>> * struct pci_device_id - PCI device ID structure > >>>> * @vendor: Vendor ID to match (or PCI_ANY_ID) > >>>> @@ -34,12 +38,14 @@ typedef unsigned long kernel_ulong_t; > >>>> * Best practice is to use driver_data as an index > >>>> * into a static list of equivalent device types, > >>>> * instead of using it as a pointer. > >>>> + * @override_only: Bitmap for override_only PCI drivers. > >>> "Match only when dev->driver_override is this driver"? > >> Just to be aligned here, > >> > >> This field will stay __u32 and may hold at the most 1 bit value set to > >> represent the actual subsystem/driver. > > The PCI core does not require "at most 1 bit is set." > > > > Actually, I don't think even the file2alias code requires that. If > > you set two bits, you can generate two aliases. > > > >> This is required to later on set the correct prefix in the modules.alias > >> file, and you just suggested to change the comment as of above, right ? > > Yes, __u32 is fine and I'm only suggesting a comment change here. > > great. > > > > > >>> As far as PCI core is concerned there's no need for this to be a > >>> bitmap. > >>> > >>> I think this would make more sense if split into two patches. The > >>> first would add override_only and change pci_match_device(). Then > >>> there's no confusion about whether this is specific to VFIO. > >> Splitting may end-up the first patch with a dead-code on below, as > >> found_id->override_only will be always 0. > >> > >> If you still believe that this is better we can do it. > > I think it's fine to add the functionality in one patch and use it in > > the next if it makes the commit clearer. I wouldn't want to add > > functionality that's not used at all in the series, but it's OK when > > they're both posted together. > > Ok. We can do the separation if all agree that the first commit is have > a dead section. > > Alex, > > we would like to get few more reviewed-by signatures and we'll send the > V3 series in a couple of days to make it to 5.15 merge window as we planned. > > Are you ok with the series after we got the green light for this patch ? > > do you think we need another pair of eyes to review the other patches ? More eyes is always better, but I'm not finding much to complain about in this series. This patch was probably the most pivotal for agreement, the rest is largely mechanical at this point. In addition to Bjorn for the PCI parts of this, I'd also like to see an ack from Yamada-san or Michal for scripts/mod/, who are already cc'd I also notice include/linux/vfio_pci_core.h doesn't get added to MAINTAINERS in the series. Please fix in patch 12/ It seems plausible that this could be ready for v5.15. Thanks, Alex
diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst index fa651e25d98c..87c6f4a6ca32 100644 --- a/Documentation/PCI/pci.rst +++ b/Documentation/PCI/pci.rst @@ -103,6 +103,7 @@ need pass only as many optional fields as necessary: - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF) - class and classmask fields default to 0 - driver_data defaults to 0UL. + - override_only field defaults to 0. Note that driver_data must match the value used by any of the pci_device_id entries defined in the driver. This makes the driver_data field mandatory diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 3a72352aa5cf..8a6bd3364127 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -136,7 +136,7 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv, struct pci_dev *dev) { struct pci_dynid *dynid; - const struct pci_device_id *found_id = NULL; + const struct pci_device_id *found_id = NULL, *ids; /* When driver_override is set, only bind to the matching driver */ if (dev->driver_override && strcmp(dev->driver_override, drv->name)) @@ -152,14 +152,27 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv, } spin_unlock(&drv->dynids.lock); - if (!found_id) - found_id = pci_match_id(drv->id_table, dev); + if (found_id) + return found_id; - /* driver_override will always match, send a dummy id */ - if (!found_id && dev->driver_override) - found_id = &pci_device_id_any; + for (ids = drv->id_table; (found_id = pci_match_id(ids, dev)); + ids = found_id + 1) { + /* + * The match table is split based on driver_override. Check the + * override_only as well so that any matching entry is + * returned. + */ + if (!found_id->override_only || dev->driver_override) + return found_id; + } - return found_id; + /* + * if no static match, driver_override will always match, send a dummy + * id. + */ + if (dev->driver_override) + return &pci_device_id_any; + return NULL; } /** diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 07edddf7e6ca..c52620ac5e70 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -180,9 +180,16 @@ static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn) return vfio_pci_core_sriov_configure(pdev, nr_virtfn); } +static const struct pci_device_id vfio_pci_table[] = { + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_ANY_ID, PCI_ANY_ID) }, /* match all by default */ + {} +}; + +MODULE_DEVICE_TABLE(pci, vfio_pci_table); + static struct pci_driver vfio_pci_driver = { .name = "vfio-pci", - .id_table = NULL, /* only dynamic ids */ + .id_table = vfio_pci_table, .probe = vfio_pci_probe, .remove = vfio_pci_remove, .sriov_configure = vfio_pci_sriov_configure, diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h index 8e291cfdaf06..39c229a7ab8c 100644 --- a/include/linux/mod_devicetable.h +++ b/include/linux/mod_devicetable.h @@ -16,6 +16,10 @@ typedef unsigned long kernel_ulong_t; #define PCI_ANY_ID (~0) +enum { + PCI_ID_F_VFIO_DRIVER_OVERRIDE = 1 << 0, +}; + /** * struct pci_device_id - PCI device ID structure * @vendor: Vendor ID to match (or PCI_ANY_ID) @@ -34,12 +38,14 @@ typedef unsigned long kernel_ulong_t; * Best practice is to use driver_data as an index * into a static list of equivalent device types, * instead of using it as a pointer. + * @override_only: Bitmap for override_only PCI drivers. */ struct pci_device_id { __u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/ __u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */ __u32 class, class_mask; /* (class,subclass,prog-if) triplet */ kernel_ulong_t driver_data; /* Data private to the driver */ + __u32 override_only; }; diff --git a/include/linux/pci.h b/include/linux/pci.h index 540b377ca8f6..57f9aa60f3b4 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -901,6 +901,34 @@ struct pci_driver { .vendor = (vend), .device = (dev), \ .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID +/** + * PCI_DEVICE_DRIVER_OVERRIDE - macro used to describe a PCI device with + * override_only flags. + * @vend: the 16 bit PCI Vendor ID + * @dev: the 16 bit PCI Device ID + * @driver_override: PCI Device override_only bitmap + * + * This macro is used to create a struct pci_device_id that matches a + * specific device. The subvendor and subdevice fields will be set to + * PCI_ANY_ID. + */ +#define PCI_DEVICE_DRIVER_OVERRIDE(vend, dev, driver_override) \ + .vendor = (vend), .device = (dev), .subvendor = PCI_ANY_ID, \ + .subdevice = PCI_ANY_ID, .override_only = (driver_override) + +/** + * PCI_DRIVER_OVERRIDE_DEVICE_VFIO - macro used to describe a VFIO + * "driver_override" PCI device. + * @vend: the 16 bit PCI Vendor ID + * @dev: the 16 bit PCI Device ID + * + * This macro is used to create a struct pci_device_id that matches a + * specific device. The subvendor and subdevice fields will be set to + * PCI_ANY_ID and the flags will be set to PCI_ID_F_VFIO_DRIVER_OVERRIDE. + */ +#define PCI_DRIVER_OVERRIDE_DEVICE_VFIO(vend, dev) \ + PCI_DEVICE_DRIVER_OVERRIDE(vend, dev, PCI_ID_F_VFIO_DRIVER_OVERRIDE) + /** * PCI_DEVICE_SUB - macro used to describe a specific PCI device with subsystem * @vend: the 16 bit PCI Vendor ID diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c index 9bb6c7edccc4..cc3625617a0e 100644 --- a/scripts/mod/devicetable-offsets.c +++ b/scripts/mod/devicetable-offsets.c @@ -42,6 +42,7 @@ int main(void) DEVID_FIELD(pci_device_id, subdevice); DEVID_FIELD(pci_device_id, class); DEVID_FIELD(pci_device_id, class_mask); + DEVID_FIELD(pci_device_id, override_only); DEVID(ccw_device_id); DEVID_FIELD(ccw_device_id, match_flags); diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c index 7c97fa8e36bc..c3edbf73157e 100644 --- a/scripts/mod/file2alias.c +++ b/scripts/mod/file2alias.c @@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename, return 1; } -/* Looks like: pci:vNdNsvNsdNbcNscNiN. */ +/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */ static int do_pci_entry(const char *filename, void *symval, char *alias) { @@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename, DEF_FIELD(symval, pci_device_id, subdevice); DEF_FIELD(symval, pci_device_id, class); DEF_FIELD(symval, pci_device_id, class_mask); + DEF_FIELD(symval, pci_device_id, override_only); - strcpy(alias, "pci:"); + if (override_only & PCI_ID_F_VFIO_DRIVER_OVERRIDE) + strcpy(alias, "vfio_pci:"); + else + strcpy(alias, "pci:"); ADD(alias, "v", vendor != PCI_ANY_ID, vendor); ADD(alias, "d", device != PCI_ANY_ID, device); ADD(alias, "sv", subvendor != PCI_ANY_ID, subvendor);