diff mbox series

[09/12] PCI: Add a PCI_ID_F_VFIO_DRIVER_OVERRIDE flag to struct pci_device_id

Message ID 20210721161609.68223-10-yishaih@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Introduce vfio_pci_core subsystem | expand

Commit Message

Yishai Hadas July 21, 2021, 4:16 p.m. UTC
From: Max Gurtovoy <mgurtovoy@nvidia.com>

The new flag field is be used to allow PCI drivers to signal the core code
during driver matching and when generating the modules.alias information.

The first use will be to define a VFIO flag that indicates the PCI driver
is a VFIO driver.

VFIO drivers have a few special properties compared to normal PCI drivers:
 - They do not automatically bind. VFIO drivers are used to swap out the
   normal driver for a device and convert the PCI device to the VFIO
   subsystem.

   The admin must make this choice and following the current uAPI this is
   usually done by using the driver_override sysfs.

 - The modules.alias includes the IDs of the VFIO PCI drivers, prefixing
   them with 'vfio_pci:' instead of the normal 'pci:'.

   This allows the userspace machinery that switches devices to VFIO to
   know what kernel drivers support what devices and allows it to trigger
   the proper device_override.

As existing tools do not recognize the "vfio_pci:" mod-alias prefix this
keeps todays behavior the same. VFIO remains on the side, is never
autoloaded and can only be activated by direct admin action.

This patch is the infrastructure to provide the information in the
modules.alias to userspace and enable the only PCI VFIO driver. Later
series introduce additional HW specific VFIO PCI drivers.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 Documentation/PCI/pci.rst         |  1 +
 drivers/pci/pci-driver.c          | 25 +++++++++++++++++++++----
 drivers/vfio/pci/vfio_pci.c       |  9 ++++++++-
 include/linux/mod_devicetable.h   |  7 +++++++
 include/linux/pci.h               | 27 +++++++++++++++++++++++++++
 scripts/mod/devicetable-offsets.c |  1 +
 scripts/mod/file2alias.c          |  8 ++++++--
 7 files changed, 71 insertions(+), 7 deletions(-)

Comments

Alex Williamson July 27, 2021, 4:34 p.m. UTC | #1
On Wed, 21 Jul 2021 19:16:06 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> From: Max Gurtovoy <mgurtovoy@nvidia.com>
> 
> The new flag field is be used to allow PCI drivers to signal the core code
> during driver matching and when generating the modules.alias information.
> 
> The first use will be to define a VFIO flag that indicates the PCI driver
> is a VFIO driver.
> 
> VFIO drivers have a few special properties compared to normal PCI drivers:
>  - They do not automatically bind. VFIO drivers are used to swap out the
>    normal driver for a device and convert the PCI device to the VFIO
>    subsystem.
> 
>    The admin must make this choice and following the current uAPI this is
>    usually done by using the driver_override sysfs.
> 
>  - The modules.alias includes the IDs of the VFIO PCI drivers, prefixing
>    them with 'vfio_pci:' instead of the normal 'pci:'.
> 
>    This allows the userspace machinery that switches devices to VFIO to
>    know what kernel drivers support what devices and allows it to trigger
>    the proper device_override.
> 
> As existing tools do not recognize the "vfio_pci:" mod-alias prefix this
> keeps todays behavior the same. VFIO remains on the side, is never
> autoloaded and can only be activated by direct admin action.
> 
> This patch is the infrastructure to provide the information in the
> modules.alias to userspace and enable the only PCI VFIO driver. Later
> series introduce additional HW specific VFIO PCI drivers.

I don't really understand why we're combining the above "special
properties" into a single flag.  For instance, why wouldn't we create a
flag that just indicates a match entry is only for driver override?  Or
if we're only using this for full wildcard matches, we could detect
that even without a flag.

Then, how does the "vfio_pci:" alias extend to other drivers?  Is this
expected to be the only driver that would use an alias ever or would
other drivers use new bits of the flag?  Seems some documentation is
necessary; the comment on PCI_DRIVER_OVERRIDE_DEVICE_VFIO doesn't
really help, "This macro is used to create a struct pci_device_id that
matches a specific device", then we proceed to use it with PCI_ANY_ID.

vfio-pci has always tried (as much as possible) to be "just another
PCI" driver to avoid all the nasty issues that used to exist with
legacy KVM device assignment, so I cringe at seeing these vfio specific
hooks in PCI-core.  Thanks,

Alex
Jason Gunthorpe July 27, 2021, 5:14 p.m. UTC | #2
On Tue, Jul 27, 2021 at 10:34:18AM -0600, Alex Williamson wrote:
> On Wed, 21 Jul 2021 19:16:06 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
> > From: Max Gurtovoy <mgurtovoy@nvidia.com>
> > 
> > The new flag field is be used to allow PCI drivers to signal the core code
> > during driver matching and when generating the modules.alias information.
> > 
> > The first use will be to define a VFIO flag that indicates the PCI driver
> > is a VFIO driver.
> > 
> > VFIO drivers have a few special properties compared to normal PCI drivers:
> >  - They do not automatically bind. VFIO drivers are used to swap out the
> >    normal driver for a device and convert the PCI device to the VFIO
> >    subsystem.
> > 
> >    The admin must make this choice and following the current uAPI this is
> >    usually done by using the driver_override sysfs.
> > 
> >  - The modules.alias includes the IDs of the VFIO PCI drivers, prefixing
> >    them with 'vfio_pci:' instead of the normal 'pci:'.
> > 
> >    This allows the userspace machinery that switches devices to VFIO to
> >    know what kernel drivers support what devices and allows it to trigger
> >    the proper device_override.
> > 
> > As existing tools do not recognize the "vfio_pci:" mod-alias prefix this
> > keeps todays behavior the same. VFIO remains on the side, is never
> > autoloaded and can only be activated by direct admin action.
> > 
> > This patch is the infrastructure to provide the information in the
> > modules.alias to userspace and enable the only PCI VFIO driver. Later
> > series introduce additional HW specific VFIO PCI drivers.
> 
> I don't really understand why we're combining the above "special
> properties" into a single flag. 

Currently I can't think of any reason to have two flags. We always
need both behaviors together. It is trivial for someone to change down
the road, so I prefer to keep the flag bit usage to a minimum.

> For instance, why wouldn't we create a flag that just indicates a
> match entry is only for driver override?

We still need to signal the generation of vfio_pci: string in the
modules.alias.

> Or if we're only using this for full wildcard matches, we could
> detect that even without a flag.

The mlx/hns/etc drivers will not use wildcard matches. This series is
the prep and the only driver we have right at this point is the
wildcard vfio_pci generic driver.

> Then, how does the "vfio_pci:" alias extend to other drivers?  

After the HW drivers are merged we have a list of things in the
modules.alias file. Eg we might have something like:

alias vfio_pci:v000015B3d00001011sv*sd*bc*sc*i* mlx5_vfio_pci
alias vfio_pci:v0000abc1d0000abcdsv*sd*bc*sc*i* hns_vfio_pci
alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci

This flag, and the vfio_pci string, is only for the VFIO subsystem. If
someday another subsystem wants to use driver_override then it will
provide its own subsystem name here instead.

This is solving the problem you had at the start - that userspace must
be able to self identify the drivers.  Starting with a PCI BDF
userspace can match the modules.alias for vfio_pci: prefixes and
determine which string to put into the driver_override sysfs. This is
instead of having userspace hardwire vfio_pci.

> Is this expected to be the only driver that would use an alias ever
> or would other drivers use new bits of the flag?

Not sure what you mean by "only driver"? As above every driver
implementing VFIO on top of PCI will use this flag. If another
subsystem wants to use driver_override it will define its own flag,
and it's userspace will look for othersubsytem_pci: tags in
modules.alias when it wants to change a PCI device over.

> Seems some documentation is necessary; the comment on
> PCI_DRIVER_OVERRIDE_DEVICE_VFIO doesn't really help, "This macro is
> used to create a struct pci_device_id that matches a specific
> device", then we proceed to use it with PCI_ANY_ID.

Fair enough, this is ment in the broader context, the generic vfio_pci
is just special.

> vfio-pci has always tried (as much as possible) to be "just another
> PCI" driver to avoid all the nasty issues that used to exist with
> legacy KVM device assignment, so I cringe at seeing these vfio specific
> hooks in PCI-core.  Thanks,

It is has always had very special behavior - a PCI driver without a
match table is is not "just another PCI" driver.

While this is not entirely elegant, considering where we have ended up
and the historical ABI that has to be preserved, it is the best idea
so far anyone has presented.

Jason
Alex Williamson July 27, 2021, 11:02 p.m. UTC | #3
On Tue, 27 Jul 2021 14:14:58 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Jul 27, 2021 at 10:34:18AM -0600, Alex Williamson wrote:
> > On Wed, 21 Jul 2021 19:16:06 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> >   
> > > From: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > 
> > > The new flag field is be used to allow PCI drivers to signal the core code
> > > during driver matching and when generating the modules.alias information.
> > > 
> > > The first use will be to define a VFIO flag that indicates the PCI driver
> > > is a VFIO driver.
> > > 
> > > VFIO drivers have a few special properties compared to normal PCI drivers:
> > >  - They do not automatically bind. VFIO drivers are used to swap out the
> > >    normal driver for a device and convert the PCI device to the VFIO
> > >    subsystem.
> > > 
> > >    The admin must make this choice and following the current uAPI this is
> > >    usually done by using the driver_override sysfs.
> > > 
> > >  - The modules.alias includes the IDs of the VFIO PCI drivers, prefixing
> > >    them with 'vfio_pci:' instead of the normal 'pci:'.
> > > 
> > >    This allows the userspace machinery that switches devices to VFIO to
> > >    know what kernel drivers support what devices and allows it to trigger
> > >    the proper device_override.
> > > 
> > > As existing tools do not recognize the "vfio_pci:" mod-alias prefix this
> > > keeps todays behavior the same. VFIO remains on the side, is never
> > > autoloaded and can only be activated by direct admin action.
> > > 
> > > This patch is the infrastructure to provide the information in the
> > > modules.alias to userspace and enable the only PCI VFIO driver. Later
> > > series introduce additional HW specific VFIO PCI drivers.  
> > 
> > I don't really understand why we're combining the above "special
> > properties" into a single flag.   
> 
> Currently I can't think of any reason to have two flags. We always
> need both behaviors together. It is trivial for someone to change down
> the road, so I prefer to keep the flag bit usage to a minimum.
> 
> > For instance, why wouldn't we create a flag that just indicates a
> > match entry is only for driver override?  
> 
> We still need to signal the generation of vfio_pci: string in the
> modules.alias.
> 
> > Or if we're only using this for full wildcard matches, we could
> > detect that even without a flag.  
> 
> The mlx/hns/etc drivers will not use wildcard matches. This series is
> the prep and the only driver we have right at this point is the
> wildcard vfio_pci generic driver.
> 
> > Then, how does the "vfio_pci:" alias extend to other drivers?    
> 
> After the HW drivers are merged we have a list of things in the
> modules.alias file. Eg we might have something like:
> 
> alias vfio_pci:v000015B3d00001011sv*sd*bc*sc*i* mlx5_vfio_pci
> alias vfio_pci:v0000abc1d0000abcdsv*sd*bc*sc*i* hns_vfio_pci
> alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci
> 
> This flag, and the vfio_pci string, is only for the VFIO subsystem. If
> someday another subsystem wants to use driver_override then it will
> provide its own subsystem name here instead.
> 
> This is solving the problem you had at the start - that userspace must
> be able to self identify the drivers.  Starting with a PCI BDF
> userspace can match the modules.alias for vfio_pci: prefixes and
> determine which string to put into the driver_override sysfs. This is
> instead of having userspace hardwire vfio_pci.
> 
> > Is this expected to be the only driver that would use an alias ever
> > or would other drivers use new bits of the flag?  
> 
> Not sure what you mean by "only driver"? As above every driver
> implementing VFIO on top of PCI will use this flag. If another
> subsystem wants to use driver_override it will define its own flag,
> and it's userspace will look for othersubsytem_pci: tags in
> modules.alias when it wants to change a PCI device over.
> 
> > Seems some documentation is necessary; the comment on
> > PCI_DRIVER_OVERRIDE_DEVICE_VFIO doesn't really help, "This macro is
> > used to create a struct pci_device_id that matches a specific
> > device", then we proceed to use it with PCI_ANY_ID.  
> 
> Fair enough, this is ment in the broader context, the generic vfio_pci
> is just special.
> 
> > vfio-pci has always tried (as much as possible) to be "just another
> > PCI" driver to avoid all the nasty issues that used to exist with
> > legacy KVM device assignment, so I cringe at seeing these vfio specific
> > hooks in PCI-core.  Thanks,  
> 
> It is has always had very special behavior - a PCI driver without a
> match table is is not "just another PCI" driver.
> 
> While this is not entirely elegant, considering where we have ended up
> and the historical ABI that has to be preserved, it is the best idea
> so far anyone has presented.

In general I think my confusion is lack of documentation and examples.
There's good information here and in the cover letter, but reviewing
the patch itself I'm not sure if vfio_pci: is meant to indicate the
vfio_pci driver or the vfio_pci device api or as I've finally decided,
just prepending "vfio_" to the modalias for a device to indicate the
class of stuff, ie. no automatic binding but discoverable by userspace
as a "vfio" driver suitable for this device.

I think we need libvirt folks onboard and maybe a clearer idea what
userspace helpers might be available.  For example would driverctl have
an option to choose a vfio class driver for a device?

I can also imagine that if the flag only covered the
matching/driver_override aspect and pci_device_id further included an
optional modalias prefix, we could do this without littering pci-core
with vfio eccentricities.  I'll be interest to see Bjorn's thoughts on
this.  Thanks,

Alex
Jason Gunthorpe July 27, 2021, 11:42 p.m. UTC | #4
On Tue, Jul 27, 2021 at 05:02:02PM -0600, Alex Williamson wrote:

> In general I think my confusion is lack of documentation and examples.
> There's good information here and in the cover letter, but reviewing
> the patch itself I'm not sure if vfio_pci: is meant to indicate the
> vfio_pci driver or the vfio_pci device api or as I've finally decided,
> just prepending "vfio_" to the modalias for a device to indicate the
> class of stuff, ie. no automatic binding but discoverable by userspace
> as a "vfio" driver suitable for this device.

Yes, the "vfio_" prefix is ment to be a generic prefix that any bus
type could use to signify the modalias entry is for the vfio flavour
of driver_override devices.

The userspace algorihtm is pretty simple.

1) Identify the sysfs path to the device:
  /sys/bus/pci/devices/0000:01:00.0/modalias

2) Get the modalias string from the kernel:
 $ cat /sys/bus/pci/devices/0000:01:00.0/modalias
pci:v000015B3d00001017sv000015B3sd00000001bc02sc00i00

3) Prefix it with vfio_:
vfio_pci:v000015B3d00001017sv000015B3sd00000001bc02sc00i00

4) Search modules.alias for the above string, select the entry that
   has the fewest *'s. See Max's sample script.

5) modprobe the matched module name

6) cat the matched modules.alias module name to
   /sys/bus/pci/devices/0000\:01\:00.0/driver_override

Further patches can make this work universally for all the current and
future vfio bus types, eg platform, fsl, etc.

Something like driverctl or libvirt can implement this algorithm and
remove all the hardwired behavior of load vfio_fsl for this or
vfio_pci for that.

I'll add the above sequence to the commit message of this patch, since
I think it makes it really clear.

> I think we need libvirt folks onboard and maybe a clearer idea what
> userspace helpers might be available.  For example would driverctl have
> an option to choose a vfio class driver for a device?

Max wrote a demo script that shows how this can work, it is linked in
the cover letter.

At the end of the day there are only two ideas that survived scrutiny:

1) This patch which makes everything dynamic and driven by
   modules.alias,

2) We continue to hardwire the driver and module names into
   libvirt/etc and just add mlx, hns, etc.

> I can also imagine that if the flag only covered the
> matching/driver_override aspect and pci_device_id further included an
> optional modalias prefix, we could do this without littering pci-core
> with vfio eccentricities.  I'll be interest to see Bjorn's thoughts on
> this.  Thanks,

This is more elegant, but we didn't do this because the pci match
struct is widely used in the kernel and bloating it further doesn't
seem to make a lot of sense at this point. Due to the macros it would
be easy to change to this scheme if was appropriate.

Jason
Bjorn Helgaas Aug. 4, 2021, 8:34 p.m. UTC | #5
On Wed, Jul 21, 2021 at 07:16:06PM +0300, Yishai Hadas wrote:
> From: Max Gurtovoy <mgurtovoy@nvidia.com>
> 
> The new flag field is be used to allow PCI drivers to signal the core code
> during driver matching and when generating the modules.alias information.

This needs to read as a complete idea even without the subject line.
The subject is the *title*; it's not the first sentence of the essay.

It's OK to repeat the subject line in the commit log, but I don't
think that would solve the problem here because "signal core code" and
"when generating ..." doesn't get to the point of the patch.

What's the objective here?

> The first use will be to define a VFIO flag that indicates the PCI driver
> is a VFIO driver.

Is there such a thing as a "VFIO driver" today?  Maybe this patch is
introducing that concept?  If so, maybe lead off by motivating and
defining what it is, then follow up with the details that go into
implementing it.

> VFIO drivers have a few special properties compared to normal PCI drivers:
>  - They do not automatically bind. VFIO drivers are used to swap out the
>    normal driver for a device and convert the PCI device to the VFIO
>    subsystem.

The comment below says "... any matching PCI_ID_F_DRIVER_OVERRIDE
[sic] entry is returned," which sounds like the opposite of "do not
automatically bind."  Might be exposing my VFIO ignorance here.

>    The admin must make this choice and following the current uAPI this is
>    usually done by using the driver_override sysfs.

I'm not sure "converting PCI device to the VFIO subsystem" is the
right way to phrase this, but whatever it is, make this idea specific,
e.g., by "echo pci-stub > /sys/.../driver_override" or whatever.

>  - The modules.alias includes the IDs of the VFIO PCI drivers, prefixing
>    them with 'vfio_pci:' instead of the normal 'pci:'.
> 
>    This allows the userspace machinery that switches devices to VFIO to
>    know what kernel drivers support what devices and allows it to trigger
>    the proper device_override.

What does "switch device to VFIO" mean?  I could be reading this too
literally (in my defense, I'm not a VFIO expert), but AFAICT this is
not something you do to the *device*.  I guess maybe this is something
like "prevent the normal driver from claiming the device so we can use
VFIO instead"?  Does "using VFIO" mean getting vfio-pci to claim the
device?

> As existing tools do not recognize the "vfio_pci:" mod-alias prefix this
> keeps todays behavior the same. VFIO remains on the side, is never
> autoloaded and can only be activated by direct admin action.

s/todays/today's/

> This patch is the infrastructure to provide the information in the
> modules.alias to userspace and enable the only PCI VFIO driver. Later
> series introduce additional HW specific VFIO PCI drivers.

s/the only/only the/ ?  (Not sure what you intend, but "the only"
doesn't seem right)

Sorry, I know I'm totally missing the point here.

> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  Documentation/PCI/pci.rst         |  1 +
>  drivers/pci/pci-driver.c          | 25 +++++++++++++++++++++----
>  drivers/vfio/pci/vfio_pci.c       |  9 ++++++++-
>  include/linux/mod_devicetable.h   |  7 +++++++
>  include/linux/pci.h               | 27 +++++++++++++++++++++++++++
>  scripts/mod/devicetable-offsets.c |  1 +
>  scripts/mod/file2alias.c          |  8 ++++++--
>  7 files changed, 71 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst
> index fa651e25d98c..24e70a386887 100644
> --- a/Documentation/PCI/pci.rst
> +++ b/Documentation/PCI/pci.rst
> @@ -103,6 +103,7 @@ need pass only as many optional fields as necessary:
>    - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
>    - class and classmask fields default to 0
>    - driver_data defaults to 0UL.
> +  - flags field defaults to 0.
>  
>  Note that driver_data must match the value used by any of the pci_device_id
>  entries defined in the driver. This makes the driver_data field mandatory
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 3a72352aa5cf..1ed8a4ab96f1 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -136,7 +136,7 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
>  						    struct pci_dev *dev)
>  {
>  	struct pci_dynid *dynid;
> -	const struct pci_device_id *found_id = NULL;
> +	const struct pci_device_id *found_id = NULL, *ids;
>  
>  	/* When driver_override is set, only bind to the matching driver */
>  	if (dev->driver_override && strcmp(dev->driver_override, drv->name))
> @@ -152,10 +152,27 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
>  	}
>  	spin_unlock(&drv->dynids.lock);
>  
> -	if (!found_id)
> -		found_id = pci_match_id(drv->id_table, dev);
> +	if (found_id)
> +		return found_id;
> +
> +	ids = drv->id_table;
> +	while ((found_id = pci_match_id(ids, dev))) {
> +		/*
> +		 * The match table is split based on driver_override. Check the
> +		 * flags as well so that any matching PCI_ID_F_DRIVER_OVERRIDE

s/PCI_ID_F_DRIVER_OVERRIDE/PCI_ID_F_VFIO_DRIVER_OVERRIDE/ ?

> +		 * entry is returned.
> +		 */
> +		if ((found_id->flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE) &&
> +		    !dev->driver_override)
> +			ids = found_id + 1;
> +		else
> +			break;

Isn't this break the same as "return found_id"?

> +	}
>  
> -	/* driver_override will always match, send a dummy id */
> +	/*
> +	 * if no static match, driver_override will always match, send a dummy

AFAICT this patch did not change dynamic matching, so I don't know why
you changed this comment.  Previously driver_override matched if there
was no dynamic or static match.  Now it's the same except that we skip
static matches with PCI_ID_F_VFIO_DRIVER_OVERRIDE.

> +	 * id.
> +	 */
>  	if (!found_id && dev->driver_override)
>  		found_id = &pci_device_id_any;
>  
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 0272b95d9c5f..7a43edbe8618 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -181,9 +181,16 @@ static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
>  	return vfio_pci_core_sriov_configure(pdev, nr_virtfn);
>  }
>  
> +static const struct pci_device_id vfio_pci_table[] = {
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_ANY_ID, PCI_ANY_ID) }, /* match all by default */
> +	{ 0, }
> +};
> +
> +MODULE_DEVICE_TABLE(pci, vfio_pci_table);
> +
>  static struct pci_driver vfio_pci_driver = {
>  	.name			= "vfio-pci",
> -	.id_table		= NULL, /* only dynamic ids */
> +	.id_table		= vfio_pci_table,
>  	.probe			= vfio_pci_probe,
>  	.remove			= vfio_pci_remove,
>  	.sriov_configure	= vfio_pci_sriov_configure,
> diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
> index 8e291cfdaf06..cd256d9c60d2 100644
> --- a/include/linux/mod_devicetable.h
> +++ b/include/linux/mod_devicetable.h
> @@ -16,6 +16,11 @@ typedef unsigned long kernel_ulong_t;
>  
>  #define PCI_ANY_ID (~0)
>  
> +

Spurious blank line.

> +enum pci_id_flags {
> +	PCI_ID_F_VFIO_DRIVER_OVERRIDE	= 1 << 0,
> +};

Why an enum?  Is the enum and the name following some similar style
elsewhere?

> +
>  /**
>   * struct pci_device_id - PCI device ID structure
>   * @vendor:		Vendor ID to match (or PCI_ANY_ID)
> @@ -34,12 +39,14 @@ typedef unsigned long kernel_ulong_t;
>   *			Best practice is to use driver_data as an index
>   *			into a static list of equivalent device types,
>   *			instead of using it as a pointer.
> + * @flags:		PCI flags of the driver. Bitmap of pci_id_flags enum.
>   */
>  struct pci_device_id {
>  	__u32 vendor, device;		/* Vendor and device ID or PCI_ANY_ID*/
>  	__u32 subvendor, subdevice;	/* Subsystem ID's or PCI_ANY_ID */
>  	__u32 class, class_mask;	/* (class,subclass,prog-if) triplet */
>  	kernel_ulong_t driver_data;	/* Data private to the driver */
> +	__u32 flags;
>  };
>  
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 540b377ca8f6..fd84609ff06b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -901,6 +901,33 @@ struct pci_driver {
>  	.vendor = (vend), .device = (dev), \
>  	.subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID
>  
> +/**
> + * PCI_DEVICE_FLAGS - macro used to describe a PCI device with specific flags.
> + * @vend: the 16 bit PCI Vendor ID
> + * @dev: the 16 bit PCI Device ID
> + * @fl: PCI Device flags as a bitmap of pci_id_flags enum
> + *
> + * This macro is used to create a struct pci_device_id that matches a
> + * specific device. The subvendor and subdevice fields will be set to
> + * PCI_ANY_ID.
> + */
> +#define PCI_DEVICE_FLAGS(vend, dev, fl) \
> +	.vendor = (vend), .device = (dev), .subvendor = PCI_ANY_ID, \
> +	.subdevice = PCI_ANY_ID, .flags = (fl)
> +
> +/**
> + * PCI_DRIVER_OVERRIDE_DEVICE_VFIO - macro used to describe a VFIO
> + *                                   "driver_override" PCI device.
> + * @vend: the 16 bit PCI Vendor ID
> + * @dev: the 16 bit PCI Device ID
> + *
> + * This macro is used to create a struct pci_device_id that matches a
> + * specific device. The subvendor and subdevice fields will be set to
> + * PCI_ANY_ID and the flags will be set to PCI_ID_F_VFIO_DRIVER_OVERRIDE.
> + */
> +#define PCI_DRIVER_OVERRIDE_DEVICE_VFIO(vend, dev) \
> +	PCI_DEVICE_FLAGS(vend, dev, PCI_ID_F_VFIO_DRIVER_OVERRIDE)
> +
>  /**
>   * PCI_DEVICE_SUB - macro used to describe a specific PCI device with subsystem
>   * @vend: the 16 bit PCI Vendor ID
> diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
> index 9bb6c7edccc4..b927c36b8333 100644
> --- a/scripts/mod/devicetable-offsets.c
> +++ b/scripts/mod/devicetable-offsets.c
> @@ -42,6 +42,7 @@ int main(void)
>  	DEVID_FIELD(pci_device_id, subdevice);
>  	DEVID_FIELD(pci_device_id, class);
>  	DEVID_FIELD(pci_device_id, class_mask);
> +	DEVID_FIELD(pci_device_id, flags);
>  
>  	DEVID(ccw_device_id);
>  	DEVID_FIELD(ccw_device_id, match_flags);
> diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> index 7c97fa8e36bc..f53b38e8f696 100644
> --- a/scripts/mod/file2alias.c
> +++ b/scripts/mod/file2alias.c
> @@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename,
>  	return 1;
>  }
>  
> -/* Looks like: pci:vNdNsvNsdNbcNscNiN. */
> +/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */
>  static int do_pci_entry(const char *filename,
>  			void *symval, char *alias)
>  {
> @@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename,
>  	DEF_FIELD(symval, pci_device_id, subdevice);
>  	DEF_FIELD(symval, pci_device_id, class);
>  	DEF_FIELD(symval, pci_device_id, class_mask);
> +	DEF_FIELD(symval, pci_device_id, flags);

I'm a little bit wary of adding a new field to this kernel/user
interface just for this single bit.  Maybe it's justified but feels
like it's worth being careful.

> -	strcpy(alias, "pci:");
> +	if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
> +		strcpy(alias, "vfio_pci:");
> +	else
> +		strcpy(alias, "pci:");
>  	ADD(alias, "v", vendor != PCI_ANY_ID, vendor);
>  	ADD(alias, "d", device != PCI_ANY_ID, device);
>  	ADD(alias, "sv", subvendor != PCI_ANY_ID, subvendor);
> -- 
> 2.18.1
>
Max Gurtovoy Aug. 5, 2021, 4:47 p.m. UTC | #6
On 8/4/2021 11:34 PM, Bjorn Helgaas wrote:
> On Wed, Jul 21, 2021 at 07:16:06PM +0300, Yishai Hadas wrote:
>> From: Max Gurtovoy <mgurtovoy@nvidia.com>
>>
>> The new flag field is be used to allow PCI drivers to signal the core code
>> during driver matching and when generating the modules.alias information.
> This needs to read as a complete idea even without the subject line.
> The subject is the *title*; it's not the first sentence of the essay.
>
> It's OK to repeat the subject line in the commit log, but I don't
> think that would solve the problem here because "signal core code" and
> "when generating ..." doesn't get to the point of the patch.
>
> What's the objective here?

We're creating a framework for adding vendor/protocol specific vfio_pci 
drivers. Today we have vfio_pci that can match to all pci devices and 
implement generic pci functionality.

For adding features like Live migration and other goodies we'll use the 
new vendor drivers since it's not generic.

In this patch, we're providing the ability for userspace to identify 
these drivers and match to devices.

We also want to prevent auto loading via some udev facilities (by adding 
new aliases for vfio pci devices) and we don't want vfio pci vendor 
drivers to race with original pci driver (e.g mlx5_core).

Thus, we enforce that mlx5_vfio_pci will match to a device (id_table 
will be added to vendor drivers) only if some admin use driver_override.

>
>> The first use will be to define a VFIO flag that indicates the PCI driver
>> is a VFIO driver.
> Is there such a thing as a "VFIO driver" today?  Maybe this patch is
> introducing that concept?  If so, maybe lead off by motivating and
> defining what it is, then follow up with the details that go into
> implementing it.
>
>> VFIO drivers have a few special properties compared to normal PCI drivers:
>>   - They do not automatically bind. VFIO drivers are used to swap out the
>>     normal driver for a device and convert the PCI device to the VFIO
>>     subsystem.
> The comment below says "... any matching PCI_ID_F_DRIVER_OVERRIDE
> [sic] entry is returned," which sounds like the opposite of "do not
> automatically bind."  Might be exposing my VFIO ignorance here.
>
>>     The admin must make this choice and following the current uAPI this is
>>     usually done by using the driver_override sysfs.
> I'm not sure "converting PCI device to the VFIO subsystem" is the
> right way to phrase this, but whatever it is, make this idea specific,
> e.g., by "echo pci-stub > /sys/.../driver_override" or whatever.
>
>>   - The modules.alias includes the IDs of the VFIO PCI drivers, prefixing
>>     them with 'vfio_pci:' instead of the normal 'pci:'.
>>
>>     This allows the userspace machinery that switches devices to VFIO to
>>     know what kernel drivers support what devices and allows it to trigger
>>     the proper device_override.
> What does "switch device to VFIO" mean?  I could be reading this too
> literally (in my defense, I'm not a VFIO expert), but AFAICT this is
> not something you do to the *device*.  I guess maybe this is something
> like "prevent the normal driver from claiming the device so we can use
> VFIO instead"?  Does "using VFIO" mean getting vfio-pci to claim the
> device?

hope the above explanation made this more clear.

We'll have vendor_vfio_pci drivers in the next patchsets and not only 
vfio_pci.ko.

mlx5 and hns will be the first 2 drivers to implement vendor specific 
functionality in vfio/pci subsystem.

We want to use these drivers to drive our devices and not vfio_pci.ko 
that don't have the logic for migrating mlx5/hns devices.


We'll improve the commit message for the next version and add the 
algorithm Jason proposed in his previous answer.

>
>> As existing tools do not recognize the "vfio_pci:" mod-alias prefix this
>> keeps todays behavior the same. VFIO remains on the side, is never
>> autoloaded and can only be activated by direct admin action.
> s/todays/today's/
>
>> This patch is the infrastructure to provide the information in the
>> modules.alias to userspace and enable the only PCI VFIO driver. Later
>> series introduce additional HW specific VFIO PCI drivers.
> s/the only/only the/ ?  (Not sure what you intend, but "the only"
> doesn't seem right)
>
> Sorry, I know I'm totally missing the point here.
>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   Documentation/PCI/pci.rst         |  1 +
>>   drivers/pci/pci-driver.c          | 25 +++++++++++++++++++++----
>>   drivers/vfio/pci/vfio_pci.c       |  9 ++++++++-
>>   include/linux/mod_devicetable.h   |  7 +++++++
>>   include/linux/pci.h               | 27 +++++++++++++++++++++++++++
>>   scripts/mod/devicetable-offsets.c |  1 +
>>   scripts/mod/file2alias.c          |  8 ++++++--
>>   7 files changed, 71 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst
>> index fa651e25d98c..24e70a386887 100644
>> --- a/Documentation/PCI/pci.rst
>> +++ b/Documentation/PCI/pci.rst
>> @@ -103,6 +103,7 @@ need pass only as many optional fields as necessary:
>>     - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
>>     - class and classmask fields default to 0
>>     - driver_data defaults to 0UL.
>> +  - flags field defaults to 0.
>>   
>>   Note that driver_data must match the value used by any of the pci_device_id
>>   entries defined in the driver. This makes the driver_data field mandatory
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index 3a72352aa5cf..1ed8a4ab96f1 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -136,7 +136,7 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
>>   						    struct pci_dev *dev)
>>   {
>>   	struct pci_dynid *dynid;
>> -	const struct pci_device_id *found_id = NULL;
>> +	const struct pci_device_id *found_id = NULL, *ids;
>>   
>>   	/* When driver_override is set, only bind to the matching driver */
>>   	if (dev->driver_override && strcmp(dev->driver_override, drv->name))
>> @@ -152,10 +152,27 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
>>   	}
>>   	spin_unlock(&drv->dynids.lock);
>>   
>> -	if (!found_id)
>> -		found_id = pci_match_id(drv->id_table, dev);
>> +	if (found_id)
>> +		return found_id;
>> +
>> +	ids = drv->id_table;
>> +	while ((found_id = pci_match_id(ids, dev))) {
>> +		/*
>> +		 * The match table is split based on driver_override. Check the
>> +		 * flags as well so that any matching PCI_ID_F_DRIVER_OVERRIDE
> s/PCI_ID_F_DRIVER_OVERRIDE/PCI_ID_F_VFIO_DRIVER_OVERRIDE/ ?

sorry, leftover from last version.

>
>> +		 * entry is returned.
>> +		 */
>> +		if ((found_id->flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE) &&
>> +		    !dev->driver_override)
>> +			ids = found_id + 1;
>> +		else
>> +			break;
> Isn't this break the same as "return found_id"?

the same.

Will update in next version.


>
>> +	}
>>   
>> -	/* driver_override will always match, send a dummy id */
>> +	/*
>> +	 * if no static match, driver_override will always match, send a dummy
> AFAICT this patch did not change dynamic matching, so I don't know why
> you changed this comment.  Previously driver_override matched if there
> was no dynamic or static match.  Now it's the same except that we skip
> static matches with PCI_ID_F_VFIO_DRIVER_OVERRIDE.

we'll keep the old comment.


>> +	 * id.
>> +	 */
>>   	if (!found_id && dev->driver_override)
>>   		found_id = &pci_device_id_any;
>>   
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 0272b95d9c5f..7a43edbe8618 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -181,9 +181,16 @@ static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
>>   	return vfio_pci_core_sriov_configure(pdev, nr_virtfn);
>>   }
>>   
>> +static const struct pci_device_id vfio_pci_table[] = {
>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_ANY_ID, PCI_ANY_ID) }, /* match all by default */
>> +	{ 0, }
>> +};
>> +
>> +MODULE_DEVICE_TABLE(pci, vfio_pci_table);
>> +
>>   static struct pci_driver vfio_pci_driver = {
>>   	.name			= "vfio-pci",
>> -	.id_table		= NULL, /* only dynamic ids */
>> +	.id_table		= vfio_pci_table,
>>   	.probe			= vfio_pci_probe,
>>   	.remove			= vfio_pci_remove,
>>   	.sriov_configure	= vfio_pci_sriov_configure,
>> diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
>> index 8e291cfdaf06..cd256d9c60d2 100644
>> --- a/include/linux/mod_devicetable.h
>> +++ b/include/linux/mod_devicetable.h
>> @@ -16,6 +16,11 @@ typedef unsigned long kernel_ulong_t;
>>   
>>   #define PCI_ANY_ID (~0)
>>   
>> +
> Spurious blank line.

good catch, thanks.


>> +enum pci_id_flags {
>> +	PCI_ID_F_VFIO_DRIVER_OVERRIDE	= 1 << 0,
>> +};
> Why an enum?  Is the enum and the name following some similar style
> elsewhere?

We might want to add more flags in the future. I'll remove the enum name 
but let's keep the enum for future extensions.


>
>> +
>>   /**
>>    * struct pci_device_id - PCI device ID structure
>>    * @vendor:		Vendor ID to match (or PCI_ANY_ID)
>> @@ -34,12 +39,14 @@ typedef unsigned long kernel_ulong_t;
>>    *			Best practice is to use driver_data as an index
>>    *			into a static list of equivalent device types,
>>    *			instead of using it as a pointer.
>> + * @flags:		PCI flags of the driver. Bitmap of pci_id_flags enum.
>>    */
>>   struct pci_device_id {
>>   	__u32 vendor, device;		/* Vendor and device ID or PCI_ANY_ID*/
>>   	__u32 subvendor, subdevice;	/* Subsystem ID's or PCI_ANY_ID */
>>   	__u32 class, class_mask;	/* (class,subclass,prog-if) triplet */
>>   	kernel_ulong_t driver_data;	/* Data private to the driver */
>> +	__u32 flags;
>>   };
>>   
>>   
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 540b377ca8f6..fd84609ff06b 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -901,6 +901,33 @@ struct pci_driver {
>>   	.vendor = (vend), .device = (dev), \
>>   	.subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID
>>   
>> +/**
>> + * PCI_DEVICE_FLAGS - macro used to describe a PCI device with specific flags.
>> + * @vend: the 16 bit PCI Vendor ID
>> + * @dev: the 16 bit PCI Device ID
>> + * @fl: PCI Device flags as a bitmap of pci_id_flags enum
>> + *
>> + * This macro is used to create a struct pci_device_id that matches a
>> + * specific device. The subvendor and subdevice fields will be set to
>> + * PCI_ANY_ID.
>> + */
>> +#define PCI_DEVICE_FLAGS(vend, dev, fl) \
>> +	.vendor = (vend), .device = (dev), .subvendor = PCI_ANY_ID, \
>> +	.subdevice = PCI_ANY_ID, .flags = (fl)
>> +
>> +/**
>> + * PCI_DRIVER_OVERRIDE_DEVICE_VFIO - macro used to describe a VFIO
>> + *                                   "driver_override" PCI device.
>> + * @vend: the 16 bit PCI Vendor ID
>> + * @dev: the 16 bit PCI Device ID
>> + *
>> + * This macro is used to create a struct pci_device_id that matches a
>> + * specific device. The subvendor and subdevice fields will be set to
>> + * PCI_ANY_ID and the flags will be set to PCI_ID_F_VFIO_DRIVER_OVERRIDE.
>> + */
>> +#define PCI_DRIVER_OVERRIDE_DEVICE_VFIO(vend, dev) \
>> +	PCI_DEVICE_FLAGS(vend, dev, PCI_ID_F_VFIO_DRIVER_OVERRIDE)
>> +
>>   /**
>>    * PCI_DEVICE_SUB - macro used to describe a specific PCI device with subsystem
>>    * @vend: the 16 bit PCI Vendor ID
>> diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
>> index 9bb6c7edccc4..b927c36b8333 100644
>> --- a/scripts/mod/devicetable-offsets.c
>> +++ b/scripts/mod/devicetable-offsets.c
>> @@ -42,6 +42,7 @@ int main(void)
>>   	DEVID_FIELD(pci_device_id, subdevice);
>>   	DEVID_FIELD(pci_device_id, class);
>>   	DEVID_FIELD(pci_device_id, class_mask);
>> +	DEVID_FIELD(pci_device_id, flags);
>>   
>>   	DEVID(ccw_device_id);
>>   	DEVID_FIELD(ccw_device_id, match_flags);
>> diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
>> index 7c97fa8e36bc..f53b38e8f696 100644
>> --- a/scripts/mod/file2alias.c
>> +++ b/scripts/mod/file2alias.c
>> @@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename,
>>   	return 1;
>>   }
>>   
>> -/* Looks like: pci:vNdNsvNsdNbcNscNiN. */
>> +/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */
>>   static int do_pci_entry(const char *filename,
>>   			void *symval, char *alias)
>>   {
>> @@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename,
>>   	DEF_FIELD(symval, pci_device_id, subdevice);
>>   	DEF_FIELD(symval, pci_device_id, class);
>>   	DEF_FIELD(symval, pci_device_id, class_mask);
>> +	DEF_FIELD(symval, pci_device_id, flags);
> I'm a little bit wary of adding a new field to this kernel/user
> interface just for this single bit.  Maybe it's justified but feels
> like it's worth being careful.

Old applications are not aware of these flags.

what worries you ?

>
>> -	strcpy(alias, "pci:");
>> +	if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
>> +		strcpy(alias, "vfio_pci:");
>> +	else
>> +		strcpy(alias, "pci:");
>>   	ADD(alias, "v", vendor != PCI_ANY_ID, vendor);
>>   	ADD(alias, "d", device != PCI_ANY_ID, device);
>>   	ADD(alias, "sv", subvendor != PCI_ANY_ID, subvendor);
>> -- 
>> 2.18.1
>>
Jason Gunthorpe Aug. 6, 2021, 12:23 a.m. UTC | #7
On Wed, Aug 04, 2021 at 03:34:12PM -0500, Bjorn Helgaas wrote:

> > The first use will be to define a VFIO flag that indicates the PCI driver
> > is a VFIO driver.
>
> Is there such a thing as a "VFIO driver" today?  

Yes.

VFIO has long existed as a driver subsystem that binds drivers to
devices in various bus types. In the case of PCI the admin moves a PCI
device from normal operation to VFIO operation via something like:

echo vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override

Other bus types (platform, acpi, etc) have a similar command to move
them to VFIO.

> > VFIO drivers have a few special properties compared to normal PCI drivers:
> >  - They do not automatically bind. VFIO drivers are used to swap out the
> >    normal driver for a device and convert the PCI device to the VFIO
> >    subsystem.
> 
> The comment below says "... any matching PCI_ID_F_DRIVER_OVERRIDE
> [sic] entry is returned," which sounds like the opposite of "do not
> automatically bind."  Might be exposing my VFIO ignorance here.

The comment is in error
 
> >    The admin must make this choice and following the current uAPI this is
> >    usually done by using the driver_override sysfs.
> 
> I'm not sure "converting PCI device to the VFIO subsystem" is the
> right way to phrase this, but whatever it is, make this idea specific,
> e.g., by "echo pci-stub > /sys/.../driver_override" or whatever.

The next version will include the sequence we worked out with Alex in
the other branch of this thread. See below

> >  - The modules.alias includes the IDs of the VFIO PCI drivers, prefixing
> >    them with 'vfio_pci:' instead of the normal 'pci:'.
> > 
> >    This allows the userspace machinery that switches devices to VFIO to
> >    know what kernel drivers support what devices and allows it to trigger
> >    the proper device_override.
> 
> What does "switch device to VFIO" mean?  I could be reading this too
> literally (in my defense, I'm not a VFIO expert), but AFAICT this is
> not something you do to the *device*.  

It means change the struct device_driver bound to the struct device -
which is an operation that the admin does on the device object.

> I guess maybe this is something like "prevent the normal driver from
> claiming the device so we can use VFIO instead"?

no..

> Does "using VFIO" mean getting vfio-pci to claim the device?

If by claim you mean bind a pci_driver to the pci_dev, then yes.

> > As existing tools do not recognize the "vfio_pci:" mod-alias prefix this
> > keeps todays behavior the same. VFIO remains on the side, is never
> > autoloaded and can only be activated by direct admin action.
> 
> s/todays/today's/
> 
> > This patch is the infrastructure to provide the information in the
> > modules.alias to userspace and enable the only PCI VFIO driver. Later
> > series introduce additional HW specific VFIO PCI drivers.
> 
> s/the only/only the/ ?  (Not sure what you intend, but "the only"
> doesn't seem right)

"the only" is correct, at this point in the sequence there is only one
pci_driver that uses this, vfio_pci.ko

> Sorry, I know I'm totally missing the point here.

Lets try again..

PCI: Add a PCI_ID_F_VFIO_DRIVER_OVERRIDE flag to struct pci_device_id

Allow device drivers to include match entries in the modules.alias file
produced by kbuild that are not used for normal driver autoprobing and
module autoloading. Drivers using these match entries can be connected to
the PCI device manually, by userspace, using the existing driver_override
sysfs.

Add the flag PCI_ID_F_VFIO_DRIVER_OVERRIDE to indicate that the match
entry is for the VFIO subsystem. These match entries are prefixed with
"vfio_" in the modules.alias.

For example the resulting modules.alias may have:

  alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core
  alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci
  alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci

In this example mlx5_core and mlx5_vfio_pci match to the same PCI
device. The kernel will autoload and autobind to mlx5_core but the kernel
and udev mechanisms will ignore mlx5_vfio_pci.

When userspace wants to change a device to the VFIO subsystem userspace
can implement a generic algorithm:

   1) Identify the sysfs path to the device:
    /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0

   2) Get the modalias string from the kernel:
    $ cat /sys/bus/pci/devices/0000:01:00.0/modalias
    pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00

   3) Prefix it with vfio_:
    vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00

   4) Search modules.alias for the above string and select the entry that
      has the fewest *'s:
    alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci

   5) modprobe the matched module name:
    $ modprobe mlx5_vfio_pci

   6) cat the matched module name to driver_override:
    echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override

The algorithm is independent of bus type. In future the other buses's with
VFIO device drivers, like platform and ACPI, can use this algorithm as
well.

This patch is the infrastructure to provide the information in the
modules.alias to userspace. Convert the only VFIO pci_driver which
results in one new line in the modules.alias:

  alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci

Later series introduce additional HW specific VFIO PCI drivers, such as
mlx5_vfio_pci.

> > diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> > index 7c97fa8e36bc..f53b38e8f696 100644
> > +++ b/scripts/mod/file2alias.c
> > @@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename,
> >  	return 1;
> >  }
> >  
> > -/* Looks like: pci:vNdNsvNsdNbcNscNiN. */
> > +/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */
> >  static int do_pci_entry(const char *filename,
> >  			void *symval, char *alias)
> >  {
> > @@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename,
> >  	DEF_FIELD(symval, pci_device_id, subdevice);
> >  	DEF_FIELD(symval, pci_device_id, class);
> >  	DEF_FIELD(symval, pci_device_id, class_mask);
> > +	DEF_FIELD(symval, pci_device_id, flags);
> 
> I'm a little bit wary of adding a new field to this kernel/user
> interface just for this single bit.  Maybe it's justified but feels
> like it's worth being careful.

A couple of us looked at this in one of the RFC threads..

As far as we could tell this is not a kernel/user interface. It is an
interface within kbuild between gcc and file2alias and is not used or
really exported beyond the kernel build sequence.

Debian code search didn't find anything, for instance.

modules.alias, as output by file2alias during kbuild, is the canonical
"kernel/user" interface here. Everything that needs this data should
be using that.

Thanks,
Jason
Max Gurtovoy Aug. 11, 2021, 12:22 p.m. UTC | #8
Hi Bjorn,

On 8/6/2021 3:23 AM, Jason Gunthorpe wrote:
> On Wed, Aug 04, 2021 at 03:34:12PM -0500, Bjorn Helgaas wrote:
>
>>> The first use will be to define a VFIO flag that indicates the PCI driver
>>> is a VFIO driver.
>> Is there such a thing as a "VFIO driver" today?
> Yes.
>
> VFIO has long existed as a driver subsystem that binds drivers to
> devices in various bus types. In the case of PCI the admin moves a PCI
> device from normal operation to VFIO operation via something like:
>
> echo vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override
>
> Other bus types (platform, acpi, etc) have a similar command to move
> them to VFIO.
>
>>> VFIO drivers have a few special properties compared to normal PCI drivers:
>>>   - They do not automatically bind. VFIO drivers are used to swap out the
>>>     normal driver for a device and convert the PCI device to the VFIO
>>>     subsystem.
>> The comment below says "... any matching PCI_ID_F_DRIVER_OVERRIDE
>> [sic] entry is returned," which sounds like the opposite of "do not
>> automatically bind."  Might be exposing my VFIO ignorance here.
> The comment is in error
>   
>>>     The admin must make this choice and following the current uAPI this is
>>>     usually done by using the driver_override sysfs.
>> I'm not sure "converting PCI device to the VFIO subsystem" is the
>> right way to phrase this, but whatever it is, make this idea specific,
>> e.g., by "echo pci-stub > /sys/.../driver_override" or whatever.
> The next version will include the sequence we worked out with Alex in
> the other branch of this thread. See below
>
>>>   - The modules.alias includes the IDs of the VFIO PCI drivers, prefixing
>>>     them with 'vfio_pci:' instead of the normal 'pci:'.
>>>
>>>     This allows the userspace machinery that switches devices to VFIO to
>>>     know what kernel drivers support what devices and allows it to trigger
>>>     the proper device_override.
>> What does "switch device to VFIO" mean?  I could be reading this too
>> literally (in my defense, I'm not a VFIO expert), but AFAICT this is
>> not something you do to the *device*.
> It means change the struct device_driver bound to the struct device -
> which is an operation that the admin does on the device object.
>
>> I guess maybe this is something like "prevent the normal driver from
>> claiming the device so we can use VFIO instead"?
> no..
>
>> Does "using VFIO" mean getting vfio-pci to claim the device?
> If by claim you mean bind a pci_driver to the pci_dev, then yes.
>
>>> As existing tools do not recognize the "vfio_pci:" mod-alias prefix this
>>> keeps todays behavior the same. VFIO remains on the side, is never
>>> autoloaded and can only be activated by direct admin action.
>> s/todays/today's/
>>
>>> This patch is the infrastructure to provide the information in the
>>> modules.alias to userspace and enable the only PCI VFIO driver. Later
>>> series introduce additional HW specific VFIO PCI drivers.
>> s/the only/only the/ ?  (Not sure what you intend, but "the only"
>> doesn't seem right)
> "the only" is correct, at this point in the sequence there is only one
> pci_driver that uses this, vfio_pci.ko
>
>> Sorry, I know I'm totally missing the point here.
> Lets try again..
>
> PCI: Add a PCI_ID_F_VFIO_DRIVER_OVERRIDE flag to struct pci_device_id
>
> Allow device drivers to include match entries in the modules.alias file
> produced by kbuild that are not used for normal driver autoprobing and
> module autoloading. Drivers using these match entries can be connected to
> the PCI device manually, by userspace, using the existing driver_override
> sysfs.
>
> Add the flag PCI_ID_F_VFIO_DRIVER_OVERRIDE to indicate that the match
> entry is for the VFIO subsystem. These match entries are prefixed with
> "vfio_" in the modules.alias.
>
> For example the resulting modules.alias may have:
>
>    alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core
>    alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci
>    alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci
>
> In this example mlx5_core and mlx5_vfio_pci match to the same PCI
> device. The kernel will autoload and autobind to mlx5_core but the kernel
> and udev mechanisms will ignore mlx5_vfio_pci.
>
> When userspace wants to change a device to the VFIO subsystem userspace
> can implement a generic algorithm:
>
>     1) Identify the sysfs path to the device:
>      /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0
>
>     2) Get the modalias string from the kernel:
>      $ cat /sys/bus/pci/devices/0000:01:00.0/modalias
>      pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00
>
>     3) Prefix it with vfio_:
>      vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00
>
>     4) Search modules.alias for the above string and select the entry that
>        has the fewest *'s:
>      alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci
>
>     5) modprobe the matched module name:
>      $ modprobe mlx5_vfio_pci
>
>     6) cat the matched module name to driver_override:
>      echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override
>
> The algorithm is independent of bus type. In future the other buses's with
> VFIO device drivers, like platform and ACPI, can use this algorithm as
> well.
>
> This patch is the infrastructure to provide the information in the
> modules.alias to userspace. Convert the only VFIO pci_driver which
> results in one new line in the modules.alias:
>
>    alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci
>
> Later series introduce additional HW specific VFIO PCI drivers, such as
> mlx5_vfio_pci.

are we good with this commit message ?

And with the code logic ?

We would like to send V2 with the proposed fixes and the above commit 
message and get your ack on this.

Our goal is to merge this series and the first preparation series 
"Provide core infrastructure for managing open/release" sent by Jason to 
kernel 5.15.

The first series is in the final review phase but this series is mostly 
depend on this patch. For the other patches we have some kind of agreement.

hopefully we can collect more "reviewed-by" signatures before sending V2.


>>> diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
>>> index 7c97fa8e36bc..f53b38e8f696 100644
>>> +++ b/scripts/mod/file2alias.c
>>> @@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename,
>>>   	return 1;
>>>   }
>>>   
>>> -/* Looks like: pci:vNdNsvNsdNbcNscNiN. */
>>> +/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */
>>>   static int do_pci_entry(const char *filename,
>>>   			void *symval, char *alias)
>>>   {
>>> @@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename,
>>>   	DEF_FIELD(symval, pci_device_id, subdevice);
>>>   	DEF_FIELD(symval, pci_device_id, class);
>>>   	DEF_FIELD(symval, pci_device_id, class_mask);
>>> +	DEF_FIELD(symval, pci_device_id, flags);
>> I'm a little bit wary of adding a new field to this kernel/user
>> interface just for this single bit.  Maybe it's justified but feels
>> like it's worth being careful.
> A couple of us looked at this in one of the RFC threads..
>
> As far as we could tell this is not a kernel/user interface. It is an
> interface within kbuild between gcc and file2alias and is not used or
> really exported beyond the kernel build sequence.
>
> Debian code search didn't find anything, for instance.
>
> modules.alias, as output by file2alias during kbuild, is the canonical
> "kernel/user" interface here. Everything that needs this data should
> be using that.
>
> Thanks,
> Jason
Bjorn Helgaas Aug. 11, 2021, 7:07 p.m. UTC | #9
On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 04, 2021 at 03:34:12PM -0500, Bjorn Helgaas wrote:
> 
> > > The first use will be to define a VFIO flag that indicates the PCI driver
> > > is a VFIO driver.
> >
> > Is there such a thing as a "VFIO driver" today?  
> 
> Yes.
> 
> VFIO has long existed as a driver subsystem that binds drivers to
> devices in various bus types. In the case of PCI the admin moves a PCI
> device from normal operation to VFIO operation via something like:

What specifically makes a driver a "VFIO driver"?  Maybe that it
supports the VFIO ioctls in include/uapi/linux/vfio.h?  That by itself
doesn't require special treatment by the kernel, so I think there's
more here.

> echo vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override
> 
> Other bus types (platform, acpi, etc) have a similar command to move
> them to VFIO.

Do the other bus types have a flag analogous to
PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
other bus types, it'd be nice if the approach were similar.

> > > This patch is the infrastructure to provide the information in the
> > > modules.alias to userspace and enable the only PCI VFIO driver. Later
> > > series introduce additional HW specific VFIO PCI drivers.
> > 
> > s/the only/only the/ ?  (Not sure what you intend, but "the only"
> > doesn't seem right)
> 
> "the only" is correct, at this point in the sequence there is only one
> pci_driver that uses this, vfio_pci.ko

Can we just name the specific driver instead of obliquely referring to
"the only such driver", e.g., something like "... add a modules.alias
entry for vfio_pci.ko, currently the only PCI VFIO driver"?

> > Sorry, I know I'm totally missing the point here.
> 
> Lets try again..
> 
> PCI: Add a PCI_ID_F_VFIO_DRIVER_OVERRIDE flag to struct pci_device_id
> 
> Allow device drivers to include match entries in the modules.alias file
> produced by kbuild that are not used for normal driver autoprobing and
> module autoloading. Drivers using these match entries can be connected to
> the PCI device manually, by userspace, using the existing driver_override
> sysfs.

IIUC, the end result of this is basically a tweak to the existing
sysfs driver_override functionality.

And I *think* (correct me if I'm wrong), this actually has nothing in
particular to do with VFIO.  It's just that you want to expose some
device IDs that are only used for binding when driver_override is set.

> Add the flag PCI_ID_F_VFIO_DRIVER_OVERRIDE to indicate that the match
> entry is for the VFIO subsystem. These match entries are prefixed with
> "vfio_" in the modules.alias.
> 
> For example the resulting modules.alias may have:
> 
>   alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core
>   alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci
>   alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci
> 
> In this example mlx5_core and mlx5_vfio_pci match to the same PCI
> device. The kernel will autoload and autobind to mlx5_core but the kernel
> and udev mechanisms will ignore mlx5_vfio_pci.
> 
> When userspace wants to change a device to the VFIO subsystem userspace
> can implement a generic algorithm:
> 
>    1) Identify the sysfs path to the device:
>     /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0
> 
>    2) Get the modalias string from the kernel:
>     $ cat /sys/bus/pci/devices/0000:01:00.0/modalias
>     pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00

So far, I think this is all the existing behavior, unaffected by this
patch.

>    3) Prefix it with vfio_:
>     vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00
> 
>    4) Search modules.alias for the above string and select the entry that
>       has the fewest *'s:
>     alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci

And this patch basically adds this modules.alias entry.

Previously vfio_pci contained no vendor/device IDs, and the only way
to bind it to a device was to either:

  - Modprobe the driver and write dynamic device IDs to the driver's
    /sys/.../new_id.  This should directly bind the driver to all
    devices that match the new IDs (see new_id_store()).

or

  - Write "vfio_pci" to the device's /sys/.../driver_override.
    AFAICS, this won't bind anything (see driver_override_store()),
    but if we call the driver's .probe() method via modprobe or
    rescan, the driver_override will match any device regardless of
    ID.

IIUC, after this patch, you can add vendor/device IDs to a struct
pci_driver with this new flag.  These IDs are advertised via
modules.alias.

For driver binding, IDs with the new flag are eligible to match only
when driver_override is set to the matching driver.

Setting a device's driver_override has *always* caused the matching
driver to bind.  The only difference after this patch is that now we
give the driver an ID from its .id_table instead of pci_device_id_any.

>    5) modprobe the matched module name:
>     $ modprobe mlx5_vfio_pci

I assume somewhere in here you need to unbind mlx5_core before binding
mlx5_vfio_pci?

>    6) cat the matched module name to driver_override:
>     echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override

Don't you need something here to trigger the driver attach, i.e.,
should step 5 and step 6 be swapped?  What if the driver is already
loaded?  Can you modprobe again to make it bind to a second device?

> The algorithm is independent of bus type. In future the other buses's with
> VFIO device drivers, like platform and ACPI, can use this algorithm as
> well.

s/buses's/buses/

I see drivers/vfio/platform/vfio_platform.c; is that what you mean?  I
don't see any VFIO things with ACPI in their name, so maybe I'm
looking the wrong place.  If this is purely *plans* for the future,
maybe say something like "planned VFIO drivers ..."

> This patch is the infrastructure to provide the information in the
> modules.alias to userspace. Convert the only VFIO pci_driver which
> results in one new line in the modules.alias:

"Convert vfio_pci, currently the only VFIO PCI driver, which ..." ?

>   alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci
> 
> Later series introduce additional HW specific VFIO PCI drivers, such as
> mlx5_vfio_pci.
> 
> > > diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> > > index 7c97fa8e36bc..f53b38e8f696 100644
> > > +++ b/scripts/mod/file2alias.c
> > > @@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename,
> > >  	return 1;
> > >  }
> > >  
> > > -/* Looks like: pci:vNdNsvNsdNbcNscNiN. */
> > > +/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */
> > >  static int do_pci_entry(const char *filename,
> > >  			void *symval, char *alias)
> > >  {
> > > @@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename,
> > >  	DEF_FIELD(symval, pci_device_id, subdevice);
> > >  	DEF_FIELD(symval, pci_device_id, class);
> > >  	DEF_FIELD(symval, pci_device_id, class_mask);
> > > +	DEF_FIELD(symval, pci_device_id, flags);
> > 
> > I'm a little bit wary of adding a new field to this kernel/user
> > interface just for this single bit.  Maybe it's justified but feels
> > like it's worth being careful.
> 
> A couple of us looked at this in one of the RFC threads..
> 
> As far as we could tell this is not a kernel/user interface. It is an
> interface within kbuild between gcc and file2alias and is not used or
> really exported beyond the kernel build sequence.
> 
> Debian code search didn't find anything, for instance.
> 
> modules.alias, as output by file2alias during kbuild, is the canonical
> "kernel/user" interface here. Everything that needs this data should
> be using that.

Ah, thanks.  I was thinking this added something to /sys/.../modalias,
but sounds like that's not the case.
Jason Gunthorpe Aug. 12, 2021, 1:27 p.m. UTC | #10
On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
> On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
> > On Wed, Aug 04, 2021 at 03:34:12PM -0500, Bjorn Helgaas wrote:
> > 
> > > > The first use will be to define a VFIO flag that indicates the PCI driver
> > > > is a VFIO driver.
> > >
> > > Is there such a thing as a "VFIO driver" today?  
> > 
> > Yes.
> > 
> > VFIO has long existed as a driver subsystem that binds drivers to
> > devices in various bus types. In the case of PCI the admin moves a PCI
> > device from normal operation to VFIO operation via something like:
> 
> What specifically makes a driver a "VFIO driver"?  

It is a device driver whose probe function instantiates a "struct
vfio_device" which binds it to the VFIO subsystem and triggers
creation of the char devs, ioctls, etc.

No different from every other subsystem, really. Eg a netdev driver
creates a struct ndev_device, a TPM driver creates struct tpm_chip,
etc.

> supports the VFIO ioctls in include/uapi/linux/vfio.h?  That by itself
> doesn't require special treatment by the kernel, so I think there's
> more here.

The unique thing about VFIO, compared to all other subsystems, is that
VFIO is a second choice for driver binding. A device will have a
natural kernel driver, eg mlx5 naturally creates netdevs, and it has a
VFIO driver option. Userspace selects if it wants the device to
operate in normal mode or VFIO mode.

The kernel should never move a device to VFIO mode automatically -
which is the special behavior compared to any other normal pci_driver.

> > echo vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override
> > 
> > Other bus types (platform, acpi, etc) have a similar command to move
> > them to VFIO.
> 
> Do the other bus types have a flag analogous to
> PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
> other bus types, it'd be nice if the approach were similar.

They could, this series doesn't attempt it. I expect the approach to
be similar as driver_override was copied from PCI to other
busses. When this is completed I hope to take a look at it.

> > PCI: Add a PCI_ID_F_VFIO_DRIVER_OVERRIDE flag to struct pci_device_id
> > 
> > Allow device drivers to include match entries in the modules.alias file
> > produced by kbuild that are not used for normal driver autoprobing and
> > module autoloading. Drivers using these match entries can be connected to
> > the PCI device manually, by userspace, using the existing driver_override
> > sysfs.
> 
> IIUC, the end result of this is basically a tweak to the existing
> sysfs driver_override functionality.

Yes..

> And I *think* (correct me if I'm wrong), this actually has nothing in
> particular to do with VFIO.  It's just that you want to expose some
> device IDs that are only used for binding when driver_override is set.

The general concept has nothing to do with VFIO but adding the "vfio_"
prefix to the modalias is obviously VFIO specific.

The entire point is to convay to userspace the information that the
modules.alias line is just for vfio.

We could imagine in future some other use for this, in which case the
future user would use their own prefix, not vfio.
 
> > When userspace wants to change a device to the VFIO subsystem userspace
> > can implement a generic algorithm:
> > 
> >    1) Identify the sysfs path to the device:
> >     /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0
> > 
> >    2) Get the modalias string from the kernel:
> >     $ cat /sys/bus/pci/devices/0000:01:00.0/modalias
> >     pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00
> 
> So far, I think this is all the existing behavior, unaffected by this
> patch.

Yes.
 
> >    3) Prefix it with vfio_:
> >     vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00
> > 
> >    4) Search modules.alias for the above string and select the entry that
> >       has the fewest *'s:
> >     alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci
> 
> And this patch basically adds this modules.alias entry.

Yes.
 
> Previously vfio_pci contained no vendor/device IDs, and the only way
> to bind it to a device was to either:
> 
>   - Modprobe the driver and write dynamic device IDs to the driver's
>     /sys/.../new_id.  This should directly bind the driver to all
>     devices that match the new IDs (see new_id_store()).
> 
> or
> 
>   - Write "vfio_pci" to the device's /sys/.../driver_override.
>     AFAICS, this won't bind anything (see driver_override_store()),
>     but if we call the driver's .probe() method via modprobe or
>     rescan, the driver_override will match any device regardless of
>     ID.

Yes

> IIUC, after this patch, you can add vendor/device IDs to a struct
> pci_driver with this new flag.  These IDs are advertised via
> modules.alias.

Yes
 
> For driver binding, IDs with the new flag are eligible to match only
> when driver_override is set to the matching driver.

Yes
 
> Setting a device's driver_override has *always* caused the matching
> driver to bind.  The only difference after this patch is that now we
> give the driver an ID from its .id_table instead of pci_device_id_any.

Almost - before a .id_table entried might be returned as well. The
difference here is that there are "hidden" entries in the id_table
that is only used by driver_overrride and we can return that hidden
entry.

> >    5) modprobe the matched module name:
> >     $ modprobe mlx5_vfio_pci
> 
> I assume somewhere in here you need to unbind mlx5_core before binding
> mlx5_vfio_pci?

Er, yes, I skipped some steps here where unbind/bind has to be done
 
> >    6) cat the matched module name to driver_override:
> >     echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override
> 
> Don't you need something here to trigger the driver attach, i.e.,
> should step 5 and step 6 be swapped?  What if the driver is already
> loaded? 

The full sequence is more like:

     echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override
     echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
     echo 0000:01:00.0 > /sys/bus/pci/drivers_probe

> Can you modprobe again to make it bind to a second device?

modprobe is a single-shot, it just loads the module and doesn't
trigger any driver binding. modprobing a second time is a NOP.

> I see drivers/vfio/platform/vfio_platform.c; is that what you mean?

Yes, look around vfio_platform_acpi_probe()

> I don't see any VFIO things with ACPI in their name, so maybe I'm
> looking the wrong place.  If this is purely *plans* for the future,
> maybe say something like "planned VFIO drivers ..."

Sure

Jason
Bjorn Helgaas Aug. 12, 2021, 3:42 p.m. UTC | #11
On Wed, Jul 21, 2021 at 07:16:06PM +0300, Yishai Hadas wrote:
> From: Max Gurtovoy <mgurtovoy@nvidia.com>
> 
> The new flag field is be used to allow PCI drivers to signal the core code
> during driver matching and when generating the modules.alias information.
> ...

> @@ -152,10 +152,27 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
>  	}
>  	spin_unlock(&drv->dynids.lock);
>  
> -	if (!found_id)
> -		found_id = pci_match_id(drv->id_table, dev);
> +	if (found_id)
> +		return found_id;
> +
> +	ids = drv->id_table;
> +	while ((found_id = pci_match_id(ids, dev))) {
> +		/*
> +		 * The match table is split based on driver_override. Check the
> +		 * flags as well so that any matching PCI_ID_F_DRIVER_OVERRIDE
> +		 * entry is returned.
> +		 */
> +		if ((found_id->flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE) &&
> +		    !dev->driver_override)
> +			ids = found_id + 1;
> +		else
> +			break;
> +	}
>  
> -	/* driver_override will always match, send a dummy id */
> +	/*
> +	 * if no static match, driver_override will always match, send a dummy
> +	 * id.
> +	 */
>  	if (!found_id && dev->driver_override)
>  		found_id = &pci_device_id_any;

Possibly more readable:

  while ((found_id = pci_match_id(ids, dev))) {

    /*
     * PCI_ID_F_VFIO_DRIVER_OVERRIDE entries only match when
     * driver_override matches this driver.
     */
    if (found_id->flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE) {
      if (dev->driver_override)
	return found_id;
      else
	ids = found_id + 1;
    } else {
      return found_id;
    }
  }

  /* Driver_override will always match; send a dummy ID */
  if (dev->driver_override)
    return &pci_device_id_any;

  return NULL;
Bjorn Helgaas Aug. 12, 2021, 3:57 p.m. UTC | #12
On Thu, Aug 12, 2021 at 10:27:28AM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
> > On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:

> > Do the other bus types have a flag analogous to
> > PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
> > other bus types, it'd be nice if the approach were similar.
> 
> They could, this series doesn't attempt it. I expect the approach to
> be similar as driver_override was copied from PCI to other
> busses. When this is completed I hope to take a look at it.

I think this would make more sense as two patches:

  - Add a "PCI_ID_DRIVER_OVERRIDE" flag.  This is not VFIO-specific,
    since nothing in PCI depends on the VFIO-ness of drivers that use
    the flag.  The only point here is that driver id_table entries
    with this flag only match when driver_override matches the driver.

  - Update file2alias.c to export the flags and the "vfio_pci:" alias.
    This seems to be the only place where VFIO comes into play, and
    putting it in a separate patch will make it much smaller and it
    will be clear how it could be extended for other buses.

> > I assume somewhere in here you need to unbind mlx5_core before binding
> > mlx5_vfio_pci?
> 
> Er, yes, I skipped some steps here where unbind/bind has to be done
>  
> > >    6) cat the matched module name to driver_override:
> > >     echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override
> > 
> > Don't you need something here to trigger the driver attach, i.e.,
> > should step 5 and step 6 be swapped?  What if the driver is already
> > loaded? 
> 
> The full sequence is more like:
> 
>      echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override
>      echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
>      echo 0000:01:00.0 > /sys/bus/pci/drivers_probe

Thanks a lot for this!  I didn't know about drivers_probe (see
drivers_probe_store()), and it doesn't seem to be documented anywhere
except sysfs-bus-usb, where it's only incidental to USB.

Bjorn
Jason Gunthorpe Aug. 12, 2021, 7:51 p.m. UTC | #13
On Thu, Aug 12, 2021 at 10:57:07AM -0500, Bjorn Helgaas wrote:
> On Thu, Aug 12, 2021 at 10:27:28AM -0300, Jason Gunthorpe wrote:
> > On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
> > > On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
> 
> > > Do the other bus types have a flag analogous to
> > > PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
> > > other bus types, it'd be nice if the approach were similar.
> > 
> > They could, this series doesn't attempt it. I expect the approach to
> > be similar as driver_override was copied from PCI to other
> > busses. When this is completed I hope to take a look at it.
> 
> I think this would make more sense as two patches:
> 
>   - Add a "PCI_ID_DRIVER_OVERRIDE" flag.  This is not VFIO-specific,
>     since nothing in PCI depends on the VFIO-ness of drivers that use
>     the flag.  The only point here is that driver id_table entries
>     with this flag only match when driver_override matches the driver.

This would require using two flags, one to indicate the above to the
PCI code and another to indicate the vfio_pci string to
file2alias. This doesn't seem justified at this point, IMHO.

>   - Update file2alias.c to export the flags and the "vfio_pci:" alias.
>     This seems to be the only place where VFIO comes into play, and
>     putting it in a separate patch will make it much smaller and it
>     will be clear how it could be extended for other buses.

Well, I don't want to see a flag called PCI_ID_DRIVER_OVERRIDE mapped
to the string "vfio_pci", that is just really confusing.

Other busses need to copy pretty much the entire patch, there isn't
really any sharing here. I don't see splitting as good here..

What this logically wants is the match entry to have a

  const char *file2alias_prefix

Which would be set to "vfio_", but I'm not keen to bloat the match
entry further to do that..

> > The full sequence is more like:
> > 
> >      echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override
> >      echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
> >      echo 0000:01:00.0 > /sys/bus/pci/drivers_probe
> 
> Thanks a lot for this!  I didn't know about drivers_probe (see
> drivers_probe_store()), and it doesn't seem to be documented anywhere
> except sysfs-bus-usb, where it's only incidental to USB.

Okay, lets make the changes in the commit message, it does help

Jason
Bjorn Helgaas Aug. 12, 2021, 8:26 p.m. UTC | #14
On Thu, Aug 12, 2021 at 04:51:26PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 12, 2021 at 10:57:07AM -0500, Bjorn Helgaas wrote:
> > On Thu, Aug 12, 2021 at 10:27:28AM -0300, Jason Gunthorpe wrote:
> > > On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
> > 
> > > > Do the other bus types have a flag analogous to
> > > > PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
> > > > other bus types, it'd be nice if the approach were similar.
> > > 
> > > They could, this series doesn't attempt it. I expect the approach to
> > > be similar as driver_override was copied from PCI to other
> > > busses. When this is completed I hope to take a look at it.
> > 
> > I think this would make more sense as two patches:
> > 
> >   - Add a "PCI_ID_DRIVER_OVERRIDE" flag.  This is not VFIO-specific,
> >     since nothing in PCI depends on the VFIO-ness of drivers that use
> >     the flag.  The only point here is that driver id_table entries
> >     with this flag only match when driver_override matches the driver.
> 
> This would require using two flags, one to indicate the above to the
> PCI code and another to indicate the vfio_pci string to
> file2alias. This doesn't seem justified at this point, IMHO.

I don't think it requires two flags.  do_pci_entry() has:

  if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
    strcpy(alias, "vfio_pci:");

I'm just proposing a rename:

s/PCI_ID_F_VFIO_DRIVER_OVERRIDE/PCI_ID_DRIVER_OVERRIDE/

> >   - Update file2alias.c to export the flags and the "vfio_pci:" alias.
> >     This seems to be the only place where VFIO comes into play, and
> >     putting it in a separate patch will make it much smaller and it
> >     will be clear how it could be extended for other buses.
> 
> Well, I don't want to see a flag called PCI_ID_DRIVER_OVERRIDE mapped
> to the string "vfio_pci", that is just really confusing.

Hahaha, I see, that's fair :)  It confused me for a long time why you
wanted "VFIO" in the flag name because from the kernel's point of
view, the flag is not related to any VFIO-ness.  It's only related to
a special variety of driver_override, and VFIO happens to be one user
of it.

I think a separate patch that maps the flag to "vfio_pci" would be
less confusing because without the distractions of the PCI core
changes, it will be obvious that "vfio_" is a file2alias thing that's
there for userspace convenience, not for kernel reasons.

Do you envision any other prefixes in the future?  I hope we don't
have to clutter pci_match_device() with checking multiple flags.
Maybe the problem is that the modules.alias entry includes "vfio_" --
maybe we need a more generic prefix with just the idea of an
"alternate" driver.

Bjorn
Max Gurtovoy Aug. 12, 2021, 11:21 p.m. UTC | #15
On 8/12/2021 11:26 PM, Bjorn Helgaas wrote:
> On Thu, Aug 12, 2021 at 04:51:26PM -0300, Jason Gunthorpe wrote:
>> On Thu, Aug 12, 2021 at 10:57:07AM -0500, Bjorn Helgaas wrote:
>>> On Thu, Aug 12, 2021 at 10:27:28AM -0300, Jason Gunthorpe wrote:
>>>> On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
>>>>> On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
>>>>> Do the other bus types have a flag analogous to
>>>>> PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
>>>>> other bus types, it'd be nice if the approach were similar.
>>>> They could, this series doesn't attempt it. I expect the approach to
>>>> be similar as driver_override was copied from PCI to other
>>>> busses. When this is completed I hope to take a look at it.
>>> I think this would make more sense as two patches:
>>>
>>>    - Add a "PCI_ID_DRIVER_OVERRIDE" flag.  This is not VFIO-specific,
>>>      since nothing in PCI depends on the VFIO-ness of drivers that use
>>>      the flag.  The only point here is that driver id_table entries
>>>      with this flag only match when driver_override matches the driver.
>> This would require using two flags, one to indicate the above to the
>> PCI code and another to indicate the vfio_pci string to
>> file2alias. This doesn't seem justified at this point, IMHO.
> I don't think it requires two flags.  do_pci_entry() has:
>
>    if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
>      strcpy(alias, "vfio_pci:");
>
> I'm just proposing a rename:
>
> s/PCI_ID_F_VFIO_DRIVER_OVERRIDE/PCI_ID_DRIVER_OVERRIDE/
>
>>>    - Update file2alias.c to export the flags and the "vfio_pci:" alias.
>>>      This seems to be the only place where VFIO comes into play, and
>>>      putting it in a separate patch will make it much smaller and it
>>>      will be clear how it could be extended for other buses.
>> Well, I don't want to see a flag called PCI_ID_DRIVER_OVERRIDE mapped
>> to the string "vfio_pci", that is just really confusing.
> Hahaha, I see, that's fair :)  It confused me for a long time why you
> wanted "VFIO" in the flag name because from the kernel's point of
> view, the flag is not related to any VFIO-ness.  It's only related to
> a special variety of driver_override, and VFIO happens to be one user
> of it.

In my original patch I used

#define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE

and in the pci core code I used PCI_ID_DRIVER_OVERRIDE in the "if" clause.

So we can maybe do that and leave the option to future update of the 
define without changing the core code.

In the future we can have something like:

#define PCI_ID_DRIVER_OVERRIDE (PCI_ID_F_VFIO_DRIVER_OVERRIDE | 
PCI_ID_F_MY_BUS_DRIVER_OVERRIDE)

The file2alias.c still have to use the exact 
PCI_ID_F_VFIO_DRIVER_OVERRIDE flag to add "vfio_" prefix.

Is that better ?

>
> I think a separate patch that maps the flag to "vfio_pci" would be
> less confusing because without the distractions of the PCI core
> changes, it will be obvious that "vfio_" is a file2alias thing that's
> there for userspace convenience, not for kernel reasons.
>
> Do you envision any other prefixes in the future?  I hope we don't
> have to clutter pci_match_device() with checking multiple flags.
> Maybe the problem is that the modules.alias entry includes "vfio_" --
> maybe we need a more generic prefix with just the idea of an
> "alternate" driver.
>
> Bjorn
Bjorn Helgaas Aug. 13, 2021, 5:44 p.m. UTC | #16
On Fri, Aug 13, 2021 at 02:21:41AM +0300, Max Gurtovoy wrote:
> 
> On 8/12/2021 11:26 PM, Bjorn Helgaas wrote:
> > On Thu, Aug 12, 2021 at 04:51:26PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Aug 12, 2021 at 10:57:07AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Aug 12, 2021 at 10:27:28AM -0300, Jason Gunthorpe wrote:
> > > > > On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
> > > > > > On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
> > > > > > Do the other bus types have a flag analogous to
> > > > > > PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
> > > > > > other bus types, it'd be nice if the approach were similar.
> > > > > They could, this series doesn't attempt it. I expect the approach to
> > > > > be similar as driver_override was copied from PCI to other
> > > > > busses. When this is completed I hope to take a look at it.
> > > > I think this would make more sense as two patches:
> > > > 
> > > >    - Add a "PCI_ID_DRIVER_OVERRIDE" flag.  This is not VFIO-specific,
> > > >      since nothing in PCI depends on the VFIO-ness of drivers that use
> > > >      the flag.  The only point here is that driver id_table entries
> > > >      with this flag only match when driver_override matches the driver.
> > > This would require using two flags, one to indicate the above to the
> > > PCI code and another to indicate the vfio_pci string to
> > > file2alias. This doesn't seem justified at this point, IMHO.
> > I don't think it requires two flags.  do_pci_entry() has:
> > 
> >    if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
> >      strcpy(alias, "vfio_pci:");
> > 
> > I'm just proposing a rename:
> > 
> > s/PCI_ID_F_VFIO_DRIVER_OVERRIDE/PCI_ID_DRIVER_OVERRIDE/
> > 
> > > >    - Update file2alias.c to export the flags and the "vfio_pci:" alias.
> > > >      This seems to be the only place where VFIO comes into play, and
> > > >      putting it in a separate patch will make it much smaller and it
> > > >      will be clear how it could be extended for other buses.
> > > Well, I don't want to see a flag called PCI_ID_DRIVER_OVERRIDE mapped
> > > to the string "vfio_pci", that is just really confusing.
> > Hahaha, I see, that's fair :)  It confused me for a long time why you
> > wanted "VFIO" in the flag name because from the kernel's point of
> > view, the flag is not related to any VFIO-ness.  It's only related to
> > a special variety of driver_override, and VFIO happens to be one user
> > of it.
> 
> In my original patch I used
> 
> #define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE
> 
> and in the pci core code I used PCI_ID_DRIVER_OVERRIDE in the "if" clause.
> 
> So we can maybe do that and leave the option to future update of the define
> without changing the core code.
> 
> In the future we can have something like:
> 
> #define PCI_ID_DRIVER_OVERRIDE (PCI_ID_F_VFIO_DRIVER_OVERRIDE |
> PCI_ID_F_MY_BUS_DRIVER_OVERRIDE)
> 
> The file2alias.c still have to use the exact PCI_ID_F_VFIO_DRIVER_OVERRIDE
> flag to add "vfio_" prefix.
> 
> Is that better ?

I don't think it's worth having two separate #defines.  If we need
more in the future, we can add them when we need them.

What if we renamed "flags" to be specifically for this override case,
e.g., "override_only"?  Then the flag could be
PCI_ID_F_VFIO_DRIVER_OVERRIDE, which would trigger a "vfio_" prefix in
file2alias.c, but pci_match_device() could just check for it being
non-zero, without caring whether the reason is VFIO or something else,
e.g.,

  pci_match_device(...)
  {
    ...
    if (found_id->override_only) {
      if (dev->driver_override)
        return found_id;
      ...

Bjorn
Max Gurtovoy Aug. 14, 2021, 11:27 p.m. UTC | #17
On 8/13/2021 8:44 PM, Bjorn Helgaas wrote:
> On Fri, Aug 13, 2021 at 02:21:41AM +0300, Max Gurtovoy wrote:
>> On 8/12/2021 11:26 PM, Bjorn Helgaas wrote:
>>> On Thu, Aug 12, 2021 at 04:51:26PM -0300, Jason Gunthorpe wrote:
>>>> On Thu, Aug 12, 2021 at 10:57:07AM -0500, Bjorn Helgaas wrote:
>>>>> On Thu, Aug 12, 2021 at 10:27:28AM -0300, Jason Gunthorpe wrote:
>>>>>> On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
>>>>>>> On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
>>>>>>> Do the other bus types have a flag analogous to
>>>>>>> PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
>>>>>>> other bus types, it'd be nice if the approach were similar.
>>>>>> They could, this series doesn't attempt it. I expect the approach to
>>>>>> be similar as driver_override was copied from PCI to other
>>>>>> busses. When this is completed I hope to take a look at it.
>>>>> I think this would make more sense as two patches:
>>>>>
>>>>>     - Add a "PCI_ID_DRIVER_OVERRIDE" flag.  This is not VFIO-specific,
>>>>>       since nothing in PCI depends on the VFIO-ness of drivers that use
>>>>>       the flag.  The only point here is that driver id_table entries
>>>>>       with this flag only match when driver_override matches the driver.
>>>> This would require using two flags, one to indicate the above to the
>>>> PCI code and another to indicate the vfio_pci string to
>>>> file2alias. This doesn't seem justified at this point, IMHO.
>>> I don't think it requires two flags.  do_pci_entry() has:
>>>
>>>     if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
>>>       strcpy(alias, "vfio_pci:");
>>>
>>> I'm just proposing a rename:
>>>
>>> s/PCI_ID_F_VFIO_DRIVER_OVERRIDE/PCI_ID_DRIVER_OVERRIDE/
>>>
>>>>>     - Update file2alias.c to export the flags and the "vfio_pci:" alias.
>>>>>       This seems to be the only place where VFIO comes into play, and
>>>>>       putting it in a separate patch will make it much smaller and it
>>>>>       will be clear how it could be extended for other buses.
>>>> Well, I don't want to see a flag called PCI_ID_DRIVER_OVERRIDE mapped
>>>> to the string "vfio_pci", that is just really confusing.
>>> Hahaha, I see, that's fair :)  It confused me for a long time why you
>>> wanted "VFIO" in the flag name because from the kernel's point of
>>> view, the flag is not related to any VFIO-ness.  It's only related to
>>> a special variety of driver_override, and VFIO happens to be one user
>>> of it.
>> In my original patch I used
>>
>> #define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE
>>
>> and in the pci core code I used PCI_ID_DRIVER_OVERRIDE in the "if" clause.
>>
>> So we can maybe do that and leave the option to future update of the define
>> without changing the core code.
>>
>> In the future we can have something like:
>>
>> #define PCI_ID_DRIVER_OVERRIDE (PCI_ID_F_VFIO_DRIVER_OVERRIDE |
>> PCI_ID_F_MY_BUS_DRIVER_OVERRIDE)
>>
>> The file2alias.c still have to use the exact PCI_ID_F_VFIO_DRIVER_OVERRIDE
>> flag to add "vfio_" prefix.
>>
>> Is that better ?
> I don't think it's worth having two separate #defines.  If we need
> more in the future, we can add them when we need them.

I meant 1 #define and 1 enum:

enum {
     PCI_ID_F_VFIO_DRIVER_OVERRIDE    = 1 << 0,
};

#define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE

>
> What if we renamed "flags" to be specifically for this override case,
> e.g., "override_only"?  Then the flag could be
> PCI_ID_F_VFIO_DRIVER_OVERRIDE, which would trigger a "vfio_" prefix in
> file2alias.c, but pci_match_device() could just check for it being
> non-zero, without caring whether the reason is VFIO or something else,
> e.g.,
>
>    pci_match_device(...)
>    {
>      ...
>      if (found_id->override_only) {
>        if (dev->driver_override)
>          return found_id;
>        ...

Jason suggested something like this:


static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
                             struct pci_dev *dev)
{
     struct pci_dynid *dynid;
     const struct pci_device_id *found_id = NULL, *ids;

     /* When driver_override is set, only bind to the matching driver */
     if (dev->driver_override && strcmp(dev->driver_override, drv->name))
         return NULL;

     /* Look at the dynamic ids first, before the static ones */
     spin_lock(&drv->dynids.lock);
     list_for_each_entry(dynid, &drv->dynids.list, node) {
         if (pci_match_one_device(&dynid->id, dev)) {
             found_id = &dynid->id;
             break;
         }
     }
     spin_unlock(&drv->dynids.lock);

     if (found_id)
         return found_id;

     for (ids = drv->id_table; (found_id = pci_match_id(ids, dev));
          ids = found_id + 1) {
         /*
          * The match table is split based on driver_override. Check the
          * flags as well so that any matching
          * PCI_ID_F_VFIO_DRIVER_OVERRIDE entry is returned.
          */
         if (!(found_id->flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE) ||
             dev->driver_override)
             return found_id;
     }

     /*
      * if no static match, driver_override will always match, send a dummy
      * id.
      */
     if (dev->driver_override)
         return &pci_device_id_any;
     return NULL;
}


It looks good to me as well.

I prefer the "flags" naming since its more generic and easy to extend.

can we continue with the above suggestion for V2 ?

It's really a matter of taste..

> Bjorn
Bjorn Helgaas Aug. 16, 2021, 5:21 p.m. UTC | #18
On Sun, Aug 15, 2021 at 02:27:13AM +0300, Max Gurtovoy wrote:
> On 8/13/2021 8:44 PM, Bjorn Helgaas wrote:
> > On Fri, Aug 13, 2021 at 02:21:41AM +0300, Max Gurtovoy wrote:
> > > On 8/12/2021 11:26 PM, Bjorn Helgaas wrote:
> > > > On Thu, Aug 12, 2021 at 04:51:26PM -0300, Jason Gunthorpe wrote:
> > > > > On Thu, Aug 12, 2021 at 10:57:07AM -0500, Bjorn Helgaas wrote:
> > > > > > On Thu, Aug 12, 2021 at 10:27:28AM -0300, Jason Gunthorpe wrote:
> > > > > > > On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
> > > > > > > > On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
> > > > > > > > Do the other bus types have a flag analogous to
> > > > > > > > PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
> > > > > > > > other bus types, it'd be nice if the approach were similar.
> > > > > > > They could, this series doesn't attempt it. I expect the approach to
> > > > > > > be similar as driver_override was copied from PCI to other
> > > > > > > busses. When this is completed I hope to take a look at it.
> > > > > > I think this would make more sense as two patches:
> > > > > > 
> > > > > >     - Add a "PCI_ID_DRIVER_OVERRIDE" flag.  This is not VFIO-specific,
> > > > > >       since nothing in PCI depends on the VFIO-ness of drivers that use
> > > > > >       the flag.  The only point here is that driver id_table entries
> > > > > >       with this flag only match when driver_override matches the driver.
> > > > > This would require using two flags, one to indicate the above to the
> > > > > PCI code and another to indicate the vfio_pci string to
> > > > > file2alias. This doesn't seem justified at this point, IMHO.
> > > > I don't think it requires two flags.  do_pci_entry() has:
> > > > 
> > > >     if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
> > > >       strcpy(alias, "vfio_pci:");
> > > > 
> > > > I'm just proposing a rename:
> > > > 
> > > > s/PCI_ID_F_VFIO_DRIVER_OVERRIDE/PCI_ID_DRIVER_OVERRIDE/
> > > > 
> > > > > >     - Update file2alias.c to export the flags and the "vfio_pci:" alias.
> > > > > >       This seems to be the only place where VFIO comes into play, and
> > > > > >       putting it in a separate patch will make it much smaller and it
> > > > > >       will be clear how it could be extended for other buses.
> > > > > Well, I don't want to see a flag called PCI_ID_DRIVER_OVERRIDE mapped
> > > > > to the string "vfio_pci", that is just really confusing.
> > > > Hahaha, I see, that's fair :)  It confused me for a long time why you
> > > > wanted "VFIO" in the flag name because from the kernel's point of
> > > > view, the flag is not related to any VFIO-ness.  It's only related to
> > > > a special variety of driver_override, and VFIO happens to be one user
> > > > of it.
> > > In my original patch I used
> > > 
> > > #define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE
> > > 
> > > and in the pci core code I used PCI_ID_DRIVER_OVERRIDE in the "if" clause.
> > > 
> > > So we can maybe do that and leave the option to future update of the define
> > > without changing the core code.
> > > 
> > > In the future we can have something like:
> > > 
> > > #define PCI_ID_DRIVER_OVERRIDE (PCI_ID_F_VFIO_DRIVER_OVERRIDE |
> > > PCI_ID_F_MY_BUS_DRIVER_OVERRIDE)
> > > 
> > > The file2alias.c still have to use the exact PCI_ID_F_VFIO_DRIVER_OVERRIDE
> > > flag to add "vfio_" prefix.
> > > 
> > > Is that better ?
> > I don't think it's worth having two separate #defines.  If we need
> > more in the future, we can add them when we need them.
> 
> I meant 1 #define and 1 enum:
> 
> enum {
>     PCI_ID_F_VFIO_DRIVER_OVERRIDE    = 1 << 0,
> };
> 
> #define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE

Basically the same thing.  Doesn't seem worthwhile to me to have both.
When reading the code, it's not at all obvious why you would define a
new name for PCI_ID_F_VFIO_DRIVER_OVERRIDE.

> > What if we renamed "flags" to be specifically for this override case,
> > e.g., "override_only"?  Then the flag could be
> > PCI_ID_F_VFIO_DRIVER_OVERRIDE, which would trigger a "vfio_" prefix in
> > file2alias.c, but pci_match_device() could just check for it being
> > non-zero, without caring whether the reason is VFIO or something else,
> > e.g.,
> > 
> >    pci_match_device(...)
> >    {
> >      ...
> >      if (found_id->override_only) {
> >        if (dev->driver_override)
> >          return found_id;
> >        ...
> 
> Jason suggested something like this:
> 
> static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
>                             struct pci_dev *dev)
> {
>     struct pci_dynid *dynid;
>     const struct pci_device_id *found_id = NULL, *ids;
> 
>     /* When driver_override is set, only bind to the matching driver */
>     if (dev->driver_override && strcmp(dev->driver_override, drv->name))
>         return NULL;
> 
>     /* Look at the dynamic ids first, before the static ones */
>     spin_lock(&drv->dynids.lock);
>     list_for_each_entry(dynid, &drv->dynids.list, node) {
>         if (pci_match_one_device(&dynid->id, dev)) {
>             found_id = &dynid->id;
>             break;
>         }
>     }
>     spin_unlock(&drv->dynids.lock);
> 
>     if (found_id)
>         return found_id;
> 
>     for (ids = drv->id_table; (found_id = pci_match_id(ids, dev));
>          ids = found_id + 1) {
>         /*
>          * The match table is split based on driver_override. Check the
>          * flags as well so that any matching
>          * PCI_ID_F_VFIO_DRIVER_OVERRIDE entry is returned.
>          */
>         if (!(found_id->flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE) ||
>             dev->driver_override)
>             return found_id;
>     }
> 
>     /*
>      * if no static match, driver_override will always match, send a dummy
>      * id.
>      */
>     if (dev->driver_override)
>         return &pci_device_id_any;
>     return NULL;
> }
> 
> 
> It looks good to me as well.

I missed your point.  Isn't the above basically the 09/12 patch [1] we're
talking about?

Yes, I see the code structure is slightly different, but the question
we're talking about here is the name of the "flags" field and the enum
or #define for the VFIO bit.

> I prefer the "flags" naming since its more generic and easy to extend.

We don't need to worry about "flags" being generic or extensible until
we need to extend it.  It's easy to fiddle with it at that point.

> can we continue with the above suggestion for V2 ?

I don't see what really changed with the above suggestion.

The point I'm trying to make is that using PCI_ID_F_VFIO_DRIVER_OVERRIDE 
in pci_match_device() suggests that the code there has some connection
or dependency on VFIO, but it does not.

[1] https://lore.kernel.org/r/20210721161609.68223-10-yishaih@nvidia.com
Max Gurtovoy Aug. 17, 2021, 1:01 p.m. UTC | #19
On 8/16/2021 8:21 PM, Bjorn Helgaas wrote:
> On Sun, Aug 15, 2021 at 02:27:13AM +0300, Max Gurtovoy wrote:
>> On 8/13/2021 8:44 PM, Bjorn Helgaas wrote:
>>> On Fri, Aug 13, 2021 at 02:21:41AM +0300, Max Gurtovoy wrote:
>>>> On 8/12/2021 11:26 PM, Bjorn Helgaas wrote:
>>>>> On Thu, Aug 12, 2021 at 04:51:26PM -0300, Jason Gunthorpe wrote:
>>>>>> On Thu, Aug 12, 2021 at 10:57:07AM -0500, Bjorn Helgaas wrote:
>>>>>>> On Thu, Aug 12, 2021 at 10:27:28AM -0300, Jason Gunthorpe wrote:
>>>>>>>> On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
>>>>>>>>> On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
>>>>>>>>> Do the other bus types have a flag analogous to
>>>>>>>>> PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
>>>>>>>>> other bus types, it'd be nice if the approach were similar.
>>>>>>>> They could, this series doesn't attempt it. I expect the approach to
>>>>>>>> be similar as driver_override was copied from PCI to other
>>>>>>>> busses. When this is completed I hope to take a look at it.
>>>>>>> I think this would make more sense as two patches:
>>>>>>>
>>>>>>>      - Add a "PCI_ID_DRIVER_OVERRIDE" flag.  This is not VFIO-specific,
>>>>>>>        since nothing in PCI depends on the VFIO-ness of drivers that use
>>>>>>>        the flag.  The only point here is that driver id_table entries
>>>>>>>        with this flag only match when driver_override matches the driver.
>>>>>> This would require using two flags, one to indicate the above to the
>>>>>> PCI code and another to indicate the vfio_pci string to
>>>>>> file2alias. This doesn't seem justified at this point, IMHO.
>>>>> I don't think it requires two flags.  do_pci_entry() has:
>>>>>
>>>>>      if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
>>>>>        strcpy(alias, "vfio_pci:");
>>>>>
>>>>> I'm just proposing a rename:
>>>>>
>>>>> s/PCI_ID_F_VFIO_DRIVER_OVERRIDE/PCI_ID_DRIVER_OVERRIDE/
>>>>>
>>>>>>>      - Update file2alias.c to export the flags and the "vfio_pci:" alias.
>>>>>>>        This seems to be the only place where VFIO comes into play, and
>>>>>>>        putting it in a separate patch will make it much smaller and it
>>>>>>>        will be clear how it could be extended for other buses.
>>>>>> Well, I don't want to see a flag called PCI_ID_DRIVER_OVERRIDE mapped
>>>>>> to the string "vfio_pci", that is just really confusing.
>>>>> Hahaha, I see, that's fair :)  It confused me for a long time why you
>>>>> wanted "VFIO" in the flag name because from the kernel's point of
>>>>> view, the flag is not related to any VFIO-ness.  It's only related to
>>>>> a special variety of driver_override, and VFIO happens to be one user
>>>>> of it.
>>>> In my original patch I used
>>>>
>>>> #define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE
>>>>
>>>> and in the pci core code I used PCI_ID_DRIVER_OVERRIDE in the "if" clause.
>>>>
>>>> So we can maybe do that and leave the option to future update of the define
>>>> without changing the core code.
>>>>
>>>> In the future we can have something like:
>>>>
>>>> #define PCI_ID_DRIVER_OVERRIDE (PCI_ID_F_VFIO_DRIVER_OVERRIDE |
>>>> PCI_ID_F_MY_BUS_DRIVER_OVERRIDE)
>>>>
>>>> The file2alias.c still have to use the exact PCI_ID_F_VFIO_DRIVER_OVERRIDE
>>>> flag to add "vfio_" prefix.
>>>>
>>>> Is that better ?
>>> I don't think it's worth having two separate #defines.  If we need
>>> more in the future, we can add them when we need them.
>> I meant 1 #define and 1 enum:
>>
>> enum {
>>      PCI_ID_F_VFIO_DRIVER_OVERRIDE    = 1 << 0,
>> };
>>
>> #define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE
> Basically the same thing.  Doesn't seem worthwhile to me to have both.
> When reading the code, it's not at all obvious why you would define a
> new name for PCI_ID_F_VFIO_DRIVER_OVERRIDE.

because we need the "vfio_" prefix in the alias.

And the match can use PCI_ID_DRIVER_OVERRIDE that in the future cab be 
(#define PCI_ID_DRIVER_OVERRIDE (PCI_ID_F_VFIO_DRIVER_OVERRIDE | 
PCI_ID_F_SOME_OTHER_ALIAS_DRIVER_OVERRIDE)

>>> What if we renamed "flags" to be specifically for this override case,
>>> e.g., "override_only"?  Then the flag could be
>>> PCI_ID_F_VFIO_DRIVER_OVERRIDE, which would trigger a "vfio_" prefix in
>>> file2alias.c, but pci_match_device() could just check for it being
>>> non-zero, without caring whether the reason is VFIO or something else,
>>> e.g.,
>>>
>>>     pci_match_device(...)
>>>     {
>>>       ...
>>>       if (found_id->override_only) {
>>>         if (dev->driver_override)
>>>           return found_id;
>>>         ...
>> Jason suggested something like this:
>>
>> static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
>>                              struct pci_dev *dev)
>> {
>>      struct pci_dynid *dynid;
>>      const struct pci_device_id *found_id = NULL, *ids;
>>
>>      /* When driver_override is set, only bind to the matching driver */
>>      if (dev->driver_override && strcmp(dev->driver_override, drv->name))
>>          return NULL;
>>
>>      /* Look at the dynamic ids first, before the static ones */
>>      spin_lock(&drv->dynids.lock);
>>      list_for_each_entry(dynid, &drv->dynids.list, node) {
>>          if (pci_match_one_device(&dynid->id, dev)) {
>>              found_id = &dynid->id;
>>              break;
>>          }
>>      }
>>      spin_unlock(&drv->dynids.lock);
>>
>>      if (found_id)
>>          return found_id;
>>
>>      for (ids = drv->id_table; (found_id = pci_match_id(ids, dev));
>>           ids = found_id + 1) {
>>          /*
>>           * The match table is split based on driver_override. Check the
>>           * flags as well so that any matching
>>           * PCI_ID_F_VFIO_DRIVER_OVERRIDE entry is returned.
>>           */
>>          if (!(found_id->flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE) ||
>>              dev->driver_override)
>>              return found_id;
>>      }
>>
>>      /*
>>       * if no static match, driver_override will always match, send a dummy
>>       * id.
>>       */
>>      if (dev->driver_override)
>>          return &pci_device_id_any;
>>      return NULL;
>> }
>>
>>
>> It looks good to me as well.
> I missed your point.  Isn't the above basically the 09/12 patch [1] we're
> talking about?
>
> Yes, I see the code structure is slightly different, but the question
> we're talking about here is the name of the "flags" field and the enum
> or #define for the VFIO bit.

I guess the renaming of "__u32 flags" to "__u32 driver_override" is ok 
from my perspective.

The enum for vfio should stay.

The prefix we want in the alias is "vfio_" and not "driver_override_".

This will allow a clean uAPI. "driver_override_" prefix will be too 
generic for userspace tools like libvirt that would like to find a 
*VFIO* driver not something else.

Thus we need alias to be "vfio_".

In the future if some other driver will use this flag, it will create an 
alias also. In your suggestion, the alias will be the same and the 
userspace tool won't be able to distinguish between the two.

But in the original solution, for non vfio driver override drivers, one 
can use new enum PCI_ID_F_SOME_OTHER_ALIAS_DRIVER_OVERRIDE and add its 
own alias prefix for recognition "my_prefix_".

>
>> I prefer the "flags" naming since its more generic and easy to extend.
> We don't need to worry about "flags" being generic or extensible until
> we need to extend it.  It's easy to fiddle with it at that point.
>
>> can we continue with the above suggestion for V2 ?
> I don't see what really changed with the above suggestion.
>
> The point I'm trying to make is that using PCI_ID_F_VFIO_DRIVER_OVERRIDE
> in pci_match_device() suggests that the code there has some connection
> or dependency on VFIO, but it does not.

This is why I suggested a "#define PCI_ID_DRIVER_OVERRIDE 
PCI_ID_F_VFIO_DRIVER_OVERRIDE"


>
> [1] https://lore.kernel.org/r/20210721161609.68223-10-yishaih@nvidia.com
Bjorn Helgaas Aug. 17, 2021, 2:13 p.m. UTC | #20
On Tue, Aug 17, 2021 at 04:01:49PM +0300, Max Gurtovoy wrote:
> On 8/16/2021 8:21 PM, Bjorn Helgaas wrote:
> > On Sun, Aug 15, 2021 at 02:27:13AM +0300, Max Gurtovoy wrote:
> > > On 8/13/2021 8:44 PM, Bjorn Helgaas wrote:
> > > > On Fri, Aug 13, 2021 at 02:21:41AM +0300, Max Gurtovoy wrote:
> > > > > On 8/12/2021 11:26 PM, Bjorn Helgaas wrote:
> > > > > > On Thu, Aug 12, 2021 at 04:51:26PM -0300, Jason Gunthorpe wrote:
> > > > > > > On Thu, Aug 12, 2021 at 10:57:07AM -0500, Bjorn Helgaas wrote:
> > > > > > > > On Thu, Aug 12, 2021 at 10:27:28AM -0300, Jason Gunthorpe wrote:
> > > > > > > > > On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
> > > > > > > > > > On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
> > > > > > > > > > Do the other bus types have a flag analogous to
> > > > > > > > > > PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
> > > > > > > > > > other bus types, it'd be nice if the approach were similar.
> > > > > > > > > They could, this series doesn't attempt it. I expect the approach to
> > > > > > > > > be similar as driver_override was copied from PCI to other
> > > > > > > > > busses. When this is completed I hope to take a look at it.
> > > > > > > > I think this would make more sense as two patches:
> > > > > > > > 
> > > > > > > >      - Add a "PCI_ID_DRIVER_OVERRIDE" flag.  This is not VFIO-specific,
> > > > > > > >        since nothing in PCI depends on the VFIO-ness of drivers that use
> > > > > > > >        the flag.  The only point here is that driver id_table entries
> > > > > > > >        with this flag only match when driver_override matches the driver.
> > > > > > > This would require using two flags, one to indicate the above to the
> > > > > > > PCI code and another to indicate the vfio_pci string to
> > > > > > > file2alias. This doesn't seem justified at this point, IMHO.
> > > > > > I don't think it requires two flags.  do_pci_entry() has:
> > > > > > 
> > > > > >      if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
> > > > > >        strcpy(alias, "vfio_pci:");
> > > > > > 
> > > > > > I'm just proposing a rename:
> > > > > > 
> > > > > > s/PCI_ID_F_VFIO_DRIVER_OVERRIDE/PCI_ID_DRIVER_OVERRIDE/
> > > > > > 
> > > > > > > >      - Update file2alias.c to export the flags and the "vfio_pci:" alias.
> > > > > > > >        This seems to be the only place where VFIO comes into play, and
> > > > > > > >        putting it in a separate patch will make it much smaller and it
> > > > > > > >        will be clear how it could be extended for other buses.
> > > > > > > Well, I don't want to see a flag called PCI_ID_DRIVER_OVERRIDE mapped
> > > > > > > to the string "vfio_pci", that is just really confusing.
> > > > > > Hahaha, I see, that's fair :)  It confused me for a long time why you
> > > > > > wanted "VFIO" in the flag name because from the kernel's point of
> > > > > > view, the flag is not related to any VFIO-ness.  It's only related to
> > > > > > a special variety of driver_override, and VFIO happens to be one user
> > > > > > of it.
> > > > > In my original patch I used
> > > > > 
> > > > > #define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE
> > > > > 
> > > > > and in the pci core code I used PCI_ID_DRIVER_OVERRIDE in the "if" clause.
> > > > > 
> > > > > So we can maybe do that and leave the option to future update of the define
> > > > > without changing the core code.
> > > > > 
> > > > > In the future we can have something like:
> > > > > 
> > > > > #define PCI_ID_DRIVER_OVERRIDE (PCI_ID_F_VFIO_DRIVER_OVERRIDE |
> > > > > PCI_ID_F_MY_BUS_DRIVER_OVERRIDE)
> > > > > 
> > > > > The file2alias.c still have to use the exact PCI_ID_F_VFIO_DRIVER_OVERRIDE
> > > > > flag to add "vfio_" prefix.
> > > > > 
> > > > > Is that better ?
> > > > I don't think it's worth having two separate #defines.  If we need
> > > > more in the future, we can add them when we need them.
> > > I meant 1 #define and 1 enum:
> > > 
> > > enum {
> > >      PCI_ID_F_VFIO_DRIVER_OVERRIDE    = 1 << 0,
> > > };
> > > 
> > > #define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE
> > Basically the same thing.  Doesn't seem worthwhile to me to have both.
> > When reading the code, it's not at all obvious why you would define a
> > new name for PCI_ID_F_VFIO_DRIVER_OVERRIDE.
> 
> because we need the "vfio_" prefix in the alias.
> 
> And the match can use PCI_ID_DRIVER_OVERRIDE that in the future cab be
> (#define PCI_ID_DRIVER_OVERRIDE (PCI_ID_F_VFIO_DRIVER_OVERRIDE |
> PCI_ID_F_SOME_OTHER_ALIAS_DRIVER_OVERRIDE)

Read this again:
https://lore.kernel.org/r/20210813174459.GA2594783@bjorn-Precision-5520

That gives you a "vfio_" prefix without the unnecessary VFIO
connection in pci_match_device.
Max Gurtovoy Aug. 17, 2021, 2:44 p.m. UTC | #21
On 8/17/2021 5:13 PM, Bjorn Helgaas wrote:
> On Tue, Aug 17, 2021 at 04:01:49PM +0300, Max Gurtovoy wrote:
>> On 8/16/2021 8:21 PM, Bjorn Helgaas wrote:
>>> On Sun, Aug 15, 2021 at 02:27:13AM +0300, Max Gurtovoy wrote:
>>>> On 8/13/2021 8:44 PM, Bjorn Helgaas wrote:
>>>>> On Fri, Aug 13, 2021 at 02:21:41AM +0300, Max Gurtovoy wrote:
>>>>>> On 8/12/2021 11:26 PM, Bjorn Helgaas wrote:
>>>>>>> On Thu, Aug 12, 2021 at 04:51:26PM -0300, Jason Gunthorpe wrote:
>>>>>>>> On Thu, Aug 12, 2021 at 10:57:07AM -0500, Bjorn Helgaas wrote:
>>>>>>>>> On Thu, Aug 12, 2021 at 10:27:28AM -0300, Jason Gunthorpe wrote:
>>>>>>>>>> On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote:
>>>>>>>>>>> On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote:
>>>>>>>>>>> Do the other bus types have a flag analogous to
>>>>>>>>>>> PCI_ID_F_VFIO_DRIVER_OVERRIDE?  If we're doing something similar to
>>>>>>>>>>> other bus types, it'd be nice if the approach were similar.
>>>>>>>>>> They could, this series doesn't attempt it. I expect the approach to
>>>>>>>>>> be similar as driver_override was copied from PCI to other
>>>>>>>>>> busses. When this is completed I hope to take a look at it.
>>>>>>>>> I think this would make more sense as two patches:
>>>>>>>>>
>>>>>>>>>       - Add a "PCI_ID_DRIVER_OVERRIDE" flag.  This is not VFIO-specific,
>>>>>>>>>         since nothing in PCI depends on the VFIO-ness of drivers that use
>>>>>>>>>         the flag.  The only point here is that driver id_table entries
>>>>>>>>>         with this flag only match when driver_override matches the driver.
>>>>>>>> This would require using two flags, one to indicate the above to the
>>>>>>>> PCI code and another to indicate the vfio_pci string to
>>>>>>>> file2alias. This doesn't seem justified at this point, IMHO.
>>>>>>> I don't think it requires two flags.  do_pci_entry() has:
>>>>>>>
>>>>>>>       if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
>>>>>>>         strcpy(alias, "vfio_pci:");
>>>>>>>
>>>>>>> I'm just proposing a rename:
>>>>>>>
>>>>>>> s/PCI_ID_F_VFIO_DRIVER_OVERRIDE/PCI_ID_DRIVER_OVERRIDE/
>>>>>>>
>>>>>>>>>       - Update file2alias.c to export the flags and the "vfio_pci:" alias.
>>>>>>>>>         This seems to be the only place where VFIO comes into play, and
>>>>>>>>>         putting it in a separate patch will make it much smaller and it
>>>>>>>>>         will be clear how it could be extended for other buses.
>>>>>>>> Well, I don't want to see a flag called PCI_ID_DRIVER_OVERRIDE mapped
>>>>>>>> to the string "vfio_pci", that is just really confusing.
>>>>>>> Hahaha, I see, that's fair :)  It confused me for a long time why you
>>>>>>> wanted "VFIO" in the flag name because from the kernel's point of
>>>>>>> view, the flag is not related to any VFIO-ness.  It's only related to
>>>>>>> a special variety of driver_override, and VFIO happens to be one user
>>>>>>> of it.
>>>>>> In my original patch I used
>>>>>>
>>>>>> #define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE
>>>>>>
>>>>>> and in the pci core code I used PCI_ID_DRIVER_OVERRIDE in the "if" clause.
>>>>>>
>>>>>> So we can maybe do that and leave the option to future update of the define
>>>>>> without changing the core code.
>>>>>>
>>>>>> In the future we can have something like:
>>>>>>
>>>>>> #define PCI_ID_DRIVER_OVERRIDE (PCI_ID_F_VFIO_DRIVER_OVERRIDE |
>>>>>> PCI_ID_F_MY_BUS_DRIVER_OVERRIDE)
>>>>>>
>>>>>> The file2alias.c still have to use the exact PCI_ID_F_VFIO_DRIVER_OVERRIDE
>>>>>> flag to add "vfio_" prefix.
>>>>>>
>>>>>> Is that better ?
>>>>> I don't think it's worth having two separate #defines.  If we need
>>>>> more in the future, we can add them when we need them.
>>>> I meant 1 #define and 1 enum:
>>>>
>>>> enum {
>>>>       PCI_ID_F_VFIO_DRIVER_OVERRIDE    = 1 << 0,
>>>> };
>>>>
>>>> #define PCI_ID_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE
>>> Basically the same thing.  Doesn't seem worthwhile to me to have both.
>>> When reading the code, it's not at all obvious why you would define a
>>> new name for PCI_ID_F_VFIO_DRIVER_OVERRIDE.
>> because we need the "vfio_" prefix in the alias.
>>
>> And the match can use PCI_ID_DRIVER_OVERRIDE that in the future cab be
>> (#define PCI_ID_DRIVER_OVERRIDE (PCI_ID_F_VFIO_DRIVER_OVERRIDE |
>> PCI_ID_F_SOME_OTHER_ALIAS_DRIVER_OVERRIDE)
> Read this again:
> https://lore.kernel.org/r/20210813174459.GA2594783@bjorn-Precision-5520
>
> That gives you a "vfio_" prefix without the unnecessary VFIO
> connection in pci_match_device.

I see.

So I guess the following code should be fine:


static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
                             struct pci_dev *dev)
{
     struct pci_dynid *dynid;
     const struct pci_device_id *found_id = NULL, *ids;

     /* When driver_override is set, only bind to the matching driver */
     if (dev->driver_override && strcmp(dev->driver_override, drv->name))
         return NULL;

     /* Look at the dynamic ids first, before the static ones */
     spin_lock(&drv->dynids.lock);
     list_for_each_entry(dynid, &drv->dynids.list, node) {
         if (pci_match_one_device(&dynid->id, dev)) {
             found_id = &dynid->id;
             break;
         }
     }
     spin_unlock(&drv->dynids.lock);

     if (found_id)
         return found_id;

     for (ids = drv->id_table; (found_id = pci_match_id(ids, dev));
          ids = found_id + 1) {
         /*
          * The match table is split based on driver_override.
          */
         if (!found_id->override_only || dev->driver_override)
             return found_id;
     }

     /*
      * if no static match, driver_override will always match, send a dummy
      * id.
      */
     if (dev->driver_override)
         return &pci_device_id_any;
     return NULL;
}
diff mbox series

Patch

diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst
index fa651e25d98c..24e70a386887 100644
--- a/Documentation/PCI/pci.rst
+++ b/Documentation/PCI/pci.rst
@@ -103,6 +103,7 @@  need pass only as many optional fields as necessary:
   - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
   - class and classmask fields default to 0
   - driver_data defaults to 0UL.
+  - flags field defaults to 0.
 
 Note that driver_data must match the value used by any of the pci_device_id
 entries defined in the driver. This makes the driver_data field mandatory
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 3a72352aa5cf..1ed8a4ab96f1 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -136,7 +136,7 @@  static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
 						    struct pci_dev *dev)
 {
 	struct pci_dynid *dynid;
-	const struct pci_device_id *found_id = NULL;
+	const struct pci_device_id *found_id = NULL, *ids;
 
 	/* When driver_override is set, only bind to the matching driver */
 	if (dev->driver_override && strcmp(dev->driver_override, drv->name))
@@ -152,10 +152,27 @@  static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
 	}
 	spin_unlock(&drv->dynids.lock);
 
-	if (!found_id)
-		found_id = pci_match_id(drv->id_table, dev);
+	if (found_id)
+		return found_id;
+
+	ids = drv->id_table;
+	while ((found_id = pci_match_id(ids, dev))) {
+		/*
+		 * The match table is split based on driver_override. Check the
+		 * flags as well so that any matching PCI_ID_F_DRIVER_OVERRIDE
+		 * entry is returned.
+		 */
+		if ((found_id->flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE) &&
+		    !dev->driver_override)
+			ids = found_id + 1;
+		else
+			break;
+	}
 
-	/* driver_override will always match, send a dummy id */
+	/*
+	 * if no static match, driver_override will always match, send a dummy
+	 * id.
+	 */
 	if (!found_id && dev->driver_override)
 		found_id = &pci_device_id_any;
 
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0272b95d9c5f..7a43edbe8618 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -181,9 +181,16 @@  static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
 	return vfio_pci_core_sriov_configure(pdev, nr_virtfn);
 }
 
+static const struct pci_device_id vfio_pci_table[] = {
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_ANY_ID, PCI_ANY_ID) }, /* match all by default */
+	{ 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, vfio_pci_table);
+
 static struct pci_driver vfio_pci_driver = {
 	.name			= "vfio-pci",
-	.id_table		= NULL, /* only dynamic ids */
+	.id_table		= vfio_pci_table,
 	.probe			= vfio_pci_probe,
 	.remove			= vfio_pci_remove,
 	.sriov_configure	= vfio_pci_sriov_configure,
diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
index 8e291cfdaf06..cd256d9c60d2 100644
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@ -16,6 +16,11 @@  typedef unsigned long kernel_ulong_t;
 
 #define PCI_ANY_ID (~0)
 
+
+enum pci_id_flags {
+	PCI_ID_F_VFIO_DRIVER_OVERRIDE	= 1 << 0,
+};
+
 /**
  * struct pci_device_id - PCI device ID structure
  * @vendor:		Vendor ID to match (or PCI_ANY_ID)
@@ -34,12 +39,14 @@  typedef unsigned long kernel_ulong_t;
  *			Best practice is to use driver_data as an index
  *			into a static list of equivalent device types,
  *			instead of using it as a pointer.
+ * @flags:		PCI flags of the driver. Bitmap of pci_id_flags enum.
  */
 struct pci_device_id {
 	__u32 vendor, device;		/* Vendor and device ID or PCI_ANY_ID*/
 	__u32 subvendor, subdevice;	/* Subsystem ID's or PCI_ANY_ID */
 	__u32 class, class_mask;	/* (class,subclass,prog-if) triplet */
 	kernel_ulong_t driver_data;	/* Data private to the driver */
+	__u32 flags;
 };
 
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 540b377ca8f6..fd84609ff06b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -901,6 +901,33 @@  struct pci_driver {
 	.vendor = (vend), .device = (dev), \
 	.subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID
 
+/**
+ * PCI_DEVICE_FLAGS - macro used to describe a PCI device with specific flags.
+ * @vend: the 16 bit PCI Vendor ID
+ * @dev: the 16 bit PCI Device ID
+ * @fl: PCI Device flags as a bitmap of pci_id_flags enum
+ *
+ * This macro is used to create a struct pci_device_id that matches a
+ * specific device. The subvendor and subdevice fields will be set to
+ * PCI_ANY_ID.
+ */
+#define PCI_DEVICE_FLAGS(vend, dev, fl) \
+	.vendor = (vend), .device = (dev), .subvendor = PCI_ANY_ID, \
+	.subdevice = PCI_ANY_ID, .flags = (fl)
+
+/**
+ * PCI_DRIVER_OVERRIDE_DEVICE_VFIO - macro used to describe a VFIO
+ *                                   "driver_override" PCI device.
+ * @vend: the 16 bit PCI Vendor ID
+ * @dev: the 16 bit PCI Device ID
+ *
+ * This macro is used to create a struct pci_device_id that matches a
+ * specific device. The subvendor and subdevice fields will be set to
+ * PCI_ANY_ID and the flags will be set to PCI_ID_F_VFIO_DRIVER_OVERRIDE.
+ */
+#define PCI_DRIVER_OVERRIDE_DEVICE_VFIO(vend, dev) \
+	PCI_DEVICE_FLAGS(vend, dev, PCI_ID_F_VFIO_DRIVER_OVERRIDE)
+
 /**
  * PCI_DEVICE_SUB - macro used to describe a specific PCI device with subsystem
  * @vend: the 16 bit PCI Vendor ID
diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
index 9bb6c7edccc4..b927c36b8333 100644
--- a/scripts/mod/devicetable-offsets.c
+++ b/scripts/mod/devicetable-offsets.c
@@ -42,6 +42,7 @@  int main(void)
 	DEVID_FIELD(pci_device_id, subdevice);
 	DEVID_FIELD(pci_device_id, class);
 	DEVID_FIELD(pci_device_id, class_mask);
+	DEVID_FIELD(pci_device_id, flags);
 
 	DEVID(ccw_device_id);
 	DEVID_FIELD(ccw_device_id, match_flags);
diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index 7c97fa8e36bc..f53b38e8f696 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -426,7 +426,7 @@  static int do_ieee1394_entry(const char *filename,
 	return 1;
 }
 
-/* Looks like: pci:vNdNsvNsdNbcNscNiN. */
+/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */
 static int do_pci_entry(const char *filename,
 			void *symval, char *alias)
 {
@@ -440,8 +440,12 @@  static int do_pci_entry(const char *filename,
 	DEF_FIELD(symval, pci_device_id, subdevice);
 	DEF_FIELD(symval, pci_device_id, class);
 	DEF_FIELD(symval, pci_device_id, class_mask);
+	DEF_FIELD(symval, pci_device_id, flags);
 
-	strcpy(alias, "pci:");
+	if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
+		strcpy(alias, "vfio_pci:");
+	else
+		strcpy(alias, "pci:");
 	ADD(alias, "v", vendor != PCI_ANY_ID, vendor);
 	ADD(alias, "d", device != PCI_ANY_ID, device);
 	ADD(alias, "sv", subvendor != PCI_ANY_ID, subvendor);