diff mbox

uio/uio_pci_generic: Add SR-IOV support

Message ID 1506517162.30379.2.camel@infradead.org (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

David Woodhouse Sept. 27, 2017, 12:59 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

Allow userspace to configure SR-IOV VFs through sysfs.

Currently, we need an in-kernel driver to permit this. But sometimes
*all* we want to do is enable the VFs so that we can assign them to
guests; we don't actually need to deal with the PF in any other way
from the host kernel. So let's make it possible to use UIO for that.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
It's not entirely clear to me why we require the driver to "enable"
SR-IOV like this anyway — were there some which needed to do something
special and device-specific instead of just falling through to
pci_{en,dis}able_sriov(), such that we need to effectively whitelist
this in the driver rather than blacklisting the "problematic" ones via
PCI quirks?

 drivers/uio/uio_pci_generic.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Bjorn Helgaas Sept. 27, 2017, 10 p.m. UTC | #1
[+cc Don, Alex D, Alex W, Bryant, Bodong, Michael, kvm list]

On Wed, Sep 27, 2017 at 01:59:22PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Allow userspace to configure SR-IOV VFs through sysfs.
> 
> Currently, we need an in-kernel driver to permit this. But sometimes
> *all* we want to do is enable the VFs so that we can assign them to
> guests; we don't actually need to deal with the PF in any other way
> from the host kernel. So let's make it possible to use UIO for that.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> It's not entirely clear to me why we require the driver to "enable"
> SR-IOV like this anyway — were there some which needed to do something
> special and device-specific instead of just falling through to
> pci_{en,dis}able_sriov(), such that we need to effectively whitelist
> this in the driver rather than blacklisting the "problematic" ones via
> PCI quirks?

IIUC, this question is basically "why doesn't the PCI core enable IOV
automatically when it sees an IOV-capable device?"

I think one reason is that an admin might want to control the number
of VFs we enable (e.g., via 1789382a72a5 ("PCI: SRIOV control and
status via sysfs" [1]).  But I guess you already know about that,
since this patch uses that sysfs path, so maybe I don't understand
your question.

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1789382a72a5

>  drivers/uio/uio_pci_generic.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
> index a56fdf9..bd196f0 100644
> --- a/drivers/uio/uio_pci_generic.c
> +++ b/drivers/uio/uio_pci_generic.c
> @@ -108,15 +108,27 @@ static void remove(struct pci_dev *pdev)
>  	struct uio_pci_generic_dev *gdev = pci_get_drvdata(pdev);
>  
>  	uio_unregister_device(&gdev->info);
> +	pci_disable_sriov(pdev);
>  	pci_disable_device(pdev);
>  	kfree(gdev);
>  }
>  
> +static int sriov_configure(struct pci_dev *pdev, int num_vfs)
> +{
> +	if (!num_vfs) {
> +		pci_disable_sriov(pdev);
> +		return 0;
> +	}
> +
> +	return pci_enable_sriov(pdev, num_vfs);
> +}
> +
>  static struct pci_driver uio_pci_driver = {
>  	.name = "uio_pci_generic",
>  	.id_table = NULL, /* only dynamic id's */
>  	.probe = probe,
>  	.remove = remove,
> +	.sriov_configure = sriov_configure,
>  };
>  
>  module_pci_driver(uio_pci_driver);
> -- 
> 2.7.4
> 
> -- 
> dwmw2
David Woodhouse Sept. 27, 2017, 10:20 p.m. UTC | #2
On Wed, 2017-09-27 at 17:00 -0500, Bjorn Helgaas wrote:
> 
> 
> IIUC, this question is basically "why doesn't the PCI core enable IOV
> automatically when it sees an IOV-capable device?"
> 
> I think one reason is that an admin might want to control the number
> of VFs we enable (e.g., via 1789382a72a5 ("PCI: SRIOV control and
> status via sysfs" [1]).  But I guess you already know about that,
> since this patch uses that sysfs path, so maybe I don't understand
> your question.


I mean, why doesn't the PCI core *allow* SR-IOV to be enabled via
sysfs, unless the driver does this?
Alexander H Duyck Sept. 27, 2017, 11:06 p.m. UTC | #3
On Wed, Sep 27, 2017 at 3:20 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Wed, 2017-09-27 at 17:00 -0500, Bjorn Helgaas wrote:
>>
>>
>> IIUC, this question is basically "why doesn't the PCI core enable IOV
>> automatically when it sees an IOV-capable device?"
>>
>> I think one reason is that an admin might want to control the number
>> of VFs we enable (e.g., via 1789382a72a5 ("PCI: SRIOV control and
>> status via sysfs" [1]).  But I guess you already know about that,
>> since this patch uses that sysfs path, so maybe I don't understand
>> your question.
>
>
> I mean, why doesn't the PCI core *allow* SR-IOV to be enabled via
> sysfs, unless the driver does this?

The general idea is that the driver usually has to free up resources
so they can be reassigned to the VF devices.

For example in the case of the Intel NICs enabling SR-IOV reassigns
the queues to the VFs, and the PF has to be aware that this change is
happening so that it doesn't try to make use of queues that then
belong to the VFs.

- Alex
Don Dutile Sept. 28, 2017, 12:12 p.m. UTC | #4
On 09/27/2017 06:00 PM, Bjorn Helgaas wrote:
> [+cc Don, Alex D, Alex W, Bryant, Bodong, Michael, kvm list]
>
> On Wed, Sep 27, 2017 at 01:59:22PM +0100, David Woodhouse wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>>
>> Allow userspace to configure SR-IOV VFs through sysfs.
>>
>> Currently, we need an in-kernel driver to permit this. But sometimes
>> *all* we want to do is enable the VFs so that we can assign them to
>> guests; we don't actually need to deal with the PF in any other way
>> from the host kernel. So let's make it possible to use UIO for that.
>>
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> ---
>> It's not entirely clear to me why we require the driver to "enable"
>> SR-IOV like this anyway — were there some which needed to do something
>> special and device-specific instead of just falling through to
>> pci_{en,dis}able_sriov(), such that we need to effectively whitelist
>> this in the driver rather than blacklisting the "problematic" ones via
>> PCI quirks?
>
> IIUC, this question is basically "why doesn't the PCI core enable IOV
> automatically when it sees an IOV-capable device?"
>
> I think one reason is that an admin might want to control the number
> of VFs we enable (e.g., via 1789382a72a5 ("PCI: SRIOV control and
> status via sysfs" [1]).  But I guess you already know about that,
> since this patch uses that sysfs path, so maybe I don't understand
> your question.
>
The major reason is that most bios don't scan extended pci config for
SRIOV info, and thus, dont provide enough resources to configure VFs.
If an SRIOV-capable device is one of the first devices scanned in a PCI tree
(off a PCI Root Port), then it could consume all the bus resources,
and fail configuring the rest of the PCI devices, often leaving the system
unbootable.
Another reason why it's not an all-or-nothing selection when a PF's VFs are
enabled as well -- there may be enough resources available for some VFs to
be enabled, but not all.  The (needs-to-be-ECN'd) req that VFs of a PF
have to be aligned to the VF BAR[n]*num_VFs_enabled is the real difficulty,
since large, aligned free mmio space is not commonly 'free' after a PCI bus(tree)
scan and resource allocation. Free bus num space (gaps), and MSI resources
add further complications.

Will try to make some time later today to dig into the patches further
and add more concrete suggestions/feedback afterward.

> [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1789382a72a5
>
>>  drivers/uio/uio_pci_generic.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
>> index a56fdf9..bd196f0 100644
>> --- a/drivers/uio/uio_pci_generic.c
>> +++ b/drivers/uio/uio_pci_generic.c
>> @@ -108,15 +108,27 @@ static void remove(struct pci_dev *pdev)
>>  	struct uio_pci_generic_dev *gdev = pci_get_drvdata(pdev);
>>
>>  	uio_unregister_device(&gdev->info);
>> +	pci_disable_sriov(pdev);
>>  	pci_disable_device(pdev);
>>  	kfree(gdev);
>>  }
>>
>> +static int sriov_configure(struct pci_dev *pdev, int num_vfs)
>> +{
>> +	if (!num_vfs) {
>> +		pci_disable_sriov(pdev);
>> +		return 0;
>> +	}
>> +
>> +	return pci_enable_sriov(pdev, num_vfs);
>> +}
>> +
>>  static struct pci_driver uio_pci_driver = {
>>  	.name = "uio_pci_generic",
>>  	.id_table = NULL, /* only dynamic id's */
>>  	.probe = probe,
>>  	.remove = remove,
>> +	.sriov_configure = sriov_configure,
>>  };
>>
>>  module_pci_driver(uio_pci_driver);
>> --
>> 2.7.4
>>
>> --
>> dwmw2
>
>
Don Dutile Sept. 28, 2017, 12:22 p.m. UTC | #5
On 09/27/2017 07:06 PM, Alexander Duyck wrote:
> On Wed, Sep 27, 2017 at 3:20 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>> On Wed, 2017-09-27 at 17:00 -0500, Bjorn Helgaas wrote:
>>>
>>>
>>> IIUC, this question is basically "why doesn't the PCI core enable IOV
>>> automatically when it sees an IOV-capable device?"
>>>
>>> I think one reason is that an admin might want to control the number
>>> of VFs we enable (e.g., via 1789382a72a5 ("PCI: SRIOV control and
>>> status via sysfs" [1]).  But I guess you already know about that,
>>> since this patch uses that sysfs path, so maybe I don't understand
>>> your question.
>>
>>
>> I mean, why doesn't the PCI core *allow* SR-IOV to be enabled via
>> sysfs, unless the driver does this?
>
> The general idea is that the driver usually has to free up resources
> so they can be reassigned to the VF devices.
>
> For example in the case of the Intel NICs enabling SR-IOV reassigns
> the queues to the VFs, and the PF has to be aware that this change is
> happening so that it doesn't try to make use of queues that then
> belong to the VFs.
>
> - Alex
>
After reading Alex's response, I now understand Dave's question
better and why the patch won't work in general.

In every SRIOV capable device I've run into to date, the PF has
to know the VFs are being assigned due to some resource mgmt, if not
internal (e.g., switch) configuration reasons.
The reasons are often subtle.

 From the context of the patches (uio), why aren't VFs enabled via
user-level sysfs interface? That was provided for user-mgmt apps
to have a PCIe-generic/common, device-independent method of VF enablement
(read: libvirt for device-assignment of VFs to VMs).
This also enables one to manage it at a finer, not-all-or-nothing level,
which is good due to bios strengths/weaknesses that various systems have
for SRIOV/VF-enablement.
David Woodhouse Sept. 28, 2017, 1:46 p.m. UTC | #6
On Thu, 2017-09-28 at 08:22 -0400, Don Dutile wrote:
> 
> After reading Alex's response, I now understand Dave's question
> better and why the patch won't work in general.

UIO doesn't work "in general". It requires a very *specific* userspace
driver for the hardware in question, which needs to know what it's
doing.

> In every SRIOV capable device I've run into to date, the PF has
> to know the VFs are being assigned due to some resource mgmt, if not
> internal (e.g., switch) configuration reasons.
> The reasons are often subtle.

Sure. If there's actually a userspace driver (which in my case there
isn't), and if that's needed for the hardware ins question (which in my
case it isn't), then it'd need to do that before enabling the VFs via
the user-level sysfs interface of which you speak.

>  From the context of the patches (uio), why aren't VFs enabled via
> user-level sysfs interface? That was provided for user-mgmt apps
> to have a PCIe-generic/common, device-independent method of VF
> enablement 

That is *precisely* what we're doing. But the user-space sysfs
interface doesn't actually *exist* unless a driver is bound to the
device in question, with a .sriov-configure method. Which is where I
came in... :)
Don Dutile Sept. 28, 2017, 3:05 p.m. UTC | #7
On 09/28/2017 09:46 AM, David Woodhouse wrote:
> On Thu, 2017-09-28 at 08:22 -0400, Don Dutile wrote:
>>
>> After reading Alex's response, I now understand Dave's question
>> better and why the patch won't work in general.
>
> UIO doesn't work "in general". It requires a very *specific* userspace
> driver for the hardware in question, which needs to know what it's
> doing.
>
>> In every SRIOV capable device I've run into to date, the PF has
>> to know the VFs are being assigned due to some resource mgmt, if not
>> internal (e.g., switch) configuration reasons.
>> The reasons are often subtle.
>
> Sure. If there's actually a userspace driver (which in my case there
> isn't), and if that's needed for the hardware ins question (which in my
> case it isn't), then it'd need to do that before enabling the VFs via
> the user-level sysfs interface of which you speak.
>
>>  From the context of the patches (uio), why aren't VFs enabled via
>> user-level sysfs interface? That was provided for user-mgmt apps
>> to have a PCIe-generic/common, device-independent method of VF
>> enablement
>
> That is *precisely* what we're doing. But the user-space sysfs
> interface doesn't actually *exist* unless a driver is bound to the
> device in question, with a .sriov-configure method. Which is where I
> came in... :)
>
ah, nickel summary: no in-kernel driver w/.sriov-configure method.
if so, now I'm up to speed with you....
hmmmm....
so, that would imply we need an in-kernel, pcie-common, .sriov-configure method
that's invoked if a driver isn't bound to a device? ... yes?
David Woodhouse Sept. 28, 2017, 3:52 p.m. UTC | #8
On Thu, 2017-09-28 at 11:05 -0400, Don Dutile wrote:
> ah, nickel summary: no in-kernel driver w/.sriov-configure method.
> if so, now I'm up to speed with you....
> hmmmm....
> so, that would imply we need an in-kernel, pcie-common, .sriov-
> configure method
> that's invoked if a driver isn't bound to a device? ... yes?

Well that was kind of the point in my question below the ---

Is that something we want to be generic? Would we want to have quirks
for the devices where we might *not* want it? 

Anything that *has* a driver for the PF, should have .sriov_configure
already. Anything that doesn't have a driver can (now) use UIO to
enable SR-IOV. So we don't *have* to make it unconditionally
available...
Don Dutile Sept. 28, 2017, 4:56 p.m. UTC | #9
On 09/28/2017 11:52 AM, David Woodhouse wrote:
> On Thu, 2017-09-28 at 11:05 -0400, Don Dutile wrote:
>> ah, nickel summary: no in-kernel driver w/.sriov-configure method.
>> if so, now I'm up to speed with you....
>> hmmmm....
>> so, that would imply we need an in-kernel, pcie-common, .sriov-
>> configure method
>> that's invoked if a driver isn't bound to a device? ... yes?
>
> Well that was kind of the point in my question below the ---
>
> Is that something we want to be generic? Would we want to have quirks
> for the devices where we might *not* want it?
>
> Anything that *has* a driver for the PF, should have .sriov_configure
> already. Anything that doesn't have a driver can (now) use UIO to
> enable SR-IOV. So we don't *have* to make it unconditionally
> available...
>
Well, my point is more like: why put it in uio?
why not make it available via pcie, setup while/if no driver attached?
i.e., other non-uio users can use the mechanism.... like libvirt? ...
if a PF driver isn't required.
David Woodhouse Oct. 2, 2017, 12:35 p.m. UTC | #10
On Thu, 2017-09-28 at 12:56 -0400, Don Dutile wrote:
> Well, my point is more like: why put it in uio?
> why not make it available via pcie, setup while/if no driver attached?
> i.e., other non-uio users can use the mechanism.... like libvirt? ...
> if a PF driver isn't required.

This would allow you to enable SR-IOV on a PF before its driver is
loaded, right? Even when that driver *is* going to need to perform
resource management for those VFs?

Would existing drivers cope with SR-IOV being enabled, and VFs being
assigned to guests, before they're loaded? If so then sure, let's do it
generically. But I'm not sure that's the case.

As it is, the UIO driver is all about "userspace knows best". So if
there's resource management to be done, then userspace needs to do that
before enabling SR-IOV. And that's consistent with the current driver-
based enabling model for SR-IOV.
Don Dutile Oct. 2, 2017, 6:52 p.m. UTC | #11
On 10/02/2017 08:35 AM, David Woodhouse wrote:
> On Thu, 2017-09-28 at 12:56 -0400, Don Dutile wrote:
>> Well, my point is more like: why put it in uio?
>> why not make it available via pcie, setup while/if no driver attached?
>> i.e., other non-uio users can use the mechanism.... like libvirt? ...
>> if a PF driver isn't required.
>
> This would allow you to enable SR-IOV on a PF before its driver is
> loaded, right? Even when that driver *is* going to need to perform
> resource management for those VFs?
>
> Would existing drivers cope with SR-IOV being enabled, and VFs being
> assigned to guests, before they're loaded? If so then sure, let's do it
> generically. But I'm not sure that's the case.
>
No better than a uio driver/mgmt api that may have to configure a PF
before a VF is enabled.
Q: So what is better: provide a common hook in sysfs for all to use,
    potentially causing a kernel fault (during vf probe/config), or
    a user-level program/mgmt-app that does the equivalent?

> As it is, the UIO driver is all about "userspace knows best". So if
> there's resource management to be done, then userspace needs to do that
> before enabling SR-IOV. And that's consistent with the current driver-
> based enabling model for SR-IOV.
>
I'm not seeing the UIO driver consistency here.  IMO, it's a similar
problem, moved to a different place. i.e., slightly different attack vector,
but same endpoint.
David Woodhouse Oct. 2, 2017, 7:10 p.m. UTC | #12
On Mon, 2017-10-02 at 14:52 -0400, Don Dutile wrote:
> On 10/02/2017 08:35 AM, David Woodhouse wrote:
> > This would allow you to enable SR-IOV on a PF before its driver is
> > loaded, right? Even when that driver *is* going to need to perform
> > resource management for those VFs?
> > 
> > Would existing drivers cope with SR-IOV being enabled, and VFs being
> > assigned to guests, before they're loaded? If so then sure, let's do it
> > generically. But I'm not sure that's the case.
> > 
> No better than a uio driver/mgmt api that may have to configure a PF
> before a VF is enabled.

Conceptually, the current model is that you don't have SR-IOV until you
have a driver loaded for the physical function which can do any
necessary resource management.

That's *why* the generic "sriov_numvfs" interface in sysfs isn't
present until such a driver is loaded.

In the UIO case, *userspace* is responsible for the PF. So it's not an
"attack vector"; we let userspace do what it likes with the PF and that
includes enabling SR-IOV too.

Do we actually *disable* SR-IOV when a (UIO or in-kernel) driver for
the PF is unloaded? If not, that's the only "attack vector" I see — to
load a driver which permits SR-IOV to be enabled, and do so, and then
unload it and load a different driver which doesn't cope.

And each driver in that scenario can be either an in-kernel driver or
UIO+userspace; it doesn't matter either way. The patch I sent is just
following the *existing* model.

But sure, my question was intended to ask whether we want to *stick*
with that model. Given the answers I got, my own conclusion was that we
probably do...
Don Dutile Oct. 2, 2017, 10:02 p.m. UTC | #13
On 10/02/2017 03:10 PM, David Woodhouse wrote:
> On Mon, 2017-10-02 at 14:52 -0400, Don Dutile wrote:
>> On 10/02/2017 08:35 AM, David Woodhouse wrote:
>>> This would allow you to enable SR-IOV on a PF before its driver is
>>> loaded, right? Even when that driver *is* going to need to perform
>>> resource management for those VFs?
>>>
>>> Would existing drivers cope with SR-IOV being enabled, and VFs being
>>> assigned to guests, before they're loaded? If so then sure, let's do it
>>> generically. But I'm not sure that's the case.
>>>
>> No better than a uio driver/mgmt api that may have to configure a PF
>> before a VF is enabled.
>
> Conceptually, the current model is that you don't have SR-IOV until you
> have a driver loaded for the physical function which can do any
> necessary resource management.
>
> That's *why* the generic "sriov_numvfs" interface in sysfs isn't
> present until such a driver is loaded.
>
> In the UIO case, *userspace* is responsible for the PF. So it's not an
> "attack vector"; we let userspace do what it likes with the PF and that
> includes enabling SR-IOV too.
>
> Do we actually *disable* SR-IOV when a (UIO or in-kernel) driver for
> the PF is unloaded? If not, that's the only "attack vector" I see — to
> load a driver which permits SR-IOV to be enabled, and do so, and then
> unload it and load a different driver which doesn't cope.
>
> And each driver in that scenario can be either an in-kernel driver or
> UIO+userspace; it doesn't matter either way. The patch I sent is just
> following the *existing* model.
>
> But sure, my question was intended to ask whether we want to *stick*
> with that model. Given the answers I got, my own conclusion was that we
> probably do...
>
ok. got the whole picture now.
+1 to your reply.
diff mbox

Patch

diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
index a56fdf9..bd196f0 100644
--- a/drivers/uio/uio_pci_generic.c
+++ b/drivers/uio/uio_pci_generic.c
@@ -108,15 +108,27 @@  static void remove(struct pci_dev *pdev)
 	struct uio_pci_generic_dev *gdev = pci_get_drvdata(pdev);
 
 	uio_unregister_device(&gdev->info);
+	pci_disable_sriov(pdev);
 	pci_disable_device(pdev);
 	kfree(gdev);
 }
 
+static int sriov_configure(struct pci_dev *pdev, int num_vfs)
+{
+	if (!num_vfs) {
+		pci_disable_sriov(pdev);
+		return 0;
+	}
+
+	return pci_enable_sriov(pdev, num_vfs);
+}
+
 static struct pci_driver uio_pci_driver = {
 	.name = "uio_pci_generic",
 	.id_table = NULL, /* only dynamic id's */
 	.probe = probe,
 	.remove = remove,
+	.sriov_configure = sriov_configure,
 };
 
 module_pci_driver(uio_pci_driver);