diff mbox

msi-x: let drivers retry when not enough vectors

Message ID 20090507082841.GA31751@redhat.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Michael S. Tsirkin May 7, 2009, 8:28 a.m. UTC
pci_enable_msix currently returns -EINVAL if you ask
for more vectors than supported by the device, which would
typically cause fallback to regular interrupts.

It's better to return the table size, making the driver retry
MSI-X with less vectors.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Hi Jesse,
This came up when I was adding MSI-X support to virtio pci driver,
which does not know the exact table size upfront.
Could you consider this patch for 2.6.31 please?


 drivers/pci/msi.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

Comments

Sheng Yang May 7, 2009, 8:51 a.m. UTC | #1
On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote:
> pci_enable_msix currently returns -EINVAL if you ask
> for more vectors than supported by the device, which would
> typically cause fallback to regular interrupts.
>
> It's better to return the table size, making the driver retry
> MSI-X with less vectors.

Hi Michael

I think driver should read from capability list to know how many vector 
supported by this device before enable MSI-X for device, as 
pci_msix_table_size() did...
Michael S. Tsirkin May 7, 2009, 9:05 a.m. UTC | #2
On Thu, May 07, 2009 at 04:51:24PM +0800, Sheng Yang wrote:
> On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote:
> > pci_enable_msix currently returns -EINVAL if you ask
> > for more vectors than supported by the device, which would
> > typically cause fallback to regular interrupts.
> >
> > It's better to return the table size, making the driver retry
> > MSI-X with less vectors.
> 
> Hi Michael
> 
> I think driver should read from capability list to know how many vector 
> supported by this device before enable MSI-X for device, as 
> pci_msix_table_size() did...

Drivers can do this, but it's more code. Since pci_enable_msix
calls pci_msix_table_size already, let it do the work. Right?


> -- 
> regards
> Yang, Sheng
> 
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >
> > Hi Jesse,
> > This came up when I was adding MSI-X support to virtio pci driver,
> > which does not know the exact table size upfront.
> > Could you consider this patch for 2.6.31 please?
> >
> >
> >  drivers/pci/msi.c |    6 +++---
> >  1 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index 6f2e629..f5bd1c9 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -687,8 +687,8 @@ int pci_msix_table_size(struct pci_dev *dev)
> >   * indicates the successful configuration of MSI-X capability structure
> >   * with new allocated MSI-X irqs. A return of < 0 indicates a failure.
> >   * Or a return of > 0 indicates that driver request is exceeding the
> > number - * of irqs available. Driver should use the returned value to
> > re-send - * its request.
> > + * of irqs or MSI-X vectors available. Driver should use the returned
> > value to + * re-send its request.
> >   **/
> >  int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int
> > nvec) {
> > @@ -704,7 +704,7 @@ int pci_enable_msix(struct pci_dev* dev, struct
> > msix_entry *entries, int nvec)
> >
> >  	nr_entries = pci_msix_table_size(dev);
> >  	if (nvec > nr_entries)
> > -		return -EINVAL;
> > +		return nr_entries;
> >
> >  	/* Check for any invalid entries */
> >  	for (i = 0; i < nvec; i++) {
>
Sheng Yang May 7, 2009, 9:10 a.m. UTC | #3
On Thursday 07 May 2009 17:05:06 Michael S. Tsirkin wrote:
> On Thu, May 07, 2009 at 04:51:24PM +0800, Sheng Yang wrote:
> > On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote:
> > > pci_enable_msix currently returns -EINVAL if you ask
> > > for more vectors than supported by the device, which would
> > > typically cause fallback to regular interrupts.
> > >
> > > It's better to return the table size, making the driver retry
> > > MSI-X with less vectors.
> >
> > Hi Michael
> >
> > I think driver should read from capability list to know how many vector
> > supported by this device before enable MSI-X for device, as
> > pci_msix_table_size() did...
>
> Drivers can do this, but it's more code. Since pci_enable_msix
> calls pci_msix_table_size already, let it do the work. Right?

If you don't know the vectors number before you enable MSI-X, how can you 
setup vectors? 

I don't think it's proper way to go.
Matthew Wilcox May 7, 2009, 9:27 a.m. UTC | #4
On Thu, May 07, 2009 at 04:51:24PM +0800, Sheng Yang wrote:
> On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote:
> > pci_enable_msix currently returns -EINVAL if you ask
> > for more vectors than supported by the device, which would
> > typically cause fallback to regular interrupts.
> >
> > It's better to return the table size, making the driver retry
> > MSI-X with less vectors.
> 
> Hi Michael
> 
> I think driver should read from capability list to know how many vector 
> supported by this device before enable MSI-X for device, as 
> pci_msix_table_size() did...

I think Michael's patch makes sense.  It reduces the amount of work the
driver has to do without requiring any additional work in the core.  I
don't see the disadvantage to it.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Michael S. Tsirkin May 7, 2009, 9:31 a.m. UTC | #5
On Thu, May 07, 2009 at 05:10:46PM +0800, Sheng Yang wrote:
> > > I think driver should read from capability list to know how many vector
> > > supported by this device before enable MSI-X for device, as
> > > pci_msix_table_size() did...
> >
> > Drivers can do this, but it's more code. Since pci_enable_msix
> > calls pci_msix_table_size already, let it do the work. Right?
> 
> If you don't know the vectors number before you enable MSI-X, how can you 
> setup vectors? 

I don't know how many irqs I will be able to get anyway.
vectors that can't be assigned are useless ...
Sheng Yang May 7, 2009, 9:40 a.m. UTC | #6
On Thursday 07 May 2009 17:27:31 Matthew Wilcox wrote:
> On Thu, May 07, 2009 at 04:51:24PM +0800, Sheng Yang wrote:
> > On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote:
> > > pci_enable_msix currently returns -EINVAL if you ask
> > > for more vectors than supported by the device, which would
> > > typically cause fallback to regular interrupts.
> > >
> > > It's better to return the table size, making the driver retry
> > > MSI-X with less vectors.
> >
> > Hi Michael
> >
> > I think driver should read from capability list to know how many vector
> > supported by this device before enable MSI-X for device, as
> > pci_msix_table_size() did...
>
> I think Michael's patch makes sense.  It reduces the amount of work the
> driver has to do without requiring any additional work in the core.  I
> don't see the disadvantage to it.
>
> Reviewed-by: Matthew Wilcox <willy@linux.intel.com>

It's indeed weird. Why the semantic of pci_enable_msix can be changed to 
"enable msix, or tell me how many vector do you have"? You can simply call 
pci_msix_table_size() to get what you want, also without any more work, no? I 
can't understand...
Matthew Wilcox May 7, 2009, 9:53 a.m. UTC | #7
On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote:
> It's indeed weird. Why the semantic of pci_enable_msix can be changed to 
> "enable msix, or tell me how many vector do you have"? You can simply call 
> pci_msix_table_size() to get what you want, also without any more work, no? I 
> can't understand...

Here's a good example.  Let's suppose you have a driver which supports
two different models of cards, one has 16 MSI-X interrupts, the other
has 10.  You can call pci_enable_msix() asking for 16 vectors.  If your
card is model A, you get 16 interrupts.  If your card is model B, it says
"you can have 10".

This is less work in the driver (since it must implement falling back to
a smaller number of interrupts *anyway*) than interrogating the card to
find out how many interrupts there are, then requesting the right number,
and still having the fallback path which is going to be less tested.
Sheng Yang May 7, 2009, 10:19 a.m. UTC | #8
On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote:
> On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote:
> > It's indeed weird. Why the semantic of pci_enable_msix can be changed to
> > "enable msix, or tell me how many vector do you have"? You can simply
> > call pci_msix_table_size() to get what you want, also without any more
> > work, no? I can't understand...
>
> Here's a good example.  Let's suppose you have a driver which supports
> two different models of cards, one has 16 MSI-X interrupts, the other
> has 10.  You can call pci_enable_msix() asking for 16 vectors.  If your
> card is model A, you get 16 interrupts.  If your card is model B, it says
> "you can have 10".
>
> This is less work in the driver (since it must implement falling back to
> a smaller number of interrupts *anyway*) than interrogating the card to
> find out how many interrupts there are, then requesting the right number,
> and still having the fallback path which is going to be less tested.

Yeah, partly understand now.

But the confusing of return value is not that pleasure compared to this 
benefit. And even you have to fall back if return > 0 anyway, but in the past, 
you just need fall back once at most; but now you may fall back twice. This 
make thing more complex - you need either two ifs or a simple loop. And just 
one "if" can deal with it before. All that required is one call for 
pci_msix_table_size(), and I believe most driver would like to know how much 
vector it have before it fill the vectors, so mostly no extra cost. But for 
this ambiguous return meaning, you have to add more code for fall back - yes, 
the driver may can assert that the positive return value always would be irq 
numbers if it call pci_msix_table_size() before, but is it safe in logic?
Michael Ellerman May 7, 2009, 10:23 a.m. UTC | #9
On Thu, 2009-05-07 at 03:53 -0600, Matthew Wilcox wrote:
> On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote:
> > It's indeed weird. Why the semantic of pci_enable_msix can be changed to 
> > "enable msix, or tell me how many vector do you have"? You can simply call 
> > pci_msix_table_size() to get what you want, also without any more work, no? I 
> > can't understand...
> 
> Here's a good example.  Let's suppose you have a driver which supports
> two different models of cards, one has 16 MSI-X interrupts, the other
> has 10.  You can call pci_enable_msix() asking for 16 vectors.  If your
> card is model A, you get 16 interrupts.  If your card is model B, it says
> "you can have 10".
> 
> This is less work in the driver (since it must implement falling back to
> a smaller number of interrupts *anyway*) than interrogating the card to
> find out how many interrupts there are, then requesting the right number,
> and still having the fallback path which is going to be less tested.

Not to mention that there's no guarantee that you'll get as many
interrupts as the device supports, so you should really be coding to
cope with that anyway. Like the example in MSI-HOWTO.txt:

197 static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec)
198 {
199         while (nvec >= FOO_DRIVER_MINIMUM_NVEC) {
200                 rc = pci_enable_msix(adapter->pdev,
201                                      adapter->msix_entries, nvec);
202                 if (rc > 0)
203                         nvec = rc;
204                 else
205                         return rc;
206         }
207 
208         return -ENOSPC;
209 }

So I agree, this patch is an improvement.

cheers
Sheng Yang May 7, 2009, 10:28 a.m. UTC | #10
On Thursday 07 May 2009 18:23:50 Michael Ellerman wrote:
> On Thu, 2009-05-07 at 03:53 -0600, Matthew Wilcox wrote:
> > On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote:
> > > It's indeed weird. Why the semantic of pci_enable_msix can be changed
> > > to "enable msix, or tell me how many vector do you have"? You can
> > > simply call pci_msix_table_size() to get what you want, also without
> > > any more work, no? I can't understand...
> >
> > Here's a good example.  Let's suppose you have a driver which supports
> > two different models of cards, one has 16 MSI-X interrupts, the other
> > has 10.  You can call pci_enable_msix() asking for 16 vectors.  If your
> > card is model A, you get 16 interrupts.  If your card is model B, it says
> > "you can have 10".
> >
> > This is less work in the driver (since it must implement falling back to
> > a smaller number of interrupts *anyway*) than interrogating the card to
> > find out how many interrupts there are, then requesting the right number,
> > and still having the fallback path which is going to be less tested.
>
> Not to mention that there's no guarantee that you'll get as many
> interrupts as the device supports, so you should really be coding to
> cope with that anyway. Like the example in MSI-HOWTO.txt:
>
> 197 static int foo_driver_enable_msix(struct foo_adapter *adapter, int
> nvec) 198 {
> 199         while (nvec >= FOO_DRIVER_MINIMUM_NVEC) {
> 200                 rc = pci_enable_msix(adapter->pdev,
> 201                                      adapter->msix_entries, nvec);
> 202                 if (rc > 0)
> 203                         nvec = rc;
> 204                 else
> 205                         return rc;
> 206         }
> 207
> 208         return -ENOSPC;
> 209 }
>
> So I agree, this patch is an improvement.
>
Oh yeah.

Forgot irq counts can also be changed from time to time.

OK, there should be a loop, so that's fine. :)
Michael S. Tsirkin May 7, 2009, 10:32 a.m. UTC | #11
On Thu, May 07, 2009 at 06:19:53PM +0800, Sheng Yang wrote:
> On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote:
> > On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote:
> > > It's indeed weird. Why the semantic of pci_enable_msix can be changed to
> > > "enable msix, or tell me how many vector do you have"? You can simply
> > > call pci_msix_table_size() to get what you want, also without any more
> > > work, no? I can't understand...
> >
> > Here's a good example.  Let's suppose you have a driver which supports
> > two different models of cards, one has 16 MSI-X interrupts, the other
> > has 10.  You can call pci_enable_msix() asking for 16 vectors.  If your
> > card is model A, you get 16 interrupts.  If your card is model B, it says
> > "you can have 10".
> >
> > This is less work in the driver (since it must implement falling back to
> > a smaller number of interrupts *anyway*) than interrogating the card to
> > find out how many interrupts there are, then requesting the right number,
> > and still having the fallback path which is going to be less tested.
> 
> Yeah, partly understand now.
> 
> But the confusing of return value is not that pleasure compared to this 
> benefit. And even you have to fall back if return > 0 anyway, but in the past, 
> you just need fall back once at most; but now you may fall back twice.

I don't think that's right - you might not be able to get the
number of interrupts that pci_enable_msix reported.

> This 
> make thing more complex - you need either two ifs or a simple loop. And just 
> one "if" can deal with it before. All that required is one call for 
> pci_msix_table_size(), and I believe most driver would like to know how much 
> vector it have before it fill the vectors, so mostly no extra cost. But for 
> this ambiguous return meaning, you have to add more code for fall back - yes, 
> the driver may can assert that the positive return value always would be irq 
> numbers if it call pci_msix_table_size() before, but is it safe in logic?

If you know how many vectors the card has, then the only failure mode
is when you are out of irqs. No change there.
Avi Kivity May 7, 2009, 10:44 a.m. UTC | #12
Michael Ellerman wrote:
> Not to mention that there's no guarantee that you'll get as many
> interrupts as the device supports, so you should really be coding to
> cope with that anyway. Like the example in MSI-HOWTO.txt:
>
> 197 static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec)
> 198 {
> 199         while (nvec >= FOO_DRIVER_MINIMUM_NVEC) {
> 200                 rc = pci_enable_msix(adapter->pdev,
> 201                                      adapter->msix_entries, nvec);
> 202                 if (rc > 0)
> 203                         nvec = rc;
> 204                 else
> 205                         return rc;
> 206         }
> 207 
> 208         return -ENOSPC;
> 209 }
>
> So I agree, this patch is an improvement.
>   

I imagine this loop is present in many drivers.  So why not add a helper

static int pci_enable_msix_min(struct foo_adapter *adapter, int min_nvec)
{
	int nvec = 2048;

        while (nvec >= min_nvec) {
                rc = pci_enable_msix(adapter->pdev,
                                     adapter->msix_entries, nvec);
                if (rc == 0)
                        return nvec;
                else if (rc > 0)
                        nvec = rc;
                else
                        return rc;
        }
        return -ENOSPC;
}
Matthew Wilcox May 7, 2009, 11:57 a.m. UTC | #13
On Thu, May 07, 2009 at 01:44:38PM +0300, Avi Kivity wrote:
> I imagine this loop is present in many drivers.  So why not add a helper

Let's look!

arch/x86/kernel/amd_iommu_init.c : Needs an exact number of vectors.
drivers/block/cciss.c : If it doesn't get all 4 vectors, falls back to pin mode (instead of MSI mode -- bug!)
drivers/dma/ioat_dma.c : Falls back down to 1 vector if it can't get nvec
drivers/infiniband/hw/mthca/mthca_main.c : Reverts to MSI if it can't get 3.
drivers/net/benet/be_main.c : Falls back to MSI if it can't get 2.
drivers/net/bnx2.c : Falls back to MSI if it can't get 9.
drivers/net/bnx2x_main.c : Falls back to MSI if it can't get N.
drivers/net/e1000e/netdev.c: Falls back to MSI if it can't get N.
drivers/net/enic/enic_main.c: Falls back to MSI if it can't get N.
drivers/net/forcedeth.c: Falls back to MSI if it can't get N.
drivers/net/igb/igb_main.c: Falls back to MSI if it can't get N.
drivers/net/igbvf/netdev.c: Falls back to MSI if it can't get 3.
drivers/net/myri10ge/myri10ge.c: Falls back to Pin if it can't get N.
drivers/net/netxen/netxen_nic_main.c: Falls back to MSI if it can't get N.
drivers/net/qlge/qlge_main.c: Falls back to MSI if it can't get N.
drivers/net/s2io.c: Falls back to MSI if it can't get N.
drivers/net/vxge/vxge-main.c: Falls back once to a second number.
drivers/pci/pcie/portdrv_core.c: Falls back to MSI if it can't get all of them.
drivers/scsi/lpfc/lpfc_init.c: Falls back to MSI if it can't get N.
drivers/scsi/mpt2sas/mpt2sas_base.c: Only allocates 1.

drivers/net/cxgb3/cxgb3_main.c: Actually falls back ... in a bit of a weird way.  This one could definitely do with the proposed loop.
drivers/net/ixgbe/ixgbe_main.c: Could also be improved with this loop.
drivers/net/mlx4/main.c: Nasty goto-based loop.
drivers/net/niu.c: Ditto
drivers/net/sfc/efx.c: Only falls back once.  Would benefit from loop.
drivers/scsi/qla2xxx/qla_isr.c: Only falls back once.  Would benefit from loop.
drivers/staging/sxg/sxg.c: /*Should try with less vector returned.*/

so ... 7 drivers would benefit from this loop.  20 seem like they wouldn't.

What a lot of drivers seem to do is fall back either to a single MSI or
just pin-based interrupts when they can't get as many interrupts as they
would like.  They don't try to get a single MSI-X interrupt.  I feel an
update to the MSI-HOWTO coming on.
Rusty Russell May 7, 2009, 11:55 p.m. UTC | #14
On Thu, 7 May 2009 07:49:53 pm Sheng Yang wrote:
> On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote:
> > Here's a good example.  Let's suppose you have a driver which supports
> > two different models of cards, one has 16 MSI-X interrupts, the other
> > has 10.  You can call pci_enable_msix() asking for 16 vectors.  If your
> > card is model A, you get 16 interrupts.  If your card is model B, it says
> > "you can have 10".

Sheng is absolutely right, that's a horrid API.

If it actually enabled that number and returned it, it might make sense (cf. 
write() returning less bytes than you give it).  But overloading the return 
value to save an explicit call is just ugly; it's not worth saving a few lines 
of code at cost of making all the drivers subtle and tricksy.

Fail with -ENOSPC or something.

Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Wilcox May 8, 2009, 12:22 a.m. UTC | #15
On Fri, May 08, 2009 at 09:25:00AM +0930, Rusty Russell wrote:
> Sheng is absolutely right, that's a horrid API.

But the API already exists, and is in use by 27 drivers.  If this were a
change to the API, I'd be against it, but it is the existing API.

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Ellerman May 8, 2009, 12:28 a.m. UTC | #16
On Fri, 2009-05-08 at 09:25 +0930, Rusty Russell wrote:
> On Thu, 7 May 2009 07:49:53 pm Sheng Yang wrote:
> > On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote:
> > > Here's a good example.  Let's suppose you have a driver which supports
> > > two different models of cards, one has 16 MSI-X interrupts, the other
> > > has 10.  You can call pci_enable_msix() asking for 16 vectors.  If your
> > > card is model A, you get 16 interrupts.  If your card is model B, it says
> > > "you can have 10".
> 
> Sheng is absolutely right, that's a horrid API.

It's not horrid, though it is tricky - but only for drivers that care,
you can still do:

if (pci_enable_msix(..))
	bail;

> If it actually enabled that number and returned it, it might make sense (cf. 
> write() returning less bytes than you give it).  

It could do that, but I think that would be worse. The driver, on
finding out it can only get a certain number of MSIs might need to make
a decision about how it allocates those - eg. in a network driver,
sharing them between TX/RX/management.

And in practice most of the drivers just bail if they can't get what
they asked for, so enabling less than they wanted would just mean they
have to go and disable them.

> But overloading the return 
> value to save an explicit call is just ugly; it's not worth saving a few lines 
> of code at cost of making all the drivers subtle and tricksy.

Looking at just this patch, I would agree, but unfortunately it's not
that simple. The first limitation on the number of MSIs the driver can
have is the number the device supports, that's what this patch does. But
there are others, and they come out of the arch code, or even the
firmware. So to implement pci_how_many_msis(), we'd need a parallel API
all the way down to the arch code, or a flag to all the existing
routines saying "don't really allocate, just find out". That would be
horrid IMHO.

cheers
Michael S. Tsirkin May 12, 2009, 9:28 p.m. UTC | #17
On Fri, May 08, 2009 at 09:25:00AM +0930, Rusty Russell wrote:
> On Thu, 7 May 2009 07:49:53 pm Sheng Yang wrote:
> > On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote:
> > > Here's a good example.  Let's suppose you have a driver which supports
> > > two different models of cards, one has 16 MSI-X interrupts, the other
> > > has 10.  You can call pci_enable_msix() asking for 16 vectors.  If your
> > > card is model A, you get 16 interrupts.  If your card is model B, it says
> > > "you can have 10".
> 
> Sheng is absolutely right, that's a horrid API.
> 
> If it actually enabled that number and returned it, it might make sense (cf. 
> write() returning less bytes than you give it).  But overloading the return 
> value to save an explicit call is just ugly; it's not worth saving a few lines 
> of code at cost of making all the drivers subtle and tricksy.
> 
> Fail with -ENOSPC or something.
> 
> Rusty.

I do agree that returning a positive value from pci_enable_msix
it an ugly API (but note that this is the API that linux currently has).

Here's a wrapper that I ended up with in my driver:

static int enable_msix(struct pci_dev *dev, struct msix_entry *entries,
		       int *options, int noptions)
{
        int i;
        for (i = 0; i < noptions; ++i)
                if (!pci_enable_msix(dev, entries, options[i]))
                        return options[i];
        return -EBUSY;
}

This gets an array of options for # of vectors and tries them one after
the other until an option that the system can support is found.
On success, we get the # of vectors actually enabled, and
driver can then use them as it sees fit.

Is there interest in moving something like this to pci.h?
Jesse Barnes June 11, 2009, 6:28 p.m. UTC | #18
On Thu, 7 May 2009 11:28:41 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> pci_enable_msix currently returns -EINVAL if you ask
> for more vectors than supported by the device, which would
> typically cause fallback to regular interrupts.
> 
> It's better to return the table size, making the driver retry
> MSI-X with less vectors.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Hi Jesse,
> This came up when I was adding MSI-X support to virtio pci driver,
> which does not know the exact table size upfront.
> Could you consider this patch for 2.6.31 please?

Applied this one to my linux-next branch; hopefully Rusty won't mind
too much. :)
diff mbox

Patch

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 6f2e629..f5bd1c9 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -687,8 +687,8 @@  int pci_msix_table_size(struct pci_dev *dev)
  * indicates the successful configuration of MSI-X capability structure
  * with new allocated MSI-X irqs. A return of < 0 indicates a failure.
  * Or a return of > 0 indicates that driver request is exceeding the number
- * of irqs available. Driver should use the returned value to re-send
- * its request.
+ * of irqs or MSI-X vectors available. Driver should use the returned value to
+ * re-send its request.
  **/
 int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec)
 {
@@ -704,7 +704,7 @@  int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec)
 
 	nr_entries = pci_msix_table_size(dev);
 	if (nvec > nr_entries)
-		return -EINVAL;
+		return nr_entries;
 
 	/* Check for any invalid entries */
 	for (i = 0; i < nvec; i++) {