diff mbox series

[V2,1/3] PCI/MSI: preference to returning -ENOSPC from pci_alloc_irq_vectors_affinity

Message ID 20181229032650.27256-2-ming.lei@redhat.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series nvme pci: two fixes on nvme_setup_irqs | expand

Commit Message

Ming Lei Dec. 29, 2018, 3:26 a.m. UTC
The API of pci_alloc_irq_vectors_affinity() requires to return -ENOSPC
if leass than @min_vecs interrupt vectors are available for @dev.

However, this way may be changed by falling back to
__pci_enable_msi_range(), for example, if the device isn't capable of
MSI, __pci_enable_msi_range() will return -EINVAL, and finally it is
returned to users of pci_alloc_irq_vectors_affinity() even though
there are quite MSIX vectors available. This way violates the interface.

Users of pci_alloc_irq_vectors_affinity() may try to reduce irq
vectors and allocate vectors again in case that -ENOSPC is returned, such
as NVMe, so we need to respect the current interface and give preference to
-ENOSPC.

Cc: Jens Axboe <axboe@fb.com>,
Cc: Keith Busch <keith.busch@intel.com>,
Cc: linux-pci@vger.kernel.org,
Cc: Bjorn Helgaas <bhelgaas@google.com>,
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/pci/msi.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

Comments

Bjorn Helgaas Dec. 31, 2018, 10 p.m. UTC | #1
On Sat, Dec 29, 2018 at 11:26:48AM +0800, Ming Lei wrote:
> The API of pci_alloc_irq_vectors_affinity() requires to return -ENOSPC
> if leass than @min_vecs interrupt vectors are available for @dev.

s/leass/fewer/

> However, this way may be changed by falling back to
> __pci_enable_msi_range(), for example, if the device isn't capable of
> MSI, __pci_enable_msi_range() will return -EINVAL, and finally it is
> returned to users of pci_alloc_irq_vectors_affinity() even though
> there are quite MSIX vectors available. This way violates the interface.

I *think* the above means:

  If a device supports MSI-X but not MSI and a caller requests
  @min_vecs that can't be satisfied by MSI-X, we previously returned
  -EINVAL (from the failed attempt to enable MSI), not -ENOSPC.

and I agree that this doesn't match the documented API.

> Users of pci_alloc_irq_vectors_affinity() may try to reduce irq
> vectors and allocate vectors again in case that -ENOSPC is returned, such
> as NVMe, so we need to respect the current interface and give preference to
> -ENOSPC.

I thought the whole point of the (min_vecs, max_vecs) tuple was to
avoid this sort of "reduce and try again" iteration in the callers.

> Cc: Jens Axboe <axboe@fb.com>,
> Cc: Keith Busch <keith.busch@intel.com>,
> Cc: linux-pci@vger.kernel.org,
> Cc: Bjorn Helgaas <bhelgaas@google.com>,
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/pci/msi.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 7a1c8a09efa5..91b4f03fee91 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>  				   const struct irq_affinity *affd)
>  {
>  	static const struct irq_affinity msi_default_affd;
> -	int vecs = -ENOSPC;
> +	int msix_vecs = -ENOSPC;
> +	int msi_vecs = -ENOSPC;
>  
>  	if (flags & PCI_IRQ_AFFINITY) {
>  		if (!affd)
> @@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>  	}
>  
>  	if (flags & PCI_IRQ_MSIX) {
> -		vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
> -				affd);
> -		if (vecs > 0)
> -			return vecs;
> +		msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs,
> +						    max_vecs, affd);
> +		if (msix_vecs > 0)
> +			return msix_vecs;
>  	}
>  
>  	if (flags & PCI_IRQ_MSI) {
> -		vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
> -		if (vecs > 0)
> -			return vecs;
> +		msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
> +						  affd);
> +		if (msi_vecs > 0)
> +			return msi_vecs;
>  	}
>  
>  	/* use legacy irq if allowed */
> @@ -1199,7 +1201,7 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>  		}
>  	}
>  
> -	return vecs;
> +	return msix_vecs == -ENOSPC ? msix_vecs : msi_vecs;

If you know you want to return -ENOSPC, just return that, not a
variable that happens to contain it, i.e.,

  if (msix_vecs == -ENOSPC)
    return -ENOSPC;
  return msi_vecs;

>  }
>  EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
>  
> -- 
> 2.9.5
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
Keith Busch Dec. 31, 2018, 10:41 p.m. UTC | #2
On Mon, Dec 31, 2018 at 04:00:59PM -0600, Bjorn Helgaas wrote:
> On Sat, Dec 29, 2018 at 11:26:48AM +0800, Ming Lei wrote:
> > Users of pci_alloc_irq_vectors_affinity() may try to reduce irq
> > vectors and allocate vectors again in case that -ENOSPC is returned, such
> > as NVMe, so we need to respect the current interface and give preference to
> > -ENOSPC.
> 
> I thought the whole point of the (min_vecs, max_vecs) tuple was to
> avoid this sort of "reduce and try again" iteration in the callers.

The min/max vecs doesn't work correctly when using the irq_affinity
nr_sets because rebalancing the set counts is driver specific. To get
around that, drivers using nr_sets have to set min and max to the same
value and handle the "reduce and try again".
Ming Lei Jan. 1, 2019, 5:24 a.m. UTC | #3
On Mon, Dec 31, 2018 at 04:00:59PM -0600, Bjorn Helgaas wrote:
> On Sat, Dec 29, 2018 at 11:26:48AM +0800, Ming Lei wrote:
> > The API of pci_alloc_irq_vectors_affinity() requires to return -ENOSPC
> > if leass than @min_vecs interrupt vectors are available for @dev.
> 
> s/leass/fewer/
> 
> > However, this way may be changed by falling back to
> > __pci_enable_msi_range(), for example, if the device isn't capable of
> > MSI, __pci_enable_msi_range() will return -EINVAL, and finally it is
> > returned to users of pci_alloc_irq_vectors_affinity() even though
> > there are quite MSIX vectors available. This way violates the interface.
> 
> I *think* the above means:
> 
>   If a device supports MSI-X but not MSI and a caller requests
>   @min_vecs that can't be satisfied by MSI-X, we previously returned
>   -EINVAL (from the failed attempt to enable MSI), not -ENOSPC.
> 
> and I agree that this doesn't match the documented API.

OK, will use the above comment log.

> 
> > Users of pci_alloc_irq_vectors_affinity() may try to reduce irq
> > vectors and allocate vectors again in case that -ENOSPC is returned, such
> > as NVMe, so we need to respect the current interface and give preference to
> > -ENOSPC.
> 
> I thought the whole point of the (min_vecs, max_vecs) tuple was to
> avoid this sort of "reduce and try again" iteration in the callers.

As Keith replied, in case of NVMe, we have to keep min_vecs same with
max_vecs.

> 
> > Cc: Jens Axboe <axboe@fb.com>,
> > Cc: Keith Busch <keith.busch@intel.com>,
> > Cc: linux-pci@vger.kernel.org,
> > Cc: Bjorn Helgaas <bhelgaas@google.com>,
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/pci/msi.c | 20 +++++++++++---------
> >  1 file changed, 11 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index 7a1c8a09efa5..91b4f03fee91 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> >  				   const struct irq_affinity *affd)
> >  {
> >  	static const struct irq_affinity msi_default_affd;
> > -	int vecs = -ENOSPC;
> > +	int msix_vecs = -ENOSPC;
> > +	int msi_vecs = -ENOSPC;
> >  
> >  	if (flags & PCI_IRQ_AFFINITY) {
> >  		if (!affd)
> > @@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> >  	}
> >  
> >  	if (flags & PCI_IRQ_MSIX) {
> > -		vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
> > -				affd);
> > -		if (vecs > 0)
> > -			return vecs;
> > +		msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs,
> > +						    max_vecs, affd);
> > +		if (msix_vecs > 0)
> > +			return msix_vecs;
> >  	}
> >  
> >  	if (flags & PCI_IRQ_MSI) {
> > -		vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
> > -		if (vecs > 0)
> > -			return vecs;
> > +		msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
> > +						  affd);
> > +		if (msi_vecs > 0)
> > +			return msi_vecs;
> >  	}
> >  
> >  	/* use legacy irq if allowed */
> > @@ -1199,7 +1201,7 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> >  		}
> >  	}
> >  
> > -	return vecs;
> > +	return msix_vecs == -ENOSPC ? msix_vecs : msi_vecs;
> 
> If you know you want to return -ENOSPC, just return that, not a
> variable that happens to contain it, i.e.,
> 
>   if (msix_vecs == -ENOSPC)
>     return -ENOSPC;
>   return msi_vecs;

OK.

Thanks,
Ming
Bjorn Helgaas Jan. 2, 2019, 9:02 p.m. UTC | #4
On Tue, Jan 01, 2019 at 01:24:59PM +0800, Ming Lei wrote:
> On Mon, Dec 31, 2018 at 04:00:59PM -0600, Bjorn Helgaas wrote:
> > On Sat, Dec 29, 2018 at 11:26:48AM +0800, Ming Lei wrote:
...
> > > Users of pci_alloc_irq_vectors_affinity() may try to reduce irq
> > > vectors and allocate vectors again in case that -ENOSPC is returned, such
> > > as NVMe, so we need to respect the current interface and give preference to
> > > -ENOSPC.
> > 
> > I thought the whole point of the (min_vecs, max_vecs) tuple was to
> > avoid this sort of "reduce and try again" iteration in the callers.
> 
> As Keith replied, in case of NVMe, we have to keep min_vecs same with
> max_vecs.

Keith said:
> The min/max vecs doesn't work correctly when using the irq_affinity
> nr_sets because rebalancing the set counts is driver specific. To
> get around that, drivers using nr_sets have to set min and max to
> the same value and handle the "reduce and try again".

Sorry I saw that, but didn't follow it at first.  After a little
archaeology, I see that 6da4b3ab9a6e ("genirq/affinity: Add support
for allocating interrupt sets") added nr_sets and some validation
tests (if affd.nr_sets, min_vecs == max_vecs) for using it in the API.

That's sort of a wart on the API, but I don't know if we should live
with it or try to clean it up somehow.

At the very least, this seems like something that could be documented
somewhere in Documentation/PCI/MSI-HOWTO.txt, which mentions
PCI_IRQ_AFFINITY, but doesn't cover struct irq_affinity or
pci_alloc_irq_vectors_affinity() at all, let alone this wrinkle about
affd.nr_sets/min_vecs/max_vecs.

Obviously that would not be part of *this* patch.

Bjorn
Keith Busch Jan. 2, 2019, 10:46 p.m. UTC | #5
On Wed, Jan 02, 2019 at 03:02:02PM -0600, Bjorn Helgaas wrote:
> Keith said:
> > The min/max vecs doesn't work correctly when using the irq_affinity
> > nr_sets because rebalancing the set counts is driver specific. To
> > get around that, drivers using nr_sets have to set min and max to
> > the same value and handle the "reduce and try again".
> 
> Sorry I saw that, but didn't follow it at first.  After a little
> archaeology, I see that 6da4b3ab9a6e ("genirq/affinity: Add support
> for allocating interrupt sets") added nr_sets and some validation
> tests (if affd.nr_sets, min_vecs == max_vecs) for using it in the API.
> 
> That's sort of a wart on the API, but I don't know if we should live
> with it or try to clean it up somehow.

Yeah, that interface is a bit awkward. I was thinking it would be nice to
thread a driver callback to PCI for the driver to redistribute the sets
as needed and let the PCI handle the retries as before. I am testing
with the following, and seems to work, but I'm getting some unexpected
warnings from blk-mq when I have nvme use it. Still investigating that,
but just throwing this out for early feedback.

---
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 7a1c8a09efa5..e33abb167c19 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1035,13 +1035,6 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 	if (maxvec < minvec)
 		return -ERANGE;
 
-	/*
-	 * If the caller is passing in sets, we can't support a range of
-	 * vectors. The caller needs to handle that.
-	 */
-	if (affd && affd->nr_sets && minvec != maxvec)
-		return -EINVAL;
-
 	if (WARN_ON_ONCE(dev->msi_enabled))
 		return -EINVAL;
 
@@ -1093,13 +1086,6 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
 	if (maxvec < minvec)
 		return -ERANGE;
 
-	/*
-	 * If the caller is passing in sets, we can't support a range of
-	 * supported vectors. The caller needs to handle that.
-	 */
-	if (affd && affd->nr_sets && minvec != maxvec)
-		return -EINVAL;
-
 	if (WARN_ON_ONCE(dev->msix_enabled))
 		return -EINVAL;
 
@@ -1110,6 +1096,9 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
 				return -ENOSPC;
 		}
 
+		if (nvec != maxvec && affd && affd->recalc_sets)
+			affd->recalc_sets(affd, nvec);
+
 		rc = __pci_enable_msix(dev, entries, nvec, affd);
 		if (rc == 0)
 			return nvec;
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index c672f34235e7..326c9bd05f62 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -249,12 +249,16 @@ struct irq_affinity_notify {
  *			the MSI(-X) vector space
  * @nr_sets:		Length of passed in *sets array
  * @sets:		Number of affinitized sets
+ * @recalc_sets:	Recalculate sets when requested allocation failed
+ * @priv:		Driver private data
  */
 struct irq_affinity {
 	int	pre_vectors;
 	int	post_vectors;
 	int	nr_sets;
 	int	*sets;
+	void	(*recalc_sets)(struct irq_affinity *, unsigned int);
+	void	*priv;
 };
 
 /**
--
diff mbox series

Patch

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 7a1c8a09efa5..91b4f03fee91 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1168,7 +1168,8 @@  int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 				   const struct irq_affinity *affd)
 {
 	static const struct irq_affinity msi_default_affd;
-	int vecs = -ENOSPC;
+	int msix_vecs = -ENOSPC;
+	int msi_vecs = -ENOSPC;
 
 	if (flags & PCI_IRQ_AFFINITY) {
 		if (!affd)
@@ -1179,16 +1180,17 @@  int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 	}
 
 	if (flags & PCI_IRQ_MSIX) {
-		vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
-				affd);
-		if (vecs > 0)
-			return vecs;
+		msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs,
+						    max_vecs, affd);
+		if (msix_vecs > 0)
+			return msix_vecs;
 	}
 
 	if (flags & PCI_IRQ_MSI) {
-		vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
-		if (vecs > 0)
-			return vecs;
+		msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
+						  affd);
+		if (msi_vecs > 0)
+			return msi_vecs;
 	}
 
 	/* use legacy irq if allowed */
@@ -1199,7 +1201,7 @@  int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 		}
 	}
 
-	return vecs;
+	return msix_vecs == -ENOSPC ? msix_vecs : msi_vecs;
 }
 EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);