diff mbox series

[v3,4/8] blk-mp: introduce blk_mq_hctx_map_queues

Message ID 20241112-refactor-blk-affinity-helpers-v3-4-573bfca0cbd8@kernel.org (mailing list archive)
State New, archived
Headers show
Series blk: refactor queue affinity helpers | expand

Commit Message

Daniel Wagner Nov. 12, 2024, 1:26 p.m. UTC
blk_mq_pci_map_queues and blk_mq_virtio_map_queues will create a CPU to
hardware queue mapping based on affinity information. These two function
share common code and only differ on how the affinity information is
retrieved. Also, those functions are located in the block subsystem
where it doesn't really fit in. They are virtio and pci subsystem
specific.

Thus introduce provide a generic mapping function which uses the
irq_get_affinity callback from bus_type.

Originally idea from Ming Lei <ming.lei@redhat.com>

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 block/blk-mq-cpumap.c  | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h |  2 ++
 2 files changed, 39 insertions(+)

Comments

Greg KH Nov. 12, 2024, 1:58 p.m. UTC | #1
On Tue, Nov 12, 2024 at 02:26:19PM +0100, Daniel Wagner wrote:
> blk_mq_pci_map_queues and blk_mq_virtio_map_queues will create a CPU to
> hardware queue mapping based on affinity information. These two function
> share common code and only differ on how the affinity information is
> retrieved. Also, those functions are located in the block subsystem
> where it doesn't really fit in. They are virtio and pci subsystem
> specific.
> 
> Thus introduce provide a generic mapping function which uses the
> irq_get_affinity callback from bus_type.
> 
> Originally idea from Ming Lei <ming.lei@redhat.com>
> 
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>  block/blk-mq-cpumap.c  | 37 +++++++++++++++++++++++++++++++++++++
>  include/linux/blk-mq.h |  2 ++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9638b25fd52124f0173e968ebdca5f1fe0b42ad9..db22a7d523a2762b76398fdd768f55efd1d6d669 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -11,6 +11,7 @@
>  #include <linux/smp.h>
>  #include <linux/cpu.h>
>  #include <linux/group_cpus.h>
> +#include <linux/device/bus.h>
>  
>  #include "blk.h"
>  #include "blk-mq.h"
> @@ -54,3 +55,39 @@ int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int index)
>  
>  	return NUMA_NO_NODE;
>  }
> +
> +/**
> + * blk_mq_hctx_map_queues - Create CPU to hardware queue mapping
> + * @qmap:	CPU to hardware queue map.
> + * @dev:	The device to map queues.
> + * @offset:	Queue offset to use for the device.
> + *
> + * Create a CPU to hardware queue mapping in @qmap. The struct bus_type
> + * irq_get_affinity callback will be used to retrieve the affinity.
> + */
> +void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap,
> +			    struct device *dev, unsigned int offset)
> +
> +{
> +	const struct cpumask *mask;
> +	unsigned int queue, cpu;
> +
> +	if (!dev->bus->irq_get_affinity)
> +		goto fallback;

I think this is better than hard-coding it, but are you sure that the
bus will always be bound to the device here so that you have a valid
bus-> pointer?

thanks,

greg k-h
Daniel Wagner Nov. 12, 2024, 3:33 p.m. UTC | #2
On Tue, Nov 12, 2024 at 02:58:43PM +0100, Greg Kroah-Hartman wrote:
> > +void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap,
> > +			    struct device *dev, unsigned int offset)
> > +
> > +{
> > +	const struct cpumask *mask;
> > +	unsigned int queue, cpu;
> > +
> > +	if (!dev->bus->irq_get_affinity)
> > +		goto fallback;
> 
> I think this is better than hard-coding it, but are you sure that the
> bus will always be bound to the device here so that you have a valid
> bus-> pointer?

No, I just assumed the bus pointer is always valid. If it is possible to
have a device without a bus, than I'll better extend the condition to

	if (!dev->bus || !dev->bus->irq_get_affinity)
        	goto fallback;
Greg KH Nov. 12, 2024, 3:42 p.m. UTC | #3
On Tue, Nov 12, 2024 at 04:33:09PM +0100, Daniel Wagner wrote:
> On Tue, Nov 12, 2024 at 02:58:43PM +0100, Greg Kroah-Hartman wrote:
> > > +void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap,
> > > +			    struct device *dev, unsigned int offset)
> > > +
> > > +{
> > > +	const struct cpumask *mask;
> > > +	unsigned int queue, cpu;
> > > +
> > > +	if (!dev->bus->irq_get_affinity)
> > > +		goto fallback;
> > 
> > I think this is better than hard-coding it, but are you sure that the
> > bus will always be bound to the device here so that you have a valid
> > bus-> pointer?
> 
> No, I just assumed the bus pointer is always valid. If it is possible to
> have a device without a bus, than I'll better extend the condition to
> 
> 	if (!dev->bus || !dev->bus->irq_get_affinity)
>         	goto fallback;

I don't know if it's possible as I don't know what codepaths are calling
this, it was hard to unwind.  But you should check "just to be safe" :)

thanks,

greg k-h
Daniel Wagner Nov. 12, 2024, 4:15 p.m. UTC | #4
On Tue, Nov 12, 2024 at 04:42:40PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Nov 12, 2024 at 04:33:09PM +0100, Daniel Wagner wrote:
> > On Tue, Nov 12, 2024 at 02:58:43PM +0100, Greg Kroah-Hartman wrote:
> > > > +void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap,
> > > > +			    struct device *dev, unsigned int offset)
> > > > +
> > > > +{
> > > > +	const struct cpumask *mask;
> > > > +	unsigned int queue, cpu;
> > > > +
> > > > +	if (!dev->bus->irq_get_affinity)
> > > > +		goto fallback;
> > > 
> > > I think this is better than hard-coding it, but are you sure that the
> > > bus will always be bound to the device here so that you have a valid
> > > bus-> pointer?
> > 
> > No, I just assumed the bus pointer is always valid. If it is possible to
> > have a device without a bus, than I'll better extend the condition to
> > 
> > 	if (!dev->bus || !dev->bus->irq_get_affinity)
> >         	goto fallback;
> 
> I don't know if it's possible as I don't know what codepaths are calling
> this, it was hard to unwind.  But you should check "just to be safe" :)

The main path to map_queues is via the probe functions. There are some
more paths like when updating a tagset after the number of queues but
that is all after the probe function.

nvme_probe
  nvme_alloc_admin_tag_set
    blk_mq_alloc_tag_set
       blk_mq_update_queue_map
          set->ops->map_queues
	     blk_mq_htcx_map_queues
  nvme_alloc_io_tag_set
    blk_mq_alloc_tag_set
      blk_mq_update_queue_map
        set->ops->map_queues
          blk_mq_htcx_map_queues

virtscsi_probe, hisi_sas_v3_probe, ...
  scsi_add_host
    scsi_add_host_with_dma
      scsi_mq_setup_tags
         blk_mq_alloc_tag_set
           blk_mq_update_queue_map
             set->ops->map_queues
               blk_mq_htcx_map_queues

virtblk_probe
  blk_mq_alloc_tag_set
    blk_mq_update_queue_map
      set->ops->map_queues
        blk_mq_htcx_map_queues

Does this help?
Greg KH Nov. 12, 2024, 4:53 p.m. UTC | #5
On Tue, Nov 12, 2024 at 05:15:31PM +0100, Daniel Wagner wrote:
> On Tue, Nov 12, 2024 at 04:42:40PM +0100, Greg Kroah-Hartman wrote:
> > On Tue, Nov 12, 2024 at 04:33:09PM +0100, Daniel Wagner wrote:
> > > On Tue, Nov 12, 2024 at 02:58:43PM +0100, Greg Kroah-Hartman wrote:
> > > > > +void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap,
> > > > > +			    struct device *dev, unsigned int offset)
> > > > > +
> > > > > +{
> > > > > +	const struct cpumask *mask;
> > > > > +	unsigned int queue, cpu;
> > > > > +
> > > > > +	if (!dev->bus->irq_get_affinity)
> > > > > +		goto fallback;
> > > > 
> > > > I think this is better than hard-coding it, but are you sure that the
> > > > bus will always be bound to the device here so that you have a valid
> > > > bus-> pointer?
> > > 
> > > No, I just assumed the bus pointer is always valid. If it is possible to
> > > have a device without a bus, than I'll better extend the condition to
> > > 
> > > 	if (!dev->bus || !dev->bus->irq_get_affinity)
> > >         	goto fallback;
> > 
> > I don't know if it's possible as I don't know what codepaths are calling
> > this, it was hard to unwind.  But you should check "just to be safe" :)
> 
> The main path to map_queues is via the probe functions. There are some
> more paths like when updating a tagset after the number of queues but
> that is all after the probe function.
> 
> nvme_probe
>   nvme_alloc_admin_tag_set
>     blk_mq_alloc_tag_set
>        blk_mq_update_queue_map
>           set->ops->map_queues
> 	     blk_mq_htcx_map_queues
>   nvme_alloc_io_tag_set
>     blk_mq_alloc_tag_set
>       blk_mq_update_queue_map
>         set->ops->map_queues
>           blk_mq_htcx_map_queues
> 
> virtscsi_probe, hisi_sas_v3_probe, ...
>   scsi_add_host
>     scsi_add_host_with_dma
>       scsi_mq_setup_tags
>          blk_mq_alloc_tag_set
>            blk_mq_update_queue_map
>              set->ops->map_queues
>                blk_mq_htcx_map_queues
> 
> virtblk_probe
>   blk_mq_alloc_tag_set
>     blk_mq_update_queue_map
>       set->ops->map_queues
>         blk_mq_htcx_map_queues
> 
> Does this help?

Ok, that seems fine.  Worst case, you crash and it's obvious that it
needs to be checked in the future :)

thanks,

greg k-h
Christoph Hellwig Nov. 12, 2024, 4:56 p.m. UTC | #6
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Daniel Wagner Nov. 12, 2024, 6:25 p.m. UTC | #7
The subject prefix has obviously a typo, should start with 'blk-mq:'
Hannes Reinecke Nov. 13, 2024, 9:48 a.m. UTC | #8
On 11/12/24 14:26, Daniel Wagner wrote:
> blk_mq_pci_map_queues and blk_mq_virtio_map_queues will create a CPU to
> hardware queue mapping based on affinity information. These two function
> share common code and only differ on how the affinity information is
> retrieved. Also, those functions are located in the block subsystem
> where it doesn't really fit in. They are virtio and pci subsystem
> specific.
> 
> Thus introduce provide a generic mapping function which uses the
> irq_get_affinity callback from bus_type.
> 
> Originally idea from Ming Lei <ming.lei@redhat.com>
> 
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>   block/blk-mq-cpumap.c  | 37 +++++++++++++++++++++++++++++++++++++
>   include/linux/blk-mq.h |  2 ++
>   2 files changed, 39 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
diff mbox series

Patch

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9638b25fd52124f0173e968ebdca5f1fe0b42ad9..db22a7d523a2762b76398fdd768f55efd1d6d669 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -11,6 +11,7 @@ 
 #include <linux/smp.h>
 #include <linux/cpu.h>
 #include <linux/group_cpus.h>
+#include <linux/device/bus.h>
 
 #include "blk.h"
 #include "blk-mq.h"
@@ -54,3 +55,39 @@  int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int index)
 
 	return NUMA_NO_NODE;
 }
+
+/**
+ * blk_mq_hctx_map_queues - Create CPU to hardware queue mapping
+ * @qmap:	CPU to hardware queue map.
+ * @dev:	The device to map queues.
+ * @offset:	Queue offset to use for the device.
+ *
+ * Create a CPU to hardware queue mapping in @qmap. The struct bus_type
+ * irq_get_affinity callback will be used to retrieve the affinity.
+ */
+void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap,
+			    struct device *dev, unsigned int offset)
+
+{
+	const struct cpumask *mask;
+	unsigned int queue, cpu;
+
+	if (!dev->bus->irq_get_affinity)
+		goto fallback;
+
+	for (queue = 0; queue < qmap->nr_queues; queue++) {
+		mask = dev->bus->irq_get_affinity(dev, queue + offset);
+		if (!mask)
+			goto fallback;
+
+		for_each_cpu(cpu, mask)
+			qmap->mq_map[cpu] = qmap->queue_offset + queue;
+	}
+
+	return;
+
+fallback:
+	WARN_ON_ONCE(qmap->nr_queues > 1);
+	blk_mq_clear_mq_map(qmap);
+}
+EXPORT_SYMBOL_GPL(blk_mq_hctx_map_queues);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2035fad3131fb60781957095ce8a3a941dd104be..1a85fdcb443c154390cd29f2b1f2a807bf10bfe3 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -923,6 +923,8 @@  void blk_mq_unfreeze_queue_non_owner(struct request_queue *q);
 void blk_freeze_queue_start_non_owner(struct request_queue *q);
 
 void blk_mq_map_queues(struct blk_mq_queue_map *qmap);
+void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap,
+			    struct device *dev, unsigned int offset);
 void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues);
 
 void blk_mq_quiesce_queue_nowait(struct request_queue *q);