mbox series

[V3,for,5.11,00/12] blk-mq/scsi: tracking device queue depth via sbitmap

Message ID 20200923013339.1621784-1-ming.lei@redhat.com (mailing list archive)
Headers show
Series blk-mq/scsi: tracking device queue depth via sbitmap | expand

Message

Ming Lei Sept. 23, 2020, 1:33 a.m. UTC
Hi,

scsi uses one global atomic variable to track queue depth for each
LUN/request queue. This way can't scale well when there is lots of CPU
cores and the disk is very fast. Broadcom guys has complained that their
high end HBA can't reach top performance because .device_busy is
operated in IO path.

Replace the atomic variable sdev->device_busy with sbitmap for
tracking scsi device queue depth.

Test on scsi_debug shows this way improve IOPS > 20%. Meantime
the IOPS difference is just ~1% compared with bypassing .device_busy
on scsi_debug via patches[1]

The 1st 6 patches moves percpu allocation hint into sbitmap, since
the improvement by doing percpu allocation hint on sbitmap is observable.
Meantime export helpers for SCSI.

Patch 7 and 8 prepares for the conversion by returning budget token
from .get_budget callback, meantime passes the budget token to driver
via 'struct blk_mq_queue_data' in .queue_rq().

The last four patches changes SCSI for switching to track device queue
depth via sbitmap.

The patchset have been tested by Broadcom, and obvious performance boost
can be observed.

Given it is based on both for-5.10/block and 5.10/scsi-queue, the target
is for v5.11. And it is posted out just for getting full/enough review.

Please comment and review!

V3:
	- rebase on both for-5.10/block and 5.10/scsi-queue.

V2:
	- fix one build failure


Ming Lei (12):
  sbitmap: remove sbitmap_clear_bit_unlock
  sbitmap: maintain allocation round_robin in sbitmap
  sbitmap: add helpers for updating allocation hint
  sbitmap: move allocation hint into sbitmap
  sbitmap: export sbitmap_weight
  sbitmap: add helper of sbitmap_calculate_shift
  blk-mq: add callbacks for storing & retrieving budget token
  blk-mq: return budget token from .get_budget callback
  scsi: put hot fields of scsi_host_template into one cacheline
  scsi: add scsi_device_busy() to read sdev->device_busy
  scsi: make sure sdev->queue_depth is <= shost->can_queue
  scsi: replace sdev->device_busy with sbitmap

 block/blk-mq-sched.c                 |  17 ++-
 block/blk-mq.c                       |  38 +++--
 block/blk-mq.h                       |  25 +++-
 block/kyber-iosched.c                |   3 +-
 drivers/message/fusion/mptsas.c      |   2 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c |   2 +-
 drivers/scsi/scsi.c                  |   4 +
 drivers/scsi/scsi_lib.c              |  69 ++++++---
 drivers/scsi/scsi_priv.h             |   1 +
 drivers/scsi/scsi_scan.c             |  22 ++-
 drivers/scsi/scsi_sysfs.c            |   4 +-
 drivers/scsi/sg.c                    |   2 +-
 include/linux/blk-mq.h               |  13 +-
 include/linux/sbitmap.h              |  84 +++++++----
 include/scsi/scsi_cmnd.h             |   2 +
 include/scsi/scsi_device.h           |   8 +-
 include/scsi/scsi_host.h             |  72 ++++-----
 lib/sbitmap.c                        | 213 +++++++++++++++------------
 18 files changed, 376 insertions(+), 205 deletions(-)

Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>

Comments

Sumanesh Samanta Sept. 23, 2020, 9:31 p.m. UTC | #1
Hi Ming,

I have tested this patch extensively in our labs.

This patch gives excellent results when a single device can provide
very high IOPs, and only a few of those devices are available on the
system.
Thus, if a RAID 0 volume is created out of many high end NVMe devices,
then that RAID0 volume can potentially reach a max IOPs that is a
summation of the maxs IOPS for all the underlying drives. Without this
patch, the current kernel code cannot get there.

For example, for a simple RAID0 volume with 32 NVMe drives, I got
almost 100% performance boost with this patch.
The NVMe stack does not have this limitation, and this patch goes a
long way in closing that gap.

I have also tested it in many other configurations, and  did not see
any adverse side effects.

Please feel free to add:
Tested-by: Sumanesh Samanta

Thanks,
Sumanesh




On Tue, Sep 22, 2020 at 7:33 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> Hi,
>
> scsi uses one global atomic variable to track queue depth for each
> LUN/request queue. This way can't scale well when there is lots of CPU
> cores and the disk is very fast. Broadcom guys has complained that their
> high end HBA can't reach top performance because .device_busy is
> operated in IO path.
>
> Replace the atomic variable sdev->device_busy with sbitmap for
> tracking scsi device queue depth.
>
> Test on scsi_debug shows this way improve IOPS > 20%. Meantime
> the IOPS difference is just ~1% compared with bypassing .device_busy
> on scsi_debug via patches[1]
>
> The 1st 6 patches moves percpu allocation hint into sbitmap, since
> the improvement by doing percpu allocation hint on sbitmap is observable.
> Meantime export helpers for SCSI.
>
> Patch 7 and 8 prepares for the conversion by returning budget token
> from .get_budget callback, meantime passes the budget token to driver
> via 'struct blk_mq_queue_data' in .queue_rq().
>
> The last four patches changes SCSI for switching to track device queue
> depth via sbitmap.
>
> The patchset have been tested by Broadcom, and obvious performance boost
> can be observed.
>
> Given it is based on both for-5.10/block and 5.10/scsi-queue, the target
> is for v5.11. And it is posted out just for getting full/enough review.
>
> Please comment and review!
>
> V3:
>         - rebase on both for-5.10/block and 5.10/scsi-queue.
>
> V2:
>         - fix one build failure
>
>
> Ming Lei (12):
>   sbitmap: remove sbitmap_clear_bit_unlock
>   sbitmap: maintain allocation round_robin in sbitmap
>   sbitmap: add helpers for updating allocation hint
>   sbitmap: move allocation hint into sbitmap
>   sbitmap: export sbitmap_weight
>   sbitmap: add helper of sbitmap_calculate_shift
>   blk-mq: add callbacks for storing & retrieving budget token
>   blk-mq: return budget token from .get_budget callback
>   scsi: put hot fields of scsi_host_template into one cacheline
>   scsi: add scsi_device_busy() to read sdev->device_busy
>   scsi: make sure sdev->queue_depth is <= shost->can_queue
>   scsi: replace sdev->device_busy with sbitmap
>
>  block/blk-mq-sched.c                 |  17 ++-
>  block/blk-mq.c                       |  38 +++--
>  block/blk-mq.h                       |  25 +++-
>  block/kyber-iosched.c                |   3 +-
>  drivers/message/fusion/mptsas.c      |   2 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c |   2 +-
>  drivers/scsi/scsi.c                  |   4 +
>  drivers/scsi/scsi_lib.c              |  69 ++++++---
>  drivers/scsi/scsi_priv.h             |   1 +
>  drivers/scsi/scsi_scan.c             |  22 ++-
>  drivers/scsi/scsi_sysfs.c            |   4 +-
>  drivers/scsi/sg.c                    |   2 +-
>  include/linux/blk-mq.h               |  13 +-
>  include/linux/sbitmap.h              |  84 +++++++----
>  include/scsi/scsi_cmnd.h             |   2 +
>  include/scsi/scsi_device.h           |   8 +-
>  include/scsi/scsi_host.h             |  72 ++++-----
>  lib/sbitmap.c                        | 213 +++++++++++++++------------
>  18 files changed, 376 insertions(+), 205 deletions(-)
>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
> Cc: Ewan D. Milne <emilne@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
>
> --
> 2.25.2
>