mbox series

[v9,0/5] Add MMC software queue support

Message ID cover.1581478568.git.baolin.wang7@gmail.com (mailing list archive)
Headers show
Series Add MMC software queue support | expand

Message

Baolin Wang Feb. 12, 2020, 4:12 a.m. UTC
Hi All,

Now the MMC read/write stack will always wait for previous request is
completed by mmc_blk_rw_wait(), before sending a new request to hardware,
or queue a work to complete request, that will bring context switching
overhead, especially for high I/O per second rates, to affect the IO
performance.

Thus this patch set will introduce the MMC software command queue support
based on command queue engine's interfaces, and set the queue depth as 64
to allow more requests can be be prepared, merged and inserted into IO
scheduler, but we only allow 2 requests in flight, that is enough to let
the irq handler always trigger the next request without a context switch,
as well as avoiding a long latency.

Moreover we can expand the MMC software queue interface to support
MMC packed request or packed command instead of adding new interfaces,
according to previosus discussion.

Below are some comparison data with fio tool. The fio command I used
is like below with changing the '--rw' parameter and enabling the direct
IO flag to measure the actual hardware transfer speed in 4K block size.

./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=1G --group_reporting --numjobs=20 --name=test_read

My eMMC card working at HS400 Enhanced strobe mode:
[    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
[    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB 
[    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
[    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
[    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)

1. Without MMC software queue
I tested 5 times for each case and output a average speed.

1) Sequential read:
Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s
Average speed: 59.66MiB/s

2) Random read:
Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s
Average speed: 27.04MiB/s

3) Sequential write:
Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s
Average speed: 69.68MiB/s

4) Random write:
Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s
Average speed: 35.96MiB/s

2. With MMC software queue
I tested 5 times for each case and output a average speed.

1) Sequential read:
Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s
Average speed: 60.68MiB/s

2) Random read:
Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s
Average speed: 31.36MiB/s

3) Sequential write:
Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s
Average speed: 71.66MiB/s

4) Random write:
Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s
Average speed: 68.76MiB/s

Form above data, we can see the MMC software queue can help to improve some
performance obviously for random read and write, though no obvious improvement
for sequential read and write.

Any comments are welcome. Thanks a lot.

Changes from v8:
 - Add more description in the commit message.
 - Optimize the failure log when calling cqe_enable(). 

Changes from v7:
 - Add reviewed tag from Arnd.
 - Use the 'hsq' acronym for varibles and functions in the core layer.
 - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the CQE
 can work normally.
 - Add a new patch to enable the host software queue for the SD card.
 - Use the default MMC queue depth for host software queue.

Changes from v6:
 - Change the patch order and set host->always_defer_done = true for the
 Spreadtrum host driver.

Changes from v5:
 - Modify the condition of defering to complete request suggested by Adrian.

Changes from v4:
 - Add a seperate patch to introduce a variable to defer to complete
 data requests for some host drivers, when using host software queue.

Changes from v3:
 - Use host software queue instead of sqhci.
 - Fix random config building issue.
 - Change queue depth to 32, but still only allow 2 requests in flight.
 - Update the testing data.

Changes from v2:
 - Remove reference to 'struct cqhci_host' and 'struct cqhci_slot',
 instead adding 'struct sqhci_host', which is only used by software queue.

Changes from v1:
 - Add request_done ops for sdhci_ops.
 - Replace virtual command queue with software queue for functions and
 variables.
 - Rename the software queue file and add sqhci.h header file.

Baolin Wang (5):
  mmc: Add MMC host software queue support
  mmc: core: Enable the MMC host software queue for the SD card
  mmc: host: sdhci: Add request_done ops for struct sdhci_ops
  mmc: host: sdhci: Add a variable to defer to complete requests if
    needed
  mmc: host: sdhci-sprd: Add software queue support

 drivers/mmc/core/block.c      |   61 ++++++++
 drivers/mmc/core/mmc.c        |   18 ++-
 drivers/mmc/core/queue.c      |   22 ++-
 drivers/mmc/core/sd.c         |   10 ++
 drivers/mmc/host/Kconfig      |    8 +
 drivers/mmc/host/Makefile     |    1 +
 drivers/mmc/host/cqhci.c      |    8 +-
 drivers/mmc/host/mmc_hsq.c    |  343 +++++++++++++++++++++++++++++++++++++++++
 drivers/mmc/host/mmc_hsq.h    |   30 ++++
 drivers/mmc/host/sdhci-sprd.c |   28 ++++
 drivers/mmc/host/sdhci.c      |   14 +-
 drivers/mmc/host/sdhci.h      |    3 +
 include/linux/mmc/host.h      |    3 +
 13 files changed, 534 insertions(+), 15 deletions(-)
 create mode 100644 drivers/mmc/host/mmc_hsq.c
 create mode 100644 drivers/mmc/host/mmc_hsq.h

Comments

Ulf Hansson Feb. 18, 2020, 11:38 p.m. UTC | #1
On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com> wrote:
>
> Hi All,
>
> Now the MMC read/write stack will always wait for previous request is
> completed by mmc_blk_rw_wait(), before sending a new request to hardware,
> or queue a work to complete request, that will bring context switching
> overhead, especially for high I/O per second rates, to affect the IO
> performance.
>
> Thus this patch set will introduce the MMC software command queue support
> based on command queue engine's interfaces, and set the queue depth as 64
> to allow more requests can be be prepared, merged and inserted into IO
> scheduler, but we only allow 2 requests in flight, that is enough to let
> the irq handler always trigger the next request without a context switch,
> as well as avoiding a long latency.
>
> Moreover we can expand the MMC software queue interface to support
> MMC packed request or packed command instead of adding new interfaces,
> according to previosus discussion.
>
> Below are some comparison data with fio tool. The fio command I used
> is like below with changing the '--rw' parameter and enabling the direct
> IO flag to measure the actual hardware transfer speed in 4K block size.
>
> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=1G --group_reporting --numjobs=20 --name=test_read
>
> My eMMC card working at HS400 Enhanced strobe mode:
> [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
> [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
> [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
> [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)
>
> 1. Without MMC software queue
> I tested 5 times for each case and output a average speed.
>
> 1) Sequential read:
> Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s
> Average speed: 59.66MiB/s
>
> 2) Random read:
> Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s
> Average speed: 27.04MiB/s
>
> 3) Sequential write:
> Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s
> Average speed: 69.68MiB/s
>
> 4) Random write:
> Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s
> Average speed: 35.96MiB/s
>
> 2. With MMC software queue
> I tested 5 times for each case and output a average speed.
>
> 1) Sequential read:
> Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s
> Average speed: 60.68MiB/s
>
> 2) Random read:
> Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s
> Average speed: 31.36MiB/s
>
> 3) Sequential write:
> Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s
> Average speed: 71.66MiB/s
>
> 4) Random write:
> Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s
> Average speed: 68.76MiB/s
>
> Form above data, we can see the MMC software queue can help to improve some
> performance obviously for random read and write, though no obvious improvement
> for sequential read and write.
>
> Any comments are welcome. Thanks a lot.
>
> Changes from v8:
>  - Add more description in the commit message.
>  - Optimize the failure log when calling cqe_enable().
>
> Changes from v7:
>  - Add reviewed tag from Arnd.
>  - Use the 'hsq' acronym for varibles and functions in the core layer.
>  - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the CQE
>  can work normally.
>  - Add a new patch to enable the host software queue for the SD card.
>  - Use the default MMC queue depth for host software queue.
>
> Changes from v6:
>  - Change the patch order and set host->always_defer_done = true for the
>  Spreadtrum host driver.
>
> Changes from v5:
>  - Modify the condition of defering to complete request suggested by Adrian.
>
> Changes from v4:
>  - Add a seperate patch to introduce a variable to defer to complete
>  data requests for some host drivers, when using host software queue.
>
> Changes from v3:
>  - Use host software queue instead of sqhci.
>  - Fix random config building issue.
>  - Change queue depth to 32, but still only allow 2 requests in flight.
>  - Update the testing data.
>
> Changes from v2:
>  - Remove reference to 'struct cqhci_host' and 'struct cqhci_slot',
>  instead adding 'struct sqhci_host', which is only used by software queue.
>
> Changes from v1:
>  - Add request_done ops for sdhci_ops.
>  - Replace virtual command queue with software queue for functions and
>  variables.
>  - Rename the software queue file and add sqhci.h header file.
>
> Baolin Wang (5):
>   mmc: Add MMC host software queue support
>   mmc: core: Enable the MMC host software queue for the SD card
>   mmc: host: sdhci: Add request_done ops for struct sdhci_ops
>   mmc: host: sdhci: Add a variable to defer to complete requests if
>     needed
>   mmc: host: sdhci-sprd: Add software queue support
>
>  drivers/mmc/core/block.c      |   61 ++++++++
>  drivers/mmc/core/mmc.c        |   18 ++-
>  drivers/mmc/core/queue.c      |   22 ++-
>  drivers/mmc/core/sd.c         |   10 ++
>  drivers/mmc/host/Kconfig      |    8 +
>  drivers/mmc/host/Makefile     |    1 +
>  drivers/mmc/host/cqhci.c      |    8 +-
>  drivers/mmc/host/mmc_hsq.c    |  343 +++++++++++++++++++++++++++++++++++++++++
>  drivers/mmc/host/mmc_hsq.h    |   30 ++++
>  drivers/mmc/host/sdhci-sprd.c |   28 ++++
>  drivers/mmc/host/sdhci.c      |   14 +-
>  drivers/mmc/host/sdhci.h      |    3 +
>  include/linux/mmc/host.h      |    3 +
>  13 files changed, 534 insertions(+), 15 deletions(-)
>  create mode 100644 drivers/mmc/host/mmc_hsq.c
>  create mode 100644 drivers/mmc/host/mmc_hsq.h
>
> --
> 1.7.9.5
>

Applied for next, thanks! Also, thanks for your patience while moving
forward during the reviews!

Note, I did some amending of patch1 to resolve some checkpatch
warnings. SPDX licence and Kconfig help texts, please have a look and
tell if there are something that doesn't look good.

Kind regards
Uffe
Baolin Wang Feb. 19, 2020, 1:35 a.m. UTC | #2
On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com> wrote:
> >
> > Hi All,
> >
> > Now the MMC read/write stack will always wait for previous request is
> > completed by mmc_blk_rw_wait(), before sending a new request to hardware,
> > or queue a work to complete request, that will bring context switching
> > overhead, especially for high I/O per second rates, to affect the IO
> > performance.
> >
> > Thus this patch set will introduce the MMC software command queue support
> > based on command queue engine's interfaces, and set the queue depth as 64
> > to allow more requests can be be prepared, merged and inserted into IO
> > scheduler, but we only allow 2 requests in flight, that is enough to let
> > the irq handler always trigger the next request without a context switch,
> > as well as avoiding a long latency.
> >
> > Moreover we can expand the MMC software queue interface to support
> > MMC packed request or packed command instead of adding new interfaces,
> > according to previosus discussion.
> >
> > Below are some comparison data with fio tool. The fio command I used
> > is like below with changing the '--rw' parameter and enabling the direct
> > IO flag to measure the actual hardware transfer speed in 4K block size.
> >
> > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=1G --group_reporting --numjobs=20 --name=test_read
> >
> > My eMMC card working at HS400 Enhanced strobe mode:
> > [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
> > [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> > [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
> > [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
> > [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)
> >
> > 1. Without MMC software queue
> > I tested 5 times for each case and output a average speed.
> >
> > 1) Sequential read:
> > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s
> > Average speed: 59.66MiB/s
> >
> > 2) Random read:
> > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s
> > Average speed: 27.04MiB/s
> >
> > 3) Sequential write:
> > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s
> > Average speed: 69.68MiB/s
> >
> > 4) Random write:
> > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s
> > Average speed: 35.96MiB/s
> >
> > 2. With MMC software queue
> > I tested 5 times for each case and output a average speed.
> >
> > 1) Sequential read:
> > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s
> > Average speed: 60.68MiB/s
> >
> > 2) Random read:
> > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s
> > Average speed: 31.36MiB/s
> >
> > 3) Sequential write:
> > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s
> > Average speed: 71.66MiB/s
> >
> > 4) Random write:
> > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s
> > Average speed: 68.76MiB/s
> >
> > Form above data, we can see the MMC software queue can help to improve some
> > performance obviously for random read and write, though no obvious improvement
> > for sequential read and write.
> >
> > Any comments are welcome. Thanks a lot.
> >
> > Changes from v8:
> >  - Add more description in the commit message.
> >  - Optimize the failure log when calling cqe_enable().
> >
> > Changes from v7:
> >  - Add reviewed tag from Arnd.
> >  - Use the 'hsq' acronym for varibles and functions in the core layer.
> >  - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the CQE
> >  can work normally.
> >  - Add a new patch to enable the host software queue for the SD card.
> >  - Use the default MMC queue depth for host software queue.
> >
> > Changes from v6:
> >  - Change the patch order and set host->always_defer_done = true for the
> >  Spreadtrum host driver.
> >
> > Changes from v5:
> >  - Modify the condition of defering to complete request suggested by Adrian.
> >
> > Changes from v4:
> >  - Add a seperate patch to introduce a variable to defer to complete
> >  data requests for some host drivers, when using host software queue.
> >
> > Changes from v3:
> >  - Use host software queue instead of sqhci.
> >  - Fix random config building issue.
> >  - Change queue depth to 32, but still only allow 2 requests in flight.
> >  - Update the testing data.
> >
> > Changes from v2:
> >  - Remove reference to 'struct cqhci_host' and 'struct cqhci_slot',
> >  instead adding 'struct sqhci_host', which is only used by software queue.
> >
> > Changes from v1:
> >  - Add request_done ops for sdhci_ops.
> >  - Replace virtual command queue with software queue for functions and
> >  variables.
> >  - Rename the software queue file and add sqhci.h header file.
> >
> > Baolin Wang (5):
> >   mmc: Add MMC host software queue support
> >   mmc: core: Enable the MMC host software queue for the SD card
> >   mmc: host: sdhci: Add request_done ops for struct sdhci_ops
> >   mmc: host: sdhci: Add a variable to defer to complete requests if
> >     needed
> >   mmc: host: sdhci-sprd: Add software queue support
> >
> >  drivers/mmc/core/block.c      |   61 ++++++++
> >  drivers/mmc/core/mmc.c        |   18 ++-
> >  drivers/mmc/core/queue.c      |   22 ++-
> >  drivers/mmc/core/sd.c         |   10 ++
> >  drivers/mmc/host/Kconfig      |    8 +
> >  drivers/mmc/host/Makefile     |    1 +
> >  drivers/mmc/host/cqhci.c      |    8 +-
> >  drivers/mmc/host/mmc_hsq.c    |  343 +++++++++++++++++++++++++++++++++++++++++
> >  drivers/mmc/host/mmc_hsq.h    |   30 ++++
> >  drivers/mmc/host/sdhci-sprd.c |   28 ++++
> >  drivers/mmc/host/sdhci.c      |   14 +-
> >  drivers/mmc/host/sdhci.h      |    3 +
> >  include/linux/mmc/host.h      |    3 +
> >  13 files changed, 534 insertions(+), 15 deletions(-)
> >  create mode 100644 drivers/mmc/host/mmc_hsq.c
> >  create mode 100644 drivers/mmc/host/mmc_hsq.h
> >
> > --
> > 1.7.9.5
> >
>
> Applied for next, thanks! Also, thanks for your patience while moving
> forward during the reviews!

I am very appreciated for you and Arnd's good sugestion when
introducing the hsq.

>
> Note, I did some amending of patch1 to resolve some checkpatch
> warnings. SPDX licence and Kconfig help texts, please have a look and
> tell if there are something that doesn't look good.

Thanks for your help and looks good to me.
Bough Chen June 8, 2020, 6:35 a.m. UTC | #3
> -----Original Message-----
> From: linux-mmc-owner@vger.kernel.org
> [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin Wang
> Sent: 2020年2月19日 9:35
> To: Ulf Hansson <ulf.hansson@linaro.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das
> <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>; Chunyan
> Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus
> Walleij <linus.walleij@linaro.org>; Baolin Wang <baolin.wang@linaro.org>;
> linux-mmc@vger.kernel.org; Linux Kernel Mailing List
> <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> 
> On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org>
> wrote:
> >
> > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com>
> wrote:
> > >
> > > Hi All,
> > >
> > > Now the MMC read/write stack will always wait for previous request
> > > is completed by mmc_blk_rw_wait(), before sending a new request to
> > > hardware, or queue a work to complete request, that will bring
> > > context switching overhead, especially for high I/O per second
> > > rates, to affect the IO performance.
> > >
> > > Thus this patch set will introduce the MMC software command queue
> > > support based on command queue engine's interfaces, and set the
> > > queue depth as 64 to allow more requests can be be prepared, merged
> > > and inserted into IO scheduler, but we only allow 2 requests in
> > > flight, that is enough to let the irq handler always trigger the
> > > next request without a context switch, as well as avoiding a long latency.
> > >
> > > Moreover we can expand the MMC software queue interface to support
> > > MMC packed request or packed command instead of adding new
> > > interfaces, according to previosus discussion.
> > >
> > > Below are some comparison data with fio tool. The fio command I used
> > > is like below with changing the '--rw' parameter and enabling the
> > > direct IO flag to measure the actual hardware transfer speed in 4K block
> size.
> > >
> > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read
> > > --bs=4K --size=1G --group_reporting --numjobs=20 --name=test_read
> > >
> > > My eMMC card working at HS400 Enhanced strobe mode:
> > > [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address
> 0001
> > > [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> > > [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
> > > [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
> > > [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB,
> chardev (248:0)
> > >
> > > 1. Without MMC software queue
> > > I tested 5 times for each case and output a average speed.
> > >
> > > 1) Sequential read:
> > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s Average
> > > speed: 59.66MiB/s
> > >
> > > 2) Random read:
> > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s Average
> > > speed: 27.04MiB/s
> > >
> > > 3) Sequential write:
> > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s Average
> > > speed: 69.68MiB/s
> > >
> > > 4) Random write:
> > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s Average
> > > speed: 35.96MiB/s
> > >
> > > 2. With MMC software queue
> > > I tested 5 times for each case and output a average speed.
> > >
> > > 1) Sequential read:
> > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s Average
> > > speed: 60.68MiB/s
> > >
> > > 2) Random read:
> > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s Average
> > > speed: 31.36MiB/s
> > >
> > > 3) Sequential write:
> > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average
> > > speed: 71.66MiB/s
> > >
> > > 4) Random write:
> > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s Average
> > > speed: 68.76MiB/s
> > >
> > > Form above data, we can see the MMC software queue can help to
> > > improve some performance obviously for random read and write, though
> > > no obvious improvement for sequential read and write.
> > >
> > > Any comments are welcome. Thanks a lot.
> > >

Hi Baolin,

I refer to your code, and add the software queue support on i.MX based on the Linux next-20200602, but unfortunately, I see an obvious performance drop when change to use software queue.
I test on our imx850-evk board, with eMMC soldered.
From the result listing below, only random write has a little performance improve, for others, seems performance drop a lot.
I noticed that, this software queue need no-removable card, any other limitation? For host?
From the code logic, software queue complete the request in irq handler, seems no other change, I do not figure out why this will trigger a performance drop on my platform. Any comment would be appreciate! 
 
Without software queue, normal read/write method:
Sequential read: 56MB/s
Random read: 23.5MB/s
Sequential write: 43.7MB/s
Random write: 19MB/s

With mmc software queue:
Sequential read: 33.5MB/s
Random read: 18.7 MB/s
Sequential write: 37.7MB/s
Random write: 19.8MB/s


Here, I also list my change code to support software queue 

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index eb85237bf2d6..996b8cc5c381 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX
        depends on MMC_SDHCI_PLTFM
        select MMC_SDHCI_IO_ACCESSORS
        select MMC_CQHCI
+       select MMC_HSQ
        help
          This selects the Freescale eSDHC/uSDHC controller support
          found on i.MX25, i.MX35 i.MX5x and i.MX6x.
diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c
index 1d7f84b23a22..6f163695b08d 100644
--- a/drivers/mmc/host/sdhci-esdhc-imx.c
+++ b/drivers/mmc/host/sdhci-esdhc-imx.c
@@ -29,6 +29,7 @@
 #include "sdhci-pltfm.h"
 #include "sdhci-esdhc.h"
 #include "cqhci.h"
+#include "mmc_hsq.h"

 #define ESDHC_SYS_CTRL_DTOCV_MASK      0x0f
 #define        ESDHC_CTRL_D3CD                 0x08
@@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host *host, u32 intmask)
        return 0;
 }

+static void esdhc_request_done(struct sdhci_host *host, struct mmc_request *mrq)
+{
+       /* Validate if the request was from software queue firstly. */
+       if (mmc_hsq_finalize_request(host->mmc, mrq))
+               return;
+
+       mmc_request_done(host->mmc, mrq);
+}
+
 static struct sdhci_ops sdhci_esdhc_ops = {
        .read_l = esdhc_readl_le,
        .read_w = esdhc_readw_le,
@@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = {
        .set_uhs_signaling = esdhc_set_uhs_signaling,
        .reset = esdhc_reset,
        .irq = esdhc_cqhci_irq,
+       .request_done = esdhc_request_done,
 };

 static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = {
@@ -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host *host)
                        writel(tmp, host->ioaddr + ESDHC_VEND_SPEC2);

                        host->quirks &= ~SDHCI_QUIRK_NO_BUSY_IRQ;
+
+                       /*
+                        * On i.MX8MM, we are running Dual Linux OS, with 1st Linux using SD Card
+                        * as rootfs storage, 2nd Linux using eMMC as rootfs storage. We let the
+                        * the 1st linux configure power/clock for the 2nd Linux.
+                        *
+                        * When the 2nd Linux is booting into rootfs stage, we let the 1st Linux
+                        * to destroy the 2nd linux, then restart the 2nd linux, we met SDHCI dump.
+                        * After we clear the pending interrupt and halt CQCTL, issue gone.
+                        */
+                       tmp = cqhci_readl(cq_host, CQHCI_IS);
+                       cqhci_writel(cq_host, tmp, CQHCI_IS);
+                       cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL);
                }

                if (imx_data->socdata->flags & ESDHC_FLAG_STD_TUNING) {
@@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host *host)
                 * After we clear the pending interrupt and halt CQCTL, issue gone.
                 */
                if (cq_host) {
-                       tmp = cqhci_readl(cq_host, CQHCI_IS);
-                       cqhci_writel(cq_host, tmp, CQHCI_IS);
-                       cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL);
                }
        }
 }
@@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
        struct sdhci_pltfm_host *pltfm_host;
        struct sdhci_host *host;
        struct cqhci_host *cq_host;
+       struct mmc_hsq *hsq;
        int err;
        struct pltfm_imx_data *imx_data;

@@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
                err = cqhci_init(cq_host, host->mmc, false);
                if (err)
                        goto disable_ahb_clk;
+       } else if (esdhc_is_usdhc(imx_data)) {
+               hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq), GFP_KERNEL);
+               if (!hsq) {
+                       err = -ENOMEM;
+                       goto disable_ahb_clk;
+               }
+
+               err = mmc_hsq_init(hsq, host->mmc);
+               if (err)
+                       goto disable_ahb_clk;
        }

        if (of_id)
@@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
        if (err)
                goto disable_ahb_clk;

+       if (!mmc_card_is_removable(host->mmc))
+               host->mmc_host_ops.request_atomic = sdhci_request_atomic;
+       else
+               host->always_defer_done = true;
+
        sdhci_esdhc_imx_hwinit(host);

        err = sdhci_add_host(host);
@@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device *dev)
                ret = cqhci_suspend(host->mmc);
                if (ret)
                        return ret;
+       } else if (esdhc_is_usdhc(imx_data)) {
+               mmc_hsq_suspend(host->mmc);
        }

        if ((imx_data->socdata->flags & ESDHC_FLAG_STATE_LOST_IN_LPMODE) &&
@@ -1764,6 +1803,8 @@ static int sdhci_esdhc_suspend(struct device *dev)
 static int sdhci_esdhc_resume(struct device *dev)
 {
        struct sdhci_host *host = dev_get_drvdata(dev);
+       struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
+       struct pltfm_imx_data *imx_data = sdhci_pltfm_priv(pltfm_host);
        int ret;

        ret = pinctrl_pm_select_default_state(dev);
@@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device *dev)
        if (ret)
                return ret;

-       if (host->mmc->caps2 & MMC_CAP2_CQE)
+       if (host->mmc->caps2 & MMC_CAP2_CQE) {
                ret = cqhci_resume(host->mmc);
+       } else if (esdhc_is_usdhc(imx_data)) {
+               mmc_hsq_resume(host->mmc);
+       }

        if (!ret)
                ret = mmc_gpio_set_cd_wake(host->mmc, false);
@@ -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device *dev)
                ret = cqhci_suspend(host->mmc);
                if (ret)
                        return ret;
+       } else if (esdhc_is_usdhc(imx_data)) {
+               mmc_hsq_suspend(host->mmc);
        }

        ret = sdhci_runtime_suspend_host(host);
@@ -1851,8 +1897,11 @@ static int sdhci_esdhc_runtime_resume(struct device *dev)
        if (err)
                goto disable_ipg_clk;

-       if (host->mmc->caps2 & MMC_CAP2_CQE)
+       if (host->mmc->caps2 & MMC_CAP2_CQE) {
                err = cqhci_resume(host->mmc);
+       } else if (esdhc_is_usdhc(imx_data)) {
+               mmc_hsq_resume(host->mmc);
+       }

        return err;



> > > Changes from v8:
> > >  - Add more description in the commit message.
> > >  - Optimize the failure log when calling cqe_enable().
> > >
> > > Changes from v7:
> > >  - Add reviewed tag from Arnd.
> > >  - Use the 'hsq' acronym for varibles and functions in the core layer.
> > >  - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the CQE
> > > can work normally.
> > >  - Add a new patch to enable the host software queue for the SD card.
> > >  - Use the default MMC queue depth for host software queue.
> > >
> > > Changes from v6:
> > >  - Change the patch order and set host->always_defer_done = true for
> > > the  Spreadtrum host driver.
> > >
> > > Changes from v5:
> > >  - Modify the condition of defering to complete request suggested by
> Adrian.
> > >
> > > Changes from v4:
> > >  - Add a seperate patch to introduce a variable to defer to complete
> > > data requests for some host drivers, when using host software queue.
> > >
> > > Changes from v3:
> > >  - Use host software queue instead of sqhci.
> > >  - Fix random config building issue.
> > >  - Change queue depth to 32, but still only allow 2 requests in flight.
> > >  - Update the testing data.
> > >
> > > Changes from v2:
> > >  - Remove reference to 'struct cqhci_host' and 'struct cqhci_slot',
> > > instead adding 'struct sqhci_host', which is only used by software queue.
> > >
> > > Changes from v1:
> > >  - Add request_done ops for sdhci_ops.
> > >  - Replace virtual command queue with software queue for functions
> > > and  variables.
> > >  - Rename the software queue file and add sqhci.h header file.
> > >
> > > Baolin Wang (5):
> > >   mmc: Add MMC host software queue support
> > >   mmc: core: Enable the MMC host software queue for the SD card
> > >   mmc: host: sdhci: Add request_done ops for struct sdhci_ops
> > >   mmc: host: sdhci: Add a variable to defer to complete requests if
> > >     needed
> > >   mmc: host: sdhci-sprd: Add software queue support
> > >
> > >  drivers/mmc/core/block.c      |   61 ++++++++
> > >  drivers/mmc/core/mmc.c        |   18 ++-
> > >  drivers/mmc/core/queue.c      |   22 ++-
> > >  drivers/mmc/core/sd.c         |   10 ++
> > >  drivers/mmc/host/Kconfig      |    8 +
> > >  drivers/mmc/host/Makefile     |    1 +
> > >  drivers/mmc/host/cqhci.c      |    8 +-
> > >  drivers/mmc/host/mmc_hsq.c    |  343
> +++++++++++++++++++++++++++++++++++++++++
> > >  drivers/mmc/host/mmc_hsq.h    |   30 ++++
> > >  drivers/mmc/host/sdhci-sprd.c |   28 ++++
> > >  drivers/mmc/host/sdhci.c      |   14 +-
> > >  drivers/mmc/host/sdhci.h      |    3 +
> > >  include/linux/mmc/host.h      |    3 +
> > >  13 files changed, 534 insertions(+), 15 deletions(-)  create mode
> > > 100644 drivers/mmc/host/mmc_hsq.c  create mode 100644
> > > drivers/mmc/host/mmc_hsq.h
> > >
> > > --
> > > 1.7.9.5
> > >
> >
> > Applied for next, thanks! Also, thanks for your patience while moving
> > forward during the reviews!
> 
> I am very appreciated for you and Arnd's good sugestion when introducing the
> hsq.
> 
> >
> > Note, I did some amending of patch1 to resolve some checkpatch
> > warnings. SPDX licence and Kconfig help texts, please have a look and
> > tell if there are something that doesn't look good.
> 
> Thanks for your help and looks good to me.
Baolin Wang June 8, 2020, 11:53 a.m. UTC | #4
Hi Haibo.

On Mon, Jun 8, 2020 at 2:35 PM BOUGH CHEN <haibo.chen@nxp.com> wrote:
>
> > -----Original Message-----
> > From: linux-mmc-owner@vger.kernel.org
> > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin Wang
> > Sent: 2020年2月19日 9:35
> > To: Ulf Hansson <ulf.hansson@linaro.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das
> > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>; Chunyan
> > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus
> > Walleij <linus.walleij@linaro.org>; Baolin Wang <baolin.wang@linaro.org>;
> > linux-mmc@vger.kernel.org; Linux Kernel Mailing List
> > <linux-kernel@vger.kernel.org>
> > Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> >
> > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org>
> > wrote:
> > >
> > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com>
> > wrote:
> > > >
> > > > Hi All,
> > > >
> > > > Now the MMC read/write stack will always wait for previous request
> > > > is completed by mmc_blk_rw_wait(), before sending a new request to
> > > > hardware, or queue a work to complete request, that will bring
> > > > context switching overhead, especially for high I/O per second
> > > > rates, to affect the IO performance.
> > > >
> > > > Thus this patch set will introduce the MMC software command queue
> > > > support based on command queue engine's interfaces, and set the
> > > > queue depth as 64 to allow more requests can be be prepared, merged
> > > > and inserted into IO scheduler, but we only allow 2 requests in
> > > > flight, that is enough to let the irq handler always trigger the
> > > > next request without a context switch, as well as avoiding a long latency.
> > > >
> > > > Moreover we can expand the MMC software queue interface to support
> > > > MMC packed request or packed command instead of adding new
> > > > interfaces, according to previosus discussion.
> > > >
> > > > Below are some comparison data with fio tool. The fio command I used
> > > > is like below with changing the '--rw' parameter and enabling the
> > > > direct IO flag to measure the actual hardware transfer speed in 4K block
> > size.
> > > >
> > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read
> > > > --bs=4K --size=1G --group_reporting --numjobs=20 --name=test_read
> > > >
> > > > My eMMC card working at HS400 Enhanced strobe mode:
> > > > [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address
> > 0001
> > > > [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> > > > [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
> > > > [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
> > > > [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB,
> > chardev (248:0)
> > > >
> > > > 1. Without MMC software queue
> > > > I tested 5 times for each case and output a average speed.
> > > >
> > > > 1) Sequential read:
> > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s Average
> > > > speed: 59.66MiB/s
> > > >
> > > > 2) Random read:
> > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s Average
> > > > speed: 27.04MiB/s
> > > >
> > > > 3) Sequential write:
> > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s Average
> > > > speed: 69.68MiB/s
> > > >
> > > > 4) Random write:
> > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s Average
> > > > speed: 35.96MiB/s
> > > >
> > > > 2. With MMC software queue
> > > > I tested 5 times for each case and output a average speed.
> > > >
> > > > 1) Sequential read:
> > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s Average
> > > > speed: 60.68MiB/s
> > > >
> > > > 2) Random read:
> > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s Average
> > > > speed: 31.36MiB/s
> > > >
> > > > 3) Sequential write:
> > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average
> > > > speed: 71.66MiB/s
> > > >
> > > > 4) Random write:
> > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s Average
> > > > speed: 68.76MiB/s
> > > >
> > > > Form above data, we can see the MMC software queue can help to
> > > > improve some performance obviously for random read and write, though
> > > > no obvious improvement for sequential read and write.
> > > >
> > > > Any comments are welcome. Thanks a lot.
> > > >
>
> Hi Baolin,
>
> I refer to your code, and add the software queue support on i.MX based on the Linux next-20200602, but unfortunately, I see an obvious performance drop when change to use software queue.
> I test on our imx850-evk board, with eMMC soldered.
> From the result listing below, only random write has a little performance improve, for others, seems performance drop a lot.
> I noticed that, this software queue need no-removable card, any other limitation? For host?
> From the code logic, software queue complete the request in irq handler, seems no other change, I do not figure out why this will trigger a performance drop on my platform. Any comment would be appreciate!

Have you tested with below patches, which introduce an atomic_request
host ops to submit next request in the irq hard handler context to
reduce latency?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed6330330276cba39058e8dbdb28ab8d013d926e
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a374a72baa817388a04825118b7c1ba13064c8c6
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=98a2642f91a47dcd1215d037c14e0e5de33a247d
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e872f1e22ea5f678ae42812949477387fda6725b
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=845c939ee22943786a6eb1d13d03c77b19fcc2c8
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6db96e5810e0a6a345b7d78549de7676ae5b2662
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=48ef8a2a1e5e6a2a189009e880e341ddb452971d
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61ab64e2f54f4c607428667f83f411cb659843a3

> Without software queue, normal read/write method:
> Sequential read: 56MB/s
> Random read: 23.5MB/s
> Sequential write: 43.7MB/s
> Random write: 19MB/s
>
> With mmc software queue:
> Sequential read: 33.5MB/s
> Random read: 18.7 MB/s
> Sequential write: 37.7MB/s
> Random write: 19.8MB/s
>
>
> Here, I also list my change code to support software queue
>
> diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
> index eb85237bf2d6..996b8cc5c381 100644
> --- a/drivers/mmc/host/Kconfig
> +++ b/drivers/mmc/host/Kconfig
> @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX
>         depends on MMC_SDHCI_PLTFM
>         select MMC_SDHCI_IO_ACCESSORS
>         select MMC_CQHCI
> +       select MMC_HSQ
>         help
>           This selects the Freescale eSDHC/uSDHC controller support
>           found on i.MX25, i.MX35 i.MX5x and i.MX6x.
> diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c
> index 1d7f84b23a22..6f163695b08d 100644
> --- a/drivers/mmc/host/sdhci-esdhc-imx.c
> +++ b/drivers/mmc/host/sdhci-esdhc-imx.c
> @@ -29,6 +29,7 @@
>  #include "sdhci-pltfm.h"
>  #include "sdhci-esdhc.h"
>  #include "cqhci.h"
> +#include "mmc_hsq.h"
>
>  #define ESDHC_SYS_CTRL_DTOCV_MASK      0x0f
>  #define        ESDHC_CTRL_D3CD                 0x08
> @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host *host, u32 intmask)
>         return 0;
>  }
>
> +static void esdhc_request_done(struct sdhci_host *host, struct mmc_request *mrq)
> +{
> +       /* Validate if the request was from software queue firstly. */
> +       if (mmc_hsq_finalize_request(host->mmc, mrq))
> +               return;
> +
> +       mmc_request_done(host->mmc, mrq);
> +}
> +
>  static struct sdhci_ops sdhci_esdhc_ops = {
>         .read_l = esdhc_readl_le,
>         .read_w = esdhc_readw_le,
> @@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = {
>         .set_uhs_signaling = esdhc_set_uhs_signaling,
>         .reset = esdhc_reset,
>         .irq = esdhc_cqhci_irq,
> +       .request_done = esdhc_request_done,
>  };
>
>  static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = {
> @@ -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host *host)
>                         writel(tmp, host->ioaddr + ESDHC_VEND_SPEC2);
>
>                         host->quirks &= ~SDHCI_QUIRK_NO_BUSY_IRQ;
> +
> +                       /*
> +                        * On i.MX8MM, we are running Dual Linux OS, with 1st Linux using SD Card
> +                        * as rootfs storage, 2nd Linux using eMMC as rootfs storage. We let the
> +                        * the 1st linux configure power/clock for the 2nd Linux.
> +                        *
> +                        * When the 2nd Linux is booting into rootfs stage, we let the 1st Linux
> +                        * to destroy the 2nd linux, then restart the 2nd linux, we met SDHCI dump.
> +                        * After we clear the pending interrupt and halt CQCTL, issue gone.
> +                        */
> +                       tmp = cqhci_readl(cq_host, CQHCI_IS);
> +                       cqhci_writel(cq_host, tmp, CQHCI_IS);
> +                       cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL);
>                 }
>
>                 if (imx_data->socdata->flags & ESDHC_FLAG_STD_TUNING) {
> @@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host *host)
>                  * After we clear the pending interrupt and halt CQCTL, issue gone.
>                  */
>                 if (cq_host) {
> -                       tmp = cqhci_readl(cq_host, CQHCI_IS);
> -                       cqhci_writel(cq_host, tmp, CQHCI_IS);
> -                       cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL);
>                 }
>         }
>  }
> @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
>         struct sdhci_pltfm_host *pltfm_host;
>         struct sdhci_host *host;
>         struct cqhci_host *cq_host;
> +       struct mmc_hsq *hsq;
>         int err;
>         struct pltfm_imx_data *imx_data;
>
> @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
>                 err = cqhci_init(cq_host, host->mmc, false);
>                 if (err)
>                         goto disable_ahb_clk;
> +       } else if (esdhc_is_usdhc(imx_data)) {
> +               hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq), GFP_KERNEL);
> +               if (!hsq) {
> +                       err = -ENOMEM;
> +                       goto disable_ahb_clk;
> +               }
> +
> +               err = mmc_hsq_init(hsq, host->mmc);
> +               if (err)
> +                       goto disable_ahb_clk;
>         }
>
>         if (of_id)
> @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
>         if (err)
>                 goto disable_ahb_clk;
>
> +       if (!mmc_card_is_removable(host->mmc))
> +               host->mmc_host_ops.request_atomic = sdhci_request_atomic;
> +       else
> +               host->always_defer_done = true;
> +
>         sdhci_esdhc_imx_hwinit(host);
>
>         err = sdhci_add_host(host);
> @@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device *dev)
>                 ret = cqhci_suspend(host->mmc);
>                 if (ret)
>                         return ret;
> +       } else if (esdhc_is_usdhc(imx_data)) {
> +               mmc_hsq_suspend(host->mmc);
>         }
>
>         if ((imx_data->socdata->flags & ESDHC_FLAG_STATE_LOST_IN_LPMODE) &&
> @@ -1764,6 +1803,8 @@ static int sdhci_esdhc_suspend(struct device *dev)
>  static int sdhci_esdhc_resume(struct device *dev)
>  {
>         struct sdhci_host *host = dev_get_drvdata(dev);
> +       struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> +       struct pltfm_imx_data *imx_data = sdhci_pltfm_priv(pltfm_host);
>         int ret;
>
>         ret = pinctrl_pm_select_default_state(dev);
> @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device *dev)
>         if (ret)
>                 return ret;
>
> -       if (host->mmc->caps2 & MMC_CAP2_CQE)
> +       if (host->mmc->caps2 & MMC_CAP2_CQE) {
>                 ret = cqhci_resume(host->mmc);
> +       } else if (esdhc_is_usdhc(imx_data)) {
> +               mmc_hsq_resume(host->mmc);
> +       }
>
>         if (!ret)
>                 ret = mmc_gpio_set_cd_wake(host->mmc, false);
> @@ -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device *dev)
>                 ret = cqhci_suspend(host->mmc);
>                 if (ret)
>                         return ret;
> +       } else if (esdhc_is_usdhc(imx_data)) {
> +               mmc_hsq_suspend(host->mmc);
>         }
>
>         ret = sdhci_runtime_suspend_host(host);
> @@ -1851,8 +1897,11 @@ static int sdhci_esdhc_runtime_resume(struct device *dev)
>         if (err)
>                 goto disable_ipg_clk;
>
> -       if (host->mmc->caps2 & MMC_CAP2_CQE)
> +       if (host->mmc->caps2 & MMC_CAP2_CQE) {
>                 err = cqhci_resume(host->mmc);
> +       } else if (esdhc_is_usdhc(imx_data)) {
> +               mmc_hsq_resume(host->mmc);
> +       }
>
>         return err;
>
>
>
> > > > Changes from v8:
> > > >  - Add more description in the commit message.
> > > >  - Optimize the failure log when calling cqe_enable().
> > > >
> > > > Changes from v7:
> > > >  - Add reviewed tag from Arnd.
> > > >  - Use the 'hsq' acronym for varibles and functions in the core layer.
> > > >  - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the CQE
> > > > can work normally.
> > > >  - Add a new patch to enable the host software queue for the SD card.
> > > >  - Use the default MMC queue depth for host software queue.
> > > >
> > > > Changes from v6:
> > > >  - Change the patch order and set host->always_defer_done = true for
> > > > the  Spreadtrum host driver.
> > > >
> > > > Changes from v5:
> > > >  - Modify the condition of defering to complete request suggested by
> > Adrian.
> > > >
> > > > Changes from v4:
> > > >  - Add a seperate patch to introduce a variable to defer to complete
> > > > data requests for some host drivers, when using host software queue.
> > > >
> > > > Changes from v3:
> > > >  - Use host software queue instead of sqhci.
> > > >  - Fix random config building issue.
> > > >  - Change queue depth to 32, but still only allow 2 requests in flight.
> > > >  - Update the testing data.
> > > >
> > > > Changes from v2:
> > > >  - Remove reference to 'struct cqhci_host' and 'struct cqhci_slot',
> > > > instead adding 'struct sqhci_host', which is only used by software queue.
> > > >
> > > > Changes from v1:
> > > >  - Add request_done ops for sdhci_ops.
> > > >  - Replace virtual command queue with software queue for functions
> > > > and  variables.
> > > >  - Rename the software queue file and add sqhci.h header file.
> > > >
> > > > Baolin Wang (5):
> > > >   mmc: Add MMC host software queue support
> > > >   mmc: core: Enable the MMC host software queue for the SD card
> > > >   mmc: host: sdhci: Add request_done ops for struct sdhci_ops
> > > >   mmc: host: sdhci: Add a variable to defer to complete requests if
> > > >     needed
> > > >   mmc: host: sdhci-sprd: Add software queue support
> > > >
> > > >  drivers/mmc/core/block.c      |   61 ++++++++
> > > >  drivers/mmc/core/mmc.c        |   18 ++-
> > > >  drivers/mmc/core/queue.c      |   22 ++-
> > > >  drivers/mmc/core/sd.c         |   10 ++
> > > >  drivers/mmc/host/Kconfig      |    8 +
> > > >  drivers/mmc/host/Makefile     |    1 +
> > > >  drivers/mmc/host/cqhci.c      |    8 +-
> > > >  drivers/mmc/host/mmc_hsq.c    |  343
> > +++++++++++++++++++++++++++++++++++++++++
> > > >  drivers/mmc/host/mmc_hsq.h    |   30 ++++
> > > >  drivers/mmc/host/sdhci-sprd.c |   28 ++++
> > > >  drivers/mmc/host/sdhci.c      |   14 +-
> > > >  drivers/mmc/host/sdhci.h      |    3 +
> > > >  include/linux/mmc/host.h      |    3 +
> > > >  13 files changed, 534 insertions(+), 15 deletions(-)  create mode
> > > > 100644 drivers/mmc/host/mmc_hsq.c  create mode 100644
> > > > drivers/mmc/host/mmc_hsq.h
> > > >
> > > > --
> > > > 1.7.9.5
> > > >
> > >
> > > Applied for next, thanks! Also, thanks for your patience while moving
> > > forward during the reviews!
> >
> > I am very appreciated for you and Arnd's good sugestion when introducing the
> > hsq.
> >
> > >
> > > Note, I did some amending of patch1 to resolve some checkpatch
> > > warnings. SPDX licence and Kconfig help texts, please have a look and
> > > tell if there are something that doesn't look good.
> >
> > Thanks for your help and looks good to me.
Bough Chen June 10, 2020, 2:26 a.m. UTC | #5
> -----Original Message-----
> From: Baolin Wang [mailto:baolin.wang7@gmail.com]
> Sent: 2020年6月8日 19:54
> To: BOUGH CHEN <haibo.chen@nxp.com>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>; Adrian Hunter
> <adrian.hunter@intel.com>; Asutosh Das <asutoshd@codeaurora.org>; Orson
> Zhai <orsonzhai@gmail.com>; Chunyan Zhang <zhang.lyra@gmail.com>; Arnd
> Bergmann <arnd@arndb.de>; Linus Walleij <linus.walleij@linaro.org>; Baolin
> Wang <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel
> Mailing List <linux-kernel@vger.kernel.org>; dl-linux-imx <linux-imx@nxp.com>
> Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> 
> Hi Haibo.
> 
> On Mon, Jun 8, 2020 at 2:35 PM BOUGH CHEN <haibo.chen@nxp.com> wrote:
> >
> > > -----Original Message-----
> > > From: linux-mmc-owner@vger.kernel.org
> > > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin Wang
> > > Sent: 2020年2月19日 9:35
> > > To: Ulf Hansson <ulf.hansson@linaro.org>
> > > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das
> > > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>;
> Chunyan
> > > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus
> > > Walleij <linus.walleij@linaro.org>; Baolin Wang
> > > <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel
> > > Mailing List <linux-kernel@vger.kernel.org>
> > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> > >
> > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org>
> > > wrote:
> > > >
> > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com>
> > > wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > Now the MMC read/write stack will always wait for previous
> > > > > request is completed by mmc_blk_rw_wait(), before sending a new
> > > > > request to hardware, or queue a work to complete request, that
> > > > > will bring context switching overhead, especially for high I/O
> > > > > per second rates, to affect the IO performance.
> > > > >
> > > > > Thus this patch set will introduce the MMC software command
> > > > > queue support based on command queue engine's interfaces, and
> > > > > set the queue depth as 64 to allow more requests can be be
> > > > > prepared, merged and inserted into IO scheduler, but we only
> > > > > allow 2 requests in flight, that is enough to let the irq
> > > > > handler always trigger the next request without a context switch, as
> well as avoiding a long latency.
> > > > >
> > > > > Moreover we can expand the MMC software queue interface to
> > > > > support MMC packed request or packed command instead of adding
> > > > > new interfaces, according to previosus discussion.
> > > > >
> > > > > Below are some comparison data with fio tool. The fio command I
> > > > > used is like below with changing the '--rw' parameter and
> > > > > enabling the direct IO flag to measure the actual hardware
> > > > > transfer speed in 4K block
> > > size.
> > > > >
> > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20
> > > > > --rw=read --bs=4K --size=1G --group_reporting --numjobs=20
> > > > > --name=test_read
> > > > >
> > > > > My eMMC card working at HS400 Enhanced strobe mode:
> > > > > [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at
> address
> > > 0001
> > > > > [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> > > > > [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00
> MiB
> > > > > [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00
> MiB
> > > > > [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00
> MiB,
> > > chardev (248:0)
> > > > >
> > > > > 1. Without MMC software queue
> > > > > I tested 5 times for each case and output a average speed.
> > > > >
> > > > > 1) Sequential read:
> > > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s
> > > > > Average
> > > > > speed: 59.66MiB/s
> > > > >
> > > > > 2) Random read:
> > > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s
> > > > > Average
> > > > > speed: 27.04MiB/s
> > > > >
> > > > > 3) Sequential write:
> > > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s
> > > > > Average
> > > > > speed: 69.68MiB/s
> > > > >
> > > > > 4) Random write:
> > > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s
> > > > > Average
> > > > > speed: 35.96MiB/s
> > > > >
> > > > > 2. With MMC software queue
> > > > > I tested 5 times for each case and output a average speed.
> > > > >
> > > > > 1) Sequential read:
> > > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s
> > > > > Average
> > > > > speed: 60.68MiB/s
> > > > >
> > > > > 2) Random read:
> > > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s
> > > > > Average
> > > > > speed: 31.36MiB/s
> > > > >
> > > > > 3) Sequential write:
> > > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average
> > > > > speed: 71.66MiB/s
> > > > >
> > > > > 4) Random write:
> > > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s
> > > > > Average
> > > > > speed: 68.76MiB/s
> > > > >
> > > > > Form above data, we can see the MMC software queue can help to
> > > > > improve some performance obviously for random read and write,
> > > > > though no obvious improvement for sequential read and write.
> > > > >
> > > > > Any comments are welcome. Thanks a lot.
> > > > >
> >
> > Hi Baolin,
> >
> > I refer to your code, and add the software queue support on i.MX based on
> the Linux next-20200602, but unfortunately, I see an obvious performance drop
> when change to use software queue.
> > I test on our imx850-evk board, with eMMC soldered.
> > From the result listing below, only random write has a little performance
> improve, for others, seems performance drop a lot.
> > I noticed that, this software queue need no-removable card, any other
> limitation? For host?
> > From the code logic, software queue complete the request in irq handler,
> seems no other change, I do not figure out why this will trigger a performance
> drop on my platform. Any comment would be appreciate!
> 
> Have you tested with below patches, which introduce an atomic_request host
> ops to submit next request in the irq hard handler context to reduce latency?

Hi Baolin,

The Linux code base I use is Linux-next-20200602, which already contain all the patch you listed, including the atomic_request.
I do not test the atomic_request alone. I just enable software queue, and this software queue use the atomic request.
So, do you mean, I need to first disable software queue, and just enable the atomic_request, to confirm whether the atomic_request has some performance impact on my platform?

Best Regards
Haibo Chen

> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern
> el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco
> mmit%2F%3Fid%3Ded6330330276cba39058e8dbdb28ab8d013d926e&amp;dat
> a=02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2
> c447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C6372721409928
> 90127&amp;sdata=JJVYXczOXYUdpdNkcJhGMaO6a6EeZkQ72hF090BMAHY%3
> D&amp;reserved=0
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern
> el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco
> mmit%2F%3Fid%3Da374a72baa817388a04825118b7c1ba13064c8c6&amp;dat
> a=02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2
> c447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C6372721409928
> 90127&amp;sdata=RaL685%2FtijHmnmr%2B9wOyr6bN53gLLXpEBgS9wwBqIi8
> %3D&amp;reserved=0
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern
> el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco
> mmit%2F%3Fid%3D98a2642f91a47dcd1215d037c14e0e5de33a247d&amp;dat
> a=02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2
> c447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C6372721409928
> 90127&amp;sdata=%2B3TBD1lyYOA7D6gvhw0mikvoALMMlyaf9xPGra%2FCnU
> I%3D&amp;reserved=0
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern
> el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco
> mmit%2F%3Fid%3De872f1e22ea5f678ae42812949477387fda6725b&amp;data
> =02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2c
> 447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C63727214099289
> 0127&amp;sdata=7kByREznCyEUzibneJB1JMlXdaW%2FZs0FN1MWSzDAPAg%
> 3D&amp;reserved=0
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern
> el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco
> mmit%2F%3Fid%3D845c939ee22943786a6eb1d13d03c77b19fcc2c8&amp;data
> =02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2c
> 447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C63727214099289
> 0127&amp;sdata=zxV8REvhSnBTFeyi9HHjFEPcYYrPJGJpNj%2FcOORY0%2BE%3
> D&amp;reserved=0
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern
> el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco
> mmit%2F%3Fid%3D6db96e5810e0a6a345b7d78549de7676ae5b2662&amp;da
> ta=02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2
> c447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C6372721409928
> 90127&amp;sdata=5J26bs7%2FmJbhcU2%2FJRAdvY6BLd2QIJLafjq37sVKfYA%3
> D&amp;reserved=0
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern
> el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco
> mmit%2F%3Fid%3D48ef8a2a1e5e6a2a189009e880e341ddb452971d&amp;dat
> a=02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2
> c447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C6372721409928
> 90127&amp;sdata=yM6znnt3BDaLR0G1%2FtWyx1OogCKP2cxErZvvZxDDzbQ%
> 3D&amp;reserved=0
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern
> el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco
> mmit%2F%3Fid%3D61ab64e2f54f4c607428667f83f411cb659843a3&amp;data
> =02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2c
> 447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C63727214099289
> 0127&amp;sdata=GZfSBBYZ9eOJfxLPEKCfVUJTcvc6oci6AmeSUCWD8Dc%3D&
> amp;reserved=0
> 
> > Without software queue, normal read/write method:
> > Sequential read: 56MB/s
> > Random read: 23.5MB/s
> > Sequential write: 43.7MB/s
> > Random write: 19MB/s
> >
> > With mmc software queue:
> > Sequential read: 33.5MB/s
> > Random read: 18.7 MB/s
> > Sequential write: 37.7MB/s
> > Random write: 19.8MB/s
> >
> >
> > Here, I also list my change code to support software queue
> >
> > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index
> > eb85237bf2d6..996b8cc5c381 100644
> > --- a/drivers/mmc/host/Kconfig
> > +++ b/drivers/mmc/host/Kconfig
> > @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX
> >         depends on MMC_SDHCI_PLTFM
> >         select MMC_SDHCI_IO_ACCESSORS
> >         select MMC_CQHCI
> > +       select MMC_HSQ
> >         help
> >           This selects the Freescale eSDHC/uSDHC controller support
> >           found on i.MX25, i.MX35 i.MX5x and i.MX6x.
> > diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c
> > b/drivers/mmc/host/sdhci-esdhc-imx.c
> > index 1d7f84b23a22..6f163695b08d 100644
> > --- a/drivers/mmc/host/sdhci-esdhc-imx.c
> > +++ b/drivers/mmc/host/sdhci-esdhc-imx.c
> > @@ -29,6 +29,7 @@
> >  #include "sdhci-pltfm.h"
> >  #include "sdhci-esdhc.h"
> >  #include "cqhci.h"
> > +#include "mmc_hsq.h"
> >
> >  #define ESDHC_SYS_CTRL_DTOCV_MASK      0x0f
> >  #define        ESDHC_CTRL_D3CD                 0x08
> > @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host
> *host, u32 intmask)
> >         return 0;
> >  }
> >
> > +static void esdhc_request_done(struct sdhci_host *host, struct
> > +mmc_request *mrq) {
> > +       /* Validate if the request was from software queue firstly. */
> > +       if (mmc_hsq_finalize_request(host->mmc, mrq))
> > +               return;
> > +
> > +       mmc_request_done(host->mmc, mrq); }
> > +
> >  static struct sdhci_ops sdhci_esdhc_ops = {
> >         .read_l = esdhc_readl_le,
> >         .read_w = esdhc_readw_le,
> > @@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = {
> >         .set_uhs_signaling = esdhc_set_uhs_signaling,
> >         .reset = esdhc_reset,
> >         .irq = esdhc_cqhci_irq,
> > +       .request_done = esdhc_request_done,
> >  };
> >
> >  static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = { @@
> > -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host
> *host)
> >                         writel(tmp, host->ioaddr +
> ESDHC_VEND_SPEC2);
> >
> >                         host->quirks &=
> ~SDHCI_QUIRK_NO_BUSY_IRQ;
> > +
> > +                       /*
> > +                        * On i.MX8MM, we are running Dual Linux OS,
> with 1st Linux using SD Card
> > +                        * as rootfs storage, 2nd Linux using eMMC as
> rootfs storage. We let the
> > +                        * the 1st linux configure power/clock for the
> 2nd Linux.
> > +                        *
> > +                        * When the 2nd Linux is booting into rootfs
> stage, we let the 1st Linux
> > +                        * to destroy the 2nd linux, then restart the
> 2nd linux, we met SDHCI dump.
> > +                        * After we clear the pending interrupt and halt
> CQCTL, issue gone.
> > +                        */
> > +                       tmp = cqhci_readl(cq_host, CQHCI_IS);
> > +                       cqhci_writel(cq_host, tmp, CQHCI_IS);
> > +                       cqhci_writel(cq_host, CQHCI_HALT,
> CQHCI_CTL);
> >                 }
> >
> >                 if (imx_data->socdata->flags &
> ESDHC_FLAG_STD_TUNING)
> > { @@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct
> sdhci_host *host)
> >                  * After we clear the pending interrupt and halt CQCTL,
> issue gone.
> >                  */
> >                 if (cq_host) {
> > -                       tmp = cqhci_readl(cq_host, CQHCI_IS);
> > -                       cqhci_writel(cq_host, tmp, CQHCI_IS);
> > -                       cqhci_writel(cq_host, CQHCI_HALT,
> CQHCI_CTL);
> >                 }
> >         }
> >  }
> > @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct
> platform_device *pdev)
> >         struct sdhci_pltfm_host *pltfm_host;
> >         struct sdhci_host *host;
> >         struct cqhci_host *cq_host;
> > +       struct mmc_hsq *hsq;
> >         int err;
> >         struct pltfm_imx_data *imx_data;
> >
> > @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct
> platform_device *pdev)
> >                 err = cqhci_init(cq_host, host->mmc, false);
> >                 if (err)
> >                         goto disable_ahb_clk;
> > +       } else if (esdhc_is_usdhc(imx_data)) {
> > +               hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq),
> GFP_KERNEL);
> > +               if (!hsq) {
> > +                       err = -ENOMEM;
> > +                       goto disable_ahb_clk;
> > +               }
> > +
> > +               err = mmc_hsq_init(hsq, host->mmc);
> > +               if (err)
> > +                       goto disable_ahb_clk;
> >         }
> >
> >         if (of_id)
> > @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct
> platform_device *pdev)
> >         if (err)
> >                 goto disable_ahb_clk;
> >
> > +       if (!mmc_card_is_removable(host->mmc))
> > +               host->mmc_host_ops.request_atomic =
> sdhci_request_atomic;
> > +       else
> > +               host->always_defer_done = true;
> > +
> >         sdhci_esdhc_imx_hwinit(host);
> >
> >         err = sdhci_add_host(host);
> > @@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device
> *dev)
> >                 ret = cqhci_suspend(host->mmc);
> >                 if (ret)
> >                         return ret;
> > +       } else if (esdhc_is_usdhc(imx_data)) {
> > +               mmc_hsq_suspend(host->mmc);
> >         }
> >
> >         if ((imx_data->socdata->flags &
> > ESDHC_FLAG_STATE_LOST_IN_LPMODE) && @@ -1764,6 +1803,8 @@
> static int
> > sdhci_esdhc_suspend(struct device *dev)  static int
> > sdhci_esdhc_resume(struct device *dev)  {
> >         struct sdhci_host *host = dev_get_drvdata(dev);
> > +       struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> > +       struct pltfm_imx_data *imx_data =
> > + sdhci_pltfm_priv(pltfm_host);
> >         int ret;
> >
> >         ret = pinctrl_pm_select_default_state(dev);
> > @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device
> *dev)
> >         if (ret)
> >                 return ret;
> >
> > -       if (host->mmc->caps2 & MMC_CAP2_CQE)
> > +       if (host->mmc->caps2 & MMC_CAP2_CQE) {
> >                 ret = cqhci_resume(host->mmc);
> > +       } else if (esdhc_is_usdhc(imx_data)) {
> > +               mmc_hsq_resume(host->mmc);
> > +       }
> >
> >         if (!ret)
> >                 ret = mmc_gpio_set_cd_wake(host->mmc, false); @@
> > -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device
> *dev)
> >                 ret = cqhci_suspend(host->mmc);
> >                 if (ret)
> >                         return ret;
> > +       } else if (esdhc_is_usdhc(imx_data)) {
> > +               mmc_hsq_suspend(host->mmc);
> >         }
> >
> >         ret = sdhci_runtime_suspend_host(host); @@ -1851,8 +1897,11
> @@
> > static int sdhci_esdhc_runtime_resume(struct device *dev)
> >         if (err)
> >                 goto disable_ipg_clk;
> >
> > -       if (host->mmc->caps2 & MMC_CAP2_CQE)
> > +       if (host->mmc->caps2 & MMC_CAP2_CQE) {
> >                 err = cqhci_resume(host->mmc);
> > +       } else if (esdhc_is_usdhc(imx_data)) {
> > +               mmc_hsq_resume(host->mmc);
> > +       }
> >
> >         return err;
> >
> >
> >
> > > > > Changes from v8:
> > > > >  - Add more description in the commit message.
> > > > >  - Optimize the failure log when calling cqe_enable().
> > > > >
> > > > > Changes from v7:
> > > > >  - Add reviewed tag from Arnd.
> > > > >  - Use the 'hsq' acronym for varibles and functions in the core layer.
> > > > >  - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the
> > > > > CQE can work normally.
> > > > >  - Add a new patch to enable the host software queue for the SD card.
> > > > >  - Use the default MMC queue depth for host software queue.
> > > > >
> > > > > Changes from v6:
> > > > >  - Change the patch order and set host->always_defer_done = true
> > > > > for the  Spreadtrum host driver.
> > > > >
> > > > > Changes from v5:
> > > > >  - Modify the condition of defering to complete request
> > > > > suggested by
> > > Adrian.
> > > > >
> > > > > Changes from v4:
> > > > >  - Add a seperate patch to introduce a variable to defer to
> > > > > complete data requests for some host drivers, when using host
> software queue.
> > > > >
> > > > > Changes from v3:
> > > > >  - Use host software queue instead of sqhci.
> > > > >  - Fix random config building issue.
> > > > >  - Change queue depth to 32, but still only allow 2 requests in flight.
> > > > >  - Update the testing data.
> > > > >
> > > > > Changes from v2:
> > > > >  - Remove reference to 'struct cqhci_host' and 'struct
> > > > > cqhci_slot', instead adding 'struct sqhci_host', which is only used by
> software queue.
> > > > >
> > > > > Changes from v1:
> > > > >  - Add request_done ops for sdhci_ops.
> > > > >  - Replace virtual command queue with software queue for
> > > > > functions and  variables.
> > > > >  - Rename the software queue file and add sqhci.h header file.
> > > > >
> > > > > Baolin Wang (5):
> > > > >   mmc: Add MMC host software queue support
> > > > >   mmc: core: Enable the MMC host software queue for the SD card
> > > > >   mmc: host: sdhci: Add request_done ops for struct sdhci_ops
> > > > >   mmc: host: sdhci: Add a variable to defer to complete requests if
> > > > >     needed
> > > > >   mmc: host: sdhci-sprd: Add software queue support
> > > > >
> > > > >  drivers/mmc/core/block.c      |   61 ++++++++
> > > > >  drivers/mmc/core/mmc.c        |   18 ++-
> > > > >  drivers/mmc/core/queue.c      |   22 ++-
> > > > >  drivers/mmc/core/sd.c         |   10 ++
> > > > >  drivers/mmc/host/Kconfig      |    8 +
> > > > >  drivers/mmc/host/Makefile     |    1 +
> > > > >  drivers/mmc/host/cqhci.c      |    8 +-
> > > > >  drivers/mmc/host/mmc_hsq.c    |  343
> > > +++++++++++++++++++++++++++++++++++++++++
> > > > >  drivers/mmc/host/mmc_hsq.h    |   30 ++++
> > > > >  drivers/mmc/host/sdhci-sprd.c |   28 ++++
> > > > >  drivers/mmc/host/sdhci.c      |   14 +-
> > > > >  drivers/mmc/host/sdhci.h      |    3 +
> > > > >  include/linux/mmc/host.h      |    3 +
> > > > >  13 files changed, 534 insertions(+), 15 deletions(-)  create
> > > > > mode
> > > > > 100644 drivers/mmc/host/mmc_hsq.c  create mode 100644
> > > > > drivers/mmc/host/mmc_hsq.h
> > > > >
> > > > > --
> > > > > 1.7.9.5
> > > > >
> > > >
> > > > Applied for next, thanks! Also, thanks for your patience while
> > > > moving forward during the reviews!
> > >
> > > I am very appreciated for you and Arnd's good sugestion when
> > > introducing the hsq.
> > >
> > > >
> > > > Note, I did some amending of patch1 to resolve some checkpatch
> > > > warnings. SPDX licence and Kconfig help texts, please have a look
> > > > and tell if there are something that doesn't look good.
> > >
> > > Thanks for your help and looks good to me.
> 
> 
> 
> --
> Baolin Wang
Baolin Wang June 14, 2020, 3:05 p.m. UTC | #6
On Wed, Jun 10, 2020 at 10:26 AM BOUGH CHEN <haibo.chen@nxp.com> wrote:
>
> > -----Original Message-----
> > From: Baolin Wang [mailto:baolin.wang7@gmail.com]
> > Sent: 2020年6月8日 19:54
> > To: BOUGH CHEN <haibo.chen@nxp.com>
> > Cc: Ulf Hansson <ulf.hansson@linaro.org>; Adrian Hunter
> > <adrian.hunter@intel.com>; Asutosh Das <asutoshd@codeaurora.org>; Orson
> > Zhai <orsonzhai@gmail.com>; Chunyan Zhang <zhang.lyra@gmail.com>; Arnd
> > Bergmann <arnd@arndb.de>; Linus Walleij <linus.walleij@linaro.org>; Baolin
> > Wang <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel
> > Mailing List <linux-kernel@vger.kernel.org>; dl-linux-imx <linux-imx@nxp.com>
> > Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> >
> > Hi Haibo.
> >
> > On Mon, Jun 8, 2020 at 2:35 PM BOUGH CHEN <haibo.chen@nxp.com> wrote:
> > >
> > > > -----Original Message-----
> > > > From: linux-mmc-owner@vger.kernel.org
> > > > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin Wang
> > > > Sent: 2020年2月19日 9:35
> > > > To: Ulf Hansson <ulf.hansson@linaro.org>
> > > > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das
> > > > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>;
> > Chunyan
> > > > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus
> > > > Walleij <linus.walleij@linaro.org>; Baolin Wang
> > > > <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel
> > > > Mailing List <linux-kernel@vger.kernel.org>
> > > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> > > >
> > > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org>
> > > > wrote:
> > > > >
> > > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > Now the MMC read/write stack will always wait for previous
> > > > > > request is completed by mmc_blk_rw_wait(), before sending a new
> > > > > > request to hardware, or queue a work to complete request, that
> > > > > > will bring context switching overhead, especially for high I/O
> > > > > > per second rates, to affect the IO performance.
> > > > > >
> > > > > > Thus this patch set will introduce the MMC software command
> > > > > > queue support based on command queue engine's interfaces, and
> > > > > > set the queue depth as 64 to allow more requests can be be
> > > > > > prepared, merged and inserted into IO scheduler, but we only
> > > > > > allow 2 requests in flight, that is enough to let the irq
> > > > > > handler always trigger the next request without a context switch, as
> > well as avoiding a long latency.
> > > > > >
> > > > > > Moreover we can expand the MMC software queue interface to
> > > > > > support MMC packed request or packed command instead of adding
> > > > > > new interfaces, according to previosus discussion.
> > > > > >
> > > > > > Below are some comparison data with fio tool. The fio command I
> > > > > > used is like below with changing the '--rw' parameter and
> > > > > > enabling the direct IO flag to measure the actual hardware
> > > > > > transfer speed in 4K block
> > > > size.
> > > > > >
> > > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20
> > > > > > --rw=read --bs=4K --size=1G --group_reporting --numjobs=20
> > > > > > --name=test_read
> > > > > >
> > > > > > My eMMC card working at HS400 Enhanced strobe mode:
> > > > > > [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at
> > address
> > > > 0001
> > > > > > [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> > > > > > [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00
> > MiB
> > > > > > [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00
> > MiB
> > > > > > [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00
> > MiB,
> > > > chardev (248:0)
> > > > > >
> > > > > > 1. Without MMC software queue
> > > > > > I tested 5 times for each case and output a average speed.
> > > > > >
> > > > > > 1) Sequential read:
> > > > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s
> > > > > > Average
> > > > > > speed: 59.66MiB/s
> > > > > >
> > > > > > 2) Random read:
> > > > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s
> > > > > > Average
> > > > > > speed: 27.04MiB/s
> > > > > >
> > > > > > 3) Sequential write:
> > > > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s
> > > > > > Average
> > > > > > speed: 69.68MiB/s
> > > > > >
> > > > > > 4) Random write:
> > > > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s
> > > > > > Average
> > > > > > speed: 35.96MiB/s
> > > > > >
> > > > > > 2. With MMC software queue
> > > > > > I tested 5 times for each case and output a average speed.
> > > > > >
> > > > > > 1) Sequential read:
> > > > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s
> > > > > > Average
> > > > > > speed: 60.68MiB/s
> > > > > >
> > > > > > 2) Random read:
> > > > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s
> > > > > > Average
> > > > > > speed: 31.36MiB/s
> > > > > >
> > > > > > 3) Sequential write:
> > > > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average
> > > > > > speed: 71.66MiB/s
> > > > > >
> > > > > > 4) Random write:
> > > > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s
> > > > > > Average
> > > > > > speed: 68.76MiB/s
> > > > > >
> > > > > > Form above data, we can see the MMC software queue can help to
> > > > > > improve some performance obviously for random read and write,
> > > > > > though no obvious improvement for sequential read and write.
> > > > > >
> > > > > > Any comments are welcome. Thanks a lot.
> > > > > >
> > >
> > > Hi Baolin,
> > >
> > > I refer to your code, and add the software queue support on i.MX based on
> > the Linux next-20200602, but unfortunately, I see an obvious performance drop
> > when change to use software queue.
> > > I test on our imx850-evk board, with eMMC soldered.
> > > From the result listing below, only random write has a little performance
> > improve, for others, seems performance drop a lot.
> > > I noticed that, this software queue need no-removable card, any other
> > limitation? For host?
> > > From the code logic, software queue complete the request in irq handler,
> > seems no other change, I do not figure out why this will trigger a performance
> > drop on my platform. Any comment would be appreciate!
> >
> > Have you tested with below patches, which introduce an atomic_request host
> > ops to submit next request in the irq hard handler context to reduce latency?
>
> Hi Baolin,
>
> The Linux code base I use is Linux-next-20200602, which already contain all the patch you listed, including the atomic_request.
> I do not test the atomic_request alone. I just enable software queue, and this software queue use the atomic request.
> So, do you mean, I need to first disable software queue, and just enable the atomic_request, to confirm whether the atomic_request has some performance impact on my platform?

NO, the HSQ will support  atomic_request ops, and I think you should
implement the atomic_request ops in your driver if possible, like
below patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61ab64e2f54f4c607428667f83f411cb659843a3

> > > Without software queue, normal read/write method:
> > > Sequential read: 56MB/s
> > > Random read: 23.5MB/s
> > > Sequential write: 43.7MB/s
> > > Random write: 19MB/s
> > >
> > > With mmc software queue:
> > > Sequential read: 33.5MB/s
> > > Random read: 18.7 MB/s
> > > Sequential write: 37.7MB/s
> > > Random write: 19.8MB/s
> > >
> > >
> > > Here, I also list my change code to support software queue
> > >
> > > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index
> > > eb85237bf2d6..996b8cc5c381 100644
> > > --- a/drivers/mmc/host/Kconfig
> > > +++ b/drivers/mmc/host/Kconfig
> > > @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX
> > >         depends on MMC_SDHCI_PLTFM
> > >         select MMC_SDHCI_IO_ACCESSORS
> > >         select MMC_CQHCI
> > > +       select MMC_HSQ
> > >         help
> > >           This selects the Freescale eSDHC/uSDHC controller support
> > >           found on i.MX25, i.MX35 i.MX5x and i.MX6x.
> > > diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c
> > > b/drivers/mmc/host/sdhci-esdhc-imx.c
> > > index 1d7f84b23a22..6f163695b08d 100644
> > > --- a/drivers/mmc/host/sdhci-esdhc-imx.c
> > > +++ b/drivers/mmc/host/sdhci-esdhc-imx.c
> > > @@ -29,6 +29,7 @@
> > >  #include "sdhci-pltfm.h"
> > >  #include "sdhci-esdhc.h"
> > >  #include "cqhci.h"
> > > +#include "mmc_hsq.h"
> > >
> > >  #define ESDHC_SYS_CTRL_DTOCV_MASK      0x0f
> > >  #define        ESDHC_CTRL_D3CD                 0x08
> > > @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host
> > *host, u32 intmask)
> > >         return 0;
> > >  }
> > >
> > > +static void esdhc_request_done(struct sdhci_host *host, struct
> > > +mmc_request *mrq) {
> > > +       /* Validate if the request was from software queue firstly. */
> > > +       if (mmc_hsq_finalize_request(host->mmc, mrq))
> > > +               return;
> > > +
> > > +       mmc_request_done(host->mmc, mrq); }
> > > +
> > >  static struct sdhci_ops sdhci_esdhc_ops = {
> > >         .read_l = esdhc_readl_le,
> > >         .read_w = esdhc_readw_le,
> > > @@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = {
> > >         .set_uhs_signaling = esdhc_set_uhs_signaling,
> > >         .reset = esdhc_reset,
> > >         .irq = esdhc_cqhci_irq,
> > > +       .request_done = esdhc_request_done,
> > >  };
> > >
> > >  static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = { @@
> > > -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host
> > *host)
> > >                         writel(tmp, host->ioaddr +
> > ESDHC_VEND_SPEC2);
> > >
> > >                         host->quirks &=
> > ~SDHCI_QUIRK_NO_BUSY_IRQ;
> > > +
> > > +                       /*
> > > +                        * On i.MX8MM, we are running Dual Linux OS,
> > with 1st Linux using SD Card
> > > +                        * as rootfs storage, 2nd Linux using eMMC as
> > rootfs storage. We let the
> > > +                        * the 1st linux configure power/clock for the
> > 2nd Linux.
> > > +                        *
> > > +                        * When the 2nd Linux is booting into rootfs
> > stage, we let the 1st Linux
> > > +                        * to destroy the 2nd linux, then restart the
> > 2nd linux, we met SDHCI dump.
> > > +                        * After we clear the pending interrupt and halt
> > CQCTL, issue gone.
> > > +                        */
> > > +                       tmp = cqhci_readl(cq_host, CQHCI_IS);
> > > +                       cqhci_writel(cq_host, tmp, CQHCI_IS);
> > > +                       cqhci_writel(cq_host, CQHCI_HALT,
> > CQHCI_CTL);
> > >                 }
> > >
> > >                 if (imx_data->socdata->flags &
> > ESDHC_FLAG_STD_TUNING)
> > > { @@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct
> > sdhci_host *host)
> > >                  * After we clear the pending interrupt and halt CQCTL,
> > issue gone.
> > >                  */
> > >                 if (cq_host) {
> > > -                       tmp = cqhci_readl(cq_host, CQHCI_IS);
> > > -                       cqhci_writel(cq_host, tmp, CQHCI_IS);
> > > -                       cqhci_writel(cq_host, CQHCI_HALT,
> > CQHCI_CTL);
> > >                 }
> > >         }
> > >  }
> > > @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct
> > platform_device *pdev)
> > >         struct sdhci_pltfm_host *pltfm_host;
> > >         struct sdhci_host *host;
> > >         struct cqhci_host *cq_host;
> > > +       struct mmc_hsq *hsq;
> > >         int err;
> > >         struct pltfm_imx_data *imx_data;
> > >
> > > @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct
> > platform_device *pdev)
> > >                 err = cqhci_init(cq_host, host->mmc, false);
> > >                 if (err)
> > >                         goto disable_ahb_clk;
> > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > +               hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq),
> > GFP_KERNEL);
> > > +               if (!hsq) {
> > > +                       err = -ENOMEM;
> > > +                       goto disable_ahb_clk;
> > > +               }
> > > +
> > > +               err = mmc_hsq_init(hsq, host->mmc);
> > > +               if (err)
> > > +                       goto disable_ahb_clk;
> > >         }
> > >
> > >         if (of_id)
> > > @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct
> > platform_device *pdev)
> > >         if (err)
> > >                 goto disable_ahb_clk;
> > >
> > > +       if (!mmc_card_is_removable(host->mmc))
> > > +               host->mmc_host_ops.request_atomic =
> > sdhci_request_atomic;
> > > +       else
> > > +               host->always_defer_done = true;
> > > +
> > >         sdhci_esdhc_imx_hwinit(host);
> > >
> > >         err = sdhci_add_host(host);
> > > @@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device
> > *dev)
> > >                 ret = cqhci_suspend(host->mmc);
> > >                 if (ret)
> > >                         return ret;
> > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > +               mmc_hsq_suspend(host->mmc);
> > >         }
> > >
> > >         if ((imx_data->socdata->flags &
> > > ESDHC_FLAG_STATE_LOST_IN_LPMODE) && @@ -1764,6 +1803,8 @@
> > static int
> > > sdhci_esdhc_suspend(struct device *dev)  static int
> > > sdhci_esdhc_resume(struct device *dev)  {
> > >         struct sdhci_host *host = dev_get_drvdata(dev);
> > > +       struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> > > +       struct pltfm_imx_data *imx_data =
> > > + sdhci_pltfm_priv(pltfm_host);
> > >         int ret;
> > >
> > >         ret = pinctrl_pm_select_default_state(dev);
> > > @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device
> > *dev)
> > >         if (ret)
> > >                 return ret;
> > >
> > > -       if (host->mmc->caps2 & MMC_CAP2_CQE)
> > > +       if (host->mmc->caps2 & MMC_CAP2_CQE) {
> > >                 ret = cqhci_resume(host->mmc);
> > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > +               mmc_hsq_resume(host->mmc);
> > > +       }
> > >
> > >         if (!ret)
> > >                 ret = mmc_gpio_set_cd_wake(host->mmc, false); @@
> > > -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device
> > *dev)
> > >                 ret = cqhci_suspend(host->mmc);
> > >                 if (ret)
> > >                         return ret;
> > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > +               mmc_hsq_suspend(host->mmc);
> > >         }
> > >
> > >         ret = sdhci_runtime_suspend_host(host); @@ -1851,8 +1897,11
> > @@
> > > static int sdhci_esdhc_runtime_resume(struct device *dev)
> > >         if (err)
> > >                 goto disable_ipg_clk;
> > >
> > > -       if (host->mmc->caps2 & MMC_CAP2_CQE)
> > > +       if (host->mmc->caps2 & MMC_CAP2_CQE) {
> > >                 err = cqhci_resume(host->mmc);
> > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > +               mmc_hsq_resume(host->mmc);
> > > +       }
> > >
> > >         return err;
> > >
> > >
> > >
> > > > > > Changes from v8:
> > > > > >  - Add more description in the commit message.
> > > > > >  - Optimize the failure log when calling cqe_enable().
> > > > > >
> > > > > > Changes from v7:
> > > > > >  - Add reviewed tag from Arnd.
> > > > > >  - Use the 'hsq' acronym for varibles and functions in the core layer.
> > > > > >  - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the
> > > > > > CQE can work normally.
> > > > > >  - Add a new patch to enable the host software queue for the SD card.
> > > > > >  - Use the default MMC queue depth for host software queue.
> > > > > >
> > > > > > Changes from v6:
> > > > > >  - Change the patch order and set host->always_defer_done = true
> > > > > > for the  Spreadtrum host driver.
> > > > > >
> > > > > > Changes from v5:
> > > > > >  - Modify the condition of defering to complete request
> > > > > > suggested by
> > > > Adrian.
> > > > > >
> > > > > > Changes from v4:
> > > > > >  - Add a seperate patch to introduce a variable to defer to
> > > > > > complete data requests for some host drivers, when using host
> > software queue.
> > > > > >
> > > > > > Changes from v3:
> > > > > >  - Use host software queue instead of sqhci.
> > > > > >  - Fix random config building issue.
> > > > > >  - Change queue depth to 32, but still only allow 2 requests in flight.
> > > > > >  - Update the testing data.
> > > > > >
> > > > > > Changes from v2:
> > > > > >  - Remove reference to 'struct cqhci_host' and 'struct
> > > > > > cqhci_slot', instead adding 'struct sqhci_host', which is only used by
> > software queue.
> > > > > >
> > > > > > Changes from v1:
> > > > > >  - Add request_done ops for sdhci_ops.
> > > > > >  - Replace virtual command queue with software queue for
> > > > > > functions and  variables.
> > > > > >  - Rename the software queue file and add sqhci.h header file.
> > > > > >
> > > > > > Baolin Wang (5):
> > > > > >   mmc: Add MMC host software queue support
> > > > > >   mmc: core: Enable the MMC host software queue for the SD card
> > > > > >   mmc: host: sdhci: Add request_done ops for struct sdhci_ops
> > > > > >   mmc: host: sdhci: Add a variable to defer to complete requests if
> > > > > >     needed
> > > > > >   mmc: host: sdhci-sprd: Add software queue support
> > > > > >
> > > > > >  drivers/mmc/core/block.c      |   61 ++++++++
> > > > > >  drivers/mmc/core/mmc.c        |   18 ++-
> > > > > >  drivers/mmc/core/queue.c      |   22 ++-
> > > > > >  drivers/mmc/core/sd.c         |   10 ++
> > > > > >  drivers/mmc/host/Kconfig      |    8 +
> > > > > >  drivers/mmc/host/Makefile     |    1 +
> > > > > >  drivers/mmc/host/cqhci.c      |    8 +-
> > > > > >  drivers/mmc/host/mmc_hsq.c    |  343
> > > > +++++++++++++++++++++++++++++++++++++++++
> > > > > >  drivers/mmc/host/mmc_hsq.h    |   30 ++++
> > > > > >  drivers/mmc/host/sdhci-sprd.c |   28 ++++
> > > > > >  drivers/mmc/host/sdhci.c      |   14 +-
> > > > > >  drivers/mmc/host/sdhci.h      |    3 +
> > > > > >  include/linux/mmc/host.h      |    3 +
> > > > > >  13 files changed, 534 insertions(+), 15 deletions(-)  create
> > > > > > mode
> > > > > > 100644 drivers/mmc/host/mmc_hsq.c  create mode 100644
> > > > > > drivers/mmc/host/mmc_hsq.h
> > > > > >
> > > > > > --
> > > > > > 1.7.9.5
> > > > > >
> > > > >
> > > > > Applied for next, thanks! Also, thanks for your patience while
> > > > > moving forward during the reviews!
> > > >
> > > > I am very appreciated for you and Arnd's good sugestion when
> > > > introducing the hsq.
> > > >
> > > > >
> > > > > Note, I did some amending of patch1 to resolve some checkpatch
> > > > > warnings. SPDX licence and Kconfig help texts, please have a look
> > > > > and tell if there are something that doesn't look good.
> > > >
> > > > Thanks for your help and looks good to me.
> >
> >
> >
> > --
> > Baolin Wang
Baolin Wang June 14, 2020, 11:25 p.m. UTC | #7
On Sun, Jun 14, 2020 at 11:05 PM Baolin Wang <baolin.wang7@gmail.com> wrote:
>
> On Wed, Jun 10, 2020 at 10:26 AM BOUGH CHEN <haibo.chen@nxp.com> wrote:
> >
> > > -----Original Message-----
> > > From: Baolin Wang [mailto:baolin.wang7@gmail.com]
> > > Sent: 2020年6月8日 19:54
> > > To: BOUGH CHEN <haibo.chen@nxp.com>
> > > Cc: Ulf Hansson <ulf.hansson@linaro.org>; Adrian Hunter
> > > <adrian.hunter@intel.com>; Asutosh Das <asutoshd@codeaurora.org>; Orson
> > > Zhai <orsonzhai@gmail.com>; Chunyan Zhang <zhang.lyra@gmail.com>; Arnd
> > > Bergmann <arnd@arndb.de>; Linus Walleij <linus.walleij@linaro.org>; Baolin
> > > Wang <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel
> > > Mailing List <linux-kernel@vger.kernel.org>; dl-linux-imx <linux-imx@nxp.com>
> > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> > >
> > > Hi Haibo.
> > >
> > > On Mon, Jun 8, 2020 at 2:35 PM BOUGH CHEN <haibo.chen@nxp.com> wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: linux-mmc-owner@vger.kernel.org
> > > > > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin Wang
> > > > > Sent: 2020年2月19日 9:35
> > > > > To: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das
> > > > > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>;
> > > Chunyan
> > > > > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus
> > > > > Walleij <linus.walleij@linaro.org>; Baolin Wang
> > > > > <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel
> > > > > Mailing List <linux-kernel@vger.kernel.org>
> > > > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> > > > >
> > > > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org>
> > > > > wrote:
> > > > > >
> > > > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > Now the MMC read/write stack will always wait for previous
> > > > > > > request is completed by mmc_blk_rw_wait(), before sending a new
> > > > > > > request to hardware, or queue a work to complete request, that
> > > > > > > will bring context switching overhead, especially for high I/O
> > > > > > > per second rates, to affect the IO performance.
> > > > > > >
> > > > > > > Thus this patch set will introduce the MMC software command
> > > > > > > queue support based on command queue engine's interfaces, and
> > > > > > > set the queue depth as 64 to allow more requests can be be
> > > > > > > prepared, merged and inserted into IO scheduler, but we only
> > > > > > > allow 2 requests in flight, that is enough to let the irq
> > > > > > > handler always trigger the next request without a context switch, as
> > > well as avoiding a long latency.
> > > > > > >
> > > > > > > Moreover we can expand the MMC software queue interface to
> > > > > > > support MMC packed request or packed command instead of adding
> > > > > > > new interfaces, according to previosus discussion.
> > > > > > >
> > > > > > > Below are some comparison data with fio tool. The fio command I
> > > > > > > used is like below with changing the '--rw' parameter and
> > > > > > > enabling the direct IO flag to measure the actual hardware
> > > > > > > transfer speed in 4K block
> > > > > size.
> > > > > > >
> > > > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20
> > > > > > > --rw=read --bs=4K --size=1G --group_reporting --numjobs=20
> > > > > > > --name=test_read
> > > > > > >
> > > > > > > My eMMC card working at HS400 Enhanced strobe mode:
> > > > > > > [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at
> > > address
> > > > > 0001
> > > > > > > [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> > > > > > > [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00
> > > MiB
> > > > > > > [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00
> > > MiB
> > > > > > > [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00
> > > MiB,
> > > > > chardev (248:0)
> > > > > > >
> > > > > > > 1. Without MMC software queue
> > > > > > > I tested 5 times for each case and output a average speed.
> > > > > > >
> > > > > > > 1) Sequential read:
> > > > > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s
> > > > > > > Average
> > > > > > > speed: 59.66MiB/s
> > > > > > >
> > > > > > > 2) Random read:
> > > > > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s
> > > > > > > Average
> > > > > > > speed: 27.04MiB/s
> > > > > > >
> > > > > > > 3) Sequential write:
> > > > > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s
> > > > > > > Average
> > > > > > > speed: 69.68MiB/s
> > > > > > >
> > > > > > > 4) Random write:
> > > > > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s
> > > > > > > Average
> > > > > > > speed: 35.96MiB/s
> > > > > > >
> > > > > > > 2. With MMC software queue
> > > > > > > I tested 5 times for each case and output a average speed.
> > > > > > >
> > > > > > > 1) Sequential read:
> > > > > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s
> > > > > > > Average
> > > > > > > speed: 60.68MiB/s
> > > > > > >
> > > > > > > 2) Random read:
> > > > > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s
> > > > > > > Average
> > > > > > > speed: 31.36MiB/s
> > > > > > >
> > > > > > > 3) Sequential write:
> > > > > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average
> > > > > > > speed: 71.66MiB/s
> > > > > > >
> > > > > > > 4) Random write:
> > > > > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s
> > > > > > > Average
> > > > > > > speed: 68.76MiB/s
> > > > > > >
> > > > > > > Form above data, we can see the MMC software queue can help to
> > > > > > > improve some performance obviously for random read and write,
> > > > > > > though no obvious improvement for sequential read and write.
> > > > > > >
> > > > > > > Any comments are welcome. Thanks a lot.
> > > > > > >
> > > >
> > > > Hi Baolin,
> > > >
> > > > I refer to your code, and add the software queue support on i.MX based on
> > > the Linux next-20200602, but unfortunately, I see an obvious performance drop
> > > when change to use software queue.
> > > > I test on our imx850-evk board, with eMMC soldered.
> > > > From the result listing below, only random write has a little performance
> > > improve, for others, seems performance drop a lot.
> > > > I noticed that, this software queue need no-removable card, any other
> > > limitation? For host?
> > > > From the code logic, software queue complete the request in irq handler,
> > > seems no other change, I do not figure out why this will trigger a performance
> > > drop on my platform. Any comment would be appreciate!
> > >
> > > Have you tested with below patches, which introduce an atomic_request host
> > > ops to submit next request in the irq hard handler context to reduce latency?
> >
> > Hi Baolin,
> >
> > The Linux code base I use is Linux-next-20200602, which already contain all the patch you listed, including the atomic_request.
> > I do not test the atomic_request alone. I just enable software queue, and this software queue use the atomic request.
> > So, do you mean, I need to first disable software queue, and just enable the atomic_request, to confirm whether the atomic_request has some performance impact on my platform?
>
> NO, the HSQ will support  atomic_request ops, and I think you should
> implement the atomic_request ops in your driver if possible, like
> below patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61ab64e2f54f4c607428667f83f411cb659843a3

Ah, sorry, I saw you've implemented the atomic_request ops. Moreover
I've tested the mainline, and I still got some performance
improvement.

What's the IO scheduler you selected?

>
> > > > Without software queue, normal read/write method:
> > > > Sequential read: 56MB/s
> > > > Random read: 23.5MB/s
> > > > Sequential write: 43.7MB/s
> > > > Random write: 19MB/s
> > > >
> > > > With mmc software queue:
> > > > Sequential read: 33.5MB/s
> > > > Random read: 18.7 MB/s
> > > > Sequential write: 37.7MB/s
> > > > Random write: 19.8MB/s
> > > >
> > > >
> > > > Here, I also list my change code to support software queue
> > > >
> > > > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index
> > > > eb85237bf2d6..996b8cc5c381 100644
> > > > --- a/drivers/mmc/host/Kconfig
> > > > +++ b/drivers/mmc/host/Kconfig
> > > > @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX
> > > >         depends on MMC_SDHCI_PLTFM
> > > >         select MMC_SDHCI_IO_ACCESSORS
> > > >         select MMC_CQHCI
> > > > +       select MMC_HSQ
> > > >         help
> > > >           This selects the Freescale eSDHC/uSDHC controller support
> > > >           found on i.MX25, i.MX35 i.MX5x and i.MX6x.
> > > > diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c
> > > > b/drivers/mmc/host/sdhci-esdhc-imx.c
> > > > index 1d7f84b23a22..6f163695b08d 100644
> > > > --- a/drivers/mmc/host/sdhci-esdhc-imx.c
> > > > +++ b/drivers/mmc/host/sdhci-esdhc-imx.c
> > > > @@ -29,6 +29,7 @@
> > > >  #include "sdhci-pltfm.h"
> > > >  #include "sdhci-esdhc.h"
> > > >  #include "cqhci.h"
> > > > +#include "mmc_hsq.h"
> > > >
> > > >  #define ESDHC_SYS_CTRL_DTOCV_MASK      0x0f
> > > >  #define        ESDHC_CTRL_D3CD                 0x08
> > > > @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host
> > > *host, u32 intmask)
> > > >         return 0;
> > > >  }
> > > >
> > > > +static void esdhc_request_done(struct sdhci_host *host, struct
> > > > +mmc_request *mrq) {
> > > > +       /* Validate if the request was from software queue firstly. */
> > > > +       if (mmc_hsq_finalize_request(host->mmc, mrq))
> > > > +               return;
> > > > +
> > > > +       mmc_request_done(host->mmc, mrq); }
> > > > +
> > > >  static struct sdhci_ops sdhci_esdhc_ops = {
> > > >         .read_l = esdhc_readl_le,
> > > >         .read_w = esdhc_readw_le,
> > > > @@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = {
> > > >         .set_uhs_signaling = esdhc_set_uhs_signaling,
> > > >         .reset = esdhc_reset,
> > > >         .irq = esdhc_cqhci_irq,
> > > > +       .request_done = esdhc_request_done,
> > > >  };
> > > >
> > > >  static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = { @@
> > > > -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host
> > > *host)
> > > >                         writel(tmp, host->ioaddr +
> > > ESDHC_VEND_SPEC2);
> > > >
> > > >                         host->quirks &=
> > > ~SDHCI_QUIRK_NO_BUSY_IRQ;
> > > > +
> > > > +                       /*
> > > > +                        * On i.MX8MM, we are running Dual Linux OS,
> > > with 1st Linux using SD Card
> > > > +                        * as rootfs storage, 2nd Linux using eMMC as
> > > rootfs storage. We let the
> > > > +                        * the 1st linux configure power/clock for the
> > > 2nd Linux.
> > > > +                        *
> > > > +                        * When the 2nd Linux is booting into rootfs
> > > stage, we let the 1st Linux
> > > > +                        * to destroy the 2nd linux, then restart the
> > > 2nd linux, we met SDHCI dump.
> > > > +                        * After we clear the pending interrupt and halt
> > > CQCTL, issue gone.
> > > > +                        */
> > > > +                       tmp = cqhci_readl(cq_host, CQHCI_IS);
> > > > +                       cqhci_writel(cq_host, tmp, CQHCI_IS);
> > > > +                       cqhci_writel(cq_host, CQHCI_HALT,
> > > CQHCI_CTL);
> > > >                 }
> > > >
> > > >                 if (imx_data->socdata->flags &
> > > ESDHC_FLAG_STD_TUNING)
> > > > { @@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct
> > > sdhci_host *host)
> > > >                  * After we clear the pending interrupt and halt CQCTL,
> > > issue gone.
> > > >                  */
> > > >                 if (cq_host) {
> > > > -                       tmp = cqhci_readl(cq_host, CQHCI_IS);
> > > > -                       cqhci_writel(cq_host, tmp, CQHCI_IS);
> > > > -                       cqhci_writel(cq_host, CQHCI_HALT,
> > > CQHCI_CTL);
> > > >                 }
> > > >         }
> > > >  }
> > > > @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct
> > > platform_device *pdev)
> > > >         struct sdhci_pltfm_host *pltfm_host;
> > > >         struct sdhci_host *host;
> > > >         struct cqhci_host *cq_host;
> > > > +       struct mmc_hsq *hsq;
> > > >         int err;
> > > >         struct pltfm_imx_data *imx_data;
> > > >
> > > > @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct
> > > platform_device *pdev)
> > > >                 err = cqhci_init(cq_host, host->mmc, false);
> > > >                 if (err)
> > > >                         goto disable_ahb_clk;
> > > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > > +               hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq),
> > > GFP_KERNEL);
> > > > +               if (!hsq) {
> > > > +                       err = -ENOMEM;
> > > > +                       goto disable_ahb_clk;
> > > > +               }
> > > > +
> > > > +               err = mmc_hsq_init(hsq, host->mmc);
> > > > +               if (err)
> > > > +                       goto disable_ahb_clk;
> > > >         }
> > > >
> > > >         if (of_id)
> > > > @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct
> > > platform_device *pdev)
> > > >         if (err)
> > > >                 goto disable_ahb_clk;
> > > >
> > > > +       if (!mmc_card_is_removable(host->mmc))
> > > > +               host->mmc_host_ops.request_atomic =
> > > sdhci_request_atomic;
> > > > +       else
> > > > +               host->always_defer_done = true;
> > > > +
> > > >         sdhci_esdhc_imx_hwinit(host);
> > > >
> > > >         err = sdhci_add_host(host);
> > > > @@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device
> > > *dev)
> > > >                 ret = cqhci_suspend(host->mmc);
> > > >                 if (ret)
> > > >                         return ret;
> > > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > > +               mmc_hsq_suspend(host->mmc);
> > > >         }
> > > >
> > > >         if ((imx_data->socdata->flags &
> > > > ESDHC_FLAG_STATE_LOST_IN_LPMODE) && @@ -1764,6 +1803,8 @@
> > > static int
> > > > sdhci_esdhc_suspend(struct device *dev)  static int
> > > > sdhci_esdhc_resume(struct device *dev)  {
> > > >         struct sdhci_host *host = dev_get_drvdata(dev);
> > > > +       struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> > > > +       struct pltfm_imx_data *imx_data =
> > > > + sdhci_pltfm_priv(pltfm_host);
> > > >         int ret;
> > > >
> > > >         ret = pinctrl_pm_select_default_state(dev);
> > > > @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device
> > > *dev)
> > > >         if (ret)
> > > >                 return ret;
> > > >
> > > > -       if (host->mmc->caps2 & MMC_CAP2_CQE)
> > > > +       if (host->mmc->caps2 & MMC_CAP2_CQE) {
> > > >                 ret = cqhci_resume(host->mmc);
> > > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > > +               mmc_hsq_resume(host->mmc);
> > > > +       }
> > > >
> > > >         if (!ret)
> > > >                 ret = mmc_gpio_set_cd_wake(host->mmc, false); @@
> > > > -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device
> > > *dev)
> > > >                 ret = cqhci_suspend(host->mmc);
> > > >                 if (ret)
> > > >                         return ret;
> > > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > > +               mmc_hsq_suspend(host->mmc);
> > > >         }
> > > >
> > > >         ret = sdhci_runtime_suspend_host(host); @@ -1851,8 +1897,11
> > > @@
> > > > static int sdhci_esdhc_runtime_resume(struct device *dev)
> > > >         if (err)
> > > >                 goto disable_ipg_clk;
> > > >
> > > > -       if (host->mmc->caps2 & MMC_CAP2_CQE)
> > > > +       if (host->mmc->caps2 & MMC_CAP2_CQE) {
> > > >                 err = cqhci_resume(host->mmc);
> > > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > > +               mmc_hsq_resume(host->mmc);
> > > > +       }
> > > >
> > > >         return err;
> > > >
> > > >
> > > >
> > > > > > > Changes from v8:
> > > > > > >  - Add more description in the commit message.
> > > > > > >  - Optimize the failure log when calling cqe_enable().
> > > > > > >
> > > > > > > Changes from v7:
> > > > > > >  - Add reviewed tag from Arnd.
> > > > > > >  - Use the 'hsq' acronym for varibles and functions in the core layer.
> > > > > > >  - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the
> > > > > > > CQE can work normally.
> > > > > > >  - Add a new patch to enable the host software queue for the SD card.
> > > > > > >  - Use the default MMC queue depth for host software queue.
> > > > > > >
> > > > > > > Changes from v6:
> > > > > > >  - Change the patch order and set host->always_defer_done = true
> > > > > > > for the  Spreadtrum host driver.
> > > > > > >
> > > > > > > Changes from v5:
> > > > > > >  - Modify the condition of defering to complete request
> > > > > > > suggested by
> > > > > Adrian.
> > > > > > >
> > > > > > > Changes from v4:
> > > > > > >  - Add a seperate patch to introduce a variable to defer to
> > > > > > > complete data requests for some host drivers, when using host
> > > software queue.
> > > > > > >
> > > > > > > Changes from v3:
> > > > > > >  - Use host software queue instead of sqhci.
> > > > > > >  - Fix random config building issue.
> > > > > > >  - Change queue depth to 32, but still only allow 2 requests in flight.
> > > > > > >  - Update the testing data.
> > > > > > >
> > > > > > > Changes from v2:
> > > > > > >  - Remove reference to 'struct cqhci_host' and 'struct
> > > > > > > cqhci_slot', instead adding 'struct sqhci_host', which is only used by
> > > software queue.
> > > > > > >
> > > > > > > Changes from v1:
> > > > > > >  - Add request_done ops for sdhci_ops.
> > > > > > >  - Replace virtual command queue with software queue for
> > > > > > > functions and  variables.
> > > > > > >  - Rename the software queue file and add sqhci.h header file.
> > > > > > >
> > > > > > > Baolin Wang (5):
> > > > > > >   mmc: Add MMC host software queue support
> > > > > > >   mmc: core: Enable the MMC host software queue for the SD card
> > > > > > >   mmc: host: sdhci: Add request_done ops for struct sdhci_ops
> > > > > > >   mmc: host: sdhci: Add a variable to defer to complete requests if
> > > > > > >     needed
> > > > > > >   mmc: host: sdhci-sprd: Add software queue support
> > > > > > >
> > > > > > >  drivers/mmc/core/block.c      |   61 ++++++++
> > > > > > >  drivers/mmc/core/mmc.c        |   18 ++-
> > > > > > >  drivers/mmc/core/queue.c      |   22 ++-
> > > > > > >  drivers/mmc/core/sd.c         |   10 ++
> > > > > > >  drivers/mmc/host/Kconfig      |    8 +
> > > > > > >  drivers/mmc/host/Makefile     |    1 +
> > > > > > >  drivers/mmc/host/cqhci.c      |    8 +-
> > > > > > >  drivers/mmc/host/mmc_hsq.c    |  343
> > > > > +++++++++++++++++++++++++++++++++++++++++
> > > > > > >  drivers/mmc/host/mmc_hsq.h    |   30 ++++
> > > > > > >  drivers/mmc/host/sdhci-sprd.c |   28 ++++
> > > > > > >  drivers/mmc/host/sdhci.c      |   14 +-
> > > > > > >  drivers/mmc/host/sdhci.h      |    3 +
> > > > > > >  include/linux/mmc/host.h      |    3 +
> > > > > > >  13 files changed, 534 insertions(+), 15 deletions(-)  create
> > > > > > > mode
> > > > > > > 100644 drivers/mmc/host/mmc_hsq.c  create mode 100644
> > > > > > > drivers/mmc/host/mmc_hsq.h
> > > > > > >
> > > > > > > --
> > > > > > > 1.7.9.5
> > > > > > >
> > > > > >
> > > > > > Applied for next, thanks! Also, thanks for your patience while
> > > > > > moving forward during the reviews!
> > > > >
> > > > > I am very appreciated for you and Arnd's good sugestion when
> > > > > introducing the hsq.
> > > > >
> > > > > >
> > > > > > Note, I did some amending of patch1 to resolve some checkpatch
> > > > > > warnings. SPDX licence and Kconfig help texts, please have a look
> > > > > > and tell if there are something that doesn't look good.
> > > > >
> > > > > Thanks for your help and looks good to me.
> > >
> > >
> > >
> > > --
> > > Baolin Wang
>
>
>
> --
> Baolin Wang
Bough Chen June 15, 2020, 2:31 a.m. UTC | #8
> -----Original Message-----
> From: Baolin Wang [mailto:baolin.wang7@gmail.com]
> Sent: 2020年6月15日 7:26
> To: BOUGH CHEN <haibo.chen@nxp.com>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>; Adrian Hunter
> <adrian.hunter@intel.com>; Asutosh Das <asutoshd@codeaurora.org>; Orson
> Zhai <orsonzhai@gmail.com>; Chunyan Zhang <zhang.lyra@gmail.com>; Arnd
> Bergmann <arnd@arndb.de>; Linus Walleij <linus.walleij@linaro.org>; Baolin
> Wang <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel
> Mailing List <linux-kernel@vger.kernel.org>; dl-linux-imx <linux-imx@nxp.com>
> Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> 
> On Sun, Jun 14, 2020 at 11:05 PM Baolin Wang <baolin.wang7@gmail.com>
> wrote:
> >
> > On Wed, Jun 10, 2020 at 10:26 AM BOUGH CHEN <haibo.chen@nxp.com>
> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Baolin Wang [mailto:baolin.wang7@gmail.com]
> > > > Sent: 2020年6月8日 19:54
> > > > To: BOUGH CHEN <haibo.chen@nxp.com>
> > > > Cc: Ulf Hansson <ulf.hansson@linaro.org>; Adrian Hunter
> > > > <adrian.hunter@intel.com>; Asutosh Das <asutoshd@codeaurora.org>;
> > > > Orson Zhai <orsonzhai@gmail.com>; Chunyan Zhang
> > > > <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus
> > > > Walleij <linus.walleij@linaro.org>; Baolin Wang
> > > > <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel
> > > > Mailing List <linux-kernel@vger.kernel.org>; dl-linux-imx
> > > > <linux-imx@nxp.com>
> > > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> > > >
> > > > Hi Haibo.
> > > >
> > > > On Mon, Jun 8, 2020 at 2:35 PM BOUGH CHEN <haibo.chen@nxp.com>
> wrote:
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: linux-mmc-owner@vger.kernel.org
> > > > > > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin
> > > > > > Wang
> > > > > > Sent: 2020年2月19日 9:35
> > > > > > To: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das
> > > > > > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>;
> > > > Chunyan
> > > > > > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>;
> > > > > > Linus Walleij <linus.walleij@linaro.org>; Baolin Wang
> > > > > > <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux
> > > > > > Kernel Mailing List <linux-kernel@vger.kernel.org>
> > > > > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support
> > > > > >
> > > > > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson
> > > > > > <ulf.hansson@linaro.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang
> > > > > > > <baolin.wang7@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > Now the MMC read/write stack will always wait for previous
> > > > > > > > request is completed by mmc_blk_rw_wait(), before sending
> > > > > > > > a new request to hardware, or queue a work to complete
> > > > > > > > request, that will bring context switching overhead,
> > > > > > > > especially for high I/O per second rates, to affect the IO
> performance.
> > > > > > > >
> > > > > > > > Thus this patch set will introduce the MMC software
> > > > > > > > command queue support based on command queue engine's
> > > > > > > > interfaces, and set the queue depth as 64 to allow more
> > > > > > > > requests can be be prepared, merged and inserted into IO
> > > > > > > > scheduler, but we only allow 2 requests in flight, that is
> > > > > > > > enough to let the irq handler always trigger the next
> > > > > > > > request without a context switch, as
> > > > well as avoiding a long latency.
> > > > > > > >
> > > > > > > > Moreover we can expand the MMC software queue interface to
> > > > > > > > support MMC packed request or packed command instead of
> > > > > > > > adding new interfaces, according to previosus discussion.
> > > > > > > >
> > > > > > > > Below are some comparison data with fio tool. The fio
> > > > > > > > command I used is like below with changing the '--rw'
> > > > > > > > parameter and enabling the direct IO flag to measure the
> > > > > > > > actual hardware transfer speed in 4K block
> > > > > > size.
> > > > > > > >
> > > > > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20
> > > > > > > > --rw=read --bs=4K --size=1G --group_reporting --numjobs=20
> > > > > > > > --name=test_read
> > > > > > > >
> > > > > > > > My eMMC card working at HS400 Enhanced strobe mode:
> > > > > > > > [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at
> > > > address
> > > > > > 0001
> > > > > > > > [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> > > > > > > > [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1
> 4.00
> > > > MiB
> > > > > > > > [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2
> 4.00
> > > > MiB
> > > > > > > > [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3
> 4.00
> > > > MiB,
> > > > > > chardev (248:0)
> > > > > > > >
> > > > > > > > 1. Without MMC software queue I tested 5 times for each
> > > > > > > > case and output a average speed.
> > > > > > > >
> > > > > > > > 1) Sequential read:
> > > > > > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s,
> > > > > > > > 60.8MiB/s Average
> > > > > > > > speed: 59.66MiB/s
> > > > > > > >
> > > > > > > > 2) Random read:
> > > > > > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s,
> > > > > > > > 27.2MiB/s Average
> > > > > > > > speed: 27.04MiB/s
> > > > > > > >
> > > > > > > > 3) Sequential write:
> > > > > > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s,
> > > > > > > > 67.5MiB/s Average
> > > > > > > > speed: 69.68MiB/s
> > > > > > > >
> > > > > > > > 4) Random write:
> > > > > > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s
> > > > > > > > Average
> > > > > > > > speed: 35.96MiB/s
> > > > > > > >
> > > > > > > > 2. With MMC software queue I tested 5 times for each case
> > > > > > > > and output a average speed.
> > > > > > > >
> > > > > > > > 1) Sequential read:
> > > > > > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s,
> > > > > > > > 59.9MiB/s Average
> > > > > > > > speed: 60.68MiB/s
> > > > > > > >
> > > > > > > > 2) Random read:
> > > > > > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s,
> > > > > > > > 31.3MiB/s Average
> > > > > > > > speed: 31.36MiB/s
> > > > > > > >
> > > > > > > > 3) Sequential write:
> > > > > > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s
> > > > > > > > Average
> > > > > > > > speed: 71.66MiB/s
> > > > > > > >
> > > > > > > > 4) Random write:
> > > > > > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s,
> > > > > > > > 68.8MiB/s Average
> > > > > > > > speed: 68.76MiB/s
> > > > > > > >
> > > > > > > > Form above data, we can see the MMC software queue can
> > > > > > > > help to improve some performance obviously for random read
> > > > > > > > and write, though no obvious improvement for sequential read and
> write.
> > > > > > > >
> > > > > > > > Any comments are welcome. Thanks a lot.
> > > > > > > >
> > > > >
> > > > > Hi Baolin,
> > > > >
> > > > > I refer to your code, and add the software queue support on i.MX
> > > > > based on
> > > > the Linux next-20200602, but unfortunately, I see an obvious
> > > > performance drop when change to use software queue.
> > > > > I test on our imx850-evk board, with eMMC soldered.
> > > > > From the result listing below, only random write has a little
> > > > > performance
> > > > improve, for others, seems performance drop a lot.
> > > > > I noticed that, this software queue need no-removable card, any
> > > > > other
> > > > limitation? For host?
> > > > > From the code logic, software queue complete the request in irq
> > > > > handler,
> > > > seems no other change, I do not figure out why this will trigger a
> > > > performance drop on my platform. Any comment would be appreciate!
> > > >
> > > > Have you tested with below patches, which introduce an
> > > > atomic_request host ops to submit next request in the irq hard handler
> context to reduce latency?
> > >
> > > Hi Baolin,
> > >
> > > The Linux code base I use is Linux-next-20200602, which already contain all
> the patch you listed, including the atomic_request.
> > > I do not test the atomic_request alone. I just enable software queue, and
> this software queue use the atomic request.
> > > So, do you mean, I need to first disable software queue, and just enable the
> atomic_request, to confirm whether the atomic_request has some
> performance impact on my platform?
> >
> > NO, the HSQ will support  atomic_request ops, and I think you should
> > implement the atomic_request ops in your driver if possible, like
> > below patch:
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> >
> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2
> >
> Fcommit%2F%3Fid%3D61ab64e2f54f4c607428667f83f411cb659843a3&amp;d
> ata=02
> > %7C01%7Chaibo.chen%40nxp.com%7C500e3b7d66954c14d2f208d810ba488
> 3%7C686e
> >
> a1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637277739547832756&amp;s
> data=0j
> > RbAi8gvllPKc0IPjdQfK7nTGQ9piefGu80lWQIG5Y%3D&amp;reserved=0
> 
> Ah, sorry, I saw you've implemented the atomic_request ops. Moreover I've
> tested the mainline, and I still got some performance improvement.
> 
> What's the IO scheduler you selected?
> 

Hi Baolin,

I do not change the setting of IO scheduler, so should be the default IO scheduler.
One thing I find when I go through your patch set, is that you said the HSQ need host side to has the flag MMC_CAP_WAIT_WHILE_BUSY.
But for i.MX usdhc host, it do not support this flag, not sure whether it caused the performance difference.


Best Regards
Haibo Chen
> >
> > > > > Without software queue, normal read/write method:
> > > > > Sequential read: 56MB/s
> > > > > Random read: 23.5MB/s
> > > > > Sequential write: 43.7MB/s
> > > > > Random write: 19MB/s
> > > > >
> > > > > With mmc software queue:
> > > > > Sequential read: 33.5MB/s
> > > > > Random read: 18.7 MB/s
> > > > > Sequential write: 37.7MB/s
> > > > > Random write: 19.8MB/s
> > > > >
> > > > >
> > > > > Here, I also list my change code to support software queue
> > > > >
> > > > > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
> > > > > index
> > > > > eb85237bf2d6..996b8cc5c381 100644
> > > > > --- a/drivers/mmc/host/Kconfig
> > > > > +++ b/drivers/mmc/host/Kconfig
> > > > > @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX
> > > > >         depends on MMC_SDHCI_PLTFM
> > > > >         select MMC_SDHCI_IO_ACCESSORS
> > > > >         select MMC_CQHCI
> > > > > +       select MMC_HSQ
> > > > >         help
> > > > >           This selects the Freescale eSDHC/uSDHC controller
> support
> > > > >           found on i.MX25, i.MX35 i.MX5x and i.MX6x.
> > > > > diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c
> > > > > b/drivers/mmc/host/sdhci-esdhc-imx.c
> > > > > index 1d7f84b23a22..6f163695b08d 100644
> > > > > --- a/drivers/mmc/host/sdhci-esdhc-imx.c
> > > > > +++ b/drivers/mmc/host/sdhci-esdhc-imx.c
> > > > > @@ -29,6 +29,7 @@
> > > > >  #include "sdhci-pltfm.h"
> > > > >  #include "sdhci-esdhc.h"
> > > > >  #include "cqhci.h"
> > > > > +#include "mmc_hsq.h"
> > > > >
> > > > >  #define ESDHC_SYS_CTRL_DTOCV_MASK      0x0f
> > > > >  #define        ESDHC_CTRL_D3CD                 0x08
> > > > > @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct
> > > > > sdhci_host
> > > > *host, u32 intmask)
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > +static void esdhc_request_done(struct sdhci_host *host, struct
> > > > > +mmc_request *mrq) {
> > > > > +       /* Validate if the request was from software queue firstly. */
> > > > > +       if (mmc_hsq_finalize_request(host->mmc, mrq))
> > > > > +               return;
> > > > > +
> > > > > +       mmc_request_done(host->mmc, mrq); }
> > > > > +
> > > > >  static struct sdhci_ops sdhci_esdhc_ops = {
> > > > >         .read_l = esdhc_readl_le,
> > > > >         .read_w = esdhc_readw_le, @@ -1237,6 +1247,7 @@ static
> > > > > struct sdhci_ops sdhci_esdhc_ops = {
> > > > >         .set_uhs_signaling = esdhc_set_uhs_signaling,
> > > > >         .reset = esdhc_reset,
> > > > >         .irq = esdhc_cqhci_irq,
> > > > > +       .request_done = esdhc_request_done,
> > > > >  };
> > > > >
> > > > >  static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = {
> > > > > @@
> > > > > -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct
> > > > > sdhci_host
> > > > *host)
> > > > >                         writel(tmp, host->ioaddr +
> > > > ESDHC_VEND_SPEC2);
> > > > >
> > > > >                         host->quirks &=
> > > > ~SDHCI_QUIRK_NO_BUSY_IRQ;
> > > > > +
> > > > > +                       /*
> > > > > +                        * On i.MX8MM, we are running Dual
> Linux
> > > > > + OS,
> > > > with 1st Linux using SD Card
> > > > > +                        * as rootfs storage, 2nd Linux using
> > > > > + eMMC as
> > > > rootfs storage. We let the
> > > > > +                        * the 1st linux configure power/clock
> > > > > + for the
> > > > 2nd Linux.
> > > > > +                        *
> > > > > +                        * When the 2nd Linux is booting into
> > > > > + rootfs
> > > > stage, we let the 1st Linux
> > > > > +                        * to destroy the 2nd linux, then
> > > > > + restart the
> > > > 2nd linux, we met SDHCI dump.
> > > > > +                        * After we clear the pending interrupt
> > > > > + and halt
> > > > CQCTL, issue gone.
> > > > > +                        */
> > > > > +                       tmp = cqhci_readl(cq_host, CQHCI_IS);
> > > > > +                       cqhci_writel(cq_host, tmp, CQHCI_IS);
> > > > > +                       cqhci_writel(cq_host, CQHCI_HALT,
> > > > CQHCI_CTL);
> > > > >                 }
> > > > >
> > > > >                 if (imx_data->socdata->flags &
> > > > ESDHC_FLAG_STD_TUNING)
> > > > > { @@ -1351,9 +1375,6 @@ static void
> > > > > sdhci_esdhc_imx_hwinit(struct
> > > > sdhci_host *host)
> > > > >                  * After we clear the pending interrupt and halt
> > > > > CQCTL,
> > > > issue gone.
> > > > >                  */
> > > > >                 if (cq_host) {
> > > > > -                       tmp = cqhci_readl(cq_host, CQHCI_IS);
> > > > > -                       cqhci_writel(cq_host, tmp, CQHCI_IS);
> > > > > -                       cqhci_writel(cq_host, CQHCI_HALT,
> > > > CQHCI_CTL);
> > > > >                 }
> > > > >         }
> > > > >  }
> > > > > @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct
> > > > platform_device *pdev)
> > > > >         struct sdhci_pltfm_host *pltfm_host;
> > > > >         struct sdhci_host *host;
> > > > >         struct cqhci_host *cq_host;
> > > > > +       struct mmc_hsq *hsq;
> > > > >         int err;
> > > > >         struct pltfm_imx_data *imx_data;
> > > > >
> > > > > @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct
> > > > platform_device *pdev)
> > > > >                 err = cqhci_init(cq_host, host->mmc, false);
> > > > >                 if (err)
> > > > >                         goto disable_ahb_clk;
> > > > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > > > +               hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq),
> > > > GFP_KERNEL);
> > > > > +               if (!hsq) {
> > > > > +                       err = -ENOMEM;
> > > > > +                       goto disable_ahb_clk;
> > > > > +               }
> > > > > +
> > > > > +               err = mmc_hsq_init(hsq, host->mmc);
> > > > > +               if (err)
> > > > > +                       goto disable_ahb_clk;
> > > > >         }
> > > > >
> > > > >         if (of_id)
> > > > > @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct
> > > > platform_device *pdev)
> > > > >         if (err)
> > > > >                 goto disable_ahb_clk;
> > > > >
> > > > > +       if (!mmc_card_is_removable(host->mmc))
> > > > > +               host->mmc_host_ops.request_atomic =
> > > > sdhci_request_atomic;
> > > > > +       else
> > > > > +               host->always_defer_done = true;
> > > > > +
> > > > >         sdhci_esdhc_imx_hwinit(host);
> > > > >
> > > > >         err = sdhci_add_host(host); @@ -1737,6 +1774,8 @@ static
> > > > > int sdhci_esdhc_suspend(struct device
> > > > *dev)
> > > > >                 ret = cqhci_suspend(host->mmc);
> > > > >                 if (ret)
> > > > >                         return ret;
> > > > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > > > +               mmc_hsq_suspend(host->mmc);
> > > > >         }
> > > > >
> > > > >         if ((imx_data->socdata->flags &
> > > > > ESDHC_FLAG_STATE_LOST_IN_LPMODE) && @@ -1764,6 +1803,8 @@
> > > > static int
> > > > > sdhci_esdhc_suspend(struct device *dev)  static int
> > > > > sdhci_esdhc_resume(struct device *dev)  {
> > > > >         struct sdhci_host *host = dev_get_drvdata(dev);
> > > > > +       struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> > > > > +       struct pltfm_imx_data *imx_data =
> > > > > + sdhci_pltfm_priv(pltfm_host);
> > > > >         int ret;
> > > > >
> > > > >         ret = pinctrl_pm_select_default_state(dev);
> > > > > @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct
> > > > > device
> > > > *dev)
> > > > >         if (ret)
> > > > >                 return ret;
> > > > >
> > > > > -       if (host->mmc->caps2 & MMC_CAP2_CQE)
> > > > > +       if (host->mmc->caps2 & MMC_CAP2_CQE) {
> > > > >                 ret = cqhci_resume(host->mmc);
> > > > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > > > +               mmc_hsq_resume(host->mmc);
> > > > > +       }
> > > > >
> > > > >         if (!ret)
> > > > >                 ret = mmc_gpio_set_cd_wake(host->mmc, false);
> @@
> > > > > -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct
> > > > > device
> > > > *dev)
> > > > >                 ret = cqhci_suspend(host->mmc);
> > > > >                 if (ret)
> > > > >                         return ret;
> > > > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > > > +               mmc_hsq_suspend(host->mmc);
> > > > >         }
> > > > >
> > > > >         ret = sdhci_runtime_suspend_host(host); @@ -1851,8
> > > > > +1897,11
> > > > @@
> > > > > static int sdhci_esdhc_runtime_resume(struct device *dev)
> > > > >         if (err)
> > > > >                 goto disable_ipg_clk;
> > > > >
> > > > > -       if (host->mmc->caps2 & MMC_CAP2_CQE)
> > > > > +       if (host->mmc->caps2 & MMC_CAP2_CQE) {
> > > > >                 err = cqhci_resume(host->mmc);
> > > > > +       } else if (esdhc_is_usdhc(imx_data)) {
> > > > > +               mmc_hsq_resume(host->mmc);
> > > > > +       }
> > > > >
> > > > >         return err;
> > > > >
> > > > >
> > > > >
> > > > > > > > Changes from v8:
> > > > > > > >  - Add more description in the commit message.
> > > > > > > >  - Optimize the failure log when calling cqe_enable().
> > > > > > > >
> > > > > > > > Changes from v7:
> > > > > > > >  - Add reviewed tag from Arnd.
> > > > > > > >  - Use the 'hsq' acronym for varibles and functions in the core
> layer.
> > > > > > > >  - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make
> > > > > > > > sure the CQE can work normally.
> > > > > > > >  - Add a new patch to enable the host software queue for the SD
> card.
> > > > > > > >  - Use the default MMC queue depth for host software queue.
> > > > > > > >
> > > > > > > > Changes from v6:
> > > > > > > >  - Change the patch order and set host->always_defer_done
> > > > > > > > = true for the  Spreadtrum host driver.
> > > > > > > >
> > > > > > > > Changes from v5:
> > > > > > > >  - Modify the condition of defering to complete request
> > > > > > > > suggested by
> > > > > > Adrian.
> > > > > > > >
> > > > > > > > Changes from v4:
> > > > > > > >  - Add a seperate patch to introduce a variable to defer
> > > > > > > > to complete data requests for some host drivers, when
> > > > > > > > using host
> > > > software queue.
> > > > > > > >
> > > > > > > > Changes from v3:
> > > > > > > >  - Use host software queue instead of sqhci.
> > > > > > > >  - Fix random config building issue.
> > > > > > > >  - Change queue depth to 32, but still only allow 2 requests in
> flight.
> > > > > > > >  - Update the testing data.
> > > > > > > >
> > > > > > > > Changes from v2:
> > > > > > > >  - Remove reference to 'struct cqhci_host' and 'struct
> > > > > > > > cqhci_slot', instead adding 'struct sqhci_host', which is
> > > > > > > > only used by
> > > > software queue.
> > > > > > > >
> > > > > > > > Changes from v1:
> > > > > > > >  - Add request_done ops for sdhci_ops.
> > > > > > > >  - Replace virtual command queue with software queue for
> > > > > > > > functions and  variables.
> > > > > > > >  - Rename the software queue file and add sqhci.h header file.
> > > > > > > >
> > > > > > > > Baolin Wang (5):
> > > > > > > >   mmc: Add MMC host software queue support
> > > > > > > >   mmc: core: Enable the MMC host software queue for the SD
> card
> > > > > > > >   mmc: host: sdhci: Add request_done ops for struct sdhci_ops
> > > > > > > >   mmc: host: sdhci: Add a variable to defer to complete requests
> if
> > > > > > > >     needed
> > > > > > > >   mmc: host: sdhci-sprd: Add software queue support
> > > > > > > >
> > > > > > > >  drivers/mmc/core/block.c      |   61 ++++++++
> > > > > > > >  drivers/mmc/core/mmc.c        |   18 ++-
> > > > > > > >  drivers/mmc/core/queue.c      |   22 ++-
> > > > > > > >  drivers/mmc/core/sd.c         |   10 ++
> > > > > > > >  drivers/mmc/host/Kconfig      |    8 +
> > > > > > > >  drivers/mmc/host/Makefile     |    1 +
> > > > > > > >  drivers/mmc/host/cqhci.c      |    8 +-
> > > > > > > >  drivers/mmc/host/mmc_hsq.c    |  343
> > > > > > +++++++++++++++++++++++++++++++++++++++++
> > > > > > > >  drivers/mmc/host/mmc_hsq.h    |   30 ++++
> > > > > > > >  drivers/mmc/host/sdhci-sprd.c |   28 ++++
> > > > > > > >  drivers/mmc/host/sdhci.c      |   14 +-
> > > > > > > >  drivers/mmc/host/sdhci.h      |    3 +
> > > > > > > >  include/linux/mmc/host.h      |    3 +
> > > > > > > >  13 files changed, 534 insertions(+), 15 deletions(-)
> > > > > > > > create mode
> > > > > > > > 100644 drivers/mmc/host/mmc_hsq.c  create mode 100644
> > > > > > > > drivers/mmc/host/mmc_hsq.h
> > > > > > > >
> > > > > > > > --
> > > > > > > > 1.7.9.5
> > > > > > > >
> > > > > > >
> > > > > > > Applied for next, thanks! Also, thanks for your patience
> > > > > > > while moving forward during the reviews!
> > > > > >
> > > > > > I am very appreciated for you and Arnd's good sugestion when
> > > > > > introducing the hsq.
> > > > > >
> > > > > > >
> > > > > > > Note, I did some amending of patch1 to resolve some
> > > > > > > checkpatch warnings. SPDX licence and Kconfig help texts,
> > > > > > > please have a look and tell if there are something that doesn't look
> good.
> > > > > >
> > > > > > Thanks for your help and looks good to me.
> > > >
> > > >
> > > >
> > > > --
> > > > Baolin Wang
> >
> >
> >
> > --
> > Baolin Wang
> 
> 
> 
> --
> Baolin Wang