Message ID | cover.1581478568.git.baolin.wang7@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Add MMC software queue support | expand |
On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com> wrote: > > Hi All, > > Now the MMC read/write stack will always wait for previous request is > completed by mmc_blk_rw_wait(), before sending a new request to hardware, > or queue a work to complete request, that will bring context switching > overhead, especially for high I/O per second rates, to affect the IO > performance. > > Thus this patch set will introduce the MMC software command queue support > based on command queue engine's interfaces, and set the queue depth as 64 > to allow more requests can be be prepared, merged and inserted into IO > scheduler, but we only allow 2 requests in flight, that is enough to let > the irq handler always trigger the next request without a context switch, > as well as avoiding a long latency. > > Moreover we can expand the MMC software queue interface to support > MMC packed request or packed command instead of adding new interfaces, > according to previosus discussion. > > Below are some comparison data with fio tool. The fio command I used > is like below with changing the '--rw' parameter and enabling the direct > IO flag to measure the actual hardware transfer speed in 4K block size. > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=1G --group_reporting --numjobs=20 --name=test_read > > My eMMC card working at HS400 Enhanced strobe mode: > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) > > 1. Without MMC software queue > I tested 5 times for each case and output a average speed. > > 1) Sequential read: > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s > Average speed: 59.66MiB/s > > 2) Random read: > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s > Average speed: 27.04MiB/s > > 3) Sequential write: > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s > Average speed: 69.68MiB/s > > 4) Random write: > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s > Average speed: 35.96MiB/s > > 2. With MMC software queue > I tested 5 times for each case and output a average speed. > > 1) Sequential read: > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s > Average speed: 60.68MiB/s > > 2) Random read: > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s > Average speed: 31.36MiB/s > > 3) Sequential write: > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s > Average speed: 71.66MiB/s > > 4) Random write: > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s > Average speed: 68.76MiB/s > > Form above data, we can see the MMC software queue can help to improve some > performance obviously for random read and write, though no obvious improvement > for sequential read and write. > > Any comments are welcome. Thanks a lot. > > Changes from v8: > - Add more description in the commit message. > - Optimize the failure log when calling cqe_enable(). > > Changes from v7: > - Add reviewed tag from Arnd. > - Use the 'hsq' acronym for varibles and functions in the core layer. > - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the CQE > can work normally. > - Add a new patch to enable the host software queue for the SD card. > - Use the default MMC queue depth for host software queue. > > Changes from v6: > - Change the patch order and set host->always_defer_done = true for the > Spreadtrum host driver. > > Changes from v5: > - Modify the condition of defering to complete request suggested by Adrian. > > Changes from v4: > - Add a seperate patch to introduce a variable to defer to complete > data requests for some host drivers, when using host software queue. > > Changes from v3: > - Use host software queue instead of sqhci. > - Fix random config building issue. > - Change queue depth to 32, but still only allow 2 requests in flight. > - Update the testing data. > > Changes from v2: > - Remove reference to 'struct cqhci_host' and 'struct cqhci_slot', > instead adding 'struct sqhci_host', which is only used by software queue. > > Changes from v1: > - Add request_done ops for sdhci_ops. > - Replace virtual command queue with software queue for functions and > variables. > - Rename the software queue file and add sqhci.h header file. > > Baolin Wang (5): > mmc: Add MMC host software queue support > mmc: core: Enable the MMC host software queue for the SD card > mmc: host: sdhci: Add request_done ops for struct sdhci_ops > mmc: host: sdhci: Add a variable to defer to complete requests if > needed > mmc: host: sdhci-sprd: Add software queue support > > drivers/mmc/core/block.c | 61 ++++++++ > drivers/mmc/core/mmc.c | 18 ++- > drivers/mmc/core/queue.c | 22 ++- > drivers/mmc/core/sd.c | 10 ++ > drivers/mmc/host/Kconfig | 8 + > drivers/mmc/host/Makefile | 1 + > drivers/mmc/host/cqhci.c | 8 +- > drivers/mmc/host/mmc_hsq.c | 343 +++++++++++++++++++++++++++++++++++++++++ > drivers/mmc/host/mmc_hsq.h | 30 ++++ > drivers/mmc/host/sdhci-sprd.c | 28 ++++ > drivers/mmc/host/sdhci.c | 14 +- > drivers/mmc/host/sdhci.h | 3 + > include/linux/mmc/host.h | 3 + > 13 files changed, 534 insertions(+), 15 deletions(-) > create mode 100644 drivers/mmc/host/mmc_hsq.c > create mode 100644 drivers/mmc/host/mmc_hsq.h > > -- > 1.7.9.5 > Applied for next, thanks! Also, thanks for your patience while moving forward during the reviews! Note, I did some amending of patch1 to resolve some checkpatch warnings. SPDX licence and Kconfig help texts, please have a look and tell if there are something that doesn't look good. Kind regards Uffe
On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com> wrote: > > > > Hi All, > > > > Now the MMC read/write stack will always wait for previous request is > > completed by mmc_blk_rw_wait(), before sending a new request to hardware, > > or queue a work to complete request, that will bring context switching > > overhead, especially for high I/O per second rates, to affect the IO > > performance. > > > > Thus this patch set will introduce the MMC software command queue support > > based on command queue engine's interfaces, and set the queue depth as 64 > > to allow more requests can be be prepared, merged and inserted into IO > > scheduler, but we only allow 2 requests in flight, that is enough to let > > the irq handler always trigger the next request without a context switch, > > as well as avoiding a long latency. > > > > Moreover we can expand the MMC software queue interface to support > > MMC packed request or packed command instead of adding new interfaces, > > according to previosus discussion. > > > > Below are some comparison data with fio tool. The fio command I used > > is like below with changing the '--rw' parameter and enabling the direct > > IO flag to measure the actual hardware transfer speed in 4K block size. > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=1G --group_reporting --numjobs=20 --name=test_read > > > > My eMMC card working at HS400 Enhanced strobe mode: > > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 > > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) > > > > 1. Without MMC software queue > > I tested 5 times for each case and output a average speed. > > > > 1) Sequential read: > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s > > Average speed: 59.66MiB/s > > > > 2) Random read: > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s > > Average speed: 27.04MiB/s > > > > 3) Sequential write: > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s > > Average speed: 69.68MiB/s > > > > 4) Random write: > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s > > Average speed: 35.96MiB/s > > > > 2. With MMC software queue > > I tested 5 times for each case and output a average speed. > > > > 1) Sequential read: > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s > > Average speed: 60.68MiB/s > > > > 2) Random read: > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s > > Average speed: 31.36MiB/s > > > > 3) Sequential write: > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s > > Average speed: 71.66MiB/s > > > > 4) Random write: > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s > > Average speed: 68.76MiB/s > > > > Form above data, we can see the MMC software queue can help to improve some > > performance obviously for random read and write, though no obvious improvement > > for sequential read and write. > > > > Any comments are welcome. Thanks a lot. > > > > Changes from v8: > > - Add more description in the commit message. > > - Optimize the failure log when calling cqe_enable(). > > > > Changes from v7: > > - Add reviewed tag from Arnd. > > - Use the 'hsq' acronym for varibles and functions in the core layer. > > - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the CQE > > can work normally. > > - Add a new patch to enable the host software queue for the SD card. > > - Use the default MMC queue depth for host software queue. > > > > Changes from v6: > > - Change the patch order and set host->always_defer_done = true for the > > Spreadtrum host driver. > > > > Changes from v5: > > - Modify the condition of defering to complete request suggested by Adrian. > > > > Changes from v4: > > - Add a seperate patch to introduce a variable to defer to complete > > data requests for some host drivers, when using host software queue. > > > > Changes from v3: > > - Use host software queue instead of sqhci. > > - Fix random config building issue. > > - Change queue depth to 32, but still only allow 2 requests in flight. > > - Update the testing data. > > > > Changes from v2: > > - Remove reference to 'struct cqhci_host' and 'struct cqhci_slot', > > instead adding 'struct sqhci_host', which is only used by software queue. > > > > Changes from v1: > > - Add request_done ops for sdhci_ops. > > - Replace virtual command queue with software queue for functions and > > variables. > > - Rename the software queue file and add sqhci.h header file. > > > > Baolin Wang (5): > > mmc: Add MMC host software queue support > > mmc: core: Enable the MMC host software queue for the SD card > > mmc: host: sdhci: Add request_done ops for struct sdhci_ops > > mmc: host: sdhci: Add a variable to defer to complete requests if > > needed > > mmc: host: sdhci-sprd: Add software queue support > > > > drivers/mmc/core/block.c | 61 ++++++++ > > drivers/mmc/core/mmc.c | 18 ++- > > drivers/mmc/core/queue.c | 22 ++- > > drivers/mmc/core/sd.c | 10 ++ > > drivers/mmc/host/Kconfig | 8 + > > drivers/mmc/host/Makefile | 1 + > > drivers/mmc/host/cqhci.c | 8 +- > > drivers/mmc/host/mmc_hsq.c | 343 +++++++++++++++++++++++++++++++++++++++++ > > drivers/mmc/host/mmc_hsq.h | 30 ++++ > > drivers/mmc/host/sdhci-sprd.c | 28 ++++ > > drivers/mmc/host/sdhci.c | 14 +- > > drivers/mmc/host/sdhci.h | 3 + > > include/linux/mmc/host.h | 3 + > > 13 files changed, 534 insertions(+), 15 deletions(-) > > create mode 100644 drivers/mmc/host/mmc_hsq.c > > create mode 100644 drivers/mmc/host/mmc_hsq.h > > > > -- > > 1.7.9.5 > > > > Applied for next, thanks! Also, thanks for your patience while moving > forward during the reviews! I am very appreciated for you and Arnd's good sugestion when introducing the hsq. > > Note, I did some amending of patch1 to resolve some checkpatch > warnings. SPDX licence and Kconfig help texts, please have a look and > tell if there are something that doesn't look good. Thanks for your help and looks good to me.
> -----Original Message----- > From: linux-mmc-owner@vger.kernel.org > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin Wang > Sent: 2020年2月19日 9:35 > To: Ulf Hansson <ulf.hansson@linaro.org> > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>; Chunyan > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus > Walleij <linus.walleij@linaro.org>; Baolin Wang <baolin.wang@linaro.org>; > linux-mmc@vger.kernel.org; Linux Kernel Mailing List > <linux-kernel@vger.kernel.org> > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org> > wrote: > > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com> > wrote: > > > > > > Hi All, > > > > > > Now the MMC read/write stack will always wait for previous request > > > is completed by mmc_blk_rw_wait(), before sending a new request to > > > hardware, or queue a work to complete request, that will bring > > > context switching overhead, especially for high I/O per second > > > rates, to affect the IO performance. > > > > > > Thus this patch set will introduce the MMC software command queue > > > support based on command queue engine's interfaces, and set the > > > queue depth as 64 to allow more requests can be be prepared, merged > > > and inserted into IO scheduler, but we only allow 2 requests in > > > flight, that is enough to let the irq handler always trigger the > > > next request without a context switch, as well as avoiding a long latency. > > > > > > Moreover we can expand the MMC software queue interface to support > > > MMC packed request or packed command instead of adding new > > > interfaces, according to previosus discussion. > > > > > > Below are some comparison data with fio tool. The fio command I used > > > is like below with changing the '--rw' parameter and enabling the > > > direct IO flag to measure the actual hardware transfer speed in 4K block > size. > > > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read > > > --bs=4K --size=1G --group_reporting --numjobs=20 --name=test_read > > > > > > My eMMC card working at HS400 Enhanced strobe mode: > > > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address > 0001 > > > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > > > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > > > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > > > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, > chardev (248:0) > > > > > > 1. Without MMC software queue > > > I tested 5 times for each case and output a average speed. > > > > > > 1) Sequential read: > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s Average > > > speed: 59.66MiB/s > > > > > > 2) Random read: > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s Average > > > speed: 27.04MiB/s > > > > > > 3) Sequential write: > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s Average > > > speed: 69.68MiB/s > > > > > > 4) Random write: > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s Average > > > speed: 35.96MiB/s > > > > > > 2. With MMC software queue > > > I tested 5 times for each case and output a average speed. > > > > > > 1) Sequential read: > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s Average > > > speed: 60.68MiB/s > > > > > > 2) Random read: > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s Average > > > speed: 31.36MiB/s > > > > > > 3) Sequential write: > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average > > > speed: 71.66MiB/s > > > > > > 4) Random write: > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s Average > > > speed: 68.76MiB/s > > > > > > Form above data, we can see the MMC software queue can help to > > > improve some performance obviously for random read and write, though > > > no obvious improvement for sequential read and write. > > > > > > Any comments are welcome. Thanks a lot. > > > Hi Baolin, I refer to your code, and add the software queue support on i.MX based on the Linux next-20200602, but unfortunately, I see an obvious performance drop when change to use software queue. I test on our imx850-evk board, with eMMC soldered. From the result listing below, only random write has a little performance improve, for others, seems performance drop a lot. I noticed that, this software queue need no-removable card, any other limitation? For host? From the code logic, software queue complete the request in irq handler, seems no other change, I do not figure out why this will trigger a performance drop on my platform. Any comment would be appreciate! Without software queue, normal read/write method: Sequential read: 56MB/s Random read: 23.5MB/s Sequential write: 43.7MB/s Random write: 19MB/s With mmc software queue: Sequential read: 33.5MB/s Random read: 18.7 MB/s Sequential write: 37.7MB/s Random write: 19.8MB/s Here, I also list my change code to support software queue diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index eb85237bf2d6..996b8cc5c381 100644 --- a/drivers/mmc/host/Kconfig +++ b/drivers/mmc/host/Kconfig @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX depends on MMC_SDHCI_PLTFM select MMC_SDHCI_IO_ACCESSORS select MMC_CQHCI + select MMC_HSQ help This selects the Freescale eSDHC/uSDHC controller support found on i.MX25, i.MX35 i.MX5x and i.MX6x. diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c index 1d7f84b23a22..6f163695b08d 100644 --- a/drivers/mmc/host/sdhci-esdhc-imx.c +++ b/drivers/mmc/host/sdhci-esdhc-imx.c @@ -29,6 +29,7 @@ #include "sdhci-pltfm.h" #include "sdhci-esdhc.h" #include "cqhci.h" +#include "mmc_hsq.h" #define ESDHC_SYS_CTRL_DTOCV_MASK 0x0f #define ESDHC_CTRL_D3CD 0x08 @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host *host, u32 intmask) return 0; } +static void esdhc_request_done(struct sdhci_host *host, struct mmc_request *mrq) +{ + /* Validate if the request was from software queue firstly. */ + if (mmc_hsq_finalize_request(host->mmc, mrq)) + return; + + mmc_request_done(host->mmc, mrq); +} + static struct sdhci_ops sdhci_esdhc_ops = { .read_l = esdhc_readl_le, .read_w = esdhc_readw_le, @@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = { .set_uhs_signaling = esdhc_set_uhs_signaling, .reset = esdhc_reset, .irq = esdhc_cqhci_irq, + .request_done = esdhc_request_done, }; static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = { @@ -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host *host) writel(tmp, host->ioaddr + ESDHC_VEND_SPEC2); host->quirks &= ~SDHCI_QUIRK_NO_BUSY_IRQ; + + /* + * On i.MX8MM, we are running Dual Linux OS, with 1st Linux using SD Card + * as rootfs storage, 2nd Linux using eMMC as rootfs storage. We let the + * the 1st linux configure power/clock for the 2nd Linux. + * + * When the 2nd Linux is booting into rootfs stage, we let the 1st Linux + * to destroy the 2nd linux, then restart the 2nd linux, we met SDHCI dump. + * After we clear the pending interrupt and halt CQCTL, issue gone. + */ + tmp = cqhci_readl(cq_host, CQHCI_IS); + cqhci_writel(cq_host, tmp, CQHCI_IS); + cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL); } if (imx_data->socdata->flags & ESDHC_FLAG_STD_TUNING) { @@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host *host) * After we clear the pending interrupt and halt CQCTL, issue gone. */ if (cq_host) { - tmp = cqhci_readl(cq_host, CQHCI_IS); - cqhci_writel(cq_host, tmp, CQHCI_IS); - cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL); } } } @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev) struct sdhci_pltfm_host *pltfm_host; struct sdhci_host *host; struct cqhci_host *cq_host; + struct mmc_hsq *hsq; int err; struct pltfm_imx_data *imx_data; @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev) err = cqhci_init(cq_host, host->mmc, false); if (err) goto disable_ahb_clk; + } else if (esdhc_is_usdhc(imx_data)) { + hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq), GFP_KERNEL); + if (!hsq) { + err = -ENOMEM; + goto disable_ahb_clk; + } + + err = mmc_hsq_init(hsq, host->mmc); + if (err) + goto disable_ahb_clk; } if (of_id) @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev) if (err) goto disable_ahb_clk; + if (!mmc_card_is_removable(host->mmc)) + host->mmc_host_ops.request_atomic = sdhci_request_atomic; + else + host->always_defer_done = true; + sdhci_esdhc_imx_hwinit(host); err = sdhci_add_host(host); @@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device *dev) ret = cqhci_suspend(host->mmc); if (ret) return ret; + } else if (esdhc_is_usdhc(imx_data)) { + mmc_hsq_suspend(host->mmc); } if ((imx_data->socdata->flags & ESDHC_FLAG_STATE_LOST_IN_LPMODE) && @@ -1764,6 +1803,8 @@ static int sdhci_esdhc_suspend(struct device *dev) static int sdhci_esdhc_resume(struct device *dev) { struct sdhci_host *host = dev_get_drvdata(dev); + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); + struct pltfm_imx_data *imx_data = sdhci_pltfm_priv(pltfm_host); int ret; ret = pinctrl_pm_select_default_state(dev); @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device *dev) if (ret) return ret; - if (host->mmc->caps2 & MMC_CAP2_CQE) + if (host->mmc->caps2 & MMC_CAP2_CQE) { ret = cqhci_resume(host->mmc); + } else if (esdhc_is_usdhc(imx_data)) { + mmc_hsq_resume(host->mmc); + } if (!ret) ret = mmc_gpio_set_cd_wake(host->mmc, false); @@ -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device *dev) ret = cqhci_suspend(host->mmc); if (ret) return ret; + } else if (esdhc_is_usdhc(imx_data)) { + mmc_hsq_suspend(host->mmc); } ret = sdhci_runtime_suspend_host(host); @@ -1851,8 +1897,11 @@ static int sdhci_esdhc_runtime_resume(struct device *dev) if (err) goto disable_ipg_clk; - if (host->mmc->caps2 & MMC_CAP2_CQE) + if (host->mmc->caps2 & MMC_CAP2_CQE) { err = cqhci_resume(host->mmc); + } else if (esdhc_is_usdhc(imx_data)) { + mmc_hsq_resume(host->mmc); + } return err; > > > Changes from v8: > > > - Add more description in the commit message. > > > - Optimize the failure log when calling cqe_enable(). > > > > > > Changes from v7: > > > - Add reviewed tag from Arnd. > > > - Use the 'hsq' acronym for varibles and functions in the core layer. > > > - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the CQE > > > can work normally. > > > - Add a new patch to enable the host software queue for the SD card. > > > - Use the default MMC queue depth for host software queue. > > > > > > Changes from v6: > > > - Change the patch order and set host->always_defer_done = true for > > > the Spreadtrum host driver. > > > > > > Changes from v5: > > > - Modify the condition of defering to complete request suggested by > Adrian. > > > > > > Changes from v4: > > > - Add a seperate patch to introduce a variable to defer to complete > > > data requests for some host drivers, when using host software queue. > > > > > > Changes from v3: > > > - Use host software queue instead of sqhci. > > > - Fix random config building issue. > > > - Change queue depth to 32, but still only allow 2 requests in flight. > > > - Update the testing data. > > > > > > Changes from v2: > > > - Remove reference to 'struct cqhci_host' and 'struct cqhci_slot', > > > instead adding 'struct sqhci_host', which is only used by software queue. > > > > > > Changes from v1: > > > - Add request_done ops for sdhci_ops. > > > - Replace virtual command queue with software queue for functions > > > and variables. > > > - Rename the software queue file and add sqhci.h header file. > > > > > > Baolin Wang (5): > > > mmc: Add MMC host software queue support > > > mmc: core: Enable the MMC host software queue for the SD card > > > mmc: host: sdhci: Add request_done ops for struct sdhci_ops > > > mmc: host: sdhci: Add a variable to defer to complete requests if > > > needed > > > mmc: host: sdhci-sprd: Add software queue support > > > > > > drivers/mmc/core/block.c | 61 ++++++++ > > > drivers/mmc/core/mmc.c | 18 ++- > > > drivers/mmc/core/queue.c | 22 ++- > > > drivers/mmc/core/sd.c | 10 ++ > > > drivers/mmc/host/Kconfig | 8 + > > > drivers/mmc/host/Makefile | 1 + > > > drivers/mmc/host/cqhci.c | 8 +- > > > drivers/mmc/host/mmc_hsq.c | 343 > +++++++++++++++++++++++++++++++++++++++++ > > > drivers/mmc/host/mmc_hsq.h | 30 ++++ > > > drivers/mmc/host/sdhci-sprd.c | 28 ++++ > > > drivers/mmc/host/sdhci.c | 14 +- > > > drivers/mmc/host/sdhci.h | 3 + > > > include/linux/mmc/host.h | 3 + > > > 13 files changed, 534 insertions(+), 15 deletions(-) create mode > > > 100644 drivers/mmc/host/mmc_hsq.c create mode 100644 > > > drivers/mmc/host/mmc_hsq.h > > > > > > -- > > > 1.7.9.5 > > > > > > > Applied for next, thanks! Also, thanks for your patience while moving > > forward during the reviews! > > I am very appreciated for you and Arnd's good sugestion when introducing the > hsq. > > > > > Note, I did some amending of patch1 to resolve some checkpatch > > warnings. SPDX licence and Kconfig help texts, please have a look and > > tell if there are something that doesn't look good. > > Thanks for your help and looks good to me.
Hi Haibo. On Mon, Jun 8, 2020 at 2:35 PM BOUGH CHEN <haibo.chen@nxp.com> wrote: > > > -----Original Message----- > > From: linux-mmc-owner@vger.kernel.org > > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin Wang > > Sent: 2020年2月19日 9:35 > > To: Ulf Hansson <ulf.hansson@linaro.org> > > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das > > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>; Chunyan > > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus > > Walleij <linus.walleij@linaro.org>; Baolin Wang <baolin.wang@linaro.org>; > > linux-mmc@vger.kernel.org; Linux Kernel Mailing List > > <linux-kernel@vger.kernel.org> > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org> > > wrote: > > > > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com> > > wrote: > > > > > > > > Hi All, > > > > > > > > Now the MMC read/write stack will always wait for previous request > > > > is completed by mmc_blk_rw_wait(), before sending a new request to > > > > hardware, or queue a work to complete request, that will bring > > > > context switching overhead, especially for high I/O per second > > > > rates, to affect the IO performance. > > > > > > > > Thus this patch set will introduce the MMC software command queue > > > > support based on command queue engine's interfaces, and set the > > > > queue depth as 64 to allow more requests can be be prepared, merged > > > > and inserted into IO scheduler, but we only allow 2 requests in > > > > flight, that is enough to let the irq handler always trigger the > > > > next request without a context switch, as well as avoiding a long latency. > > > > > > > > Moreover we can expand the MMC software queue interface to support > > > > MMC packed request or packed command instead of adding new > > > > interfaces, according to previosus discussion. > > > > > > > > Below are some comparison data with fio tool. The fio command I used > > > > is like below with changing the '--rw' parameter and enabling the > > > > direct IO flag to measure the actual hardware transfer speed in 4K block > > size. > > > > > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read > > > > --bs=4K --size=1G --group_reporting --numjobs=20 --name=test_read > > > > > > > > My eMMC card working at HS400 Enhanced strobe mode: > > > > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address > > 0001 > > > > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > > > > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > > > > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > > > > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, > > chardev (248:0) > > > > > > > > 1. Without MMC software queue > > > > I tested 5 times for each case and output a average speed. > > > > > > > > 1) Sequential read: > > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s Average > > > > speed: 59.66MiB/s > > > > > > > > 2) Random read: > > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s Average > > > > speed: 27.04MiB/s > > > > > > > > 3) Sequential write: > > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s Average > > > > speed: 69.68MiB/s > > > > > > > > 4) Random write: > > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s Average > > > > speed: 35.96MiB/s > > > > > > > > 2. With MMC software queue > > > > I tested 5 times for each case and output a average speed. > > > > > > > > 1) Sequential read: > > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s Average > > > > speed: 60.68MiB/s > > > > > > > > 2) Random read: > > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s Average > > > > speed: 31.36MiB/s > > > > > > > > 3) Sequential write: > > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average > > > > speed: 71.66MiB/s > > > > > > > > 4) Random write: > > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s Average > > > > speed: 68.76MiB/s > > > > > > > > Form above data, we can see the MMC software queue can help to > > > > improve some performance obviously for random read and write, though > > > > no obvious improvement for sequential read and write. > > > > > > > > Any comments are welcome. Thanks a lot. > > > > > > Hi Baolin, > > I refer to your code, and add the software queue support on i.MX based on the Linux next-20200602, but unfortunately, I see an obvious performance drop when change to use software queue. > I test on our imx850-evk board, with eMMC soldered. > From the result listing below, only random write has a little performance improve, for others, seems performance drop a lot. > I noticed that, this software queue need no-removable card, any other limitation? For host? > From the code logic, software queue complete the request in irq handler, seems no other change, I do not figure out why this will trigger a performance drop on my platform. Any comment would be appreciate! Have you tested with below patches, which introduce an atomic_request host ops to submit next request in the irq hard handler context to reduce latency? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed6330330276cba39058e8dbdb28ab8d013d926e https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a374a72baa817388a04825118b7c1ba13064c8c6 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=98a2642f91a47dcd1215d037c14e0e5de33a247d https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e872f1e22ea5f678ae42812949477387fda6725b https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=845c939ee22943786a6eb1d13d03c77b19fcc2c8 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6db96e5810e0a6a345b7d78549de7676ae5b2662 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=48ef8a2a1e5e6a2a189009e880e341ddb452971d https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61ab64e2f54f4c607428667f83f411cb659843a3 > Without software queue, normal read/write method: > Sequential read: 56MB/s > Random read: 23.5MB/s > Sequential write: 43.7MB/s > Random write: 19MB/s > > With mmc software queue: > Sequential read: 33.5MB/s > Random read: 18.7 MB/s > Sequential write: 37.7MB/s > Random write: 19.8MB/s > > > Here, I also list my change code to support software queue > > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig > index eb85237bf2d6..996b8cc5c381 100644 > --- a/drivers/mmc/host/Kconfig > +++ b/drivers/mmc/host/Kconfig > @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX > depends on MMC_SDHCI_PLTFM > select MMC_SDHCI_IO_ACCESSORS > select MMC_CQHCI > + select MMC_HSQ > help > This selects the Freescale eSDHC/uSDHC controller support > found on i.MX25, i.MX35 i.MX5x and i.MX6x. > diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c > index 1d7f84b23a22..6f163695b08d 100644 > --- a/drivers/mmc/host/sdhci-esdhc-imx.c > +++ b/drivers/mmc/host/sdhci-esdhc-imx.c > @@ -29,6 +29,7 @@ > #include "sdhci-pltfm.h" > #include "sdhci-esdhc.h" > #include "cqhci.h" > +#include "mmc_hsq.h" > > #define ESDHC_SYS_CTRL_DTOCV_MASK 0x0f > #define ESDHC_CTRL_D3CD 0x08 > @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host *host, u32 intmask) > return 0; > } > > +static void esdhc_request_done(struct sdhci_host *host, struct mmc_request *mrq) > +{ > + /* Validate if the request was from software queue firstly. */ > + if (mmc_hsq_finalize_request(host->mmc, mrq)) > + return; > + > + mmc_request_done(host->mmc, mrq); > +} > + > static struct sdhci_ops sdhci_esdhc_ops = { > .read_l = esdhc_readl_le, > .read_w = esdhc_readw_le, > @@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = { > .set_uhs_signaling = esdhc_set_uhs_signaling, > .reset = esdhc_reset, > .irq = esdhc_cqhci_irq, > + .request_done = esdhc_request_done, > }; > > static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = { > @@ -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host *host) > writel(tmp, host->ioaddr + ESDHC_VEND_SPEC2); > > host->quirks &= ~SDHCI_QUIRK_NO_BUSY_IRQ; > + > + /* > + * On i.MX8MM, we are running Dual Linux OS, with 1st Linux using SD Card > + * as rootfs storage, 2nd Linux using eMMC as rootfs storage. We let the > + * the 1st linux configure power/clock for the 2nd Linux. > + * > + * When the 2nd Linux is booting into rootfs stage, we let the 1st Linux > + * to destroy the 2nd linux, then restart the 2nd linux, we met SDHCI dump. > + * After we clear the pending interrupt and halt CQCTL, issue gone. > + */ > + tmp = cqhci_readl(cq_host, CQHCI_IS); > + cqhci_writel(cq_host, tmp, CQHCI_IS); > + cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL); > } > > if (imx_data->socdata->flags & ESDHC_FLAG_STD_TUNING) { > @@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host *host) > * After we clear the pending interrupt and halt CQCTL, issue gone. > */ > if (cq_host) { > - tmp = cqhci_readl(cq_host, CQHCI_IS); > - cqhci_writel(cq_host, tmp, CQHCI_IS); > - cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL); > } > } > } > @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev) > struct sdhci_pltfm_host *pltfm_host; > struct sdhci_host *host; > struct cqhci_host *cq_host; > + struct mmc_hsq *hsq; > int err; > struct pltfm_imx_data *imx_data; > > @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev) > err = cqhci_init(cq_host, host->mmc, false); > if (err) > goto disable_ahb_clk; > + } else if (esdhc_is_usdhc(imx_data)) { > + hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq), GFP_KERNEL); > + if (!hsq) { > + err = -ENOMEM; > + goto disable_ahb_clk; > + } > + > + err = mmc_hsq_init(hsq, host->mmc); > + if (err) > + goto disable_ahb_clk; > } > > if (of_id) > @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev) > if (err) > goto disable_ahb_clk; > > + if (!mmc_card_is_removable(host->mmc)) > + host->mmc_host_ops.request_atomic = sdhci_request_atomic; > + else > + host->always_defer_done = true; > + > sdhci_esdhc_imx_hwinit(host); > > err = sdhci_add_host(host); > @@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device *dev) > ret = cqhci_suspend(host->mmc); > if (ret) > return ret; > + } else if (esdhc_is_usdhc(imx_data)) { > + mmc_hsq_suspend(host->mmc); > } > > if ((imx_data->socdata->flags & ESDHC_FLAG_STATE_LOST_IN_LPMODE) && > @@ -1764,6 +1803,8 @@ static int sdhci_esdhc_suspend(struct device *dev) > static int sdhci_esdhc_resume(struct device *dev) > { > struct sdhci_host *host = dev_get_drvdata(dev); > + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); > + struct pltfm_imx_data *imx_data = sdhci_pltfm_priv(pltfm_host); > int ret; > > ret = pinctrl_pm_select_default_state(dev); > @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device *dev) > if (ret) > return ret; > > - if (host->mmc->caps2 & MMC_CAP2_CQE) > + if (host->mmc->caps2 & MMC_CAP2_CQE) { > ret = cqhci_resume(host->mmc); > + } else if (esdhc_is_usdhc(imx_data)) { > + mmc_hsq_resume(host->mmc); > + } > > if (!ret) > ret = mmc_gpio_set_cd_wake(host->mmc, false); > @@ -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device *dev) > ret = cqhci_suspend(host->mmc); > if (ret) > return ret; > + } else if (esdhc_is_usdhc(imx_data)) { > + mmc_hsq_suspend(host->mmc); > } > > ret = sdhci_runtime_suspend_host(host); > @@ -1851,8 +1897,11 @@ static int sdhci_esdhc_runtime_resume(struct device *dev) > if (err) > goto disable_ipg_clk; > > - if (host->mmc->caps2 & MMC_CAP2_CQE) > + if (host->mmc->caps2 & MMC_CAP2_CQE) { > err = cqhci_resume(host->mmc); > + } else if (esdhc_is_usdhc(imx_data)) { > + mmc_hsq_resume(host->mmc); > + } > > return err; > > > > > > > Changes from v8: > > > > - Add more description in the commit message. > > > > - Optimize the failure log when calling cqe_enable(). > > > > > > > > Changes from v7: > > > > - Add reviewed tag from Arnd. > > > > - Use the 'hsq' acronym for varibles and functions in the core layer. > > > > - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the CQE > > > > can work normally. > > > > - Add a new patch to enable the host software queue for the SD card. > > > > - Use the default MMC queue depth for host software queue. > > > > > > > > Changes from v6: > > > > - Change the patch order and set host->always_defer_done = true for > > > > the Spreadtrum host driver. > > > > > > > > Changes from v5: > > > > - Modify the condition of defering to complete request suggested by > > Adrian. > > > > > > > > Changes from v4: > > > > - Add a seperate patch to introduce a variable to defer to complete > > > > data requests for some host drivers, when using host software queue. > > > > > > > > Changes from v3: > > > > - Use host software queue instead of sqhci. > > > > - Fix random config building issue. > > > > - Change queue depth to 32, but still only allow 2 requests in flight. > > > > - Update the testing data. > > > > > > > > Changes from v2: > > > > - Remove reference to 'struct cqhci_host' and 'struct cqhci_slot', > > > > instead adding 'struct sqhci_host', which is only used by software queue. > > > > > > > > Changes from v1: > > > > - Add request_done ops for sdhci_ops. > > > > - Replace virtual command queue with software queue for functions > > > > and variables. > > > > - Rename the software queue file and add sqhci.h header file. > > > > > > > > Baolin Wang (5): > > > > mmc: Add MMC host software queue support > > > > mmc: core: Enable the MMC host software queue for the SD card > > > > mmc: host: sdhci: Add request_done ops for struct sdhci_ops > > > > mmc: host: sdhci: Add a variable to defer to complete requests if > > > > needed > > > > mmc: host: sdhci-sprd: Add software queue support > > > > > > > > drivers/mmc/core/block.c | 61 ++++++++ > > > > drivers/mmc/core/mmc.c | 18 ++- > > > > drivers/mmc/core/queue.c | 22 ++- > > > > drivers/mmc/core/sd.c | 10 ++ > > > > drivers/mmc/host/Kconfig | 8 + > > > > drivers/mmc/host/Makefile | 1 + > > > > drivers/mmc/host/cqhci.c | 8 +- > > > > drivers/mmc/host/mmc_hsq.c | 343 > > +++++++++++++++++++++++++++++++++++++++++ > > > > drivers/mmc/host/mmc_hsq.h | 30 ++++ > > > > drivers/mmc/host/sdhci-sprd.c | 28 ++++ > > > > drivers/mmc/host/sdhci.c | 14 +- > > > > drivers/mmc/host/sdhci.h | 3 + > > > > include/linux/mmc/host.h | 3 + > > > > 13 files changed, 534 insertions(+), 15 deletions(-) create mode > > > > 100644 drivers/mmc/host/mmc_hsq.c create mode 100644 > > > > drivers/mmc/host/mmc_hsq.h > > > > > > > > -- > > > > 1.7.9.5 > > > > > > > > > > Applied for next, thanks! Also, thanks for your patience while moving > > > forward during the reviews! > > > > I am very appreciated for you and Arnd's good sugestion when introducing the > > hsq. > > > > > > > > Note, I did some amending of patch1 to resolve some checkpatch > > > warnings. SPDX licence and Kconfig help texts, please have a look and > > > tell if there are something that doesn't look good. > > > > Thanks for your help and looks good to me.
> -----Original Message----- > From: Baolin Wang [mailto:baolin.wang7@gmail.com] > Sent: 2020年6月8日 19:54 > To: BOUGH CHEN <haibo.chen@nxp.com> > Cc: Ulf Hansson <ulf.hansson@linaro.org>; Adrian Hunter > <adrian.hunter@intel.com>; Asutosh Das <asutoshd@codeaurora.org>; Orson > Zhai <orsonzhai@gmail.com>; Chunyan Zhang <zhang.lyra@gmail.com>; Arnd > Bergmann <arnd@arndb.de>; Linus Walleij <linus.walleij@linaro.org>; Baolin > Wang <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel > Mailing List <linux-kernel@vger.kernel.org>; dl-linux-imx <linux-imx@nxp.com> > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > Hi Haibo. > > On Mon, Jun 8, 2020 at 2:35 PM BOUGH CHEN <haibo.chen@nxp.com> wrote: > > > > > -----Original Message----- > > > From: linux-mmc-owner@vger.kernel.org > > > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin Wang > > > Sent: 2020年2月19日 9:35 > > > To: Ulf Hansson <ulf.hansson@linaro.org> > > > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das > > > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>; > Chunyan > > > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus > > > Walleij <linus.walleij@linaro.org>; Baolin Wang > > > <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel > > > Mailing List <linux-kernel@vger.kernel.org> > > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > > > > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org> > > > wrote: > > > > > > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com> > > > wrote: > > > > > > > > > > Hi All, > > > > > > > > > > Now the MMC read/write stack will always wait for previous > > > > > request is completed by mmc_blk_rw_wait(), before sending a new > > > > > request to hardware, or queue a work to complete request, that > > > > > will bring context switching overhead, especially for high I/O > > > > > per second rates, to affect the IO performance. > > > > > > > > > > Thus this patch set will introduce the MMC software command > > > > > queue support based on command queue engine's interfaces, and > > > > > set the queue depth as 64 to allow more requests can be be > > > > > prepared, merged and inserted into IO scheduler, but we only > > > > > allow 2 requests in flight, that is enough to let the irq > > > > > handler always trigger the next request without a context switch, as > well as avoiding a long latency. > > > > > > > > > > Moreover we can expand the MMC software queue interface to > > > > > support MMC packed request or packed command instead of adding > > > > > new interfaces, according to previosus discussion. > > > > > > > > > > Below are some comparison data with fio tool. The fio command I > > > > > used is like below with changing the '--rw' parameter and > > > > > enabling the direct IO flag to measure the actual hardware > > > > > transfer speed in 4K block > > > size. > > > > > > > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 > > > > > --rw=read --bs=4K --size=1G --group_reporting --numjobs=20 > > > > > --name=test_read > > > > > > > > > > My eMMC card working at HS400 Enhanced strobe mode: > > > > > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at > address > > > 0001 > > > > > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > > > > > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 > MiB > > > > > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 > MiB > > > > > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 > MiB, > > > chardev (248:0) > > > > > > > > > > 1. Without MMC software queue > > > > > I tested 5 times for each case and output a average speed. > > > > > > > > > > 1) Sequential read: > > > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s > > > > > Average > > > > > speed: 59.66MiB/s > > > > > > > > > > 2) Random read: > > > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s > > > > > Average > > > > > speed: 27.04MiB/s > > > > > > > > > > 3) Sequential write: > > > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s > > > > > Average > > > > > speed: 69.68MiB/s > > > > > > > > > > 4) Random write: > > > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s > > > > > Average > > > > > speed: 35.96MiB/s > > > > > > > > > > 2. With MMC software queue > > > > > I tested 5 times for each case and output a average speed. > > > > > > > > > > 1) Sequential read: > > > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s > > > > > Average > > > > > speed: 60.68MiB/s > > > > > > > > > > 2) Random read: > > > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s > > > > > Average > > > > > speed: 31.36MiB/s > > > > > > > > > > 3) Sequential write: > > > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average > > > > > speed: 71.66MiB/s > > > > > > > > > > 4) Random write: > > > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s > > > > > Average > > > > > speed: 68.76MiB/s > > > > > > > > > > Form above data, we can see the MMC software queue can help to > > > > > improve some performance obviously for random read and write, > > > > > though no obvious improvement for sequential read and write. > > > > > > > > > > Any comments are welcome. Thanks a lot. > > > > > > > > > Hi Baolin, > > > > I refer to your code, and add the software queue support on i.MX based on > the Linux next-20200602, but unfortunately, I see an obvious performance drop > when change to use software queue. > > I test on our imx850-evk board, with eMMC soldered. > > From the result listing below, only random write has a little performance > improve, for others, seems performance drop a lot. > > I noticed that, this software queue need no-removable card, any other > limitation? For host? > > From the code logic, software queue complete the request in irq handler, > seems no other change, I do not figure out why this will trigger a performance > drop on my platform. Any comment would be appreciate! > > Have you tested with below patches, which introduce an atomic_request host > ops to submit next request in the irq hard handler context to reduce latency? Hi Baolin, The Linux code base I use is Linux-next-20200602, which already contain all the patch you listed, including the atomic_request. I do not test the atomic_request alone. I just enable software queue, and this software queue use the atomic request. So, do you mean, I need to first disable software queue, and just enable the atomic_request, to confirm whether the atomic_request has some performance impact on my platform? Best Regards Haibo Chen > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern > el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco > mmit%2F%3Fid%3Ded6330330276cba39058e8dbdb28ab8d013d926e&dat > a=02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2 > c447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C6372721409928 > 90127&sdata=JJVYXczOXYUdpdNkcJhGMaO6a6EeZkQ72hF090BMAHY%3 > D&reserved=0 > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern > el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco > mmit%2F%3Fid%3Da374a72baa817388a04825118b7c1ba13064c8c6&dat > a=02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2 > c447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C6372721409928 > 90127&sdata=RaL685%2FtijHmnmr%2B9wOyr6bN53gLLXpEBgS9wwBqIi8 > %3D&reserved=0 > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern > el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco > mmit%2F%3Fid%3D98a2642f91a47dcd1215d037c14e0e5de33a247d&dat > a=02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2 > c447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C6372721409928 > 90127&sdata=%2B3TBD1lyYOA7D6gvhw0mikvoALMMlyaf9xPGra%2FCnU > I%3D&reserved=0 > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern > el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco > mmit%2F%3Fid%3De872f1e22ea5f678ae42812949477387fda6725b&data > =02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2c > 447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C63727214099289 > 0127&sdata=7kByREznCyEUzibneJB1JMlXdaW%2FZs0FN1MWSzDAPAg% > 3D&reserved=0 > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern > el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco > mmit%2F%3Fid%3D845c939ee22943786a6eb1d13d03c77b19fcc2c8&data > =02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2c > 447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C63727214099289 > 0127&sdata=zxV8REvhSnBTFeyi9HHjFEPcYYrPJGJpNj%2FcOORY0%2BE%3 > D&reserved=0 > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern > el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco > mmit%2F%3Fid%3D6db96e5810e0a6a345b7d78549de7676ae5b2662&da > ta=02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2 > c447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C6372721409928 > 90127&sdata=5J26bs7%2FmJbhcU2%2FJRAdvY6BLd2QIJLafjq37sVKfYA%3 > D&reserved=0 > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern > el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco > mmit%2F%3Fid%3D48ef8a2a1e5e6a2a189009e880e341ddb452971d&dat > a=02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2 > c447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C6372721409928 > 90127&sdata=yM6znnt3BDaLR0G1%2FtWyx1OogCKP2cxErZvvZxDDzbQ% > 3D&reserved=0 > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kern > el.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fco > mmit%2F%3Fid%3D61ab64e2f54f4c607428667f83f411cb659843a3&data > =02%7C01%7Chaibo.chen%40nxp.com%7C8618eba906ad4fdb4e1708d80ba2c > 447%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C63727214099289 > 0127&sdata=GZfSBBYZ9eOJfxLPEKCfVUJTcvc6oci6AmeSUCWD8Dc%3D& > amp;reserved=0 > > > Without software queue, normal read/write method: > > Sequential read: 56MB/s > > Random read: 23.5MB/s > > Sequential write: 43.7MB/s > > Random write: 19MB/s > > > > With mmc software queue: > > Sequential read: 33.5MB/s > > Random read: 18.7 MB/s > > Sequential write: 37.7MB/s > > Random write: 19.8MB/s > > > > > > Here, I also list my change code to support software queue > > > > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index > > eb85237bf2d6..996b8cc5c381 100644 > > --- a/drivers/mmc/host/Kconfig > > +++ b/drivers/mmc/host/Kconfig > > @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX > > depends on MMC_SDHCI_PLTFM > > select MMC_SDHCI_IO_ACCESSORS > > select MMC_CQHCI > > + select MMC_HSQ > > help > > This selects the Freescale eSDHC/uSDHC controller support > > found on i.MX25, i.MX35 i.MX5x and i.MX6x. > > diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c > > b/drivers/mmc/host/sdhci-esdhc-imx.c > > index 1d7f84b23a22..6f163695b08d 100644 > > --- a/drivers/mmc/host/sdhci-esdhc-imx.c > > +++ b/drivers/mmc/host/sdhci-esdhc-imx.c > > @@ -29,6 +29,7 @@ > > #include "sdhci-pltfm.h" > > #include "sdhci-esdhc.h" > > #include "cqhci.h" > > +#include "mmc_hsq.h" > > > > #define ESDHC_SYS_CTRL_DTOCV_MASK 0x0f > > #define ESDHC_CTRL_D3CD 0x08 > > @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host > *host, u32 intmask) > > return 0; > > } > > > > +static void esdhc_request_done(struct sdhci_host *host, struct > > +mmc_request *mrq) { > > + /* Validate if the request was from software queue firstly. */ > > + if (mmc_hsq_finalize_request(host->mmc, mrq)) > > + return; > > + > > + mmc_request_done(host->mmc, mrq); } > > + > > static struct sdhci_ops sdhci_esdhc_ops = { > > .read_l = esdhc_readl_le, > > .read_w = esdhc_readw_le, > > @@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = { > > .set_uhs_signaling = esdhc_set_uhs_signaling, > > .reset = esdhc_reset, > > .irq = esdhc_cqhci_irq, > > + .request_done = esdhc_request_done, > > }; > > > > static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = { @@ > > -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host > *host) > > writel(tmp, host->ioaddr + > ESDHC_VEND_SPEC2); > > > > host->quirks &= > ~SDHCI_QUIRK_NO_BUSY_IRQ; > > + > > + /* > > + * On i.MX8MM, we are running Dual Linux OS, > with 1st Linux using SD Card > > + * as rootfs storage, 2nd Linux using eMMC as > rootfs storage. We let the > > + * the 1st linux configure power/clock for the > 2nd Linux. > > + * > > + * When the 2nd Linux is booting into rootfs > stage, we let the 1st Linux > > + * to destroy the 2nd linux, then restart the > 2nd linux, we met SDHCI dump. > > + * After we clear the pending interrupt and halt > CQCTL, issue gone. > > + */ > > + tmp = cqhci_readl(cq_host, CQHCI_IS); > > + cqhci_writel(cq_host, tmp, CQHCI_IS); > > + cqhci_writel(cq_host, CQHCI_HALT, > CQHCI_CTL); > > } > > > > if (imx_data->socdata->flags & > ESDHC_FLAG_STD_TUNING) > > { @@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct > sdhci_host *host) > > * After we clear the pending interrupt and halt CQCTL, > issue gone. > > */ > > if (cq_host) { > > - tmp = cqhci_readl(cq_host, CQHCI_IS); > > - cqhci_writel(cq_host, tmp, CQHCI_IS); > > - cqhci_writel(cq_host, CQHCI_HALT, > CQHCI_CTL); > > } > > } > > } > > @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct > platform_device *pdev) > > struct sdhci_pltfm_host *pltfm_host; > > struct sdhci_host *host; > > struct cqhci_host *cq_host; > > + struct mmc_hsq *hsq; > > int err; > > struct pltfm_imx_data *imx_data; > > > > @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct > platform_device *pdev) > > err = cqhci_init(cq_host, host->mmc, false); > > if (err) > > goto disable_ahb_clk; > > + } else if (esdhc_is_usdhc(imx_data)) { > > + hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq), > GFP_KERNEL); > > + if (!hsq) { > > + err = -ENOMEM; > > + goto disable_ahb_clk; > > + } > > + > > + err = mmc_hsq_init(hsq, host->mmc); > > + if (err) > > + goto disable_ahb_clk; > > } > > > > if (of_id) > > @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct > platform_device *pdev) > > if (err) > > goto disable_ahb_clk; > > > > + if (!mmc_card_is_removable(host->mmc)) > > + host->mmc_host_ops.request_atomic = > sdhci_request_atomic; > > + else > > + host->always_defer_done = true; > > + > > sdhci_esdhc_imx_hwinit(host); > > > > err = sdhci_add_host(host); > > @@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device > *dev) > > ret = cqhci_suspend(host->mmc); > > if (ret) > > return ret; > > + } else if (esdhc_is_usdhc(imx_data)) { > > + mmc_hsq_suspend(host->mmc); > > } > > > > if ((imx_data->socdata->flags & > > ESDHC_FLAG_STATE_LOST_IN_LPMODE) && @@ -1764,6 +1803,8 @@ > static int > > sdhci_esdhc_suspend(struct device *dev) static int > > sdhci_esdhc_resume(struct device *dev) { > > struct sdhci_host *host = dev_get_drvdata(dev); > > + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); > > + struct pltfm_imx_data *imx_data = > > + sdhci_pltfm_priv(pltfm_host); > > int ret; > > > > ret = pinctrl_pm_select_default_state(dev); > > @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device > *dev) > > if (ret) > > return ret; > > > > - if (host->mmc->caps2 & MMC_CAP2_CQE) > > + if (host->mmc->caps2 & MMC_CAP2_CQE) { > > ret = cqhci_resume(host->mmc); > > + } else if (esdhc_is_usdhc(imx_data)) { > > + mmc_hsq_resume(host->mmc); > > + } > > > > if (!ret) > > ret = mmc_gpio_set_cd_wake(host->mmc, false); @@ > > -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device > *dev) > > ret = cqhci_suspend(host->mmc); > > if (ret) > > return ret; > > + } else if (esdhc_is_usdhc(imx_data)) { > > + mmc_hsq_suspend(host->mmc); > > } > > > > ret = sdhci_runtime_suspend_host(host); @@ -1851,8 +1897,11 > @@ > > static int sdhci_esdhc_runtime_resume(struct device *dev) > > if (err) > > goto disable_ipg_clk; > > > > - if (host->mmc->caps2 & MMC_CAP2_CQE) > > + if (host->mmc->caps2 & MMC_CAP2_CQE) { > > err = cqhci_resume(host->mmc); > > + } else if (esdhc_is_usdhc(imx_data)) { > > + mmc_hsq_resume(host->mmc); > > + } > > > > return err; > > > > > > > > > > > Changes from v8: > > > > > - Add more description in the commit message. > > > > > - Optimize the failure log when calling cqe_enable(). > > > > > > > > > > Changes from v7: > > > > > - Add reviewed tag from Arnd. > > > > > - Use the 'hsq' acronym for varibles and functions in the core layer. > > > > > - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the > > > > > CQE can work normally. > > > > > - Add a new patch to enable the host software queue for the SD card. > > > > > - Use the default MMC queue depth for host software queue. > > > > > > > > > > Changes from v6: > > > > > - Change the patch order and set host->always_defer_done = true > > > > > for the Spreadtrum host driver. > > > > > > > > > > Changes from v5: > > > > > - Modify the condition of defering to complete request > > > > > suggested by > > > Adrian. > > > > > > > > > > Changes from v4: > > > > > - Add a seperate patch to introduce a variable to defer to > > > > > complete data requests for some host drivers, when using host > software queue. > > > > > > > > > > Changes from v3: > > > > > - Use host software queue instead of sqhci. > > > > > - Fix random config building issue. > > > > > - Change queue depth to 32, but still only allow 2 requests in flight. > > > > > - Update the testing data. > > > > > > > > > > Changes from v2: > > > > > - Remove reference to 'struct cqhci_host' and 'struct > > > > > cqhci_slot', instead adding 'struct sqhci_host', which is only used by > software queue. > > > > > > > > > > Changes from v1: > > > > > - Add request_done ops for sdhci_ops. > > > > > - Replace virtual command queue with software queue for > > > > > functions and variables. > > > > > - Rename the software queue file and add sqhci.h header file. > > > > > > > > > > Baolin Wang (5): > > > > > mmc: Add MMC host software queue support > > > > > mmc: core: Enable the MMC host software queue for the SD card > > > > > mmc: host: sdhci: Add request_done ops for struct sdhci_ops > > > > > mmc: host: sdhci: Add a variable to defer to complete requests if > > > > > needed > > > > > mmc: host: sdhci-sprd: Add software queue support > > > > > > > > > > drivers/mmc/core/block.c | 61 ++++++++ > > > > > drivers/mmc/core/mmc.c | 18 ++- > > > > > drivers/mmc/core/queue.c | 22 ++- > > > > > drivers/mmc/core/sd.c | 10 ++ > > > > > drivers/mmc/host/Kconfig | 8 + > > > > > drivers/mmc/host/Makefile | 1 + > > > > > drivers/mmc/host/cqhci.c | 8 +- > > > > > drivers/mmc/host/mmc_hsq.c | 343 > > > +++++++++++++++++++++++++++++++++++++++++ > > > > > drivers/mmc/host/mmc_hsq.h | 30 ++++ > > > > > drivers/mmc/host/sdhci-sprd.c | 28 ++++ > > > > > drivers/mmc/host/sdhci.c | 14 +- > > > > > drivers/mmc/host/sdhci.h | 3 + > > > > > include/linux/mmc/host.h | 3 + > > > > > 13 files changed, 534 insertions(+), 15 deletions(-) create > > > > > mode > > > > > 100644 drivers/mmc/host/mmc_hsq.c create mode 100644 > > > > > drivers/mmc/host/mmc_hsq.h > > > > > > > > > > -- > > > > > 1.7.9.5 > > > > > > > > > > > > > Applied for next, thanks! Also, thanks for your patience while > > > > moving forward during the reviews! > > > > > > I am very appreciated for you and Arnd's good sugestion when > > > introducing the hsq. > > > > > > > > > > > Note, I did some amending of patch1 to resolve some checkpatch > > > > warnings. SPDX licence and Kconfig help texts, please have a look > > > > and tell if there are something that doesn't look good. > > > > > > Thanks for your help and looks good to me. > > > > -- > Baolin Wang
On Wed, Jun 10, 2020 at 10:26 AM BOUGH CHEN <haibo.chen@nxp.com> wrote: > > > -----Original Message----- > > From: Baolin Wang [mailto:baolin.wang7@gmail.com] > > Sent: 2020年6月8日 19:54 > > To: BOUGH CHEN <haibo.chen@nxp.com> > > Cc: Ulf Hansson <ulf.hansson@linaro.org>; Adrian Hunter > > <adrian.hunter@intel.com>; Asutosh Das <asutoshd@codeaurora.org>; Orson > > Zhai <orsonzhai@gmail.com>; Chunyan Zhang <zhang.lyra@gmail.com>; Arnd > > Bergmann <arnd@arndb.de>; Linus Walleij <linus.walleij@linaro.org>; Baolin > > Wang <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel > > Mailing List <linux-kernel@vger.kernel.org>; dl-linux-imx <linux-imx@nxp.com> > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > > > Hi Haibo. > > > > On Mon, Jun 8, 2020 at 2:35 PM BOUGH CHEN <haibo.chen@nxp.com> wrote: > > > > > > > -----Original Message----- > > > > From: linux-mmc-owner@vger.kernel.org > > > > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin Wang > > > > Sent: 2020年2月19日 9:35 > > > > To: Ulf Hansson <ulf.hansson@linaro.org> > > > > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das > > > > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>; > > Chunyan > > > > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus > > > > Walleij <linus.walleij@linaro.org>; Baolin Wang > > > > <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel > > > > Mailing List <linux-kernel@vger.kernel.org> > > > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > > > > > > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org> > > > > wrote: > > > > > > > > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com> > > > > wrote: > > > > > > > > > > > > Hi All, > > > > > > > > > > > > Now the MMC read/write stack will always wait for previous > > > > > > request is completed by mmc_blk_rw_wait(), before sending a new > > > > > > request to hardware, or queue a work to complete request, that > > > > > > will bring context switching overhead, especially for high I/O > > > > > > per second rates, to affect the IO performance. > > > > > > > > > > > > Thus this patch set will introduce the MMC software command > > > > > > queue support based on command queue engine's interfaces, and > > > > > > set the queue depth as 64 to allow more requests can be be > > > > > > prepared, merged and inserted into IO scheduler, but we only > > > > > > allow 2 requests in flight, that is enough to let the irq > > > > > > handler always trigger the next request without a context switch, as > > well as avoiding a long latency. > > > > > > > > > > > > Moreover we can expand the MMC software queue interface to > > > > > > support MMC packed request or packed command instead of adding > > > > > > new interfaces, according to previosus discussion. > > > > > > > > > > > > Below are some comparison data with fio tool. The fio command I > > > > > > used is like below with changing the '--rw' parameter and > > > > > > enabling the direct IO flag to measure the actual hardware > > > > > > transfer speed in 4K block > > > > size. > > > > > > > > > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 > > > > > > --rw=read --bs=4K --size=1G --group_reporting --numjobs=20 > > > > > > --name=test_read > > > > > > > > > > > > My eMMC card working at HS400 Enhanced strobe mode: > > > > > > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at > > address > > > > 0001 > > > > > > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > > > > > > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 > > MiB > > > > > > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 > > MiB > > > > > > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 > > MiB, > > > > chardev (248:0) > > > > > > > > > > > > 1. Without MMC software queue > > > > > > I tested 5 times for each case and output a average speed. > > > > > > > > > > > > 1) Sequential read: > > > > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s > > > > > > Average > > > > > > speed: 59.66MiB/s > > > > > > > > > > > > 2) Random read: > > > > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s > > > > > > Average > > > > > > speed: 27.04MiB/s > > > > > > > > > > > > 3) Sequential write: > > > > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s > > > > > > Average > > > > > > speed: 69.68MiB/s > > > > > > > > > > > > 4) Random write: > > > > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s > > > > > > Average > > > > > > speed: 35.96MiB/s > > > > > > > > > > > > 2. With MMC software queue > > > > > > I tested 5 times for each case and output a average speed. > > > > > > > > > > > > 1) Sequential read: > > > > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s > > > > > > Average > > > > > > speed: 60.68MiB/s > > > > > > > > > > > > 2) Random read: > > > > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s > > > > > > Average > > > > > > speed: 31.36MiB/s > > > > > > > > > > > > 3) Sequential write: > > > > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average > > > > > > speed: 71.66MiB/s > > > > > > > > > > > > 4) Random write: > > > > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s > > > > > > Average > > > > > > speed: 68.76MiB/s > > > > > > > > > > > > Form above data, we can see the MMC software queue can help to > > > > > > improve some performance obviously for random read and write, > > > > > > though no obvious improvement for sequential read and write. > > > > > > > > > > > > Any comments are welcome. Thanks a lot. > > > > > > > > > > > > Hi Baolin, > > > > > > I refer to your code, and add the software queue support on i.MX based on > > the Linux next-20200602, but unfortunately, I see an obvious performance drop > > when change to use software queue. > > > I test on our imx850-evk board, with eMMC soldered. > > > From the result listing below, only random write has a little performance > > improve, for others, seems performance drop a lot. > > > I noticed that, this software queue need no-removable card, any other > > limitation? For host? > > > From the code logic, software queue complete the request in irq handler, > > seems no other change, I do not figure out why this will trigger a performance > > drop on my platform. Any comment would be appreciate! > > > > Have you tested with below patches, which introduce an atomic_request host > > ops to submit next request in the irq hard handler context to reduce latency? > > Hi Baolin, > > The Linux code base I use is Linux-next-20200602, which already contain all the patch you listed, including the atomic_request. > I do not test the atomic_request alone. I just enable software queue, and this software queue use the atomic request. > So, do you mean, I need to first disable software queue, and just enable the atomic_request, to confirm whether the atomic_request has some performance impact on my platform? NO, the HSQ will support atomic_request ops, and I think you should implement the atomic_request ops in your driver if possible, like below patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61ab64e2f54f4c607428667f83f411cb659843a3 > > > Without software queue, normal read/write method: > > > Sequential read: 56MB/s > > > Random read: 23.5MB/s > > > Sequential write: 43.7MB/s > > > Random write: 19MB/s > > > > > > With mmc software queue: > > > Sequential read: 33.5MB/s > > > Random read: 18.7 MB/s > > > Sequential write: 37.7MB/s > > > Random write: 19.8MB/s > > > > > > > > > Here, I also list my change code to support software queue > > > > > > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index > > > eb85237bf2d6..996b8cc5c381 100644 > > > --- a/drivers/mmc/host/Kconfig > > > +++ b/drivers/mmc/host/Kconfig > > > @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX > > > depends on MMC_SDHCI_PLTFM > > > select MMC_SDHCI_IO_ACCESSORS > > > select MMC_CQHCI > > > + select MMC_HSQ > > > help > > > This selects the Freescale eSDHC/uSDHC controller support > > > found on i.MX25, i.MX35 i.MX5x and i.MX6x. > > > diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c > > > b/drivers/mmc/host/sdhci-esdhc-imx.c > > > index 1d7f84b23a22..6f163695b08d 100644 > > > --- a/drivers/mmc/host/sdhci-esdhc-imx.c > > > +++ b/drivers/mmc/host/sdhci-esdhc-imx.c > > > @@ -29,6 +29,7 @@ > > > #include "sdhci-pltfm.h" > > > #include "sdhci-esdhc.h" > > > #include "cqhci.h" > > > +#include "mmc_hsq.h" > > > > > > #define ESDHC_SYS_CTRL_DTOCV_MASK 0x0f > > > #define ESDHC_CTRL_D3CD 0x08 > > > @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host > > *host, u32 intmask) > > > return 0; > > > } > > > > > > +static void esdhc_request_done(struct sdhci_host *host, struct > > > +mmc_request *mrq) { > > > + /* Validate if the request was from software queue firstly. */ > > > + if (mmc_hsq_finalize_request(host->mmc, mrq)) > > > + return; > > > + > > > + mmc_request_done(host->mmc, mrq); } > > > + > > > static struct sdhci_ops sdhci_esdhc_ops = { > > > .read_l = esdhc_readl_le, > > > .read_w = esdhc_readw_le, > > > @@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = { > > > .set_uhs_signaling = esdhc_set_uhs_signaling, > > > .reset = esdhc_reset, > > > .irq = esdhc_cqhci_irq, > > > + .request_done = esdhc_request_done, > > > }; > > > > > > static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = { @@ > > > -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host > > *host) > > > writel(tmp, host->ioaddr + > > ESDHC_VEND_SPEC2); > > > > > > host->quirks &= > > ~SDHCI_QUIRK_NO_BUSY_IRQ; > > > + > > > + /* > > > + * On i.MX8MM, we are running Dual Linux OS, > > with 1st Linux using SD Card > > > + * as rootfs storage, 2nd Linux using eMMC as > > rootfs storage. We let the > > > + * the 1st linux configure power/clock for the > > 2nd Linux. > > > + * > > > + * When the 2nd Linux is booting into rootfs > > stage, we let the 1st Linux > > > + * to destroy the 2nd linux, then restart the > > 2nd linux, we met SDHCI dump. > > > + * After we clear the pending interrupt and halt > > CQCTL, issue gone. > > > + */ > > > + tmp = cqhci_readl(cq_host, CQHCI_IS); > > > + cqhci_writel(cq_host, tmp, CQHCI_IS); > > > + cqhci_writel(cq_host, CQHCI_HALT, > > CQHCI_CTL); > > > } > > > > > > if (imx_data->socdata->flags & > > ESDHC_FLAG_STD_TUNING) > > > { @@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct > > sdhci_host *host) > > > * After we clear the pending interrupt and halt CQCTL, > > issue gone. > > > */ > > > if (cq_host) { > > > - tmp = cqhci_readl(cq_host, CQHCI_IS); > > > - cqhci_writel(cq_host, tmp, CQHCI_IS); > > > - cqhci_writel(cq_host, CQHCI_HALT, > > CQHCI_CTL); > > > } > > > } > > > } > > > @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct > > platform_device *pdev) > > > struct sdhci_pltfm_host *pltfm_host; > > > struct sdhci_host *host; > > > struct cqhci_host *cq_host; > > > + struct mmc_hsq *hsq; > > > int err; > > > struct pltfm_imx_data *imx_data; > > > > > > @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct > > platform_device *pdev) > > > err = cqhci_init(cq_host, host->mmc, false); > > > if (err) > > > goto disable_ahb_clk; > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > + hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq), > > GFP_KERNEL); > > > + if (!hsq) { > > > + err = -ENOMEM; > > > + goto disable_ahb_clk; > > > + } > > > + > > > + err = mmc_hsq_init(hsq, host->mmc); > > > + if (err) > > > + goto disable_ahb_clk; > > > } > > > > > > if (of_id) > > > @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct > > platform_device *pdev) > > > if (err) > > > goto disable_ahb_clk; > > > > > > + if (!mmc_card_is_removable(host->mmc)) > > > + host->mmc_host_ops.request_atomic = > > sdhci_request_atomic; > > > + else > > > + host->always_defer_done = true; > > > + > > > sdhci_esdhc_imx_hwinit(host); > > > > > > err = sdhci_add_host(host); > > > @@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device > > *dev) > > > ret = cqhci_suspend(host->mmc); > > > if (ret) > > > return ret; > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > + mmc_hsq_suspend(host->mmc); > > > } > > > > > > if ((imx_data->socdata->flags & > > > ESDHC_FLAG_STATE_LOST_IN_LPMODE) && @@ -1764,6 +1803,8 @@ > > static int > > > sdhci_esdhc_suspend(struct device *dev) static int > > > sdhci_esdhc_resume(struct device *dev) { > > > struct sdhci_host *host = dev_get_drvdata(dev); > > > + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); > > > + struct pltfm_imx_data *imx_data = > > > + sdhci_pltfm_priv(pltfm_host); > > > int ret; > > > > > > ret = pinctrl_pm_select_default_state(dev); > > > @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device > > *dev) > > > if (ret) > > > return ret; > > > > > > - if (host->mmc->caps2 & MMC_CAP2_CQE) > > > + if (host->mmc->caps2 & MMC_CAP2_CQE) { > > > ret = cqhci_resume(host->mmc); > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > + mmc_hsq_resume(host->mmc); > > > + } > > > > > > if (!ret) > > > ret = mmc_gpio_set_cd_wake(host->mmc, false); @@ > > > -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device > > *dev) > > > ret = cqhci_suspend(host->mmc); > > > if (ret) > > > return ret; > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > + mmc_hsq_suspend(host->mmc); > > > } > > > > > > ret = sdhci_runtime_suspend_host(host); @@ -1851,8 +1897,11 > > @@ > > > static int sdhci_esdhc_runtime_resume(struct device *dev) > > > if (err) > > > goto disable_ipg_clk; > > > > > > - if (host->mmc->caps2 & MMC_CAP2_CQE) > > > + if (host->mmc->caps2 & MMC_CAP2_CQE) { > > > err = cqhci_resume(host->mmc); > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > + mmc_hsq_resume(host->mmc); > > > + } > > > > > > return err; > > > > > > > > > > > > > > > Changes from v8: > > > > > > - Add more description in the commit message. > > > > > > - Optimize the failure log when calling cqe_enable(). > > > > > > > > > > > > Changes from v7: > > > > > > - Add reviewed tag from Arnd. > > > > > > - Use the 'hsq' acronym for varibles and functions in the core layer. > > > > > > - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the > > > > > > CQE can work normally. > > > > > > - Add a new patch to enable the host software queue for the SD card. > > > > > > - Use the default MMC queue depth for host software queue. > > > > > > > > > > > > Changes from v6: > > > > > > - Change the patch order and set host->always_defer_done = true > > > > > > for the Spreadtrum host driver. > > > > > > > > > > > > Changes from v5: > > > > > > - Modify the condition of defering to complete request > > > > > > suggested by > > > > Adrian. > > > > > > > > > > > > Changes from v4: > > > > > > - Add a seperate patch to introduce a variable to defer to > > > > > > complete data requests for some host drivers, when using host > > software queue. > > > > > > > > > > > > Changes from v3: > > > > > > - Use host software queue instead of sqhci. > > > > > > - Fix random config building issue. > > > > > > - Change queue depth to 32, but still only allow 2 requests in flight. > > > > > > - Update the testing data. > > > > > > > > > > > > Changes from v2: > > > > > > - Remove reference to 'struct cqhci_host' and 'struct > > > > > > cqhci_slot', instead adding 'struct sqhci_host', which is only used by > > software queue. > > > > > > > > > > > > Changes from v1: > > > > > > - Add request_done ops for sdhci_ops. > > > > > > - Replace virtual command queue with software queue for > > > > > > functions and variables. > > > > > > - Rename the software queue file and add sqhci.h header file. > > > > > > > > > > > > Baolin Wang (5): > > > > > > mmc: Add MMC host software queue support > > > > > > mmc: core: Enable the MMC host software queue for the SD card > > > > > > mmc: host: sdhci: Add request_done ops for struct sdhci_ops > > > > > > mmc: host: sdhci: Add a variable to defer to complete requests if > > > > > > needed > > > > > > mmc: host: sdhci-sprd: Add software queue support > > > > > > > > > > > > drivers/mmc/core/block.c | 61 ++++++++ > > > > > > drivers/mmc/core/mmc.c | 18 ++- > > > > > > drivers/mmc/core/queue.c | 22 ++- > > > > > > drivers/mmc/core/sd.c | 10 ++ > > > > > > drivers/mmc/host/Kconfig | 8 + > > > > > > drivers/mmc/host/Makefile | 1 + > > > > > > drivers/mmc/host/cqhci.c | 8 +- > > > > > > drivers/mmc/host/mmc_hsq.c | 343 > > > > +++++++++++++++++++++++++++++++++++++++++ > > > > > > drivers/mmc/host/mmc_hsq.h | 30 ++++ > > > > > > drivers/mmc/host/sdhci-sprd.c | 28 ++++ > > > > > > drivers/mmc/host/sdhci.c | 14 +- > > > > > > drivers/mmc/host/sdhci.h | 3 + > > > > > > include/linux/mmc/host.h | 3 + > > > > > > 13 files changed, 534 insertions(+), 15 deletions(-) create > > > > > > mode > > > > > > 100644 drivers/mmc/host/mmc_hsq.c create mode 100644 > > > > > > drivers/mmc/host/mmc_hsq.h > > > > > > > > > > > > -- > > > > > > 1.7.9.5 > > > > > > > > > > > > > > > > Applied for next, thanks! Also, thanks for your patience while > > > > > moving forward during the reviews! > > > > > > > > I am very appreciated for you and Arnd's good sugestion when > > > > introducing the hsq. > > > > > > > > > > > > > > Note, I did some amending of patch1 to resolve some checkpatch > > > > > warnings. SPDX licence and Kconfig help texts, please have a look > > > > > and tell if there are something that doesn't look good. > > > > > > > > Thanks for your help and looks good to me. > > > > > > > > -- > > Baolin Wang
On Sun, Jun 14, 2020 at 11:05 PM Baolin Wang <baolin.wang7@gmail.com> wrote: > > On Wed, Jun 10, 2020 at 10:26 AM BOUGH CHEN <haibo.chen@nxp.com> wrote: > > > > > -----Original Message----- > > > From: Baolin Wang [mailto:baolin.wang7@gmail.com] > > > Sent: 2020年6月8日 19:54 > > > To: BOUGH CHEN <haibo.chen@nxp.com> > > > Cc: Ulf Hansson <ulf.hansson@linaro.org>; Adrian Hunter > > > <adrian.hunter@intel.com>; Asutosh Das <asutoshd@codeaurora.org>; Orson > > > Zhai <orsonzhai@gmail.com>; Chunyan Zhang <zhang.lyra@gmail.com>; Arnd > > > Bergmann <arnd@arndb.de>; Linus Walleij <linus.walleij@linaro.org>; Baolin > > > Wang <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel > > > Mailing List <linux-kernel@vger.kernel.org>; dl-linux-imx <linux-imx@nxp.com> > > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > > > > > Hi Haibo. > > > > > > On Mon, Jun 8, 2020 at 2:35 PM BOUGH CHEN <haibo.chen@nxp.com> wrote: > > > > > > > > > -----Original Message----- > > > > > From: linux-mmc-owner@vger.kernel.org > > > > > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin Wang > > > > > Sent: 2020年2月19日 9:35 > > > > > To: Ulf Hansson <ulf.hansson@linaro.org> > > > > > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das > > > > > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>; > > > Chunyan > > > > > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus > > > > > Walleij <linus.walleij@linaro.org>; Baolin Wang > > > > > <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel > > > > > Mailing List <linux-kernel@vger.kernel.org> > > > > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > > > > > > > > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@linaro.org> > > > > > wrote: > > > > > > > > > > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > Now the MMC read/write stack will always wait for previous > > > > > > > request is completed by mmc_blk_rw_wait(), before sending a new > > > > > > > request to hardware, or queue a work to complete request, that > > > > > > > will bring context switching overhead, especially for high I/O > > > > > > > per second rates, to affect the IO performance. > > > > > > > > > > > > > > Thus this patch set will introduce the MMC software command > > > > > > > queue support based on command queue engine's interfaces, and > > > > > > > set the queue depth as 64 to allow more requests can be be > > > > > > > prepared, merged and inserted into IO scheduler, but we only > > > > > > > allow 2 requests in flight, that is enough to let the irq > > > > > > > handler always trigger the next request without a context switch, as > > > well as avoiding a long latency. > > > > > > > > > > > > > > Moreover we can expand the MMC software queue interface to > > > > > > > support MMC packed request or packed command instead of adding > > > > > > > new interfaces, according to previosus discussion. > > > > > > > > > > > > > > Below are some comparison data with fio tool. The fio command I > > > > > > > used is like below with changing the '--rw' parameter and > > > > > > > enabling the direct IO flag to measure the actual hardware > > > > > > > transfer speed in 4K block > > > > > size. > > > > > > > > > > > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 > > > > > > > --rw=read --bs=4K --size=1G --group_reporting --numjobs=20 > > > > > > > --name=test_read > > > > > > > > > > > > > > My eMMC card working at HS400 Enhanced strobe mode: > > > > > > > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at > > > address > > > > > 0001 > > > > > > > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > > > > > > > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 > > > MiB > > > > > > > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 > > > MiB > > > > > > > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 > > > MiB, > > > > > chardev (248:0) > > > > > > > > > > > > > > 1. Without MMC software queue > > > > > > > I tested 5 times for each case and output a average speed. > > > > > > > > > > > > > > 1) Sequential read: > > > > > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s > > > > > > > Average > > > > > > > speed: 59.66MiB/s > > > > > > > > > > > > > > 2) Random read: > > > > > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s > > > > > > > Average > > > > > > > speed: 27.04MiB/s > > > > > > > > > > > > > > 3) Sequential write: > > > > > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s > > > > > > > Average > > > > > > > speed: 69.68MiB/s > > > > > > > > > > > > > > 4) Random write: > > > > > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s > > > > > > > Average > > > > > > > speed: 35.96MiB/s > > > > > > > > > > > > > > 2. With MMC software queue > > > > > > > I tested 5 times for each case and output a average speed. > > > > > > > > > > > > > > 1) Sequential read: > > > > > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s > > > > > > > Average > > > > > > > speed: 60.68MiB/s > > > > > > > > > > > > > > 2) Random read: > > > > > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s > > > > > > > Average > > > > > > > speed: 31.36MiB/s > > > > > > > > > > > > > > 3) Sequential write: > > > > > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average > > > > > > > speed: 71.66MiB/s > > > > > > > > > > > > > > 4) Random write: > > > > > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s > > > > > > > Average > > > > > > > speed: 68.76MiB/s > > > > > > > > > > > > > > Form above data, we can see the MMC software queue can help to > > > > > > > improve some performance obviously for random read and write, > > > > > > > though no obvious improvement for sequential read and write. > > > > > > > > > > > > > > Any comments are welcome. Thanks a lot. > > > > > > > > > > > > > > > Hi Baolin, > > > > > > > > I refer to your code, and add the software queue support on i.MX based on > > > the Linux next-20200602, but unfortunately, I see an obvious performance drop > > > when change to use software queue. > > > > I test on our imx850-evk board, with eMMC soldered. > > > > From the result listing below, only random write has a little performance > > > improve, for others, seems performance drop a lot. > > > > I noticed that, this software queue need no-removable card, any other > > > limitation? For host? > > > > From the code logic, software queue complete the request in irq handler, > > > seems no other change, I do not figure out why this will trigger a performance > > > drop on my platform. Any comment would be appreciate! > > > > > > Have you tested with below patches, which introduce an atomic_request host > > > ops to submit next request in the irq hard handler context to reduce latency? > > > > Hi Baolin, > > > > The Linux code base I use is Linux-next-20200602, which already contain all the patch you listed, including the atomic_request. > > I do not test the atomic_request alone. I just enable software queue, and this software queue use the atomic request. > > So, do you mean, I need to first disable software queue, and just enable the atomic_request, to confirm whether the atomic_request has some performance impact on my platform? > > NO, the HSQ will support atomic_request ops, and I think you should > implement the atomic_request ops in your driver if possible, like > below patch: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61ab64e2f54f4c607428667f83f411cb659843a3 Ah, sorry, I saw you've implemented the atomic_request ops. Moreover I've tested the mainline, and I still got some performance improvement. What's the IO scheduler you selected? > > > > > Without software queue, normal read/write method: > > > > Sequential read: 56MB/s > > > > Random read: 23.5MB/s > > > > Sequential write: 43.7MB/s > > > > Random write: 19MB/s > > > > > > > > With mmc software queue: > > > > Sequential read: 33.5MB/s > > > > Random read: 18.7 MB/s > > > > Sequential write: 37.7MB/s > > > > Random write: 19.8MB/s > > > > > > > > > > > > Here, I also list my change code to support software queue > > > > > > > > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index > > > > eb85237bf2d6..996b8cc5c381 100644 > > > > --- a/drivers/mmc/host/Kconfig > > > > +++ b/drivers/mmc/host/Kconfig > > > > @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX > > > > depends on MMC_SDHCI_PLTFM > > > > select MMC_SDHCI_IO_ACCESSORS > > > > select MMC_CQHCI > > > > + select MMC_HSQ > > > > help > > > > This selects the Freescale eSDHC/uSDHC controller support > > > > found on i.MX25, i.MX35 i.MX5x and i.MX6x. > > > > diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c > > > > b/drivers/mmc/host/sdhci-esdhc-imx.c > > > > index 1d7f84b23a22..6f163695b08d 100644 > > > > --- a/drivers/mmc/host/sdhci-esdhc-imx.c > > > > +++ b/drivers/mmc/host/sdhci-esdhc-imx.c > > > > @@ -29,6 +29,7 @@ > > > > #include "sdhci-pltfm.h" > > > > #include "sdhci-esdhc.h" > > > > #include "cqhci.h" > > > > +#include "mmc_hsq.h" > > > > > > > > #define ESDHC_SYS_CTRL_DTOCV_MASK 0x0f > > > > #define ESDHC_CTRL_D3CD 0x08 > > > > @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host > > > *host, u32 intmask) > > > > return 0; > > > > } > > > > > > > > +static void esdhc_request_done(struct sdhci_host *host, struct > > > > +mmc_request *mrq) { > > > > + /* Validate if the request was from software queue firstly. */ > > > > + if (mmc_hsq_finalize_request(host->mmc, mrq)) > > > > + return; > > > > + > > > > + mmc_request_done(host->mmc, mrq); } > > > > + > > > > static struct sdhci_ops sdhci_esdhc_ops = { > > > > .read_l = esdhc_readl_le, > > > > .read_w = esdhc_readw_le, > > > > @@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = { > > > > .set_uhs_signaling = esdhc_set_uhs_signaling, > > > > .reset = esdhc_reset, > > > > .irq = esdhc_cqhci_irq, > > > > + .request_done = esdhc_request_done, > > > > }; > > > > > > > > static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = { @@ > > > > -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host > > > *host) > > > > writel(tmp, host->ioaddr + > > > ESDHC_VEND_SPEC2); > > > > > > > > host->quirks &= > > > ~SDHCI_QUIRK_NO_BUSY_IRQ; > > > > + > > > > + /* > > > > + * On i.MX8MM, we are running Dual Linux OS, > > > with 1st Linux using SD Card > > > > + * as rootfs storage, 2nd Linux using eMMC as > > > rootfs storage. We let the > > > > + * the 1st linux configure power/clock for the > > > 2nd Linux. > > > > + * > > > > + * When the 2nd Linux is booting into rootfs > > > stage, we let the 1st Linux > > > > + * to destroy the 2nd linux, then restart the > > > 2nd linux, we met SDHCI dump. > > > > + * After we clear the pending interrupt and halt > > > CQCTL, issue gone. > > > > + */ > > > > + tmp = cqhci_readl(cq_host, CQHCI_IS); > > > > + cqhci_writel(cq_host, tmp, CQHCI_IS); > > > > + cqhci_writel(cq_host, CQHCI_HALT, > > > CQHCI_CTL); > > > > } > > > > > > > > if (imx_data->socdata->flags & > > > ESDHC_FLAG_STD_TUNING) > > > > { @@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct > > > sdhci_host *host) > > > > * After we clear the pending interrupt and halt CQCTL, > > > issue gone. > > > > */ > > > > if (cq_host) { > > > > - tmp = cqhci_readl(cq_host, CQHCI_IS); > > > > - cqhci_writel(cq_host, tmp, CQHCI_IS); > > > > - cqhci_writel(cq_host, CQHCI_HALT, > > > CQHCI_CTL); > > > > } > > > > } > > > > } > > > > @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct > > > platform_device *pdev) > > > > struct sdhci_pltfm_host *pltfm_host; > > > > struct sdhci_host *host; > > > > struct cqhci_host *cq_host; > > > > + struct mmc_hsq *hsq; > > > > int err; > > > > struct pltfm_imx_data *imx_data; > > > > > > > > @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct > > > platform_device *pdev) > > > > err = cqhci_init(cq_host, host->mmc, false); > > > > if (err) > > > > goto disable_ahb_clk; > > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > > + hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq), > > > GFP_KERNEL); > > > > + if (!hsq) { > > > > + err = -ENOMEM; > > > > + goto disable_ahb_clk; > > > > + } > > > > + > > > > + err = mmc_hsq_init(hsq, host->mmc); > > > > + if (err) > > > > + goto disable_ahb_clk; > > > > } > > > > > > > > if (of_id) > > > > @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct > > > platform_device *pdev) > > > > if (err) > > > > goto disable_ahb_clk; > > > > > > > > + if (!mmc_card_is_removable(host->mmc)) > > > > + host->mmc_host_ops.request_atomic = > > > sdhci_request_atomic; > > > > + else > > > > + host->always_defer_done = true; > > > > + > > > > sdhci_esdhc_imx_hwinit(host); > > > > > > > > err = sdhci_add_host(host); > > > > @@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device > > > *dev) > > > > ret = cqhci_suspend(host->mmc); > > > > if (ret) > > > > return ret; > > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > > + mmc_hsq_suspend(host->mmc); > > > > } > > > > > > > > if ((imx_data->socdata->flags & > > > > ESDHC_FLAG_STATE_LOST_IN_LPMODE) && @@ -1764,6 +1803,8 @@ > > > static int > > > > sdhci_esdhc_suspend(struct device *dev) static int > > > > sdhci_esdhc_resume(struct device *dev) { > > > > struct sdhci_host *host = dev_get_drvdata(dev); > > > > + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); > > > > + struct pltfm_imx_data *imx_data = > > > > + sdhci_pltfm_priv(pltfm_host); > > > > int ret; > > > > > > > > ret = pinctrl_pm_select_default_state(dev); > > > > @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device > > > *dev) > > > > if (ret) > > > > return ret; > > > > > > > > - if (host->mmc->caps2 & MMC_CAP2_CQE) > > > > + if (host->mmc->caps2 & MMC_CAP2_CQE) { > > > > ret = cqhci_resume(host->mmc); > > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > > + mmc_hsq_resume(host->mmc); > > > > + } > > > > > > > > if (!ret) > > > > ret = mmc_gpio_set_cd_wake(host->mmc, false); @@ > > > > -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device > > > *dev) > > > > ret = cqhci_suspend(host->mmc); > > > > if (ret) > > > > return ret; > > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > > + mmc_hsq_suspend(host->mmc); > > > > } > > > > > > > > ret = sdhci_runtime_suspend_host(host); @@ -1851,8 +1897,11 > > > @@ > > > > static int sdhci_esdhc_runtime_resume(struct device *dev) > > > > if (err) > > > > goto disable_ipg_clk; > > > > > > > > - if (host->mmc->caps2 & MMC_CAP2_CQE) > > > > + if (host->mmc->caps2 & MMC_CAP2_CQE) { > > > > err = cqhci_resume(host->mmc); > > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > > + mmc_hsq_resume(host->mmc); > > > > + } > > > > > > > > return err; > > > > > > > > > > > > > > > > > > > Changes from v8: > > > > > > > - Add more description in the commit message. > > > > > > > - Optimize the failure log when calling cqe_enable(). > > > > > > > > > > > > > > Changes from v7: > > > > > > > - Add reviewed tag from Arnd. > > > > > > > - Use the 'hsq' acronym for varibles and functions in the core layer. > > > > > > > - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the > > > > > > > CQE can work normally. > > > > > > > - Add a new patch to enable the host software queue for the SD card. > > > > > > > - Use the default MMC queue depth for host software queue. > > > > > > > > > > > > > > Changes from v6: > > > > > > > - Change the patch order and set host->always_defer_done = true > > > > > > > for the Spreadtrum host driver. > > > > > > > > > > > > > > Changes from v5: > > > > > > > - Modify the condition of defering to complete request > > > > > > > suggested by > > > > > Adrian. > > > > > > > > > > > > > > Changes from v4: > > > > > > > - Add a seperate patch to introduce a variable to defer to > > > > > > > complete data requests for some host drivers, when using host > > > software queue. > > > > > > > > > > > > > > Changes from v3: > > > > > > > - Use host software queue instead of sqhci. > > > > > > > - Fix random config building issue. > > > > > > > - Change queue depth to 32, but still only allow 2 requests in flight. > > > > > > > - Update the testing data. > > > > > > > > > > > > > > Changes from v2: > > > > > > > - Remove reference to 'struct cqhci_host' and 'struct > > > > > > > cqhci_slot', instead adding 'struct sqhci_host', which is only used by > > > software queue. > > > > > > > > > > > > > > Changes from v1: > > > > > > > - Add request_done ops for sdhci_ops. > > > > > > > - Replace virtual command queue with software queue for > > > > > > > functions and variables. > > > > > > > - Rename the software queue file and add sqhci.h header file. > > > > > > > > > > > > > > Baolin Wang (5): > > > > > > > mmc: Add MMC host software queue support > > > > > > > mmc: core: Enable the MMC host software queue for the SD card > > > > > > > mmc: host: sdhci: Add request_done ops for struct sdhci_ops > > > > > > > mmc: host: sdhci: Add a variable to defer to complete requests if > > > > > > > needed > > > > > > > mmc: host: sdhci-sprd: Add software queue support > > > > > > > > > > > > > > drivers/mmc/core/block.c | 61 ++++++++ > > > > > > > drivers/mmc/core/mmc.c | 18 ++- > > > > > > > drivers/mmc/core/queue.c | 22 ++- > > > > > > > drivers/mmc/core/sd.c | 10 ++ > > > > > > > drivers/mmc/host/Kconfig | 8 + > > > > > > > drivers/mmc/host/Makefile | 1 + > > > > > > > drivers/mmc/host/cqhci.c | 8 +- > > > > > > > drivers/mmc/host/mmc_hsq.c | 343 > > > > > +++++++++++++++++++++++++++++++++++++++++ > > > > > > > drivers/mmc/host/mmc_hsq.h | 30 ++++ > > > > > > > drivers/mmc/host/sdhci-sprd.c | 28 ++++ > > > > > > > drivers/mmc/host/sdhci.c | 14 +- > > > > > > > drivers/mmc/host/sdhci.h | 3 + > > > > > > > include/linux/mmc/host.h | 3 + > > > > > > > 13 files changed, 534 insertions(+), 15 deletions(-) create > > > > > > > mode > > > > > > > 100644 drivers/mmc/host/mmc_hsq.c create mode 100644 > > > > > > > drivers/mmc/host/mmc_hsq.h > > > > > > > > > > > > > > -- > > > > > > > 1.7.9.5 > > > > > > > > > > > > > > > > > > > Applied for next, thanks! Also, thanks for your patience while > > > > > > moving forward during the reviews! > > > > > > > > > > I am very appreciated for you and Arnd's good sugestion when > > > > > introducing the hsq. > > > > > > > > > > > > > > > > > Note, I did some amending of patch1 to resolve some checkpatch > > > > > > warnings. SPDX licence and Kconfig help texts, please have a look > > > > > > and tell if there are something that doesn't look good. > > > > > > > > > > Thanks for your help and looks good to me. > > > > > > > > > > > > -- > > > Baolin Wang > > > > -- > Baolin Wang
> -----Original Message----- > From: Baolin Wang [mailto:baolin.wang7@gmail.com] > Sent: 2020年6月15日 7:26 > To: BOUGH CHEN <haibo.chen@nxp.com> > Cc: Ulf Hansson <ulf.hansson@linaro.org>; Adrian Hunter > <adrian.hunter@intel.com>; Asutosh Das <asutoshd@codeaurora.org>; Orson > Zhai <orsonzhai@gmail.com>; Chunyan Zhang <zhang.lyra@gmail.com>; Arnd > Bergmann <arnd@arndb.de>; Linus Walleij <linus.walleij@linaro.org>; Baolin > Wang <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel > Mailing List <linux-kernel@vger.kernel.org>; dl-linux-imx <linux-imx@nxp.com> > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > On Sun, Jun 14, 2020 at 11:05 PM Baolin Wang <baolin.wang7@gmail.com> > wrote: > > > > On Wed, Jun 10, 2020 at 10:26 AM BOUGH CHEN <haibo.chen@nxp.com> > wrote: > > > > > > > -----Original Message----- > > > > From: Baolin Wang [mailto:baolin.wang7@gmail.com] > > > > Sent: 2020年6月8日 19:54 > > > > To: BOUGH CHEN <haibo.chen@nxp.com> > > > > Cc: Ulf Hansson <ulf.hansson@linaro.org>; Adrian Hunter > > > > <adrian.hunter@intel.com>; Asutosh Das <asutoshd@codeaurora.org>; > > > > Orson Zhai <orsonzhai@gmail.com>; Chunyan Zhang > > > > <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; Linus > > > > Walleij <linus.walleij@linaro.org>; Baolin Wang > > > > <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux Kernel > > > > Mailing List <linux-kernel@vger.kernel.org>; dl-linux-imx > > > > <linux-imx@nxp.com> > > > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > > > > > > > Hi Haibo. > > > > > > > > On Mon, Jun 8, 2020 at 2:35 PM BOUGH CHEN <haibo.chen@nxp.com> > wrote: > > > > > > > > > > > -----Original Message----- > > > > > > From: linux-mmc-owner@vger.kernel.org > > > > > > [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Baolin > > > > > > Wang > > > > > > Sent: 2020年2月19日 9:35 > > > > > > To: Ulf Hansson <ulf.hansson@linaro.org> > > > > > > Cc: Adrian Hunter <adrian.hunter@intel.com>; Asutosh Das > > > > > > <asutoshd@codeaurora.org>; Orson Zhai <orsonzhai@gmail.com>; > > > > Chunyan > > > > > > Zhang <zhang.lyra@gmail.com>; Arnd Bergmann <arnd@arndb.de>; > > > > > > Linus Walleij <linus.walleij@linaro.org>; Baolin Wang > > > > > > <baolin.wang@linaro.org>; linux-mmc@vger.kernel.org; Linux > > > > > > Kernel Mailing List <linux-kernel@vger.kernel.org> > > > > > > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > > > > > > > > > > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson > > > > > > <ulf.hansson@linaro.org> > > > > > > wrote: > > > > > > > > > > > > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang > > > > > > > <baolin.wang7@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > Now the MMC read/write stack will always wait for previous > > > > > > > > request is completed by mmc_blk_rw_wait(), before sending > > > > > > > > a new request to hardware, or queue a work to complete > > > > > > > > request, that will bring context switching overhead, > > > > > > > > especially for high I/O per second rates, to affect the IO > performance. > > > > > > > > > > > > > > > > Thus this patch set will introduce the MMC software > > > > > > > > command queue support based on command queue engine's > > > > > > > > interfaces, and set the queue depth as 64 to allow more > > > > > > > > requests can be be prepared, merged and inserted into IO > > > > > > > > scheduler, but we only allow 2 requests in flight, that is > > > > > > > > enough to let the irq handler always trigger the next > > > > > > > > request without a context switch, as > > > > well as avoiding a long latency. > > > > > > > > > > > > > > > > Moreover we can expand the MMC software queue interface to > > > > > > > > support MMC packed request or packed command instead of > > > > > > > > adding new interfaces, according to previosus discussion. > > > > > > > > > > > > > > > > Below are some comparison data with fio tool. The fio > > > > > > > > command I used is like below with changing the '--rw' > > > > > > > > parameter and enabling the direct IO flag to measure the > > > > > > > > actual hardware transfer speed in 4K block > > > > > > size. > > > > > > > > > > > > > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 > > > > > > > > --rw=read --bs=4K --size=1G --group_reporting --numjobs=20 > > > > > > > > --name=test_read > > > > > > > > > > > > > > > > My eMMC card working at HS400 Enhanced strobe mode: > > > > > > > > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at > > > > address > > > > > > 0001 > > > > > > > > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > > > > > > > > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 > 4.00 > > > > MiB > > > > > > > > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 > 4.00 > > > > MiB > > > > > > > > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 > 4.00 > > > > MiB, > > > > > > chardev (248:0) > > > > > > > > > > > > > > > > 1. Without MMC software queue I tested 5 times for each > > > > > > > > case and output a average speed. > > > > > > > > > > > > > > > > 1) Sequential read: > > > > > > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, > > > > > > > > 60.8MiB/s Average > > > > > > > > speed: 59.66MiB/s > > > > > > > > > > > > > > > > 2) Random read: > > > > > > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, > > > > > > > > 27.2MiB/s Average > > > > > > > > speed: 27.04MiB/s > > > > > > > > > > > > > > > > 3) Sequential write: > > > > > > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, > > > > > > > > 67.5MiB/s Average > > > > > > > > speed: 69.68MiB/s > > > > > > > > > > > > > > > > 4) Random write: > > > > > > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s > > > > > > > > Average > > > > > > > > speed: 35.96MiB/s > > > > > > > > > > > > > > > > 2. With MMC software queue I tested 5 times for each case > > > > > > > > and output a average speed. > > > > > > > > > > > > > > > > 1) Sequential read: > > > > > > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, > > > > > > > > 59.9MiB/s Average > > > > > > > > speed: 60.68MiB/s > > > > > > > > > > > > > > > > 2) Random read: > > > > > > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, > > > > > > > > 31.3MiB/s Average > > > > > > > > speed: 31.36MiB/s > > > > > > > > > > > > > > > > 3) Sequential write: > > > > > > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s > > > > > > > > Average > > > > > > > > speed: 71.66MiB/s > > > > > > > > > > > > > > > > 4) Random write: > > > > > > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, > > > > > > > > 68.8MiB/s Average > > > > > > > > speed: 68.76MiB/s > > > > > > > > > > > > > > > > Form above data, we can see the MMC software queue can > > > > > > > > help to improve some performance obviously for random read > > > > > > > > and write, though no obvious improvement for sequential read and > write. > > > > > > > > > > > > > > > > Any comments are welcome. Thanks a lot. > > > > > > > > > > > > > > > > > > Hi Baolin, > > > > > > > > > > I refer to your code, and add the software queue support on i.MX > > > > > based on > > > > the Linux next-20200602, but unfortunately, I see an obvious > > > > performance drop when change to use software queue. > > > > > I test on our imx850-evk board, with eMMC soldered. > > > > > From the result listing below, only random write has a little > > > > > performance > > > > improve, for others, seems performance drop a lot. > > > > > I noticed that, this software queue need no-removable card, any > > > > > other > > > > limitation? For host? > > > > > From the code logic, software queue complete the request in irq > > > > > handler, > > > > seems no other change, I do not figure out why this will trigger a > > > > performance drop on my platform. Any comment would be appreciate! > > > > > > > > Have you tested with below patches, which introduce an > > > > atomic_request host ops to submit next request in the irq hard handler > context to reduce latency? > > > > > > Hi Baolin, > > > > > > The Linux code base I use is Linux-next-20200602, which already contain all > the patch you listed, including the atomic_request. > > > I do not test the atomic_request alone. I just enable software queue, and > this software queue use the atomic request. > > > So, do you mean, I need to first disable software queue, and just enable the > atomic_request, to confirm whether the atomic_request has some > performance impact on my platform? > > > > NO, the HSQ will support atomic_request ops, and I think you should > > implement the atomic_request ops in your driver if possible, like > > below patch: > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit. > > > kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2 > > > Fcommit%2F%3Fid%3D61ab64e2f54f4c607428667f83f411cb659843a3&d > ata=02 > > %7C01%7Chaibo.chen%40nxp.com%7C500e3b7d66954c14d2f208d810ba488 > 3%7C686e > > > a1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637277739547832756&s > data=0j > > RbAi8gvllPKc0IPjdQfK7nTGQ9piefGu80lWQIG5Y%3D&reserved=0 > > Ah, sorry, I saw you've implemented the atomic_request ops. Moreover I've > tested the mainline, and I still got some performance improvement. > > What's the IO scheduler you selected? > Hi Baolin, I do not change the setting of IO scheduler, so should be the default IO scheduler. One thing I find when I go through your patch set, is that you said the HSQ need host side to has the flag MMC_CAP_WAIT_WHILE_BUSY. But for i.MX usdhc host, it do not support this flag, not sure whether it caused the performance difference. Best Regards Haibo Chen > > > > > > > Without software queue, normal read/write method: > > > > > Sequential read: 56MB/s > > > > > Random read: 23.5MB/s > > > > > Sequential write: 43.7MB/s > > > > > Random write: 19MB/s > > > > > > > > > > With mmc software queue: > > > > > Sequential read: 33.5MB/s > > > > > Random read: 18.7 MB/s > > > > > Sequential write: 37.7MB/s > > > > > Random write: 19.8MB/s > > > > > > > > > > > > > > > Here, I also list my change code to support software queue > > > > > > > > > > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig > > > > > index > > > > > eb85237bf2d6..996b8cc5c381 100644 > > > > > --- a/drivers/mmc/host/Kconfig > > > > > +++ b/drivers/mmc/host/Kconfig > > > > > @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX > > > > > depends on MMC_SDHCI_PLTFM > > > > > select MMC_SDHCI_IO_ACCESSORS > > > > > select MMC_CQHCI > > > > > + select MMC_HSQ > > > > > help > > > > > This selects the Freescale eSDHC/uSDHC controller > support > > > > > found on i.MX25, i.MX35 i.MX5x and i.MX6x. > > > > > diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c > > > > > b/drivers/mmc/host/sdhci-esdhc-imx.c > > > > > index 1d7f84b23a22..6f163695b08d 100644 > > > > > --- a/drivers/mmc/host/sdhci-esdhc-imx.c > > > > > +++ b/drivers/mmc/host/sdhci-esdhc-imx.c > > > > > @@ -29,6 +29,7 @@ > > > > > #include "sdhci-pltfm.h" > > > > > #include "sdhci-esdhc.h" > > > > > #include "cqhci.h" > > > > > +#include "mmc_hsq.h" > > > > > > > > > > #define ESDHC_SYS_CTRL_DTOCV_MASK 0x0f > > > > > #define ESDHC_CTRL_D3CD 0x08 > > > > > @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct > > > > > sdhci_host > > > > *host, u32 intmask) > > > > > return 0; > > > > > } > > > > > > > > > > +static void esdhc_request_done(struct sdhci_host *host, struct > > > > > +mmc_request *mrq) { > > > > > + /* Validate if the request was from software queue firstly. */ > > > > > + if (mmc_hsq_finalize_request(host->mmc, mrq)) > > > > > + return; > > > > > + > > > > > + mmc_request_done(host->mmc, mrq); } > > > > > + > > > > > static struct sdhci_ops sdhci_esdhc_ops = { > > > > > .read_l = esdhc_readl_le, > > > > > .read_w = esdhc_readw_le, @@ -1237,6 +1247,7 @@ static > > > > > struct sdhci_ops sdhci_esdhc_ops = { > > > > > .set_uhs_signaling = esdhc_set_uhs_signaling, > > > > > .reset = esdhc_reset, > > > > > .irq = esdhc_cqhci_irq, > > > > > + .request_done = esdhc_request_done, > > > > > }; > > > > > > > > > > static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = { > > > > > @@ > > > > > -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct > > > > > sdhci_host > > > > *host) > > > > > writel(tmp, host->ioaddr + > > > > ESDHC_VEND_SPEC2); > > > > > > > > > > host->quirks &= > > > > ~SDHCI_QUIRK_NO_BUSY_IRQ; > > > > > + > > > > > + /* > > > > > + * On i.MX8MM, we are running Dual > Linux > > > > > + OS, > > > > with 1st Linux using SD Card > > > > > + * as rootfs storage, 2nd Linux using > > > > > + eMMC as > > > > rootfs storage. We let the > > > > > + * the 1st linux configure power/clock > > > > > + for the > > > > 2nd Linux. > > > > > + * > > > > > + * When the 2nd Linux is booting into > > > > > + rootfs > > > > stage, we let the 1st Linux > > > > > + * to destroy the 2nd linux, then > > > > > + restart the > > > > 2nd linux, we met SDHCI dump. > > > > > + * After we clear the pending interrupt > > > > > + and halt > > > > CQCTL, issue gone. > > > > > + */ > > > > > + tmp = cqhci_readl(cq_host, CQHCI_IS); > > > > > + cqhci_writel(cq_host, tmp, CQHCI_IS); > > > > > + cqhci_writel(cq_host, CQHCI_HALT, > > > > CQHCI_CTL); > > > > > } > > > > > > > > > > if (imx_data->socdata->flags & > > > > ESDHC_FLAG_STD_TUNING) > > > > > { @@ -1351,9 +1375,6 @@ static void > > > > > sdhci_esdhc_imx_hwinit(struct > > > > sdhci_host *host) > > > > > * After we clear the pending interrupt and halt > > > > > CQCTL, > > > > issue gone. > > > > > */ > > > > > if (cq_host) { > > > > > - tmp = cqhci_readl(cq_host, CQHCI_IS); > > > > > - cqhci_writel(cq_host, tmp, CQHCI_IS); > > > > > - cqhci_writel(cq_host, CQHCI_HALT, > > > > CQHCI_CTL); > > > > > } > > > > > } > > > > > } > > > > > @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct > > > > platform_device *pdev) > > > > > struct sdhci_pltfm_host *pltfm_host; > > > > > struct sdhci_host *host; > > > > > struct cqhci_host *cq_host; > > > > > + struct mmc_hsq *hsq; > > > > > int err; > > > > > struct pltfm_imx_data *imx_data; > > > > > > > > > > @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct > > > > platform_device *pdev) > > > > > err = cqhci_init(cq_host, host->mmc, false); > > > > > if (err) > > > > > goto disable_ahb_clk; > > > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > > > + hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq), > > > > GFP_KERNEL); > > > > > + if (!hsq) { > > > > > + err = -ENOMEM; > > > > > + goto disable_ahb_clk; > > > > > + } > > > > > + > > > > > + err = mmc_hsq_init(hsq, host->mmc); > > > > > + if (err) > > > > > + goto disable_ahb_clk; > > > > > } > > > > > > > > > > if (of_id) > > > > > @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct > > > > platform_device *pdev) > > > > > if (err) > > > > > goto disable_ahb_clk; > > > > > > > > > > + if (!mmc_card_is_removable(host->mmc)) > > > > > + host->mmc_host_ops.request_atomic = > > > > sdhci_request_atomic; > > > > > + else > > > > > + host->always_defer_done = true; > > > > > + > > > > > sdhci_esdhc_imx_hwinit(host); > > > > > > > > > > err = sdhci_add_host(host); @@ -1737,6 +1774,8 @@ static > > > > > int sdhci_esdhc_suspend(struct device > > > > *dev) > > > > > ret = cqhci_suspend(host->mmc); > > > > > if (ret) > > > > > return ret; > > > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > > > + mmc_hsq_suspend(host->mmc); > > > > > } > > > > > > > > > > if ((imx_data->socdata->flags & > > > > > ESDHC_FLAG_STATE_LOST_IN_LPMODE) && @@ -1764,6 +1803,8 @@ > > > > static int > > > > > sdhci_esdhc_suspend(struct device *dev) static int > > > > > sdhci_esdhc_resume(struct device *dev) { > > > > > struct sdhci_host *host = dev_get_drvdata(dev); > > > > > + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); > > > > > + struct pltfm_imx_data *imx_data = > > > > > + sdhci_pltfm_priv(pltfm_host); > > > > > int ret; > > > > > > > > > > ret = pinctrl_pm_select_default_state(dev); > > > > > @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct > > > > > device > > > > *dev) > > > > > if (ret) > > > > > return ret; > > > > > > > > > > - if (host->mmc->caps2 & MMC_CAP2_CQE) > > > > > + if (host->mmc->caps2 & MMC_CAP2_CQE) { > > > > > ret = cqhci_resume(host->mmc); > > > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > > > + mmc_hsq_resume(host->mmc); > > > > > + } > > > > > > > > > > if (!ret) > > > > > ret = mmc_gpio_set_cd_wake(host->mmc, false); > @@ > > > > > -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct > > > > > device > > > > *dev) > > > > > ret = cqhci_suspend(host->mmc); > > > > > if (ret) > > > > > return ret; > > > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > > > + mmc_hsq_suspend(host->mmc); > > > > > } > > > > > > > > > > ret = sdhci_runtime_suspend_host(host); @@ -1851,8 > > > > > +1897,11 > > > > @@ > > > > > static int sdhci_esdhc_runtime_resume(struct device *dev) > > > > > if (err) > > > > > goto disable_ipg_clk; > > > > > > > > > > - if (host->mmc->caps2 & MMC_CAP2_CQE) > > > > > + if (host->mmc->caps2 & MMC_CAP2_CQE) { > > > > > err = cqhci_resume(host->mmc); > > > > > + } else if (esdhc_is_usdhc(imx_data)) { > > > > > + mmc_hsq_resume(host->mmc); > > > > > + } > > > > > > > > > > return err; > > > > > > > > > > > > > > > > > > > > > > > Changes from v8: > > > > > > > > - Add more description in the commit message. > > > > > > > > - Optimize the failure log when calling cqe_enable(). > > > > > > > > > > > > > > > > Changes from v7: > > > > > > > > - Add reviewed tag from Arnd. > > > > > > > > - Use the 'hsq' acronym for varibles and functions in the core > layer. > > > > > > > > - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make > > > > > > > > sure the CQE can work normally. > > > > > > > > - Add a new patch to enable the host software queue for the SD > card. > > > > > > > > - Use the default MMC queue depth for host software queue. > > > > > > > > > > > > > > > > Changes from v6: > > > > > > > > - Change the patch order and set host->always_defer_done > > > > > > > > = true for the Spreadtrum host driver. > > > > > > > > > > > > > > > > Changes from v5: > > > > > > > > - Modify the condition of defering to complete request > > > > > > > > suggested by > > > > > > Adrian. > > > > > > > > > > > > > > > > Changes from v4: > > > > > > > > - Add a seperate patch to introduce a variable to defer > > > > > > > > to complete data requests for some host drivers, when > > > > > > > > using host > > > > software queue. > > > > > > > > > > > > > > > > Changes from v3: > > > > > > > > - Use host software queue instead of sqhci. > > > > > > > > - Fix random config building issue. > > > > > > > > - Change queue depth to 32, but still only allow 2 requests in > flight. > > > > > > > > - Update the testing data. > > > > > > > > > > > > > > > > Changes from v2: > > > > > > > > - Remove reference to 'struct cqhci_host' and 'struct > > > > > > > > cqhci_slot', instead adding 'struct sqhci_host', which is > > > > > > > > only used by > > > > software queue. > > > > > > > > > > > > > > > > Changes from v1: > > > > > > > > - Add request_done ops for sdhci_ops. > > > > > > > > - Replace virtual command queue with software queue for > > > > > > > > functions and variables. > > > > > > > > - Rename the software queue file and add sqhci.h header file. > > > > > > > > > > > > > > > > Baolin Wang (5): > > > > > > > > mmc: Add MMC host software queue support > > > > > > > > mmc: core: Enable the MMC host software queue for the SD > card > > > > > > > > mmc: host: sdhci: Add request_done ops for struct sdhci_ops > > > > > > > > mmc: host: sdhci: Add a variable to defer to complete requests > if > > > > > > > > needed > > > > > > > > mmc: host: sdhci-sprd: Add software queue support > > > > > > > > > > > > > > > > drivers/mmc/core/block.c | 61 ++++++++ > > > > > > > > drivers/mmc/core/mmc.c | 18 ++- > > > > > > > > drivers/mmc/core/queue.c | 22 ++- > > > > > > > > drivers/mmc/core/sd.c | 10 ++ > > > > > > > > drivers/mmc/host/Kconfig | 8 + > > > > > > > > drivers/mmc/host/Makefile | 1 + > > > > > > > > drivers/mmc/host/cqhci.c | 8 +- > > > > > > > > drivers/mmc/host/mmc_hsq.c | 343 > > > > > > +++++++++++++++++++++++++++++++++++++++++ > > > > > > > > drivers/mmc/host/mmc_hsq.h | 30 ++++ > > > > > > > > drivers/mmc/host/sdhci-sprd.c | 28 ++++ > > > > > > > > drivers/mmc/host/sdhci.c | 14 +- > > > > > > > > drivers/mmc/host/sdhci.h | 3 + > > > > > > > > include/linux/mmc/host.h | 3 + > > > > > > > > 13 files changed, 534 insertions(+), 15 deletions(-) > > > > > > > > create mode > > > > > > > > 100644 drivers/mmc/host/mmc_hsq.c create mode 100644 > > > > > > > > drivers/mmc/host/mmc_hsq.h > > > > > > > > > > > > > > > > -- > > > > > > > > 1.7.9.5 > > > > > > > > > > > > > > > > > > > > > > Applied for next, thanks! Also, thanks for your patience > > > > > > > while moving forward during the reviews! > > > > > > > > > > > > I am very appreciated for you and Arnd's good sugestion when > > > > > > introducing the hsq. > > > > > > > > > > > > > > > > > > > > Note, I did some amending of patch1 to resolve some > > > > > > > checkpatch warnings. SPDX licence and Kconfig help texts, > > > > > > > please have a look and tell if there are something that doesn't look > good. > > > > > > > > > > > > Thanks for your help and looks good to me. > > > > > > > > > > > > > > > > -- > > > > Baolin Wang > > > > > > > > -- > > Baolin Wang > > > > -- > Baolin Wang