mbox series

[0/3] mmc: renesas_sdhi_internal_dmac: improve performance by using IOMMU

Message ID 1556255930-18188-1-git-send-email-yoshihiro.shimoda.uh@renesas.com (mailing list archive)
Headers show
Series mmc: renesas_sdhi_internal_dmac: improve performance by using IOMMU | expand

Message

Yoshihiro Shimoda April 26, 2019, 5:18 a.m. UTC
This patch set is based on renesas-drivers.git /
renesas-drivers-2019-04-23-v5.1-rc6 tag.

Since SDHI host internal DMAC of the R-Car Gen3 cannot handle two or more
segments, the performance rate (especially, eMMC HS400 reading) is not good.
However, if IOMMU is enabled on the DMAC, since IOMMU will map multiple
scatter gather buffers as one contignous iova, the DMAC can handle the iova
as well and then the performance rate is possible to improve. In fact,
I have measured the performance by using bonnie++, "Sequential Input - block"
rate was improved on all platforms (r8a7795, r8a77965 and r8a77990).
Please refer to the end of this email about the performance.
(I beleive if the performance is improved, the CPU load is also increased.)

However, in case of a sdio card (especiialy some WiFi cards/drivers),
scatter gather buffers are possible to be not contiguous iova because
each scatter gather buffer has only about 1500 bytes, the DMAC cannot
handle it. So, this patch set adds init_card() ops to detect the card
type, and then the driver changes the max_segs if the DMAC is under
IOMMU environment and an sd card/mmc is detected.

---
kernel v5.1-rc6 + local patches + eMMC ext4 format,,,,,,,,,,,,,,,,,,,,,,,,,,
Buildroot 2019.02.1,,,,,,,,,,,,,,,,,,,,,,,,,,
Bonnie++ 1.03e : bonnie\+\+ -d ./ -s 8192 -r 4096 -b -u root,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,
environment,Size,Sequential Output - per char (K/sec),<- (CPU %),Sequential Output - block (K/sec),<- (CPU %),Sequential Output - rewrite (K/sec),<- (CPU %),Sequential Input - per char (K/sec),<- (CPU %),Sequential Input - block (K/sec),<- (CPU %),Random seeks,<- (CPU %),files,Sequential Create,<- (CPU %),Sequential Read,<- (CPU %),Sequential Delete,<- (CPU %),Random Create,<- (CPU %),Random Read,<- (CPU %),Random Delete,<- (CPU %)
H3_No_IPMMU,8G,80621,97,117133,28,58619,15,70679,94,118682,16,4068.2,12,16,673,3,+++++,+++,597,3,642,3,+++++,+++,588,3
H3_IPMMU,8G,68183,97,130482,31,80730,20,74719,98,195727,25,4326.4,12,16,859,4,+++++,+++,809,4,796,4,+++++,+++,781,4
M3-N_No_IPMMU,8G,59031,96,121806,32,59500,17,54025,95,118384,17,3245.2,14,16,688,4,+++++,+++,641,3,679,4,+++++,+++,641,4
M3-N_IPMMU,8G,57414,93,136734,35,79095,22,56235,98,196351,27,3438.8,15,16,846,5,+++++,+++,809,4,830,5,+++++,+++,815,5
E3_No_IPMMU,8G,32136,96,99390,42,40603,27,28733,94,76958,26,2638.6,32,16,485,15,+++++,+++,485,11,490,15,+++++,+++,491,11
E3_IPMMU,8G,31712,95,119053,48,61360,36,30075,97,138801,44,2714.4,35,16,552,17,+++++,+++,588,13,573,17,+++++,+++,590,13
---

Yoshihiro Shimoda (3):
  mmc: tmio: add init_card ops
  mmc: tmio: No memory size limitation if runs on IOMMU
  mmc: renesas_sdhi_internal_dmac: use multiple segments if possible

 drivers/mmc/host/renesas_sdhi_core.c          |  8 ++++++++
 drivers/mmc/host/renesas_sdhi_internal_dmac.c | 12 ++++++++++++
 drivers/mmc/host/tmio_mmc.h                   |  2 ++
 drivers/mmc/host/tmio_mmc_core.c              | 14 ++++++++++++--
 4 files changed, 34 insertions(+), 2 deletions(-)

Comments

Wolfram Sang April 26, 2019, 9:46 a.m. UTC | #1
Hi Shimoda-san,

thanks for working on this!

> Please refer to the end of this email about the performance.

Yes, nice improvements, great!

> (I beleive if the performance is improved, the CPU load is also increased.)

I do wonder about this a bit, though. IPMMU and DMA shouldn't be that
much expensive for the CPU, or? Am I overlooking something?

Kind regards,

   Wolfram
Yoshihiro Shimoda May 7, 2019, 8:58 a.m. UTC | #2
Hi Wolfram-san,

> From: Wolfram Sang, Sent: Friday, April 26, 2019 6:46 PM
> 
> Hi Shimoda-san,
> 
> thanks for working on this!
> 
> > Please refer to the end of this email about the performance.
> 
> Yes, nice improvements, great!

Thanks!

> > (I beleive if the performance is improved, the CPU load is also increased.)
> 
> I do wonder about this a bit, though. IPMMU and DMA shouldn't be that
> much expensive for the CPU, or? Am I overlooking something?

I'm guessing that a user land app (in this case bonnie++) consumes CPU load for some reason.
I'll experiment whether my guess is correct or not by using usb 3.0 host like below tomorrow:
 - case 1: usb 3.0 host + usb SSD as SuperSpeed (IOMMU is disabled).
 - case 2: usb 3.0 host + usb SSD via a usb2.0 hub as high-speed (IOMMU is disabled).

Best regards,
Yoshihiro Shimoda

> Kind regards,
> 
>    Wolfram
Yoshihiro Shimoda May 8, 2019, 4:28 a.m. UTC | #3
Hi Wolfram-san again,

> From: Yoshihiro Shimoda, Sent: Tuesday, May 7, 2019 5:59 PM
> 
> Hi Wolfram-san,
> 
> > From: Wolfram Sang, Sent: Friday, April 26, 2019 6:46 PM
> >
> > Hi Shimoda-san,
> >
> > thanks for working on this!
> >
> > > Please refer to the end of this email about the performance.
> >
> > Yes, nice improvements, great!
> 
> Thanks!
> 
> > > (I beleive if the performance is improved, the CPU load is also increased.)
> >
> > I do wonder about this a bit, though. IPMMU and DMA shouldn't be that
> > much expensive for the CPU, or? Am I overlooking something?
> 
> I'm guessing that a user land app (in this case bonnie++) consumes CPU load for some reason.
> I'll experiment whether my guess is correct or not by using usb 3.0 host like below tomorrow:
>  - case 1: usb 3.0 host + usb SSD as SuperSpeed (IOMMU is disabled).
>  - case 2: usb 3.0 host + usb SSD via a usb2.0 hub as high-speed (IOMMU is disabled).

I have measured them + IOMMU enabled environment. It seems my guess is correct.

  - case 1: usb 3.0 host + usb SSD as SuperSpeed (IOMMU is disabled).
  - case 2: usb 3.0 host + usb SSD via a usb2.0 hub as high-speed (IOMMU is disabled).
  - case 3: usb 3.0 host + usb SSD as SuperSpeed (IOMMU is enabled).
  - case 4: usb 3.0 host + usb SSD via a usb2.0 hub as high-speed (IOMMU is enabled).

---
kernel v5.1-rc7 + local patches + USB SSD ext4 format,,,,,,,,,,,,,,,,,,,,,,,,,,
Buildroot 2019.02.1,,,,,,,,,,,,,,,,,,,,,,,,,,
Bonnie++ 1.03e : bonnie\+\+ -d ./ -s 8192 -r 4096 -b -u root,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,
environment,Size,Sequential Output - per char (K/sec),<- (CPU %),Sequential Output - block (K/sec),<- (CPU %),Sequential Output - rewrite (K/sec),<- (CPU %),Sequential Input - per char (K/sec),<- (CPU %),Sequential Input ? block (K/sec),<- (CPU %),Random seeks,<- (CPU %),files,Sequential Create,<- (CPU %),Sequential Read,<- (CPU %),Sequential Delete,<- (CPU %),Random Create,<- (CPU %),Random Read,<- (CPU %),Random Delete,<- (CPU %)
H3_SuperSpeed_No_IOMMU,8G,82598,99,242161,58,102489,25,73719,98,254089,32,2133.3,6,16,382,2,+++++,+++,385,1,380,1,+++++,+++,387,2
H3_HighSpeed_No_IOMMU,8G,41971,53,39983,9,16459,4,37900,51,37833,5,1585.7,5,16,304,1,+++++,+++,295,1,302,1,+++++,+++,294,1
H3_SuperSpeed_IOMMU,8G,66139,99,276686,65,132297,33,69732,99,293396,37,2099.2,7,16,389,2,+++++,+++,391,1,382,1,+++++,+++,391,2
H3_HighSpeed_IOMMU,8G,43191,50,40446,9,17432,4,38541,51,38481,5,1619.4,4,16,302,1,+++++,+++,296,1,303,1,+++++,+++,294,1
---

Best regards,
Yoshihiro Shimoda