Message ID | 1557721744-30545-3-git-send-email-yoshihiro.shimoda.uh@renesas.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Geert Uytterhoeven |
Headers | show |
Series | mmc: renesas_sdhi: improve performance by using IOMMU | expand |
Hi Shimoda-san, thank you for this update! > +static void renesas_sdhi_init_card(struct mmc_host *mmc, struct mmc_card *card) > +{ > + struct tmio_mmc_host *host = mmc_priv(mmc); > + > + if (host->pdev->dev.iommu_group && I wonder if I am too cautious, but maybe we should have another condition here to be checked first, namely "host->mmc->max_segs < 512"? > + (mmc_card_mmc(card) || mmc_card_sd(card))) > + host->mmc->max_segs = 512; > + else > + host->mmc->max_segs = host->pdata->max_segs; max_segs can be 0, so we should probably have: + host->mmc->max_segs = host->pdata->max_segs ?: 32; That also means, for the sys-dmac and Gen2, we then use 512 for the IOMMU case and 32 (default TMIO value) for the non IOMMU case. My understanding is that SYS DMAC can handle 512 in both cases. Maybe it makes sense then to make an incremental patch setting the max_segs value explicitly to 512 in the sys-dmac driver for Gen2? Kind regards, Wolfram
Hi Wolfram-san, > From: Wolfram Sang, Sent: Monday, May 13, 2019 6:01 PM > > Hi Shimoda-san, > > thank you for this update! > > > +static void renesas_sdhi_init_card(struct mmc_host *mmc, struct mmc_card *card) > > +{ > > + struct tmio_mmc_host *host = mmc_priv(mmc); > > + > > + if (host->pdev->dev.iommu_group && > > I wonder if I am too cautious, but maybe we should have another > condition here to be checked first, namely "host->mmc->max_segs < 512"? I got it. I'll fix it on v3 patch. > > + (mmc_card_mmc(card) || mmc_card_sd(card))) > > + host->mmc->max_segs = 512; > > + else > > + host->mmc->max_segs = host->pdata->max_segs; > > max_segs can be 0, so we should probably have: > > + host->mmc->max_segs = host->pdata->max_segs ?: 32; Thank you for the point! I'll fix it on v3 patch. > That also means, for the sys-dmac and Gen2, we then use 512 for the > IOMMU case and 32 (default TMIO value) for the non IOMMU case. My > understanding is that SYS DMAC can handle 512 in both cases. Maybe it > makes sense then to make an incremental patch setting the max_segs value > explicitly to 512 in the sys-dmac driver for Gen2? I also think SYS DMAC can handle 512 segments. However, I'm not sure it can improve the performance or not though. Anyway, an incremental patch makes sense if needed, I think. Best regards, Yoshihiro Shimoda > Kind regards, > > Wolfram
Hi Wolfram-san, > From: Yoshihiro Shimoda, Sent: Monday, May 13, 2019 6:46 PM <snip> > > That also means, for the sys-dmac and Gen2, we then use 512 for the > > IOMMU case and 32 (default TMIO value) for the non IOMMU case. My > > understanding is that SYS DMAC can handle 512 in both cases. Maybe it > > makes sense then to make an incremental patch setting the max_segs value > > explicitly to 512 in the sys-dmac driver for Gen2? > > I also think SYS DMAC can handle 512 segments. However, I'm not sure > it can improve the performance or not though. Anyway, an incremental patch > makes sense if needed, I think. I measured the performance on R-Car H2 Lager. It seems 512 segments improve the sequential input to 5%. May I make an incremental patch on the patch series? What do you think? Best regards, Yoshihiro Shimoda -- kernel v5.1-rc7 + local patches + sdr104,,,,,,,,,,,,,,,,,,,,,,,,,, Buildroot 2019.02.1,,,,,,,,,,,,,,,,,,,,,,,,,, Bonnie++ 1.03e : bonnie\+\+ -d ./ -s 2048 -r 1024 -b -u root,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,, environment,Size,Sequential Output - per char (K/sec),<- (CPU %),Sequential Output - block (K/sec),<- (CPU %),Sequential Output - rewrite (K/sec),<- (CPU %),Sequential Input - per char (K/sec),<- (CPU %),Sequential Input ? block (K/sec),<- (CPU %),Random seeks,<- (CPU %),files,Sequential Create,<- (CPU %),Sequential Read,<- (CPU %),Sequential Delete,<- (CPU %),Random Create,<- (CPU %),Random Read,<- (CPU %),Random Delete,<- (CPU %) max_segs_32_sys_dmac,2G,19651,45,18122,7,11612,5,31417,56,34344,6,20.9,0,16,183,1,+++++,+++,195,1,198,2,+++++,+++,195,2 max_segs_512_sys_dmac,2G,18728,43,18273,8,12405,5,33524,61,34158,6,21.0,0,16,184,1,+++++,+++,189,1,198,2,+++++,+++,194,2
Hi Shimoda-san, > > > That also means, for the sys-dmac and Gen2, we then use 512 for the > > > IOMMU case and 32 (default TMIO value) for the non IOMMU case. My > > > understanding is that SYS DMAC can handle 512 in both cases. Maybe it > > > makes sense then to make an incremental patch setting the max_segs value > > > explicitly to 512 in the sys-dmac driver for Gen2? > > > > I also think SYS DMAC can handle 512 segments. However, I'm not sure > > it can improve the performance or not though. Anyway, an incremental patch > > makes sense if needed, I think. > > I measured the performance on R-Car H2 Lager. It seems 512 segments improve > the sequential input to 5%. May I make an incremental patch on the patch series? > What do you think? Cool! I didn't expect that much of a performance improvement. My main concern was understandable code because there was no "real reason" that we once use 32 and once 512. But if it causes even a performance improvement, even better :) Yes, an incremental patch is a good idea. Thanks for the work, Wolfram
Hi Wolfram-san again, > From: Yoshihiro Shimoda, Sent: Monday, May 13, 2019 6:46 PM > > Hi Wolfram-san, > > > From: Wolfram Sang, Sent: Monday, May 13, 2019 6:01 PM > > > > Hi Shimoda-san, > > > > thank you for this update! > > > > > +static void renesas_sdhi_init_card(struct mmc_host *mmc, struct mmc_card *card) > > > +{ > > > + struct tmio_mmc_host *host = mmc_priv(mmc); > > > + > > > + if (host->pdev->dev.iommu_group && > > > > I wonder if I am too cautious, but maybe we should have another > > condition here to be checked first, namely "host->mmc->max_segs < 512"? > > I got it. I'll fix it on v3 patch. I'm afraid but I misunderstood this condition is "host->pdata->max_segs", not "host->mmc->max_segs", to avoid small max_segs value than pdata->max_segs? (No one has such max_segs value at the moment though.) If we use "host->mmc->max_segs", the max_segments value will be toggled by connecting/disconnecting a card like below: (a card is connected) # cat /sys/block/mmcblk0/queue/max_segments 512 (a card is disconnected and connected again) # cat /sys/block/mmcblk0/queue/max_segments 1 (a card is disconnected and connected again) # cat /sys/block/mmcblk0/queue/max_segments 512 ... Best regards, Yoshihiro Shimoda > > > + (mmc_card_mmc(card) || mmc_card_sd(card))) > > > + host->mmc->max_segs = 512; > > > + else > > > + host->mmc->max_segs = host->pdata->max_segs; > > > > max_segs can be 0, so we should probably have: > > > > + host->mmc->max_segs = host->pdata->max_segs ?: 32; > > Thank you for the point! I'll fix it on v3 patch. > > > That also means, for the sys-dmac and Gen2, we then use 512 for the > > IOMMU case and 32 (default TMIO value) for the non IOMMU case. My > > understanding is that SYS DMAC can handle 512 in both cases. Maybe it > > makes sense then to make an incremental patch setting the max_segs value > > explicitly to 512 in the sys-dmac driver for Gen2? > > I also think SYS DMAC can handle 512 segments. However, I'm not sure > it can improve the performance or not though. Anyway, an incremental patch > makes sense if needed, I think. > > Best regards, > Yoshihiro Shimoda > > > Kind regards, > > > > Wolfram
Hi Shimoda-san, > > > > + if (host->pdev->dev.iommu_group && > > > > > > I wonder if I am too cautious, but maybe we should have another > > > condition here to be checked first, namely "host->mmc->max_segs < 512"? > > > > I got it. I'll fix it on v3 patch. > > I'm afraid but I misunderstood this condition is > "host->pdata->max_segs", not "host->mmc->max_segs", to avoid small > max_segs value than pdata->max_segs? You are right. I was in deed thinking mmc->max_segs because it will be set at probe time, so it would work with values > 512. But I missed the case you described, I am sorry. But using pdata->max_segs should work. > (No one has such max_segs value at the moment though.) Yes. I want to be future-proof here. Just to avoid that the value might be "magically" decreased if the value might be bigger than 512. It would be hard to find then because it is kinda deep in the driver. Two more remarks: * We should probably use a define for the magic value 512. * Maybe you could add a comment to the init_card function describing why we can increase max_segs in that case. Basically, a short summary of your patch description. Does this make sense to you? Kind regards, Wolfram
Hi Wolfram-san, > From: Wolfram Sang, Sent: Wednesday, May 22, 2019 6:57 AM > > Hi Shimoda-san, > > > > > > + if (host->pdev->dev.iommu_group && > > > > > > > > I wonder if I am too cautious, but maybe we should have another > > > > condition here to be checked first, namely "host->mmc->max_segs < 512"? > > > > > > I got it. I'll fix it on v3 patch. > > > > I'm afraid but I misunderstood this condition is > > "host->pdata->max_segs", not "host->mmc->max_segs", to avoid small > > max_segs value than pdata->max_segs? > > You are right. I was in deed thinking mmc->max_segs because it will be > set at probe time, so it would work with values > 512. But I missed the > case you described, I am sorry. But using pdata->max_segs should work. Thank you for the reply. I got it! > > (No one has such max_segs value at the moment though.) > > Yes. I want to be future-proof here. Just to avoid that the value might > be "magically" decreased if the value might be bigger than 512. It would > be hard to find then because it is kinda deep in the driver. I got it. > Two more remarks: > > * We should probably use a define for the magic value 512. I think so. I also would like to use a define for a magic value 32. > * Maybe you could add a comment to the init_card function describing why > we can increase max_segs in that case. Basically, a short summary of > your patch description. It's a good idea! I'll add such a short summary. > Does this make sense to you? Yes. Thank you for your comments. I'll make v3 patch later. Best regards, Yoshihiro Shimoda > Kind regards, > > Wolfram
diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c index 5e9e36e..29a7d66 100644 --- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -203,6 +203,17 @@ static void renesas_sdhi_clk_disable(struct tmio_mmc_host *host) clk_disable_unprepare(priv->clk_cd); } +static void renesas_sdhi_init_card(struct mmc_host *mmc, struct mmc_card *card) +{ + struct tmio_mmc_host *host = mmc_priv(mmc); + + if (host->pdev->dev.iommu_group && + (mmc_card_mmc(card) || mmc_card_sd(card))) + host->mmc->max_segs = 512; + else + host->mmc->max_segs = host->pdata->max_segs; +} + static int renesas_sdhi_card_busy(struct mmc_host *mmc) { struct tmio_mmc_host *host = mmc_priv(mmc); @@ -726,6 +737,8 @@ int renesas_sdhi_probe(struct platform_device *pdev, /* SDR speeds are only available on Gen2+ */ if (mmc_data->flags & TMIO_MMC_MIN_RCAR2) { + host->ops.init_card = renesas_sdhi_init_card; + /* card_busy caused issues on r8a73a4 (pre-Gen2) CD-less SDHI */ host->ops.card_busy = renesas_sdhi_card_busy; host->ops.start_signal_voltage_switch =
In IOMMU environment, since it's possible to merge scatter gather buffers of memory requests to one iova, this patch changes the max_segs value when init_card of mmc_host timing to improve the transfer performance on renesas_sdhi_internal_dmac. Notes that an sdio card may be possible to use scatter gather buffers with non page aligned size, so that this driver will not use multiple segments to avoid any trouble. Also, on renesas_sdhi_sys_dmac, the max_segs value will change from 32 to 512, but the sys_dmac can handle 512 segments, so that this init_card ops is added on "TMIO_MMC_MIN_RCAR2" environment. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> --- drivers/mmc/host/renesas_sdhi_core.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)