diff mbox series

[1/1] mmc: Set optimal I/O size when mmc_setip_queue

Message ID 20230818022817.3341-1-Sharp.Xia@mediatek.com (mailing list archive)
State New, archived
Headers show
Series [1/1] mmc: Set optimal I/O size when mmc_setip_queue | expand

Commit Message

Sharp Xia (夏宇彬) Aug. 18, 2023, 2:28 a.m. UTC
From: Sharp Xia <Sharp.Xia@mediatek.com>

MMC does not set readahead and uses the default VM_READAHEAD_PAGES
resulting in slower reading speed.
Use the max_req_size reported by host driver to set the optimal
I/O size to improve performance.

Signed-off-by: Sharp Xia <Sharp.Xia@mediatek.com>
---
 drivers/mmc/core/queue.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Ulf Hansson Aug. 24, 2023, 10:55 a.m. UTC | #1
On Fri, 18 Aug 2023 at 04:45, <Sharp.Xia@mediatek.com> wrote:
>
> From: Sharp Xia <Sharp.Xia@mediatek.com>
>
> MMC does not set readahead and uses the default VM_READAHEAD_PAGES
> resulting in slower reading speed.
> Use the max_req_size reported by host driver to set the optimal
> I/O size to improve performance.

This seems reasonable to me. However, it would be nice if you could
share some performance numbers too - comparing before and after
$subject patch.

Kind regards
Uffe

>
> Signed-off-by: Sharp Xia <Sharp.Xia@mediatek.com>
> ---
>  drivers/mmc/core/queue.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index b396e3900717..fc83c4917360 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -359,6 +359,7 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
>                 blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
>         blk_queue_max_hw_sectors(mq->queue,
>                 min(host->max_blk_count, host->max_req_size / 512));
> +       blk_queue_io_opt(mq->queue, host->max_req_size);
>         if (host->can_dma_map_merge)
>                 WARN(!blk_queue_can_use_dma_map_merging(mq->queue,
>                                                         mmc_dev(host)),
> --
> 2.18.0
>
Sharp Xia (夏宇彬) Aug. 25, 2023, 7:10 a.m. UTC | #2
On Thu, 2023-08-24 at 12:55 +0200, Ulf Hansson wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Fri, 18 Aug 2023 at 04:45, <Sharp.Xia@mediatek.com> wrote:
> >
> > From: Sharp Xia <Sharp.Xia@mediatek.com>
> >
> > MMC does not set readahead and uses the default VM_READAHEAD_PAGES
> > resulting in slower reading speed.
> > Use the max_req_size reported by host driver to set the optimal
> > I/O size to improve performance.
> 
> This seems reasonable to me. However, it would be nice if you could
> share some performance numbers too - comparing before and after
> $subject patch.
> 
> Kind regards
> Uffe
> 
> >
> > Signed-off-by: Sharp Xia <Sharp.Xia@mediatek.com>
> > ---
> >  drivers/mmc/core/queue.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > index b396e3900717..fc83c4917360 100644
> > --- a/drivers/mmc/core/queue.c
> > +++ b/drivers/mmc/core/queue.c
> > @@ -359,6 +359,7 @@ static void mmc_setup_queue(struct mmc_queue
> *mq, struct mmc_card *card)
> >                 blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
> >         blk_queue_max_hw_sectors(mq->queue,
> >                 min(host->max_blk_count, host->max_req_size /
> 512));
> > +       blk_queue_io_opt(mq->queue, host->max_req_size);
> >         if (host->can_dma_map_merge)
> >                 WARN(!blk_queue_can_use_dma_map_merging(mq->queue,
> >                                                         mmc_dev(hos
> t)),
> > --
> > 2.18.0
> >

I test this patch on internal platform(kernel-5.15).

Before:
console:/ # echo 3 > /proc/sys/vm/drop_caches
console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
4485393+1 records in
4485393+1 records out
2296521564 bytes (2.1 G) copied, 37.124446 s, 59 M/s
console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
128

After:
console:/ # echo 3 > /proc/sys/vm/drop_caches
console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
4485393+1 records in
4485393+1 records out
2296521564 bytes (2.1 G) copied, 28.956049 s, 76 M/s
console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
1024
Sharp Xia (夏宇彬) Aug. 25, 2023, 7:25 a.m. UTC | #3
On Thu, 2023-08-24 at 12:55 +0200, Ulf Hansson wrote:
>        
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Fri, 18 Aug 2023 at 04:45, <Sharp.Xia@mediatek.com> wrote:
> >
> > From: Sharp Xia <Sharp.Xia@mediatek.com>
> >
> > MMC does not set readahead and uses the default VM_READAHEAD_PAGES
> > resulting in slower reading speed.
> > Use the max_req_size reported by host driver to set the optimal
> > I/O size to improve performance.
> 
> This seems reasonable to me. However, it would be nice if you could
> share some performance numbers too - comparing before and after
> $subject patch.
> 
> Kind regards
> Uffe
> 
> >
> > Signed-off-by: Sharp Xia <Sharp.Xia@mediatek.com>
> > ---
> >  drivers/mmc/core/queue.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > index b396e3900717..fc83c4917360 100644
> > --- a/drivers/mmc/core/queue.c
> > +++ b/drivers/mmc/core/queue.c
> > @@ -359,6 +359,7 @@ static void mmc_setup_queue(struct mmc_queue
> *mq, struct mmc_card *card)
> >                 blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
> >         blk_queue_max_hw_sectors(mq->queue,
> >                 min(host->max_blk_count, host->max_req_size /
> 512));
> > +       blk_queue_io_opt(mq->queue, host->max_req_size);
> >         if (host->can_dma_map_merge)
> >                 WARN(!blk_queue_can_use_dma_map_merging(mq->queue,
> >                                                         mmc_dev(hos
> t)),
> > --
> > 2.18.0
> >

I test this patch on internal platform(kernel-5.15).

Before:
console:/ # echo 3 > /proc/sys/vm/drop_caches
console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
4485393+1 records in
4485393+1 records out
2296521564 bytes (2.1 G) copied, 37.124446 s, 59 M/s
console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
128

After:
console:/ # echo 3 > /proc/sys/vm/drop_caches
console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
4485393+1 records in
4485393+1 records out
2296521564 bytes (2.1 G) copied, 28.956049 s, 76 M/s
console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
1024
Sharp Xia (夏宇彬) Aug. 25, 2023, 7:26 a.m. UTC | #4
On Thu, 2023-08-24 at 12:55 +0200, Ulf Hansson wrote:
>        
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Fri, 18 Aug 2023 at 04:45, <Sharp.Xia@mediatek.com> wrote:
> >
> > From: Sharp Xia <Sharp.Xia@mediatek.com>
> >
> > MMC does not set readahead and uses the default VM_READAHEAD_PAGES
> > resulting in slower reading speed.
> > Use the max_req_size reported by host driver to set the optimal
> > I/O size to improve performance.
> 
> This seems reasonable to me. However, it would be nice if you could
> share some performance numbers too - comparing before and after
> $subject patch.
> 
> Kind regards
> Uffe
> 
> >
> > Signed-off-by: Sharp Xia <Sharp.Xia@mediatek.com>
> > ---
> >  drivers/mmc/core/queue.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > index b396e3900717..fc83c4917360 100644
> > --- a/drivers/mmc/core/queue.c
> > +++ b/drivers/mmc/core/queue.c
> > @@ -359,6 +359,7 @@ static void mmc_setup_queue(struct mmc_queue
> *mq, struct mmc_card *card)
> >                 blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
> >         blk_queue_max_hw_sectors(mq->queue,
> >                 min(host->max_blk_count, host->max_req_size /
> 512));
> > +       blk_queue_io_opt(mq->queue, host->max_req_size);
> >         if (host->can_dma_map_merge)
> >                 WARN(!blk_queue_can_use_dma_map_merging(mq->queue,
> >                                                         mmc_dev(hos
> t)),
> > --
> > 2.18.0
> >

I test this patch on internal platform(kernel-5.15).

Before:
console:/ # echo 3 > /proc/sys/vm/drop_caches
console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
4485393+1 records in
4485393+1 records out
2296521564 bytes (2.1 G) copied, 37.124446 s, 59 M/s
console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
128

After:
console:/ # echo 3 > /proc/sys/vm/drop_caches
console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
4485393+1 records in
4485393+1 records out
2296521564 bytes (2.1 G) copied, 28.956049 s, 76 M/s
console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
1024
Shawn Lin Aug. 25, 2023, 8:11 a.m. UTC | #5
Hi Sharp,

On 2023/8/25 15:10, Sharp Xia (夏宇彬) wrote:
> On Thu, 2023-08-24 at 12:55 +0200, Ulf Hansson wrote:
>>   	
>> External email : Please do not click links or open attachments until
>> you have verified the sender or the content.
>>   On Fri, 18 Aug 2023 at 04:45, <Sharp.Xia@mediatek.com> wrote:
>>>
>>> From: Sharp Xia <Sharp.Xia@mediatek.com>
>>>
>>> MMC does not set readahead and uses the default VM_READAHEAD_PAGES
>>> resulting in slower reading speed.
>>> Use the max_req_size reported by host driver to set the optimal
>>> I/O size to improve performance.
>>
>> This seems reasonable to me. However, it would be nice if you could
>> share some performance numbers too - comparing before and after
>> $subject patch.
>>
>> Kind regards
>> Uffe
>>
>>>
>>> Signed-off-by: Sharp Xia <Sharp.Xia@mediatek.com>
>>> ---
>>>   drivers/mmc/core/queue.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
>>> index b396e3900717..fc83c4917360 100644
>>> --- a/drivers/mmc/core/queue.c
>>> +++ b/drivers/mmc/core/queue.c
>>> @@ -359,6 +359,7 @@ static void mmc_setup_queue(struct mmc_queue
>> *mq, struct mmc_card *card)
>>>                  blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
>>>          blk_queue_max_hw_sectors(mq->queue,
>>>                  min(host->max_blk_count, host->max_req_size /
>> 512));
>>> +       blk_queue_io_opt(mq->queue, host->max_req_size);
>>>          if (host->can_dma_map_merge)
>>>                  WARN(!blk_queue_can_use_dma_map_merging(mq->queue,
>>>                                                          mmc_dev(hos
>> t)),
>>> --
>>> 2.18.0
>>>
> 
> I test this patch on internal platform(kernel-5.15).

I patched this one and the test shows me a stable 11% performance drop.

Before:
echo 3 > proc/sys/vm/drop_caches && dd if=/data/1GB.img of=/dev/null 

2048000+0 records in
2048000+0 records out
1048576000 bytes (0.9 G) copied, 3.912249 s, 256 M/s

After:
echo 3 > proc/sys/vm/drop_caches && dd if=/data/1GB.img of=/dev/null
2048000+0 records in
2048000+0 records out
1048576000 bytes (0.9 G) copied, 4.436271 s, 225 M/s

> 
> Before:
> console:/ # echo 3 > /proc/sys/vm/drop_caches
> console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
> 4485393+1 records in
> 4485393+1 records out
> 2296521564 bytes (2.1 G) copied, 37.124446 s, 59 M/s
> console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
> 128
> 
> After:
> console:/ # echo 3 > /proc/sys/vm/drop_caches
> console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
> 4485393+1 records in
> 4485393+1 records out
> 2296521564 bytes (2.1 G) copied, 28.956049 s, 76 M/s
> console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
> 1024
>
Sharp Xia (夏宇彬) Aug. 25, 2023, 8:39 a.m. UTC | #6
On Fri, 2023-08-25 at 16:11 +0800, Shawn Lin wrote:
>  	 
>  Hi Sharp,
> 
> On 2023/8/25 15:10, Sharp Xia (夏宇彬) wrote:
> > On Thu, 2023-08-24 at 12:55 +0200, Ulf Hansson wrote:
> >>   
> >> External email : Please do not click links or open attachments
> until
> >> you have verified the sender or the content.
> >>   On Fri, 18 Aug 2023 at 04:45, <Sharp.Xia@mediatek.com> wrote:
> >>>
> >>> From: Sharp Xia <Sharp.Xia@mediatek.com>
> >>>
> >>> MMC does not set readahead and uses the default
> VM_READAHEAD_PAGES
> >>> resulting in slower reading speed.
> >>> Use the max_req_size reported by host driver to set the optimal
> >>> I/O size to improve performance.
> >>
> >> This seems reasonable to me. However, it would be nice if you
> could
> >> share some performance numbers too - comparing before and after
> >> $subject patch.
> >>
> >> Kind regards
> >> Uffe
> >>
> >>>
> >>> Signed-off-by: Sharp Xia <Sharp.Xia@mediatek.com>
> >>> ---
> >>>   drivers/mmc/core/queue.c | 1 +
> >>>   1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> >>> index b396e3900717..fc83c4917360 100644
> >>> --- a/drivers/mmc/core/queue.c
> >>> +++ b/drivers/mmc/core/queue.c
> >>> @@ -359,6 +359,7 @@ static void mmc_setup_queue(struct mmc_queue
> >> *mq, struct mmc_card *card)
> >>>                  blk_queue_bounce_limit(mq->queue,
> BLK_BOUNCE_HIGH);
> >>>          blk_queue_max_hw_sectors(mq->queue,
> >>>                  min(host->max_blk_count, host->max_req_size /
> >> 512));
> >>> +       blk_queue_io_opt(mq->queue, host->max_req_size);
> >>>          if (host->can_dma_map_merge)
> >>>                  WARN(!blk_queue_can_use_dma_map_merging(mq-
> >queue,
> >>>                                                          mmc_dev(
> hos
> >> t)),
> >>> --
> >>> 2.18.0
> >>>
> > 
> > I test this patch on internal platform(kernel-5.15).
> 
> I patched this one and the test shows me a stable 11% performance
> drop.
> 
> Before:
> echo 3 > proc/sys/vm/drop_caches && dd if=/data/1GB.img of=/dev/null 
> 
> 2048000+0 records in
> 2048000+0 records out
> 1048576000 bytes (0.9 G) copied, 3.912249 s, 256 M/s
> 
> After:
> echo 3 > proc/sys/vm/drop_caches && dd if=/data/1GB.img of=/dev/null
> 2048000+0 records in
> 2048000+0 records out
> 1048576000 bytes (0.9 G) copied, 4.436271 s, 225 M/s
> 
> > 
> > Before:
> > console:/ # echo 3 > /proc/sys/vm/drop_caches
> > console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
> > 4485393+1 records in
> > 4485393+1 records out
> > 2296521564 bytes (2.1 G) copied, 37.124446 s, 59 M/s
> > console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
> > 128
> > 
> > After:
> > console:/ # echo 3 > /proc/sys/vm/drop_caches
> > console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
> > 4485393+1 records in
> > 4485393+1 records out
> > 2296521564 bytes (2.1 G) copied, 28.956049 s, 76 M/s
> > console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
> > 1024
> > 
Hi Shawn,

What is your readahead value before and after applying this patch?
Shawn Lin Aug. 25, 2023, 9:17 a.m. UTC | #7
On 2023/8/25 16:39, Sharp.Xia@mediatek.com wrote:
> On Fri, 2023-08-25 at 16:11 +0800, Shawn Lin wrote:
>>   	
>>   Hi Sharp,

...

>>> 1024
>>>
> Hi Shawn,
> 
> What is your readahead value before and after applying this patch?
> 

The original readahead is 128, and after applying the patch is 1024


cat /d/mmc0/ios
clock:          200000000 Hz
actual clock:   200000000 Hz
vdd:            18 (3.0 ~ 3.1 V)
bus mode:       2 (push-pull)
chip select:    0 (don't care)
power mode:     2 (on)
bus width:      3 (8 bits)
timing spec:    10 (mmc HS400 enhanced strobe)
signal voltage: 1 (1.80 V)
driver type:    0 (driver type B)

The driver I used is sdhci-of-dwcmshc.c with a KLMBG2JETDB041 eMMC chip.
Wenchao Chen Aug. 25, 2023, 12:23 p.m. UTC | #8
On Fri, Aug 25, 2023 at 7:43 PM <Sharp.Xia@mediatek.com> wrote:
>
> On Fri, 2023-08-25 at 16:11 +0800, Shawn Lin wrote:
> >
> >  Hi Sharp,
> >
> > On 2023/8/25 15:10, Sharp Xia (夏宇彬) wrote:
> > > On Thu, 2023-08-24 at 12:55 +0200, Ulf Hansson wrote:
> > >>
> > >> External email : Please do not click links or open attachments
> > until
> > >> you have verified the sender or the content.
> > >>   On Fri, 18 Aug 2023 at 04:45, <Sharp.Xia@mediatek.com> wrote:
> > >>>
> > >>> From: Sharp Xia <Sharp.Xia@mediatek.com>
> > >>>
> > >>> MMC does not set readahead and uses the default
> > VM_READAHEAD_PAGES
> > >>> resulting in slower reading speed.
> > >>> Use the max_req_size reported by host driver to set the optimal
> > >>> I/O size to improve performance.
> > >>
> > >> This seems reasonable to me. However, it would be nice if you
> > could
> > >> share some performance numbers too - comparing before and after
> > >> $subject patch.
> > >>
> > >> Kind regards
> > >> Uffe
> > >>
> > >>>
> > >>> Signed-off-by: Sharp Xia <Sharp.Xia@mediatek.com>
> > >>> ---
> > >>>   drivers/mmc/core/queue.c | 1 +
> > >>>   1 file changed, 1 insertion(+)
> > >>>
> > >>> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > >>> index b396e3900717..fc83c4917360 100644
> > >>> --- a/drivers/mmc/core/queue.c
> > >>> +++ b/drivers/mmc/core/queue.c
> > >>> @@ -359,6 +359,7 @@ static void mmc_setup_queue(struct mmc_queue
> > >> *mq, struct mmc_card *card)
> > >>>                  blk_queue_bounce_limit(mq->queue,
> > BLK_BOUNCE_HIGH);
> > >>>          blk_queue_max_hw_sectors(mq->queue,
> > >>>                  min(host->max_blk_count, host->max_req_size /
> > >> 512));
> > >>> +       blk_queue_io_opt(mq->queue, host->max_req_size);
> > >>>          if (host->can_dma_map_merge)
> > >>>                  WARN(!blk_queue_can_use_dma_map_merging(mq-
> > >queue,
> > >>>                                                          mmc_dev(
> > hos
> > >> t)),
> > >>> --
> > >>> 2.18.0
> > >>>
> > >
> > > I test this patch on internal platform(kernel-5.15).
> >
> > I patched this one and the test shows me a stable 11% performance
> > drop.
> >
> > Before:
> > echo 3 > proc/sys/vm/drop_caches && dd if=/data/1GB.img of=/dev/null
> >
> > 2048000+0 records in
> > 2048000+0 records out
> > 1048576000 bytes (0.9 G) copied, 3.912249 s, 256 M/s
> >
> > After:
> > echo 3 > proc/sys/vm/drop_caches && dd if=/data/1GB.img of=/dev/null
> > 2048000+0 records in
> > 2048000+0 records out
> > 1048576000 bytes (0.9 G) copied, 4.436271 s, 225 M/s
> >
> > >
> > > Before:
> > > console:/ # echo 3 > /proc/sys/vm/drop_caches
> > > console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
> > > 4485393+1 records in
> > > 4485393+1 records out
> > > 2296521564 bytes (2.1 G) copied, 37.124446 s, 59 M/s
> > > console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
> > > 128
> > >
> > > After:
> > > console:/ # echo 3 > /proc/sys/vm/drop_caches
> > > console:/ # dd if=/mnt/media_rw/8031-130D/super.img of=/dev/null
> > > 4485393+1 records in
> > > 4485393+1 records out
> > > 2296521564 bytes (2.1 G) copied, 28.956049 s, 76 M/s
> > > console:/ # cat /sys/block/mmcblk0/queue/read_ahead_kb
> > > 1024
> > >
> Hi Shawn,
>
> What is your readahead value before and after applying this patch?
>

Hi Sharp
Use "echo 1024 > sys/block/mmcblk0/queue/read_ahead_kb" instead of
"blk_queue_io_opt(mq->queue, host->max_req_size);"?
Sharp Xia (夏宇彬) Aug. 26, 2023, 4:26 p.m. UTC | #9
On Fri, 2023-08-25 at 17:17 +0800, Shawn Lin wrote:
>  	 
> 
>  On 2023/8/25 16:39, Sharp.Xia@mediatek.com wrote:
> > On Fri, 2023-08-25 at 16:11 +0800, Shawn Lin wrote:
> >>   
> >>   Hi Sharp,
> 
> ...
> 
> >>> 1024
> >>>
> > Hi Shawn,
> > 
> > What is your readahead value before and after applying this patch?
> > 
> 
> The original readahead is 128, and after applying the patch is 1024
> 
> 
> cat /d/mmc0/ios
> clock:          200000000 Hz
> actual clock:   200000000 Hz
> vdd:            18 (3.0 ~ 3.1 V)
> bus mode:       2 (push-pull)
> chip select:    0 (don't care)
> power mode:     2 (on)
> bus width:      3 (8 bits)
> timing spec:    10 (mmc HS400 enhanced strobe)
> signal voltage: 1 (1.80 V)
> driver type:    0 (driver type B)
> 
> The driver I used is sdhci-of-dwcmshc.c with a KLMBG2JETDB041 eMMC
> chip.

I tested with RK3568 and sdhci-of-dwcmshc.c driver, the performance improved by 2~3%.
 
Before:
root@OpenWrt:/mnt/mmcblk0p3# time dd if=test.img of=/dev/null
2097152+0 records in
2097152+0 records out
real    0m 6.01s
user    0m 0.84s
sys     0m 2.89s
root@OpenWrt:/mnt/mmcblk0p3# cat /sys/block/mmcblk0/queue/read_ahead_kb
128
 
After:
root@OpenWrt:/mnt/mmcblk0p3# echo 3 > /proc/sys/vm/drop_caches
root@OpenWrt:/mnt/mmcblk0p3# time dd if=test.img of=/dev/null
2097152+0 records in
2097152+0 records out
real    0m 5.86s
user    0m 1.04s
sys     0m 3.18s
root@OpenWrt:/mnt/mmcblk0p3# cat /sys/block/mmcblk0/queue/read_ahead_kb
1024
 
root@OpenWrt:/sys/kernel/debug/mmc0# cat ios
clock:          200000000 Hz
actual clock:   200000000 Hz
vdd:            18 (3.0 ~ 3.1 V)
bus mode:       2 (push-pull)
chip select:    0 (don't care)
power mode:     2 (on)
bus width:      3 (8 bits)
timing spec:    9 (mmc HS200)
signal voltage: 1 (1.80 V)
driver type:    0 (driver type B)
Sharp Xia (夏宇彬) Aug. 26, 2023, 4:54 p.m. UTC | #10
On Fri, 2023-08-25 at 20:23 +0800, Wenchao Chen wrote:
> 
> 
> Hi Sharp
> Use "echo 1024 > sys/block/mmcblk0/queue/read_ahead_kb" instead of
> "blk_queue_io_opt(mq->queue, host->max_req_size);"?

Hi Wenchao,

User space does not know the max_req_size of each mmc host.
And when the SD card is hot inserted,
it is complicated for the user space to modify this value.
Shawn Lin Aug. 28, 2023, 2:27 a.m. UTC | #11
Hi Sharp

On 2023/8/27 0:26, Sharp.Xia@mediatek.com wrote:
> On Fri, 2023-08-25 at 17:17 +0800, Shawn Lin wrote:
>>   	
>>

After more testing, most of my platforms which runs at HS400/HS200 mode 
shows nearly no differences with the readahead ranging from 128 to 1024. 
Yet just a board shows a performance drop now. Highly suspect it's eMMC
chip depends. I would recommand leave it to the BSP guys to decide which
readahead value is best for their usage.

> 
> I tested with RK3568 and sdhci-of-dwcmshc.c driver, the performance improved by 2~3%.
>   
> Before:
> root@OpenWrt:/mnt/mmcblk0p3# time dd if=test.img of=/dev/null
> 2097152+0 records in
> 2097152+0 records out
> real    0m 6.01s
> user    0m 0.84s
> sys     0m 2.89s
> root@OpenWrt:/mnt/mmcblk0p3# cat /sys/block/mmcblk0/queue/read_ahead_kb
> 128
>   
> After:
> root@OpenWrt:/mnt/mmcblk0p3# echo 3 > /proc/sys/vm/drop_caches
> root@OpenWrt:/mnt/mmcblk0p3# time dd if=test.img of=/dev/null
> 2097152+0 records in
> 2097152+0 records out
> real    0m 5.86s
> user    0m 1.04s
> sys     0m 3.18s
> root@OpenWrt:/mnt/mmcblk0p3# cat /sys/block/mmcblk0/queue/read_ahead_kb
> 1024
>   
> root@OpenWrt:/sys/kernel/debug/mmc0# cat ios
> clock:          200000000 Hz
> actual clock:   200000000 Hz
> vdd:            18 (3.0 ~ 3.1 V)
> bus mode:       2 (push-pull)
> chip select:    0 (don't care)
> power mode:     2 (on)
> bus width:      3 (8 bits)
> timing spec:    9 (mmc HS200)
> signal voltage: 1 (1.80 V)
> driver type:    0 (driver type B)
>
Ulf Hansson Aug. 28, 2023, 9:04 a.m. UTC | #12
On Mon, 28 Aug 2023 at 04:28, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>
> Hi Sharp
>
> On 2023/8/27 0:26, Sharp.Xia@mediatek.com wrote:
> > On Fri, 2023-08-25 at 17:17 +0800, Shawn Lin wrote:
> >>
> >>
>
> After more testing, most of my platforms which runs at HS400/HS200 mode
> shows nearly no differences with the readahead ranging from 128 to 1024.
> Yet just a board shows a performance drop now. Highly suspect it's eMMC
> chip depends. I would recommand leave it to the BSP guys to decide which
> readahead value is best for their usage.

That's a very good point. The SD/eMMC card certainly behaves
differently, depending on the request-size.

Another thing we could consider doing, could be to combine the
information about the request-size from the mmc host, with some
relevant information from the registers in the card (not sure exactly
what though).

>
> >
> > I tested with RK3568 and sdhci-of-dwcmshc.c driver, the performance improved by 2~3%.
> >
> > Before:
> > root@OpenWrt:/mnt/mmcblk0p3# time dd if=test.img of=/dev/null
> > 2097152+0 records in
> > 2097152+0 records out
> > real    0m 6.01s
> > user    0m 0.84s
> > sys     0m 2.89s
> > root@OpenWrt:/mnt/mmcblk0p3# cat /sys/block/mmcblk0/queue/read_ahead_kb
> > 128
> >
> > After:
> > root@OpenWrt:/mnt/mmcblk0p3# echo 3 > /proc/sys/vm/drop_caches
> > root@OpenWrt:/mnt/mmcblk0p3# time dd if=test.img of=/dev/null
> > 2097152+0 records in
> > 2097152+0 records out
> > real    0m 5.86s
> > user    0m 1.04s
> > sys     0m 3.18s
> > root@OpenWrt:/mnt/mmcblk0p3# cat /sys/block/mmcblk0/queue/read_ahead_kb
> > 1024
> >
> > root@OpenWrt:/sys/kernel/debug/mmc0# cat ios
> > clock:          200000000 Hz
> > actual clock:   200000000 Hz
> > vdd:            18 (3.0 ~ 3.1 V)
> > bus mode:       2 (push-pull)
> > chip select:    0 (don't care)
> > power mode:     2 (on)
> > bus width:      3 (8 bits)
> > timing spec:    9 (mmc HS200)
> > signal voltage: 1 (1.80 V)
> > driver type:    0 (driver type B)
> >

Thanks for testing and sharing the data, both of you!

Kind regards
Uffe
diff mbox series

Patch

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index b396e3900717..fc83c4917360 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -359,6 +359,7 @@  static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
 		blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
 	blk_queue_max_hw_sectors(mq->queue,
 		min(host->max_blk_count, host->max_req_size / 512));
+	blk_queue_io_opt(mq->queue, host->max_req_size);
 	if (host->can_dma_map_merge)
 		WARN(!blk_queue_can_use_dma_map_merging(mq->queue,
 							mmc_dev(host)),