Message ID | 20250402070934.2387587-2-weilongping@oppo.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] dm-bufio: improve the performance of __dm_bufio_prefetch | expand |
On Wed, 2 Apr 2025, LongPing Wei wrote: > At this time, all bios for hash blocks should eventually > be merged into a single large bio. > > Signed-off-by: LongPing Wei <weilongping@oppo.com> > --- > drivers/md/dm-verity-target.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c > index 3c427f18a04b..813d5cfc7ffa 100644 > --- a/drivers/md/dm-verity-target.c > +++ b/drivers/md/dm-verity-target.c > @@ -1683,6 +1683,10 @@ static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv) > > verity_verify_sig_opts_cleanup(&verify_args); > > + dm_bufio_prefetch_with_ioprio(v->bufio, v->hash_start, > + v->hash_blocks - v->hash_start, > + IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 0)); > + > dm_audit_log_ctr(DM_MSG_PREFIX, ti, 1); > > return 0; > -- > 2.34.1 Hi I would move it into the "resume" callback, so that if the user reconfigures the device stack between "ctr" and "resume", it won't read the data too early. Don't use IOPRIO_CLASS_RT, this is not real-time requirement, IOPRIO_CLASS_RT would slow down concurrent I/O. Another problem with this approach is that when the verity device is big and system memory is small, it just causes I/O churn - new bufio blocks will be displacing old blocks - and it will degrade performance, not improve it. Please, describe some scenario, where this prefetch actually helps. What is the size of the metadata that you are prefetching? What is the total memory size? Is there any benchmark that shows the advantage of this patch? Mikulas
On 2025/4/2 17:56, Mikulas Patocka wrote: > > > On Wed, 2 Apr 2025, LongPing Wei wrote: > >> At this time, all bios for hash blocks should eventually >> be merged into a single large bio. >> >> Signed-off-by: LongPing Wei <weilongping@oppo.com> >> --- >> drivers/md/dm-verity-target.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c >> index 3c427f18a04b..813d5cfc7ffa 100644 >> --- a/drivers/md/dm-verity-target.c >> +++ b/drivers/md/dm-verity-target.c >> @@ -1683,6 +1683,10 @@ static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv) >> >> verity_verify_sig_opts_cleanup(&verify_args); >> >> + dm_bufio_prefetch_with_ioprio(v->bufio, v->hash_start, >> + v->hash_blocks - v->hash_start, >> + IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 0)); >> + >> dm_audit_log_ctr(DM_MSG_PREFIX, ti, 1); >> >> return 0; >> -- >> 2.34.1 > > Hi > > I would move it into the "resume" callback, so that if the user > reconfigures the device stack between "ctr" and "resume", it won't read > the data too early. > > Don't use IOPRIO_CLASS_RT, this is not real-time requirement, > IOPRIO_CLASS_RT would slow down concurrent I/O. > If the prefetch io is submitted with non-rt at first, the later dm io need the same hash block will wait the non-rt bio. > Another problem with this approach is that when the verity device is big > and system memory is small, it just causes I/O churn - new bufio blocks > will be displacing old blocks - and it will degrade performance, not > improve it. > Do we need a solution to check if the memory is enough to the prefetch? For Android devices, the verity device should be created on the boot procedure. > Please, describe some scenario, where this prefetch actually helps. What > is the size of the metadata that you are prefetching? What is the total > memory size? Is there any benchmark that shows the advantage of this > patch? The size of hash blocks for the ROM of our low-end devices is about 71MiB. I want to enhance probability of cache hit when try_verify_in_tasklet is enabled. How about only doing the prefetch when try_verify_in_tasklet is enabled? For example: if (v->use_bh_wq) dm_bufio_prefetch_with_ioprio(v->bufio, v->hash_start, v->hash_blocks - v->hash_start, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 0)); > > Mikulas > LongPing
On Wed, 2 Apr 2025, LongPing Wei wrote: > On 2025/4/2 17:56, Mikulas Patocka wrote: > > > > > > On Wed, 2 Apr 2025, LongPing Wei wrote: > > > > > At this time, all bios for hash blocks should eventually > > > be merged into a single large bio. > > > > > > Signed-off-by: LongPing Wei <weilongping@oppo.com> > > > --- > > > drivers/md/dm-verity-target.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c > > > index 3c427f18a04b..813d5cfc7ffa 100644 > > > --- a/drivers/md/dm-verity-target.c > > > +++ b/drivers/md/dm-verity-target.c > > > @@ -1683,6 +1683,10 @@ static int verity_ctr(struct dm_target *ti, > > > unsigned int argc, char **argv) > > > verity_verify_sig_opts_cleanup(&verify_args); > > > + dm_bufio_prefetch_with_ioprio(v->bufio, v->hash_start, > > > + v->hash_blocks - v->hash_start, > > > + IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 0)); > > > + > > > dm_audit_log_ctr(DM_MSG_PREFIX, ti, 1); > > > return 0; > > > -- > > > 2.34.1 > > > > Hi > > > > I would move it into the "resume" callback, so that if the user > > reconfigures the device stack between "ctr" and "resume", it won't read > > the data too early. > > > > Don't use IOPRIO_CLASS_RT, this is not real-time requirement, > > IOPRIO_CLASS_RT would slow down concurrent I/O. > > > If the prefetch io is submitted with non-rt at first, the later dm io > need the same hash block will wait the non-rt bio. Submitting large I/O with IOPRIO_CLASS_RT will block every other task that does some I/O, so I can't do that. It needs to be changed to IOPRIO_CLASS_NONE or IOPRIO_CLASS_IDLE. > > Another problem with this approach is that when the verity device is big > > and system memory is small, it just causes I/O churn - new bufio blocks > > will be displacing old blocks - and it will degrade performance, not > > improve it. > > > Do we need a solution to check if the memory is enough to the prefetch? Yes. > For Android devices, the verity device should be created on the boot > procedure. > > Please, describe some scenario, where this prefetch actually helps. What > > is the size of the metadata that you are prefetching? What is the total > > memory size? Is there any benchmark that shows the advantage of this > > patch? > > The size of hash blocks for the ROM of our low-end devices is about > 71MiB. I want to enhance probability of cache hit when > try_verify_in_tasklet is enabled. How about only doing the prefetch when > try_verify_in_tasklet is enabled? > For example: > if (v->use_bh_wq) > dm_bufio_prefetch_with_ioprio(v->bufio, v->hash_start, > v->hash_blocks - v->hash_start, > IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 0)); I wouldn't overload the "use_bh_wq" option for that. Perhaps we could add a new option (that would be off by default), so that the patch won't cause problems to existing users. How much does this patch improve Android boot time? So that we can decide whether the improvement is worth the complexity. Mikulas
On 2025/4/3 2:44, Mikulas Patocka wrote: > > > On Wed, 2 Apr 2025, LongPing Wei wrote: > >> On 2025/4/2 17:56, Mikulas Patocka wrote: >>> >>> >>> On Wed, 2 Apr 2025, LongPing Wei wrote: >>> >>>> At this time, all bios for hash blocks should eventually >>>> be merged into a single large bio. >>>> >>>> Signed-off-by: LongPing Wei <weilongping@oppo.com> >>>> --- >>>> drivers/md/dm-verity-target.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c >>>> index 3c427f18a04b..813d5cfc7ffa 100644 >>>> --- a/drivers/md/dm-verity-target.c >>>> +++ b/drivers/md/dm-verity-target.c >>>> @@ -1683,6 +1683,10 @@ static int verity_ctr(struct dm_target *ti, >>>> unsigned int argc, char **argv) >>>> verity_verify_sig_opts_cleanup(&verify_args); >>>> + dm_bufio_prefetch_with_ioprio(v->bufio, v->hash_start, >>>> + v->hash_blocks - v->hash_start, >>>> + IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 0)); >>>> + >>>> dm_audit_log_ctr(DM_MSG_PREFIX, ti, 1); >>>> return 0; >>>> -- >>>> 2.34.1 >>> >>> Hi >>> >>> I would move it into the "resume" callback, so that if the user >>> reconfigures the device stack between "ctr" and "resume", it won't read >>> the data too early. >>> >>> Don't use IOPRIO_CLASS_RT, this is not real-time requirement, >>> IOPRIO_CLASS_RT would slow down concurrent I/O. >>> >> If the prefetch io is submitted with non-rt at first, the later dm io >> need the same hash block will wait the non-rt bio. > > Submitting large I/O with IOPRIO_CLASS_RT will block every other task that > does some I/O, so I can't do that. It needs to be changed to > IOPRIO_CLASS_NONE or IOPRIO_CLASS_IDLE. > >>> Another problem with this approach is that when the verity device is big >>> and system memory is small, it just causes I/O churn - new bufio blocks >>> will be displacing old blocks - and it will degrade performance, not >>> improve it. >>> >> Do we need a solution to check if the memory is enough to the prefetch? > > Yes. > >> For Android devices, the verity device should be created on the boot >> procedure. > >>> Please, describe some scenario, where this prefetch actually helps. What >>> is the size of the metadata that you are prefetching? What is the total >>> memory size? Is there any benchmark that shows the advantage of this >>> patch? >> >> The size of hash blocks for the ROM of our low-end devices is about >> 71MiB. I want to enhance probability of cache hit when >> try_verify_in_tasklet is enabled. How about only doing the prefetch when >> try_verify_in_tasklet is enabled? >> For example: >> if (v->use_bh_wq) >> dm_bufio_prefetch_with_ioprio(v->bufio, v->hash_start, >> v->hash_blocks - v->hash_start, >> IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 0)); > > I wouldn't overload the "use_bh_wq" option for that. Perhaps we could add > a new option (that would be off by default), so that the patch won't cause > problems to existing users. > > How much does this patch improve Android boot time? So that we can decide > whether the improvement is worth the complexity. > > Mikulas > Hi, Mikulas Can we add a parameter that is used only once in the process of ctr or resume? For normal boot, the improvement is not significant. But for upgrading boot, it could be valuable to do this as dm-verity works on dm-user and snapshotd in userspace has a large number of writes in the background. We have met an issue of system running slowly in this situation. If a parameter that is used only once in the process of ctr or resume is acceptable, I want to use it only in upgrading boot mode and the shrinker of dm-bufio should be skipped until the system is upgraded. The goal is to reduce the number of read requests to snapuserd. LongPing
On Wed, Apr 02, 2025 at 03:09:36PM +0800, LongPing Wei wrote: > > + dm_bufio_prefetch_with_ioprio(v->bufio, v->hash_start, > + v->hash_blocks - v->hash_start, > + IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 0)); Err, no. Random code should not prefetch with a RT priority.
On Wed, Apr 02, 2025 at 08:44:39PM +0200, Mikulas Patocka wrote: > Submitting large I/O with IOPRIO_CLASS_RT will block every other task that > does some I/O, so I can't do that. It needs to be changed to > IOPRIO_CLASS_NONE or IOPRIO_CLASS_IDLE. Also random kernel code should not simply upgrade the I/O priority. Your magic snowflake is not going to be as magic for everyone else.
diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c index 3c427f18a04b..813d5cfc7ffa 100644 --- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -1683,6 +1683,10 @@ static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv) verity_verify_sig_opts_cleanup(&verify_args); + dm_bufio_prefetch_with_ioprio(v->bufio, v->hash_start, + v->hash_blocks - v->hash_start, + IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 0)); + dm_audit_log_ctr(DM_MSG_PREFIX, ti, 1); return 0;
At this time, all bios for hash blocks should eventually be merged into a single large bio. Signed-off-by: LongPing Wei <weilongping@oppo.com> --- drivers/md/dm-verity-target.c | 4 ++++ 1 file changed, 4 insertions(+)