Message ID | 20221003153420.285896-2-vschneid@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | bitmap,cpumask: Add for_each_cpu_andnot() | expand |
On Mon, Oct 03, 2022 at 04:34:16PM +0100, Valentin Schneider wrote: > A recent commit made cpumask_next*() trigger a warning when passed > n = nr_cpu_ids - 1. This means extra care must be taken when feeding CPU > numbers back into cpumask_next*(). > > The warning occurs nearly every boot on QEMU: [...] > Fixes: 78e5a3399421 ("cpumask: fix checking valid cpu range") No! It fixes blk-mq bug, which has been revealed after 78e5a3399421. > Suggested-by: Yury Norov <yury.norov@gmail.com> OK, maybe I suggested something like this. But after looking into the code of blk_mq_hctx_next_cpu() code for more, I have a feeling that this should be overridden deeper. Can you check - did this warning raise because hctx->next_cpu, or because cpumask_next_and() was called twice after jumping on select_cpu label? > Signed-off-by: Valentin Schneider <vschneid@redhat.com> > --- > block/blk-mq.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index c96c8c4f751b..30ae51eda95e 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2046,8 +2046,13 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) > > if (--hctx->next_cpu_batch <= 0) { > select_cpu: Because we have backward looking goto, I have a strong feeling that the code should be reorganized. > - next_cpu = cpumask_next_and(next_cpu, hctx->cpumask, > - cpu_online_mask); > + if (next_cpu == nr_cpu_ids - 1) > + next_cpu = nr_cpu_ids; > + else > + next_cpu = cpumask_next_and(next_cpu, > + hctx->cpumask, > + cpu_online_mask); > + > if (next_cpu >= nr_cpu_ids) > next_cpu = blk_mq_first_mapped_cpu(hctx); This simply means 'let's start from the beginning', and should be replaced with cpumask_next_and_wrap(). > hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; Maybe something like this would work? if (--hctx->next_cpu_batch > 0 && cpu_online(next_cpu)) { hctx->next_cpu = next_cpu; return next_cpu; } next_cpu = cpumask_next_and_wrap(next_cpu, hctx->cpumask, cpu_online_mask) if (next_cpu < nr_cpu_ids) { hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; hctx->next_cpu = next_cpu; return next_cpu; } /* * Make sure to re-select CPU next time once after CPUs * in hctx->cpumask become online again. */ hctx->next_cpu = next_cpu; hctx->next_cpu_batch = 1; return WORK_CPU_UNBOUND; I didn't test it and likely screwed some corner case. I'm just trying to say that picking next cpu should be an easier thing. Thanks, Yury
On 03/10/22 10:54, Yury Norov wrote: > On Mon, Oct 03, 2022 at 04:34:16PM +0100, Valentin Schneider wrote: >> A recent commit made cpumask_next*() trigger a warning when passed >> n = nr_cpu_ids - 1. This means extra care must be taken when feeding CPU >> numbers back into cpumask_next*(). >> >> The warning occurs nearly every boot on QEMU: > > [...] > >> Fixes: 78e5a3399421 ("cpumask: fix checking valid cpu range") > > No! It fixes blk-mq bug, which has been revealed after 78e5a3399421. > >> Suggested-by: Yury Norov <yury.norov@gmail.com> > > OK, maybe I suggested something like this. But after looking into the code > of blk_mq_hctx_next_cpu() code for more, I have a feeling that this should > be overridden deeper. > > Can you check - did this warning raise because hctx->next_cpu, or > because cpumask_next_and() was called twice after jumping on > select_cpu label? > It seems to always happen when hctx->next_cpu == nr_cpu_ids-1 at the start of the function - no jumping involved. >> Signed-off-by: Valentin Schneider <vschneid@redhat.com> >> --- >> block/blk-mq.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index c96c8c4f751b..30ae51eda95e 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2046,8 +2046,13 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) >> >> if (--hctx->next_cpu_batch <= 0) { >> select_cpu: > > Because we have backward looking goto, I have a strong feeling that the > code should be reorganized. > >> - next_cpu = cpumask_next_and(next_cpu, hctx->cpumask, >> - cpu_online_mask); >> + if (next_cpu == nr_cpu_ids - 1) >> + next_cpu = nr_cpu_ids; >> + else >> + next_cpu = cpumask_next_and(next_cpu, >> + hctx->cpumask, >> + cpu_online_mask); >> + >> if (next_cpu >= nr_cpu_ids) >> next_cpu = blk_mq_first_mapped_cpu(hctx); > > This simply means 'let's start from the beginning', and should be > replaced with cpumask_next_and_wrap(). I hadn't looked in depth there, but that's a strange behaviour. If we get to the end of the cpumask, blk_mq_first_mapped_cpu() does: int cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask); if (cpu >= nr_cpu_ids) cpu = cpumask_first(hctx->cpumask); return cpu; That if branch means the returned CPU is offline, which then triggers: if (!cpu_online(next_cpu)) { if (!tried) { tried = true; goto select_cpu; } but going back to select_cpu doesn't make much sense, we've already checked that hctx->cpumask and cpu_online_mask were disjoint. > >> hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; > > > Maybe something like this would work? > > if (--hctx->next_cpu_batch > 0 && cpu_online(next_cpu)) { > hctx->next_cpu = next_cpu; > return next_cpu; > } > > next_cpu = cpumask_next_and_wrap(next_cpu, hctx->cpumask, cpu_online_mask) > if (next_cpu < nr_cpu_ids) { > hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; > hctx->next_cpu = next_cpu; > return next_cpu; > } > > /* > * Make sure to re-select CPU next time once after CPUs > * in hctx->cpumask become online again. > */ > hctx->next_cpu = next_cpu; > hctx->next_cpu_batch = 1; > return WORK_CPU_UNBOUND; > > I didn't test it and likely screwed some corner case. I'm just > trying to say that picking next cpu should be an easier thing. > Agreed, your suggestion looks sane, let me play with that a bit. > Thanks, > Yury
diff --git a/block/blk-mq.c b/block/blk-mq.c index c96c8c4f751b..30ae51eda95e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2046,8 +2046,13 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) if (--hctx->next_cpu_batch <= 0) { select_cpu: - next_cpu = cpumask_next_and(next_cpu, hctx->cpumask, - cpu_online_mask); + if (next_cpu == nr_cpu_ids - 1) + next_cpu = nr_cpu_ids; + else + next_cpu = cpumask_next_and(next_cpu, + hctx->cpumask, + cpu_online_mask); + if (next_cpu >= nr_cpu_ids) next_cpu = blk_mq_first_mapped_cpu(hctx); hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
A recent commit made cpumask_next*() trigger a warning when passed n = nr_cpu_ids - 1. This means extra care must be taken when feeding CPU numbers back into cpumask_next*(). The warning occurs nearly every boot on QEMU: [ 5.398453] WARNING: CPU: 3 PID: 162 at include/linux/cpumask.h:110 __blk_mq_delay_run_hw_queue+0x16b/0x180 [ 5.399317] Modules linked in: [ 5.399646] CPU: 3 PID: 162 Comm: ssh-keygen Tainted: G N 6.0.0-rc4-00004-g93003cb24006 #55 [ 5.400135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 5.405430] Call Trace: [ 5.406152] <TASK> [ 5.406452] blk_mq_sched_insert_requests+0x67/0x150 [ 5.406759] blk_mq_flush_plug_list+0xd0/0x280 [ 5.406987] ? bit_wait+0x60/0x60 [ 5.407317] __blk_flush_plug+0xdb/0x120 [ 5.407561] ? bit_wait+0x60/0x60 [ 5.407765] io_schedule_prepare+0x38/0x40 [ 5.407974] io_schedule+0x6/0x40 [ 5.408283] bit_wait_io+0x8/0x60 [ 5.408485] __wait_on_bit+0x72/0x90 [ 5.408668] out_of_line_wait_on_bit+0x8c/0xb0 [ 5.408879] ? swake_up_one+0x30/0x30 [ 5.409158] ext4_read_bh+0x7a/0x80 [ 5.409381] ext4_get_branch+0xc0/0x130 [ 5.409583] ext4_ind_map_blocks+0x1ac/0xb30 [ 5.409844] ? __es_remove_extent+0x61/0x6d0 [ 5.410128] ? blk_account_io_merge_bio+0x67/0xd0 [ 5.410416] ? percpu_counter_add_batch+0x59/0xb0 [ 5.410720] ? percpu_counter_add_batch+0x59/0xb0 [ 5.410949] ? _raw_read_unlock+0x13/0x30 [ 5.411323] ext4_map_blocks+0xc2/0x590 [ 5.411609] ? xa_load+0x7c/0xa0 [ 5.411859] ext4_mpage_readpages+0x4a2/0xaa0 [ 5.412192] read_pages+0x69/0x2b0 [ 5.412477] ? folio_add_lru+0x4f/0x70 [ 5.412696] page_cache_ra_unbounded+0x11d/0x170 [ 5.412960] filemap_get_pages+0x109/0x5d0 [ 5.413340] ? __vma_adjust+0x348/0x930 [ 5.413576] filemap_read+0xb7/0x380 [ 5.413805] ? vma_merge+0x2e9/0x330 [ 5.414067] ? vma_set_page_prot+0x43/0x80 [ 5.414309] ? __inode_security_revalidate+0x5e/0x80 [ 5.414581] ? selinux_file_permission+0x107/0x130 [ 5.414889] vfs_read+0x205/0x2e0 [ 5.415162] ksys_read+0x54/0xd0 [ 5.415348] do_syscall_64+0x3a/0x90 [ 5.415584] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 78e5a3399421 ("cpumask: fix checking valid cpu range") Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> --- block/blk-mq.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)