mbox series

[v6,0/5] Fix potential kernel panic when increase hardware queue

Message ID cover.1588856361.git.zhangweiping@didiglobal.com (mailing list archive)
Headers show
Series Fix potential kernel panic when increase hardware queue | expand

Message

Weiping Zhang May 7, 2020, 1:03 p.m. UTC
Hi,

This series mainly fix the kernel panic when increase hardware queue,
and also fix some other misc issue.

Memleak 1:

__blk_mq_alloc_rq_maps
	__blk_mq_alloc_rq_map

if fail
	blk_mq_free_rq_map

Actually, __blk_mq_alloc_rq_map alloc both map and request, here
also need free request.


Patch1: fix Memleak 1.
Patch2: fix prev_nr_hw_queues issue, need be saved before change.
Patch3: From Ming, fix potential kernel panic when increase hardware queue.
Patch4~5: rename two function, because these two function alloc both
map and request, and keep in pair with blk_mq_free_map_and_request(s).

Changes since V5:
 * fix 80 char per line for patch-4

Changes since V4:
 * use another way to fix kernel panic when increase hardware queue,
   this patch from Ming.

Changes since V3:
* record patchset, fix issue fistly then rename.
* rename function to blk_mq_alloc_map_and_request

Changes since V2:
 * rename some functions name and fix memleak when free map and requests
 * Not free new allocated map and request, they will be relased when tagset gone

Changes since V1:
 * Add fix for potential kernel panic when increase hardware queue

Ming Lei (1):
  block: alloc map and request for new hardware queue

Weiping Zhang (4):
  block: free both rq_map and request
  block: save previous hardware queue count before udpate
  block: rename __blk_mq_alloc_rq_map
  block: rename blk_mq_alloc_rq_maps

 block/blk-mq.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

Comments

Christoph Hellwig May 7, 2020, 1:43 p.m. UTC | #1
The whole series looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Jens Axboe May 7, 2020, 6:31 p.m. UTC | #2
On 5/7/20 7:03 AM, Weiping Zhang wrote:
> Hi,
> 
> This series mainly fix the kernel panic when increase hardware queue,
> and also fix some other misc issue.
> 
> Memleak 1:
> 
> __blk_mq_alloc_rq_maps
> 	__blk_mq_alloc_rq_map
> 
> if fail
> 	blk_mq_free_rq_map
> 
> Actually, __blk_mq_alloc_rq_map alloc both map and request, here
> also need free request.

Applied for 5.8, thanks.
Bart Van Assche May 12, 2020, 1:30 a.m. UTC | #3
On 2020-05-07 06:03, Weiping Zhang wrote:
> This series mainly fix the kernel panic when increase hardware queue,
> and also fix some other misc issue.

Does this patch series survive blktests? I'm asking this because
blktests triggers the crash shown below for Jens' block-for-next branch.
I think this report is the result of a recent change.

run blktests block/030

null_blk: module loaded
Increasing nr_hw_queues to 8 fails, fallback to 1
==================================================================
BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x2f2/0x830
Read of size 8 at addr 0000000000000128 by task nproc/8541

CPU: 5 PID: 8541 Comm: nproc Not tainted 5.7.0-rc4-dbg+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
Call Trace:
 dump_stack+0xa5/0xe6
 __kasan_report.cold+0x65/0xbb
 kasan_report+0x45/0x60
 check_memory_region+0x15e/0x1c0
 __kasan_check_read+0x15/0x20
 blk_mq_map_swqueue+0x2f2/0x830
 __blk_mq_update_nr_hw_queues+0x3df/0x690
 blk_mq_update_nr_hw_queues+0x32/0x50
 nullb_device_submit_queues_store+0xde/0x160 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x14b/0x2d0
 ksys_write+0xdd/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x310
 entry_SYSCALL_64_after_hwframe+0x49/0xb3

Thanks,

Bart.
Weiping Zhang May 12, 2020, 12:09 p.m. UTC | #4
On Tue, May 12, 2020 at 9:31 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 2020-05-07 06:03, Weiping Zhang wrote:
> > This series mainly fix the kernel panic when increase hardware queue,
> > and also fix some other misc issue.
>
> Does this patch series survive blktests? I'm asking this because
> blktests triggers the crash shown below for Jens' block-for-next branch.
> I think this report is the result of a recent change.
>
> run blktests block/030
>
> null_blk: module loaded
> Increasing nr_hw_queues to 8 fails, fallback to 1
> ==================================================================
> BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x2f2/0x830
> Read of size 8 at addr 0000000000000128 by task nproc/8541
>
> CPU: 5 PID: 8541 Comm: nproc Not tainted 5.7.0-rc4-dbg+ #3
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
> Call Trace:
>  dump_stack+0xa5/0xe6
>  __kasan_report.cold+0x65/0xbb
>  kasan_report+0x45/0x60
>  check_memory_region+0x15e/0x1c0
>  __kasan_check_read+0x15/0x20
>  blk_mq_map_swqueue+0x2f2/0x830
>  __blk_mq_update_nr_hw_queues+0x3df/0x690
>  blk_mq_update_nr_hw_queues+0x32/0x50
>  nullb_device_submit_queues_store+0xde/0x160 [null_blk]
>  configfs_write_file+0x1c4/0x250 [configfs]
>  __vfs_write+0x4c/0x90
>  vfs_write+0x14b/0x2d0
>  ksys_write+0xdd/0x180
>  __x64_sys_write+0x47/0x50
>  do_syscall_64+0x6f/0x310
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
>
> Thanks,
>

Hi Bart,

I don't test block/030, since I don't pull blktest very often.

It's a different problem,
because the mapping cann't be reset when do fallback, so the
cpu[>=1] will point to a hctx(!=0).

 it should be fixed by:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bc34d6b572b6..d82cefb0474f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3365,8 +3365,8 @@ static void __blk_mq_update_nr_hw_queues(struct
blk_mq_tag_set *set,
                goto reregister;

        set->nr_hw_queues = nr_hw_queues;
-       blk_mq_update_queue_map(set);
 fallback:
+       blk_mq_update_queue_map(set);
        list_for_each_entry(q, &set->tag_list, tag_set_list) {
                blk_mq_realloc_hw_ctxs(set, q);
                if (q->nr_hw_queues != set->nr_hw_queues) {
> Bart.
Weiping Zhang May 12, 2020, 12:20 p.m. UTC | #5
On Tue, May 12, 2020 at 8:09 PM Weiping Zhang <zwp10758@gmail.com> wrote:
>
> On Tue, May 12, 2020 at 9:31 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >
> > On 2020-05-07 06:03, Weiping Zhang wrote:
> > > This series mainly fix the kernel panic when increase hardware queue,
> > > and also fix some other misc issue.
> >
> > Does this patch series survive blktests? I'm asking this because
> > blktests triggers the crash shown below for Jens' block-for-next branch.
> > I think this report is the result of a recent change.
> >
> > run blktests block/030
> >
> > null_blk: module loaded
> > Increasing nr_hw_queues to 8 fails, fallback to 1
> > ==================================================================
> > BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x2f2/0x830
> > Read of size 8 at addr 0000000000000128 by task nproc/8541
> >
> > CPU: 5 PID: 8541 Comm: nproc Not tainted 5.7.0-rc4-dbg+ #3
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
> > Call Trace:
> >  dump_stack+0xa5/0xe6
> >  __kasan_report.cold+0x65/0xbb
> >  kasan_report+0x45/0x60
> >  check_memory_region+0x15e/0x1c0
> >  __kasan_check_read+0x15/0x20
> >  blk_mq_map_swqueue+0x2f2/0x830
> >  __blk_mq_update_nr_hw_queues+0x3df/0x690
> >  blk_mq_update_nr_hw_queues+0x32/0x50
> >  nullb_device_submit_queues_store+0xde/0x160 [null_blk]
> >  configfs_write_file+0x1c4/0x250 [configfs]
> >  __vfs_write+0x4c/0x90
> >  vfs_write+0x14b/0x2d0
> >  ksys_write+0xdd/0x180
> >  __x64_sys_write+0x47/0x50
> >  do_syscall_64+0x6f/0x310
> >  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> >
> > Thanks,
> >
>
> Hi Bart,
>
> I don't test block/030, since I don't pull blktest very often.
>
> It's a different problem,
> because the mapping cann't be reset when do fallback, so the
> cpu[>=1] will point to a hctx(!=0).
>
>  it should be fixed by:
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index bc34d6b572b6..d82cefb0474f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3365,8 +3365,8 @@ static void __blk_mq_update_nr_hw_queues(struct
> blk_mq_tag_set *set,
>                 goto reregister;
>
>         set->nr_hw_queues = nr_hw_queues;
> -       blk_mq_update_queue_map(set);
>  fallback:
> +       blk_mq_update_queue_map(set);
>         list_for_each_entry(q, &set->tag_list, tag_set_list) {
>                 blk_mq_realloc_hw_ctxs(set, q);
>                 if (q->nr_hw_queues != set->nr_hw_queues) {

And block/030 should also be improved ?

 35         # Since older null_blk versions do not allow "submit_queues" to be
 36         # modified, check first whether that configs attribute is writeable.
 37         # Each iteration of the loop below triggers $(nproc) + 1
 38         # null_init_hctx() calls. Since <interval>=$(nproc), all possible
 39         # blk_mq_realloc_hw_ctxs() error paths will be triggered. Whether or
 40         # not this test succeeds depends on whether or not _check_dmesg()
 41         # detects a kernel warning.
 42         if { echo "$(<"$sq")" >$sq; } 2>/dev/null; then
 43                 for ((i = 0; i < 100; i++)); do
 44                         echo 1 > $sq
 45                         nproc > $sq  # this line output lots
"nproc: write error: Cannot allocate memory"
 46                 done
 47         else
 48                 SKIP_REASON="Skipping test because $sq cannot be modified"
 49         fi


The test result show this test case [failed], actually it [pass],
there is no warning detect
in kernel log, if apply above patch.

block/030 (trigger the blk_mq_realloc_hw_ctxs() error path)  [failed]
    runtime  1.999s  ...  2.115s
    --- tests/block/030.out 2020-05-12 10:42:26.345782849 +0800
    +++ /data1/zwp/src/blktests/results/nodev/block/030.out.bad
2020-05-12 20:14:59.878915218 +0800
    @@ -1 +1,51 @@
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    ...
    (Run 'diff -u tests/block/030.out
/data1/zwp/src/blktests/results/nodev/block/030.out.bad' to see the
entire diff)

Thanks
Weiping
Bart Van Assche May 12, 2020, 11:08 p.m. UTC | #6
On 2020-05-12 05:20, Weiping Zhang wrote:
> On Tue, May 12, 2020 at 8:09 PM Weiping Zhang <zwp10758@gmail.com> wrote:
>> I don't test block/030, since I don't pull blktest very often.

That's unfortunate ...

>> It's a different problem,
>> because the mapping cann't be reset when do fallback, so the
>> cpu[>=1] will point to a hctx(!=0).
>>
>>  it should be fixed by:
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index bc34d6b572b6..d82cefb0474f 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -3365,8 +3365,8 @@ static void __blk_mq_update_nr_hw_queues(struct
>> blk_mq_tag_set *set,
>>                 goto reregister;
>>
>>         set->nr_hw_queues = nr_hw_queues;
>> -       blk_mq_update_queue_map(set);
>>  fallback:
>> +       blk_mq_update_queue_map(set);
>>         list_for_each_entry(q, &set->tag_list, tag_set_list) {
>>                 blk_mq_realloc_hw_ctxs(set, q);
>>                 if (q->nr_hw_queues != set->nr_hw_queues) {

If this is posted as a patch, feel free to add:

Tested-by: Bart van Assche <bvanassche@acm.org>

> And block/030 should also be improved ?
> 
>  35         # Since older null_blk versions do not allow "submit_queues" to be
>  36         # modified, check first whether that configs attribute is writeable.
>  37         # Each iteration of the loop below triggers $(nproc) + 1
>  38         # null_init_hctx() calls. Since <interval>=$(nproc), all possible
>  39         # blk_mq_realloc_hw_ctxs() error paths will be triggered. Whether or
>  40         # not this test succeeds depends on whether or not _check_dmesg()
>  41         # detects a kernel warning.
>  42         if { echo "$(<"$sq")" >$sq; } 2>/dev/null; then
>  43                 for ((i = 0; i < 100; i++)); do
>  44                         echo 1 > $sq
>  45                         nproc > $sq  # this line output lots
> "nproc: write error: Cannot allocate memory"
>  46                 done
>  47         else
>  48                 SKIP_REASON="Skipping test because $sq cannot be modified"
>  49         fi
> 
> 
> The test result show this test case [failed], actually it [pass],
> there is no warning detect
> in kernel log, if apply above patch.
> 
> block/030 (trigger the blk_mq_realloc_hw_ctxs() error path)  [failed]
>     runtime  1.999s  ...  2.115s
>     --- tests/block/030.out 2020-05-12 10:42:26.345782849 +0800
>     +++ /data1/zwp/src/blktests/results/nodev/block/030.out.bad
> 2020-05-12 20:14:59.878915218 +0800
>     @@ -1 +1,51 @@
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     ...
>     (Run 'diff -u tests/block/030.out
> /data1/zwp/src/blktests/results/nodev/block/030.out.bad' to see the
> entire diff)

That's weird. I have not yet encountered this. Test block/030 passes on
my setup.

Thanks,

Bart.
Weiping Zhang May 13, 2020, 12:43 a.m. UTC | #7
On Wed, May 13, 2020 at 7:08 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 2020-05-12 05:20, Weiping Zhang wrote:
> > On Tue, May 12, 2020 at 8:09 PM Weiping Zhang <zwp10758@gmail.com> wrote:
> >> I don't test block/030, since I don't pull blktest very often.
>
> That's unfortunate ...
>
> >> It's a different problem,
> >> because the mapping cann't be reset when do fallback, so the
> >> cpu[>=1] will point to a hctx(!=0).
> >>
> >>  it should be fixed by:
> >>
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index bc34d6b572b6..d82cefb0474f 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -3365,8 +3365,8 @@ static void __blk_mq_update_nr_hw_queues(struct
> >> blk_mq_tag_set *set,
> >>                 goto reregister;
> >>
> >>         set->nr_hw_queues = nr_hw_queues;
> >> -       blk_mq_update_queue_map(set);
> >>  fallback:
> >> +       blk_mq_update_queue_map(set);
> >>         list_for_each_entry(q, &set->tag_list, tag_set_list) {
> >>                 blk_mq_realloc_hw_ctxs(set, q);
> >>                 if (q->nr_hw_queues != set->nr_hw_queues) {
>
> If this is posted as a patch, feel free to add:
>
> Tested-by: Bart van Assche <bvanassche@acm.org>
>
Post it latter, thank you

> > And block/030 should also be improved ?
> >
> >  35         # Since older null_blk versions do not allow "submit_queues" to be
> >  36         # modified, check first whether that configs attribute is writeable.
> >  37         # Each iteration of the loop below triggers $(nproc) + 1
> >  38         # null_init_hctx() calls. Since <interval>=$(nproc), all possible
> >  39         # blk_mq_realloc_hw_ctxs() error paths will be triggered. Whether or
> >  40         # not this test succeeds depends on whether or not _check_dmesg()
> >  41         # detects a kernel warning.
> >  42         if { echo "$(<"$sq")" >$sq; } 2>/dev/null; then
> >  43                 for ((i = 0; i < 100; i++)); do
> >  44                         echo 1 > $sq
> >  45                         nproc > $sq  # this line output lots
> > "nproc: write error: Cannot allocate memory"
> >  46                 done
> >  47         else
> >  48                 SKIP_REASON="Skipping test because $sq cannot be modified"
> >  49         fi
> >
> >
> > The test result show this test case [failed], actually it [pass],
> > there is no warning detect
> > in kernel log, if apply above patch.
> >
> > block/030 (trigger the blk_mq_realloc_hw_ctxs() error path)  [failed]
> >     runtime  1.999s  ...  2.115s
> >     --- tests/block/030.out 2020-05-12 10:42:26.345782849 +0800
> >     +++ /data1/zwp/src/blktests/results/nodev/block/030.out.bad
> > 2020-05-12 20:14:59.878915218 +0800
> >     @@ -1 +1,51 @@
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     ...
> >     (Run 'diff -u tests/block/030.out
> > /data1/zwp/src/blktests/results/nodev/block/030.out.bad' to see the
> > entire diff)
>
> That's weird. I have not yet encountered this. Test block/030 passes on
> my setup.
>
> Thanks,
>
> Bart.