diff mbox series

[-next,1/8] md/raid10: prevent soft lockup while flush writes

Message ID 20230420112946.2869956-2-yukuai1@huaweicloud.com (mailing list archive)
State New, archived
Delegated to: Song Liu
Headers show
Series md/raid1-10: limit the number of plugged bio | expand

Commit Message

Yu Kuai April 20, 2023, 11:29 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com>

Currently, there is no limit for raid1/raid10 plugged bio. While flushing
writes, raid1 has cond_resched() while raid10 doesn't, and too many
writes can cause soft lockup.

Follow up soft lockup can be triggered easily with writeback test for
raid10 with ramdisks:

watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
Call Trace:
 <TASK>
 call_rcu+0x16/0x20
 put_object+0x41/0x80
 __delete_object+0x50/0x90
 delete_object_full+0x2b/0x40
 kmemleak_free+0x46/0xa0
 slab_free_freelist_hook.constprop.0+0xed/0x1a0
 kmem_cache_free+0xfd/0x300
 mempool_free_slab+0x1f/0x30
 mempool_free+0x3a/0x100
 bio_free+0x59/0x80
 bio_put+0xcf/0x2c0
 free_r10bio+0xbf/0xf0
 raid_end_bio_io+0x78/0xb0
 one_write_done+0x8a/0xa0
 raid10_end_write_request+0x1b4/0x430
 bio_endio+0x175/0x320
 brd_submit_bio+0x3b9/0x9b7 [brd]
 __submit_bio+0x69/0xe0
 submit_bio_noacct_nocheck+0x1e6/0x5a0
 submit_bio_noacct+0x38c/0x7e0
 flush_pending_writes+0xf0/0x240
 raid10d+0xac/0x1ed0

This patch fix the problem by adding cond_resched() to raid10 like what
raid1 did.

Note that unlimited plugged bio still need to be optimized because in
the case of writeback lots of dirty pages, this will take lots of memory
and io latecy is quite bad.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid10.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Song Liu April 24, 2023, 11:55 p.m. UTC | #1
On Thu, Apr 20, 2023 at 4:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Currently, there is no limit for raid1/raid10 plugged bio. While flushing
> writes, raid1 has cond_resched() while raid10 doesn't, and too many
> writes can cause soft lockup.
>
> Follow up soft lockup can be triggered easily with writeback test for
> raid10 with ramdisks:
>
> watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
> Call Trace:
>  <TASK>
>  call_rcu+0x16/0x20
>  put_object+0x41/0x80
>  __delete_object+0x50/0x90
>  delete_object_full+0x2b/0x40
>  kmemleak_free+0x46/0xa0
>  slab_free_freelist_hook.constprop.0+0xed/0x1a0
>  kmem_cache_free+0xfd/0x300
>  mempool_free_slab+0x1f/0x30
>  mempool_free+0x3a/0x100
>  bio_free+0x59/0x80
>  bio_put+0xcf/0x2c0
>  free_r10bio+0xbf/0xf0
>  raid_end_bio_io+0x78/0xb0
>  one_write_done+0x8a/0xa0
>  raid10_end_write_request+0x1b4/0x430
>  bio_endio+0x175/0x320
>  brd_submit_bio+0x3b9/0x9b7 [brd]
>  __submit_bio+0x69/0xe0
>  submit_bio_noacct_nocheck+0x1e6/0x5a0
>  submit_bio_noacct+0x38c/0x7e0
>  flush_pending_writes+0xf0/0x240
>  raid10d+0xac/0x1ed0
>
> This patch fix the problem by adding cond_resched() to raid10 like what
> raid1 did.

nit: per submitting-patches.rst:

Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour.

>
> Note that unlimited plugged bio still need to be optimized because in
> the case of writeback lots of dirty pages, this will take lots of memory
> and io latecy is quite bad.

typo: latency.

>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/raid10.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6590aa49598c..a116b7c9d9f3 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -921,6 +921,7 @@ static void flush_pending_writes(struct r10conf *conf)
>                         else
>                                 submit_bio_noacct(bio);
>                         bio = next;
> +                       cond_resched();
>                 }
>                 blk_finish_plug(&plug);
>         } else
> @@ -1140,6 +1141,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>                 else
>                         submit_bio_noacct(bio);
>                 bio = next;
> +               cond_resched();
>         }
>         kfree(plug);
>  }
> --
> 2.39.2
>
Song Liu April 25, 2023, 12:23 a.m. UTC | #2
On Thu, Apr 20, 2023 at 4:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Currently, there is no limit for raid1/raid10 plugged bio. While flushing
> writes, raid1 has cond_resched() while raid10 doesn't, and too many
> writes can cause soft lockup.
>
> Follow up soft lockup can be triggered easily with writeback test for
> raid10 with ramdisks:
>
> watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
> Call Trace:
>  <TASK>
>  call_rcu+0x16/0x20
>  put_object+0x41/0x80
>  __delete_object+0x50/0x90
>  delete_object_full+0x2b/0x40
>  kmemleak_free+0x46/0xa0
>  slab_free_freelist_hook.constprop.0+0xed/0x1a0
>  kmem_cache_free+0xfd/0x300
>  mempool_free_slab+0x1f/0x30
>  mempool_free+0x3a/0x100
>  bio_free+0x59/0x80
>  bio_put+0xcf/0x2c0
>  free_r10bio+0xbf/0xf0
>  raid_end_bio_io+0x78/0xb0
>  one_write_done+0x8a/0xa0
>  raid10_end_write_request+0x1b4/0x430
>  bio_endio+0x175/0x320
>  brd_submit_bio+0x3b9/0x9b7 [brd]
>  __submit_bio+0x69/0xe0
>  submit_bio_noacct_nocheck+0x1e6/0x5a0
>  submit_bio_noacct+0x38c/0x7e0
>  flush_pending_writes+0xf0/0x240
>  raid10d+0xac/0x1ed0

Is it possible to trigger this with a mdadm test?

Thanks,
Song

>
> This patch fix the problem by adding cond_resched() to raid10 like what
> raid1 did.
>
> Note that unlimited plugged bio still need to be optimized because in
> the case of writeback lots of dirty pages, this will take lots of memory
> and io latecy is quite bad.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/raid10.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6590aa49598c..a116b7c9d9f3 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -921,6 +921,7 @@ static void flush_pending_writes(struct r10conf *conf)
>                         else
>                                 submit_bio_noacct(bio);
>                         bio = next;
> +                       cond_resched();
>                 }
>                 blk_finish_plug(&plug);
>         } else
> @@ -1140,6 +1141,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>                 else
>                         submit_bio_noacct(bio);
>                 bio = next;
> +               cond_resched();
>         }
>         kfree(plug);
>  }
> --
> 2.39.2
>
Yu Kuai April 25, 2023, 6:16 a.m. UTC | #3
Hi,

在 2023/04/25 8:23, Song Liu 写道:
> On Thu, Apr 20, 2023 at 4:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Currently, there is no limit for raid1/raid10 plugged bio. While flushing
>> writes, raid1 has cond_resched() while raid10 doesn't, and too many
>> writes can cause soft lockup.
>>
>> Follow up soft lockup can be triggered easily with writeback test for
>> raid10 with ramdisks:
>>
>> watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
>> Call Trace:
>>   <TASK>
>>   call_rcu+0x16/0x20
>>   put_object+0x41/0x80
>>   __delete_object+0x50/0x90
>>   delete_object_full+0x2b/0x40
>>   kmemleak_free+0x46/0xa0
>>   slab_free_freelist_hook.constprop.0+0xed/0x1a0
>>   kmem_cache_free+0xfd/0x300
>>   mempool_free_slab+0x1f/0x30
>>   mempool_free+0x3a/0x100
>>   bio_free+0x59/0x80
>>   bio_put+0xcf/0x2c0
>>   free_r10bio+0xbf/0xf0
>>   raid_end_bio_io+0x78/0xb0
>>   one_write_done+0x8a/0xa0
>>   raid10_end_write_request+0x1b4/0x430
>>   bio_endio+0x175/0x320
>>   brd_submit_bio+0x3b9/0x9b7 [brd]
>>   __submit_bio+0x69/0xe0
>>   submit_bio_noacct_nocheck+0x1e6/0x5a0
>>   submit_bio_noacct+0x38c/0x7e0
>>   flush_pending_writes+0xf0/0x240
>>   raid10d+0xac/0x1ed0
> 
> Is it possible to trigger this with a mdadm test?
> 

The test I mentioned in patch 8 can trigger this problem reliablity, so
I this add a new test can achieve this.

Thanks,
Kuai
> Thanks,
> Song
> 
>>
>> This patch fix the problem by adding cond_resched() to raid10 like what
>> raid1 did.
>>
>> Note that unlimited plugged bio still need to be optimized because in
>> the case of writeback lots of dirty pages, this will take lots of memory
>> and io latecy is quite bad.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/raid10.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 6590aa49598c..a116b7c9d9f3 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -921,6 +921,7 @@ static void flush_pending_writes(struct r10conf *conf)
>>                          else
>>                                  submit_bio_noacct(bio);
>>                          bio = next;
>> +                       cond_resched();
>>                  }
>>                  blk_finish_plug(&plug);
>>          } else
>> @@ -1140,6 +1141,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>                  else
>>                          submit_bio_noacct(bio);
>>                  bio = next;
>> +               cond_resched();
>>          }
>>          kfree(plug);
>>   }
>> --
>> 2.39.2
>>
> .
>
Song Liu April 25, 2023, 6:39 a.m. UTC | #4
On Mon, Apr 24, 2023 at 11:16 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/04/25 8:23, Song Liu 写道:
> > On Thu, Apr 20, 2023 at 4:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> From: Yu Kuai <yukuai3@huawei.com>
> >>
> >> Currently, there is no limit for raid1/raid10 plugged bio. While flushing
> >> writes, raid1 has cond_resched() while raid10 doesn't, and too many
> >> writes can cause soft lockup.
> >>
> >> Follow up soft lockup can be triggered easily with writeback test for
> >> raid10 with ramdisks:
> >>
> >> watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
> >> Call Trace:
> >>   <TASK>
> >>   call_rcu+0x16/0x20
> >>   put_object+0x41/0x80
> >>   __delete_object+0x50/0x90
> >>   delete_object_full+0x2b/0x40
> >>   kmemleak_free+0x46/0xa0
> >>   slab_free_freelist_hook.constprop.0+0xed/0x1a0
> >>   kmem_cache_free+0xfd/0x300
> >>   mempool_free_slab+0x1f/0x30
> >>   mempool_free+0x3a/0x100
> >>   bio_free+0x59/0x80
> >>   bio_put+0xcf/0x2c0
> >>   free_r10bio+0xbf/0xf0
> >>   raid_end_bio_io+0x78/0xb0
> >>   one_write_done+0x8a/0xa0
> >>   raid10_end_write_request+0x1b4/0x430
> >>   bio_endio+0x175/0x320
> >>   brd_submit_bio+0x3b9/0x9b7 [brd]
> >>   __submit_bio+0x69/0xe0
> >>   submit_bio_noacct_nocheck+0x1e6/0x5a0
> >>   submit_bio_noacct+0x38c/0x7e0
> >>   flush_pending_writes+0xf0/0x240
> >>   raid10d+0xac/0x1ed0
> >
> > Is it possible to trigger this with a mdadm test?
> >
>
> The test I mentioned in patch 8 can trigger this problem reliablity, so
> I this add a new test can achieve this.

To be clear, by "mdadm test" I mean the tests included in mdadm:

https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/tests

Could you please try to add a test? If it works, we should add it to
mdadm.

Thanks,
Song
Yu Kuai April 25, 2023, 6:47 a.m. UTC | #5
Hi,

在 2023/04/25 14:39, Song Liu 写道:
> On Mon, Apr 24, 2023 at 11:16 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/04/25 8:23, Song Liu 写道:
>>> On Thu, Apr 20, 2023 at 4:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> Currently, there is no limit for raid1/raid10 plugged bio. While flushing
>>>> writes, raid1 has cond_resched() while raid10 doesn't, and too many
>>>> writes can cause soft lockup.
>>>>
>>>> Follow up soft lockup can be triggered easily with writeback test for
>>>> raid10 with ramdisks:
>>>>
>>>> watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
>>>> Call Trace:
>>>>    <TASK>
>>>>    call_rcu+0x16/0x20
>>>>    put_object+0x41/0x80
>>>>    __delete_object+0x50/0x90
>>>>    delete_object_full+0x2b/0x40
>>>>    kmemleak_free+0x46/0xa0
>>>>    slab_free_freelist_hook.constprop.0+0xed/0x1a0
>>>>    kmem_cache_free+0xfd/0x300
>>>>    mempool_free_slab+0x1f/0x30
>>>>    mempool_free+0x3a/0x100
>>>>    bio_free+0x59/0x80
>>>>    bio_put+0xcf/0x2c0
>>>>    free_r10bio+0xbf/0xf0
>>>>    raid_end_bio_io+0x78/0xb0
>>>>    one_write_done+0x8a/0xa0
>>>>    raid10_end_write_request+0x1b4/0x430
>>>>    bio_endio+0x175/0x320
>>>>    brd_submit_bio+0x3b9/0x9b7 [brd]
>>>>    __submit_bio+0x69/0xe0
>>>>    submit_bio_noacct_nocheck+0x1e6/0x5a0
>>>>    submit_bio_noacct+0x38c/0x7e0
>>>>    flush_pending_writes+0xf0/0x240
>>>>    raid10d+0xac/0x1ed0
>>>
>>> Is it possible to trigger this with a mdadm test?
>>>
>>
>> The test I mentioned in patch 8 can trigger this problem reliablity, so
>> I this add a new test can achieve this.
> 
> To be clear, by "mdadm test" I mean the tests included in mdadm:
> 
> https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/tests
> 
> Could you please try to add a test? If it works, we should add it to
> mdadm.

Yes, of course. However, I'm not familiar how mdadm tests works yet, it
might take some time. By the way, I'll be good if I can add the test to
blktests if possible.

Thanks,
Kuai
> 
> Thanks,
> Song
> .
>
diff mbox series

Patch

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6590aa49598c..a116b7c9d9f3 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -921,6 +921,7 @@  static void flush_pending_writes(struct r10conf *conf)
 			else
 				submit_bio_noacct(bio);
 			bio = next;
+			cond_resched();
 		}
 		blk_finish_plug(&plug);
 	} else
@@ -1140,6 +1141,7 @@  static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 		else
 			submit_bio_noacct(bio);
 		bio = next;
+		cond_resched();
 	}
 	kfree(plug);
 }