Message ID | 20240711202316.10775-1-mat.jonczyk@o2.pl (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | md/raid1: set max_sectors during early return from choose_slow_rdev() | expand |
Context | Check | Description |
---|---|---|
mdraidci/vmtest-md-6_11-PR | success | PR summary |
mdraidci/vmtest-md-6_11-VM_Test-0 | success | Logs for build-kernel |
On Thu, 11 Jul 2024 22:23:16 +0200 Mateusz Jończyk <mat.jonczyk@o2.pl> wrote: > Linux 6.9+ is unable to start a degraded RAID1 array with one drive, > when that drive has a write-mostly flag set. During such an attempt, > the following assertion in bio_split() is hit: > Nice catch and good patch :) Kwai? -Paul > BUG_ON(sectors <= 0); > > Call Trace: > ? bio_split+0x96/0xb0 > ? exc_invalid_op+0x53/0x70 > ? bio_split+0x96/0xb0 > ? asm_exc_invalid_op+0x1b/0x20 > ? bio_split+0x96/0xb0 > ? raid1_read_request+0x890/0xd20 > ? __call_rcu_common.constprop.0+0x97/0x260 > raid1_make_request+0x81/0xce0 > ? __get_random_u32_below+0x17/0x70 > ? new_slab+0x2b3/0x580 > md_handle_request+0x77/0x210 > md_submit_bio+0x62/0xa0 > __submit_bio+0x17b/0x230 > submit_bio_noacct_nocheck+0x18e/0x3c0 > submit_bio_noacct+0x244/0x670 > > After investigation, it turned out that choose_slow_rdev() does not > set the value of max_sectors in some cases and because of it, > raid1_read_request calls bio_split with sectors == 0. > > Fix it by filling in this variable. > > This bug was introduced in > commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from > read_balance()") but apparently hidden until > commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best > rdev from read_balance()") shortly thereafter. > > Cc: stable@vger.kernel.org # 6.9.x+ > Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> > Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from > read_balance()") Cc: Song Liu <song@kernel.org> > Cc: Yu Kuai <yukuai3@huawei.com> > Cc: Paul Luse <paul.e.luse@linux.intel.com> > Cc: Xiao Ni <xni@redhat.com> > Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > Link: > https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/ > > -- > > Tested on both Linux 6.10 and 6.9.8. > > Inside a VM, mdadm testsuite for RAID1 on 6.10 did not find any > problems: ./test --dev=loop --no-error --raidtype=raid1 > (on 6.9.8 there was one failure, caused by external bitmap support not > compiled in). > > Notes: > - I was reliably getting deadlocks when adding / removing devices > on such an array - while the array was loaded with fsstress with 20 > concurrent processes. When the array was idle or loaded with > fsstress with 8 processes, no such deadlocks happened in my tests. > This occurred also on unpatched Linux 6.8.0 though, but not on > 6.1.97-rc1, so this is likely an independent regression (to be > investigated). > - I was also getting deadlocks when adding / removing the bitmap on > the array in similar conditions - this happened on Linux 6.1.97-rc1 > also though. fsstress with 8 concurrent processes did cause it only > once during many tests. > - in my testing, there was once a problem with hot adding an > internal bitmap to the array: > mdadm: Cannot add bitmap while array is resyncing or > reshaping etc. mdadm: failed to set internal bitmap. > even though no such reshaping was happening according to > /proc/mdstat. This seems unrelated, though. > --- > drivers/md/raid1.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 7b8a71ca66dd..82f70a4ce6ed 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -680,6 +680,7 @@ static int choose_slow_rdev(struct r1conf *conf, > struct r1bio *r1_bio, len = r1_bio->sectors; > read_len = raid1_check_read_range(rdev, this_sector, > &len); if (read_len == r1_bio->sectors) { > + *max_sectors = read_len; > update_read_sectors(conf, disk, this_sector, > read_len); return disk; > } > > base-commit: 256abd8e550ce977b728be79a74e1729438b4948
Hi, 在 2024/07/12 4:23, Mateusz Jończyk 写道: > Linux 6.9+ is unable to start a degraded RAID1 array with one drive, > when that drive has a write-mostly flag set. During such an attempt, > the following assertion in bio_split() is hit: > > BUG_ON(sectors <= 0); > > Call Trace: > ? bio_split+0x96/0xb0 > ? exc_invalid_op+0x53/0x70 > ? bio_split+0x96/0xb0 > ? asm_exc_invalid_op+0x1b/0x20 > ? bio_split+0x96/0xb0 > ? raid1_read_request+0x890/0xd20 > ? __call_rcu_common.constprop.0+0x97/0x260 > raid1_make_request+0x81/0xce0 > ? __get_random_u32_below+0x17/0x70 > ? new_slab+0x2b3/0x580 > md_handle_request+0x77/0x210 > md_submit_bio+0x62/0xa0 > __submit_bio+0x17b/0x230 > submit_bio_noacct_nocheck+0x18e/0x3c0 > submit_bio_noacct+0x244/0x670 > > After investigation, it turned out that choose_slow_rdev() does not set > the value of max_sectors in some cases and because of it, > raid1_read_request calls bio_split with sectors == 0. > > Fix it by filling in this variable. > > This bug was introduced in > commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") > but apparently hidden until > commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()") > shortly thereafter. > > Cc: stable@vger.kernel.org # 6.9.x+ > Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> > Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") > Cc: Song Liu <song@kernel.org> > Cc: Yu Kuai <yukuai3@huawei.com> > Cc: Paul Luse <paul.e.luse@linux.intel.com> > Cc: Xiao Ni <xni@redhat.com> > Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/ > > -- Thanks for the patch! Reviewed-by: Yu Kuai <yukuai3@huawei.com> BTW, do you have plans to add a new test to mdadm tests? I'll pick it up if you don't, just let me know. Thanks, Kuai > > Tested on both Linux 6.10 and 6.9.8. > > Inside a VM, mdadm testsuite for RAID1 on 6.10 did not find any problems: > ./test --dev=loop --no-error --raidtype=raid1 > (on 6.9.8 there was one failure, caused by external bitmap support not > compiled in). > > Notes: > - I was reliably getting deadlocks when adding / removing devices > on such an array - while the array was loaded with fsstress with 20 > concurrent processes. When the array was idle or loaded with fsstress > with 8 processes, no such deadlocks happened in my tests. > This occurred also on unpatched Linux 6.8.0 though, but not on > 6.1.97-rc1, so this is likely an independent regression (to be > investigated). > - I was also getting deadlocks when adding / removing the bitmap on the > array in similar conditions - this happened on Linux 6.1.97-rc1 > also though. fsstress with 8 concurrent processes did cause it only > once during many tests. > - in my testing, there was once a problem with hot adding an > internal bitmap to the array: > mdadm: Cannot add bitmap while array is resyncing or reshaping etc. > mdadm: failed to set internal bitmap. > even though no such reshaping was happening according to /proc/mdstat. > This seems unrelated, though. > --- > drivers/md/raid1.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 7b8a71ca66dd..82f70a4ce6ed 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -680,6 +680,7 @@ static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio, > len = r1_bio->sectors; > read_len = raid1_check_read_range(rdev, this_sector, &len); > if (read_len == r1_bio->sectors) { > + *max_sectors = read_len; > update_read_sectors(conf, disk, this_sector, read_len); > return disk; > } > > base-commit: 256abd8e550ce977b728be79a74e1729438b4948 >
On Fri, Jul 12, 2024 at 9:17 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: [...] > > > > After investigation, it turned out that choose_slow_rdev() does not set > > the value of max_sectors in some cases and because of it, > > raid1_read_request calls bio_split with sectors == 0. > > > > Fix it by filling in this variable. > > > > This bug was introduced in > > commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") > > but apparently hidden until > > commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()") > > shortly thereafter. > > > > Cc: stable@vger.kernel.org # 6.9.x+ > > Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> > > Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") > > Cc: Song Liu <song@kernel.org> > > Cc: Yu Kuai <yukuai3@huawei.com> > > Cc: Paul Luse <paul.e.luse@linux.intel.com> > > Cc: Xiao Ni <xni@redhat.com> > > Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > > Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/ > > > > -- > > Thanks for the patch! > > Reviewed-by: Yu Kuai <yukuai3@huawei.com> Applied to md-6.11. Thanks! Song
W dniu 12.07.2024 o 03:16, Yu Kuai pisze: > Hi, > > 在 2024/07/12 4:23, Mateusz Jończyk 写道: >> Linux 6.9+ is unable to start a degraded RAID1 array with one drive, >> when that drive has a write-mostly flag set. During such an attempt, >> the following assertion in bio_split() is hit: >> [snip] > > Thanks for the patch! > > Reviewed-by: Yu Kuai <yukuai3@huawei.com> > > BTW, do you have plans to add a new test to mdadm tests? I'll > pick it up if you don't, just let me know. > > Thanks, > Kuai Yes, I'm working on it. Greetings, Mateusz
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 7b8a71ca66dd..82f70a4ce6ed 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -680,6 +680,7 @@ static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio, len = r1_bio->sectors; read_len = raid1_check_read_range(rdev, this_sector, &len); if (read_len == r1_bio->sectors) { + *max_sectors = read_len; update_read_sectors(conf, disk, this_sector, read_len); return disk; }
Linux 6.9+ is unable to start a degraded RAID1 array with one drive, when that drive has a write-mostly flag set. During such an attempt, the following assertion in bio_split() is hit: BUG_ON(sectors <= 0); Call Trace: ? bio_split+0x96/0xb0 ? exc_invalid_op+0x53/0x70 ? bio_split+0x96/0xb0 ? asm_exc_invalid_op+0x1b/0x20 ? bio_split+0x96/0xb0 ? raid1_read_request+0x890/0xd20 ? __call_rcu_common.constprop.0+0x97/0x260 raid1_make_request+0x81/0xce0 ? __get_random_u32_below+0x17/0x70 ? new_slab+0x2b3/0x580 md_handle_request+0x77/0x210 md_submit_bio+0x62/0xa0 __submit_bio+0x17b/0x230 submit_bio_noacct_nocheck+0x18e/0x3c0 submit_bio_noacct+0x244/0x670 After investigation, it turned out that choose_slow_rdev() does not set the value of max_sectors in some cases and because of it, raid1_read_request calls bio_split with sectors == 0. Fix it by filling in this variable. This bug was introduced in commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") but apparently hidden until commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()") shortly thereafter. Cc: stable@vger.kernel.org # 6.9.x+ Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") Cc: Song Liu <song@kernel.org> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Paul Luse <paul.e.luse@linux.intel.com> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/ -- Tested on both Linux 6.10 and 6.9.8. Inside a VM, mdadm testsuite for RAID1 on 6.10 did not find any problems: ./test --dev=loop --no-error --raidtype=raid1 (on 6.9.8 there was one failure, caused by external bitmap support not compiled in). Notes: - I was reliably getting deadlocks when adding / removing devices on such an array - while the array was loaded with fsstress with 20 concurrent processes. When the array was idle or loaded with fsstress with 8 processes, no such deadlocks happened in my tests. This occurred also on unpatched Linux 6.8.0 though, but not on 6.1.97-rc1, so this is likely an independent regression (to be investigated). - I was also getting deadlocks when adding / removing the bitmap on the array in similar conditions - this happened on Linux 6.1.97-rc1 also though. fsstress with 8 concurrent processes did cause it only once during many tests. - in my testing, there was once a problem with hot adding an internal bitmap to the array: mdadm: Cannot add bitmap while array is resyncing or reshaping etc. mdadm: failed to set internal bitmap. even though no such reshaping was happening according to /proc/mdstat. This seems unrelated, though. --- drivers/md/raid1.c | 1 + 1 file changed, 1 insertion(+) base-commit: 256abd8e550ce977b728be79a74e1729438b4948