Message ID | 20231211081714.1923567-1-linan666@huaweicloud.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Song Liu |
Headers | show |
Series | md: Don't clear MD_CLOSING when the raid is about to stop | expand |
On Mon, 11 Dec 2023 16:17:14 +0800 linan666@huaweicloud.com wrote: > From: Li Nan <linan122@huawei.com> > > The raid should not be opened anymore when it is about to be stopped. > However, other processes can open it again if the flag MD_CLOSING is > cleared before exiting. From now on, this flag will not be cleared when > the raid will be stopped. > > Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called > md_set_readonly or do_md_stop") Signed-off-by: Li Nan <linan122@huawei.com> Hello Li Nan, I was there when I needed to fix this: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c8870379a21fbd9ad14ca36204ccfbe9d25def43 For sure, you have to consider applying same solution for array_store "clear". Minor nit below. Thanks, Mariusz > --- > drivers/md/md.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 4e9fe5cbeedc..ebdfc9068a60 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -6238,7 +6238,6 @@ static void md_clean(struct mddev *mddev) > mddev->persistent = 0; > mddev->level = LEVEL_NONE; > mddev->clevel[0] = 0; > - mddev->flags = 0; I recommend (safety recommendation): mddev->flags = MD_CLOSING; Unless you can prove that other flags cannot race. > mddev->sb_flags = 0; > mddev->ro = MD_RDWR; > mddev->metadata_type[0] = 0;
Hi, 在 2023/12/11 17:56, Mariusz Tkaczyk 写道: > On Mon, 11 Dec 2023 16:17:14 +0800 > linan666@huaweicloud.com wrote: > >> From: Li Nan <linan122@huawei.com> >> >> The raid should not be opened anymore when it is about to be stopped. >> However, other processes can open it again if the flag MD_CLOSING is >> cleared before exiting. From now on, this flag will not be cleared when >> the raid will be stopped. >> >> Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called >> md_set_readonly or do_md_stop") Signed-off-by: Li Nan <linan122@huawei.com> > > Hello Li Nan, > I was there when I needed to fix this: > https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c8870379a21fbd9ad14ca36204ccfbe9d25def43 > > For sure, you have to consider applying same solution for array_store "clear". > Minor nit below. > > Thanks, > Mariusz > >> --- >> drivers/md/md.c | 8 +++----- >> 1 file changed, 3 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index 4e9fe5cbeedc..ebdfc9068a60 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -6238,7 +6238,6 @@ static void md_clean(struct mddev *mddev) >> mddev->persistent = 0; >> mddev->level = LEVEL_NONE; >> mddev->clevel[0] = 0; >> - mddev->flags = 0; > > I recommend (safety recommendation): > mddev->flags = MD_CLOSING; Taking a look I think both MD_CLOSING and MD_DELETED should not be cleared, however, there is no guarantee that MD_CLOSING will be set before md_clean, because mdadm can be removed without running. Hence I think just set MD_CLOSING is werid. I think the proper way is to keep MD_CLOSING and MD_DELETED if they are set. However, there is no such api to clear other bits at once. Since we're not expecting anyone else to write flags, following maybe acceptable: mddev->flags &= BIT_ULL_MASK(MD_CLOSING) | BIT_ULL_MASK(MD_DELETED); Or after making sure other flags cannot race, this patch is ok. Thanks, Kuai > > Unless you can prove that other flags cannot race. > >> mddev->sb_flags = 0; >> mddev->ro = MD_RDWR; >> mddev->metadata_type[0] = 0; > > . >
On Tue, 12 Dec 2023 11:21:28 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote: > Hi, > > 在 2023/12/11 17:56, Mariusz Tkaczyk 写道: > > On Mon, 11 Dec 2023 16:17:14 +0800 > > linan666@huaweicloud.com wrote: > > > >> From: Li Nan <linan122@huawei.com> > >> > >> The raid should not be opened anymore when it is about to be stopped. > >> However, other processes can open it again if the flag MD_CLOSING is > >> cleared before exiting. From now on, this flag will not be cleared when > >> the raid will be stopped. > >> > >> Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called > >> md_set_readonly or do_md_stop") Signed-off-by: Li Nan > >> <linan122@huawei.com> > > > > Hello Li Nan, > > I was there when I needed to fix this: > > https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c8870379a21fbd9ad14ca36204ccfbe9d25def43 > > > > For sure, you have to consider applying same solution for array_store > > "clear". Minor nit below. > > > > Thanks, > > Mariusz > > > >> --- > >> drivers/md/md.c | 8 +++----- > >> 1 file changed, 3 insertions(+), 5 deletions(-) > >> > >> diff --git a/drivers/md/md.c b/drivers/md/md.c > >> index 4e9fe5cbeedc..ebdfc9068a60 100644 > >> --- a/drivers/md/md.c > >> +++ b/drivers/md/md.c > >> @@ -6238,7 +6238,6 @@ static void md_clean(struct mddev *mddev) > >> mddev->persistent = 0; > >> mddev->level = LEVEL_NONE; > >> mddev->clevel[0] = 0; > >> - mddev->flags = 0; > > > > I recommend (safety recommendation): > > mddev->flags = MD_CLOSING; > > Taking a look I think both MD_CLOSING and MD_DELETED should not be > cleared, however, there is no guarantee that MD_CLOSING will be set > before md_clean, because mdadm can be removed without running. Hence I > think just set MD_CLOSING is werid. > > I think the proper way is to keep MD_CLOSING and MD_DELETED if they are > set. However, there is no such api to clear other bits at once. Since > we're not expecting anyone else to write flags, following maybe > acceptable: > > mddev->flags &= BIT_ULL_MASK(MD_CLOSING) | BIT_ULL_MASK(MD_DELETED); Yes, MD_CLOSING is a bit number to not a bit value I can assign directly. Thanks for clarifying! Mariusz > > Or after making sure other flags cannot race, this patch is ok. > > Thanks, > Kuai > > > > > Unless you can prove that other flags cannot race. > > > >> mddev->sb_flags = 0; > >> mddev->ro = MD_RDWR; > >> mddev->metadata_type[0] = 0; > > > > . > > >
Hello, kernel test robot noticed "mdadm-selftests.06name.fail" on: commit: 4d060913335fad6f1d1f0816859c20aae823851c ("[PATCH] md: Don't clear MD_CLOSING when the raid is about to stop") url: https://github.com/intel-lab-lkp/linux/commits/linan666-huaweicloud-com/md-Don-t-clear-MD_CLOSING-when-the-raid-is-about-to-stop/20231211-162010 base: git://git.kernel.org/cgit/linux/kernel/git/song/md.git md-next patch link: https://lore.kernel.org/all/20231211081714.1923567-1-linan666@huaweicloud.com/ patch subject: [PATCH] md: Don't clear MD_CLOSING when the raid is about to stop in testcase: mdadm-selftests version: mdadm-selftests-x86_64-5f41845-1_20220826 with following parameters: disk: 1HDD test_prefix: 06name compiler: gcc-12 test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz (Haswell) with 16G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202312261217.c5597bdf-oliver.sang@intel.com 2023-12-24 10:19:28 mkdir -p /var/tmp 2023-12-24 10:19:28 mke2fs -t ext3 -b 4096 -J size=4 -q /dev/sdb1 2023-12-24 10:19:59 mount -t ext3 /dev/sdb1 /var/tmp sed -e 's/{DEFAULT_METADATA}/1.2/g' \ -e 's,{MAP_PATH},/run/mdadm/map,g' mdadm.8.in > mdadm.8 /usr/bin/install -D -m 644 mdadm.8 /usr/share/man/man8/mdadm.8 /usr/bin/install -D -m 644 mdmon.8 /usr/share/man/man8/mdmon.8 /usr/bin/install -D -m 644 md.4 /usr/share/man/man4/md.4 /usr/bin/install -D -m 644 mdadm.conf.5 /usr/share/man/man5/mdadm.conf.5 /usr/bin/install -D -m 644 udev-md-raid-creating.rules /lib/udev/rules.d/01-md-raid-creating.rules /usr/bin/install -D -m 644 udev-md-raid-arrays.rules /lib/udev/rules.d/63-md-raid-arrays.rules /usr/bin/install -D -m 644 udev-md-raid-assembly.rules /lib/udev/rules.d/64-md-raid-assembly.rules /usr/bin/install -D -m 644 udev-md-clustered-confirm-device.rules /lib/udev/rules.d/69-md-clustered-confirm-device.rules /usr/bin/install -D -m 755 mdadm /sbin/mdadm /usr/bin/install -D -m 755 mdmon /sbin/mdmon Testing on linux-6.7.0-rc3-00014-g4d060913335f kernel /lkp/benchmarks/mdadm-selftests/tests/06name... FAILED - see /var/tmp/06name.log and /var/tmp/fail06name.log for details The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20231226/202312261217.c5597bdf-oliver.sang@intel.com
在 2023/12/26 14:24, kernel test robot 写道: > > > Hello, > > kernel test robot noticed "mdadm-selftests.06name.fail" on: > > commit: 4d060913335fad6f1d1f0816859c20aae823851c ("[PATCH] md: Don't clear MD_CLOSING when the raid is about to stop") > url: https://github.com/intel-lab-lkp/linux/commits/linan666-huaweicloud-com/md-Don-t-clear-MD_CLOSING-when-the-raid-is-about-to-stop/20231211-162010 > base: git://git.kernel.org/cgit/linux/kernel/git/song/md.git md-next > patch link: https://lore.kernel.org/all/20231211081714.1923567-1-linan666@huaweicloud.com/ > patch subject: [PATCH] md: Don't clear MD_CLOSING when the raid is about to stop > > in testcase: mdadm-selftests > version: mdadm-selftests-x86_64-5f41845-1_20220826 > with following parameters: > > disk: 1HDD > test_prefix: 06name > > > > compiler: gcc-12 > test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz (Haswell) with 16G memory > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202312261217.c5597bdf-oliver.sang@intel.com > > 2023-12-24 10:19:28 mkdir -p /var/tmp > 2023-12-24 10:19:28 mke2fs -t ext3 -b 4096 -J size=4 -q /dev/sdb1 > 2023-12-24 10:19:59 mount -t ext3 /dev/sdb1 /var/tmp > sed -e 's/{DEFAULT_METADATA}/1.2/g' \ > -e 's,{MAP_PATH},/run/mdadm/map,g' mdadm.8.in > mdadm.8 > /usr/bin/install -D -m 644 mdadm.8 /usr/share/man/man8/mdadm.8 > /usr/bin/install -D -m 644 mdmon.8 /usr/share/man/man8/mdmon.8 > /usr/bin/install -D -m 644 md.4 /usr/share/man/man4/md.4 > /usr/bin/install -D -m 644 mdadm.conf.5 /usr/share/man/man5/mdadm.conf.5 > /usr/bin/install -D -m 644 udev-md-raid-creating.rules /lib/udev/rules.d/01-md-raid-creating.rules > /usr/bin/install -D -m 644 udev-md-raid-arrays.rules /lib/udev/rules.d/63-md-raid-arrays.rules > /usr/bin/install -D -m 644 udev-md-raid-assembly.rules /lib/udev/rules.d/64-md-raid-assembly.rules > /usr/bin/install -D -m 644 udev-md-clustered-confirm-device.rules /lib/udev/rules.d/69-md-clustered-confirm-device.rules > /usr/bin/install -D -m 755 mdadm /sbin/mdadm > /usr/bin/install -D -m 755 mdmon /sbin/mdmon > Testing on linux-6.7.0-rc3-00014-g4d060913335f kernel > /lkp/benchmarks/mdadm-selftests/tests/06name... FAILED - see /var/tmp/06name.log and /var/tmp/fail06name.log for details > > If MD_CLOSING is not cleared when stopping, it will still exist when assembling and mddev can't be opened anymore. We need to clear it when starting. I will fix it in next version. > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20231226/202312261217.c5597bdf-oliver.sang@intel.com > > >
在 2023/12/12 11:21, Yu Kuai 写道: > Hi, > > 在 2023/12/11 17:56, Mariusz Tkaczyk 写道: >> On Mon, 11 Dec 2023 16:17:14 +0800 >> linan666@huaweicloud.com wrote: >> >>> From: Li Nan <linan122@huawei.com> >>> >>> The raid should not be opened anymore when it is about to be stopped. >>> However, other processes can open it again if the flag MD_CLOSING is >>> cleared before exiting. From now on, this flag will not be cleared when >>> the raid will be stopped. >>> >>> Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called >>> md_set_readonly or do_md_stop") Signed-off-by: Li Nan >>> <linan122@huawei.com> >> >> Hello Li Nan, >> I was there when I needed to fix this: >> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c8870379a21fbd9ad14ca36204ccfbe9d25def43 >> >> >> For sure, you have to consider applying same solution for array_store >> "clear". >> Minor nit below. >> >> Thanks, >> Mariusz >> >>> --- >>> drivers/md/md.c | 8 +++----- >>> 1 file changed, 3 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>> index 4e9fe5cbeedc..ebdfc9068a60 100644 >>> --- a/drivers/md/md.c >>> +++ b/drivers/md/md.c >>> @@ -6238,7 +6238,6 @@ static void md_clean(struct mddev *mddev) >>> mddev->persistent = 0; >>> mddev->level = LEVEL_NONE; >>> mddev->clevel[0] = 0; >>> - mddev->flags = 0; >> >> I recommend (safety recommendation): >> mddev->flags = MD_CLOSING; > > Taking a look I think both MD_CLOSING and MD_DELETED should not be > cleared, however, there is no guarantee that MD_CLOSING will be set > before md_clean, because mdadm can be removed without running. Hence I > think just set MD_CLOSING is werid. > > I think the proper way is to keep MD_CLOSING and MD_DELETED if they are > set. However, there is no such api to clear other bits at once. Since > we're not expecting anyone else to write flags, following maybe > acceptable: > > mddev->flags &= BIT_ULL_MASK(MD_CLOSING) | BIT_ULL_MASK(MD_DELETED); > MD_DELETED is only set after mddev->active is put to 0. We need to open mddev and get it before stropping raid, so the active must not be 0 and MD_DELETED will not be set in md_clean. > Or after making sure other flags cannot race, this patch is ok. > > Thanks, > Kuai > >> >> Unless you can prove that other flags cannot race. >> >>> mddev->sb_flags = 0; >>> mddev->ro = MD_RDWR; >>> mddev->metadata_type[0] = 0; >> >> . >> > > > .
diff --git a/drivers/md/md.c b/drivers/md/md.c index 4e9fe5cbeedc..ebdfc9068a60 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6238,7 +6238,6 @@ static void md_clean(struct mddev *mddev) mddev->persistent = 0; mddev->level = LEVEL_NONE; mddev->clevel[0] = 0; - mddev->flags = 0; mddev->sb_flags = 0; mddev->ro = MD_RDWR; mddev->metadata_type[0] = 0; @@ -7614,7 +7613,6 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode, int err = 0; void __user *argp = (void __user *)arg; struct mddev *mddev = NULL; - bool did_set_md_closing = false; if (!md_ioctl_valid(cmd)) return -ENOTTY; @@ -7698,7 +7696,6 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode, err = -EBUSY; goto out; } - did_set_md_closing = true; mutex_unlock(&mddev->open_mutex); sync_blockdev(bdev); } @@ -7742,10 +7739,13 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode, case STOP_ARRAY: err = do_md_stop(mddev, 0, bdev); + if (err) + clear_bit(MD_CLOSING, &mddev->flags); goto unlock; case STOP_ARRAY_RO: err = md_set_readonly(mddev, bdev); + clear_bit(MD_CLOSING, &mddev->flags); goto unlock; case HOT_REMOVE_DISK: @@ -7840,8 +7840,6 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode, mddev_unlock(mddev); out: - if(did_set_md_closing) - clear_bit(MD_CLOSING, &mddev->flags); return err; } #ifdef CONFIG_COMPAT