diff mbox series

md: do not delete safemode_timer in mddev_suspend

Message ID 20240508092053.1447930-1-linan666@huaweicloud.com (mailing list archive)
State Accepted, archived
Headers show
Series md: do not delete safemode_timer in mddev_suspend | expand

Checks

Context Check Description
mdraidci/vmtest-md-6-10-PR success PR summary
mdraidci/vmtest-md-6-10-VM_Test-0 success Logs for build-kernel

Commit Message

Li Nan May 8, 2024, 9:20 a.m. UTC
From: Li Nan <linan122@huawei.com>

The deletion of safemode_timer in mddev_suspend() is redundant and
potentially harmful now. If timer is about to be woken up but gets
deleted, 'in_sync' will remain 0 until the next write, causing array
to stay in the 'active' state instead of transitioning to 'clean'.

Commit 0d9f4f135eb6 ("MD: Add del_timer_sync to mddev_suspend (fix
nasty panic))" introduced this deletion for dm, because if timer fired
after dm is destroyed, the resource which the timer depends on might
have been freed.

However, commit 0dd84b319352 ("md: call __md_stop_writes in md_stop")
added __md_stop_writes() to md_stop(), which is called before freeing
resource. Timer is deleted in __md_stop_writes(), and the origin issue
is resolved. Therefore, delete safemode_timer can be removed safely now.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Yu Kuai May 9, 2024, 1:34 a.m. UTC | #1
在 2024/05/08 17:20, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> The deletion of safemode_timer in mddev_suspend() is redundant and
> potentially harmful now. If timer is about to be woken up but gets
> deleted, 'in_sync' will remain 0 until the next write, causing array
> to stay in the 'active' state instead of transitioning to 'clean'.
> 
> Commit 0d9f4f135eb6 ("MD: Add del_timer_sync to mddev_suspend (fix
> nasty panic))" introduced this deletion for dm, because if timer fired
> after dm is destroyed, the resource which the timer depends on might
> have been freed.
> 
> However, commit 0dd84b319352 ("md: call __md_stop_writes in md_stop")
> added __md_stop_writes() to md_stop(), which is called before freeing
> resource. Timer is deleted in __md_stop_writes(), and the origin issue
> is resolved. Therefore, delete safemode_timer can be removed safely now.
> 
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/md/md.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index aff9118ff697..09c55d9a2c54 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -479,7 +479,6 @@ int mddev_suspend(struct mddev *mddev, bool interruptible)
>   	 */
>   	WRITE_ONCE(mddev->suspended, mddev->suspended + 1);
>   
> -	del_timer_sync(&mddev->safemode_timer);

I don't understand why time is deleted here before, it's right based on
git log, commit 0d9f4f135eb6 add this to fix panic for dm-raid, and it's
not necessary now.

LGTM, feel free to add:

Reviewed-by: Yu Kuai <yukuai3@huawei.com>

However, since this behaviour is introduced since 2012, does anybody
really care about array status is 'active' instead of 'clean' while
there is no IO after suspend?

Thanks,
Kuai

>   	/* restrict memory reclaim I/O during raid array is suspend */
>   	mddev->noio_flag = memalloc_noio_save();
>   
>
Mariusz Tkaczyk May 15, 2024, 1:12 p.m. UTC | #2
On Thu, 9 May 2024 09:34:15 +0800
Yu Kuai <yukuai1@huaweicloud.com> wrote:

> However, since this behaviour is introduced since 2012, does anybody
> really care about array status is 'active' instead of 'clean' while
> there is no IO after suspend?

It may cause rebuild after reboot (bad but we can live with it) or platform hang
(this is bad). mdadm is waiting for transition to clean on shutdown if I
remember correctly.
Probably nobody tried that as we all know that Linux doesn't like suspending
and this is rare to reboot platform just after suspend. Probably any write will
fix it.

But this is all based on my knowledge, not tested or proved however I believe
that it gives light where to look for problems if you want.

Mariusz
Song Liu June 10, 2024, 5:35 p.m. UTC | #3
On Wed, May 8, 2024 at 2:31 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> The deletion of safemode_timer in mddev_suspend() is redundant and
> potentially harmful now. If timer is about to be woken up but gets
> deleted, 'in_sync' will remain 0 until the next write, causing array
> to stay in the 'active' state instead of transitioning to 'clean'.
>
> Commit 0d9f4f135eb6 ("MD: Add del_timer_sync to mddev_suspend (fix
> nasty panic))" introduced this deletion for dm, because if timer fired
> after dm is destroyed, the resource which the timer depends on might
> have been freed.
>
> However, commit 0dd84b319352 ("md: call __md_stop_writes in md_stop")
> added __md_stop_writes() to md_stop(), which is called before freeing
> resource. Timer is deleted in __md_stop_writes(), and the origin issue
> is resolved. Therefore, delete safemode_timer can be removed safely now.
>
> Signed-off-by: Li Nan <linan122@huawei.com>

Applied to md-6.11. Thanks!

Song
diff mbox series

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index aff9118ff697..09c55d9a2c54 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -479,7 +479,6 @@  int mddev_suspend(struct mddev *mddev, bool interruptible)
 	 */
 	WRITE_ONCE(mddev->suspended, mddev->suspended + 1);
 
-	del_timer_sync(&mddev->safemode_timer);
 	/* restrict memory reclaim I/O during raid array is suspend */
 	mddev->noio_flag = memalloc_noio_save();