mbox series

[-next,v2,0/2] md/raid5-cache: fix a deadlock in r5l_exit_log()

Message ID 20230628010756.70649-1-yukuai1@huaweicloud.com (mailing list archive)
Headers show
Series md/raid5-cache: fix a deadlock in r5l_exit_log() | expand

Message

Yu Kuai June 28, 2023, 1:07 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com>

Changes in v2:
 - remove a now unused local variable in patch 2;

Commit b13015af94cf ("md/raid5-cache: Clear conf->log after finishing
work") introduce a new problem:

// caller hold reconfig_mutex
r5l_exit_log
 flush_work(&log->disable_writeback_work)
			r5c_disable_writeback_async
			 wait_event
			  /*
			   * conf->log is not NULL, and mddev_trylock()
			   * will fail, wait_event() can never pass.
			   */
 conf->log = NULL

patch 1 revert this patch, an patch 2 fix the original problem in a
different way.

Noted this problem is just found by code review, and I think this is
probably the reason that some mdadm tests is broken.

Yu Kuai (2):
  md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after
    finishing work"
  md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()

 drivers/md/raid5-cache.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

Comments

Yu Kuai July 7, 2023, 1:24 a.m. UTC | #1
Hi,

在 2023/06/28 9:07, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Changes in v2:
>   - remove a now unused local variable in patch 2;
> 
> Commit b13015af94cf ("md/raid5-cache: Clear conf->log after finishing
> work") introduce a new problem:
> 
> // caller hold reconfig_mutex
> r5l_exit_log
>   flush_work(&log->disable_writeback_work)
> 			r5c_disable_writeback_async
> 			 wait_event
> 			  /*
> 			   * conf->log is not NULL, and mddev_trylock()
> 			   * will fail, wait_event() can never pass.
> 			   */
>   conf->log = NULL
> 
> patch 1 revert this patch, an patch 2 fix the original problem in a
> different way.
> 
> Noted this problem is just found by code review, and I think this is
> probably the reason that some mdadm tests is broken.

Any suggestions?

By the way, while taking another look at this problem, I think probably
read and write 'conf->log' should use READ_ONCE and WRITE_ONCE.

Thanks,
Kuai
> 
> Yu Kuai (2):
>    md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after
>      finishing work"
>    md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
> 
>   drivers/md/raid5-cache.c | 25 ++++++++++---------------
>   1 file changed, 10 insertions(+), 15 deletions(-)
>