[v3,1/3] md/raid1: freeze array more strictly when reshape

Message ID	20230719070954.3084379-2-xueshi.hu@smartx.com (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linux-raid-owner@vger.kernel.org> From: Xueshi Hu <xueshi.hu@smartx.com> To: song@kernel.org Cc: linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, Xueshi Hu <xueshi.hu@smartx.com> Subject: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Date: Wed, 19 Jul 2023 15:09:52 +0800 Message-Id: <20230719070954.3084379-2-xueshi.hu@smartx.com> In-Reply-To: <20230719070954.3084379-1-xueshi.hu@smartx.com> References: <20230719070954.3084379-1-xueshi.hu@smartx.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	don't change mempool if in-flight r1bio exists \| expand [v3,0/3] don't change mempool if in-flight r1bio exists [v3,1/3] md/raid1: freeze array more strictly when reshape [v3,2/3] md/raid1: don't allow_barrier() before r1bio got freed [v3,3/3] md/raid1: check array size before reshape

Message ID

20230719070954.3084379-2-xueshi.hu@smartx.com (mailing list archive)

State

Superseded, archived

Headers

From: Xueshi Hu <xueshi.hu@smartx.com>
To: song@kernel.org
Cc: linux-raid@vger.kernel.org, yukuai1@huaweicloud.com,
        Xueshi Hu <xueshi.hu@smartx.com>
Subject: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
Date: Wed, 19 Jul 2023 15:09:52 +0800
Message-Id: <20230719070954.3084379-2-xueshi.hu@smartx.com>
In-Reply-To: <20230719070954.3084379-1-xueshi.hu@smartx.com>
References: <20230719070954.3084379-1-xueshi.hu@smartx.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

don't change mempool if in-flight r1bio exists | expand

Commit Message

Xueshi Hu July 19, 2023, 7:09 a.m. UTC

When an IO error happens, reschedule_retry() will increase
r1conf::nr_queued, which makes freeze_array() unblocked. However, before
all r1bio in the memory pool are released, the memory pool should not be
modified. Introduce freeze_array_totally() to solve the problem. Compared
to freeze_array(), it's more strict because any in-flight io needs to
complete including queued io.

Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
 drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

Comments

Yu Kuai July 20, 2023, 1:36 a.m. UTC | #1

Hi,

在 2023/07/19 15:09, Xueshi Hu 写道:
> When an IO error happens, reschedule_retry() will increase
> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
> all r1bio in the memory pool are released, the memory pool should not be
> modified. Introduce freeze_array_totally() to solve the problem. Compared
> to freeze_array(), it's more strict because any in-flight io needs to
> complete including queued io.
> 
> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
>   drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
>   1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index dd25832eb045..5605c9680818 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, int extra)
>   	/* Stop sync I/O and normal I/O and wait for everything to
>   	 * go quiet.
>   	 * This is called in two situations:
> -	 * 1) management command handlers (reshape, remove disk, quiesce).
> +	 * 1) management command handlers (remove disk, quiesce).
>   	 * 2) one normal I/O request failed.
>   
>   	 * After array_frozen is set to 1, new sync IO will be blocked at
> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
>   	wake_up(&conf->wait_barrier);
>   }
>   
> +/* conf->resync_lock should be held */
> +static int get_pending(struct r1conf *conf)
> +{
> +	int idx, ret;
> +
> +	ret = atomic_read(&conf->nr_sync_pending);
> +	for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> +		ret += atomic_read(&conf->nr_pending[idx]);
> +
> +	return ret;
> +}
> +
> +static void freeze_array_totally(struct r1conf *conf)
> +{
> +	/*
> +	 * freeze_array_totally() is almost the same with freeze_array() except
> +	 * it requires there's no queued io. Raid1's reshape will destroy the
> +	 * old mempool and change r1conf::raid_disks, which are necessary when
> +	 * freeing the queued io.
> +	 */
> +	spin_lock_irq(&conf->resync_lock);
> +	conf->array_frozen = 1;
> +	raid1_log(conf->mddev, "freeze totally");
> +	wait_event_lock_irq_cmd(
> +			conf->wait_barrier,
> +			get_pending(conf) == 0,
> +			conf->resync_lock,
> +			md_wakeup_thread(conf->mddev->thread));
> +	spin_unlock_irq(&conf->resync_lock);
> +}
> +
>   static void alloc_behind_master_bio(struct r1bio *r1_bio,
>   					   struct bio *bio)
>   {
> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
>   		return -ENOMEM;
>   	}
>   
> -	freeze_array(conf, 0);
> +	freeze_array_totally(conf);

I think this is wrong, raid1_reshape() can't be called with
'reconfig_mutex' grabbed, and this will deadlock because failed io need
this lock to be handled by daemon thread.(see details in [1]).

Be aware that never hold 'reconfig_mutex' to wait for io.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
>   
>   	/* ok, everything is stopped */
>   	oldpool = conf->r1bio_pool;
>

Yu Kuai July 20, 2023, 1:37 a.m. UTC | #2

Hi,

在 2023/07/20 9:36, Yu Kuai 写道:
> Hi,
> 
> 在 2023/07/19 15:09, Xueshi Hu 写道:
>> When an IO error happens, reschedule_retry() will increase
>> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
>> all r1bio in the memory pool are released, the memory pool should not be
>> modified. Introduce freeze_array_totally() to solve the problem. Compared
>> to freeze_array(), it's more strict because any in-flight io needs to
>> complete including queued io.
>>
>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>> ---
>>   drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
>>   1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index dd25832eb045..5605c9680818 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, 
>> int extra)
>>       /* Stop sync I/O and normal I/O and wait for everything to
>>        * go quiet.
>>        * This is called in two situations:
>> -     * 1) management command handlers (reshape, remove disk, quiesce).
>> +     * 1) management command handlers (remove disk, quiesce).
>>        * 2) one normal I/O request failed.
>>        * After array_frozen is set to 1, new sync IO will be blocked at
>> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
>>       wake_up(&conf->wait_barrier);
>>   }
>> +/* conf->resync_lock should be held */
>> +static int get_pending(struct r1conf *conf)
>> +{
>> +    int idx, ret;
>> +
>> +    ret = atomic_read(&conf->nr_sync_pending);
>> +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
>> +        ret += atomic_read(&conf->nr_pending[idx]);
>> +
>> +    return ret;
>> +}
>> +
>> +static void freeze_array_totally(struct r1conf *conf)
>> +{
>> +    /*
>> +     * freeze_array_totally() is almost the same with freeze_array() 
>> except
>> +     * it requires there's no queued io. Raid1's reshape will destroy 
>> the
>> +     * old mempool and change r1conf::raid_disks, which are necessary 
>> when
>> +     * freeing the queued io.
>> +     */
>> +    spin_lock_irq(&conf->resync_lock);
>> +    conf->array_frozen = 1;
>> +    raid1_log(conf->mddev, "freeze totally");
>> +    wait_event_lock_irq_cmd(
>> +            conf->wait_barrier,
>> +            get_pending(conf) == 0,
>> +            conf->resync_lock,
>> +            md_wakeup_thread(conf->mddev->thread));
>> +    spin_unlock_irq(&conf->resync_lock);
>> +}
>> +
>>   static void alloc_behind_master_bio(struct r1bio *r1_bio,
>>                          struct bio *bio)
>>   {
>> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
>>           return -ENOMEM;
>>       }
>> -    freeze_array(conf, 0);
>> +    freeze_array_totally(conf);
> 
> I think this is wrong, raid1_reshape() can't be called with
Sorry about thi typo, I mean raid1_reshape() can be called with ...

Thanks,
Kuai
> 'reconfig_mutex' grabbed, and this will deadlock because failed io need
> this lock to be handled by daemon thread.(see details in [1]).
> 
> Be aware that never hold 'reconfig_mutex' to wait for io.
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463 
> 
>>       /* ok, everything is stopped */
>>       oldpool = conf->r1bio_pool;
>>
> 
> .
>

Xueshi Hu July 31, 2023, 2:02 p.m. UTC | #3

On Thu, Jul 20, 2023 at 09:37:38AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/20 9:36, Yu Kuai 写道:
> > Hi,
> > 
> > 在 2023/07/19 15:09, Xueshi Hu 写道:
> > > When an IO error happens, reschedule_retry() will increase
> > > r1conf::nr_queued, which makes freeze_array() unblocked. However, before
> > > all r1bio in the memory pool are released, the memory pool should not be
> > > modified. Introduce freeze_array_totally() to solve the problem. Compared
> > > to freeze_array(), it's more strict because any in-flight io needs to
> > > complete including queued io.
> > > 
> > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > > ---
> > >   drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
> > >   1 file changed, 33 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > index dd25832eb045..5605c9680818 100644
> > > --- a/drivers/md/raid1.c
> > > +++ b/drivers/md/raid1.c
> > > @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf,
> > > int extra)
> > >       /* Stop sync I/O and normal I/O and wait for everything to
> > >        * go quiet.
> > >        * This is called in two situations:
> > > -     * 1) management command handlers (reshape, remove disk, quiesce).
> > > +     * 1) management command handlers (remove disk, quiesce).
> > >        * 2) one normal I/O request failed.
> > >        * After array_frozen is set to 1, new sync IO will be blocked at
> > > @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
> > >       wake_up(&conf->wait_barrier);
> > >   }
> > > +/* conf->resync_lock should be held */
> > > +static int get_pending(struct r1conf *conf)
> > > +{
> > > +    int idx, ret;
> > > +
> > > +    ret = atomic_read(&conf->nr_sync_pending);
> > > +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> > > +        ret += atomic_read(&conf->nr_pending[idx]);
> > > +
> > > +    return ret;
> > > +}
> > > +
> > > +static void freeze_array_totally(struct r1conf *conf)
> > > +{
> > > +    /*
> > > +     * freeze_array_totally() is almost the same with
> > > freeze_array() except
> > > +     * it requires there's no queued io. Raid1's reshape will
> > > destroy the
> > > +     * old mempool and change r1conf::raid_disks, which are
> > > necessary when
> > > +     * freeing the queued io.
> > > +     */
> > > +    spin_lock_irq(&conf->resync_lock);
> > > +    conf->array_frozen = 1;
> > > +    raid1_log(conf->mddev, "freeze totally");
> > > +    wait_event_lock_irq_cmd(
> > > +            conf->wait_barrier,
> > > +            get_pending(conf) == 0,
> > > +            conf->resync_lock,
> > > +            md_wakeup_thread(conf->mddev->thread));
> > > +    spin_unlock_irq(&conf->resync_lock);
> > > +}
> > > +
> > >   static void alloc_behind_master_bio(struct r1bio *r1_bio,
> > >                          struct bio *bio)
> > >   {
> > > @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
> > >           return -ENOMEM;
> > >       }
> > > -    freeze_array(conf, 0);
> > > +    freeze_array_totally(conf);
> > 
> > I think this is wrong, raid1_reshape() can't be called with
> Sorry about thi typo, I mean raid1_reshape() can be called with ...
You're right, this is indeed a deadlock.

I am wondering whether this approach is viable

	if (unlikely(atomic_read(conf->nr_queued))) {
		kfree(newpoolinfo);
		mempool_exit(&newpool);
		unfreeze_array(conf);

		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
		md_wakeup_thread(mddev->thread);
		return -EBUSY;
	}

Thanks,
Hu

> 
> Thanks,
> Kuai
> > 'reconfig_mutex' grabbed, and this will deadlock because failed io need
> > this lock to be handled by daemon thread.(see details in [1]).
> > 
> > Be aware that never hold 'reconfig_mutex' to wait for io.
> > 
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
> > 
> > >       /* ok, everything is stopped */
> > >       oldpool = conf->r1bio_pool;
> > > 
> > 
> > .
> > 
>

Yu Kuai Aug. 1, 2023, 1:24 a.m. UTC | #4

Hi,

在 2023/07/31 22:02, Xueshi Hu 写道:
> On Thu, Jul 20, 2023 at 09:37:38AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/20 9:36, Yu Kuai 写道:
>>> Hi,
>>>
>>> 在 2023/07/19 15:09, Xueshi Hu 写道:
>>>> When an IO error happens, reschedule_retry() will increase
>>>> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
>>>> all r1bio in the memory pool are released, the memory pool should not be
>>>> modified. Introduce freeze_array_totally() to solve the problem. Compared
>>>> to freeze_array(), it's more strict because any in-flight io needs to
>>>> complete including queued io.
>>>>
>>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>>>> ---
>>>>    drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
>>>>    1 file changed, 33 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>>> index dd25832eb045..5605c9680818 100644
>>>> --- a/drivers/md/raid1.c
>>>> +++ b/drivers/md/raid1.c
>>>> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf,
>>>> int extra)
>>>>        /* Stop sync I/O and normal I/O and wait for everything to
>>>>         * go quiet.
>>>>         * This is called in two situations:
>>>> -     * 1) management command handlers (reshape, remove disk, quiesce).
>>>> +     * 1) management command handlers (remove disk, quiesce).
>>>>         * 2) one normal I/O request failed.
>>>>         * After array_frozen is set to 1, new sync IO will be blocked at
>>>> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
>>>>        wake_up(&conf->wait_barrier);
>>>>    }
>>>> +/* conf->resync_lock should be held */
>>>> +static int get_pending(struct r1conf *conf)
>>>> +{
>>>> +    int idx, ret;
>>>> +
>>>> +    ret = atomic_read(&conf->nr_sync_pending);
>>>> +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
>>>> +        ret += atomic_read(&conf->nr_pending[idx]);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void freeze_array_totally(struct r1conf *conf)
>>>> +{
>>>> +    /*
>>>> +     * freeze_array_totally() is almost the same with
>>>> freeze_array() except
>>>> +     * it requires there's no queued io. Raid1's reshape will
>>>> destroy the
>>>> +     * old mempool and change r1conf::raid_disks, which are
>>>> necessary when
>>>> +     * freeing the queued io.
>>>> +     */
>>>> +    spin_lock_irq(&conf->resync_lock);
>>>> +    conf->array_frozen = 1;
>>>> +    raid1_log(conf->mddev, "freeze totally");
>>>> +    wait_event_lock_irq_cmd(
>>>> +            conf->wait_barrier,
>>>> +            get_pending(conf) == 0,
>>>> +            conf->resync_lock,
>>>> +            md_wakeup_thread(conf->mddev->thread));
>>>> +    spin_unlock_irq(&conf->resync_lock);
>>>> +}
>>>> +
>>>>    static void alloc_behind_master_bio(struct r1bio *r1_bio,
>>>>                           struct bio *bio)
>>>>    {
>>>> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
>>>>            return -ENOMEM;
>>>>        }
>>>> -    freeze_array(conf, 0);
>>>> +    freeze_array_totally(conf);
>>>
>>> I think this is wrong, raid1_reshape() can't be called with
>> Sorry about thi typo, I mean raid1_reshape() can be called with ...
> You're right, this is indeed a deadlock.
> 
> I am wondering whether this approach is viable
> 
> 	if (unlikely(atomic_read(conf->nr_queued))) {
> 		kfree(newpoolinfo);
> 		mempool_exit(&newpool);
> 		unfreeze_array(conf);
> 
> 		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> 		md_wakeup_thread(mddev->thread);
> 		return -EBUSY;
> 	}

This is not okay, 'nr_queued' can be incresed at any time when normal io
failed, read it once doesn't mean anything, and you need to
freeze_array() before reading it:

freeze_array
// guarantee new io won't be dispatched
if (atomic_read(conf->nr_queued))
  ...
  unfreeze_array
  return -EBUSY;

Fortunately, I'm working on another patchset to synchronize io with
array configuration, which means all the callers of raid1_reshape() will
suspend the array, and no normal io will be in progress, hence this
problem won't exist as well.

Thanks,
Kuai

> 
> Thanks,
> Hu
> 
>>
>> Thanks,
>> Kuai
>>> 'reconfig_mutex' grabbed, and this will deadlock because failed io need
>>> this lock to be handled by daemon thread.(see details in [1]).
>>>
>>> Be aware that never hold 'reconfig_mutex' to wait for io.
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
>>>
>>>>        /* ok, everything is stopped */
>>>>        oldpool = conf->r1bio_pool;
>>>>
>>>
>>> .
>>>
>>
> .
>

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index dd25832eb045..5605c9680818 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1072,7 +1072,7 @@  static void freeze_array(struct r1conf *conf, int extra)
 	/* Stop sync I/O and normal I/O and wait for everything to
 	 * go quiet.
 	 * This is called in two situations:
-	 * 1) management command handlers (reshape, remove disk, quiesce).
+	 * 1) management command handlers (remove disk, quiesce).
 	 * 2) one normal I/O request failed.
 
 	 * After array_frozen is set to 1, new sync IO will be blocked at
@@ -1111,6 +1111,37 @@  static void unfreeze_array(struct r1conf *conf)
 	wake_up(&conf->wait_barrier);
 }
 
+/* conf->resync_lock should be held */
+static int get_pending(struct r1conf *conf)
+{
+	int idx, ret;
+
+	ret = atomic_read(&conf->nr_sync_pending);
+	for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
+		ret += atomic_read(&conf->nr_pending[idx]);
+
+	return ret;
+}
+
+static void freeze_array_totally(struct r1conf *conf)
+{
+	/*
+	 * freeze_array_totally() is almost the same with freeze_array() except
+	 * it requires there's no queued io. Raid1's reshape will destroy the
+	 * old mempool and change r1conf::raid_disks, which are necessary when
+	 * freeing the queued io.
+	 */
+	spin_lock_irq(&conf->resync_lock);
+	conf->array_frozen = 1;
+	raid1_log(conf->mddev, "freeze totally");
+	wait_event_lock_irq_cmd(
+			conf->wait_barrier,
+			get_pending(conf) == 0,
+			conf->resync_lock,
+			md_wakeup_thread(conf->mddev->thread));
+	spin_unlock_irq(&conf->resync_lock);
+}
+
 static void alloc_behind_master_bio(struct r1bio *r1_bio,
 					   struct bio *bio)
 {
@@ -3296,7 +3327,7 @@  static int raid1_reshape(struct mddev *mddev)
 		return -ENOMEM;
 	}
 
-	freeze_array(conf, 0);
+	freeze_array_totally(conf);
 
 	/* ok, everything is stopped */
 	oldpool = conf->r1bio_pool;

[v3,1/3] md/raid1: freeze array more strictly when reshape

Commit Message

Comments

Patch