rbd: don't move requests to the running list on errors

Message ID	20240117183542.1431147-1-idryomov@gmail.com (mailing list archive)
State	New, archived
Headers	show Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65A0322EE3 for <ceph-devel@vger.kernel.org>; Wed, 17 Jan 2024 18:35:55 +0000 (UTC) From: Ilya Dryomov <idryomov@gmail.com> To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang <dongsheng.yang@easystack.cn> Subject: [PATCH] rbd: don't move requests to the running list on errors Date: Wed, 17 Jan 2024 19:35:30 +0100 Message-ID: <20240117183542.1431147-1-idryomov@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	rbd: don't move requests to the running list on errors \| expand rbd: don't move requests to the running list on errors

Message ID

20240117183542.1431147-1-idryomov@gmail.com (mailing list archive)

State

New, archived

Headers

From: Ilya Dryomov <idryomov@gmail.com>
To: ceph-devel@vger.kernel.org
Cc: Dongsheng Yang <dongsheng.yang@easystack.cn>
Subject: [PATCH] rbd: don't move requests to the running list on errors
Date: Wed, 17 Jan 2024 19:35:30 +0100
Message-ID: <20240117183542.1431147-1-idryomov@gmail.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

rbd: don't move requests to the running list on errors | expand

Commit Message

Ilya Dryomov Jan. 17, 2024, 6:35 p.m. UTC

The running list is supposed to contain requests that are pinning the
exclusive lock, i.e. those that must be flushed before exclusive lock
is released.  When wake_lock_waiters() is called to handle an error,
requests on the acquiring list are failed with that error and no
flushing takes place.  Briefly moving them to the running list is not
only pointless but also harmful: if exclusive lock gets acquired
before all of their state machines are scheduled and go through
rbd_lock_del_request(), we trigger

    rbd_assert(list_empty(&rbd_dev->running_list));

in rbd_try_acquire_lock().

Cc: stable@vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 drivers/block/rbd.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

Comments

Ilya Dryomov Jan. 18, 2024, 10:24 a.m. UTC | #1

On Thu, Jan 18, 2024 at 4:13 AM Dongsheng Yang
<dongsheng.yang@easystack.cn> wrote:
>
>
>
> 在 2024/1/18 星期四 上午 2:35, Ilya Dryomov 写道:
> > The running list is supposed to contain requests that are pinning the
> > exclusive lock, i.e. those that must be flushed before exclusive lock
> > is released.  When wake_lock_waiters() is called to handle an error,
> > requests on the acquiring list are failed with that error and no
> > flushing takes place.  Briefly moving them to the running list is not
> > only pointless but also harmful: if exclusive lock gets acquired
> > before all of their state machines are scheduled and go through
> > rbd_lock_del_request(), we trigger
> >
> >      rbd_assert(list_empty(&rbd_dev->running_list));
> >
> > in rbd_try_acquire_lock().
> >
> > Cc: stable@vger.kernel.org
> > Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
> > Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> > ---
> >   drivers/block/rbd.c | 22 ++++++++++++++--------
> >   1 file changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> > index 63897d0d6629..12b5d53ec856 100644
> > --- a/drivers/block/rbd.c
> > +++ b/drivers/block/rbd.c
> > @@ -3452,14 +3452,15 @@ static bool rbd_lock_add_request(struct rbd_img_request *img_req)
> >   static void rbd_lock_del_request(struct rbd_img_request *img_req)
> >   {
> >       struct rbd_device *rbd_dev = img_req->rbd_dev;
> > -     bool need_wakeup;
> > +     bool need_wakeup = false;
> >
> >       lockdep_assert_held(&rbd_dev->lock_rwsem);
> >       spin_lock(&rbd_dev->lock_lists_lock);
> > -     rbd_assert(!list_empty(&img_req->lock_item));
> > -     list_del_init(&img_req->lock_item);
> > -     need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING &&
> > -                    list_empty(&rbd_dev->running_list));
> > +     if (!list_empty(&img_req->lock_item)) {
> > +             list_del_init(&img_req->lock_item);
> > +             need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING &&
> > +                            list_empty(&rbd_dev->running_list));
> > +     }
> >       spin_unlock(&rbd_dev->lock_lists_lock);
> >       if (need_wakeup)
> >               complete(&rbd_dev->releasing_wait);
> > @@ -3842,14 +3843,19 @@ static void wake_lock_waiters(struct rbd_device *rbd_dev, int result)
> >               return;
> >       }
> >
> > -     list_for_each_entry(img_req, &rbd_dev->acquiring_list, lock_item) {
> > +     while (!list_empty(&rbd_dev->acquiring_list)) {
> > +             img_req = list_first_entry(&rbd_dev->acquiring_list,
> > +                                        struct rbd_img_request, lock_item);
> >               mutex_lock(&img_req->state_mutex);
> >               rbd_assert(img_req->state == RBD_IMG_EXCLUSIVE_LOCK);
> > +             if (!result)
> > +                     list_move_tail(&img_req->lock_item,
> > +                                    &rbd_dev->running_list);
> > +             else
> > +                     list_del_init(&img_req->lock_item);
> >               rbd_img_schedule(img_req, result);
> >               mutex_unlock(&img_req->state_mutex);
> >       }
> > -
> > -     list_splice_tail_init(&rbd_dev->acquiring_list, &rbd_dev->running_list);
>
> Hi Ilya,
>         If we dont move these requests to ->running_list, then the need_wakeup
> is always false for these requests. So who will finally complete the
> &rbd_dev->releaseing_wait ?

Hi Dongsheng,

These requests are woken up explicitly in rbd_img_schedule().  Because
img_req->work_result would be set to an error, the state machine would
finish immediately on:

    case RBD_IMG_EXCLUSIVE_LOCK:
            if (*result)
                    return true;

rbd_dev->releasing_wait doesn't need to be completed in this case
because these requests are terminated while still on the acquiring
list.  Waiting for their state machines to get scheduled just to hit
that "if (*result)" check and bail isn't necessary.

Thanks,

                Ilya

Dongsheng Yang Jan. 18, 2024, 10:46 a.m. UTC | #2

在 2024/1/18 星期四 下午 6:24, Ilya Dryomov 写道:
> On Thu, Jan 18, 2024 at 4:13 AM Dongsheng Yang
> <dongsheng.yang@easystack.cn> wrote:
>>
>>
>>
>> 在 2024/1/18 星期四 上午 2:35, Ilya Dryomov 写道:
>>> The running list is supposed to contain requests that are pinning the
>>> exclusive lock, i.e. those that must be flushed before exclusive lock
>>> is released.  When wake_lock_waiters() is called to handle an error,
>>> requests on the acquiring list are failed with that error and no
>>> flushing takes place.  Briefly moving them to the running list is not
>>> only pointless but also harmful: if exclusive lock gets acquired
>>> before all of their state machines are scheduled and go through
>>> rbd_lock_del_request(), we trigger
>>>
>>>       rbd_assert(list_empty(&rbd_dev->running_list));
>>>
>>> in rbd_try_acquire_lock().
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
>>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>>> ---
>>>    drivers/block/rbd.c | 22 ++++++++++++++--------
>>>    1 file changed, 14 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index 63897d0d6629..12b5d53ec856 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -3452,14 +3452,15 @@ static bool rbd_lock_add_request(struct rbd_img_request *img_req)
>>>    static void rbd_lock_del_request(struct rbd_img_request *img_req)
>>>    {
>>>        struct rbd_device *rbd_dev = img_req->rbd_dev;
>>> -     bool need_wakeup;
>>> +     bool need_wakeup = false;
>>>
>>>        lockdep_assert_held(&rbd_dev->lock_rwsem);
>>>        spin_lock(&rbd_dev->lock_lists_lock);
>>> -     rbd_assert(!list_empty(&img_req->lock_item));
>>> -     list_del_init(&img_req->lock_item);
>>> -     need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING &&
>>> -                    list_empty(&rbd_dev->running_list));
>>> +     if (!list_empty(&img_req->lock_item)) {
>>> +             list_del_init(&img_req->lock_item);
>>> +             need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING &&
>>> +                            list_empty(&rbd_dev->running_list));
>>> +     }
>>>        spin_unlock(&rbd_dev->lock_lists_lock);
>>>        if (need_wakeup)
>>>                complete(&rbd_dev->releasing_wait);
>>> @@ -3842,14 +3843,19 @@ static void wake_lock_waiters(struct rbd_device *rbd_dev, int result)
>>>                return;
>>>        }
>>>
>>> -     list_for_each_entry(img_req, &rbd_dev->acquiring_list, lock_item) {
>>> +     while (!list_empty(&rbd_dev->acquiring_list)) {
>>> +             img_req = list_first_entry(&rbd_dev->acquiring_list,
>>> +                                        struct rbd_img_request, lock_item);
>>>                mutex_lock(&img_req->state_mutex);
>>>                rbd_assert(img_req->state == RBD_IMG_EXCLUSIVE_LOCK);
>>> +             if (!result)
>>> +                     list_move_tail(&img_req->lock_item,
>>> +                                    &rbd_dev->running_list);
>>> +             else
>>> +                     list_del_init(&img_req->lock_item);
>>>                rbd_img_schedule(img_req, result);
>>>                mutex_unlock(&img_req->state_mutex);
>>>        }
>>> -
>>> -     list_splice_tail_init(&rbd_dev->acquiring_list, &rbd_dev->running_list);
>>
>> Hi Ilya,
>>          If we dont move these requests to ->running_list, then the need_wakeup
>> is always false for these requests. So who will finally complete the
>> &rbd_dev->releaseing_wait ?
> 
> Hi Dongsheng,
> 
> These requests are woken up explicitly in rbd_img_schedule().  Because
> img_req->work_result would be set to an error, the state machine would
> finish immediately on:
> 
>      case RBD_IMG_EXCLUSIVE_LOCK:
>              if (*result)
>                      return true;
> 
> rbd_dev->releasing_wait doesn't need to be completed in this case
> because these requests are terminated while still on the acquiring
> list.  Waiting for their state machines to get scheduled just to hit
> that "if (*result)" check and bail isn't necessary.

Hi Ilya,
	Thanx for your explanation, looks good to me now.

Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>

Thanx
> 
> Thanks,
> 
>                  Ilya
>

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 63897d0d6629..12b5d53ec856 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3452,14 +3452,15 @@  static bool rbd_lock_add_request(struct rbd_img_request *img_req)
 static void rbd_lock_del_request(struct rbd_img_request *img_req)
 {
 	struct rbd_device *rbd_dev = img_req->rbd_dev;
-	bool need_wakeup;
+	bool need_wakeup = false;
 
 	lockdep_assert_held(&rbd_dev->lock_rwsem);
 	spin_lock(&rbd_dev->lock_lists_lock);
-	rbd_assert(!list_empty(&img_req->lock_item));
-	list_del_init(&img_req->lock_item);
-	need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING &&
-		       list_empty(&rbd_dev->running_list));
+	if (!list_empty(&img_req->lock_item)) {
+		list_del_init(&img_req->lock_item);
+		need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING &&
+			       list_empty(&rbd_dev->running_list));
+	}
 	spin_unlock(&rbd_dev->lock_lists_lock);
 	if (need_wakeup)
 		complete(&rbd_dev->releasing_wait);
@@ -3842,14 +3843,19 @@  static void wake_lock_waiters(struct rbd_device *rbd_dev, int result)
 		return;
 	}
 
-	list_for_each_entry(img_req, &rbd_dev->acquiring_list, lock_item) {
+	while (!list_empty(&rbd_dev->acquiring_list)) {
+		img_req = list_first_entry(&rbd_dev->acquiring_list,
+					   struct rbd_img_request, lock_item);
 		mutex_lock(&img_req->state_mutex);
 		rbd_assert(img_req->state == RBD_IMG_EXCLUSIVE_LOCK);
+		if (!result)
+			list_move_tail(&img_req->lock_item,
+				       &rbd_dev->running_list);
+		else
+			list_del_init(&img_req->lock_item);
 		rbd_img_schedule(img_req, result);
 		mutex_unlock(&img_req->state_mutex);
 	}
-
-	list_splice_tail_init(&rbd_dev->acquiring_list, &rbd_dev->running_list);
 }
 
 static bool locker_equal(const struct ceph_locker *lhs,

rbd: don't move requests to the running list on errors

Commit Message

Comments

Patch