ceph: stop forwarding the request when exceeding 256 times

Message ID	20220329080608.14667-1-xiubli@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <ceph-devel-owner@kernel.org> From: xiubli@redhat.com To: jlayton@kernel.org Cc: idryomov@gmail.com, vshankar@redhat.com, gfarnum@redhat.com, ceph-devel@vger.kernel.org, Xiubo Li <xiubli@redhat.com> Subject: [PATCH] ceph: stop forwarding the request when exceeding 256 times Date: Tue, 29 Mar 2022 16:06:08 +0800 Message-Id: <20220329080608.14667-1-xiubli@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ceph: stop forwarding the request when exceeding 256 times \| expand ceph: stop forwarding the request when exceeding 256 times

Message ID

20220329080608.14667-1-xiubli@redhat.com (mailing list archive)

State

New, archived

Headers

From: xiubli@redhat.com
To: jlayton@kernel.org
Cc: idryomov@gmail.com, vshankar@redhat.com, gfarnum@redhat.com,
        ceph-devel@vger.kernel.org, Xiubo Li <xiubli@redhat.com>
Subject: [PATCH] ceph: stop forwarding the request when exceeding 256 times
Date: Tue, 29 Mar 2022 16:06:08 +0800
Message-Id: <20220329080608.14667-1-xiubli@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

ceph: stop forwarding the request when exceeding 256 times | expand

Commit Message

Xiubo Li March 29, 2022, 8:06 a.m. UTC

From: Xiubo Li <xiubli@redhat.com>

The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
while in 'ceph_mds_request_head' the type is '__u8'. So in case
the request bounces between MDSes exceeding 256 times, the client
will get stuck.

In this case it's ususally a bug in MDS and continue bouncing the
request makes no sense.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/mds_client.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

Comments

Luis Henriques March 29, 2022, 9:53 a.m. UTC | #1

xiubli@redhat.com writes:

> From: Xiubo Li <xiubli@redhat.com>
>
> The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
> while in 'ceph_mds_request_head' the type is '__u8'. So in case
> the request bounces between MDSes exceeding 256 times, the client
> will get stuck.
>
> In this case it's ususally a bug in MDS and continue bouncing the
> request makes no sense.

Ouch.  Nice catch.  This patch looks OK to me, just 2 minor comments
bellow.

> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/mds_client.c | 31 ++++++++++++++++++++++++++++---
>  1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index a89ee866ebbb..0bb6e7bc499c 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -3293,6 +3293,7 @@ static void handle_forward(struct ceph_mds_client *mdsc,
>  	int err = -EINVAL;
>  	void *p = msg->front.iov_base;
>  	void *end = p + msg->front.iov_len;
> +	bool aborted = false;
>  
>  	ceph_decode_need(&p, end, 2*sizeof(u32), bad);
>  	next_mds = ceph_decode_32(&p);
> @@ -3309,8 +3310,28 @@ static void handle_forward(struct ceph_mds_client *mdsc,
>  		dout("forward tid %llu aborted, unregistering\n", tid);
>  		__unregister_request(mdsc, req);
>  	} else if (fwd_seq <= req->r_num_fwd) {
> -		dout("forward tid %llu to mds%d - old seq %d <= %d\n",
> -		     tid, next_mds, req->r_num_fwd, fwd_seq);
> +		/*
> +		 * The type of 'num_fwd' in ceph 'MClientRequestForward'
> +		 * is 'int32_t', while in 'ceph_mds_request_head' the
> +		 * type is '__u8'. So in case the request bounces between
> +		 * MDSes exceeding 256 times, the client will get stuck.
> +		 *
> +		 * In this case it's ususally a bug in MDS and continue
> +		 * bouncing the request makes no sense.
> +		 */
> +		if (req->r_num_fwd == 256) {
> +			mutex_lock(&req->r_fill_mutex);
> +			req->r_err = -EIO;

Not sure -EIO is the most appropriate.  Maybe -E2BIG... although not quite
it either.

> +			set_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags);
> +			mutex_unlock(&req->r_fill_mutex);
> +			aborted = true;
> +			dout("forward tid %llu to mds%d - seq overflowed %d <= %d\n",
> +			     tid, next_mds, req->r_num_fwd, fwd_seq);
> +			goto out;

This 'goto' statement can be dropped, but one before (when the
lookup_get_request() fails) needs to be adjusted, otherwise
ceph_mdsc_put_request() may be called with a NULL pointer.

Cheers,

Xiubo Li March 29, 2022, 11:12 a.m. UTC | #2

On 3/29/22 5:53 PM, Luís Henriques wrote:
> xiubli@redhat.com writes:
>
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
>> while in 'ceph_mds_request_head' the type is '__u8'. So in case
>> the request bounces between MDSes exceeding 256 times, the client
>> will get stuck.
>>
>> In this case it's ususally a bug in MDS and continue bouncing the
>> request makes no sense.
> Ouch.  Nice catch.  This patch looks OK to me, just 2 minor comments
> bellow.
>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/mds_client.c | 31 ++++++++++++++++++++++++++++---
>>   1 file changed, 28 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>> index a89ee866ebbb..0bb6e7bc499c 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -3293,6 +3293,7 @@ static void handle_forward(struct ceph_mds_client *mdsc,
>>   	int err = -EINVAL;
>>   	void *p = msg->front.iov_base;
>>   	void *end = p + msg->front.iov_len;
>> +	bool aborted = false;
>>   
>>   	ceph_decode_need(&p, end, 2*sizeof(u32), bad);
>>   	next_mds = ceph_decode_32(&p);
>> @@ -3309,8 +3310,28 @@ static void handle_forward(struct ceph_mds_client *mdsc,
>>   		dout("forward tid %llu aborted, unregistering\n", tid);
>>   		__unregister_request(mdsc, req);
>>   	} else if (fwd_seq <= req->r_num_fwd) {
>> -		dout("forward tid %llu to mds%d - old seq %d <= %d\n",
>> -		     tid, next_mds, req->r_num_fwd, fwd_seq);
>> +		/*
>> +		 * The type of 'num_fwd' in ceph 'MClientRequestForward'
>> +		 * is 'int32_t', while in 'ceph_mds_request_head' the
>> +		 * type is '__u8'. So in case the request bounces between
>> +		 * MDSes exceeding 256 times, the client will get stuck.
>> +		 *
>> +		 * In this case it's ususally a bug in MDS and continue
>> +		 * bouncing the request makes no sense.
>> +		 */
>> +		if (req->r_num_fwd == 256) {
>> +			mutex_lock(&req->r_fill_mutex);
>> +			req->r_err = -EIO;
> Not sure -EIO is the most appropriate.  Maybe -E2BIG... although not quite
> it either.
>
Yeah, I also not very sure here.

Jeff ?


>> +			set_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags);
>> +			mutex_unlock(&req->r_fill_mutex);
>> +			aborted = true;
>> +			dout("forward tid %llu to mds%d - seq overflowed %d <= %d\n",
>> +			     tid, next_mds, req->r_num_fwd, fwd_seq);
>> +			goto out;
> This 'goto' statement can be dropped, but one before (when the
> lookup_get_request() fails) needs to be adjusted, otherwise
> ceph_mdsc_put_request() may be called with a NULL pointer.

Yeah, will fix it.

Thanks.

-- Xiubo


> Cheers,

Jeff Layton March 29, 2022, 11:28 a.m. UTC | #3

On Tue, 2022-03-29 at 19:12 +0800, Xiubo Li wrote:
> On 3/29/22 5:53 PM, Luís Henriques wrote:
> > xiubli@redhat.com writes:
> > 
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
> > > while in 'ceph_mds_request_head' the type is '__u8'. So in case
> > > the request bounces between MDSes exceeding 256 times, the client
> > > will get stuck.
> > > 
> > > In this case it's ususally a bug in MDS and continue bouncing the
> > > request makes no sense.
> > Ouch.  Nice catch.  This patch looks OK to me, just 2 minor comments
> > bellow.
> > 
> > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > ---
> > >   fs/ceph/mds_client.c | 31 ++++++++++++++++++++++++++++---
> > >   1 file changed, 28 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > index a89ee866ebbb..0bb6e7bc499c 100644
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -3293,6 +3293,7 @@ static void handle_forward(struct ceph_mds_client *mdsc,
> > >   	int err = -EINVAL;
> > >   	void *p = msg->front.iov_base;
> > >   	void *end = p + msg->front.iov_len;
> > > +	bool aborted = false;
> > >   
> > >   	ceph_decode_need(&p, end, 2*sizeof(u32), bad);
> > >   	next_mds = ceph_decode_32(&p);
> > > @@ -3309,8 +3310,28 @@ static void handle_forward(struct ceph_mds_client *mdsc,
> > >   		dout("forward tid %llu aborted, unregistering\n", tid);
> > >   		__unregister_request(mdsc, req);
> > >   	} else if (fwd_seq <= req->r_num_fwd) {
> > > -		dout("forward tid %llu to mds%d - old seq %d <= %d\n",
> > > -		     tid, next_mds, req->r_num_fwd, fwd_seq);
> > > +		/*
> > > +		 * The type of 'num_fwd' in ceph 'MClientRequestForward'
> > > +		 * is 'int32_t', while in 'ceph_mds_request_head' the
> > > +		 * type is '__u8'. So in case the request bounces between
> > > +		 * MDSes exceeding 256 times, the client will get stuck.
> > > +		 *
> > > +		 * In this case it's ususally a bug in MDS and continue
> > > +		 * bouncing the request makes no sense.
> > > +		 */
> > > +		if (req->r_num_fwd == 256) {
> > > +			mutex_lock(&req->r_fill_mutex);
> > > +			req->r_err = -EIO;
> > Not sure -EIO is the most appropriate.  Maybe -E2BIG... although not quite
> > it either.
> > 
> Yeah, I also not very sure here.
> 
> Jeff ?
> 

Matching errors like this really comes down to a judgement call.  E2BIG
usually means that some buffer was sized too small, so you'll have users
trying to figure out what they passed in wrong if you return that here.

-EIO is the usual "default" when you don't know what else to use.
There's also -EREMOTEIO which may be closer here since this is
indicative of MDS problems. Given that, it may also be a good idea to
log a pr_warn or pr_notice message at the same time explaining what
happened.


> 
> > > +			set_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags);
> > > +			mutex_unlock(&req->r_fill_mutex);
> > > +			aborted = true;
> > > +			dout("forward tid %llu to mds%d - seq overflowed %d <= %d\n",
> > > +			     tid, next_mds, req->r_num_fwd, fwd_seq);
> > > +			goto out;
> > This 'goto' statement can be dropped, but one before (when the
> > lookup_get_request() fails) needs to be adjusted, otherwise
> > ceph_mdsc_put_request() may be called with a NULL pointer.
> 
> Yeah, will fix it.
> 
> Thanks.
> 
> -- Xiubo
> 
> 
> > Cheers,
>

Xiubo Li March 29, 2022, 12:24 p.m. UTC | #4

On 3/29/22 7:28 PM, Jeff Layton wrote:
> On Tue, 2022-03-29 at 19:12 +0800, Xiubo Li wrote:
>> On 3/29/22 5:53 PM, Luís Henriques wrote:
>>> xiubli@redhat.com writes:
>>>
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
>>>> while in 'ceph_mds_request_head' the type is '__u8'. So in case
>>>> the request bounces between MDSes exceeding 256 times, the client
>>>> will get stuck.
>>>>
>>>> In this case it's ususally a bug in MDS and continue bouncing the
>>>> request makes no sense.
>>> Ouch.  Nice catch.  This patch looks OK to me, just 2 minor comments
>>> bellow.
>>>
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> ---
>>>>    fs/ceph/mds_client.c | 31 ++++++++++++++++++++++++++++---
>>>>    1 file changed, 28 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>> index a89ee866ebbb..0bb6e7bc499c 100644
>>>> --- a/fs/ceph/mds_client.c
>>>> +++ b/fs/ceph/mds_client.c
>>>> @@ -3293,6 +3293,7 @@ static void handle_forward(struct ceph_mds_client *mdsc,
>>>>    	int err = -EINVAL;
>>>>    	void *p = msg->front.iov_base;
>>>>    	void *end = p + msg->front.iov_len;
>>>> +	bool aborted = false;
>>>>    
>>>>    	ceph_decode_need(&p, end, 2*sizeof(u32), bad);
>>>>    	next_mds = ceph_decode_32(&p);
>>>> @@ -3309,8 +3310,28 @@ static void handle_forward(struct ceph_mds_client *mdsc,
>>>>    		dout("forward tid %llu aborted, unregistering\n", tid);
>>>>    		__unregister_request(mdsc, req);
>>>>    	} else if (fwd_seq <= req->r_num_fwd) {
>>>> -		dout("forward tid %llu to mds%d - old seq %d <= %d\n",
>>>> -		     tid, next_mds, req->r_num_fwd, fwd_seq);
>>>> +		/*
>>>> +		 * The type of 'num_fwd' in ceph 'MClientRequestForward'
>>>> +		 * is 'int32_t', while in 'ceph_mds_request_head' the
>>>> +		 * type is '__u8'. So in case the request bounces between
>>>> +		 * MDSes exceeding 256 times, the client will get stuck.
>>>> +		 *
>>>> +		 * In this case it's ususally a bug in MDS and continue
>>>> +		 * bouncing the request makes no sense.
>>>> +		 */
>>>> +		if (req->r_num_fwd == 256) {
>>>> +			mutex_lock(&req->r_fill_mutex);
>>>> +			req->r_err = -EIO;
>>> Not sure -EIO is the most appropriate.  Maybe -E2BIG... although not quite
>>> it either.
>>>
>> Yeah, I also not very sure here.
>>
>> Jeff ?
>>
> Matching errors like this really comes down to a judgement call.  E2BIG
> usually means that some buffer was sized too small, so you'll have users
> trying to figure out what they passed in wrong if you return that here.
>
> -EIO is the usual "default" when you don't know what else to use.
> There's also -EREMOTEIO which may be closer here since this is
> indicative of MDS problems. Given that, it may also be a good idea to
> log a pr_warn or pr_notice message at the same time explaining what
> happened.

As discussed in IRC, have switched to EMULTIHOP instead.

Thanks Jeff.

-- Xiubo

>
>>>> +			set_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags);
>>>> +			mutex_unlock(&req->r_fill_mutex);
>>>> +			aborted = true;
>>>> +			dout("forward tid %llu to mds%d - seq overflowed %d <= %d\n",
>>>> +			     tid, next_mds, req->r_num_fwd, fwd_seq);
>>>> +			goto out;
>>> This 'goto' statement can be dropped, but one before (when the
>>> lookup_get_request() fails) needs to be adjusted, otherwise
>>> ceph_mdsc_put_request() may be called with a NULL pointer.
>> Yeah, will fix it.
>>
>> Thanks.
>>
>> -- Xiubo
>>
>>
>>> Cheers,

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a89ee866ebbb..0bb6e7bc499c 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3293,6 +3293,7 @@  static void handle_forward(struct ceph_mds_client *mdsc,
 	int err = -EINVAL;
 	void *p = msg->front.iov_base;
 	void *end = p + msg->front.iov_len;
+	bool aborted = false;
 
 	ceph_decode_need(&p, end, 2*sizeof(u32), bad);
 	next_mds = ceph_decode_32(&p);
@@ -3309,8 +3310,28 @@  static void handle_forward(struct ceph_mds_client *mdsc,
 		dout("forward tid %llu aborted, unregistering\n", tid);
 		__unregister_request(mdsc, req);
 	} else if (fwd_seq <= req->r_num_fwd) {
-		dout("forward tid %llu to mds%d - old seq %d <= %d\n",
-		     tid, next_mds, req->r_num_fwd, fwd_seq);
+		/*
+		 * The type of 'num_fwd' in ceph 'MClientRequestForward'
+		 * is 'int32_t', while in 'ceph_mds_request_head' the
+		 * type is '__u8'. So in case the request bounces between
+		 * MDSes exceeding 256 times, the client will get stuck.
+		 *
+		 * In this case it's ususally a bug in MDS and continue
+		 * bouncing the request makes no sense.
+		 */
+		if (req->r_num_fwd == 256) {
+			mutex_lock(&req->r_fill_mutex);
+			req->r_err = -EIO;
+			set_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags);
+			mutex_unlock(&req->r_fill_mutex);
+			aborted = true;
+			dout("forward tid %llu to mds%d - seq overflowed %d <= %d\n",
+			     tid, next_mds, req->r_num_fwd, fwd_seq);
+			goto out;
+		} else {
+			dout("forward tid %llu to mds%d - old seq %d <= %d\n",
+			     tid, next_mds, req->r_num_fwd, fwd_seq);
+		}
 	} else {
 		/* resend. forward race not possible; mds would drop */
 		dout("forward tid %llu to mds%d (we resend)\n", tid, next_mds);
@@ -3322,9 +3343,13 @@  static void handle_forward(struct ceph_mds_client *mdsc,
 		put_request_session(req);
 		__do_request(mdsc, req);
 	}
-	ceph_mdsc_put_request(req);
 out:
 	mutex_unlock(&mdsc->mutex);
+
+	/* kick calling process */
+	if (aborted)
+		complete_request(mdsc, req);
+	ceph_mdsc_put_request(req);
 	return;
 
 bad:

ceph: stop forwarding the request when exceeding 256 times

Commit Message

Comments

Patch