ceph: stop retrying the request when exceeding 256 times

Message ID	20220330064444.330384-1-xiubli@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <ceph-devel-owner@kernel.org> From: xiubli@redhat.com To: jlayton@kernel.org Cc: idryomov@gmail.com, vshankar@redhat.com, ceph-devel@vger.kernel.org, Xiubo Li <xiubli@redhat.com> Subject: [PATCH] ceph: stop retrying the request when exceeding 256 times Date: Wed, 30 Mar 2022 14:44:44 +0800 Message-Id: <20220330064444.330384-1-xiubli@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ceph: stop retrying the request when exceeding 256 times \| expand ceph: stop retrying the request when exceeding 256 times

Message ID

20220330064444.330384-1-xiubli@redhat.com (mailing list archive)

State

New, archived

Headers

From: xiubli@redhat.com
To: jlayton@kernel.org
Cc: idryomov@gmail.com, vshankar@redhat.com,
        ceph-devel@vger.kernel.org, Xiubo Li <xiubli@redhat.com>
Subject: [PATCH] ceph: stop retrying the request when exceeding 256 times
Date: Wed, 30 Mar 2022 14:44:44 +0800
Message-Id: <20220330064444.330384-1-xiubli@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

ceph: stop retrying the request when exceeding 256 times | expand

Commit Message

Xiubo Li March 30, 2022, 6:44 a.m. UTC

From: Xiubo Li <xiubli@redhat.com>

The type of 'r_attempts' in kernel 'ceph_mds_request' is 'int',
while in 'ceph_mds_request_head' the type of 'num_retry' is '__u8'.
So in case the request retries exceeding 256 times, the MDS will
receive a incorrect retry seq.

In this case it's ususally a bug in MDS and continue retrying the
request makes no sense. For now let's limit it to 256. In future
this could be fixed in ceph code, so avoid using the hardcode here.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/mds_client.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

Comments

Jeff Layton March 30, 2022, 10:36 a.m. UTC | #1

On Wed, 2022-03-30 at 14:44 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> The type of 'r_attempts' in kernel 'ceph_mds_request' is 'int',
> while in 'ceph_mds_request_head' the type of 'num_retry' is '__u8'.
> So in case the request retries exceeding 256 times, the MDS will
> receive a incorrect retry seq.
> 
> In this case it's ususally a bug in MDS and continue retrying the
> request makes no sense. For now let's limit it to 256. In future
> this could be fixed in ceph code, so avoid using the hardcode here.
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/mds_client.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index e11d31401f12..f476c65fb985 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2679,7 +2679,28 @@ static int __prepare_send_request(struct ceph_mds_session *session,
>  	struct ceph_mds_client *mdsc = session->s_mdsc;
>  	struct ceph_mds_request_head_old *rhead;
>  	struct ceph_msg *msg;
> -	int flags = 0;
> +	int flags = 0, max_retry;
> +
> +	/*
> +	 * The type of 'r_attempts' in kernel 'ceph_mds_request'
> +	 * is 'int', while in 'ceph_mds_request_head' the type of
> +	 * 'num_retry' is '__u8'. So in case the request retries
> +	 *  exceeding 256 times, the MDS will receive a incorrect
> +	 *  retry seq.
> +	 *
> +	 * In this case it's ususally a bug in MDS and continue
> +	 * retrying the request makes no sense.
> +	 *
> +	 * In future this could be fixed in ceph code, so avoid
> +	 * using the hardcode here.
> +	 */
> +	max_retry = sizeof_field(struct ceph_mds_request_head, num_retry);
> +	max_retry = 1 << (max_retry * BITS_PER_BYTE);
> +	if (req->r_attempts >= max_retry) {
> +		pr_warn_ratelimited("%s request tid %llu seq overflow\n",
> +				    __func__, req->r_tid);
> +		return -EMULTIHOP;
> +	}
>  
>  	req->r_attempts++;
>  	if (req->r_inode) {
> @@ -2691,7 +2712,7 @@ static int __prepare_send_request(struct ceph_mds_session *session,
>  		else
>  			req->r_sent_on_mseq = -1;
>  	}
> -	dout("prepare_send_request %p tid %lld %s (attempt %d)\n", req,
> +	dout("%s %p tid %lld %s (attempt %d)\n", __func__, req,
>  	     req->r_tid, ceph_mds_op_name(req->r_op), req->r_attempts);
>  
>  	if (test_bit(CEPH_MDS_R_GOT_UNSAFE, &req->r_req_flags)) {

Reviewed-by: Jeff Layton <jlayton@kernel.org>

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index e11d31401f12..f476c65fb985 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2679,7 +2679,28 @@  static int __prepare_send_request(struct ceph_mds_session *session,
 	struct ceph_mds_client *mdsc = session->s_mdsc;
 	struct ceph_mds_request_head_old *rhead;
 	struct ceph_msg *msg;
-	int flags = 0;
+	int flags = 0, max_retry;
+
+	/*
+	 * The type of 'r_attempts' in kernel 'ceph_mds_request'
+	 * is 'int', while in 'ceph_mds_request_head' the type of
+	 * 'num_retry' is '__u8'. So in case the request retries
+	 *  exceeding 256 times, the MDS will receive a incorrect
+	 *  retry seq.
+	 *
+	 * In this case it's ususally a bug in MDS and continue
+	 * retrying the request makes no sense.
+	 *
+	 * In future this could be fixed in ceph code, so avoid
+	 * using the hardcode here.
+	 */
+	max_retry = sizeof_field(struct ceph_mds_request_head, num_retry);
+	max_retry = 1 << (max_retry * BITS_PER_BYTE);
+	if (req->r_attempts >= max_retry) {
+		pr_warn_ratelimited("%s request tid %llu seq overflow\n",
+				    __func__, req->r_tid);
+		return -EMULTIHOP;
+	}
 
 	req->r_attempts++;
 	if (req->r_inode) {
@@ -2691,7 +2712,7 @@  static int __prepare_send_request(struct ceph_mds_session *session,
 		else
 			req->r_sent_on_mseq = -1;
 	}
-	dout("prepare_send_request %p tid %lld %s (attempt %d)\n", req,
+	dout("%s %p tid %lld %s (attempt %d)\n", __func__, req,
 	     req->r_tid, ceph_mds_op_name(req->r_op), req->r_attempts);
 
 	if (test_bit(CEPH_MDS_R_GOT_UNSAFE, &req->r_req_flags)) {

ceph: stop retrying the request when exceeding 256 times

Commit Message

Comments

Patch