9p/trans_fd: avoid sending req to a cancelled conn

Message ID	AA2DB53B-DFC7-4B88-9515-E4C9AFA6435D@gmail.com (mailing list archive)
State	New, archived
Delegated to:	Eric Van Hensbergen
Headers	show Received: from mail-oo1-f45.google.com (mail-oo1-f45.google.com [209.85.161.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 316671B7C3 for <v9fs@lists.linux.dev>; Tue, 8 Aug 2023 16:44:43 +0000 (UTC) From: Sishuai Gong <sishuai.system@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Subject: [PATCH] 9p/trans_fd: avoid sending req to a cancelled conn Message-Id: <AA2DB53B-DFC7-4B88-9515-E4C9AFA6435D@gmail.com> Date: Tue, 8 Aug 2023 12:44:31 -0400 Cc: v9fs@lists.linux.dev To: ericvh@kernel.org, lucho@ionkov.net, asmadeus@codewreck.org, linux_oss@crudebyte.com
Series	9p/trans_fd: avoid sending req to a cancelled conn \| expand 9p/trans_fd: avoid sending req to a cancelled conn

Message ID

AA2DB53B-DFC7-4B88-9515-E4C9AFA6435D@gmail.com (mailing list archive)

State

New, archived

Delegated to:

Eric Van Hensbergen

Headers

From: Sishuai Gong <sishuai.system@gmail.com>
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\))
Subject: [PATCH] 9p/trans_fd: avoid sending req to a cancelled conn
Message-Id: <AA2DB53B-DFC7-4B88-9515-E4C9AFA6435D@gmail.com>
Date: Tue, 8 Aug 2023 12:44:31 -0400
Cc: v9fs@lists.linux.dev
To: ericvh@kernel.org,
 lucho@ionkov.net,
 asmadeus@codewreck.org,
 linux_oss@crudebyte.com

Series

9p/trans_fd: avoid sending req to a cancelled conn | expand

Commit Message

Sishuai Gong Aug. 8, 2023, 4:44 p.m. UTC

When a connection is cancelled by p9_conn_cancel(), all requests on it
should be cancelled---mark req->status as REQ_STATUS_ERROR. However,
because a race over m->err between p9_conn_cancel() and p9_fd_request(),
p9_fd_request might see the old value of m->err, think that the connection
is NOT cancelled, and then add new requests to this cancelled connection.

Fixing this issue by lock-protecting the check on m->err.

Signed-off-by: Sishuai Gong <sishuai.system@gmail.com>
---
 net/9p/trans_fd.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

—

Comments

Dominique Martinet Oct. 23, 2023, 12:54 p.m. UTC | #1

Sishuai Gong wrote on Tue, Aug 08, 2023 at 12:44:31PM -0400:
> When a connection is cancelled by p9_conn_cancel(), all requests on it
> should be cancelled---mark req->status as REQ_STATUS_ERROR. However,
> because a race over m->err between p9_conn_cancel() and p9_fd_request(),
> p9_fd_request might see the old value of m->err, think that the connection
> is NOT cancelled, and then add new requests to this cancelled connection.
> 
> Fixing this issue by lock-protecting the check on m->err.

Sorry for the (long) delay in reviewing this, and thanks for the patch!

I've got a bit of a mixed feelings on this, as there are other obviously
unlocked m->err accesses in the work functions and we don't really want
to take the req lock there everytime.. These should be ok in the normal
p9_conn_destroy that disarms poll and cancels work before calling
p9_conn_cancel, but the other calls to p9_conn_cancel are simiarly racy
and for example a cancel that'd be initiated by the write end could
process a new request in p9_read_work after the reqs have been drained
out...

On the the other hand, this is better than the current situation so I
think we should take it for now until someone has time to do better;
it's just leak in the unlikely case this race is hit but we might as
well close part of the problem.

I don't see how that'd break anything anyway, so I'll push this to my
next branch tomorrow after testing the rest of my patches.

> 
> Signed-off-by: Sishuai Gong <sishuai.system@gmail.com>
> ---
>  net/9p/trans_fd.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 00b684616e8d..e43a850f5190 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -671,10 +671,14 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
> 
>  	p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
>  		 m, current, &req->tc, req->tc.id);
> -	if (m->err < 0)
> -		return m->err;
> 
>  	spin_lock(&m->req_lock);
> +
> +	if (m->err < 0) {
> +		spin_unlock(&m->req_lock);
> +		return m->err;
> +	}
> +
>  	WRITE_ONCE(req->status, REQ_STATUS_UNSENT);
>  	list_add_tail(&req->req_list, &m->unsent_req_list);
>  	spin_unlock(&m->req_lock);
> —
> 
>

Christian Schoenebeck Oct. 23, 2023, 1:29 p.m. UTC | #2

On Monday, October 23, 2023 2:54:53 PM CEST asmadeus@codewreck.org wrote:
> Sishuai Gong wrote on Tue, Aug 08, 2023 at 12:44:31PM -0400:
> > When a connection is cancelled by p9_conn_cancel(), all requests on it
> > should be cancelled---mark req->status as REQ_STATUS_ERROR. However,
> > because a race over m->err between p9_conn_cancel() and p9_fd_request(),
> > p9_fd_request might see the old value of m->err, think that the connection
> > is NOT cancelled, and then add new requests to this cancelled connection.
> > 
> > Fixing this issue by lock-protecting the check on m->err.
> 
> Sorry for the (long) delay in reviewing this, and thanks for the patch!
> 
> I've got a bit of a mixed feelings on this, as there are other obviously
> unlocked m->err accesses in the work functions and we don't really want
> to take the req lock there everytime.. These should be ok in the normal
> p9_conn_destroy that disarms poll and cancels work before calling
> p9_conn_cancel, but the other calls to p9_conn_cancel are simiarly racy
> and for example a cancel that'd be initiated by the write end could
> process a new request in p9_read_work after the reqs have been drained
> out...
> 
> On the the other hand, this is better than the current situation so I
> think we should take it for now until someone has time to do better;
> it's just leak in the unlikely case this race is hit but we might as
> well close part of the problem.
> 
> I don't see how that'd break anything anyway, so I'll push this to my
> next branch tomorrow after testing the rest of my patches.

Same here: I don't see it making it worse, so at least a little improvement:

Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>

> > 
> > Signed-off-by: Sishuai Gong <sishuai.system@gmail.com>
> > ---
> >  net/9p/trans_fd.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> > index 00b684616e8d..e43a850f5190 100644
> > --- a/net/9p/trans_fd.c
> > +++ b/net/9p/trans_fd.c
> > @@ -671,10 +671,14 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
> > 
> >  	p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
> >  		 m, current, &req->tc, req->tc.id);
> > -	if (m->err < 0)
> > -		return m->err;
> > 
> >  	spin_lock(&m->req_lock);
> > +
> > +	if (m->err < 0) {
> > +		spin_unlock(&m->req_lock);
> > +		return m->err;
> > +	}
> > +
> >  	WRITE_ONCE(req->status, REQ_STATUS_UNSENT);
> >  	list_add_tail(&req->req_list, &m->unsent_req_list);
> >  	spin_unlock(&m->req_lock);
> > —
> > 
> >

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 00b684616e8d..e43a850f5190 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -671,10 +671,14 @@  static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)

 	p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
 		 m, current, &req->tc, req->tc.id);
-	if (m->err < 0)
-		return m->err;

 	spin_lock(&m->req_lock);
+
+	if (m->err < 0) {
+		spin_unlock(&m->req_lock);
+		return m->err;
+	}
+
 	WRITE_ONCE(req->status, REQ_STATUS_UNSENT);
 	list_add_tail(&req->req_list, &m->unsent_req_list);
 	spin_unlock(&m->req_lock);

9p/trans_fd: avoid sending req to a cancelled conn

Commit Message

Comments

Patch