diff mbox series

[RFC] xprtrdma: Fix XDRBUF_SPARSE_PAGES support

Message ID 160711969521.12486.14622797339746381076.stgit@manet.1015granger.net (mailing list archive)
State New, archived
Headers show
Series [RFC] xprtrdma: Fix XDRBUF_SPARSE_PAGES support | expand

Commit Message

Chuck Lever Dec. 4, 2020, 10:11 p.m. UTC
Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
only when it has determined that a Reply chunk is necessary. There
are plenty of cases where no Reply chunk is needed, but the
XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
rpcrdma_inline_fixup() when it tries to copy parts of the received
Reply into a missing page.

To avoid crashing, handle sparse page allocation up front.

Until XATTR support was added, this issue did not appear often
because the only SPARSE_PAGES consumer always expected a reply large
enough to require a Reply chunk.

Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/rpc_rdma.c |   41 +++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

Here's a stop-gap that can be back-ported to earlier kernels if needed.
Untested. Comments welcome.

Comments

Olga Kornievskaia Dec. 8, 2020, 5:03 p.m. UTC | #1
Hi Chuck,

Is the only user of SPARSE_PAGES is v3 ACLs? Are you fixing this so
that v3 ACLs would work over rdma?

On Fri, Dec 4, 2020 at 5:13 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
> only when it has determined that a Reply chunk is necessary. There
> are plenty of cases where no Reply chunk is needed, but the
> XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
> rpcrdma_inline_fixup() when it tries to copy parts of the received
> Reply into a missing page.
>
> To avoid crashing, handle sparse page allocation up front.
>
> Until XATTR support was added, this issue did not appear often
> because the only SPARSE_PAGES consumer always expected a reply large
> enough to require a Reply chunk.
>
> Reported-by: Olga Kornievskaia <kolga@netapp.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/xprtrdma/rpc_rdma.c |   41 +++++++++++++++++++++++++++++++---------
>  1 file changed, 32 insertions(+), 9 deletions(-)
>
> Here's a stop-gap that can be back-ported to earlier kernels if needed.
> Untested. Comments welcome.
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index 0f5120c7668f..09b7649fa112 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -179,6 +179,32 @@ rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
>                 r_xprt->rx_ep->re_max_inline_recv;
>  }
>
> +/* ACL likes to be lazy in allocating pages. For TCP, these
> + * pages can be allocated during receive processing. Not true
> + * for RDMA, which must always provision receive buffers
> + * up front.
> + */
> +static int
> +rpcrdma_alloc_sparse_pages(struct rpc_rqst *rqst)
> +{
> +       struct xdr_buf *xb = &rqst->rq_rcv_buf;
> +       struct page **ppages;
> +       int len;
> +
> +       len = xb->page_len;
> +       ppages = xb->pages + xdr_buf_pagecount(xb);
> +       while (len > 0) {
> +               if (!*ppages)
> +                       *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
> +               if (!*ppages)
> +                       return -ENOBUFS;
> +               ppages++;
> +               len -= PAGE_SIZE;
> +       }
> +
> +       return 0;
> +}
> +
>  /* Split @vec on page boundaries into SGEs. FMR registers pages, not
>   * a byte range. Other modes coalesce these SGEs into a single MR
>   * when they can.
> @@ -233,15 +259,6 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
>         ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
>         page_base = offset_in_page(xdrbuf->page_base);
>         while (len) {
> -               /* ACL likes to be lazy in allocating pages - ACLs
> -                * are small by default but can get huge.
> -                */
> -               if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) {
> -                       if (!*ppages)
> -                               *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
> -                       if (!*ppages)
> -                               return -ENOBUFS;
> -               }
>                 seg->mr_page = *ppages;
>                 seg->mr_offset = (char *)page_base;
>                 seg->mr_len = min_t(u32, PAGE_SIZE - page_base, len);
> @@ -867,6 +884,12 @@ rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
>         __be32 *p;
>         int ret;
>
> +       if (unlikely(rqst->rq_rcv_buf.flags & XDRBUF_SPARSE_PAGES)) {
> +               ret = rpcrdma_alloc_sparse_pages(rqst);
> +               if (ret)
> +                       return ret;
> +       }
> +
>         rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
>         xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf),
>                         rqst);
>
>
Chuck Lever Dec. 8, 2020, 5:16 p.m. UTC | #2
> On Dec 8, 2020, at 12:03 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> Hi Chuck,
> 
> Is the only user of SPARSE_PAGES is v3 ACLs?

There was one other, the gssproxy upcall, which I've proposed a fix
for, for v5.11. After that fix goes in and Frank's replacements for
GETXATTR and LISTXATTRS are merged, NFSv3 GETACL will be the only
user of XDRBUF_SPARSE_PAGES.


> Are you fixing this so that v3 ACLs would work over rdma?

As far as I know, NFSv3 GETACL works fine over RDMA because it always
expects a large Reply, and thus rpcrdma_marshal_req prepares a Reply
chunk. That triggers the code that allocates the receive pages
properly.

But you might be asking me why I'm sending this patch. Discussion?
Belt and suspenders? Fodder for backport, just in case?

It's always possible that someone might add another XDRBUF_SPARSE_PAGES
user before we are able to update NFSv3 GETACL. The below patch should
help make that situation less problematic, if it should occur.

So, after Frank's patches are merged, this fix is not addressing an
existing active crasher.


> On Fri, Dec 4, 2020 at 5:13 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>> Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
>> only when it has determined that a Reply chunk is necessary. There
>> are plenty of cases where no Reply chunk is needed, but the
>> XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
>> rpcrdma_inline_fixup() when it tries to copy parts of the received
>> Reply into a missing page.
>> 
>> To avoid crashing, handle sparse page allocation up front.
>> 
>> Until XATTR support was added, this issue did not appear often
>> because the only SPARSE_PAGES consumer always expected a reply large
>> enough to require a Reply chunk.
>> 
>> Reported-by: Olga Kornievskaia <kolga@netapp.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> net/sunrpc/xprtrdma/rpc_rdma.c |   41 +++++++++++++++++++++++++++++++---------
>> 1 file changed, 32 insertions(+), 9 deletions(-)
>> 
>> Here's a stop-gap that can be back-ported to earlier kernels if needed.
>> Untested. Comments welcome.
>> 
>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>> index 0f5120c7668f..09b7649fa112 100644
>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>> @@ -179,6 +179,32 @@ rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
>>                r_xprt->rx_ep->re_max_inline_recv;
>> }
>> 
>> +/* ACL likes to be lazy in allocating pages. For TCP, these
>> + * pages can be allocated during receive processing. Not true
>> + * for RDMA, which must always provision receive buffers
>> + * up front.
>> + */
>> +static int
>> +rpcrdma_alloc_sparse_pages(struct rpc_rqst *rqst)
>> +{
>> +       struct xdr_buf *xb = &rqst->rq_rcv_buf;
>> +       struct page **ppages;
>> +       int len;
>> +
>> +       len = xb->page_len;
>> +       ppages = xb->pages + xdr_buf_pagecount(xb);
>> +       while (len > 0) {
>> +               if (!*ppages)
>> +                       *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
>> +               if (!*ppages)
>> +                       return -ENOBUFS;
>> +               ppages++;
>> +               len -= PAGE_SIZE;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> /* Split @vec on page boundaries into SGEs. FMR registers pages, not
>>  * a byte range. Other modes coalesce these SGEs into a single MR
>>  * when they can.
>> @@ -233,15 +259,6 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
>>        ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
>>        page_base = offset_in_page(xdrbuf->page_base);
>>        while (len) {
>> -               /* ACL likes to be lazy in allocating pages - ACLs
>> -                * are small by default but can get huge.
>> -                */
>> -               if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) {
>> -                       if (!*ppages)
>> -                               *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
>> -                       if (!*ppages)
>> -                               return -ENOBUFS;
>> -               }
>>                seg->mr_page = *ppages;
>>                seg->mr_offset = (char *)page_base;
>>                seg->mr_len = min_t(u32, PAGE_SIZE - page_base, len);
>> @@ -867,6 +884,12 @@ rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
>>        __be32 *p;
>>        int ret;
>> 
>> +       if (unlikely(rqst->rq_rcv_buf.flags & XDRBUF_SPARSE_PAGES)) {
>> +               ret = rpcrdma_alloc_sparse_pages(rqst);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +
>>        rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
>>        xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf),
>>                        rqst);
>> 
>> 

--
Chuck Lever
diff mbox series

Patch

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 0f5120c7668f..09b7649fa112 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -179,6 +179,32 @@  rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
 		r_xprt->rx_ep->re_max_inline_recv;
 }
 
+/* ACL likes to be lazy in allocating pages. For TCP, these
+ * pages can be allocated during receive processing. Not true
+ * for RDMA, which must always provision receive buffers
+ * up front.
+ */
+static int
+rpcrdma_alloc_sparse_pages(struct rpc_rqst *rqst)
+{
+	struct xdr_buf *xb = &rqst->rq_rcv_buf;
+	struct page **ppages;
+	int len;
+
+	len = xb->page_len;
+	ppages = xb->pages + xdr_buf_pagecount(xb);
+	while (len > 0) {
+		if (!*ppages)
+			*ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
+		if (!*ppages)
+			return -ENOBUFS;
+		ppages++;
+		len -= PAGE_SIZE;
+	}
+
+	return 0;
+}
+
 /* Split @vec on page boundaries into SGEs. FMR registers pages, not
  * a byte range. Other modes coalesce these SGEs into a single MR
  * when they can.
@@ -233,15 +259,6 @@  rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
 	ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
 	page_base = offset_in_page(xdrbuf->page_base);
 	while (len) {
-		/* ACL likes to be lazy in allocating pages - ACLs
-		 * are small by default but can get huge.
-		 */
-		if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) {
-			if (!*ppages)
-				*ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
-			if (!*ppages)
-				return -ENOBUFS;
-		}
 		seg->mr_page = *ppages;
 		seg->mr_offset = (char *)page_base;
 		seg->mr_len = min_t(u32, PAGE_SIZE - page_base, len);
@@ -867,6 +884,12 @@  rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
 	__be32 *p;
 	int ret;
 
+	if (unlikely(rqst->rq_rcv_buf.flags & XDRBUF_SPARSE_PAGES)) {
+		ret = rpcrdma_alloc_sparse_pages(rqst);
+		if (ret)
+			return ret;
+	}
+
 	rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
 	xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf),
 			rqst);