diff mbox series

[v5] xprtrdma: Fix XDRBUF_SPARSE_PAGES support

Message ID 160746979784.1926.1490061321200284214.stgit@manet.1015granger.net (mailing list archive)
State New, archived
Headers show
Series [v5] xprtrdma: Fix XDRBUF_SPARSE_PAGES support | expand

Commit Message

Chuck Lever Dec. 8, 2020, 11:29 p.m. UTC
Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
only when it has determined that a Reply chunk is necessary. There
are plenty of cases where no Reply chunk is needed, but the
XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
rpcrdma_inline_fixup() when it tries to copy parts of the received
Reply into a missing page.

To avoid crashing, handle sparse page allocation up front.

Until XATTR support was added, this issue did not appear often
because the only SPARSE_PAGES consumer always expected a reply large
enough to always require a Reply chunk.

Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
---
 net/sunrpc/xprtrdma/rpc_rdma.c |   40 +++++++++++++++++++++++++++++++---------
 1 file changed, 31 insertions(+), 9 deletions(-)

Hi-

v4 had a bug, which I've fixed. This version has been tested.

In kernels before 5.10-rc5, there are still problems with the way
LISTXATTRS and GETXATTR deal with the tail / XDR pad for the page
content that this patch does not address. So backporting this fix
alone is not enough to get those working again -- more surgery would
be required.

Since none of the other SPARSE_PAGES users have a problem, let's
leave this one on the cutting room floor. It's here in the mail
archive if anyone needs it.


Changes since v4:
- xdr_buf_pagecount() was simply the wrong thing to use.

Changes since v3:
- I swear I am not drunk. I forgot to commit the change before mailing it.

Changes since v2:
- Actually fix the xdr_buf_pagecount() problem

Changes since RFC:
- Ensure xdr_buf_pagecount() is defined in rpc_rdma.c
- noinline the sparse page allocator -- it's an uncommon path

Comments

Olga Kornievskaia Dec. 9, 2020, 4:47 p.m. UTC | #1
On Tue, Dec 8, 2020 at 6:31 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
> only when it has determined that a Reply chunk is necessary. There
> are plenty of cases where no Reply chunk is needed, but the
> XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
> rpcrdma_inline_fixup() when it tries to copy parts of the received
> Reply into a missing page.
>
> To avoid crashing, handle sparse page allocation up front.
>
> Until XATTR support was added, this issue did not appear often
> because the only SPARSE_PAGES consumer always expected a reply large
> enough to always require a Reply chunk.
>
> Reported-by: Olga Kornievskaia <kolga@netapp.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Cc: <stable@vger.kernel.org>
> ---
>  net/sunrpc/xprtrdma/rpc_rdma.c |   40 +++++++++++++++++++++++++++++++---------
>  1 file changed, 31 insertions(+), 9 deletions(-)
>
> Hi-
>
> v4 had a bug, which I've fixed. This version has been tested.

This version on top of the same commit  (rc4) passes generic/013
without oopsing for me too.

> In kernels before 5.10-rc5, there are still problems with the way
> LISTXATTRS and GETXATTR deal with the tail / XDR pad for the page
> content that this patch does not address. So backporting this fix
> alone is not enough to get those working again -- more surgery would
> be required.
>
> Since none of the other SPARSE_PAGES users have a problem, let's
> leave this one on the cutting room floor. It's here in the mail
> archive if anyone needs it.
>
>
> Changes since v4:
> - xdr_buf_pagecount() was simply the wrong thing to use.
>
> Changes since v3:
> - I swear I am not drunk. I forgot to commit the change before mailing it.
>
> Changes since v2:
> - Actually fix the xdr_buf_pagecount() problem
>
> Changes since RFC:
> - Ensure xdr_buf_pagecount() is defined in rpc_rdma.c
> - noinline the sparse page allocator -- it's an uncommon path
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index 0f5120c7668f..c48536f2121f 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -179,6 +179,31 @@ rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
>                 r_xprt->rx_ep->re_max_inline_recv;
>  }
>
> +/* ACL likes to be lazy in allocating pages. For TCP, these
> + * pages can be allocated during receive processing. Not true
> + * for RDMA, which must always provision receive buffers
> + * up front.
> + */
> +static noinline int
> +rpcrdma_alloc_sparse_pages(struct xdr_buf *buf)
> +{
> +       struct page **ppages;
> +       int len;
> +
> +       len = buf->page_len;
> +       ppages = buf->pages + (buf->page_base >> PAGE_SHIFT);
> +       while (len > 0) {
> +               if (!*ppages)
> +                       *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
> +               if (!*ppages)
> +                       return -ENOBUFS;
> +               ppages++;
> +               len -= PAGE_SIZE;
> +       }
> +
> +       return 0;
> +}
> +
>  /* Split @vec on page boundaries into SGEs. FMR registers pages, not
>   * a byte range. Other modes coalesce these SGEs into a single MR
>   * when they can.
> @@ -233,15 +258,6 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
>         ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
>         page_base = offset_in_page(xdrbuf->page_base);
>         while (len) {
> -               /* ACL likes to be lazy in allocating pages - ACLs
> -                * are small by default but can get huge.
> -                */
> -               if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) {
> -                       if (!*ppages)
> -                               *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
> -                       if (!*ppages)
> -                               return -ENOBUFS;
> -               }
>                 seg->mr_page = *ppages;
>                 seg->mr_offset = (char *)page_base;
>                 seg->mr_len = min_t(u32, PAGE_SIZE - page_base, len);
> @@ -867,6 +883,12 @@ rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
>         __be32 *p;
>         int ret;
>
> +       if (unlikely(rqst->rq_rcv_buf.flags & XDRBUF_SPARSE_PAGES)) {
> +               ret = rpcrdma_alloc_sparse_pages(&rqst->rq_rcv_buf);
> +               if (ret)
> +                       return ret;
> +       }
> +
>         rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
>         xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf),
>                         rqst);
>
>
Chuck Lever Dec. 9, 2020, 4:58 p.m. UTC | #2
> On Dec 9, 2020, at 11:47 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> On Tue, Dec 8, 2020 at 6:31 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>> Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
>> only when it has determined that a Reply chunk is necessary. There
>> are plenty of cases where no Reply chunk is needed, but the
>> XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
>> rpcrdma_inline_fixup() when it tries to copy parts of the received
>> Reply into a missing page.
>> 
>> To avoid crashing, handle sparse page allocation up front.
>> 
>> Until XATTR support was added, this issue did not appear often
>> because the only SPARSE_PAGES consumer always expected a reply large
>> enough to always require a Reply chunk.
>> 
>> Reported-by: Olga Kornievskaia <kolga@netapp.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>> net/sunrpc/xprtrdma/rpc_rdma.c |   40 +++++++++++++++++++++++++++++++---------
>> 1 file changed, 31 insertions(+), 9 deletions(-)
>> 
>> Hi-
>> 
>> v4 had a bug, which I've fixed. This version has been tested.
> 
> This version on top of the same commit  (rc4) passes generic/013
> without oopsing for me too.

Excellent, thanks for confirming!


>> In kernels before 5.10-rc5, there are still problems with the way
>> LISTXATTRS and GETXATTR deal with the tail / XDR pad for the page
>> content that this patch does not address. So backporting this fix
>> alone is not enough to get those working again -- more surgery would
>> be required.
>> 
>> Since none of the other SPARSE_PAGES users have a problem, let's
>> leave this one on the cutting room floor. It's here in the mail
>> archive if anyone needs it.
>> 
>> 
>> Changes since v4:
>> - xdr_buf_pagecount() was simply the wrong thing to use.
>> 
>> Changes since v3:
>> - I swear I am not drunk. I forgot to commit the change before mailing it.
>> 
>> Changes since v2:
>> - Actually fix the xdr_buf_pagecount() problem
>> 
>> Changes since RFC:
>> - Ensure xdr_buf_pagecount() is defined in rpc_rdma.c
>> - noinline the sparse page allocator -- it's an uncommon path
>> 
>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>> index 0f5120c7668f..c48536f2121f 100644
>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>> @@ -179,6 +179,31 @@ rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
>>                r_xprt->rx_ep->re_max_inline_recv;
>> }
>> 
>> +/* ACL likes to be lazy in allocating pages. For TCP, these
>> + * pages can be allocated during receive processing. Not true
>> + * for RDMA, which must always provision receive buffers
>> + * up front.
>> + */
>> +static noinline int
>> +rpcrdma_alloc_sparse_pages(struct xdr_buf *buf)
>> +{
>> +       struct page **ppages;
>> +       int len;
>> +
>> +       len = buf->page_len;
>> +       ppages = buf->pages + (buf->page_base >> PAGE_SHIFT);
>> +       while (len > 0) {
>> +               if (!*ppages)
>> +                       *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
>> +               if (!*ppages)
>> +                       return -ENOBUFS;
>> +               ppages++;
>> +               len -= PAGE_SIZE;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> /* Split @vec on page boundaries into SGEs. FMR registers pages, not
>>  * a byte range. Other modes coalesce these SGEs into a single MR
>>  * when they can.
>> @@ -233,15 +258,6 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
>>        ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
>>        page_base = offset_in_page(xdrbuf->page_base);
>>        while (len) {
>> -               /* ACL likes to be lazy in allocating pages - ACLs
>> -                * are small by default but can get huge.
>> -                */
>> -               if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) {
>> -                       if (!*ppages)
>> -                               *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
>> -                       if (!*ppages)
>> -                               return -ENOBUFS;
>> -               }
>>                seg->mr_page = *ppages;
>>                seg->mr_offset = (char *)page_base;
>>                seg->mr_len = min_t(u32, PAGE_SIZE - page_base, len);
>> @@ -867,6 +883,12 @@ rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
>>        __be32 *p;
>>        int ret;
>> 
>> +       if (unlikely(rqst->rq_rcv_buf.flags & XDRBUF_SPARSE_PAGES)) {
>> +               ret = rpcrdma_alloc_sparse_pages(&rqst->rq_rcv_buf);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +
>>        rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
>>        xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf),
>>                        rqst);

--
Chuck Lever
diff mbox series

Patch

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 0f5120c7668f..c48536f2121f 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -179,6 +179,31 @@  rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
 		r_xprt->rx_ep->re_max_inline_recv;
 }
 
+/* ACL likes to be lazy in allocating pages. For TCP, these
+ * pages can be allocated during receive processing. Not true
+ * for RDMA, which must always provision receive buffers
+ * up front.
+ */
+static noinline int
+rpcrdma_alloc_sparse_pages(struct xdr_buf *buf)
+{
+	struct page **ppages;
+	int len;
+
+	len = buf->page_len;
+	ppages = buf->pages + (buf->page_base >> PAGE_SHIFT);
+	while (len > 0) {
+		if (!*ppages)
+			*ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
+		if (!*ppages)
+			return -ENOBUFS;
+		ppages++;
+		len -= PAGE_SIZE;
+	}
+
+	return 0;
+}
+
 /* Split @vec on page boundaries into SGEs. FMR registers pages, not
  * a byte range. Other modes coalesce these SGEs into a single MR
  * when they can.
@@ -233,15 +258,6 @@  rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
 	ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
 	page_base = offset_in_page(xdrbuf->page_base);
 	while (len) {
-		/* ACL likes to be lazy in allocating pages - ACLs
-		 * are small by default but can get huge.
-		 */
-		if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) {
-			if (!*ppages)
-				*ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
-			if (!*ppages)
-				return -ENOBUFS;
-		}
 		seg->mr_page = *ppages;
 		seg->mr_offset = (char *)page_base;
 		seg->mr_len = min_t(u32, PAGE_SIZE - page_base, len);
@@ -867,6 +883,12 @@  rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
 	__be32 *p;
 	int ret;
 
+	if (unlikely(rqst->rq_rcv_buf.flags & XDRBUF_SPARSE_PAGES)) {
+		ret = rpcrdma_alloc_sparse_pages(&rqst->rq_rcv_buf);
+		if (ret)
+			return ret;
+	}
+
 	rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
 	xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf),
 			rqst);