diff mbox series

SUNRPC: Fix another issue with MIC buffer space

Message ID 20191115133907.15900.51854.stgit@manet.1015granger.net (mailing list archive)
State New, archived
Headers show
Series SUNRPC: Fix another issue with MIC buffer space | expand

Commit Message

Chuck Lever Nov. 15, 2019, 1:39 p.m. UTC
xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
This can happen when xdr_buf_read_mic() is given an xdr_buf with
a small page array (like, only a few bytes).

Instead, just cap the number of bytes that xdr_shrink_pagelen()
will move.

Fixes: 5f1bc39979d ("SUNRPC: Fix buffer handling of GSS MIC ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xdr.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Comments

Benjamin Coddington Nov. 15, 2019, 2:35 p.m. UTC | #1
On 15 Nov 2019, at 8:39, Chuck Lever wrote:

> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
> This can happen when xdr_buf_read_mic() is given an xdr_buf with
> a small page array (like, only a few bytes).

Hi Chuck,

Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this 
can
happen.. unless perhaps xdr->page_len is 0?  Or maybe xdr_shift_buf has 
bug?

I'd prefer to keep the BUG_ON.  How can I reproduce it?

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 14ba9e72a204..71d754fc780e 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct 
xdr_netobj *mic, unsigned int o
         if (offset < boundary && (offset + mic->len) > boundary)
                 xdr_shift_buf(buf, boundary - offset);

+       trace_printk("boundary %d, offset %d, page_len %d\n", boundary, 
offset, buf->page_len);
+
         /* Is the mic partially in the pages? */
         boundary += buf->page_len;
         if (offset < boundary && (offset + mic->len) > boundary)

^^ that should be enough for me to try to figure out what's doing wrong.

Ben
Chuck Lever Nov. 15, 2019, 2:41 p.m. UTC | #2
> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote:
> 
> On 15 Nov 2019, at 8:39, Chuck Lever wrote:
> 
>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
>> This can happen when xdr_buf_read_mic() is given an xdr_buf with
>> a small page array (like, only a few bytes).
> 
> Hi Chuck,
> 
> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can
> happen.. unless perhaps xdr->page_len is 0?  Or maybe xdr_shift_buf has bug?

rpc_prepare_reply_pages() sets buf->page_len to the args->count of the
NFS READ request. For really small READs, this can be 2, or 12, or
anything smaller than the MIC length.


> I'd prefer to keep the BUG_ON.

Linus would prefer not to. :-)


> How can I reproduce it?

I've been using the git regression suite with NFSv4.1 and krb5i.
I run it with 12 threads.


> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> index 14ba9e72a204..71d754fc780e 100644
> --- a/net/sunrpc/xdr.c
> +++ b/net/sunrpc/xdr.c
> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct xdr_netobj *mic, unsigned int o
>        if (offset < boundary && (offset + mic->len) > boundary)
>                xdr_shift_buf(buf, boundary - offset);
> 
> +       trace_printk("boundary %d, offset %d, page_len %d\n", boundary, offset, buf->page_len);
> +
>        /* Is the mic partially in the pages? */
>        boundary += buf->page_len;
>        if (offset < boundary && (offset + mic->len) > boundary)
> 
> ^^ that should be enough for me to try to figure out what's doing wrong.
> 
> Ben
> 

--
Chuck Lever
Chuck Lever Nov. 15, 2019, 2:44 p.m. UTC | #3
> On Nov 15, 2019, at 9:41 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote:
>> 
>> On 15 Nov 2019, at 8:39, Chuck Lever wrote:
>> 
>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with
>>> a small page array (like, only a few bytes).
>> 
>> Hi Chuck,
>> 
>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can
>> happen.. unless perhaps xdr->page_len is 0?  Or maybe xdr_shift_buf has bug?
> 
> rpc_prepare_reply_pages() sets buf->page_len to the args->count of the
> NFS READ request. For really small READs, this can be 2, or 12, or
> anything smaller than the MIC length.
> 
> 
>> I'd prefer to keep the BUG_ON.
> 
> Linus would prefer not to. :-)
> 
> 
>> How can I reproduce it?
> 
> I've been using the git regression suite with NFSv4.1 and krb5i.
> I run it with 12 threads.

And I enable disconnect injection. Yer basic torture test.


>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>> index 14ba9e72a204..71d754fc780e 100644
>> --- a/net/sunrpc/xdr.c
>> +++ b/net/sunrpc/xdr.c
>> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct xdr_netobj *mic, unsigned int o
>>       if (offset < boundary && (offset + mic->len) > boundary)
>>               xdr_shift_buf(buf, boundary - offset);
>> 
>> +       trace_printk("boundary %d, offset %d, page_len %d\n", boundary, offset, buf->page_len);
>> +
>>       /* Is the mic partially in the pages? */
>>       boundary += buf->page_len;
>>       if (offset < boundary && (offset + mic->len) > boundary)
>> 
>> ^^ that should be enough for me to try to figure out what's doing wrong.
>> 
>> Ben
>> 
> 
> --
> Chuck Lever

--
Chuck Lever
Benjamin Coddington Nov. 15, 2019, 2:51 p.m. UTC | #4
On 15 Nov 2019, at 9:44, Chuck Lever wrote:

>> On Nov 15, 2019, at 9:41 AM, Chuck Lever <chuck.lever@oracle.com> 
>> wrote:
>>
>>
>>
>>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington 
>>> <bcodding@redhat.com> wrote:
>>>
>>> On 15 Nov 2019, at 8:39, Chuck Lever wrote:
>>>
>>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
>>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with
>>>> a small page array (like, only a few bytes).
>>>
>>> Hi Chuck,
>>>
>>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how 
>>> this can
>>> happen.. unless perhaps xdr->page_len is 0?  Or maybe xdr_shift_buf 
>>> has bug?
>>
>> rpc_prepare_reply_pages() sets buf->page_len to the args->count of 
>> the
>> NFS READ request. For really small READs, this can be 2, or 12, or
>> anything smaller than the MIC length.
>>
>>
>>> I'd prefer to keep the BUG_ON.
>>
>> Linus would prefer not to. :-)

Ahh..

>>
>>
>>> How can I reproduce it?
>>
>> I've been using the git regression suite with NFSv4.1 and krb5i.
>> I run it with 12 threads.
>
> And I enable disconnect injection. Yer basic torture test.

Thanks.. I'll try.

Ben
Chuck Lever Nov. 15, 2019, 2:54 p.m. UTC | #5
> On Nov 15, 2019, at 9:51 AM, Benjamin Coddington <bcodding@redhat.com> wrote:
> 
> On 15 Nov 2019, at 9:44, Chuck Lever wrote:
> 
>>> On Nov 15, 2019, at 9:41 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>> 
>>> 
>>> 
>>>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote:
>>>> 
>>>> On 15 Nov 2019, at 8:39, Chuck Lever wrote:
>>>> 
>>>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
>>>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with
>>>>> a small page array (like, only a few bytes).
>>>> 
>>>> Hi Chuck,
>>>> 
>>>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can
>>>> happen.. unless perhaps xdr->page_len is 0?  Or maybe xdr_shift_buf has bug?
>>> 
>>> rpc_prepare_reply_pages() sets buf->page_len to the args->count of the
>>> NFS READ request. For really small READs, this can be 2, or 12, or
>>> anything smaller than the MIC length.
>>> 
>>> 
>>>> I'd prefer to keep the BUG_ON.
>>> 
>>> Linus would prefer not to. :-)
> 
> Ahh..
> 
>>> 
>>> 
>>>> How can I reproduce it?
>>> 
>>> I've been using the git regression suite with NFSv4.1 and krb5i.
>>> I run it with 12 threads.
>> 
>> And I enable disconnect injection. Yer basic torture test.
> 
> Thanks.. I'll try.

I'm not sure why the issue doesn't show up without disconnect injection.
Could be that a few of the READs fail, and thus return no data?


--
Chuck Lever
Chuck Lever Nov. 15, 2019, 3:51 p.m. UTC | #6
> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote:
> 
> On 15 Nov 2019, at 8:39, Chuck Lever wrote:
> 
>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
>> This can happen when xdr_buf_read_mic() is given an xdr_buf with
>> a small page array (like, only a few bytes).
> 
> Hi Chuck,
> 
> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can
> happen.. unless perhaps xdr->page_len is 0?  Or maybe xdr_shift_buf has bug?
> 
> I'd prefer to keep the BUG_ON.  How can I reproduce it?
> 
> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> index 14ba9e72a204..71d754fc780e 100644
> --- a/net/sunrpc/xdr.c
> +++ b/net/sunrpc/xdr.c
> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct xdr_netobj *mic, unsigned int o
>        if (offset < boundary && (offset + mic->len) > boundary)
>                xdr_shift_buf(buf, boundary - offset);
> 
> +       trace_printk("boundary %d, offset %d, page_len %d\n", boundary, offset, buf->page_len);
> +

Btw, I did some troubleshooting with a printk in here a couple days ago:

xdr_buf_read_mic: offset=136 boundary=142 buf->page_len=2


>        /* Is the mic partially in the pages? */
>        boundary += buf->page_len;
>        if (offset < boundary && (offset + mic->len) > boundary)
> 
> ^^ that should be enough for me to try to figure out what's doing wrong.
> 
> Ben
> 

--
Chuck Lever
Benjamin Coddington Nov. 15, 2019, 5:31 p.m. UTC | #7
On 15 Nov 2019, at 10:51, Chuck Lever wrote:

>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington 
>> <bcodding@redhat.com> wrote:
>>
>> On 15 Nov 2019, at 8:39, Chuck Lever wrote:
>>
>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with
>>> a small page array (like, only a few bytes).
>>
>> Hi Chuck,
>>
>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how 
>> this can
>> happen.. unless perhaps xdr->page_len is 0?  Or maybe xdr_shift_buf 
>> has bug?
>>
>> I'd prefer to keep the BUG_ON.  How can I reproduce it?
>>
>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>> index 14ba9e72a204..71d754fc780e 100644
>> --- a/net/sunrpc/xdr.c
>> +++ b/net/sunrpc/xdr.c
>> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, 
>> struct xdr_netobj *mic, unsigned int o
>>        if (offset < boundary && (offset + mic->len) > boundary)
>>                xdr_shift_buf(buf, boundary - offset);
>>
>> +       trace_printk("boundary %d, offset %d, page_len %d\n", 
>> boundary, offset, buf->page_len);
>> +
>
> Btw, I did some troubleshooting with a printk in here a couple days 
> ago:
>
> xdr_buf_read_mic: offset=136 boundary=142 buf->page_len=2

Ok, I see..  Your fix makes sense to me now, not much that 
xdr_buf_read_mic
can do about it, and we get rid of another BUG_ON site.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>

BTW that git regression test with disconnect injection is .. mean.  I
haven't hit the BUG_ON yet, but lots of:

[  171.770148] BUG: unable to handle page fault for address: 
ffff8880af767986
[  171.771752] #PF: supervisor read access in kernel mode
[  171.772552] #PF: error_code(0x0000) - not-present page
[  171.780214] RIP: 0010:kmem_cache_alloc+0x66/0x2d0

Ben
Chuck Lever Nov. 15, 2019, 5:34 p.m. UTC | #8
> On Nov 15, 2019, at 12:31 PM, Benjamin Coddington <bcodding@redhat.com> wrote:
> 
> On 15 Nov 2019, at 10:51, Chuck Lever wrote:
> 
>>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote:
>>> 
>>> On 15 Nov 2019, at 8:39, Chuck Lever wrote:
>>> 
>>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
>>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with
>>>> a small page array (like, only a few bytes).
>>> 
>>> Hi Chuck,
>>> 
>>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can
>>> happen.. unless perhaps xdr->page_len is 0?  Or maybe xdr_shift_buf has bug?
>>> 
>>> I'd prefer to keep the BUG_ON.  How can I reproduce it?
>>> 
>>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>>> index 14ba9e72a204..71d754fc780e 100644
>>> --- a/net/sunrpc/xdr.c
>>> +++ b/net/sunrpc/xdr.c
>>> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct xdr_netobj *mic, unsigned int o
>>>       if (offset < boundary && (offset + mic->len) > boundary)
>>>               xdr_shift_buf(buf, boundary - offset);
>>> 
>>> +       trace_printk("boundary %d, offset %d, page_len %d\n", boundary, offset, buf->page_len);
>>> +
>> 
>> Btw, I did some troubleshooting with a printk in here a couple days ago:
>> 
>> xdr_buf_read_mic: offset=136 boundary=142 buf->page_len=2
> 
> Ok, I see..  Your fix makes sense to me now, not much that xdr_buf_read_mic
> can do about it, and we get rid of another BUG_ON site.
> 
> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>

Thanks!


> BTW that git regression test with disconnect injection is .. mean.  I
> haven't hit the BUG_ON yet, but lots of:
> 
> [  171.770148] BUG: unable to handle page fault for address: ffff8880af767986
> [  171.771752] #PF: supervisor read access in kernel mode
> [  171.772552] #PF: error_code(0x0000) - not-present page
> [  171.780214] RIP: 0010:kmem_cache_alloc+0x66/0x2d0

!!! haven't seen that one. (I'm on NFS/RDMA. Should I try TCP, or
just let you chase this one down?)


--
Chuck Lever
Benjamin Coddington Nov. 15, 2019, 6:04 p.m. UTC | #9
On 15 Nov 2019, at 12:34, Chuck Lever wrote:

>> On Nov 15, 2019, at 12:31 PM, Benjamin Coddington 
>> <bcodding@redhat.com> wrote:
>>
>> On 15 Nov 2019, at 10:51, Chuck Lever wrote:
>>
>>>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington 
>>>> <bcodding@redhat.com> wrote:
>>>>
>>>> On 15 Nov 2019, at 8:39, Chuck Lever wrote:
>>>>
>>>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
>>>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with
>>>>> a small page array (like, only a few bytes).
>>>>
>>>> Hi Chuck,
>>>>
>>>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how 
>>>> this can
>>>> happen.. unless perhaps xdr->page_len is 0?  Or maybe xdr_shift_buf 
>>>> has bug?
>>>>
>>>> I'd prefer to keep the BUG_ON.  How can I reproduce it?
>>>>
>>>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>>>> index 14ba9e72a204..71d754fc780e 100644
>>>> --- a/net/sunrpc/xdr.c
>>>> +++ b/net/sunrpc/xdr.c
>>>> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, 
>>>> struct xdr_netobj *mic, unsigned int o
>>>>       if (offset < boundary && (offset + mic->len) > boundary)
>>>>               xdr_shift_buf(buf, boundary - offset);
>>>>
>>>> +       trace_printk("boundary %d, offset %d, page_len %d\n", 
>>>> boundary, offset, buf->page_len);
>>>> +
>>>
>>> Btw, I did some troubleshooting with a printk in here a couple days 
>>> ago:
>>>
>>> xdr_buf_read_mic: offset=136 boundary=142 buf->page_len=2
>>
>> Ok, I see..  Your fix makes sense to me now, not much that 
>> xdr_buf_read_mic
>> can do about it, and we get rid of another BUG_ON site.
>>
>> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
>
> Thanks!
>
>
>> BTW that git regression test with disconnect injection is .. mean.  I
>> haven't hit the BUG_ON yet, but lots of:
>>
>> [  171.770148] BUG: unable to handle page fault for address: 
>> ffff8880af767986
>> [  171.771752] #PF: supervisor read access in kernel mode
>> [  171.772552] #PF: error_code(0x0000) - not-present page
>> [  171.780214] RIP: 0010:kmem_cache_alloc+0x66/0x2d0
>
> !!! haven't seen that one. (I'm on NFS/RDMA. Should I try TCP, or
> just let you chase this one down?)

It is TCP.  Looks like I just grabbed Trond's master branch, and it is 
at a
bit of a random place in between lost of merges.. somewhere after 
5.4-rc6.
Let me retry with something tagged.

Ben
diff mbox series

Patch

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 14ba9e72a204..f3104be8ff5d 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -436,13 +436,12 @@  __be32 *xdr_encode_opaque(__be32 *p, const void *ptr, unsigned int nbytes)
 }
 
 /**
- * xdr_shrink_pagelen
+ * xdr_shrink_pagelen - shrinks buf->pages by up to @len bytes
  * @buf: xdr_buf
  * @len: bytes to remove from buf->pages
  *
- * Shrinks XDR buffer's page array buf->pages by
- * 'len' bytes. The extra data is not lost, but is instead
- * moved into the tail.
+ * The extra data is not lost, but is instead moved into buf->tail.
+ * Returns the actual number of bytes moved.
  */
 static unsigned int
 xdr_shrink_pagelen(struct xdr_buf *buf, size_t len)
@@ -455,8 +454,8 @@  __be32 *xdr_encode_opaque(__be32 *p, const void *ptr, unsigned int nbytes)
 
 	result = 0;
 	tail = buf->tail;
-	BUG_ON (len > pglen);
-
+	if (len > buf->page_len)
+		len = buf-> page_len;
 	tailbuf_len = buf->buflen - buf->head->iov_len - buf->page_len;
 
 	/* Shift the tail first */