Message ID | 20191115133907.15900.51854.stgit@manet.1015granger.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SUNRPC: Fix another issue with MIC buffer space | expand |
On 15 Nov 2019, at 8:39, Chuck Lever wrote: > xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. > This can happen when xdr_buf_read_mic() is given an xdr_buf with > a small page array (like, only a few bytes). Hi Chuck, Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can happen.. unless perhaps xdr->page_len is 0? Or maybe xdr_shift_buf has bug? I'd prefer to keep the BUG_ON. How can I reproduce it? diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index 14ba9e72a204..71d754fc780e 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct xdr_netobj *mic, unsigned int o if (offset < boundary && (offset + mic->len) > boundary) xdr_shift_buf(buf, boundary - offset); + trace_printk("boundary %d, offset %d, page_len %d\n", boundary, offset, buf->page_len); + /* Is the mic partially in the pages? */ boundary += buf->page_len; if (offset < boundary && (offset + mic->len) > boundary) ^^ that should be enough for me to try to figure out what's doing wrong. Ben
> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote: > > On 15 Nov 2019, at 8:39, Chuck Lever wrote: > >> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. >> This can happen when xdr_buf_read_mic() is given an xdr_buf with >> a small page array (like, only a few bytes). > > Hi Chuck, > > Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can > happen.. unless perhaps xdr->page_len is 0? Or maybe xdr_shift_buf has bug? rpc_prepare_reply_pages() sets buf->page_len to the args->count of the NFS READ request. For really small READs, this can be 2, or 12, or anything smaller than the MIC length. > I'd prefer to keep the BUG_ON. Linus would prefer not to. :-) > How can I reproduce it? I've been using the git regression suite with NFSv4.1 and krb5i. I run it with 12 threads. > diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c > index 14ba9e72a204..71d754fc780e 100644 > --- a/net/sunrpc/xdr.c > +++ b/net/sunrpc/xdr.c > @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct xdr_netobj *mic, unsigned int o > if (offset < boundary && (offset + mic->len) > boundary) > xdr_shift_buf(buf, boundary - offset); > > + trace_printk("boundary %d, offset %d, page_len %d\n", boundary, offset, buf->page_len); > + > /* Is the mic partially in the pages? */ > boundary += buf->page_len; > if (offset < boundary && (offset + mic->len) > boundary) > > ^^ that should be enough for me to try to figure out what's doing wrong. > > Ben > -- Chuck Lever
> On Nov 15, 2019, at 9:41 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > > >> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote: >> >> On 15 Nov 2019, at 8:39, Chuck Lever wrote: >> >>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. >>> This can happen when xdr_buf_read_mic() is given an xdr_buf with >>> a small page array (like, only a few bytes). >> >> Hi Chuck, >> >> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can >> happen.. unless perhaps xdr->page_len is 0? Or maybe xdr_shift_buf has bug? > > rpc_prepare_reply_pages() sets buf->page_len to the args->count of the > NFS READ request. For really small READs, this can be 2, or 12, or > anything smaller than the MIC length. > > >> I'd prefer to keep the BUG_ON. > > Linus would prefer not to. :-) > > >> How can I reproduce it? > > I've been using the git regression suite with NFSv4.1 and krb5i. > I run it with 12 threads. And I enable disconnect injection. Yer basic torture test. >> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c >> index 14ba9e72a204..71d754fc780e 100644 >> --- a/net/sunrpc/xdr.c >> +++ b/net/sunrpc/xdr.c >> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct xdr_netobj *mic, unsigned int o >> if (offset < boundary && (offset + mic->len) > boundary) >> xdr_shift_buf(buf, boundary - offset); >> >> + trace_printk("boundary %d, offset %d, page_len %d\n", boundary, offset, buf->page_len); >> + >> /* Is the mic partially in the pages? */ >> boundary += buf->page_len; >> if (offset < boundary && (offset + mic->len) > boundary) >> >> ^^ that should be enough for me to try to figure out what's doing wrong. >> >> Ben >> > > -- > Chuck Lever -- Chuck Lever
On 15 Nov 2019, at 9:44, Chuck Lever wrote: >> On Nov 15, 2019, at 9:41 AM, Chuck Lever <chuck.lever@oracle.com> >> wrote: >> >> >> >>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington >>> <bcodding@redhat.com> wrote: >>> >>> On 15 Nov 2019, at 8:39, Chuck Lever wrote: >>> >>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. >>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with >>>> a small page array (like, only a few bytes). >>> >>> Hi Chuck, >>> >>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how >>> this can >>> happen.. unless perhaps xdr->page_len is 0? Or maybe xdr_shift_buf >>> has bug? >> >> rpc_prepare_reply_pages() sets buf->page_len to the args->count of >> the >> NFS READ request. For really small READs, this can be 2, or 12, or >> anything smaller than the MIC length. >> >> >>> I'd prefer to keep the BUG_ON. >> >> Linus would prefer not to. :-) Ahh.. >> >> >>> How can I reproduce it? >> >> I've been using the git regression suite with NFSv4.1 and krb5i. >> I run it with 12 threads. > > And I enable disconnect injection. Yer basic torture test. Thanks.. I'll try. Ben
> On Nov 15, 2019, at 9:51 AM, Benjamin Coddington <bcodding@redhat.com> wrote: > > On 15 Nov 2019, at 9:44, Chuck Lever wrote: > >>> On Nov 15, 2019, at 9:41 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >>> >>> >>> >>>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote: >>>> >>>> On 15 Nov 2019, at 8:39, Chuck Lever wrote: >>>> >>>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. >>>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with >>>>> a small page array (like, only a few bytes). >>>> >>>> Hi Chuck, >>>> >>>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can >>>> happen.. unless perhaps xdr->page_len is 0? Or maybe xdr_shift_buf has bug? >>> >>> rpc_prepare_reply_pages() sets buf->page_len to the args->count of the >>> NFS READ request. For really small READs, this can be 2, or 12, or >>> anything smaller than the MIC length. >>> >>> >>>> I'd prefer to keep the BUG_ON. >>> >>> Linus would prefer not to. :-) > > Ahh.. > >>> >>> >>>> How can I reproduce it? >>> >>> I've been using the git regression suite with NFSv4.1 and krb5i. >>> I run it with 12 threads. >> >> And I enable disconnect injection. Yer basic torture test. > > Thanks.. I'll try. I'm not sure why the issue doesn't show up without disconnect injection. Could be that a few of the READs fail, and thus return no data? -- Chuck Lever
> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote: > > On 15 Nov 2019, at 8:39, Chuck Lever wrote: > >> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. >> This can happen when xdr_buf_read_mic() is given an xdr_buf with >> a small page array (like, only a few bytes). > > Hi Chuck, > > Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can > happen.. unless perhaps xdr->page_len is 0? Or maybe xdr_shift_buf has bug? > > I'd prefer to keep the BUG_ON. How can I reproduce it? > > diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c > index 14ba9e72a204..71d754fc780e 100644 > --- a/net/sunrpc/xdr.c > +++ b/net/sunrpc/xdr.c > @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct xdr_netobj *mic, unsigned int o > if (offset < boundary && (offset + mic->len) > boundary) > xdr_shift_buf(buf, boundary - offset); > > + trace_printk("boundary %d, offset %d, page_len %d\n", boundary, offset, buf->page_len); > + Btw, I did some troubleshooting with a printk in here a couple days ago: xdr_buf_read_mic: offset=136 boundary=142 buf->page_len=2 > /* Is the mic partially in the pages? */ > boundary += buf->page_len; > if (offset < boundary && (offset + mic->len) > boundary) > > ^^ that should be enough for me to try to figure out what's doing wrong. > > Ben > -- Chuck Lever
On 15 Nov 2019, at 10:51, Chuck Lever wrote: >> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington >> <bcodding@redhat.com> wrote: >> >> On 15 Nov 2019, at 8:39, Chuck Lever wrote: >> >>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. >>> This can happen when xdr_buf_read_mic() is given an xdr_buf with >>> a small page array (like, only a few bytes). >> >> Hi Chuck, >> >> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how >> this can >> happen.. unless perhaps xdr->page_len is 0? Or maybe xdr_shift_buf >> has bug? >> >> I'd prefer to keep the BUG_ON. How can I reproduce it? >> >> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c >> index 14ba9e72a204..71d754fc780e 100644 >> --- a/net/sunrpc/xdr.c >> +++ b/net/sunrpc/xdr.c >> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, >> struct xdr_netobj *mic, unsigned int o >> if (offset < boundary && (offset + mic->len) > boundary) >> xdr_shift_buf(buf, boundary - offset); >> >> + trace_printk("boundary %d, offset %d, page_len %d\n", >> boundary, offset, buf->page_len); >> + > > Btw, I did some troubleshooting with a printk in here a couple days > ago: > > xdr_buf_read_mic: offset=136 boundary=142 buf->page_len=2 Ok, I see.. Your fix makes sense to me now, not much that xdr_buf_read_mic can do about it, and we get rid of another BUG_ON site. Reviewed-by: Benjamin Coddington <bcodding@redhat.com> BTW that git regression test with disconnect injection is .. mean. I haven't hit the BUG_ON yet, but lots of: [ 171.770148] BUG: unable to handle page fault for address: ffff8880af767986 [ 171.771752] #PF: supervisor read access in kernel mode [ 171.772552] #PF: error_code(0x0000) - not-present page [ 171.780214] RIP: 0010:kmem_cache_alloc+0x66/0x2d0 Ben
> On Nov 15, 2019, at 12:31 PM, Benjamin Coddington <bcodding@redhat.com> wrote: > > On 15 Nov 2019, at 10:51, Chuck Lever wrote: > >>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote: >>> >>> On 15 Nov 2019, at 8:39, Chuck Lever wrote: >>> >>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. >>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with >>>> a small page array (like, only a few bytes). >>> >>> Hi Chuck, >>> >>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can >>> happen.. unless perhaps xdr->page_len is 0? Or maybe xdr_shift_buf has bug? >>> >>> I'd prefer to keep the BUG_ON. How can I reproduce it? >>> >>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c >>> index 14ba9e72a204..71d754fc780e 100644 >>> --- a/net/sunrpc/xdr.c >>> +++ b/net/sunrpc/xdr.c >>> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct xdr_netobj *mic, unsigned int o >>> if (offset < boundary && (offset + mic->len) > boundary) >>> xdr_shift_buf(buf, boundary - offset); >>> >>> + trace_printk("boundary %d, offset %d, page_len %d\n", boundary, offset, buf->page_len); >>> + >> >> Btw, I did some troubleshooting with a printk in here a couple days ago: >> >> xdr_buf_read_mic: offset=136 boundary=142 buf->page_len=2 > > Ok, I see.. Your fix makes sense to me now, not much that xdr_buf_read_mic > can do about it, and we get rid of another BUG_ON site. > > Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Thanks! > BTW that git regression test with disconnect injection is .. mean. I > haven't hit the BUG_ON yet, but lots of: > > [ 171.770148] BUG: unable to handle page fault for address: ffff8880af767986 > [ 171.771752] #PF: supervisor read access in kernel mode > [ 171.772552] #PF: error_code(0x0000) - not-present page > [ 171.780214] RIP: 0010:kmem_cache_alloc+0x66/0x2d0 !!! haven't seen that one. (I'm on NFS/RDMA. Should I try TCP, or just let you chase this one down?) -- Chuck Lever
On 15 Nov 2019, at 12:34, Chuck Lever wrote: >> On Nov 15, 2019, at 12:31 PM, Benjamin Coddington >> <bcodding@redhat.com> wrote: >> >> On 15 Nov 2019, at 10:51, Chuck Lever wrote: >> >>>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington >>>> <bcodding@redhat.com> wrote: >>>> >>>> On 15 Nov 2019, at 8:39, Chuck Lever wrote: >>>> >>>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. >>>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with >>>>> a small page array (like, only a few bytes). >>>> >>>> Hi Chuck, >>>> >>>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how >>>> this can >>>> happen.. unless perhaps xdr->page_len is 0? Or maybe xdr_shift_buf >>>> has bug? >>>> >>>> I'd prefer to keep the BUG_ON. How can I reproduce it? >>>> >>>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c >>>> index 14ba9e72a204..71d754fc780e 100644 >>>> --- a/net/sunrpc/xdr.c >>>> +++ b/net/sunrpc/xdr.c >>>> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, >>>> struct xdr_netobj *mic, unsigned int o >>>> if (offset < boundary && (offset + mic->len) > boundary) >>>> xdr_shift_buf(buf, boundary - offset); >>>> >>>> + trace_printk("boundary %d, offset %d, page_len %d\n", >>>> boundary, offset, buf->page_len); >>>> + >>> >>> Btw, I did some troubleshooting with a printk in here a couple days >>> ago: >>> >>> xdr_buf_read_mic: offset=136 boundary=142 buf->page_len=2 >> >> Ok, I see.. Your fix makes sense to me now, not much that >> xdr_buf_read_mic >> can do about it, and we get rid of another BUG_ON site. >> >> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> > > Thanks! > > >> BTW that git regression test with disconnect injection is .. mean. I >> haven't hit the BUG_ON yet, but lots of: >> >> [ 171.770148] BUG: unable to handle page fault for address: >> ffff8880af767986 >> [ 171.771752] #PF: supervisor read access in kernel mode >> [ 171.772552] #PF: error_code(0x0000) - not-present page >> [ 171.780214] RIP: 0010:kmem_cache_alloc+0x66/0x2d0 > > !!! haven't seen that one. (I'm on NFS/RDMA. Should I try TCP, or > just let you chase this one down?) It is TCP. Looks like I just grabbed Trond's master branch, and it is at a bit of a random place in between lost of merges.. somewhere after 5.4-rc6. Let me retry with something tagged. Ben
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index 14ba9e72a204..f3104be8ff5d 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -436,13 +436,12 @@ __be32 *xdr_encode_opaque(__be32 *p, const void *ptr, unsigned int nbytes) } /** - * xdr_shrink_pagelen + * xdr_shrink_pagelen - shrinks buf->pages by up to @len bytes * @buf: xdr_buf * @len: bytes to remove from buf->pages * - * Shrinks XDR buffer's page array buf->pages by - * 'len' bytes. The extra data is not lost, but is instead - * moved into the tail. + * The extra data is not lost, but is instead moved into buf->tail. + * Returns the actual number of bytes moved. */ static unsigned int xdr_shrink_pagelen(struct xdr_buf *buf, size_t len) @@ -455,8 +454,8 @@ __be32 *xdr_encode_opaque(__be32 *p, const void *ptr, unsigned int nbytes) result = 0; tail = buf->tail; - BUG_ON (len > pglen); - + if (len > buf->page_len) + len = buf-> page_len; tailbuf_len = buf->buflen - buf->head->iov_len - buf->page_len; /* Shift the tail first */
xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. This can happen when xdr_buf_read_mic() is given an xdr_buf with a small page array (like, only a few bytes). Instead, just cap the number of bytes that xdr_shrink_pagelen() will move. Fixes: 5f1bc39979d ("SUNRPC: Fix buffer handling of GSS MIC ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xdr.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)