diff mbox

[6/6] NFSv4: allow getacl rpc to allocate pages on demand

Message ID 1487470070-32358-7-git-send-email-bfields@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bruce Fields Feb. 19, 2017, 2:07 a.m. UTC
From: Weston Andros Adamson <dros@netapp.com>

Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
allocate whatever pages we need on demand.  This is what the NFSv3 ACL
code does.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfs/nfs4proc.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

Comments

Chuck Lever Feb. 19, 2017, 7:29 p.m. UTC | #1
> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> 
> From: Weston Andros Adamson <dros@netapp.com>
> 
> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
> code does.

The patch description does not explain why this change is
being done.

The matching hack in xprtrdma is in rpcrdma_convert_iovs().
Note that those are GFP_ATOMIC allocations, whereas here
they are GFP_KERNEL, and are thus more reliable.

IMO this is a step in the wrong direction. We should not be
adding more upper layer dependencies on memory allocation
in the transport layer.

I strongly prefer that rather the NFSACL code works the way
this code currently does, and that the hacks be removed from
the transport implementations.


> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
> fs/nfs/nfs4proc.c | 23 +++++++----------------
> 1 file changed, 7 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 3e3dbba4aa74..7842c73fddfc 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5068,6 +5068,7 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
> 	struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, };
> 	struct nfs_getaclargs args = {
> 		.fh = NFS_FH(inode),
> +		/* The xdr layer may allocate pages here. */

Sure, it is called xdr_partial_copy_from_skb, but that function
lives in socklib.c and is invoked only from xprtsock.c. Also, a
similar hack has to be done in xprtrdma.

So IMO this is a transport layer hack, and not part of the
(generic) XDR layer.


> 		.acl_pages = pages,
> 	};
> 	struct nfs_getaclres res = {
> @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
> 		.rpc_argp = &args,
> 		.rpc_resp = &res,
> 	};
> -	unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1;
> -	int ret = -ENOMEM, i;
> -
> -	if (npages > ARRAY_SIZE(pages))
> -		return -ERANGE;
> -
> -	for (i = 0; i < npages; i++) {
> -		pages[i] = alloc_page(GFP_KERNEL);
> -		if (!pages[i])
> -			goto out_free;
> -	}
> +	int ret, i;
> 
> 	/* for decoding across pages */
> 	res.acl_scratch = alloc_page(GFP_KERNEL);
> 	if (!res.acl_scratch)
> -		goto out_free;
> +		return -ENOMEM;
> 
> -	args.acl_len = npages * PAGE_SIZE;
> +	args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT;
> 
> -	dprintk("%s  buf %p buflen %zu npages %d args.acl_len %zu\n",
> -		__func__, buf, buflen, npages, args.acl_len);
> +	dprintk("%s  buf %p buflen %zu args.acl_len %zu\n",
> +		__func__, buf, buflen, args.acl_len);
> 	ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode),
> 			     &msg, &args.seq_args, &res.seq_res, 0);
> 	if (ret == 0)
> 		ret = res.acl_len;
> -out_free:
> +
> 	for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++)
> 		__free_page(pages[i]);
> 	__free_page(res.acl_scratch);
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields Feb. 20, 2017, 4:09 p.m. UTC | #2
On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
> 
> > On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> > 
> > From: Weston Andros Adamson <dros@netapp.com>
> > 
> > Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
> > allocate whatever pages we need on demand.  This is what the NFSv3 ACL
> > code does.
> 
> The patch description does not explain why this change is
> being done.

The only justification I see is avoiding allocating pages unnecessarily.
Without this patch, for each getacl, we allocate 17 pages (if I'm
calculating correctly) and probably rarely use most of them.

In the v3 case I think it's 7 pages instead of 17.

Do we have reason to believe that's actually a big deal?

--b.

> The matching hack in xprtrdma is in rpcrdma_convert_iovs().
> Note that those are GFP_ATOMIC allocations, whereas here
> they are GFP_KERNEL, and are thus more reliable.
> 
> IMO this is a step in the wrong direction. We should not be
> adding more upper layer dependencies on memory allocation
> in the transport layer.
> 
> I strongly prefer that rather the NFSACL code works the way
> this code currently does, and that the hacks be removed from
> the transport implementations.
> 
> 
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> > fs/nfs/nfs4proc.c | 23 +++++++----------------
> > 1 file changed, 7 insertions(+), 16 deletions(-)
> > 
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 3e3dbba4aa74..7842c73fddfc 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -5068,6 +5068,7 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
> > 	struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, };
> > 	struct nfs_getaclargs args = {
> > 		.fh = NFS_FH(inode),
> > +		/* The xdr layer may allocate pages here. */
> 
> Sure, it is called xdr_partial_copy_from_skb, but that function
> lives in socklib.c and is invoked only from xprtsock.c. Also, a
> similar hack has to be done in xprtrdma.
> 
> So IMO this is a transport layer hack, and not part of the
> (generic) XDR layer.
> 
> 
> > 		.acl_pages = pages,
> > 	};
> > 	struct nfs_getaclres res = {
> > @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
> > 		.rpc_argp = &args,
> > 		.rpc_resp = &res,
> > 	};
> > -	unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1;
> > -	int ret = -ENOMEM, i;
> > -
> > -	if (npages > ARRAY_SIZE(pages))
> > -		return -ERANGE;
> > -
> > -	for (i = 0; i < npages; i++) {
> > -		pages[i] = alloc_page(GFP_KERNEL);
> > -		if (!pages[i])
> > -			goto out_free;
> > -	}
> > +	int ret, i;
> > 
> > 	/* for decoding across pages */
> > 	res.acl_scratch = alloc_page(GFP_KERNEL);
> > 	if (!res.acl_scratch)
> > -		goto out_free;
> > +		return -ENOMEM;
> > 
> > -	args.acl_len = npages * PAGE_SIZE;
> > +	args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT;
> > 
> > -	dprintk("%s  buf %p buflen %zu npages %d args.acl_len %zu\n",
> > -		__func__, buf, buflen, npages, args.acl_len);
> > +	dprintk("%s  buf %p buflen %zu args.acl_len %zu\n",
> > +		__func__, buf, buflen, args.acl_len);
> > 	ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode),
> > 			     &msg, &args.seq_args, &res.seq_res, 0);
> > 	if (ret == 0)
> > 		ret = res.acl_len;
> > -out_free:
> > +
> > 	for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++)
> > 		__free_page(pages[i]);
> > 	__free_page(res.acl_scratch);
> > -- 
> > 2.9.3
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever Feb. 20, 2017, 4:42 p.m. UTC | #3
> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote:
> 
> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
>> 
>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>>> 
>>> From: Weston Andros Adamson <dros@netapp.com>
>>> 
>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
>>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
>>> code does.
>> 
>> The patch description does not explain why this change is
>> being done.
> 
> The only justification I see is avoiding allocating pages unnecessarily.

That makes sense. Is there a real world workload that has seen
a negative effect?


> Without this patch, for each getacl, we allocate 17 pages (if I'm
> calculating correctly) and probably rarely use most of them.
> 
> In the v3 case I think it's 7 pages instead of 17.

I would have guessed 9. Out of curiosity, is there a reason
documented for these size limits?


> Do we have reason to believe that's actually a big deal?

The xprtrdma hack already has to allocate the full set of
pages for NFSACL GETACL.

If NFSv4 GETATTR(fs_acl4) already works this way and there
are no real problems, I can't see any issue with NFSACL GETACL
using the same mechanism to retrieve smaller objects.

The only risk to overallocating is that it could drive some
page reclaims. The upper layer should be in a better position
to prevent deadlock in this case than the transport layer is.
However if NFSv4 doesn't see a problem here, then there isn't
likely to be an issue for NFSACL GETACL, IMO.


> --b.
> 
>> The matching hack in xprtrdma is in rpcrdma_convert_iovs().
>> Note that those are GFP_ATOMIC allocations, whereas here
>> they are GFP_KERNEL, and are thus more reliable.
>> 
>> IMO this is a step in the wrong direction. We should not be
>> adding more upper layer dependencies on memory allocation
>> in the transport layer.
>> 
>> I strongly prefer that rather the NFSACL code works the way
>> this code currently does, and that the hacks be removed from
>> the transport implementations.
>> 
>> 
>>> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
>>> ---
>>> fs/nfs/nfs4proc.c | 23 +++++++----------------
>>> 1 file changed, 7 insertions(+), 16 deletions(-)
>>> 
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index 3e3dbba4aa74..7842c73fddfc 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -5068,6 +5068,7 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
>>> 	struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, };
>>> 	struct nfs_getaclargs args = {
>>> 		.fh = NFS_FH(inode),
>>> +		/* The xdr layer may allocate pages here. */
>> 
>> Sure, it is called xdr_partial_copy_from_skb, but that function
>> lives in socklib.c and is invoked only from xprtsock.c. Also, a
>> similar hack has to be done in xprtrdma.
>> 
>> So IMO this is a transport layer hack, and not part of the
>> (generic) XDR layer.
>> 
>> 
>>> 		.acl_pages = pages,
>>> 	};
>>> 	struct nfs_getaclres res = {
>>> @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
>>> 		.rpc_argp = &args,
>>> 		.rpc_resp = &res,
>>> 	};
>>> -	unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1;
>>> -	int ret = -ENOMEM, i;
>>> -
>>> -	if (npages > ARRAY_SIZE(pages))
>>> -		return -ERANGE;
>>> -
>>> -	for (i = 0; i < npages; i++) {
>>> -		pages[i] = alloc_page(GFP_KERNEL);
>>> -		if (!pages[i])
>>> -			goto out_free;
>>> -	}
>>> +	int ret, i;
>>> 
>>> 	/* for decoding across pages */
>>> 	res.acl_scratch = alloc_page(GFP_KERNEL);
>>> 	if (!res.acl_scratch)
>>> -		goto out_free;
>>> +		return -ENOMEM;
>>> 
>>> -	args.acl_len = npages * PAGE_SIZE;
>>> +	args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT;
>>> 
>>> -	dprintk("%s  buf %p buflen %zu npages %d args.acl_len %zu\n",
>>> -		__func__, buf, buflen, npages, args.acl_len);
>>> +	dprintk("%s  buf %p buflen %zu args.acl_len %zu\n",
>>> +		__func__, buf, buflen, args.acl_len);
>>> 	ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode),
>>> 			     &msg, &args.seq_args, &res.seq_res, 0);
>>> 	if (ret == 0)
>>> 		ret = res.acl_len;
>>> -out_free:
>>> +
>>> 	for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++)
>>> 		__free_page(pages[i]);
>>> 	__free_page(res.acl_scratch);
>>> -- 
>>> 2.9.3
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields Feb. 20, 2017, 5:15 p.m. UTC | #4
On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote:
> 
> > On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote:
> > 
> > On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
> >> 
> >>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> >>> 
> >>> From: Weston Andros Adamson <dros@netapp.com>
> >>> 
> >>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
> >>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
> >>> code does.
> >> 
> >> The patch description does not explain why this change is
> >> being done.
> > 
> > The only justification I see is avoiding allocating pages unnecessarily.
> 
> That makes sense. Is there a real world workload that has seen
> a negative effect?
> 
> 
> > Without this patch, for each getacl, we allocate 17 pages (if I'm
> > calculating correctly) and probably rarely use most of them.
> > 
> > In the v3 case I think it's 7 pages instead of 17.
> 
> I would have guessed 9. Out of curiosity, is there a reason
> documented for these size limits?


In the v4 case: 

	#define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)

And I believe XATTR_SIZE_MAX is a global maximum on the size of any
extend attribute value.

In the v3 case:

	/* Maximum number of ACL entries over NFS */
	#define NFS_ACL_MAX_ENTRIES     1024

	#define NFSACL_MAXPAGES         ((2*(8+12*NFS_ACL_MAX_ENTRIES) + PAGE_SIZE-1) \
	                                 >> PAGE_SHIFT)

No idea where that 1024 comes from.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Gruenbacher Feb. 20, 2017, 9:31 p.m. UTC | #5
On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote:
>>
>> > On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote:
>> >
>> > On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
>> >>
>> >>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>> >>>
>> >>> From: Weston Andros Adamson <dros@netapp.com>
>> >>>
>> >>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
>> >>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
>> >>> code does.
>> >>
>> >> The patch description does not explain why this change is
>> >> being done.
>> >
>> > The only justification I see is avoiding allocating pages unnecessarily.
>>
>> That makes sense. Is there a real world workload that has seen
>> a negative effect?
>>
>>
>> > Without this patch, for each getacl, we allocate 17 pages (if I'm
>> > calculating correctly) and probably rarely use most of them.
>> >
>> > In the v3 case I think it's 7 pages instead of 17.
>>
>> I would have guessed 9. Out of curiosity, is there a reason
>> documented for these size limits?
>
>
> In the v4 case:
>
>         #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
>
> And I believe XATTR_SIZE_MAX is a global maximum on the size of any
> extend attribute value.

XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4
ACLs are passed through unchanged in "system.nfs4_acl".

> In the v3 case:
>
>         /* Maximum number of ACL entries over NFS */
>         #define NFS_ACL_MAX_ENTRIES     1024
>
>         #define NFSACL_MAXPAGES         ((2*(8+12*NFS_ACL_MAX_ENTRIES) + PAGE_SIZE-1) \
>                                          >> PAGE_SHIFT)
>
> No idea where that 1024 comes from.

The 1024-entry limit is arbitrary.

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Gruenbacher Feb. 20, 2017, 10:38 p.m. UTC | #6
On Sun, Feb 19, 2017 at 3:07 AM, J. Bruce Fields <bfields@redhat.com> wrote:
> From: Weston Andros Adamson <dros@netapp.com>
>
> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
> code does.
>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Andreas Gruenbacher <agreuenba@rdhat.com>

> ---
>  fs/nfs/nfs4proc.c | 23 +++++++----------------
>  1 file changed, 7 insertions(+), 16 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 3e3dbba4aa74..7842c73fddfc 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5068,6 +5068,7 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
>         struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, };
>         struct nfs_getaclargs args = {
>                 .fh = NFS_FH(inode),
> +               /* The xdr layer may allocate pages here. */
>                 .acl_pages = pages,
>         };
>         struct nfs_getaclres res = {
> @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
>                 .rpc_argp = &args,
>                 .rpc_resp = &res,
>         };
> -       unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1;
> -       int ret = -ENOMEM, i;
> -
> -       if (npages > ARRAY_SIZE(pages))
> -               return -ERANGE;
> -
> -       for (i = 0; i < npages; i++) {
> -               pages[i] = alloc_page(GFP_KERNEL);
> -               if (!pages[i])
> -                       goto out_free;
> -       }
> +       int ret, i;
>
>         /* for decoding across pages */
>         res.acl_scratch = alloc_page(GFP_KERNEL);
>         if (!res.acl_scratch)
> -               goto out_free;
> +               return -ENOMEM;
>
> -       args.acl_len = npages * PAGE_SIZE;
> +       args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT;

Define acl_len directly in the args initializer instead of here.

> -       dprintk("%s  buf %p buflen %zu npages %d args.acl_len %zu\n",
> -               __func__, buf, buflen, npages, args.acl_len);
> +       dprintk("%s  buf %p buflen %zu args.acl_len %zu\n",
> +               __func__, buf, buflen, args.acl_len);
>         ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode),
>                              &msg, &args.seq_args, &res.seq_res, 0);
>         if (ret == 0)
>                 ret = res.acl_len;
> -out_free:
> +
>         for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++)
>                 __free_page(pages[i]);
>         __free_page(res.acl_scratch);
> --
> 2.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields Feb. 21, 2017, 6:35 p.m. UTC | #7
On Mon, Feb 20, 2017 at 11:38:22PM +0100, Andreas Gruenbacher wrote:
> On Sun, Feb 19, 2017 at 3:07 AM, J. Bruce Fields <bfields@redhat.com> wrote:
> > @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
> >                 .rpc_argp = &args,
> >                 .rpc_resp = &res,
> >         };
> > -       unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1;
> > -       int ret = -ENOMEM, i;
> > -
> > -       if (npages > ARRAY_SIZE(pages))
> > -               return -ERANGE;
> > -
> > -       for (i = 0; i < npages; i++) {
> > -               pages[i] = alloc_page(GFP_KERNEL);
> > -               if (!pages[i])
> > -                       goto out_free;
> > -       }
> > +       int ret, i;
> >
> >         /* for decoding across pages */
> >         res.acl_scratch = alloc_page(GFP_KERNEL);
> >         if (!res.acl_scratch)
> > -               goto out_free;
> > +               return -ENOMEM;
> >
> > -       args.acl_len = npages * PAGE_SIZE;
> > +       args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT;
> 
> Define acl_len directly in the args initializer instead of here.

Got it, thanks.  I'll send another revision of the series.

--b.

> 
> > -       dprintk("%s  buf %p buflen %zu npages %d args.acl_len %zu\n",
> > -               __func__, buf, buflen, npages, args.acl_len);
> > +       dprintk("%s  buf %p buflen %zu args.acl_len %zu\n",
> > +               __func__, buf, buflen, args.acl_len);
> >         ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode),
> >                              &msg, &args.seq_args, &res.seq_res, 0);
> >         if (ret == 0)
> >                 ret = res.acl_len;
> > -out_free:
> > +
> >         for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++)
> >                 __free_page(pages[i]);
> >         __free_page(res.acl_scratch);
> > --
> > 2.9.3
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever Feb. 21, 2017, 6:46 p.m. UTC | #8
Hi Andreas-


> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote:
> 
> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote:
>>> 
>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote:
>>>> 
>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
>>>>> 
>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>>>>>> 
>>>>>> From: Weston Andros Adamson <dros@netapp.com>
>>>>>> 
>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
>>>>>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
>>>>>> code does.
>>>>> 
>>>>> The patch description does not explain why this change is
>>>>> being done.
>>>> 
>>>> The only justification I see is avoiding allocating pages unnecessarily.
>>> 
>>> That makes sense. Is there a real world workload that has seen
>>> a negative effect?
>>> 
>>> 
>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm
>>>> calculating correctly) and probably rarely use most of them.
>>>> 
>>>> In the v3 case I think it's 7 pages instead of 17.
>>> 
>>> I would have guessed 9. Out of curiosity, is there a reason
>>> documented for these size limits?
>> 
>> 
>> In the v4 case:
>> 
>>        #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
>> 
>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any
>> extend attribute value.
> 
> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4
> ACLs are passed through unchanged in "system.nfs4_acl".

"Extended attribute" means this is a Linux-specific limit?

Is there anything that prevents a non-Linux system from constructing
or returning an ACL that is larger than that?

What happens on a Linux client when a server returns an ACL that does
not fit in this allotment?


>> In the v3 case:
>> 
>>        /* Maximum number of ACL entries over NFS */
>>        #define NFS_ACL_MAX_ENTRIES     1024
>> 
>>        #define NFSACL_MAXPAGES         ((2*(8+12*NFS_ACL_MAX_ENTRIES) + PAGE_SIZE-1) \
>>>> PAGE_SHIFT)
>> 
>> No idea where that 1024 comes from.
> 
> The 1024-entry limit is arbitrary.
> 
> Andreas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Weston Andros Adamson Feb. 21, 2017, 7:45 p.m. UTC | #9
> On Feb 21, 2017, at 1:35 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> 
> On Mon, Feb 20, 2017 at 11:38:22PM +0100, Andreas Gruenbacher wrote:
>> On Sun, Feb 19, 2017 at 3:07 AM, J. Bruce Fields <bfields@redhat.com> wrote:
>>> @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
>>>                .rpc_argp = &args,
>>>                .rpc_resp = &res,
>>>        };
>>> -       unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1;
>>> -       int ret = -ENOMEM, i;
>>> -
>>> -       if (npages > ARRAY_SIZE(pages))
>>> -               return -ERANGE;
>>> -
>>> -       for (i = 0; i < npages; i++) {
>>> -               pages[i] = alloc_page(GFP_KERNEL);
>>> -               if (!pages[i])
>>> -                       goto out_free;
>>> -       }
>>> +       int ret, i;
>>> 
>>>        /* for decoding across pages */
>>>        res.acl_scratch = alloc_page(GFP_KERNEL);
>>>        if (!res.acl_scratch)
>>> -               goto out_free;
>>> +               return -ENOMEM;
>>> 
>>> -       args.acl_len = npages * PAGE_SIZE;
>>> +       args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT;
>> 
>> Define acl_len directly in the args initializer instead of here.
> 
> Got it, thanks.  I'll send another revision of the series.

Thanks Bruce,

You can add:

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>

on the next revision.

-dros

> 
> --b.
> 
>> 
>>> -       dprintk("%s  buf %p buflen %zu npages %d args.acl_len %zu\n",
>>> -               __func__, buf, buflen, npages, args.acl_len);
>>> +       dprintk("%s  buf %p buflen %zu args.acl_len %zu\n",
>>> +               __func__, buf, buflen, args.acl_len);
>>>        ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode),
>>>                             &msg, &args.seq_args, &res.seq_res, 0);
>>>        if (ret == 0)
>>>                ret = res.acl_len;
>>> -out_free:
>>> +
>>>        for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++)
>>>                __free_page(pages[i]);
>>>        __free_page(res.acl_scratch);
>>> --
>>> 2.9.3
>>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Gruenbacher Feb. 21, 2017, 9:21 p.m. UTC | #10
On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> Hi Andreas-
>
>
>> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote:
>>
>> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote:
>>>>
>>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote:
>>>>>
>>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
>>>>>>
>>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>>>>>>>
>>>>>>> From: Weston Andros Adamson <dros@netapp.com>
>>>>>>>
>>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
>>>>>>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
>>>>>>> code does.
>>>>>>
>>>>>> The patch description does not explain why this change is
>>>>>> being done.
>>>>>
>>>>> The only justification I see is avoiding allocating pages unnecessarily.
>>>>
>>>> That makes sense. Is there a real world workload that has seen
>>>> a negative effect?
>>>>
>>>>
>>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm
>>>>> calculating correctly) and probably rarely use most of them.
>>>>>
>>>>> In the v3 case I think it's 7 pages instead of 17.
>>>>
>>>> I would have guessed 9. Out of curiosity, is there a reason
>>>> documented for these size limits?
>>>
>>>
>>> In the v4 case:
>>>
>>>        #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
>>>
>>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any
>>> extend attribute value.
>>
>> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4
>> ACLs are passed through unchanged in "system.nfs4_acl".
>
> "Extended attribute" means this is a Linux-specific limit?

Yes.

> Is there anything that prevents a non-Linux system from constructing
> or returning an ACL that is larger than that?

No.

> What happens on a Linux client when a server returns an ACL that does
> not fit in this allotment?

I would hope an error, but I haven't tested it.

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields Feb. 21, 2017, 9:37 p.m. UTC | #11
On Tue, Feb 21, 2017 at 10:21:05PM +0100, Andreas Gruenbacher wrote:
> On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> > Hi Andreas-
> >
> >
> >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote:
> >>
> >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote:
> >>>>
> >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote:
> >>>>>
> >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
> >>>>>>
> >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> >>>>>>>
> >>>>>>> From: Weston Andros Adamson <dros@netapp.com>
> >>>>>>>
> >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
> >>>>>>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
> >>>>>>> code does.
> >>>>>>
> >>>>>> The patch description does not explain why this change is
> >>>>>> being done.
> >>>>>
> >>>>> The only justification I see is avoiding allocating pages unnecessarily.
> >>>>
> >>>> That makes sense. Is there a real world workload that has seen
> >>>> a negative effect?
> >>>>
> >>>>
> >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm
> >>>>> calculating correctly) and probably rarely use most of them.
> >>>>>
> >>>>> In the v3 case I think it's 7 pages instead of 17.
> >>>>
> >>>> I would have guessed 9. Out of curiosity, is there a reason
> >>>> documented for these size limits?
> >>>
> >>>
> >>> In the v4 case:
> >>>
> >>>        #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
> >>>
> >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any
> >>> extend attribute value.
> >>
> >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4
> >> ACLs are passed through unchanged in "system.nfs4_acl".
> >
> > "Extended attribute" means this is a Linux-specific limit?
> 
> Yes.
> 
> > Is there anything that prevents a non-Linux system from constructing
> > or returning an ACL that is larger than that?
> 
> No.

In the >=v4.1 case there are session limits, but they'll typically be
less.  In the 4.0 case I think there's no explicit limit at all.  In
practice I bet other systems are similar to Linux in that the assume
peers won't send rpc replies or requests larger than about the
maximum-sized read or write.  But again that'll usually be a higher
limit than our ACL limit.

> > What happens on a Linux client when a server returns an ACL that does
> > not fit in this allotment?
> 
> I would hope an error, but I haven't tested it.

I haven't tested either, but it looks to me like the rpc layer receives
a truncated request, the xdr decoding recognizes that it's truncated,
and the result is an -ERANGE.

Looking now I think that my "NFSv4: simplify getacl decoding" changes
that to an -EIO.  More importantly, it makes that an EIO even when the
calling application was only asking for the length, not the actual ACL
data.  I'll fix that.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Gruenbacher Feb. 21, 2017, 9:45 p.m. UTC | #12
On Tue, Feb 21, 2017 at 10:37 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> On Tue, Feb 21, 2017 at 10:21:05PM +0100, Andreas Gruenbacher wrote:
>> On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> > Hi Andreas-
>> >
>> >
>> >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote:
>> >>
>> >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>> >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote:
>> >>>>
>> >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote:
>> >>>>>
>> >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
>> >>>>>>
>> >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>> >>>>>>>
>> >>>>>>> From: Weston Andros Adamson <dros@netapp.com>
>> >>>>>>>
>> >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
>> >>>>>>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
>> >>>>>>> code does.
>> >>>>>>
>> >>>>>> The patch description does not explain why this change is
>> >>>>>> being done.
>> >>>>>
>> >>>>> The only justification I see is avoiding allocating pages unnecessarily.
>> >>>>
>> >>>> That makes sense. Is there a real world workload that has seen
>> >>>> a negative effect?
>> >>>>
>> >>>>
>> >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm
>> >>>>> calculating correctly) and probably rarely use most of them.
>> >>>>>
>> >>>>> In the v3 case I think it's 7 pages instead of 17.
>> >>>>
>> >>>> I would have guessed 9. Out of curiosity, is there a reason
>> >>>> documented for these size limits?
>> >>>
>> >>>
>> >>> In the v4 case:
>> >>>
>> >>>        #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
>> >>>
>> >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any
>> >>> extend attribute value.
>> >>
>> >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4
>> >> ACLs are passed through unchanged in "system.nfs4_acl".
>> >
>> > "Extended attribute" means this is a Linux-specific limit?
>>
>> Yes.
>>
>> > Is there anything that prevents a non-Linux system from constructing
>> > or returning an ACL that is larger than that?
>>
>> No.
>
> In the >=v4.1 case there are session limits, but they'll typically be
> less.  In the 4.0 case I think there's no explicit limit at all.  In
> practice I bet other systems are similar to Linux in that the assume
> peers won't send rpc replies or requests larger than about the
> maximum-sized read or write.  But again that'll usually be a higher
> limit than our ACL limit.
>
>> > What happens on a Linux client when a server returns an ACL that does
>> > not fit in this allotment?
>>
>> I would hope an error, but I haven't tested it.
>
> I haven't tested either, but it looks to me like the rpc layer receives
> a truncated request, the xdr decoding recognizes that it's truncated,
> and the result is an -ERANGE.
>
> Looking now I think that my "NFSv4: simplify getacl decoding" changes
> that to an -EIO.  More importantly, it makes that an EIO even when the
> calling application was only asking for the length, not the actual ACL
> data.  I'll fix that.

Just be careful not to return a length from getxattr(path, name, NULL,
0) that will cause getxattr(path, name, buffer, size) to fail with
ERANGE, please. Otherwise, user space might get very confused.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields Feb. 22, 2017, 1:53 a.m. UTC | #13
On Tue, Feb 21, 2017 at 10:45:35PM +0100, Andreas Gruenbacher wrote:
> On Tue, Feb 21, 2017 at 10:37 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> > On Tue, Feb 21, 2017 at 10:21:05PM +0100, Andreas Gruenbacher wrote:
> >> On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >> > Hi Andreas-
> >> >
> >> >
> >> >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote:
> >> >>
> >> >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> >> >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote:
> >> >>>>
> >> >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote:
> >> >>>>>
> >> >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
> >> >>>>>>
> >> >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> >> >>>>>>>
> >> >>>>>>> From: Weston Andros Adamson <dros@netapp.com>
> >> >>>>>>>
> >> >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
> >> >>>>>>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
> >> >>>>>>> code does.
> >> >>>>>>
> >> >>>>>> The patch description does not explain why this change is
> >> >>>>>> being done.
> >> >>>>>
> >> >>>>> The only justification I see is avoiding allocating pages unnecessarily.
> >> >>>>
> >> >>>> That makes sense. Is there a real world workload that has seen
> >> >>>> a negative effect?
> >> >>>>
> >> >>>>
> >> >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm
> >> >>>>> calculating correctly) and probably rarely use most of them.
> >> >>>>>
> >> >>>>> In the v3 case I think it's 7 pages instead of 17.
> >> >>>>
> >> >>>> I would have guessed 9. Out of curiosity, is there a reason
> >> >>>> documented for these size limits?
> >> >>>
> >> >>>
> >> >>> In the v4 case:
> >> >>>
> >> >>>        #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
> >> >>>
> >> >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any
> >> >>> extend attribute value.
> >> >>
> >> >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4
> >> >> ACLs are passed through unchanged in "system.nfs4_acl".
> >> >
> >> > "Extended attribute" means this is a Linux-specific limit?
> >>
> >> Yes.
> >>
> >> > Is there anything that prevents a non-Linux system from constructing
> >> > or returning an ACL that is larger than that?
> >>
> >> No.
> >
> > In the >=v4.1 case there are session limits, but they'll typically be
> > less.  In the 4.0 case I think there's no explicit limit at all.  In
> > practice I bet other systems are similar to Linux in that the assume
> > peers won't send rpc replies or requests larger than about the
> > maximum-sized read or write.  But again that'll usually be a higher
> > limit than our ACL limit.
> >
> >> > What happens on a Linux client when a server returns an ACL that does
> >> > not fit in this allotment?
> >>
> >> I would hope an error, but I haven't tested it.
> >
> > I haven't tested either, but it looks to me like the rpc layer receives
> > a truncated request, the xdr decoding recognizes that it's truncated,
> > and the result is an -ERANGE.
> >
> > Looking now I think that my "NFSv4: simplify getacl decoding" changes
> > that to an -EIO.  More importantly, it makes that an EIO even when the
> > calling application was only asking for the length, not the actual ACL
> > data.  I'll fix that.
> 
> Just be careful not to return a length from getxattr(path, name, NULL,
> 0) that will cause getxattr(path, name, buffer, size) to fail with
> ERANGE, please. Otherwise, user space might get very confused.

Ugh, OK.  So there could be userspace code that does something like

	while (getxattr(path, name, buf, size) == -ERANGE) {
		/* oops, must have raced with a size change */
		size = getxattr(path, name, NULL, 0);
		buf = realloc(buf, size);
	}

and you'd consider that a kernel bug not a userspace bug?

I suspect that can happen both before and after my changes.

So what do we want for that case?  Just -EIO?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Gruenbacher Feb. 23, 2017, 10:28 a.m. UTC | #14
On Wed, Feb 22, 2017 at 2:53 AM, J. Bruce Fields <bfields@redhat.com> wrote:
> On Tue, Feb 21, 2017 at 10:45:35PM +0100, Andreas Gruenbacher wrote:
>> On Tue, Feb 21, 2017 at 10:37 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>> > On Tue, Feb 21, 2017 at 10:21:05PM +0100, Andreas Gruenbacher wrote:
>> >> On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> >> > Hi Andreas-
>> >> >
>> >> >
>> >> >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote:
>> >> >>
>> >> >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>> >> >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote:
>> >> >>>>
>> >> >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote:
>> >> >>>>>
>> >> >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
>> >> >>>>>>
>> >> >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
>> >> >>>>>>>
>> >> >>>>>>> From: Weston Andros Adamson <dros@netapp.com>
>> >> >>>>>>>
>> >> >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
>> >> >>>>>>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
>> >> >>>>>>> code does.
>> >> >>>>>>
>> >> >>>>>> The patch description does not explain why this change is
>> >> >>>>>> being done.
>> >> >>>>>
>> >> >>>>> The only justification I see is avoiding allocating pages unnecessarily.
>> >> >>>>
>> >> >>>> That makes sense. Is there a real world workload that has seen
>> >> >>>> a negative effect?
>> >> >>>>
>> >> >>>>
>> >> >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm
>> >> >>>>> calculating correctly) and probably rarely use most of them.
>> >> >>>>>
>> >> >>>>> In the v3 case I think it's 7 pages instead of 17.
>> >> >>>>
>> >> >>>> I would have guessed 9. Out of curiosity, is there a reason
>> >> >>>> documented for these size limits?
>> >> >>>
>> >> >>>
>> >> >>> In the v4 case:
>> >> >>>
>> >> >>>        #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
>> >> >>>
>> >> >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any
>> >> >>> extend attribute value.
>> >> >>
>> >> >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4
>> >> >> ACLs are passed through unchanged in "system.nfs4_acl".
>> >> >
>> >> > "Extended attribute" means this is a Linux-specific limit?
>> >>
>> >> Yes.
>> >>
>> >> > Is there anything that prevents a non-Linux system from constructing
>> >> > or returning an ACL that is larger than that?
>> >>
>> >> No.
>> >
>> > In the >=v4.1 case there are session limits, but they'll typically be
>> > less.  In the 4.0 case I think there's no explicit limit at all.  In
>> > practice I bet other systems are similar to Linux in that the assume
>> > peers won't send rpc replies or requests larger than about the
>> > maximum-sized read or write.  But again that'll usually be a higher
>> > limit than our ACL limit.
>> >
>> >> > What happens on a Linux client when a server returns an ACL that does
>> >> > not fit in this allotment?
>> >>
>> >> I would hope an error, but I haven't tested it.
>> >
>> > I haven't tested either, but it looks to me like the rpc layer receives
>> > a truncated request, the xdr decoding recognizes that it's truncated,
>> > and the result is an -ERANGE.
>> >
>> > Looking now I think that my "NFSv4: simplify getacl decoding" changes
>> > that to an -EIO.  More importantly, it makes that an EIO even when the
>> > calling application was only asking for the length, not the actual ACL
>> > data.  I'll fix that.
>>
>> Just be careful not to return a length from getxattr(path, name, NULL,
>> 0) that will cause getxattr(path, name, buffer, size) to fail with
>> ERANGE, please. Otherwise, user space might get very confused.
>
> Ugh, OK.  So there could be userspace code that does something like
>
>         while (getxattr(path, name, buf, size) == -ERANGE) {
>                 /* oops, must have raced with a size change */
>                 size = getxattr(path, name, NULL, 0);
>                 buf = realloc(buf, size);
>         }
>
> and you'd consider that a kernel bug not a userspace bug?

It would at least provoke errors if the above loop (with an additional
check for size == -1) didn't terminate, so I'd like to avoid that. I
see now that there is botched code in fs/xattr.c that tries to prevent
that, so I'll try to fix that so that file systems won't have to
bother.

> I suspect that can happen both before and after my changes.
>
> So what do we want for that case?  Just -EIO?

getxattr and listxattr are trying to cast that kind of error to
-E2BIG, which seems okay.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields Feb. 23, 2017, 8:20 p.m. UTC | #15
On Thu, Feb 23, 2017 at 11:28:46AM +0100, Andreas Gruenbacher wrote:
> On Wed, Feb 22, 2017 at 2:53 AM, J. Bruce Fields <bfields@redhat.com> wrote:
> > On Tue, Feb 21, 2017 at 10:45:35PM +0100, Andreas Gruenbacher wrote:
> >> On Tue, Feb 21, 2017 at 10:37 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> >> > On Tue, Feb 21, 2017 at 10:21:05PM +0100, Andreas Gruenbacher wrote:
> >> >> On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >> >> > Hi Andreas-
> >> >> >
> >> >> >
> >> >> >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote:
> >> >> >>
> >> >> >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> >> >> >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote:
> >> >> >>>>
> >> >> >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote:
> >> >> >>>>>
> >> >> >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
> >> >> >>>>>>
> >> >> >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote:
> >> >> >>>>>>>
> >> >> >>>>>>> From: Weston Andros Adamson <dros@netapp.com>
> >> >> >>>>>>>
> >> >> >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
> >> >> >>>>>>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
> >> >> >>>>>>> code does.
> >> >> >>>>>>
> >> >> >>>>>> The patch description does not explain why this change is
> >> >> >>>>>> being done.
> >> >> >>>>>
> >> >> >>>>> The only justification I see is avoiding allocating pages unnecessarily.
> >> >> >>>>
> >> >> >>>> That makes sense. Is there a real world workload that has seen
> >> >> >>>> a negative effect?
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm
> >> >> >>>>> calculating correctly) and probably rarely use most of them.
> >> >> >>>>>
> >> >> >>>>> In the v3 case I think it's 7 pages instead of 17.
> >> >> >>>>
> >> >> >>>> I would have guessed 9. Out of curiosity, is there a reason
> >> >> >>>> documented for these size limits?
> >> >> >>>
> >> >> >>>
> >> >> >>> In the v4 case:
> >> >> >>>
> >> >> >>>        #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
> >> >> >>>
> >> >> >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any
> >> >> >>> extend attribute value.
> >> >> >>
> >> >> >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4
> >> >> >> ACLs are passed through unchanged in "system.nfs4_acl".
> >> >> >
> >> >> > "Extended attribute" means this is a Linux-specific limit?
> >> >>
> >> >> Yes.
> >> >>
> >> >> > Is there anything that prevents a non-Linux system from constructing
> >> >> > or returning an ACL that is larger than that?
> >> >>
> >> >> No.
> >> >
> >> > In the >=v4.1 case there are session limits, but they'll typically be
> >> > less.  In the 4.0 case I think there's no explicit limit at all.  In
> >> > practice I bet other systems are similar to Linux in that the assume
> >> > peers won't send rpc replies or requests larger than about the
> >> > maximum-sized read or write.  But again that'll usually be a higher
> >> > limit than our ACL limit.
> >> >
> >> >> > What happens on a Linux client when a server returns an ACL that does
> >> >> > not fit in this allotment?
> >> >>
> >> >> I would hope an error, but I haven't tested it.
> >> >
> >> > I haven't tested either, but it looks to me like the rpc layer receives
> >> > a truncated request, the xdr decoding recognizes that it's truncated,
> >> > and the result is an -ERANGE.
> >> >
> >> > Looking now I think that my "NFSv4: simplify getacl decoding" changes
> >> > that to an -EIO.  More importantly, it makes that an EIO even when the
> >> > calling application was only asking for the length, not the actual ACL
> >> > data.  I'll fix that.
> >>
> >> Just be careful not to return a length from getxattr(path, name, NULL,
> >> 0) that will cause getxattr(path, name, buffer, size) to fail with
> >> ERANGE, please. Otherwise, user space might get very confused.
> >
> > Ugh, OK.  So there could be userspace code that does something like
> >
> >         while (getxattr(path, name, buf, size) == -ERANGE) {
> >                 /* oops, must have raced with a size change */
> >                 size = getxattr(path, name, NULL, 0);
> >                 buf = realloc(buf, size);
> >         }
> >
> > and you'd consider that a kernel bug not a userspace bug?
> 
> It would at least provoke errors if the above loop (with an additional
> check for size == -1) didn't terminate, so I'd like to avoid that. I
> see now that there is botched code in fs/xattr.c that tries to prevent
> that, so I'll try to fix that so that file systems won't have to
> bother.

Having seen your patch on fs-devel....  OK, so after that point, we can
choose in NFS to either to return -E2BIG ourselves or to return success
with the large length and let fs/xattr convert to -E2BIG if necessary.
Thanks, that makes sense.

> > I suspect that can happen both before and after my changes.
> >
> > So what do we want for that case?  Just -EIO?
> 
> getxattr and listxattr are trying to cast that kind of error to
> -E2BIG, which seems okay.

Got it, thanks.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 3e3dbba4aa74..7842c73fddfc 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5068,6 +5068,7 @@  static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
 	struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, };
 	struct nfs_getaclargs args = {
 		.fh = NFS_FH(inode),
+		/* The xdr layer may allocate pages here. */
 		.acl_pages = pages,
 	};
 	struct nfs_getaclres res = {
@@ -5079,32 +5080,22 @@  static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
 		.rpc_argp = &args,
 		.rpc_resp = &res,
 	};
-	unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1;
-	int ret = -ENOMEM, i;
-
-	if (npages > ARRAY_SIZE(pages))
-		return -ERANGE;
-
-	for (i = 0; i < npages; i++) {
-		pages[i] = alloc_page(GFP_KERNEL);
-		if (!pages[i])
-			goto out_free;
-	}
+	int ret, i;
 
 	/* for decoding across pages */
 	res.acl_scratch = alloc_page(GFP_KERNEL);
 	if (!res.acl_scratch)
-		goto out_free;
+		return -ENOMEM;
 
-	args.acl_len = npages * PAGE_SIZE;
+	args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT;
 
-	dprintk("%s  buf %p buflen %zu npages %d args.acl_len %zu\n",
-		__func__, buf, buflen, npages, args.acl_len);
+	dprintk("%s  buf %p buflen %zu args.acl_len %zu\n",
+		__func__, buf, buflen, args.acl_len);
 	ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode),
 			     &msg, &args.seq_args, &res.seq_res, 0);
 	if (ret == 0)
 		ret = res.acl_len;
-out_free:
+
 	for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++)
 		__free_page(pages[i]);
 	__free_page(res.acl_scratch);