Message ID | 1487470070-32358-7-git-send-email-bfields@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: > > From: Weston Andros Adamson <dros@netapp.com> > > Instead of preallocating pags, allow xdr_partial_copy_from_skb() to > allocate whatever pages we need on demand. This is what the NFSv3 ACL > code does. The patch description does not explain why this change is being done. The matching hack in xprtrdma is in rpcrdma_convert_iovs(). Note that those are GFP_ATOMIC allocations, whereas here they are GFP_KERNEL, and are thus more reliable. IMO this is a step in the wrong direction. We should not be adding more upper layer dependencies on memory allocation in the transport layer. I strongly prefer that rather the NFSACL code works the way this code currently does, and that the hacks be removed from the transport implementations. > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > --- > fs/nfs/nfs4proc.c | 23 +++++++---------------- > 1 file changed, 7 insertions(+), 16 deletions(-) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 3e3dbba4aa74..7842c73fddfc 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -5068,6 +5068,7 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) > struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, }; > struct nfs_getaclargs args = { > .fh = NFS_FH(inode), > + /* The xdr layer may allocate pages here. */ Sure, it is called xdr_partial_copy_from_skb, but that function lives in socklib.c and is invoked only from xprtsock.c. Also, a similar hack has to be done in xprtrdma. So IMO this is a transport layer hack, and not part of the (generic) XDR layer. > .acl_pages = pages, > }; > struct nfs_getaclres res = { > @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) > .rpc_argp = &args, > .rpc_resp = &res, > }; > - unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1; > - int ret = -ENOMEM, i; > - > - if (npages > ARRAY_SIZE(pages)) > - return -ERANGE; > - > - for (i = 0; i < npages; i++) { > - pages[i] = alloc_page(GFP_KERNEL); > - if (!pages[i]) > - goto out_free; > - } > + int ret, i; > > /* for decoding across pages */ > res.acl_scratch = alloc_page(GFP_KERNEL); > if (!res.acl_scratch) > - goto out_free; > + return -ENOMEM; > > - args.acl_len = npages * PAGE_SIZE; > + args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT; > > - dprintk("%s buf %p buflen %zu npages %d args.acl_len %zu\n", > - __func__, buf, buflen, npages, args.acl_len); > + dprintk("%s buf %p buflen %zu args.acl_len %zu\n", > + __func__, buf, buflen, args.acl_len); > ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode), > &msg, &args.seq_args, &res.seq_res, 0); > if (ret == 0) > ret = res.acl_len; > -out_free: > + > for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++) > __free_page(pages[i]); > __free_page(res.acl_scratch); > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: > > > On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: > > > > From: Weston Andros Adamson <dros@netapp.com> > > > > Instead of preallocating pags, allow xdr_partial_copy_from_skb() to > > allocate whatever pages we need on demand. This is what the NFSv3 ACL > > code does. > > The patch description does not explain why this change is > being done. The only justification I see is avoiding allocating pages unnecessarily. Without this patch, for each getacl, we allocate 17 pages (if I'm calculating correctly) and probably rarely use most of them. In the v3 case I think it's 7 pages instead of 17. Do we have reason to believe that's actually a big deal? --b. > The matching hack in xprtrdma is in rpcrdma_convert_iovs(). > Note that those are GFP_ATOMIC allocations, whereas here > they are GFP_KERNEL, and are thus more reliable. > > IMO this is a step in the wrong direction. We should not be > adding more upper layer dependencies on memory allocation > in the transport layer. > > I strongly prefer that rather the NFSACL code works the way > this code currently does, and that the hacks be removed from > the transport implementations. > > > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > > --- > > fs/nfs/nfs4proc.c | 23 +++++++---------------- > > 1 file changed, 7 insertions(+), 16 deletions(-) > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > index 3e3dbba4aa74..7842c73fddfc 100644 > > --- a/fs/nfs/nfs4proc.c > > +++ b/fs/nfs/nfs4proc.c > > @@ -5068,6 +5068,7 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) > > struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, }; > > struct nfs_getaclargs args = { > > .fh = NFS_FH(inode), > > + /* The xdr layer may allocate pages here. */ > > Sure, it is called xdr_partial_copy_from_skb, but that function > lives in socklib.c and is invoked only from xprtsock.c. Also, a > similar hack has to be done in xprtrdma. > > So IMO this is a transport layer hack, and not part of the > (generic) XDR layer. > > > > .acl_pages = pages, > > }; > > struct nfs_getaclres res = { > > @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) > > .rpc_argp = &args, > > .rpc_resp = &res, > > }; > > - unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1; > > - int ret = -ENOMEM, i; > > - > > - if (npages > ARRAY_SIZE(pages)) > > - return -ERANGE; > > - > > - for (i = 0; i < npages; i++) { > > - pages[i] = alloc_page(GFP_KERNEL); > > - if (!pages[i]) > > - goto out_free; > > - } > > + int ret, i; > > > > /* for decoding across pages */ > > res.acl_scratch = alloc_page(GFP_KERNEL); > > if (!res.acl_scratch) > > - goto out_free; > > + return -ENOMEM; > > > > - args.acl_len = npages * PAGE_SIZE; > > + args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT; > > > > - dprintk("%s buf %p buflen %zu npages %d args.acl_len %zu\n", > > - __func__, buf, buflen, npages, args.acl_len); > > + dprintk("%s buf %p buflen %zu args.acl_len %zu\n", > > + __func__, buf, buflen, args.acl_len); > > ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode), > > &msg, &args.seq_args, &res.seq_res, 0); > > if (ret == 0) > > ret = res.acl_len; > > -out_free: > > + > > for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++) > > __free_page(pages[i]); > > __free_page(res.acl_scratch); > > -- > > 2.9.3 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote: > > On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: >> >>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: >>> >>> From: Weston Andros Adamson <dros@netapp.com> >>> >>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to >>> allocate whatever pages we need on demand. This is what the NFSv3 ACL >>> code does. >> >> The patch description does not explain why this change is >> being done. > > The only justification I see is avoiding allocating pages unnecessarily. That makes sense. Is there a real world workload that has seen a negative effect? > Without this patch, for each getacl, we allocate 17 pages (if I'm > calculating correctly) and probably rarely use most of them. > > In the v3 case I think it's 7 pages instead of 17. I would have guessed 9. Out of curiosity, is there a reason documented for these size limits? > Do we have reason to believe that's actually a big deal? The xprtrdma hack already has to allocate the full set of pages for NFSACL GETACL. If NFSv4 GETATTR(fs_acl4) already works this way and there are no real problems, I can't see any issue with NFSACL GETACL using the same mechanism to retrieve smaller objects. The only risk to overallocating is that it could drive some page reclaims. The upper layer should be in a better position to prevent deadlock in this case than the transport layer is. However if NFSv4 doesn't see a problem here, then there isn't likely to be an issue for NFSACL GETACL, IMO. > --b. > >> The matching hack in xprtrdma is in rpcrdma_convert_iovs(). >> Note that those are GFP_ATOMIC allocations, whereas here >> they are GFP_KERNEL, and are thus more reliable. >> >> IMO this is a step in the wrong direction. We should not be >> adding more upper layer dependencies on memory allocation >> in the transport layer. >> >> I strongly prefer that rather the NFSACL code works the way >> this code currently does, and that the hacks be removed from >> the transport implementations. >> >> >>> Signed-off-by: J. Bruce Fields <bfields@redhat.com> >>> --- >>> fs/nfs/nfs4proc.c | 23 +++++++---------------- >>> 1 file changed, 7 insertions(+), 16 deletions(-) >>> >>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>> index 3e3dbba4aa74..7842c73fddfc 100644 >>> --- a/fs/nfs/nfs4proc.c >>> +++ b/fs/nfs/nfs4proc.c >>> @@ -5068,6 +5068,7 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) >>> struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, }; >>> struct nfs_getaclargs args = { >>> .fh = NFS_FH(inode), >>> + /* The xdr layer may allocate pages here. */ >> >> Sure, it is called xdr_partial_copy_from_skb, but that function >> lives in socklib.c and is invoked only from xprtsock.c. Also, a >> similar hack has to be done in xprtrdma. >> >> So IMO this is a transport layer hack, and not part of the >> (generic) XDR layer. >> >> >>> .acl_pages = pages, >>> }; >>> struct nfs_getaclres res = { >>> @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) >>> .rpc_argp = &args, >>> .rpc_resp = &res, >>> }; >>> - unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1; >>> - int ret = -ENOMEM, i; >>> - >>> - if (npages > ARRAY_SIZE(pages)) >>> - return -ERANGE; >>> - >>> - for (i = 0; i < npages; i++) { >>> - pages[i] = alloc_page(GFP_KERNEL); >>> - if (!pages[i]) >>> - goto out_free; >>> - } >>> + int ret, i; >>> >>> /* for decoding across pages */ >>> res.acl_scratch = alloc_page(GFP_KERNEL); >>> if (!res.acl_scratch) >>> - goto out_free; >>> + return -ENOMEM; >>> >>> - args.acl_len = npages * PAGE_SIZE; >>> + args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT; >>> >>> - dprintk("%s buf %p buflen %zu npages %d args.acl_len %zu\n", >>> - __func__, buf, buflen, npages, args.acl_len); >>> + dprintk("%s buf %p buflen %zu args.acl_len %zu\n", >>> + __func__, buf, buflen, args.acl_len); >>> ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode), >>> &msg, &args.seq_args, &res.seq_res, 0); >>> if (ret == 0) >>> ret = res.acl_len; >>> -out_free: >>> + >>> for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++) >>> __free_page(pages[i]); >>> __free_page(res.acl_scratch); >>> -- >>> 2.9.3 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> Chuck Lever >> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote: > > > On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote: > > > > On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: > >> > >>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: > >>> > >>> From: Weston Andros Adamson <dros@netapp.com> > >>> > >>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to > >>> allocate whatever pages we need on demand. This is what the NFSv3 ACL > >>> code does. > >> > >> The patch description does not explain why this change is > >> being done. > > > > The only justification I see is avoiding allocating pages unnecessarily. > > That makes sense. Is there a real world workload that has seen > a negative effect? > > > > Without this patch, for each getacl, we allocate 17 pages (if I'm > > calculating correctly) and probably rarely use most of them. > > > > In the v3 case I think it's 7 pages instead of 17. > > I would have guessed 9. Out of curiosity, is there a reason > documented for these size limits? In the v4 case: #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE) And I believe XATTR_SIZE_MAX is a global maximum on the size of any extend attribute value. In the v3 case: /* Maximum number of ACL entries over NFS */ #define NFS_ACL_MAX_ENTRIES 1024 #define NFSACL_MAXPAGES ((2*(8+12*NFS_ACL_MAX_ENTRIES) + PAGE_SIZE-1) \ >> PAGE_SHIFT) No idea where that 1024 comes from. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote: > On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote: >> >> > On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote: >> > >> > On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: >> >> >> >>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: >> >>> >> >>> From: Weston Andros Adamson <dros@netapp.com> >> >>> >> >>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to >> >>> allocate whatever pages we need on demand. This is what the NFSv3 ACL >> >>> code does. >> >> >> >> The patch description does not explain why this change is >> >> being done. >> > >> > The only justification I see is avoiding allocating pages unnecessarily. >> >> That makes sense. Is there a real world workload that has seen >> a negative effect? >> >> >> > Without this patch, for each getacl, we allocate 17 pages (if I'm >> > calculating correctly) and probably rarely use most of them. >> > >> > In the v3 case I think it's 7 pages instead of 17. >> >> I would have guessed 9. Out of curiosity, is there a reason >> documented for these size limits? > > > In the v4 case: > > #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE) > > And I believe XATTR_SIZE_MAX is a global maximum on the size of any > extend attribute value. XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4 ACLs are passed through unchanged in "system.nfs4_acl". > In the v3 case: > > /* Maximum number of ACL entries over NFS */ > #define NFS_ACL_MAX_ENTRIES 1024 > > #define NFSACL_MAXPAGES ((2*(8+12*NFS_ACL_MAX_ENTRIES) + PAGE_SIZE-1) \ > >> PAGE_SHIFT) > > No idea where that 1024 comes from. The 1024-entry limit is arbitrary. Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Feb 19, 2017 at 3:07 AM, J. Bruce Fields <bfields@redhat.com> wrote: > From: Weston Andros Adamson <dros@netapp.com> > > Instead of preallocating pags, allow xdr_partial_copy_from_skb() to > allocate whatever pages we need on demand. This is what the NFSv3 ACL > code does. > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Andreas Gruenbacher <agreuenba@rdhat.com> > --- > fs/nfs/nfs4proc.c | 23 +++++++---------------- > 1 file changed, 7 insertions(+), 16 deletions(-) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 3e3dbba4aa74..7842c73fddfc 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -5068,6 +5068,7 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) > struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, }; > struct nfs_getaclargs args = { > .fh = NFS_FH(inode), > + /* The xdr layer may allocate pages here. */ > .acl_pages = pages, > }; > struct nfs_getaclres res = { > @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) > .rpc_argp = &args, > .rpc_resp = &res, > }; > - unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1; > - int ret = -ENOMEM, i; > - > - if (npages > ARRAY_SIZE(pages)) > - return -ERANGE; > - > - for (i = 0; i < npages; i++) { > - pages[i] = alloc_page(GFP_KERNEL); > - if (!pages[i]) > - goto out_free; > - } > + int ret, i; > > /* for decoding across pages */ > res.acl_scratch = alloc_page(GFP_KERNEL); > if (!res.acl_scratch) > - goto out_free; > + return -ENOMEM; > > - args.acl_len = npages * PAGE_SIZE; > + args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT; Define acl_len directly in the args initializer instead of here. > - dprintk("%s buf %p buflen %zu npages %d args.acl_len %zu\n", > - __func__, buf, buflen, npages, args.acl_len); > + dprintk("%s buf %p buflen %zu args.acl_len %zu\n", > + __func__, buf, buflen, args.acl_len); > ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode), > &msg, &args.seq_args, &res.seq_res, 0); > if (ret == 0) > ret = res.acl_len; > -out_free: > + > for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++) > __free_page(pages[i]); > __free_page(res.acl_scratch); > -- > 2.9.3 > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 20, 2017 at 11:38:22PM +0100, Andreas Gruenbacher wrote: > On Sun, Feb 19, 2017 at 3:07 AM, J. Bruce Fields <bfields@redhat.com> wrote: > > @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) > > .rpc_argp = &args, > > .rpc_resp = &res, > > }; > > - unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1; > > - int ret = -ENOMEM, i; > > - > > - if (npages > ARRAY_SIZE(pages)) > > - return -ERANGE; > > - > > - for (i = 0; i < npages; i++) { > > - pages[i] = alloc_page(GFP_KERNEL); > > - if (!pages[i]) > > - goto out_free; > > - } > > + int ret, i; > > > > /* for decoding across pages */ > > res.acl_scratch = alloc_page(GFP_KERNEL); > > if (!res.acl_scratch) > > - goto out_free; > > + return -ENOMEM; > > > > - args.acl_len = npages * PAGE_SIZE; > > + args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT; > > Define acl_len directly in the args initializer instead of here. Got it, thanks. I'll send another revision of the series. --b. > > > - dprintk("%s buf %p buflen %zu npages %d args.acl_len %zu\n", > > - __func__, buf, buflen, npages, args.acl_len); > > + dprintk("%s buf %p buflen %zu args.acl_len %zu\n", > > + __func__, buf, buflen, args.acl_len); > > ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode), > > &msg, &args.seq_args, &res.seq_res, 0); > > if (ret == 0) > > ret = res.acl_len; > > -out_free: > > + > > for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++) > > __free_page(pages[i]); > > __free_page(res.acl_scratch); > > -- > > 2.9.3 > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Andreas- > On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote: > > On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote: >> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote: >>> >>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote: >>>> >>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: >>>>> >>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: >>>>>> >>>>>> From: Weston Andros Adamson <dros@netapp.com> >>>>>> >>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to >>>>>> allocate whatever pages we need on demand. This is what the NFSv3 ACL >>>>>> code does. >>>>> >>>>> The patch description does not explain why this change is >>>>> being done. >>>> >>>> The only justification I see is avoiding allocating pages unnecessarily. >>> >>> That makes sense. Is there a real world workload that has seen >>> a negative effect? >>> >>> >>>> Without this patch, for each getacl, we allocate 17 pages (if I'm >>>> calculating correctly) and probably rarely use most of them. >>>> >>>> In the v3 case I think it's 7 pages instead of 17. >>> >>> I would have guessed 9. Out of curiosity, is there a reason >>> documented for these size limits? >> >> >> In the v4 case: >> >> #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE) >> >> And I believe XATTR_SIZE_MAX is a global maximum on the size of any >> extend attribute value. > > XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4 > ACLs are passed through unchanged in "system.nfs4_acl". "Extended attribute" means this is a Linux-specific limit? Is there anything that prevents a non-Linux system from constructing or returning an ACL that is larger than that? What happens on a Linux client when a server returns an ACL that does not fit in this allotment? >> In the v3 case: >> >> /* Maximum number of ACL entries over NFS */ >> #define NFS_ACL_MAX_ENTRIES 1024 >> >> #define NFSACL_MAXPAGES ((2*(8+12*NFS_ACL_MAX_ENTRIES) + PAGE_SIZE-1) \ >>>> PAGE_SHIFT) >> >> No idea where that 1024 comes from. > > The 1024-entry limit is arbitrary. > > Andreas > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Feb 21, 2017, at 1:35 PM, J. Bruce Fields <bfields@redhat.com> wrote: > > On Mon, Feb 20, 2017 at 11:38:22PM +0100, Andreas Gruenbacher wrote: >> On Sun, Feb 19, 2017 at 3:07 AM, J. Bruce Fields <bfields@redhat.com> wrote: >>> @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) >>> .rpc_argp = &args, >>> .rpc_resp = &res, >>> }; >>> - unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1; >>> - int ret = -ENOMEM, i; >>> - >>> - if (npages > ARRAY_SIZE(pages)) >>> - return -ERANGE; >>> - >>> - for (i = 0; i < npages; i++) { >>> - pages[i] = alloc_page(GFP_KERNEL); >>> - if (!pages[i]) >>> - goto out_free; >>> - } >>> + int ret, i; >>> >>> /* for decoding across pages */ >>> res.acl_scratch = alloc_page(GFP_KERNEL); >>> if (!res.acl_scratch) >>> - goto out_free; >>> + return -ENOMEM; >>> >>> - args.acl_len = npages * PAGE_SIZE; >>> + args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT; >> >> Define acl_len directly in the args initializer instead of here. > > Got it, thanks. I'll send another revision of the series. Thanks Bruce, You can add: Signed-off-by: Weston Andros Adamson <dros@primarydata.com> on the next revision. -dros > > --b. > >> >>> - dprintk("%s buf %p buflen %zu npages %d args.acl_len %zu\n", >>> - __func__, buf, buflen, npages, args.acl_len); >>> + dprintk("%s buf %p buflen %zu args.acl_len %zu\n", >>> + __func__, buf, buflen, args.acl_len); >>> ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode), >>> &msg, &args.seq_args, &res.seq_res, 0); >>> if (ret == 0) >>> ret = res.acl_len; >>> -out_free: >>> + >>> for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++) >>> __free_page(pages[i]); >>> __free_page(res.acl_scratch); >>> -- >>> 2.9.3 >>> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > Hi Andreas- > > >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote: >> >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote: >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote: >>>> >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote: >>>>> >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: >>>>>> >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: >>>>>>> >>>>>>> From: Weston Andros Adamson <dros@netapp.com> >>>>>>> >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to >>>>>>> allocate whatever pages we need on demand. This is what the NFSv3 ACL >>>>>>> code does. >>>>>> >>>>>> The patch description does not explain why this change is >>>>>> being done. >>>>> >>>>> The only justification I see is avoiding allocating pages unnecessarily. >>>> >>>> That makes sense. Is there a real world workload that has seen >>>> a negative effect? >>>> >>>> >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm >>>>> calculating correctly) and probably rarely use most of them. >>>>> >>>>> In the v3 case I think it's 7 pages instead of 17. >>>> >>>> I would have guessed 9. Out of curiosity, is there a reason >>>> documented for these size limits? >>> >>> >>> In the v4 case: >>> >>> #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE) >>> >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any >>> extend attribute value. >> >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4 >> ACLs are passed through unchanged in "system.nfs4_acl". > > "Extended attribute" means this is a Linux-specific limit? Yes. > Is there anything that prevents a non-Linux system from constructing > or returning an ACL that is larger than that? No. > What happens on a Linux client when a server returns an ACL that does > not fit in this allotment? I would hope an error, but I haven't tested it. Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 21, 2017 at 10:21:05PM +0100, Andreas Gruenbacher wrote: > On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > Hi Andreas- > > > > > >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote: > >> > >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote: > >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote: > >>>> > >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote: > >>>>> > >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: > >>>>>> > >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: > >>>>>>> > >>>>>>> From: Weston Andros Adamson <dros@netapp.com> > >>>>>>> > >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to > >>>>>>> allocate whatever pages we need on demand. This is what the NFSv3 ACL > >>>>>>> code does. > >>>>>> > >>>>>> The patch description does not explain why this change is > >>>>>> being done. > >>>>> > >>>>> The only justification I see is avoiding allocating pages unnecessarily. > >>>> > >>>> That makes sense. Is there a real world workload that has seen > >>>> a negative effect? > >>>> > >>>> > >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm > >>>>> calculating correctly) and probably rarely use most of them. > >>>>> > >>>>> In the v3 case I think it's 7 pages instead of 17. > >>>> > >>>> I would have guessed 9. Out of curiosity, is there a reason > >>>> documented for these size limits? > >>> > >>> > >>> In the v4 case: > >>> > >>> #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE) > >>> > >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any > >>> extend attribute value. > >> > >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4 > >> ACLs are passed through unchanged in "system.nfs4_acl". > > > > "Extended attribute" means this is a Linux-specific limit? > > Yes. > > > Is there anything that prevents a non-Linux system from constructing > > or returning an ACL that is larger than that? > > No. In the >=v4.1 case there are session limits, but they'll typically be less. In the 4.0 case I think there's no explicit limit at all. In practice I bet other systems are similar to Linux in that the assume peers won't send rpc replies or requests larger than about the maximum-sized read or write. But again that'll usually be a higher limit than our ACL limit. > > What happens on a Linux client when a server returns an ACL that does > > not fit in this allotment? > > I would hope an error, but I haven't tested it. I haven't tested either, but it looks to me like the rpc layer receives a truncated request, the xdr decoding recognizes that it's truncated, and the result is an -ERANGE. Looking now I think that my "NFSv4: simplify getacl decoding" changes that to an -EIO. More importantly, it makes that an EIO even when the calling application was only asking for the length, not the actual ACL data. I'll fix that. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 21, 2017 at 10:37 PM, J. Bruce Fields <bfields@redhat.com> wrote: > On Tue, Feb 21, 2017 at 10:21:05PM +0100, Andreas Gruenbacher wrote: >> On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >> > Hi Andreas- >> > >> > >> >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote: >> >> >> >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote: >> >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote: >> >>>> >> >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote: >> >>>>> >> >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: >> >>>>>> >> >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: >> >>>>>>> >> >>>>>>> From: Weston Andros Adamson <dros@netapp.com> >> >>>>>>> >> >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to >> >>>>>>> allocate whatever pages we need on demand. This is what the NFSv3 ACL >> >>>>>>> code does. >> >>>>>> >> >>>>>> The patch description does not explain why this change is >> >>>>>> being done. >> >>>>> >> >>>>> The only justification I see is avoiding allocating pages unnecessarily. >> >>>> >> >>>> That makes sense. Is there a real world workload that has seen >> >>>> a negative effect? >> >>>> >> >>>> >> >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm >> >>>>> calculating correctly) and probably rarely use most of them. >> >>>>> >> >>>>> In the v3 case I think it's 7 pages instead of 17. >> >>>> >> >>>> I would have guessed 9. Out of curiosity, is there a reason >> >>>> documented for these size limits? >> >>> >> >>> >> >>> In the v4 case: >> >>> >> >>> #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE) >> >>> >> >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any >> >>> extend attribute value. >> >> >> >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4 >> >> ACLs are passed through unchanged in "system.nfs4_acl". >> > >> > "Extended attribute" means this is a Linux-specific limit? >> >> Yes. >> >> > Is there anything that prevents a non-Linux system from constructing >> > or returning an ACL that is larger than that? >> >> No. > > In the >=v4.1 case there are session limits, but they'll typically be > less. In the 4.0 case I think there's no explicit limit at all. In > practice I bet other systems are similar to Linux in that the assume > peers won't send rpc replies or requests larger than about the > maximum-sized read or write. But again that'll usually be a higher > limit than our ACL limit. > >> > What happens on a Linux client when a server returns an ACL that does >> > not fit in this allotment? >> >> I would hope an error, but I haven't tested it. > > I haven't tested either, but it looks to me like the rpc layer receives > a truncated request, the xdr decoding recognizes that it's truncated, > and the result is an -ERANGE. > > Looking now I think that my "NFSv4: simplify getacl decoding" changes > that to an -EIO. More importantly, it makes that an EIO even when the > calling application was only asking for the length, not the actual ACL > data. I'll fix that. Just be careful not to return a length from getxattr(path, name, NULL, 0) that will cause getxattr(path, name, buffer, size) to fail with ERANGE, please. Otherwise, user space might get very confused. Thanks, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 21, 2017 at 10:45:35PM +0100, Andreas Gruenbacher wrote: > On Tue, Feb 21, 2017 at 10:37 PM, J. Bruce Fields <bfields@redhat.com> wrote: > > On Tue, Feb 21, 2017 at 10:21:05PM +0100, Andreas Gruenbacher wrote: > >> On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > >> > Hi Andreas- > >> > > >> > > >> >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote: > >> >> > >> >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote: > >> >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote: > >> >>>> > >> >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote: > >> >>>>> > >> >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: > >> >>>>>> > >> >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: > >> >>>>>>> > >> >>>>>>> From: Weston Andros Adamson <dros@netapp.com> > >> >>>>>>> > >> >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to > >> >>>>>>> allocate whatever pages we need on demand. This is what the NFSv3 ACL > >> >>>>>>> code does. > >> >>>>>> > >> >>>>>> The patch description does not explain why this change is > >> >>>>>> being done. > >> >>>>> > >> >>>>> The only justification I see is avoiding allocating pages unnecessarily. > >> >>>> > >> >>>> That makes sense. Is there a real world workload that has seen > >> >>>> a negative effect? > >> >>>> > >> >>>> > >> >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm > >> >>>>> calculating correctly) and probably rarely use most of them. > >> >>>>> > >> >>>>> In the v3 case I think it's 7 pages instead of 17. > >> >>>> > >> >>>> I would have guessed 9. Out of curiosity, is there a reason > >> >>>> documented for these size limits? > >> >>> > >> >>> > >> >>> In the v4 case: > >> >>> > >> >>> #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE) > >> >>> > >> >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any > >> >>> extend attribute value. > >> >> > >> >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4 > >> >> ACLs are passed through unchanged in "system.nfs4_acl". > >> > > >> > "Extended attribute" means this is a Linux-specific limit? > >> > >> Yes. > >> > >> > Is there anything that prevents a non-Linux system from constructing > >> > or returning an ACL that is larger than that? > >> > >> No. > > > > In the >=v4.1 case there are session limits, but they'll typically be > > less. In the 4.0 case I think there's no explicit limit at all. In > > practice I bet other systems are similar to Linux in that the assume > > peers won't send rpc replies or requests larger than about the > > maximum-sized read or write. But again that'll usually be a higher > > limit than our ACL limit. > > > >> > What happens on a Linux client when a server returns an ACL that does > >> > not fit in this allotment? > >> > >> I would hope an error, but I haven't tested it. > > > > I haven't tested either, but it looks to me like the rpc layer receives > > a truncated request, the xdr decoding recognizes that it's truncated, > > and the result is an -ERANGE. > > > > Looking now I think that my "NFSv4: simplify getacl decoding" changes > > that to an -EIO. More importantly, it makes that an EIO even when the > > calling application was only asking for the length, not the actual ACL > > data. I'll fix that. > > Just be careful not to return a length from getxattr(path, name, NULL, > 0) that will cause getxattr(path, name, buffer, size) to fail with > ERANGE, please. Otherwise, user space might get very confused. Ugh, OK. So there could be userspace code that does something like while (getxattr(path, name, buf, size) == -ERANGE) { /* oops, must have raced with a size change */ size = getxattr(path, name, NULL, 0); buf = realloc(buf, size); } and you'd consider that a kernel bug not a userspace bug? I suspect that can happen both before and after my changes. So what do we want for that case? Just -EIO? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 22, 2017 at 2:53 AM, J. Bruce Fields <bfields@redhat.com> wrote: > On Tue, Feb 21, 2017 at 10:45:35PM +0100, Andreas Gruenbacher wrote: >> On Tue, Feb 21, 2017 at 10:37 PM, J. Bruce Fields <bfields@redhat.com> wrote: >> > On Tue, Feb 21, 2017 at 10:21:05PM +0100, Andreas Gruenbacher wrote: >> >> On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >> >> > Hi Andreas- >> >> > >> >> > >> >> >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote: >> >> >> >> >> >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote: >> >> >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote: >> >> >>>> >> >> >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote: >> >> >>>>> >> >> >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: >> >> >>>>>> >> >> >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: >> >> >>>>>>> >> >> >>>>>>> From: Weston Andros Adamson <dros@netapp.com> >> >> >>>>>>> >> >> >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to >> >> >>>>>>> allocate whatever pages we need on demand. This is what the NFSv3 ACL >> >> >>>>>>> code does. >> >> >>>>>> >> >> >>>>>> The patch description does not explain why this change is >> >> >>>>>> being done. >> >> >>>>> >> >> >>>>> The only justification I see is avoiding allocating pages unnecessarily. >> >> >>>> >> >> >>>> That makes sense. Is there a real world workload that has seen >> >> >>>> a negative effect? >> >> >>>> >> >> >>>> >> >> >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm >> >> >>>>> calculating correctly) and probably rarely use most of them. >> >> >>>>> >> >> >>>>> In the v3 case I think it's 7 pages instead of 17. >> >> >>>> >> >> >>>> I would have guessed 9. Out of curiosity, is there a reason >> >> >>>> documented for these size limits? >> >> >>> >> >> >>> >> >> >>> In the v4 case: >> >> >>> >> >> >>> #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE) >> >> >>> >> >> >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any >> >> >>> extend attribute value. >> >> >> >> >> >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4 >> >> >> ACLs are passed through unchanged in "system.nfs4_acl". >> >> > >> >> > "Extended attribute" means this is a Linux-specific limit? >> >> >> >> Yes. >> >> >> >> > Is there anything that prevents a non-Linux system from constructing >> >> > or returning an ACL that is larger than that? >> >> >> >> No. >> > >> > In the >=v4.1 case there are session limits, but they'll typically be >> > less. In the 4.0 case I think there's no explicit limit at all. In >> > practice I bet other systems are similar to Linux in that the assume >> > peers won't send rpc replies or requests larger than about the >> > maximum-sized read or write. But again that'll usually be a higher >> > limit than our ACL limit. >> > >> >> > What happens on a Linux client when a server returns an ACL that does >> >> > not fit in this allotment? >> >> >> >> I would hope an error, but I haven't tested it. >> > >> > I haven't tested either, but it looks to me like the rpc layer receives >> > a truncated request, the xdr decoding recognizes that it's truncated, >> > and the result is an -ERANGE. >> > >> > Looking now I think that my "NFSv4: simplify getacl decoding" changes >> > that to an -EIO. More importantly, it makes that an EIO even when the >> > calling application was only asking for the length, not the actual ACL >> > data. I'll fix that. >> >> Just be careful not to return a length from getxattr(path, name, NULL, >> 0) that will cause getxattr(path, name, buffer, size) to fail with >> ERANGE, please. Otherwise, user space might get very confused. > > Ugh, OK. So there could be userspace code that does something like > > while (getxattr(path, name, buf, size) == -ERANGE) { > /* oops, must have raced with a size change */ > size = getxattr(path, name, NULL, 0); > buf = realloc(buf, size); > } > > and you'd consider that a kernel bug not a userspace bug? It would at least provoke errors if the above loop (with an additional check for size == -1) didn't terminate, so I'd like to avoid that. I see now that there is botched code in fs/xattr.c that tries to prevent that, so I'll try to fix that so that file systems won't have to bother. > I suspect that can happen both before and after my changes. > > So what do we want for that case? Just -EIO? getxattr and listxattr are trying to cast that kind of error to -E2BIG, which seems okay. Thanks, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Feb 23, 2017 at 11:28:46AM +0100, Andreas Gruenbacher wrote: > On Wed, Feb 22, 2017 at 2:53 AM, J. Bruce Fields <bfields@redhat.com> wrote: > > On Tue, Feb 21, 2017 at 10:45:35PM +0100, Andreas Gruenbacher wrote: > >> On Tue, Feb 21, 2017 at 10:37 PM, J. Bruce Fields <bfields@redhat.com> wrote: > >> > On Tue, Feb 21, 2017 at 10:21:05PM +0100, Andreas Gruenbacher wrote: > >> >> On Tue, Feb 21, 2017 at 7:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > >> >> > Hi Andreas- > >> >> > > >> >> > > >> >> >> On Feb 20, 2017, at 4:31 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote: > >> >> >> > >> >> >> On Mon, Feb 20, 2017 at 6:15 PM, J. Bruce Fields <bfields@redhat.com> wrote: > >> >> >>> On Mon, Feb 20, 2017 at 11:42:31AM -0500, Chuck Lever wrote: > >> >> >>>> > >> >> >>>>> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@redhat.com> wrote: > >> >> >>>>> > >> >> >>>>> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote: > >> >> >>>>>> > >> >> >>>>>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@redhat.com> wrote: > >> >> >>>>>>> > >> >> >>>>>>> From: Weston Andros Adamson <dros@netapp.com> > >> >> >>>>>>> > >> >> >>>>>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to > >> >> >>>>>>> allocate whatever pages we need on demand. This is what the NFSv3 ACL > >> >> >>>>>>> code does. > >> >> >>>>>> > >> >> >>>>>> The patch description does not explain why this change is > >> >> >>>>>> being done. > >> >> >>>>> > >> >> >>>>> The only justification I see is avoiding allocating pages unnecessarily. > >> >> >>>> > >> >> >>>> That makes sense. Is there a real world workload that has seen > >> >> >>>> a negative effect? > >> >> >>>> > >> >> >>>> > >> >> >>>>> Without this patch, for each getacl, we allocate 17 pages (if I'm > >> >> >>>>> calculating correctly) and probably rarely use most of them. > >> >> >>>>> > >> >> >>>>> In the v3 case I think it's 7 pages instead of 17. > >> >> >>>> > >> >> >>>> I would have guessed 9. Out of curiosity, is there a reason > >> >> >>>> documented for these size limits? > >> >> >>> > >> >> >>> > >> >> >>> In the v4 case: > >> >> >>> > >> >> >>> #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE) > >> >> >>> > >> >> >>> And I believe XATTR_SIZE_MAX is a global maximum on the size of any > >> >> >>> extend attribute value. > >> >> >> > >> >> >> XATTR_SIZE_MAX is the maximum size of an extended attribute. NFSv4 > >> >> >> ACLs are passed through unchanged in "system.nfs4_acl". > >> >> > > >> >> > "Extended attribute" means this is a Linux-specific limit? > >> >> > >> >> Yes. > >> >> > >> >> > Is there anything that prevents a non-Linux system from constructing > >> >> > or returning an ACL that is larger than that? > >> >> > >> >> No. > >> > > >> > In the >=v4.1 case there are session limits, but they'll typically be > >> > less. In the 4.0 case I think there's no explicit limit at all. In > >> > practice I bet other systems are similar to Linux in that the assume > >> > peers won't send rpc replies or requests larger than about the > >> > maximum-sized read or write. But again that'll usually be a higher > >> > limit than our ACL limit. > >> > > >> >> > What happens on a Linux client when a server returns an ACL that does > >> >> > not fit in this allotment? > >> >> > >> >> I would hope an error, but I haven't tested it. > >> > > >> > I haven't tested either, but it looks to me like the rpc layer receives > >> > a truncated request, the xdr decoding recognizes that it's truncated, > >> > and the result is an -ERANGE. > >> > > >> > Looking now I think that my "NFSv4: simplify getacl decoding" changes > >> > that to an -EIO. More importantly, it makes that an EIO even when the > >> > calling application was only asking for the length, not the actual ACL > >> > data. I'll fix that. > >> > >> Just be careful not to return a length from getxattr(path, name, NULL, > >> 0) that will cause getxattr(path, name, buffer, size) to fail with > >> ERANGE, please. Otherwise, user space might get very confused. > > > > Ugh, OK. So there could be userspace code that does something like > > > > while (getxattr(path, name, buf, size) == -ERANGE) { > > /* oops, must have raced with a size change */ > > size = getxattr(path, name, NULL, 0); > > buf = realloc(buf, size); > > } > > > > and you'd consider that a kernel bug not a userspace bug? > > It would at least provoke errors if the above loop (with an additional > check for size == -1) didn't terminate, so I'd like to avoid that. I > see now that there is botched code in fs/xattr.c that tries to prevent > that, so I'll try to fix that so that file systems won't have to > bother. Having seen your patch on fs-devel.... OK, so after that point, we can choose in NFS to either to return -E2BIG ourselves or to return success with the large length and let fs/xattr convert to -E2BIG if necessary. Thanks, that makes sense. > > I suspect that can happen both before and after my changes. > > > > So what do we want for that case? Just -EIO? > > getxattr and listxattr are trying to cast that kind of error to > -E2BIG, which seems okay. Got it, thanks. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 3e3dbba4aa74..7842c73fddfc 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5068,6 +5068,7 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, }; struct nfs_getaclargs args = { .fh = NFS_FH(inode), + /* The xdr layer may allocate pages here. */ .acl_pages = pages, }; struct nfs_getaclres res = { @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen) .rpc_argp = &args, .rpc_resp = &res, }; - unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1; - int ret = -ENOMEM, i; - - if (npages > ARRAY_SIZE(pages)) - return -ERANGE; - - for (i = 0; i < npages; i++) { - pages[i] = alloc_page(GFP_KERNEL); - if (!pages[i]) - goto out_free; - } + int ret, i; /* for decoding across pages */ res.acl_scratch = alloc_page(GFP_KERNEL); if (!res.acl_scratch) - goto out_free; + return -ENOMEM; - args.acl_len = npages * PAGE_SIZE; + args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT; - dprintk("%s buf %p buflen %zu npages %d args.acl_len %zu\n", - __func__, buf, buflen, npages, args.acl_len); + dprintk("%s buf %p buflen %zu args.acl_len %zu\n", + __func__, buf, buflen, args.acl_len); ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode), &msg, &args.seq_args, &res.seq_res, 0); if (ret == 0) ret = res.acl_len; -out_free: + for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++) __free_page(pages[i]); __free_page(res.acl_scratch);