mbox series

[RFC,00/35] user xattr support (RFC8276)

Message ID cover.1568309119.git.fllinden@amazon.com (mailing list archive)
Headers show
Series user xattr support (RFC8276) | expand

Message

Frank van der Linden Sept. 12, 2019, 5:25 p.m. UTC
This RFC patch series implements both client and server side support
for RFC8276 user extended attributes.

The server side should be complete (except for modifications made
after comments, of course). The client side lacks caching. I am
working on that, but am not happy with my current implementation.
So I'm sending this out for input. Having a reviewed base to work
from will make adding client-side caching support easier to add,
in any case.

Thanks for any input!

Some notable issues:

* RFC8276 explicitly only concerns itself with the user xattr namespace.
  Linux is the only OS that encodes the namespace in the attribute name,
  adding "user." for the user namespace. Others, like FreeBSD and MacOS,
  pass the namespace as a system call argument. Since the "user." prefix
  is specific to Linux, it is stripped off on the client side, and added
  back on the server side.

* Extended attributes tend to be small in size, but they may be somewhat
  large. The Linux-imposed limit is 64k. The sunrpc XDR interfaces only
  allow sizes > PAGE_SIZE to be encoded using pages. That works out
  great for reads/writes, but since there are no page-based xattr
  FS interfaces, it means some extra explicit copying.

* A quirk of the Linux xattr implementation is that you can create
  more extended attributes per file than listxattr can handle in
  its maximum size, meaning that you get E2BIG back. There is no
  1-1 translation for that error to the LISTXATTR op. I chose to
  return NFS4ERR_TOOSMALL (which is a valid error for the operation),
  and have the client code check for it and return E2BIG. Not
  great, but it seemed to be the best option.

* The LISTXATTR op uses cookies to support multiple calls to retrieve
  the complete list, like READDDIR. This means that there's the old
  "how to deal with non-0 cookies" issue.

  In the vast majority of cases, LISTXATTR should not need multiple
  calls. However, you never know what a client will do. READDIR
  uses the cookie as a seek offset in to the directory. It's not
  quite as simple with LISTXATTR. First, there is no seperate
  FS interface to list only user xattrs. Which means that, based
  on permissions, the buffer returned by vfs_listxattr might turn
  out differently (e.g. if you have no permission to read some
  system. attributes). That means that using just a hard offset
  in to the buffer (with verifications) can't work, as its a
  context-specific value, and the client couldn't cache it if
  it wanted to.

  My code returns the attribute count as a cookie. The server
  always asks vfs_listxattr for a XATTR_LIST_MAX buffer (see
  below), and then starts XDR encoding at the Nth user xattr
  attribute, where N is the cookie value. There are bounds
  checks of course.

  This isn't great, but probably the best you can do.

* There is no FS interface to only read user extended attributes,
  and the server code needs to strip off the "user." prefix.
  Additionally, the RFC specifies that the max length field for the
  LISTXATTR op refers to the total XDR-encoded length. Given all
  this, it is not possible to know what the length is it should pass
  to vfs_listxattr. So it just passes an XATTR_LIST_MAX sized buffer
  to vfs_listxattr, and then encodes as many user xattrs as it can
  from the returned buffer.

* Given that this is an extension to v4.2, it is only supported
  if 4.2 is used (even though it has no dependencies on 4.2
  specific features).

* There is a new mount option, already known for other filesystems,
  "user_xattr", which must be passed explicitly as it stands
  right now, together with vers=4.2

* There is a new export option to turn off extended attributes for
  a filesystem.

* Modifications outside of the NFS(D) code were minor. They were
  all in the xattr code, to export versions of vfs_setxattr and
  vfs_removexattr that the NFS code should use (inode rwsem taken
  because of atomic change info), and to have them break delegations,
  as specified by RFC8276.

* As I mentioned, this has no client caching. I have one implementation,
  but I'm not that happy with it.

  The main issue with client caching is that, for virtually all expected
  cases, the number of user extended attributes per file will be small,
  and their data small. But, theoretically, you can, within current Linux
  limitations, create some 9000 user extended attributes per inode, each
  64k in size.

  My current implementation uses an inode LRU chain (much like the access
  cache), and an rhashtable per inode (I picked rhashtables because
  they automatically grow). This seems like overkill, though.

  If there are any suggestions on better implementations, I'll be happy
  to take them. The client caching I mention here is not in this series,
  as I need to clean it up a bit (and am not sure if I really want to use
  it). But I can share it if asked. It will be in the next iteration,
  in whatever shape or form it might take.

Frank van der Linden (35):
  nfsd: make sure the nfsd4_ops array has the right size
  nfs/nfsd: basic NFS4 extended attribute definitions
  NFSv4.2: query the server for extended attribute support
  nfs: parse the {no}user_xattr option
  NFSv4.2: define a function to compute the maximum XDR size for
    listxattr
  NFSv4.2: define and set initial limits for extended attributes
  NFSv4.2: define argument and response structures for xattr operations
  NFSv4.2: define the encode/decode sizes for the XATTR operations
  NFSv4.2: define and use extended attribute overhead sizes
  NFSv4.2: add client side XDR handling for extended attributes
  nfs: define nfs_access_get_cached function
  NFSv4.2: query the extended attribute access bits
  nfs: modify update_changeattr to deal with regular files
  nfs: define and use the NFS_INO_INVALID_XATTR flag
  nfs: make the buf_to_pages_noslab function available to the nfs code
  NFSv4.2: add the extended attribute proc functions.
  NFSv4.2: hook in the user extended attribute handlers
  NFSv4.2: add client side xattr caching functions
  NFSv4.2: call the xattr cache functions
  nfs: add the NFS_V4_XATTR config option
  xattr: modify vfs_{set,remove}xattr for NFS server use
  nfsd: split off the write decode code in to a seperate function
  nfsd: add defines for NFSv4.2 extended attribute support
  nfsd: define xattr functions to call in to their vfs counterparts
  nfsd: take xattr access bits in to account when checking
  nfsd: add structure definitions for xattr requests / responses
  nfsd: implement the xattr procedure functions.
  nfsd: define xattr reply size functions
  nfsd: add xattr XDR decode functions
  nfsd: add xattr XDR encode functions
  nfsd: add xattr operations to ops array
  xattr: add a function to check if a namespace is supported
  nfsd: add fattr support for user extended attributes
  nfsd: add export flag to disable user extended attributes
  nfsd: add NFSD_V4_XATTR config option

 fs/nfs/Kconfig                   |   9 +
 fs/nfs/Makefile                  |   1 +
 fs/nfs/client.c                  |  17 +-
 fs/nfs/dir.c                     |  24 +-
 fs/nfs/inode.c                   |   7 +-
 fs/nfs/internal.h                |  24 ++
 fs/nfs/nfs42.h                   |  29 ++
 fs/nfs/nfs42proc.c               | 242 ++++++++++++++
 fs/nfs/nfs42xattr.c              |  72 ++++
 fs/nfs/nfs42xdr.c                | 446 +++++++++++++++++++++++++
 fs/nfs/nfs4_fs.h                 |   5 +
 fs/nfs/nfs4client.c              |  31 ++
 fs/nfs/nfs4proc.c                | 246 ++++++++++++--
 fs/nfs/nfs4super.c               |   1 +
 fs/nfs/nfs4xdr.c                 |  35 ++
 fs/nfs/nfstrace.h                |   3 +-
 fs/nfs/super.c                   |  11 +
 fs/nfsd/Kconfig                  |  10 +
 fs/nfsd/export.c                 |   1 +
 fs/nfsd/nfs4proc.c               | 145 +++++++-
 fs/nfsd/nfs4xdr.c                | 548 +++++++++++++++++++++++++++++--
 fs/nfsd/nfsd.h                   |  13 +-
 fs/nfsd/vfs.c                    | 153 +++++++++
 fs/nfsd/vfs.h                    |  10 +
 fs/nfsd/xdr4.h                   |  31 ++
 fs/xattr.c                       |  90 ++++-
 include/linux/nfs4.h             |  27 +-
 include/linux/nfs_fs.h           |   6 +
 include/linux/nfs_fs_sb.h        |   7 +
 include/linux/nfs_xdr.h          |  62 +++-
 include/linux/xattr.h            |   4 +
 include/uapi/linux/nfs4.h        |   3 +
 include/uapi/linux/nfs_fs.h      |   1 +
 include/uapi/linux/nfsd/export.h |   3 +-
 34 files changed, 2233 insertions(+), 84 deletions(-)
 create mode 100644 fs/nfs/nfs42xattr.c

Comments

Chuck Lever Oct. 24, 2019, 8:16 p.m. UTC | #1
Hi Frank-

> On Sep 12, 2019, at 1:25 PM, Frank van der Linden <fllinden@amazon.com> wrote:
> 
> This RFC patch series implements both client and server side support
> for RFC8276 user extended attributes.
> 
> The server side should be complete (except for modifications made
> after comments, of course). The client side lacks caching. I am
> working on that, but am not happy with my current implementation.
> So I'm sending this out for input. Having a reviewed base to work
> from will make adding client-side caching support easier to add,
> in any case.
> 
> Thanks for any input!

This prototype looks very good. I have some general comments as I'm
getting to know your code. I assume your end goal is to get this
work merged into the upstream Linux kernel.

First some comments about patch submission and review:

- IMO you can post future updates just to linux-nfs. Note that the
kernel NFS client and server are maintained separately, so when it
comes time to submit final patches, you will send the client work
to Trond and Anna, and the server work to Bruce (and maybe me).

- We like patches that are as small as possible but no smaller.
Some of these might be too small. For example, you don't need to add
the XDR encoders, decoders, and reply size macros in separate patches.
That might help get the total patch count down without reducing
review-ability. I can help offline with other examples.

- Please run scripts/checkpatch.pl on each patch before you post
again. This will help identify coding convention issues that should
be addressed before merge. sparse is also a good idea too.
clang-format is also nice but is entirely optional.

- I was not able to get 34/35 to apply. The series might be missing
a patch that adds nfsd_getxattr and friends.

- Do you have man page updates for the new mount and export options?


Now some comments about code design:

- I'm not clear why new CONFIG options are necessary. These days we
try to avoid adding new CONFIG options if possible. I can't think of
a reason someone would need to compile user xattr support out if
NFSv4.2 is enabled.

- Can you explain why an NFS server administrator might want to
disable user xattr support on a share?

- Probably you are correct that the design choices you made regarding
multi-message LISTXATTR are the best that can be done. Hopefully that
is not a frequent operation, but for something like "tar" it might be.
Do you have any feeling for how to assess performance?

- Regarding client caching... the RFC is notably vague about what
is needed there. You might be able to get away with no caching, just
as a start. Do you (and others) think that this series would be
acceptable and mergeable without any client caching support?

- Do you have access to an RDMA-capable platform? RPC/RDMA needs to
be able to predict how large each reply will be, in order to reserve
appropriate memory resources to land the whole RPC reply on the client.
I'm wondering if you've found any particular areas where that might be
a challenge.


Testing:

- Does fstests already have user xattr functional tests? If not, how
do you envision testing this new code?

- How should we test the logic that deals with delegation recall?

- Do you have plans to submit patches to pyNFS?


> Some notable issues:
> 
> * RFC8276 explicitly only concerns itself with the user xattr namespace.
>  Linux is the only OS that encodes the namespace in the attribute name,
>  adding "user." for the user namespace. Others, like FreeBSD and MacOS,
>  pass the namespace as a system call argument. Since the "user." prefix
>  is specific to Linux, it is stripped off on the client side, and added
>  back on the server side.
> 
> * Extended attributes tend to be small in size, but they may be somewhat
>  large. The Linux-imposed limit is 64k. The sunrpc XDR interfaces only
>  allow sizes > PAGE_SIZE to be encoded using pages. That works out
>  great for reads/writes, but since there are no page-based xattr
>  FS interfaces, it means some extra explicit copying.
> 
> * A quirk of the Linux xattr implementation is that you can create
>  more extended attributes per file than listxattr can handle in
>  its maximum size, meaning that you get E2BIG back. There is no
>  1-1 translation for that error to the LISTXATTR op. I chose to
>  return NFS4ERR_TOOSMALL (which is a valid error for the operation),
>  and have the client code check for it and return E2BIG. Not
>  great, but it seemed to be the best option.
> 
> * The LISTXATTR op uses cookies to support multiple calls to retrieve
>  the complete list, like READDDIR. This means that there's the old
>  "how to deal with non-0 cookies" issue.
> 
>  In the vast majority of cases, LISTXATTR should not need multiple
>  calls. However, you never know what a client will do. READDIR
>  uses the cookie as a seek offset in to the directory. It's not
>  quite as simple with LISTXATTR. First, there is no seperate
>  FS interface to list only user xattrs. Which means that, based
>  on permissions, the buffer returned by vfs_listxattr might turn
>  out differently (e.g. if you have no permission to read some
>  system. attributes). That means that using just a hard offset
>  in to the buffer (with verifications) can't work, as its a
>  context-specific value, and the client couldn't cache it if
>  it wanted to.
> 
>  My code returns the attribute count as a cookie. The server
>  always asks vfs_listxattr for a XATTR_LIST_MAX buffer (see
>  below), and then starts XDR encoding at the Nth user xattr
>  attribute, where N is the cookie value. There are bounds
>  checks of course.
> 
>  This isn't great, but probably the best you can do.
> 
> * There is no FS interface to only read user extended attributes,
>  and the server code needs to strip off the "user." prefix.
>  Additionally, the RFC specifies that the max length field for the
>  LISTXATTR op refers to the total XDR-encoded length. Given all
>  this, it is not possible to know what the length is it should pass
>  to vfs_listxattr. So it just passes an XATTR_LIST_MAX sized buffer
>  to vfs_listxattr, and then encodes as many user xattrs as it can
>  from the returned buffer.
> 
> * Given that this is an extension to v4.2, it is only supported
>  if 4.2 is used (even though it has no dependencies on 4.2
>  specific features).
> 
> * There is a new mount option, already known for other filesystems,
>  "user_xattr", which must be passed explicitly as it stands
>  right now, together with vers=4.2
> 
> * There is a new export option to turn off extended attributes for
>  a filesystem.
> 
> * Modifications outside of the NFS(D) code were minor. They were
>  all in the xattr code, to export versions of vfs_setxattr and
>  vfs_removexattr that the NFS code should use (inode rwsem taken
>  because of atomic change info), and to have them break delegations,
>  as specified by RFC8276.
> 
> * As I mentioned, this has no client caching. I have one implementation,
>  but I'm not that happy with it.
> 
>  The main issue with client caching is that, for virtually all expected
>  cases, the number of user extended attributes per file will be small,
>  and their data small. But, theoretically, you can, within current Linux
>  limitations, create some 9000 user extended attributes per inode, each
>  64k in size.
> 
>  My current implementation uses an inode LRU chain (much like the access
>  cache), and an rhashtable per inode (I picked rhashtables because
>  they automatically grow). This seems like overkill, though.
> 
>  If there are any suggestions on better implementations, I'll be happy
>  to take them. The client caching I mention here is not in this series,
>  as I need to clean it up a bit (and am not sure if I really want to use
>  it). But I can share it if asked. It will be in the next iteration,
>  in whatever shape or form it might take.
> 
> Frank van der Linden (35):
>  nfsd: make sure the nfsd4_ops array has the right size
>  nfs/nfsd: basic NFS4 extended attribute definitions
>  NFSv4.2: query the server for extended attribute support
>  nfs: parse the {no}user_xattr option
>  NFSv4.2: define a function to compute the maximum XDR size for
>    listxattr
>  NFSv4.2: define and set initial limits for extended attributes
>  NFSv4.2: define argument and response structures for xattr operations
>  NFSv4.2: define the encode/decode sizes for the XATTR operations
>  NFSv4.2: define and use extended attribute overhead sizes
>  NFSv4.2: add client side XDR handling for extended attributes
>  nfs: define nfs_access_get_cached function
>  NFSv4.2: query the extended attribute access bits
>  nfs: modify update_changeattr to deal with regular files
>  nfs: define and use the NFS_INO_INVALID_XATTR flag
>  nfs: make the buf_to_pages_noslab function available to the nfs code
>  NFSv4.2: add the extended attribute proc functions.
>  NFSv4.2: hook in the user extended attribute handlers
>  NFSv4.2: add client side xattr caching functions
>  NFSv4.2: call the xattr cache functions
>  nfs: add the NFS_V4_XATTR config option
>  xattr: modify vfs_{set,remove}xattr for NFS server use
>  nfsd: split off the write decode code in to a seperate function
>  nfsd: add defines for NFSv4.2 extended attribute support
>  nfsd: define xattr functions to call in to their vfs counterparts
>  nfsd: take xattr access bits in to account when checking
>  nfsd: add structure definitions for xattr requests / responses
>  nfsd: implement the xattr procedure functions.
>  nfsd: define xattr reply size functions
>  nfsd: add xattr XDR decode functions
>  nfsd: add xattr XDR encode functions
>  nfsd: add xattr operations to ops array
>  xattr: add a function to check if a namespace is supported
>  nfsd: add fattr support for user extended attributes
>  nfsd: add export flag to disable user extended attributes
>  nfsd: add NFSD_V4_XATTR config option
> 
> fs/nfs/Kconfig                   |   9 +
> fs/nfs/Makefile                  |   1 +
> fs/nfs/client.c                  |  17 +-
> fs/nfs/dir.c                     |  24 +-
> fs/nfs/inode.c                   |   7 +-
> fs/nfs/internal.h                |  24 ++
> fs/nfs/nfs42.h                   |  29 ++
> fs/nfs/nfs42proc.c               | 242 ++++++++++++++
> fs/nfs/nfs42xattr.c              |  72 ++++
> fs/nfs/nfs42xdr.c                | 446 +++++++++++++++++++++++++
> fs/nfs/nfs4_fs.h                 |   5 +
> fs/nfs/nfs4client.c              |  31 ++
> fs/nfs/nfs4proc.c                | 246 ++++++++++++--
> fs/nfs/nfs4super.c               |   1 +
> fs/nfs/nfs4xdr.c                 |  35 ++
> fs/nfs/nfstrace.h                |   3 +-
> fs/nfs/super.c                   |  11 +
> fs/nfsd/Kconfig                  |  10 +
> fs/nfsd/export.c                 |   1 +
> fs/nfsd/nfs4proc.c               | 145 +++++++-
> fs/nfsd/nfs4xdr.c                | 548 +++++++++++++++++++++++++++++--
> fs/nfsd/nfsd.h                   |  13 +-
> fs/nfsd/vfs.c                    | 153 +++++++++
> fs/nfsd/vfs.h                    |  10 +
> fs/nfsd/xdr4.h                   |  31 ++
> fs/xattr.c                       |  90 ++++-
> include/linux/nfs4.h             |  27 +-
> include/linux/nfs_fs.h           |   6 +
> include/linux/nfs_fs_sb.h        |   7 +
> include/linux/nfs_xdr.h          |  62 +++-
> include/linux/xattr.h            |   4 +
> include/uapi/linux/nfs4.h        |   3 +
> include/uapi/linux/nfs_fs.h      |   1 +
> include/uapi/linux/nfsd/export.h |   3 +-
> 34 files changed, 2233 insertions(+), 84 deletions(-)
> create mode 100644 fs/nfs/nfs42xattr.c
> 
> -- 
> 2.17.2
> 

--
Chuck Lever
chucklever@gmail.com
Frank van der Linden Oct. 24, 2019, 11:15 p.m. UTC | #2
Hi Chuck,

Thanks for your comments.

On Thu, Oct 24, 2019 at 04:16:33PM -0400, Chuck Lever wrote:
> - IMO you can post future updates just to linux-nfs. Note that the
> kernel NFS client and server are maintained separately, so when it
> comes time to submit final patches, you will send the client work
> to Trond and Anna, and the server work to Bruce (and maybe me).

Sure, I'll do that.

> 
> - We like patches that are as small as possible but no smaller.
> Some of these might be too small. For example, you don't need to add
> the XDR encoders, decoders, and reply size macros in separate patches.

True, I might have gone overboard there :-) If you can send further
suggestions offline, that'd be great!

> - Please run scripts/checkpatch.pl on each patch before you post
> again. This will help identify coding convention issues that should
> be addressed before merge. sparse is also a good idea too.
> clang-format is also nice but is entirely optional.

No problem. I think there shouldn't be many issues, but I'm sure
I mixed up some of the coding styles I've had to adhere to over
the decades..

> 
> - I was not able to get 34/35 to apply. The series might be missing
> a patch that adds nfsd_getxattr and friends.

Hm, odd. I'll check on that - I might have messed up there.

> 
> - Do you have man page updates for the new mount and export options?

I don't, but I can easily write them. They go in nfs-utils, right?

> - I'm not clear why new CONFIG options are necessary. These days we
> try to avoid adding new CONFIG options if possible. I can't think of
> a reason someone would need to compile user xattr support out if
> NFSv4.2 is enabled.
> 
> - Can you explain why an NFS server administrator might want to
> disable user xattr support on a share?

I think both of these are cases of being careful. E.g. don't enable
something by default and allow it to be disabled at runtime in
case something goes terribly wrong.

I didn't have any other reasons, really. I'm happy do to away with
the CONFIG options if that's the consensus, as well as the
nouser_xattr export option.

> - Probably you are correct that the design choices you made regarding
> multi-message LISTXATTR are the best that can be done. Hopefully that
> is not a frequent operation, but for something like "tar" it might be.
> Do you have any feeling for how to assess performance?

So far, my performance testing has been based on synthetic workloads,
which I'm also using to test some boundary conditions. E.g. create
as many xattrs as the Linux limit allows, list them all, now do
this for many files, etc. I'll definitely add testing with code
that uses xattrs. tar is on the list, but I'm happy to test anything
that exercises the code.

I don't think a multi-message LISTXATTR will happen a lot in practice,
if at all.

> 
> - Regarding client caching... the RFC is notably vague about what
> is needed there. You might be able to get away with no caching, just
> as a start. Do you (and others) think that this series would be
> acceptable and mergeable without any client caching support?

The performance is, obviously, not great without client side caching.
But then again, that's on my synthetic workloads. In cases like GNU
tar, it won't matter a whole lot because of the way that code is
structured.

I would prefer to have client side caching enabled from the start.
I have an implementation that works, but, like I mentioned, I
have some misgivings about it. But I should just include it when
I re-post - I might simply be worrying too much.

> 
> - Do you have access to an RDMA-capable platform? RPC/RDMA needs to
> be able to predict how large each reply will be, in order to reserve
> appropriate memory resources to land the whole RPC reply on the client.
> I'm wondering if you've found any particular areas where that might be
> a challenge.

Hm. I might be able to set something up. If not, I'd be relying
on someone you might know to test it for me :-)

I am not too familiar with the RDMA RPC code. From what I posted, is 
there any specific part of how the RPC layer is used that would
be of concern with RDMA?

I don't do anything other parts of the code don't do. The only special
case is using on-demand page allocation on receive, which the ACL code
also does (XDRBUF_SPARSE_PAGES - used for LISTXATTR and GETXATTR).

> 
> 
> Testing:
> 
> - Does fstests already have user xattr functional tests? If not, how
> do you envision testing this new code?

https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/ has some xattr
tests. I've, so far, been using my own set of tests that I'm happy
to contribute to any testsuite.

> 
> - How should we test the logic that deals with delegation recall?

I believe pyNFS has some logic do test this. What I have been doing
is manual testing, either using 2 clients, or, simpler, setting
xattrs on a file on the server itself, and verifying that client
delegations were recalled.

> 
> - Do you have plans to submit patches to pyNFS?

It wasn't in my plans, but I certainly could. One issue I've noticed,
with pyNFS and some other tests, is that they go no further than 4.1.
They'll need some more work to do 4.2 - although that shouldn't be
a lot of work, as most (or was it all?) features in 4.2 are optional.

I've not had much time to work on this in the past few weeks, but
next week is looking much better. Here's my plan:

* address any issues flagged by checkpatch
* merge some patches, with your input
* clean up my nfs-ganesha patches and test some more against that
* test against Rick's FreeBSD prototype
* repost the series, split in to client and server

In general, what do people do with code changes that affect both
client and server (e.g. generic defines)?

Thanks,

- Frank
Chuck Lever Oct. 25, 2019, 7:55 p.m. UTC | #3
Hi Frank-


> On Oct 24, 2019, at 7:15 PM, Frank van der Linden <fllinden@amazon.com> wrote:
> 
> Hi Chuck,
> 
> Thanks for your comments.
> 
> On Thu, Oct 24, 2019 at 04:16:33PM -0400, Chuck Lever wrote:
>> - IMO you can post future updates just to linux-nfs. Note that the
>> kernel NFS client and server are maintained separately, so when it
>> comes time to submit final patches, you will send the client work
>> to Trond and Anna, and the server work to Bruce (and maybe me).
> 
> Sure, I'll do that.
> 
>> 
>> - We like patches that are as small as possible but no smaller.
>> Some of these might be too small. For example, you don't need to add
>> the XDR encoders, decoders, and reply size macros in separate patches.
> 
> True, I might have gone overboard there :-) If you can send further
> suggestions offline, that'd be great!
> 
>> - Please run scripts/checkpatch.pl on each patch before you post
>> again. This will help identify coding convention issues that should
>> be addressed before merge. sparse is also a good idea too.
>> clang-format is also nice but is entirely optional.
> 
> No problem. I think there shouldn't be many issues, but I'm sure
> I mixed up some of the coding styles I've had to adhere to over
> the decades..
> 
>> 
>> - I was not able to get 34/35 to apply. The series might be missing
>> a patch that adds nfsd_getxattr and friends.
> 
> Hm, odd. I'll check on that - I might have messed up there.
> 
>> 
>> - Do you have man page updates for the new mount and export options?
> 
> I don't, but I can easily write them. They go in nfs-utils, right?

Yes. utils/mount/nfs.man for the mount option.


>> - I'm not clear why new CONFIG options are necessary. These days we
>> try to avoid adding new CONFIG options if possible. I can't think of
>> a reason someone would need to compile user xattr support out if
>> NFSv4.2 is enabled.
>> 
>> - Can you explain why an NFS server administrator might want to
>> disable user xattr support on a share?
> 
> I think both of these are cases of being careful. E.g. don't enable
> something by default and allow it to be disabled at runtime in
> case something goes terribly wrong.
> 
> I didn't have any other reasons, really. I'm happy do to away with
> the CONFIG options if that's the consensus, as well as the
> nouser_xattr export option.

I have similar patches adding support for access to a couple of
security xattrs. I initially wrapped the new code with CONFIG
but after some discussion it was decided there was really no
need to be so cautious.

The user_xattr export option is a separate matter, but again,
if we don't know of a use case for it, I would leave it out for
the moment.


>> - Probably you are correct that the design choices you made regarding
>> multi-message LISTXATTR are the best that can be done. Hopefully that
>> is not a frequent operation, but for something like "tar" it might be.
>> Do you have any feeling for how to assess performance?
> 
> So far, my performance testing has been based on synthetic workloads,
> which I'm also using to test some boundary conditions. E.g. create
> as many xattrs as the Linux limit allows, list them all, now do
> this for many files, etc. I'll definitely add testing with code
> that uses xattrs. tar is on the list, but I'm happy to test anything
> that exercises the code.
> 
> I don't think a multi-message LISTXATTR will happen a lot in practice,
> if at all.
> 
>> 
>> - Regarding client caching... the RFC is notably vague about what
>> is needed there. You might be able to get away with no caching, just
>> as a start. Do you (and others) think that this series would be
>> acceptable and mergeable without any client caching support?
> 
> The performance is, obviously, not great without client side caching.
> But then again, that's on my synthetic workloads. In cases like GNU
> tar, it won't matter a whole lot because of the way that code is
> structured.
> 
> I would prefer to have client side caching enabled from the start.
> I have an implementation that works, but, like I mentioned, I
> have some misgivings about it. But I should just include it when
> I re-post - I might simply be worrying too much.

After the patches are cleaner (checkpatch and squashing) I think
you will get more direct review of the caching heuristics.

I'll send some suggestions via private e-mail.


>> - Do you have access to an RDMA-capable platform? RPC/RDMA needs to
>> be able to predict how large each reply will be, in order to reserve
>> appropriate memory resources to land the whole RPC reply on the client.
>> I'm wondering if you've found any particular areas where that might be
>> a challenge.
> 
> Hm. I might be able to set something up. If not, I'd be relying
> on someone you might know to test it for me :-)
> 
> I am not too familiar with the RDMA RPC code. From what I posted, is 
> there any specific part of how the RPC layer is used that would
> be of concern with RDMA?
> 
> I don't do anything other parts of the code don't do. The only special
> case is using on-demand page allocation on receive, which the ACL code
> also does (XDRBUF_SPARSE_PAGES - used for LISTXATTR and GETXATTR).

That's exactly what's of concern. RDMA has similar logic as TCP
here to allocate pages on demand for this case. The problem
arises when the server needs to return a bigger reply than will
fit in this buffer. Rare.


>> Testing:
>> 
>> - Does fstests already have user xattr functional tests? If not, how
>> do you envision testing this new code?
> 
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/ has some xattr
> tests. I've, so far, been using my own set of tests that I'm happy
> to contribute to any testsuite.

fstests would be the one.


>> - How should we test the logic that deals with delegation recall?
> 
> I believe pyNFS has some logic do test this. What I have been doing
> is manual testing, either using 2 clients, or, simpler, setting
> xattrs on a file on the server itself, and verifying that client
> delegations were recalled.
> 
>> 
>> - Do you have plans to submit patches to pyNFS?
> 
> It wasn't in my plans, but I certainly could. One issue I've noticed,
> with pyNFS and some other tests, is that they go no further than 4.1.
> They'll need some more work to do 4.2 - although that shouldn't be
> a lot of work, as most (or was it all?) features in 4.2 are optional.

OK, if v4.2 is not supported in the test suite, then there is
a pre-requisite discussion to be had.


> I've not had much time to work on this in the past few weeks, but
> next week is looking much better. Here's my plan:
> 
> * address any issues flagged by checkpatch
> * merge some patches, with your input
> * clean up my nfs-ganesha patches and test some more against that
> * test against Rick's FreeBSD prototype
> * repost the series, split in to client and server
> 
> In general, what do people do with code changes that affect both
> client and server (e.g. generic defines)?

For generic defines, include the same patches in both the client
and server series. When git merges the two separate branches, it
should recognize that the incoming files are identical and do
nothing.


--
Chuck Lever
chucklever@gmail.com
J. Bruce Fields Nov. 4, 2019, 3:01 a.m. UTC | #4
On Fri, Oct 25, 2019 at 03:55:09PM -0400, Chuck Lever wrote:
> > On Oct 24, 2019, at 7:15 PM, Frank van der Linden <fllinden@amazon.com> wrote:
> > I think both of these are cases of being careful. E.g. don't enable
> > something by default and allow it to be disabled at runtime in
> > case something goes terribly wrong.
> > 
> > I didn't have any other reasons, really. I'm happy do to away with
> > the CONFIG options if that's the consensus, as well as the
> > nouser_xattr export option.
> 
> I have similar patches adding support for access to a couple of
> security xattrs. I initially wrapped the new code with CONFIG
> but after some discussion it was decided there was really no
> need to be so cautious.
> 
> The user_xattr export option is a separate matter, but again,
> if we don't know of a use case for it, I would leave it out for
> the moment.

Agreed.

Do ext4, xfs, etc. have an option to turn off xattrs?  If so, maybe it
would be good enough to turn off xattrs on the exported filesystem
rathre than on the export.

If not, maybe that's a sign that hasn't been a need.

--b.
Chuck Lever Nov. 4, 2019, 3:36 p.m. UTC | #5
> On Nov 3, 2019, at 10:01 PM, bfields@fieldses.org wrote:
> 
> On Fri, Oct 25, 2019 at 03:55:09PM -0400, Chuck Lever wrote:
>>> On Oct 24, 2019, at 7:15 PM, Frank van der Linden <fllinden@amazon.com> wrote:
>>> I think both of these are cases of being careful. E.g. don't enable
>>> something by default and allow it to be disabled at runtime in
>>> case something goes terribly wrong.
>>> 
>>> I didn't have any other reasons, really. I'm happy do to away with
>>> the CONFIG options if that's the consensus, as well as the
>>> nouser_xattr export option.
>> 
>> I have similar patches adding support for access to a couple of
>> security xattrs. I initially wrapped the new code with CONFIG
>> but after some discussion it was decided there was really no
>> need to be so cautious.
>> 
>> The user_xattr export option is a separate matter, but again,
>> if we don't know of a use case for it, I would leave it out for
>> the moment.
> 
> Agreed.
> 
> Do ext4, xfs, etc. have an option to turn off xattrs?  If so, maybe it
> would be good enough to turn off xattrs on the exported filesystem
> rather than on the export.

Following the server's local file systems' mount options seems like a
good way to go. In particular, is there a need to expose user xattrs
on the server host, but prevent NFS clients' access to them? I can't
think of one.


> If not, maybe that's a sign that hasn't been a need.
> 
> --b.

--
Chuck Lever
chucklever@gmail.com
Frank van der Linden Nov. 4, 2019, 4:21 p.m. UTC | #6
On Mon, Nov 04, 2019 at 10:36:03AM -0500, Chuck Lever wrote:
> 
> Following the server's local file systems' mount options seems like a
> good way to go. In particular, is there a need to expose user xattrs
> on the server host, but prevent NFS clients' access to them? I can't
> think of one.

Ok, that sounds fine to me - I'll remove the user_xattr export flag,
and we had already agreed to do away with the CONFIGs.

That leaves one last item with regard to enabling support: the client side
mount option. I assume the [no]user_xattr option should work the same as
with other filesystems. What about the default setting?

Also, currently, my code does not fail the mount operation if user xattrs
are asked for, but the server does not support them. It just doesn't
set NFS_CAP_XATTR for the server, and the xattr handler entry points
eturn -EOPNOTSUPP, as they check for NFS_CAP_XATTR. What's the preferred
behavior there?

- Frank
J. Bruce Fields Nov. 4, 2019, 10:58 p.m. UTC | #7
On Mon, Nov 04, 2019 at 04:21:47PM +0000, Frank van der Linden wrote:
> On Mon, Nov 04, 2019 at 10:36:03AM -0500, Chuck Lever wrote:
> > 
> > Following the server's local file systems' mount options seems like a
> > good way to go. In particular, is there a need to expose user xattrs
> > on the server host, but prevent NFS clients' access to them? I can't
> > think of one.
> 
> Ok, that sounds fine to me - I'll remove the user_xattr export flag,
> and we had already agreed to do away with the CONFIGs.
> 
> That leaves one last item with regard to enabling support: the client side
> mount option. I assume the [no]user_xattr option should work the same as
> with other filesystems. What about the default setting?

Just checking code for other filesystems quickly; if I understand right:

	- ext4 has user_xattr and nouser_xattr options, but you get a
	  deprecation warning if you try to use the latter;
	- xfs doesn't support either option;
	- cifs supports both, with xattr support the default.

Not necessarily my call, but just for simplicity's sake, I'd probably
leave out the option and see if anybody complains.

> Also, currently, my code does not fail the mount operation if user xattrs
> are asked for, but the server does not support them. It just doesn't
> set NFS_CAP_XATTR for the server, and the xattr handler entry points
> eturn -EOPNOTSUPP, as they check for NFS_CAP_XATTR. What's the preferred
> behavior there?

getxattr(2) under ERRORS says:

	ENOTSUP
	      Extended attributes are not supported by the filesystem,
	      or  are disabled.

so I'm guessing just erroring out is clearest.

I also see there's an EOPNOTSUPP return in the nouser_xattr case in
ext4_xattr_user_get.  (errno(3) says ENOTSUP and EOPNOTSUPP are the same
value on Linux.)

--b.
Frank van der Linden Nov. 5, 2019, 12:06 a.m. UTC | #8
On Mon, Nov 04, 2019 at 05:58:46PM -0500, Bruce Fields wrote:
> Just checking code for other filesystems quickly; if I understand right:
> 
> 	- ext4 has user_xattr and nouser_xattr options, but you get a
> 	  deprecation warning if you try to use the latter;
> 	- xfs doesn't support either option;
> 	- cifs supports both, with xattr support the default.
> 
> Not necessarily my call, but just for simplicity's sake, I'd probably
> leave out the option and see if anybody complains.

Sounds good, I'll go with that.
> 
> > Also, currently, my code does not fail the mount operation if user xattrs
> > are asked for, but the server does not support them. It just doesn't
> > set NFS_CAP_XATTR for the server, and the xattr handler entry points
> > eturn -EOPNOTSUPP, as they check for NFS_CAP_XATTR. What's the preferred
> > behavior there?
> 
> getxattr(2) under ERRORS says:
> 
> 	ENOTSUP
> 	      Extended attributes are not supported by the filesystem,
> 	      or  are disabled.
> 
> so I'm guessing just erroring out is clearest.
> 
> I also see there's an EOPNOTSUPP return in the nouser_xattr case in
> ext4_xattr_user_get.  (errno(3) says ENOTSUP and EOPNOTSUPP are the same
> value on Linux.)

Sure. So, to recap, here's what I'll do:

* Remove the CONFIG options - enable the code by default.
* Server side:
	* Remove the export file option to not export extended attributes
* Client side:
	* Always probe user xattr support, and use it when available
	  (returning EOPNOTSUPP, as expected, for all xattr syscalls if
	   it is not available).

Thanks for the feedback so far - much appreciated. I'll submit the separate
client and server patch sets this week.

- Frank
Chuck Lever Nov. 5, 2019, 3:44 p.m. UTC | #9
> On Nov 4, 2019, at 5:58 PM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Mon, Nov 04, 2019 at 04:21:47PM +0000, Frank van der Linden wrote:
>> On Mon, Nov 04, 2019 at 10:36:03AM -0500, Chuck Lever wrote:
>>> 
>>> Following the server's local file systems' mount options seems like a
>>> good way to go. In particular, is there a need to expose user xattrs
>>> on the server host, but prevent NFS clients' access to them? I can't
>>> think of one.
>> 
>> Ok, that sounds fine to me - I'll remove the user_xattr export flag,
>> and we had already agreed to do away with the CONFIGs.
>> 
>> That leaves one last item with regard to enabling support: the client side
>> mount option. I assume the [no]user_xattr option should work the same as
>> with other filesystems. What about the default setting?
> 
> Just checking code for other filesystems quickly; if I understand right:
> 
> 	- ext4 has user_xattr and nouser_xattr options, but you get a
> 	  deprecation warning if you try to use the latter;
> 	- xfs doesn't support either option;
> 	- cifs supports both, with xattr support the default.
> 
> Not necessarily my call, but just for simplicity's sake, I'd probably
> leave out the option and see if anybody complains.

Agree, I would leave it out to begin with. Anyone on linux-fsdevel,
feel free to chime in here about why some other file systems have
this mount option. History lessons welcome.


>> Also, currently, my code does not fail the mount operation if user xattrs
>> are asked for, but the server does not support them. It just doesn't
>> set NFS_CAP_XATTR for the server, and the xattr handler entry points
>> eturn -EOPNOTSUPP, as they check for NFS_CAP_XATTR. What's the preferred
>> behavior there?
> 
> getxattr(2) under ERRORS says:
> 
> 	ENOTSUP
> 	      Extended attributes are not supported by the filesystem,
> 	      or  are disabled.
> 
> so I'm guessing just erroring out is clearest.

IMO on the client, we want getxattr failure behavior to be consistent among:

- A version of NFS that does not support xattrs at all (say, v3)

- A version of NFS that can support them but doesn't (say, NFSv4.2 before these patches)

- A version of NFS that can support them, but the server doesn't

- A version of NFS that can support them, a server that can support them, but it's filesystem doesn't


> I also see there's an EOPNOTSUPP return in the nouser_xattr case in
> ext4_xattr_user_get.  (errno(3) says ENOTSUP and EOPNOTSUPP are the same
> value on Linux.)
> 
> --b.

--
Chuck Lever
chucklever@gmail.com