mbox series

[GIT,PULL] Please pull NFS client updates for 4.21

Message ID 02d3ecd37f9390e7b8a7be8ec0e1cafb7fdbed26.camel@netapp.com (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] Please pull NFS client updates for 4.21 | expand

Pull-request

git://git.linux-nfs.org/projects/anna/linux-nfs.git tags/nfs-for-4.21-1

Message

Schumaker, Anna Jan. 2, 2019, 10:42 p.m. UTC
Hi Linus,

The following changes since commit abc13275771fac77e2d7b129c289522dacb644b6:

  SUNRPC: Remove xprt_connect_status() (2018-12-18 11:04:10 -0500)

are available in the Git repository at:

  git://git.linux-nfs.org/projects/anna/linux-nfs.git tags/nfs-for-4.21-1

for you to fetch changes up to 260f71eff493a844531629854c0935fa8de4fa2c:

  sunrpc: convert to DEFINE_SHOW_ATTRIBUTE (2019-01-02 12:05:49 -0500)

----------------------------------------------------------------
Note that there is a conflict with the rdma tree in this pull request, since
we delete a file that has been changed in the rdma tree.  Hopefully that's
easy enough to resolve!

We also were unable to track down a maintainer for Neil Brown's changes to
the generic cred code that are prerequisites to his RPC cred cleanup patches.
We've been asking around for several months without any response, so
hopefully it's okay to include those patches in this pull request.

Stable bugfixes:
- xprtrdma: Yet another double DMA-unmap # v4.20

Features:
- Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG
- Per-xprt rdma receive workqueues
- Drop support for FMR memory registration
- Make port= mount option optional for RDMA mounts

Other bugfixes and cleanups:
- Remove unused nfs4_xdev_fs_type declaration
- Fix comments for behavior that has changed
- Remove generic RPC credentials by switching to 'struct cred'
- Fix crossing mountpoints with different auth flavors
- Various xprtrdma fixes from testing and auditing the close code
- Fixes for disconnect issues when using xprtrdma with krb5
- Clean up and improve xprtrdma trace points
- Fix NFS v4.2 async copy reboot recovery

Thanks,
Anna

----------------------------------------------------------------

Ben Dooks (1):
      SUNRPC: allow /proc entries without CONFIG_SUNRPC_DEBUG

Chris Perl (1):
      NFS: nfs_compare_mount_options always compare auth flavors.

Chuck Lever (30):
      xprtrdma: Yet another double DMA-unmap
      xprtrdma: Ensure MRs are DMA-unmapped when posting LOCAL_INV fails
      xprtrdma: Refactor Receive accounting
      xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueue
      xprtrdma: No qp_event disconnect
      xprtrdma: Don't wake pending tasks until disconnect is done
      xprtrdma: Fix ri_max_segs and the result of ro_maxpages
      xprtrdma: Reduce max_frwr_depth
      xprtrdma: Remove support for FMR memory registration
      xprtrdma: Remove rpcrdma_memreg_ops
      xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)
      NFS: Make "port=" mount option optional for RDMA mounts
      xprtrdma: Recognize XDRBUF_SPARSE_PAGES
      xprtrdma: Remove request_module from backchannel
      xprtrdma: Expose transport header errors
      xprtrdma: Simplify locking that protects the rl_allreqs list
      xprtrdma: Cull dprintk() call sites
      xprtrdma: Remove unused fields from rpcrdma_ia
      xprtrdma: Clean up of xprtrdma chunk trace points
      xprtrdma: Relocate the xprtrdma_mr_map trace points
      xprtrdma: Add trace points for calls to transport switch methods
      xprtrdma: Trace mapping, alloc, and dereg failures
      NFS: Fix NFSv4 symbolic trace point output
      SUNRPC: Simplify defining common RPC trace events
      SUNRPC: Fix some kernel doc complaints
      xprtrdma: Update comments in frwr_op_send
      xprtrdma: Replace outdated comment for rpcrdma_ep_post
      xprtrdma: Add documenting comment for rpcrdma_buffer_destroy
      xprtrdma: Don't leak freed MRs
      xprtrdma: Prevent leak of rpcrdma_rep objects

J. Bruce Fields (2):
      sunrpc: handle ENOMEM in rpcb_getport_async
      sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS

NeilBrown (24):
      cred: add cred_fscmp() for comparing creds.
      cred: add get_cred_rcu()
      cred: export get_task_cred().
      cred: allow get_cred() and put_cred() to be given NULL.
      SUNRPC: add 'struct cred *' to auth_cred and rpc_cred
      SUNRPC: remove groupinfo from struct auth_cred.
      SUNRPC: remove uid and gid from struct auth_cred
      SUNRPC: remove machine_cred field from struct auth_cred
      NFSv4: add cl_root_cred for use when machine cred is not available.
      NFSv4: don't require lock for get_renew_cred or get_machine_cred
      SUNRPC: discard RPC_DO_ROOTOVERRIDE()
      NFS/SUNRPC: don't lookup machine credential until rpcauth_bindcred().
      SUNRPC: introduce RPC_TASK_NULLCREDS to request auth_none
      SUNRPC: add side channel to use non-generic cred for rpc call.
      NFS: move credential expiry tracking out of SUNRPC into NFS.
      SUNRPC: remove RPCAUTH_AUTH_NO_CRKEY_TIMEOUT
      NFS: change access cache to use 'struct cred'.
      NFS: struct nfs_open_dir_context: convert rpc_cred pointer to cred.
      NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.
      SUNRPC: remove generic cred code.
      SUNRPC: remove crbind rpc_cred operation
      SUNRPC: simplify auth_unix.
      SUNRPC discard cr_uid from struct rpc_cred.
      NFS: remove unnecessary test for IS_ERR(cred)

Olga Kornievskaia (2):
      NFSv4: cleanup remove unused nfs4_xdev_fs_type
      NFSv4.2 fix async copy reboot recovery

Pavel Tikhomirov (1):
      nfs: fix comment to nfs_generic_pg_test which does the opposite

Santosh kumar pradhan (1):
      sunrpc: Add xprt after nfs4_test_session_trunk()

Yangtao Li (1):
      sunrpc: convert to DEFINE_SHOW_ATTRIBUTE

 fs/lockd/clntproc.c                        |   6 +-
 fs/nfs/blocklayout/blocklayout.c           |   2 +-
 fs/nfs/client.c                            |   9 +-
 fs/nfs/delegation.c                        |  28 +-
 fs/nfs/delegation.h                        |  10 +-
 fs/nfs/dir.c                               |  59 ++--
 fs/nfs/flexfilelayout/flexfilelayout.c     |  64 ++--
 fs/nfs/flexfilelayout/flexfilelayout.h     |   8 +-
 fs/nfs/flexfilelayout/flexfilelayoutdev.c  |  16 +-
 fs/nfs/inode.c                             |  13 +-
 fs/nfs/internal.h                          |  15 +-
 fs/nfs/nfs3proc.c                          |   4 +-
 fs/nfs/nfs4_fs.h                           |  68 ++---
 fs/nfs/nfs4client.c                        |   4 +-
 fs/nfs/nfs4proc.c                          | 158 +++++-----
 fs/nfs/nfs4renewd.c                        |   9 +-
 fs/nfs/nfs4session.c                       |   5 +-
 fs/nfs/nfs4state.c                         | 131 ++++-----
 fs/nfs/nfs4trace.h                         | 456 ++++++++++++++++++++---------
 fs/nfs/pagelist.c                          |   4 +-
 fs/nfs/pnfs.c                              |  14 +-
 fs/nfs/pnfs.h                              |  10 +-
 fs/nfs/pnfs_dev.c                          |   4 +-
 fs/nfs/pnfs_nfs.c                          |   2 +-
 fs/nfs/proc.c                              |   2 +-
 fs/nfs/super.c                             |  13 +-
 fs/nfs/unlink.c                            |  20 +-
 fs/nfs/write.c                             |  24 +-
 fs/nfsd/nfs4callback.c                     |  31 +-
 fs/nfsd/state.h                            |   2 +-
 include/linux/cred.h                       |  26 +-
 include/linux/nfs_fs.h                     |  13 +-
 include/linux/nfs_fs_sb.h                  |   2 +-
 include/linux/nfs_xdr.h                    |  16 +-
 include/linux/sunrpc/auth.h                |  51 +---
 include/linux/sunrpc/clnt.h                |   5 +-
 include/linux/sunrpc/sched.h               |   6 +-
 include/trace/events/rpcrdma.h             | 218 ++++++++++++--
 include/trace/events/sunrpc.h              | 172 +++++------
 kernel/cred.c                              |  58 +++-
 net/sunrpc/Makefile                        |   2 +-
 net/sunrpc/auth.c                          | 116 ++++----
 net/sunrpc/auth_generic.c                  | 293 ------------------
 net/sunrpc/auth_gss/auth_gss.c             |  47 +--
 net/sunrpc/auth_gss/gss_mech_switch.c      |   2 +-
 net/sunrpc/auth_null.c                     |   4 -
 net/sunrpc/auth_unix.c                     | 110 +++----
 net/sunrpc/backchannel_rqst.c              |   2 +-
 net/sunrpc/clnt.c                          |  29 +-
 net/sunrpc/rpc_pipe.c                      |  19 +-
 net/sunrpc/rpcb_clnt.c                     |  12 +-
 net/sunrpc/sched.c                         |   5 +-
 net/sunrpc/xprtmultipath.c                 |   4 +-
 net/sunrpc/xprtrdma/Makefile               |   3 +-
 net/sunrpc/xprtrdma/backchannel.c          |  39 +--
 net/sunrpc/xprtrdma/fmr_ops.c              | 337 ---------------------
 net/sunrpc/xprtrdma/frwr_ops.c             | 209 ++++++++-----
 net/sunrpc/xprtrdma/rpc_rdma.c             |  78 +++--
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |   8 +-
 net/sunrpc/xprtrdma/transport.c            |  91 +++---
 net/sunrpc/xprtrdma/verbs.c                | 255 +++++++---------
 net/sunrpc/xprtrdma/xprt_rdma.h            |  80 ++---
 net/sunrpc/xprtsock.c                      |  10 +-
 63 files changed, 1518 insertions(+), 1995 deletions(-)
 delete mode 100644 net/sunrpc/auth_generic.c
 delete mode 100644 net/sunrpc/xprtrdma/fmr_ops.c

Comments

Linus Torvalds Jan. 3, 2019, 12:48 a.m. UTC | #1
On Wed, Jan 2, 2019 at 2:42 PM Schumaker, Anna
<Anna.Schumaker@netapp.com> wrote:
>
> We also were unable to track down a maintainer for Neil Brown's changes to
> the generic cred code that are prerequisites to his RPC cred cleanup patches.
> We've been asking around for several months without any response, so
> hopefully it's okay to include those patches in this pull request.

Looks ok to me, although I wonder what the semantics of cred_fscmp()
are across namespaces?

IOW, it seems potentially a bit suspicious to do cred_fscmp() if the
two creds have different namnespaces? Hmm?

Is there some reason that can't happen, or some reason it doesn't matter?

                  Linus
pr-tracker-bot@kernel.org Jan. 3, 2019, 2:40 a.m. UTC | #2
The pull request you sent on Wed, 2 Jan 2019 22:42:32 +0000:

> git://git.linux-nfs.org/projects/anna/linux-nfs.git tags/nfs-for-4.21-1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e6b92572808467f35fd159d47c45b650de29e722

Thank you!
NeilBrown Jan. 3, 2019, 4:53 a.m. UTC | #3
On Wed, Jan 02 2019, Linus Torvalds wrote:

> On Wed, Jan 2, 2019 at 2:42 PM Schumaker, Anna
> <Anna.Schumaker@netapp.com> wrote:
>>
>> We also were unable to track down a maintainer for Neil Brown's changes to
>> the generic cred code that are prerequisites to his RPC cred cleanup patches.
>> We've been asking around for several months without any response, so
>> hopefully it's okay to include those patches in this pull request.
>
> Looks ok to me, although I wonder what the semantics of cred_fscmp()
> are across namespaces?
>
> IOW, it seems potentially a bit suspicious to do cred_fscmp() if the
> two creds have different namnespaces? Hmm?
>
> Is there some reason that can't happen, or some reason it doesn't matter?
>
>                   Linus

Interesting question.
For the current use in NFS, it is consistent with existing practice to
ignore the name space.
NFS file accesses (when using the normal uid-based access checks) always
use the manifest uid of the process - the one returned by getuid() (or
more accurately, getfsuid()).
Maybe this is wrong?  Maybe we should always use from_kuid() or whatever
to get the uid/gid to send over the wire?

Anna/Trond: do you have thoughts on this?  If a process in a user
namespace accesses a file over NFS, should the UID presented to the
server be the one in that name-space, or the one you get by mapping to
the global name-space?
Or should we map to the namespace that was active when the filesystem
was mounted?

I don't think cred_fscmp() should do any of this mapping, but maybe it
should treat creds from different namespaces as different - as a
precaution.

Thanks,
NeilBrown
Trond Myklebust Jan. 3, 2019, 6:09 a.m. UTC | #4
On Thu, 2019-01-03 at 15:53 +1100, NeilBrown wrote:
> On Wed, Jan 02 2019, Linus Torvalds wrote:
> 
> > On Wed, Jan 2, 2019 at 2:42 PM Schumaker, Anna
> > <Anna.Schumaker@netapp.com> wrote:
> > > We also were unable to track down a maintainer for Neil Brown's
> > > changes to
> > > the generic cred code that are prerequisites to his RPC cred
> > > cleanup patches.
> > > We've been asking around for several months without any response,
> > > so
> > > hopefully it's okay to include those patches in this pull
> > > request.
> > 
> > Looks ok to me, although I wonder what the semantics of
> > cred_fscmp()
> > are across namespaces?
> > 
> > IOW, it seems potentially a bit suspicious to do cred_fscmp() if
> > the
> > two creds have different namnespaces? Hmm?
> > 
> > Is there some reason that can't happen, or some reason it doesn't
> > matter?
> > 
> >                   Linus
> 
> Interesting question.
> For the current use in NFS, it is consistent with existing practice
> to
> ignore the name space.
> NFS file accesses (when using the normal uid-based access checks)
> always
> use the manifest uid of the process - the one returned by getuid()
> (or
> more accurately, getfsuid()).
> Maybe this is wrong?  Maybe we should always use from_kuid() or
> whatever
> to get the uid/gid to send over the wire?
> 
> Anna/Trond: do you have thoughts on this?  If a process in a user
> namespace accesses a file over NFS, should the UID presented to the
> server be the one in that name-space, or the one you get by mapping
> to
> the global name-space?
> Or should we map to the namespace that was active when the filesystem
> was mounted?
> 
> I don't think cred_fscmp() should do any of this mapping, but maybe
> it
> should treat creds from different namespaces as different - as a
> precaution.
> 
> Thanks,
> NeilBrown

The values being compared are in cred_fscmp() are all of type kuid_t or
kgid_t so that means they have already been mapped from the user
namespace into the kernel uid/gid space.
When we put those kuid/kgid values onto the wire, we currently always
use the init namespace rather than the user namespace of the mount
process.

When using strong authentication (i.e. krb5) then none of this matters,
since the server performs its own mapping of the presented RPCSEC_GSS
session into a credential. That mapping is independent of the user
namespace on the client, it just depends on which krb5 principal the
process used to identify itself.

The problem case is limited to when we're using the weak AUTH_UNIX
authentication, since the server is then implicitly trusting the client
to protect against identity spoofing. This is particularly true if the
NFS server is being accessed through NAT, in which case it has very
limited possibilities for discriminating between containers on the same
client using the export table because they will all originate from the
same source IP address. I think that for these cases, using the init
namespace is the right thing to do for the same reason we use it with
local filesystems: if we try to use a different namespace then
unprivileged userspace processes might be able to manipulate the
mapping to spoof the identities of privileged users or groups, or
otherwise gain access to files to which they normally should not have
access.

Does that argument make sense?
Eric W. Biederman Jan. 3, 2019, 8:47 p.m. UTC | #5
Trond Myklebust <trondmy@hammerspace.com> writes:

> On Thu, 2019-01-03 at 15:53 +1100, NeilBrown wrote:
>> On Wed, Jan 02 2019, Linus Torvalds wrote:
>> 
>> > On Wed, Jan 2, 2019 at 2:42 PM Schumaker, Anna
>> > <Anna.Schumaker@netapp.com> wrote:
>> > > We also were unable to track down a maintainer for Neil Brown's
>> > > changes to
>> > > the generic cred code that are prerequisites to his RPC cred
>> > > cleanup patches.
>> > > We've been asking around for several months without any response,
>> > > so
>> > > hopefully it's okay to include those patches in this pull
>> > > request.
>> > 
>> > Looks ok to me, although I wonder what the semantics of
>> > cred_fscmp()
>> > are across namespaces?
>> > 
>> > IOW, it seems potentially a bit suspicious to do cred_fscmp() if
>> > the
>> > two creds have different namnespaces? Hmm?
>> > 
>> > Is there some reason that can't happen, or some reason it doesn't
>> > matter?

I think the function might be better named cred_uidgid_cmp to make
it clear that the comparison only cares about the uid and gid values
in the cred.  As  Trond points out all of the values are of kuid_t
and kgid_t so are namespace independent at that point.


>> Interesting question.
>> For the current use in NFS, it is consistent with existing practice
>> to
>> ignore the name space.
>> NFS file accesses (when using the normal uid-based access checks)
>> always
>> use the manifest uid of the process - the one returned by getuid()
>> (or
>> more accurately, getfsuid()).
>> Maybe this is wrong?  Maybe we should always use from_kuid() or
>> whatever
>> to get the uid/gid to send over the wire?
>> 
>> Anna/Trond: do you have thoughts on this?  If a process in a user
>> namespace accesses a file over NFS, should the UID presented to the
>> server be the one in that name-space, or the one you get by mapping
>> to
>> the global name-space?
>> Or should we map to the namespace that was active when the filesystem
>> was mounted?
>> 
>> I don't think cred_fscmp() should do any of this mapping, but maybe
>> it
>> should treat creds from different namespaces as different - as a
>> precaution.
>> 
>> Thanks,
>> NeilBrown
>
> The values being compared are in cred_fscmp() are all of type kuid_t or
> kgid_t so that means they have already been mapped from the user
> namespace into the kernel uid/gid space.
> When we put those kuid/kgid values onto the wire, we currently always
> use the init namespace rather than the user namespace of the mount
> process.

It happens that mounts of nfs are still limited to mounter in the
initial user namespace.  While it seems plausible that we could allow
unprivileged users to mount nfs filesystems the auditing of the code
base has not been done to confirm that is safe.  The nfs code base is a
large attack surface so this something that must be done with quite a
bit of caution.

> When using strong authentication (i.e. krb5) then none of this matters,
> since the server performs its own mapping of the presented RPCSEC_GSS
> session into a credential. That mapping is independent of the user
> namespace on the client, it just depends on which krb5 principal the
> process used to identify itself.
>
> The problem case is limited to when we're using the weak AUTH_UNIX
> authentication, since the server is then implicitly trusting the client
> to protect against identity spoofing. This is particularly true if the
> NFS server is being accessed through NAT, in which case it has very
> limited possibilities for discriminating between containers on the same
> client using the export table because they will all originate from the
> same source IP address. I think that for these cases, using the init
> namespace is the right thing to do for the same reason we use it with
> local filesystems: if we try to use a different namespace then
> unprivileged userspace processes might be able to manipulate the
> mapping to spoof the identities of privileged users or groups, or
> otherwise gain access to files to which they normally should not have
> access.
>
> Does that argument make sense?

I think the NAT case is something to be concerned about.  However I
believe the same attack is possible with an ordinary UDP or TCP socket
bound for the same server.  So I don't know if we can protect the
server.

In general I believe we should put things on the wire that match
sb->s_user_ns.  Matching the mounter of the filesystem.  Because that
means behavior on the wire is independent of if you are running
in a container or not.  That seems to be the least surprising thing to
do, and would match what a userspace nfs client would do.  But again
that is not an immediate concern.

Eric
Linus Torvalds Jan. 3, 2019, 8:53 p.m. UTC | #6
On Wed, Jan 2, 2019 at 10:10 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> The values being compared are in cred_fscmp() are all of type kuid_t or
> kgid_t so that means they have already been mapped from the user
> namespace into the kernel uid/gid space.

Ok, "everything happens in init_ns" is I guess a very valid argument.

It might be good to perhaps show that in the name (and maybe even a
comment), but as long as it's only really used for nfs I guess it
isn't a big deal.

Thanks,

             Linus