diff mbox series

[v6] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing

Message ID 20230802-master-v6-1-45d48299168b@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v6] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing | expand

Commit Message

Jeff Layton Aug. 2, 2023, 4:49 p.m. UTC
From: David Howells <dhowells@redhat.com>

When NFS superblocks are created by automounting, their LSM parameters
aren't set in the fs_context struct prior to sget_fc() being called,
leading to failure to match existing superblocks.

Fix this by adding a new LSM hook to load fc->security for submount
creation when alloc_fs_context() is creating the fs_context for it.

However, this uncovers a further bug: nfs_get_root() initialises the
superblock security manually by calling security_sb_set_mnt_opts() or
security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
security_sb_set_mnt_opts(), which can lead to SELinux, at least,
complaining.

Fix that by adding a flag to the fs_context that suppresses the
security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
when it sets the LSM context on the new superblock.

The first bug leads to messages like the following appearing in dmesg:

	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
---
This patch was originally sent by David several months ago, but it
never got merged. I'm resending to resurrect the discussion. Can we
get this fixed?

ver #6)
 - Rebase onto v6.5.0-rc4

ver #5)
 - Removed unused variable.
 - Only allocate smack_mnt_opts if we're dealing with a submount.

ver #4)
 - When doing a FOR_SUBMOUNT mount, don't set the root label in SELinux or
   Smack.

ver #3)
 - Made LSM parameter extraction dependent on fc->purpose ==
   FS_CONTEXT_FOR_SUBMOUNT.  Shouldn't happen on FOR_RECONFIGURE.

ver #2)
 - Added Smack support
 - Made LSM parameter extraction dependent on reference != NULL.
---
 fs/fs_context.c               |  4 ++++
 fs/nfs/getroot.c              |  1 +
 fs/super.c                    | 10 ++++----
 include/linux/fs_context.h    |  1 +
 include/linux/lsm_hook_defs.h |  1 +
 include/linux/security.h      |  6 +++++
 security/security.c           | 14 +++++++++++
 security/selinux/hooks.c      | 25 ++++++++++++++++++++
 security/smack/smack_lsm.c    | 54 +++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 112 insertions(+), 4 deletions(-)


---
base-commit: 5d0c230f1de8c7515b6567d9afba1f196fb4e2f4
change-id: 20230802-master-3082090e8d69

Best regards,

Comments

Paul Moore Aug. 2, 2023, 6:16 p.m. UTC | #1
On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> 
> When NFS superblocks are created by automounting, their LSM parameters
> aren't set in the fs_context struct prior to sget_fc() being called,
> leading to failure to match existing superblocks.
> 
> Fix this by adding a new LSM hook to load fc->security for submount
> creation when alloc_fs_context() is creating the fs_context for it.
> 
> However, this uncovers a further bug: nfs_get_root() initialises the
> superblock security manually by calling security_sb_set_mnt_opts() or
> security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
> security_sb_set_mnt_opts(), which can lead to SELinux, at least,
> complaining.
> 
> Fix that by adding a flag to the fs_context that suppresses the
> security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
> when it sets the LSM context on the new superblock.
>
> The first bug leads to messages like the following appearing in dmesg:
> 
> 	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
> Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
> Tested-by: Jeff Layton <jlayton@kernel.org>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
> Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
> Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
> ---
> This patch was originally sent by David several months ago, but it
> never got merged. I'm resending to resurrect the discussion. Can we
> get this fixed?

Sorry, I sorta lost track of this after the ROOTCONTEXT_MNT discussion
back in v3.  Looking at it a bit closer now I have one nitpicky
request and one larger concern (see below).

> diff --git a/fs/super.c b/fs/super.c
> index e781226e2880..13adf43e2e5d 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1541,10 +1541,12 @@ int vfs_get_tree(struct fs_context *fc)
>  	smp_wmb();
>  	sb->s_flags |= SB_BORN;
>  
> -	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> -	if (unlikely(error)) {
> -		fc_drop_locked(fc);
> -		return error;
> +	if (!(fc->lsm_set)) {
> +		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> +		if (unlikely(error)) {
> +			fc_drop_locked(fc);
> +			return error;
> +		}
>  	}

I generally dislike core kernel code which makes LSM calls conditional
on some kernel state maintained outside the LSM.  Sometimes it has to
be done as there is no other good options, but I would like us to try
and avoid it if possible.  The commit description mentioned that this
was put here to avoid a SELinux complaint, can you provide an example
of the complain?  Does it complain about a double/invalid mount, e.g.
"SELinux: mount invalid.  Same superblock, different security ..."?

I'd like to understand why the sb_set_mnt_opts() call fails when it
comes after the fs_context_init() call.  I'm particulary curious to
know if the failure is due to conflicting SELinux state in the
fs_context, or if it is simply an issue of sb_set_mnt_opts() not
properly handling existing values.  Perhaps I'm being overly naive,
but I'm hopeful that we can address both of these within the SELinux
code itself.

In a worst case situation, we could always implement a flag *inside*
the SELinux code, similar to what has been done with 'lsm_set' here.

> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index d06e350fedee..29cce0fadbeb 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2745,6 +2745,30 @@ static int selinux_umount(struct vfsmount *mnt, int flags)
>  				   FILESYSTEM__UNMOUNT, NULL);
>  }
>  
> +static int selinux_fs_context_init(struct fs_context *fc,
> +				   struct dentry *reference)
> +{
> +	const struct superblock_security_struct *sbsec;
> +	struct selinux_mnt_opts *opts;
> +
> +	if (fc->purpose == FS_CONTEXT_FOR_SUBMOUNT) {
> +		opts = kzalloc(sizeof(*opts), GFP_KERNEL);
> +		if (!opts)
> +			return -ENOMEM;
> +
> +		sbsec = selinux_superblock(reference->d_sb);
> +		if (sbsec->flags & FSCONTEXT_MNT)
> +			opts->fscontext_sid	= sbsec->sid;
> +		if (sbsec->flags & CONTEXT_MNT)
> +			opts->context_sid	= sbsec->mntpoint_sid;
> +		if (sbsec->flags & DEFCONTEXT_MNT)
> +			opts->defcontext_sid	= sbsec->def_sid;

I acknowledge this is very nitpicky, but we're starting to make a
greater effort towards using consistent style within the SELinux
code.  With that in mind, please remove the alignment whitespace in
the assignments above.  Thank you.

> +		fc->security = opts;
> +	}
> +
> +	return 0;
> +}
> +
>  static int selinux_fs_context_dup(struct fs_context *fc,
>  				  struct fs_context *src_fc)
>  {

--
paul-moore.com
Jeff Layton Aug. 2, 2023, 7:34 p.m. UTC | #2
On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > When NFS superblocks are created by automounting, their LSM parameters
> > aren't set in the fs_context struct prior to sget_fc() being called,
> > leading to failure to match existing superblocks.
> > 
> > Fix this by adding a new LSM hook to load fc->security for submount
> > creation when alloc_fs_context() is creating the fs_context for it.
> > 
> > However, this uncovers a further bug: nfs_get_root() initialises the
> > superblock security manually by calling security_sb_set_mnt_opts() or
> > security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
> > security_sb_set_mnt_opts(), which can lead to SELinux, at least,
> > complaining.
> > 
> > Fix that by adding a flag to the fs_context that suppresses the
> > security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
> > when it sets the LSM context on the new superblock.
> > 
> > The first bug leads to messages like the following appearing in dmesg:
> > 
> > 	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
> > 
> > Signed-off-by: David Howells <dhowells@redhat.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
> > Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
> > Tested-by: Jeff Layton <jlayton@kernel.org>
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> > Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> > Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
> > Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
> > Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
> > Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
> > Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
> > ---
> > This patch was originally sent by David several months ago, but it
> > never got merged. I'm resending to resurrect the discussion. Can we
> > get this fixed?
> 
> Sorry, I sorta lost track of this after the ROOTCONTEXT_MNT discussion
> back in v3.  Looking at it a bit closer now I have one nitpicky
> request and one larger concern (see below).
> 
> > diff --git a/fs/super.c b/fs/super.c
> > index e781226e2880..13adf43e2e5d 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -1541,10 +1541,12 @@ int vfs_get_tree(struct fs_context *fc)
> >  	smp_wmb();
> >  	sb->s_flags |= SB_BORN;
> >  
> > -	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > -	if (unlikely(error)) {
> > -		fc_drop_locked(fc);
> > -		return error;
> > +	if (!(fc->lsm_set)) {
> > +		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > +		if (unlikely(error)) {
> > +			fc_drop_locked(fc);
> > +			return error;
> > +		}
> >  	}
> 
> I generally dislike core kernel code which makes LSM calls conditional
> on some kernel state maintained outside the LSM.  Sometimes it has to
> be done as there is no other good options, but I would like us to try
> and avoid it if possible.  The commit description mentioned that this
> was put here to avoid a SELinux complaint, can you provide an example
> of the complain?  Does it complain about a double/invalid mount, e.g.
> "SELinux: mount invalid.  Same superblock, different security ..."?
> 

The problem I had was not so much SELinux warnings, but rather that in a
situation where I would expect to share superblocks between two
filesystems, it didn't.

Basically if you do something like this:

# mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
# mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0

...when "foo" and "bar" are directories on the same filesystem on the
server, you should get two vfsmounts that share a superblock. That's
what you get if selinux is disabled, but not when it's enabled (even
when it's in permissive mode).

The problems that David hit with the automounter have a similar root
cause though, I believe.

> I'd like to understand why the sb_set_mnt_opts() call fails when it
> comes after the fs_context_init() call.  I'm particulary curious to
> know if the failure is due to conflicting SELinux state in the
> fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> properly handling existing values.  Perhaps I'm being overly naive,
> but I'm hopeful that we can address both of these within the SELinux
> code itself.
> 

The problem I hit was that nfs_compare_super is called with a fs_context
that has a NULL ->security pointer. That caused it to call
selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
it returns 1 and decides not to share sb's.

Filling out fc->security with this new operation seems to fix that, but
if you see a better way to do this, then I'm certainly open to the idea.

> In a worst case situation, we could always implement a flag *inside*
> the SELinux code, similar to what has been done with 'lsm_set' here.
> 

I'm fine with a different solution, if you see a better one. You'll have
to handhold me through this one though. LSM stuff is not really my
forte'. Let me know what you'd like to see here.


> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index d06e350fedee..29cce0fadbeb 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -2745,6 +2745,30 @@ static int selinux_umount(struct vfsmount *mnt, int flags)
> >  				   FILESYSTEM__UNMOUNT, NULL);
> >  }
> >  
> > +static int selinux_fs_context_init(struct fs_context *fc,
> > +				   struct dentry *reference)
> > +{
> > +	const struct superblock_security_struct *sbsec;
> > +	struct selinux_mnt_opts *opts;
> > +
> > +	if (fc->purpose == FS_CONTEXT_FOR_SUBMOUNT) {
> > +		opts = kzalloc(sizeof(*opts), GFP_KERNEL);
> > +		if (!opts)
> > +			return -ENOMEM;
> > +
> > +		sbsec = selinux_superblock(reference->d_sb);
> > +		if (sbsec->flags & FSCONTEXT_MNT)
> > +			opts->fscontext_sid	= sbsec->sid;
> > +		if (sbsec->flags & CONTEXT_MNT)
> > +			opts->context_sid	= sbsec->mntpoint_sid;
> > +		if (sbsec->flags & DEFCONTEXT_MNT)
> > +			opts->defcontext_sid	= sbsec->def_sid;
> 
> I acknowledge this is very nitpicky, but we're starting to make a
> greater effort towards using consistent style within the SELinux
> code.  With that in mind, please remove the alignment whitespace in
> the assignments above.  Thank you.
> 

Will do. Thanks for having a look!

> > +		fc->security = opts;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int selinux_fs_context_dup(struct fs_context *fc,
> >  				  struct fs_context *src_fc)
> >  {
> 
> --
> paul-moore.com
Paul Moore Aug. 3, 2023, 2:46 a.m. UTC | #3
On Wed, Aug 2, 2023 at 3:34 PM Jeff Layton <jlayton@kernel.org> wrote:
> On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:

...

> > I generally dislike core kernel code which makes LSM calls conditional
> > on some kernel state maintained outside the LSM.  Sometimes it has to
> > be done as there is no other good options, but I would like us to try
> > and avoid it if possible.  The commit description mentioned that this
> > was put here to avoid a SELinux complaint, can you provide an example
> > of the complain?  Does it complain about a double/invalid mount, e.g.
> > "SELinux: mount invalid.  Same superblock, different security ..."?
>
> The problem I had was not so much SELinux warnings, but rather that in a
> situation where I would expect to share superblocks between two
> filesystems, it didn't.
>
> Basically if you do something like this:
>
> # mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> # mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0
>
> ...when "foo" and "bar" are directories on the same filesystem on the
> server, you should get two vfsmounts that share a superblock. That's
> what you get if selinux is disabled, but not when it's enabled (even
> when it's in permissive mode).

Thanks, that helps.  I'm guessing the difference in behavior is due to
the old->has_sec_mnt_opts check in nfs_compare_super().

> > I'd like to understand why the sb_set_mnt_opts() call fails when it
> > comes after the fs_context_init() call.  I'm particulary curious to
> > know if the failure is due to conflicting SELinux state in the
> > fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> > properly handling existing values.  Perhaps I'm being overly naive,
> > but I'm hopeful that we can address both of these within the SELinux
> > code itself.
>
> The problem I hit was that nfs_compare_super is called with a fs_context
> that has a NULL ->security pointer. That caused it to call
> selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
> it returns 1 and decides not to share sb's.
>
> Filling out fc->security with this new operation seems to fix that, but
> if you see a better way to do this, then I'm certainly open to the idea.

Just as you mention that you are not a LSM expert, I am not a VFS
expert, so I think we'll have to help each other a bit ;)

I think I'm beginning to understand alloc_fs_context() a bit more,
including the fs_context_for_XXX() wrappers.  One thing I have
realized is that I believe we need to update the
selinux_fs_context_init() and smack_fs_context_init() functions to
properly handle a NULL @reference dentry; I think returning without
error in both cases is the correct answer.  In the non-NULL @reference
case, I believe your patch is correct, we do want to inherit the
options from @reference.  My only concern now is the
fs_context::lsm_set flag.

You didn't mention exactly why the security_sb_set_mnt_opts() was
failing, and requires the fs_context::lsm_set check, but my guess is
that something is tripping over the fact that the superblock is
already properly setup.  I'm working under the assumption that this
problem - attempting to reconfigure a properly configured superblock -
should only be happening in the submount/non-NULL-reference case.  If
it is happening elsewhere I think I'm going to need some help
understanding that ...

However, assuming I'm mostly correct in the above paragraph, would it
be possible to take a reference to the @reference dentry's superblock
in security_fs_context_init(), that we could later compare to the
superblock passed into security_sb_set_mnt_opts()?  If we know that
the fs_context was initialized with the same superblock we are now
being asked to set mount options on, we should be able to return from
the LSM hook without doing anything.

Right?

Or am I missing something really silly? :)
Christian Brauner Aug. 3, 2023, 1:27 p.m. UTC | #4
On Wed, Aug 02, 2023 at 03:34:27PM -0400, Jeff Layton wrote:
> On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > When NFS superblocks are created by automounting, their LSM parameters
> > > aren't set in the fs_context struct prior to sget_fc() being called,
> > > leading to failure to match existing superblocks.
> > > 
> > > Fix this by adding a new LSM hook to load fc->security for submount
> > > creation when alloc_fs_context() is creating the fs_context for it.
> > > 
> > > However, this uncovers a further bug: nfs_get_root() initialises the
> > > superblock security manually by calling security_sb_set_mnt_opts() or
> > > security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
> > > security_sb_set_mnt_opts(), which can lead to SELinux, at least,
> > > complaining.
> > > 
> > > Fix that by adding a flag to the fs_context that suppresses the
> > > security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
> > > when it sets the LSM context on the new superblock.
> > > 
> > > The first bug leads to messages like the following appearing in dmesg:
> > > 
> > > 	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
> > > 
> > > Signed-off-by: David Howells <dhowells@redhat.com>
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
> > > Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
> > > Tested-by: Jeff Layton <jlayton@kernel.org>
> > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> > > Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> > > Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
> > > Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
> > > Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
> > > Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
> > > Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
> > > ---
> > > This patch was originally sent by David several months ago, but it
> > > never got merged. I'm resending to resurrect the discussion. Can we
> > > get this fixed?
> > 
> > Sorry, I sorta lost track of this after the ROOTCONTEXT_MNT discussion
> > back in v3.  Looking at it a bit closer now I have one nitpicky
> > request and one larger concern (see below).
> > 
> > > diff --git a/fs/super.c b/fs/super.c
> > > index e781226e2880..13adf43e2e5d 100644
> > > --- a/fs/super.c
> > > +++ b/fs/super.c
> > > @@ -1541,10 +1541,12 @@ int vfs_get_tree(struct fs_context *fc)
> > >  	smp_wmb();
> > >  	sb->s_flags |= SB_BORN;
> > >  
> > > -	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > -	if (unlikely(error)) {
> > > -		fc_drop_locked(fc);
> > > -		return error;
> > > +	if (!(fc->lsm_set)) {
> > > +		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > +		if (unlikely(error)) {
> > > +			fc_drop_locked(fc);
> > > +			return error;
> > > +		}
> > >  	}
> > 
> > I generally dislike core kernel code which makes LSM calls conditional
> > on some kernel state maintained outside the LSM.  Sometimes it has to
> > be done as there is no other good options, but I would like us to try
> > and avoid it if possible.  The commit description mentioned that this
> > was put here to avoid a SELinux complaint, can you provide an example
> > of the complain?  Does it complain about a double/invalid mount, e.g.
> > "SELinux: mount invalid.  Same superblock, different security ..."?
> > 
> 
> The problem I had was not so much SELinux warnings, but rather that in a
> situation where I would expect to share superblocks between two
> filesystems, it didn't.
> 
> Basically if you do something like this:
> 
> # mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> # mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0
> 
> ...when "foo" and "bar" are directories on the same filesystem on the
> server, you should get two vfsmounts that share a superblock. That's
> what you get if selinux is disabled, but not when it's enabled (even
> when it's in permissive mode).
> 
> The problems that David hit with the automounter have a similar root
> cause though, I believe.
> 
> > I'd like to understand why the sb_set_mnt_opts() call fails when it
> > comes after the fs_context_init() call.  I'm particulary curious to
> > know if the failure is due to conflicting SELinux state in the
> > fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> > properly handling existing values.  Perhaps I'm being overly naive,
> > but I'm hopeful that we can address both of these within the SELinux
> > code itself.
> > 
> 
> The problem I hit was that nfs_compare_super is called with a fs_context
> that has a NULL ->security pointer. That caused it to call
> selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
> it returns 1 and decides not to share sb's.

I tried to follow this because I'm really still quite puzzled by this
whole thing. Two consecutive mounts that should share the superblock
don't share the superblock. But behavior differs between nfs3 and nfs4
due to how automounting works.

Afaict, the callchain you're looking at in this scenario is:

(1) nfs3

(1.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0,nfsvers=3
      vfs_get_tree(fc_foo)
      -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
            -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_foo)
               -> nfs_get_tree_common(fc_foo)
                  -> sb_foo = sget_fc(fc_foo, nfs_compare_super, ...)

(1.2) mount 127.0.0.1:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0,nfsvers=3
      vfs_get_tree(fc_bar)
      -> fs_contex_operations->get_tree::nfs_get_tree(fc_bar)
            -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_bar)
               -> nfs_get_tree_common(fc_bar)
                  -> sb_foo = sget_fc(fc_bar, nfs_compare_super, ...)
                     -> nfs_compare_super(sb_foo, fc_bar)
                        -> selinux_sb_mnt_opts_compat(sb_foo, fc_bar->security)

And fc_bar->security is non-NULL and compatible with sb_foo's current
security settings. Fine.

(2) nfs4

But for nfs4 we're looking at a vastly more complicated callchain at
least looking at this from a local nfs:

(2.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
      vfs_get_tree(fc_foo)
      -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
         -> if (!ctx->internal) branch is taken
            -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs4_try_get_tree(fc_foo)
               -> do_nfs4_mount(fc_foo)
                  -> fc_dup_foo = vfs_dup_fs_context(fc_foo)
                    -> security_fs_context_dup(fc_dup_foo, fc_foo)
                       {
                                fc_dup_foo->security = kmemdup(fc_foo->security)
                       }
                       new_fs_context->internal = true
                  -> foo_mnt = fc_mount(fc_dup_foo)
                    -> vfs_get_tree(fc_dup_foo)
                       -> if (!ctx->internal) branch is _not_ taken
                          -> nfs_get_tree_common(fc_dup_foo)
                                 sb_foo = sget_fc(fc, nfs_compare_super, ...)
                  -> mount_subtree()
                     -> vfs_path_lookup(..., "/export/foo", LOOKUP_AUTOMOUNT)
                        -> nfs_d_automount("export")
                           -> fc_sub_foo = fs_context_for_submount()
                              {
                                      fc_sub_bar->security = NULL
                              {
                           -> nfs4_submount(fc_sub_foo)
                              -> nfs4_do_submount(fc_sub_foo)
                                 -> vfs_get_tree(fc_sub_foo)
                                    -> nfs_get_tree_common(fc_sub_foo)
                                       -> sb_foo_2 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
                        -> nfs_d_automount("foo")
                           -> fc_sub_foo = fs_context_for_submount()
                              {
                                      fc_sub_bar->security = NULL
                              {
                           -> nfs4_submount(fc_sub_foo)
                              -> nfs4_do_submount(fc_sub_foo)
                                 -> vfs_get_tree(fc_sub_foo)
                                    -> nfs_get_tree_common(fc_sub_foo)
             |--------------------------> sb_foo_3 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
             |
As far as I can see you're already allocating 3 separate superblocks of
which two are discarded and only one survives. Afaict, the one that
survives is _| the last one. Under the assumption that I'm correct,
where does the third superblock get it's selinux context from given that
fc->security isn't even set during submount?

And where is the context=%s output generated for mountinfo?

Is this a correct callchain?

> 
> Filling out fc->security with this new operation seems to fix that, but
> if you see a better way to do this, then I'm certainly open to the idea.
> 
> > In a worst case situation, we could always implement a flag *inside*
> > the SELinux code, similar to what has been done with 'lsm_set' here.
> > 
> 
> I'm fine with a different solution, if you see a better one. You'll have

Independent of the modification in fs_context_for_submount() you might want to
think about something like:

static const struct fs_context_operations nfs4_fs_context_ops = {
      .free           = nfs4_free,
      .parse_param    = nfs4_parse_param,
      .get_tree       = nfs4_get_tree,
};

static const struct fs_context_operations nfs4_fs_submount_ops = {
      .free           = nfs4_free_submount,
      .parse_param    = nfs4_parse_param_submount,
      .get_tree       = nfs4_get_tree_submount,
};

static int nfs4_init_fs_context_submount(struct fs_context *fc)
{
        return 0;
}

static int nfs4_fs_context_get_tree(struct fs_context *fc)
{
        if (fc->purpose == FS_CONTEXT_FOR_SUBMOUNT)
                fc->ops = &nfs4_fs_submount_ops;
        else
                fc->ops = &nfs4_fs_context_ops;
        .
        .
        .
}

which will make the callchain probably a lot to follow instead of wafting
through the same nested functions over and over. But just a thought.
Jeff Layton Aug. 3, 2023, 4:09 p.m. UTC | #5
On Thu, 2023-08-03 at 15:27 +0200, Christian Brauner wrote:
> On Wed, Aug 02, 2023 at 03:34:27PM -0400, Jeff Layton wrote:
> > On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > When NFS superblocks are created by automounting, their LSM parameters
> > > > aren't set in the fs_context struct prior to sget_fc() being called,
> > > > leading to failure to match existing superblocks.
> > > > 
> > > > Fix this by adding a new LSM hook to load fc->security for submount
> > > > creation when alloc_fs_context() is creating the fs_context for it.
> > > > 
> > > > However, this uncovers a further bug: nfs_get_root() initialises the
> > > > superblock security manually by calling security_sb_set_mnt_opts() or
> > > > security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
> > > > security_sb_set_mnt_opts(), which can lead to SELinux, at least,
> > > > complaining.
> > > > 
> > > > Fix that by adding a flag to the fs_context that suppresses the
> > > > security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
> > > > when it sets the LSM context on the new superblock.
> > > > 
> > > > The first bug leads to messages like the following appearing in dmesg:
> > > > 
> > > > 	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
> > > > 
> > > > Signed-off-by: David Howells <dhowells@redhat.com>
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
> > > > Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
> > > > Tested-by: Jeff Layton <jlayton@kernel.org>
> > > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > > Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> > > > Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> > > > Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
> > > > Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
> > > > Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
> > > > Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
> > > > Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
> > > > ---
> > > > This patch was originally sent by David several months ago, but it
> > > > never got merged. I'm resending to resurrect the discussion. Can we
> > > > get this fixed?
> > > 
> > > Sorry, I sorta lost track of this after the ROOTCONTEXT_MNT discussion
> > > back in v3.  Looking at it a bit closer now I have one nitpicky
> > > request and one larger concern (see below).
> > > 
> > > > diff --git a/fs/super.c b/fs/super.c
> > > > index e781226e2880..13adf43e2e5d 100644
> > > > --- a/fs/super.c
> > > > +++ b/fs/super.c
> > > > @@ -1541,10 +1541,12 @@ int vfs_get_tree(struct fs_context *fc)
> > > >  	smp_wmb();
> > > >  	sb->s_flags |= SB_BORN;
> > > >  
> > > > -	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > -	if (unlikely(error)) {
> > > > -		fc_drop_locked(fc);
> > > > -		return error;
> > > > +	if (!(fc->lsm_set)) {
> > > > +		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > +		if (unlikely(error)) {
> > > > +			fc_drop_locked(fc);
> > > > +			return error;
> > > > +		}
> > > >  	}
> > > 
> > > I generally dislike core kernel code which makes LSM calls conditional
> > > on some kernel state maintained outside the LSM.  Sometimes it has to
> > > be done as there is no other good options, but I would like us to try
> > > and avoid it if possible.  The commit description mentioned that this
> > > was put here to avoid a SELinux complaint, can you provide an example
> > > of the complain?  Does it complain about a double/invalid mount, e.g.
> > > "SELinux: mount invalid.  Same superblock, different security ..."?
> > > 
> > 
> > The problem I had was not so much SELinux warnings, but rather that in a
> > situation where I would expect to share superblocks between two
> > filesystems, it didn't.
> > 
> > Basically if you do something like this:
> > 
> > # mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > # mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0
> > 
> > ...when "foo" and "bar" are directories on the same filesystem on the
> > server, you should get two vfsmounts that share a superblock. That's
> > what you get if selinux is disabled, but not when it's enabled (even
> > when it's in permissive mode).
> > 
> > The problems that David hit with the automounter have a similar root
> > cause though, I believe.
> > 
> > > I'd like to understand why the sb_set_mnt_opts() call fails when it
> > > comes after the fs_context_init() call.  I'm particulary curious to
> > > know if the failure is due to conflicting SELinux state in the
> > > fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> > > properly handling existing values.  Perhaps I'm being overly naive,
> > > but I'm hopeful that we can address both of these within the SELinux
> > > code itself.
> > > 
> > 
> > The problem I hit was that nfs_compare_super is called with a fs_context
> > that has a NULL ->security pointer. That caused it to call
> > selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
> > it returns 1 and decides not to share sb's.
> 
> I tried to follow this because I'm really still quite puzzled by this
> whole thing. Two consecutive mounts that should share the superblock
> don't share the superblock. But behavior differs between nfs3 and nfs4
> due to how automounting works.
> 
> Afaict, the callchain you're looking at in this scenario is:
> 
> (1) nfs3
> 
> (1.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0,nfsvers=3
>       vfs_get_tree(fc_foo)
>       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
>             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_foo)
>                -> nfs_get_tree_common(fc_foo)
>                   -> sb_foo = sget_fc(fc_foo, nfs_compare_super, ...)
> 
> (1.2) mount 127.0.0.1:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0,nfsvers=3
>       vfs_get_tree(fc_bar)
>       -> fs_contex_operations->get_tree::nfs_get_tree(fc_bar)
>             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_bar)
>                -> nfs_get_tree_common(fc_bar)
>                   -> sb_foo = sget_fc(fc_bar, nfs_compare_super, ...)
>                      -> nfs_compare_super(sb_foo, fc_bar)
>                         -> selinux_sb_mnt_opts_compat(sb_foo, fc_bar->security)
> 
> And fc_bar->security is non-NULL and compatible with sb_foo's current
> security settings. Fine.
> 
> (2) nfs4
> 
> But for nfs4 we're looking at a vastly more complicated callchain at
> least looking at this from a local nfs:
> 
> (2.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
>       vfs_get_tree(fc_foo)
>       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
>          -> if (!ctx->internal) branch is taken
>             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs4_try_get_tree(fc_foo)
>                -> do_nfs4_mount(fc_foo)
>                   -> fc_dup_foo = vfs_dup_fs_context(fc_foo)
>                     -> security_fs_context_dup(fc_dup_foo, fc_foo)
>                        {
>                                 fc_dup_foo->security = kmemdup(fc_foo->security)
>                        }
>                        new_fs_context->internal = true
>                   -> foo_mnt = fc_mount(fc_dup_foo)
>                     -> vfs_get_tree(fc_dup_foo)
>                        -> if (!ctx->internal) branch is _not_ taken
>                           -> nfs_get_tree_common(fc_dup_foo)
>                                  sb_foo = sget_fc(fc, nfs_compare_super, ...)
>                   -> mount_subtree()
>                      -> vfs_path_lookup(..., "/export/foo", LOOKUP_AUTOMOUNT)
>                         -> nfs_d_automount("export")
>                            -> fc_sub_foo = fs_context_for_submount()
>                               {
>                                       fc_sub_bar->security = NULL


Should the above be:

					fc_sub_foo->security = NULL;

?

If so, then with this patch, the above would no longer be NULL. We'd
inherit the security context info from the reference dentry passed to
fs_context_for_submount().

>                               {
>                            -> nfs4_submount(fc_sub_foo)
>                               -> nfs4_do_submount(fc_sub_foo)
>                                  -> vfs_get_tree(fc_sub_foo)
>                                     -> nfs_get_tree_common(fc_sub_foo)
>                                        -> sb_foo_2 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
>                         -> nfs_d_automount("foo")
>                            -> fc_sub_foo = fs_context_for_submount()
>                               {
>                                       fc_sub_bar->security = NULL

Ditto here -- that should be fc_sub_foo , correct?
>                               {
>                            -> nfs4_submount(fc_sub_foo)
>                               -> nfs4_do_submount(fc_sub_foo)
>                                  -> vfs_get_tree(fc_sub_foo)
>                                     -> nfs_get_tree_common(fc_sub_foo)
>              |--------------------------> sb_foo_3 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
>              |
> As far as I can see you're already allocating 3 separate superblocks of
> which two are discarded and only one survives. Afaict, the one that
> survives is _| the last one. Under the assumption that I'm correct,
> where does the third superblock get it's selinux context from given that
> fc->security isn't even set during submount?
> 

That's the problem this patch is intended to fix. It allows child mounts
to properly inherit security options from a parent dentry.

> And where is the context=%s output generated for mountinfo?
> 

security_sb_show_options / selinux_sb_show_options

> Is this a correct callchain?
> 

I think it looks about right, but I didn't verify the details to the
degree you have.

> > 
> > Filling out fc->security with this new operation seems to fix that, but
> > if you see a better way to do this, then I'm certainly open to the idea.
> > 
> > > In a worst case situation, we could always implement a flag *inside*
> > > the SELinux code, similar to what has been done with 'lsm_set' here.
> > > 
> > 
> > I'm fine with a different solution, if you see a better one. You'll have
> 
> Independent of the modification in fs_context_for_submount() you might want to
> think about something like:
> 
> static const struct fs_context_operations nfs4_fs_context_ops = {
>       .free           = nfs4_free,
>       .parse_param    = nfs4_parse_param,
>       .get_tree       = nfs4_get_tree,
> };
> 
> static const struct fs_context_operations nfs4_fs_submount_ops = {
>       .free           = nfs4_free_submount,
>       .parse_param    = nfs4_parse_param_submount,
>       .get_tree       = nfs4_get_tree_submount,
> };
> 
> static int nfs4_init_fs_context_submount(struct fs_context *fc)
> {
>         return 0;
> }
> 
> static int nfs4_fs_context_get_tree(struct fs_context *fc)
> {
>         if (fc->purpose == FS_CONTEXT_FOR_SUBMOUNT)
>                 fc->ops = &nfs4_fs_submount_ops;
>         else
>                 fc->ops = &nfs4_fs_context_ops;
>         .
>         .
>         .
> }
> 
> which will make the callchain probably a lot to follow instead of wafting
> through the same nested functions over and over. But just a thought.

Sounds reasonable. I'd rather do that sort of cleanup afterward though,
to make this patch easier to eventually backport.
Jeff Layton Aug. 3, 2023, 4:27 p.m. UTC | #6
On Wed, 2023-08-02 at 22:46 -0400, Paul Moore wrote:
> On Wed, Aug 2, 2023 at 3:34 PM Jeff Layton <jlayton@kernel.org> wrote:
> > On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> 
> ...
> 
> > > I generally dislike core kernel code which makes LSM calls conditional
> > > on some kernel state maintained outside the LSM.  Sometimes it has to
> > > be done as there is no other good options, but I would like us to try
> > > and avoid it if possible.  The commit description mentioned that this
> > > was put here to avoid a SELinux complaint, can you provide an example
> > > of the complain?  Does it complain about a double/invalid mount, e.g.
> > > "SELinux: mount invalid.  Same superblock, different security ..."?
> > 
> > The problem I had was not so much SELinux warnings, but rather that in a
> > situation where I would expect to share superblocks between two
> > filesystems, it didn't.
> > 
> > Basically if you do something like this:
> > 
> > # mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > # mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0
> > 
> > ...when "foo" and "bar" are directories on the same filesystem on the
> > server, you should get two vfsmounts that share a superblock. That's
> > what you get if selinux is disabled, but not when it's enabled (even
> > when it's in permissive mode).
> 
> Thanks, that helps.  I'm guessing the difference in behavior is due to
> the old->has_sec_mnt_opts check in nfs_compare_super().
> 

Yep. That gets set, but fc->security is still NULL.

> > > I'd like to understand why the sb_set_mnt_opts() call fails when it
> > > comes after the fs_context_init() call.  I'm particulary curious to
> > > know if the failure is due to conflicting SELinux state in the
> > > fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> > > properly handling existing values.  Perhaps I'm being overly naive,
> > > but I'm hopeful that we can address both of these within the SELinux
> > > code itself.
> > 
> > The problem I hit was that nfs_compare_super is called with a fs_context
> > that has a NULL ->security pointer. That caused it to call
> > selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
> > it returns 1 and decides not to share sb's.
> > 
> > Filling out fc->security with this new operation seems to fix that, but
> > if you see a better way to do this, then I'm certainly open to the idea.
> 
> Just as you mention that you are not a LSM expert, I am not a VFS
> expert, so I think we'll have to help each other a bit ;)
> 
> I think I'm beginning to understand alloc_fs_context() a bit more,
> including the fs_context_for_XXX() wrappers.  One thing I have
> realized is that I believe we need to update the
> selinux_fs_context_init() and smack_fs_context_init() functions to
> properly handle a NULL @reference dentry; I think returning without
> error in both cases is the correct answer.  In the non-NULL @reference
> case, I believe your patch is correct, we do want to inherit the
> options from @reference.
> 


ACK. That seems reasonable. I'll work that in.


>   My only concern now is the
> fs_context::lsm_set flag.
> 

Yeah, that bit is ugly. David studied this problem a lot more than I
have, but basically, we only want to set the context info once, and
we're not always going to have a nice string to parse to set up the
options. This obviously works, but I'm fine with a more elegant method
if you can spot one.


> You didn't mention exactly why the security_sb_set_mnt_opts() was
> failing, and requires the fs_context::lsm_set check, but my guess is
> that something is tripping over the fact that the superblock is
> already properly setup.  I'm working under the assumption that this
> problem - attempting to reconfigure a properly configured superblock -
> should only be happening in the submount/non-NULL-reference case.  If
> it is happening elsewhere I think I'm going to need some help
> understanding that ...
> 

Correct. When you pass in the mount options, fc->security seems to be
properly set. NFS mounting is complex though, so the final superblock
you care about may end up being a descendant of the one that was
originally configured.

This patch is intended to ensure we carry over security info in these
cases. We already try to inherit other parameters from parent mounts, so
this is just another set that we need to make sure we inherit.

> However, assuming I'm mostly correct in the above paragraph, would it
> be possible to take a reference to the @reference dentry's superblock
> in security_fs_context_init(), that we could later compare to the
> superblock passed into security_sb_set_mnt_opts()?  If we know that
> the fs_context was initialized with the same superblock we are now
> being asked to set mount options on, we should be able to return from
> the LSM hook without doing anything.
> 

I'm not sure that I follow your logic here:

You want to take a sb reference and carry that in the fs_context? What
will you do with it in security_sb_set_mnt_opts?

FWIW, It's generally easier to deal with inode or dentry references than
refs to the superblock too, so if we want to carry a reference to an
object around, we'd probably rather handle one of those.
Christian Brauner Aug. 3, 2023, 5:36 p.m. UTC | #7
On Thu, Aug 03, 2023 at 12:09:33PM -0400, Jeff Layton wrote:
> On Thu, 2023-08-03 at 15:27 +0200, Christian Brauner wrote:
> > On Wed, Aug 02, 2023 at 03:34:27PM -0400, Jeff Layton wrote:
> > > On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > > > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> > > > > 
> > > > > When NFS superblocks are created by automounting, their LSM parameters
> > > > > aren't set in the fs_context struct prior to sget_fc() being called,
> > > > > leading to failure to match existing superblocks.
> > > > > 
> > > > > Fix this by adding a new LSM hook to load fc->security for submount
> > > > > creation when alloc_fs_context() is creating the fs_context for it.
> > > > > 
> > > > > However, this uncovers a further bug: nfs_get_root() initialises the
> > > > > superblock security manually by calling security_sb_set_mnt_opts() or
> > > > > security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
> > > > > security_sb_set_mnt_opts(), which can lead to SELinux, at least,
> > > > > complaining.
> > > > > 
> > > > > Fix that by adding a flag to the fs_context that suppresses the
> > > > > security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
> > > > > when it sets the LSM context on the new superblock.
> > > > > 
> > > > > The first bug leads to messages like the following appearing in dmesg:
> > > > > 
> > > > > 	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
> > > > > 
> > > > > Signed-off-by: David Howells <dhowells@redhat.com>
> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
> > > > > Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
> > > > > Tested-by: Jeff Layton <jlayton@kernel.org>
> > > > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > > > Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> > > > > Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> > > > > Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
> > > > > Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
> > > > > Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
> > > > > Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
> > > > > Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
> > > > > ---
> > > > > This patch was originally sent by David several months ago, but it
> > > > > never got merged. I'm resending to resurrect the discussion. Can we
> > > > > get this fixed?
> > > > 
> > > > Sorry, I sorta lost track of this after the ROOTCONTEXT_MNT discussion
> > > > back in v3.  Looking at it a bit closer now I have one nitpicky
> > > > request and one larger concern (see below).
> > > > 
> > > > > diff --git a/fs/super.c b/fs/super.c
> > > > > index e781226e2880..13adf43e2e5d 100644
> > > > > --- a/fs/super.c
> > > > > +++ b/fs/super.c
> > > > > @@ -1541,10 +1541,12 @@ int vfs_get_tree(struct fs_context *fc)
> > > > >  	smp_wmb();
> > > > >  	sb->s_flags |= SB_BORN;
> > > > >  
> > > > > -	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > > -	if (unlikely(error)) {
> > > > > -		fc_drop_locked(fc);
> > > > > -		return error;
> > > > > +	if (!(fc->lsm_set)) {
> > > > > +		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > > +		if (unlikely(error)) {
> > > > > +			fc_drop_locked(fc);
> > > > > +			return error;
> > > > > +		}
> > > > >  	}
> > > > 
> > > > I generally dislike core kernel code which makes LSM calls conditional
> > > > on some kernel state maintained outside the LSM.  Sometimes it has to
> > > > be done as there is no other good options, but I would like us to try
> > > > and avoid it if possible.  The commit description mentioned that this
> > > > was put here to avoid a SELinux complaint, can you provide an example
> > > > of the complain?  Does it complain about a double/invalid mount, e.g.
> > > > "SELinux: mount invalid.  Same superblock, different security ..."?
> > > > 
> > > 
> > > The problem I had was not so much SELinux warnings, but rather that in a
> > > situation where I would expect to share superblocks between two
> > > filesystems, it didn't.
> > > 
> > > Basically if you do something like this:
> > > 
> > > # mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > > # mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0
> > > 
> > > ...when "foo" and "bar" are directories on the same filesystem on the
> > > server, you should get two vfsmounts that share a superblock. That's
> > > what you get if selinux is disabled, but not when it's enabled (even
> > > when it's in permissive mode).
> > > 
> > > The problems that David hit with the automounter have a similar root
> > > cause though, I believe.
> > > 
> > > > I'd like to understand why the sb_set_mnt_opts() call fails when it
> > > > comes after the fs_context_init() call.  I'm particulary curious to
> > > > know if the failure is due to conflicting SELinux state in the
> > > > fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> > > > properly handling existing values.  Perhaps I'm being overly naive,
> > > > but I'm hopeful that we can address both of these within the SELinux
> > > > code itself.
> > > > 
> > > 
> > > The problem I hit was that nfs_compare_super is called with a fs_context
> > > that has a NULL ->security pointer. That caused it to call
> > > selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
> > > it returns 1 and decides not to share sb's.
> > 
> > I tried to follow this because I'm really still quite puzzled by this
> > whole thing. Two consecutive mounts that should share the superblock
> > don't share the superblock. But behavior differs between nfs3 and nfs4
> > due to how automounting works.
> > 
> > Afaict, the callchain you're looking at in this scenario is:
> > 
> > (1) nfs3
> > 
> > (1.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0,nfsvers=3
> >       vfs_get_tree(fc_foo)
> >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
> >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_foo)
> >                -> nfs_get_tree_common(fc_foo)
> >                   -> sb_foo = sget_fc(fc_foo, nfs_compare_super, ...)
> > 
> > (1.2) mount 127.0.0.1:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0,nfsvers=3
> >       vfs_get_tree(fc_bar)
> >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_bar)
> >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_bar)
> >                -> nfs_get_tree_common(fc_bar)
> >                   -> sb_foo = sget_fc(fc_bar, nfs_compare_super, ...)
> >                      -> nfs_compare_super(sb_foo, fc_bar)
> >                         -> selinux_sb_mnt_opts_compat(sb_foo, fc_bar->security)
> > 
> > And fc_bar->security is non-NULL and compatible with sb_foo's current
> > security settings. Fine.
> > 
> > (2) nfs4
> > 
> > But for nfs4 we're looking at a vastly more complicated callchain at
> > least looking at this from a local nfs:
> > 
> > (2.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> >       vfs_get_tree(fc_foo)
> >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
> >          -> if (!ctx->internal) branch is taken
> >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs4_try_get_tree(fc_foo)
> >                -> do_nfs4_mount(fc_foo)
> >                   -> fc_dup_foo = vfs_dup_fs_context(fc_foo)
> >                     -> security_fs_context_dup(fc_dup_foo, fc_foo)
> >                        {
> >                                 fc_dup_foo->security = kmemdup(fc_foo->security)
> >                        }
> >                        new_fs_context->internal = true
> >                   -> foo_mnt = fc_mount(fc_dup_foo)
> >                     -> vfs_get_tree(fc_dup_foo)
> >                        -> if (!ctx->internal) branch is _not_ taken
> >                           -> nfs_get_tree_common(fc_dup_foo)
> >                                  sb_foo = sget_fc(fc, nfs_compare_super, ...)
> >                   -> mount_subtree()
> >                      -> vfs_path_lookup(..., "/export/foo", LOOKUP_AUTOMOUNT)
> >                         -> nfs_d_automount("export")
> >                            -> fc_sub_foo = fs_context_for_submount()
> >                               {
> >                                       fc_sub_bar->security = NULL
> 
> 
> Should the above be:
> 
> 					fc_sub_foo->security = NULL;

Yes, typo for whatever reason.

> 
> ?
> 
> If so, then with this patch, the above would no longer be NULL. We'd
> inherit the security context info from the reference dentry passed to
> fs_context_for_submount().
> 
> >                               {
> >                            -> nfs4_submount(fc_sub_foo)
> >                               -> nfs4_do_submount(fc_sub_foo)
> >                                  -> vfs_get_tree(fc_sub_foo)
> >                                     -> nfs_get_tree_common(fc_sub_foo)
> >                                        -> sb_foo_2 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
> >                         -> nfs_d_automount("foo")
> >                            -> fc_sub_foo = fs_context_for_submount()
> >                               {
> >                                       fc_sub_bar->security = NULL
> 
> Ditto here -- that should be fc_sub_foo , correct?

Yes, same. Was just a typo.

> >                               {
> >                            -> nfs4_submount(fc_sub_foo)
> >                               -> nfs4_do_submount(fc_sub_foo)
> >                                  -> vfs_get_tree(fc_sub_foo)
> >                                     -> nfs_get_tree_common(fc_sub_foo)
> >              |--------------------------> sb_foo_3 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
> >              |
> > As far as I can see you're already allocating 3 separate superblocks of
> > which two are discarded and only one survives. Afaict, the one that
> > survives is _| the last one. Under the assumption that I'm correct,
> > where does the third superblock get it's selinux context from given that
> > fc->security isn't even set during submount?
> > 
> 
> That's the problem this patch is intended to fix. It allows child mounts
> to properly inherit security options from a parent dentry.

Yeah, I'm aware. Your patch will ensure that the last superblock is
found again. But you're always going to allocate addititional
superblocks afaict. That's at least what I can gather from the logic.
Say you have:

/export/a/b/c/d/e/foo   *(rw,insecure,no_subtree_check,no_root_squash)
/export/a/b/c/d/e/bar   *(rw,insecure,no_subtree_check,no_root_squash)

you allocate 8 superblocks (it's always path components +1) of which you
immediately discard 7 after you finished. That's easily reproducible
with selinux completely disabled. I'm just astonished.

> 
> > And where is the context=%s output generated for mountinfo?
> > 
> 
> security_sb_show_options / selinux_sb_show_options
> 
> > Is this a correct callchain?
> > 
> 
> I think it looks about right, but I didn't verify the details to the
> degree you have.
> 
> > > 
> > > Filling out fc->security with this new operation seems to fix that, but
> > > if you see a better way to do this, then I'm certainly open to the idea.
> > > 
> > > > In a worst case situation, we could always implement a flag *inside*
> > > > the SELinux code, similar to what has been done with 'lsm_set' here.
> > > > 
> > > 
> > > I'm fine with a different solution, if you see a better one. You'll have
> > 
> > Independent of the modification in fs_context_for_submount() you might want to
> > think about something like:
> > 
> > static const struct fs_context_operations nfs4_fs_context_ops = {
> >       .free           = nfs4_free,
> >       .parse_param    = nfs4_parse_param,
> >       .get_tree       = nfs4_get_tree,
> > };
> > 
> > static const struct fs_context_operations nfs4_fs_submount_ops = {
> >       .free           = nfs4_free_submount,
> >       .parse_param    = nfs4_parse_param_submount,
> >       .get_tree       = nfs4_get_tree_submount,
> > };
> > 
> > static int nfs4_init_fs_context_submount(struct fs_context *fc)
> > {
> >         return 0;
> > }
> > 
> > static int nfs4_fs_context_get_tree(struct fs_context *fc)
> > {
> >         if (fc->purpose == FS_CONTEXT_FOR_SUBMOUNT)
> >                 fc->ops = &nfs4_fs_submount_ops;
> >         else
> >                 fc->ops = &nfs4_fs_context_ops;
> >         .
> >         .
> >         .
> > }
> > 
> > which will make the callchain probably a lot to follow instead of wafting
> > through the same nested functions over and over. But just a thought.
> 
> Sounds reasonable. I'd rather do that sort of cleanup afterward though,
> to make this patch easier to eventually backport.

Yeah, sure.
Jeff Layton Aug. 3, 2023, 6:58 p.m. UTC | #8
On Thu, 2023-08-03 at 19:36 +0200, Christian Brauner wrote:
> On Thu, Aug 03, 2023 at 12:09:33PM -0400, Jeff Layton wrote:
> > On Thu, 2023-08-03 at 15:27 +0200, Christian Brauner wrote:
> > > On Wed, Aug 02, 2023 at 03:34:27PM -0400, Jeff Layton wrote:
> > > > On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > > > > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > 
> > > > > > When NFS superblocks are created by automounting, their LSM parameters
> > > > > > aren't set in the fs_context struct prior to sget_fc() being called,
> > > > > > leading to failure to match existing superblocks.
> > > > > > 
> > > > > > Fix this by adding a new LSM hook to load fc->security for submount
> > > > > > creation when alloc_fs_context() is creating the fs_context for it.
> > > > > > 
> > > > > > However, this uncovers a further bug: nfs_get_root() initialises the
> > > > > > superblock security manually by calling security_sb_set_mnt_opts() or
> > > > > > security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
> > > > > > security_sb_set_mnt_opts(), which can lead to SELinux, at least,
> > > > > > complaining.
> > > > > > 
> > > > > > Fix that by adding a flag to the fs_context that suppresses the
> > > > > > security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
> > > > > > when it sets the LSM context on the new superblock.
> > > > > > 
> > > > > > The first bug leads to messages like the following appearing in dmesg:
> > > > > > 
> > > > > > 	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
> > > > > > 
> > > > > > Signed-off-by: David Howells <dhowells@redhat.com>
> > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
> > > > > > Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
> > > > > > Tested-by: Jeff Layton <jlayton@kernel.org>
> > > > > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > > > > Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> > > > > > Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> > > > > > Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
> > > > > > Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
> > > > > > Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
> > > > > > Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
> > > > > > Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
> > > > > > ---
> > > > > > This patch was originally sent by David several months ago, but it
> > > > > > never got merged. I'm resending to resurrect the discussion. Can we
> > > > > > get this fixed?
> > > > > 
> > > > > Sorry, I sorta lost track of this after the ROOTCONTEXT_MNT discussion
> > > > > back in v3.  Looking at it a bit closer now I have one nitpicky
> > > > > request and one larger concern (see below).
> > > > > 
> > > > > > diff --git a/fs/super.c b/fs/super.c
> > > > > > index e781226e2880..13adf43e2e5d 100644
> > > > > > --- a/fs/super.c
> > > > > > +++ b/fs/super.c
> > > > > > @@ -1541,10 +1541,12 @@ int vfs_get_tree(struct fs_context *fc)
> > > > > >  	smp_wmb();
> > > > > >  	sb->s_flags |= SB_BORN;
> > > > > >  
> > > > > > -	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > > > -	if (unlikely(error)) {
> > > > > > -		fc_drop_locked(fc);
> > > > > > -		return error;
> > > > > > +	if (!(fc->lsm_set)) {
> > > > > > +		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > > > +		if (unlikely(error)) {
> > > > > > +			fc_drop_locked(fc);
> > > > > > +			return error;
> > > > > > +		}
> > > > > >  	}
> > > > > 
> > > > > I generally dislike core kernel code which makes LSM calls conditional
> > > > > on some kernel state maintained outside the LSM.  Sometimes it has to
> > > > > be done as there is no other good options, but I would like us to try
> > > > > and avoid it if possible.  The commit description mentioned that this
> > > > > was put here to avoid a SELinux complaint, can you provide an example
> > > > > of the complain?  Does it complain about a double/invalid mount, e.g.
> > > > > "SELinux: mount invalid.  Same superblock, different security ..."?
> > > > > 
> > > > 
> > > > The problem I had was not so much SELinux warnings, but rather that in a
> > > > situation where I would expect to share superblocks between two
> > > > filesystems, it didn't.
> > > > 
> > > > Basically if you do something like this:
> > > > 
> > > > # mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > > > # mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0
> > > > 
> > > > ...when "foo" and "bar" are directories on the same filesystem on the
> > > > server, you should get two vfsmounts that share a superblock. That's
> > > > what you get if selinux is disabled, but not when it's enabled (even
> > > > when it's in permissive mode).
> > > > 
> > > > The problems that David hit with the automounter have a similar root
> > > > cause though, I believe.
> > > > 
> > > > > I'd like to understand why the sb_set_mnt_opts() call fails when it
> > > > > comes after the fs_context_init() call.  I'm particulary curious to
> > > > > know if the failure is due to conflicting SELinux state in the
> > > > > fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> > > > > properly handling existing values.  Perhaps I'm being overly naive,
> > > > > but I'm hopeful that we can address both of these within the SELinux
> > > > > code itself.
> > > > > 
> > > > 
> > > > The problem I hit was that nfs_compare_super is called with a fs_context
> > > > that has a NULL ->security pointer. That caused it to call
> > > > selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
> > > > it returns 1 and decides not to share sb's.
> > > 
> > > I tried to follow this because I'm really still quite puzzled by this
> > > whole thing. Two consecutive mounts that should share the superblock
> > > don't share the superblock. But behavior differs between nfs3 and nfs4
> > > due to how automounting works.
> > > 
> > > Afaict, the callchain you're looking at in this scenario is:
> > > 
> > > (1) nfs3
> > > 
> > > (1.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0,nfsvers=3
> > >       vfs_get_tree(fc_foo)
> > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
> > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_foo)
> > >                -> nfs_get_tree_common(fc_foo)
> > >                   -> sb_foo = sget_fc(fc_foo, nfs_compare_super, ...)
> > > 
> > > (1.2) mount 127.0.0.1:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0,nfsvers=3
> > >       vfs_get_tree(fc_bar)
> > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_bar)
> > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_bar)
> > >                -> nfs_get_tree_common(fc_bar)
> > >                   -> sb_foo = sget_fc(fc_bar, nfs_compare_super, ...)
> > >                      -> nfs_compare_super(sb_foo, fc_bar)
> > >                         -> selinux_sb_mnt_opts_compat(sb_foo, fc_bar->security)
> > > 
> > > And fc_bar->security is non-NULL and compatible with sb_foo's current
> > > security settings. Fine.
> > > 
> > > (2) nfs4
> > > 
> > > But for nfs4 we're looking at a vastly more complicated callchain at
> > > least looking at this from a local nfs:
> > > 
> > > (2.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > >       vfs_get_tree(fc_foo)
> > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
> > >          -> if (!ctx->internal) branch is taken
> > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs4_try_get_tree(fc_foo)
> > >                -> do_nfs4_mount(fc_foo)
> > >                   -> fc_dup_foo = vfs_dup_fs_context(fc_foo)
> > >                     -> security_fs_context_dup(fc_dup_foo, fc_foo)
> > >                        {
> > >                                 fc_dup_foo->security = kmemdup(fc_foo->security)
> > >                        }
> > >                        new_fs_context->internal = true
> > >                   -> foo_mnt = fc_mount(fc_dup_foo)
> > >                     -> vfs_get_tree(fc_dup_foo)
> > >                        -> if (!ctx->internal) branch is _not_ taken
> > >                           -> nfs_get_tree_common(fc_dup_foo)
> > >                                  sb_foo = sget_fc(fc, nfs_compare_super, ...)
> > >                   -> mount_subtree()
> > >                      -> vfs_path_lookup(..., "/export/foo", LOOKUP_AUTOMOUNT)
> > >                         -> nfs_d_automount("export")
> > >                            -> fc_sub_foo = fs_context_for_submount()
> > >                               {
> > >                                       fc_sub_bar->security = NULL
> > 
> > 
> > Should the above be:
> > 
> > 					fc_sub_foo->security = NULL;
> 
> Yes, typo for whatever reason.
> 
> > 
> > ?
> > 
> > If so, then with this patch, the above would no longer be NULL. We'd
> > inherit the security context info from the reference dentry passed to
> > fs_context_for_submount().
> > 
> > >                               {
> > >                            -> nfs4_submount(fc_sub_foo)
> > >                               -> nfs4_do_submount(fc_sub_foo)
> > >                                  -> vfs_get_tree(fc_sub_foo)
> > >                                     -> nfs_get_tree_common(fc_sub_foo)
> > >                                        -> sb_foo_2 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
> > >                         -> nfs_d_automount("foo")
> > >                            -> fc_sub_foo = fs_context_for_submount()
> > >                               {
> > >                                       fc_sub_bar->security = NULL
> > 
> > Ditto here -- that should be fc_sub_foo , correct?
> 
> Yes, same. Was just a typo.
> 
> > >                               {
> > >                            -> nfs4_submount(fc_sub_foo)
> > >                               -> nfs4_do_submount(fc_sub_foo)
> > >                                  -> vfs_get_tree(fc_sub_foo)
> > >                                     -> nfs_get_tree_common(fc_sub_foo)
> > >              |--------------------------> sb_foo_3 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
> > >              |
> > > As far as I can see you're already allocating 3 separate superblocks of
> > > which two are discarded and only one survives. Afaict, the one that
> > > survives is _| the last one. Under the assumption that I'm correct,
> > > where does the third superblock get it's selinux context from given that
> > > fc->security isn't even set during submount?
> > > 
> > 
> > That's the problem this patch is intended to fix. It allows child mounts
> > to properly inherit security options from a parent dentry.
> 
> Yeah, I'm aware. Your patch will ensure that the last superblock is
> found again. But you're always going to allocate addititional
> superblocks afaict. That's at least what I can gather from the logic.
> Say you have:
> 
> /export/a/b/c/d/e/foo   *(rw,insecure,no_subtree_check,no_root_squash)
> /export/a/b/c/d/e/bar   *(rw,insecure,no_subtree_check,no_root_squash)
> 
> you allocate 8 superblocks (it's always path components +1) of which you
> immediately discard 7 after you finished. That's easily reproducible
> with selinux completely disabled. I'm just astonished.
> 

Actually, your callchain might not be correct.

I think that you should only end up calling back into nfs_d_automount
and creating a new sb when we cross a mount boundary. So if each of
those intermediate directories represents a different fs, then you'll
get a bunch of superblocks that will end up discarded, but I don't
believe we create a new mount just for intermediate directories that we
can walk.

Basically the nfsv4 mount process is to create a (hidden) superblock for
the root of the tree on the server, and then use the normal pathwalk
scheme to walk down to the right dentry for the root. Once we get there
we can prune everything above that point and we end up with a single sb.



> > 
> > > And where is the context=%s output generated for mountinfo?
> > > 
> > 
> > security_sb_show_options / selinux_sb_show_options
> > 
> > > Is this a correct callchain?
> > > 
> > 
> > I think it looks about right, but I didn't verify the details to the
> > degree you have.
> > 
> > > > 
> > > > Filling out fc->security with this new operation seems to fix that, but
> > > > if you see a better way to do this, then I'm certainly open to the idea.
> > > > 
> > > > > In a worst case situation, we could always implement a flag *inside*
> > > > > the SELinux code, similar to what has been done with 'lsm_set' here.
> > > > > 
> > > > 
> > > > I'm fine with a different solution, if you see a better one. You'll have
> > > 
> > > Independent of the modification in fs_context_for_submount() you might want to
> > > think about something like:
> > > 
> > > static const struct fs_context_operations nfs4_fs_context_ops = {
> > >       .free           = nfs4_free,
> > >       .parse_param    = nfs4_parse_param,
> > >       .get_tree       = nfs4_get_tree,
> > > };
> > > 
> > > static const struct fs_context_operations nfs4_fs_submount_ops = {
> > >       .free           = nfs4_free_submount,
> > >       .parse_param    = nfs4_parse_param_submount,
> > >       .get_tree       = nfs4_get_tree_submount,
> > > };
> > > 
> > > static int nfs4_init_fs_context_submount(struct fs_context *fc)
> > > {
> > >         return 0;
> > > }
> > > 
> > > static int nfs4_fs_context_get_tree(struct fs_context *fc)
> > > {
> > >         if (fc->purpose == FS_CONTEXT_FOR_SUBMOUNT)
> > >                 fc->ops = &nfs4_fs_submount_ops;
> > >         else
> > >                 fc->ops = &nfs4_fs_context_ops;
> > >         .
> > >         .
> > >         .
> > > }
> > > 
> > > which will make the callchain probably a lot to follow instead of wafting
> > > through the same nested functions over and over. But just a thought.
> > 
> > Sounds reasonable. I'd rather do that sort of cleanup afterward though,
> > to make this patch easier to eventually backport.
> 
> Yeah, sure.
Paul Moore Aug. 4, 2023, 2:48 a.m. UTC | #9
On Thu, Aug 3, 2023 at 12:27 PM Jeff Layton <jlayton@kernel.org> wrote:
> On Wed, 2023-08-02 at 22:46 -0400, Paul Moore wrote:
> > On Wed, Aug 2, 2023 at 3:34 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > > > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:

...

> > My only concern now is the fs_context::lsm_set flag.
>
> Yeah, that bit is ugly. David studied this problem a lot more than I
> have, but basically, we only want to set the context info once, and
> we're not always going to have a nice string to parse to set up the
> options. This obviously works, but I'm fine with a more elegant method
> if you can spot one.

Like I said before, sometimes making a LSM hook conditional on some
flag is the only practical solution, but I always worry that there is
a chance that a future patch might end up toggling that flag by
accident and we lose an important call into the LSM.  Even if all we
end up doing is moving the flag down into the LSMs I would be happier;
there is still a risk, but at least if something breaks it is our (the
LSM folks) own damn fault ;)

> > You didn't mention exactly why the security_sb_set_mnt_opts() was
> > failing, and requires the fs_context::lsm_set check, but my guess is
> > that something is tripping over the fact that the superblock is
> > already properly setup.  I'm working under the assumption that this
> > problem - attempting to reconfigure a properly configured superblock -
> > should only be happening in the submount/non-NULL-reference case.  If
> > it is happening elsewhere I think I'm going to need some help
> > understanding that ...
>
> Correct. When you pass in the mount options, fc->security seems to be
> properly set. NFS mounting is complex though, so the final superblock
> you care about may end up being a descendant of the one that was
> originally configured.

Ooof, okay, there goes that idea.

At this point I guess it comes back to that question of why is calling
into security_sb_set_mnt_opts() a second (or third, etc.) time failing
for you?  Is there some conflict with the superblock
config/labeling/etc.?  Is there a permissions problem?  Better
understanding why that is failing might help us come up with a better
solution.
Christian Brauner Aug. 4, 2023, 8:25 a.m. UTC | #10
On Thu, Aug 03, 2023 at 02:58:46PM -0400, Jeff Layton wrote:
> On Thu, 2023-08-03 at 19:36 +0200, Christian Brauner wrote:
> > On Thu, Aug 03, 2023 at 12:09:33PM -0400, Jeff Layton wrote:
> > > On Thu, 2023-08-03 at 15:27 +0200, Christian Brauner wrote:
> > > > On Wed, Aug 02, 2023 at 03:34:27PM -0400, Jeff Layton wrote:
> > > > > On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > > > > > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > > 
> > > > > > > When NFS superblocks are created by automounting, their LSM parameters
> > > > > > > aren't set in the fs_context struct prior to sget_fc() being called,
> > > > > > > leading to failure to match existing superblocks.
> > > > > > > 
> > > > > > > Fix this by adding a new LSM hook to load fc->security for submount
> > > > > > > creation when alloc_fs_context() is creating the fs_context for it.
> > > > > > > 
> > > > > > > However, this uncovers a further bug: nfs_get_root() initialises the
> > > > > > > superblock security manually by calling security_sb_set_mnt_opts() or
> > > > > > > security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
> > > > > > > security_sb_set_mnt_opts(), which can lead to SELinux, at least,
> > > > > > > complaining.
> > > > > > > 
> > > > > > > Fix that by adding a flag to the fs_context that suppresses the
> > > > > > > security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
> > > > > > > when it sets the LSM context on the new superblock.
> > > > > > > 
> > > > > > > The first bug leads to messages like the following appearing in dmesg:
> > > > > > > 
> > > > > > > 	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
> > > > > > > 
> > > > > > > Signed-off-by: David Howells <dhowells@redhat.com>
> > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
> > > > > > > Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
> > > > > > > Tested-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> > > > > > > Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> > > > > > > Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
> > > > > > > Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
> > > > > > > Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
> > > > > > > Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
> > > > > > > Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
> > > > > > > ---
> > > > > > > This patch was originally sent by David several months ago, but it
> > > > > > > never got merged. I'm resending to resurrect the discussion. Can we
> > > > > > > get this fixed?
> > > > > > 
> > > > > > Sorry, I sorta lost track of this after the ROOTCONTEXT_MNT discussion
> > > > > > back in v3.  Looking at it a bit closer now I have one nitpicky
> > > > > > request and one larger concern (see below).
> > > > > > 
> > > > > > > diff --git a/fs/super.c b/fs/super.c
> > > > > > > index e781226e2880..13adf43e2e5d 100644
> > > > > > > --- a/fs/super.c
> > > > > > > +++ b/fs/super.c
> > > > > > > @@ -1541,10 +1541,12 @@ int vfs_get_tree(struct fs_context *fc)
> > > > > > >  	smp_wmb();
> > > > > > >  	sb->s_flags |= SB_BORN;
> > > > > > >  
> > > > > > > -	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > > > > -	if (unlikely(error)) {
> > > > > > > -		fc_drop_locked(fc);
> > > > > > > -		return error;
> > > > > > > +	if (!(fc->lsm_set)) {
> > > > > > > +		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > > > > +		if (unlikely(error)) {
> > > > > > > +			fc_drop_locked(fc);
> > > > > > > +			return error;
> > > > > > > +		}
> > > > > > >  	}
> > > > > > 
> > > > > > I generally dislike core kernel code which makes LSM calls conditional
> > > > > > on some kernel state maintained outside the LSM.  Sometimes it has to
> > > > > > be done as there is no other good options, but I would like us to try
> > > > > > and avoid it if possible.  The commit description mentioned that this
> > > > > > was put here to avoid a SELinux complaint, can you provide an example
> > > > > > of the complain?  Does it complain about a double/invalid mount, e.g.
> > > > > > "SELinux: mount invalid.  Same superblock, different security ..."?
> > > > > > 
> > > > > 
> > > > > The problem I had was not so much SELinux warnings, but rather that in a
> > > > > situation where I would expect to share superblocks between two
> > > > > filesystems, it didn't.
> > > > > 
> > > > > Basically if you do something like this:
> > > > > 
> > > > > # mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > > > > # mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0
> > > > > 
> > > > > ...when "foo" and "bar" are directories on the same filesystem on the
> > > > > server, you should get two vfsmounts that share a superblock. That's
> > > > > what you get if selinux is disabled, but not when it's enabled (even
> > > > > when it's in permissive mode).
> > > > > 
> > > > > The problems that David hit with the automounter have a similar root
> > > > > cause though, I believe.
> > > > > 
> > > > > > I'd like to understand why the sb_set_mnt_opts() call fails when it
> > > > > > comes after the fs_context_init() call.  I'm particulary curious to
> > > > > > know if the failure is due to conflicting SELinux state in the
> > > > > > fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> > > > > > properly handling existing values.  Perhaps I'm being overly naive,
> > > > > > but I'm hopeful that we can address both of these within the SELinux
> > > > > > code itself.
> > > > > > 
> > > > > 
> > > > > The problem I hit was that nfs_compare_super is called with a fs_context
> > > > > that has a NULL ->security pointer. That caused it to call
> > > > > selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
> > > > > it returns 1 and decides not to share sb's.
> > > > 
> > > > I tried to follow this because I'm really still quite puzzled by this
> > > > whole thing. Two consecutive mounts that should share the superblock
> > > > don't share the superblock. But behavior differs between nfs3 and nfs4
> > > > due to how automounting works.
> > > > 
> > > > Afaict, the callchain you're looking at in this scenario is:
> > > > 
> > > > (1) nfs3
> > > > 
> > > > (1.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0,nfsvers=3
> > > >       vfs_get_tree(fc_foo)
> > > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
> > > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_foo)
> > > >                -> nfs_get_tree_common(fc_foo)
> > > >                   -> sb_foo = sget_fc(fc_foo, nfs_compare_super, ...)
> > > > 
> > > > (1.2) mount 127.0.0.1:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0,nfsvers=3
> > > >       vfs_get_tree(fc_bar)
> > > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_bar)
> > > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_bar)
> > > >                -> nfs_get_tree_common(fc_bar)
> > > >                   -> sb_foo = sget_fc(fc_bar, nfs_compare_super, ...)
> > > >                      -> nfs_compare_super(sb_foo, fc_bar)
> > > >                         -> selinux_sb_mnt_opts_compat(sb_foo, fc_bar->security)
> > > > 
> > > > And fc_bar->security is non-NULL and compatible with sb_foo's current
> > > > security settings. Fine.
> > > > 
> > > > (2) nfs4
> > > > 
> > > > But for nfs4 we're looking at a vastly more complicated callchain at
> > > > least looking at this from a local nfs:
> > > > 
> > > > (2.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > > >       vfs_get_tree(fc_foo)
> > > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
> > > >          -> if (!ctx->internal) branch is taken
> > > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs4_try_get_tree(fc_foo)
> > > >                -> do_nfs4_mount(fc_foo)
> > > >                   -> fc_dup_foo = vfs_dup_fs_context(fc_foo)
> > > >                     -> security_fs_context_dup(fc_dup_foo, fc_foo)
> > > >                        {
> > > >                                 fc_dup_foo->security = kmemdup(fc_foo->security)
> > > >                        }
> > > >                        new_fs_context->internal = true
> > > >                   -> foo_mnt = fc_mount(fc_dup_foo)
> > > >                     -> vfs_get_tree(fc_dup_foo)
> > > >                        -> if (!ctx->internal) branch is _not_ taken
> > > >                           -> nfs_get_tree_common(fc_dup_foo)
> > > >                                  sb_foo = sget_fc(fc, nfs_compare_super, ...)
> > > >                   -> mount_subtree()
> > > >                      -> vfs_path_lookup(..., "/export/foo", LOOKUP_AUTOMOUNT)
> > > >                         -> nfs_d_automount("export")
> > > >                            -> fc_sub_foo = fs_context_for_submount()
> > > >                               {
> > > >                                       fc_sub_bar->security = NULL
> > > 
> > > 
> > > Should the above be:
> > > 
> > > 					fc_sub_foo->security = NULL;
> > 
> > Yes, typo for whatever reason.
> > 
> > > 
> > > ?
> > > 
> > > If so, then with this patch, the above would no longer be NULL. We'd
> > > inherit the security context info from the reference dentry passed to
> > > fs_context_for_submount().
> > > 
> > > >                               {
> > > >                            -> nfs4_submount(fc_sub_foo)
> > > >                               -> nfs4_do_submount(fc_sub_foo)
> > > >                                  -> vfs_get_tree(fc_sub_foo)
> > > >                                     -> nfs_get_tree_common(fc_sub_foo)
> > > >                                        -> sb_foo_2 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
> > > >                         -> nfs_d_automount("foo")
> > > >                            -> fc_sub_foo = fs_context_for_submount()
> > > >                               {
> > > >                                       fc_sub_bar->security = NULL
> > > 
> > > Ditto here -- that should be fc_sub_foo , correct?
> > 
> > Yes, same. Was just a typo.
> > 
> > > >                               {
> > > >                            -> nfs4_submount(fc_sub_foo)
> > > >                               -> nfs4_do_submount(fc_sub_foo)
> > > >                                  -> vfs_get_tree(fc_sub_foo)
> > > >                                     -> nfs_get_tree_common(fc_sub_foo)
> > > >              |--------------------------> sb_foo_3 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
> > > >              |
> > > > As far as I can see you're already allocating 3 separate superblocks of
> > > > which two are discarded and only one survives. Afaict, the one that
> > > > survives is _| the last one. Under the assumption that I'm correct,
> > > > where does the third superblock get it's selinux context from given that
> > > > fc->security isn't even set during submount?
> > > > 
> > > 
> > > That's the problem this patch is intended to fix. It allows child mounts
> > > to properly inherit security options from a parent dentry.
> > 
> > Yeah, I'm aware. Your patch will ensure that the last superblock is
> > found again. But you're always going to allocate addititional
> > superblocks afaict. That's at least what I can gather from the logic.
> > Say you have:
> > 
> > /export/a/b/c/d/e/foo   *(rw,insecure,no_subtree_check,no_root_squash)
> > /export/a/b/c/d/e/bar   *(rw,insecure,no_subtree_check,no_root_squash)
> > 
> > you allocate 8 superblocks (it's always path components +1) of which you
> > immediately discard 7 after you finished. That's easily reproducible
> > with selinux completely disabled. I'm just astonished.
> > 
> 
> Actually, your callchain might not be correct.
> 
> I think that you should only end up calling back into nfs_d_automount
> and creating a new sb when we cross a mount boundary. So if each of
> those intermediate directories represents a different fs, then you'll
> get a bunch of superblocks that will end up discarded, but I don't
> believe we create a new mount just for intermediate directories that we
> can walk.
> 
> Basically the nfsv4 mount process is to create a (hidden) superblock for
> the root of the tree on the server, and then use the normal pathwalk
> scheme to walk down to the right dentry for the root. Once we get there
> we can prune everything above that point and we end up with a single sb.

Now, I may just be doing something really wrong but that's not what's
happening according to my tracing. See below. 

cat /etc/exports
/export/a/b/c/d/e/foo   *(rw,insecure,no_subtree_check,no_root_squash)
/export/a/b/c/d/e/bar   *(rw,insecure,no_subtree_check,no_root_squash)

systemctl start nfs-server
exportfs -avr
mount 127.0.0.1:/export/a/b/c/d/e/foo /mnt/a

sget_fc(0xffff8c2e862ba800)
-> alloc_super(0xffff8c2e862ba800)
   -> fc_mount(0xffff8c2e862ba800)
      -> sget_fc(0xffff8c2e8c15f800)
         -> alloc_super(0xffff8c2e8c15f800)                   (1)

   -> mount_subtree(0xffff8c2e94694000)
      -> vfs_path_lookup(/export/a/b/c/d/e/foo)
      
         -> nfs_d_automount(export)
            -> nfs4_submount(0xffff8c2e8c15f800)
               -> nfs_do_submount(0xffff8c2e8c15f800)
                  -> sget_fc(0xffff8c2e8a5e7000)
                     -> alloc_super(0xffff8c2e8a5e7000)       (2)
      
         -> nfs_d_automount(a)
            -> nfs4_submount(0xffff8c2e8a5e7000)
               -> nfs_do_submount(0xffff8c2e8a5e7000)
                  -> sget_fc(0xffff8c2e8c15f000)
                     -> alloc_super(0xffff8c2e8c15f000)       (3)
      
         -> nfs_d_automount(b)
            -> nfs4_submount(0xffff8c2e8c15f000)
               -> nfs_do_submount(0xffff8c2e8c15f000)
                  -> sget_fc(0xffff8c2e94697000)
                  -> alloc_super(0xffff8c2e94697000)          (4)
      
         -> nfs_d_automount(c)
            -> nfs4_submount(0xffff8c2e94697000)
               -> nfs_do_submount(0xffff8c2e94697000)
                  -> sget_fc(0xffff8c2e8c159800)
                     -> alloc_super(0xffff8c2e8c159800)       (5)
      
         -> nfs_d_automount(d)
            -> nfs4_submount(0xffff8c2e8c159800)
               -> nfs_do_submount(0xffff8c2e8c159800)
                  -> sget_fc(0xffff8c2e94693000)
                     -> alloc_super(0xffff8c2e94693000)       (6)
      
         -> nfs_d_automount(e)
            -> nfs4_submount(0xffff8c2e94693000)
               -> nfs_do_submount(0xffff8c2e94693000)
                  -> sget_fc(0xffff8c2e94694000)
                     -> alloc_super(0xffff8c2e94694000)       (7)
      
         -> nfs_d_automount(foo)
            -> nfs4_submount(0xffff8c2e94694000)
               -> nfs_do_submount(0xffff8c2e94694000)         (8)
Jeff Layton Aug. 4, 2023, 12:58 p.m. UTC | #11
On Thu, 2023-08-03 at 22:48 -0400, Paul Moore wrote:
> On Thu, Aug 3, 2023 at 12:27 PM Jeff Layton <jlayton@kernel.org> wrote:
> > On Wed, 2023-08-02 at 22:46 -0400, Paul Moore wrote:
> > > On Wed, Aug 2, 2023 at 3:34 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > > > > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> 
> ...
> 
> > > My only concern now is the fs_context::lsm_set flag.
> > 
> > Yeah, that bit is ugly. David studied this problem a lot more than I
> > have, but basically, we only want to set the context info once, and
> > we're not always going to have a nice string to parse to set up the
> > options. This obviously works, but I'm fine with a more elegant method
> > if you can spot one.
> 
> Like I said before, sometimes making a LSM hook conditional on some
> flag is the only practical solution, but I always worry that there is
> a chance that a future patch might end up toggling that flag by
> accident and we lose an important call into the LSM.  Even if all we
> end up doing is moving the flag down into the LSMs I would be happier;
> there is still a risk, but at least if something breaks it is our (the
> LSM folks) own damn fault ;)
> 
> > > You didn't mention exactly why the security_sb_set_mnt_opts() was
> > > failing, and requires the fs_context::lsm_set check, but my guess is
> > > that something is tripping over the fact that the superblock is
> > > already properly setup.  I'm working under the assumption that this
> > > problem - attempting to reconfigure a properly configured superblock -
> > > should only be happening in the submount/non-NULL-reference case.  If
> > > it is happening elsewhere I think I'm going to need some help
> > > understanding that ...
> > 
> > Correct. When you pass in the mount options, fc->security seems to be
> > properly set. NFS mounting is complex though, so the final superblock
> > you care about may end up being a descendant of the one that was
> > originally configured.
> 
> Ooof, okay, there goes that idea.
> 
> At this point I guess it comes back to that question of why is calling
> into security_sb_set_mnt_opts() a second (or third, etc.) time failing
> for you?  Is there some conflict with the superblock
> config/labeling/etc.?  Is there a permissions problem?  Better
> understanding why that is failing might help us come up with a better
> solution.
> 

I removed the lsm_set parameter from this patch, and my testcase still
works fine. I can post a v7 if we want to go forward with that. I'm
guessing the complaint he saw was the "out_double_mount" pr_warn.

It looks like as long as the context options match, there shouldn't be
an issue, and I don't see how you'd get mismatched ones if you're
inheriting them.

David, do you remember what prompted you to add the lsm_set parameter?
Jeff Layton Aug. 4, 2023, 1:25 p.m. UTC | #12
On Fri, 2023-08-04 at 10:25 +0200, Christian Brauner wrote:
> On Thu, Aug 03, 2023 at 02:58:46PM -0400, Jeff Layton wrote:
> > On Thu, 2023-08-03 at 19:36 +0200, Christian Brauner wrote:
> > > On Thu, Aug 03, 2023 at 12:09:33PM -0400, Jeff Layton wrote:
> > > > On Thu, 2023-08-03 at 15:27 +0200, Christian Brauner wrote:
> > > > > On Wed, Aug 02, 2023 at 03:34:27PM -0400, Jeff Layton wrote:
> > > > > > On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > > > > > > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > > > 
> > > > > > > > When NFS superblocks are created by automounting, their LSM parameters
> > > > > > > > aren't set in the fs_context struct prior to sget_fc() being called,
> > > > > > > > leading to failure to match existing superblocks.
> > > > > > > > 
> > > > > > > > Fix this by adding a new LSM hook to load fc->security for submount
> > > > > > > > creation when alloc_fs_context() is creating the fs_context for it.
> > > > > > > > 
> > > > > > > > However, this uncovers a further bug: nfs_get_root() initialises the
> > > > > > > > superblock security manually by calling security_sb_set_mnt_opts() or
> > > > > > > > security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
> > > > > > > > security_sb_set_mnt_opts(), which can lead to SELinux, at least,
> > > > > > > > complaining.
> > > > > > > > 
> > > > > > > > Fix that by adding a flag to the fs_context that suppresses the
> > > > > > > > security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
> > > > > > > > when it sets the LSM context on the new superblock.
> > > > > > > > 
> > > > > > > > The first bug leads to messages like the following appearing in dmesg:
> > > > > > > > 
> > > > > > > > 	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
> > > > > > > > 
> > > > > > > > Signed-off-by: David Howells <dhowells@redhat.com>
> > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > > Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
> > > > > > > > Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
> > > > > > > > Tested-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > > Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> > > > > > > > Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> > > > > > > > Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
> > > > > > > > Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
> > > > > > > > Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
> > > > > > > > Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
> > > > > > > > Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
> > > > > > > > ---
> > > > > > > > This patch was originally sent by David several months ago, but it
> > > > > > > > never got merged. I'm resending to resurrect the discussion. Can we
> > > > > > > > get this fixed?
> > > > > > > 
> > > > > > > Sorry, I sorta lost track of this after the ROOTCONTEXT_MNT discussion
> > > > > > > back in v3.  Looking at it a bit closer now I have one nitpicky
> > > > > > > request and one larger concern (see below).
> > > > > > > 
> > > > > > > > diff --git a/fs/super.c b/fs/super.c
> > > > > > > > index e781226e2880..13adf43e2e5d 100644
> > > > > > > > --- a/fs/super.c
> > > > > > > > +++ b/fs/super.c
> > > > > > > > @@ -1541,10 +1541,12 @@ int vfs_get_tree(struct fs_context *fc)
> > > > > > > >  	smp_wmb();
> > > > > > > >  	sb->s_flags |= SB_BORN;
> > > > > > > >  
> > > > > > > > -	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > > > > > -	if (unlikely(error)) {
> > > > > > > > -		fc_drop_locked(fc);
> > > > > > > > -		return error;
> > > > > > > > +	if (!(fc->lsm_set)) {
> > > > > > > > +		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > > > > > +		if (unlikely(error)) {
> > > > > > > > +			fc_drop_locked(fc);
> > > > > > > > +			return error;
> > > > > > > > +		}
> > > > > > > >  	}
> > > > > > > 
> > > > > > > I generally dislike core kernel code which makes LSM calls conditional
> > > > > > > on some kernel state maintained outside the LSM.  Sometimes it has to
> > > > > > > be done as there is no other good options, but I would like us to try
> > > > > > > and avoid it if possible.  The commit description mentioned that this
> > > > > > > was put here to avoid a SELinux complaint, can you provide an example
> > > > > > > of the complain?  Does it complain about a double/invalid mount, e.g.
> > > > > > > "SELinux: mount invalid.  Same superblock, different security ..."?
> > > > > > > 
> > > > > > 
> > > > > > The problem I had was not so much SELinux warnings, but rather that in a
> > > > > > situation where I would expect to share superblocks between two
> > > > > > filesystems, it didn't.
> > > > > > 
> > > > > > Basically if you do something like this:
> > > > > > 
> > > > > > # mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > > > > > # mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0
> > > > > > 
> > > > > > ...when "foo" and "bar" are directories on the same filesystem on the
> > > > > > server, you should get two vfsmounts that share a superblock. That's
> > > > > > what you get if selinux is disabled, but not when it's enabled (even
> > > > > > when it's in permissive mode).
> > > > > > 
> > > > > > The problems that David hit with the automounter have a similar root
> > > > > > cause though, I believe.
> > > > > > 
> > > > > > > I'd like to understand why the sb_set_mnt_opts() call fails when it
> > > > > > > comes after the fs_context_init() call.  I'm particulary curious to
> > > > > > > know if the failure is due to conflicting SELinux state in the
> > > > > > > fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> > > > > > > properly handling existing values.  Perhaps I'm being overly naive,
> > > > > > > but I'm hopeful that we can address both of these within the SELinux
> > > > > > > code itself.
> > > > > > > 
> > > > > > 
> > > > > > The problem I hit was that nfs_compare_super is called with a fs_context
> > > > > > that has a NULL ->security pointer. That caused it to call
> > > > > > selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
> > > > > > it returns 1 and decides not to share sb's.
> > > > > 
> > > > > I tried to follow this because I'm really still quite puzzled by this
> > > > > whole thing. Two consecutive mounts that should share the superblock
> > > > > don't share the superblock. But behavior differs between nfs3 and nfs4
> > > > > due to how automounting works.
> > > > > 
> > > > > Afaict, the callchain you're looking at in this scenario is:
> > > > > 
> > > > > (1) nfs3
> > > > > 
> > > > > (1.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0,nfsvers=3
> > > > >       vfs_get_tree(fc_foo)
> > > > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
> > > > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_foo)
> > > > >                -> nfs_get_tree_common(fc_foo)
> > > > >                   -> sb_foo = sget_fc(fc_foo, nfs_compare_super, ...)
> > > > > 
> > > > > (1.2) mount 127.0.0.1:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0,nfsvers=3
> > > > >       vfs_get_tree(fc_bar)
> > > > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_bar)
> > > > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_bar)
> > > > >                -> nfs_get_tree_common(fc_bar)
> > > > >                   -> sb_foo = sget_fc(fc_bar, nfs_compare_super, ...)
> > > > >                      -> nfs_compare_super(sb_foo, fc_bar)
> > > > >                         -> selinux_sb_mnt_opts_compat(sb_foo, fc_bar->security)
> > > > > 
> > > > > And fc_bar->security is non-NULL and compatible with sb_foo's current
> > > > > security settings. Fine.
> > > > > 
> > > > > (2) nfs4
> > > > > 
> > > > > But for nfs4 we're looking at a vastly more complicated callchain at
> > > > > least looking at this from a local nfs:
> > > > > 
> > > > > (2.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > > > >       vfs_get_tree(fc_foo)
> > > > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
> > > > >          -> if (!ctx->internal) branch is taken
> > > > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs4_try_get_tree(fc_foo)
> > > > >                -> do_nfs4_mount(fc_foo)
> > > > >                   -> fc_dup_foo = vfs_dup_fs_context(fc_foo)
> > > > >                     -> security_fs_context_dup(fc_dup_foo, fc_foo)
> > > > >                        {
> > > > >                                 fc_dup_foo->security = kmemdup(fc_foo->security)
> > > > >                        }
> > > > >                        new_fs_context->internal = true
> > > > >                   -> foo_mnt = fc_mount(fc_dup_foo)
> > > > >                     -> vfs_get_tree(fc_dup_foo)
> > > > >                        -> if (!ctx->internal) branch is _not_ taken
> > > > >                           -> nfs_get_tree_common(fc_dup_foo)
> > > > >                                  sb_foo = sget_fc(fc, nfs_compare_super, ...)
> > > > >                   -> mount_subtree()
> > > > >                      -> vfs_path_lookup(..., "/export/foo", LOOKUP_AUTOMOUNT)
> > > > >                         -> nfs_d_automount("export")
> > > > >                            -> fc_sub_foo = fs_context_for_submount()
> > > > >                               {
> > > > >                                       fc_sub_bar->security = NULL
> > > > 
> > > > 
> > > > Should the above be:
> > > > 
> > > > 					fc_sub_foo->security = NULL;
> > > 
> > > Yes, typo for whatever reason.
> > > 
> > > > 
> > > > ?
> > > > 
> > > > If so, then with this patch, the above would no longer be NULL. We'd
> > > > inherit the security context info from the reference dentry passed to
> > > > fs_context_for_submount().
> > > > 
> > > > >                               {
> > > > >                            -> nfs4_submount(fc_sub_foo)
> > > > >                               -> nfs4_do_submount(fc_sub_foo)
> > > > >                                  -> vfs_get_tree(fc_sub_foo)
> > > > >                                     -> nfs_get_tree_common(fc_sub_foo)
> > > > >                                        -> sb_foo_2 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
> > > > >                         -> nfs_d_automount("foo")
> > > > >                            -> fc_sub_foo = fs_context_for_submount()
> > > > >                               {
> > > > >                                       fc_sub_bar->security = NULL
> > > > 
> > > > Ditto here -- that should be fc_sub_foo , correct?
> > > 
> > > Yes, same. Was just a typo.
> > > 
> > > > >                               {
> > > > >                            -> nfs4_submount(fc_sub_foo)
> > > > >                               -> nfs4_do_submount(fc_sub_foo)
> > > > >                                  -> vfs_get_tree(fc_sub_foo)
> > > > >                                     -> nfs_get_tree_common(fc_sub_foo)
> > > > >              |--------------------------> sb_foo_3 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
> > > > >              |
> > > > > As far as I can see you're already allocating 3 separate superblocks of
> > > > > which two are discarded and only one survives. Afaict, the one that
> > > > > survives is _| the last one. Under the assumption that I'm correct,
> > > > > where does the third superblock get it's selinux context from given that
> > > > > fc->security isn't even set during submount?
> > > > > 
> > > > 
> > > > That's the problem this patch is intended to fix. It allows child mounts
> > > > to properly inherit security options from a parent dentry.
> > > 
> > > Yeah, I'm aware. Your patch will ensure that the last superblock is
> > > found again. But you're always going to allocate addititional
> > > superblocks afaict. That's at least what I can gather from the logic.
> > > Say you have:
> > > 
> > > /export/a/b/c/d/e/foo   *(rw,insecure,no_subtree_check,no_root_squash)
> > > /export/a/b/c/d/e/bar   *(rw,insecure,no_subtree_check,no_root_squash)
> > > 
> > > you allocate 8 superblocks (it's always path components +1) of which you
> > > immediately discard 7 after you finished. That's easily reproducible
> > > with selinux completely disabled. I'm just astonished.
> > > 
> > 
> > Actually, your callchain might not be correct.
> > 
> > I think that you should only end up calling back into nfs_d_automount
> > and creating a new sb when we cross a mount boundary. So if each of
> > those intermediate directories represents a different fs, then you'll
> > get a bunch of superblocks that will end up discarded, but I don't
> > believe we create a new mount just for intermediate directories that we
> > can walk.
> > 
> > Basically the nfsv4 mount process is to create a (hidden) superblock for
> > the root of the tree on the server, and then use the normal pathwalk
> > scheme to walk down to the right dentry for the root. Once we get there
> > we can prune everything above that point and we end up with a single sb.
> 
> Now, I may just be doing something really wrong but that's not what's
> happening according to my tracing. See below. 
> 
> cat /etc/exports
> /export/a/b/c/d/e/foo   *(rw,insecure,no_subtree_check,no_root_squash)
> /export/a/b/c/d/e/bar   *(rw,insecure,no_subtree_check,no_root_squash)
> 
> systemctl start nfs-server
> exportfs -avr
> mount 127.0.0.1:/export/a/b/c/d/e/foo /mnt/a
> 

Ahh ok. In your case, you're only exporting those single directories,
which means that the server has to create pseudoroot entries for all of
the intermediate directories in that path.

So, while "foo" corresponds to the directory on the server, the
intermediate /export/a/b/c/d/e hierarchy are constructed, and the client
is (probably) treating each of those as if they were new mountpoints.

If, instead, your exports file just looked like:

/export		*(rw,insecure,no_subtree_check,no_root_squash)

...and you do that same mount command, you'd find that it would set up a
sb for /export and then just walk down the path normally until it got to
the terminal dentry and then mount that onto the tree.

It may be possible to make your testcase work more efficiently, but
mounting is a fairly rare activity so we don't tend to spend time
optimizing it. If you see a way to do it more efficiently though, then
I'd be willing to take a look.


> sget_fc(0xffff8c2e862ba800)
> -> alloc_super(0xffff8c2e862ba800)
>    -> fc_mount(0xffff8c2e862ba800)
>       -> sget_fc(0xffff8c2e8c15f800)
>          -> alloc_super(0xffff8c2e8c15f800)                   (1)
> 
>    -> mount_subtree(0xffff8c2e94694000)
>       -> vfs_path_lookup(/export/a/b/c/d/e/foo)
>       
>          -> nfs_d_automount(export)
>             -> nfs4_submount(0xffff8c2e8c15f800)
>                -> nfs_do_submount(0xffff8c2e8c15f800)
>                   -> sget_fc(0xffff8c2e8a5e7000)
>                      -> alloc_super(0xffff8c2e8a5e7000)       (2)
>       
>          -> nfs_d_automount(a)
>             -> nfs4_submount(0xffff8c2e8a5e7000)
>                -> nfs_do_submount(0xffff8c2e8a5e7000)
>                   -> sget_fc(0xffff8c2e8c15f000)
>                      -> alloc_super(0xffff8c2e8c15f000)       (3)
>       
>          -> nfs_d_automount(b)
>             -> nfs4_submount(0xffff8c2e8c15f000)
>                -> nfs_do_submount(0xffff8c2e8c15f000)
>                   -> sget_fc(0xffff8c2e94697000)
>                   -> alloc_super(0xffff8c2e94697000)          (4)
>       
>          -> nfs_d_automount(c)
>             -> nfs4_submount(0xffff8c2e94697000)
>                -> nfs_do_submount(0xffff8c2e94697000)
>                   -> sget_fc(0xffff8c2e8c159800)
>                      -> alloc_super(0xffff8c2e8c159800)       (5)
>       
>          -> nfs_d_automount(d)
>             -> nfs4_submount(0xffff8c2e8c159800)
>                -> nfs_do_submount(0xffff8c2e8c159800)
>                   -> sget_fc(0xffff8c2e94693000)
>                      -> alloc_super(0xffff8c2e94693000)       (6)
>       
>          -> nfs_d_automount(e)
>             -> nfs4_submount(0xffff8c2e94693000)
>                -> nfs_do_submount(0xffff8c2e94693000)
>                   -> sget_fc(0xffff8c2e94694000)
>                      -> alloc_super(0xffff8c2e94694000)       (7)
>       
>          -> nfs_d_automount(foo)
>             -> nfs4_submount(0xffff8c2e94694000)
>                -> nfs_do_submount(0xffff8c2e94694000)         (8)
Christian Brauner Aug. 4, 2023, 1:38 p.m. UTC | #13
On Fri, Aug 04, 2023 at 09:25:40AM -0400, Jeff Layton wrote:
> On Fri, 2023-08-04 at 10:25 +0200, Christian Brauner wrote:
> > On Thu, Aug 03, 2023 at 02:58:46PM -0400, Jeff Layton wrote:
> > > On Thu, 2023-08-03 at 19:36 +0200, Christian Brauner wrote:
> > > > On Thu, Aug 03, 2023 at 12:09:33PM -0400, Jeff Layton wrote:
> > > > > On Thu, 2023-08-03 at 15:27 +0200, Christian Brauner wrote:
> > > > > > On Wed, Aug 02, 2023 at 03:34:27PM -0400, Jeff Layton wrote:
> > > > > > > On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > > > > > > > On Aug  2, 2023 Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > > > > 
> > > > > > > > > When NFS superblocks are created by automounting, their LSM parameters
> > > > > > > > > aren't set in the fs_context struct prior to sget_fc() being called,
> > > > > > > > > leading to failure to match existing superblocks.
> > > > > > > > > 
> > > > > > > > > Fix this by adding a new LSM hook to load fc->security for submount
> > > > > > > > > creation when alloc_fs_context() is creating the fs_context for it.
> > > > > > > > > 
> > > > > > > > > However, this uncovers a further bug: nfs_get_root() initialises the
> > > > > > > > > superblock security manually by calling security_sb_set_mnt_opts() or
> > > > > > > > > security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
> > > > > > > > > security_sb_set_mnt_opts(), which can lead to SELinux, at least,
> > > > > > > > > complaining.
> > > > > > > > > 
> > > > > > > > > Fix that by adding a flag to the fs_context that suppresses the
> > > > > > > > > security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
> > > > > > > > > when it sets the LSM context on the new superblock.
> > > > > > > > > 
> > > > > > > > > The first bug leads to messages like the following appearing in dmesg:
> > > > > > > > > 
> > > > > > > > > 	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
> > > > > > > > > 
> > > > > > > > > Signed-off-by: David Howells <dhowells@redhat.com>
> > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
> > > > > > > > > Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
> > > > > > > > > Tested-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> > > > > > > > > Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> > > > > > > > > Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
> > > > > > > > > Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
> > > > > > > > > Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
> > > > > > > > > Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
> > > > > > > > > Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
> > > > > > > > > ---
> > > > > > > > > This patch was originally sent by David several months ago, but it
> > > > > > > > > never got merged. I'm resending to resurrect the discussion. Can we
> > > > > > > > > get this fixed?
> > > > > > > > 
> > > > > > > > Sorry, I sorta lost track of this after the ROOTCONTEXT_MNT discussion
> > > > > > > > back in v3.  Looking at it a bit closer now I have one nitpicky
> > > > > > > > request and one larger concern (see below).
> > > > > > > > 
> > > > > > > > > diff --git a/fs/super.c b/fs/super.c
> > > > > > > > > index e781226e2880..13adf43e2e5d 100644
> > > > > > > > > --- a/fs/super.c
> > > > > > > > > +++ b/fs/super.c
> > > > > > > > > @@ -1541,10 +1541,12 @@ int vfs_get_tree(struct fs_context *fc)
> > > > > > > > >  	smp_wmb();
> > > > > > > > >  	sb->s_flags |= SB_BORN;
> > > > > > > > >  
> > > > > > > > > -	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > > > > > > -	if (unlikely(error)) {
> > > > > > > > > -		fc_drop_locked(fc);
> > > > > > > > > -		return error;
> > > > > > > > > +	if (!(fc->lsm_set)) {
> > > > > > > > > +		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > > > > > > +		if (unlikely(error)) {
> > > > > > > > > +			fc_drop_locked(fc);
> > > > > > > > > +			return error;
> > > > > > > > > +		}
> > > > > > > > >  	}
> > > > > > > > 
> > > > > > > > I generally dislike core kernel code which makes LSM calls conditional
> > > > > > > > on some kernel state maintained outside the LSM.  Sometimes it has to
> > > > > > > > be done as there is no other good options, but I would like us to try
> > > > > > > > and avoid it if possible.  The commit description mentioned that this
> > > > > > > > was put here to avoid a SELinux complaint, can you provide an example
> > > > > > > > of the complain?  Does it complain about a double/invalid mount, e.g.
> > > > > > > > "SELinux: mount invalid.  Same superblock, different security ..."?
> > > > > > > > 
> > > > > > > 
> > > > > > > The problem I had was not so much SELinux warnings, but rather that in a
> > > > > > > situation where I would expect to share superblocks between two
> > > > > > > filesystems, it didn't.
> > > > > > > 
> > > > > > > Basically if you do something like this:
> > > > > > > 
> > > > > > > # mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > > > > > > # mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0
> > > > > > > 
> > > > > > > ...when "foo" and "bar" are directories on the same filesystem on the
> > > > > > > server, you should get two vfsmounts that share a superblock. That's
> > > > > > > what you get if selinux is disabled, but not when it's enabled (even
> > > > > > > when it's in permissive mode).
> > > > > > > 
> > > > > > > The problems that David hit with the automounter have a similar root
> > > > > > > cause though, I believe.
> > > > > > > 
> > > > > > > > I'd like to understand why the sb_set_mnt_opts() call fails when it
> > > > > > > > comes after the fs_context_init() call.  I'm particulary curious to
> > > > > > > > know if the failure is due to conflicting SELinux state in the
> > > > > > > > fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> > > > > > > > properly handling existing values.  Perhaps I'm being overly naive,
> > > > > > > > but I'm hopeful that we can address both of these within the SELinux
> > > > > > > > code itself.
> > > > > > > > 
> > > > > > > 
> > > > > > > The problem I hit was that nfs_compare_super is called with a fs_context
> > > > > > > that has a NULL ->security pointer. That caused it to call
> > > > > > > selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
> > > > > > > it returns 1 and decides not to share sb's.
> > > > > > 
> > > > > > I tried to follow this because I'm really still quite puzzled by this
> > > > > > whole thing. Two consecutive mounts that should share the superblock
> > > > > > don't share the superblock. But behavior differs between nfs3 and nfs4
> > > > > > due to how automounting works.
> > > > > > 
> > > > > > Afaict, the callchain you're looking at in this scenario is:
> > > > > > 
> > > > > > (1) nfs3
> > > > > > 
> > > > > > (1.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0,nfsvers=3
> > > > > >       vfs_get_tree(fc_foo)
> > > > > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
> > > > > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_foo)
> > > > > >                -> nfs_get_tree_common(fc_foo)
> > > > > >                   -> sb_foo = sget_fc(fc_foo, nfs_compare_super, ...)
> > > > > > 
> > > > > > (1.2) mount 127.0.0.1:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0,nfsvers=3
> > > > > >       vfs_get_tree(fc_bar)
> > > > > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_bar)
> > > > > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_bar)
> > > > > >                -> nfs_get_tree_common(fc_bar)
> > > > > >                   -> sb_foo = sget_fc(fc_bar, nfs_compare_super, ...)
> > > > > >                      -> nfs_compare_super(sb_foo, fc_bar)
> > > > > >                         -> selinux_sb_mnt_opts_compat(sb_foo, fc_bar->security)
> > > > > > 
> > > > > > And fc_bar->security is non-NULL and compatible with sb_foo's current
> > > > > > security settings. Fine.
> > > > > > 
> > > > > > (2) nfs4
> > > > > > 
> > > > > > But for nfs4 we're looking at a vastly more complicated callchain at
> > > > > > least looking at this from a local nfs:
> > > > > > 
> > > > > > (2.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > > > > >       vfs_get_tree(fc_foo)
> > > > > >       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
> > > > > >          -> if (!ctx->internal) branch is taken
> > > > > >             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs4_try_get_tree(fc_foo)
> > > > > >                -> do_nfs4_mount(fc_foo)
> > > > > >                   -> fc_dup_foo = vfs_dup_fs_context(fc_foo)
> > > > > >                     -> security_fs_context_dup(fc_dup_foo, fc_foo)
> > > > > >                        {
> > > > > >                                 fc_dup_foo->security = kmemdup(fc_foo->security)
> > > > > >                        }
> > > > > >                        new_fs_context->internal = true
> > > > > >                   -> foo_mnt = fc_mount(fc_dup_foo)
> > > > > >                     -> vfs_get_tree(fc_dup_foo)
> > > > > >                        -> if (!ctx->internal) branch is _not_ taken
> > > > > >                           -> nfs_get_tree_common(fc_dup_foo)
> > > > > >                                  sb_foo = sget_fc(fc, nfs_compare_super, ...)
> > > > > >                   -> mount_subtree()
> > > > > >                      -> vfs_path_lookup(..., "/export/foo", LOOKUP_AUTOMOUNT)
> > > > > >                         -> nfs_d_automount("export")
> > > > > >                            -> fc_sub_foo = fs_context_for_submount()
> > > > > >                               {
> > > > > >                                       fc_sub_bar->security = NULL
> > > > > 
> > > > > 
> > > > > Should the above be:
> > > > > 
> > > > > 					fc_sub_foo->security = NULL;
> > > > 
> > > > Yes, typo for whatever reason.
> > > > 
> > > > > 
> > > > > ?
> > > > > 
> > > > > If so, then with this patch, the above would no longer be NULL. We'd
> > > > > inherit the security context info from the reference dentry passed to
> > > > > fs_context_for_submount().
> > > > > 
> > > > > >                               {
> > > > > >                            -> nfs4_submount(fc_sub_foo)
> > > > > >                               -> nfs4_do_submount(fc_sub_foo)
> > > > > >                                  -> vfs_get_tree(fc_sub_foo)
> > > > > >                                     -> nfs_get_tree_common(fc_sub_foo)
> > > > > >                                        -> sb_foo_2 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
> > > > > >                         -> nfs_d_automount("foo")
> > > > > >                            -> fc_sub_foo = fs_context_for_submount()
> > > > > >                               {
> > > > > >                                       fc_sub_bar->security = NULL
> > > > > 
> > > > > Ditto here -- that should be fc_sub_foo , correct?
> > > > 
> > > > Yes, same. Was just a typo.
> > > > 
> > > > > >                               {
> > > > > >                            -> nfs4_submount(fc_sub_foo)
> > > > > >                               -> nfs4_do_submount(fc_sub_foo)
> > > > > >                                  -> vfs_get_tree(fc_sub_foo)
> > > > > >                                     -> nfs_get_tree_common(fc_sub_foo)
> > > > > >              |--------------------------> sb_foo_3 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
> > > > > >              |
> > > > > > As far as I can see you're already allocating 3 separate superblocks of
> > > > > > which two are discarded and only one survives. Afaict, the one that
> > > > > > survives is _| the last one. Under the assumption that I'm correct,
> > > > > > where does the third superblock get it's selinux context from given that
> > > > > > fc->security isn't even set during submount?
> > > > > > 
> > > > > 
> > > > > That's the problem this patch is intended to fix. It allows child mounts
> > > > > to properly inherit security options from a parent dentry.
> > > > 
> > > > Yeah, I'm aware. Your patch will ensure that the last superblock is
> > > > found again. But you're always going to allocate addititional
> > > > superblocks afaict. That's at least what I can gather from the logic.
> > > > Say you have:
> > > > 
> > > > /export/a/b/c/d/e/foo   *(rw,insecure,no_subtree_check,no_root_squash)
> > > > /export/a/b/c/d/e/bar   *(rw,insecure,no_subtree_check,no_root_squash)
> > > > 
> > > > you allocate 8 superblocks (it's always path components +1) of which you
> > > > immediately discard 7 after you finished. That's easily reproducible
> > > > with selinux completely disabled. I'm just astonished.
> > > > 
> > > 
> > > Actually, your callchain might not be correct.
> > > 
> > > I think that you should only end up calling back into nfs_d_automount
> > > and creating a new sb when we cross a mount boundary. So if each of
> > > those intermediate directories represents a different fs, then you'll
> > > get a bunch of superblocks that will end up discarded, but I don't
> > > believe we create a new mount just for intermediate directories that we
> > > can walk.
> > > 
> > > Basically the nfsv4 mount process is to create a (hidden) superblock for
> > > the root of the tree on the server, and then use the normal pathwalk
> > > scheme to walk down to the right dentry for the root. Once we get there
> > > we can prune everything above that point and we end up with a single sb.
> > 
> > Now, I may just be doing something really wrong but that's not what's
> > happening according to my tracing. See below. 
> > 
> > cat /etc/exports
> > /export/a/b/c/d/e/foo   *(rw,insecure,no_subtree_check,no_root_squash)
> > /export/a/b/c/d/e/bar   *(rw,insecure,no_subtree_check,no_root_squash)
> > 
> > systemctl start nfs-server
> > exportfs -avr
> > mount 127.0.0.1:/export/a/b/c/d/e/foo /mnt/a
> > 
> 
> Ahh ok. In your case, you're only exporting those single directories,
> which means that the server has to create pseudoroot entries for all of
> the intermediate directories in that path.
> 
> So, while "foo" corresponds to the directory on the server, the
> intermediate /export/a/b/c/d/e hierarchy are constructed, and the client
> is (probably) treating each of those as if they were new mountpoints.

That's kinda what I figured but I didn't really understand why.

It's probably not too bad for nfs but it's a pricy endeavour for the
whole system. Yes, it's rare but it means, if you have say 250 path
components you're keeping 250 NFS superblocks in fs_type->instances
until you finished mount_subtree(). Which means as pathwalk progresses
costs for each new superblock increase dramatically and since sb_lock is
global you're also stalling other superblocks creation on the system.

Anyway, that's for another time.
David Howells Aug. 4, 2023, 3:16 p.m. UTC | #14
Paul Moore <paul@paul-moore.com> wrote:

> 
> I generally dislike core kernel code which makes LSM calls conditional
> on some kernel state maintained outside the LSM.  Sometimes it has to
> be done as there is no other good options, but I would like us to try
> and avoid it if possible.  The commit description mentioned that this
> was put here to avoid a SELinux complaint, can you provide an example
> of the complain?  Does it complain about a double/invalid mount, e.g.
> "SELinux: mount invalid.  Same superblock, different security ..."?
> 
> I'd like to understand why the sb_set_mnt_opts() call fails when it
> comes after the fs_context_init() call.  I'm particulary curious to
> know if the failure is due to conflicting SELinux state in the
> fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> properly handling existing values.  Perhaps I'm being overly naive,
> but I'm hopeful that we can address both of these within the SELinux
> code itself.
> 
> In a worst case situation, we could always implement a flag *inside*
> the SELinux code, similar to what has been done with 'lsm_set' here.

IIRC, the issue is when you make a mount with an explicit context= setting and
make another mount from some way down the export tree that doesn't have an
explicit setting, e.g.:

	mount carina:/ /mnt -o context=system_u:object_r:root_t:s0
	mount carina:/nfs/scratch /mnt2

and then cause an automount to walk from one to the other:

	stat /mnt/nfs/scratch/foo

For reference, my server has:

	/nfs/scratch 192.168.6.0/255.255.255.0,90.155.74.16/255.255.255.248
	/nfs         192.168.6.0/255.255.255.0,90.155.74.16/255.255.255.248
	/            192.168.6.0/255.255.255.0,90.155.74.16/255.255.255.248

and if I look in /proc/fs/nfsfs/volumes, I can see the individual superblocks:

	NV SERVER   PORT DEV          FSID                              FSC
	v4 c0a80601  801 0:51         0:0                               no 
	v4 c0a80601  801 0:56         3:0                               no 
	v4 c0a80601  801 0:52         1:0                               no 
	v4 c0a80601  801 0:55         3:0                               no 

As you can see, there are two referring to the same 'volume'.

Without the "fc->lsm_set=true" bit, you get an error something like:

	SELinux: mount invalid.  Same superblock, different security settings for (dev 0:56, type nfs4)

One important question is how should sharing of a mount with unspecified
context be handled when we try to unify it with a mount that has an explicit
context?

David
David Howells Aug. 4, 2023, 3:22 p.m. UTC | #15
David Howells <dhowells@redhat.com> wrote:

> IIRC, the issue is when you make a mount with an explicit context= setting and
> make another mount from some way down the export tree that doesn't have an
> explicit setting, e.g.:
> 
> 	mount carina:/ /mnt -o context=system_u:object_r:root_t:s0
> 	mount carina:/nfs/scratch /mnt2
> 
> and then cause an automount to walk from one to the other:
> 
> 	stat /mnt/nfs/scratch/foo

Actually, the order there isn't quite right.  The problem is with this order:

	# mount carina:/ /mnt -o context=system_u:object_r:root_t:s0
	# stat /mnt/nfs/scratch/bus
	  File: /mnt/nfs/scratch/bus
	  Size: 124160          Blocks: 248        IO Block: 1048576 regular file
	Device: 0,55    Inode: 131         Links: 1
	...
	# mount carina:/nfs/scratch /mnt2
	mount.nfs: /mnt2 is busy or already mounted or sharecache fail

with the error:

	SELinux: mount invalid.  Same superblock, different security settings for (dev 0:52, type nfs4)

David
Jeff Layton Aug. 4, 2023, 4 p.m. UTC | #16
On Fri, 2023-08-04 at 16:22 +0100, David Howells wrote:
> David Howells <dhowells@redhat.com> wrote:
> 
> > IIRC, the issue is when you make a mount with an explicit context= setting and
> > make another mount from some way down the export tree that doesn't have an
> > explicit setting, e.g.:
> > 
> > 	mount carina:/ /mnt -o context=system_u:object_r:root_t:s0
> > 	mount carina:/nfs/scratch /mnt2
> > 
> > and then cause an automount to walk from one to the other:
> > 
> > 	stat /mnt/nfs/scratch/foo
> 
> Actually, the order there isn't quite right.  The problem is with this order:
> 
> 	# mount carina:/ /mnt -o context=system_u:object_r:root_t:s0
> 	# stat /mnt/nfs/scratch/bus
> 	  File: /mnt/nfs/scratch/bus
> 	  Size: 124160          Blocks: 248        IO Block: 1048576 regular file
> 	Device: 0,55    Inode: 131         Links: 1
> 	...
> 	# mount carina:/nfs/scratch /mnt2
> 	mount.nfs: /mnt2 is busy or already mounted or sharecache fail
> 
> with the error:
> 
> 	SELinux: mount invalid.  Same superblock, different security settings for (dev 0:52, type nfs4)
> 

That seems like the correct behavior to me. You tried to mount the same
mount with a different sec context. If you want that, then you need to
use -o nosharecache.

I'll send a v7 in a bit.
diff mbox series

Patch

diff --git a/fs/fs_context.c b/fs/fs_context.c
index 851214d1d013..a523aea956c4 100644
--- a/fs/fs_context.c
+++ b/fs/fs_context.c
@@ -282,6 +282,10 @@  static struct fs_context *alloc_fs_context(struct file_system_type *fs_type,
 		break;
 	}
 
+	ret = security_fs_context_init(fc, reference);
+	if (ret < 0)
+		goto err_fc;
+
 	/* TODO: Make all filesystems support this unconditionally */
 	init_fs_context = fc->fs_type->init_fs_context;
 	if (!init_fs_context)
diff --git a/fs/nfs/getroot.c b/fs/nfs/getroot.c
index 11ff2b2e060f..651bffb0067e 100644
--- a/fs/nfs/getroot.c
+++ b/fs/nfs/getroot.c
@@ -144,6 +144,7 @@  int nfs_get_root(struct super_block *s, struct fs_context *fc)
 	}
 	if (error)
 		goto error_splat_root;
+	fc->lsm_set = true;
 	if (server->caps & NFS_CAP_SECURITY_LABEL &&
 		!(kflags_out & SECURITY_LSM_NATIVE_LABELS))
 		server->caps &= ~NFS_CAP_SECURITY_LABEL;
diff --git a/fs/super.c b/fs/super.c
index e781226e2880..13adf43e2e5d 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1541,10 +1541,12 @@  int vfs_get_tree(struct fs_context *fc)
 	smp_wmb();
 	sb->s_flags |= SB_BORN;
 
-	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
-	if (unlikely(error)) {
-		fc_drop_locked(fc);
-		return error;
+	if (!(fc->lsm_set)) {
+		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
+		if (unlikely(error)) {
+			fc_drop_locked(fc);
+			return error;
+		}
 	}
 
 	/*
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index ff6341e09925..26a9fcdb10cc 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -109,6 +109,7 @@  struct fs_context {
 	bool			need_free:1;	/* Need to call ops->free() */
 	bool			global:1;	/* Goes into &init_user_ns */
 	bool			oldapi:1;	/* Coming from mount(2) */
+	bool			lsm_set:1;	/* security_sb_set/clone_mnt_opts() already done */
 };
 
 struct fs_context_operations {
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 7308a1a7599b..7ce3550154b1 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -54,6 +54,7 @@  LSM_HOOK(int, 0, bprm_creds_from_file, struct linux_binprm *bprm, struct file *f
 LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm)
 LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm)
 LSM_HOOK(void, LSM_RET_VOID, bprm_committed_creds, struct linux_binprm *bprm)
+LSM_HOOK(int, 0, fs_context_init, struct fs_context *fc, struct dentry *reference)
 LSM_HOOK(int, 0, fs_context_dup, struct fs_context *fc,
 	 struct fs_context *src_sc)
 LSM_HOOK(int, -ENOPARAM, fs_context_parse_param, struct fs_context *fc,
diff --git a/include/linux/security.h b/include/linux/security.h
index 32828502f09e..61fda06fac9d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -293,6 +293,7 @@  int security_bprm_creds_from_file(struct linux_binprm *bprm, struct file *file);
 int security_bprm_check(struct linux_binprm *bprm);
 void security_bprm_committing_creds(struct linux_binprm *bprm);
 void security_bprm_committed_creds(struct linux_binprm *bprm);
+int security_fs_context_init(struct fs_context *fc, struct dentry *reference);
 int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc);
 int security_fs_context_parse_param(struct fs_context *fc, struct fs_parameter *param);
 int security_sb_alloc(struct super_block *sb);
@@ -629,6 +630,11 @@  static inline void security_bprm_committed_creds(struct linux_binprm *bprm)
 {
 }
 
+static inline int security_fs_context_init(struct fs_context *fc,
+					   struct dentry *reference)
+{
+	return 0;
+}
 static inline int security_fs_context_dup(struct fs_context *fc,
 					  struct fs_context *src_fc)
 {
diff --git a/security/security.c b/security/security.c
index b720424ca37d..8a6dc6f7cda0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1138,6 +1138,20 @@  void security_bprm_committed_creds(struct linux_binprm *bprm)
 	call_void_hook(bprm_committed_creds, bprm);
 }
 
+/**
+ * security_fs_context_init() - Initialise fc->security
+ * @fc: new filesystem context
+ * @dentry: dentry reference for submount/remount
+ *
+ * Fill out the ->security field for a new fs_context.
+ *
+ * Return: Returns 0 on success or negative error code on failure.
+ */
+int security_fs_context_init(struct fs_context *fc, struct dentry *reference)
+{
+	return call_int_hook(fs_context_init, 0, fc, reference);
+}
+
 /**
  * security_fs_context_dup() - Duplicate a fs_context LSM blob
  * @fc: destination filesystem context
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d06e350fedee..29cce0fadbeb 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2745,6 +2745,30 @@  static int selinux_umount(struct vfsmount *mnt, int flags)
 				   FILESYSTEM__UNMOUNT, NULL);
 }
 
+static int selinux_fs_context_init(struct fs_context *fc,
+				   struct dentry *reference)
+{
+	const struct superblock_security_struct *sbsec;
+	struct selinux_mnt_opts *opts;
+
+	if (fc->purpose == FS_CONTEXT_FOR_SUBMOUNT) {
+		opts = kzalloc(sizeof(*opts), GFP_KERNEL);
+		if (!opts)
+			return -ENOMEM;
+
+		sbsec = selinux_superblock(reference->d_sb);
+		if (sbsec->flags & FSCONTEXT_MNT)
+			opts->fscontext_sid	= sbsec->sid;
+		if (sbsec->flags & CONTEXT_MNT)
+			opts->context_sid	= sbsec->mntpoint_sid;
+		if (sbsec->flags & DEFCONTEXT_MNT)
+			opts->defcontext_sid	= sbsec->def_sid;
+		fc->security = opts;
+	}
+
+	return 0;
+}
+
 static int selinux_fs_context_dup(struct fs_context *fc,
 				  struct fs_context *src_fc)
 {
@@ -7182,6 +7206,7 @@  static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	/*
 	 * PUT "CLONING" (ACCESSING + ALLOCATING) HOOKS HERE
 	 */
+	LSM_HOOK_INIT(fs_context_init, selinux_fs_context_init),
 	LSM_HOOK_INIT(fs_context_dup, selinux_fs_context_dup),
 	LSM_HOOK_INIT(fs_context_parse_param, selinux_fs_context_parse_param),
 	LSM_HOOK_INIT(sb_eat_lsm_opts, selinux_sb_eat_lsm_opts),
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 6e270cf3fd30..938c8259c5e7 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -614,6 +614,59 @@  static int smack_add_opt(int token, const char *s, void **mnt_opts)
 	return -EINVAL;
 }
 
+/**
+ * smack_fs_context_init - Initialise security data for a filesystem context
+ * @fc: The filesystem context.
+ * @reference: Reference dentry (automount/reconfigure) or NULL
+ *
+ * Returns 0 on success or -ENOMEM on error.
+ */
+static int smack_fs_context_init(struct fs_context *fc,
+				 struct dentry *reference)
+{
+	struct superblock_smack *sbsp;
+	struct smack_mnt_opts *ctx;
+	struct inode_smack *isp;
+
+	if (fc->purpose == FS_CONTEXT_FOR_SUBMOUNT) {
+		ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+		if (!ctx)
+			return -ENOMEM;
+		fc->security = ctx;
+
+		sbsp = smack_superblock(reference->d_sb);
+		isp = smack_inode(reference->d_sb->s_root->d_inode);
+
+		if (sbsp->smk_default) {
+			ctx->fsdefault = kstrdup(sbsp->smk_default->smk_known, GFP_KERNEL);
+			if (!ctx->fsdefault)
+				return -ENOMEM;
+		}
+
+		if (sbsp->smk_floor) {
+			ctx->fsfloor = kstrdup(sbsp->smk_floor->smk_known, GFP_KERNEL);
+			if (!ctx->fsfloor)
+				return -ENOMEM;
+		}
+
+		if (sbsp->smk_hat) {
+			ctx->fshat = kstrdup(sbsp->smk_hat->smk_known, GFP_KERNEL);
+			if (!ctx->fshat)
+				return -ENOMEM;
+		}
+
+		if (isp->smk_flags & SMK_INODE_TRANSMUTE) {
+			if (sbsp->smk_root) {
+				ctx->fstransmute = kstrdup(sbsp->smk_root->smk_known, GFP_KERNEL);
+				if (!ctx->fstransmute)
+					return -ENOMEM;
+			}
+		}
+	}
+
+	return 0;
+}
+
 /**
  * smack_fs_context_dup - Duplicate the security data on fs_context duplication
  * @fc: The new filesystem context.
@@ -4876,6 +4929,7 @@  static struct security_hook_list smack_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(ptrace_traceme, smack_ptrace_traceme),
 	LSM_HOOK_INIT(syslog, smack_syslog),
 
+	LSM_HOOK_INIT(fs_context_init, smack_fs_context_init),
 	LSM_HOOK_INIT(fs_context_dup, smack_fs_context_dup),
 	LSM_HOOK_INIT(fs_context_parse_param, smack_fs_context_parse_param),