diff mbox series

[v9,1/8] landlock: Add IOCTL access right

Message ID 20240209170612.1638517-2-gnoack@google.com (mailing list archive)
State Handled Elsewhere
Headers show
Series Landlock: IOCTL support | expand

Commit Message

Günther Noack Feb. 9, 2024, 5:06 p.m. UTC
Introduces the LANDLOCK_ACCESS_FS_IOCTL access right
and increments the Landlock ABI version to 5.

Like the truncate right, these rights are associated with a file
descriptor at the time of open(2), and get respected even when the
file descriptor is used outside of the thread which it was originally
opened in.

A newly enabled Landlock policy therefore does not apply to file
descriptors which are already open.

If the LANDLOCK_ACCESS_FS_IOCTL right is handled, only a small number
of safe IOCTL commands will be permitted on newly opened files.  The
permitted IOCTLs can be configured through the ruleset in limited ways
now.  (See documentation for details.)

Specifically, when LANDLOCK_ACCESS_FS_IOCTL is handled, granting this
right on a file or directory will *not* permit to do all IOCTL
commands, but only influence the IOCTL commands which are not already
handled through other access rights.  The intent is to keep the groups
of IOCTL commands more fine-grained.

Noteworthy scenarios which require special attention:

TTY devices are often passed into a process from the parent process,
and so a newly enabled Landlock policy does not retroactively apply to
them automatically.  In the past, TTY devices have often supported
IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
letting callers control the TTY input buffer (and simulate
keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
modern kernels though.

Some legitimate file system features, like setting up fscrypt, are
exposed as IOCTL commands on regular files and directories -- users of
Landlock are advised to double check that the sandboxed process does
not need to invoke these IOCTLs.

Known limitations:

The LANDLOCK_ACCESS_FS_IOCTL access right is a coarse-grained control
over IOCTL commands.  Future work will enable a more fine-grained
access control for IOCTLs.

In the meantime, Landlock users may use path-based restrictions in
combination with their knowledge about the file system layout to
control what IOCTLs can be done.  Mounting file systems with the nodev
option can help to distinguish regular files and devices, and give
guarantees about the affected files, which Landlock alone can not give
yet.

Signed-off-by: Günther Noack <gnoack@google.com>
---
 include/uapi/linux/landlock.h                |  55 ++++-
 security/landlock/fs.c                       | 227 ++++++++++++++++++-
 security/landlock/fs.h                       |   3 +
 security/landlock/limits.h                   |  11 +-
 security/landlock/ruleset.h                  |   2 +-
 security/landlock/syscalls.c                 |  19 +-
 tools/testing/selftests/landlock/base_test.c |   2 +-
 tools/testing/selftests/landlock/fs_test.c   |   5 +-
 8 files changed, 302 insertions(+), 22 deletions(-)

Comments

Günther Noack Feb. 10, 2024, 11:06 a.m. UTC | #1
Hello Arnd!

On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> Introduces the LANDLOCK_ACCESS_FS_IOCTL access right
> and increments the Landlock ABI version to 5.
>
> [...]
>
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 73997e63734f..84efea3f7c0f 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> +/**
> + * get_required_ioctl_access(): Determine required IOCTL access rights.
> + *
> + * @cmd: The IOCTL command that is supposed to be run.
> + *
> + * Any new IOCTL commands that are implemented in fs/ioctl.c's do_vfs_ioctl()
> + * should be considered for inclusion here.
> + *
> + * Returns: The access rights that must be granted on an opened file in order to
> + * use the given @cmd.
> + */
> +static __attribute_const__ access_mask_t
> +get_required_ioctl_access(const unsigned int cmd)
> +{
> +	switch (cmd) {

  [...]

> +	case FIONREAD:

Hello Arnd!  Christian Brauner suggested at FOSDEM that you would be a
good person to reach out to regarding this question -- we would
appreciate if you could have a short look! :)

Context: This patch set lets processes restrict the use of IOCTLs with
the Landlock LSM.  To make the use of this feature more practical, we
are relaxing the rules for some common and harmless IOCTL commands,
which are directly implemented in fs/ioctl.c.

The IOCTL command in question is FIONREAD: fs/ioctl.c implements
FIONREAD directly for S_ISREG files, but it does call the FIONREAD
implementation in the VFS layer for other file types.

The question we are asking ourselves is:

* Can we let processes safely use FIONREAD for all files which get
  opened for the purpose of reading, or do we run the risk of
  accidentally exposing surprising IOCTL implementations which have
  completely different purposes?

  Is it safe to assume that the VFS layer FIONREAD implementations are
  actually implementing FIONREAD semantics?

* I know there have been accidental collisions of IOCTL command
  numbers in the past -- Hypothetically, if this were to happen in one
  of the VFS implementations of FIONREAD, would that be considered a
  bug that would need to get fixed in that implementation?


> +	case FIOQSIZE:
> +	case FIGETBSZ:
> +		/*
> +		 * FIONREAD returns the number of bytes available for reading.
> +		 * FIONREAD returns the number of immediately readable bytes for
> +		 * a file.

(Oops, repetitive doc, probably mis-merged.  Fixing it in next version.)


Thanks!
—Günther
Günther Noack Feb. 10, 2024, 11:18 a.m. UTC | #2
On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 73997e63734f..84efea3f7c0f 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -1333,7 +1520,9 @@ static int hook_file_open(struct file *const file)
>  {
>  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
>  	access_mask_t open_access_request, full_access_request, allowed_access;
> -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> +	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE |
> +					      LANDLOCK_ACCESS_FS_IOCTL |
> +					      IOCTL_GROUPS;
>  	const struct landlock_ruleset *const dom = get_current_fs_domain();
>  
>  	if (!dom)
> @@ -1375,6 +1564,16 @@ static int hook_file_open(struct file *const file)
>  		}
>  	}
>  
> +	/*
> +	 * Named pipes should be treated just like anonymous pipes.
> +	 * Therefore, we permit all IOCTLs on them.
> +	 */
> +	if (S_ISFIFO(file_inode(file)->i_mode)) {
> +		allowed_access |= LANDLOCK_ACCESS_FS_IOCTL |
> +				  LANDLOCK_ACCESS_FS_IOCTL_RW |
> +				  LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
> +	}
> +

Hello Mickaël, this "if" is a change I'd like to draw your attention
to -- this special case was necessary so that all IOCTLs are permitted
on named pipes. (There is also a test for it in another commit.)

Open questions here are:

 - I'm a bit on the edge whether it's worth it to have these special
   cases here.  After all, users can very easily just permit all
   IOCTLs through the ruleset if needed, and it might simplify the
   mental model that we have to explain in the documentation

 - I've put the special case into the file open hook, under the
   assumption that it would simplify the Landlock audit support to
   have the correct rights on the struct file.  The implementation
   could alternatively also be done in the ioctl hook. Let me know
   which one makes more sense to you.

BTW, named UNIX domain sockets can apparently not be opened with open() and
therefore they don't hit the LSM file_open hook.  (It is done with the BSD
socket API instead.)

Thanks!
—Günther
Arnd Bergmann Feb. 10, 2024, 11:49 a.m. UTC | #3
On Sat, Feb 10, 2024, at 12:06, Günther Noack wrote:
> On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
>
> The IOCTL command in question is FIONREAD: fs/ioctl.c implements
> FIONREAD directly for S_ISREG files, but it does call the FIONREAD
> implementation in the VFS layer for other file types.
>
> The question we are asking ourselves is:
>
> * Can we let processes safely use FIONREAD for all files which get
>   opened for the purpose of reading, or do we run the risk of
>   accidentally exposing surprising IOCTL implementations which have
>   completely different purposes?
>
>   Is it safe to assume that the VFS layer FIONREAD implementations are
>   actually implementing FIONREAD semantics?
>
> * I know there have been accidental collisions of IOCTL command
>   numbers in the past -- Hypothetically, if this were to happen in one
>   of the VFS implementations of FIONREAD, would that be considered a
>   bug that would need to get fixed in that implementation?

Clearly it's impossible to be sure no driver has a conflict
on this particular ioctl, but the risk for one intentionally
overriding it should be fairly low.

There are a couple of possible issues I can think of:

- the numeric value of FIONREAD is different depending
  on the architecture, with at least four different numbers
  aliasing to it. This is probably harmless but makes it
  harder to look for accidental conflicts.

- Aside from FIONREAD, it is sometimes called SIOCINQ
  (for sockets) or TIOCINQ (for tty). These still go
  through the same VFS entry point and as far as I can
  tell always have the same semantics (writing 4 bytes
  of data with the count of the remaining bytes in the
  fd).

- There are probably a couple of drivers that do something
  in their ioctl handler without actually looking at
  the command number.

If you want to be really sure you get this right, you
could add a new callback to struct file_operations
that handles this for all drivers, something like

static int ioctl_fionread(struct file *filp, int __user *arg)
{
     int n;

     if (S_ISREG(inode->i_mode))
         return put_user(i_size_read(inode) - filp->f_pos, arg);

     if (!file->f_op->fionread)
         return -ENOIOCTLCMD;

     n = file->f_op->fionread(filp);

     if (n < 0)
         return n;

     return put_user(n, arg);
}

With this, you can go through any driver implementing
FIONREAD/SIOCINQ/TIOCINQ and move the code from .ioctl
into .fionread. This probably results in cleaner code
overall, especially in drivers that have no other ioctl
commands besides this one.

Since sockets and ttys tend to have both SIOCINQ/TIOCINQ
and SIOCOUTQ/TIOCOUTQ (unlike regular files), it's
probably best to do both at the same time, or maybe
have a single callback pointer with an in/out flag.

       Arnd
Christian Brauner Feb. 12, 2024, 11:09 a.m. UTC | #4
On Sat, Feb 10, 2024 at 12:49:23PM +0100, Arnd Bergmann wrote:
> On Sat, Feb 10, 2024, at 12:06, Günther Noack wrote:
> > On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> >
> > The IOCTL command in question is FIONREAD: fs/ioctl.c implements
> > FIONREAD directly for S_ISREG files, but it does call the FIONREAD
> > implementation in the VFS layer for other file types.
> >
> > The question we are asking ourselves is:
> >
> > * Can we let processes safely use FIONREAD for all files which get
> >   opened for the purpose of reading, or do we run the risk of
> >   accidentally exposing surprising IOCTL implementations which have
> >   completely different purposes?
> >
> >   Is it safe to assume that the VFS layer FIONREAD implementations are
> >   actually implementing FIONREAD semantics?

Yes, otherwise this should considered a bug.

> >
> > * I know there have been accidental collisions of IOCTL command
> >   numbers in the past -- Hypothetically, if this were to happen in one
> >   of the VFS implementations of FIONREAD, would that be considered a
> >   bug that would need to get fixed in that implementation?
> 
> Clearly it's impossible to be sure no driver has a conflict
> on this particular ioctl, but the risk for one intentionally
> overriding it should be fairly low.
> 
> There are a couple of possible issues I can think of:
> 
> - the numeric value of FIONREAD is different depending
>   on the architecture, with at least four different numbers
>   aliasing to it. This is probably harmless but makes it
>   harder to look for accidental conflicts.
> 
> - Aside from FIONREAD, it is sometimes called SIOCINQ
>   (for sockets) or TIOCINQ (for tty). These still go
>   through the same VFS entry point and as far as I can
>   tell always have the same semantics (writing 4 bytes
>   of data with the count of the remaining bytes in the
>   fd).
> 
> - There are probably a couple of drivers that do something
>   in their ioctl handler without actually looking at
>   the command number.
> 
> If you want to be really sure you get this right, you
> could add a new callback to struct file_operations
> that handles this for all drivers, something like
> 
> static int ioctl_fionread(struct file *filp, int __user *arg)
> {
>      int n;
> 
>      if (S_ISREG(inode->i_mode))
>          return put_user(i_size_read(inode) - filp->f_pos, arg);
> 
>      if (!file->f_op->fionread)
>          return -ENOIOCTLCMD;
> 
>      n = file->f_op->fionread(filp);
> 
>      if (n < 0)
>          return n;
> 
>      return put_user(n, arg);
> }
> 
> With this, you can go through any driver implementing
> FIONREAD/SIOCINQ/TIOCINQ and move the code from .ioctl
> into .fionread. This probably results in cleaner code
> overall, especially in drivers that have no other ioctl
> commands besides this one.
> 
> Since sockets and ttys tend to have both SIOCINQ/TIOCINQ
> and SIOCOUTQ/TIOCOUTQ (unlike regular files), it's

I'm not excited about adding a bunch of methods to struct
file_operations for this stuff.
Günther Noack Feb. 12, 2024, 10:10 p.m. UTC | #5
Thank you, Arnd and Christian, for the detailed insights!
This is very helpful!

On Mon, Feb 12, 2024 at 12:09:47PM +0100, Christian Brauner wrote:
> On Sat, Feb 10, 2024 at 12:49:23PM +0100, Arnd Bergmann wrote:
> > On Sat, Feb 10, 2024, at 12:06, Günther Noack wrote:
> > > On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> > >
> > > The IOCTL command in question is FIONREAD: fs/ioctl.c implements
> > > FIONREAD directly for S_ISREG files, but it does call the FIONREAD
> > > implementation in the VFS layer for other file types.
> > >
> > > The question we are asking ourselves is:
> > >
> > > * Can we let processes safely use FIONREAD for all files which get
> > >   opened for the purpose of reading, or do we run the risk of
> > >   accidentally exposing surprising IOCTL implementations which have
> > >   completely different purposes?
> > >
> > >   Is it safe to assume that the VFS layer FIONREAD implementations are
> > >   actually implementing FIONREAD semantics?
> 
> Yes, otherwise this should considered a bug.

Excellent, thanks :)


> > > * I know there have been accidental collisions of IOCTL command
> > >   numbers in the past -- Hypothetically, if this were to happen in one
> > >   of the VFS implementations of FIONREAD, would that be considered a
> > >   bug that would need to get fixed in that implementation?
> > 
> > Clearly it's impossible to be sure no driver has a conflict
> > on this particular ioctl, but the risk for one intentionally
> > overriding it should be fairly low.
> > 
> > There are a couple of possible issues I can think of:
> > 
> > - the numeric value of FIONREAD is different depending
> >   on the architecture, with at least four different numbers
> >   aliasing to it. This is probably harmless but makes it
> >   harder to look for accidental conflicts.
> > 
> > - Aside from FIONREAD, it is sometimes called SIOCINQ
> >   (for sockets) or TIOCINQ (for tty). These still go
> >   through the same VFS entry point and as far as I can
> >   tell always have the same semantics (writing 4 bytes
> >   of data with the count of the remaining bytes in the
> >   fd).

Thanks, good pointer!

Grepping for these three names, I found:

* ~10 FIONREAD implementations in various drivers
* ~10 TIOCINQ implementations (all in net/, mostly in net/*/af_*.c files)
* ~20 SIOCINQ implementations (all in net/, related to network protocols?)

The implementations seem mostly related to networking, which gives me
hope that special casing and denying FIONREAD in the common case might
not make a big difference after all.

(The ioctl filtering in this patch set only applies to files opened
through open(2), but not to network sockets and other files acquired
through socket(2), pipe(2), socketpair(2), fanotiy_init(2) and the
like. -- It just gets messy if such files are opened through
/proc/*/fd/*)


> > - There are probably a couple of drivers that do something
> >   in their ioctl handler without actually looking at
> >   the command number.

Thanks you for the pointers!

You are right, it is surprisingly common that ioctl handlers do work
without first looking at the command number.  I spot checked a few
ioctl handler implementations and it is easy to dig up examples.

If this is done, the pattern is often this:

   preparation_work();
   
   switch (cmd) {
   case A:   /* impl */
   case B:   /* impl */
   default:  /* set error */
   }
   
   cleanup_work();

Common types of preparation work are the acquisition and release of
locks, allocation and release of commonly used buffers, copying memory
to and from userspace, and flushing buffers.

One of the larger examples which I found was video_ioctl2() from
drivers/media/v412-core/v412-ioctl.c, which is used from multiple
video drivers.

It also seems to me that ioctl handlers doing work independent of
command number might be a bigger problem than the hypothetical command
number collision I originally asked about.  -- If we allow FIONREAD to be called on files too liberally, we are not only exposing 


> > If you want to be really sure you get this right, you
> > could add a new callback to struct file_operations
> > that handles this for all drivers, something like
> > 
> > static int ioctl_fionread(struct file *filp, int __user *arg)
> > {
> >      int n;
> > 
> >      if (S_ISREG(inode->i_mode))
> >          return put_user(i_size_read(inode) - filp->f_pos, arg);
> > 
> >      if (!file->f_op->fionread)
> >          return -ENOIOCTLCMD;
> > 
> >      n = file->f_op->fionread(filp);
> > 
> >      if (n < 0)
> >          return n;
> > 
> >      return put_user(n, arg);
> > }
> > 
> > With this, you can go through any driver implementing
> > FIONREAD/SIOCINQ/TIOCINQ and move the code from .ioctl
> > into .fionread. This probably results in cleaner code
> > overall, especially in drivers that have no other ioctl
> > commands besides this one.
> > 
> > Since sockets and ttys tend to have both SIOCINQ/TIOCINQ
> > and SIOCOUTQ/TIOCOUTQ (unlike regular files), it's
> > probably best to do both at the same time, or maybe
> > have a single callback pointer with an in/out flag.
> 
> I'm not excited about adding a bunch of methods to struct
> file_operations for this stuff.

Agreed.  As far as I understand, if we added a special .fionread
handler to struct file_operations, the pros and cons would be:

 + For files that don't implement FIONREAD, calling ioctl(fd,
   FIONREAD, ...) could not accidentally execute ioctl handler
   preparation and cleanup work.

 - It would use a bit more space in struct file_operations.

 - It might not be obvious to driver implementers that they'd need to
   hook into .fionread.  There is a slight risk that some ioctl
   implementers would still accidentally implement FIONREAD the "old"
   way, and then it would not get called.

 - It would set a weird precedent to have a special handler for a
   single IOCTL command (why is this one command special?); this can't
   be done for all IOCTL commands like that.

If I weigh this against each other, I am not convinced that it
wouldn't cause more complications later on.  But it was a good idea
that I had not considered and it was worth looking into, I think.


In summary, I need to digest this a bit longer ;-)

I think these two were the key insights for me from this discussion:

 * There are fewer implementations of FIONREAD than I was afraid we
   would find.  This gives me some hope that we can maybe special case
   it after all in Landlock, and solve the 99% of realistic scenarios
   with it (and the remaining 1% can still ask for the "all IOCTLs"
   right, if needed).

 * Some IOCTL handlers are messier than I had expected, and it seems
   less realistic that we can convince ourselves that these are safe.
   I particularly did not realize before that ioctl handlers that
   don't even implement FIONREAD might actually still do work when
   called with FIONREAD... who would have thought! o_O

I'll try to explore the direction of special casing FIONREAD for now,
and think about it some more.  Thank you for the insightful input,
I'll loop you in when I have something more concrete.

—Günther
Mickaël Salaün Feb. 16, 2024, 2:11 p.m. UTC | #6
On Sat, Feb 10, 2024 at 12:18:06PM +0100, Günther Noack wrote:
> On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> > diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> > index 73997e63734f..84efea3f7c0f 100644
> > --- a/security/landlock/fs.c
> > +++ b/security/landlock/fs.c
> > @@ -1333,7 +1520,9 @@ static int hook_file_open(struct file *const file)
> >  {
> >  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> >  	access_mask_t open_access_request, full_access_request, allowed_access;
> > -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > +	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE |
> > +					      LANDLOCK_ACCESS_FS_IOCTL |
> > +					      IOCTL_GROUPS;
> >  	const struct landlock_ruleset *const dom = get_current_fs_domain();
> >  
> >  	if (!dom)
> > @@ -1375,6 +1564,16 @@ static int hook_file_open(struct file *const file)
> >  		}
> >  	}
> >  
> > +	/*
> > +	 * Named pipes should be treated just like anonymous pipes.
> > +	 * Therefore, we permit all IOCTLs on them.
> > +	 */
> > +	if (S_ISFIFO(file_inode(file)->i_mode)) {
> > +		allowed_access |= LANDLOCK_ACCESS_FS_IOCTL |
> > +				  LANDLOCK_ACCESS_FS_IOCTL_RW |
> > +				  LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;

Why not LANDLOCK_ACCESS_FS_IOCTL | IOCTL_GROUPS instead?

> > +	}
> > +
> 
> Hello Mickaël, this "if" is a change I'd like to draw your attention
> to -- this special case was necessary so that all IOCTLs are permitted
> on named pipes. (There is also a test for it in another commit.)
> 
> Open questions here are:
> 
>  - I'm a bit on the edge whether it's worth it to have these special
>    cases here.  After all, users can very easily just permit all
>    IOCTLs through the ruleset if needed, and it might simplify the
>    mental model that we have to explain in the documentation

It might simplify the kernel implementation a bit but it would make the
Landlock security policies more complex, and could encourage people to
allow all IOCTLs on a directory because this directory might contain
(dynamically created) named pipes.

I suggest to extend this check with S_ISFIFO(mode) || S_ISSOCK(mode).
A comment should explain that LANDLOCK_ACCESS_FS_* rights are not meant
to restrict IPCs.

> 
>  - I've put the special case into the file open hook, under the
>    assumption that it would simplify the Landlock audit support to
>    have the correct rights on the struct file.  The implementation
>    could alternatively also be done in the ioctl hook. Let me know
>    which one makes more sense to you.

I like your approach, thanks!  Also, in theory this approach should be
better for performance reasons, even if it should not be visible in
practice. Anyway, keeping a consistent set of access rights is
definitely useful for observability.

I'm wondering if we should do the same mode check for
LANDLOCK_ACCESS_FS_TRUNCATE too... It would not be visible to user space
anyway because the LSM hooks are called after the file mode checks for
truncate(2) and ftruncate(2). But because we need this kind of check for
IOCTL, it might be a good idea to make it common to all optional_access
values, at least to document what is really handled. Adding dedicated
truncate and ftruncate tests (before this commit) would guarantee that
the returned error codes are unchanged.

Moving this check before the is_access_to_paths_allowed() call would
enable to avoid looking for always-allowed access rights by removing
them from the full_access_request. This could help improve performance
when opening named pipe because no optional_access would be requested.

A new helper similar to get_required_file_open_access() could help.

> 
> BTW, named UNIX domain sockets can apparently not be opened with open() and
> therefore they don't hit the LSM file_open hook.  (It is done with the BSD
> socket API instead.)

What about /proc/*/fd/* ? We can test with open_proc_fd() to make sure
our assumptions are correct.
Mickaël Salaün Feb. 16, 2024, 3:51 p.m. UTC | #7
On Fri, Feb 16, 2024 at 03:11:18PM +0100, Mickaël Salaün wrote:
> On Sat, Feb 10, 2024 at 12:18:06PM +0100, Günther Noack wrote:
> > On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> > > diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> > > index 73997e63734f..84efea3f7c0f 100644
> > > --- a/security/landlock/fs.c
> > > +++ b/security/landlock/fs.c
> > > @@ -1333,7 +1520,9 @@ static int hook_file_open(struct file *const file)
> > >  {
> > >  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> > >  	access_mask_t open_access_request, full_access_request, allowed_access;
> > > -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > > +	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE |
> > > +					      LANDLOCK_ACCESS_FS_IOCTL |
> > > +					      IOCTL_GROUPS;
> > >  	const struct landlock_ruleset *const dom = get_current_fs_domain();
> > >  
> > >  	if (!dom)
> > > @@ -1375,6 +1564,16 @@ static int hook_file_open(struct file *const file)
> > >  		}
> > >  	}
> > >  
> > > +	/*
> > > +	 * Named pipes should be treated just like anonymous pipes.
> > > +	 * Therefore, we permit all IOCTLs on them.
> > > +	 */
> > > +	if (S_ISFIFO(file_inode(file)->i_mode)) {
> > > +		allowed_access |= LANDLOCK_ACCESS_FS_IOCTL |
> > > +				  LANDLOCK_ACCESS_FS_IOCTL_RW |
> > > +				  LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
> 
> Why not LANDLOCK_ACCESS_FS_IOCTL | IOCTL_GROUPS instead?
> 
> > > +	}
> > > +
> > 
> > Hello Mickaël, this "if" is a change I'd like to draw your attention
> > to -- this special case was necessary so that all IOCTLs are permitted
> > on named pipes. (There is also a test for it in another commit.)
> > 
> > Open questions here are:
> > 
> >  - I'm a bit on the edge whether it's worth it to have these special
> >    cases here.  After all, users can very easily just permit all
> >    IOCTLs through the ruleset if needed, and it might simplify the
> >    mental model that we have to explain in the documentation
> 
> It might simplify the kernel implementation a bit but it would make the
> Landlock security policies more complex, and could encourage people to
> allow all IOCTLs on a directory because this directory might contain
> (dynamically created) named pipes.
> 
> I suggest to extend this check with S_ISFIFO(mode) || S_ISSOCK(mode).
> A comment should explain that LANDLOCK_ACCESS_FS_* rights are not meant
> to restrict IPCs.
> 
> > 
> >  - I've put the special case into the file open hook, under the
> >    assumption that it would simplify the Landlock audit support to
> >    have the correct rights on the struct file.  The implementation
> >    could alternatively also be done in the ioctl hook. Let me know
> >    which one makes more sense to you.
> 
> I like your approach, thanks!  Also, in theory this approach should be
> better for performance reasons, even if it should not be visible in
> practice. Anyway, keeping a consistent set of access rights is
> definitely useful for observability.
> 
> I'm wondering if we should do the same mode check for
> LANDLOCK_ACCESS_FS_TRUNCATE too... It would not be visible to user space
> anyway because the LSM hooks are called after the file mode checks for
> truncate(2) and ftruncate(2). But because we need this kind of check for
> IOCTL, it might be a good idea to make it common to all optional_access
> values, at least to document what is really handled. Adding dedicated
> truncate and ftruncate tests (before this commit) would guarantee that
> the returned error codes are unchanged.
> 
> Moving this check before the is_access_to_paths_allowed() call would
> enable to avoid looking for always-allowed access rights by removing
> them from the full_access_request. This could help improve performance
> when opening named pipe because no optional_access would be requested.
> 
> A new helper similar to get_required_file_open_access() could help.
> 
> > 
> > BTW, named UNIX domain sockets can apparently not be opened with open() and
> > therefore they don't hit the LSM file_open hook.  (It is done with the BSD
> > socket API instead.)
> 
> What about /proc/*/fd/* ? We can test with open_proc_fd() to make sure
> our assumptions are correct.

Actually, these fifo and socket checks (and related optimizations)
should already be handled with is_nouser_or_private() called by
is_access_to_paths_allowed(). Some new dedicated tests should help
though.
Mickaël Salaün Feb. 16, 2024, 5:19 p.m. UTC | #8
On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> Introduces the LANDLOCK_ACCESS_FS_IOCTL access right
> and increments the Landlock ABI version to 5.
> 
> Like the truncate right, these rights are associated with a file
> descriptor at the time of open(2), and get respected even when the
> file descriptor is used outside of the thread which it was originally
> opened in.
> 
> A newly enabled Landlock policy therefore does not apply to file
> descriptors which are already open.
> 
> If the LANDLOCK_ACCESS_FS_IOCTL right is handled, only a small number
> of safe IOCTL commands will be permitted on newly opened files.  The
> permitted IOCTLs can be configured through the ruleset in limited ways
> now.  (See documentation for details.)
> 
> Specifically, when LANDLOCK_ACCESS_FS_IOCTL is handled, granting this
> right on a file or directory will *not* permit to do all IOCTL
> commands, but only influence the IOCTL commands which are not already
> handled through other access rights.  The intent is to keep the groups
> of IOCTL commands more fine-grained.
> 
> Noteworthy scenarios which require special attention:
> 
> TTY devices are often passed into a process from the parent process,
> and so a newly enabled Landlock policy does not retroactively apply to
> them automatically.  In the past, TTY devices have often supported
> IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
> letting callers control the TTY input buffer (and simulate
> keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
> modern kernels though.
> 
> Some legitimate file system features, like setting up fscrypt, are
> exposed as IOCTL commands on regular files and directories -- users of
> Landlock are advised to double check that the sandboxed process does
> not need to invoke these IOCTLs.
> 
> Known limitations:
> 
> The LANDLOCK_ACCESS_FS_IOCTL access right is a coarse-grained control
> over IOCTL commands.  Future work will enable a more fine-grained
> access control for IOCTLs.
> 
> In the meantime, Landlock users may use path-based restrictions in
> combination with their knowledge about the file system layout to
> control what IOCTLs can be done.  Mounting file systems with the nodev
> option can help to distinguish regular files and devices, and give
> guarantees about the affected files, which Landlock alone can not give
> yet.
> 
> Signed-off-by: Günther Noack <gnoack@google.com>
> ---
>  include/uapi/linux/landlock.h                |  55 ++++-
>  security/landlock/fs.c                       | 227 ++++++++++++++++++-
>  security/landlock/fs.h                       |   3 +
>  security/landlock/limits.h                   |  11 +-
>  security/landlock/ruleset.h                  |   2 +-
>  security/landlock/syscalls.c                 |  19 +-
>  tools/testing/selftests/landlock/base_test.c |   2 +-
>  tools/testing/selftests/landlock/fs_test.c   |   5 +-
>  8 files changed, 302 insertions(+), 22 deletions(-)
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index 25c8d7677539..16d7d72804f8 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -128,7 +128,7 @@ struct landlock_net_port_attr {
>   * files and directories.  Files or directories opened before the sandboxing
>   * are not subject to these restrictions.
>   *
> - * A file can only receive these access rights:
> + * The following access rights apply only to files:
>   *
>   * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file.
>   * - %LANDLOCK_ACCESS_FS_WRITE_FILE: Open a file with write access. Note that
> @@ -138,12 +138,13 @@ struct landlock_net_port_attr {
>   * - %LANDLOCK_ACCESS_FS_READ_FILE: Open a file with read access.
>   * - %LANDLOCK_ACCESS_FS_TRUNCATE: Truncate a file with :manpage:`truncate(2)`,
>   *   :manpage:`ftruncate(2)`, :manpage:`creat(2)`, or :manpage:`open(2)` with
> - *   ``O_TRUNC``. Whether an opened file can be truncated with
> - *   :manpage:`ftruncate(2)` is determined during :manpage:`open(2)`, in the
> - *   same way as read and write permissions are checked during
> - *   :manpage:`open(2)` using %LANDLOCK_ACCESS_FS_READ_FILE and
> - *   %LANDLOCK_ACCESS_FS_WRITE_FILE. This access right is available since the
> - *   third version of the Landlock ABI.
> + *   ``O_TRUNC``.  This access right is available since the third version of the
> + *   Landlock ABI.
> + *
> + * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
> + * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
> + * read and write permissions are checked during :manpage:`open(2)` using
> + * %LANDLOCK_ACCESS_FS_READ_FILE and %LANDLOCK_ACCESS_FS_WRITE_FILE.
>   *
>   * A directory can receive access rights related to files or directories.  The
>   * following access right is applied to the directory itself, and the
> @@ -198,13 +199,50 @@ struct landlock_net_port_attr {
>   *   If multiple requirements are not met, the ``EACCES`` error code takes
>   *   precedence over ``EXDEV``.
>   *
> + * The following access right applies both to files and directories:
> + *
> + * - %LANDLOCK_ACCESS_FS_IOCTL: Invoke :manpage:`ioctl(2)` commands on an opened
> + *   file or directory.
> + *
> + *   This access right applies to all :manpage:`ioctl(2)` commands, except of
> + *   ``FIOCLEX``, ``FIONCLEX``, ``FIONBIO`` and ``FIOASYNC``.  These commands
> + *   continue to be invokable independent of the %LANDLOCK_ACCESS_FS_IOCTL
> + *   access right.
> + *
> + *   When certain other access rights are handled in the ruleset, in addition to
> + *   %LANDLOCK_ACCESS_FS_IOCTL, granting these access rights will unlock access
> + *   to additional groups of IOCTL commands, on the affected files:
> + *
> + *   * %LANDLOCK_ACCESS_FS_READ_FILE and %LANDLOCK_ACCESS_FS_WRITE_FILE unlock
> + *     access to ``FIOQSIZE``, ``FIONREAD``, ``FIGETBSZ``, ``FS_IOC_FIEMAP``,
> + *     ``FIBMAP``, ``FIDEDUPERANGE``, ``FICLONE``, ``FICLONERANGE``,
> + *     ``FS_IOC_RESVSP``, ``FS_IOC_RESVSP64``, ``FS_IOC_UNRESVSP``,
> + *     ``FS_IOC_UNRESVSP64``, ``FS_IOC_ZERO_RANGE``.
> + *
> + *   * %LANDLOCK_ACCESS_FS_READ_DIR unlocks access to ``FIOQSIZE``,
> + *     ``FIONREAD``, ``FIGETBSZ``.
> + *
> + *   When these access rights are handled in the ruleset, the availability of
> + *   the affected IOCTL commands is not governed by %LANDLOCK_ACCESS_FS_IOCTL
> + *   any more, but by the respective access right.
> + *
> + *   All other IOCTL commands are not handled specially, and are governed by
> + *   %LANDLOCK_ACCESS_FS_IOCTL.  This includes %FS_IOC_GETFLAGS and
> + *   %FS_IOC_SETFLAGS for manipulating inode flags (:manpage:`ioctl_iflags(2)`),
> + *   %FS_IOC_FSFETXATTR and %FS_IOC_FSSETXATTR for manipulating extended
> + *   attributes, as well as %FIFREEZE and %FITHAW for freezing and thawing file
> + *   systems.
> + *
> + *   This access right is available since the fifth version of the Landlock
> + *   ABI.
> + *
>   * .. warning::
>   *
>   *   It is currently not possible to restrict some file-related actions
>   *   accessible through these syscall families: :manpage:`chdir(2)`,
>   *   :manpage:`stat(2)`, :manpage:`flock(2)`, :manpage:`chmod(2)`,
>   *   :manpage:`chown(2)`, :manpage:`setxattr(2)`, :manpage:`utime(2)`,
> - *   :manpage:`ioctl(2)`, :manpage:`fcntl(2)`, :manpage:`access(2)`.
> + *   :manpage:`fcntl(2)`, :manpage:`access(2)`.
>   *   Future Landlock evolutions will enable to restrict them.
>   */
>  /* clang-format off */
> @@ -223,6 +261,7 @@ struct landlock_net_port_attr {
>  #define LANDLOCK_ACCESS_FS_MAKE_SYM			(1ULL << 12)
>  #define LANDLOCK_ACCESS_FS_REFER			(1ULL << 13)
>  #define LANDLOCK_ACCESS_FS_TRUNCATE			(1ULL << 14)
> +#define LANDLOCK_ACCESS_FS_IOCTL			(1ULL << 15)
>  /* clang-format on */
>  
>  /**
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 73997e63734f..84efea3f7c0f 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -7,6 +7,7 @@
>   * Copyright © 2021-2022 Microsoft Corporation
>   */
>  
> +#include <asm/ioctls.h>
>  #include <kunit/test.h>
>  #include <linux/atomic.h>
>  #include <linux/bitops.h>
> @@ -14,6 +15,7 @@
>  #include <linux/compiler_types.h>
>  #include <linux/dcache.h>
>  #include <linux/err.h>
> +#include <linux/falloc.h>
>  #include <linux/fs.h>
>  #include <linux/init.h>
>  #include <linux/kernel.h>
> @@ -29,6 +31,7 @@
>  #include <linux/types.h>
>  #include <linux/wait_bit.h>
>  #include <linux/workqueue.h>
> +#include <uapi/linux/fiemap.h>
>  #include <uapi/linux/landlock.h>
>  
>  #include "common.h"
> @@ -84,6 +87,186 @@ static const struct landlock_object_underops landlock_fs_underops = {
>  	.release = release_inode
>  };
>  
> +/* IOCTL helpers */
> +
> +/*
> + * These are synthetic access rights, which are only used within the kernel, but
> + * not exposed to callers in userspace.  The mapping between these access rights
> + * and IOCTL commands is defined in the get_required_ioctl_access() helper function.
> + */
> +#define LANDLOCK_ACCESS_FS_IOCTL_RW (LANDLOCK_LAST_PUBLIC_ACCESS_FS << 1)
> +#define LANDLOCK_ACCESS_FS_IOCTL_RW_FILE (LANDLOCK_LAST_PUBLIC_ACCESS_FS << 2)
> +
> +/* ioctl_groups - all synthetic access rights for IOCTL command groups */
> +/* clang-format off */
> +#define IOCTL_GROUPS (				\
> +	LANDLOCK_ACCESS_FS_IOCTL_RW |		\
> +	LANDLOCK_ACCESS_FS_IOCTL_RW_FILE)
> +/* clang-format on */
> +
> +static_assert((IOCTL_GROUPS & LANDLOCK_MASK_ACCESS_FS) == IOCTL_GROUPS);
> +
> +/**
> + * get_required_ioctl_access(): Determine required IOCTL access rights.
> + *
> + * @cmd: The IOCTL command that is supposed to be run.
> + *
> + * Any new IOCTL commands that are implemented in fs/ioctl.c's do_vfs_ioctl()
> + * should be considered for inclusion here.

It might be a good idea to add a similar comment in
fs/ioctl.c:do_vfs_ioctl(), just before the "default" case, to make sure
nobody forget to Cc us if a new command is added.

> + *
> + * Returns: The access rights that must be granted on an opened file in order to
> + * use the given @cmd.
> + */
> +static __attribute_const__ access_mask_t
> +get_required_ioctl_access(const unsigned int cmd)
> +{
> +	switch (cmd) {
> +	case FIOCLEX:
> +	case FIONCLEX:
> +	case FIONBIO:
> +	case FIOASYNC:
> +		/*
> +		 * FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC manipulate the FD's
> +		 * close-on-exec and the file's buffered-IO and async flags.
> +		 * These operations are also available through fcntl(2), and are
> +		 * unconditionally permitted in Landlock.
> +		 */
> +		return 0;
> +	case FIONREAD:
> +	case FIOQSIZE:
> +	case FIGETBSZ:
> +		/*
> +		 * FIONREAD returns the number of bytes available for reading.
> +		 * FIONREAD returns the number of immediately readable bytes for
> +		 * a file.
> +		 *
> +		 * FIOQSIZE queries the size of a file or directory.
> +		 *
> +		 * FIGETBSZ queries the file system's block size for a file or
> +		 * directory.
> +		 *
> +		 * These IOCTL commands are permitted for files which are opened
> +		 * with LANDLOCK_ACCESS_FS_READ_DIR,
> +		 * LANDLOCK_ACCESS_FS_READ_FILE, or
> +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL_RW;
> +	case FS_IOC_FIEMAP:
> +	case FIBMAP:
> +		/*
> +		 * FS_IOC_FIEMAP and FIBMAP query information about the
> +		 * allocation of blocks within a file.  They are permitted for
> +		 * files which are opened with LANDLOCK_ACCESS_FS_READ_FILE or
> +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> +		 */
> +		fallthrough;
> +	case FIDEDUPERANGE:
> +	case FICLONE:
> +	case FICLONERANGE:
> +		/*
> +		 * FIDEDUPERANGE, FICLONE and FICLONERANGE make files share
> +		 * their underlying storage ("reflink") between source and
> +		 * destination FDs, on file systems which support that.
> +		 *
> +		 * The underlying implementations are already checking whether
> +		 * the involved files are opened with the appropriate read/write
> +		 * modes.  We rely on this being implemented correctly.
> +		 *
> +		 * These IOCTLs are permitted for files which are opened with
> +		 * LANDLOCK_ACCESS_FS_READ_FILE or
> +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> +		 */
> +		fallthrough;
> +	case FS_IOC_RESVSP:
> +	case FS_IOC_RESVSP64:
> +	case FS_IOC_UNRESVSP:
> +	case FS_IOC_UNRESVSP64:
> +	case FS_IOC_ZERO_RANGE:
> +		/*
> +		 * These IOCTLs reserve space, or create holes like
> +		 * fallocate(2).  We rely on the implementations checking the
> +		 * files' read/write modes.
> +		 *
> +		 * These IOCTLs are permitted for files which are opened with
> +		 * LANDLOCK_ACCESS_FS_READ_FILE or
> +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
> +	default:
> +		/*
> +		 * Other commands are guarded by the catch-all access right.
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL;
> +	}

Good documentation and better grouping!

> +}
> +
> +/**
> + * expand_ioctl() - Return the dst flags from either the src flag or the
> + * %LANDLOCK_ACCESS_FS_IOCTL flag, depending on whether the
> + * %LANDLOCK_ACCESS_FS_IOCTL and src access rights are handled or not.
> + *
> + * @handled: Handled access rights.
> + * @access: The access mask to copy values from.
> + * @src: A single access right to copy from in @access.
> + * @dst: One or more access rights to copy to.
> + *
> + * Returns: @dst, or 0.
> + */
> +static __attribute_const__ access_mask_t
> +expand_ioctl(const access_mask_t handled, const access_mask_t access,
> +	     const access_mask_t src, const access_mask_t dst)
> +{
> +	access_mask_t copy_from;
> +
> +	if (!(handled & LANDLOCK_ACCESS_FS_IOCTL))
> +		return 0;
> +
> +	copy_from = (handled & src) ? src : LANDLOCK_ACCESS_FS_IOCTL;
> +	if (access & copy_from)
> +		return dst;
> +
> +	return 0;
> +}
> +
> +/**
> + * landlock_expand_access_fs() - Returns @access with the synthetic IOCTL group
> + * flags enabled if necessary.
> + *
> + * @handled: Handled FS access rights.
> + * @access: FS access rights to expand.
> + *
> + * Returns: @access expanded by the necessary flags for the synthetic IOCTL
> + * access rights.
> + */
> +static __attribute_const__ access_mask_t landlock_expand_access_fs(
> +	const access_mask_t handled, const access_mask_t access)
> +{
> +	return access |
> +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_WRITE_FILE,
> +			    LANDLOCK_ACCESS_FS_IOCTL_RW |
> +				    LANDLOCK_ACCESS_FS_IOCTL_RW_FILE) |
> +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_READ_FILE,
> +			    LANDLOCK_ACCESS_FS_IOCTL_RW |
> +				    LANDLOCK_ACCESS_FS_IOCTL_RW_FILE) |
> +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_READ_DIR,
> +			    LANDLOCK_ACCESS_FS_IOCTL_RW);
> +}
> +
> +/**
> + * landlock_expand_handled_access_fs() - add synthetic IOCTL access rights to an
> + * access mask of handled accesses.
> + *
> + * @handled: The handled accesses of a ruleset that is being created.
> + *
> + * Returns: @handled, with the bits for the synthetic IOCTL access rights set,
> + * if %LANDLOCK_ACCESS_FS_IOCTL is handled.
> + */
> +__attribute_const__ access_mask_t
> +landlock_expand_handled_access_fs(const access_mask_t handled)
> +{
> +	return landlock_expand_access_fs(handled, handled);
> +}
> +
>  /* Ruleset management */
>  
>  static struct landlock_object *get_inode_object(struct inode *const inode)
> @@ -148,7 +331,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
>  	LANDLOCK_ACCESS_FS_EXECUTE | \
>  	LANDLOCK_ACCESS_FS_WRITE_FILE | \
>  	LANDLOCK_ACCESS_FS_READ_FILE | \
> -	LANDLOCK_ACCESS_FS_TRUNCATE)
> +	LANDLOCK_ACCESS_FS_TRUNCATE | \
> +	LANDLOCK_ACCESS_FS_IOCTL)
>  /* clang-format on */
>  
>  /*
> @@ -158,6 +342,7 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
>  			    const struct path *const path,
>  			    access_mask_t access_rights)
>  {
> +	access_mask_t handled;
>  	int err;
>  	struct landlock_id id = {
>  		.type = LANDLOCK_KEY_INODE,
> @@ -170,9 +355,11 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
>  	if (WARN_ON_ONCE(ruleset->num_layers != 1))
>  		return -EINVAL;
>  
> +	handled = landlock_get_fs_access_mask(ruleset, 0);
> +	/* Expands the synthetic IOCTL groups. */
> +	access_rights |= landlock_expand_access_fs(handled, access_rights);
>  	/* Transforms relative access rights to absolute ones. */
> -	access_rights |= LANDLOCK_MASK_ACCESS_FS &
> -			 ~landlock_get_fs_access_mask(ruleset, 0);
> +	access_rights |= LANDLOCK_MASK_ACCESS_FS & ~handled;
>  	id.key.object = get_inode_object(d_backing_inode(path->dentry));
>  	if (IS_ERR(id.key.object))
>  		return PTR_ERR(id.key.object);
> @@ -1333,7 +1520,9 @@ static int hook_file_open(struct file *const file)
>  {
>  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
>  	access_mask_t open_access_request, full_access_request, allowed_access;
> -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> +	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE |
> +					      LANDLOCK_ACCESS_FS_IOCTL |
> +					      IOCTL_GROUPS;
>  	const struct landlock_ruleset *const dom = get_current_fs_domain();
>  
>  	if (!dom)
> @@ -1375,6 +1564,16 @@ static int hook_file_open(struct file *const file)
>  		}
>  	}
>  
> +	/*
> +	 * Named pipes should be treated just like anonymous pipes.
> +	 * Therefore, we permit all IOCTLs on them.
> +	 */
> +	if (S_ISFIFO(file_inode(file)->i_mode)) {
> +		allowed_access |= LANDLOCK_ACCESS_FS_IOCTL |
> +				  LANDLOCK_ACCESS_FS_IOCTL_RW |
> +				  LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
> +	}

This should not be required, cf. other thread.

> +
>  	/*
>  	 * For operations on already opened files (i.e. ftruncate()), it is the
>  	 * access rights at the time of open() which decide whether the
> @@ -1406,6 +1605,25 @@ static int hook_file_truncate(struct file *const file)
>  	return -EACCES;
>  }
>  
> +static int hook_file_ioctl(struct file *file, unsigned int cmd,
> +			   unsigned long arg)
> +{
> +	const access_mask_t required_access = get_required_ioctl_access(cmd);
> +	const access_mask_t allowed_access =
> +		landlock_file(file)->allowed_access;
> +
> +	/*
> +	 * It is the access rights at the time of opening the file which
> +	 * determine whether IOCTL can be used on the opened file later.
> +	 *
> +	 * The access right is attached to the opened file in hook_file_open().
> +	 */
> +	if ((allowed_access & required_access) == required_access)
> +		return 0;
> +
> +	return -EACCES;
> +}
> +
>  static struct security_hook_list landlock_hooks[] __ro_after_init = {
>  	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
>  
> @@ -1428,6 +1646,7 @@ static struct security_hook_list landlock_hooks[] __ro_after_init = {
>  	LSM_HOOK_INIT(file_alloc_security, hook_file_alloc_security),
>  	LSM_HOOK_INIT(file_open, hook_file_open),
>  	LSM_HOOK_INIT(file_truncate, hook_file_truncate),
> +	LSM_HOOK_INIT(file_ioctl, hook_file_ioctl),
>  };
>  
>  __init void landlock_add_fs_hooks(void)
> diff --git a/security/landlock/fs.h b/security/landlock/fs.h
> index 488e4813680a..086576b8386b 100644
> --- a/security/landlock/fs.h
> +++ b/security/landlock/fs.h
> @@ -92,4 +92,7 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
>  			    const struct path *const path,
>  			    access_mask_t access_hierarchy);
>  
> +__attribute_const__ access_mask_t
> +landlock_expand_handled_access_fs(const access_mask_t handled);
> +
>  #endif /* _SECURITY_LANDLOCK_FS_H */
> diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> index 93c9c6f91556..ecbdc8bbf906 100644
> --- a/security/landlock/limits.h
> +++ b/security/landlock/limits.h
> @@ -18,7 +18,16 @@
>  #define LANDLOCK_MAX_NUM_LAYERS		16
>  #define LANDLOCK_MAX_NUM_RULES		U32_MAX
>  
> -#define LANDLOCK_LAST_ACCESS_FS		LANDLOCK_ACCESS_FS_TRUNCATE
> +/*
> + * For file system access rights, Landlock distinguishes between the publicly
> + * visible access rights (1 to LANDLOCK_LAST_PUBLIC_ACCESS_FS) and the private
> + * ones which are not exposed to userspace (LANDLOCK_LAST_PUBLIC_ACCESS_FS + 1
> + * to LANDLOCK_LAST_ACCESS_FS).  The private access rights are defined in fs.c.
> + */
> +#define LANDLOCK_LAST_PUBLIC_ACCESS_FS	LANDLOCK_ACCESS_FS_IOCTL
> +#define LANDLOCK_MASK_PUBLIC_ACCESS_FS	((LANDLOCK_LAST_PUBLIC_ACCESS_FS << 1) - 1)
> +
> +#define LANDLOCK_LAST_ACCESS_FS		(LANDLOCK_LAST_PUBLIC_ACCESS_FS << 2)
>  #define LANDLOCK_MASK_ACCESS_FS		((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
>  #define LANDLOCK_NUM_ACCESS_FS		__const_hweight64(LANDLOCK_MASK_ACCESS_FS)
>  #define LANDLOCK_SHIFT_ACCESS_FS	0
> diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
> index c7f1526784fd..5a28ea8e1c3d 100644
> --- a/security/landlock/ruleset.h
> +++ b/security/landlock/ruleset.h
> @@ -30,7 +30,7 @@
>  	LANDLOCK_ACCESS_FS_REFER)
>  /* clang-format on */
>  
> -typedef u16 access_mask_t;
> +typedef u32 access_mask_t;
>  /* Makes sure all filesystem access rights can be stored. */
>  static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
>  /* Makes sure all network access rights can be stored. */
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 898358f57fa0..f0bc50003b46 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -137,7 +137,7 @@ static const struct file_operations ruleset_fops = {
>  	.write = fop_dummy_write,
>  };
>  
> -#define LANDLOCK_ABI_VERSION 4
> +#define LANDLOCK_ABI_VERSION 5
>  
>  /**
>   * sys_landlock_create_ruleset - Create a new ruleset
> @@ -192,8 +192,8 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
>  		return err;
>  
>  	/* Checks content (and 32-bits cast). */
> -	if ((ruleset_attr.handled_access_fs | LANDLOCK_MASK_ACCESS_FS) !=
> -	    LANDLOCK_MASK_ACCESS_FS)
> +	if ((ruleset_attr.handled_access_fs | LANDLOCK_MASK_PUBLIC_ACCESS_FS) !=
> +	    LANDLOCK_MASK_PUBLIC_ACCESS_FS)
>  		return -EINVAL;
>  
>  	/* Checks network content (and 32-bits cast). */
> @@ -201,6 +201,10 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
>  	    LANDLOCK_MASK_ACCESS_NET)
>  		return -EINVAL;
>  
> +	/* Expands synthetic IOCTL groups. */
> +	ruleset_attr.handled_access_fs = landlock_expand_handled_access_fs(
> +		ruleset_attr.handled_access_fs);
> +
>  	/* Checks arguments and transforms to kernel struct. */
>  	ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs,
>  					  ruleset_attr.handled_access_net);
> @@ -309,8 +313,13 @@ static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
>  	if (!path_beneath_attr.allowed_access)
>  		return -ENOMSG;
>  
> -	/* Checks that allowed_access matches the @ruleset constraints. */
> -	mask = landlock_get_raw_fs_access_mask(ruleset, 0);
> +	/*
> +	 * Checks that allowed_access matches the @ruleset constraints and only
> +	 * consists of publicly visible access rights (as opposed to synthetic
> +	 * ones).
> +	 */
> +	mask = landlock_get_raw_fs_access_mask(ruleset, 0) &
> +	       LANDLOCK_MASK_PUBLIC_ACCESS_FS;
>  	if ((path_beneath_attr.allowed_access | mask) != mask)
>  		return -EINVAL;
>  
> diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
> index 646f778dfb1e..d292b419ccba 100644
> --- a/tools/testing/selftests/landlock/base_test.c
> +++ b/tools/testing/selftests/landlock/base_test.c
> @@ -75,7 +75,7 @@ TEST(abi_version)
>  	const struct landlock_ruleset_attr ruleset_attr = {
>  		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
>  	};
> -	ASSERT_EQ(4, landlock_create_ruleset(NULL, 0,
> +	ASSERT_EQ(5, landlock_create_ruleset(NULL, 0,
>  					     LANDLOCK_CREATE_RULESET_VERSION));
>  
>  	ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
> diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
> index 2d6d9b43d958..3203f4a5bc85 100644
> --- a/tools/testing/selftests/landlock/fs_test.c
> +++ b/tools/testing/selftests/landlock/fs_test.c
> @@ -527,9 +527,10 @@ TEST_F_FORK(layout1, inval)
>  	LANDLOCK_ACCESS_FS_EXECUTE | \
>  	LANDLOCK_ACCESS_FS_WRITE_FILE | \
>  	LANDLOCK_ACCESS_FS_READ_FILE | \
> -	LANDLOCK_ACCESS_FS_TRUNCATE)
> +	LANDLOCK_ACCESS_FS_TRUNCATE | \
> +	LANDLOCK_ACCESS_FS_IOCTL)
>  
> -#define ACCESS_LAST LANDLOCK_ACCESS_FS_TRUNCATE
> +#define ACCESS_LAST LANDLOCK_ACCESS_FS_IOCTL
>  
>  #define ACCESS_ALL ( \
>  	ACCESS_FILE | \
> -- 
> 2.43.0.687.g38aa6559b0-goog
> 
>
Günther Noack Feb. 18, 2024, 8:34 a.m. UTC | #9
On Fri, Feb 16, 2024 at 04:51:40PM +0100, Mickaël Salaün wrote:
> On Fri, Feb 16, 2024 at 03:11:18PM +0100, Mickaël Salaün wrote:
> > What about /proc/*/fd/* ? We can test with open_proc_fd() to make sure
> > our assumptions are correct.
> 
> Actually, these fifo and socket checks (and related optimizations)
> should already be handled with is_nouser_or_private() called by
> is_access_to_paths_allowed(). Some new dedicated tests should help
> though.

I am generally a bit confused about how opening /proc/*/fd/* works.

Specifically:

* Do we have to worry about the scenario where the file_open hook gets
  called with the same struct file* twice (overwriting the access
  rights)?

* I had trouble finding the place in fs/proc/ where the re-opening is
  implemented.

Do you happen to understand this in more detail?  At what point do the
re-opened files start sharing the same kernel objects?  Is that at the
inode level?

The documentation I consulted unfortunately did not explain it either:

* The man page (proc_pid_fd(5), or previously proc(5)) does not
  discuss the behavior on open() much, apart from using it in some
  examples.

* Michael Kerrisk's "Linux Programming Interface" book claims that the
  behaviour of opening /dev/fd/1 is like doing dup(1) (section 5.11)
  -- that is true on other UNIXes, but on Linux the resulting file
  descriptors do not share the same struct file* apparently.  This
  makes a difference for regular files, where the two FDs subsequently
  use two separate offsets into the file (f_pos).

Thanks,
–Günther
Mickaël Salaün Feb. 19, 2024, 6:34 p.m. UTC | #10
Arn, Christian, please take a look at the following RFC patch and the
rationale explained here.

On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> Introduces the LANDLOCK_ACCESS_FS_IOCTL access right
> and increments the Landlock ABI version to 5.
> 
> Like the truncate right, these rights are associated with a file
> descriptor at the time of open(2), and get respected even when the
> file descriptor is used outside of the thread which it was originally
> opened in.
> 
> A newly enabled Landlock policy therefore does not apply to file
> descriptors which are already open.
> 
> If the LANDLOCK_ACCESS_FS_IOCTL right is handled, only a small number
> of safe IOCTL commands will be permitted on newly opened files.  The
> permitted IOCTLs can be configured through the ruleset in limited ways
> now.  (See documentation for details.)
> 
> Specifically, when LANDLOCK_ACCESS_FS_IOCTL is handled, granting this
> right on a file or directory will *not* permit to do all IOCTL
> commands, but only influence the IOCTL commands which are not already
> handled through other access rights.  The intent is to keep the groups
> of IOCTL commands more fine-grained.
> 
> Noteworthy scenarios which require special attention:
> 
> TTY devices are often passed into a process from the parent process,
> and so a newly enabled Landlock policy does not retroactively apply to
> them automatically.  In the past, TTY devices have often supported
> IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
> letting callers control the TTY input buffer (and simulate
> keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
> modern kernels though.
> 
> Some legitimate file system features, like setting up fscrypt, are
> exposed as IOCTL commands on regular files and directories -- users of
> Landlock are advised to double check that the sandboxed process does
> not need to invoke these IOCTLs.

I think we really need to allow fscrypt and fs-verity IOCTLs.

> 
> Known limitations:
> 
> The LANDLOCK_ACCESS_FS_IOCTL access right is a coarse-grained control
> over IOCTL commands.  Future work will enable a more fine-grained
> access control for IOCTLs.
> 
> In the meantime, Landlock users may use path-based restrictions in
> combination with their knowledge about the file system layout to
> control what IOCTLs can be done.  Mounting file systems with the nodev
> option can help to distinguish regular files and devices, and give
> guarantees about the affected files, which Landlock alone can not give
> yet.

I had a second though about our current approach, and it looks like we
can do simpler, more generic, and with less IOCTL commands specific
handling.

What we didn't take into account is that an IOCTL needs an opened file,
which means that the caller must already have been allowed to open this
file in read or write mode.

I think most FS-specific IOCTL commands check access rights (i.e. access
mode or required capability), other than implicit ones (at least read or
write), when appropriate.  We don't get such guarantee with device
drivers.

The main threat is IOCTLs on character or block devices because their
impact may be unknown (if we only look at the IOCTL command, not the
backing file), but we should allow IOCTLs on filesystems (e.g. fscrypt,
fs-verity, clone extents).  I think we should only implement a
LANDLOCK_ACCESS_FS_IOCTL_DEV right, which would be more explicit.  This
change would impact the IOCTLs grouping (not required anymore), but
we'll still need the list of VFS IOCTLs.


> 
> Signed-off-by: Günther Noack <gnoack@google.com>
> ---
>  include/uapi/linux/landlock.h                |  55 ++++-
>  security/landlock/fs.c                       | 227 ++++++++++++++++++-
>  security/landlock/fs.h                       |   3 +
>  security/landlock/limits.h                   |  11 +-
>  security/landlock/ruleset.h                  |   2 +-
>  security/landlock/syscalls.c                 |  19 +-
>  tools/testing/selftests/landlock/base_test.c |   2 +-
>  tools/testing/selftests/landlock/fs_test.c   |   5 +-
>  8 files changed, 302 insertions(+), 22 deletions(-)

> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 73997e63734f..84efea3f7c0f 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c

> @@ -84,6 +87,186 @@ static const struct landlock_object_underops landlock_fs_underops = {
>  	.release = release_inode
>  };
>  
> +/* IOCTL helpers */
> +
> +/*
> + * These are synthetic access rights, which are only used within the kernel, but
> + * not exposed to callers in userspace.  The mapping between these access rights
> + * and IOCTL commands is defined in the get_required_ioctl_access() helper function.
> + */
> +#define LANDLOCK_ACCESS_FS_IOCTL_RW (LANDLOCK_LAST_PUBLIC_ACCESS_FS << 1)
> +#define LANDLOCK_ACCESS_FS_IOCTL_RW_FILE (LANDLOCK_LAST_PUBLIC_ACCESS_FS << 2)
> +
> +/* ioctl_groups - all synthetic access rights for IOCTL command groups */
> +/* clang-format off */
> +#define IOCTL_GROUPS (				\
> +	LANDLOCK_ACCESS_FS_IOCTL_RW |		\
> +	LANDLOCK_ACCESS_FS_IOCTL_RW_FILE)
> +/* clang-format on */
> +
> +static_assert((IOCTL_GROUPS & LANDLOCK_MASK_ACCESS_FS) == IOCTL_GROUPS);
> +
> +/**
> + * get_required_ioctl_access(): Determine required IOCTL access rights.
> + *
> + * @cmd: The IOCTL command that is supposed to be run.
> + *
> + * Any new IOCTL commands that are implemented in fs/ioctl.c's do_vfs_ioctl()
> + * should be considered for inclusion here.
> + *
> + * Returns: The access rights that must be granted on an opened file in order to
> + * use the given @cmd.
> + */
> +static __attribute_const__ access_mask_t
> +get_required_ioctl_access(const unsigned int cmd)
> +{
> +	switch (cmd) {
> +	case FIOCLEX:
> +	case FIONCLEX:
> +	case FIONBIO:
> +	case FIOASYNC:
> +		/*
> +		 * FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC manipulate the FD's
> +		 * close-on-exec and the file's buffered-IO and async flags.
> +		 * These operations are also available through fcntl(2), and are
> +		 * unconditionally permitted in Landlock.
> +		 */
> +		return 0;
> +	case FIONREAD:
> +	case FIOQSIZE:
> +	case FIGETBSZ:
> +		/*
> +		 * FIONREAD returns the number of bytes available for reading.
> +		 * FIONREAD returns the number of immediately readable bytes for
> +		 * a file.
> +		 *
> +		 * FIOQSIZE queries the size of a file or directory.
> +		 *
> +		 * FIGETBSZ queries the file system's block size for a file or
> +		 * directory.
> +		 *
> +		 * These IOCTL commands are permitted for files which are opened
> +		 * with LANDLOCK_ACCESS_FS_READ_DIR,
> +		 * LANDLOCK_ACCESS_FS_READ_FILE, or
> +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> +		 */

Because files or directories can only be opened with
LANDLOCK_ACCESS_FS_{READ,WRITE}_{FILE,DIR}, and because IOCTLs can only
be sent on a file descriptor, this means that we can always allow these
3 commands (for opened files).

> +		return LANDLOCK_ACCESS_FS_IOCTL_RW;
> +	case FS_IOC_FIEMAP:
> +	case FIBMAP:
> +		/*
> +		 * FS_IOC_FIEMAP and FIBMAP query information about the
> +		 * allocation of blocks within a file.  They are permitted for
> +		 * files which are opened with LANDLOCK_ACCESS_FS_READ_FILE or
> +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> +		 */
> +		fallthrough;
> +	case FIDEDUPERANGE:
> +	case FICLONE:
> +	case FICLONERANGE:
> +		/*
> +		 * FIDEDUPERANGE, FICLONE and FICLONERANGE make files share
> +		 * their underlying storage ("reflink") between source and
> +		 * destination FDs, on file systems which support that.
> +		 *
> +		 * The underlying implementations are already checking whether
> +		 * the involved files are opened with the appropriate read/write
> +		 * modes.  We rely on this being implemented correctly.
> +		 *
> +		 * These IOCTLs are permitted for files which are opened with
> +		 * LANDLOCK_ACCESS_FS_READ_FILE or
> +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> +		 */
> +		fallthrough;
> +	case FS_IOC_RESVSP:
> +	case FS_IOC_RESVSP64:
> +	case FS_IOC_UNRESVSP:
> +	case FS_IOC_UNRESVSP64:
> +	case FS_IOC_ZERO_RANGE:
> +		/*
> +		 * These IOCTLs reserve space, or create holes like
> +		 * fallocate(2).  We rely on the implementations checking the
> +		 * files' read/write modes.
> +		 *
> +		 * These IOCTLs are permitted for files which are opened with
> +		 * LANDLOCK_ACCESS_FS_READ_FILE or
> +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> +		 */

These 10 commands only make sense on directories, so we could also
always allow them on file descriptors.

> +		return LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
> +	default:
> +		/*
> +		 * Other commands are guarded by the catch-all access right.
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL;
> +	}
> +}
> +
> +/**
> + * expand_ioctl() - Return the dst flags from either the src flag or the
> + * %LANDLOCK_ACCESS_FS_IOCTL flag, depending on whether the
> + * %LANDLOCK_ACCESS_FS_IOCTL and src access rights are handled or not.
> + *
> + * @handled: Handled access rights.
> + * @access: The access mask to copy values from.
> + * @src: A single access right to copy from in @access.
> + * @dst: One or more access rights to copy to.
> + *
> + * Returns: @dst, or 0.
> + */
> +static __attribute_const__ access_mask_t
> +expand_ioctl(const access_mask_t handled, const access_mask_t access,
> +	     const access_mask_t src, const access_mask_t dst)
> +{
> +	access_mask_t copy_from;
> +
> +	if (!(handled & LANDLOCK_ACCESS_FS_IOCTL))
> +		return 0;
> +
> +	copy_from = (handled & src) ? src : LANDLOCK_ACCESS_FS_IOCTL;
> +	if (access & copy_from)
> +		return dst;
> +
> +	return 0;
> +}
> +
> +/**
> + * landlock_expand_access_fs() - Returns @access with the synthetic IOCTL group
> + * flags enabled if necessary.
> + *
> + * @handled: Handled FS access rights.
> + * @access: FS access rights to expand.
> + *
> + * Returns: @access expanded by the necessary flags for the synthetic IOCTL
> + * access rights.
> + */
> +static __attribute_const__ access_mask_t landlock_expand_access_fs(
> +	const access_mask_t handled, const access_mask_t access)
> +{
> +	return access |
> +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_WRITE_FILE,
> +			    LANDLOCK_ACCESS_FS_IOCTL_RW |
> +				    LANDLOCK_ACCESS_FS_IOCTL_RW_FILE) |
> +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_READ_FILE,
> +			    LANDLOCK_ACCESS_FS_IOCTL_RW |
> +				    LANDLOCK_ACCESS_FS_IOCTL_RW_FILE) |
> +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_READ_DIR,
> +			    LANDLOCK_ACCESS_FS_IOCTL_RW);
> +}
> +
> +/**
> + * landlock_expand_handled_access_fs() - add synthetic IOCTL access rights to an
> + * access mask of handled accesses.
> + *
> + * @handled: The handled accesses of a ruleset that is being created.
> + *
> + * Returns: @handled, with the bits for the synthetic IOCTL access rights set,
> + * if %LANDLOCK_ACCESS_FS_IOCTL is handled.
> + */
> +__attribute_const__ access_mask_t
> +landlock_expand_handled_access_fs(const access_mask_t handled)
> +{
> +	return landlock_expand_access_fs(handled, handled);
> +}
> +
>  /* Ruleset management */
>  
>  static struct landlock_object *get_inode_object(struct inode *const inode)
> @@ -148,7 +331,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
>  	LANDLOCK_ACCESS_FS_EXECUTE | \
>  	LANDLOCK_ACCESS_FS_WRITE_FILE | \
>  	LANDLOCK_ACCESS_FS_READ_FILE | \
> -	LANDLOCK_ACCESS_FS_TRUNCATE)
> +	LANDLOCK_ACCESS_FS_TRUNCATE | \
> +	LANDLOCK_ACCESS_FS_IOCTL)
>  /* clang-format on */
>  
>  /*
> @@ -158,6 +342,7 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
>  			    const struct path *const path,
>  			    access_mask_t access_rights)
>  {
> +	access_mask_t handled;
>  	int err;
>  	struct landlock_id id = {
>  		.type = LANDLOCK_KEY_INODE,
> @@ -170,9 +355,11 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
>  	if (WARN_ON_ONCE(ruleset->num_layers != 1))
>  		return -EINVAL;
>  
> +	handled = landlock_get_fs_access_mask(ruleset, 0);
> +	/* Expands the synthetic IOCTL groups. */
> +	access_rights |= landlock_expand_access_fs(handled, access_rights);
>  	/* Transforms relative access rights to absolute ones. */
> -	access_rights |= LANDLOCK_MASK_ACCESS_FS &
> -			 ~landlock_get_fs_access_mask(ruleset, 0);
> +	access_rights |= LANDLOCK_MASK_ACCESS_FS & ~handled;
>  	id.key.object = get_inode_object(d_backing_inode(path->dentry));
>  	if (IS_ERR(id.key.object))
>  		return PTR_ERR(id.key.object);
> @@ -1333,7 +1520,9 @@ static int hook_file_open(struct file *const file)
>  {
>  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
>  	access_mask_t open_access_request, full_access_request, allowed_access;
> -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> +	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE |
> +					      LANDLOCK_ACCESS_FS_IOCTL |
> +					      IOCTL_GROUPS;
>  	const struct landlock_ruleset *const dom = get_current_fs_domain();
>  
>  	if (!dom)

We should set optional_access according to the file type before
`full_access_request = open_access_request | optional_access;`

const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);

optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
if (is_device)
    optional_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;


Because LANDLOCK_ACCESS_FS_IOCTL_DEV is dedicated to character or block
devices, we may want landlock_add_rule() to only allow this access right
to be tied to directories, or character devices, or block devices.  Even
if it would be more consistent with constraints on directory-only access
rights, I'm not sure about that.


> @@ -1375,6 +1564,16 @@ static int hook_file_open(struct file *const file)
>  		}
>  	}
>  
> +	/*
> +	 * Named pipes should be treated just like anonymous pipes.
> +	 * Therefore, we permit all IOCTLs on them.
> +	 */
> +	if (S_ISFIFO(file_inode(file)->i_mode)) {
> +		allowed_access |= LANDLOCK_ACCESS_FS_IOCTL |
> +				  LANDLOCK_ACCESS_FS_IOCTL_RW |
> +				  LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
> +	}

Instead of this S_ISFIFO check:

if (!is_device)
    allowed_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;

> +
>  	/*
>  	 * For operations on already opened files (i.e. ftruncate()), it is the
>  	 * access rights at the time of open() which decide whether the
> @@ -1406,6 +1605,25 @@ static int hook_file_truncate(struct file *const file)
>  	return -EACCES;
>  }
>  
> +static int hook_file_ioctl(struct file *file, unsigned int cmd,
> +			   unsigned long arg)
> +{
> +	const access_mask_t required_access = get_required_ioctl_access(cmd);

const access_mask_t required_access = LANDLOCK_ACCESS_FS_IOCTL_DEV;


> +	const access_mask_t allowed_access =
> +		landlock_file(file)->allowed_access;
> +
> +	/*
> +	 * It is the access rights at the time of opening the file which
> +	 * determine whether IOCTL can be used on the opened file later.
> +	 *
> +	 * The access right is attached to the opened file in hook_file_open().
> +	 */
> +	if ((allowed_access & required_access) == required_access)
> +		return 0;

We could then check against the do_vfs_ioctl()'s commands, excluding
FIONREAD and file_ioctl()'s commands, to always allow VFS-related
commands:

if (vfs_masked_device_ioctl(cmd))
    return 0;

As a safeguard, we could define vfs_masked_device_ioctl(cmd) in
fs/ioctl.c and make it called by do_vfs_ioctl() as a safeguard to make
sure we keep an accurate list of VFS IOCTL commands (see next RFC patch).

The compat IOCTL hook must also be implemented.

What do you think? Any better idea?


> +
> +	return -EACCES;
> +}
> +
>  static struct security_hook_list landlock_hooks[] __ro_after_init = {
>  	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
>  
> @@ -1428,6 +1646,7 @@ static struct security_hook_list landlock_hooks[] __ro_after_init = {
Günther Noack Feb. 19, 2024, 9:44 p.m. UTC | #11
Hello!

On Sun, Feb 18, 2024 at 09:34:39AM +0100, Günther Noack wrote:
> On Fri, Feb 16, 2024 at 04:51:40PM +0100, Mickaël Salaün wrote:
> > On Fri, Feb 16, 2024 at 03:11:18PM +0100, Mickaël Salaün wrote:
> > > What about /proc/*/fd/* ? We can test with open_proc_fd() to make sure
> > > our assumptions are correct.
> > 
> > Actually, these fifo and socket checks (and related optimizations)
> > should already be handled with is_nouser_or_private() called by
> > is_access_to_paths_allowed(). Some new dedicated tests should help
> > though.
> 
> I am generally a bit confused about how opening /proc/*/fd/* works.
> 
> Specifically:
> 
> * Do we have to worry about the scenario where the file_open hook gets
>   called with the same struct file* twice (overwriting the access
>   rights)?
> 
> * I had trouble finding the place in fs/proc/ where the re-opening is
>   implemented.
> 
> Do you happen to understand this in more detail?  At what point do the
> re-opened files start sharing the same kernel objects?  Is that at the
> inode level?

FYI, I figured it out —

 - every call to open(2) results in a new struct file
 - the resulting struct file refers to an existing inode
 - this is not supported for all inode types;
   a rough categorization happens in inode.c:init_special_inode()

The open(2) syscall creates a struct file and populates it
based on the origin fd's underlying inode through the .open function
in file_operations.

The procfs implementation for the lookup is in proc_pid_get_link /
proc_fd_link on the proc side.  It patches up the current task's
nameidata struct as a side effect by calling nd_jump_link().

For reference, I described this it in more detail at
https://blog.gnoack.org/post/proc-fd-is-not-dup/.

—Günther

--
Günther Noack Feb. 28, 2024, 12:57 p.m. UTC | #12
Hello Mickaël!

On Mon, Feb 19, 2024 at 07:34:42PM +0100, Mickaël Salaün wrote:
> Arn, Christian, please take a look at the following RFC patch and the
> rationale explained here.
> 
> On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> > Introduces the LANDLOCK_ACCESS_FS_IOCTL access right
> > and increments the Landlock ABI version to 5.
> > 
> > Like the truncate right, these rights are associated with a file
> > descriptor at the time of open(2), and get respected even when the
> > file descriptor is used outside of the thread which it was originally
> > opened in.
> > 
> > A newly enabled Landlock policy therefore does not apply to file
> > descriptors which are already open.
> > 
> > If the LANDLOCK_ACCESS_FS_IOCTL right is handled, only a small number
> > of safe IOCTL commands will be permitted on newly opened files.  The
> > permitted IOCTLs can be configured through the ruleset in limited ways
> > now.  (See documentation for details.)
> > 
> > Specifically, when LANDLOCK_ACCESS_FS_IOCTL is handled, granting this
> > right on a file or directory will *not* permit to do all IOCTL
> > commands, but only influence the IOCTL commands which are not already
> > handled through other access rights.  The intent is to keep the groups
> > of IOCTL commands more fine-grained.
> > 
> > Noteworthy scenarios which require special attention:
> > 
> > TTY devices are often passed into a process from the parent process,
> > and so a newly enabled Landlock policy does not retroactively apply to
> > them automatically.  In the past, TTY devices have often supported
> > IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
> > letting callers control the TTY input buffer (and simulate
> > keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
> > modern kernels though.
> > 
> > Some legitimate file system features, like setting up fscrypt, are
> > exposed as IOCTL commands on regular files and directories -- users of
> > Landlock are advised to double check that the sandboxed process does
> > not need to invoke these IOCTLs.
> 
> I think we really need to allow fscrypt and fs-verity IOCTLs.
> 
> > 
> > Known limitations:
> > 
> > The LANDLOCK_ACCESS_FS_IOCTL access right is a coarse-grained control
> > over IOCTL commands.  Future work will enable a more fine-grained
> > access control for IOCTLs.
> > 
> > In the meantime, Landlock users may use path-based restrictions in
> > combination with their knowledge about the file system layout to
> > control what IOCTLs can be done.  Mounting file systems with the nodev
> > option can help to distinguish regular files and devices, and give
> > guarantees about the affected files, which Landlock alone can not give
> > yet.
> 
> I had a second though about our current approach, and it looks like we
> can do simpler, more generic, and with less IOCTL commands specific
> handling.
> 
> What we didn't take into account is that an IOCTL needs an opened file,
> which means that the caller must already have been allowed to open this
> file in read or write mode.
> 
> I think most FS-specific IOCTL commands check access rights (i.e. access
> mode or required capability), other than implicit ones (at least read or
> write), when appropriate.  We don't get such guarantee with device
> drivers.
> 
> The main threat is IOCTLs on character or block devices because their
> impact may be unknown (if we only look at the IOCTL command, not the
> backing file), but we should allow IOCTLs on filesystems (e.g. fscrypt,
> fs-verity, clone extents).  I think we should only implement a
> LANDLOCK_ACCESS_FS_IOCTL_DEV right, which would be more explicit.  This
> change would impact the IOCTLs grouping (not required anymore), but
> we'll still need the list of VFS IOCTLs.


I am fine with dropping the IOCTL grouping and going for this simpler approach.

This must have been a misunderstanding - I thought you wanted to align the
access checks in Landlock with the ones done by the kernel already, so that we
can reason about it more locally.  But I'm fine with doing it just for device
files as well, if that is what it takes.  It's definitely simpler.

Before I jump into the implementation, let me paraphrase your proposal to make
sure I understood it correctly:

 * We *only* introduce the LANDLOCK_ACCESS_FS_IOCTL_DEV right.

 * This access right governs the use of nontrivial IOCTL commands on
   character and block device files.

   * On open()ed files which are not character or block devices,
     all IOCTL commands keep working.

     This includes pipes and sockets, but also a variety of "anonymous" file
     types which are possibly openable through /proc/self/*/fd/*?

 * The trivial IOCTL commands are identified using the proposed function
   vfs_masked_device_ioctl().

   * For these commands, the implementations are in fs/ioctl.c, except for
     FIONREAD, in some cases.  We trust these implementations to check the
     file's type (dir/regular) and access rights (r/w) correctly.


Open questions I have:

* What about files which are neither devices nor regular files or directories?

  The obvious ones which can be open()ed are pipes, where only FIONREAND and two
  harmless-looking watch queue IOCTLs are implemented.

  But then I think that /proc/*/fd/* is a way through which other non-device
  files can become accessible?  What do we do for these?  (I am getting EACCES
  when trying to open some anon_inodes that way... is this something we can
  count on?)

* How did you come up with the list in vfs_masked_device_ioctl()?  I notice that
  some of these are from the switch() statement we had before, but not all of
  them are included.

  I can kind of see that for the fallocate()-like ones and for FIBMAP, because
  these **only** make sense for regular files, and IOCTLs on regular files are
  permitted anyway.

* What do we do for FIONREAD?  Your patch says that it should be forwarded to
  device implementations.  But technically, devices can implement all kinds of
  surprising behaviour for that.

  If you look at the ioctl implementations of different drivers, you can very
  quickly find a surprising amount of things that happen completely independent
  of the IOCTL command.  (Some implementations are acquiring locks and other
  resources before they even check what the cmd value is. - and we would be
  exposing that if we let devices handle FIONREAD).


Please let me know whether I understood you correctly there.

Regarding the implementation notes you left below, I think they mostly derive
from the *_IOCTL_DEV approach in a direct way.


> > +static __attribute_const__ access_mask_t
> > +get_required_ioctl_access(const unsigned int cmd)
> > +{
> > +	switch (cmd) {
> > +	case FIOCLEX:
> > +	case FIONCLEX:
> > +	case FIONBIO:
> > +	case FIOASYNC:
> > +		/*
> > +		 * FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC manipulate the FD's
> > +		 * close-on-exec and the file's buffered-IO and async flags.
> > +		 * These operations are also available through fcntl(2), and are
> > +		 * unconditionally permitted in Landlock.
> > +		 */
> > +		return 0;
> > +	case FIONREAD:
> > +	case FIOQSIZE:
> > +	case FIGETBSZ:
> > +		/*
> > +		 * FIONREAD returns the number of bytes available for reading.
> > +		 * FIONREAD returns the number of immediately readable bytes for
> > +		 * a file.
> > +		 *
> > +		 * FIOQSIZE queries the size of a file or directory.
> > +		 *
> > +		 * FIGETBSZ queries the file system's block size for a file or
> > +		 * directory.
> > +		 *
> > +		 * These IOCTL commands are permitted for files which are opened
> > +		 * with LANDLOCK_ACCESS_FS_READ_DIR,
> > +		 * LANDLOCK_ACCESS_FS_READ_FILE, or
> > +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> > +		 */
> 
> Because files or directories can only be opened with
> LANDLOCK_ACCESS_FS_{READ,WRITE}_{FILE,DIR}, and because IOCTLs can only
> be sent on a file descriptor, this means that we can always allow these
> 3 commands (for opened files).
> 
> > +		return LANDLOCK_ACCESS_FS_IOCTL_RW;
> > +	case FS_IOC_FIEMAP:
> > +	case FIBMAP:
> > +		/*
> > +		 * FS_IOC_FIEMAP and FIBMAP query information about the
> > +		 * allocation of blocks within a file.  They are permitted for
> > +		 * files which are opened with LANDLOCK_ACCESS_FS_READ_FILE or
> > +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> > +		 */
> > +		fallthrough;
> > +	case FIDEDUPERANGE:
> > +	case FICLONE:
> > +	case FICLONERANGE:
> > +		/*
> > +		 * FIDEDUPERANGE, FICLONE and FICLONERANGE make files share
> > +		 * their underlying storage ("reflink") between source and
> > +		 * destination FDs, on file systems which support that.
> > +		 *
> > +		 * The underlying implementations are already checking whether
> > +		 * the involved files are opened with the appropriate read/write
> > +		 * modes.  We rely on this being implemented correctly.
> > +		 *
> > +		 * These IOCTLs are permitted for files which are opened with
> > +		 * LANDLOCK_ACCESS_FS_READ_FILE or
> > +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> > +		 */
> > +		fallthrough;
> > +	case FS_IOC_RESVSP:
> > +	case FS_IOC_RESVSP64:
> > +	case FS_IOC_UNRESVSP:
> > +	case FS_IOC_UNRESVSP64:
> > +	case FS_IOC_ZERO_RANGE:
> > +		/*
> > +		 * These IOCTLs reserve space, or create holes like
> > +		 * fallocate(2).  We rely on the implementations checking the
> > +		 * files' read/write modes.
> > +		 *
> > +		 * These IOCTLs are permitted for files which are opened with
> > +		 * LANDLOCK_ACCESS_FS_READ_FILE or
> > +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> > +		 */
> 
> These 10 commands only make sense on directories, so we could also
> always allow them on file descriptors.

I imagine that's a typo?  The commands above do make sense on regular files.


> > +		return LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
> > +	default:
> > +		/*
> > +		 * Other commands are guarded by the catch-all access right.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL;
> > +	}
> > +}
> > +
> > +/**
> > + * expand_ioctl() - Return the dst flags from either the src flag or the
> > + * %LANDLOCK_ACCESS_FS_IOCTL flag, depending on whether the
> > + * %LANDLOCK_ACCESS_FS_IOCTL and src access rights are handled or not.
> > + *
> > + * @handled: Handled access rights.
> > + * @access: The access mask to copy values from.
> > + * @src: A single access right to copy from in @access.
> > + * @dst: One or more access rights to copy to.
> > + *
> > + * Returns: @dst, or 0.
> > + */
> > +static __attribute_const__ access_mask_t
> > +expand_ioctl(const access_mask_t handled, const access_mask_t access,
> > +	     const access_mask_t src, const access_mask_t dst)
> > +{
> > +	access_mask_t copy_from;
> > +
> > +	if (!(handled & LANDLOCK_ACCESS_FS_IOCTL))
> > +		return 0;
> > +
> > +	copy_from = (handled & src) ? src : LANDLOCK_ACCESS_FS_IOCTL;
> > +	if (access & copy_from)
> > +		return dst;
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * landlock_expand_access_fs() - Returns @access with the synthetic IOCTL group
> > + * flags enabled if necessary.
> > + *
> > + * @handled: Handled FS access rights.
> > + * @access: FS access rights to expand.
> > + *
> > + * Returns: @access expanded by the necessary flags for the synthetic IOCTL
> > + * access rights.
> > + */
> > +static __attribute_const__ access_mask_t landlock_expand_access_fs(
> > +	const access_mask_t handled, const access_mask_t access)
> > +{
> > +	return access |
> > +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_WRITE_FILE,
> > +			    LANDLOCK_ACCESS_FS_IOCTL_RW |
> > +				    LANDLOCK_ACCESS_FS_IOCTL_RW_FILE) |
> > +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_READ_FILE,
> > +			    LANDLOCK_ACCESS_FS_IOCTL_RW |
> > +				    LANDLOCK_ACCESS_FS_IOCTL_RW_FILE) |
> > +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_READ_DIR,
> > +			    LANDLOCK_ACCESS_FS_IOCTL_RW);
> > +}
> > +
> > +/**
> > + * landlock_expand_handled_access_fs() - add synthetic IOCTL access rights to an
> > + * access mask of handled accesses.
> > + *
> > + * @handled: The handled accesses of a ruleset that is being created.
> > + *
> > + * Returns: @handled, with the bits for the synthetic IOCTL access rights set,
> > + * if %LANDLOCK_ACCESS_FS_IOCTL is handled.
> > + */
> > +__attribute_const__ access_mask_t
> > +landlock_expand_handled_access_fs(const access_mask_t handled)
> > +{
> > +	return landlock_expand_access_fs(handled, handled);
> > +}
> > +
> >  /* Ruleset management */
> >  
> >  static struct landlock_object *get_inode_object(struct inode *const inode)
> > @@ -148,7 +331,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
> >  	LANDLOCK_ACCESS_FS_EXECUTE | \
> >  	LANDLOCK_ACCESS_FS_WRITE_FILE | \
> >  	LANDLOCK_ACCESS_FS_READ_FILE | \
> > -	LANDLOCK_ACCESS_FS_TRUNCATE)
> > +	LANDLOCK_ACCESS_FS_TRUNCATE | \
> > +	LANDLOCK_ACCESS_FS_IOCTL)
> >  /* clang-format on */
> >  
> >  /*
> > @@ -158,6 +342,7 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
> >  			    const struct path *const path,
> >  			    access_mask_t access_rights)
> >  {
> > +	access_mask_t handled;
> >  	int err;
> >  	struct landlock_id id = {
> >  		.type = LANDLOCK_KEY_INODE,
> > @@ -170,9 +355,11 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
> >  	if (WARN_ON_ONCE(ruleset->num_layers != 1))
> >  		return -EINVAL;
> >  
> > +	handled = landlock_get_fs_access_mask(ruleset, 0);
> > +	/* Expands the synthetic IOCTL groups. */
> > +	access_rights |= landlock_expand_access_fs(handled, access_rights);
> >  	/* Transforms relative access rights to absolute ones. */
> > -	access_rights |= LANDLOCK_MASK_ACCESS_FS &
> > -			 ~landlock_get_fs_access_mask(ruleset, 0);
> > +	access_rights |= LANDLOCK_MASK_ACCESS_FS & ~handled;
> >  	id.key.object = get_inode_object(d_backing_inode(path->dentry));
> >  	if (IS_ERR(id.key.object))
> >  		return PTR_ERR(id.key.object);
> > @@ -1333,7 +1520,9 @@ static int hook_file_open(struct file *const file)
> >  {
> >  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> >  	access_mask_t open_access_request, full_access_request, allowed_access;
> > -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > +	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE |
> > +					      LANDLOCK_ACCESS_FS_IOCTL |
> > +					      IOCTL_GROUPS;
> >  	const struct landlock_ruleset *const dom = get_current_fs_domain();
> >  
> >  	if (!dom)
> 
> We should set optional_access according to the file type before
> `full_access_request = open_access_request | optional_access;`
> 
> const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
> 
> optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> if (is_device)
>     optional_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;
> 
> 
> Because LANDLOCK_ACCESS_FS_IOCTL_DEV is dedicated to character or block
> devices, we may want landlock_add_rule() to only allow this access right
> to be tied to directories, or character devices, or block devices.  Even
> if it would be more consistent with constraints on directory-only access
> rights, I'm not sure about that.
> 
> 
> > @@ -1375,6 +1564,16 @@ static int hook_file_open(struct file *const file)
> >  		}
> >  	}
> >  
> > +	/*
> > +	 * Named pipes should be treated just like anonymous pipes.
> > +	 * Therefore, we permit all IOCTLs on them.
> > +	 */
> > +	if (S_ISFIFO(file_inode(file)->i_mode)) {
> > +		allowed_access |= LANDLOCK_ACCESS_FS_IOCTL |
> > +				  LANDLOCK_ACCESS_FS_IOCTL_RW |
> > +				  LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
> > +	}
> 
> Instead of this S_ISFIFO check:
> 
> if (!is_device)
>     allowed_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;
> 
> > +
> >  	/*
> >  	 * For operations on already opened files (i.e. ftruncate()), it is the
> >  	 * access rights at the time of open() which decide whether the
> > @@ -1406,6 +1605,25 @@ static int hook_file_truncate(struct file *const file)
> >  	return -EACCES;
> >  }
> >  
> > +static int hook_file_ioctl(struct file *file, unsigned int cmd,
> > +			   unsigned long arg)
> > +{
> > +	const access_mask_t required_access = get_required_ioctl_access(cmd);
> 
> const access_mask_t required_access = LANDLOCK_ACCESS_FS_IOCTL_DEV;
> 
> 
> > +	const access_mask_t allowed_access =
> > +		landlock_file(file)->allowed_access;
> > +
> > +	/*
> > +	 * It is the access rights at the time of opening the file which
> > +	 * determine whether IOCTL can be used on the opened file later.
> > +	 *
> > +	 * The access right is attached to the opened file in hook_file_open().
> > +	 */
> > +	if ((allowed_access & required_access) == required_access)
> > +		return 0;
> 
> We could then check against the do_vfs_ioctl()'s commands, excluding
> FIONREAD and file_ioctl()'s commands, to always allow VFS-related
> commands:
> 
> if (vfs_masked_device_ioctl(cmd))
>     return 0;
> 
> As a safeguard, we could define vfs_masked_device_ioctl(cmd) in
> fs/ioctl.c and make it called by do_vfs_ioctl() as a safeguard to make
> sure we keep an accurate list of VFS IOCTL commands (see next RFC patch).


> The compat IOCTL hook must also be implemented.

Thanks!  I can't believe I missed that one.


> What do you think? Any better idea?

It seems like a reasonable approach.  I'd like to double check with you that we
are on the same page about it before doing the next implementation step.  (These
iterations seems cheaper when we do them in English than when we do them in C.)

Thanks for the review!
—Günther
Mickaël Salaün March 1, 2024, 12:59 p.m. UTC | #13
On Wed, Feb 28, 2024 at 01:57:42PM +0100, Günther Noack wrote:
> Hello Mickaël!
> 
> On Mon, Feb 19, 2024 at 07:34:42PM +0100, Mickaël Salaün wrote:
> > Arn, Christian, please take a look at the following RFC patch and the
> > rationale explained here.
> > 
> > On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> > > Introduces the LANDLOCK_ACCESS_FS_IOCTL access right
> > > and increments the Landlock ABI version to 5.
> > > 
> > > Like the truncate right, these rights are associated with a file
> > > descriptor at the time of open(2), and get respected even when the
> > > file descriptor is used outside of the thread which it was originally
> > > opened in.
> > > 
> > > A newly enabled Landlock policy therefore does not apply to file
> > > descriptors which are already open.
> > > 
> > > If the LANDLOCK_ACCESS_FS_IOCTL right is handled, only a small number
> > > of safe IOCTL commands will be permitted on newly opened files.  The
> > > permitted IOCTLs can be configured through the ruleset in limited ways
> > > now.  (See documentation for details.)
> > > 
> > > Specifically, when LANDLOCK_ACCESS_FS_IOCTL is handled, granting this
> > > right on a file or directory will *not* permit to do all IOCTL
> > > commands, but only influence the IOCTL commands which are not already
> > > handled through other access rights.  The intent is to keep the groups
> > > of IOCTL commands more fine-grained.
> > > 
> > > Noteworthy scenarios which require special attention:
> > > 
> > > TTY devices are often passed into a process from the parent process,
> > > and so a newly enabled Landlock policy does not retroactively apply to
> > > them automatically.  In the past, TTY devices have often supported
> > > IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
> > > letting callers control the TTY input buffer (and simulate
> > > keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
> > > modern kernels though.
> > > 
> > > Some legitimate file system features, like setting up fscrypt, are
> > > exposed as IOCTL commands on regular files and directories -- users of
> > > Landlock are advised to double check that the sandboxed process does
> > > not need to invoke these IOCTLs.
> > 
> > I think we really need to allow fscrypt and fs-verity IOCTLs.
> > 
> > > 
> > > Known limitations:
> > > 
> > > The LANDLOCK_ACCESS_FS_IOCTL access right is a coarse-grained control
> > > over IOCTL commands.  Future work will enable a more fine-grained
> > > access control for IOCTLs.
> > > 
> > > In the meantime, Landlock users may use path-based restrictions in
> > > combination with their knowledge about the file system layout to
> > > control what IOCTLs can be done.  Mounting file systems with the nodev
> > > option can help to distinguish regular files and devices, and give
> > > guarantees about the affected files, which Landlock alone can not give
> > > yet.
> > 
> > I had a second though about our current approach, and it looks like we
> > can do simpler, more generic, and with less IOCTL commands specific
> > handling.
> > 
> > What we didn't take into account is that an IOCTL needs an opened file,
> > which means that the caller must already have been allowed to open this
> > file in read or write mode.
> > 
> > I think most FS-specific IOCTL commands check access rights (i.e. access
> > mode or required capability), other than implicit ones (at least read or
> > write), when appropriate.  We don't get such guarantee with device
> > drivers.
> > 
> > The main threat is IOCTLs on character or block devices because their
> > impact may be unknown (if we only look at the IOCTL command, not the
> > backing file), but we should allow IOCTLs on filesystems (e.g. fscrypt,
> > fs-verity, clone extents).  I think we should only implement a
> > LANDLOCK_ACCESS_FS_IOCTL_DEV right, which would be more explicit.  This
> > change would impact the IOCTLs grouping (not required anymore), but
> > we'll still need the list of VFS IOCTLs.
> 
> 
> I am fine with dropping the IOCTL grouping and going for this simpler approach.
> 
> This must have been a misunderstanding - I thought you wanted to align the
> access checks in Landlock with the ones done by the kernel already, so that we
> can reason about it more locally.  But I'm fine with doing it just for device
> files as well, if that is what it takes.  It's definitely simpler.

I still think we should align existing Landlock access rights with the VFS IOCTL
semantic (i.e. mostly defined in do_vfs_ioctl(), but also in the compat
ioctl syscall).  However, according to our investigations and
discussions, it looks like the groups we defined should already be
enforced by the VFS code, which means we should not need such groups
after all.  My last proposal is to still delegate access for VFS IOCTLs
to the current Landlock access rights, but it doesn't seem required to
add specific access check if we are able to identify these VFS IOCTLs.

> 
> Before I jump into the implementation, let me paraphrase your proposal to make
> sure I understood it correctly:
> 
>  * We *only* introduce the LANDLOCK_ACCESS_FS_IOCTL_DEV right.

Yes

> 
>  * This access right governs the use of nontrivial IOCTL commands on
>    character and block device files.
> 
>    * On open()ed files which are not character or block devices,
>      all IOCTL commands keep working.

Yes

> 
>      This includes pipes and sockets, but also a variety of "anonymous" file
>      types which are possibly openable through /proc/self/*/fd/*?

Indeed, and we should document that. It should be noted that these
"anonymous" file types only comes from dedicated syscalls (which are not
currently controlled by Landlock) or from this synthetic proc interface.
One thing to keep in mind is that /proc/*/fd/* can only be opened on
tasks under the same sandbox (or a child one), so we should consider
that they are explicitly allowed by the policy the same way
pre-sandboxed inherited file descriptors are.

It might be interesting to list a few of such anonymous file types.  Are
there any that can act on global resources (like block/char devices
can)?

I also think that most anonymous file types should check for FD's read
and write mode when it makes sense (which is not the case for most
block/char IOCTLs), but I might be wrong.

I think this LANDLOCK_ACCESS_FS_IOCTL_DEV design would be good for now,
and probably enough for most use cases.  This would fill a major gap in
an easy-to-understand-and-document way.

> 
>  * The trivial IOCTL commands are identified using the proposed function
>    vfs_masked_device_ioctl().
> 
>    * For these commands, the implementations are in fs/ioctl.c, except for
>      FIONREAD, in some cases.  We trust these implementations to check the
>      file's type (dir/regular) and access rights (r/w) correctly.

FIONREAD is explicitly not part of vfs_masked_device_ioctl() because it
is only defined for regular files (and forwarded to the underlying
implementation otherwise), hence the "masked_device" name. If the
underlying filesystem handles this IOCTL command for directory that's
fine, and we don't need explicit exception.

> 
> 
> Open questions I have:
> 
> * What about files which are neither devices nor regular files or directories?
> 
>   The obvious ones which can be open()ed are pipes, where only FIONREAND and two
>   harmless-looking watch queue IOCTLs are implemented.
> 
>   But then I think that /proc/*/fd/* is a way through which other non-device
>   files can become accessible?  What do we do for these?  (I am getting EACCES
>   when trying to open some anon_inodes that way... is this something we can
>   count on?)

As explained above, /proc/*/fd/* is already restricted per sandbox
scopes, which seem enough.

> 
> * How did you come up with the list in vfs_masked_device_ioctl()?  I notice that
>   some of these are from the switch() statement we had before, but not all of
>   them are included.
> 
>   I can kind of see that for the fallocate()-like ones and for FIBMAP, because
>   these **only** make sense for regular files, and IOCTLs on regular files are
>   permitted anyway.

I took inspiration from get_required_ioctl_access(), and built this list
looking at which of the VFS IOCTLs go through the VFS implementation
(mostly do_vfs_ioctl() but also the compat syscall) for IOCTL requests
on *block and character devices*.

The initial assumption is that file systems cannot implement block nor
character device IOCTLs, which is why this approach seems safe and
consistent.

> 
> * What do we do for FIONREAD?  Your patch says that it should be forwarded to
>   device implementations.  But technically, devices can implement all kinds of
>   surprising behaviour for that.

FIONREAD should always be allowed for non-device files (which means on
allowed-to-be-opened and non-device files), and controlled with
LANDLOCK_ACCESS_FS_IOCTL_DEV for character and block devices.

> 
>   If you look at the ioctl implementations of different drivers, you can very
>   quickly find a surprising amount of things that happen completely independent
>   of the IOCTL command.  (Some implementations are acquiring locks and other
>   resources before they even check what the cmd value is. - and we would be
>   exposing that if we let devices handle FIONREAD).

Correct, which is why FIONREAD on devices should be controlled by
LANDLOCK_ACCESS_FS_IOCTL_DEV.  See my previous email (below) with the
"is_device" checks.

> 
> 
> Please let me know whether I understood you correctly there.

I think so, but I guess you missed the "is_device" part.

> 
> Regarding the implementation notes you left below, I think they mostly derive
> from the *_IOCTL_DEV approach in a direct way.

Yes

> 
> 
> > > +static __attribute_const__ access_mask_t
> > > +get_required_ioctl_access(const unsigned int cmd)
> > > +{
> > > +	switch (cmd) {
> > > +	case FIOCLEX:
> > > +	case FIONCLEX:
> > > +	case FIONBIO:
> > > +	case FIOASYNC:
> > > +		/*
> > > +		 * FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC manipulate the FD's
> > > +		 * close-on-exec and the file's buffered-IO and async flags.
> > > +		 * These operations are also available through fcntl(2), and are
> > > +		 * unconditionally permitted in Landlock.
> > > +		 */
> > > +		return 0;
> > > +	case FIONREAD:
> > > +	case FIOQSIZE:
> > > +	case FIGETBSZ:
> > > +		/*
> > > +		 * FIONREAD returns the number of bytes available for reading.
> > > +		 * FIONREAD returns the number of immediately readable bytes for
> > > +		 * a file.
> > > +		 *
> > > +		 * FIOQSIZE queries the size of a file or directory.
> > > +		 *
> > > +		 * FIGETBSZ queries the file system's block size for a file or
> > > +		 * directory.
> > > +		 *
> > > +		 * These IOCTL commands are permitted for files which are opened
> > > +		 * with LANDLOCK_ACCESS_FS_READ_DIR,
> > > +		 * LANDLOCK_ACCESS_FS_READ_FILE, or
> > > +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> > > +		 */
> > 
> > Because files or directories can only be opened with
> > LANDLOCK_ACCESS_FS_{READ,WRITE}_{FILE,DIR}, and because IOCTLs can only
> > be sent on a file descriptor, this means that we can always allow these
> > 3 commands (for opened files).
> > 
> > > +		return LANDLOCK_ACCESS_FS_IOCTL_RW;
> > > +	case FS_IOC_FIEMAP:
> > > +	case FIBMAP:
> > > +		/*
> > > +		 * FS_IOC_FIEMAP and FIBMAP query information about the
> > > +		 * allocation of blocks within a file.  They are permitted for
> > > +		 * files which are opened with LANDLOCK_ACCESS_FS_READ_FILE or
> > > +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> > > +		 */
> > > +		fallthrough;
> > > +	case FIDEDUPERANGE:
> > > +	case FICLONE:
> > > +	case FICLONERANGE:
> > > +		/*
> > > +		 * FIDEDUPERANGE, FICLONE and FICLONERANGE make files share
> > > +		 * their underlying storage ("reflink") between source and
> > > +		 * destination FDs, on file systems which support that.
> > > +		 *
> > > +		 * The underlying implementations are already checking whether
> > > +		 * the involved files are opened with the appropriate read/write
> > > +		 * modes.  We rely on this being implemented correctly.
> > > +		 *
> > > +		 * These IOCTLs are permitted for files which are opened with
> > > +		 * LANDLOCK_ACCESS_FS_READ_FILE or
> > > +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> > > +		 */
> > > +		fallthrough;
> > > +	case FS_IOC_RESVSP:
> > > +	case FS_IOC_RESVSP64:
> > > +	case FS_IOC_UNRESVSP:
> > > +	case FS_IOC_UNRESVSP64:
> > > +	case FS_IOC_ZERO_RANGE:
> > > +		/*
> > > +		 * These IOCTLs reserve space, or create holes like
> > > +		 * fallocate(2).  We rely on the implementations checking the
> > > +		 * files' read/write modes.
> > > +		 *
> > > +		 * These IOCTLs are permitted for files which are opened with
> > > +		 * LANDLOCK_ACCESS_FS_READ_FILE or
> > > +		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
> > > +		 */
> > 
> > These 10 commands only make sense on directories, so we could also
> > always allow them on file descriptors.
> 
> I imagine that's a typo?  The commands above do make sense on regular files.

Yes, I meant they "only make sense on regular files".

> 
> 
> > > +		return LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
> > > +	default:
> > > +		/*
> > > +		 * Other commands are guarded by the catch-all access right.
> > > +		 */
> > > +		return LANDLOCK_ACCESS_FS_IOCTL;
> > > +	}
> > > +}
> > > +
> > > +/**
> > > + * expand_ioctl() - Return the dst flags from either the src flag or the
> > > + * %LANDLOCK_ACCESS_FS_IOCTL flag, depending on whether the
> > > + * %LANDLOCK_ACCESS_FS_IOCTL and src access rights are handled or not.
> > > + *
> > > + * @handled: Handled access rights.
> > > + * @access: The access mask to copy values from.
> > > + * @src: A single access right to copy from in @access.
> > > + * @dst: One or more access rights to copy to.
> > > + *
> > > + * Returns: @dst, or 0.
> > > + */
> > > +static __attribute_const__ access_mask_t
> > > +expand_ioctl(const access_mask_t handled, const access_mask_t access,
> > > +	     const access_mask_t src, const access_mask_t dst)
> > > +{
> > > +	access_mask_t copy_from;
> > > +
> > > +	if (!(handled & LANDLOCK_ACCESS_FS_IOCTL))
> > > +		return 0;
> > > +
> > > +	copy_from = (handled & src) ? src : LANDLOCK_ACCESS_FS_IOCTL;
> > > +	if (access & copy_from)
> > > +		return dst;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * landlock_expand_access_fs() - Returns @access with the synthetic IOCTL group
> > > + * flags enabled if necessary.
> > > + *
> > > + * @handled: Handled FS access rights.
> > > + * @access: FS access rights to expand.
> > > + *
> > > + * Returns: @access expanded by the necessary flags for the synthetic IOCTL
> > > + * access rights.
> > > + */
> > > +static __attribute_const__ access_mask_t landlock_expand_access_fs(
> > > +	const access_mask_t handled, const access_mask_t access)
> > > +{
> > > +	return access |
> > > +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_WRITE_FILE,
> > > +			    LANDLOCK_ACCESS_FS_IOCTL_RW |
> > > +				    LANDLOCK_ACCESS_FS_IOCTL_RW_FILE) |
> > > +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_READ_FILE,
> > > +			    LANDLOCK_ACCESS_FS_IOCTL_RW |
> > > +				    LANDLOCK_ACCESS_FS_IOCTL_RW_FILE) |
> > > +	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_READ_DIR,
> > > +			    LANDLOCK_ACCESS_FS_IOCTL_RW);
> > > +}
> > > +
> > > +/**
> > > + * landlock_expand_handled_access_fs() - add synthetic IOCTL access rights to an
> > > + * access mask of handled accesses.
> > > + *
> > > + * @handled: The handled accesses of a ruleset that is being created.
> > > + *
> > > + * Returns: @handled, with the bits for the synthetic IOCTL access rights set,
> > > + * if %LANDLOCK_ACCESS_FS_IOCTL is handled.
> > > + */
> > > +__attribute_const__ access_mask_t
> > > +landlock_expand_handled_access_fs(const access_mask_t handled)
> > > +{
> > > +	return landlock_expand_access_fs(handled, handled);
> > > +}
> > > +
> > >  /* Ruleset management */
> > >  
> > >  static struct landlock_object *get_inode_object(struct inode *const inode)
> > > @@ -148,7 +331,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
> > >  	LANDLOCK_ACCESS_FS_EXECUTE | \
> > >  	LANDLOCK_ACCESS_FS_WRITE_FILE | \
> > >  	LANDLOCK_ACCESS_FS_READ_FILE | \
> > > -	LANDLOCK_ACCESS_FS_TRUNCATE)
> > > +	LANDLOCK_ACCESS_FS_TRUNCATE | \
> > > +	LANDLOCK_ACCESS_FS_IOCTL)
> > >  /* clang-format on */
> > >  
> > >  /*
> > > @@ -158,6 +342,7 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
> > >  			    const struct path *const path,
> > >  			    access_mask_t access_rights)
> > >  {
> > > +	access_mask_t handled;
> > >  	int err;
> > >  	struct landlock_id id = {
> > >  		.type = LANDLOCK_KEY_INODE,
> > > @@ -170,9 +355,11 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
> > >  	if (WARN_ON_ONCE(ruleset->num_layers != 1))
> > >  		return -EINVAL;
> > >  
> > > +	handled = landlock_get_fs_access_mask(ruleset, 0);
> > > +	/* Expands the synthetic IOCTL groups. */
> > > +	access_rights |= landlock_expand_access_fs(handled, access_rights);
> > >  	/* Transforms relative access rights to absolute ones. */
> > > -	access_rights |= LANDLOCK_MASK_ACCESS_FS &
> > > -			 ~landlock_get_fs_access_mask(ruleset, 0);
> > > +	access_rights |= LANDLOCK_MASK_ACCESS_FS & ~handled;
> > >  	id.key.object = get_inode_object(d_backing_inode(path->dentry));
> > >  	if (IS_ERR(id.key.object))
> > >  		return PTR_ERR(id.key.object);
> > > @@ -1333,7 +1520,9 @@ static int hook_file_open(struct file *const file)
> > >  {
> > >  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> > >  	access_mask_t open_access_request, full_access_request, allowed_access;
> > > -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > > +	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE |
> > > +					      LANDLOCK_ACCESS_FS_IOCTL |
> > > +					      IOCTL_GROUPS;
> > >  	const struct landlock_ruleset *const dom = get_current_fs_domain();
> > >  
> > >  	if (!dom)
> > 
> > We should set optional_access according to the file type before
> > `full_access_request = open_access_request | optional_access;`
> > 
> > const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
> > 
> > optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > if (is_device)
> >     optional_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > 
> > 
> > Because LANDLOCK_ACCESS_FS_IOCTL_DEV is dedicated to character or block
> > devices, we may want landlock_add_rule() to only allow this access right
> > to be tied to directories, or character devices, or block devices.  Even
> > if it would be more consistent with constraints on directory-only access
> > rights, I'm not sure about that.
> > 
> > 
> > > @@ -1375,6 +1564,16 @@ static int hook_file_open(struct file *const file)
> > >  		}
> > >  	}
> > >  
> > > +	/*
> > > +	 * Named pipes should be treated just like anonymous pipes.
> > > +	 * Therefore, we permit all IOCTLs on them.
> > > +	 */
> > > +	if (S_ISFIFO(file_inode(file)->i_mode)) {
> > > +		allowed_access |= LANDLOCK_ACCESS_FS_IOCTL |
> > > +				  LANDLOCK_ACCESS_FS_IOCTL_RW |
> > > +				  LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
> > > +	}
> > 
> > Instead of this S_ISFIFO check:
> > 
> > if (!is_device)
> >     allowed_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > 
> > > +
> > >  	/*
> > >  	 * For operations on already opened files (i.e. ftruncate()), it is the
> > >  	 * access rights at the time of open() which decide whether the
> > > @@ -1406,6 +1605,25 @@ static int hook_file_truncate(struct file *const file)
> > >  	return -EACCES;
> > >  }
> > >  
> > > +static int hook_file_ioctl(struct file *file, unsigned int cmd,
> > > +			   unsigned long arg)
> > > +{
> > > +	const access_mask_t required_access = get_required_ioctl_access(cmd);
> > 
> > const access_mask_t required_access = LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > 
> > 
> > > +	const access_mask_t allowed_access =
> > > +		landlock_file(file)->allowed_access;
> > > +
> > > +	/*
> > > +	 * It is the access rights at the time of opening the file which
> > > +	 * determine whether IOCTL can be used on the opened file later.
> > > +	 *
> > > +	 * The access right is attached to the opened file in hook_file_open().
> > > +	 */
> > > +	if ((allowed_access & required_access) == required_access)
> > > +		return 0;
> > 
> > We could then check against the do_vfs_ioctl()'s commands, excluding
> > FIONREAD and file_ioctl()'s commands, to always allow VFS-related
> > commands:
> > 
> > if (vfs_masked_device_ioctl(cmd))
> >     return 0;
> > 
> > As a safeguard, we could define vfs_masked_device_ioctl(cmd) in
> > fs/ioctl.c and make it called by do_vfs_ioctl() as a safeguard to make
> > sure we keep an accurate list of VFS IOCTL commands (see next RFC patch).
> 
> 
> > The compat IOCTL hook must also be implemented.
> 
> Thanks!  I can't believe I missed that one.
> 
> 
> > What do you think? Any better idea?
> 
> It seems like a reasonable approach.  I'd like to double check with you that we
> are on the same page about it before doing the next implementation step.  (These
> iterations seems cheaper when we do them in English than when we do them in C.)

We only reached this design because of the previous iterations, reviews
and discussions.  Implementations details matter in this case and it's
good to take time to convince ourselves of the best approach (and to
understand how underlying implementations work).  Finding a "simple"
interface that makes sense to control IOCTLs in an efficient way wasn't
obvious but I'm convinced we got it now.

Thanks for your perseverance!

> 
> Thanks for the review!
> —Günther
>
Mickaël Salaün March 1, 2024, 1:38 p.m. UTC | #14
On Fri, Mar 01, 2024 at 01:59:13PM +0100, Mickaël Salaün wrote:
> On Wed, Feb 28, 2024 at 01:57:42PM +0100, Günther Noack wrote:
> > Hello Mickaël!
> > 
> > On Mon, Feb 19, 2024 at 07:34:42PM +0100, Mickaël Salaün wrote:
> > > Arn, Christian, please take a look at the following RFC patch and the
> > > rationale explained here.
> > > 
> > > On Fri, Feb 09, 2024 at 06:06:05PM +0100, Günther Noack wrote:
> > > > Introduces the LANDLOCK_ACCESS_FS_IOCTL access right
> > > > and increments the Landlock ABI version to 5.
> > > > 
> > > > Like the truncate right, these rights are associated with a file
> > > > descriptor at the time of open(2), and get respected even when the
> > > > file descriptor is used outside of the thread which it was originally
> > > > opened in.
> > > > 
> > > > A newly enabled Landlock policy therefore does not apply to file
> > > > descriptors which are already open.
> > > > 
> > > > If the LANDLOCK_ACCESS_FS_IOCTL right is handled, only a small number
> > > > of safe IOCTL commands will be permitted on newly opened files.  The
> > > > permitted IOCTLs can be configured through the ruleset in limited ways
> > > > now.  (See documentation for details.)
> > > > 
> > > > Specifically, when LANDLOCK_ACCESS_FS_IOCTL is handled, granting this
> > > > right on a file or directory will *not* permit to do all IOCTL
> > > > commands, but only influence the IOCTL commands which are not already
> > > > handled through other access rights.  The intent is to keep the groups
> > > > of IOCTL commands more fine-grained.
> > > > 
> > > > Noteworthy scenarios which require special attention:
> > > > 
> > > > TTY devices are often passed into a process from the parent process,
> > > > and so a newly enabled Landlock policy does not retroactively apply to
> > > > them automatically.  In the past, TTY devices have often supported
> > > > IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
> > > > letting callers control the TTY input buffer (and simulate
> > > > keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
> > > > modern kernels though.
> > > > 
> > > > Some legitimate file system features, like setting up fscrypt, are
> > > > exposed as IOCTL commands on regular files and directories -- users of
> > > > Landlock are advised to double check that the sandboxed process does
> > > > not need to invoke these IOCTLs.
> > > 
> > > I think we really need to allow fscrypt and fs-verity IOCTLs.
> > > 
> > > > 
> > > > Known limitations:
> > > > 
> > > > The LANDLOCK_ACCESS_FS_IOCTL access right is a coarse-grained control
> > > > over IOCTL commands.  Future work will enable a more fine-grained
> > > > access control for IOCTLs.
> > > > 
> > > > In the meantime, Landlock users may use path-based restrictions in
> > > > combination with their knowledge about the file system layout to
> > > > control what IOCTLs can be done.  Mounting file systems with the nodev
> > > > option can help to distinguish regular files and devices, and give
> > > > guarantees about the affected files, which Landlock alone can not give
> > > > yet.
> > > 
> > > I had a second though about our current approach, and it looks like we
> > > can do simpler, more generic, and with less IOCTL commands specific
> > > handling.
> > > 
> > > What we didn't take into account is that an IOCTL needs an opened file,
> > > which means that the caller must already have been allowed to open this
> > > file in read or write mode.
> > > 
> > > I think most FS-specific IOCTL commands check access rights (i.e. access
> > > mode or required capability), other than implicit ones (at least read or
> > > write), when appropriate.  We don't get such guarantee with device
> > > drivers.
> > > 
> > > The main threat is IOCTLs on character or block devices because their
> > > impact may be unknown (if we only look at the IOCTL command, not the
> > > backing file), but we should allow IOCTLs on filesystems (e.g. fscrypt,
> > > fs-verity, clone extents).  I think we should only implement a
> > > LANDLOCK_ACCESS_FS_IOCTL_DEV right, which would be more explicit.  This
> > > change would impact the IOCTLs grouping (not required anymore), but
> > > we'll still need the list of VFS IOCTLs.
> > 
> > 
> > I am fine with dropping the IOCTL grouping and going for this simpler approach.
> > 
> > This must have been a misunderstanding - I thought you wanted to align the
> > access checks in Landlock with the ones done by the kernel already, so that we
> > can reason about it more locally.  But I'm fine with doing it just for device
> > files as well, if that is what it takes.  It's definitely simpler.
> 
> I still think we should align existing Landlock access rights with the VFS IOCTL
> semantic (i.e. mostly defined in do_vfs_ioctl(), but also in the compat
> ioctl syscall).  However, according to our investigations and
> discussions, it looks like the groups we defined should already be
> enforced by the VFS code, which means we should not need such groups
> after all.  My last proposal is to still delegate access for VFS IOCTLs
> to the current Landlock access rights, but it doesn't seem required to
> add specific access check if we are able to identify these VFS IOCTLs.

To say it another way, at least one of the read/write Landlock rights
are already required to open a file/directory, and according to the new
get_required_ioctl_access() grouping we can simplifying it further to
fully rely on the meta "open" access right, and then replace
get_required_ioctl_access() with the file type and
vfs_masked_device_ioctl() checks.

For now, the only "optional" access right is
LANDLOCK_ACCESS_FS_TRUNCATE, and I don't think it needs to be tied to
any VFS IOCTLs.

Because the IOCTL_DEV access right comes now, future access rights that
may need to also check IOCTL (e.g. change file attribute) should be much
simpler to implement.  Indeed, they will only impact VFS IOCTLs which
would always be allowed with LANDLOCK_ACCESS_FS_IOCTL_DEV.  It should
then be trivial to add a new control layer for a subset of the VFS
IOCTLs.
diff mbox series

Patch

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index 25c8d7677539..16d7d72804f8 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -128,7 +128,7 @@  struct landlock_net_port_attr {
  * files and directories.  Files or directories opened before the sandboxing
  * are not subject to these restrictions.
  *
- * A file can only receive these access rights:
+ * The following access rights apply only to files:
  *
  * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file.
  * - %LANDLOCK_ACCESS_FS_WRITE_FILE: Open a file with write access. Note that
@@ -138,12 +138,13 @@  struct landlock_net_port_attr {
  * - %LANDLOCK_ACCESS_FS_READ_FILE: Open a file with read access.
  * - %LANDLOCK_ACCESS_FS_TRUNCATE: Truncate a file with :manpage:`truncate(2)`,
  *   :manpage:`ftruncate(2)`, :manpage:`creat(2)`, or :manpage:`open(2)` with
- *   ``O_TRUNC``. Whether an opened file can be truncated with
- *   :manpage:`ftruncate(2)` is determined during :manpage:`open(2)`, in the
- *   same way as read and write permissions are checked during
- *   :manpage:`open(2)` using %LANDLOCK_ACCESS_FS_READ_FILE and
- *   %LANDLOCK_ACCESS_FS_WRITE_FILE. This access right is available since the
- *   third version of the Landlock ABI.
+ *   ``O_TRUNC``.  This access right is available since the third version of the
+ *   Landlock ABI.
+ *
+ * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
+ * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
+ * read and write permissions are checked during :manpage:`open(2)` using
+ * %LANDLOCK_ACCESS_FS_READ_FILE and %LANDLOCK_ACCESS_FS_WRITE_FILE.
  *
  * A directory can receive access rights related to files or directories.  The
  * following access right is applied to the directory itself, and the
@@ -198,13 +199,50 @@  struct landlock_net_port_attr {
  *   If multiple requirements are not met, the ``EACCES`` error code takes
  *   precedence over ``EXDEV``.
  *
+ * The following access right applies both to files and directories:
+ *
+ * - %LANDLOCK_ACCESS_FS_IOCTL: Invoke :manpage:`ioctl(2)` commands on an opened
+ *   file or directory.
+ *
+ *   This access right applies to all :manpage:`ioctl(2)` commands, except of
+ *   ``FIOCLEX``, ``FIONCLEX``, ``FIONBIO`` and ``FIOASYNC``.  These commands
+ *   continue to be invokable independent of the %LANDLOCK_ACCESS_FS_IOCTL
+ *   access right.
+ *
+ *   When certain other access rights are handled in the ruleset, in addition to
+ *   %LANDLOCK_ACCESS_FS_IOCTL, granting these access rights will unlock access
+ *   to additional groups of IOCTL commands, on the affected files:
+ *
+ *   * %LANDLOCK_ACCESS_FS_READ_FILE and %LANDLOCK_ACCESS_FS_WRITE_FILE unlock
+ *     access to ``FIOQSIZE``, ``FIONREAD``, ``FIGETBSZ``, ``FS_IOC_FIEMAP``,
+ *     ``FIBMAP``, ``FIDEDUPERANGE``, ``FICLONE``, ``FICLONERANGE``,
+ *     ``FS_IOC_RESVSP``, ``FS_IOC_RESVSP64``, ``FS_IOC_UNRESVSP``,
+ *     ``FS_IOC_UNRESVSP64``, ``FS_IOC_ZERO_RANGE``.
+ *
+ *   * %LANDLOCK_ACCESS_FS_READ_DIR unlocks access to ``FIOQSIZE``,
+ *     ``FIONREAD``, ``FIGETBSZ``.
+ *
+ *   When these access rights are handled in the ruleset, the availability of
+ *   the affected IOCTL commands is not governed by %LANDLOCK_ACCESS_FS_IOCTL
+ *   any more, but by the respective access right.
+ *
+ *   All other IOCTL commands are not handled specially, and are governed by
+ *   %LANDLOCK_ACCESS_FS_IOCTL.  This includes %FS_IOC_GETFLAGS and
+ *   %FS_IOC_SETFLAGS for manipulating inode flags (:manpage:`ioctl_iflags(2)`),
+ *   %FS_IOC_FSFETXATTR and %FS_IOC_FSSETXATTR for manipulating extended
+ *   attributes, as well as %FIFREEZE and %FITHAW for freezing and thawing file
+ *   systems.
+ *
+ *   This access right is available since the fifth version of the Landlock
+ *   ABI.
+ *
  * .. warning::
  *
  *   It is currently not possible to restrict some file-related actions
  *   accessible through these syscall families: :manpage:`chdir(2)`,
  *   :manpage:`stat(2)`, :manpage:`flock(2)`, :manpage:`chmod(2)`,
  *   :manpage:`chown(2)`, :manpage:`setxattr(2)`, :manpage:`utime(2)`,
- *   :manpage:`ioctl(2)`, :manpage:`fcntl(2)`, :manpage:`access(2)`.
+ *   :manpage:`fcntl(2)`, :manpage:`access(2)`.
  *   Future Landlock evolutions will enable to restrict them.
  */
 /* clang-format off */
@@ -223,6 +261,7 @@  struct landlock_net_port_attr {
 #define LANDLOCK_ACCESS_FS_MAKE_SYM			(1ULL << 12)
 #define LANDLOCK_ACCESS_FS_REFER			(1ULL << 13)
 #define LANDLOCK_ACCESS_FS_TRUNCATE			(1ULL << 14)
+#define LANDLOCK_ACCESS_FS_IOCTL			(1ULL << 15)
 /* clang-format on */
 
 /**
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 73997e63734f..84efea3f7c0f 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -7,6 +7,7 @@ 
  * Copyright © 2021-2022 Microsoft Corporation
  */
 
+#include <asm/ioctls.h>
 #include <kunit/test.h>
 #include <linux/atomic.h>
 #include <linux/bitops.h>
@@ -14,6 +15,7 @@ 
 #include <linux/compiler_types.h>
 #include <linux/dcache.h>
 #include <linux/err.h>
+#include <linux/falloc.h>
 #include <linux/fs.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
@@ -29,6 +31,7 @@ 
 #include <linux/types.h>
 #include <linux/wait_bit.h>
 #include <linux/workqueue.h>
+#include <uapi/linux/fiemap.h>
 #include <uapi/linux/landlock.h>
 
 #include "common.h"
@@ -84,6 +87,186 @@  static const struct landlock_object_underops landlock_fs_underops = {
 	.release = release_inode
 };
 
+/* IOCTL helpers */
+
+/*
+ * These are synthetic access rights, which are only used within the kernel, but
+ * not exposed to callers in userspace.  The mapping between these access rights
+ * and IOCTL commands is defined in the get_required_ioctl_access() helper function.
+ */
+#define LANDLOCK_ACCESS_FS_IOCTL_RW (LANDLOCK_LAST_PUBLIC_ACCESS_FS << 1)
+#define LANDLOCK_ACCESS_FS_IOCTL_RW_FILE (LANDLOCK_LAST_PUBLIC_ACCESS_FS << 2)
+
+/* ioctl_groups - all synthetic access rights for IOCTL command groups */
+/* clang-format off */
+#define IOCTL_GROUPS (				\
+	LANDLOCK_ACCESS_FS_IOCTL_RW |		\
+	LANDLOCK_ACCESS_FS_IOCTL_RW_FILE)
+/* clang-format on */
+
+static_assert((IOCTL_GROUPS & LANDLOCK_MASK_ACCESS_FS) == IOCTL_GROUPS);
+
+/**
+ * get_required_ioctl_access(): Determine required IOCTL access rights.
+ *
+ * @cmd: The IOCTL command that is supposed to be run.
+ *
+ * Any new IOCTL commands that are implemented in fs/ioctl.c's do_vfs_ioctl()
+ * should be considered for inclusion here.
+ *
+ * Returns: The access rights that must be granted on an opened file in order to
+ * use the given @cmd.
+ */
+static __attribute_const__ access_mask_t
+get_required_ioctl_access(const unsigned int cmd)
+{
+	switch (cmd) {
+	case FIOCLEX:
+	case FIONCLEX:
+	case FIONBIO:
+	case FIOASYNC:
+		/*
+		 * FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC manipulate the FD's
+		 * close-on-exec and the file's buffered-IO and async flags.
+		 * These operations are also available through fcntl(2), and are
+		 * unconditionally permitted in Landlock.
+		 */
+		return 0;
+	case FIONREAD:
+	case FIOQSIZE:
+	case FIGETBSZ:
+		/*
+		 * FIONREAD returns the number of bytes available for reading.
+		 * FIONREAD returns the number of immediately readable bytes for
+		 * a file.
+		 *
+		 * FIOQSIZE queries the size of a file or directory.
+		 *
+		 * FIGETBSZ queries the file system's block size for a file or
+		 * directory.
+		 *
+		 * These IOCTL commands are permitted for files which are opened
+		 * with LANDLOCK_ACCESS_FS_READ_DIR,
+		 * LANDLOCK_ACCESS_FS_READ_FILE, or
+		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
+		 */
+		return LANDLOCK_ACCESS_FS_IOCTL_RW;
+	case FS_IOC_FIEMAP:
+	case FIBMAP:
+		/*
+		 * FS_IOC_FIEMAP and FIBMAP query information about the
+		 * allocation of blocks within a file.  They are permitted for
+		 * files which are opened with LANDLOCK_ACCESS_FS_READ_FILE or
+		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
+		 */
+		fallthrough;
+	case FIDEDUPERANGE:
+	case FICLONE:
+	case FICLONERANGE:
+		/*
+		 * FIDEDUPERANGE, FICLONE and FICLONERANGE make files share
+		 * their underlying storage ("reflink") between source and
+		 * destination FDs, on file systems which support that.
+		 *
+		 * The underlying implementations are already checking whether
+		 * the involved files are opened with the appropriate read/write
+		 * modes.  We rely on this being implemented correctly.
+		 *
+		 * These IOCTLs are permitted for files which are opened with
+		 * LANDLOCK_ACCESS_FS_READ_FILE or
+		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
+		 */
+		fallthrough;
+	case FS_IOC_RESVSP:
+	case FS_IOC_RESVSP64:
+	case FS_IOC_UNRESVSP:
+	case FS_IOC_UNRESVSP64:
+	case FS_IOC_ZERO_RANGE:
+		/*
+		 * These IOCTLs reserve space, or create holes like
+		 * fallocate(2).  We rely on the implementations checking the
+		 * files' read/write modes.
+		 *
+		 * These IOCTLs are permitted for files which are opened with
+		 * LANDLOCK_ACCESS_FS_READ_FILE or
+		 * LANDLOCK_ACCESS_FS_WRITE_FILE.
+		 */
+		return LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
+	default:
+		/*
+		 * Other commands are guarded by the catch-all access right.
+		 */
+		return LANDLOCK_ACCESS_FS_IOCTL;
+	}
+}
+
+/**
+ * expand_ioctl() - Return the dst flags from either the src flag or the
+ * %LANDLOCK_ACCESS_FS_IOCTL flag, depending on whether the
+ * %LANDLOCK_ACCESS_FS_IOCTL and src access rights are handled or not.
+ *
+ * @handled: Handled access rights.
+ * @access: The access mask to copy values from.
+ * @src: A single access right to copy from in @access.
+ * @dst: One or more access rights to copy to.
+ *
+ * Returns: @dst, or 0.
+ */
+static __attribute_const__ access_mask_t
+expand_ioctl(const access_mask_t handled, const access_mask_t access,
+	     const access_mask_t src, const access_mask_t dst)
+{
+	access_mask_t copy_from;
+
+	if (!(handled & LANDLOCK_ACCESS_FS_IOCTL))
+		return 0;
+
+	copy_from = (handled & src) ? src : LANDLOCK_ACCESS_FS_IOCTL;
+	if (access & copy_from)
+		return dst;
+
+	return 0;
+}
+
+/**
+ * landlock_expand_access_fs() - Returns @access with the synthetic IOCTL group
+ * flags enabled if necessary.
+ *
+ * @handled: Handled FS access rights.
+ * @access: FS access rights to expand.
+ *
+ * Returns: @access expanded by the necessary flags for the synthetic IOCTL
+ * access rights.
+ */
+static __attribute_const__ access_mask_t landlock_expand_access_fs(
+	const access_mask_t handled, const access_mask_t access)
+{
+	return access |
+	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_WRITE_FILE,
+			    LANDLOCK_ACCESS_FS_IOCTL_RW |
+				    LANDLOCK_ACCESS_FS_IOCTL_RW_FILE) |
+	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_READ_FILE,
+			    LANDLOCK_ACCESS_FS_IOCTL_RW |
+				    LANDLOCK_ACCESS_FS_IOCTL_RW_FILE) |
+	       expand_ioctl(handled, access, LANDLOCK_ACCESS_FS_READ_DIR,
+			    LANDLOCK_ACCESS_FS_IOCTL_RW);
+}
+
+/**
+ * landlock_expand_handled_access_fs() - add synthetic IOCTL access rights to an
+ * access mask of handled accesses.
+ *
+ * @handled: The handled accesses of a ruleset that is being created.
+ *
+ * Returns: @handled, with the bits for the synthetic IOCTL access rights set,
+ * if %LANDLOCK_ACCESS_FS_IOCTL is handled.
+ */
+__attribute_const__ access_mask_t
+landlock_expand_handled_access_fs(const access_mask_t handled)
+{
+	return landlock_expand_access_fs(handled, handled);
+}
+
 /* Ruleset management */
 
 static struct landlock_object *get_inode_object(struct inode *const inode)
@@ -148,7 +331,8 @@  static struct landlock_object *get_inode_object(struct inode *const inode)
 	LANDLOCK_ACCESS_FS_EXECUTE | \
 	LANDLOCK_ACCESS_FS_WRITE_FILE | \
 	LANDLOCK_ACCESS_FS_READ_FILE | \
-	LANDLOCK_ACCESS_FS_TRUNCATE)
+	LANDLOCK_ACCESS_FS_TRUNCATE | \
+	LANDLOCK_ACCESS_FS_IOCTL)
 /* clang-format on */
 
 /*
@@ -158,6 +342,7 @@  int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
 			    const struct path *const path,
 			    access_mask_t access_rights)
 {
+	access_mask_t handled;
 	int err;
 	struct landlock_id id = {
 		.type = LANDLOCK_KEY_INODE,
@@ -170,9 +355,11 @@  int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
 	if (WARN_ON_ONCE(ruleset->num_layers != 1))
 		return -EINVAL;
 
+	handled = landlock_get_fs_access_mask(ruleset, 0);
+	/* Expands the synthetic IOCTL groups. */
+	access_rights |= landlock_expand_access_fs(handled, access_rights);
 	/* Transforms relative access rights to absolute ones. */
-	access_rights |= LANDLOCK_MASK_ACCESS_FS &
-			 ~landlock_get_fs_access_mask(ruleset, 0);
+	access_rights |= LANDLOCK_MASK_ACCESS_FS & ~handled;
 	id.key.object = get_inode_object(d_backing_inode(path->dentry));
 	if (IS_ERR(id.key.object))
 		return PTR_ERR(id.key.object);
@@ -1333,7 +1520,9 @@  static int hook_file_open(struct file *const file)
 {
 	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
 	access_mask_t open_access_request, full_access_request, allowed_access;
-	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
+	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE |
+					      LANDLOCK_ACCESS_FS_IOCTL |
+					      IOCTL_GROUPS;
 	const struct landlock_ruleset *const dom = get_current_fs_domain();
 
 	if (!dom)
@@ -1375,6 +1564,16 @@  static int hook_file_open(struct file *const file)
 		}
 	}
 
+	/*
+	 * Named pipes should be treated just like anonymous pipes.
+	 * Therefore, we permit all IOCTLs on them.
+	 */
+	if (S_ISFIFO(file_inode(file)->i_mode)) {
+		allowed_access |= LANDLOCK_ACCESS_FS_IOCTL |
+				  LANDLOCK_ACCESS_FS_IOCTL_RW |
+				  LANDLOCK_ACCESS_FS_IOCTL_RW_FILE;
+	}
+
 	/*
 	 * For operations on already opened files (i.e. ftruncate()), it is the
 	 * access rights at the time of open() which decide whether the
@@ -1406,6 +1605,25 @@  static int hook_file_truncate(struct file *const file)
 	return -EACCES;
 }
 
+static int hook_file_ioctl(struct file *file, unsigned int cmd,
+			   unsigned long arg)
+{
+	const access_mask_t required_access = get_required_ioctl_access(cmd);
+	const access_mask_t allowed_access =
+		landlock_file(file)->allowed_access;
+
+	/*
+	 * It is the access rights at the time of opening the file which
+	 * determine whether IOCTL can be used on the opened file later.
+	 *
+	 * The access right is attached to the opened file in hook_file_open().
+	 */
+	if ((allowed_access & required_access) == required_access)
+		return 0;
+
+	return -EACCES;
+}
+
 static struct security_hook_list landlock_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
 
@@ -1428,6 +1646,7 @@  static struct security_hook_list landlock_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(file_alloc_security, hook_file_alloc_security),
 	LSM_HOOK_INIT(file_open, hook_file_open),
 	LSM_HOOK_INIT(file_truncate, hook_file_truncate),
+	LSM_HOOK_INIT(file_ioctl, hook_file_ioctl),
 };
 
 __init void landlock_add_fs_hooks(void)
diff --git a/security/landlock/fs.h b/security/landlock/fs.h
index 488e4813680a..086576b8386b 100644
--- a/security/landlock/fs.h
+++ b/security/landlock/fs.h
@@ -92,4 +92,7 @@  int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
 			    const struct path *const path,
 			    access_mask_t access_hierarchy);
 
+__attribute_const__ access_mask_t
+landlock_expand_handled_access_fs(const access_mask_t handled);
+
 #endif /* _SECURITY_LANDLOCK_FS_H */
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index 93c9c6f91556..ecbdc8bbf906 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -18,7 +18,16 @@ 
 #define LANDLOCK_MAX_NUM_LAYERS		16
 #define LANDLOCK_MAX_NUM_RULES		U32_MAX
 
-#define LANDLOCK_LAST_ACCESS_FS		LANDLOCK_ACCESS_FS_TRUNCATE
+/*
+ * For file system access rights, Landlock distinguishes between the publicly
+ * visible access rights (1 to LANDLOCK_LAST_PUBLIC_ACCESS_FS) and the private
+ * ones which are not exposed to userspace (LANDLOCK_LAST_PUBLIC_ACCESS_FS + 1
+ * to LANDLOCK_LAST_ACCESS_FS).  The private access rights are defined in fs.c.
+ */
+#define LANDLOCK_LAST_PUBLIC_ACCESS_FS	LANDLOCK_ACCESS_FS_IOCTL
+#define LANDLOCK_MASK_PUBLIC_ACCESS_FS	((LANDLOCK_LAST_PUBLIC_ACCESS_FS << 1) - 1)
+
+#define LANDLOCK_LAST_ACCESS_FS		(LANDLOCK_LAST_PUBLIC_ACCESS_FS << 2)
 #define LANDLOCK_MASK_ACCESS_FS		((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
 #define LANDLOCK_NUM_ACCESS_FS		__const_hweight64(LANDLOCK_MASK_ACCESS_FS)
 #define LANDLOCK_SHIFT_ACCESS_FS	0
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index c7f1526784fd..5a28ea8e1c3d 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -30,7 +30,7 @@ 
 	LANDLOCK_ACCESS_FS_REFER)
 /* clang-format on */
 
-typedef u16 access_mask_t;
+typedef u32 access_mask_t;
 /* Makes sure all filesystem access rights can be stored. */
 static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
 /* Makes sure all network access rights can be stored. */
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 898358f57fa0..f0bc50003b46 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -137,7 +137,7 @@  static const struct file_operations ruleset_fops = {
 	.write = fop_dummy_write,
 };
 
-#define LANDLOCK_ABI_VERSION 4
+#define LANDLOCK_ABI_VERSION 5
 
 /**
  * sys_landlock_create_ruleset - Create a new ruleset
@@ -192,8 +192,8 @@  SYSCALL_DEFINE3(landlock_create_ruleset,
 		return err;
 
 	/* Checks content (and 32-bits cast). */
-	if ((ruleset_attr.handled_access_fs | LANDLOCK_MASK_ACCESS_FS) !=
-	    LANDLOCK_MASK_ACCESS_FS)
+	if ((ruleset_attr.handled_access_fs | LANDLOCK_MASK_PUBLIC_ACCESS_FS) !=
+	    LANDLOCK_MASK_PUBLIC_ACCESS_FS)
 		return -EINVAL;
 
 	/* Checks network content (and 32-bits cast). */
@@ -201,6 +201,10 @@  SYSCALL_DEFINE3(landlock_create_ruleset,
 	    LANDLOCK_MASK_ACCESS_NET)
 		return -EINVAL;
 
+	/* Expands synthetic IOCTL groups. */
+	ruleset_attr.handled_access_fs = landlock_expand_handled_access_fs(
+		ruleset_attr.handled_access_fs);
+
 	/* Checks arguments and transforms to kernel struct. */
 	ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs,
 					  ruleset_attr.handled_access_net);
@@ -309,8 +313,13 @@  static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
 	if (!path_beneath_attr.allowed_access)
 		return -ENOMSG;
 
-	/* Checks that allowed_access matches the @ruleset constraints. */
-	mask = landlock_get_raw_fs_access_mask(ruleset, 0);
+	/*
+	 * Checks that allowed_access matches the @ruleset constraints and only
+	 * consists of publicly visible access rights (as opposed to synthetic
+	 * ones).
+	 */
+	mask = landlock_get_raw_fs_access_mask(ruleset, 0) &
+	       LANDLOCK_MASK_PUBLIC_ACCESS_FS;
 	if ((path_beneath_attr.allowed_access | mask) != mask)
 		return -EINVAL;
 
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index 646f778dfb1e..d292b419ccba 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -75,7 +75,7 @@  TEST(abi_version)
 	const struct landlock_ruleset_attr ruleset_attr = {
 		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
 	};
-	ASSERT_EQ(4, landlock_create_ruleset(NULL, 0,
+	ASSERT_EQ(5, landlock_create_ruleset(NULL, 0,
 					     LANDLOCK_CREATE_RULESET_VERSION));
 
 	ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index 2d6d9b43d958..3203f4a5bc85 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -527,9 +527,10 @@  TEST_F_FORK(layout1, inval)
 	LANDLOCK_ACCESS_FS_EXECUTE | \
 	LANDLOCK_ACCESS_FS_WRITE_FILE | \
 	LANDLOCK_ACCESS_FS_READ_FILE | \
-	LANDLOCK_ACCESS_FS_TRUNCATE)
+	LANDLOCK_ACCESS_FS_TRUNCATE | \
+	LANDLOCK_ACCESS_FS_IOCTL)
 
-#define ACCESS_LAST LANDLOCK_ACCESS_FS_TRUNCATE
+#define ACCESS_LAST LANDLOCK_ACCESS_FS_IOCTL
 
 #define ACCESS_ALL ( \
 	ACCESS_FILE | \