diff mbox series

[v13,01/10] landlock: Add IOCTL access right for character and block devices

Message ID 20240327131040.158777-2-gnoack@google.com (mailing list archive)
State Handled Elsewhere
Headers show
Series Landlock: IOCTL support | expand

Commit Message

Günther Noack March 27, 2024, 1:10 p.m. UTC
Introduces the LANDLOCK_ACCESS_FS_IOCTL_DEV right
and increments the Landlock ABI version to 5.

This access right applies to device-custom IOCTL commands
when they are invoked on block or character device files.

Like the truncate right, this right is associated with a file
descriptor at the time of open(2), and gets respected even when the
file descriptor is used outside of the thread which it was originally
opened in.

Therefore, a newly enabled Landlock policy does not apply to file
descriptors which are already open.

If the LANDLOCK_ACCESS_FS_IOCTL_DEV right is handled, only a small
number of safe IOCTL commands will be permitted on newly opened device
files.  These include FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC, as well
as other IOCTL commands for regular files which are implemented in
fs/ioctl.c.

Noteworthy scenarios which require special attention:

TTY devices are often passed into a process from the parent process,
and so a newly enabled Landlock policy does not retroactively apply to
them automatically.  In the past, TTY devices have often supported
IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
letting callers control the TTY input buffer (and simulate
keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
modern kernels though.

Known limitations:

The LANDLOCK_ACCESS_FS_IOCTL_DEV access right is a coarse-grained
control over IOCTL commands.

Landlock users may use path-based restrictions in combination with
their knowledge about the file system layout to control what IOCTLs
can be done.

Cc: Paul Moore <paul@paul-moore.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Günther Noack <gnoack@google.com>
---
 include/uapi/linux/landlock.h                |  33 +++-
 security/landlock/fs.c                       | 183 ++++++++++++++++++-
 security/landlock/limits.h                   |   2 +-
 security/landlock/syscalls.c                 |   8 +-
 tools/testing/selftests/landlock/base_test.c |   2 +-
 tools/testing/selftests/landlock/fs_test.c   |   5 +-
 6 files changed, 216 insertions(+), 17 deletions(-)

Comments

Mickaël Salaün March 27, 2024, 4:57 p.m. UTC | #1
On Wed, Mar 27, 2024 at 01:10:31PM +0000, Günther Noack wrote:
> Introduces the LANDLOCK_ACCESS_FS_IOCTL_DEV right
> and increments the Landlock ABI version to 5.
> 
> This access right applies to device-custom IOCTL commands
> when they are invoked on block or character device files.
> 
> Like the truncate right, this right is associated with a file
> descriptor at the time of open(2), and gets respected even when the
> file descriptor is used outside of the thread which it was originally
> opened in.
> 
> Therefore, a newly enabled Landlock policy does not apply to file
> descriptors which are already open.
> 
> If the LANDLOCK_ACCESS_FS_IOCTL_DEV right is handled, only a small
> number of safe IOCTL commands will be permitted on newly opened device
> files.  These include FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC, as well
> as other IOCTL commands for regular files which are implemented in
> fs/ioctl.c.
> 
> Noteworthy scenarios which require special attention:
> 
> TTY devices are often passed into a process from the parent process,
> and so a newly enabled Landlock policy does not retroactively apply to
> them automatically.  In the past, TTY devices have often supported
> IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
> letting callers control the TTY input buffer (and simulate
> keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
> modern kernels though.
> 
> Known limitations:
> 
> The LANDLOCK_ACCESS_FS_IOCTL_DEV access right is a coarse-grained
> control over IOCTL commands.
> 
> Landlock users may use path-based restrictions in combination with
> their knowledge about the file system layout to control what IOCTLs
> can be done.
> 
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Günther Noack <gnoack@google.com>
> ---
>  include/uapi/linux/landlock.h                |  33 +++-
>  security/landlock/fs.c                       | 183 ++++++++++++++++++-
>  security/landlock/limits.h                   |   2 +-
>  security/landlock/syscalls.c                 |   8 +-
>  tools/testing/selftests/landlock/base_test.c |   2 +-
>  tools/testing/selftests/landlock/fs_test.c   |   5 +-
>  6 files changed, 216 insertions(+), 17 deletions(-)
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index 25c8d7677539..5d90e9799eb5 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -128,7 +128,7 @@ struct landlock_net_port_attr {
>   * files and directories.  Files or directories opened before the sandboxing
>   * are not subject to these restrictions.
>   *
> - * A file can only receive these access rights:
> + * The following access rights apply only to files:
>   *
>   * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file.
>   * - %LANDLOCK_ACCESS_FS_WRITE_FILE: Open a file with write access. Note that
> @@ -138,12 +138,13 @@ struct landlock_net_port_attr {
>   * - %LANDLOCK_ACCESS_FS_READ_FILE: Open a file with read access.
>   * - %LANDLOCK_ACCESS_FS_TRUNCATE: Truncate a file with :manpage:`truncate(2)`,
>   *   :manpage:`ftruncate(2)`, :manpage:`creat(2)`, or :manpage:`open(2)` with
> - *   ``O_TRUNC``. Whether an opened file can be truncated with
> - *   :manpage:`ftruncate(2)` is determined during :manpage:`open(2)`, in the
> - *   same way as read and write permissions are checked during
> - *   :manpage:`open(2)` using %LANDLOCK_ACCESS_FS_READ_FILE and
> - *   %LANDLOCK_ACCESS_FS_WRITE_FILE. This access right is available since the
> - *   third version of the Landlock ABI.
> + *   ``O_TRUNC``.  This access right is available since the third version of the
> + *   Landlock ABI.
> + *
> + * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
> + * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
> + * read and write permissions are checked during :manpage:`open(2)` using
> + * %LANDLOCK_ACCESS_FS_READ_FILE and %LANDLOCK_ACCESS_FS_WRITE_FILE.
>   *
>   * A directory can receive access rights related to files or directories.  The
>   * following access right is applied to the directory itself, and the
> @@ -198,13 +199,28 @@ struct landlock_net_port_attr {
>   *   If multiple requirements are not met, the ``EACCES`` error code takes
>   *   precedence over ``EXDEV``.
>   *
> + * The following access right applies both to files and directories:
> + *
> + * - %LANDLOCK_ACCESS_FS_IOCTL_DEV: Invoke :manpage:`ioctl(2)` commands on an opened
> + *   character or block device.
> + *
> + *   This access right applies to all `ioctl(2)` commands implemented by device

:manpage:`ioctl(2)`

> + *   drivers.  However, the following common IOCTL commands continue to be
> + *   invokable independent of the %LANDLOCK_ACCESS_FS_IOCTL_DEV right:

This is good but explaining the rationale could help, something like
this (taking care of not packing lines listing commands to ease review
when a new command will be added):

IOCTL commands targetting file descriptors (``FIOCLEX``, ``FIONCLEX``),
file descriptions (``FIONBIO``, ``FIOASYNC``),
file systems (``FIOQSIZE``, ``FS_IOC_FIEMAP``, ``FICLONE``,
``FICLONERAN``, ``FIDEDUPERANGE``, ``FS_IOC_GETFLAGS``,
``FS_IOC_SETFLAGS``, ``FS_IOC_FSGETXATTR``, ``FS_IOC_FSSETXATTR``),
or superblocks (``FIFREEZE``, ``FITHAW``, ``FIGETBSZ``,
``FS_IOC_GETFSUUID``, ``FS_IOC_GETFSSYSFSPATH``)
are never denied.  However, such IOCTL commands still require an opened
file and may not be available on any file type.  Read or write
permission may be checked by the underlying implementation, as well as
capabilities.

> + *
> + *   ``FIOCLEX``, ``FIONCLEX``, ``FIONBIO``, ``FIOASYNC``, ``FIFREEZE``,
> + *   ``FITHAW``, ``FIGETBSZ``, ``FS_IOC_GETFSUUID``, ``FS_IOC_GETFSSYSFSPATH``
> + *
> + *   This access right is available since the fifth version of the Landlock
> + *   ABI.
> + *
>   * .. warning::
>   *
>   *   It is currently not possible to restrict some file-related actions
>   *   accessible through these syscall families: :manpage:`chdir(2)`,
>   *   :manpage:`stat(2)`, :manpage:`flock(2)`, :manpage:`chmod(2)`,
>   *   :manpage:`chown(2)`, :manpage:`setxattr(2)`, :manpage:`utime(2)`,
> - *   :manpage:`ioctl(2)`, :manpage:`fcntl(2)`, :manpage:`access(2)`.
> + *   :manpage:`fcntl(2)`, :manpage:`access(2)`.
>   *   Future Landlock evolutions will enable to restrict them.
>   */
>  /* clang-format off */
> @@ -223,6 +239,7 @@ struct landlock_net_port_attr {
>  #define LANDLOCK_ACCESS_FS_MAKE_SYM			(1ULL << 12)
>  #define LANDLOCK_ACCESS_FS_REFER			(1ULL << 13)
>  #define LANDLOCK_ACCESS_FS_TRUNCATE			(1ULL << 14)
> +#define LANDLOCK_ACCESS_FS_IOCTL_DEV			(1ULL << 15)
>  /* clang-format on */
>  
>  /**
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index c15559432d3d..2ef6c57fa20b 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -7,6 +7,7 @@
>   * Copyright © 2021-2022 Microsoft Corporation
>   */
>  
> +#include <asm/ioctls.h>
>  #include <kunit/test.h>
>  #include <linux/atomic.h>
>  #include <linux/bitops.h>
> @@ -14,6 +15,7 @@
>  #include <linux/compiler_types.h>
>  #include <linux/dcache.h>
>  #include <linux/err.h>
> +#include <linux/falloc.h>
>  #include <linux/fs.h>
>  #include <linux/init.h>
>  #include <linux/kernel.h>
> @@ -29,6 +31,7 @@
>  #include <linux/types.h>
>  #include <linux/wait_bit.h>
>  #include <linux/workqueue.h>
> +#include <uapi/linux/fiemap.h>
>  #include <uapi/linux/landlock.h>
>  
>  #include "common.h"
> @@ -84,6 +87,141 @@ static const struct landlock_object_underops landlock_fs_underops = {
>  	.release = release_inode
>  };
>  
> +/* IOCTL helpers */
> +
> +/**
> + * get_required_ioctl_dev_access(): Determine required access rights for IOCTLs
> + * on device files.
> + *
> + * @cmd: The IOCTL command that is supposed to be run.
> + *
> + * By default, any IOCTL on a device file requires the
> + * LANDLOCK_ACCESS_FS_IOCTL_DEV right.  We make exceptions for commands, if:
> + *
> + * 1. The command is implemented in fs/ioctl.c's do_vfs_ioctl(),
> + *    not in f_ops->unlocked_ioctl() or f_ops->compat_ioctl().
> + *
> + * 2. The command can be reasonably used on a device file at all.
> + *
> + * Any new IOCTL commands that are implemented in fs/ioctl.c's do_vfs_ioctl()
> + * should be considered for inclusion here.
> + *
> + * Returns: The access rights that must be granted on an opened file in order to
> + * use the given @cmd.
> + */
> +static __attribute_const__ access_mask_t
> +get_required_ioctl_dev_access(const unsigned int cmd)
> +{
> +	switch (cmd) {
> +	case FIOCLEX:
> +	case FIONCLEX:
> +	case FIONBIO:
> +	case FIOASYNC:
> +		/*
> +		 * FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC manipulate the FD's
> +		 * close-on-exec and the file's buffered-IO and async flags.
> +		 * These operations are also available through fcntl(2), and are
> +		 * unconditionally permitted in Landlock.
> +		 */
> +		return 0;
> +	case FIOQSIZE:
> +		/*
> +		 * FIOQSIZE queries the size of a regular file or directory.
> +		 *
> +		 * This IOCTL command only applies to regular files and
> +		 * directories.
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;

This should always be allowed because do_vfs_ioctl() never returns
-ENOIOCTLCMD for this command.  That's why I wrote
vfs_masked_device_ioctl() this way [1].  I think it would be easier to
read and maintain this code with a is_masked_device_ioctl() logic.  Listing
commands that are not masked makes it difficult to review because
allowed and denied return codes are interleaved.

[1] https://lore.kernel.org/r/20240219183539.2926165-1-mic@digikod.net

Your IOCTL command explanation comments are nice and they should be kept
in is_masked_device_ioctl() (if they mask device IOCTL commands).

> +	case FIFREEZE:
> +	case FITHAW:
> +		/*
> +		 * FIFREEZE and FITHAW freeze and thaw the file system which the
> +		 * given file belongs to.  Requires CAP_SYS_ADMIN.
> +		 *
> +		 * These commands operate on the file system's superblock rather
> +		 * than on the file itself.  The same operations can also be
> +		 * done through any other file or directory on the same file
> +		 * system, so it is safe to permit these.
> +		 */
> +		return 0;
> +	case FS_IOC_FIEMAP:
> +		/*
> +		 * FS_IOC_FIEMAP queries information about the allocation of
> +		 * blocks within a file.
> +		 *
> +		 * This IOCTL command only applies to regular files.
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;

Same here.

> +	case FIGETBSZ:
> +		/*
> +		 * FIGETBSZ queries the file system's block size for a file or
> +		 * directory.
> +		 *
> +		 * This command operates on the file system's superblock rather
> +		 * than on the file itself.  The same operation can also be done
> +		 * through any other file or directory on the same file system,
> +		 * so it is safe to permit it.
> +		 */
> +		return 0;
> +	case FICLONE:
> +	case FICLONERANGE:
> +	case FIDEDUPERANGE:
> +		/*
> +		 * FICLONE, FICLONERANGE and FIDEDUPERANGE make files share
> +		 * their underlying storage ("reflink") between source and
> +		 * destination FDs, on file systems which support that.
> +		 *
> +		 * These IOCTL commands only apply to regular files.
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;

ditto

> +	case FIONREAD:
> +		/*
> +		 * FIONREAD returns the number of bytes available for reading.
> +		 *
> +		 * We require LANDLOCK_ACCESS_FS_IOCTL_DEV for FIONREAD, because
> +		 * devices implement it in f_ops->unlocked_ioctl().  The
> +		 * implementations of this operation have varying quality and
> +		 * complexity, so it is hard to reason about what they do.
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> +	case FS_IOC_GETFLAGS:
> +	case FS_IOC_SETFLAGS:
> +	case FS_IOC_FSGETXATTR:
> +	case FS_IOC_FSSETXATTR:
> +		/*
> +		 * FS_IOC_GETFLAGS, FS_IOC_SETFLAGS, FS_IOC_FSGETXATTR and
> +		 * FS_IOC_FSSETXATTR do not apply for devices.
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> +	case FS_IOC_GETFSUUID:
> +	case FS_IOC_GETFSSYSFSPATH:
> +		/*
> +		 * FS_IOC_GETFSUUID and FS_IOC_GETFSSYSFSPATH both operate on
> +		 * the file system superblock, not on the specific file, so
> +		 * these operations are available through any other file on the
> +		 * same file system as well.
> +		 */
> +		return 0;
> +	case FIBMAP:
> +	case FS_IOC_RESVSP:
> +	case FS_IOC_RESVSP64:
> +	case FS_IOC_UNRESVSP:
> +	case FS_IOC_UNRESVSP64:
> +	case FS_IOC_ZERO_RANGE:
> +		/*
> +		 * FIBMAP, FS_IOC_RESVSP, FS_IOC_RESVSP64, FS_IOC_UNRESVSP,
> +		 * FS_IOC_UNRESVSP64 and FS_IOC_ZERO_RANGE only apply to regular
> +		 * files (as implemented in file_ioctl()).
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> +	default:
> +		/*
> +		 * Other commands are guarded by the catch-all access right.
> +		 */
> +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> +	}
> +}
> +
>  /* Ruleset management */
>  
>  static struct landlock_object *get_inode_object(struct inode *const inode)
> @@ -148,7 +286,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
>  	LANDLOCK_ACCESS_FS_EXECUTE | \
>  	LANDLOCK_ACCESS_FS_WRITE_FILE | \
>  	LANDLOCK_ACCESS_FS_READ_FILE | \
> -	LANDLOCK_ACCESS_FS_TRUNCATE)
> +	LANDLOCK_ACCESS_FS_TRUNCATE | \
> +	LANDLOCK_ACCESS_FS_IOCTL_DEV)
>  /* clang-format on */
>  
>  /*
> @@ -1335,8 +1474,10 @@ static int hook_file_alloc_security(struct file *const file)
>  static int hook_file_open(struct file *const file)
>  {
>  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> -	access_mask_t open_access_request, full_access_request, allowed_access;
> -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> +	access_mask_t open_access_request, full_access_request, allowed_access,
> +		optional_access;
> +	const struct inode *inode = file_inode(file);
> +	const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
>  	const struct landlock_ruleset *const dom =
>  		get_fs_domain(landlock_cred(file->f_cred)->domain);
>  
> @@ -1354,6 +1495,10 @@ static int hook_file_open(struct file *const file)
>  	 * We look up more access than what we immediately need for open(), so
>  	 * that we can later authorize operations on opened files.
>  	 */
> +	optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> +	if (is_device)
> +		optional_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;
> +
>  	full_access_request = open_access_request | optional_access;
>  
>  	if (is_access_to_paths_allowed(
> @@ -1410,6 +1555,36 @@ static int hook_file_truncate(struct file *const file)
>  	return -EACCES;
>  }
>  
> +static int hook_file_ioctl(struct file *file, unsigned int cmd,
> +			   unsigned long arg)
> +{
> +	const struct inode *inode = file_inode(file);
> +	const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
> +	access_mask_t required_access, allowed_access;

As explained in [2], I'd like not-sandboxed tasks to not have visible
performance impact because of Landlock:

  We should first check landlock_file(file)->allowed_access as in
  hook_file_truncate() to return as soon as possible for non-sandboxed
  tasks.  Any other computation should be done after that (e.g. with an
  is_device() helper).

[2] https://lore.kernel.org/r/20240311.If7ieshaegu2@digikod.net

This is_device(file) helper should also replace other is_device variables.


> +
> +	if (!is_device)
> +		return 0;
> +
> +	/*
> +	 * It is the access rights at the time of opening the file which
> +	 * determine whether IOCTL can be used on the opened file later.
> +	 *
> +	 * The access right is attached to the opened file in hook_file_open().
> +	 */
> +	required_access = get_required_ioctl_dev_access(cmd);
> +	allowed_access = landlock_file(file)->allowed_access;
> +	if ((allowed_access & required_access) == required_access)
> +		return 0;
> +
> +	return -EACCES;
> +}
> +
> +static int hook_file_ioctl_compat(struct file *file, unsigned int cmd,
> +				  unsigned long arg)
> +{
> +	return hook_file_ioctl(file, cmd, arg);

The compat-specific IOCTL commands are missing (e.g. FS_IOC_RESVSP_32).
Relying on is_masked_device_ioctl() should make this call OK though.

> +}
> +
>  static struct security_hook_list landlock_hooks[] __ro_after_init = {
>  	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
>
Mickaël Salaün March 28, 2024, 12:01 p.m. UTC | #2
On Wed, Mar 27, 2024 at 05:57:35PM +0100, Mickaël Salaün wrote:
> On Wed, Mar 27, 2024 at 01:10:31PM +0000, Günther Noack wrote:
> > Introduces the LANDLOCK_ACCESS_FS_IOCTL_DEV right
> > and increments the Landlock ABI version to 5.
> > 
> > This access right applies to device-custom IOCTL commands
> > when they are invoked on block or character device files.
> > 
> > Like the truncate right, this right is associated with a file
> > descriptor at the time of open(2), and gets respected even when the
> > file descriptor is used outside of the thread which it was originally
> > opened in.
> > 
> > Therefore, a newly enabled Landlock policy does not apply to file
> > descriptors which are already open.
> > 
> > If the LANDLOCK_ACCESS_FS_IOCTL_DEV right is handled, only a small
> > number of safe IOCTL commands will be permitted on newly opened device
> > files.  These include FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC, as well
> > as other IOCTL commands for regular files which are implemented in
> > fs/ioctl.c.
> > 
> > Noteworthy scenarios which require special attention:
> > 
> > TTY devices are often passed into a process from the parent process,
> > and so a newly enabled Landlock policy does not retroactively apply to
> > them automatically.  In the past, TTY devices have often supported
> > IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
> > letting callers control the TTY input buffer (and simulate
> > keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
> > modern kernels though.
> > 
> > Known limitations:
> > 
> > The LANDLOCK_ACCESS_FS_IOCTL_DEV access right is a coarse-grained
> > control over IOCTL commands.
> > 
> > Landlock users may use path-based restrictions in combination with
> > their knowledge about the file system layout to control what IOCTLs
> > can be done.
> > 
> > Cc: Paul Moore <paul@paul-moore.com>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Signed-off-by: Günther Noack <gnoack@google.com>
> > ---
> >  include/uapi/linux/landlock.h                |  33 +++-
> >  security/landlock/fs.c                       | 183 ++++++++++++++++++-
> >  security/landlock/limits.h                   |   2 +-
> >  security/landlock/syscalls.c                 |   8 +-
> >  tools/testing/selftests/landlock/base_test.c |   2 +-
> >  tools/testing/selftests/landlock/fs_test.c   |   5 +-
> >  6 files changed, 216 insertions(+), 17 deletions(-)
> > 
> > diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> > index 25c8d7677539..5d90e9799eb5 100644
> > --- a/include/uapi/linux/landlock.h
> > +++ b/include/uapi/linux/landlock.h
> > @@ -128,7 +128,7 @@ struct landlock_net_port_attr {
> >   * files and directories.  Files or directories opened before the sandboxing
> >   * are not subject to these restrictions.
> >   *
> > - * A file can only receive these access rights:
> > + * The following access rights apply only to files:
> >   *
> >   * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file.
> >   * - %LANDLOCK_ACCESS_FS_WRITE_FILE: Open a file with write access. Note that
> > @@ -138,12 +138,13 @@ struct landlock_net_port_attr {
> >   * - %LANDLOCK_ACCESS_FS_READ_FILE: Open a file with read access.
> >   * - %LANDLOCK_ACCESS_FS_TRUNCATE: Truncate a file with :manpage:`truncate(2)`,
> >   *   :manpage:`ftruncate(2)`, :manpage:`creat(2)`, or :manpage:`open(2)` with
> > - *   ``O_TRUNC``. Whether an opened file can be truncated with
> > - *   :manpage:`ftruncate(2)` is determined during :manpage:`open(2)`, in the
> > - *   same way as read and write permissions are checked during
> > - *   :manpage:`open(2)` using %LANDLOCK_ACCESS_FS_READ_FILE and
> > - *   %LANDLOCK_ACCESS_FS_WRITE_FILE. This access right is available since the
> > - *   third version of the Landlock ABI.
> > + *   ``O_TRUNC``.  This access right is available since the third version of the
> > + *   Landlock ABI.
> > + *
> > + * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
> > + * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
> > + * read and write permissions are checked during :manpage:`open(2)` using
> > + * %LANDLOCK_ACCESS_FS_READ_FILE and %LANDLOCK_ACCESS_FS_WRITE_FILE.
> >   *
> >   * A directory can receive access rights related to files or directories.  The
> >   * following access right is applied to the directory itself, and the
> > @@ -198,13 +199,28 @@ struct landlock_net_port_attr {
> >   *   If multiple requirements are not met, the ``EACCES`` error code takes
> >   *   precedence over ``EXDEV``.
> >   *
> > + * The following access right applies both to files and directories:
> > + *
> > + * - %LANDLOCK_ACCESS_FS_IOCTL_DEV: Invoke :manpage:`ioctl(2)` commands on an opened
> > + *   character or block device.
> > + *
> > + *   This access right applies to all `ioctl(2)` commands implemented by device
> 
> :manpage:`ioctl(2)`
> 
> > + *   drivers.  However, the following common IOCTL commands continue to be
> > + *   invokable independent of the %LANDLOCK_ACCESS_FS_IOCTL_DEV right:
> 
> This is good but explaining the rationale could help, something like
> this (taking care of not packing lines listing commands to ease review
> when a new command will be added):
> 
> IOCTL commands targetting file descriptors (``FIOCLEX``, ``FIONCLEX``),
> file descriptions (``FIONBIO``, ``FIOASYNC``),
> file systems (``FIOQSIZE``, ``FS_IOC_FIEMAP``, ``FICLONE``,
> ``FICLONERAN``, ``FIDEDUPERANGE``, ``FS_IOC_GETFLAGS``,
> ``FS_IOC_SETFLAGS``, ``FS_IOC_FSGETXATTR``, ``FS_IOC_FSSETXATTR``),
> or superblocks (``FIFREEZE``, ``FITHAW``, ``FIGETBSZ``,
> ``FS_IOC_GETFSUUID``, ``FS_IOC_GETFSSYSFSPATH``)
> are never denied.  However, such IOCTL commands still require an opened
> file and may not be available on any file type.  Read or write
> permission may be checked by the underlying implementation, as well as
> capabilities.
> 
> > + *
> > + *   ``FIOCLEX``, ``FIONCLEX``, ``FIONBIO``, ``FIOASYNC``, ``FIFREEZE``,
> > + *   ``FITHAW``, ``FIGETBSZ``, ``FS_IOC_GETFSUUID``, ``FS_IOC_GETFSSYSFSPATH``
> > + *
> > + *   This access right is available since the fifth version of the Landlock
> > + *   ABI.
> > + *
> >   * .. warning::
> >   *
> >   *   It is currently not possible to restrict some file-related actions
> >   *   accessible through these syscall families: :manpage:`chdir(2)`,
> >   *   :manpage:`stat(2)`, :manpage:`flock(2)`, :manpage:`chmod(2)`,
> >   *   :manpage:`chown(2)`, :manpage:`setxattr(2)`, :manpage:`utime(2)`,
> > - *   :manpage:`ioctl(2)`, :manpage:`fcntl(2)`, :manpage:`access(2)`.
> > + *   :manpage:`fcntl(2)`, :manpage:`access(2)`.
> >   *   Future Landlock evolutions will enable to restrict them.
> >   */
> >  /* clang-format off */
> > @@ -223,6 +239,7 @@ struct landlock_net_port_attr {
> >  #define LANDLOCK_ACCESS_FS_MAKE_SYM			(1ULL << 12)
> >  #define LANDLOCK_ACCESS_FS_REFER			(1ULL << 13)
> >  #define LANDLOCK_ACCESS_FS_TRUNCATE			(1ULL << 14)
> > +#define LANDLOCK_ACCESS_FS_IOCTL_DEV			(1ULL << 15)
> >  /* clang-format on */
> >  
> >  /**
> > diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> > index c15559432d3d..2ef6c57fa20b 100644
> > --- a/security/landlock/fs.c
> > +++ b/security/landlock/fs.c
> > @@ -7,6 +7,7 @@
> >   * Copyright © 2021-2022 Microsoft Corporation
> >   */
> >  
> > +#include <asm/ioctls.h>
> >  #include <kunit/test.h>
> >  #include <linux/atomic.h>
> >  #include <linux/bitops.h>
> > @@ -14,6 +15,7 @@
> >  #include <linux/compiler_types.h>
> >  #include <linux/dcache.h>
> >  #include <linux/err.h>
> > +#include <linux/falloc.h>
> >  #include <linux/fs.h>
> >  #include <linux/init.h>
> >  #include <linux/kernel.h>
> > @@ -29,6 +31,7 @@
> >  #include <linux/types.h>
> >  #include <linux/wait_bit.h>
> >  #include <linux/workqueue.h>
> > +#include <uapi/linux/fiemap.h>
> >  #include <uapi/linux/landlock.h>
> >  
> >  #include "common.h"
> > @@ -84,6 +87,141 @@ static const struct landlock_object_underops landlock_fs_underops = {
> >  	.release = release_inode
> >  };
> >  
> > +/* IOCTL helpers */
> > +
> > +/**
> > + * get_required_ioctl_dev_access(): Determine required access rights for IOCTLs
> > + * on device files.
> > + *
> > + * @cmd: The IOCTL command that is supposed to be run.
> > + *
> > + * By default, any IOCTL on a device file requires the
> > + * LANDLOCK_ACCESS_FS_IOCTL_DEV right.  We make exceptions for commands, if:
> > + *
> > + * 1. The command is implemented in fs/ioctl.c's do_vfs_ioctl(),
> > + *    not in f_ops->unlocked_ioctl() or f_ops->compat_ioctl().
> > + *
> > + * 2. The command can be reasonably used on a device file at all.
> > + *
> > + * Any new IOCTL commands that are implemented in fs/ioctl.c's do_vfs_ioctl()
> > + * should be considered for inclusion here.
> > + *
> > + * Returns: The access rights that must be granted on an opened file in order to
> > + * use the given @cmd.
> > + */
> > +static __attribute_const__ access_mask_t
> > +get_required_ioctl_dev_access(const unsigned int cmd)
> > +{
> > +	switch (cmd) {
> > +	case FIOCLEX:
> > +	case FIONCLEX:
> > +	case FIONBIO:
> > +	case FIOASYNC:
> > +		/*
> > +		 * FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC manipulate the FD's
> > +		 * close-on-exec and the file's buffered-IO and async flags.
> > +		 * These operations are also available through fcntl(2), and are
> > +		 * unconditionally permitted in Landlock.
> > +		 */
> > +		return 0;
> > +	case FIOQSIZE:
> > +		/*
> > +		 * FIOQSIZE queries the size of a regular file or directory.
> > +		 *
> > +		 * This IOCTL command only applies to regular files and
> > +		 * directories.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> 
> This should always be allowed because do_vfs_ioctl() never returns
> -ENOIOCTLCMD for this command.  That's why I wrote
> vfs_masked_device_ioctl() this way [1].  I think it would be easier to
> read and maintain this code with a is_masked_device_ioctl() logic.  Listing
> commands that are not masked makes it difficult to review because
> allowed and denied return codes are interleaved.
> 
> [1] https://lore.kernel.org/r/20240219183539.2926165-1-mic@digikod.net
> 
> Your IOCTL command explanation comments are nice and they should be kept
> in is_masked_device_ioctl() (if they mask device IOCTL commands).
> 
> > +	case FIFREEZE:
> > +	case FITHAW:
> > +		/*
> > +		 * FIFREEZE and FITHAW freeze and thaw the file system which the
> > +		 * given file belongs to.  Requires CAP_SYS_ADMIN.
> > +		 *
> > +		 * These commands operate on the file system's superblock rather
> > +		 * than on the file itself.  The same operations can also be
> > +		 * done through any other file or directory on the same file
> > +		 * system, so it is safe to permit these.
> > +		 */
> > +		return 0;
> > +	case FS_IOC_FIEMAP:
> > +		/*
> > +		 * FS_IOC_FIEMAP queries information about the allocation of
> > +		 * blocks within a file.
> > +		 *
> > +		 * This IOCTL command only applies to regular files.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> 
> Same here.
> 
> > +	case FIGETBSZ:
> > +		/*
> > +		 * FIGETBSZ queries the file system's block size for a file or
> > +		 * directory.
> > +		 *
> > +		 * This command operates on the file system's superblock rather
> > +		 * than on the file itself.  The same operation can also be done
> > +		 * through any other file or directory on the same file system,
> > +		 * so it is safe to permit it.
> > +		 */
> > +		return 0;
> > +	case FICLONE:
> > +	case FICLONERANGE:
> > +	case FIDEDUPERANGE:
> > +		/*
> > +		 * FICLONE, FICLONERANGE and FIDEDUPERANGE make files share
> > +		 * their underlying storage ("reflink") between source and
> > +		 * destination FDs, on file systems which support that.
> > +		 *
> > +		 * These IOCTL commands only apply to regular files.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> 
> ditto
> 
> > +	case FIONREAD:
> > +		/*
> > +		 * FIONREAD returns the number of bytes available for reading.
> > +		 *
> > +		 * We require LANDLOCK_ACCESS_FS_IOCTL_DEV for FIONREAD, because
> > +		 * devices implement it in f_ops->unlocked_ioctl().  The
> > +		 * implementations of this operation have varying quality and
> > +		 * complexity, so it is hard to reason about what they do.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > +	case FS_IOC_GETFLAGS:
> > +	case FS_IOC_SETFLAGS:
> > +	case FS_IOC_FSGETXATTR:
> > +	case FS_IOC_FSSETXATTR:
> > +		/*
> > +		 * FS_IOC_GETFLAGS, FS_IOC_SETFLAGS, FS_IOC_FSGETXATTR and
> > +		 * FS_IOC_FSSETXATTR do not apply for devices.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > +	case FS_IOC_GETFSUUID:
> > +	case FS_IOC_GETFSSYSFSPATH:
> > +		/*
> > +		 * FS_IOC_GETFSUUID and FS_IOC_GETFSSYSFSPATH both operate on
> > +		 * the file system superblock, not on the specific file, so
> > +		 * these operations are available through any other file on the
> > +		 * same file system as well.
> > +		 */
> > +		return 0;
> > +	case FIBMAP:
> > +	case FS_IOC_RESVSP:
> > +	case FS_IOC_RESVSP64:
> > +	case FS_IOC_UNRESVSP:
> > +	case FS_IOC_UNRESVSP64:
> > +	case FS_IOC_ZERO_RANGE:
> > +		/*
> > +		 * FIBMAP, FS_IOC_RESVSP, FS_IOC_RESVSP64, FS_IOC_UNRESVSP,
> > +		 * FS_IOC_UNRESVSP64 and FS_IOC_ZERO_RANGE only apply to regular
> > +		 * files (as implemented in file_ioctl()).
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > +	default:
> > +		/*
> > +		 * Other commands are guarded by the catch-all access right.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > +	}
> > +}
> > +
> >  /* Ruleset management */
> >  
> >  static struct landlock_object *get_inode_object(struct inode *const inode)
> > @@ -148,7 +286,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
> >  	LANDLOCK_ACCESS_FS_EXECUTE | \
> >  	LANDLOCK_ACCESS_FS_WRITE_FILE | \
> >  	LANDLOCK_ACCESS_FS_READ_FILE | \
> > -	LANDLOCK_ACCESS_FS_TRUNCATE)
> > +	LANDLOCK_ACCESS_FS_TRUNCATE | \
> > +	LANDLOCK_ACCESS_FS_IOCTL_DEV)
> >  /* clang-format on */
> >  
> >  /*
> > @@ -1335,8 +1474,10 @@ static int hook_file_alloc_security(struct file *const file)
> >  static int hook_file_open(struct file *const file)
> >  {
> >  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> > -	access_mask_t open_access_request, full_access_request, allowed_access;
> > -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > +	access_mask_t open_access_request, full_access_request, allowed_access,
> > +		optional_access;
> > +	const struct inode *inode = file_inode(file);
> > +	const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
> >  	const struct landlock_ruleset *const dom =
> >  		get_fs_domain(landlock_cred(file->f_cred)->domain);
> >  
> > @@ -1354,6 +1495,10 @@ static int hook_file_open(struct file *const file)
> >  	 * We look up more access than what we immediately need for open(), so
> >  	 * that we can later authorize operations on opened files.
> >  	 */
> > +	optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > +	if (is_device)
> > +		optional_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > +
> >  	full_access_request = open_access_request | optional_access;
> >  
> >  	if (is_access_to_paths_allowed(
> > @@ -1410,6 +1555,36 @@ static int hook_file_truncate(struct file *const file)
> >  	return -EACCES;
> >  }
> >  
> > +static int hook_file_ioctl(struct file *file, unsigned int cmd,
> > +			   unsigned long arg)
> > +{
> > +	const struct inode *inode = file_inode(file);
> > +	const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
> > +	access_mask_t required_access, allowed_access;
> 
> As explained in [2], I'd like not-sandboxed tasks to not have visible
> performance impact because of Landlock:
> 
>   We should first check landlock_file(file)->allowed_access as in
>   hook_file_truncate() to return as soon as possible for non-sandboxed
>   tasks.  Any other computation should be done after that (e.g. with an
>   is_device() helper).
> 
> [2] https://lore.kernel.org/r/20240311.If7ieshaegu2@digikod.net
> 
> This is_device(file) helper should also replace other is_device variables.
> 
> 
> > +
> > +	if (!is_device)
> > +		return 0;
> > +
> > +	/*
> > +	 * It is the access rights at the time of opening the file which
> > +	 * determine whether IOCTL can be used on the opened file later.
> > +	 *
> > +	 * The access right is attached to the opened file in hook_file_open().
> > +	 */
> > +	required_access = get_required_ioctl_dev_access(cmd);
> > +	allowed_access = landlock_file(file)->allowed_access;
> > +	if ((allowed_access & required_access) == required_access)
> > +		return 0;
> > +
> > +	return -EACCES;
> > +}
> > +
> > +static int hook_file_ioctl_compat(struct file *file, unsigned int cmd,
> > +				  unsigned long arg)
> > +{
> > +	return hook_file_ioctl(file, cmd, arg);
> 
> The compat-specific IOCTL commands are missing (e.g. FS_IOC_RESVSP_32).
> Relying on is_masked_device_ioctl() should make this call OK though.

Well no, see vfs_masked_device_ioctl_compat().

> 
> > +}
> > +
> >  static struct security_hook_list landlock_hooks[] __ro_after_init = {
> >  	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
> >
Günther Noack April 2, 2024, 6:28 p.m. UTC | #3
Hello!

Thanks for the review!

On Wed, Mar 27, 2024 at 05:57:31PM +0100, Mickaël Salaün wrote:
> On Wed, Mar 27, 2024 at 01:10:31PM +0000, Günther Noack wrote:
> > Introduces the LANDLOCK_ACCESS_FS_IOCTL_DEV right
> > and increments the Landlock ABI version to 5.
> > 
> > This access right applies to device-custom IOCTL commands
> > when they are invoked on block or character device files.
> > 
> > Like the truncate right, this right is associated with a file
> > descriptor at the time of open(2), and gets respected even when the
> > file descriptor is used outside of the thread which it was originally
> > opened in.
> > 
> > Therefore, a newly enabled Landlock policy does not apply to file
> > descriptors which are already open.
> > 
> > If the LANDLOCK_ACCESS_FS_IOCTL_DEV right is handled, only a small
> > number of safe IOCTL commands will be permitted on newly opened device
> > files.  These include FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC, as well
> > as other IOCTL commands for regular files which are implemented in
> > fs/ioctl.c.
> > 
> > Noteworthy scenarios which require special attention:
> > 
> > TTY devices are often passed into a process from the parent process,
> > and so a newly enabled Landlock policy does not retroactively apply to
> > them automatically.  In the past, TTY devices have often supported
> > IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
> > letting callers control the TTY input buffer (and simulate
> > keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
> > modern kernels though.
> > 
> > Known limitations:
> > 
> > The LANDLOCK_ACCESS_FS_IOCTL_DEV access right is a coarse-grained
> > control over IOCTL commands.
> > 
> > Landlock users may use path-based restrictions in combination with
> > their knowledge about the file system layout to control what IOCTLs
> > can be done.
> > 
> > Cc: Paul Moore <paul@paul-moore.com>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Signed-off-by: Günther Noack <gnoack@google.com>
> > ---
> >  include/uapi/linux/landlock.h                |  33 +++-
> >  security/landlock/fs.c                       | 183 ++++++++++++++++++-
> >  security/landlock/limits.h                   |   2 +-
> >  security/landlock/syscalls.c                 |   8 +-
> >  tools/testing/selftests/landlock/base_test.c |   2 +-
> >  tools/testing/selftests/landlock/fs_test.c   |   5 +-
> >  6 files changed, 216 insertions(+), 17 deletions(-)
> > 
> > diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> > index 25c8d7677539..5d90e9799eb5 100644
> > --- a/include/uapi/linux/landlock.h
> > +++ b/include/uapi/linux/landlock.h
> > @@ -128,7 +128,7 @@ struct landlock_net_port_attr {
> >   * files and directories.  Files or directories opened before the sandboxing
> >   * are not subject to these restrictions.
> >   *
> > - * A file can only receive these access rights:
> > + * The following access rights apply only to files:
> >   *
> >   * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file.
> >   * - %LANDLOCK_ACCESS_FS_WRITE_FILE: Open a file with write access. Note that
> > @@ -138,12 +138,13 @@ struct landlock_net_port_attr {
> >   * - %LANDLOCK_ACCESS_FS_READ_FILE: Open a file with read access.
> >   * - %LANDLOCK_ACCESS_FS_TRUNCATE: Truncate a file with :manpage:`truncate(2)`,
> >   *   :manpage:`ftruncate(2)`, :manpage:`creat(2)`, or :manpage:`open(2)` with
> > - *   ``O_TRUNC``. Whether an opened file can be truncated with
> > - *   :manpage:`ftruncate(2)` is determined during :manpage:`open(2)`, in the
> > - *   same way as read and write permissions are checked during
> > - *   :manpage:`open(2)` using %LANDLOCK_ACCESS_FS_READ_FILE and
> > - *   %LANDLOCK_ACCESS_FS_WRITE_FILE. This access right is available since the
> > - *   third version of the Landlock ABI.
> > + *   ``O_TRUNC``.  This access right is available since the third version of the
> > + *   Landlock ABI.
> > + *
> > + * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
> > + * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
> > + * read and write permissions are checked during :manpage:`open(2)` using
> > + * %LANDLOCK_ACCESS_FS_READ_FILE and %LANDLOCK_ACCESS_FS_WRITE_FILE.
> >   *
> >   * A directory can receive access rights related to files or directories.  The
> >   * following access right is applied to the directory itself, and the
> > @@ -198,13 +199,28 @@ struct landlock_net_port_attr {
> >   *   If multiple requirements are not met, the ``EACCES`` error code takes
> >   *   precedence over ``EXDEV``.
> >   *
> > + * The following access right applies both to files and directories:
> > + *
> > + * - %LANDLOCK_ACCESS_FS_IOCTL_DEV: Invoke :manpage:`ioctl(2)` commands on an opened
> > + *   character or block device.
> > + *
> > + *   This access right applies to all `ioctl(2)` commands implemented by device
> 
> :manpage:`ioctl(2)`
> 
> > + *   drivers.  However, the following common IOCTL commands continue to be
> > + *   invokable independent of the %LANDLOCK_ACCESS_FS_IOCTL_DEV right:
> 
> This is good but explaining the rationale could help, something like
> this (taking care of not packing lines listing commands to ease review
> when a new command will be added):
> 
> IOCTL commands targetting file descriptors (``FIOCLEX``, ``FIONCLEX``),
> file descriptions (``FIONBIO``, ``FIOASYNC``),
> file systems (``FIOQSIZE``, ``FS_IOC_FIEMAP``, ``FICLONE``,
> ``FICLONERAN``, ``FIDEDUPERANGE``, ``FS_IOC_GETFLAGS``,
> ``FS_IOC_SETFLAGS``, ``FS_IOC_FSGETXATTR``, ``FS_IOC_FSSETXATTR``),
> or superblocks (``FIFREEZE``, ``FITHAW``, ``FIGETBSZ``,
> ``FS_IOC_GETFSUUID``, ``FS_IOC_GETFSSYSFSPATH``)
> are never denied.  However, such IOCTL commands still require an opened
> file and may not be available on any file type.  Read or write
> permission may be checked by the underlying implementation, as well as
> capabilities.

OK, I'll add some more explanation in the next version.


> > + *   ``FIOCLEX``, ``FIONCLEX``, ``FIONBIO``, ``FIOASYNC``, ``FIFREEZE``,
> > + *   ``FITHAW``, ``FIGETBSZ``, ``FS_IOC_GETFSUUID``, ``FS_IOC_GETFSSYSFSPATH``
> > + *
> > + *   This access right is available since the fifth version of the Landlock
> > + *   ABI.
> > + *
> >   * .. warning::
> >   *
> >   *   It is currently not possible to restrict some file-related actions
> >   *   accessible through these syscall families: :manpage:`chdir(2)`,
> >   *   :manpage:`stat(2)`, :manpage:`flock(2)`, :manpage:`chmod(2)`,
> >   *   :manpage:`chown(2)`, :manpage:`setxattr(2)`, :manpage:`utime(2)`,
> > - *   :manpage:`ioctl(2)`, :manpage:`fcntl(2)`, :manpage:`access(2)`.
> > + *   :manpage:`fcntl(2)`, :manpage:`access(2)`.
> >   *   Future Landlock evolutions will enable to restrict them.
> >   */
> >  /* clang-format off */
> > @@ -223,6 +239,7 @@ struct landlock_net_port_attr {
> >  #define LANDLOCK_ACCESS_FS_MAKE_SYM			(1ULL << 12)
> >  #define LANDLOCK_ACCESS_FS_REFER			(1ULL << 13)
> >  #define LANDLOCK_ACCESS_FS_TRUNCATE			(1ULL << 14)
> > +#define LANDLOCK_ACCESS_FS_IOCTL_DEV			(1ULL << 15)
> >  /* clang-format on */
> >  
> >  /**
> > diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> > index c15559432d3d..2ef6c57fa20b 100644
> > --- a/security/landlock/fs.c
> > +++ b/security/landlock/fs.c
> > @@ -7,6 +7,7 @@
> >   * Copyright © 2021-2022 Microsoft Corporation
> >   */
> >  
> > +#include <asm/ioctls.h>
> >  #include <kunit/test.h>
> >  #include <linux/atomic.h>
> >  #include <linux/bitops.h>
> > @@ -14,6 +15,7 @@
> >  #include <linux/compiler_types.h>
> >  #include <linux/dcache.h>
> >  #include <linux/err.h>
> > +#include <linux/falloc.h>
> >  #include <linux/fs.h>
> >  #include <linux/init.h>
> >  #include <linux/kernel.h>
> > @@ -29,6 +31,7 @@
> >  #include <linux/types.h>
> >  #include <linux/wait_bit.h>
> >  #include <linux/workqueue.h>
> > +#include <uapi/linux/fiemap.h>
> >  #include <uapi/linux/landlock.h>
> >  
> >  #include "common.h"
> > @@ -84,6 +87,141 @@ static const struct landlock_object_underops landlock_fs_underops = {
> >  	.release = release_inode
> >  };
> >  
> > +/* IOCTL helpers */
> > +
> > +/**
> > + * get_required_ioctl_dev_access(): Determine required access rights for IOCTLs
> > + * on device files.
> > + *
> > + * @cmd: The IOCTL command that is supposed to be run.
> > + *
> > + * By default, any IOCTL on a device file requires the
> > + * LANDLOCK_ACCESS_FS_IOCTL_DEV right.  We make exceptions for commands, if:
> > + *
> > + * 1. The command is implemented in fs/ioctl.c's do_vfs_ioctl(),
> > + *    not in f_ops->unlocked_ioctl() or f_ops->compat_ioctl().
> > + *
> > + * 2. The command can be reasonably used on a device file at all.
> > + *
> > + * Any new IOCTL commands that are implemented in fs/ioctl.c's do_vfs_ioctl()
> > + * should be considered for inclusion here.
> > + *
> > + * Returns: The access rights that must be granted on an opened file in order to
> > + * use the given @cmd.
> > + */
> > +static __attribute_const__ access_mask_t
> > +get_required_ioctl_dev_access(const unsigned int cmd)
> > +{
> > +	switch (cmd) {
> > +	case FIOCLEX:
> > +	case FIONCLEX:
> > +	case FIONBIO:
> > +	case FIOASYNC:
> > +		/*
> > +		 * FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC manipulate the FD's
> > +		 * close-on-exec and the file's buffered-IO and async flags.
> > +		 * These operations are also available through fcntl(2), and are
> > +		 * unconditionally permitted in Landlock.
> > +		 */
> > +		return 0;
> > +	case FIOQSIZE:
> > +		/*
> > +		 * FIOQSIZE queries the size of a regular file or directory.
> > +		 *
> > +		 * This IOCTL command only applies to regular files and
> > +		 * directories.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> 
> This should always be allowed because do_vfs_ioctl() never returns
> -ENOIOCTLCMD for this command.  That's why I wrote
> vfs_masked_device_ioctl() this way [1].  I think it would be easier to
> read and maintain this code with a is_masked_device_ioctl() logic.  Listing
> commands that are not masked makes it difficult to review because
> allowed and denied return codes are interleaved.

Oh, I misunderstood you on [2], I think -- I was under the impression that you
wanted to keep the switch case in the same order (and with the same entries?) as
the original in do_vfs_ioctl.  So you'd prefer to only list the always-allowed
IOCTL commands here, as you have done in vfs_masked_device_ioctl() [3]?

[2] https://lore.kernel.org/all/20240326.ooCheem1biV2@digikod.net/
[3] https://lore.kernel.org/all/20240219183539.2926165-1-mic@digikod.net/


Can you please clarify how you make up your mind about what should be permitted
and what should not?  I have trouble understanding the rationale for the changes
that you asked for below, apart from the points that they are harmless and that
the return codes should be consistent.

The criteria that I have used in this patch set are that (a) it is implemented
in do_vfs_ioctl() rather than further below, and (b) it makes sense to use that
command on a device file.  (If we permit FIOQSIZE, FS_IOC_FIEMAP and others
here, we will get slightly more correct error codes in these cases, but the
IOCTLs will still not work, because they are not useful and not implemented for
devices. -- On the other hand, we are also increasing the exposed code surface a
bit.  For example, FS_IOC_FIEMAP is calling into inode->i_op->fiemap().  That is
probably harmless for device files, but requires us to reason at a deeper level
to convince ourselves of that.)

In your implementation at [3], you were permitting FICLONE* and FIDEDUPERANGE,
but not FS_IOC_ZERO_RANGE, which is like fallocate().  How are these cases
different to each other?  Is that on purpose?


> [1] https://lore.kernel.org/r/20240219183539.2926165-1-mic@digikod.net
> 
> Your IOCTL command explanation comments are nice and they should be kept
> in is_masked_device_ioctl() (if they mask device IOCTL commands).

OK

> 
> > +	case FIFREEZE:
> > +	case FITHAW:
> > +		/*
> > +		 * FIFREEZE and FITHAW freeze and thaw the file system which the
> > +		 * given file belongs to.  Requires CAP_SYS_ADMIN.
> > +		 *
> > +		 * These commands operate on the file system's superblock rather
> > +		 * than on the file itself.  The same operations can also be
> > +		 * done through any other file or directory on the same file
> > +		 * system, so it is safe to permit these.
> > +		 */
> > +		return 0;
> > +	case FS_IOC_FIEMAP:
> > +		/*
> > +		 * FS_IOC_FIEMAP queries information about the allocation of
> > +		 * blocks within a file.
> > +		 *
> > +		 * This IOCTL command only applies to regular files.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> 
> Same here.
> 
> > +	case FIGETBSZ:
> > +		/*
> > +		 * FIGETBSZ queries the file system's block size for a file or
> > +		 * directory.
> > +		 *
> > +		 * This command operates on the file system's superblock rather
> > +		 * than on the file itself.  The same operation can also be done
> > +		 * through any other file or directory on the same file system,
> > +		 * so it is safe to permit it.
> > +		 */
> > +		return 0;
> > +	case FICLONE:
> > +	case FICLONERANGE:
> > +	case FIDEDUPERANGE:
> > +		/*
> > +		 * FICLONE, FICLONERANGE and FIDEDUPERANGE make files share
> > +		 * their underlying storage ("reflink") between source and
> > +		 * destination FDs, on file systems which support that.
> > +		 *
> > +		 * These IOCTL commands only apply to regular files.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> 
> ditto
> 
> > +	case FIONREAD:
> > +		/*
> > +		 * FIONREAD returns the number of bytes available for reading.
> > +		 *
> > +		 * We require LANDLOCK_ACCESS_FS_IOCTL_DEV for FIONREAD, because
> > +		 * devices implement it in f_ops->unlocked_ioctl().  The
> > +		 * implementations of this operation have varying quality and
> > +		 * complexity, so it is hard to reason about what they do.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > +	case FS_IOC_GETFLAGS:
> > +	case FS_IOC_SETFLAGS:
> > +	case FS_IOC_FSGETXATTR:
> > +	case FS_IOC_FSSETXATTR:
> > +		/*
> > +		 * FS_IOC_GETFLAGS, FS_IOC_SETFLAGS, FS_IOC_FSGETXATTR and
> > +		 * FS_IOC_FSSETXATTR do not apply for devices.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > +	case FS_IOC_GETFSUUID:
> > +	case FS_IOC_GETFSSYSFSPATH:
> > +		/*
> > +		 * FS_IOC_GETFSUUID and FS_IOC_GETFSSYSFSPATH both operate on
> > +		 * the file system superblock, not on the specific file, so
> > +		 * these operations are available through any other file on the
> > +		 * same file system as well.
> > +		 */
> > +		return 0;
> > +	case FIBMAP:
> > +	case FS_IOC_RESVSP:
> > +	case FS_IOC_RESVSP64:
> > +	case FS_IOC_UNRESVSP:
> > +	case FS_IOC_UNRESVSP64:
> > +	case FS_IOC_ZERO_RANGE:
> > +		/*
> > +		 * FIBMAP, FS_IOC_RESVSP, FS_IOC_RESVSP64, FS_IOC_UNRESVSP,
> > +		 * FS_IOC_UNRESVSP64 and FS_IOC_ZERO_RANGE only apply to regular
> > +		 * files (as implemented in file_ioctl()).
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > +	default:
> > +		/*
> > +		 * Other commands are guarded by the catch-all access right.
> > +		 */
> > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > +	}
> > +}
> > +
> >  /* Ruleset management */
> >  
> >  static struct landlock_object *get_inode_object(struct inode *const inode)
> > @@ -148,7 +286,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
> >  	LANDLOCK_ACCESS_FS_EXECUTE | \
> >  	LANDLOCK_ACCESS_FS_WRITE_FILE | \
> >  	LANDLOCK_ACCESS_FS_READ_FILE | \
> > -	LANDLOCK_ACCESS_FS_TRUNCATE)
> > +	LANDLOCK_ACCESS_FS_TRUNCATE | \
> > +	LANDLOCK_ACCESS_FS_IOCTL_DEV)
> >  /* clang-format on */
> >  
> >  /*
> > @@ -1335,8 +1474,10 @@ static int hook_file_alloc_security(struct file *const file)
> >  static int hook_file_open(struct file *const file)
> >  {
> >  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> > -	access_mask_t open_access_request, full_access_request, allowed_access;
> > -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > +	access_mask_t open_access_request, full_access_request, allowed_access,
> > +		optional_access;
> > +	const struct inode *inode = file_inode(file);
> > +	const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
> >  	const struct landlock_ruleset *const dom =
> >  		get_fs_domain(landlock_cred(file->f_cred)->domain);
> >  
> > @@ -1354,6 +1495,10 @@ static int hook_file_open(struct file *const file)
> >  	 * We look up more access than what we immediately need for open(), so
> >  	 * that we can later authorize operations on opened files.
> >  	 */
> > +	optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > +	if (is_device)
> > +		optional_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > +
> >  	full_access_request = open_access_request | optional_access;
> >  
> >  	if (is_access_to_paths_allowed(
> > @@ -1410,6 +1555,36 @@ static int hook_file_truncate(struct file *const file)
> >  	return -EACCES;
> >  }
> >  
> > +static int hook_file_ioctl(struct file *file, unsigned int cmd,
> > +			   unsigned long arg)
> > +{
> > +	const struct inode *inode = file_inode(file);
> > +	const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
> > +	access_mask_t required_access, allowed_access;
> 
> As explained in [2], I'd like not-sandboxed tasks to not have visible
> performance impact because of Landlock:
> 
>   We should first check landlock_file(file)->allowed_access as in
>   hook_file_truncate() to return as soon as possible for non-sandboxed
>   tasks.  Any other computation should be done after that (e.g. with an
>   is_device() helper).
> 
> [2] https://lore.kernel.org/r/20240311.If7ieshaegu2@digikod.net
> 
> This is_device(file) helper should also replace other is_device variables.

Done.

FWIW, I have doubts that it makes a performance difference - the is_device()
check is almost for free as well.  But we can pull the same check earlier for
consistency with the truncate hook, if it helps people to understand that their
own program performance should be unaffected.

> 
> 
> > +
> > +	if (!is_device)
> > +		return 0;
> > +
> > +	/*
> > +	 * It is the access rights at the time of opening the file which
> > +	 * determine whether IOCTL can be used on the opened file later.
> > +	 *
> > +	 * The access right is attached to the opened file in hook_file_open().
> > +	 */
> > +	required_access = get_required_ioctl_dev_access(cmd);
> > +	allowed_access = landlock_file(file)->allowed_access;
> > +	if ((allowed_access & required_access) == required_access)
> > +		return 0;
> > +
> > +	return -EACCES;
> > +}
> > +
> > +static int hook_file_ioctl_compat(struct file *file, unsigned int cmd,
> > +				  unsigned long arg)
> > +{
> > +	return hook_file_ioctl(file, cmd, arg);
> 
> The compat-specific IOCTL commands are missing (e.g. FS_IOC_RESVSP_32).
> Relying on is_masked_device_ioctl() should make this call OK though.

OK, I'll try to replicate the logic from your vfs_masked_device_ioctl() approach
then?

—Günther
Mickaël Salaün April 3, 2024, 11:15 a.m. UTC | #4
On Tue, Apr 02, 2024 at 08:28:49PM +0200, Günther Noack wrote:
> Hello!
> 
> Thanks for the review!
> 
> On Wed, Mar 27, 2024 at 05:57:31PM +0100, Mickaël Salaün wrote:
> > On Wed, Mar 27, 2024 at 01:10:31PM +0000, Günther Noack wrote:
> > > Introduces the LANDLOCK_ACCESS_FS_IOCTL_DEV right
> > > and increments the Landlock ABI version to 5.
> > > 
> > > This access right applies to device-custom IOCTL commands
> > > when they are invoked on block or character device files.
> > > 
> > > Like the truncate right, this right is associated with a file
> > > descriptor at the time of open(2), and gets respected even when the
> > > file descriptor is used outside of the thread which it was originally
> > > opened in.
> > > 
> > > Therefore, a newly enabled Landlock policy does not apply to file
> > > descriptors which are already open.
> > > 
> > > If the LANDLOCK_ACCESS_FS_IOCTL_DEV right is handled, only a small
> > > number of safe IOCTL commands will be permitted on newly opened device
> > > files.  These include FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC, as well
> > > as other IOCTL commands for regular files which are implemented in
> > > fs/ioctl.c.
> > > 
> > > Noteworthy scenarios which require special attention:
> > > 
> > > TTY devices are often passed into a process from the parent process,
> > > and so a newly enabled Landlock policy does not retroactively apply to
> > > them automatically.  In the past, TTY devices have often supported
> > > IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
> > > letting callers control the TTY input buffer (and simulate
> > > keypresses).  This should be restricted to CAP_SYS_ADMIN programs on
> > > modern kernels though.
> > > 
> > > Known limitations:
> > > 
> > > The LANDLOCK_ACCESS_FS_IOCTL_DEV access right is a coarse-grained
> > > control over IOCTL commands.
> > > 
> > > Landlock users may use path-based restrictions in combination with
> > > their knowledge about the file system layout to control what IOCTLs
> > > can be done.
> > > 
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > Signed-off-by: Günther Noack <gnoack@google.com>
> > > ---
> > >  include/uapi/linux/landlock.h                |  33 +++-
> > >  security/landlock/fs.c                       | 183 ++++++++++++++++++-
> > >  security/landlock/limits.h                   |   2 +-
> > >  security/landlock/syscalls.c                 |   8 +-
> > >  tools/testing/selftests/landlock/base_test.c |   2 +-
> > >  tools/testing/selftests/landlock/fs_test.c   |   5 +-
> > >  6 files changed, 216 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> > > index 25c8d7677539..5d90e9799eb5 100644
> > > --- a/include/uapi/linux/landlock.h
> > > +++ b/include/uapi/linux/landlock.h
> > > @@ -128,7 +128,7 @@ struct landlock_net_port_attr {
> > >   * files and directories.  Files or directories opened before the sandboxing
> > >   * are not subject to these restrictions.
> > >   *
> > > - * A file can only receive these access rights:
> > > + * The following access rights apply only to files:
> > >   *
> > >   * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file.
> > >   * - %LANDLOCK_ACCESS_FS_WRITE_FILE: Open a file with write access. Note that
> > > @@ -138,12 +138,13 @@ struct landlock_net_port_attr {
> > >   * - %LANDLOCK_ACCESS_FS_READ_FILE: Open a file with read access.
> > >   * - %LANDLOCK_ACCESS_FS_TRUNCATE: Truncate a file with :manpage:`truncate(2)`,
> > >   *   :manpage:`ftruncate(2)`, :manpage:`creat(2)`, or :manpage:`open(2)` with
> > > - *   ``O_TRUNC``. Whether an opened file can be truncated with
> > > - *   :manpage:`ftruncate(2)` is determined during :manpage:`open(2)`, in the
> > > - *   same way as read and write permissions are checked during
> > > - *   :manpage:`open(2)` using %LANDLOCK_ACCESS_FS_READ_FILE and
> > > - *   %LANDLOCK_ACCESS_FS_WRITE_FILE. This access right is available since the
> > > - *   third version of the Landlock ABI.
> > > + *   ``O_TRUNC``.  This access right is available since the third version of the
> > > + *   Landlock ABI.
> > > + *
> > > + * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
> > > + * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
> > > + * read and write permissions are checked during :manpage:`open(2)` using
> > > + * %LANDLOCK_ACCESS_FS_READ_FILE and %LANDLOCK_ACCESS_FS_WRITE_FILE.
> > >   *
> > >   * A directory can receive access rights related to files or directories.  The
> > >   * following access right is applied to the directory itself, and the
> > > @@ -198,13 +199,28 @@ struct landlock_net_port_attr {
> > >   *   If multiple requirements are not met, the ``EACCES`` error code takes
> > >   *   precedence over ``EXDEV``.
> > >   *
> > > + * The following access right applies both to files and directories:
> > > + *
> > > + * - %LANDLOCK_ACCESS_FS_IOCTL_DEV: Invoke :manpage:`ioctl(2)` commands on an opened
> > > + *   character or block device.
> > > + *
> > > + *   This access right applies to all `ioctl(2)` commands implemented by device
> > 
> > :manpage:`ioctl(2)`
> > 
> > > + *   drivers.  However, the following common IOCTL commands continue to be
> > > + *   invokable independent of the %LANDLOCK_ACCESS_FS_IOCTL_DEV right:
> > 
> > This is good but explaining the rationale could help, something like
> > this (taking care of not packing lines listing commands to ease review
> > when a new command will be added):
> > 
> > IOCTL commands targetting file descriptors (``FIOCLEX``, ``FIONCLEX``),
> > file descriptions (``FIONBIO``, ``FIOASYNC``),
> > file systems (``FIOQSIZE``, ``FS_IOC_FIEMAP``, ``FICLONE``,
> > ``FICLONERAN``, ``FIDEDUPERANGE``, ``FS_IOC_GETFLAGS``,
> > ``FS_IOC_SETFLAGS``, ``FS_IOC_FSGETXATTR``, ``FS_IOC_FSSETXATTR``),
> > or superblocks (``FIFREEZE``, ``FITHAW``, ``FIGETBSZ``,
> > ``FS_IOC_GETFSUUID``, ``FS_IOC_GETFSSYSFSPATH``)
> > are never denied.  However, such IOCTL commands still require an opened
> > file and may not be available on any file type.  Read or write
> > permission may be checked by the underlying implementation, as well as
> > capabilities.
> 
> OK, I'll add some more explanation in the next version.
> 
> 
> > > + *   ``FIOCLEX``, ``FIONCLEX``, ``FIONBIO``, ``FIOASYNC``, ``FIFREEZE``,
> > > + *   ``FITHAW``, ``FIGETBSZ``, ``FS_IOC_GETFSUUID``, ``FS_IOC_GETFSSYSFSPATH``
> > > + *
> > > + *   This access right is available since the fifth version of the Landlock
> > > + *   ABI.
> > > + *
> > >   * .. warning::
> > >   *
> > >   *   It is currently not possible to restrict some file-related actions
> > >   *   accessible through these syscall families: :manpage:`chdir(2)`,
> > >   *   :manpage:`stat(2)`, :manpage:`flock(2)`, :manpage:`chmod(2)`,
> > >   *   :manpage:`chown(2)`, :manpage:`setxattr(2)`, :manpage:`utime(2)`,
> > > - *   :manpage:`ioctl(2)`, :manpage:`fcntl(2)`, :manpage:`access(2)`.
> > > + *   :manpage:`fcntl(2)`, :manpage:`access(2)`.
> > >   *   Future Landlock evolutions will enable to restrict them.
> > >   */
> > >  /* clang-format off */
> > > @@ -223,6 +239,7 @@ struct landlock_net_port_attr {
> > >  #define LANDLOCK_ACCESS_FS_MAKE_SYM			(1ULL << 12)
> > >  #define LANDLOCK_ACCESS_FS_REFER			(1ULL << 13)
> > >  #define LANDLOCK_ACCESS_FS_TRUNCATE			(1ULL << 14)
> > > +#define LANDLOCK_ACCESS_FS_IOCTL_DEV			(1ULL << 15)
> > >  /* clang-format on */
> > >  
> > >  /**
> > > diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> > > index c15559432d3d..2ef6c57fa20b 100644
> > > --- a/security/landlock/fs.c
> > > +++ b/security/landlock/fs.c
> > > @@ -7,6 +7,7 @@
> > >   * Copyright © 2021-2022 Microsoft Corporation
> > >   */
> > >  
> > > +#include <asm/ioctls.h>
> > >  #include <kunit/test.h>
> > >  #include <linux/atomic.h>
> > >  #include <linux/bitops.h>
> > > @@ -14,6 +15,7 @@
> > >  #include <linux/compiler_types.h>
> > >  #include <linux/dcache.h>
> > >  #include <linux/err.h>
> > > +#include <linux/falloc.h>
> > >  #include <linux/fs.h>
> > >  #include <linux/init.h>
> > >  #include <linux/kernel.h>
> > > @@ -29,6 +31,7 @@
> > >  #include <linux/types.h>
> > >  #include <linux/wait_bit.h>
> > >  #include <linux/workqueue.h>
> > > +#include <uapi/linux/fiemap.h>
> > >  #include <uapi/linux/landlock.h>
> > >  
> > >  #include "common.h"
> > > @@ -84,6 +87,141 @@ static const struct landlock_object_underops landlock_fs_underops = {
> > >  	.release = release_inode
> > >  };
> > >  
> > > +/* IOCTL helpers */
> > > +
> > > +/**
> > > + * get_required_ioctl_dev_access(): Determine required access rights for IOCTLs
> > > + * on device files.
> > > + *
> > > + * @cmd: The IOCTL command that is supposed to be run.
> > > + *
> > > + * By default, any IOCTL on a device file requires the
> > > + * LANDLOCK_ACCESS_FS_IOCTL_DEV right.  We make exceptions for commands, if:
> > > + *
> > > + * 1. The command is implemented in fs/ioctl.c's do_vfs_ioctl(),
> > > + *    not in f_ops->unlocked_ioctl() or f_ops->compat_ioctl().
> > > + *
> > > + * 2. The command can be reasonably used on a device file at all.
> > > + *
> > > + * Any new IOCTL commands that are implemented in fs/ioctl.c's do_vfs_ioctl()
> > > + * should be considered for inclusion here.
> > > + *
> > > + * Returns: The access rights that must be granted on an opened file in order to
> > > + * use the given @cmd.
> > > + */
> > > +static __attribute_const__ access_mask_t
> > > +get_required_ioctl_dev_access(const unsigned int cmd)
> > > +{
> > > +	switch (cmd) {
> > > +	case FIOCLEX:
> > > +	case FIONCLEX:
> > > +	case FIONBIO:
> > > +	case FIOASYNC:
> > > +		/*
> > > +		 * FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC manipulate the FD's
> > > +		 * close-on-exec and the file's buffered-IO and async flags.
> > > +		 * These operations are also available through fcntl(2), and are
> > > +		 * unconditionally permitted in Landlock.
> > > +		 */
> > > +		return 0;
> > > +	case FIOQSIZE:
> > > +		/*
> > > +		 * FIOQSIZE queries the size of a regular file or directory.
> > > +		 *
> > > +		 * This IOCTL command only applies to regular files and
> > > +		 * directories.
> > > +		 */
> > > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > 
> > This should always be allowed because do_vfs_ioctl() never returns
> > -ENOIOCTLCMD for this command.  That's why I wrote
> > vfs_masked_device_ioctl() this way [1].  I think it would be easier to
> > read and maintain this code with a is_masked_device_ioctl() logic.  Listing
> > commands that are not masked makes it difficult to review because
> > allowed and denied return codes are interleaved.
> 
> Oh, I misunderstood you on [2], I think -- I was under the impression that you
> wanted to keep the switch case in the same order (and with the same entries?) as
> the original in do_vfs_ioctl.  So you'd prefer to only list the always-allowed
> IOCTL commands here, as you have done in vfs_masked_device_ioctl() [3]?
> 
> [2] https://lore.kernel.org/all/20240326.ooCheem1biV2@digikod.net/
> [3] https://lore.kernel.org/all/20240219183539.2926165-1-mic@digikod.net/

That was indeed unclear.  About IOCTL commands, the same order ease
reviewing and maintenance but we don't need to list all commands,
which will limit updates of this list.  However, for the current
unused/unmasked one, we can still add them very briefly in comments as I
did with FIONREAD and file_ioctl()'s ones in vfs_masked_device_ioctl().
Only listing the "masked" ones (for device case) shorten the list, and
having a list with the same semantic ("mask device-specific IOCTLs")
ease review and maintenance as well.

> 
> Can you please clarify how you make up your mind about what should be permitted
> and what should not?  I have trouble understanding the rationale for the changes
> that you asked for below, apart from the points that they are harmless and that
> the return codes should be consistent.

The rationale is the same: all IOCTL commands that are not
passed/specific to character or block devices (i.e. IOCTLs defined in
fs/ioctl.c) are allowed.  vfs_masked_device_ioctl() returns true if the
IOCTL command is not passed to the related device driver but handled by
fs/ioctl.c instead (i.e. handled by the VFS layer).

> 
> The criteria that I have used in this patch set are that (a) it is implemented
> in do_vfs_ioctl() rather than further below, and (b) it makes sense to use that
> command on a device file.  (If we permit FIOQSIZE, FS_IOC_FIEMAP and others
> here, we will get slightly more correct error codes in these cases, but the
> IOCTLs will still not work, because they are not useful and not implemented for
> devices. -- On the other hand, we are also increasing the exposed code surface a
> bit.  For example, FS_IOC_FIEMAP is calling into inode->i_op->fiemap().  That is
> probably harmless for device files, but requires us to reason at a deeper level
> to convince ourselves of that.)

FIOQSIZE is fully handled by do_vfs_ioctl(), and FS_IOC_FIEMAP is
implemented as the inode level, so it should not be passed at the struct
file/device level unless ENOIOCTLCMD is returned (but it should not,
right?).  Because it depends on the inode implementation, it looks like
this IOCTL may work (in theory) on character or block devices too.  If
this is correct, we should not deny it because the semantic of
LANDLOCK_ACCESS_FS_IOCTL_DEV is to control IOCTLs passed to device
drivers.  Furthermore, as you pointed out, error codes would be
unaltered.

It would be good to test (as you suggested IIRC) the masked commands on
a simple device (e.g. /dev/null) to check that it returns ENOTTY,
EOPNOTSUPP, or EACCES according to our expectations.

I agree that this would increase a bit the exposed code surface but I'm
pretty sure that if a sandboxed process is allowed to access a device
file, it is also allowed to access directory or other file types as well
and then would still be able to reach the FS_IOC_FIEMAP implementation.

I'd like to avoid exceptions as in the current implementation of
get_required_ioctl_dev_access() with a switch/case either returning 0 or
LANDLOCK_ACCESS_FS_IOCTL_DEV (excluding the default case of course).  An
alternative approach would be to group IOCTL command cases according to
their returned value, but I find it a bit more complex for no meaningful
gain.  What do you think?

> 
> In your implementation at [3], you were permitting FICLONE* and FIDEDUPERANGE,
> but not FS_IOC_ZERO_RANGE, which is like fallocate().  How are these cases
> different to each other?  Is that on purpose?

FICLONE* and FIDEDUPERANGE match device files and the
vfs_clone_file_range()/generic_file_rw_checks() check returns EINVAL for
device files.  So there is no need to add exceptions for these commands.

FS_IOC_ZERO_RANGE is only implemented for regular files (see
file_ioctl() call), so it is passed to device files.

> 
> 
> > [1] https://lore.kernel.org/r/20240219183539.2926165-1-mic@digikod.net
> > 
> > Your IOCTL command explanation comments are nice and they should be kept
> > in is_masked_device_ioctl() (if they mask device IOCTL commands).
> 
> OK
> 
> > 
> > > +	case FIFREEZE:
> > > +	case FITHAW:
> > > +		/*
> > > +		 * FIFREEZE and FITHAW freeze and thaw the file system which the
> > > +		 * given file belongs to.  Requires CAP_SYS_ADMIN.
> > > +		 *
> > > +		 * These commands operate on the file system's superblock rather
> > > +		 * than on the file itself.  The same operations can also be
> > > +		 * done through any other file or directory on the same file
> > > +		 * system, so it is safe to permit these.
> > > +		 */
> > > +		return 0;
> > > +	case FS_IOC_FIEMAP:
> > > +		/*
> > > +		 * FS_IOC_FIEMAP queries information about the allocation of
> > > +		 * blocks within a file.
> > > +		 *
> > > +		 * This IOCTL command only applies to regular files.
> > > +		 */
> > > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > 
> > Same here.
> > 
> > > +	case FIGETBSZ:
> > > +		/*
> > > +		 * FIGETBSZ queries the file system's block size for a file or
> > > +		 * directory.
> > > +		 *
> > > +		 * This command operates on the file system's superblock rather
> > > +		 * than on the file itself.  The same operation can also be done
> > > +		 * through any other file or directory on the same file system,
> > > +		 * so it is safe to permit it.
> > > +		 */
> > > +		return 0;
> > > +	case FICLONE:
> > > +	case FICLONERANGE:
> > > +	case FIDEDUPERANGE:
> > > +		/*
> > > +		 * FICLONE, FICLONERANGE and FIDEDUPERANGE make files share
> > > +		 * their underlying storage ("reflink") between source and
> > > +		 * destination FDs, on file systems which support that.
> > > +		 *
> > > +		 * These IOCTL commands only apply to regular files.
> > > +		 */
> > > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > 
> > ditto
> > 
> > > +	case FIONREAD:
> > > +		/*
> > > +		 * FIONREAD returns the number of bytes available for reading.
> > > +		 *
> > > +		 * We require LANDLOCK_ACCESS_FS_IOCTL_DEV for FIONREAD, because
> > > +		 * devices implement it in f_ops->unlocked_ioctl().  The
> > > +		 * implementations of this operation have varying quality and
> > > +		 * complexity, so it is hard to reason about what they do.
> > > +		 */
> > > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > > +	case FS_IOC_GETFLAGS:
> > > +	case FS_IOC_SETFLAGS:
> > > +	case FS_IOC_FSGETXATTR:
> > > +	case FS_IOC_FSSETXATTR:
> > > +		/*
> > > +		 * FS_IOC_GETFLAGS, FS_IOC_SETFLAGS, FS_IOC_FSGETXATTR and
> > > +		 * FS_IOC_FSSETXATTR do not apply for devices.
> > > +		 */
> > > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > > +	case FS_IOC_GETFSUUID:
> > > +	case FS_IOC_GETFSSYSFSPATH:
> > > +		/*
> > > +		 * FS_IOC_GETFSUUID and FS_IOC_GETFSSYSFSPATH both operate on
> > > +		 * the file system superblock, not on the specific file, so
> > > +		 * these operations are available through any other file on the
> > > +		 * same file system as well.
> > > +		 */
> > > +		return 0;
> > > +	case FIBMAP:
> > > +	case FS_IOC_RESVSP:
> > > +	case FS_IOC_RESVSP64:
> > > +	case FS_IOC_UNRESVSP:
> > > +	case FS_IOC_UNRESVSP64:
> > > +	case FS_IOC_ZERO_RANGE:
> > > +		/*
> > > +		 * FIBMAP, FS_IOC_RESVSP, FS_IOC_RESVSP64, FS_IOC_UNRESVSP,
> > > +		 * FS_IOC_UNRESVSP64 and FS_IOC_ZERO_RANGE only apply to regular
> > > +		 * files (as implemented in file_ioctl()).
> > > +		 */
> > > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > > +	default:
> > > +		/*
> > > +		 * Other commands are guarded by the catch-all access right.
> > > +		 */
> > > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > > +	}
> > > +}
> > > +
> > >  /* Ruleset management */
> > >  
> > >  static struct landlock_object *get_inode_object(struct inode *const inode)
> > > @@ -148,7 +286,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
> > >  	LANDLOCK_ACCESS_FS_EXECUTE | \
> > >  	LANDLOCK_ACCESS_FS_WRITE_FILE | \
> > >  	LANDLOCK_ACCESS_FS_READ_FILE | \
> > > -	LANDLOCK_ACCESS_FS_TRUNCATE)
> > > +	LANDLOCK_ACCESS_FS_TRUNCATE | \
> > > +	LANDLOCK_ACCESS_FS_IOCTL_DEV)
> > >  /* clang-format on */
> > >  
> > >  /*
> > > @@ -1335,8 +1474,10 @@ static int hook_file_alloc_security(struct file *const file)
> > >  static int hook_file_open(struct file *const file)
> > >  {
> > >  	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> > > -	access_mask_t open_access_request, full_access_request, allowed_access;
> > > -	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > > +	access_mask_t open_access_request, full_access_request, allowed_access,
> > > +		optional_access;
> > > +	const struct inode *inode = file_inode(file);
> > > +	const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
> > >  	const struct landlock_ruleset *const dom =
> > >  		get_fs_domain(landlock_cred(file->f_cred)->domain);
> > >  
> > > @@ -1354,6 +1495,10 @@ static int hook_file_open(struct file *const file)
> > >  	 * We look up more access than what we immediately need for open(), so
> > >  	 * that we can later authorize operations on opened files.
> > >  	 */
> > > +	optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
> > > +	if (is_device)
> > > +		optional_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > > +
> > >  	full_access_request = open_access_request | optional_access;
> > >  
> > >  	if (is_access_to_paths_allowed(
> > > @@ -1410,6 +1555,36 @@ static int hook_file_truncate(struct file *const file)
> > >  	return -EACCES;
> > >  }
> > >  
> > > +static int hook_file_ioctl(struct file *file, unsigned int cmd,
> > > +			   unsigned long arg)
> > > +{
> > > +	const struct inode *inode = file_inode(file);
> > > +	const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
> > > +	access_mask_t required_access, allowed_access;
> > 
> > As explained in [2], I'd like not-sandboxed tasks to not have visible
> > performance impact because of Landlock:
> > 
> >   We should first check landlock_file(file)->allowed_access as in
> >   hook_file_truncate() to return as soon as possible for non-sandboxed
> >   tasks.  Any other computation should be done after that (e.g. with an
> >   is_device() helper).
> > 
> > [2] https://lore.kernel.org/r/20240311.If7ieshaegu2@digikod.net
> > 
> > This is_device(file) helper should also replace other is_device variables.
> 
> Done.
> 
> FWIW, I have doubts that it makes a performance difference - the is_device()
> check is almost for free as well.  But we can pull the same check earlier for
> consistency with the truncate hook, if it helps people to understand that their
> own program performance should be unaffected.

Agree

> 
> > 
> > 
> > > +
> > > +	if (!is_device)
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * It is the access rights at the time of opening the file which
> > > +	 * determine whether IOCTL can be used on the opened file later.
> > > +	 *
> > > +	 * The access right is attached to the opened file in hook_file_open().
> > > +	 */
> > > +	required_access = get_required_ioctl_dev_access(cmd);
> > > +	allowed_access = landlock_file(file)->allowed_access;
> > > +	if ((allowed_access & required_access) == required_access)
> > > +		return 0;
> > > +
> > > +	return -EACCES;
> > > +}
> > > +
> > > +static int hook_file_ioctl_compat(struct file *file, unsigned int cmd,
> > > +				  unsigned long arg)
> > > +{
> > > +	return hook_file_ioctl(file, cmd, arg);
> > 
> > The compat-specific IOCTL commands are missing (e.g. FS_IOC_RESVSP_32).
> > Relying on is_masked_device_ioctl() should make this call OK though.
> 
> OK, I'll try to replicate the logic from your vfs_masked_device_ioctl() approach
> then?

Yes please, unless you catch an issue with this approach.

> 
> —Günther
>
Günther Noack April 5, 2024, 4:17 p.m. UTC | #5
On Wed, Apr 03, 2024 at 01:15:45PM +0200, Mickaël Salaün wrote:
> On Tue, Apr 02, 2024 at 08:28:49PM +0200, Günther Noack wrote:
> > On Wed, Mar 27, 2024 at 05:57:31PM +0100, Mickaël Salaün wrote:
> > > On Wed, Mar 27, 2024 at 01:10:31PM +0000, Günther Noack wrote:
> > > > +	case FIOQSIZE:
> > > > +		/*
> > > > +		 * FIOQSIZE queries the size of a regular file or directory.
> > > > +		 *
> > > > +		 * This IOCTL command only applies to regular files and
> > > > +		 * directories.
> > > > +		 */
> > > > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > > 
> > > This should always be allowed because do_vfs_ioctl() never returns
> > > -ENOIOCTLCMD for this command.  That's why I wrote
> > > vfs_masked_device_ioctl() this way [1].  I think it would be easier to
> > > read and maintain this code with a is_masked_device_ioctl() logic.  Listing
> > > commands that are not masked makes it difficult to review because
> > > allowed and denied return codes are interleaved.
> > 
> > Oh, I misunderstood you on [2], I think -- I was under the impression that you
> > wanted to keep the switch case in the same order (and with the same entries?) as
> > the original in do_vfs_ioctl.  So you'd prefer to only list the always-allowed
> > IOCTL commands here, as you have done in vfs_masked_device_ioctl() [3]?
> > 
> > [2] https://lore.kernel.org/all/20240326.ooCheem1biV2@digikod.net/
> > [3] https://lore.kernel.org/all/20240219183539.2926165-1-mic@digikod.net/
> 
> That was indeed unclear.  About IOCTL commands, the same order ease
> reviewing and maintenance but we don't need to list all commands,
> which will limit updates of this list.  However, for the current
> unused/unmasked one, we can still add them very briefly in comments as I
> did with FIONREAD and file_ioctl()'s ones in vfs_masked_device_ioctl().
> Only listing the "masked" ones (for device case) shorten the list, and
> having a list with the same semantic ("mask device-specific IOCTLs")
> ease review and maintenance as well.
> 
> > 
> > Can you please clarify how you make up your mind about what should be permitted
> > and what should not?  I have trouble understanding the rationale for the changes
> > that you asked for below, apart from the points that they are harmless and that
> > the return codes should be consistent.
> 
> The rationale is the same: all IOCTL commands that are not
> passed/specific to character or block devices (i.e. IOCTLs defined in
> fs/ioctl.c) are allowed.  vfs_masked_device_ioctl() returns true if the
> IOCTL command is not passed to the related device driver but handled by
> fs/ioctl.c instead (i.e. handled by the VFS layer).

Thanks for clarifying -- this makes more sense now.  I traced the cases with
-ENOIOCTLCMD through the code more thoroughly and it is more aligned now with
what you implemented before.  The places where I ended up implementing it
differently to your vfs_masked_device_ioctl() patch are:

 * Do not blanket-permit FS_IOC_{GET,SET}{FLAGS,XATTR}.
   They fall back to the device implementation.

 * FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH are now handled.
   These return -ENOIOCTLCMD from do_vfs_ioctl(), so they do fall back to the
   handlers in struct file_operations, so we can not permit these either.

These seem like pretty clear cases to me.


> > The criteria that I have used in this patch set are that (a) it is implemented
> > in do_vfs_ioctl() rather than further below, and (b) it makes sense to use that
> > command on a device file.  (If we permit FIOQSIZE, FS_IOC_FIEMAP and others
> > here, we will get slightly more correct error codes in these cases, but the
> > IOCTLs will still not work, because they are not useful and not implemented for
> > devices. -- On the other hand, we are also increasing the exposed code surface a
> > bit.  For example, FS_IOC_FIEMAP is calling into inode->i_op->fiemap().  That is
> > probably harmless for device files, but requires us to reason at a deeper level
> > to convince ourselves of that.)
> 
> FIOQSIZE is fully handled by do_vfs_ioctl(), and FS_IOC_FIEMAP is
> implemented as the inode level, so it should not be passed at the struct
> file/device level unless ENOIOCTLCMD is returned (but it should not,
> right?).  Because it depends on the inode implementation, it looks like
> this IOCTL may work (in theory) on character or block devices too.  If
> this is correct, we should not deny it because the semantic of
> LANDLOCK_ACCESS_FS_IOCTL_DEV is to control IOCTLs passed to device
> drivers.  Furthermore, as you pointed out, error codes would be
> unaltered.
> 
> It would be good to test (as you suggested IIRC) the masked commands on
> a simple device (e.g. /dev/null) to check that it returns ENOTTY,
> EOPNOTSUPP, or EACCES according to our expectations.

Sounds good, I'll add a test.


> I agree that this would increase a bit the exposed code surface but I'm
> pretty sure that if a sandboxed process is allowed to access a device
> file, it is also allowed to access directory or other file types as well
> and then would still be able to reach the FS_IOC_FIEMAP implementation.

I assume you mean FIGETBSZ?  The FS_IOC_FIEMAP IOCTL is the one that returns
file extent maps, so that user space can reason about whether a file is stored
in a consecutive way on disk.


> I'd like to avoid exceptions as in the current implementation of
> get_required_ioctl_dev_access() with a switch/case either returning 0 or
> LANDLOCK_ACCESS_FS_IOCTL_DEV (excluding the default case of course).  An
> alternative approach would be to group IOCTL command cases according to
> their returned value, but I find it a bit more complex for no meaningful
> gain.  What do you think?

I don't have strong opinions about it, as long as we don't accidentally mess up
the fallbacks if this changes.


> > In your implementation at [3], you were permitting FICLONE* and FIDEDUPERANGE,
> > but not FS_IOC_ZERO_RANGE, which is like fallocate().  How are these cases
> > different to each other?  Is that on purpose?
> 
> FICLONE* and FIDEDUPERANGE match device files and the
> vfs_clone_file_range()/generic_file_rw_checks() check returns EINVAL for
> device files.  So there is no need to add exceptions for these commands.
> 
> FS_IOC_ZERO_RANGE is only implemented for regular files (see
> file_ioctl() call), so it is passed to device files.

Makes sense :)


—Günther
Günther Noack April 5, 2024, 4:22 p.m. UTC | #6
On Fri, Apr 05, 2024 at 06:17:17PM +0200, Günther Noack wrote:
> On Wed, Apr 03, 2024 at 01:15:45PM +0200, Mickaël Salaün wrote:
> > On Tue, Apr 02, 2024 at 08:28:49PM +0200, Günther Noack wrote:
> > > Can you please clarify how you make up your mind about what should be permitted
> > > and what should not?  I have trouble understanding the rationale for the changes
> > > that you asked for below, apart from the points that they are harmless and that
> > > the return codes should be consistent.
> > 
> > The rationale is the same: all IOCTL commands that are not
> > passed/specific to character or block devices (i.e. IOCTLs defined in
> > fs/ioctl.c) are allowed.  vfs_masked_device_ioctl() returns true if the
> > IOCTL command is not passed to the related device driver but handled by
> > fs/ioctl.c instead (i.e. handled by the VFS layer).
> 
> Thanks for clarifying -- this makes more sense now.  I traced the cases with
> -ENOIOCTLCMD through the code more thoroughly and it is more aligned now with
> what you implemented before.  The places where I ended up implementing it
> differently to your vfs_masked_device_ioctl() patch are:
> 
>  * Do not blanket-permit FS_IOC_{GET,SET}{FLAGS,XATTR}.
>    They fall back to the device implementation.
> 
>  * FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH are now handled.
>    These return -ENOIOCTLCMD from do_vfs_ioctl(), so they do fall back to the
>    handlers in struct file_operations, so we can not permit these either.

Kent, Amir:

Is it intentional that the new FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH IOCTLs
can fall back to a IOCTL implementation in struct file_operations?  I found this
remark by Amir which sounded vaguely like it might have been on purpose?  Did I
understand that correctly?

https://lore.kernel.org/lkml/CAOQ4uxjvEL4P4vV5SKpHVS5DtOwKpxAn4n4+Kfqawcu+H-MC5g@mail.gmail.com/

Otherwise, I am happy to send a patch to make it non-extensible (the impls in
fs/ioctl.c would need to return -ENOTTY).  This would let us reason better about
the safety of these IOCTLs for IOCTL security policies enforced by the Landlock
LSM. (Some of these file_operations IOCTL implementations do stuff before
looking at the cmd number.)

Thanks,
—Günther
Mickaël Salaün April 5, 2024, 6:01 p.m. UTC | #7
On Fri, Apr 05, 2024 at 06:17:17PM +0200, Günther Noack wrote:
> On Wed, Apr 03, 2024 at 01:15:45PM +0200, Mickaël Salaün wrote:
> > On Tue, Apr 02, 2024 at 08:28:49PM +0200, Günther Noack wrote:
> > > On Wed, Mar 27, 2024 at 05:57:31PM +0100, Mickaël Salaün wrote:
> > > > On Wed, Mar 27, 2024 at 01:10:31PM +0000, Günther Noack wrote:
> > > > > +	case FIOQSIZE:
> > > > > +		/*
> > > > > +		 * FIOQSIZE queries the size of a regular file or directory.
> > > > > +		 *
> > > > > +		 * This IOCTL command only applies to regular files and
> > > > > +		 * directories.
> > > > > +		 */
> > > > > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > > > 
> > > > This should always be allowed because do_vfs_ioctl() never returns
> > > > -ENOIOCTLCMD for this command.  That's why I wrote
> > > > vfs_masked_device_ioctl() this way [1].  I think it would be easier to
> > > > read and maintain this code with a is_masked_device_ioctl() logic.  Listing
> > > > commands that are not masked makes it difficult to review because
> > > > allowed and denied return codes are interleaved.
> > > 
> > > Oh, I misunderstood you on [2], I think -- I was under the impression that you
> > > wanted to keep the switch case in the same order (and with the same entries?) as
> > > the original in do_vfs_ioctl.  So you'd prefer to only list the always-allowed
> > > IOCTL commands here, as you have done in vfs_masked_device_ioctl() [3]?
> > > 
> > > [2] https://lore.kernel.org/all/20240326.ooCheem1biV2@digikod.net/
> > > [3] https://lore.kernel.org/all/20240219183539.2926165-1-mic@digikod.net/
> > 
> > That was indeed unclear.  About IOCTL commands, the same order ease
> > reviewing and maintenance but we don't need to list all commands,
> > which will limit updates of this list.  However, for the current
> > unused/unmasked one, we can still add them very briefly in comments as I
> > did with FIONREAD and file_ioctl()'s ones in vfs_masked_device_ioctl().
> > Only listing the "masked" ones (for device case) shorten the list, and
> > having a list with the same semantic ("mask device-specific IOCTLs")
> > ease review and maintenance as well.
> > 
> > > 
> > > Can you please clarify how you make up your mind about what should be permitted
> > > and what should not?  I have trouble understanding the rationale for the changes
> > > that you asked for below, apart from the points that they are harmless and that
> > > the return codes should be consistent.
> > 
> > The rationale is the same: all IOCTL commands that are not
> > passed/specific to character or block devices (i.e. IOCTLs defined in
> > fs/ioctl.c) are allowed.  vfs_masked_device_ioctl() returns true if the
> > IOCTL command is not passed to the related device driver but handled by
> > fs/ioctl.c instead (i.e. handled by the VFS layer).
> 
> Thanks for clarifying -- this makes more sense now.  I traced the cases with
> -ENOIOCTLCMD through the code more thoroughly and it is more aligned now with
> what you implemented before.  The places where I ended up implementing it
> differently to your vfs_masked_device_ioctl() patch are:
> 
>  * Do not blanket-permit FS_IOC_{GET,SET}{FLAGS,XATTR}.
>    They fall back to the device implementation.
> 
>  * FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH are now handled.
>    These return -ENOIOCTLCMD from do_vfs_ioctl(), so they do fall back to the
>    handlers in struct file_operations, so we can not permit these either.

Good catch!

> 
> These seem like pretty clear cases to me.
> 
> 
> > > The criteria that I have used in this patch set are that (a) it is implemented
> > > in do_vfs_ioctl() rather than further below, and (b) it makes sense to use that
> > > command on a device file.  (If we permit FIOQSIZE, FS_IOC_FIEMAP and others
> > > here, we will get slightly more correct error codes in these cases, but the
> > > IOCTLs will still not work, because they are not useful and not implemented for
> > > devices. -- On the other hand, we are also increasing the exposed code surface a
> > > bit.  For example, FS_IOC_FIEMAP is calling into inode->i_op->fiemap().  That is
> > > probably harmless for device files, but requires us to reason at a deeper level
> > > to convince ourselves of that.)
> > 
> > FIOQSIZE is fully handled by do_vfs_ioctl(), and FS_IOC_FIEMAP is
> > implemented as the inode level, so it should not be passed at the struct
> > file/device level unless ENOIOCTLCMD is returned (but it should not,
> > right?).  Because it depends on the inode implementation, it looks like
> > this IOCTL may work (in theory) on character or block devices too.  If
> > this is correct, we should not deny it because the semantic of
> > LANDLOCK_ACCESS_FS_IOCTL_DEV is to control IOCTLs passed to device
> > drivers.  Furthermore, as you pointed out, error codes would be
> > unaltered.
> > 
> > It would be good to test (as you suggested IIRC) the masked commands on
> > a simple device (e.g. /dev/null) to check that it returns ENOTTY,
> > EOPNOTSUPP, or EACCES according to our expectations.
> 
> Sounds good, I'll add a test.
> 
> 
> > I agree that this would increase a bit the exposed code surface but I'm
> > pretty sure that if a sandboxed process is allowed to access a device
> > file, it is also allowed to access directory or other file types as well
> > and then would still be able to reach the FS_IOC_FIEMAP implementation.
> 
> I assume you mean FIGETBSZ?  The FS_IOC_FIEMAP IOCTL is the one that returns
> file extent maps, so that user space can reason about whether a file is stored
> in a consecutive way on disk.

I meant FS_IOC_FIEMAP for regular files.

> 
> 
> > I'd like to avoid exceptions as in the current implementation of
> > get_required_ioctl_dev_access() with a switch/case either returning 0 or
> > LANDLOCK_ACCESS_FS_IOCTL_DEV (excluding the default case of course).  An
> > alternative approach would be to group IOCTL command cases according to
> > their returned value, but I find it a bit more complex for no meaningful
> > gain.  What do you think?
> 
> I don't have strong opinions about it, as long as we don't accidentally mess up
> the fallbacks if this changes.
> 
> 
> > > In your implementation at [3], you were permitting FICLONE* and FIDEDUPERANGE,
> > > but not FS_IOC_ZERO_RANGE, which is like fallocate().  How are these cases
> > > different to each other?  Is that on purpose?
> > 
> > FICLONE* and FIDEDUPERANGE match device files and the
> > vfs_clone_file_range()/generic_file_rw_checks() check returns EINVAL for
> > device files.  So there is no need to add exceptions for these commands.
> > 
> > FS_IOC_ZERO_RANGE is only implemented for regular files (see
> > file_ioctl() call), so it is passed to device files.
> 
> Makes sense :)
> 
> 
> —Günther
>
Mickaël Salaün April 5, 2024, 6:04 p.m. UTC | #8
On Fri, Apr 05, 2024 at 06:22:52PM +0200, Günther Noack wrote:
> On Fri, Apr 05, 2024 at 06:17:17PM +0200, Günther Noack wrote:
> > On Wed, Apr 03, 2024 at 01:15:45PM +0200, Mickaël Salaün wrote:
> > > On Tue, Apr 02, 2024 at 08:28:49PM +0200, Günther Noack wrote:
> > > > Can you please clarify how you make up your mind about what should be permitted
> > > > and what should not?  I have trouble understanding the rationale for the changes
> > > > that you asked for below, apart from the points that they are harmless and that
> > > > the return codes should be consistent.
> > > 
> > > The rationale is the same: all IOCTL commands that are not
> > > passed/specific to character or block devices (i.e. IOCTLs defined in
> > > fs/ioctl.c) are allowed.  vfs_masked_device_ioctl() returns true if the
> > > IOCTL command is not passed to the related device driver but handled by
> > > fs/ioctl.c instead (i.e. handled by the VFS layer).
> > 
> > Thanks for clarifying -- this makes more sense now.  I traced the cases with
> > -ENOIOCTLCMD through the code more thoroughly and it is more aligned now with
> > what you implemented before.  The places where I ended up implementing it
> > differently to your vfs_masked_device_ioctl() patch are:
> > 
> >  * Do not blanket-permit FS_IOC_{GET,SET}{FLAGS,XATTR}.
> >    They fall back to the device implementation.
> > 
> >  * FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH are now handled.
> >    These return -ENOIOCTLCMD from do_vfs_ioctl(), so they do fall back to the
> >    handlers in struct file_operations, so we can not permit these either.
> 
> Kent, Amir:
> 
> Is it intentional that the new FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH IOCTLs
> can fall back to a IOCTL implementation in struct file_operations?  I found this
> remark by Amir which sounded vaguely like it might have been on purpose?  Did I
> understand that correctly?

I think the rationale is that all new VFS IOCTLs should have this fall
back because device drivers might already implement them.

> 
> https://lore.kernel.org/lkml/CAOQ4uxjvEL4P4vV5SKpHVS5DtOwKpxAn4n4+Kfqawcu+H-MC5g@mail.gmail.com/
> 
> Otherwise, I am happy to send a patch to make it non-extensible (the impls in
> fs/ioctl.c would need to return -ENOTTY).  This would let us reason better about
> the safety of these IOCTLs for IOCTL security policies enforced by the Landlock
> LSM. (Some of these file_operations IOCTL implementations do stuff before
> looking at the cmd number.)
> 
> Thanks,
> —Günther
>
Kent Overstreet April 5, 2024, 6:17 p.m. UTC | #9
On Fri, Apr 05, 2024 at 06:22:52PM +0200, Günther Noack wrote:
> On Fri, Apr 05, 2024 at 06:17:17PM +0200, Günther Noack wrote:
> > On Wed, Apr 03, 2024 at 01:15:45PM +0200, Mickaël Salaün wrote:
> > > On Tue, Apr 02, 2024 at 08:28:49PM +0200, Günther Noack wrote:
> > > > Can you please clarify how you make up your mind about what should be permitted
> > > > and what should not?  I have trouble understanding the rationale for the changes
> > > > that you asked for below, apart from the points that they are harmless and that
> > > > the return codes should be consistent.
> > > 
> > > The rationale is the same: all IOCTL commands that are not
> > > passed/specific to character or block devices (i.e. IOCTLs defined in
> > > fs/ioctl.c) are allowed.  vfs_masked_device_ioctl() returns true if the
> > > IOCTL command is not passed to the related device driver but handled by
> > > fs/ioctl.c instead (i.e. handled by the VFS layer).
> > 
> > Thanks for clarifying -- this makes more sense now.  I traced the cases with
> > -ENOIOCTLCMD through the code more thoroughly and it is more aligned now with
> > what you implemented before.  The places where I ended up implementing it
> > differently to your vfs_masked_device_ioctl() patch are:
> > 
> >  * Do not blanket-permit FS_IOC_{GET,SET}{FLAGS,XATTR}.
> >    They fall back to the device implementation.
> > 
> >  * FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH are now handled.
> >    These return -ENOIOCTLCMD from do_vfs_ioctl(), so they do fall back to the
> >    handlers in struct file_operations, so we can not permit these either.
> 
> Kent, Amir:
> 
> Is it intentional that the new FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH IOCTLs
> can fall back to a IOCTL implementation in struct file_operations?  I found this
> remark by Amir which sounded vaguely like it might have been on purpose?  Did I
> understand that correctly?
> 
> https://lore.kernel.org/lkml/CAOQ4uxjvEL4P4vV5SKpHVS5DtOwKpxAn4n4+Kfqawcu+H-MC5g@mail.gmail.com/
> 
> Otherwise, I am happy to send a patch to make it non-extensible (the impls in
> fs/ioctl.c would need to return -ENOTTY).  This would let us reason better about
> the safety of these IOCTLs for IOCTL security policies enforced by the Landlock
> LSM. (Some of these file_operations IOCTL implementations do stuff before
> looking at the cmd number.)

They're not supposed to be extensible - the generic implementations are
all we need.
Günther Noack April 5, 2024, 9:44 p.m. UTC | #10
On Fri, Apr 05, 2024 at 02:17:29PM -0400, Kent Overstreet wrote:
> On Fri, Apr 05, 2024 at 06:22:52PM +0200, Günther Noack wrote:
> > Kent, Amir:
> > 
> > Is it intentional that the new FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH IOCTLs
> > can fall back to a IOCTL implementation in struct file_operations?  I found this
> > remark by Amir which sounded vaguely like it might have been on purpose?  Did I
> > understand that correctly?
> > 
> > https://lore.kernel.org/lkml/CAOQ4uxjvEL4P4vV5SKpHVS5DtOwKpxAn4n4+Kfqawcu+H-MC5g@mail.gmail.com/
> > 
> > Otherwise, I am happy to send a patch to make it non-extensible (the impls in
> > fs/ioctl.c would need to return -ENOTTY).  This would let us reason better about
> > the safety of these IOCTLs for IOCTL security policies enforced by the Landlock
> > LSM. (Some of these file_operations IOCTL implementations do stuff before
> > looking at the cmd number.)
> 
> They're not supposed to be extensible - the generic implementations are
> all we need.

Thank you for confirming, Kent -- I sent you a small patch as part of the next
version of the Landlock patch series:
https://lore.kernel.org/all/20240405214040.101396-2-gnoack@google.com/

—Günther
diff mbox series

Patch

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index 25c8d7677539..5d90e9799eb5 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -128,7 +128,7 @@  struct landlock_net_port_attr {
  * files and directories.  Files or directories opened before the sandboxing
  * are not subject to these restrictions.
  *
- * A file can only receive these access rights:
+ * The following access rights apply only to files:
  *
  * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file.
  * - %LANDLOCK_ACCESS_FS_WRITE_FILE: Open a file with write access. Note that
@@ -138,12 +138,13 @@  struct landlock_net_port_attr {
  * - %LANDLOCK_ACCESS_FS_READ_FILE: Open a file with read access.
  * - %LANDLOCK_ACCESS_FS_TRUNCATE: Truncate a file with :manpage:`truncate(2)`,
  *   :manpage:`ftruncate(2)`, :manpage:`creat(2)`, or :manpage:`open(2)` with
- *   ``O_TRUNC``. Whether an opened file can be truncated with
- *   :manpage:`ftruncate(2)` is determined during :manpage:`open(2)`, in the
- *   same way as read and write permissions are checked during
- *   :manpage:`open(2)` using %LANDLOCK_ACCESS_FS_READ_FILE and
- *   %LANDLOCK_ACCESS_FS_WRITE_FILE. This access right is available since the
- *   third version of the Landlock ABI.
+ *   ``O_TRUNC``.  This access right is available since the third version of the
+ *   Landlock ABI.
+ *
+ * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
+ * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
+ * read and write permissions are checked during :manpage:`open(2)` using
+ * %LANDLOCK_ACCESS_FS_READ_FILE and %LANDLOCK_ACCESS_FS_WRITE_FILE.
  *
  * A directory can receive access rights related to files or directories.  The
  * following access right is applied to the directory itself, and the
@@ -198,13 +199,28 @@  struct landlock_net_port_attr {
  *   If multiple requirements are not met, the ``EACCES`` error code takes
  *   precedence over ``EXDEV``.
  *
+ * The following access right applies both to files and directories:
+ *
+ * - %LANDLOCK_ACCESS_FS_IOCTL_DEV: Invoke :manpage:`ioctl(2)` commands on an opened
+ *   character or block device.
+ *
+ *   This access right applies to all `ioctl(2)` commands implemented by device
+ *   drivers.  However, the following common IOCTL commands continue to be
+ *   invokable independent of the %LANDLOCK_ACCESS_FS_IOCTL_DEV right:
+ *
+ *   ``FIOCLEX``, ``FIONCLEX``, ``FIONBIO``, ``FIOASYNC``, ``FIFREEZE``,
+ *   ``FITHAW``, ``FIGETBSZ``, ``FS_IOC_GETFSUUID``, ``FS_IOC_GETFSSYSFSPATH``
+ *
+ *   This access right is available since the fifth version of the Landlock
+ *   ABI.
+ *
  * .. warning::
  *
  *   It is currently not possible to restrict some file-related actions
  *   accessible through these syscall families: :manpage:`chdir(2)`,
  *   :manpage:`stat(2)`, :manpage:`flock(2)`, :manpage:`chmod(2)`,
  *   :manpage:`chown(2)`, :manpage:`setxattr(2)`, :manpage:`utime(2)`,
- *   :manpage:`ioctl(2)`, :manpage:`fcntl(2)`, :manpage:`access(2)`.
+ *   :manpage:`fcntl(2)`, :manpage:`access(2)`.
  *   Future Landlock evolutions will enable to restrict them.
  */
 /* clang-format off */
@@ -223,6 +239,7 @@  struct landlock_net_port_attr {
 #define LANDLOCK_ACCESS_FS_MAKE_SYM			(1ULL << 12)
 #define LANDLOCK_ACCESS_FS_REFER			(1ULL << 13)
 #define LANDLOCK_ACCESS_FS_TRUNCATE			(1ULL << 14)
+#define LANDLOCK_ACCESS_FS_IOCTL_DEV			(1ULL << 15)
 /* clang-format on */
 
 /**
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index c15559432d3d..2ef6c57fa20b 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -7,6 +7,7 @@ 
  * Copyright © 2021-2022 Microsoft Corporation
  */
 
+#include <asm/ioctls.h>
 #include <kunit/test.h>
 #include <linux/atomic.h>
 #include <linux/bitops.h>
@@ -14,6 +15,7 @@ 
 #include <linux/compiler_types.h>
 #include <linux/dcache.h>
 #include <linux/err.h>
+#include <linux/falloc.h>
 #include <linux/fs.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
@@ -29,6 +31,7 @@ 
 #include <linux/types.h>
 #include <linux/wait_bit.h>
 #include <linux/workqueue.h>
+#include <uapi/linux/fiemap.h>
 #include <uapi/linux/landlock.h>
 
 #include "common.h"
@@ -84,6 +87,141 @@  static const struct landlock_object_underops landlock_fs_underops = {
 	.release = release_inode
 };
 
+/* IOCTL helpers */
+
+/**
+ * get_required_ioctl_dev_access(): Determine required access rights for IOCTLs
+ * on device files.
+ *
+ * @cmd: The IOCTL command that is supposed to be run.
+ *
+ * By default, any IOCTL on a device file requires the
+ * LANDLOCK_ACCESS_FS_IOCTL_DEV right.  We make exceptions for commands, if:
+ *
+ * 1. The command is implemented in fs/ioctl.c's do_vfs_ioctl(),
+ *    not in f_ops->unlocked_ioctl() or f_ops->compat_ioctl().
+ *
+ * 2. The command can be reasonably used on a device file at all.
+ *
+ * Any new IOCTL commands that are implemented in fs/ioctl.c's do_vfs_ioctl()
+ * should be considered for inclusion here.
+ *
+ * Returns: The access rights that must be granted on an opened file in order to
+ * use the given @cmd.
+ */
+static __attribute_const__ access_mask_t
+get_required_ioctl_dev_access(const unsigned int cmd)
+{
+	switch (cmd) {
+	case FIOCLEX:
+	case FIONCLEX:
+	case FIONBIO:
+	case FIOASYNC:
+		/*
+		 * FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC manipulate the FD's
+		 * close-on-exec and the file's buffered-IO and async flags.
+		 * These operations are also available through fcntl(2), and are
+		 * unconditionally permitted in Landlock.
+		 */
+		return 0;
+	case FIOQSIZE:
+		/*
+		 * FIOQSIZE queries the size of a regular file or directory.
+		 *
+		 * This IOCTL command only applies to regular files and
+		 * directories.
+		 */
+		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
+	case FIFREEZE:
+	case FITHAW:
+		/*
+		 * FIFREEZE and FITHAW freeze and thaw the file system which the
+		 * given file belongs to.  Requires CAP_SYS_ADMIN.
+		 *
+		 * These commands operate on the file system's superblock rather
+		 * than on the file itself.  The same operations can also be
+		 * done through any other file or directory on the same file
+		 * system, so it is safe to permit these.
+		 */
+		return 0;
+	case FS_IOC_FIEMAP:
+		/*
+		 * FS_IOC_FIEMAP queries information about the allocation of
+		 * blocks within a file.
+		 *
+		 * This IOCTL command only applies to regular files.
+		 */
+		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
+	case FIGETBSZ:
+		/*
+		 * FIGETBSZ queries the file system's block size for a file or
+		 * directory.
+		 *
+		 * This command operates on the file system's superblock rather
+		 * than on the file itself.  The same operation can also be done
+		 * through any other file or directory on the same file system,
+		 * so it is safe to permit it.
+		 */
+		return 0;
+	case FICLONE:
+	case FICLONERANGE:
+	case FIDEDUPERANGE:
+		/*
+		 * FICLONE, FICLONERANGE and FIDEDUPERANGE make files share
+		 * their underlying storage ("reflink") between source and
+		 * destination FDs, on file systems which support that.
+		 *
+		 * These IOCTL commands only apply to regular files.
+		 */
+		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
+	case FIONREAD:
+		/*
+		 * FIONREAD returns the number of bytes available for reading.
+		 *
+		 * We require LANDLOCK_ACCESS_FS_IOCTL_DEV for FIONREAD, because
+		 * devices implement it in f_ops->unlocked_ioctl().  The
+		 * implementations of this operation have varying quality and
+		 * complexity, so it is hard to reason about what they do.
+		 */
+		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
+	case FS_IOC_GETFLAGS:
+	case FS_IOC_SETFLAGS:
+	case FS_IOC_FSGETXATTR:
+	case FS_IOC_FSSETXATTR:
+		/*
+		 * FS_IOC_GETFLAGS, FS_IOC_SETFLAGS, FS_IOC_FSGETXATTR and
+		 * FS_IOC_FSSETXATTR do not apply for devices.
+		 */
+		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
+	case FS_IOC_GETFSUUID:
+	case FS_IOC_GETFSSYSFSPATH:
+		/*
+		 * FS_IOC_GETFSUUID and FS_IOC_GETFSSYSFSPATH both operate on
+		 * the file system superblock, not on the specific file, so
+		 * these operations are available through any other file on the
+		 * same file system as well.
+		 */
+		return 0;
+	case FIBMAP:
+	case FS_IOC_RESVSP:
+	case FS_IOC_RESVSP64:
+	case FS_IOC_UNRESVSP:
+	case FS_IOC_UNRESVSP64:
+	case FS_IOC_ZERO_RANGE:
+		/*
+		 * FIBMAP, FS_IOC_RESVSP, FS_IOC_RESVSP64, FS_IOC_UNRESVSP,
+		 * FS_IOC_UNRESVSP64 and FS_IOC_ZERO_RANGE only apply to regular
+		 * files (as implemented in file_ioctl()).
+		 */
+		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
+	default:
+		/*
+		 * Other commands are guarded by the catch-all access right.
+		 */
+		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
+	}
+}
+
 /* Ruleset management */
 
 static struct landlock_object *get_inode_object(struct inode *const inode)
@@ -148,7 +286,8 @@  static struct landlock_object *get_inode_object(struct inode *const inode)
 	LANDLOCK_ACCESS_FS_EXECUTE | \
 	LANDLOCK_ACCESS_FS_WRITE_FILE | \
 	LANDLOCK_ACCESS_FS_READ_FILE | \
-	LANDLOCK_ACCESS_FS_TRUNCATE)
+	LANDLOCK_ACCESS_FS_TRUNCATE | \
+	LANDLOCK_ACCESS_FS_IOCTL_DEV)
 /* clang-format on */
 
 /*
@@ -1335,8 +1474,10 @@  static int hook_file_alloc_security(struct file *const file)
 static int hook_file_open(struct file *const file)
 {
 	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
-	access_mask_t open_access_request, full_access_request, allowed_access;
-	const access_mask_t optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
+	access_mask_t open_access_request, full_access_request, allowed_access,
+		optional_access;
+	const struct inode *inode = file_inode(file);
+	const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
 	const struct landlock_ruleset *const dom =
 		get_fs_domain(landlock_cred(file->f_cred)->domain);
 
@@ -1354,6 +1495,10 @@  static int hook_file_open(struct file *const file)
 	 * We look up more access than what we immediately need for open(), so
 	 * that we can later authorize operations on opened files.
 	 */
+	optional_access = LANDLOCK_ACCESS_FS_TRUNCATE;
+	if (is_device)
+		optional_access |= LANDLOCK_ACCESS_FS_IOCTL_DEV;
+
 	full_access_request = open_access_request | optional_access;
 
 	if (is_access_to_paths_allowed(
@@ -1410,6 +1555,36 @@  static int hook_file_truncate(struct file *const file)
 	return -EACCES;
 }
 
+static int hook_file_ioctl(struct file *file, unsigned int cmd,
+			   unsigned long arg)
+{
+	const struct inode *inode = file_inode(file);
+	const bool is_device = S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode);
+	access_mask_t required_access, allowed_access;
+
+	if (!is_device)
+		return 0;
+
+	/*
+	 * It is the access rights at the time of opening the file which
+	 * determine whether IOCTL can be used on the opened file later.
+	 *
+	 * The access right is attached to the opened file in hook_file_open().
+	 */
+	required_access = get_required_ioctl_dev_access(cmd);
+	allowed_access = landlock_file(file)->allowed_access;
+	if ((allowed_access & required_access) == required_access)
+		return 0;
+
+	return -EACCES;
+}
+
+static int hook_file_ioctl_compat(struct file *file, unsigned int cmd,
+				  unsigned long arg)
+{
+	return hook_file_ioctl(file, cmd, arg);
+}
+
 static struct security_hook_list landlock_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
 
@@ -1432,6 +1607,8 @@  static struct security_hook_list landlock_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(file_alloc_security, hook_file_alloc_security),
 	LSM_HOOK_INIT(file_open, hook_file_open),
 	LSM_HOOK_INIT(file_truncate, hook_file_truncate),
+	LSM_HOOK_INIT(file_ioctl, hook_file_ioctl),
+	LSM_HOOK_INIT(file_ioctl_compat, hook_file_ioctl_compat),
 };
 
 __init void landlock_add_fs_hooks(void)
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index 93c9c6f91556..20fdb5ff3514 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -18,7 +18,7 @@ 
 #define LANDLOCK_MAX_NUM_LAYERS		16
 #define LANDLOCK_MAX_NUM_RULES		U32_MAX
 
-#define LANDLOCK_LAST_ACCESS_FS		LANDLOCK_ACCESS_FS_TRUNCATE
+#define LANDLOCK_LAST_ACCESS_FS		LANDLOCK_ACCESS_FS_IOCTL_DEV
 #define LANDLOCK_MASK_ACCESS_FS		((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
 #define LANDLOCK_NUM_ACCESS_FS		__const_hweight64(LANDLOCK_MASK_ACCESS_FS)
 #define LANDLOCK_SHIFT_ACCESS_FS	0
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 6788e73b6681..9ae3dfa47443 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -149,7 +149,7 @@  static const struct file_operations ruleset_fops = {
 	.write = fop_dummy_write,
 };
 
-#define LANDLOCK_ABI_VERSION 4
+#define LANDLOCK_ABI_VERSION 5
 
 /**
  * sys_landlock_create_ruleset - Create a new ruleset
@@ -321,7 +321,11 @@  static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
 	if (!path_beneath_attr.allowed_access)
 		return -ENOMSG;
 
-	/* Checks that allowed_access matches the @ruleset constraints. */
+	/*
+	 * Checks that allowed_access matches the @ruleset constraints and only
+	 * consists of publicly visible access rights (as opposed to synthetic
+	 * ones).
+	 */
 	mask = landlock_get_raw_fs_access_mask(ruleset, 0);
 	if ((path_beneath_attr.allowed_access | mask) != mask)
 		return -EINVAL;
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index a6f89aaea77d..3c1e9f35b531 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -75,7 +75,7 @@  TEST(abi_version)
 	const struct landlock_ruleset_attr ruleset_attr = {
 		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
 	};
-	ASSERT_EQ(4, landlock_create_ruleset(NULL, 0,
+	ASSERT_EQ(5, landlock_create_ruleset(NULL, 0,
 					     LANDLOCK_CREATE_RULESET_VERSION));
 
 	ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index 9a6036fbf289..418ad745a5dd 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -529,9 +529,10 @@  TEST_F_FORK(layout1, inval)
 	LANDLOCK_ACCESS_FS_EXECUTE | \
 	LANDLOCK_ACCESS_FS_WRITE_FILE | \
 	LANDLOCK_ACCESS_FS_READ_FILE | \
-	LANDLOCK_ACCESS_FS_TRUNCATE)
+	LANDLOCK_ACCESS_FS_TRUNCATE | \
+	LANDLOCK_ACCESS_FS_IOCTL_DEV)
 
-#define ACCESS_LAST LANDLOCK_ACCESS_FS_TRUNCATE
+#define ACCESS_LAST LANDLOCK_ACCESS_FS_IOCTL_DEV
 
 #define ACCESS_ALL ( \
 	ACCESS_FILE | \