diff mbox series

[RFC,4/9] audit: add filtering for io_uring records

Message ID 162163380685.8379.17381053199011043757.stgit@sifl (mailing list archive)
State New, archived
Headers show
Series Add LSM access controls and auditing to io_uring | expand

Commit Message

Paul Moore May 21, 2021, 9:50 p.m. UTC
WARNING - This is a work in progress and should not be merged
anywhere important.  It is almost surely not complete, and while it
probably compiles it likely hasn't been booted and will do terrible
things.  You have been warned.

This patch adds basic audit io_uring filtering, using as much of the
existing audit filtering infrastructure as possible.  In order to do
this we reuse the audit filter rule's syscall mask for the io_uring
operation and we create a new filter for io_uring operations as
AUDIT_FILTER_URING_EXIT/audit_filter_list[7].

<TODO - provide some additional guidance for the userspace tools>

Signed-off-by: Paul Moore <paul@paul-moore.com>
---
 include/uapi/linux/audit.h |    3 +-
 kernel/auditfilter.c       |    4 ++-
 kernel/auditsc.c           |   65 ++++++++++++++++++++++++++++++++++----------
 3 files changed, 55 insertions(+), 17 deletions(-)

Comments

Richard Guy Briggs May 28, 2021, 10:35 p.m. UTC | #1
On 2021-05-21 17:50, Paul Moore wrote:
> WARNING - This is a work in progress and should not be merged
> anywhere important.  It is almost surely not complete, and while it
> probably compiles it likely hasn't been booted and will do terrible
> things.  You have been warned.
> 
> This patch adds basic audit io_uring filtering, using as much of the
> existing audit filtering infrastructure as possible.  In order to do
> this we reuse the audit filter rule's syscall mask for the io_uring
> operation and we create a new filter for io_uring operations as
> AUDIT_FILTER_URING_EXIT/audit_filter_list[7].
> 
> <TODO - provide some additional guidance for the userspace tools>
> 
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
>  include/uapi/linux/audit.h |    3 +-
>  kernel/auditfilter.c       |    4 ++-
>  kernel/auditsc.c           |   65 ++++++++++++++++++++++++++++++++++----------
>  3 files changed, 55 insertions(+), 17 deletions(-)
> 
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index b26e0c435e8b..621eb3c1076e 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -167,8 +167,9 @@
>  #define AUDIT_FILTER_EXCLUDE	0x05	/* Apply rule before record creation */
>  #define AUDIT_FILTER_TYPE	AUDIT_FILTER_EXCLUDE /* obsolete misleading naming */
>  #define AUDIT_FILTER_FS		0x06	/* Apply rule at __audit_inode_child */
> +#define AUDIT_FILTER_URING_EXIT	0x07	/* Apply rule at io_uring op exit */
>  
> -#define AUDIT_NR_FILTERS	7
> +#define AUDIT_NR_FILTERS	8
>  
>  #define AUDIT_FILTER_PREPEND	0x10	/* Prepend to front of list */
>  
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index db2c6b59dfc3..c21119c00504 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -44,7 +44,8 @@ struct list_head audit_filter_list[AUDIT_NR_FILTERS] = {
>  	LIST_HEAD_INIT(audit_filter_list[4]),
>  	LIST_HEAD_INIT(audit_filter_list[5]),
>  	LIST_HEAD_INIT(audit_filter_list[6]),
> -#if AUDIT_NR_FILTERS != 7
> +	LIST_HEAD_INIT(audit_filter_list[7]),
> +#if AUDIT_NR_FILTERS != 8
>  #error Fix audit_filter_list initialiser
>  #endif
>  };
> @@ -56,6 +57,7 @@ static struct list_head audit_rules_list[AUDIT_NR_FILTERS] = {
>  	LIST_HEAD_INIT(audit_rules_list[4]),
>  	LIST_HEAD_INIT(audit_rules_list[5]),
>  	LIST_HEAD_INIT(audit_rules_list[6]),
> +	LIST_HEAD_INIT(audit_rules_list[7]),
>  };
>  
>  DEFINE_MUTEX(audit_filter_mutex);
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index d8aa2c690bf9..4f6ab34020fb 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -799,6 +799,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
>  	return rule->mask[word] & bit;
>  }
>  
> +/**
> + * audit_filter_uring - apply filters to an io_uring operation
> + * @tsk: associated task
> + * @ctx: audit context
> + */
> +static void audit_filter_uring(struct task_struct *tsk,
> +			       struct audit_context *ctx)
> +{
> +	struct audit_entry *e;
> +	enum audit_state state;
> +
> +	if (auditd_test_task(tsk))
> +		return;

Is this necessary?  auditd and auditctl don't (intentionally) use any
io_uring functionality.  Is it possible it might inadvertantly use some
by virtue of libc or other library calls now or in the future?

> +	rcu_read_lock();
> +	list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT],
> +				list) {
> +		if (audit_in_mask(&e->rule, ctx->uring_op) &&

While this seems like the most obvious approach given the parallels
between syscalls and io_uring operations, as coded here it won't work
due to the different mappings of syscall numbers and io_uring
operations unless we re-use the auditctl -S field with raw io_uring
operation numbers in the place of syscall numbers.  This should have
been obvious to me when I first looked at this patch.  It became obvious
when I started looking at the userspace auditctl.c.

The easy first step would be to use something like this:
	auditctl -a uring,always -S 18,28 -F key=uring_open
to monitor file open commands only.  The same is not yet possible for
the perm field, but there are so few io_uring ops at this point compared
with syscalls that it might be manageable.  The arch is irrelevant since
io_uring operation numbers are identical across all hardware as far as I
can tell.  Most of the rest of the fields should make sense if they do
for a syscall rule.  Here's a sample of userspace code to support this
patch:
	https://github.com/rgbriggs/audit-userspace/commit/a77baa1651b7ad841a220eb962d4cc92bc07dc96
	https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau-iouring-filtering.v1.0


If we abuse the syscall infrastructure at first, we'd need a transition
plan to coordinate user and kernel switchover to seperate mechanisms for
the two to work together if the need should arise to have both syscall
and uring filters in the same rule.


I'm still exploring ideas on the best place to put this translation
table, in userspace libaudit, kaudit, io_op_defs[],
include/uapi/linux/io_uring.h, ...

For speed, the best would be in userspace libaudit, but that will be the
least obvious place for any io_uring developer to look when making any
updates to the list of io_uring operations and will most likely result
in additions escaping security logging.

Next best for speed would be in kaudit when instantiating rules, to
translate the syscall numbers to operation numbers and store them
natively in a different mask (audit_in_mask).  This also runs the risk
of additions escaping security logging.

After that, they could reuse the existing audit syscall filter
infrastructure and mask and translate on the fly from the specific op in
use to its equivalent syscall number when checking the existing filter.
This is the least disruptive to the audit code and the most likely to be
updated when new io_uring operations are added, but will incur an extra
step to translate the io_uring operation number to a syscall number in
the hot path.

Given that multiple aspects of security appear to have been a complete
afterthought to this feature, necessitating it to be bolted on after the
fact, it appears that the last option might be the most attractive to
the security folks, making this as easy as possible for fs folks to
update the audit code would be necessary to maintain security.


If there isn't a direct mapping between io_uring operations and
syscalls (the reverse isn't needed), then we'll need to duplicate the
mechanism throughout the stack starting in userspace auditctl and
libaudit.  Currently the bitmap for syscalls is 2k.  The current
io_uring op list appears to be 37.


It might be wise to deliberately not support auditctl "-w" (and the
exported audit_add_watch() function) since that is currently hardcoded
to the AUDIT_FILTER_EXIT list and is the old watch form [replaced by
audit_rule_fieldpair_data()] anyways that is more likely to be
deprecated.  It also appears to make sense not to support autrace (at
least initially).


Does any of this roughly make sense?


> +		    audit_filter_rules(tsk, &e->rule, ctx, NULL, &state,
> +				       false)) {
> +			rcu_read_unlock();
> +			ctx->current_state = state;
> +			return;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return;
> +}
> +
>  /* At syscall exit time, this filter is called if the audit_state is
>   * not low enough that auditing cannot take place, but is also not
>   * high enough that we already know we have to write an audit record
> @@ -1754,7 +1783,7 @@ static void audit_log_exit(void)
>   * __audit_free - free a per-task audit context
>   * @tsk: task whose audit context block to free
>   *
> - * Called from copy_process and do_exit
> + * Called from copy_process, do_exit, and the io_uring code
>   */
>  void __audit_free(struct task_struct *tsk)
>  {
> @@ -1772,15 +1801,21 @@ void __audit_free(struct task_struct *tsk)
>  	 * random task_struct that doesn't doesn't have any meaningful data we
>  	 * need to log via audit_log_exit().
>  	 */
> -	if (tsk == current && !context->dummy &&
> -	    context->context == AUDIT_CTX_SYSCALL) {
> +	if (tsk == current && !context->dummy) {
>  		context->return_valid = AUDITSC_INVALID;
>  		context->return_code = 0;
> -
> -		audit_filter_syscall(tsk, context);
> -		audit_filter_inodes(tsk, context);
> -		if (context->current_state == AUDIT_RECORD_CONTEXT)
> -			audit_log_exit();
> +		if (context->context == AUDIT_CTX_SYSCALL) {
> +			audit_filter_syscall(tsk, context);
> +			audit_filter_inodes(tsk, context);
> +			if (context->current_state == AUDIT_RECORD_CONTEXT)
> +				audit_log_exit();
> +		} else if (context->context == AUDIT_CTX_URING) {
> +			/* TODO: verify this case is real and valid */
> +			audit_filter_uring(tsk, context);
> +			audit_filter_inodes(tsk, context);
> +			if (context->current_state == AUDIT_RECORD_CONTEXT)
> +				audit_log_uring(context);
> +		}
>  	}
>  
>  	audit_set_context(tsk, NULL);
> @@ -1861,12 +1896,6 @@ void __audit_uring_exit(int success, long code)
>  {
>  	struct audit_context *ctx = audit_context();
>  
> -	/*
> -	 * TODO: At some point we will likely want to filter on io_uring ops
> -	 *       and other things similar to what we do for syscalls, but that
> -	 *       is something for another day; just record what we can here.
> -	 */
> -
>  	if (!ctx || ctx->dummy)
>  		goto out;
>  	if (ctx->context == AUDIT_CTX_SYSCALL) {
> @@ -1891,6 +1920,8 @@ void __audit_uring_exit(int success, long code)
>  		 * the behavior here.
>  		 */
>  		audit_filter_syscall(current, ctx);
> +		if (ctx->current_state != AUDIT_RECORD_CONTEXT)
> +			audit_filter_uring(current, ctx);
>  		audit_filter_inodes(current, ctx);
>  		if (ctx->current_state != AUDIT_RECORD_CONTEXT)
>  			return;
> @@ -1899,7 +1930,9 @@ void __audit_uring_exit(int success, long code)
>  		return;
>  	}
>  #if 1
> -	/* XXX - temporary hack to force record generation */
> +	/* XXX - temporary hack to force record generation, we are leaving this
> +	 *       enabled, but if you want to actually test the filtering you
> +	 *       need to disable this #if/#endif block */
>  	ctx->current_state = AUDIT_RECORD_CONTEXT;
>  #endif
>  
> @@ -1907,6 +1940,8 @@ void __audit_uring_exit(int success, long code)
>  	if (!list_empty(&ctx->killed_trees))
>  		audit_kill_trees(ctx);
>  
> +	/* run through both filters to ensure we set the filterkey properly */
> +	audit_filter_uring(current, ctx);
>  	audit_filter_inodes(current, ctx);
>  	if (ctx->current_state != AUDIT_RECORD_CONTEXT)
>  		goto out;
> 
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-audit

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
Paul Moore May 30, 2021, 3:26 p.m. UTC | #2
On Fri, May 28, 2021 at 6:36 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2021-05-21 17:50, Paul Moore wrote:
> > WARNING - This is a work in progress and should not be merged
> > anywhere important.  It is almost surely not complete, and while it
> > probably compiles it likely hasn't been booted and will do terrible
> > things.  You have been warned.
> >
> > This patch adds basic audit io_uring filtering, using as much of the
> > existing audit filtering infrastructure as possible.  In order to do
> > this we reuse the audit filter rule's syscall mask for the io_uring
> > operation and we create a new filter for io_uring operations as
> > AUDIT_FILTER_URING_EXIT/audit_filter_list[7].
> >
> > <TODO - provide some additional guidance for the userspace tools>
> >
> > Signed-off-by: Paul Moore <paul@paul-moore.com>
> > ---
> >  include/uapi/linux/audit.h |    3 +-
> >  kernel/auditfilter.c       |    4 ++-
> >  kernel/auditsc.c           |   65 ++++++++++++++++++++++++++++++++++----------
> >  3 files changed, 55 insertions(+), 17 deletions(-)

...

> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index d8aa2c690bf9..4f6ab34020fb 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -799,6 +799,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
> >       return rule->mask[word] & bit;
> >  }
> >
> > +/**
> > + * audit_filter_uring - apply filters to an io_uring operation
> > + * @tsk: associated task
> > + * @ctx: audit context
> > + */
> > +static void audit_filter_uring(struct task_struct *tsk,
> > +                            struct audit_context *ctx)
> > +{
> > +     struct audit_entry *e;
> > +     enum audit_state state;
> > +
> > +     if (auditd_test_task(tsk))
> > +             return;
>
> Is this necessary?  auditd and auditctl don't (intentionally) use any
> io_uring functionality.  Is it possible it might inadvertantly use some
> by virtue of libc or other library calls now or in the future?

I think the better question is what harm does it do?  Yes, I'm not
aware of an auditd implementation that currently makes use of
io_uring, but it is also not inconceivable some future implementation
might want to make use of it and given the disjoint nature of kernel
and userspace development I don't want the kernel to block such
developments.  However, if you can think of a reason why having this
check here is bad I'm listening (note: we are already in the slow path
here so having the additional check isn't an issue as far as I'm
concerned).

As a reminder, auditd_test_task() only returns true/1 if the task is
registered with the audit subsystem as an auditd connection, an
auditctl process should not cause this function to return true.

> > +     rcu_read_lock();
> > +     list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT],
> > +                             list) {
> > +             if (audit_in_mask(&e->rule, ctx->uring_op) &&
>
> While this seems like the most obvious approach given the parallels
> between syscalls and io_uring operations, as coded here it won't work
> due to the different mappings of syscall numbers and io_uring
> operations unless we re-use the auditctl -S field with raw io_uring
> operation numbers in the place of syscall numbers.  This should have
> been obvious to me when I first looked at this patch.  It became obvious
> when I started looking at the userspace auditctl.c.

FWIW, my intention was to treat io_uring opcodes exactly like we treat
syscall numbers.  Yes, this would potentially be an issue if we wanted
to combine syscalls and io_uring opcodes into one filter, but why
would we ever want to do that?  Combining the two into one filter not
only makes the filter lists longer than needed (we will always know if
we are filtering on a syscall or io_uring op) and complicates the
filter rule processing.

Or is there a problem with this that I'm missing?

> The easy first step would be to use something like this:
>         auditctl -a uring,always -S 18,28 -F key=uring_open
> to monitor file open commands only.  The same is not yet possible for
> the perm field, but there are so few io_uring ops at this point compared
> with syscalls that it might be manageable.  The arch is irrelevant since
> io_uring operation numbers are identical across all hardware as far as I
> can tell.  Most of the rest of the fields should make sense if they do
> for a syscall rule.

I've never been a fan of audit's "perm" filtering; I've always felt
there were better ways to handle that so I'm not overly upset that we
are skipping that functionality with this initial support.  If it
becomes a problem in the future we can always add that support at a
later date.

I currently fear that just getting io_uring and audit to coexist is
going to be a large enough problem in the immediate future.

> Here's a sample of userspace code to support this
> patch:
>         https://github.com/rgbriggs/audit-userspace/commit/a77baa1651b7ad841a220eb962d4cc92bc07dc96
>         https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau-iouring-filtering.v1.0

Great, thank you.  I haven't grabbed a copy yet for testing, but I will.

> If we abuse the syscall infrastructure at first, we'd need a transition
> plan to coordinate user and kernel switchover to seperate mechanisms for
> the two to work together if the need should arise to have both syscall
> and uring filters in the same rule.

See my comments above, I don't currently see why we would ever want
syscall and io_uring filtering to happen in the same rule.  Please
speak up if you can think of a reason why this would either be needed,
or desirable for some reason.

> It might be wise to deliberately not support auditctl "-w" (and the
> exported audit_add_watch() function) since that is currently hardcoded
> to the AUDIT_FILTER_EXIT list and is the old watch form [replaced by
> audit_rule_fieldpair_data()] anyways that is more likely to be
> deprecated.  It also appears to make sense not to support autrace (at
> least initially).

I'm going to be honest with you and simply say that I've run out of my
email/review time in front of the computer on this holiday weekend
(blame the lockdown/bpf/lsm discussion <g>) and I need to go for
today, but this is something I'll take a look it this coming week.
Hopefully the comments above give us something to think/talk about in
the meantime.

Regardless, thanks for your help on the userspace side of the
filtering, that should make testing a lot easier moving forward.

--
paul moore
www.paul-moore.com
Richard Guy Briggs May 31, 2021, 1:44 p.m. UTC | #3
On 2021-05-30 11:26, Paul Moore wrote:
> On Fri, May 28, 2021 at 6:36 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2021-05-21 17:50, Paul Moore wrote:
> > > WARNING - This is a work in progress and should not be merged
> > > anywhere important.  It is almost surely not complete, and while it
> > > probably compiles it likely hasn't been booted and will do terrible
> > > things.  You have been warned.
> > >
> > > This patch adds basic audit io_uring filtering, using as much of the
> > > existing audit filtering infrastructure as possible.  In order to do
> > > this we reuse the audit filter rule's syscall mask for the io_uring
> > > operation and we create a new filter for io_uring operations as
> > > AUDIT_FILTER_URING_EXIT/audit_filter_list[7].
> > >
> > > <TODO - provide some additional guidance for the userspace tools>
> > >
> > > Signed-off-by: Paul Moore <paul@paul-moore.com>
> > > ---
> > >  include/uapi/linux/audit.h |    3 +-
> > >  kernel/auditfilter.c       |    4 ++-
> > >  kernel/auditsc.c           |   65 ++++++++++++++++++++++++++++++++++----------
> > >  3 files changed, 55 insertions(+), 17 deletions(-)
> 
> ...
> 
> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index d8aa2c690bf9..4f6ab34020fb 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -799,6 +799,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
> > >       return rule->mask[word] & bit;
> > >  }
> > >
> > > +/**
> > > + * audit_filter_uring - apply filters to an io_uring operation
> > > + * @tsk: associated task
> > > + * @ctx: audit context
> > > + */
> > > +static void audit_filter_uring(struct task_struct *tsk,
> > > +                            struct audit_context *ctx)
> > > +{
> > > +     struct audit_entry *e;
> > > +     enum audit_state state;
> > > +
> > > +     if (auditd_test_task(tsk))
> > > +             return;
> >
> > Is this necessary?  auditd and auditctl don't (intentionally) use any
> > io_uring functionality.  Is it possible it might inadvertantly use some
> > by virtue of libc or other library calls now or in the future?
> 
> I think the better question is what harm does it do?  Yes, I'm not
> aware of an auditd implementation that currently makes use of
> io_uring, but it is also not inconceivable some future implementation
> might want to make use of it and given the disjoint nature of kernel
> and userspace development I don't want the kernel to block such
> developments.  However, if you can think of a reason why having this
> check here is bad I'm listening (note: we are already in the slow path
> here so having the additional check isn't an issue as far as I'm
> concerned).
> 
> As a reminder, auditd_test_task() only returns true/1 if the task is
> registered with the audit subsystem as an auditd connection, an
> auditctl process should not cause this function to return true.

My main concern was overhead, since the whole goal of io_uring is speed.

The chances that audit does use this functionality in the future suggest
to me that it is best to leave this check in.

> > > +     rcu_read_lock();
> > > +     list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT],
> > > +                             list) {
> > > +             if (audit_in_mask(&e->rule, ctx->uring_op) &&
> >
> > While this seems like the most obvious approach given the parallels
> > between syscalls and io_uring operations, as coded here it won't work
> > due to the different mappings of syscall numbers and io_uring
> > operations unless we re-use the auditctl -S field with raw io_uring
> > operation numbers in the place of syscall numbers.  This should have
> > been obvious to me when I first looked at this patch.  It became obvious
> > when I started looking at the userspace auditctl.c.
> 
> FWIW, my intention was to treat io_uring opcodes exactly like we treat
> syscall numbers.  Yes, this would potentially be an issue if we wanted
> to combine syscalls and io_uring opcodes into one filter, but why
> would we ever want to do that?  Combining the two into one filter not
> only makes the filter lists longer than needed (we will always know if
> we are filtering on a syscall or io_uring op) and complicates the
> filter rule processing.
> 
> Or is there a problem with this that I'm missing?

No, I think you have a good understanding of it.  I'm asking hard
questions to avoid missing something important.  If we can reuse the
syscall infrastructure for this then that is extremely helpful (if not
lazy, which isn't necessarily a bad thing).  It does mean that the
io_uring op dictionary will need to live in userspace audit the way it
is currently implemented, or provide a flag to indicate it is a syscall
number to be translated in the kernel either at the time of rule
addition or translated on the fly on rule check in the kernel adding
overhead to a critical path.

> > The easy first step would be to use something like this:
> >         auditctl -a uring,always -S 18,28 -F key=uring_open
> > to monitor file open commands only.  The same is not yet possible for
> > the perm field, but there are so few io_uring ops at this point compared
> > with syscalls that it might be manageable.  The arch is irrelevant since
> > io_uring operation numbers are identical across all hardware as far as I
> > can tell.  Most of the rest of the fields should make sense if they do
> > for a syscall rule.
> 
> I've never been a fan of audit's "perm" filtering; I've always felt
> there were better ways to handle that so I'm not overly upset that we
> are skipping that functionality with this initial support.  If it
> becomes a problem in the future we can always add that support at a
> later date.

Ok, I don't see a pressing need to add it initially, but should add a
check to block that field from being used to avoid the confusion of
unpredictable behaviour should someone try to add a perm filter to a
io_uring filter.  That should be done protectively in the kernel and
proactively in userspace.

> I currently fear that just getting io_uring and audit to coexist is
> going to be a large enough problem in the immediate future.

Agreed.

> > Here's a sample of userspace code to support this
> > patch:
> >         https://github.com/rgbriggs/audit-userspace/commit/a77baa1651b7ad841a220eb962d4cc92bc07dc96
> >         https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau-iouring-filtering.v1.0
> 
> Great, thank you.  I haven't grabbed a copy yet for testing, but I will.

I've added a perm filter block as an additional patch in userspace and
updated the tree so that first commit is no longer the top of tree but
the branch name is current.

I'll add a kernel perm filter check.

I just noticed some list checking that is missing in tree and watch in
your patch.

Suggested fixup patches to follow...

> > If we abuse the syscall infrastructure at first, we'd need a transition
> > plan to coordinate user and kernel switchover to seperate mechanisms for
> > the two to work together if the need should arise to have both syscall
> > and uring filters in the same rule.
> 
> See my comments above, I don't currently see why we would ever want
> syscall and io_uring filtering to happen in the same rule.  Please
> speak up if you can think of a reason why this would either be needed,
> or desirable for some reason.

I think they can be seperate rules for now.  Either a syscall rule
catching all io_uring ops can be added, or an io_uring rule can be added
to catch specific ops.  The scenario I was thinking of was catching
syscalls of specific io_uring ops.

> > It might be wise to deliberately not support auditctl "-w" (and the
> > exported audit_add_watch() function) since that is currently hardcoded
> > to the AUDIT_FILTER_EXIT list and is the old watch form [replaced by
> > audit_rule_fieldpair_data()] anyways that is more likely to be
> > deprecated.  It also appears to make sense not to support autrace (at
> > least initially).
> 
> I'm going to be honest with you and simply say that I've run out of my
> email/review time in front of the computer on this holiday weekend
> (blame the lockdown/bpf/lsm discussion <g>) and I need to go for
> today, but this is something I'll take a look it this coming week.
> Hopefully the comments above give us something to think/talk about in
> the meantime.

I wasn't expecting you to work the weekend.  :-)

> Regardless, thanks for your help on the userspace side of the
> filtering, that should make testing a lot easier moving forward.

Standard RFC disclaimers apply.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
Paul Moore June 2, 2021, 1:40 a.m. UTC | #4
On Mon, May 31, 2021 at 9:44 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2021-05-30 11:26, Paul Moore wrote:
> > On Fri, May 28, 2021 at 6:36 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > On 2021-05-21 17:50, Paul Moore wrote:

...

> > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > index d8aa2c690bf9..4f6ab34020fb 100644
> > > > --- a/kernel/auditsc.c
> > > > +++ b/kernel/auditsc.c
> > > > @@ -799,6 +799,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
> > > >       return rule->mask[word] & bit;
> > > >  }
> > > >
> > > > +/**
> > > > + * audit_filter_uring - apply filters to an io_uring operation
> > > > + * @tsk: associated task
> > > > + * @ctx: audit context
> > > > + */
> > > > +static void audit_filter_uring(struct task_struct *tsk,
> > > > +                            struct audit_context *ctx)
> > > > +{
> > > > +     struct audit_entry *e;
> > > > +     enum audit_state state;
> > > > +
> > > > +     if (auditd_test_task(tsk))
> > > > +             return;
> > >
> > > Is this necessary?  auditd and auditctl don't (intentionally) use any
> > > io_uring functionality.  Is it possible it might inadvertantly use some
> > > by virtue of libc or other library calls now or in the future?
> >
> > I think the better question is what harm does it do?  Yes, I'm not
> > aware of an auditd implementation that currently makes use of
> > io_uring, but it is also not inconceivable some future implementation
> > might want to make use of it and given the disjoint nature of kernel
> > and userspace development I don't want the kernel to block such
> > developments.  However, if you can think of a reason why having this
> > check here is bad I'm listening (note: we are already in the slow path
> > here so having the additional check isn't an issue as far as I'm
> > concerned).
> >
> > As a reminder, auditd_test_task() only returns true/1 if the task is
> > registered with the audit subsystem as an auditd connection, an
> > auditctl process should not cause this function to return true.
>
> My main concern was overhead, since the whole goal of io_uring is speed.

At the point where this test takes place we are already in the audit
slow path as far as io_uring is concerned.  I understand your concern,
but the advantage of being able to effectively use io_uring in the
future makes this worth keeping in my opinion.

> The chances that audit does use this functionality in the future suggest
> to me that it is best to leave this check in.

Sounds like we are in agreement.  We'll keep it for now.

> > > > +     rcu_read_lock();
> > > > +     list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT],
> > > > +                             list) {
> > > > +             if (audit_in_mask(&e->rule, ctx->uring_op) &&
> > >
> > > While this seems like the most obvious approach given the parallels
> > > between syscalls and io_uring operations, as coded here it won't work
> > > due to the different mappings of syscall numbers and io_uring
> > > operations unless we re-use the auditctl -S field with raw io_uring
> > > operation numbers in the place of syscall numbers.  This should have
> > > been obvious to me when I first looked at this patch.  It became obvious
> > > when I started looking at the userspace auditctl.c.
> >
> > FWIW, my intention was to treat io_uring opcodes exactly like we treat
> > syscall numbers.  Yes, this would potentially be an issue if we wanted
> > to combine syscalls and io_uring opcodes into one filter, but why
> > would we ever want to do that?  Combining the two into one filter not
> > only makes the filter lists longer than needed (we will always know if
> > we are filtering on a syscall or io_uring op) and complicates the
> > filter rule processing.
> >
> > Or is there a problem with this that I'm missing?
>
> No, I think you have a good understanding of it.  I'm asking hard
> questions to avoid missing something important.  If we can reuse the
> syscall infrastructure for this then that is extremely helpful (if not
> lazy, which isn't necessarily a bad thing).  It does mean that the
> io_uring op dictionary will need to live in userspace audit the way it
> is currently implemented ....

Which I currently believe is the right thing to do.

> > > The easy first step would be to use something like this:
> > >         auditctl -a uring,always -S 18,28 -F key=uring_open
> > > to monitor file open commands only.  The same is not yet possible for
> > > the perm field, but there are so few io_uring ops at this point compared
> > > with syscalls that it might be manageable.  The arch is irrelevant since
> > > io_uring operation numbers are identical across all hardware as far as I
> > > can tell.  Most of the rest of the fields should make sense if they do
> > > for a syscall rule.
> >
> > I've never been a fan of audit's "perm" filtering; I've always felt
> > there were better ways to handle that so I'm not overly upset that we
> > are skipping that functionality with this initial support.  If it
> > becomes a problem in the future we can always add that support at a
> > later date.
>
> Ok, I don't see a pressing need to add it initially, but should add a
> check to block that field from being used to avoid the confusion of
> unpredictable behaviour should someone try to add a perm filter to a
> io_uring filter.  That should be done protectively in the kernel and
> proactively in userspace.

Sure, that's reasonable.

> > > Here's a sample of userspace code to support this
> > > patch:
> > >         https://github.com/rgbriggs/audit-userspace/commit/a77baa1651b7ad841a220eb962d4cc92bc07dc96
> > >         https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau-iouring-filtering.v1.0
> >
> > Great, thank you.  I haven't grabbed a copy yet for testing, but I will.
>
> I've added a perm filter block as an additional patch in userspace and
> updated the tree so that first commit is no longer the top of tree but
> the branch name is current.
>
> I'll add a kernel perm filter check.
>
> I just noticed some list checking that is missing in tree and watch in
> your patch.
>
> Suggested fixup patches to follow...

I see them, thank you, comments will follow over there.  Although to
be honest I'm mostly focusing on the testing right now while we wait
to hear back from Jens on what he is willing to accept regarding audit
calls in io_issue_sqe().  If we can't do the _entry()/_exit() calls
then this work is pretty much dead and we just have to deal with it in
Kconfig.  I might make one last, clean patchset and put it in a branch
for the distros that want to carry the patchset, but it isn't clear to
me that it is something I would want to maintain long term.  Long
running out of tree patches are generally A Bad Idea.

> > > If we abuse the syscall infrastructure at first, we'd need a transition
> > > plan to coordinate user and kernel switchover to seperate mechanisms for
> > > the two to work together if the need should arise to have both syscall
> > > and uring filters in the same rule.
> >
> > See my comments above, I don't currently see why we would ever want
> > syscall and io_uring filtering to happen in the same rule.  Please
> > speak up if you can think of a reason why this would either be needed,
> > or desirable for some reason.
>
> I think they can be seperate rules for now.  Either a syscall rule
> catching all io_uring ops can be added, or an io_uring rule can be added
> to catch specific ops.  The scenario I was thinking of was catching
> syscalls of specific io_uring ops.

Perhaps I'm misunderstand you, but that scenario really shouldn't
exist.  The io_uring ops function independently of syscalls; you can
*submit* io_uring ops via io_uring_enter(), but they are not
guaranteed to be dispatched synchronously (obviously), and given the
cred shenanigans that can happen with io_uring there is no guarantee
the filters would even be applicable.

It isn't an issue of "can" the filters be separate, they *have* to be separate.
Richard Guy Briggs June 2, 2021, 3:37 p.m. UTC | #5
On 2021-06-01 21:40, Paul Moore wrote:
> On Mon, May 31, 2021 at 9:44 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2021-05-30 11:26, Paul Moore wrote:
> > > On Fri, May 28, 2021 at 6:36 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > > On 2021-05-21 17:50, Paul Moore wrote:
> > > > If we abuse the syscall infrastructure at first, we'd need a transition
> > > > plan to coordinate user and kernel switchover to seperate mechanisms for
> > > > the two to work together if the need should arise to have both syscall
> > > > and uring filters in the same rule.
> > >
> > > See my comments above, I don't currently see why we would ever want
> > > syscall and io_uring filtering to happen in the same rule.  Please
> > > speak up if you can think of a reason why this would either be needed,
> > > or desirable for some reason.
> >
> > I think they can be seperate rules for now.  Either a syscall rule
> > catching all io_uring ops can be added, or an io_uring rule can be added
> > to catch specific ops.  The scenario I was thinking of was catching
> > syscalls of specific io_uring ops.
> 
> Perhaps I'm misunderstand you, but that scenario really shouldn't
> exist.  The io_uring ops function independently of syscalls; you can
> *submit* io_uring ops via io_uring_enter(), but they are not
> guaranteed to be dispatched synchronously (obviously), and given the
> cred shenanigans that can happen with io_uring there is no guarantee
> the filters would even be applicable.

That wasn't my understanding.  There are a number of io_uring calls
starting with at least open that are currently synchronous (but may
become async in future) that we may want to single out which would be a
specific io_uring syscall with a specific io_uring opcode.  I guess
that particular situation would be caught by the io_uring opcode
triggering an event that includes SYSCALL and URINGOP records.

> It isn't an issue of "can" the filters be separate, they *have* to be separate.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
Paul Moore June 2, 2021, 5:20 p.m. UTC | #6
On Wed, Jun 2, 2021 at 11:38 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2021-06-01 21:40, Paul Moore wrote:
> > On Mon, May 31, 2021 at 9:44 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > On 2021-05-30 11:26, Paul Moore wrote:
> > > > On Fri, May 28, 2021 at 6:36 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > > > On 2021-05-21 17:50, Paul Moore wrote:
> > > > > If we abuse the syscall infrastructure at first, we'd need a transition
> > > > > plan to coordinate user and kernel switchover to seperate mechanisms for
> > > > > the two to work together if the need should arise to have both syscall
> > > > > and uring filters in the same rule.
> > > >
> > > > See my comments above, I don't currently see why we would ever want
> > > > syscall and io_uring filtering to happen in the same rule.  Please
> > > > speak up if you can think of a reason why this would either be needed,
> > > > or desirable for some reason.
> > >
> > > I think they can be seperate rules for now.  Either a syscall rule
> > > catching all io_uring ops can be added, or an io_uring rule can be added
> > > to catch specific ops.  The scenario I was thinking of was catching
> > > syscalls of specific io_uring ops.
> >
> > Perhaps I'm misunderstand you, but that scenario really shouldn't
> > exist.  The io_uring ops function independently of syscalls; you can
> > *submit* io_uring ops via io_uring_enter(), but they are not
> > guaranteed to be dispatched synchronously (obviously), and given the
> > cred shenanigans that can happen with io_uring there is no guarantee
> > the filters would even be applicable.
>
> That wasn't my understanding.  There are a number of io_uring calls
> starting with at least open that are currently synchronous (but may
> become async in future) that we may want to single out which would be a
> specific io_uring syscall with a specific io_uring opcode.  I guess
> that particular situation would be caught by the io_uring opcode
> triggering an event that includes SYSCALL and URINGOP records.

The only io_uring syscalls are io_uring_setup(2), io_uring_enter(2),
etc., the stuff that is dispatched in io_issue_sqe() are the io_uring
ops/opcodes/whatever.  They *look* like syscalls but they are not and
we have to treat them differently.

> > It isn't an issue of "can" the filters be separate, they *have* to be separate.
diff mbox series

Patch

diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index b26e0c435e8b..621eb3c1076e 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -167,8 +167,9 @@ 
 #define AUDIT_FILTER_EXCLUDE	0x05	/* Apply rule before record creation */
 #define AUDIT_FILTER_TYPE	AUDIT_FILTER_EXCLUDE /* obsolete misleading naming */
 #define AUDIT_FILTER_FS		0x06	/* Apply rule at __audit_inode_child */
+#define AUDIT_FILTER_URING_EXIT	0x07	/* Apply rule at io_uring op exit */
 
-#define AUDIT_NR_FILTERS	7
+#define AUDIT_NR_FILTERS	8
 
 #define AUDIT_FILTER_PREPEND	0x10	/* Prepend to front of list */
 
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index db2c6b59dfc3..c21119c00504 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -44,7 +44,8 @@  struct list_head audit_filter_list[AUDIT_NR_FILTERS] = {
 	LIST_HEAD_INIT(audit_filter_list[4]),
 	LIST_HEAD_INIT(audit_filter_list[5]),
 	LIST_HEAD_INIT(audit_filter_list[6]),
-#if AUDIT_NR_FILTERS != 7
+	LIST_HEAD_INIT(audit_filter_list[7]),
+#if AUDIT_NR_FILTERS != 8
 #error Fix audit_filter_list initialiser
 #endif
 };
@@ -56,6 +57,7 @@  static struct list_head audit_rules_list[AUDIT_NR_FILTERS] = {
 	LIST_HEAD_INIT(audit_rules_list[4]),
 	LIST_HEAD_INIT(audit_rules_list[5]),
 	LIST_HEAD_INIT(audit_rules_list[6]),
+	LIST_HEAD_INIT(audit_rules_list[7]),
 };
 
 DEFINE_MUTEX(audit_filter_mutex);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index d8aa2c690bf9..4f6ab34020fb 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -799,6 +799,35 @@  static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
 	return rule->mask[word] & bit;
 }
 
+/**
+ * audit_filter_uring - apply filters to an io_uring operation
+ * @tsk: associated task
+ * @ctx: audit context
+ */
+static void audit_filter_uring(struct task_struct *tsk,
+			       struct audit_context *ctx)
+{
+	struct audit_entry *e;
+	enum audit_state state;
+
+	if (auditd_test_task(tsk))
+		return;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT],
+				list) {
+		if (audit_in_mask(&e->rule, ctx->uring_op) &&
+		    audit_filter_rules(tsk, &e->rule, ctx, NULL, &state,
+				       false)) {
+			rcu_read_unlock();
+			ctx->current_state = state;
+			return;
+		}
+	}
+	rcu_read_unlock();
+	return;
+}
+
 /* At syscall exit time, this filter is called if the audit_state is
  * not low enough that auditing cannot take place, but is also not
  * high enough that we already know we have to write an audit record
@@ -1754,7 +1783,7 @@  static void audit_log_exit(void)
  * __audit_free - free a per-task audit context
  * @tsk: task whose audit context block to free
  *
- * Called from copy_process and do_exit
+ * Called from copy_process, do_exit, and the io_uring code
  */
 void __audit_free(struct task_struct *tsk)
 {
@@ -1772,15 +1801,21 @@  void __audit_free(struct task_struct *tsk)
 	 * random task_struct that doesn't doesn't have any meaningful data we
 	 * need to log via audit_log_exit().
 	 */
-	if (tsk == current && !context->dummy &&
-	    context->context == AUDIT_CTX_SYSCALL) {
+	if (tsk == current && !context->dummy) {
 		context->return_valid = AUDITSC_INVALID;
 		context->return_code = 0;
-
-		audit_filter_syscall(tsk, context);
-		audit_filter_inodes(tsk, context);
-		if (context->current_state == AUDIT_RECORD_CONTEXT)
-			audit_log_exit();
+		if (context->context == AUDIT_CTX_SYSCALL) {
+			audit_filter_syscall(tsk, context);
+			audit_filter_inodes(tsk, context);
+			if (context->current_state == AUDIT_RECORD_CONTEXT)
+				audit_log_exit();
+		} else if (context->context == AUDIT_CTX_URING) {
+			/* TODO: verify this case is real and valid */
+			audit_filter_uring(tsk, context);
+			audit_filter_inodes(tsk, context);
+			if (context->current_state == AUDIT_RECORD_CONTEXT)
+				audit_log_uring(context);
+		}
 	}
 
 	audit_set_context(tsk, NULL);
@@ -1861,12 +1896,6 @@  void __audit_uring_exit(int success, long code)
 {
 	struct audit_context *ctx = audit_context();
 
-	/*
-	 * TODO: At some point we will likely want to filter on io_uring ops
-	 *       and other things similar to what we do for syscalls, but that
-	 *       is something for another day; just record what we can here.
-	 */
-
 	if (!ctx || ctx->dummy)
 		goto out;
 	if (ctx->context == AUDIT_CTX_SYSCALL) {
@@ -1891,6 +1920,8 @@  void __audit_uring_exit(int success, long code)
 		 * the behavior here.
 		 */
 		audit_filter_syscall(current, ctx);
+		if (ctx->current_state != AUDIT_RECORD_CONTEXT)
+			audit_filter_uring(current, ctx);
 		audit_filter_inodes(current, ctx);
 		if (ctx->current_state != AUDIT_RECORD_CONTEXT)
 			return;
@@ -1899,7 +1930,9 @@  void __audit_uring_exit(int success, long code)
 		return;
 	}
 #if 1
-	/* XXX - temporary hack to force record generation */
+	/* XXX - temporary hack to force record generation, we are leaving this
+	 *       enabled, but if you want to actually test the filtering you
+	 *       need to disable this #if/#endif block */
 	ctx->current_state = AUDIT_RECORD_CONTEXT;
 #endif
 
@@ -1907,6 +1940,8 @@  void __audit_uring_exit(int success, long code)
 	if (!list_empty(&ctx->killed_trees))
 		audit_kill_trees(ctx);
 
+	/* run through both filters to ensure we set the filterkey properly */
+	audit_filter_uring(current, ctx);
 	audit_filter_inodes(current, ctx);
 	if (ctx->current_state != AUDIT_RECORD_CONTEXT)
 		goto out;