Message ID | 162163380685.8379.17381053199011043757.stgit@sifl (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add LSM access controls and auditing to io_uring | expand |
On 2021-05-21 17:50, Paul Moore wrote: > WARNING - This is a work in progress and should not be merged > anywhere important. It is almost surely not complete, and while it > probably compiles it likely hasn't been booted and will do terrible > things. You have been warned. > > This patch adds basic audit io_uring filtering, using as much of the > existing audit filtering infrastructure as possible. In order to do > this we reuse the audit filter rule's syscall mask for the io_uring > operation and we create a new filter for io_uring operations as > AUDIT_FILTER_URING_EXIT/audit_filter_list[7]. > > <TODO - provide some additional guidance for the userspace tools> > > Signed-off-by: Paul Moore <paul@paul-moore.com> > --- > include/uapi/linux/audit.h | 3 +- > kernel/auditfilter.c | 4 ++- > kernel/auditsc.c | 65 ++++++++++++++++++++++++++++++++++---------- > 3 files changed, 55 insertions(+), 17 deletions(-) > > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h > index b26e0c435e8b..621eb3c1076e 100644 > --- a/include/uapi/linux/audit.h > +++ b/include/uapi/linux/audit.h > @@ -167,8 +167,9 @@ > #define AUDIT_FILTER_EXCLUDE 0x05 /* Apply rule before record creation */ > #define AUDIT_FILTER_TYPE AUDIT_FILTER_EXCLUDE /* obsolete misleading naming */ > #define AUDIT_FILTER_FS 0x06 /* Apply rule at __audit_inode_child */ > +#define AUDIT_FILTER_URING_EXIT 0x07 /* Apply rule at io_uring op exit */ > > -#define AUDIT_NR_FILTERS 7 > +#define AUDIT_NR_FILTERS 8 > > #define AUDIT_FILTER_PREPEND 0x10 /* Prepend to front of list */ > > diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c > index db2c6b59dfc3..c21119c00504 100644 > --- a/kernel/auditfilter.c > +++ b/kernel/auditfilter.c > @@ -44,7 +44,8 @@ struct list_head audit_filter_list[AUDIT_NR_FILTERS] = { > LIST_HEAD_INIT(audit_filter_list[4]), > LIST_HEAD_INIT(audit_filter_list[5]), > LIST_HEAD_INIT(audit_filter_list[6]), > -#if AUDIT_NR_FILTERS != 7 > + LIST_HEAD_INIT(audit_filter_list[7]), > +#if AUDIT_NR_FILTERS != 8 > #error Fix audit_filter_list initialiser > #endif > }; > @@ -56,6 +57,7 @@ static struct list_head audit_rules_list[AUDIT_NR_FILTERS] = { > LIST_HEAD_INIT(audit_rules_list[4]), > LIST_HEAD_INIT(audit_rules_list[5]), > LIST_HEAD_INIT(audit_rules_list[6]), > + LIST_HEAD_INIT(audit_rules_list[7]), > }; > > DEFINE_MUTEX(audit_filter_mutex); > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > index d8aa2c690bf9..4f6ab34020fb 100644 > --- a/kernel/auditsc.c > +++ b/kernel/auditsc.c > @@ -799,6 +799,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val) > return rule->mask[word] & bit; > } > > +/** > + * audit_filter_uring - apply filters to an io_uring operation > + * @tsk: associated task > + * @ctx: audit context > + */ > +static void audit_filter_uring(struct task_struct *tsk, > + struct audit_context *ctx) > +{ > + struct audit_entry *e; > + enum audit_state state; > + > + if (auditd_test_task(tsk)) > + return; Is this necessary? auditd and auditctl don't (intentionally) use any io_uring functionality. Is it possible it might inadvertantly use some by virtue of libc or other library calls now or in the future? > + rcu_read_lock(); > + list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT], > + list) { > + if (audit_in_mask(&e->rule, ctx->uring_op) && While this seems like the most obvious approach given the parallels between syscalls and io_uring operations, as coded here it won't work due to the different mappings of syscall numbers and io_uring operations unless we re-use the auditctl -S field with raw io_uring operation numbers in the place of syscall numbers. This should have been obvious to me when I first looked at this patch. It became obvious when I started looking at the userspace auditctl.c. The easy first step would be to use something like this: auditctl -a uring,always -S 18,28 -F key=uring_open to monitor file open commands only. The same is not yet possible for the perm field, but there are so few io_uring ops at this point compared with syscalls that it might be manageable. The arch is irrelevant since io_uring operation numbers are identical across all hardware as far as I can tell. Most of the rest of the fields should make sense if they do for a syscall rule. Here's a sample of userspace code to support this patch: https://github.com/rgbriggs/audit-userspace/commit/a77baa1651b7ad841a220eb962d4cc92bc07dc96 https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau-iouring-filtering.v1.0 If we abuse the syscall infrastructure at first, we'd need a transition plan to coordinate user and kernel switchover to seperate mechanisms for the two to work together if the need should arise to have both syscall and uring filters in the same rule. I'm still exploring ideas on the best place to put this translation table, in userspace libaudit, kaudit, io_op_defs[], include/uapi/linux/io_uring.h, ... For speed, the best would be in userspace libaudit, but that will be the least obvious place for any io_uring developer to look when making any updates to the list of io_uring operations and will most likely result in additions escaping security logging. Next best for speed would be in kaudit when instantiating rules, to translate the syscall numbers to operation numbers and store them natively in a different mask (audit_in_mask). This also runs the risk of additions escaping security logging. After that, they could reuse the existing audit syscall filter infrastructure and mask and translate on the fly from the specific op in use to its equivalent syscall number when checking the existing filter. This is the least disruptive to the audit code and the most likely to be updated when new io_uring operations are added, but will incur an extra step to translate the io_uring operation number to a syscall number in the hot path. Given that multiple aspects of security appear to have been a complete afterthought to this feature, necessitating it to be bolted on after the fact, it appears that the last option might be the most attractive to the security folks, making this as easy as possible for fs folks to update the audit code would be necessary to maintain security. If there isn't a direct mapping between io_uring operations and syscalls (the reverse isn't needed), then we'll need to duplicate the mechanism throughout the stack starting in userspace auditctl and libaudit. Currently the bitmap for syscalls is 2k. The current io_uring op list appears to be 37. It might be wise to deliberately not support auditctl "-w" (and the exported audit_add_watch() function) since that is currently hardcoded to the AUDIT_FILTER_EXIT list and is the old watch form [replaced by audit_rule_fieldpair_data()] anyways that is more likely to be deprecated. It also appears to make sense not to support autrace (at least initially). Does any of this roughly make sense? > + audit_filter_rules(tsk, &e->rule, ctx, NULL, &state, > + false)) { > + rcu_read_unlock(); > + ctx->current_state = state; > + return; > + } > + } > + rcu_read_unlock(); > + return; > +} > + > /* At syscall exit time, this filter is called if the audit_state is > * not low enough that auditing cannot take place, but is also not > * high enough that we already know we have to write an audit record > @@ -1754,7 +1783,7 @@ static void audit_log_exit(void) > * __audit_free - free a per-task audit context > * @tsk: task whose audit context block to free > * > - * Called from copy_process and do_exit > + * Called from copy_process, do_exit, and the io_uring code > */ > void __audit_free(struct task_struct *tsk) > { > @@ -1772,15 +1801,21 @@ void __audit_free(struct task_struct *tsk) > * random task_struct that doesn't doesn't have any meaningful data we > * need to log via audit_log_exit(). > */ > - if (tsk == current && !context->dummy && > - context->context == AUDIT_CTX_SYSCALL) { > + if (tsk == current && !context->dummy) { > context->return_valid = AUDITSC_INVALID; > context->return_code = 0; > - > - audit_filter_syscall(tsk, context); > - audit_filter_inodes(tsk, context); > - if (context->current_state == AUDIT_RECORD_CONTEXT) > - audit_log_exit(); > + if (context->context == AUDIT_CTX_SYSCALL) { > + audit_filter_syscall(tsk, context); > + audit_filter_inodes(tsk, context); > + if (context->current_state == AUDIT_RECORD_CONTEXT) > + audit_log_exit(); > + } else if (context->context == AUDIT_CTX_URING) { > + /* TODO: verify this case is real and valid */ > + audit_filter_uring(tsk, context); > + audit_filter_inodes(tsk, context); > + if (context->current_state == AUDIT_RECORD_CONTEXT) > + audit_log_uring(context); > + } > } > > audit_set_context(tsk, NULL); > @@ -1861,12 +1896,6 @@ void __audit_uring_exit(int success, long code) > { > struct audit_context *ctx = audit_context(); > > - /* > - * TODO: At some point we will likely want to filter on io_uring ops > - * and other things similar to what we do for syscalls, but that > - * is something for another day; just record what we can here. > - */ > - > if (!ctx || ctx->dummy) > goto out; > if (ctx->context == AUDIT_CTX_SYSCALL) { > @@ -1891,6 +1920,8 @@ void __audit_uring_exit(int success, long code) > * the behavior here. > */ > audit_filter_syscall(current, ctx); > + if (ctx->current_state != AUDIT_RECORD_CONTEXT) > + audit_filter_uring(current, ctx); > audit_filter_inodes(current, ctx); > if (ctx->current_state != AUDIT_RECORD_CONTEXT) > return; > @@ -1899,7 +1930,9 @@ void __audit_uring_exit(int success, long code) > return; > } > #if 1 > - /* XXX - temporary hack to force record generation */ > + /* XXX - temporary hack to force record generation, we are leaving this > + * enabled, but if you want to actually test the filtering you > + * need to disable this #if/#endif block */ > ctx->current_state = AUDIT_RECORD_CONTEXT; > #endif > > @@ -1907,6 +1940,8 @@ void __audit_uring_exit(int success, long code) > if (!list_empty(&ctx->killed_trees)) > audit_kill_trees(ctx); > > + /* run through both filters to ensure we set the filterkey properly */ > + audit_filter_uring(current, ctx); > audit_filter_inodes(current, ctx); > if (ctx->current_state != AUDIT_RECORD_CONTEXT) > goto out; > > -- > Linux-audit mailing list > Linux-audit@redhat.com > https://listman.redhat.com/mailman/listinfo/linux-audit - RGB -- Richard Guy Briggs <rgb@redhat.com> Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635
On Fri, May 28, 2021 at 6:36 PM Richard Guy Briggs <rgb@redhat.com> wrote: > On 2021-05-21 17:50, Paul Moore wrote: > > WARNING - This is a work in progress and should not be merged > > anywhere important. It is almost surely not complete, and while it > > probably compiles it likely hasn't been booted and will do terrible > > things. You have been warned. > > > > This patch adds basic audit io_uring filtering, using as much of the > > existing audit filtering infrastructure as possible. In order to do > > this we reuse the audit filter rule's syscall mask for the io_uring > > operation and we create a new filter for io_uring operations as > > AUDIT_FILTER_URING_EXIT/audit_filter_list[7]. > > > > <TODO - provide some additional guidance for the userspace tools> > > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > --- > > include/uapi/linux/audit.h | 3 +- > > kernel/auditfilter.c | 4 ++- > > kernel/auditsc.c | 65 ++++++++++++++++++++++++++++++++++---------- > > 3 files changed, 55 insertions(+), 17 deletions(-) ... > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > index d8aa2c690bf9..4f6ab34020fb 100644 > > --- a/kernel/auditsc.c > > +++ b/kernel/auditsc.c > > @@ -799,6 +799,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val) > > return rule->mask[word] & bit; > > } > > > > +/** > > + * audit_filter_uring - apply filters to an io_uring operation > > + * @tsk: associated task > > + * @ctx: audit context > > + */ > > +static void audit_filter_uring(struct task_struct *tsk, > > + struct audit_context *ctx) > > +{ > > + struct audit_entry *e; > > + enum audit_state state; > > + > > + if (auditd_test_task(tsk)) > > + return; > > Is this necessary? auditd and auditctl don't (intentionally) use any > io_uring functionality. Is it possible it might inadvertantly use some > by virtue of libc or other library calls now or in the future? I think the better question is what harm does it do? Yes, I'm not aware of an auditd implementation that currently makes use of io_uring, but it is also not inconceivable some future implementation might want to make use of it and given the disjoint nature of kernel and userspace development I don't want the kernel to block such developments. However, if you can think of a reason why having this check here is bad I'm listening (note: we are already in the slow path here so having the additional check isn't an issue as far as I'm concerned). As a reminder, auditd_test_task() only returns true/1 if the task is registered with the audit subsystem as an auditd connection, an auditctl process should not cause this function to return true. > > + rcu_read_lock(); > > + list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT], > > + list) { > > + if (audit_in_mask(&e->rule, ctx->uring_op) && > > While this seems like the most obvious approach given the parallels > between syscalls and io_uring operations, as coded here it won't work > due to the different mappings of syscall numbers and io_uring > operations unless we re-use the auditctl -S field with raw io_uring > operation numbers in the place of syscall numbers. This should have > been obvious to me when I first looked at this patch. It became obvious > when I started looking at the userspace auditctl.c. FWIW, my intention was to treat io_uring opcodes exactly like we treat syscall numbers. Yes, this would potentially be an issue if we wanted to combine syscalls and io_uring opcodes into one filter, but why would we ever want to do that? Combining the two into one filter not only makes the filter lists longer than needed (we will always know if we are filtering on a syscall or io_uring op) and complicates the filter rule processing. Or is there a problem with this that I'm missing? > The easy first step would be to use something like this: > auditctl -a uring,always -S 18,28 -F key=uring_open > to monitor file open commands only. The same is not yet possible for > the perm field, but there are so few io_uring ops at this point compared > with syscalls that it might be manageable. The arch is irrelevant since > io_uring operation numbers are identical across all hardware as far as I > can tell. Most of the rest of the fields should make sense if they do > for a syscall rule. I've never been a fan of audit's "perm" filtering; I've always felt there were better ways to handle that so I'm not overly upset that we are skipping that functionality with this initial support. If it becomes a problem in the future we can always add that support at a later date. I currently fear that just getting io_uring and audit to coexist is going to be a large enough problem in the immediate future. > Here's a sample of userspace code to support this > patch: > https://github.com/rgbriggs/audit-userspace/commit/a77baa1651b7ad841a220eb962d4cc92bc07dc96 > https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau-iouring-filtering.v1.0 Great, thank you. I haven't grabbed a copy yet for testing, but I will. > If we abuse the syscall infrastructure at first, we'd need a transition > plan to coordinate user and kernel switchover to seperate mechanisms for > the two to work together if the need should arise to have both syscall > and uring filters in the same rule. See my comments above, I don't currently see why we would ever want syscall and io_uring filtering to happen in the same rule. Please speak up if you can think of a reason why this would either be needed, or desirable for some reason. > It might be wise to deliberately not support auditctl "-w" (and the > exported audit_add_watch() function) since that is currently hardcoded > to the AUDIT_FILTER_EXIT list and is the old watch form [replaced by > audit_rule_fieldpair_data()] anyways that is more likely to be > deprecated. It also appears to make sense not to support autrace (at > least initially). I'm going to be honest with you and simply say that I've run out of my email/review time in front of the computer on this holiday weekend (blame the lockdown/bpf/lsm discussion <g>) and I need to go for today, but this is something I'll take a look it this coming week. Hopefully the comments above give us something to think/talk about in the meantime. Regardless, thanks for your help on the userspace side of the filtering, that should make testing a lot easier moving forward. -- paul moore www.paul-moore.com
On 2021-05-30 11:26, Paul Moore wrote: > On Fri, May 28, 2021 at 6:36 PM Richard Guy Briggs <rgb@redhat.com> wrote: > > On 2021-05-21 17:50, Paul Moore wrote: > > > WARNING - This is a work in progress and should not be merged > > > anywhere important. It is almost surely not complete, and while it > > > probably compiles it likely hasn't been booted and will do terrible > > > things. You have been warned. > > > > > > This patch adds basic audit io_uring filtering, using as much of the > > > existing audit filtering infrastructure as possible. In order to do > > > this we reuse the audit filter rule's syscall mask for the io_uring > > > operation and we create a new filter for io_uring operations as > > > AUDIT_FILTER_URING_EXIT/audit_filter_list[7]. > > > > > > <TODO - provide some additional guidance for the userspace tools> > > > > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > > --- > > > include/uapi/linux/audit.h | 3 +- > > > kernel/auditfilter.c | 4 ++- > > > kernel/auditsc.c | 65 ++++++++++++++++++++++++++++++++++---------- > > > 3 files changed, 55 insertions(+), 17 deletions(-) > > ... > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > > index d8aa2c690bf9..4f6ab34020fb 100644 > > > --- a/kernel/auditsc.c > > > +++ b/kernel/auditsc.c > > > @@ -799,6 +799,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val) > > > return rule->mask[word] & bit; > > > } > > > > > > +/** > > > + * audit_filter_uring - apply filters to an io_uring operation > > > + * @tsk: associated task > > > + * @ctx: audit context > > > + */ > > > +static void audit_filter_uring(struct task_struct *tsk, > > > + struct audit_context *ctx) > > > +{ > > > + struct audit_entry *e; > > > + enum audit_state state; > > > + > > > + if (auditd_test_task(tsk)) > > > + return; > > > > Is this necessary? auditd and auditctl don't (intentionally) use any > > io_uring functionality. Is it possible it might inadvertantly use some > > by virtue of libc or other library calls now or in the future? > > I think the better question is what harm does it do? Yes, I'm not > aware of an auditd implementation that currently makes use of > io_uring, but it is also not inconceivable some future implementation > might want to make use of it and given the disjoint nature of kernel > and userspace development I don't want the kernel to block such > developments. However, if you can think of a reason why having this > check here is bad I'm listening (note: we are already in the slow path > here so having the additional check isn't an issue as far as I'm > concerned). > > As a reminder, auditd_test_task() only returns true/1 if the task is > registered with the audit subsystem as an auditd connection, an > auditctl process should not cause this function to return true. My main concern was overhead, since the whole goal of io_uring is speed. The chances that audit does use this functionality in the future suggest to me that it is best to leave this check in. > > > + rcu_read_lock(); > > > + list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT], > > > + list) { > > > + if (audit_in_mask(&e->rule, ctx->uring_op) && > > > > While this seems like the most obvious approach given the parallels > > between syscalls and io_uring operations, as coded here it won't work > > due to the different mappings of syscall numbers and io_uring > > operations unless we re-use the auditctl -S field with raw io_uring > > operation numbers in the place of syscall numbers. This should have > > been obvious to me when I first looked at this patch. It became obvious > > when I started looking at the userspace auditctl.c. > > FWIW, my intention was to treat io_uring opcodes exactly like we treat > syscall numbers. Yes, this would potentially be an issue if we wanted > to combine syscalls and io_uring opcodes into one filter, but why > would we ever want to do that? Combining the two into one filter not > only makes the filter lists longer than needed (we will always know if > we are filtering on a syscall or io_uring op) and complicates the > filter rule processing. > > Or is there a problem with this that I'm missing? No, I think you have a good understanding of it. I'm asking hard questions to avoid missing something important. If we can reuse the syscall infrastructure for this then that is extremely helpful (if not lazy, which isn't necessarily a bad thing). It does mean that the io_uring op dictionary will need to live in userspace audit the way it is currently implemented, or provide a flag to indicate it is a syscall number to be translated in the kernel either at the time of rule addition or translated on the fly on rule check in the kernel adding overhead to a critical path. > > The easy first step would be to use something like this: > > auditctl -a uring,always -S 18,28 -F key=uring_open > > to monitor file open commands only. The same is not yet possible for > > the perm field, but there are so few io_uring ops at this point compared > > with syscalls that it might be manageable. The arch is irrelevant since > > io_uring operation numbers are identical across all hardware as far as I > > can tell. Most of the rest of the fields should make sense if they do > > for a syscall rule. > > I've never been a fan of audit's "perm" filtering; I've always felt > there were better ways to handle that so I'm not overly upset that we > are skipping that functionality with this initial support. If it > becomes a problem in the future we can always add that support at a > later date. Ok, I don't see a pressing need to add it initially, but should add a check to block that field from being used to avoid the confusion of unpredictable behaviour should someone try to add a perm filter to a io_uring filter. That should be done protectively in the kernel and proactively in userspace. > I currently fear that just getting io_uring and audit to coexist is > going to be a large enough problem in the immediate future. Agreed. > > Here's a sample of userspace code to support this > > patch: > > https://github.com/rgbriggs/audit-userspace/commit/a77baa1651b7ad841a220eb962d4cc92bc07dc96 > > https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau-iouring-filtering.v1.0 > > Great, thank you. I haven't grabbed a copy yet for testing, but I will. I've added a perm filter block as an additional patch in userspace and updated the tree so that first commit is no longer the top of tree but the branch name is current. I'll add a kernel perm filter check. I just noticed some list checking that is missing in tree and watch in your patch. Suggested fixup patches to follow... > > If we abuse the syscall infrastructure at first, we'd need a transition > > plan to coordinate user and kernel switchover to seperate mechanisms for > > the two to work together if the need should arise to have both syscall > > and uring filters in the same rule. > > See my comments above, I don't currently see why we would ever want > syscall and io_uring filtering to happen in the same rule. Please > speak up if you can think of a reason why this would either be needed, > or desirable for some reason. I think they can be seperate rules for now. Either a syscall rule catching all io_uring ops can be added, or an io_uring rule can be added to catch specific ops. The scenario I was thinking of was catching syscalls of specific io_uring ops. > > It might be wise to deliberately not support auditctl "-w" (and the > > exported audit_add_watch() function) since that is currently hardcoded > > to the AUDIT_FILTER_EXIT list and is the old watch form [replaced by > > audit_rule_fieldpair_data()] anyways that is more likely to be > > deprecated. It also appears to make sense not to support autrace (at > > least initially). > > I'm going to be honest with you and simply say that I've run out of my > email/review time in front of the computer on this holiday weekend > (blame the lockdown/bpf/lsm discussion <g>) and I need to go for > today, but this is something I'll take a look it this coming week. > Hopefully the comments above give us something to think/talk about in > the meantime. I wasn't expecting you to work the weekend. :-) > Regardless, thanks for your help on the userspace side of the > filtering, that should make testing a lot easier moving forward. Standard RFC disclaimers apply. > paul moore - RGB -- Richard Guy Briggs <rgb@redhat.com> Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635
On Mon, May 31, 2021 at 9:44 AM Richard Guy Briggs <rgb@redhat.com> wrote: > On 2021-05-30 11:26, Paul Moore wrote: > > On Fri, May 28, 2021 at 6:36 PM Richard Guy Briggs <rgb@redhat.com> wrote: > > > On 2021-05-21 17:50, Paul Moore wrote: ... > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > > > index d8aa2c690bf9..4f6ab34020fb 100644 > > > > --- a/kernel/auditsc.c > > > > +++ b/kernel/auditsc.c > > > > @@ -799,6 +799,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val) > > > > return rule->mask[word] & bit; > > > > } > > > > > > > > +/** > > > > + * audit_filter_uring - apply filters to an io_uring operation > > > > + * @tsk: associated task > > > > + * @ctx: audit context > > > > + */ > > > > +static void audit_filter_uring(struct task_struct *tsk, > > > > + struct audit_context *ctx) > > > > +{ > > > > + struct audit_entry *e; > > > > + enum audit_state state; > > > > + > > > > + if (auditd_test_task(tsk)) > > > > + return; > > > > > > Is this necessary? auditd and auditctl don't (intentionally) use any > > > io_uring functionality. Is it possible it might inadvertantly use some > > > by virtue of libc or other library calls now or in the future? > > > > I think the better question is what harm does it do? Yes, I'm not > > aware of an auditd implementation that currently makes use of > > io_uring, but it is also not inconceivable some future implementation > > might want to make use of it and given the disjoint nature of kernel > > and userspace development I don't want the kernel to block such > > developments. However, if you can think of a reason why having this > > check here is bad I'm listening (note: we are already in the slow path > > here so having the additional check isn't an issue as far as I'm > > concerned). > > > > As a reminder, auditd_test_task() only returns true/1 if the task is > > registered with the audit subsystem as an auditd connection, an > > auditctl process should not cause this function to return true. > > My main concern was overhead, since the whole goal of io_uring is speed. At the point where this test takes place we are already in the audit slow path as far as io_uring is concerned. I understand your concern, but the advantage of being able to effectively use io_uring in the future makes this worth keeping in my opinion. > The chances that audit does use this functionality in the future suggest > to me that it is best to leave this check in. Sounds like we are in agreement. We'll keep it for now. > > > > + rcu_read_lock(); > > > > + list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT], > > > > + list) { > > > > + if (audit_in_mask(&e->rule, ctx->uring_op) && > > > > > > While this seems like the most obvious approach given the parallels > > > between syscalls and io_uring operations, as coded here it won't work > > > due to the different mappings of syscall numbers and io_uring > > > operations unless we re-use the auditctl -S field with raw io_uring > > > operation numbers in the place of syscall numbers. This should have > > > been obvious to me when I first looked at this patch. It became obvious > > > when I started looking at the userspace auditctl.c. > > > > FWIW, my intention was to treat io_uring opcodes exactly like we treat > > syscall numbers. Yes, this would potentially be an issue if we wanted > > to combine syscalls and io_uring opcodes into one filter, but why > > would we ever want to do that? Combining the two into one filter not > > only makes the filter lists longer than needed (we will always know if > > we are filtering on a syscall or io_uring op) and complicates the > > filter rule processing. > > > > Or is there a problem with this that I'm missing? > > No, I think you have a good understanding of it. I'm asking hard > questions to avoid missing something important. If we can reuse the > syscall infrastructure for this then that is extremely helpful (if not > lazy, which isn't necessarily a bad thing). It does mean that the > io_uring op dictionary will need to live in userspace audit the way it > is currently implemented .... Which I currently believe is the right thing to do. > > > The easy first step would be to use something like this: > > > auditctl -a uring,always -S 18,28 -F key=uring_open > > > to monitor file open commands only. The same is not yet possible for > > > the perm field, but there are so few io_uring ops at this point compared > > > with syscalls that it might be manageable. The arch is irrelevant since > > > io_uring operation numbers are identical across all hardware as far as I > > > can tell. Most of the rest of the fields should make sense if they do > > > for a syscall rule. > > > > I've never been a fan of audit's "perm" filtering; I've always felt > > there were better ways to handle that so I'm not overly upset that we > > are skipping that functionality with this initial support. If it > > becomes a problem in the future we can always add that support at a > > later date. > > Ok, I don't see a pressing need to add it initially, but should add a > check to block that field from being used to avoid the confusion of > unpredictable behaviour should someone try to add a perm filter to a > io_uring filter. That should be done protectively in the kernel and > proactively in userspace. Sure, that's reasonable. > > > Here's a sample of userspace code to support this > > > patch: > > > https://github.com/rgbriggs/audit-userspace/commit/a77baa1651b7ad841a220eb962d4cc92bc07dc96 > > > https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau-iouring-filtering.v1.0 > > > > Great, thank you. I haven't grabbed a copy yet for testing, but I will. > > I've added a perm filter block as an additional patch in userspace and > updated the tree so that first commit is no longer the top of tree but > the branch name is current. > > I'll add a kernel perm filter check. > > I just noticed some list checking that is missing in tree and watch in > your patch. > > Suggested fixup patches to follow... I see them, thank you, comments will follow over there. Although to be honest I'm mostly focusing on the testing right now while we wait to hear back from Jens on what he is willing to accept regarding audit calls in io_issue_sqe(). If we can't do the _entry()/_exit() calls then this work is pretty much dead and we just have to deal with it in Kconfig. I might make one last, clean patchset and put it in a branch for the distros that want to carry the patchset, but it isn't clear to me that it is something I would want to maintain long term. Long running out of tree patches are generally A Bad Idea. > > > If we abuse the syscall infrastructure at first, we'd need a transition > > > plan to coordinate user and kernel switchover to seperate mechanisms for > > > the two to work together if the need should arise to have both syscall > > > and uring filters in the same rule. > > > > See my comments above, I don't currently see why we would ever want > > syscall and io_uring filtering to happen in the same rule. Please > > speak up if you can think of a reason why this would either be needed, > > or desirable for some reason. > > I think they can be seperate rules for now. Either a syscall rule > catching all io_uring ops can be added, or an io_uring rule can be added > to catch specific ops. The scenario I was thinking of was catching > syscalls of specific io_uring ops. Perhaps I'm misunderstand you, but that scenario really shouldn't exist. The io_uring ops function independently of syscalls; you can *submit* io_uring ops via io_uring_enter(), but they are not guaranteed to be dispatched synchronously (obviously), and given the cred shenanigans that can happen with io_uring there is no guarantee the filters would even be applicable. It isn't an issue of "can" the filters be separate, they *have* to be separate.
On 2021-06-01 21:40, Paul Moore wrote: > On Mon, May 31, 2021 at 9:44 AM Richard Guy Briggs <rgb@redhat.com> wrote: > > On 2021-05-30 11:26, Paul Moore wrote: > > > On Fri, May 28, 2021 at 6:36 PM Richard Guy Briggs <rgb@redhat.com> wrote: > > > > On 2021-05-21 17:50, Paul Moore wrote: > > > > If we abuse the syscall infrastructure at first, we'd need a transition > > > > plan to coordinate user and kernel switchover to seperate mechanisms for > > > > the two to work together if the need should arise to have both syscall > > > > and uring filters in the same rule. > > > > > > See my comments above, I don't currently see why we would ever want > > > syscall and io_uring filtering to happen in the same rule. Please > > > speak up if you can think of a reason why this would either be needed, > > > or desirable for some reason. > > > > I think they can be seperate rules for now. Either a syscall rule > > catching all io_uring ops can be added, or an io_uring rule can be added > > to catch specific ops. The scenario I was thinking of was catching > > syscalls of specific io_uring ops. > > Perhaps I'm misunderstand you, but that scenario really shouldn't > exist. The io_uring ops function independently of syscalls; you can > *submit* io_uring ops via io_uring_enter(), but they are not > guaranteed to be dispatched synchronously (obviously), and given the > cred shenanigans that can happen with io_uring there is no guarantee > the filters would even be applicable. That wasn't my understanding. There are a number of io_uring calls starting with at least open that are currently synchronous (but may become async in future) that we may want to single out which would be a specific io_uring syscall with a specific io_uring opcode. I guess that particular situation would be caught by the io_uring opcode triggering an event that includes SYSCALL and URINGOP records. > It isn't an issue of "can" the filters be separate, they *have* to be separate. > paul moore - RGB -- Richard Guy Briggs <rgb@redhat.com> Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635
On Wed, Jun 2, 2021 at 11:38 AM Richard Guy Briggs <rgb@redhat.com> wrote: > On 2021-06-01 21:40, Paul Moore wrote: > > On Mon, May 31, 2021 at 9:44 AM Richard Guy Briggs <rgb@redhat.com> wrote: > > > On 2021-05-30 11:26, Paul Moore wrote: > > > > On Fri, May 28, 2021 at 6:36 PM Richard Guy Briggs <rgb@redhat.com> wrote: > > > > > On 2021-05-21 17:50, Paul Moore wrote: > > > > > If we abuse the syscall infrastructure at first, we'd need a transition > > > > > plan to coordinate user and kernel switchover to seperate mechanisms for > > > > > the two to work together if the need should arise to have both syscall > > > > > and uring filters in the same rule. > > > > > > > > See my comments above, I don't currently see why we would ever want > > > > syscall and io_uring filtering to happen in the same rule. Please > > > > speak up if you can think of a reason why this would either be needed, > > > > or desirable for some reason. > > > > > > I think they can be seperate rules for now. Either a syscall rule > > > catching all io_uring ops can be added, or an io_uring rule can be added > > > to catch specific ops. The scenario I was thinking of was catching > > > syscalls of specific io_uring ops. > > > > Perhaps I'm misunderstand you, but that scenario really shouldn't > > exist. The io_uring ops function independently of syscalls; you can > > *submit* io_uring ops via io_uring_enter(), but they are not > > guaranteed to be dispatched synchronously (obviously), and given the > > cred shenanigans that can happen with io_uring there is no guarantee > > the filters would even be applicable. > > That wasn't my understanding. There are a number of io_uring calls > starting with at least open that are currently synchronous (but may > become async in future) that we may want to single out which would be a > specific io_uring syscall with a specific io_uring opcode. I guess > that particular situation would be caught by the io_uring opcode > triggering an event that includes SYSCALL and URINGOP records. The only io_uring syscalls are io_uring_setup(2), io_uring_enter(2), etc., the stuff that is dispatched in io_issue_sqe() are the io_uring ops/opcodes/whatever. They *look* like syscalls but they are not and we have to treat them differently. > > It isn't an issue of "can" the filters be separate, they *have* to be separate.
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h index b26e0c435e8b..621eb3c1076e 100644 --- a/include/uapi/linux/audit.h +++ b/include/uapi/linux/audit.h @@ -167,8 +167,9 @@ #define AUDIT_FILTER_EXCLUDE 0x05 /* Apply rule before record creation */ #define AUDIT_FILTER_TYPE AUDIT_FILTER_EXCLUDE /* obsolete misleading naming */ #define AUDIT_FILTER_FS 0x06 /* Apply rule at __audit_inode_child */ +#define AUDIT_FILTER_URING_EXIT 0x07 /* Apply rule at io_uring op exit */ -#define AUDIT_NR_FILTERS 7 +#define AUDIT_NR_FILTERS 8 #define AUDIT_FILTER_PREPEND 0x10 /* Prepend to front of list */ diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c index db2c6b59dfc3..c21119c00504 100644 --- a/kernel/auditfilter.c +++ b/kernel/auditfilter.c @@ -44,7 +44,8 @@ struct list_head audit_filter_list[AUDIT_NR_FILTERS] = { LIST_HEAD_INIT(audit_filter_list[4]), LIST_HEAD_INIT(audit_filter_list[5]), LIST_HEAD_INIT(audit_filter_list[6]), -#if AUDIT_NR_FILTERS != 7 + LIST_HEAD_INIT(audit_filter_list[7]), +#if AUDIT_NR_FILTERS != 8 #error Fix audit_filter_list initialiser #endif }; @@ -56,6 +57,7 @@ static struct list_head audit_rules_list[AUDIT_NR_FILTERS] = { LIST_HEAD_INIT(audit_rules_list[4]), LIST_HEAD_INIT(audit_rules_list[5]), LIST_HEAD_INIT(audit_rules_list[6]), + LIST_HEAD_INIT(audit_rules_list[7]), }; DEFINE_MUTEX(audit_filter_mutex); diff --git a/kernel/auditsc.c b/kernel/auditsc.c index d8aa2c690bf9..4f6ab34020fb 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -799,6 +799,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val) return rule->mask[word] & bit; } +/** + * audit_filter_uring - apply filters to an io_uring operation + * @tsk: associated task + * @ctx: audit context + */ +static void audit_filter_uring(struct task_struct *tsk, + struct audit_context *ctx) +{ + struct audit_entry *e; + enum audit_state state; + + if (auditd_test_task(tsk)) + return; + + rcu_read_lock(); + list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT], + list) { + if (audit_in_mask(&e->rule, ctx->uring_op) && + audit_filter_rules(tsk, &e->rule, ctx, NULL, &state, + false)) { + rcu_read_unlock(); + ctx->current_state = state; + return; + } + } + rcu_read_unlock(); + return; +} + /* At syscall exit time, this filter is called if the audit_state is * not low enough that auditing cannot take place, but is also not * high enough that we already know we have to write an audit record @@ -1754,7 +1783,7 @@ static void audit_log_exit(void) * __audit_free - free a per-task audit context * @tsk: task whose audit context block to free * - * Called from copy_process and do_exit + * Called from copy_process, do_exit, and the io_uring code */ void __audit_free(struct task_struct *tsk) { @@ -1772,15 +1801,21 @@ void __audit_free(struct task_struct *tsk) * random task_struct that doesn't doesn't have any meaningful data we * need to log via audit_log_exit(). */ - if (tsk == current && !context->dummy && - context->context == AUDIT_CTX_SYSCALL) { + if (tsk == current && !context->dummy) { context->return_valid = AUDITSC_INVALID; context->return_code = 0; - - audit_filter_syscall(tsk, context); - audit_filter_inodes(tsk, context); - if (context->current_state == AUDIT_RECORD_CONTEXT) - audit_log_exit(); + if (context->context == AUDIT_CTX_SYSCALL) { + audit_filter_syscall(tsk, context); + audit_filter_inodes(tsk, context); + if (context->current_state == AUDIT_RECORD_CONTEXT) + audit_log_exit(); + } else if (context->context == AUDIT_CTX_URING) { + /* TODO: verify this case is real and valid */ + audit_filter_uring(tsk, context); + audit_filter_inodes(tsk, context); + if (context->current_state == AUDIT_RECORD_CONTEXT) + audit_log_uring(context); + } } audit_set_context(tsk, NULL); @@ -1861,12 +1896,6 @@ void __audit_uring_exit(int success, long code) { struct audit_context *ctx = audit_context(); - /* - * TODO: At some point we will likely want to filter on io_uring ops - * and other things similar to what we do for syscalls, but that - * is something for another day; just record what we can here. - */ - if (!ctx || ctx->dummy) goto out; if (ctx->context == AUDIT_CTX_SYSCALL) { @@ -1891,6 +1920,8 @@ void __audit_uring_exit(int success, long code) * the behavior here. */ audit_filter_syscall(current, ctx); + if (ctx->current_state != AUDIT_RECORD_CONTEXT) + audit_filter_uring(current, ctx); audit_filter_inodes(current, ctx); if (ctx->current_state != AUDIT_RECORD_CONTEXT) return; @@ -1899,7 +1930,9 @@ void __audit_uring_exit(int success, long code) return; } #if 1 - /* XXX - temporary hack to force record generation */ + /* XXX - temporary hack to force record generation, we are leaving this + * enabled, but if you want to actually test the filtering you + * need to disable this #if/#endif block */ ctx->current_state = AUDIT_RECORD_CONTEXT; #endif @@ -1907,6 +1940,8 @@ void __audit_uring_exit(int success, long code) if (!list_empty(&ctx->killed_trees)) audit_kill_trees(ctx); + /* run through both filters to ensure we set the filterkey properly */ + audit_filter_uring(current, ctx); audit_filter_inodes(current, ctx); if (ctx->current_state != AUDIT_RECORD_CONTEXT) goto out;
WARNING - This is a work in progress and should not be merged anywhere important. It is almost surely not complete, and while it probably compiles it likely hasn't been booted and will do terrible things. You have been warned. This patch adds basic audit io_uring filtering, using as much of the existing audit filtering infrastructure as possible. In order to do this we reuse the audit filter rule's syscall mask for the io_uring operation and we create a new filter for io_uring operations as AUDIT_FILTER_URING_EXIT/audit_filter_list[7]. <TODO - provide some additional guidance for the userspace tools> Signed-off-by: Paul Moore <paul@paul-moore.com> --- include/uapi/linux/audit.h | 3 +- kernel/auditfilter.c | 4 ++- kernel/auditsc.c | 65 ++++++++++++++++++++++++++++++++++---------- 3 files changed, 55 insertions(+), 17 deletions(-)