Message ID | 162294979358.65917.10329704371767126341.stgit@olly (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] audit: reduce the number of kauditd_thread wakeups | expand |
On 2021-06-05 23:23, Paul Moore wrote: > [NOTE: As this is an RFC patch, I wanted to add some commentary at > the top of the patch description explaining where this patch came > from and what testing has been done. This patch is a derivative > of another unreleased patch that removed all of the wake up calls > from the audit_log_start() and audit_log_end() functions; while > that patch worked well, there we some record losees with high > volume burst traffic in the case of `auditctl -a task,never` or > CONFIG_AUDITSYSCALL=n. As this patch doesn't completely remove > the wake up calls these two cases should behave similarly to how > they do today. As far as testing is concerned, this patch passes > both the audit-testsuite and selinux-testsuite without problem and > with expected audit queue losses (no losses outside of the tests > which attempt to generate losses). This patch also preserves the > ability for the system to continue to function, in the > `auditctl -a exit,always -S all` case, albeit very slowly.] > > When audit is enabled, the kauditd_thread() function runs in its own > kernel thread, emptying the audit record queue and sending the > entries to userspace via different means. As part of its normal > processing it goes to sleep after emptying the queue and waits for > the audit functions to wake it. > > The current audit kernel code attempts to wake the kauditd thread > after each entry is added to the queue. Considering that a single > syscall can have multiple audit records, with each wake attempt > involving locking and additional processing, this can be rather > wasteful. In an effort to limit the number of wake attempts without > unnecessarily risking the audit queue size this patch does the > following: > > * In the case of syscall auditing the wake up call is moved from > audit_log_end() to audit_log_exit() meaning that the kauditd > thread is only woken once, at the end of the syscall, after all of > the syscall's audit records have been added to the queue. > > * In the case where audit records are generated outside of a syscall > context, the wake up call in audit_log_end() is preserved in order > to ensure that these records do not suffer any additional latency > or put unnecessary pressure on the queue. This is done through > some additional checking in audit_log_start() and an additional > flag in the audit_buffer struct. > > * The audit_log_start() function never attempts to wake the kauditd > thread. In the current code this is only done when the queue is > under pressure, but at that point it is extremely likely that the > thread is already active or scheduled, do we should be able to > safely remove this wake attempt. > > * Always wake the kauditd thread after processing management and > user records in audit_receive_msg(). This should be relatively > low frequency compared to the other audit sources, and doing a > wake call here helps keep record latency low and the queue size > in check. > > * The kauditd thread itself is now a bit better at handling high > volume audit record bursts. Previously after emptying the queue > the thread would wake every process that was blocked and then go > to sleep; possibly going to sleep just a flood of new records > are added to the queue. The new kauditd thread approach wakes all > of the blocked processes and then reschedules itself in > anticipation of new records. When the thread returns to execution > it checks the queue and if there are any records present it > immediately starts processing them, if the queue is empty the > kauditd thread goes back to sleep. > > Signed-off-by: Paul Moore <paul@paul-moore.com> This looks good to me. Thank you for the thorough description. I can see how this work was provoked given some of the other work in progress and some of the minor changes that will be needed to those other works as a result. Acked-by: Richard Guy Briggs <rgb@redhat.com> > --- > kernel/audit.c | 66 ++++++++++++++++++++++++++++++++++++++---------------- > kernel/audit.h | 2 ++ > kernel/auditsc.c | 2 ++ > 3 files changed, 51 insertions(+), 19 deletions(-) > > diff --git a/kernel/audit.c b/kernel/audit.c > index 551a394bc8f42..7c3368835bb71 100644 > --- a/kernel/audit.c > +++ b/kernel/audit.c > @@ -197,9 +197,10 @@ static struct audit_ctl_mutex { > * to place it on a transmit queue. Multiple audit_buffers can be in > * use simultaneously. */ > struct audit_buffer { > - struct sk_buff *skb; /* formatted skb ready to send */ > - struct audit_context *ctx; /* NULL or associated context */ > - gfp_t gfp_mask; > + struct sk_buff *skb; > + struct audit_context *ctx; > + gfp_t gfp_mask; > + unsigned int kauditd_wake:1; > }; > > struct audit_reply { > @@ -804,6 +805,17 @@ static void kauditd_send_multicast_skb(struct sk_buff *skb) > nlmsg_multicast(sock, copy, 0, AUDIT_NLGRP_READLOG, GFP_KERNEL); > } > > +/** > + * kauditd_wakeup - Wake the kauditd_thread > + * > + * Description: > + * Wake up the kauditd_thread thread so that it can process the audit queues. > + */ > +void kauditd_wakeup(void) > +{ > + wake_up_interruptible(&kauditd_wait); > +} > + > /** > * kauditd_thread - Worker thread to send audit records to userspace > * @dummy: unused > @@ -832,6 +844,7 @@ static int kauditd_thread(void *dummy) > portid = ac->portid; > rcu_read_unlock(); > > +all_queues: > /* attempt to flush the hold queue */ > rc = kauditd_send_queue(sk, portid, > &audit_hold_queue, UNICAST_RETRIES, > @@ -861,25 +874,33 @@ static int kauditd_thread(void *dummy) > kauditd_send_multicast_skb, > (sk ? > kauditd_retry_skb : kauditd_hold_skb)); > - if (ac && rc < 0) > + if (ac && rc < 0) { > auditd_reset(ac); > - sk = NULL; > + goto disconnect; > + } > + > + /* we have processed all the queues so wake everyone who might > + * be waiting, reschedule ourselves to give others a chance to > + * run, and check the queue once again when we return; if > + * everything is quiet then we can go back to sleep */ > + wake_up(&audit_backlog_wait); > + cond_resched(); > + if (skb_queue_len(&audit_queue)) > + goto all_queues; > > +disconnect: > /* drop our netns reference, no auditd sends past this line */ > + sk = NULL; > if (net) { > put_net(net); > net = NULL; > } > > - /* we have processed all the queues so wake everyone */ > - wake_up(&audit_backlog_wait); > - > /* NOTE: we want to wake up if there is anything on the queue, > * regardless of if an auditd is connected, as we need to > * do the multicast send and rotate records from the > * main queue to the retry/hold queues */ > - wait_event_freezable(kauditd_wait, > - (skb_queue_len(&audit_queue) ? 1 : 0)); > + wait_event_freezable(kauditd_wait, skb_queue_len(&audit_queue)); > } > > return 0; > @@ -1509,6 +1530,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh) > break; > } > > + /* poke the kauditd_thread to handle any records we generated */ > + kauditd_wakeup(); > + > return err < 0 ? err : 0; > } > > @@ -1750,6 +1774,7 @@ static struct audit_buffer *audit_buffer_alloc(struct audit_context *ctx, > > ab->ctx = ctx; > ab->gfp_mask = gfp_mask; > + ab->kauditd_wake = 0; > > return ab; > > @@ -1831,14 +1856,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, > > while (audit_backlog_limit && > (skb_queue_len(&audit_queue) > audit_backlog_limit)) { > - /* wake kauditd to try and flush the queue */ > - wake_up_interruptible(&kauditd_wait); > - > /* sleep if we are allowed and we haven't exhausted our > * backlog wait limit */ > if (gfpflags_allow_blocking(gfp_mask) && (stime > 0)) { > long rtime = stime; > - > DECLARE_WAITQUEUE(wait, current); > > add_wait_queue_exclusive(&audit_backlog_wait, > @@ -1864,10 +1885,14 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, > return NULL; > } > > - audit_get_stamp(ab->ctx, &t, &serial); > /* cancel dummy context to enable supporting records */ > if (ctx) > ctx->dummy = 0; > + /* force a kauditd wakeup if the normal syscall exit isn't doing it */ > + if (!ctx || !ctx->in_syscall) > + ab->kauditd_wake = 1; > + > + audit_get_stamp(ab->ctx, &t, &serial); > audit_log_format(ab, "audit(%llu.%03lu:%u): ", > (unsigned long long)t.tv_sec, t.tv_nsec/1000000, serial); > > @@ -2365,8 +2390,8 @@ int audit_signal_info(int sig, struct task_struct *t) > * > * We can not do a netlink send inside an irq context because it blocks (last > * arg, flags, is not set to MSG_DONTWAIT), so the audit buffer is placed on a > - * queue and a kthread is scheduled to remove them from the queue outside the > - * irq context. May be called in any context. > + * queue where a kthread processes the buffer from outside the irq context. > + * May be called in any context. > */ > void audit_log_end(struct audit_buffer *ab) > { > @@ -2385,9 +2410,12 @@ void audit_log_end(struct audit_buffer *ab) > nlh = nlmsg_hdr(skb); > nlh->nlmsg_len = skb->len - NLMSG_HDRLEN; > > - /* queue the netlink packet and poke the kauditd thread */ > + /* queue the netlink packet */ > skb_queue_tail(&audit_queue, skb); > - wake_up_interruptible(&kauditd_wait); > + > + /* only wakeup kauditd if we can't do it later */ > + if (ab->kauditd_wake) > + kauditd_wakeup(); > } else > audit_log_lost("rate limit exceeded"); > > diff --git a/kernel/audit.h b/kernel/audit.h > index 3b9c0945225a1..b5f677017e975 100644 > --- a/kernel/audit.h > +++ b/kernel/audit.h > @@ -213,6 +213,8 @@ static inline int audit_hash_ino(u32 ino) > /* Indicates that audit should log the full pathname. */ > #define AUDIT_NAME_FULL -1 > > +extern void kauditd_wakeup(void); > + > extern int audit_match_class(int class, unsigned syscall); > extern int audit_comparator(const u32 left, const u32 op, const u32 right); > extern int audit_uid_comparator(kuid_t left, u32 op, kuid_t right); > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > index 47fb48f42c934..bad776497438f 100644 > --- a/kernel/auditsc.c > +++ b/kernel/auditsc.c > @@ -1600,6 +1600,8 @@ static void audit_log_exit(void) > audit_log_end(ab); > if (call_panic) > audit_panic("error converting sid to string"); > + > + kauditd_wakeup(); > } > > /** > > -- > Linux-audit mailing list > Linux-audit@redhat.com > https://listman.redhat.com/mailman/listinfo/linux-audit - RGB -- Richard Guy Briggs <rgb@redhat.com> Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635 -- Linux-audit mailing list Linux-audit@redhat.com https://listman.redhat.com/mailman/listinfo/linux-audit
On Mon, Jun 7, 2021 at 2:40 PM Richard Guy Briggs <rgb@redhat.com> wrote: > On 2021-06-05 23:23, Paul Moore wrote: > > [NOTE: As this is an RFC patch, I wanted to add some commentary at > > the top of the patch description explaining where this patch came > > from and what testing has been done. This patch is a derivative > > of another unreleased patch that removed all of the wake up calls > > from the audit_log_start() and audit_log_end() functions; while > > that patch worked well, there we some record losees with high > > volume burst traffic in the case of `auditctl -a task,never` or > > CONFIG_AUDITSYSCALL=n. As this patch doesn't completely remove > > the wake up calls these two cases should behave similarly to how > > they do today. As far as testing is concerned, this patch passes > > both the audit-testsuite and selinux-testsuite without problem and > > with expected audit queue losses (no losses outside of the tests > > which attempt to generate losses). This patch also preserves the > > ability for the system to continue to function, in the > > `auditctl -a exit,always -S all` case, albeit very slowly.] > > > > When audit is enabled, the kauditd_thread() function runs in its own > > kernel thread, emptying the audit record queue and sending the > > entries to userspace via different means. As part of its normal > > processing it goes to sleep after emptying the queue and waits for > > the audit functions to wake it. > > > > The current audit kernel code attempts to wake the kauditd thread > > after each entry is added to the queue. Considering that a single > > syscall can have multiple audit records, with each wake attempt > > involving locking and additional processing, this can be rather > > wasteful. In an effort to limit the number of wake attempts without > > unnecessarily risking the audit queue size this patch does the > > following: > > > > * In the case of syscall auditing the wake up call is moved from > > audit_log_end() to audit_log_exit() meaning that the kauditd > > thread is only woken once, at the end of the syscall, after all of > > the syscall's audit records have been added to the queue. > > > > * In the case where audit records are generated outside of a syscall > > context, the wake up call in audit_log_end() is preserved in order > > to ensure that these records do not suffer any additional latency > > or put unnecessary pressure on the queue. This is done through > > some additional checking in audit_log_start() and an additional > > flag in the audit_buffer struct. > > > > * The audit_log_start() function never attempts to wake the kauditd > > thread. In the current code this is only done when the queue is > > under pressure, but at that point it is extremely likely that the > > thread is already active or scheduled, do we should be able to > > safely remove this wake attempt. > > > > * Always wake the kauditd thread after processing management and > > user records in audit_receive_msg(). This should be relatively > > low frequency compared to the other audit sources, and doing a > > wake call here helps keep record latency low and the queue size > > in check. > > > > * The kauditd thread itself is now a bit better at handling high > > volume audit record bursts. Previously after emptying the queue > > the thread would wake every process that was blocked and then go > > to sleep; possibly going to sleep just a flood of new records > > are added to the queue. The new kauditd thread approach wakes all > > of the blocked processes and then reschedules itself in > > anticipation of new records. When the thread returns to execution > > it checks the queue and if there are any records present it > > immediately starts processing them, if the queue is empty the > > kauditd thread goes back to sleep. > > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > This looks good to me. Thank you for the thorough description. I can > see how this work was provoked given some of the other work in progress > and some of the minor changes that will be needed to those other works > as a result. > > Acked-by: Richard Guy Briggs <rgb@redhat.com> Thanks for taking a look. As a FYI, I'm going to sit on this until the io_uring patches are merged as there is some overlap, and I want to possibly play with this a bit more as well. I wanted to post this to see if people had any strong feelings about this, and of course just to get it "out there" so I wouldn't forget about it :)
diff --git a/kernel/audit.c b/kernel/audit.c index 551a394bc8f42..7c3368835bb71 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -197,9 +197,10 @@ static struct audit_ctl_mutex { * to place it on a transmit queue. Multiple audit_buffers can be in * use simultaneously. */ struct audit_buffer { - struct sk_buff *skb; /* formatted skb ready to send */ - struct audit_context *ctx; /* NULL or associated context */ - gfp_t gfp_mask; + struct sk_buff *skb; + struct audit_context *ctx; + gfp_t gfp_mask; + unsigned int kauditd_wake:1; }; struct audit_reply { @@ -804,6 +805,17 @@ static void kauditd_send_multicast_skb(struct sk_buff *skb) nlmsg_multicast(sock, copy, 0, AUDIT_NLGRP_READLOG, GFP_KERNEL); } +/** + * kauditd_wakeup - Wake the kauditd_thread + * + * Description: + * Wake up the kauditd_thread thread so that it can process the audit queues. + */ +void kauditd_wakeup(void) +{ + wake_up_interruptible(&kauditd_wait); +} + /** * kauditd_thread - Worker thread to send audit records to userspace * @dummy: unused @@ -832,6 +844,7 @@ static int kauditd_thread(void *dummy) portid = ac->portid; rcu_read_unlock(); +all_queues: /* attempt to flush the hold queue */ rc = kauditd_send_queue(sk, portid, &audit_hold_queue, UNICAST_RETRIES, @@ -861,25 +874,33 @@ static int kauditd_thread(void *dummy) kauditd_send_multicast_skb, (sk ? kauditd_retry_skb : kauditd_hold_skb)); - if (ac && rc < 0) + if (ac && rc < 0) { auditd_reset(ac); - sk = NULL; + goto disconnect; + } + + /* we have processed all the queues so wake everyone who might + * be waiting, reschedule ourselves to give others a chance to + * run, and check the queue once again when we return; if + * everything is quiet then we can go back to sleep */ + wake_up(&audit_backlog_wait); + cond_resched(); + if (skb_queue_len(&audit_queue)) + goto all_queues; +disconnect: /* drop our netns reference, no auditd sends past this line */ + sk = NULL; if (net) { put_net(net); net = NULL; } - /* we have processed all the queues so wake everyone */ - wake_up(&audit_backlog_wait); - /* NOTE: we want to wake up if there is anything on the queue, * regardless of if an auditd is connected, as we need to * do the multicast send and rotate records from the * main queue to the retry/hold queues */ - wait_event_freezable(kauditd_wait, - (skb_queue_len(&audit_queue) ? 1 : 0)); + wait_event_freezable(kauditd_wait, skb_queue_len(&audit_queue)); } return 0; @@ -1509,6 +1530,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh) break; } + /* poke the kauditd_thread to handle any records we generated */ + kauditd_wakeup(); + return err < 0 ? err : 0; } @@ -1750,6 +1774,7 @@ static struct audit_buffer *audit_buffer_alloc(struct audit_context *ctx, ab->ctx = ctx; ab->gfp_mask = gfp_mask; + ab->kauditd_wake = 0; return ab; @@ -1831,14 +1856,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, while (audit_backlog_limit && (skb_queue_len(&audit_queue) > audit_backlog_limit)) { - /* wake kauditd to try and flush the queue */ - wake_up_interruptible(&kauditd_wait); - /* sleep if we are allowed and we haven't exhausted our * backlog wait limit */ if (gfpflags_allow_blocking(gfp_mask) && (stime > 0)) { long rtime = stime; - DECLARE_WAITQUEUE(wait, current); add_wait_queue_exclusive(&audit_backlog_wait, @@ -1864,10 +1885,14 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, return NULL; } - audit_get_stamp(ab->ctx, &t, &serial); /* cancel dummy context to enable supporting records */ if (ctx) ctx->dummy = 0; + /* force a kauditd wakeup if the normal syscall exit isn't doing it */ + if (!ctx || !ctx->in_syscall) + ab->kauditd_wake = 1; + + audit_get_stamp(ab->ctx, &t, &serial); audit_log_format(ab, "audit(%llu.%03lu:%u): ", (unsigned long long)t.tv_sec, t.tv_nsec/1000000, serial); @@ -2365,8 +2390,8 @@ int audit_signal_info(int sig, struct task_struct *t) * * We can not do a netlink send inside an irq context because it blocks (last * arg, flags, is not set to MSG_DONTWAIT), so the audit buffer is placed on a - * queue and a kthread is scheduled to remove them from the queue outside the - * irq context. May be called in any context. + * queue where a kthread processes the buffer from outside the irq context. + * May be called in any context. */ void audit_log_end(struct audit_buffer *ab) { @@ -2385,9 +2410,12 @@ void audit_log_end(struct audit_buffer *ab) nlh = nlmsg_hdr(skb); nlh->nlmsg_len = skb->len - NLMSG_HDRLEN; - /* queue the netlink packet and poke the kauditd thread */ + /* queue the netlink packet */ skb_queue_tail(&audit_queue, skb); - wake_up_interruptible(&kauditd_wait); + + /* only wakeup kauditd if we can't do it later */ + if (ab->kauditd_wake) + kauditd_wakeup(); } else audit_log_lost("rate limit exceeded"); diff --git a/kernel/audit.h b/kernel/audit.h index 3b9c0945225a1..b5f677017e975 100644 --- a/kernel/audit.h +++ b/kernel/audit.h @@ -213,6 +213,8 @@ static inline int audit_hash_ino(u32 ino) /* Indicates that audit should log the full pathname. */ #define AUDIT_NAME_FULL -1 +extern void kauditd_wakeup(void); + extern int audit_match_class(int class, unsigned syscall); extern int audit_comparator(const u32 left, const u32 op, const u32 right); extern int audit_uid_comparator(kuid_t left, u32 op, kuid_t right); diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 47fb48f42c934..bad776497438f 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -1600,6 +1600,8 @@ static void audit_log_exit(void) audit_log_end(ab); if (call_panic) audit_panic("error converting sid to string"); + + kauditd_wakeup(); } /**
[NOTE: As this is an RFC patch, I wanted to add some commentary at the top of the patch description explaining where this patch came from and what testing has been done. This patch is a derivative of another unreleased patch that removed all of the wake up calls from the audit_log_start() and audit_log_end() functions; while that patch worked well, there we some record losees with high volume burst traffic in the case of `auditctl -a task,never` or CONFIG_AUDITSYSCALL=n. As this patch doesn't completely remove the wake up calls these two cases should behave similarly to how they do today. As far as testing is concerned, this patch passes both the audit-testsuite and selinux-testsuite without problem and with expected audit queue losses (no losses outside of the tests which attempt to generate losses). This patch also preserves the ability for the system to continue to function, in the `auditctl -a exit,always -S all` case, albeit very slowly.] When audit is enabled, the kauditd_thread() function runs in its own kernel thread, emptying the audit record queue and sending the entries to userspace via different means. As part of its normal processing it goes to sleep after emptying the queue and waits for the audit functions to wake it. The current audit kernel code attempts to wake the kauditd thread after each entry is added to the queue. Considering that a single syscall can have multiple audit records, with each wake attempt involving locking and additional processing, this can be rather wasteful. In an effort to limit the number of wake attempts without unnecessarily risking the audit queue size this patch does the following: * In the case of syscall auditing the wake up call is moved from audit_log_end() to audit_log_exit() meaning that the kauditd thread is only woken once, at the end of the syscall, after all of the syscall's audit records have been added to the queue. * In the case where audit records are generated outside of a syscall context, the wake up call in audit_log_end() is preserved in order to ensure that these records do not suffer any additional latency or put unnecessary pressure on the queue. This is done through some additional checking in audit_log_start() and an additional flag in the audit_buffer struct. * The audit_log_start() function never attempts to wake the kauditd thread. In the current code this is only done when the queue is under pressure, but at that point it is extremely likely that the thread is already active or scheduled, do we should be able to safely remove this wake attempt. * Always wake the kauditd thread after processing management and user records in audit_receive_msg(). This should be relatively low frequency compared to the other audit sources, and doing a wake call here helps keep record latency low and the queue size in check. * The kauditd thread itself is now a bit better at handling high volume audit record bursts. Previously after emptying the queue the thread would wake every process that was blocked and then go to sleep; possibly going to sleep just a flood of new records are added to the queue. The new kauditd thread approach wakes all of the blocked processes and then reschedules itself in anticipation of new records. When the thread returns to execution it checks the queue and if there are any records present it immediately starts processing them, if the queue is empty the kauditd thread goes back to sleep. Signed-off-by: Paul Moore <paul@paul-moore.com> --- kernel/audit.c | 66 ++++++++++++++++++++++++++++++++++++++---------------- kernel/audit.h | 2 ++ kernel/auditsc.c | 2 ++ 3 files changed, 51 insertions(+), 19 deletions(-) -- Linux-audit mailing list Linux-audit@redhat.com https://listman.redhat.com/mailman/listinfo/linux-audit