mbox series

[0/4] audit: refactor and fix for potential deadlock

Message ID 20230508075812.76077-1-eiichi.tsukata@nutanix.com (mailing list archive)
Headers show
Series audit: refactor and fix for potential deadlock | expand

Message

Eiichi Tsukata May 8, 2023, 7:58 a.m. UTC
Commit 7ffb8e317bae ("audit: we don't need to
__set_current_state(TASK_RUNNING)") accidentally moved queue full check
before add_wait_queue_exclusive() which introduced the following race:

    CPU1                           CPU2
  ========                       ========
  (in audit_log_start())         (in kauditd_thread())

  queue is full
                                 wake_up(&audit_backlog_wait)
                                 wait_event_freezable()
  add_wait_queue_exclusive()
  ...
  schedule_timeout()

Once this happens, both audit_log_start() and kauditd_thread() can cause
deadlock for up to backlog_wait_time waiting for each other. To prevent
the race, this patch adds queue full check after
prepare_to_wait_exclusive().

This patchset consists of the following parts:

Patch 1-3: Refactorings and a small fix preferable for deadlock fix
Patch 4: Actual deadlock fix

Eiichi Tsukata (4):
  audit: refactor queue full checks
  audit: account backlog waiting time in audit_receive()
  audit: convert DECLARE_WAITQUEUE to DEFINE_WAIT
  audit: check if queue is full after prepare_to_wait_exclusive()

 kernel/audit.c | 71 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 28 deletions(-)

Comments

Paul Moore May 8, 2023, 2:07 p.m. UTC | #1
On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
<eiichi.tsukata@nutanix.com> wrote:
> Commit 7ffb8e317bae ("audit: we don't need to
> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> before add_wait_queue_exclusive() which introduced the following race:
>
>     CPU1                           CPU2
>   ========                       ========
>   (in audit_log_start())         (in kauditd_thread())
>
>   queue is full
>                                  wake_up(&audit_backlog_wait)
>                                  wait_event_freezable()
>   add_wait_queue_exclusive()
>   ...
>   schedule_timeout()
>
> Once this happens, both audit_log_start() and kauditd_thread() can cause
> deadlock for up to backlog_wait_time waiting for each other. To prevent
> the race, this patch adds queue full check after
> prepare_to_wait_exclusive().

Have you seen this occur in practice?  The first thing that
audit_log_start() does when detecting that @audit_queue is full is to
wake kauditd_thread, which should start immediately draining
@audit_queue (even if a daemon is not listening, the @audit_queue will
be drained into one of the other queues).  While it is hard to predict
scheduling behavior, it seems unlikely that kauditd_thread would be
able to process the entire queue before audit_log_start() progresses
from waking kauditd_thread() to sleeping.

In the absolute worst case, the task attempting to emit an audit
record sleeps for either the configured wait time or until it is
awakened by kauditd_thread() which could be triggered by another task
attempting to emit an audit record.
Eiichi Tsukata May 9, 2023, 1:34 a.m. UTC | #2
> On May 8, 2023, at 23:07, Paul Moore <paul@paul-moore.com> wrote:
> 
> On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
> <eiichi.tsukata@nutanix.com> wrote:
>> Commit 7ffb8e317bae ("audit: we don't need to
>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
>> before add_wait_queue_exclusive() which introduced the following race:
>> 
>>    CPU1                           CPU2
>>  ========                       ========
>>  (in audit_log_start())         (in kauditd_thread())
>> 
>>  queue is full
>>                                 wake_up(&audit_backlog_wait)
>>                                 wait_event_freezable()
>>  add_wait_queue_exclusive()
>>  ...
>>  schedule_timeout()
>> 
>> Once this happens, both audit_log_start() and kauditd_thread() can cause
>> deadlock for up to backlog_wait_time waiting for each other. To prevent
>> the race, this patch adds queue full check after
>> prepare_to_wait_exclusive().
> 
> Have you seen this occur in practice?  

Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
timeout.

Eiichi
Eiichi Tsukata May 10, 2023, 8:09 a.m. UTC | #3
> On May 9, 2023, at 10:34, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> 
> 
>> On May 8, 2023, at 23:07, Paul Moore <paul@paul-moore.com> wrote:
>> 
>> On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
>> <eiichi.tsukata@nutanix.com> wrote:
>>> Commit 7ffb8e317bae ("audit: we don't need to
>>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
>>> before add_wait_queue_exclusive() which introduced the following race:
>>> 
>>>   CPU1                           CPU2
>>> ========                       ========
>>> (in audit_log_start())         (in kauditd_thread())
>>> 
>>> queue is full
>>>                                wake_up(&audit_backlog_wait)
>>>                                wait_event_freezable()
>>> add_wait_queue_exclusive()
>>> ...
>>> schedule_timeout()
>>> 
>>> Once this happens, both audit_log_start() and kauditd_thread() can cause
>>> deadlock for up to backlog_wait_time waiting for each other. To prevent
>>> the race, this patch adds queue full check after
>>> prepare_to_wait_exclusive().
>> 
>> Have you seen this occur in practice?  
> 
> Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
> In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
> timeout.
> 

I found another case.

kauditd_thread issues wake_up(&audit_backlog_wait) once after wake up.
As waiter side is using add_wait_queue_exclusive() it only wakes up one process at once.

If kauditd wakes up a process which is sleeping in audit_log_start(), then the process will
queue skb and issue wake_up_interruptible(&kauditd_wait). No problem.
But if kauditd wakes up a process which is sleeping in audit_receive(), then the process
won’t try to wake up kauditd. In this case other processes sleeping in audit_log_start()
keep sleeping even if kauditd have flushed the queue.

At this point I’m planning to use non-exclusive wait in audit_receive() in v2.
Let me know if we should use wake_up_all() in kauditd or you have better solution.

Eiichi
Paul Moore May 17, 2023, 4:03 p.m. UTC | #4
On Mon, May 8, 2023 at 9:49 PM Eiichi Tsukata
<eiichi.tsukata@nutanix.com> wrote:
> > On May 8, 2023, at 23:07, Paul Moore <paul@paul-moore.com> wrote:
> > On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
> > <eiichi.tsukata@nutanix.com> wrote:
> >> Commit 7ffb8e317bae ("audit: we don't need to
> >> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> >> before add_wait_queue_exclusive() which introduced the following race:
> >>
> >>    CPU1                           CPU2
> >>  ========                       ========
> >>  (in audit_log_start())         (in kauditd_thread())
> >>
> >>  queue is full
> >>                                 wake_up(&audit_backlog_wait)
> >>                                 wait_event_freezable()
> >>  add_wait_queue_exclusive()
> >>  ...
> >>  schedule_timeout()
> >>
> >> Once this happens, both audit_log_start() and kauditd_thread() can cause
> >> deadlock for up to backlog_wait_time waiting for each other. To prevent
> >> the race, this patch adds queue full check after
> >> prepare_to_wait_exclusive().
> >
> > Have you seen this occur in practice?
>
> Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
> In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
> timeout.

Sorry for the delay, I was away at a conference last week.

Regardless of how we tweak the wait, there is always going to be a
chance that rescheduling the task with a timeout is going to impact
the application.  If you have sensitive applications where this is
proving to be a problem I would suggest tuning your backlog size, the
wait time, or removing the backlog limit entirely.

--
paul-moore.com
Paul Moore May 17, 2023, 4:15 p.m. UTC | #5
On Wed, May 10, 2023 at 4:09 AM Eiichi Tsukata
<eiichi.tsukata@nutanix.com> wrote:
> > On May 9, 2023, at 10:34, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> >> On May 8, 2023, at 23:07, Paul Moore <paul@paul-moore.com> wrote:
> >> On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
> >> <eiichi.tsukata@nutanix.com> wrote:
> >>> Commit 7ffb8e317bae ("audit: we don't need to
> >>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> >>> before add_wait_queue_exclusive() which introduced the following race:
> >>>
> >>>   CPU1                           CPU2
> >>> ========                       ========
> >>> (in audit_log_start())         (in kauditd_thread())
> >>>
> >>> queue is full
> >>>                                wake_up(&audit_backlog_wait)
> >>>                                wait_event_freezable()
> >>> add_wait_queue_exclusive()
> >>> ...
> >>> schedule_timeout()
> >>>
> >>> Once this happens, both audit_log_start() and kauditd_thread() can cause
> >>> deadlock for up to backlog_wait_time waiting for each other. To prevent
> >>> the race, this patch adds queue full check after
> >>> prepare_to_wait_exclusive().
> >>
> >> Have you seen this occur in practice?
> >
> > Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
> > In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
> > timeout.
> >
>
> I found another case.
>
> kauditd_thread issues wake_up(&audit_backlog_wait) once after wake up.
> As waiter side is using add_wait_queue_exclusive() it only wakes up one process at once.
>
> If kauditd wakes up a process which is sleeping in audit_log_start(), then the process will
> queue skb and issue wake_up_interruptible(&kauditd_wait). No problem.
> But if kauditd wakes up a process which is sleeping in audit_receive(), then the process
> won’t try to wake up kauditd. In this case other processes sleeping in audit_log_start()
> keep sleeping even if kauditd have flushed the queue.
>
> At this point I’m planning to use non-exclusive wait in audit_receive() in v2.
> Let me know if we should use wake_up_all() in kauditd or you have better solution.

The nice part about marking the waiters as WQ_FLAG_EXCLUSIVE is that
we avoid the thundering herd problem.  The waiter was held in the
first place because the system was under high levels of audit
pressure, so it stands to reason that a slow wake-up of the waiters is
a wise choice to avoid re-entering another audit pressure state.

I'll take a look at your v2 patch, but as things stand I still believe
that this is better addressed via runtime configuration (see my
previous email).  However, some of the refactoring you've done might
be nice, I'll reply back on that in your v2 patch.

Thanks.