Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart

HI all,

We recently found another problem when running a simple test case
where multiple processes open/read/close a single file, and a network
disruption or smb restart occurs.

After just a few minutes of restarts, a few of the processes error
out, but a few remain.  If you stop the disruption, the share is
accessible again, but the original processes are blocked indefinitely
waiting on lock_sem

Once this occurs, you'll see two threads deadlocked like the below.  Basically:
thread1: down_read(lock_sem) inside cifs_strict_readv(); does not
block, but blocks on network temporarily
thread2: down_write(lock_sem) inside _cifsFileInfo_put; blocks due to thread 1
thread1: down_read(lock_sem) inside cifs_reopen_file (after
reconnect); unblocks to reconnect, but then blocks due to thread 2

At this point there is a deadlock, both threads are waiting for each
other to release lock_sem

The root of the problem seems to be having to hold the read sem over a
network disruption, and no check when it comes back before down_read a
second time.
Inside cifs_strict_readv there's a comment that explains why we need
hold the sem over the read, and it's so no one modifies the lock list
with a brlock that prevents writing.
However, in this case at least, we're only opening files for reading,
and inside _cifsFileInfo_put it needs to down_write to delete the
outstanding lock records.
So we're not really hitting that case, but are modifying the list.

One approach to avoid the deadlock is for the reading process to drop
the lock_sem before re-aquiring it again inside cifs_reopen_file.
We would drop the lock just before calling, and then inside we'd need
to check for the condition of the conflicting brlock, and then
possibly error out somehow.
But that is probably not preferred.

Instead, can the reader only down_read one time?  cifs_relock_file is
only called from one place (cifs_reopen_file) and it's when we're
reconnecting, but
cifs_reopen_file is called from many places. I'm not sure if we there
is something in existence we can use that says we don't need to
re-aquire it, or if we need to add a flag or something?

Here is the live crash session

crash> ps -m | grep UN
[0 01:54:10.166] [UN]  PID: 13271  TASK: ffff93b8a7349e40  CPU: 2
COMMAND: "python3-openloo"
[0 01:54:10.167] [UN]  PID: 13273  TASK: ffff93b8b3310000  CPU: 2
COMMAND: "python3-openloo"
[0 01:54:10.169] [UN]  PID: 13269  TASK: ffff93b8a734dac0  CPU: 0
COMMAND: "python3-openloo"
[0 01:54:10.208] [UN]  PID: 13272  TASK: ffff93b8a7348000  CPU: 7
COMMAND: "python3-openloo"
[0 01:54:10.216] [UN]  PID: 13265  TASK: ffff93b8a7358000  CPU: 6
COMMAND: "python3-openloo"
[0 01:54:10.172] [UN]  PID: 13263  TASK: ffff93b8b3313c80  CPU: 2
COMMAND: "python3-openloo"
[0 01:54:10.229] [UN]  PID: 13267  TASK: ffff93b8a7359e40  CPU: 5
COMMAND: "python3-openloo"
[0 01:54:10.234] [UN]  PID: 13277  TASK: ffff93b8a7dddac0  CPU: 3
COMMAND: "python3-openloo"
[0 01:54:10.235] [UN]  PID: 13270  TASK: ffff93b8a734bc80  CPU: 1
COMMAND: "python3-openloo"
[0 01:54:10.244] [UN]  PID: 13276  TASK: ffff93b8a7dd8000  CPU: 6
COMMAND: "python3-openloo"
[0 01:54:10.260] [UN]  PID: 13264  TASK: ffff93b8b3315ac0  CPU: 6
COMMAND: "python3-openloo"
[0 01:54:10.257] [UN]  PID: 13274  TASK: ffff93b8a7ddbc80  CPU: 1
COMMAND: "python3-openloo"
[0 01:54:31.262] [UN]  PID: 13279  TASK: ffff93b8a7d90000  CPU: 1
COMMAND: "python3-openloo" <--- block same instant
[0 01:54:31.265] [UN]  PID: 13275  TASK: ffff93b8a7dd9e40  CPU: 4
COMMAND: "python3-openloo" <--- block same instant
crash> bt 13279
PID: 13279  TASK: ffff93b8a7d90000  CPU: 1   COMMAND: "python3-openloo"
 #0 [ffffaae1c0a77c20] __schedule at ffffffffa19d4239
 #1 [ffffaae1c0a77cb0] schedule at ffffffffa19d46b9
 #2 [ffffaae1c0a77cc8] rwsem_down_write_slowpath at ffffffffa1131ada
<------------- T+1: pid 13279 down_write(... lock_sem); blocks due to
pid 13275
 #3 [ffffaae1c0a77d98] _cifsFileInfo_put at ffffffffc0759818 [cifs]
 #4 [ffffaae1c0a77e78] cifs_close at ffffffffc0759b4f [cifs]
 #5 [ffffaae1c0a77e88] __fput at ffffffffa12f6e51
 #6 [ffffaae1c0a77ed0] task_work_run at ffffffffa10fc827
 #7 [ffffaae1c0a77f08] exit_to_usermode_loop at ffffffffa1003f83
 #8 [ffffaae1c0a77f38] do_syscall_64 at ffffffffa10043c2
 #9 [ffffaae1c0a77f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
    RIP: 00007fc6caca1778  RSP: 00007ffed0c43de8  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 00007fc6bd5a5f30  RCX: 00007fc6caca1778
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000003
    RBP: 0000558ebb9ca0d0   R8: 00007fc6cab941a8   R9: 0000000000000000
    R10: 0000558ebb9ca0d0  R11: 0000000000000246  R12: 0000000000000000
    R13: 0000000000000000  R14: 00007fc6cab4df0c  R15: 00007ffed0c43f10
    ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b
crash> bt 13275
PID: 13275  TASK: ffff93b8a7dd9e40  CPU: 4   COMMAND: "python3-openloo"
 #0 [ffffaae1c0a57a00] __schedule at ffffffffa19d4239
 #1 [ffffaae1c0a57a90] schedule at ffffffffa19d46b9
 #2 [ffffaae1c0a57aa8] rwsem_down_read_slowpath at ffffffffa19d72b4
 #3 [ffffaae1c0a57b60] cifs_reopen_file at ffffffffc0755352 [cifs]
<------------- T+2: pid 13275 down_read(... lock_sem); blocks due to
pid 13279
 #4 [ffffaae1c0a57c08] cifs_readpage_worker at ffffffffc07575f0 [cifs]
 #5 [ffffaae1c0a57cd0] cifs_readpage at ffffffffc0758096 [cifs]
 #6 [ffffaae1c0a57d08] generic_file_read_iter at ffffffffa123acd2
 #7 [ffffaae1c0a57e10] cifs_strict_readv at ffffffffc07606a5 [cifs]
<------------- T+0: pid 13275 down_read(... lock_sem); does not block
 #8 [ffffaae1c0a57e40] new_sync_read at ffffffffa12f336a
 #9 [ffffaae1c0a57ec8] vfs_read at ffffffffa12f5fad
#10 [ffffaae1c0a57f00] ksys_read at ffffffffa12f62df
#11 [ffffaae1c0a57f38] do_syscall_64 at ffffffffa10042bb
#12 [ffffaae1c0a57f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
    RIP: 00007f6ed0b7c055  RSP: 00007ffeea637e68  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f6ed05cd600  RCX: 00007f6ed0b7c055
    RDX: 000000000000001e  RSI: 00007f6ec3404c10  RDI: 0000000000000003
    RBP: 000000000000001e   R8: 00007f6ed0a6f1a8   R9: 0000000000000000
    R10: 00007f6ed090faf0  R11: 0000000000000246  R12: 00007f6ec3480f40
    R13: 0000000000000003  R14: 00007f6ec3404c10  R15: 000055c449ea60d0
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

Here is my tracing from the attached debug patch which shows the
conditions, and points at the bugzilla we have open:
[  233.588248] CIFS VFS: file=fs/cifs/file.c line=480 pid=13279
cifs_sem_op op=up_write sem=ffff93b8a7424bd8
[  233.589462] CIFS VFS: file=fs/cifs/file.c line=311 pid=13279
cifs_sem_op op=down_write sem=ffff93b8a7424bd8
[  233.592333] CIFS VFS: file=fs/cifs/file.c line=314 pid=13275
cifs_sem_op op=up_write sem=ffff93b8a7424bd8
[  233.612324] CIFS VFS: file=fs/cifs/file.c line=314 pid=13279
cifs_sem_op op=up_write sem=ffff93b8a7424bd8
[  233.612510] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13275
cifs_sem_op op=down_read sem=ffff93b8a7424bd8
[  233.616487] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13279
cifs_sem_op op=down_read sem=ffff93b8a7424bd8  T0:
https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
[  254.389458] CIFS VFS: Send error in read = -9
[  254.391943] CIFS VFS: file=fs/cifs/file.c line=3926 pid=13279
cifs_sem_op op=up_read sem=ffff93b8a7424bd8
[  254.397453] CIFS VFS: file=fs/cifs/file.c line=471 pid=13279
cifs_sem_op op=down_write sem=ffff93b8a7424bd8  T1:
https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
[  254.399798] CIFS VFS: file=fs/cifs/file.c line=630 pid=13275
cifs_sem_op op=down_read sem=ffff93b8a7424bd8   T2:
https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9

There was a patch that removed a lockdep warning, but it looks like
this removed a semi-valid warning, because clearly we can deadlock
here:

commit 560d388950ceda5e7c7cdef7f3d9a8ff297bbf9d
Author: Rabin Vincent <rabinv@axis.com>
Date:   Wed May 3 17:17:21 2017 +0200

    CIFS: silence lockdep splat in cifs_relock_file()

    cifs_relock_file() can perform a down_write() on the inode's lock_sem even
    though it was already performed in cifs_strict_readv().  Lockdep complains
    about this.  AFAICS, there is no problem here, and lockdep just needs to be
    told that this nesting is OK.

     =============================================
     [ INFO: possible recursive locking detected ]
     4.11.0+ #20 Not tainted
     ---------------------------------------------
     cat/701 is trying to acquire lock:
      (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00

     but task is already holding lock:
      (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310

     other info that might help us debug this:
      Possible unsafe locking scenario:

            CPU0
            ----
       lock(&cifsi->lock_sem);
       lock(&cifsi->lock_sem);

      *** DEADLOCK ***

      May be due to missing lock nesting notation

     1 lock held by cat/701:
      #0:  (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310

     stack backtrace:
     CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
     Call Trace:
      dump_stack+0x85/0xc2
      __lock_acquire+0x17dd/0x2260
      ? trace_hardirqs_on_thunk+0x1a/0x1c
      ? preempt_schedule_irq+0x6b/0x80
      lock_acquire+0xcc/0x260
      ? lock_acquire+0xcc/0x260
      ? cifs_reopen_file+0x7a7/0xc00
      down_read+0x2d/0x70
      ? cifs_reopen_file+0x7a7/0xc00
      cifs_reopen_file+0x7a7/0xc00
      ? printk+0x43/0x4b
      cifs_readpage_worker+0x327/0x8a0
      cifs_readpage+0x8c/0x2a0
      generic_file_read_iter+0x692/0xd00
      cifs_strict_readv+0x29f/0x310
      generic_file_splice_read+0x11c/0x1c0
      do_splice_to+0xa5/0xc0
      splice_direct_to_actor+0xfa/0x350
      ? generic_pipe_buf_nosteal+0x10/0x10
      do_splice_direct+0xb5/0xe0
      do_sendfile+0x278/0x3a0
      SyS_sendfile64+0xc4/0xe0
      entry_SYSCALL_64_fastpath+0x1f/0xbe

    Signed-off-by: Rabin Vincent <rabinv@axis.com>
    Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
    Signed-off-by: Steve French <smfrench@gmail.com>

Message ID	CALF+zOmFsykoOWHy6Yt6dH6i-Cn9XWWCMsnsdPZaKuLE6jjO7w@mail.gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=SN/X=YO=vger.kernel.org=linux-cifs-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9E1714ED for <patchwork-cifs-client@patchwork.kernel.org>; Mon, 21 Oct 2019 17:23:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7B7582086D for <patchwork-cifs-client@patchwork.kernel.org>; Mon, 21 Oct 2019 17:23:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="b1FT+Ijo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726819AbfJURXL (ORCPT <rfc822;patchwork-cifs-client@patchwork.kernel.org>); Mon, 21 Oct 2019 13:23:11 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:35279 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726672AbfJURXL (ORCPT <rfc822;linux-cifs@vger.kernel.org>); Mon, 21 Oct 2019 13:23:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571678588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=voBNVYxMip0Cua5n2Obu6rb0dGhN8+OrH1il0+qp+/0=; b=b1FT+IjoeQj4lOW11ZFP8FlMf2j6iYNgwqClerR6CdNYTpB25KfRLwN8qcy/0c499RK0UB rKvF58HMuR4PvP4mqsWwa6bj+FhOzzbGR9qebL8TzhZt3jK0KbgTE/TE1ACnHyMnjcnSm3 y5YLgZBzBpFBeWToIsoTwbiiC34v63U= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-3-4oECyb1QMLuk-58XzX0APA-1; Mon, 21 Oct 2019 13:23:06 -0400 Received: by mail-qk1-f200.google.com with SMTP id k67so14237891qkc.3 for <linux-cifs@vger.kernel.org>; Mon, 21 Oct 2019 10:23:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=f+wVXTfFuBDn7o/7l2n2lclpIkAILwyErCpXEOFxdmw=; b=gkjRk9KYc4Qp00Nhw72ZOePcOLepRS8/YoirgOtqSMbK1mdC9m0EibPgnePVuzNtmW o69NA+ZgExmOl96GlvT3X1pw0zSqW+3MvAsH00GX1xUejnf9QXDG0sPe6NoGsfR+iecH KWkeF9xtwnHnT5xcvF5n/CQ9VzlN7xVh7JgCbhd4yidmbXhvEQ1aVq3Qk4xSSY2klrJ/ K+F4AxOs1FufiE7XDVcyOOtaPMeStma1Rrf2jZ2L1mH2ih4I8hwpHGH00anB4DM/lOqX Wl0JytyoLj/eKM7wHacVonAL2jIkJt3oZbYhLjg7kYIfrHWNZqjBOfrGFiO1e2gzRXFN xhwA== X-Gm-Message-State: APjAAAWttlnlb1Yka/Z0VrDzJSBc/hbuog6DziX+z869ZbkjHoOK4U3H 3Ow47tigkgnJPzkW33ZuX9ou/2zcHSwFVqgP5PoqwnM/frWxr6HuCdQvtAm4stM4GZXXH0RC1Hq Km+htq3CdyaKAQOj0IB1faxXwbmnq2uJu5+amSw== X-Received: by 2002:ae9:f404:: with SMTP id y4mr23012810qkl.225.1571678585295; Mon, 21 Oct 2019 10:23:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqxBWtbCGKrsRAeUJBC9B3hUSmHZqqgE0l4RJvm033YowJdodazbAsXdBJZ6dao15F0s/tjusK+szQAz1Obn92w= X-Received: by 2002:ae9:f404:: with SMTP id y4mr23012770qkl.225.1571678584708; Mon, 21 Oct 2019 10:23:04 -0700 (PDT) MIME-Version: 1.0 From: David Wysochanski <dwysocha@redhat.com> Date: Mon, 21 Oct 2019 13:22:27 -0400 Message-ID: <CALF+zOmFsykoOWHy6Yt6dH6i-Cn9XWWCMsnsdPZaKuLE6jjO7w@mail.gmail.com> Subject: Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart To: linux-cifs <linux-cifs@vger.kernel.org> X-MC-Unique: 4oECyb1QMLuk-58XzX0APA-1 X-Mimecast-Spam-Score: 0 Content-Type: multipart/mixed; boundary="000000000000be94d005956eeff8" Sender: linux-cifs-owner@vger.kernel.org Precedence: bulk List-ID: <linux-cifs.vger.kernel.org> X-Mailing-List: linux-cifs@vger.kernel.org
Series	Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart \| expand Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disrupt…

Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart

Commit Message

Comments

Patch