From patchwork Mon Aug 22 16:31:01 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 9293835 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 58A23607FF for ; Mon, 22 Aug 2016 16:31:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4BDFC289AD for ; Mon, 22 Aug 2016 16:31:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4051D289B6; Mon, 22 Aug 2016 16:31:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from oss.sgi.com (oss.sgi.com [192.48.182.195]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9942E289AD for ; Mon, 22 Aug 2016 16:31:10 +0000 (UTC) Received: from oss.sgi.com (localhost [IPv6:::1]) by oss.sgi.com (Postfix) with ESMTP id 1811C7CA1; Mon, 22 Aug 2016 11:31:09 -0500 (CDT) X-Original-To: xfs@oss.sgi.com Delivered-To: xfs@oss.sgi.com Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (Postfix) with ESMTP id 8D91C7CA0 for ; Mon, 22 Aug 2016 11:31:06 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay3.corp.sgi.com (Postfix) with ESMTP id 07B02AC002 for ; Mon, 22 Aug 2016 09:31:05 -0700 (PDT) X-ASG-Debug-ID: 1471883464-0bf81509b0db720001-NocioJ Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id ZybdY1nAjZATpnC0 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Mon, 22 Aug 2016 09:31:04 -0700 (PDT) X-Barracuda-Envelope-From: bfoster@redhat.com X-Barracuda-Effective-Source-IP: mx1.redhat.com[209.132.183.28] X-Barracuda-Apparent-Source-IP: 209.132.183.28 X-ASG-Whitelist: Client Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D11B54A54A for ; Mon, 22 Aug 2016 16:31:03 +0000 (UTC) Received: from bfoster.bfoster (dhcp-41-33.bos.redhat.com [10.18.41.33]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u7MGV3SY002095 for ; Mon, 22 Aug 2016 12:31:03 -0400 Received: by bfoster.bfoster (Postfix, from userid 1000) id 0FCBF121165; Mon, 22 Aug 2016 12:31:01 -0400 (EDT) From: Brian Foster To: xfs@oss.sgi.com Subject: [PATCH] xfs: close xfs_wait_buftarg() race with buffer lookups Date: Mon, 22 Aug 2016 12:31:01 -0400 X-ASG-Orig-Subj: [PATCH] xfs: close xfs_wait_buftarg() race with buffer lookups Message-Id: <1471883461-27407-1-git-send-email-bfoster@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 22 Aug 2016 16:31:03 +0000 (UTC) X-Barracuda-Connect: mx1.redhat.com[209.132.183.28] X-Barracuda-Start-Time: 1471883464 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://192.48.157.11:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 5328 X-Virus-Scanned: by bsmtpd at sgi.com X-Barracuda-BRTS-Status: 1 X-BeenThere: xfs@oss.sgi.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com X-Virus-Scanned: ClamAV using ClamSMTP xfs_wait_buftarg() is invoked on unmount and filesystem freeze to drain and free the buffer LRU. We have reports of filesystem freeze hangs with the following call chain: ->xfs_wait_buftarg() ->xfs_log_quiesce() ->xfs_quiesce_attr() ->xfs_fs_freeze() ... This hang can reproduced with a long enough running fsstress instance running in parallel with a tight freeze/unfreeze loop. The cause of the hang is racing b_hold updates between xfs_wait_buftarg() and _xfs_buf_lookup(). Specifically, buftarg wait path checks whether a buffer has a >1 b_hold count to determine whether to skip the buffer. If b_hold == 1, xfs_wait_buftarg_rele() proceeds to prepare the buffer for the final removal and ultimately calls xfs_buf_rele() to drop the LRU reference. The problem is that _xfs_buf_find() can acquire a b_hold reference any time after xfs_buftarg_wait_rele() has decided it has the only remaining reference. If this occurs, xfs_wait_buftarg() drops the LRU reference, but the xfs_buf_rele() instance doesn't actually remove the buffer from the LRU due to the _xfs_buf_find() hold. At this point b_hold == 1, yet the buffer is held via the _xfs_buf_find() codepath and still remains on the LRU. Both call chains will ultimately call xfs_buf_rele() on a buffer with b_hold == 1. This can have several user facing side effects such as use after free errors or the freed buffer remaining on the LRU indefinitely with an underflowed or garbage b_hold count value. Update xfs_buftarg_wait_rele() to properly synchronize the LRU drain against buffer lookups. Atomically decrement and lock the perag associated with the buffer to lock out buffer lookups before xfs_wait_buftarg() determines whether the LRU holds the final reference. Open code freeing of the buffer so we can remove it from the perag rbtree before the perag lock is dropped and guarantee it cannot be looked up once freeing is imminent. Also update xfs_buftarg_wait_rele() to drop and reacquire the lru_lock in the correct order to avoid deadlocks with LRU insertion. Signed-off-by: Brian Foster --- A couple notes... first is that are several different ways to close this race. I opted for this approach because xfs_wait_buftarg() is the slow/less frequent path. Second, I believe we still have an independent freeze hang upstream due to an issue with how we use drain_workqueue(). Specifically, I see the following warning from kernel/workqueue.c:__queue_work(): if (unlikely(wq->flags & __WQ_DRAINING) && WARN_ON_ONCE(!is_chained_work(wq))) return; ... followed by a hang. I suspect this is due to the ioend work being dropped and thus never completing the I/O, but I haven't dug into it yet. For now, I'm testing for hangs with this fix and the above commented out. Brian fs/xfs/xfs_buf.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 42 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 607cc29..ce9c419 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -1570,24 +1570,48 @@ xfs_buftarg_wait_rele( { struct xfs_buf *bp = container_of(item, struct xfs_buf, b_lru); struct list_head *dispose = arg; + struct xfs_perag *pag = bp->b_pag; + int release; + + /* + * Drop lru_lock to avoid deadlocks with LRU insertion. The required + * lock order to safely isolate a buffer is pag_buf_lock -> b_lock + * -> lru_lock. + */ + spin_unlock(lru_lock); - if (atomic_read(&bp->b_hold) > 1) { + /* + * Atomically decrement and lock the perag to synchronize against buffer + * lookups. We unconditionally decrement because a check for b_hold == 1 + * alone is not sufficient. _xfs_buf_find() can acquire a reference + * immediately after the check. + */ + release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock); + if (!release) { /* need to wait, so skip it this pass */ trace_xfs_buf_wait_buftarg(bp, _RET_IP_); - return LRU_SKIP; + goto skip_buf; } - if (!spin_trylock(&bp->b_lock)) - return LRU_SKIP; + if (!spin_trylock(&bp->b_lock)) { + spin_unlock(&pag->pag_buf_lock); + goto skip_buf; + } + spin_lock(lru_lock); + + /* remove from the rbtree to prevent further lookups */ + rb_erase(&bp->b_rbnode, &pag->pag_buf_tree); + spin_unlock(&pag->pag_buf_lock); - /* - * clear the LRU reference count so the buffer doesn't get - * ignored in xfs_buf_rele(). - */ - atomic_set(&bp->b_lru_ref, 0); bp->b_state |= XFS_BSTATE_DISPOSE; list_lru_isolate_move(lru, item, dispose); spin_unlock(&bp->b_lock); return LRU_REMOVED; + +skip_buf: + /* skipping the buf, put back the LRU hold */ + atomic_inc(&bp->b_hold); + spin_lock(lru_lock); + return LRU_SKIP; } void @@ -1629,7 +1653,15 @@ xfs_wait_buftarg( xfs_alert(btp->bt_mount, "Please run xfs_repair to determine the extent of the problem."); } - xfs_buf_rele(bp); + + /* + * The buffer has already been removed from the rbtree + * and had the last reference dropped. Drop the perag + * reference and free it. + */ + ASSERT(atomic_read(&bp->b_hold) == 0); + xfs_perag_put(bp->b_pag); + xfs_buf_free(bp); } if (loop++ != 0) delay(100);