From patchwork Sat Apr 1 06:40:03 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 9657553 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 43C3D60352 for ; Sat, 1 Apr 2017 06:40:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 356C2286A1 for ; Sat, 1 Apr 2017 06:40:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 28F66286BE; Sat, 1 Apr 2017 06:40:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 656A8286A1 for ; Sat, 1 Apr 2017 06:40:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751188AbdDAGkv (ORCPT ); Sat, 1 Apr 2017 02:40:51 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:40502 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750879AbdDAGkv (ORCPT ); Sat, 1 Apr 2017 02:40:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=References:In-Reply-To:Message-Id: Date:Subject:Cc:To:From:Sender:Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=/3NU8g5fExk/e1DmclEe2yhtzASUtnb9Q0ZTQA/8Znk=; b=dOXNYtFAAljlF9lVCFdFFztSt HoA9ObkzacPskBArvk3ZFUiiHkCYEM/j3NZgZ61utJ/QVO+Kt1GvRneK1bXBU4ZwGn73gVr0BcvwZ ZoXz/EUwk37DxsgciysWCLd1FuuIo95mfTXmoV65hS5HHxDVdFM1CnsS9rEoJCzqrfN6Z6uz+/bkS 2IvDyfMPdmjAhG6afC7WfYkh9Zcnf8z9vWFmdGpn0Wh1Er+cHcld1w/p+gLBm7Iq31uswtbNEKwDF bHjg2/9A6WetQTd+NmBKNS8v5OaiYibFM5kMWZGkOPOc+DYgqoQRijOFJTRKq2UxizT8CEC0E6Xs2 ROYRUUfnA==; Received: from 77.117.150.49.wireless.dyn.drei.com ([77.117.150.49] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.87 #1 (Red Hat Linux)) id 1cuCiO-0002N3-SF; Sat, 01 Apr 2017 06:40:50 +0000 From: Christoph Hellwig To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, Brian Foster , "Darrick J . Wong" Subject: [PATCH 04/27] xfs: sync eofblocks scans under iolock are livelock prone Date: Sat, 1 Apr 2017 08:40:03 +0200 Message-Id: <20170401064026.5783-5-hch@lst.de> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170401064026.5783-1-hch@lst.de> References: <20170401064026.5783-1-hch@lst.de> X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Brian Foster commit c3155097ad89a956579bc305856a1f2878494e52 upstream. The xfs_eofblocks.eof_scan_owner field is an internal field to facilitate invoking eofb scans from the kernel while under the iolock. This is necessary because the eofb scan acquires the iolock of each inode. Synchronous scans are invoked on certain buffered write failures while under iolock. In such cases, the scan owner indicates that the context for the scan already owns the particular iolock and prevents a double lock deadlock. eofblocks scans while under iolock are still livelock prone in the event of multiple parallel scans, however. If multiple buffered writes to different inodes fail and invoke eofblocks scans at the same time, each scan avoids a deadlock with its own inode by virtue of the eof_scan_owner field, but will never be able to acquire the iolock of the inode from the parallel scan. Because the low free space scans are invoked with SYNC_WAIT, the scan will not return until it has processed every tagged inode and thus both scans will spin indefinitely on the iolock being held across the opposite scan. This problem can be reproduced reliably by generic/224 on systems with higher cpu counts (x16). To avoid this problem, simplify the semantics of eofblocks scans to never invoke a scan while under iolock. This means that the buffered write context must drop the iolock before the scan. It must reacquire the lock before the write retry and also repeat the initial write checks, as the original state might no longer be valid once the iolock was dropped. Signed-off-by: Brian Foster Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_file.c | 13 +++++++++---- fs/xfs/xfs_icache.c | 45 +++++++-------------------------------------- fs/xfs/xfs_icache.h | 2 -- 3 files changed, 16 insertions(+), 44 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 9a5d64b5f35a..81b7fee40b54 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -675,8 +675,10 @@ xfs_file_buffered_aio_write( struct xfs_inode *ip = XFS_I(inode); ssize_t ret; int enospc = 0; - int iolock = XFS_IOLOCK_EXCL; + int iolock; +write_retry: + iolock = XFS_IOLOCK_EXCL; xfs_rw_ilock(ip, iolock); ret = xfs_file_aio_write_checks(iocb, from, &iolock); @@ -686,7 +688,6 @@ xfs_file_buffered_aio_write( /* We can write back this queue in page reclaim */ current->backing_dev_info = inode_to_bdi(inode); -write_retry: trace_xfs_file_buffered_write(ip, iov_iter_count(from), iocb->ki_pos); ret = iomap_file_buffered_write(iocb, from, &xfs_iomap_ops); if (likely(ret >= 0)) @@ -702,18 +703,21 @@ xfs_file_buffered_aio_write( * running at the same time. */ if (ret == -EDQUOT && !enospc) { + xfs_rw_iunlock(ip, iolock); enospc = xfs_inode_free_quota_eofblocks(ip); if (enospc) goto write_retry; enospc = xfs_inode_free_quota_cowblocks(ip); if (enospc) goto write_retry; + iolock = 0; } else if (ret == -ENOSPC && !enospc) { struct xfs_eofblocks eofb = {0}; enospc = 1; xfs_flush_inodes(ip->i_mount); - eofb.eof_scan_owner = ip->i_ino; /* for locking */ + + xfs_rw_iunlock(ip, iolock); eofb.eof_flags = XFS_EOF_FLAGS_SYNC; xfs_icache_free_eofblocks(ip->i_mount, &eofb); goto write_retry; @@ -721,7 +725,8 @@ xfs_file_buffered_aio_write( current->backing_dev_info = NULL; out: - xfs_rw_iunlock(ip, iolock); + if (iolock) + xfs_rw_iunlock(ip, iolock); return ret; } diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index e4b382aabf9f..78708d001a63 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1326,11 +1326,8 @@ xfs_inode_free_eofblocks( { int ret = 0; struct xfs_eofblocks *eofb = args; - bool need_iolock = true; int match; - ASSERT(!eofb || (eofb && eofb->eof_scan_owner != 0)); - if (!xfs_can_free_eofblocks(ip, false)) { /* inode could be preallocated or append-only */ trace_xfs_inode_free_eofblocks_invalid(ip); @@ -1358,27 +1355,19 @@ xfs_inode_free_eofblocks( if (eofb->eof_flags & XFS_EOF_FLAGS_MINFILESIZE && XFS_ISIZE(ip) < eofb->eof_min_file_size) return 0; - - /* - * A scan owner implies we already hold the iolock. Skip it here - * to avoid deadlock. - */ - if (eofb->eof_scan_owner == ip->i_ino) - need_iolock = false; } /* * If the caller is waiting, return -EAGAIN to keep the background * scanner moving and revisit the inode in a subsequent pass. */ - if (need_iolock && !xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) { + if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) { if (flags & SYNC_WAIT) ret = -EAGAIN; return ret; } ret = xfs_free_eofblocks(ip); - if (need_iolock) - xfs_iunlock(ip, XFS_IOLOCK_EXCL); + xfs_iunlock(ip, XFS_IOLOCK_EXCL); return ret; } @@ -1425,15 +1414,10 @@ __xfs_inode_free_quota_eofblocks( struct xfs_eofblocks eofb = {0}; struct xfs_dquot *dq; - ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); - /* - * Set the scan owner to avoid a potential livelock. Otherwise, the scan - * can repeatedly trylock on the inode we're currently processing. We - * run a sync scan to increase effectiveness and use the union filter to + * Run a sync scan to increase effectiveness and use the union filter to * cover all applicable quotas in a single scan. */ - eofb.eof_scan_owner = ip->i_ino; eofb.eof_flags = XFS_EOF_FLAGS_UNION|XFS_EOF_FLAGS_SYNC; if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) { @@ -1585,12 +1569,9 @@ xfs_inode_free_cowblocks( { int ret; struct xfs_eofblocks *eofb = args; - bool need_iolock = true; int match; struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK); - ASSERT(!eofb || (eofb && eofb->eof_scan_owner != 0)); - /* * Just clear the tag if we have an empty cow fork or none at all. It's * possible the inode was fully unshared since it was originally tagged. @@ -1623,28 +1604,16 @@ xfs_inode_free_cowblocks( if (eofb->eof_flags & XFS_EOF_FLAGS_MINFILESIZE && XFS_ISIZE(ip) < eofb->eof_min_file_size) return 0; - - /* - * A scan owner implies we already hold the iolock. Skip it in - * xfs_free_eofblocks() to avoid deadlock. This also eliminates - * the possibility of EAGAIN being returned. - */ - if (eofb->eof_scan_owner == ip->i_ino) - need_iolock = false; } /* Free the CoW blocks */ - if (need_iolock) { - xfs_ilock(ip, XFS_IOLOCK_EXCL); - xfs_ilock(ip, XFS_MMAPLOCK_EXCL); - } + xfs_ilock(ip, XFS_IOLOCK_EXCL); + xfs_ilock(ip, XFS_MMAPLOCK_EXCL); ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF); - if (need_iolock) { - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); - xfs_iunlock(ip, XFS_IOLOCK_EXCL); - } + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); + xfs_iunlock(ip, XFS_IOLOCK_EXCL); return ret; } diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index a1e02f4708ab..8a7c849b4dea 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -27,7 +27,6 @@ struct xfs_eofblocks { kgid_t eof_gid; prid_t eof_prid; __u64 eof_min_file_size; - xfs_ino_t eof_scan_owner; }; #define SYNC_WAIT 0x0001 /* wait for i/o to complete */ @@ -102,7 +101,6 @@ xfs_fs_eofblocks_from_user( dst->eof_flags = src->eof_flags; dst->eof_prid = src->eof_prid; dst->eof_min_file_size = src->eof_min_file_size; - dst->eof_scan_owner = NULLFSINO; dst->eof_uid = INVALID_UID; if (src->eof_flags & XFS_EOF_FLAGS_UID) {