From patchwork Wed Jan 1 01:08:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11314689 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EEDF5139A for ; Wed, 1 Jan 2020 01:09:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CEE0420718 for ; Wed, 1 Jan 2020 01:09:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Etf6hVmb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727134AbgAABJB (ORCPT ); Tue, 31 Dec 2019 20:09:01 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:48602 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727132AbgAABJB (ORCPT ); Tue, 31 Dec 2019 20:09:01 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118xL7091232 for ; Wed, 1 Jan 2020 01:08:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=7TaSHvryPFwHNpM2TihZjMjAPgqHT09j0ZWFU9tuKF8=; b=Etf6hVmbFOU3T1OYHzQL+xrB9O9oYAlbYSCyohhrFiivEu38DJsvOEh6vQ8dmHIo4/QO U91Y+qQpX8y4R1W8/mdaS/XinFXegajnEgy4/6DSy5jC+GxQpSyhAIcCd0za+GEWsZOz ijd46aPyd9OrVRR/J2GQNGuPEKvpgN2E7Nt0SdHtnusNYeeETMXy+DQeooAFqXme8OB3 f2t1N/DxwgFZxK9x8RiAy0GcEZWHGr/ZyDIBoi5kK7Z4Ae2UTgLpP2j2iSh2CY7iGiUk 9Lr6TKkUVOue8Gj4cYXx9CorGC80w12+lBTsuTeHgg2eBUj6aOdVAjHvkfB1t6PXQ2BJ 0Q== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 2x5ypqjwby-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:08:59 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118vcA190345 for ; Wed, 1 Jan 2020 01:08:57 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 2x8bsrfy7v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:08:57 +0000 Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 00118nnl010860 for ; Wed, 1 Jan 2020 01:08:49 GMT Received: from localhost (/10.159.150.156) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Dec 2019 17:08:48 -0800 Subject: [PATCH 01/10] xfs: decide if inode needs inactivation From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 31 Dec 2019 17:08:46 -0800 Message-ID: <157784092655.1362752.15438079465056974707.stgit@magnolia> In-Reply-To: <157784092020.1362752.15046503361741521784.stgit@magnolia> References: <157784092020.1362752.15046503361741521784.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Add a predicate function to decide if an inode needs (deferred) inactivation. Any file that has been unlinked or has speculative preallocations either for post-EOF writes or for CoW qualifies. This function will also be used by the upcoming deferred inactivation patch. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_inode.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode.h | 1 + 2 files changed, 60 insertions(+) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 1187ff7035d9..097a89826ba7 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1806,6 +1806,65 @@ xfs_inactive_ifree( return 0; } +/* + * Returns true if we need to update the on-disk metadata before we can free + * the memory used by this inode. Updates include freeing post-eof + * preallocations; freeing COW staging extents; and marking the inode free in + * the inobt if it is on the unlinked list. + */ +bool +xfs_inode_needs_inactivation( + struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *cow_ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK); + + /* + * If the inode is already free, then there can be nothing + * to clean up here. + */ + if (VFS_I(ip)->i_mode == 0) + return false; + + /* If this is a read-only mount, don't do this (would generate I/O) */ + if (mp->m_flags & XFS_MOUNT_RDONLY) + return false; + + /* Try to clean out the cow blocks if there are any. */ + if (cow_ifp && cow_ifp->if_bytes > 0) + return true; + + if (VFS_I(ip)->i_nlink != 0) { + int error; + bool has; + + /* + * force is true because we are evicting an inode from the + * cache. Post-eof blocks must be freed, lest we end up with + * broken free space accounting. + * + * Note: don't bother with iolock here since lockdep complains + * about acquiring it in reclaim context. We have the only + * reference to the inode at this point anyways. + * + * If the predicate errors out, send the inode through + * inactivation anyway, because that's what we did before. + * The inactivation worker will ignore an inode that doesn't + * actually need it. + */ + if (!xfs_can_free_eofblocks(ip, true)) + return false; + error = xfs_has_eofblocks(ip, &has); + return error != 0 || has; + } + + /* + * Link count dropped to zero, which means we have to mark the inode + * free on disk and remove it from the AGI unlinked list. + */ + return true; +} + /* * xfs_inactive * diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 377e02cd3c0a..0a46548e51a8 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -498,6 +498,7 @@ extern struct kmem_zone *xfs_inode_zone; bool xfs_inode_verify_forks(struct xfs_inode *ip); int xfs_has_eofblocks(struct xfs_inode *ip, bool *has); +bool xfs_inode_needs_inactivation(struct xfs_inode *ip); int xfs_iunlink_init(struct xfs_perag *pag); void xfs_iunlink_destroy(struct xfs_perag *pag); From patchwork Wed Jan 1 01:08:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11314695 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF37F139A for ; Wed, 1 Jan 2020 01:09:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C3DBF2064B for ; Wed, 1 Jan 2020 01:09:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="NFRC7Fg2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727152AbgAABJC (ORCPT ); Tue, 31 Dec 2019 20:09:02 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:48614 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727144AbgAABJB (ORCPT ); Tue, 31 Dec 2019 20:09:01 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118x46091245 for ; Wed, 1 Jan 2020 01:08:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=Kvdm9DMPbKzcKvnysSUMGB9l7wwhqfyMHNalF/nklEs=; b=NFRC7Fg2fED+hpyaErGc5xdwvm6RTca5/RNesbQ8us+/sXkQ4q20fixQu89Zv13n5Ejd UYVDBB7kuaTiKqDxEJc8H0KWe+aZxFkOGLwRXDdrTfJqdZt3g3Mat/ZD6XZoo+iaM0Ei 3SrlCyzbtAZF6EspN6USMu1F4OmdziDGa1rxtGbhkDSai/TdRyi6Bcm29SoIZMFtV8gA RtVSAm3gucO0lyOYavL+X7lPkzE1Mm0lad042O7DF4hdtwvS3eABzTBhwG2MJu8sEdM2 L0Lv5uxaMNZm8yJ8+crm9ERWFA3SdR2CtLkuOa6xC947DdL4RtS0BWKYdKTuDJ62cZkR tw== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 2x5ypqjwbs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:08:59 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118v1m190248 for ; Wed, 1 Jan 2020 01:08:57 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 2x8bsrfy8q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:08:56 +0000 Received: from abhmp0018.oracle.com (abhmp0018.oracle.com [141.146.116.24]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 00118tPl031345 for ; Wed, 1 Jan 2020 01:08:55 GMT Received: from localhost (/10.159.150.156) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Dec 2019 17:08:54 -0800 Subject: [PATCH 02/10] xfs: track unlinked inactive inode fs summary counters From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 31 Dec 2019 17:08:52 -0800 Message-ID: <157784093263.1362752.14373360314662413051.stgit@magnolia> In-Reply-To: <157784092020.1362752.15046503361741521784.stgit@magnolia> References: <157784092020.1362752.15046503361741521784.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Set up counters to track the number of inodes and blocks that will be freed from inactivating unlinked inodes. We'll use this in the deferred inactivation patch to hide the effects of deferred processing. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_inode.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_mount.h | 7 +++++++ fs/xfs/xfs_super.c | 31 +++++++++++++++++++++++++++++- 3 files changed, 91 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 097a89826ba7..2fe8f030ebb8 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1806,6 +1806,60 @@ xfs_inactive_ifree( return 0; } +/* + * Play some accounting tricks with deferred inactivation of unlinked inodes so + * that it looks like the inode got freed immediately. The superblock + * maintains counts of the number of inodes, data blocks, and rt blocks that + * would be freed if we were to force inode inactivation. These counts are + * added to the statfs free counters outside of the regular fdblocks/ifree + * counters. If userspace actually demands those "free" resources we'll force + * an inactivation scan to free things for real. + * + * Note that we can safely skip the block accounting trickery for complicated + * situations (inode with blocks on both devices, inode block counts that seem + * wrong) since the worst that happens is that statfs resource usage decreases + * more slowly. + * + * Positive @direction means we're setting up the accounting trick and + * negative undoes it. + */ +static inline void +xfs_inode_iadjust( + struct xfs_inode *ip, + int direction) +{ + struct xfs_mount *mp = ip->i_mount; + xfs_filblks_t iblocks; + int64_t inodes = 0; + int64_t dblocks = 0; + int64_t rblocks = 0; + + ASSERT(direction != 0); + + if (VFS_I(ip)->i_nlink == 0) { + inodes = 1; + + iblocks = max_t(int64_t, 0, ip->i_d.di_nblocks + + ip->i_delayed_blks); + if (!XFS_IS_REALTIME_INODE(ip)) + dblocks = iblocks; + else if (!XFS_IFORK_Q(ip) || + XFS_IFORK_FORMAT(ip, XFS_ATTR_FORK) == + XFS_DINODE_FMT_LOCAL) + rblocks = iblocks; + } + + if (direction < 0) { + inodes = -inodes; + dblocks = -dblocks; + rblocks = -rblocks; + } + + percpu_counter_add(&mp->m_iinactive, inodes); + percpu_counter_add(&mp->m_dinactive, dblocks); + percpu_counter_add(&mp->m_rinactive, rblocks); +} + /* * Returns true if we need to update the on-disk metadata before we can free * the memory used by this inode. Updates include freeing post-eof diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 296223c2b782..d203c922dc51 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -85,6 +85,13 @@ typedef struct xfs_mount { */ struct percpu_counter m_delalloc_blks; + /* Count of inodes waiting for inactivation. */ + struct percpu_counter m_iinactive; + /* Count of data device blocks waiting for inactivation. */ + struct percpu_counter m_dinactive; + /* Coult of realtime device blocks waiting for inactivation. */ + struct percpu_counter m_rinactive; + struct xfs_buf *m_sb_bp; /* buffer for superblock */ char *m_rtname; /* realtime device name */ char *m_logname; /* external log device name */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 03d95bf0952c..ed10ba2cd087 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -784,6 +784,8 @@ xfs_fs_statfs( uint64_t icount; uint64_t ifree; uint64_t fdblocks; + uint64_t iinactive; + uint64_t binactive; xfs_extlen_t lsize; int64_t ffree; @@ -797,6 +799,7 @@ xfs_fs_statfs( icount = percpu_counter_sum(&mp->m_icount); ifree = percpu_counter_sum(&mp->m_ifree); fdblocks = percpu_counter_sum(&mp->m_fdblocks); + iinactive = percpu_counter_sum(&mp->m_iinactive); spin_lock(&mp->m_sb_lock); statp->f_bsize = sbp->sb_blocksize; @@ -820,7 +823,7 @@ xfs_fs_statfs( sbp->sb_icount); /* make sure statp->f_ffree does not underflow */ - ffree = statp->f_files - (icount - ifree); + ffree = statp->f_files - (icount - ifree) + iinactive; statp->f_ffree = max_t(int64_t, ffree, 0); @@ -834,7 +837,12 @@ xfs_fs_statfs( statp->f_blocks = sbp->sb_rblocks; statp->f_bavail = statp->f_bfree = sbp->sb_frextents * sbp->sb_rextsize; + binactive = percpu_counter_sum(&mp->m_rinactive); + } else { + binactive = percpu_counter_sum(&mp->m_dinactive); } + statp->f_bavail += binactive; + statp->f_bfree += binactive; return 0; } @@ -1024,8 +1032,26 @@ xfs_init_percpu_counters( if (error) goto free_fdblocks; + error = percpu_counter_init(&mp->m_iinactive, 0, GFP_KERNEL); + if (error) + goto free_delalloc; + + error = percpu_counter_init(&mp->m_dinactive, 0, GFP_KERNEL); + if (error) + goto free_iinactive; + + error = percpu_counter_init(&mp->m_rinactive, 0, GFP_KERNEL); + if (error) + goto free_dinactive; + return 0; +free_dinactive: + percpu_counter_destroy(&mp->m_dinactive); +free_iinactive: + percpu_counter_destroy(&mp->m_iinactive); +free_delalloc: + percpu_counter_destroy(&mp->m_delalloc_blks); free_fdblocks: percpu_counter_destroy(&mp->m_fdblocks); free_ifree: @@ -1054,6 +1080,9 @@ xfs_destroy_percpu_counters( ASSERT(XFS_FORCED_SHUTDOWN(mp) || percpu_counter_sum(&mp->m_delalloc_blks) == 0); percpu_counter_destroy(&mp->m_delalloc_blks); + percpu_counter_destroy(&mp->m_iinactive); + percpu_counter_destroy(&mp->m_dinactive); + percpu_counter_destroy(&mp->m_rinactive); } static void From patchwork Wed Jan 1 01:08:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11314709 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 06B3C109A for ; Wed, 1 Jan 2020 01:09:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D00A620718 for ; Wed, 1 Jan 2020 01:09:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="out2W8yd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727161AbgAABJE (ORCPT ); Tue, 31 Dec 2019 20:09:04 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:51868 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727141AbgAABJE (ORCPT ); Tue, 31 Dec 2019 20:09:04 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 001190Fc089152 for ; Wed, 1 Jan 2020 01:09:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=a3Ce/5MCTDkyJ3Vi8ti9FxrsaZ//d2nhrFwykgthlFk=; b=out2W8ydck1/B+Bi7XeX8jn+eFZ6xmxUJ5N39sRU1zNakwINVq1Poj5t4lIhpoSjyy79 fZcFnHsqxJw85fFrbMu9gIAl0D0/0TT/CygwHeFlCGaOzHCP63yDTdm2OPslB/qe8Pif Pe7RDUV9y0c397RM1WNUhVhqsB9EGGx0FQJzn3DdxavPe4VT8oN0zsc+osou7jY3iWQj mCoxHk1dJhRlr4b5vIszSKVsuhdC9l3fjCfzzWDW+x6ruCq0bNKcf2D6GGVHvyYw+YHA 5iYzFuWf1uk2kLeVT7Fys6lAvHicWteckv2hmIOdCLtE6v7/fRxOQ34nr7tqteFCMLx6 2w== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 2x5y0pjxts-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:03 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118v2x190375 for ; Wed, 1 Jan 2020 01:09:02 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3020.oracle.com with ESMTP id 2x8bsrfydk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:02 +0000 Received: from abhmp0020.oracle.com (abhmp0020.oracle.com [141.146.116.26]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 001191cq005682 for ; Wed, 1 Jan 2020 01:09:01 GMT Received: from localhost (/10.159.150.156) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Dec 2019 17:09:00 -0800 Subject: [PATCH 03/10] xfs: track unlinked inactive inode quota counters From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 31 Dec 2019 17:08:58 -0800 Message-ID: <157784093877.1362752.15817417024591800990.stgit@magnolia> In-Reply-To: <157784092020.1362752.15046503361741521784.stgit@magnolia> References: <157784092020.1362752.15046503361741521784.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Set up quota counters to track the number of inodes and blocks that will be freed from inactivating unlinked inodes. We'll use this in the deferred inactivation patch to hide the effects of deferred processing. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_dquot.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_dquot.h | 10 ++++++++++ fs/xfs/xfs_inode.c | 4 ++++ fs/xfs/xfs_qm.c | 13 +++++++++++++ fs/xfs/xfs_qm_syscalls.c | 7 ++++--- fs/xfs/xfs_quota.h | 4 +++- fs/xfs/xfs_trace.h | 1 + 7 files changed, 80 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 6429001d1895..e50c75d9d788 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1275,3 +1275,48 @@ xfs_qm_dqiterate( return error; } + +/* Update dquot pending-inactivation counters. */ +STATIC void +xfs_dquot_adjust( + struct xfs_dquot *dqp, + int direction, + int64_t inodes, + int64_t dblocks, + int64_t rblocks) +{ + xfs_dqlock(dqp); + dqp->q_ina_total += direction; + dqp->q_ina_icount += inodes; + dqp->q_ina_bcount += dblocks; + dqp->q_ina_rtbcount += rblocks; + xfs_dqunlock(dqp); +} + +/* Update pending-inactivation counters for all dquots attach to inode. */ +void +xfs_qm_iadjust( + struct xfs_inode *ip, + int direction, + int64_t inodes, + int64_t dblocks, + int64_t rblocks) +{ + struct xfs_mount *mp = ip->i_mount; + + if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp) || + xfs_is_quota_inode(&mp->m_sb, ip->i_ino)) + return; + + if (XFS_IS_UQUOTA_ON(mp) && ip->i_udquot) + xfs_dquot_adjust(ip->i_udquot, direction, inodes, dblocks, + rblocks); + + if (XFS_IS_GQUOTA_ON(mp) && ip->i_gdquot) + xfs_dquot_adjust(ip->i_gdquot, direction, inodes, dblocks, + rblocks); + + if (XFS_IS_PQUOTA_ON(mp) && ip->i_pdquot) + xfs_dquot_adjust(ip->i_pdquot, direction, inodes, dblocks, + rblocks); +} diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h index fe3e46df604b..0d58f4ae8349 100644 --- a/fs/xfs/xfs_dquot.h +++ b/fs/xfs/xfs_dquot.h @@ -47,6 +47,16 @@ struct xfs_dquot { xfs_qcnt_t q_res_icount; /* total realtime blks used+reserved */ xfs_qcnt_t q_res_rtbcount; + + /* inactive inodes attached to this dquot */ + uint64_t q_ina_total; + /* inactive regular nblks used+reserved */ + xfs_qcnt_t q_ina_bcount; + /* inactive inos allocd+reserved */ + xfs_qcnt_t q_ina_icount; + /* inactive realtime blks used+reserved */ + xfs_qcnt_t q_ina_rtbcount; + xfs_qcnt_t q_prealloc_lo_wmark; xfs_qcnt_t q_prealloc_hi_wmark; int64_t q_low_space[XFS_QLOWSP_MAX]; diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 2fe8f030ebb8..aa019e49e512 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -36,6 +36,8 @@ #include "xfs_bmap_btree.h" #include "xfs_reflink.h" #include "xfs_health.h" +#include "xfs_dquot_item.h" +#include "xfs_dquot.h" kmem_zone_t *xfs_inode_zone; @@ -1858,6 +1860,8 @@ xfs_inode_iadjust( percpu_counter_add(&mp->m_iinactive, inodes); percpu_counter_add(&mp->m_dinactive, dblocks); percpu_counter_add(&mp->m_rinactive, rblocks); + + xfs_qm_iadjust(ip, direction, inodes, dblocks, rblocks); } /* diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index ed6cc943db92..fc3898f5e27d 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -424,6 +424,19 @@ xfs_qm_dquot_isolate( if (!xfs_dqlock_nowait(dqp)) goto out_miss_busy; + /* + * An inode is on the inactive list waiting to release its resources, + * so remove this dquot from the freelist and try again. We detached + * the dquot from the NEEDS_INACTIVE inode so that quotaoff won't + * deadlock on inactive inodes holding dquots. + */ + if (dqp->q_ina_total > 0) { + xfs_dqunlock(dqp); + trace_xfs_dqreclaim_inactive(dqp); + list_lru_isolate(lru, &dqp->q_lru); + return LRU_REMOVED; + } + /* * This dquot has acquired a reference in the meantime remove it from * the freelist and try again. diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index c339b7404cf3..d93bf0c39d3d 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -616,8 +616,8 @@ xfs_qm_scall_getquota_fill_qc( XFS_FSB_TO_B(mp, be64_to_cpu(dqp->q_core.d_blk_softlimit)); dst->d_ino_hardlimit = be64_to_cpu(dqp->q_core.d_ino_hardlimit); dst->d_ino_softlimit = be64_to_cpu(dqp->q_core.d_ino_softlimit); - dst->d_space = XFS_FSB_TO_B(mp, dqp->q_res_bcount); - dst->d_ino_count = dqp->q_res_icount; + dst->d_space = XFS_FSB_TO_B(mp, dqp->q_res_bcount - dqp->q_ina_bcount); + dst->d_ino_count = dqp->q_res_icount - dqp->q_ina_icount; dst->d_spc_timer = be32_to_cpu(dqp->q_core.d_btimer); dst->d_ino_timer = be32_to_cpu(dqp->q_core.d_itimer); dst->d_ino_warns = be16_to_cpu(dqp->q_core.d_iwarns); @@ -626,7 +626,8 @@ xfs_qm_scall_getquota_fill_qc( XFS_FSB_TO_B(mp, be64_to_cpu(dqp->q_core.d_rtb_hardlimit)); dst->d_rt_spc_softlimit = XFS_FSB_TO_B(mp, be64_to_cpu(dqp->q_core.d_rtb_softlimit)); - dst->d_rt_space = XFS_FSB_TO_B(mp, dqp->q_res_rtbcount); + dst->d_rt_space = + XFS_FSB_TO_B(mp, dqp->q_res_rtbcount - dqp->q_ina_rtbcount); dst->d_rt_spc_timer = be32_to_cpu(dqp->q_core.d_rtbtimer); dst->d_rt_spc_warns = be16_to_cpu(dqp->q_core.d_rtbwarns); diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h index efe42ae7a2f3..c354f01dae7b 100644 --- a/fs/xfs/xfs_quota.h +++ b/fs/xfs/xfs_quota.h @@ -106,7 +106,8 @@ extern int xfs_qm_newmount(struct xfs_mount *, uint *, uint *); extern void xfs_qm_mount_quotas(struct xfs_mount *); extern void xfs_qm_unmount(struct xfs_mount *); extern void xfs_qm_unmount_quotas(struct xfs_mount *); - +extern void xfs_qm_iadjust(struct xfs_inode *ip, int direction, int64_t inodes, + int64_t dblocks, int64_t rblocks); #else static inline int xfs_qm_vop_dqalloc(struct xfs_inode *ip, xfs_dqid_t uid, xfs_dqid_t gid, @@ -148,6 +149,7 @@ static inline int xfs_trans_reserve_quota_bydquots(struct xfs_trans *tp, #define xfs_qm_mount_quotas(mp) #define xfs_qm_unmount(mp) #define xfs_qm_unmount_quotas(mp) +#define xfs_qm_iadjust(ip, dir, inodes, dblocks, rblocks) #endif /* CONFIG_XFS_QUOTA */ #define xfs_trans_unreserve_quota_nblks(tp, ip, nblks, ninos, flags) \ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index cee45e6cdb39..0064f4491d66 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -906,6 +906,7 @@ DEFINE_EVENT(xfs_dquot_class, name, \ TP_PROTO(struct xfs_dquot *dqp), \ TP_ARGS(dqp)) DEFINE_DQUOT_EVENT(xfs_dqadjust); +DEFINE_DQUOT_EVENT(xfs_dqreclaim_inactive); DEFINE_DQUOT_EVENT(xfs_dqreclaim_want); DEFINE_DQUOT_EVENT(xfs_dqreclaim_dirty); DEFINE_DQUOT_EVENT(xfs_dqreclaim_busy); From patchwork Wed Jan 1 01:09:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11314715 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 19BB0139A for ; Wed, 1 Jan 2020 01:09:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EC65120718 for ; Wed, 1 Jan 2020 01:09:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="izaQarDL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727141AbgAABJK (ORCPT ); Tue, 31 Dec 2019 20:09:10 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:48756 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727166AbgAABJK (ORCPT ); Tue, 31 Dec 2019 20:09:10 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 001190j1091263 for ; Wed, 1 Jan 2020 01:09:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=+05yX1vqhyr3tBXE68f5jvu7CUA2x7DycxpASCpqq7A=; b=izaQarDLuu1O4tiGSWnHEfn5ietAIwWXsKHJsDSG6NSTNHXmRaPgTyCt6B+iUUSi3bxf B3OxkUETuRHVq/mu/GR0KK0i/zmE5owYbi5CRuriQVNM1Oab3qKWIbGDfVZL8iafmcAU Ic6gACYoEz18WAO8usE6tKcm8PQJtBWBcLJRdGtG4e8ZUfzUlPqx8lJJMzn3IcvSpZJf 10ePQaxucom0wiQrQIBdyAcwtdWRthoTZRR9GvsKvi62pj7l15GpN92QeBLML4Q5Yg0K VjpBV30uvjzvc/XMGKQMFXpPweFsyLCsrzVz6bTxpP8l00H2pjTidn+CAhn/q+5rL/nw 3g== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2120.oracle.com with ESMTP id 2x5ypqjwcw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:09 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118uXZ045261 for ; Wed, 1 Jan 2020 01:09:08 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3030.oracle.com with ESMTP id 2x7medfb4m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:08 +0000 Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 001197em011027 for ; Wed, 1 Jan 2020 01:09:07 GMT Received: from localhost (/10.159.150.156) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Dec 2019 17:09:07 -0800 Subject: [PATCH 04/10] xfs: pass per-ag structure to the xfs_ici_walk execute function From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 31 Dec 2019 17:09:05 -0800 Message-ID: <157784094496.1362752.10290761414347952592.stgit@magnolia> In-Reply-To: <157784092020.1362752.15046503361741521784.stgit@magnolia> References: <157784092020.1362752.15046503361741521784.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Pass the per-AG structure to the xfs_ici_walk execute function. This isn't needed now, but deferred inactivation will need it to modify some per-ag data. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_icache.c | 24 ++++++++++++++++-------- fs/xfs/xfs_icache.h | 2 +- fs/xfs/xfs_qm_syscalls.c | 1 + 3 files changed, 18 insertions(+), 9 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 1a09d4854266..d9bfc78a1b85 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -26,8 +26,10 @@ #include -STATIC int xfs_inode_free_eofblocks(struct xfs_inode *ip, void *args); -STATIC int xfs_inode_free_cowblocks(struct xfs_inode *ip, void *args); +STATIC int xfs_inode_free_eofblocks(struct xfs_inode *ip, struct xfs_perag *pag, + void *args); +STATIC int xfs_inode_free_cowblocks(struct xfs_inode *ip, struct xfs_perag *pag, + void *args); /* * Allocate and initialise an xfs_inode. @@ -798,7 +800,8 @@ STATIC int xfs_ici_walk_ag( struct xfs_mount *mp, struct xfs_perag *pag, - int (*execute)(struct xfs_inode *ip, void *args), + int (*execute)(struct xfs_inode *ip, + struct xfs_perag *pag, void *args), void *args, int tag, int iter_flags) @@ -874,7 +877,7 @@ xfs_ici_walk_ag( if ((iter_flags & XFS_ICI_WALK_INEW_WAIT) && xfs_iflags_test(batch[i], XFS_INEW)) xfs_inew_wait(batch[i]); - error = execute(batch[i], args); + error = execute(batch[i], pag, args); xfs_irele(batch[i]); if (error == -EAGAIN) { skipped++; @@ -919,7 +922,8 @@ STATIC int xfs_ici_walk( struct xfs_mount *mp, int iter_flags, - int (*execute)(struct xfs_inode *ip, void *args), + int (*execute)(struct xfs_inode *ip, + struct xfs_perag *pag, void *args), void *args, int tag) { @@ -950,7 +954,8 @@ xfs_ici_walk( int xfs_ici_walk_all( struct xfs_mount *mp, - int (*execute)(struct xfs_inode *ip, void *args), + int (*execute)(struct xfs_inode *ip, + struct xfs_perag *pag, void *args), void *args) { return xfs_ici_walk(mp, XFS_ICI_WALK_INEW_WAIT, execute, args, @@ -977,15 +982,16 @@ xfs_queue_blockgc( static int xfs_blockgc_scan_inode( struct xfs_inode *ip, + struct xfs_perag *pag, void *args) { int error; - error = xfs_inode_free_eofblocks(ip, args); + error = xfs_inode_free_eofblocks(ip, pag, args); if (error && error != -EAGAIN) return error; - return xfs_inode_free_cowblocks(ip, args); + return xfs_inode_free_cowblocks(ip, pag, args); } /* Scan an AG's inodes for block preallocations that we can remove. */ @@ -1528,6 +1534,7 @@ xfs_inode_matches_eofb( STATIC int xfs_inode_free_eofblocks( struct xfs_inode *ip, + struct xfs_perag *pag, void *args) { struct xfs_eofblocks *eofb = args; @@ -1806,6 +1813,7 @@ xfs_prep_free_cowblocks( STATIC int xfs_inode_free_cowblocks( struct xfs_inode *ip, + struct xfs_perag *pag, void *args) { struct xfs_eofblocks *eofb = args; diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index ee4e05b59afb..d7713eb0734d 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -70,7 +70,7 @@ void xfs_inode_clear_cowblocks_tag(struct xfs_inode *ip); int xfs_icache_free_cowblocks(struct xfs_mount *, struct xfs_eofblocks *); int xfs_ici_walk_all(struct xfs_mount *mp, - int (*execute)(struct xfs_inode *ip, void *args), + int (*execute)(struct xfs_inode *ip, struct xfs_perag *pag, void *args), void *args); int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp, diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index d93bf0c39d3d..fa0db72f8d0d 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -733,6 +733,7 @@ struct xfs_dqrele { STATIC int xfs_dqrele_inode( struct xfs_inode *ip, + struct xfs_perag *pag, void *args) { struct xfs_dqrele *dqr = args; From patchwork Wed Jan 1 01:09:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11314717 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 287A5109A for ; Wed, 1 Jan 2020 01:09:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F123B20718 for ; Wed, 1 Jan 2020 01:09:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="HCqLqKGJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727154AbgAABJQ (ORCPT ); Tue, 31 Dec 2019 20:09:16 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:51934 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727166AbgAABJQ (ORCPT ); Tue, 31 Dec 2019 20:09:16 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 001190E9089146 for ; Wed, 1 Jan 2020 01:09:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=OaG8tpL4qB6UWPwwfBYGMRrnahpGobE/W8ACWnKvmuw=; b=HCqLqKGJd/BQJBDuH1rDxdTrm5ph4drUJRLQuDXD77LAawPrYbiarjm4t68XG8zitVV4 61YnMLEj6JyGQNiMOZIER88rnlpP8w2FxBECMAlTaNXj5cZ7oRSnEm03oHErqMOXgxcV D41sELt1AK87UZVPF3LHpXxbc4PT75MoVsYijUvSeV9NdvwfVep3X3hDIpiZD5szToj0 5xbawHAqquJlvGJYWF/2GkUNRn/99tpBBUa9rHNg0W58I399nTKXuEESV6aqYOuh5GJc qPK72Bmd4bBfHHY0qDHC8qTVDjIbMEOriJfptlU8cn+UHr+2x3Mq0kllqDP8+3bOcVJF Xg== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 2x5y0pjxu0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:14 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118vOK190257 for ; Wed, 1 Jan 2020 01:09:14 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 2x8bsrfyke-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:14 +0000 Received: from abhmp0020.oracle.com (abhmp0020.oracle.com [141.146.116.26]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 00119D8E011036 for ; Wed, 1 Jan 2020 01:09:13 GMT Received: from localhost (/10.159.150.156) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Dec 2019 17:09:13 -0800 Subject: [PATCH 05/10] xfs: pass around xfs_inode_ag_walk iget/irele helper functions From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 31 Dec 2019 17:09:11 -0800 Message-ID: <157784095105.1362752.6192279595180573182.stgit@magnolia> In-Reply-To: <157784092020.1362752.15046503361741521784.stgit@magnolia> References: <157784092020.1362752.15046503361741521784.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Create an alternative version of xfs_ici_walk() that allow a caller to pass in custom inode grab and inode release helper functions. Deferred inode inactivation deals with xfs inodes that are still in memory but no longer visible to the vfs, which means that it has to screen and process those inodes differently. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_icache.c | 82 +++++++++++++++++++++++++++++++++++++++------------ fs/xfs/xfs_icache.h | 6 ++-- 2 files changed, 65 insertions(+), 23 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index d9bfc78a1b85..01f5502d984a 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -792,20 +792,38 @@ xfs_inode_ag_walk_grab( return false; } +struct xfs_ici_walk_ops { + /* + * Examine the given inode to decide if we want to pass it to the + * execute function. If so, this function should do whatever is needed + * to prevent others from grabbing it. If not, this function should + * release the inode. + */ + bool (*igrab)(struct xfs_inode *ip, int iter_flags); + + /* Do something with the given inode. */ + xfs_ici_walk_fn iwalk; + + /* + * Release an inode after the execution function runs. This function + * is optional. + */ + void (*irele)(struct xfs_inode *ip); +}; + /* * For a given per-AG structure @pag, @grab, @execute, and @rele all incore * inodes with the given radix tree @tag. */ STATIC int xfs_ici_walk_ag( - struct xfs_mount *mp, struct xfs_perag *pag, - int (*execute)(struct xfs_inode *ip, - struct xfs_perag *pag, void *args), + const struct xfs_ici_walk_ops *ops, + int iter_flags, void *args, - int tag, - int iter_flags) + int tag) { + struct xfs_mount *mp = pag->pag_mount; uint32_t first_index; int last_error = 0; int skipped; @@ -846,7 +864,7 @@ xfs_ici_walk_ag( for (i = 0; i < nr_found; i++) { struct xfs_inode *ip = batch[i]; - if (done || !xfs_inode_ag_walk_grab(ip, iter_flags)) + if (done || !ops->igrab(ip, iter_flags)) batch[i] = NULL; /* @@ -877,8 +895,9 @@ xfs_ici_walk_ag( if ((iter_flags & XFS_ICI_WALK_INEW_WAIT) && xfs_iflags_test(batch[i], XFS_INEW)) xfs_inew_wait(batch[i]); - error = execute(batch[i], pag, args); - xfs_irele(batch[i]); + error = ops->iwalk(batch[i], pag, args); + if (ops->irele) + ops->irele(batch[i]); if (error == -EAGAIN) { skipped++; continue; @@ -915,15 +934,14 @@ xfs_ici_walk_get_perag( } /* - * Call the @execute function on all incore inodes matching the radix tree - * @tag. + * Call the @grab, @execute, and @rele functions on all incore inodes matching + * the radix tree @tag. */ STATIC int -xfs_ici_walk( +xfs_ici_walk_fns( struct xfs_mount *mp, + const struct xfs_ici_walk_ops *ops, int iter_flags, - int (*execute)(struct xfs_inode *ip, - struct xfs_perag *pag, void *args), void *args, int tag) { @@ -935,8 +953,7 @@ xfs_ici_walk( ag = 0; while ((pag = xfs_ici_walk_get_perag(mp, ag, tag))) { ag = pag->pag_agno + 1; - error = xfs_ici_walk_ag(mp, pag, execute, args, tag, - iter_flags); + error = xfs_ici_walk_ag(pag, ops, iter_flags, args, tag); xfs_perag_put(pag); if (error) { last_error = error; @@ -947,6 +964,27 @@ xfs_ici_walk( return last_error; } +/* + * Call the @execute function on all incore inodes matching a given radix tree + * @tag. + */ +STATIC int +xfs_ici_walk( + struct xfs_mount *mp, + int iter_flags, + xfs_ici_walk_fn iwalk, + void *args, + int tag) +{ + struct xfs_ici_walk_ops ops = { + .igrab = xfs_inode_ag_walk_grab, + .iwalk = iwalk, + .irele = xfs_irele, + }; + + return xfs_ici_walk_fns(mp, &ops, iter_flags, args, tag); +} + /* * Walk all incore inodes in the filesystem. Knowledge of radix tree tags * is hidden and we always wait for INEW inodes. @@ -954,11 +992,10 @@ xfs_ici_walk( int xfs_ici_walk_all( struct xfs_mount *mp, - int (*execute)(struct xfs_inode *ip, - struct xfs_perag *pag, void *args), + xfs_ici_walk_fn iwalk, void *args) { - return xfs_ici_walk(mp, XFS_ICI_WALK_INEW_WAIT, execute, args, + return xfs_ici_walk(mp, XFS_ICI_WALK_INEW_WAIT, iwalk, args, XFS_ICI_NO_TAG); } @@ -1000,8 +1037,13 @@ xfs_blockgc_scan_pag( struct xfs_perag *pag, struct xfs_eofblocks *eofb) { - return xfs_ici_walk_ag(pag->pag_mount, pag, xfs_blockgc_scan_inode, - eofb, XFS_ICI_BLOCK_GC_TAG, 0); + static const struct xfs_ici_walk_ops ops = { + .igrab = xfs_inode_ag_walk_grab, + .iwalk = xfs_blockgc_scan_inode, + .irele = xfs_irele, + }; + + return xfs_ici_walk_ag(pag, &ops, 0, eofb, XFS_ICI_BLOCK_GC_TAG); } /* Scan all incore inodes for block preallocations that we can remove. */ diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index d7713eb0734d..3c34c0e2e266 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -69,9 +69,9 @@ void xfs_inode_set_cowblocks_tag(struct xfs_inode *ip); void xfs_inode_clear_cowblocks_tag(struct xfs_inode *ip); int xfs_icache_free_cowblocks(struct xfs_mount *, struct xfs_eofblocks *); -int xfs_ici_walk_all(struct xfs_mount *mp, - int (*execute)(struct xfs_inode *ip, struct xfs_perag *pag, void *args), - void *args); +typedef int (*xfs_ici_walk_fn)(struct xfs_inode *ip, struct xfs_perag *pag, + void *args); +int xfs_ici_walk_all(struct xfs_mount *mp, xfs_ici_walk_fn iwalk, void *args); int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino, bool *inuse); From patchwork Wed Jan 1 01:09:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11314719 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D3ED1398 for ; Wed, 1 Jan 2020 01:09:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C10E3206E4 for ; Wed, 1 Jan 2020 01:09:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="lbYIJ2S1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727169AbgAABJY (ORCPT ); Tue, 31 Dec 2019 20:09:24 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:53160 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727166AbgAABJY (ORCPT ); Tue, 31 Dec 2019 20:09:24 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00119L0i109821 for ; Wed, 1 Jan 2020 01:09:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=oSh7ikZNiQic93hR6H8BKaeUM6QI73D1LiC0+Lzd47w=; b=lbYIJ2S1dnIoVOOxLZ8x4K2yrL5F58z6b4xzJ5maFbbvOBZrkLbOGvZR53bH9+gG47om YH1LsT8sehqRSjkjj5XiU3D4OSXqlT3OBcOdSyH1bvXkGbvtxxacnCNpOe+EegHKlQwi vns/QU3ES7CoiJmwEqjNrqnajsnkZ5CXbGcXYVW87lZXCuKf/JW//zjqHZ+8op0Psq97 ZhEEKBWAvCRCm3IWzhpVo/6D6ix3GzTJPMQBPBT3U7ux+8x8YmutT33FFIL9g7m407xt o8PHBPTqbBVHgcIQbqHKNQqaVSDUEBaP06sG/B1GVLck01MDde/d/Q8XXbAKm4cyiudN BQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 2x5xftk2dd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:21 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118v8a190299 for ; Wed, 1 Jan 2020 01:09:20 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 2x8bsrfynj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:20 +0000 Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 00119J3T031507 for ; Wed, 1 Jan 2020 01:09:20 GMT Received: from localhost (/10.159.150.156) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Dec 2019 17:09:19 -0800 Subject: [PATCH 06/10] xfs: deferred inode inactivation From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 31 Dec 2019 17:09:17 -0800 Message-ID: <157784095718.1362752.13211509487069295216.stgit@magnolia> In-Reply-To: <157784092020.1362752.15046503361741521784.stgit@magnolia> References: <157784092020.1362752.15046503361741521784.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Instead of calling xfs_inactive directly from xfs_fs_destroy_inode, defer the inactivation phase to a separate workqueue. With this we avoid blocking memory reclaim on filesystem metadata updates that are necessary to free an in-core inode, such as post-eof block freeing, COW staging extent freeing, and truncating and freeing unlinked inodes. Now that work is deferred to a workqueue where we can do the freeing in batches. We introduce two new inode flags -- NEEDS_INACTIVE and INACTIVATING. The first flag helps our worker find inodes needing inactivation, and the second flag marks inodes that are in the process of being inactivated. A concurrent xfs_iget on the inode can still resurrect the inode by clearing NEEDS_INACTIVE (or bailing if INACTIVATING is set). Unfortunately, deferring the inactivation has one huge downside -- eventual consistency. Since all the freeing is deferred to a worker thread, one can rm a file but the space doesn't come back immediately. This can cause some odd side effects with quota accounting and statfs, so we also force inactivation scans in order to maintain the existing behaviors, at least outwardly. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_icache.c | 430 +++++++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_icache.h | 8 + fs/xfs/xfs_inode.c | 76 ++++++++ fs/xfs/xfs_inode.h | 15 +- fs/xfs/xfs_iomap.c | 1 fs/xfs/xfs_log_recover.c | 7 + fs/xfs/xfs_mount.c | 20 ++ fs/xfs/xfs_mount.h | 5 - fs/xfs/xfs_qm_syscalls.c | 6 + fs/xfs/xfs_super.c | 56 +++++- fs/xfs/xfs_trace.h | 15 +- 11 files changed, 612 insertions(+), 27 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 01f5502d984a..13b318dc2e89 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -30,6 +30,8 @@ STATIC int xfs_inode_free_eofblocks(struct xfs_inode *ip, struct xfs_perag *pag, void *args); STATIC int xfs_inode_free_cowblocks(struct xfs_inode *ip, struct xfs_perag *pag, void *args); +static void xfs_perag_set_inactive_tag(struct xfs_perag *pag); +static void xfs_perag_clear_inactive_tag(struct xfs_perag *pag); /* * Allocate and initialise an xfs_inode. @@ -222,6 +224,18 @@ xfs_perag_clear_reclaim_tag( trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_); } +static void +__xfs_inode_set_reclaim_tag( + struct xfs_perag *pag, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + + radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino), + XFS_ICI_RECLAIM_TAG); + xfs_perag_set_reclaim_tag(pag); + __xfs_iflags_set(ip, XFS_IRECLAIMABLE); +} /* * We set the inode flag atomically with the radix tree tag. @@ -239,10 +253,7 @@ xfs_inode_set_reclaim_tag( spin_lock(&pag->pag_ici_lock); spin_lock(&ip->i_flags_lock); - radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino), - XFS_ICI_RECLAIM_TAG); - xfs_perag_set_reclaim_tag(pag); - __xfs_iflags_set(ip, XFS_IRECLAIMABLE); + __xfs_inode_set_reclaim_tag(pag, ip); spin_unlock(&ip->i_flags_lock); spin_unlock(&pag->pag_ici_lock); @@ -260,6 +271,40 @@ xfs_inode_clear_reclaim_tag( xfs_perag_clear_reclaim_tag(pag); } +/* Set this inode's inactive tag and set the per-AG tag. */ +void +xfs_inode_set_inactive_tag( + struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_perag *pag; + + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); + spin_lock(&pag->pag_ici_lock); + spin_lock(&ip->i_flags_lock); + + radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino), + XFS_ICI_INACTIVE_TAG); + xfs_perag_set_inactive_tag(pag); + __xfs_iflags_set(ip, XFS_NEED_INACTIVE); + + spin_unlock(&ip->i_flags_lock); + spin_unlock(&pag->pag_ici_lock); + xfs_perag_put(pag); +} + +/* Clear this inode's inactive tag and try to clear the AG's. */ +STATIC void +xfs_inode_clear_inactive_tag( + struct xfs_perag *pag, + xfs_ino_t ino) +{ + radix_tree_tag_clear(&pag->pag_ici_root, + XFS_INO_TO_AGINO(pag->pag_mount, ino), + XFS_ICI_INACTIVE_TAG); + xfs_perag_clear_inactive_tag(pag); +} + static void xfs_inew_wait( struct xfs_inode *ip) @@ -321,6 +366,13 @@ xfs_iget_check_free_state( struct xfs_inode *ip, int flags) { + /* + * Unlinked inodes awaiting inactivation must not be reused until we + * have a chance to clear the on-disk metadata. + */ + if (VFS_I(ip)->i_nlink == 0 && (ip->i_flags & XFS_NEED_INACTIVE)) + return -ENOENT; + if (flags & XFS_IGET_CREATE) { /* should be a free inode */ if (VFS_I(ip)->i_mode != 0) { @@ -348,6 +400,77 @@ xfs_iget_check_free_state( return 0; } +/* + * We've torn down the VFS part of this NEED_INACTIVE inode, so we need to get + * it back into working state. This function unlocks the i_flags_lock and RCU. + */ +static int +xfs_iget_inactive( + struct xfs_perag *pag, + struct xfs_inode *ip) + __releases(&ip->i_flags_lock) __releases(RCU) +{ + struct xfs_mount *mp = ip->i_mount; + struct inode *inode = VFS_I(ip); + int error; + + /* + * We need to set XFS_INACTIVATING to prevent xfs_inactive_inode from + * stomping over us while we recycle the inode. We can't clear the + * radix tree inactive tag yet as it requires pag_ici_lock to be held + * exclusive. + */ + ip->i_flags |= XFS_INACTIVATING; + + spin_unlock(&ip->i_flags_lock); + rcu_read_unlock(); + + /* Undo our inactivation preparation and drop the tags. */ + xfs_inode_inactivation_cleanup(ip); + + error = xfs_reinit_inode(mp, inode); + if (error) { + bool wake; + /* + * Re-initializing the inode failed, and we are in deep + * trouble. Try to re-add it to the inactive list. + */ + rcu_read_lock(); + spin_lock(&ip->i_flags_lock); + wake = !!__xfs_iflags_test(ip, XFS_INEW); + ip->i_flags &= ~(XFS_INEW | XFS_INACTIVATING); + if (wake) + wake_up_bit(&ip->i_flags, __XFS_INEW_BIT); + ASSERT(ip->i_flags & XFS_NEED_INACTIVE); + trace_xfs_iget_inactive_fail(ip); + spin_unlock(&ip->i_flags_lock); + rcu_read_unlock(); + return error; + } + + spin_lock(&pag->pag_ici_lock); + spin_lock(&ip->i_flags_lock); + + /* + * Clear the per-lifetime state in the inode as we are now effectively + * a new inode and need to return to the initial state before reuse + * occurs. + */ + ip->i_flags &= ~XFS_IRECLAIM_RESET_FLAGS; + ip->i_flags |= XFS_INEW; + xfs_inode_clear_inactive_tag(pag, ip->i_ino); + inode->i_state = I_NEW; + ip->i_sick = 0; + ip->i_checked = 0; + + ASSERT(!rwsem_is_locked(&inode->i_rwsem)); + init_rwsem(&inode->i_rwsem); + + spin_unlock(&ip->i_flags_lock); + spin_unlock(&pag->pag_ici_lock); + return 0; +} + /* * Check the validity of the inode we just found it the cache */ @@ -382,14 +505,14 @@ xfs_iget_cache_hit( /* * If we are racing with another cache hit that is currently * instantiating this inode or currently recycling it out of - * reclaimabe state, wait for the initialisation to complete + * reclaimable state, wait for the initialisation to complete * before continuing. * * XXX(hch): eventually we should do something equivalent to * wait_on_inode to wait for these flags to be cleared * instead of polling for it. */ - if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) { + if (ip->i_flags & (XFS_INEW | XFS_IRECLAIM | XFS_INACTIVATING)) { trace_xfs_iget_skip(ip); XFS_STATS_INC(mp, xs_ig_frecycle); error = -EAGAIN; @@ -465,6 +588,21 @@ xfs_iget_cache_hit( spin_unlock(&ip->i_flags_lock); spin_unlock(&pag->pag_ici_lock); + } else if (ip->i_flags & XFS_NEED_INACTIVE) { + /* + * If NEED_INACTIVE is set, we've torn down the VFS inode and + * need to carefully get it back into useable state. + */ + trace_xfs_iget_inactive(ip); + + if (flags & XFS_IGET_INCORE) { + error = -EAGAIN; + goto out_error; + } + + error = xfs_iget_inactive(pag, ip); + if (error) + return error; } else { /* If the VFS inode is being torn down, pause and try again. */ if (!igrab(inode)) { @@ -772,7 +910,8 @@ xfs_inode_ag_walk_grab( /* avoid new or reclaimable inodes. Leave for reclaim code to flush */ if ((!newinos && __xfs_iflags_test(ip, XFS_INEW)) || - __xfs_iflags_test(ip, XFS_IRECLAIMABLE | XFS_IRECLAIM)) + __xfs_iflags_test(ip, XFS_IRECLAIMABLE | XFS_IRECLAIM | + XFS_NEED_INACTIVE | XFS_INACTIVATING)) goto out_unlock_noent; spin_unlock(&ip->i_flags_lock); @@ -1052,6 +1191,10 @@ xfs_blockgc_scan( struct xfs_mount *mp, struct xfs_eofblocks *eofb) { + if (eofb->eof_flags & XFS_EOF_FLAGS_SYNC) + xfs_inactive_inodes(mp, eofb); + else + xfs_inactive_force(mp); return xfs_ici_walk(mp, 0, xfs_blockgc_scan_inode, eofb, XFS_ICI_BLOCK_GC_TAG); } @@ -1199,6 +1342,8 @@ xfs_reclaim_inode( xfs_ino_t ino = ip->i_ino; /* for radix_tree_delete */ int error; + trace_xfs_inode_reclaiming(ip); + restart: error = 0; xfs_ilock(ip, XFS_ILOCK_EXCL); @@ -1926,3 +2071,274 @@ xfs_inode_clear_cowblocks_tag( trace_xfs_inode_clear_cowblocks_tag(ip); return __xfs_inode_clear_blocks_tag(ip, XFS_ICOWBLOCKS); } + +/* + * Deferred Inode Inactivation + * =========================== + * + * Sometimes, inodes need to have work done on them once the last program has + * closed the file. Typically this means cleaning out any leftover post-eof or + * CoW staging blocks for linked files. For inodes that have been totally + * unlinked, this means unmapping data/attr/cow blocks, removing the inode + * from the unlinked buckets, and marking it free in the inobt and inode table. + * + * This process can generate many metadata updates, which shows up as close() + * and unlink() calls that take a long time. We defer all that work to a + * per-AG workqueue which means that we can batch a lot of work and do it in + * inode order for better performance. Furthermore, we can control the + * workqueue, which means that we can avoid doing inactivation work at a bad + * time, such as when the fs is frozen. + * + * Deferred inactivation introduces new inode flag states (NEED_INACTIVE and + * INACTIVATING) and adds a new INACTIVE radix tree tag for fast access. We + * maintain separate perag counters for both types, and move counts as inodes + * wander the state machine, which now works as follows: + * + * If the inode needs inactivation, we: + * - Set the NEED_INACTIVE inode flag + * - Increment the per-AG inactive count + * - Set the INACTIVE tag in the per-AG inode tree + * - Set the INACTIVE tag in the per-fs AG tree + * - Schedule background inode inactivation + * + * If the inode does not need inactivation, we: + * - Set the RECLAIMABLE inode flag + * - Increment the per-AG reclaim count + * - Set the RECLAIM tag in the per-AG inode tree + * - Set the RECLAIM tag in the per-fs AG tree + * - Schedule background inode reclamation + * + * When it is time for background inode inactivation, we: + * - Set the INACTIVATING inode flag + * - Make all the on-disk updates + * - Clear both INACTIVATING and NEED_INACTIVE inode flags + * - Decrement the per-AG inactive count + * - Clear the INACTIVE tag in the per-AG inode tree + * - Clear the INACTIVE tag in the per-fs AG tree if that was the last one + * - Kick the inode into reclamation per the previous paragraph. + * + * When it is time for background inode reclamation, we: + * - Set the IRECLAIM inode flag + * - Detach all the resources and remove the inode from the per-AG inode tree + * - Clear both IRECLAIM and RECLAIMABLE inode flags + * - Decrement the per-AG reclaim count + * - Clear the RECLAIM tag from the per-AG inode tree + * - Clear the RECLAIM tag from the per-fs AG tree if there are no more + * inodes waiting for reclamation or inactivation + */ + +/* Queue a new inode inactivation pass if there are reclaimable inodes. */ +static void +xfs_inactive_work_queue( + struct xfs_mount *mp) +{ + rcu_read_lock(); + if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_INACTIVE_TAG)) + queue_delayed_work(mp->m_inactive_workqueue, + &mp->m_inactive_work, + msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10)); + rcu_read_unlock(); +} + +/* Remember that an AG has one more inode to inactivate. */ +static void +xfs_perag_set_inactive_tag( + struct xfs_perag *pag) +{ + struct xfs_mount *mp = pag->pag_mount; + + lockdep_assert_held(&pag->pag_ici_lock); + if (pag->pag_ici_inactive++) + return; + + /* propagate the inactive tag up into the perag radix tree */ + spin_lock(&mp->m_perag_lock); + radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno, + XFS_ICI_INACTIVE_TAG); + spin_unlock(&mp->m_perag_lock); + + /* schedule periodic background inode inactivation */ + xfs_inactive_work_queue(mp); + + trace_xfs_perag_set_inactive(mp, pag->pag_agno, -1, _RET_IP_); +} + +/* Remember that an AG has one less inode to inactivate. */ +static void +xfs_perag_clear_inactive_tag( + struct xfs_perag *pag) +{ + struct xfs_mount *mp = pag->pag_mount; + + lockdep_assert_held(&pag->pag_ici_lock); + if (--pag->pag_ici_inactive) + return; + + /* clear the inactive tag from the perag radix tree */ + spin_lock(&mp->m_perag_lock); + radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno, + XFS_ICI_INACTIVE_TAG); + spin_unlock(&mp->m_perag_lock); + trace_xfs_perag_clear_inactive(mp, pag->pag_agno, -1, _RET_IP_); +} + +/* + * Grab the inode for inactivation exclusively. + * Return true if we grabbed it. + */ +STATIC bool +xfs_inactive_grab( + struct xfs_inode *ip, + int flags) +{ + ASSERT(rcu_read_lock_held()); + + /* quick check for stale RCU freed inode */ + if (!ip->i_ino) + return false; + + /* + * The radix tree lock here protects a thread in xfs_iget from racing + * with us starting reclaim on the inode. + * + * Due to RCU lookup, we may find inodes that have been freed and only + * have XFS_IRECLAIM set. Indeed, we may see reallocated inodes that + * aren't candidates for reclaim at all, so we must check the + * XFS_IRECLAIMABLE is set first before proceeding to reclaim. + * Obviously if XFS_NEED_INACTIVE isn't set then we ignore this inode. + */ + spin_lock(&ip->i_flags_lock); + if (!(ip->i_flags & XFS_NEED_INACTIVE) || + (ip->i_flags & XFS_INACTIVATING)) { + /* not a inactivation candidate. */ + spin_unlock(&ip->i_flags_lock); + return false; + } + + ip->i_flags |= XFS_INACTIVATING; + spin_unlock(&ip->i_flags_lock); + return true; +} + +/* Inactivate this inode. */ +STATIC int +xfs_inactive_inode( + struct xfs_inode *ip, + struct xfs_perag *pag, + void *args) +{ + struct xfs_eofblocks *eofb = args; + + ASSERT(ip->i_mount->m_super->s_writers.frozen < SB_FREEZE_FS); + + /* + * Not a match for our passed in scan filter? Put it back on the shelf + * and move on. + */ + spin_lock(&ip->i_flags_lock); + if (!xfs_inode_matches_eofb(ip, eofb)) { + ip->i_flags &= ~XFS_INACTIVATING; + spin_unlock(&ip->i_flags_lock); + return 0; + } + spin_unlock(&ip->i_flags_lock); + + trace_xfs_inode_inactivating(ip); + + /* Update metadata prior to freeing inode. */ + xfs_inode_inactivation_cleanup(ip); + xfs_inactive(ip); + ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0); + + /* + * Clear the inactive state flags and schedule a reclaim run once + * we're done with the inactivations. We must ensure that the inode + * smoothly transitions from inactivating to reclaimable so that iget + * cannot see either data structure midway through the transition. + */ + spin_lock(&pag->pag_ici_lock); + spin_lock(&ip->i_flags_lock); + + ip->i_flags &= ~(XFS_NEED_INACTIVE | XFS_INACTIVATING); + xfs_inode_clear_inactive_tag(pag, ip->i_ino); + + __xfs_inode_set_reclaim_tag(pag, ip); + + spin_unlock(&ip->i_flags_lock); + spin_unlock(&pag->pag_ici_lock); + + return 0; +} + +static const struct xfs_ici_walk_ops xfs_inactive_iwalk_ops = { + .igrab = xfs_inactive_grab, + .iwalk = xfs_inactive_inode, +}; + +/* + * Walk the AGs and reclaim the inodes in them. Even if the filesystem is + * corrupted, we still need to clear the INACTIVE iflag so that we can move + * on to reclaiming the inode. + */ +int +xfs_inactive_inodes( + struct xfs_mount *mp, + struct xfs_eofblocks *eofb) +{ + return xfs_ici_walk_fns(mp, &xfs_inactive_iwalk_ops, 0, eofb, + XFS_ICI_INACTIVE_TAG); +} + +/* Try to get inode inactivation moving. */ +void +xfs_inactive_worker( + struct work_struct *work) +{ + struct xfs_mount *mp = container_of(to_delayed_work(work), + struct xfs_mount, m_inactive_work); + int error; + + /* + * We want to skip inode inactivation while the filesystem is frozen + * because we don't want the inactivation thread to block while taking + * sb_intwrite. Therefore, we try to take sb_write for the duration + * of the inactive scan -- a freeze attempt will block until we're + * done here, and if the fs is past stage 1 freeze we'll bounce out + * until things unfreeze. If the fs goes down while frozen we'll + * still have log recovery to clean up after us. + */ + if (!sb_start_write_trylock(mp->m_super)) + return; + + error = xfs_inactive_inodes(mp, NULL); + if (error && error != -EAGAIN) + xfs_err(mp, "inode inactivation failed, error %d", error); + + sb_end_write(mp->m_super); + xfs_inactive_work_queue(mp); +} + +/* Flush all inode inactivation work that might be queued. */ +void +xfs_inactive_force( + struct xfs_mount *mp) +{ + queue_delayed_work(mp->m_inactive_workqueue, &mp->m_inactive_work, 0); + flush_delayed_work(&mp->m_inactive_work); +} + +/* + * Flush all inode inactivation work that might be queued, make sure the + * delayed work item is not queued, and then make sure there aren't any more + * inodes waiting to be inactivated. + */ +void +xfs_inactive_shutdown( + struct xfs_mount *mp) +{ + cancel_delayed_work_sync(&mp->m_inactive_work); + flush_workqueue(mp->m_inactive_workqueue); + xfs_inactive_inodes(mp, NULL); + cancel_delayed_work_sync(&mp->m_reclaim_work); + xfs_reclaim_inodes(mp, SYNC_WAIT); +} diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 3c34c0e2e266..d6e79e7b5d94 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -28,6 +28,8 @@ struct xfs_eofblocks { #define XFS_ICI_RECLAIM_TAG 0 /* inode is to be reclaimed */ /* Inode has speculative preallocations (posteof or cow) to clean. */ #define XFS_ICI_BLOCK_GC_TAG 1 +/* Inode can be inactivated. */ +#define XFS_ICI_INACTIVE_TAG 2 /* * Flags for xfs_iget() @@ -56,6 +58,7 @@ int xfs_reclaim_inodes_count(struct xfs_mount *mp); long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan); void xfs_inode_set_reclaim_tag(struct xfs_inode *ip); +void xfs_inode_set_inactive_tag(struct xfs_inode *ip); bool xfs_inode_free_quota_blocks(struct xfs_inode *ip, bool sync); int xfs_inode_free_blocks(struct xfs_mount *mp, bool sync); @@ -79,4 +82,9 @@ int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp, void xfs_blockgc_stop(struct xfs_mount *mp); void xfs_blockgc_start(struct xfs_mount *mp); +void xfs_inactive_worker(struct work_struct *work); +int xfs_inactive_inodes(struct xfs_mount *mp, struct xfs_eofblocks *eofb); +void xfs_inactive_force(struct xfs_mount *mp); +void xfs_inactive_shutdown(struct xfs_mount *mp); + #endif diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index aa019e49e512..2977086e7374 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1864,6 +1864,68 @@ xfs_inode_iadjust( xfs_qm_iadjust(ip, direction, inodes, dblocks, rblocks); } +/* Clean up inode inactivation. */ +void +xfs_inode_inactivation_cleanup( + struct xfs_inode *ip) +{ + int ret; + + if (XFS_FORCED_SHUTDOWN(ip->i_mount)) + return; + + /* + * Undo the pending-inactivation counter updates since we're bringing + * this inode back to life. + */ + ret = xfs_qm_dqattach(ip); + if (ret) + xfs_err(ip->i_mount, "error %d reactivating inode quota", ret); + + xfs_inode_iadjust(ip, -1); +} + +/* Prepare inode for inactivation. */ +void +xfs_inode_inactivation_prep( + struct xfs_inode *ip) +{ + int ret; + + if (XFS_FORCED_SHUTDOWN(ip->i_mount)) + return; + + /* + * If this inode is unlinked (and now unreferenced) we need to dispose + * of it in the on disk metadata. + * + * Bump generation so that the inode can't be opened by handle now that + * the last external references has dropped. Bulkstat won't return + * inodes with zero nlink so nobody will ever find this inode again. + * Then add this inode & blocks to the counts of things that will be + * freed during the next inactivation run. + */ + if (VFS_I(ip)->i_nlink == 0) + VFS_I(ip)->i_generation++; + + /* + * Increase the pending-inactivation counters so that the fs looks like + * it's free. + */ + ret = xfs_qm_dqattach(ip); + if (ret) + xfs_err(ip->i_mount, "error %d inactivating inode quota", ret); + + xfs_inode_iadjust(ip, 1); + + /* + * Detach dquots just in case someone tries a quotaoff while + * the inode is waiting on the inactive list. We'll reattach + * them (if needed) when inactivating the inode. + */ + xfs_qm_dqdetach(ip); +} + /* * Returns true if we need to update the on-disk metadata before we can free * the memory used by this inode. Updates include freeing post-eof @@ -1955,6 +2017,16 @@ xfs_inactive( if (mp->m_flags & XFS_MOUNT_RDONLY) return; + /* + * Re-attach dquots prior to freeing EOF blocks or CoW staging extents. + * We dropped the dquot prior to inactivation (because quotaoff can't + * resurrect inactive inodes to force-drop the dquot) so we /must/ + * do this before touching any block mappings. + */ + error = xfs_qm_dqattach(ip); + if (error) + return; + /* Try to clean out the cow blocks if there are any. */ if (xfs_inode_has_cow_data(ip)) xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true); @@ -1980,10 +2052,6 @@ xfs_inactive( ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0)) truncate = 1; - error = xfs_qm_dqattach(ip); - if (error) - return; - if (S_ISLNK(VFS_I(ip)->i_mode)) error = xfs_inactive_symlink(ip); else if (truncate) diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 0a46548e51a8..4a3e472d7078 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -212,6 +212,7 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip) #define XFS_IRECLAIMABLE (1 << 2) /* inode can be reclaimed */ #define __XFS_INEW_BIT 3 /* inode has just been allocated */ #define XFS_INEW (1 << __XFS_INEW_BIT) +#define XFS_NEED_INACTIVE (1 << 4) /* see XFS_INACTIVATING below */ #define XFS_ITRUNCATED (1 << 5) /* truncated down so flush-on-close */ #define XFS_IDIRTY_RELEASE (1 << 6) /* dirty release already seen */ #define __XFS_IFLOCK_BIT 7 /* inode is being flushed right now */ @@ -228,6 +229,15 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip) #define XFS_IRECOVERY (1 << 11) #define XFS_ICOWBLOCKS (1 << 12)/* has the cowblocks tag set */ +/* + * If we need to update on-disk metadata before this IRECLAIMABLE inode can be + * freed, then NEED_INACTIVE will be set. Once we start the updates, the + * INACTIVATING bit will be set to keep iget away from this inode. After the + * inactivation completes, both flags will be cleared and the inode is a + * plain old IRECLAIMABLE inode. + */ +#define XFS_INACTIVATING (1 << 13) + /* * Per-lifetime flags need to be reset when re-using a reclaimable inode during * inode lookup. This prevents unintended behaviour on the new inode from @@ -235,7 +245,8 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip) */ #define XFS_IRECLAIM_RESET_FLAGS \ (XFS_IRECLAIMABLE | XFS_IRECLAIM | \ - XFS_IDIRTY_RELEASE | XFS_ITRUNCATED) + XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \ + XFS_INACTIVATING) /* * Synchronize processes attempting to flush the in-core inode back to disk. @@ -499,6 +510,8 @@ extern struct kmem_zone *xfs_inode_zone; bool xfs_inode_verify_forks(struct xfs_inode *ip); int xfs_has_eofblocks(struct xfs_inode *ip, bool *has); bool xfs_inode_needs_inactivation(struct xfs_inode *ip); +void xfs_inode_inactivation_prep(struct xfs_inode *ip); +void xfs_inode_inactivation_cleanup(struct xfs_inode *ip); int xfs_iunlink_init(struct xfs_perag *pag); void xfs_iunlink_destroy(struct xfs_perag *pag); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index e45bee3c8faf..b398f197d748 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -475,6 +475,7 @@ xfs_iomap_prealloc_size( alloc_blocks); freesp = percpu_counter_read_positive(&mp->m_fdblocks); + freesp += percpu_counter_read_positive(&mp->m_dinactive); if (freesp < mp->m_low_space[XFS_LOWSP_5_PCNT]) { shift = 2; if (freesp < mp->m_low_space[XFS_LOWSP_4_PCNT]) diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 99ec3fba4548..730a36675b55 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -5102,6 +5102,13 @@ xlog_recover_process_iunlinks( } xfs_buf_rele(agibp); } + + /* + * Now that we've put all the iunlink inodes on the lru, let's make + * sure that we perform all the on-disk metadata updates to actually + * free those inodes. + */ + xfs_inactive_force(mp); } STATIC void diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index ea74bd3be0bf..27729a8c8c12 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -1009,7 +1009,8 @@ xfs_mountfs( /* Clean out dquots that might be in memory after quotacheck. */ xfs_qm_unmount(mp); /* - * Cancel all delayed reclaim work and reclaim the inodes directly. + * Shut down all pending inode inactivation work, which will also + * cancel all delayed reclaim work and reclaim the inodes directly. * We have to do this /after/ rtunmount and qm_unmount because those * two will have scheduled delayed reclaim for the rt/quota inodes. * @@ -1019,8 +1020,7 @@ xfs_mountfs( * qm_unmount_quotas and therefore rely on qm_unmount to release the * quota inodes. */ - cancel_delayed_work_sync(&mp->m_reclaim_work); - xfs_reclaim_inodes(mp, SYNC_WAIT); + xfs_inactive_shutdown(mp); xfs_health_unmount(mp); out_log_dealloc: mp->m_flags |= XFS_MOUNT_UNMOUNTING; @@ -1058,6 +1058,13 @@ xfs_unmountfs( uint64_t resblks; int error; + /* + * Perform all on-disk metadata updates required to inactivate inodes. + * Since this can involve finobt updates, do it now before we lose the + * per-AG space reservations. + */ + xfs_inactive_force(mp); + xfs_blockgc_stop(mp); xfs_fs_unreserve_ag_blocks(mp); xfs_qm_unmount_quotas(mp); @@ -1108,6 +1115,13 @@ xfs_unmountfs( xfs_qm_unmount(mp); + /* + * Kick off inode inactivation again to push the metadata inodes into + * reclamation, then flush out all the work because we're going away + * soon. + */ + xfs_inactive_shutdown(mp); + /* * Unreserve any blocks we have so that when we unmount we don't account * the reserved free space as used. This is really only necessary for diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index d203c922dc51..51f88b56bbbe 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -162,6 +162,7 @@ typedef struct xfs_mount { atomic_t m_active_trans; /* number trans frozen */ struct xfs_mru_cache *m_filestream; /* per-mount filestream data */ struct delayed_work m_reclaim_work; /* background inode reclaim */ + struct delayed_work m_inactive_work; /* background inode inactive */ bool m_update_sb; /* sb needs update in mount */ int64_t m_low_space[XFS_LOWSP_MAX]; /* low free space thresholds */ @@ -176,6 +177,7 @@ typedef struct xfs_mount { struct workqueue_struct *m_cil_workqueue; struct workqueue_struct *m_reclaim_workqueue; struct workqueue_struct *m_blockgc_workqueue; + struct workqueue_struct *m_inactive_workqueue; struct workqueue_struct *m_sync_workqueue; /* @@ -343,7 +345,8 @@ typedef struct xfs_perag { spinlock_t pag_ici_lock; /* incore inode cache lock */ struct radix_tree_root pag_ici_root; /* incore inode cache root */ - int pag_ici_reclaimable; /* reclaimable inodes */ + unsigned int pag_ici_reclaimable; /* reclaimable inodes */ + unsigned int pag_ici_inactive; /* inactive inodes */ struct mutex pag_ici_reclaim_lock; /* serialisation point */ unsigned long pag_ici_reclaim_cursor; /* reclaim restart point */ diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index fa0db72f8d0d..43ba4e6b5e22 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -105,6 +105,12 @@ xfs_qm_scall_quotaoff( uint inactivate_flags; struct xfs_qoff_logitem *qoffstart; + /* + * Clean up the inactive list before we turn quota off, to reduce the + * amount of quotaoff work we have to do with the mutex held. + */ + xfs_inactive_force(mp); + /* * No file system can have quotas enabled on disk but not in core. * Note that quota utilities (like quotaoff) _expect_ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index ed10ba2cd087..14c5d002c358 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -520,8 +520,15 @@ xfs_init_mount_workqueues( if (!mp->m_sync_workqueue) goto out_destroy_eofb; + mp->m_inactive_workqueue = alloc_workqueue("xfs-inactive/%s", + WQ_MEM_RECLAIM | WQ_FREEZABLE, 0, mp->m_super->s_id); + if (!mp->m_inactive_workqueue) + goto out_destroy_sync; + return 0; +out_destroy_sync: + destroy_workqueue(mp->m_inactive_workqueue); out_destroy_eofb: destroy_workqueue(mp->m_blockgc_workqueue); out_destroy_reclaim: @@ -540,6 +547,7 @@ STATIC void xfs_destroy_mount_workqueues( struct xfs_mount *mp) { + destroy_workqueue(mp->m_inactive_workqueue); destroy_workqueue(mp->m_sync_workqueue); destroy_workqueue(mp->m_blockgc_workqueue); destroy_workqueue(mp->m_reclaim_workqueue); @@ -627,28 +635,34 @@ xfs_fs_destroy_inode( struct inode *inode) { struct xfs_inode *ip = XFS_I(inode); + struct xfs_mount *mp = ip->i_mount; + bool need_inactive; trace_xfs_destroy_inode(ip); ASSERT(!rwsem_is_locked(&inode->i_rwsem)); - XFS_STATS_INC(ip->i_mount, vn_rele); - XFS_STATS_INC(ip->i_mount, vn_remove); - - xfs_inactive(ip); - - if (!XFS_FORCED_SHUTDOWN(ip->i_mount) && ip->i_delayed_blks) { + XFS_STATS_INC(mp, vn_rele); + XFS_STATS_INC(mp, vn_remove); + + need_inactive = xfs_inode_needs_inactivation(ip); + if (need_inactive) { + trace_xfs_inode_set_need_inactive(ip); + xfs_inode_inactivation_prep(ip); + } else if (!XFS_FORCED_SHUTDOWN(ip->i_mount) && ip->i_delayed_blks) { xfs_check_delalloc(ip, XFS_DATA_FORK); xfs_check_delalloc(ip, XFS_COW_FORK); ASSERT(0); } - - XFS_STATS_INC(ip->i_mount, vn_reclaim); + XFS_STATS_INC(mp, vn_reclaim); + trace_xfs_inode_set_reclaimable(ip); /* * We should never get here with one of the reclaim flags already set. */ ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIMABLE)); ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIM)); + ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_NEED_INACTIVE)); + ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_INACTIVATING)); /* * We always use background reclaim here because even if the @@ -657,7 +671,10 @@ xfs_fs_destroy_inode( * this more efficiently than we can here, so simply let background * reclaim tear down all inodes. */ - xfs_inode_set_reclaim_tag(ip); + if (need_inactive) + xfs_inode_set_inactive_tag(ip); + else + xfs_inode_set_reclaim_tag(ip); } static void @@ -942,6 +959,18 @@ xfs_fs_unfreeze( return 0; } +/* + * Before we get to stage 1 of a freeze, force all the inactivation work so + * that there's less work to do if we crash during the freeze. + */ +STATIC int +xfs_fs_freeze_super( + struct super_block *sb) +{ + xfs_inactive_force(XFS_M(sb)); + return freeze_super(sb); +} + /* * This function fills in xfs_mount_t fields based on mount args. * Note: the superblock _has_ now been read in. @@ -1141,6 +1170,7 @@ static const struct super_operations xfs_super_operations = { .show_options = xfs_fs_show_options, .nr_cached_objects = xfs_fs_nr_cached_objects, .free_cached_objects = xfs_fs_free_cached_objects, + .freeze_super = xfs_fs_freeze_super, }; static int @@ -1699,6 +1729,13 @@ xfs_remount_ro( return error; } + /* + * Perform all on-disk metadata updates required to inactivate inodes. + * Since this can involve finobt updates, do it now before we lose the + * per-AG space reservations. + */ + xfs_inactive_force(mp); + /* Free the per-AG metadata reservation pool. */ error = xfs_fs_unreserve_ag_blocks(mp); if (error) { @@ -1819,6 +1856,7 @@ static int xfs_init_fs_context( atomic_set(&mp->m_active_trans, 0); INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker); INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker); + INIT_DELAYED_WORK(&mp->m_inactive_work, xfs_inactive_worker); mp->m_kobj.kobject.kset = xfs_kset; /* * We don't create the finobt per-ag space reservation until after log diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 0064f4491d66..9233f51020af 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -133,6 +133,8 @@ DEFINE_PERAG_REF_EVENT(xfs_perag_set_reclaim); DEFINE_PERAG_REF_EVENT(xfs_perag_clear_reclaim); DEFINE_PERAG_REF_EVENT(xfs_perag_set_blockgc); DEFINE_PERAG_REF_EVENT(xfs_perag_clear_blockgc); +DEFINE_PERAG_REF_EVENT(xfs_perag_set_inactive); +DEFINE_PERAG_REF_EVENT(xfs_perag_clear_inactive); DECLARE_EVENT_CLASS(xfs_ag_class, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno), @@ -593,14 +595,17 @@ DECLARE_EVENT_CLASS(xfs_inode_class, TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_ino_t, ino) + __field(unsigned long, iflags) ), TP_fast_assign( __entry->dev = VFS_I(ip)->i_sb->s_dev; __entry->ino = ip->i_ino; + __entry->iflags = ip->i_flags; ), - TP_printk("dev %d:%d ino 0x%llx", + TP_printk("dev %d:%d ino 0x%llx iflags 0x%lx", MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->ino) + __entry->ino, + __entry->iflags) ) #define DEFINE_INODE_EVENT(name) \ @@ -610,6 +615,8 @@ DEFINE_EVENT(xfs_inode_class, name, \ DEFINE_INODE_EVENT(xfs_iget_skip); DEFINE_INODE_EVENT(xfs_iget_reclaim); DEFINE_INODE_EVENT(xfs_iget_reclaim_fail); +DEFINE_INODE_EVENT(xfs_iget_inactive); +DEFINE_INODE_EVENT(xfs_iget_inactive_fail); DEFINE_INODE_EVENT(xfs_iget_hit); DEFINE_INODE_EVENT(xfs_iget_miss); @@ -644,6 +651,10 @@ DEFINE_INODE_EVENT(xfs_inode_free_eofblocks_invalid); DEFINE_INODE_EVENT(xfs_inode_set_cowblocks_tag); DEFINE_INODE_EVENT(xfs_inode_clear_cowblocks_tag); DEFINE_INODE_EVENT(xfs_inode_free_cowblocks_invalid); +DEFINE_INODE_EVENT(xfs_inode_set_reclaimable); +DEFINE_INODE_EVENT(xfs_inode_reclaiming); +DEFINE_INODE_EVENT(xfs_inode_set_need_inactive); +DEFINE_INODE_EVENT(xfs_inode_inactivating); /* * ftrace's __print_symbolic requires that all enum values be wrapped in the From patchwork Wed Jan 1 01:09:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11314721 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C5DB81398 for ; Wed, 1 Jan 2020 01:09:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A3C4420718 for ; Wed, 1 Jan 2020 01:09:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="DDPuTZ1N" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727133AbgAABJ3 (ORCPT ); Tue, 31 Dec 2019 20:09:29 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:52112 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727166AbgAABJ3 (ORCPT ); Tue, 31 Dec 2019 20:09:29 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00119R5Z089230 for ; Wed, 1 Jan 2020 01:09:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=MB051OOVeLNKLRqjmW8cN13Rjv3Q8LzBl9DCJpNA+zk=; b=DDPuTZ1NomoPBa5TjFo/LgArvDknQO/JQFKK1mVzIbQP74QA9XyVGBSOaiHTBpfLwnSD hp4OuW0sufW7AOpLfEXXYKwxXQLUT5ZOQYKDzdQAKvfJJ2i8u5xjiy7gVm3mey+S0bQM joN/U4LjTC0p+IIjWKlBxAytJd1+hx0DB27DKjzcaezNQF4ZRBi7dpy5i2rT7KSCXxqU /30Y6QyZoUE533c6Q0zM6MIlCule/oz4vU5QIIWb626oyaq6i13AsQi5zKA1te4KPaVu Keuf0hqYLtEwgC8u5JldxT5JOiy6+NRuKhbNF/+Fjr5hNFI25YniaxsRYIrBEqmJ5tp/ bA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 2x5y0pjxu3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:27 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118xhJ012473 for ; Wed, 1 Jan 2020 01:09:27 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 2x8gueecgt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:27 +0000 Received: from abhmp0020.oracle.com (abhmp0020.oracle.com [141.146.116.26]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 00119QJR031519 for ; Wed, 1 Jan 2020 01:09:26 GMT Received: from localhost (/10.159.150.156) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Dec 2019 17:09:25 -0800 Subject: [PATCH 07/10] xfs: force inode inactivation and retry fs writes when there isn't space From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 31 Dec 2019 17:09:23 -0800 Message-ID: <157784096361.1362752.6646122257069139549.stgit@magnolia> In-Reply-To: <157784092020.1362752.15046503361741521784.stgit@magnolia> References: <157784092020.1362752.15046503361741521784.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Any time we try to modify a file's contents and it fails due to ENOSPC or EDQUOT, force inactivation work to free up some resources and try one more time. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_iomap.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index b398f197d748..192852499729 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -883,6 +883,7 @@ xfs_buffered_write_iomap_begin( struct xfs_iext_cursor icur, ccur; xfs_fsblock_t prealloc_blocks = 0; bool eof = false, cow_eof = false, shared = false; + bool cleared_space = false; int allocfork = XFS_DATA_FORK; int error = 0; @@ -893,6 +894,7 @@ xfs_buffered_write_iomap_begin( ASSERT(!XFS_IS_REALTIME_INODE(ip)); +start_over: xfs_ilock(ip, XFS_ILOCK_EXCL); if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ip, XFS_DATA_FORK)) || @@ -1035,6 +1037,18 @@ xfs_buffered_write_iomap_begin( break; case -ENOSPC: case -EDQUOT: + /* + * If the delalloc reservation failed due to a lack of space, + * try to flush inactive inodes to free some space. + */ + if (!cleared_space) { + cleared_space = true; + allocfork = XFS_DATA_FORK; + xfs_iunlock(ip, XFS_ILOCK_EXCL); + xfs_inactive_force(mp); + error = 0; + goto start_over; + } /* retry without any preallocation */ trace_xfs_delalloc_enospc(ip, offset, count); if (prealloc_blocks) { From patchwork Wed Jan 1 01:09:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11314723 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 429E6109A for ; Wed, 1 Jan 2020 01:09:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2087F206E6 for ; Wed, 1 Jan 2020 01:09:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="d6J1ZuoY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727134AbgAABJi (ORCPT ); Tue, 31 Dec 2019 20:09:38 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:53288 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727132AbgAABJi (ORCPT ); Tue, 31 Dec 2019 20:09:38 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00119bYH109853 for ; Wed, 1 Jan 2020 01:09:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=CagbB94odETlEZcrFpS621J9eybQAQZHkRNViqrmyk8=; b=d6J1ZuoYfJm8G7K/sDI/UFB+gytyqbOa/TzaK6dnTn2Td3Aq9jvrLMqWWAYEPhR0jnHR wogLDrfLt8RuS9yEmSiI8RQeHAkkg8mXSZ19K1Qbxc4USqHJKnMoOu22EATYLBHXLNp3 BLop6/El+G0L7/QxRWLFxo2eedsMPn10tTygnGQy+JQsOXof/x6vuI0epIE3e3lEGpWR KbBiEKzfErsRJ1g0xHnNuJjgd4Hhc/TcUn6OasBeqRAvXSS1p5SVOWeJwuIraVk45qYD A+9uTa9xQu02Yg7/6FJKFEcSmYvhkDXn6QhUWZylNIj1/weZLkUcxF3WRXff8SICAPY0 iA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 2x5xftk2dg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:37 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118xKL012522 for ; Wed, 1 Jan 2020 01:09:36 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 2x8gueedrh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:36 +0000 Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 00119Xku031579 for ; Wed, 1 Jan 2020 01:09:33 GMT Received: from localhost (/10.159.150.156) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Dec 2019 17:09:32 -0800 Subject: [PATCH 08/10] xfs: force inactivation before fallocate when space is low From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 31 Dec 2019 17:09:29 -0800 Message-ID: <157784096970.1362752.7717767986299881332.stgit@magnolia> In-Reply-To: <157784092020.1362752.15046503361741521784.stgit@magnolia> References: <157784092020.1362752.15046503361741521784.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=926 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=976 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong If we think that inactivation will free enough blocks to make it easier to satisfy an fallocate request, force inactivation. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 6553d533d659..5b66f89d5f36 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -28,6 +28,7 @@ #include "xfs_icache.h" #include "xfs_iomap.h" #include "xfs_reflink.h" +#include "xfs_sb.h" /* Kernel only BMAP related definitions and functions */ @@ -697,6 +698,41 @@ xfs_free_eofblocks( return error; } +/* + * If we suspect that the target device isn't going to be able to satisfy the + * entire request, try forcing inode inactivation to free up space. While it's + * perfectly fine to fill a preallocation request with a bunch of short + * extents, we'd prefer to do the inactivation work now to combat long term + * fragmentation in new file data. + */ +static void +xfs_alloc_reclaim_inactive_space( + struct xfs_mount *mp, + bool is_rt, + xfs_filblks_t allocatesize_fsb) +{ + struct xfs_perag *pag; + struct xfs_sb *sbp = &mp->m_sb; + xfs_extlen_t free; + xfs_agnumber_t agno; + + if (is_rt) { + if (sbp->sb_frextents * sbp->sb_rextsize >= allocatesize_fsb) + return; + } else { + for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) { + pag = xfs_perag_get(mp, agno); + free = pag->pagf_freeblks; + xfs_perag_put(pag); + + if (free >= allocatesize_fsb) + return; + } + } + + xfs_inactive_force(mp); +} + int xfs_alloc_file_space( struct xfs_inode *ip, @@ -786,6 +822,8 @@ xfs_alloc_file_space( quota_flag = XFS_QMOPT_RES_REGBLKS; } + xfs_alloc_reclaim_inactive_space(mp, rt, allocatesize_fsb); + /* * Allocate and setup the transaction. */ From patchwork Wed Jan 1 01:09:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11314725 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0106B109A for ; Wed, 1 Jan 2020 01:09:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C9E52206E6 for ; Wed, 1 Jan 2020 01:09:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="cPTSSQ9p" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727142AbgAABJm (ORCPT ); Tue, 31 Dec 2019 20:09:42 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:49042 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727132AbgAABJm (ORCPT ); Tue, 31 Dec 2019 20:09:42 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00119eg9091654 for ; Wed, 1 Jan 2020 01:09:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=4AkJxBoM3op4qhuSPnGb328kKxkTM0LbeYU+cPgC97s=; b=cPTSSQ9pEkP8kj1L+rn8jMt5Kj6B3hnpvarey425jGdRr7nSPuBVGIwItBQd9rJzgsgK xEH0O+nCcvjn3mZ6SO9pKPWSErlYt39jKniWplew8vlIgJibtSbQcY4DMiGchf5LS2P+ SCw87A4eJMGrc9E9/hid3qNGJteuCzjVcyHZGVcKNc+OvoDsjzZLtHavYOhsfZMJ9vIK Z/R/D65Z9X2t/aQzkhqZhrmExRmnPK8vwLzaf9mStKNU5e2H6KvE9DEACzrPDwpr0KnP e4XSHZLfKtFOI1JzXfLK/wobzctklO0zbLTOeS2Ouot8Xjos48NEHRkY/nCP/4b2jbQ9 qA== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 2x5ypqjwd7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:40 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118uv8190222 for ; Wed, 1 Jan 2020 01:09:40 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 2x8bsrfyun-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:40 +0000 Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 00119dWP031590 for ; Wed, 1 Jan 2020 01:09:39 GMT Received: from localhost (/10.159.150.156) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Dec 2019 17:09:39 -0800 Subject: [PATCH 09/10] xfs: parallelize inode inactivation From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 31 Dec 2019 17:09:36 -0800 Message-ID: <157784097668.1362752.16785191645786207862.stgit@magnolia> In-Reply-To: <157784092020.1362752.15046503361741521784.stgit@magnolia> References: <157784092020.1362752.15046503361741521784.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Split the inode inactivation work into per-AG work items so that we can take advantage of parallelization. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/common.c | 2 + fs/xfs/xfs_icache.c | 90 ++++++++++++++++++++++++++++++++++++++++++------- fs/xfs/xfs_icache.h | 2 + fs/xfs/xfs_mount.c | 3 ++ fs/xfs/xfs_mount.h | 4 ++ fs/xfs/xfs_super.c | 5 ++- 6 files changed, 90 insertions(+), 16 deletions(-) diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 52fc05ee7ef8..402d42a277f4 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -910,6 +910,7 @@ xchk_stop_reaping( { sc->flags |= XCHK_REAPING_DISABLED; xfs_blockgc_stop(sc->mp); + xfs_inactive_cancel_work(sc->mp); } /* Restart background reaping of resources. */ @@ -917,6 +918,7 @@ void xchk_start_reaping( struct xfs_scrub *sc) { + xfs_inactive_schedule_now(sc->mp); xfs_blockgc_start(sc->mp); sc->flags &= ~XCHK_REAPING_DISABLED; } diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 13b318dc2e89..5240e9e517d7 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -2130,12 +2130,12 @@ xfs_inode_clear_cowblocks_tag( /* Queue a new inode inactivation pass if there are reclaimable inodes. */ static void xfs_inactive_work_queue( - struct xfs_mount *mp) + struct xfs_perag *pag) { rcu_read_lock(); - if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_INACTIVE_TAG)) - queue_delayed_work(mp->m_inactive_workqueue, - &mp->m_inactive_work, + if (radix_tree_tagged(&pag->pag_ici_root, XFS_ICI_INACTIVE_TAG)) + queue_delayed_work(pag->pag_mount->m_inactive_workqueue, + &pag->pag_inactive_work, msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10)); rcu_read_unlock(); } @@ -2158,7 +2158,7 @@ xfs_perag_set_inactive_tag( spin_unlock(&mp->m_perag_lock); /* schedule periodic background inode inactivation */ - xfs_inactive_work_queue(mp); + xfs_inactive_work_queue(pag); trace_xfs_perag_set_inactive(mp, pag->pag_agno, -1, _RET_IP_); } @@ -2275,6 +2275,19 @@ static const struct xfs_ici_walk_ops xfs_inactive_iwalk_ops = { .iwalk = xfs_inactive_inode, }; +/* + * Inactivate the inodes in an AG. Even if the filesystem is corrupted, we + * still need to clear the INACTIVE iflag so that we can move on to reclaiming + * the inode. + */ +static int +xfs_inactive_inodes_pag( + struct xfs_perag *pag) +{ + return xfs_ici_walk_ag(pag, &xfs_inactive_iwalk_ops, 0, NULL, + XFS_ICI_INACTIVE_TAG); +} + /* * Walk the AGs and reclaim the inodes in them. Even if the filesystem is * corrupted, we still need to clear the INACTIVE iflag so that we can move @@ -2294,8 +2307,9 @@ void xfs_inactive_worker( struct work_struct *work) { - struct xfs_mount *mp = container_of(to_delayed_work(work), - struct xfs_mount, m_inactive_work); + struct xfs_perag *pag = container_of(to_delayed_work(work), + struct xfs_perag, pag_inactive_work); + struct xfs_mount *mp = pag->pag_mount; int error; /* @@ -2310,12 +2324,31 @@ xfs_inactive_worker( if (!sb_start_write_trylock(mp->m_super)) return; - error = xfs_inactive_inodes(mp, NULL); + error = xfs_inactive_inodes_pag(pag); if (error && error != -EAGAIN) xfs_err(mp, "inode inactivation failed, error %d", error); sb_end_write(mp->m_super); - xfs_inactive_work_queue(mp); + xfs_inactive_work_queue(pag); +} + +/* Wait for all background inactivation work to finish. */ +static void +xfs_inactive_flush( + struct xfs_mount *mp) +{ + struct xfs_perag *pag; + xfs_agnumber_t agno; + + for_each_perag_tag(mp, agno, pag, XFS_ICI_INACTIVE_TAG) { + bool flush; + + spin_lock(&pag->pag_ici_lock); + flush = pag->pag_ici_inactive > 0; + spin_unlock(&pag->pag_ici_lock); + if (flush) + flush_delayed_work(&pag->pag_inactive_work); + } } /* Flush all inode inactivation work that might be queued. */ @@ -2323,8 +2356,8 @@ void xfs_inactive_force( struct xfs_mount *mp) { - queue_delayed_work(mp->m_inactive_workqueue, &mp->m_inactive_work, 0); - flush_delayed_work(&mp->m_inactive_work); + xfs_inactive_schedule_now(mp); + xfs_inactive_flush(mp); } /* @@ -2336,9 +2369,40 @@ void xfs_inactive_shutdown( struct xfs_mount *mp) { - cancel_delayed_work_sync(&mp->m_inactive_work); - flush_workqueue(mp->m_inactive_workqueue); + xfs_inactive_cancel_work(mp); xfs_inactive_inodes(mp, NULL); cancel_delayed_work_sync(&mp->m_reclaim_work); xfs_reclaim_inodes(mp, SYNC_WAIT); } + +/* Cancel all queued inactivation work. */ +void +xfs_inactive_cancel_work( + struct xfs_mount *mp) +{ + struct xfs_perag *pag; + xfs_agnumber_t agno; + + for_each_perag_tag(mp, agno, pag, XFS_ICI_INACTIVE_TAG) + cancel_delayed_work_sync(&pag->pag_inactive_work); + flush_workqueue(mp->m_inactive_workqueue); +} + +/* Cancel all pending deferred inactivation work and reschedule it now. */ +void +xfs_inactive_schedule_now( + struct xfs_mount *mp) +{ + struct xfs_perag *pag; + xfs_agnumber_t agno; + + for_each_perag_tag(mp, agno, pag, XFS_ICI_INACTIVE_TAG) { + spin_lock(&pag->pag_ici_lock); + if (pag->pag_ici_inactive) { + cancel_delayed_work(&pag->pag_inactive_work); + queue_delayed_work(mp->m_inactive_workqueue, + &pag->pag_inactive_work, 0); + } + spin_unlock(&pag->pag_ici_lock); + } +} diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index d6e79e7b5d94..a82b473b88a2 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -86,5 +86,7 @@ void xfs_inactive_worker(struct work_struct *work); int xfs_inactive_inodes(struct xfs_mount *mp, struct xfs_eofblocks *eofb); void xfs_inactive_force(struct xfs_mount *mp); void xfs_inactive_shutdown(struct xfs_mount *mp); +void xfs_inactive_cancel_work(struct xfs_mount *mp); +void xfs_inactive_schedule_now(struct xfs_mount *mp); #endif diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 27729a8c8c12..b9b37eff4063 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -127,6 +127,7 @@ __xfs_free_perag( struct xfs_perag *pag = container_of(head, struct xfs_perag, rcu_head); ASSERT(!delayed_work_pending(&pag->pag_blockgc_work)); + ASSERT(!delayed_work_pending(&pag->pag_inactive_work)); ASSERT(atomic_read(&pag->pag_ref) == 0); kmem_free(pag); } @@ -148,6 +149,7 @@ xfs_free_perag( ASSERT(pag); ASSERT(atomic_read(&pag->pag_ref) == 0); cancel_delayed_work_sync(&pag->pag_blockgc_work); + cancel_delayed_work_sync(&pag->pag_inactive_work); xfs_iunlink_destroy(pag); xfs_buf_hash_destroy(pag); mutex_destroy(&pag->pag_ici_reclaim_lock); @@ -204,6 +206,7 @@ xfs_initialize_perag( spin_lock_init(&pag->pag_ici_lock); mutex_init(&pag->pag_ici_reclaim_lock); INIT_DELAYED_WORK(&pag->pag_blockgc_work, xfs_blockgc_worker); + INIT_DELAYED_WORK(&pag->pag_inactive_work, xfs_inactive_worker); INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC); if (xfs_buf_hash_init(pag)) goto out_free_pag; diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 51f88b56bbbe..87a62b0543ec 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -162,7 +162,6 @@ typedef struct xfs_mount { atomic_t m_active_trans; /* number trans frozen */ struct xfs_mru_cache *m_filestream; /* per-mount filestream data */ struct delayed_work m_reclaim_work; /* background inode reclaim */ - struct delayed_work m_inactive_work; /* background inode inactive */ bool m_update_sb; /* sb needs update in mount */ int64_t m_low_space[XFS_LOWSP_MAX]; /* low free space thresholds */ @@ -366,6 +365,9 @@ typedef struct xfs_perag { /* background prealloc block trimming */ struct delayed_work pag_blockgc_work; + /* background inode inactivation */ + struct delayed_work pag_inactive_work; + /* reference count */ uint8_t pagf_refcount_level; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 14c5d002c358..fced499ecdc9 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -521,7 +521,8 @@ xfs_init_mount_workqueues( goto out_destroy_eofb; mp->m_inactive_workqueue = alloc_workqueue("xfs-inactive/%s", - WQ_MEM_RECLAIM | WQ_FREEZABLE, 0, mp->m_super->s_id); + WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_FREEZABLE, 0, + mp->m_super->s_id); if (!mp->m_inactive_workqueue) goto out_destroy_sync; @@ -1449,6 +1450,7 @@ xfs_configure_background_workqueues( max_active = min_t(unsigned int, max_active, WQ_UNBOUND_MAX_ACTIVE); workqueue_set_max_active(mp->m_blockgc_workqueue, max_active); + workqueue_set_max_active(mp->m_inactive_workqueue, max_active); } static int @@ -1856,7 +1858,6 @@ static int xfs_init_fs_context( atomic_set(&mp->m_active_trans, 0); INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker); INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker); - INIT_DELAYED_WORK(&mp->m_inactive_work, xfs_inactive_worker); mp->m_kobj.kobject.kset = xfs_kset; /* * We don't create the finobt per-ag space reservation until after log From patchwork Wed Jan 1 01:09:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11314727 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4C1E91398 for ; Wed, 1 Jan 2020 01:09:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2063B2064B for ; Wed, 1 Jan 2020 01:09:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Gh6Ug9oE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727145AbgAABJs (ORCPT ); Tue, 31 Dec 2019 20:09:48 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:53350 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727132AbgAABJs (ORCPT ); Tue, 31 Dec 2019 20:09:48 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 001191v8109731 for ; Wed, 1 Jan 2020 01:09:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=yM4L5lqj7e/lftvqVOX/k5304v0dG8SNaXmQQTmW+Ag=; b=Gh6Ug9oEGu0mLDz46chjYPcdH6ZC7qx0omI0poKXKDFVfASqQrGKE7QjXuwLW5D1LOXJ Xd4VuKYbNU3HUcmiVndd4/k7slffyIKwQum8itzEu+1xqyUx0ZUhlkqqEIR72tMAbSQD cpB1VN1S488gZ2OXsncOLieh3YagRaLCC4S3Hyq1cy54FY2/ok7wuwEd87ahr6l5hfKd l81ISOb1+vK5jYX3viOBenZiLILunG9sl4nLoACiUDCOsOF85b6F1lrQ/g5iq/Mc1LAp F6wNqss/4Kcxn1OIW1wmoYIMBCGc4WEaKVtRAPkMSsFtQFlNKdNYhhvPyqqbb7lS/nEE zA== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 2x5xftk2dv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:47 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id 00118vI5045385 for ; Wed, 1 Jan 2020 01:09:46 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3030.oracle.com with ESMTP id 2x7medfbfc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jan 2020 01:09:46 +0000 Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 00119jC3027758 for ; Wed, 1 Jan 2020 01:09:45 GMT Received: from localhost (/10.159.150.156) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Dec 2019 17:09:45 -0800 Subject: [PATCH 10/10] xfs: create a polled function to force inode inactivation From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 31 Dec 2019 17:09:43 -0800 Message-ID: <157784098338.1362752.12534751621591800147.stgit@magnolia> In-Reply-To: <157784092020.1362752.15046503361741521784.stgit@magnolia> References: <157784092020.1362752.15046503361741521784.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9487 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-2001010009 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Create a polled version of xfs_inactive_force so that we can force inactivation while holding a lock (usually the umount lock) without tripping over the softlockup timer. This is for callers that hold vfs locks while calling inactivation, which is currently unmount, iunlink processing during mount, and rw->ro remount. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_icache.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_icache.h | 2 ++ fs/xfs/xfs_mount.c | 2 +- fs/xfs/xfs_mount.h | 6 ++++++ fs/xfs/xfs_super.c | 3 ++- 5 files changed, 61 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 5240e9e517d7..892bb789dcbf 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -25,6 +25,7 @@ #include "xfs_health.h" #include +#include STATIC int xfs_inode_free_eofblocks(struct xfs_inode *ip, struct xfs_perag *pag, void *args); @@ -2284,8 +2285,12 @@ static int xfs_inactive_inodes_pag( struct xfs_perag *pag) { - return xfs_ici_walk_ag(pag, &xfs_inactive_iwalk_ops, 0, NULL, + int error; + + error = xfs_ici_walk_ag(pag, &xfs_inactive_iwalk_ops, 0, NULL, XFS_ICI_INACTIVE_TAG); + wake_up(&pag->pag_mount->m_inactive_wait); + return error; } /* @@ -2298,8 +2303,12 @@ xfs_inactive_inodes( struct xfs_mount *mp, struct xfs_eofblocks *eofb) { - return xfs_ici_walk_fns(mp, &xfs_inactive_iwalk_ops, 0, eofb, + int error; + + error = xfs_ici_walk_fns(mp, &xfs_inactive_iwalk_ops, 0, eofb, XFS_ICI_INACTIVE_TAG); + wake_up(&mp->m_inactive_wait); + return error; } /* Try to get inode inactivation moving. */ @@ -2406,3 +2415,42 @@ xfs_inactive_schedule_now( spin_unlock(&pag->pag_ici_lock); } } + +/* Return true if there are inodes still being inactivated. */ +static bool +xfs_inactive_pending( + struct xfs_mount *mp) +{ + struct xfs_perag *pag; + xfs_agnumber_t agno = 0; + bool ret = false; + + while (!ret && + (pag = xfs_perag_get_tag(mp, agno, XFS_ICI_INACTIVE_TAG))) { + agno = pag->pag_agno + 1; + spin_lock(&pag->pag_ici_lock); + if (pag->pag_ici_inactive) + ret = true; + spin_unlock(&pag->pag_ici_lock); + xfs_perag_put(pag); + } + + return ret; +} + +/* + * Flush all pending inactivation work and poll until finished. This function + * is for callers that must flush with vfs locks held, such as unmount, + * remount, and iunlinks processing during mount. + */ +void +xfs_inactive_force_poll( + struct xfs_mount *mp) +{ + xfs_inactive_schedule_now(mp); + + while (!wait_event_timeout(mp->m_inactive_wait, + xfs_inactive_pending(mp) == false, HZ)) { + touch_softlockup_watchdog(); + } +} diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index a82b473b88a2..75332d4450ba 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -89,4 +89,6 @@ void xfs_inactive_shutdown(struct xfs_mount *mp); void xfs_inactive_cancel_work(struct xfs_mount *mp); void xfs_inactive_schedule_now(struct xfs_mount *mp); +void xfs_inactive_force_poll(struct xfs_mount *mp); + #endif diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index b9b37eff4063..5e2ce91f4ab8 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -1066,7 +1066,7 @@ xfs_unmountfs( * Since this can involve finobt updates, do it now before we lose the * per-AG space reservations. */ - xfs_inactive_force(mp); + xfs_inactive_force_poll(mp); xfs_blockgc_stop(mp); xfs_fs_unreserve_ag_blocks(mp); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 87a62b0543ec..237a15a136c8 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -206,6 +206,12 @@ typedef struct xfs_mount { * into a single flush. */ struct work_struct m_flush_inodes_work; + + /* + * Use this to wait for the inode inactivation workqueue to finish + * inactivating all the inodes. + */ + struct wait_queue_head m_inactive_wait; } xfs_mount_t; #define M_IGEO(mp) (&(mp)->m_ino_geo) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index fced499ecdc9..af1fe32247cf 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1736,7 +1736,7 @@ xfs_remount_ro( * Since this can involve finobt updates, do it now before we lose the * per-AG space reservations. */ - xfs_inactive_force(mp); + xfs_inactive_force_poll(mp); /* Free the per-AG metadata reservation pool. */ error = xfs_fs_unreserve_ag_blocks(mp); @@ -1859,6 +1859,7 @@ static int xfs_init_fs_context( INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker); INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker); mp->m_kobj.kobject.kset = xfs_kset; + init_waitqueue_head(&mp->m_inactive_wait); /* * We don't create the finobt per-ag space reservation until after log * recovery, so we must set this to true so that an ifree transaction