From patchwork Wed Apr 17 01:40:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10904339 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D7089922 for ; Wed, 17 Apr 2019 01:40:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B15E2289FB for ; Wed, 17 Apr 2019 01:40:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A571E289FC; Wed, 17 Apr 2019 01:40:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D2B0289DA for ; Wed, 17 Apr 2019 01:40:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728959AbfDQBkM (ORCPT ); Tue, 16 Apr 2019 21:40:12 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38206 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728514AbfDQBkM (ORCPT ); Tue, 16 Apr 2019 21:40:12 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3H1ckss097381 for ; Wed, 17 Apr 2019 01:40:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=H9RxD3MllsDR/1/J6+sBzdCqqjZH1t2DbVtbttiomDI=; b=Yn8J3uII7wbWYKPVuqx+j/gth5RipGyTgR8XUfjBfR8SwumIiGgVDtz+r3LE8jm5Xs2i JBDp3SKUphpnwpOxQBrHhTgrlN/TPVAtP7V/65V9yjE/BAK9h6+zXjD1xJRAriH+KCHM 8u1GfCPRsExeHNQR5PGHzkFg9V6evhbIEHUtU1YV4Z+oOSwiZGTcOA9k2M6ESk1THPAB 2Lf1BowHJY+OUAkPD/QrrgwZyItaumo4fZ1qxied17ZDaw2QL6G+0dKbKdWaf05TTXOt NA1jZLttkhCRdIct50P78IBMPNBHV+6UakMnLTIzB35NnIg7yK5kq4Q5HH8caizuyEnm aw== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2120.oracle.com with ESMTP id 2rusnewxmb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 17 Apr 2019 01:40:10 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3H1e91C052651 for ; Wed, 17 Apr 2019 01:40:09 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3020.oracle.com with ESMTP id 2rv2tv320r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 17 Apr 2019 01:40:09 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x3H1e72U022186 for ; Wed, 17 Apr 2019 01:40:07 GMT Received: from localhost (/10.159.224.90) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 16 Apr 2019 18:40:07 -0700 Subject: [PATCH 1/3] xfs: track delayed allocation reservations across From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 16 Apr 2019 18:40:06 -0700 Message-ID: <155546520611.176278.16800498529488238281.stgit@magnolia> In-Reply-To: <155546519998.176278.8300210828717055034.stgit@magnolia> References: <155546519998.176278.8300210828717055034.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9229 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904170007 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9229 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904170007 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Add a percpu counter to track the number of blocks directly reserved for delayed allocations on the data device. This counter (in contrast to i_delayed_blks) does not track allocated CoW staging extents or anything going on with the realtime device. It will be used in the upcoming summary counter scrub function to check the free block counts without having to freeze the filesystem or walk all the inodes to find the delayed allocations. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 17 ++++++++++++++--- fs/xfs/xfs_mount.h | 15 +++++++++++++++ fs/xfs/xfs_super.c | 9 +++++++++ 3 files changed, 38 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 4637ae1ae91c..356ebd1cbe82 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -2009,6 +2009,9 @@ xfs_bmap_add_extent_delay_real( goto done; } + if (da_new != da_old) + xfs_mod_delalloc(mp, (int64_t)da_new - da_old); + if (bma->cur) { da_new += bma->cur->bc_private.b.allocated; bma->cur->bc_private.b.allocated = 0; @@ -2640,6 +2643,7 @@ xfs_bmap_add_extent_hole_delay( /* * Nothing to do for disk quota accounting here. */ + xfs_mod_delalloc(ip->i_mount, (int64_t)newlen - oldlen); } } @@ -3352,8 +3356,10 @@ xfs_bmap_btalloc_accounting( * already have quota reservation and there's nothing to do * yet. */ - if (ap->wasdel) + if (ap->wasdel) { + xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)args->len); return; + } /* * Otherwise, we've allocated blocks in a hole. The transaction @@ -3372,8 +3378,10 @@ xfs_bmap_btalloc_accounting( /* data/attr fork only */ ap->ip->i_d.di_nblocks += args->len; xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE); - if (ap->wasdel) + if (ap->wasdel) { ap->ip->i_delayed_blks -= args->len; + xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)args->len); + } xfs_trans_mod_dquot_byino(ap->tp, ap->ip, ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT : XFS_TRANS_DQ_BCOUNT, args->len); @@ -3969,6 +3977,7 @@ xfs_bmapi_reserve_delalloc( ip->i_delayed_blks += alen; + xfs_mod_delalloc(ip->i_mount, alen + indlen); got->br_startoff = aoff; got->br_startblock = nullstartblock(indlen); @@ -4840,8 +4849,10 @@ xfs_bmap_del_extent_delay( da_diff = da_old - da_new; if (!isrt) da_diff += del->br_blockcount; - if (da_diff) + if (da_diff) { xfs_mod_fdblocks(mp, da_diff, false); + xfs_mod_delalloc(mp, -da_diff); + } return error; } diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 14fba76ab811..f5cf4eb22d8e 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -81,6 +81,12 @@ typedef struct xfs_mount { struct percpu_counter m_icount; /* allocated inodes counter */ struct percpu_counter m_ifree; /* free inodes counter */ struct percpu_counter m_fdblocks; /* free block counter */ + /* + * Count of data device blocks reserved for delayed allocations, + * including indlen blocks. Does not include allocated CoW staging + * extents or anything related to the rt device. + */ + struct percpu_counter m_delalloc_blks; struct xfs_buf *m_sb_bp; /* buffer for superblock */ char *m_fsname; /* filesystem name */ @@ -476,4 +482,13 @@ struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp, int error_class, int error); void xfs_force_summary_recalc(struct xfs_mount *mp); +/* Update the in-core delayed block counter. */ +static inline void +xfs_mod_delalloc( + struct xfs_mount *mp, + int64_t delta) +{ + percpu_counter_add(&mp->m_delalloc_blks, delta); +} + #endif /* __XFS_MOUNT_H__ */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index df917f41ca46..3a870a194cd9 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1538,8 +1538,14 @@ xfs_init_percpu_counters( if (error) goto free_ifree; + error = percpu_counter_init(&mp->m_delalloc_blks, 0, GFP_KERNEL); + if (error) + goto free_fdblocks; + return 0; +free_fdblocks: + percpu_counter_destroy(&mp->m_fdblocks); free_ifree: percpu_counter_destroy(&mp->m_ifree); free_icount: @@ -1563,6 +1569,9 @@ xfs_destroy_percpu_counters( percpu_counter_destroy(&mp->m_icount); percpu_counter_destroy(&mp->m_ifree); percpu_counter_destroy(&mp->m_fdblocks); + ASSERT(XFS_FORCED_SHUTDOWN(mp) || + percpu_counter_sum(&mp->m_delalloc_blks) == 0); + percpu_counter_destroy(&mp->m_delalloc_blks); } static struct xfs_mount * From patchwork Wed Apr 17 01:40:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10904341 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 755E81515 for ; Wed, 17 Apr 2019 01:40:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5A7FC28905 for ; Wed, 17 Apr 2019 01:40:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4F1F928A2D; Wed, 17 Apr 2019 01:40:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DF78628A24 for ; Wed, 17 Apr 2019 01:40:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729155AbfDQBkT (ORCPT ); Tue, 16 Apr 2019 21:40:19 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38284 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728514AbfDQBkT (ORCPT ); Tue, 16 Apr 2019 21:40:19 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3H1cSnW097296 for ; Wed, 17 Apr 2019 01:40:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=QgubM0cV8XV4lyV3hbeS/ejnBhMa32UBnOdYXQT54lo=; b=CRivrrCOTkFG7aYvgmU8BWYPIiRXGncRIZv4m4pmNJefNbzGUNzo03yGHvevFiJ68IF5 yisyXWZPs4C9BIuK4Y4/TksDd7mR7Q2lLmdTxx0Lyz/gM0XbxO5U8gloP9+lheRZ6+XH YP9aiNNGDfG7DQNZoFmxRMLrV4Xii/L5M7CafK+hjDp6A/0HR74u337j77qe+sRmuDDB pqm6HUkUgra59RiBLdvQT00k60tQJSW+g8WSVZJYBMaoESIvxPlswJcsXoG5rmLRPPpo j/4fGzcyGnY3h0TJbp5UhaMk1ehjcpBnCJ1HLGmNk1KHgkLhn5PFaUbgAXG6m5/Duniu 1A== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 2rusnewxmm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 17 Apr 2019 01:40:17 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3H1d2VY109512 for ; Wed, 17 Apr 2019 01:40:17 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 2ru4vtgwx7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 17 Apr 2019 01:40:17 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x3H1eGTS007261 for ; Wed, 17 Apr 2019 01:40:16 GMT Received: from localhost (/10.159.224.90) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 16 Apr 2019 18:40:16 -0700 Subject: [PATCH 2/3] xfs: allow scrubbers to pause background reclaim From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 16 Apr 2019 18:40:12 -0700 Message-ID: <155546521227.176278.16067700926705972688.stgit@magnolia> In-Reply-To: <155546519998.176278.8300210828717055034.stgit@magnolia> References: <155546519998.176278.8300210828717055034.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9229 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904170007 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9229 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904170007 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong The forthcoming summary counter patch races with regular filesystem activity to compute rough expected values for the counters. This design was chosen to avoid having to freeze the entire filesystem to check the counters, but while that's running we'd prefer to minimize background reclamation activity to reduce the perturbations to the incore free block count. Therefore, provide a way for scrubbers to disable background posteof and cowblock reclamation. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/common.c | 18 ++++++++++++++++++ fs/xfs/scrub/common.h | 2 ++ fs/xfs/scrub/scrub.c | 2 ++ fs/xfs/scrub/scrub.h | 1 + 4 files changed, 23 insertions(+) diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 7076d5c98151..a406a22a734f 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -894,3 +894,21 @@ xchk_ilock_inverted( } return -EDEADLOCK; } + +/* Pause background reclamation and inactivation. */ +void +xchk_disable_reclaim( + struct xfs_scrub *sc) +{ + sc->flags |= XCHK_RECLAIM_DISABLED; + xfs_icache_disable_reclaim(sc->mp); +} + +/* Unpause background reclamation and inactivation. */ +void +xchk_enable_reclaim( + struct xfs_scrub *sc) +{ + xfs_icache_enable_reclaim(sc->mp); + sc->flags &= ~XCHK_RECLAIM_DISABLED; +} diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index e26a430bd466..2288f45a5606 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -137,5 +137,7 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm) int xchk_metadata_inode_forks(struct xfs_scrub *sc); int xchk_ilock_inverted(struct xfs_inode *ip, uint lock_mode); +void xchk_disable_reclaim(struct xfs_scrub *sc); +void xchk_enable_reclaim(struct xfs_scrub *sc); #endif /* __XFS_SCRUB_COMMON_H__ */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 93b0f075a4d3..421c22a0bf39 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -187,6 +187,8 @@ xchk_teardown( xfs_irele(sc->ip); sc->ip = NULL; } + if (sc->flags & XCHK_RECLAIM_DISABLED) + xchk_enable_reclaim(sc); if (sc->flags & XCHK_HAS_QUOTAOFFLOCK) { mutex_unlock(&sc->mp->m_quotainfo->qi_quotaofflock); sc->flags &= ~XCHK_HAS_QUOTAOFFLOCK; diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 1b280f8f185a..1f6de7bbb9f5 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -80,6 +80,7 @@ struct xfs_scrub { /* XCHK state flags grow up from zero, XREP state flags grown down from 2^31 */ #define XCHK_TRY_HARDER (1 << 0) /* can't get resources, try again */ #define XCHK_HAS_QUOTAOFFLOCK (1 << 1) /* we hold the quotaoff lock */ +#define XCHK_RECLAIM_DISABLED (1 << 2) /* background reclaim is paused */ #define XREP_ALREADY_FIXED (1 << 31) /* checking our repair work */ /* Metadata scrubbers */ From patchwork Wed Apr 17 01:40:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10904343 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB3CE1515 for ; Wed, 17 Apr 2019 01:40:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BFC902877B for ; Wed, 17 Apr 2019 01:40:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B41A728A24; Wed, 17 Apr 2019 01:40:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0DCFE28A39 for ; Wed, 17 Apr 2019 01:40:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728373AbfDQBk1 (ORCPT ); Tue, 16 Apr 2019 21:40:27 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38436 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728514AbfDQBk1 (ORCPT ); Tue, 16 Apr 2019 21:40:27 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3H1ccu8097335 for ; Wed, 17 Apr 2019 01:40:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=N3TXFUFIWPSLFqgGXRiv/X7zVsvQtcaEOEMqMHPCKwc=; b=Y4ioRHdJkbo2CW8K4jbSxkg+IPhSq9xB4+0teXar8JPrhrd6+oN8qZ+Uwc1DAzM/lc2c 4Z4L+8U5UGcaiUauojoeSPtp03ccZMDfsRp8X5WPj4yVInlqYDK+ZHykp9UbxDgFMab5 +cJ5VEk3Anc5BjViCI3F908KXN9zTdn9WSgNztx9wIkinFBaHIAb6s3lD3FdQFAvWKsM M2ioYNSUcj0Bv97m4ImiCUBBgS8k7/cdwCXtu1Az38Cfqej5Lw3Me9Pzl3DuTjUcxIx6 epzUY6YD7UYMMcG4UHArtDKqIB/yIzWfdrOl9MfHtZwA//hiLk/qIYNftY3ZfI0ntZ70 kg== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2120.oracle.com with ESMTP id 2rusnewxmw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 17 Apr 2019 01:40:24 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3H1e6xL019487 for ; Wed, 17 Apr 2019 01:40:24 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3030.oracle.com with ESMTP id 2rwe7a41g2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 17 Apr 2019 01:40:24 +0000 Received: from abhmp0022.oracle.com (abhmp0022.oracle.com [141.146.116.28]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x3H1eMBI022234 for ; Wed, 17 Apr 2019 01:40:23 GMT Received: from localhost (/10.159.224.90) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 16 Apr 2019 18:40:22 -0700 Subject: [PATCH 3/3] xfs: add online scrub/repair for superblock counters From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Tue, 16 Apr 2019 18:40:21 -0700 Message-ID: <155546522160.176278.12189319767486072198.stgit@magnolia> In-Reply-To: <155546519998.176278.8300210828717055034.stgit@magnolia> References: <155546519998.176278.8300210828717055034.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9229 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904170007 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9229 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904170007 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Teach online scrub and repair how to check and reset the superblock inode and block counters. The AG rebuilding functions will need these to adjust the counts if they need to change as a part of recovering from corruption. We must use the repair freeze mechanism to prevent any other changes while we do this. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_fs.h | 3 - fs/xfs/scrub/common.h | 1 fs/xfs/scrub/fscounters.c | 229 +++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/health.c | 1 fs/xfs/scrub/scrub.c | 6 + fs/xfs/scrub/scrub.h | 7 + fs/xfs/scrub/trace.h | 48 +++++++++ 8 files changed, 294 insertions(+), 2 deletions(-) create mode 100644 fs/xfs/scrub/fscounters.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index b20964e26a22..1dfc6df2e2bd 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -143,6 +143,7 @@ xfs-y += $(addprefix scrub/, \ common.o \ dabtree.o \ dir.o \ + fscounters.o \ health.o \ ialloc.o \ inode.o \ diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 43a53b03247b..e7382c780ed7 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -578,9 +578,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_UQUOTA 21 /* user quotas */ #define XFS_SCRUB_TYPE_GQUOTA 22 /* group quotas */ #define XFS_SCRUB_TYPE_PQUOTA 23 /* project quotas */ +#define XFS_SCRUB_TYPE_FSCOUNTERS 24 /* fs summary counters */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 24 +#define XFS_SCRUB_TYPE_NR 25 /* i: Repair this metadata. */ #define XFS_SCRUB_IFLAG_REPAIR (1 << 0) diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 2288f45a5606..7de945eace00 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -105,6 +105,7 @@ xchk_setup_quota(struct xfs_scrub *sc, struct xfs_inode *ip) return -ENOENT; } #endif +int xchk_setup_fscounters(struct xfs_scrub *sc, struct xfs_inode *ip); void xchk_ag_free(struct xfs_scrub *sc, struct xchk_ag *sa); int xchk_ag_init(struct xfs_scrub *sc, xfs_agnumber_t agno, diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c new file mode 100644 index 000000000000..c809213d8cfe --- /dev/null +++ b/fs/xfs/scrub/fscounters.c @@ -0,0 +1,229 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2019 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_alloc.h" +#include "xfs_ialloc.h" +#include "xfs_rmap.h" +#include "xfs_error.h" +#include "xfs_errortag.h" +#include "xfs_icache.h" +#include "xfs_health.h" +#include "xfs_bmap.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/trace.h" +#include "scrub/repair.h" + +/* + * FS Summary Counters + * =================== + * + * The basics of filesystem summary counter checking are that we iterate the + * AGs counting the number of free blocks, free space btree blocks, per-AG + * reservations, inodes, delayed allocation reservations, and free inodes. + * Then we compare what we computed against the in-core counters. + * + * However, the reality is that summary counters are a tricky beast to check. + * While we /could/ freeze the filesystem and scramble around the AGs counting + * the free blocks, in practice we prefer not do that for a scan because + * freezing is costly. To get around this, we added a per-cpu counter of the + * delalloc reservations so that we can rotor around the AGs relatively + * quickly, and we allow the counts to be slightly off because we're not + * taking any locks while we do this. + */ + +int +xchk_setup_fscounters( + struct xfs_scrub *sc, + struct xfs_inode *ip) +{ + sc->buf = kmem_zalloc(sizeof(struct xchk_fscounters), KM_SLEEP); + if (!sc->buf) + return -ENOMEM; + + /* + * Pause background reclaim while we're scrubbing to reduce the + * likelihood of background perturbations to the counters throwing + * off our calculations. + */ + xchk_disable_reclaim(sc); + + return xchk_trans_alloc(sc, 0); +} + +/* + * Calculate what the global in-core counters ought to be from the AG header + * contents. Callers can compare this to the actual in-core counters to + * calculate by how much both in-core and on-disk counters need to be + * adjusted. + */ +STATIC int +xchk_fscounters_calc( + struct xfs_scrub *sc, + struct xchk_fscounters *fsc) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_buf *agi_bp; + struct xfs_buf *agf_bp; + struct xfs_agi *agi; + struct xfs_agf *agf; + struct xfs_perag *pag; + uint64_t delayed; + xfs_agnumber_t agno; + int error; + + for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) { + /* Lock both AG headers. */ + error = xfs_ialloc_read_agi(mp, sc->tp, agno, &agi_bp); + if (error) + return error; + error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, &agf_bp); + if (error) + return error; + if (!agf_bp) + return -ENOMEM; + + /* Count all the inodes */ + agi = XFS_BUF_TO_AGI(agi_bp); + fsc->icount += be32_to_cpu(agi->agi_count); + fsc->ifree += be32_to_cpu(agi->agi_freecount); + + /* Add up the free/freelist/bnobt/cntbt blocks */ + agf = XFS_BUF_TO_AGF(agf_bp); + fsc->fdblocks += be32_to_cpu(agf->agf_freeblks); + fsc->fdblocks += be32_to_cpu(agf->agf_flcount); + fsc->fdblocks += be32_to_cpu(agf->agf_btreeblks); + + /* + * Per-AG reservations are taken out of the incore counters, + * so they must be left out of the free blocks computation. + */ + pag = xfs_perag_get(mp, agno); + fsc->fdblocks -= pag->pag_meta_resv.ar_reserved; + fsc->fdblocks -= pag->pag_rmapbt_resv.ar_orig_reserved; + xfs_perag_put(pag); + + xfs_trans_brelse(sc->tp, agf_bp); + xfs_trans_brelse(sc->tp, agi_bp); + } + + /* + * The global incore space reservation is taken from the incore + * counters, so leave that out of the computation. + */ + fsc->fdblocks -= mp->m_resblks_avail; + + /* + * Delayed allocation reservations are taken out of the incore counters + * but not recorded on disk, so leave them and their indlen blocks out + * of the computation. + */ + delayed = percpu_counter_sum(&mp->m_delalloc_blks); + fsc->fdblocks -= delayed; + + trace_xchk_fscounters_calc(mp, fsc->icount, fsc->ifree, fsc->fdblocks, + delayed); + + /* Bail out if the values we compute are totally nonsense. */ + if (!xfs_verify_icount(mp, fsc->icount) || + fsc->fdblocks > mp->m_sb.sb_dblocks || + fsc->ifree > fsc->icount) + return -EFSCORRUPTED; + + return 0; +} + +/* + * Is the @counter within an acceptable range of @expected? + * + * Currently that means 1/16th (6%) or @nr_range of the @expected value. + */ +static inline bool +xchk_fscounter_within_range( + struct xfs_scrub *sc, + struct percpu_counter *counter, + uint64_t expected, + uint64_t nr_range) +{ + int64_t value = percpu_counter_sum(counter); + uint64_t range; + + range = max_t(uint64_t, expected >> 4, nr_range); + if (value < 0) + return false; + if (range < expected && value < expected - range) + return false; + if ((int64_t)(expected + range) >= 0 && value > expected + range) + return false; + return true; +} + +/* Check the superblock counters. */ +int +xchk_fscounters( + struct xfs_scrub *sc) +{ + struct xfs_mount *mp = sc->mp; + struct xchk_fscounters *fsc = sc->buf; + int64_t icount, ifree, fdblocks; + int error; + + icount = percpu_counter_sum(&sc->mp->m_icount); + ifree = percpu_counter_sum(&sc->mp->m_ifree); + fdblocks = percpu_counter_sum(&sc->mp->m_fdblocks); + + if (icount < 0 || ifree < 0 || fdblocks < 0) + xchk_block_set_corrupt(sc, mp->m_sb_bp); + + /* See if icount is obviously wrong. */ + if (!xfs_verify_icount(mp, icount)) + xchk_block_set_corrupt(sc, mp->m_sb_bp); + + /* See if fdblocks / ifree are obviously wrong. */ + if (fdblocks > mp->m_sb.sb_dblocks) + xchk_block_set_corrupt(sc, mp->m_sb_bp); + if (ifree > icount) + xchk_block_set_corrupt(sc, mp->m_sb_bp); + + /* If we already know it's bad, we can skip the AG iteration. */ + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return 0; + + /* Counters seem ok, but let's count them. */ + error = xchk_fscounters_calc(sc, fsc); + if (!xchk_process_error(sc, 0, XFS_SB_BLOCK(sc->mp), &error)) + return error; + + /* + * Compare the in-core counters with whatever we counted. We'll + * consider the inode counts ok if they're within 1024 inodes, and the + * free block counts if they're within 1/64th of the filesystem size. + */ + if (!xchk_fscounter_within_range(sc, &mp->m_icount, fsc->icount, 1024)) + xchk_block_set_corrupt(sc, mp->m_sb_bp); + + if (!xchk_fscounter_within_range(sc, &mp->m_ifree, fsc->ifree, 1024)) + xchk_block_set_corrupt(sc, mp->m_sb_bp); + + if (!xchk_fscounter_within_range(sc, &mp->m_fdblocks, fsc->fdblocks, + mp->m_sb.sb_dblocks >> 6)) + xchk_block_set_corrupt(sc, mp->m_sb_bp); + + return 0; +} diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c index 16b536aa125e..23cf8e2f25db 100644 --- a/fs/xfs/scrub/health.c +++ b/fs/xfs/scrub/health.c @@ -109,6 +109,7 @@ static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = { [XFS_SCRUB_TYPE_UQUOTA] = { XHG_FS, XFS_SICK_FS_UQUOTA }, [XFS_SCRUB_TYPE_GQUOTA] = { XHG_FS, XFS_SICK_FS_GQUOTA }, [XFS_SCRUB_TYPE_PQUOTA] = { XHG_FS, XFS_SICK_FS_PQUOTA }, + [XFS_SCRUB_TYPE_FSCOUNTERS] = { XHG_FS, XFS_SICK_FS_COUNTERS }, }; /* Return the health status mask for this scrub type. */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 421c22a0bf39..4d5d00d35ef7 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -352,6 +352,12 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .scrub = xchk_quota, .repair = xrep_notsupported, }, + [XFS_SCRUB_TYPE_FSCOUNTERS] = { /* fs summary counters */ + .type = ST_FS, + .setup = xchk_setup_fscounters, + .scrub = xchk_fscounters, + .repair = xrep_notsupported, + }, }; /* This isn't a stable feature, warn once per day. */ diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 1f6de7bbb9f5..caa90ea5a22e 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -127,6 +127,7 @@ xchk_quota(struct xfs_scrub *sc) return -ENOENT; } #endif +int xchk_fscounters(struct xfs_scrub *sc); /* cross-referencing helpers */ void xchk_xref_is_used_space(struct xfs_scrub *sc, xfs_agblock_t agbno, @@ -152,4 +153,10 @@ void xchk_xref_is_used_rt_space(struct xfs_scrub *sc, xfs_rtblock_t rtbno, # define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0) #endif +struct xchk_fscounters { + uint64_t icount; + uint64_t ifree; + uint64_t fdblocks; +}; + #endif /* __XFS_SCRUB_SCRUB_H__ */ diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 3c83e8b3b39c..7120aee4a506 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTSUM); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_UQUOTA); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_GQUOTA); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_PQUOTA); +TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_FSCOUNTERS); #define XFS_SCRUB_TYPE_STRINGS \ { XFS_SCRUB_TYPE_PROBE, "probe" }, \ @@ -75,7 +76,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_PQUOTA); { XFS_SCRUB_TYPE_RTSUM, "rtsummary" }, \ { XFS_SCRUB_TYPE_UQUOTA, "usrquota" }, \ { XFS_SCRUB_TYPE_GQUOTA, "grpquota" }, \ - { XFS_SCRUB_TYPE_PQUOTA, "prjquota" } + { XFS_SCRUB_TYPE_PQUOTA, "prjquota" }, \ + { XFS_SCRUB_TYPE_FSCOUNTERS, "fscounters" } DECLARE_EVENT_CLASS(xchk_class, TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm, @@ -590,6 +592,50 @@ TRACE_EVENT(xchk_iallocbt_check_cluster, __entry->cluster_ino) ) +TRACE_EVENT(xchk_fscounters_calc, + TP_PROTO(struct xfs_mount *mp, uint64_t icount, uint64_t ifree, + uint64_t fdblocks, uint64_t delalloc), + TP_ARGS(mp, icount, ifree, fdblocks, delalloc), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int64_t, icount_sb) + __field(int64_t, icount_percpu) + __field(uint64_t, icount_calculated) + __field(int64_t, ifree_sb) + __field(int64_t, ifree_percpu) + __field(uint64_t, ifree_calculated) + __field(int64_t, fdblocks_sb) + __field(int64_t, fdblocks_percpu) + __field(uint64_t, fdblocks_calculated) + __field(uint64_t, delalloc) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->icount_sb = mp->m_sb.sb_icount; + __entry->icount_percpu = percpu_counter_sum(&mp->m_icount); + __entry->icount_calculated = icount; + __entry->ifree_sb = mp->m_sb.sb_ifree; + __entry->ifree_percpu = percpu_counter_sum(&mp->m_ifree); + __entry->ifree_calculated = ifree; + __entry->fdblocks_sb = mp->m_sb.sb_fdblocks; + __entry->fdblocks_percpu = percpu_counter_sum(&mp->m_fdblocks); + __entry->fdblocks_calculated = fdblocks; + __entry->delalloc = delalloc; + ), + TP_printk("dev %d:%d icount %lld:%lld:%llu ifree %lld:%lld:%llu fdblocks %lld:%lld:%llu delalloc %llu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->icount_sb, + __entry->icount_percpu, + __entry->icount_calculated, + __entry->ifree_sb, + __entry->ifree_percpu, + __entry->ifree_calculated, + __entry->fdblocks_sb, + __entry->fdblocks_percpu, + __entry->fdblocks_calculated, + __entry->delalloc) +) + /* repair tracepoints */ #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)