From patchwork Thu Jul 26 00:19:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544971 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38CBB112E for ; Thu, 26 Jul 2018 00:19:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 261722AA22 for ; Thu, 26 Jul 2018 00:19:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1981D2AA30; Thu, 26 Jul 2018 00:19:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 57CC32AA22 for ; Thu, 26 Jul 2018 00:19:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728369AbeGZBdq (ORCPT ); Wed, 25 Jul 2018 21:33:46 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:44232 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbeGZBdq (ORCPT ); Wed, 25 Jul 2018 21:33:46 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0EmQr166679; Thu, 26 Jul 2018 00:19:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=EpuLO+zo1TnAA78sPR9flrGnL3R/iAxoeJBdSwKIMvI=; b=tCC7BkKsvepRU8yfd5FofyARl+Eeam5gf6SAgIAUuoCTgIt+6lqzpnm2NTRd5D+LCskE PXDDVyo40jWw3n/siQ7x00pZyx08GyYlHCiE3Curof26CBNls/+wZfTGpi3CVmytCCp0 23fT1GuXHilnxpv23LRTbm2JdwBWKUxkc32DrAGcUITu1yFsOexhNgKTjLjfQRplt8Xj sDNHSPtNjlpFfogR2SVEambkRgG7jpQ/syBQQelYiR6sDVZtlfQ5lmhTbomgASaS5/c1 G6D4nJxZXV8CiviV+qQkG67SycIrVSSBl1Nf/ptOe2PF3zgMb7Yrf0SS1Me0zfP6zQCW aQ== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2kbv8t7ppa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:19:36 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0Jag0012302 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:19:36 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0JZd1018708; Thu, 26 Jul 2018 00:19:36 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:19:35 -0700 Subject: [PATCH 01/16] xfs: pass transaction lock while setting up agresv on cyclic metadata From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:19:33 -0700 Message-ID: <153256437346.29021.16149947898612665862.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Pass a tranaction pointer through to all helpers that calculate the per-AG block reservation. Online repair will use this to reinitialize per-ag reservations while it still holds all the AG headers locked to the repair transaction. Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster --- fs/xfs/libxfs/xfs_ag_resv.c | 13 +++++++------ fs/xfs/libxfs/xfs_ag_resv.h | 2 +- fs/xfs/libxfs/xfs_ialloc_btree.c | 10 ++++++---- fs/xfs/libxfs/xfs_ialloc_btree.h | 4 ++-- fs/xfs/libxfs/xfs_refcount_btree.c | 5 +++-- fs/xfs/libxfs/xfs_refcount_btree.h | 3 ++- fs/xfs/libxfs/xfs_rmap_btree.c | 5 +++-- fs/xfs/libxfs/xfs_rmap_btree.h | 2 +- fs/xfs/xfs_fsops.c | 2 +- 9 files changed, 26 insertions(+), 20 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c index fecd187fcf2c..e701ebc36c06 100644 --- a/fs/xfs/libxfs/xfs_ag_resv.c +++ b/fs/xfs/libxfs/xfs_ag_resv.c @@ -248,7 +248,8 @@ __xfs_ag_resv_init( /* Create a per-AG block reservation. */ int xfs_ag_resv_init( - struct xfs_perag *pag) + struct xfs_perag *pag, + struct xfs_trans *tp) { struct xfs_mount *mp = pag->pag_mount; xfs_agnumber_t agno = pag->pag_agno; @@ -260,11 +261,11 @@ xfs_ag_resv_init( if (pag->pag_meta_resv.ar_asked == 0) { ask = used = 0; - error = xfs_refcountbt_calc_reserves(mp, agno, &ask, &used); + error = xfs_refcountbt_calc_reserves(mp, tp, agno, &ask, &used); if (error) goto out; - error = xfs_finobt_calc_reserves(mp, agno, &ask, &used); + error = xfs_finobt_calc_reserves(mp, tp, agno, &ask, &used); if (error) goto out; @@ -282,7 +283,7 @@ xfs_ag_resv_init( mp->m_inotbt_nores = true; - error = xfs_refcountbt_calc_reserves(mp, agno, &ask, + error = xfs_refcountbt_calc_reserves(mp, tp, agno, &ask, &used); if (error) goto out; @@ -298,7 +299,7 @@ xfs_ag_resv_init( if (pag->pag_rmapbt_resv.ar_asked == 0) { ask = used = 0; - error = xfs_rmapbt_calc_reserves(mp, agno, &ask, &used); + error = xfs_rmapbt_calc_reserves(mp, tp, agno, &ask, &used); if (error) goto out; @@ -309,7 +310,7 @@ xfs_ag_resv_init( #ifdef DEBUG /* need to read in the AGF for the ASSERT below to work */ - error = xfs_alloc_pagf_init(pag->pag_mount, NULL, pag->pag_agno, 0); + error = xfs_alloc_pagf_init(pag->pag_mount, tp, pag->pag_agno, 0); if (error) return error; diff --git a/fs/xfs/libxfs/xfs_ag_resv.h b/fs/xfs/libxfs/xfs_ag_resv.h index 4619b554ee90..d1005116b43b 100644 --- a/fs/xfs/libxfs/xfs_ag_resv.h +++ b/fs/xfs/libxfs/xfs_ag_resv.h @@ -7,7 +7,7 @@ #define __XFS_AG_RESV_H__ int xfs_ag_resv_free(struct xfs_perag *pag); -int xfs_ag_resv_init(struct xfs_perag *pag); +int xfs_ag_resv_init(struct xfs_perag *pag, struct xfs_trans *tp); bool xfs_ag_resv_critical(struct xfs_perag *pag, enum xfs_ag_resv_type type); xfs_extlen_t xfs_ag_resv_needed(struct xfs_perag *pag, diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c index 735a33252eb2..86c50208a143 100644 --- a/fs/xfs/libxfs/xfs_ialloc_btree.c +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c @@ -552,6 +552,7 @@ xfs_inobt_max_size( static int xfs_inobt_count_blocks( struct xfs_mount *mp, + struct xfs_trans *tp, xfs_agnumber_t agno, xfs_btnum_t btnum, xfs_extlen_t *tree_blocks) @@ -560,14 +561,14 @@ xfs_inobt_count_blocks( struct xfs_btree_cur *cur; int error; - error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp); + error = xfs_ialloc_read_agi(mp, tp, agno, &agbp); if (error) return error; - cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno, btnum); + cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, btnum); error = xfs_btree_count_blocks(cur, tree_blocks); xfs_btree_del_cursor(cur, error); - xfs_buf_relse(agbp); + xfs_trans_brelse(tp, agbp); return error; } @@ -578,6 +579,7 @@ xfs_inobt_count_blocks( int xfs_finobt_calc_reserves( struct xfs_mount *mp, + struct xfs_trans *tp, xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used) @@ -588,7 +590,7 @@ xfs_finobt_calc_reserves( if (!xfs_sb_version_hasfinobt(&mp->m_sb)) return 0; - error = xfs_inobt_count_blocks(mp, agno, XFS_BTNUM_FINO, &tree_len); + error = xfs_inobt_count_blocks(mp, tp, agno, XFS_BTNUM_FINO, &tree_len); if (error) return error; diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h index bf8f0c405e7d..ebdd0c6b8766 100644 --- a/fs/xfs/libxfs/xfs_ialloc_btree.h +++ b/fs/xfs/libxfs/xfs_ialloc_btree.h @@ -60,8 +60,8 @@ int xfs_inobt_rec_check_count(struct xfs_mount *, #define xfs_inobt_rec_check_count(mp, rec) 0 #endif /* DEBUG */ -int xfs_finobt_calc_reserves(struct xfs_mount *mp, xfs_agnumber_t agno, - xfs_extlen_t *ask, xfs_extlen_t *used); +int xfs_finobt_calc_reserves(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used); extern xfs_extlen_t xfs_iallocbt_calc_size(struct xfs_mount *mp, unsigned long long len); diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c index b71937982c5b..bcd65ee37260 100644 --- a/fs/xfs/libxfs/xfs_refcount_btree.c +++ b/fs/xfs/libxfs/xfs_refcount_btree.c @@ -408,6 +408,7 @@ xfs_refcountbt_max_size( int xfs_refcountbt_calc_reserves( struct xfs_mount *mp, + struct xfs_trans *tp, xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used) @@ -422,14 +423,14 @@ xfs_refcountbt_calc_reserves( return 0; - error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp); + error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp); if (error) return error; agf = XFS_BUF_TO_AGF(agbp); agblocks = be32_to_cpu(agf->agf_length); tree_len = be32_to_cpu(agf->agf_refcount_blocks); - xfs_buf_relse(agbp); + xfs_trans_brelse(tp, agbp); *ask += xfs_refcountbt_max_size(mp, agblocks); *used += tree_len; diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h index d2852b6e1fa8..c868394ac02e 100644 --- a/fs/xfs/libxfs/xfs_refcount_btree.h +++ b/fs/xfs/libxfs/xfs_refcount_btree.h @@ -55,6 +55,7 @@ extern xfs_extlen_t xfs_refcountbt_max_size(struct xfs_mount *mp, xfs_agblock_t agblocks); extern int xfs_refcountbt_calc_reserves(struct xfs_mount *mp, - xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used); + struct xfs_trans *tp, xfs_agnumber_t agno, xfs_extlen_t *ask, + xfs_extlen_t *used); #endif /* __XFS_REFCOUNT_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c index 221a88ea60bb..f79cf040d745 100644 --- a/fs/xfs/libxfs/xfs_rmap_btree.c +++ b/fs/xfs/libxfs/xfs_rmap_btree.c @@ -554,6 +554,7 @@ xfs_rmapbt_max_size( int xfs_rmapbt_calc_reserves( struct xfs_mount *mp, + struct xfs_trans *tp, xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used) @@ -567,14 +568,14 @@ xfs_rmapbt_calc_reserves( if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) return 0; - error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp); + error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp); if (error) return error; agf = XFS_BUF_TO_AGF(agbp); agblocks = be32_to_cpu(agf->agf_length); tree_len = be32_to_cpu(agf->agf_rmap_blocks); - xfs_buf_relse(agbp); + xfs_trans_brelse(tp, agbp); /* Reserve 1% of the AG or enough for 1 block per record. */ *ask += max(agblocks / 100, xfs_rmapbt_max_size(mp, agblocks)); diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h index 50198b6c3bb2..820d668b063d 100644 --- a/fs/xfs/libxfs/xfs_rmap_btree.h +++ b/fs/xfs/libxfs/xfs_rmap_btree.h @@ -51,7 +51,7 @@ extern xfs_extlen_t xfs_rmapbt_calc_size(struct xfs_mount *mp, extern xfs_extlen_t xfs_rmapbt_max_size(struct xfs_mount *mp, xfs_agblock_t agblocks); -extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp, +extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp, struct xfs_trans *tp, xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used); #endif /* __XFS_RMAP_BTREE_H__ */ diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 3f2bd6032cf8..7c00b8bedfe3 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -536,7 +536,7 @@ xfs_fs_reserve_ag_blocks( for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) { pag = xfs_perag_get(mp, agno); - err2 = xfs_ag_resv_init(pag); + err2 = xfs_ag_resv_init(pag, NULL); xfs_perag_put(pag); if (err2 && !error) error = err2; From patchwork Thu Jul 26 00:19:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544975 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F6A81822 for ; Thu, 26 Jul 2018 00:19:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5A95A2AA22 for ; Thu, 26 Jul 2018 00:19:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4EBD82AA30; Thu, 26 Jul 2018 00:19:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 31F732AA2D for ; Thu, 26 Jul 2018 00:19:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728499AbeGZBd4 (ORCPT ); Wed, 25 Jul 2018 21:33:56 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:55428 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728308AbeGZBd4 (ORCPT ); Wed, 25 Jul 2018 21:33:56 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0Dl8b167251; Thu, 26 Jul 2018 00:19:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=h/nwszkRV+ZYGpHWXyHEKzQk6jaGG8YRR558lbFYLys=; b=Mn+KKw4MvsiULOUnKCGpqQaelHDFjfnv8eIu8B/Dll3OE+8rnouFCE9ztsWkw7SwssD+ c/TUfgP7JyXQZBilrDeoXCAW+HV7nqzLKd5EU2d2J6IDYAmlUYOmj0CxoI8eQCywkqgx h+aQxzqh9oREyA8rC2LYj47vlU2iL9GjTeWt5bfx09hrMyqZgkNYp9H1l5VE7nwBbfIs gCrOodGoXww2pGoTvQbQ9+3NZOQXN3XWPHGCILI4Y6O0nxUk2MysrP8mvLlrI1lYE8nb N/OaGBsvdRilhYnW54hMWbdvdLFQkKLsDfgfWjTQXbF6bOqXNqW7cwmkVOff2OwtBrYc Vw== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2120.oracle.com with ESMTP id 2kbwfpynba-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:19:46 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0JjkG030280 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:19:45 GMT Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0Jijk018781; Thu, 26 Jul 2018 00:19:44 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:19:44 -0700 Subject: [PATCH 02/16] xfs: move the repair extent list into its own file From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:19:39 -0700 Message-ID: <153256437975.29021.14919845411774307079.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Move the xrep_extent_list code into a separate file. Logically, this data structure is really just a clumsy bitmap, and in the next patch we'll make this more obvious. No functional changes. Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster --- fs/xfs/Makefile | 1 fs/xfs/scrub/bitmap.c | 208 +++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/bitmap.h | 37 +++++++++ fs/xfs/scrub/repair.c | 194 ---------------------------------------------- fs/xfs/scrub/repair.h | 27 ------ 5 files changed, 248 insertions(+), 219 deletions(-) create mode 100644 fs/xfs/scrub/bitmap.c create mode 100644 fs/xfs/scrub/bitmap.h -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index a36cccbec169..57ec46951ede 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -164,6 +164,7 @@ xfs-$(CONFIG_XFS_QUOTA) += scrub/quota.o ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y) xfs-y += $(addprefix scrub/, \ agheader_repair.o \ + bitmap.o \ repair.o \ ) endif diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c new file mode 100644 index 000000000000..a7c2f4773f98 --- /dev/null +++ b/fs/xfs/scrub/bitmap.c @@ -0,0 +1,208 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" + +/* Collect a dead btree extent for later disposal. */ +int +xrep_collect_btree_extent( + struct xfs_scrub *sc, + struct xrep_extent_list *exlist, + xfs_fsblock_t fsbno, + xfs_extlen_t len) +{ + struct xrep_extent *rex; + + trace_xrep_collect_btree_extent(sc->mp, + XFS_FSB_TO_AGNO(sc->mp, fsbno), + XFS_FSB_TO_AGBNO(sc->mp, fsbno), len); + + rex = kmem_alloc(sizeof(struct xrep_extent), KM_MAYFAIL); + if (!rex) + return -ENOMEM; + + INIT_LIST_HEAD(&rex->list); + rex->fsbno = fsbno; + rex->len = len; + list_add_tail(&rex->list, &exlist->list); + + return 0; +} + +/* + * An error happened during the rebuild so the transaction will be cancelled. + * The fs will shut down, and the administrator has to unmount and run repair. + * Therefore, free all the memory associated with the list so we can die. + */ +void +xrep_cancel_btree_extents( + struct xfs_scrub *sc, + struct xrep_extent_list *exlist) +{ + struct xrep_extent *rex; + struct xrep_extent *n; + + for_each_xrep_extent_safe(rex, n, exlist) { + list_del(&rex->list); + kmem_free(rex); + } +} + +/* Compare two btree extents. */ +static int +xrep_btree_extent_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_extent *ap; + struct xrep_extent *bp; + + ap = container_of(a, struct xrep_extent, list); + bp = container_of(b, struct xrep_extent, list); + + if (ap->fsbno > bp->fsbno) + return 1; + if (ap->fsbno < bp->fsbno) + return -1; + return 0; +} + +/* + * Remove all the blocks mentioned in @sublist from the extents in @exlist. + * + * The intent is that callers will iterate the rmapbt for all of its records + * for a given owner to generate @exlist; and iterate all the blocks of the + * metadata structures that are not being rebuilt and have the same rmapbt + * owner to generate @sublist. This routine subtracts all the extents + * mentioned in sublist from all the extents linked in @exlist, which leaves + * @exlist as the list of blocks that are not accounted for, which we assume + * are the dead blocks of the old metadata structure. The blocks mentioned in + * @exlist can be reaped. + */ +#define LEFT_ALIGNED (1 << 0) +#define RIGHT_ALIGNED (1 << 1) +int +xrep_subtract_extents( + struct xfs_scrub *sc, + struct xrep_extent_list *exlist, + struct xrep_extent_list *sublist) +{ + struct list_head *lp; + struct xrep_extent *ex; + struct xrep_extent *newex; + struct xrep_extent *subex; + xfs_fsblock_t sub_fsb; + xfs_extlen_t sub_len; + int state; + int error = 0; + + if (list_empty(&exlist->list) || list_empty(&sublist->list)) + return 0; + ASSERT(!list_empty(&sublist->list)); + + list_sort(NULL, &exlist->list, xrep_btree_extent_cmp); + list_sort(NULL, &sublist->list, xrep_btree_extent_cmp); + + /* + * Now that we've sorted both lists, we iterate exlist once, rolling + * forward through sublist and/or exlist as necessary until we find an + * overlap or reach the end of either list. We do not reset lp to the + * head of exlist nor do we reset subex to the head of sublist. The + * list traversal is similar to merge sort, but we're deleting + * instead. In this manner we avoid O(n^2) operations. + */ + subex = list_first_entry(&sublist->list, struct xrep_extent, + list); + lp = exlist->list.next; + while (lp != &exlist->list) { + ex = list_entry(lp, struct xrep_extent, list); + + /* + * Advance subex and/or ex until we find a pair that + * intersect or we run out of extents. + */ + while (subex->fsbno + subex->len <= ex->fsbno) { + if (list_is_last(&subex->list, &sublist->list)) + goto out; + subex = list_next_entry(subex, list); + } + if (subex->fsbno >= ex->fsbno + ex->len) { + lp = lp->next; + continue; + } + + /* trim subex to fit the extent we have */ + sub_fsb = subex->fsbno; + sub_len = subex->len; + if (subex->fsbno < ex->fsbno) { + sub_len -= ex->fsbno - subex->fsbno; + sub_fsb = ex->fsbno; + } + if (sub_len > ex->len) + sub_len = ex->len; + + state = 0; + if (sub_fsb == ex->fsbno) + state |= LEFT_ALIGNED; + if (sub_fsb + sub_len == ex->fsbno + ex->len) + state |= RIGHT_ALIGNED; + switch (state) { + case LEFT_ALIGNED: + /* Coincides with only the left. */ + ex->fsbno += sub_len; + ex->len -= sub_len; + break; + case RIGHT_ALIGNED: + /* Coincides with only the right. */ + ex->len -= sub_len; + lp = lp->next; + break; + case LEFT_ALIGNED | RIGHT_ALIGNED: + /* Total overlap, just delete ex. */ + lp = lp->next; + list_del(&ex->list); + kmem_free(ex); + break; + case 0: + /* + * Deleting from the middle: add the new right extent + * and then shrink the left extent. + */ + newex = kmem_alloc(sizeof(struct xrep_extent), + KM_MAYFAIL); + if (!newex) { + error = -ENOMEM; + goto out; + } + INIT_LIST_HEAD(&newex->list); + newex->fsbno = sub_fsb + sub_len; + newex->len = ex->fsbno + ex->len - newex->fsbno; + list_add(&newex->list, &ex->list); + ex->len = sub_fsb - ex->fsbno; + lp = lp->next; + break; + default: + ASSERT(0); + break; + } + } + +out: + return error; +} +#undef LEFT_ALIGNED +#undef RIGHT_ALIGNED diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h new file mode 100644 index 000000000000..1038157695a8 --- /dev/null +++ b/fs/xfs/scrub/bitmap.h @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_SCRUB_BITMAP_H__ +#define __XFS_SCRUB_BITMAP_H__ + +struct xrep_extent { + struct list_head list; + xfs_fsblock_t fsbno; + xfs_extlen_t len; +}; + +struct xrep_extent_list { + struct list_head list; +}; + +static inline void +xrep_init_extent_list( + struct xrep_extent_list *exlist) +{ + INIT_LIST_HEAD(&exlist->list); +} + +#define for_each_xrep_extent_safe(rbe, n, exlist) \ + list_for_each_entry_safe((rbe), (n), &(exlist)->list, list) +int xrep_collect_btree_extent(struct xfs_scrub *sc, + struct xrep_extent_list *btlist, xfs_fsblock_t fsbno, + xfs_extlen_t len); +void xrep_cancel_btree_extents(struct xfs_scrub *sc, + struct xrep_extent_list *btlist); +int xrep_subtract_extents(struct xfs_scrub *sc, + struct xrep_extent_list *exlist, + struct xrep_extent_list *sublist); + +#endif /* __XFS_SCRUB_BITMAP_H__ */ diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 5de1cac424ec..27a904ef6189 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -34,6 +34,7 @@ #include "scrub/common.h" #include "scrub/trace.h" #include "scrub/repair.h" +#include "scrub/bitmap.h" /* * Attempt to repair some metadata, if the metadata is corrupt and userspace @@ -380,200 +381,7 @@ xrep_init_btblock( * sublist. As with the other btrees we subtract sublist from exlist, and the * result (since the rmapbt lives in the free space) are the blocks from the * old rmapbt. - */ - -/* Collect a dead btree extent for later disposal. */ -int -xrep_collect_btree_extent( - struct xfs_scrub *sc, - struct xrep_extent_list *exlist, - xfs_fsblock_t fsbno, - xfs_extlen_t len) -{ - struct xrep_extent *rex; - - trace_xrep_collect_btree_extent(sc->mp, - XFS_FSB_TO_AGNO(sc->mp, fsbno), - XFS_FSB_TO_AGBNO(sc->mp, fsbno), len); - - rex = kmem_alloc(sizeof(struct xrep_extent), KM_MAYFAIL); - if (!rex) - return -ENOMEM; - - INIT_LIST_HEAD(&rex->list); - rex->fsbno = fsbno; - rex->len = len; - list_add_tail(&rex->list, &exlist->list); - - return 0; -} - -/* - * An error happened during the rebuild so the transaction will be cancelled. - * The fs will shut down, and the administrator has to unmount and run repair. - * Therefore, free all the memory associated with the list so we can die. - */ -void -xrep_cancel_btree_extents( - struct xfs_scrub *sc, - struct xrep_extent_list *exlist) -{ - struct xrep_extent *rex; - struct xrep_extent *n; - - for_each_xrep_extent_safe(rex, n, exlist) { - list_del(&rex->list); - kmem_free(rex); - } -} - -/* Compare two btree extents. */ -static int -xrep_btree_extent_cmp( - void *priv, - struct list_head *a, - struct list_head *b) -{ - struct xrep_extent *ap; - struct xrep_extent *bp; - - ap = container_of(a, struct xrep_extent, list); - bp = container_of(b, struct xrep_extent, list); - - if (ap->fsbno > bp->fsbno) - return 1; - if (ap->fsbno < bp->fsbno) - return -1; - return 0; -} - -/* - * Remove all the blocks mentioned in @sublist from the extents in @exlist. * - * The intent is that callers will iterate the rmapbt for all of its records - * for a given owner to generate @exlist; and iterate all the blocks of the - * metadata structures that are not being rebuilt and have the same rmapbt - * owner to generate @sublist. This routine subtracts all the extents - * mentioned in sublist from all the extents linked in @exlist, which leaves - * @exlist as the list of blocks that are not accounted for, which we assume - * are the dead blocks of the old metadata structure. The blocks mentioned in - * @exlist can be reaped. - */ -#define LEFT_ALIGNED (1 << 0) -#define RIGHT_ALIGNED (1 << 1) -int -xrep_subtract_extents( - struct xfs_scrub *sc, - struct xrep_extent_list *exlist, - struct xrep_extent_list *sublist) -{ - struct list_head *lp; - struct xrep_extent *ex; - struct xrep_extent *newex; - struct xrep_extent *subex; - xfs_fsblock_t sub_fsb; - xfs_extlen_t sub_len; - int state; - int error = 0; - - if (list_empty(&exlist->list) || list_empty(&sublist->list)) - return 0; - ASSERT(!list_empty(&sublist->list)); - - list_sort(NULL, &exlist->list, xrep_btree_extent_cmp); - list_sort(NULL, &sublist->list, xrep_btree_extent_cmp); - - /* - * Now that we've sorted both lists, we iterate exlist once, rolling - * forward through sublist and/or exlist as necessary until we find an - * overlap or reach the end of either list. We do not reset lp to the - * head of exlist nor do we reset subex to the head of sublist. The - * list traversal is similar to merge sort, but we're deleting - * instead. In this manner we avoid O(n^2) operations. - */ - subex = list_first_entry(&sublist->list, struct xrep_extent, - list); - lp = exlist->list.next; - while (lp != &exlist->list) { - ex = list_entry(lp, struct xrep_extent, list); - - /* - * Advance subex and/or ex until we find a pair that - * intersect or we run out of extents. - */ - while (subex->fsbno + subex->len <= ex->fsbno) { - if (list_is_last(&subex->list, &sublist->list)) - goto out; - subex = list_next_entry(subex, list); - } - if (subex->fsbno >= ex->fsbno + ex->len) { - lp = lp->next; - continue; - } - - /* trim subex to fit the extent we have */ - sub_fsb = subex->fsbno; - sub_len = subex->len; - if (subex->fsbno < ex->fsbno) { - sub_len -= ex->fsbno - subex->fsbno; - sub_fsb = ex->fsbno; - } - if (sub_len > ex->len) - sub_len = ex->len; - - state = 0; - if (sub_fsb == ex->fsbno) - state |= LEFT_ALIGNED; - if (sub_fsb + sub_len == ex->fsbno + ex->len) - state |= RIGHT_ALIGNED; - switch (state) { - case LEFT_ALIGNED: - /* Coincides with only the left. */ - ex->fsbno += sub_len; - ex->len -= sub_len; - break; - case RIGHT_ALIGNED: - /* Coincides with only the right. */ - ex->len -= sub_len; - lp = lp->next; - break; - case LEFT_ALIGNED | RIGHT_ALIGNED: - /* Total overlap, just delete ex. */ - lp = lp->next; - list_del(&ex->list); - kmem_free(ex); - break; - case 0: - /* - * Deleting from the middle: add the new right extent - * and then shrink the left extent. - */ - newex = kmem_alloc(sizeof(struct xrep_extent), - KM_MAYFAIL); - if (!newex) { - error = -ENOMEM; - goto out; - } - INIT_LIST_HEAD(&newex->list); - newex->fsbno = sub_fsb + sub_len; - newex->len = ex->fsbno + ex->len - newex->fsbno; - list_add(&newex->list, &ex->list); - ex->len = sub_fsb - ex->fsbno; - lp = lp->next; - break; - default: - ASSERT(0); - break; - } - } - -out: - return error; -} -#undef LEFT_ALIGNED -#undef RIGHT_ALIGNED - -/* * Disposal of Blocks from Old per-AG Btrees * * Now that we've constructed a new btree to replace the damaged one, we want diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 91355f6b0087..a3d491a438f4 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -27,33 +27,8 @@ int xrep_init_btblock(struct xfs_scrub *sc, xfs_fsblock_t fsb, struct xfs_buf **bpp, xfs_btnum_t btnum, const struct xfs_buf_ops *ops); -struct xrep_extent { - struct list_head list; - xfs_fsblock_t fsbno; - xfs_extlen_t len; -}; - -struct xrep_extent_list { - struct list_head list; -}; - -static inline void -xrep_init_extent_list( - struct xrep_extent_list *exlist) -{ - INIT_LIST_HEAD(&exlist->list); -} +struct xrep_extent_list; -#define for_each_xrep_extent_safe(rbe, n, exlist) \ - list_for_each_entry_safe((rbe), (n), &(exlist)->list, list) -int xrep_collect_btree_extent(struct xfs_scrub *sc, - struct xrep_extent_list *btlist, xfs_fsblock_t fsbno, - xfs_extlen_t len); -void xrep_cancel_btree_extents(struct xfs_scrub *sc, - struct xrep_extent_list *btlist); -int xrep_subtract_extents(struct xfs_scrub *sc, - struct xrep_extent_list *exlist, - struct xrep_extent_list *sublist); int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink); int xrep_invalidate_blocks(struct xfs_scrub *sc, struct xrep_extent_list *btlist); From patchwork Thu Jul 26 00:19:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544977 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2495E1822 for ; Thu, 26 Jul 2018 00:20:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 105F22AA22 for ; Thu, 26 Jul 2018 00:20:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 048922AA30; Thu, 26 Jul 2018 00:20:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C3BB72AA22 for ; Thu, 26 Jul 2018 00:19:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728308AbeGZBeF (ORCPT ); Wed, 25 Jul 2018 21:34:05 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:40518 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728268AbeGZBeF (ORCPT ); Wed, 25 Jul 2018 21:34:05 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0EHIT191980; Thu, 26 Jul 2018 00:19:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=eZDAJucT6+JZD/sTF7QfMLKZ850AT8XxsEKdYhXmG0k=; b=qgw+gjY4Ncd6IPFdDkipBaRrj2yzswbVMejURXCv/Ygij6g/yzf2Y41b2zrJJnQyfDKt 9g4pQK2PeU13a37RbiXmbJWJxVtNemtvxFsludTUWHDv5+fRS/heDR3+jJaO8cFbUZFg /vdHnKwYJ1XTYNbGJ8R7EYLGDuw+Smtg2BMwoAaL72cnsoU/SoF9ejn8P6mrdD2AxmWY mshILmQeakpUQtRKOhTbAEvFJs4hwbLoy/pzhtIdTJqOuWcVtVo+j8MgZbBFH8IMf3xN Rtqft0geMLZoJXx8Y+plnF1g3ooYr9sHgD3mIeahEdNh9QnRt03vKBuwi7lRZAHxPaFQ 5w== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2kbtbcysy8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:19:52 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0Jpwa003828 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:19:51 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0JpxE032584; Thu, 26 Jul 2018 00:19:51 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:19:50 -0700 Subject: [PATCH 03/16] xfs: refactor the xrep_extent_list into xfs_bitmap From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:19:48 -0700 Message-ID: <153256438856.29021.17191745119148616451.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong As mentioned previously, the xrep_extent_list basically implements a bitmap with two functions: set and disjoint union. Rename all these functions to xfs_bitmap to shorten the name and make it more obvious what we're doing. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/bitmap.c | 173 +++++++++++++++++++++++++------------------------ fs/xfs/scrub/bitmap.h | 34 ++++------ fs/xfs/scrub/repair.c | 85 +++++++++++------------- fs/xfs/scrub/repair.h | 8 +- fs/xfs/scrub/trace.h | 1 5 files changed, 144 insertions(+), 157 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c index a7c2f4773f98..4840f5a1e179 100644 --- a/fs/xfs/scrub/bitmap.c +++ b/fs/xfs/scrub/bitmap.c @@ -16,51 +16,53 @@ #include "scrub/repair.h" #include "scrub/bitmap.h" -/* Collect a dead btree extent for later disposal. */ +/* + * Set a range of this bitmap. Caller must ensure the range is not set. + * + * This is the logical equivalent of bitmap |= mask(fsbno, len). + */ int -xrep_collect_btree_extent( - struct xfs_scrub *sc, - struct xrep_extent_list *exlist, +xfs_bitmap_set( + struct xfs_bitmap *bitmap, xfs_fsblock_t fsbno, - xfs_extlen_t len) + xfs_fsblock_t len) { - struct xrep_extent *rex; + struct xfs_bitmap_range *bmr; - trace_xrep_collect_btree_extent(sc->mp, - XFS_FSB_TO_AGNO(sc->mp, fsbno), - XFS_FSB_TO_AGBNO(sc->mp, fsbno), len); - - rex = kmem_alloc(sizeof(struct xrep_extent), KM_MAYFAIL); - if (!rex) + bmr = kmem_alloc(sizeof(struct xfs_bitmap_range), KM_MAYFAIL); + if (!bmr) return -ENOMEM; - INIT_LIST_HEAD(&rex->list); - rex->fsbno = fsbno; - rex->len = len; - list_add_tail(&rex->list, &exlist->list); + INIT_LIST_HEAD(&bmr->list); + bmr->fsbno = fsbno; + bmr->len = len; + list_add_tail(&bmr->list, &bitmap->list); return 0; } -/* - * An error happened during the rebuild so the transaction will be cancelled. - * The fs will shut down, and the administrator has to unmount and run repair. - * Therefore, free all the memory associated with the list so we can die. - */ +/* Free everything related to this bitmap. */ void -xrep_cancel_btree_extents( - struct xfs_scrub *sc, - struct xrep_extent_list *exlist) +xfs_bitmap_destroy( + struct xfs_bitmap *bitmap) { - struct xrep_extent *rex; - struct xrep_extent *n; + struct xfs_bitmap_range *bmr; + struct xfs_bitmap_range *n; - for_each_xrep_extent_safe(rex, n, exlist) { - list_del(&rex->list); - kmem_free(rex); + for_each_xfs_bitmap_extent(bmr, n, bitmap) { + list_del(&bmr->list); + kmem_free(bmr); } } +/* Set up a per-AG block bitmap. */ +void +xfs_bitmap_init( + struct xfs_bitmap *bitmap) +{ + INIT_LIST_HEAD(&bitmap->list); +} + /* Compare two btree extents. */ static int xrep_btree_extent_cmp( @@ -68,11 +70,11 @@ xrep_btree_extent_cmp( struct list_head *a, struct list_head *b) { - struct xrep_extent *ap; - struct xrep_extent *bp; + struct xfs_bitmap_range *ap; + struct xfs_bitmap_range *bp; - ap = container_of(a, struct xrep_extent, list); - bp = container_of(b, struct xrep_extent, list); + ap = container_of(a, struct xfs_bitmap_range, list); + bp = container_of(b, struct xfs_bitmap_range, list); if (ap->fsbno > bp->fsbno) return 1; @@ -82,117 +84,118 @@ xrep_btree_extent_cmp( } /* - * Remove all the blocks mentioned in @sublist from the extents in @exlist. + * Remove all the blocks mentioned in @sub from the extents in @bitmap. * * The intent is that callers will iterate the rmapbt for all of its records - * for a given owner to generate @exlist; and iterate all the blocks of the + * for a given owner to generate @bitmap; and iterate all the blocks of the * metadata structures that are not being rebuilt and have the same rmapbt - * owner to generate @sublist. This routine subtracts all the extents - * mentioned in sublist from all the extents linked in @exlist, which leaves - * @exlist as the list of blocks that are not accounted for, which we assume + * owner to generate @sub. This routine subtracts all the extents + * mentioned in sub from all the extents linked in @bitmap, which leaves + * @bitmap as the list of blocks that are not accounted for, which we assume * are the dead blocks of the old metadata structure. The blocks mentioned in - * @exlist can be reaped. + * @bitmap can be reaped. + * + * This is the logical equivalent of bitmap &= ~sub. */ #define LEFT_ALIGNED (1 << 0) #define RIGHT_ALIGNED (1 << 1) int -xrep_subtract_extents( - struct xfs_scrub *sc, - struct xrep_extent_list *exlist, - struct xrep_extent_list *sublist) +xfs_bitmap_disunion( + struct xfs_bitmap *bitmap, + struct xfs_bitmap *sub) { struct list_head *lp; - struct xrep_extent *ex; - struct xrep_extent *newex; - struct xrep_extent *subex; + struct xfs_bitmap_range *br; + struct xfs_bitmap_range *new_br; + struct xfs_bitmap_range *sub_br; xfs_fsblock_t sub_fsb; - xfs_extlen_t sub_len; + xfs_fsblock_t sub_len; int state; int error = 0; - if (list_empty(&exlist->list) || list_empty(&sublist->list)) + if (list_empty(&bitmap->list) || list_empty(&sub->list)) return 0; - ASSERT(!list_empty(&sublist->list)); + ASSERT(!list_empty(&sub->list)); - list_sort(NULL, &exlist->list, xrep_btree_extent_cmp); - list_sort(NULL, &sublist->list, xrep_btree_extent_cmp); + list_sort(NULL, &bitmap->list, xrep_btree_extent_cmp); + list_sort(NULL, &sub->list, xrep_btree_extent_cmp); /* - * Now that we've sorted both lists, we iterate exlist once, rolling - * forward through sublist and/or exlist as necessary until we find an + * Now that we've sorted both lists, we iterate bitmap once, rolling + * forward through sub and/or bitmap as necessary until we find an * overlap or reach the end of either list. We do not reset lp to the - * head of exlist nor do we reset subex to the head of sublist. The + * head of bitmap nor do we reset sub_br to the head of sub. The * list traversal is similar to merge sort, but we're deleting * instead. In this manner we avoid O(n^2) operations. */ - subex = list_first_entry(&sublist->list, struct xrep_extent, + sub_br = list_first_entry(&sub->list, struct xfs_bitmap_range, list); - lp = exlist->list.next; - while (lp != &exlist->list) { - ex = list_entry(lp, struct xrep_extent, list); + lp = bitmap->list.next; + while (lp != &bitmap->list) { + br = list_entry(lp, struct xfs_bitmap_range, list); /* - * Advance subex and/or ex until we find a pair that + * Advance sub_br and/or br until we find a pair that * intersect or we run out of extents. */ - while (subex->fsbno + subex->len <= ex->fsbno) { - if (list_is_last(&subex->list, &sublist->list)) + while (sub_br->fsbno + sub_br->len <= br->fsbno) { + if (list_is_last(&sub_br->list, &sub->list)) goto out; - subex = list_next_entry(subex, list); + sub_br = list_next_entry(sub_br, list); } - if (subex->fsbno >= ex->fsbno + ex->len) { + if (sub_br->fsbno >= br->fsbno + br->len) { lp = lp->next; continue; } - /* trim subex to fit the extent we have */ - sub_fsb = subex->fsbno; - sub_len = subex->len; - if (subex->fsbno < ex->fsbno) { - sub_len -= ex->fsbno - subex->fsbno; - sub_fsb = ex->fsbno; + /* trim sub_br to fit the extent we have */ + sub_fsb = sub_br->fsbno; + sub_len = sub_br->len; + if (sub_br->fsbno < br->fsbno) { + sub_len -= br->fsbno - sub_br->fsbno; + sub_fsb = br->fsbno; } - if (sub_len > ex->len) - sub_len = ex->len; + if (sub_len > br->len) + sub_len = br->len; state = 0; - if (sub_fsb == ex->fsbno) + if (sub_fsb == br->fsbno) state |= LEFT_ALIGNED; - if (sub_fsb + sub_len == ex->fsbno + ex->len) + if (sub_fsb + sub_len == br->fsbno + br->len) state |= RIGHT_ALIGNED; switch (state) { case LEFT_ALIGNED: /* Coincides with only the left. */ - ex->fsbno += sub_len; - ex->len -= sub_len; + br->fsbno += sub_len; + br->len -= sub_len; break; case RIGHT_ALIGNED: /* Coincides with only the right. */ - ex->len -= sub_len; + br->len -= sub_len; lp = lp->next; break; case LEFT_ALIGNED | RIGHT_ALIGNED: /* Total overlap, just delete ex. */ lp = lp->next; - list_del(&ex->list); - kmem_free(ex); + list_del(&br->list); + kmem_free(br); break; case 0: /* * Deleting from the middle: add the new right extent * and then shrink the left extent. */ - newex = kmem_alloc(sizeof(struct xrep_extent), + new_br = kmem_alloc(sizeof(struct xfs_bitmap_range), KM_MAYFAIL); - if (!newex) { + if (!new_br) { error = -ENOMEM; goto out; } - INIT_LIST_HEAD(&newex->list); - newex->fsbno = sub_fsb + sub_len; - newex->len = ex->fsbno + ex->len - newex->fsbno; - list_add(&newex->list, &ex->list); - ex->len = sub_fsb - ex->fsbno; + INIT_LIST_HEAD(&new_br->list); + new_br->fsbno = sub_fsb + sub_len; + new_br->len = br->fsbno + br->len - new_br->fsbno; + list_add(&new_br->list, &br->list); + br->len = sub_fsb - br->fsbno; lp = lp->next; break; default: diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h index 1038157695a8..3c39900e9269 100644 --- a/fs/xfs/scrub/bitmap.h +++ b/fs/xfs/scrub/bitmap.h @@ -6,32 +6,28 @@ #ifndef __XFS_SCRUB_BITMAP_H__ #define __XFS_SCRUB_BITMAP_H__ -struct xrep_extent { +struct xfs_bitmap_range { struct list_head list; xfs_fsblock_t fsbno; - xfs_extlen_t len; + xfs_fsblock_t len; }; -struct xrep_extent_list { +struct xfs_bitmap { struct list_head list; }; -static inline void -xrep_init_extent_list( - struct xrep_extent_list *exlist) -{ - INIT_LIST_HEAD(&exlist->list); -} +void xfs_bitmap_init(struct xfs_bitmap *bitmap); +void xfs_bitmap_destroy(struct xfs_bitmap *bitmap); -#define for_each_xrep_extent_safe(rbe, n, exlist) \ - list_for_each_entry_safe((rbe), (n), &(exlist)->list, list) -int xrep_collect_btree_extent(struct xfs_scrub *sc, - struct xrep_extent_list *btlist, xfs_fsblock_t fsbno, - xfs_extlen_t len); -void xrep_cancel_btree_extents(struct xfs_scrub *sc, - struct xrep_extent_list *btlist); -int xrep_subtract_extents(struct xfs_scrub *sc, - struct xrep_extent_list *exlist, - struct xrep_extent_list *sublist); +#define for_each_xfs_bitmap_extent(bex, n, bitmap) \ + list_for_each_entry_safe((bex), (n), &(bitmap)->list, list) + +#define for_each_xfs_bitmap_block(fsbno, bex, n, bitmap) \ + list_for_each_entry_safe((bex), (n), &(bitmap)->list, list) \ + for (fsbno = bex->fsbno; fsbno < bex->fsbno + bex->len; fsbno++) + +int xfs_bitmap_set(struct xfs_bitmap *bitmap, xfs_fsblock_t fsbno, + xfs_fsblock_t len); +int xfs_bitmap_disunion(struct xfs_bitmap *bitmap, struct xfs_bitmap *sub); #endif /* __XFS_SCRUB_BITMAP_H__ */ diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 27a904ef6189..85b048b341a0 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -368,17 +368,17 @@ xrep_init_btblock( * * However, that leaves the matter of removing all the metadata describing the * old broken structure. For primary metadata we use the rmap data to collect - * every extent with a matching rmap owner (exlist); we then iterate all other + * every extent with a matching rmap owner (bitmap); we then iterate all other * metadata structures with the same rmap owner to collect the extents that - * cannot be removed (sublist). We then subtract sublist from exlist to + * cannot be removed (sublist). We then subtract sublist from bitmap to * derive the blocks that were used by the old btree. These blocks can be * reaped. * * For rmapbt reconstructions we must use different tactics for extent * collection. First we iterate all primary metadata (this excludes the old * rmapbt, obviously) to generate new rmap records. The gaps in the rmap - * records are collected as exlist. The bnobt records are collected as - * sublist. As with the other btrees we subtract sublist from exlist, and the + * records are collected as bitmap. The bnobt records are collected as + * sublist. As with the other btrees we subtract sublist from bitmap, and the * result (since the rmapbt lives in the free space) are the blocks from the * old rmapbt. * @@ -386,11 +386,11 @@ xrep_init_btblock( * * Now that we've constructed a new btree to replace the damaged one, we want * to dispose of the blocks that (we think) the old btree was using. - * Previously, we used the rmapbt to collect the extents (exlist) with the + * Previously, we used the rmapbt to collect the extents (bitmap) with the * rmap owner corresponding to the tree we rebuilt, collected extents for any * blocks with the same rmap owner that are owned by another data structure - * (sublist), and subtracted sublist from exlist. In theory the extents - * remaining in exlist are the old btree's blocks. + * (sublist), and subtracted sublist from bitmap. In theory the extents + * remaining in bitmap are the old btree's blocks. * * Unfortunately, it's possible that the btree was crosslinked with other * blocks on disk. The rmap data can tell us if there are multiple owners, so @@ -406,7 +406,7 @@ xrep_init_btblock( * If there are no rmap records at all, we also free the block. If the btree * being rebuilt lives in the free space (bnobt/cntbt/rmapbt) then there isn't * supposed to be a rmap record and everything is ok. For other btrees there - * had to have been an rmap entry for the block to have ended up on @exlist, + * had to have been an rmap entry for the block to have ended up on @bitmap, * so if it's gone now there's something wrong and the fs will shut down. * * Note: If there are multiple rmap records with only the same rmap owner as @@ -419,7 +419,7 @@ xrep_init_btblock( * The caller is responsible for locking the AG headers for the entire rebuild * operation so that nothing else can sneak in and change the AG state while * we're not looking. We also assume that the caller already invalidated any - * buffers associated with @exlist. + * buffers associated with @bitmap. */ /* @@ -429,13 +429,12 @@ xrep_init_btblock( int xrep_invalidate_blocks( struct xfs_scrub *sc, - struct xrep_extent_list *exlist) + struct xfs_bitmap *bitmap) { - struct xrep_extent *rex; - struct xrep_extent *n; + struct xfs_bitmap_range *bmr; + struct xfs_bitmap_range *n; struct xfs_buf *bp; xfs_fsblock_t fsbno; - xfs_agblock_t i; /* * For each block in each extent, see if there's an incore buffer for @@ -445,18 +444,16 @@ xrep_invalidate_blocks( * because we never own those; and if we can't TRYLOCK the buffer we * assume it's owned by someone else. */ - for_each_xrep_extent_safe(rex, n, exlist) { - for (fsbno = rex->fsbno, i = rex->len; i > 0; fsbno++, i--) { - /* Skip AG headers and post-EOFS blocks */ - if (!xfs_verify_fsbno(sc->mp, fsbno)) - continue; - bp = xfs_buf_incore(sc->mp->m_ddev_targp, - XFS_FSB_TO_DADDR(sc->mp, fsbno), - XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK); - if (bp) { - xfs_trans_bjoin(sc->tp, bp); - xfs_trans_binval(sc->tp, bp); - } + for_each_xfs_bitmap_block(fsbno, bmr, n, bitmap) { + /* Skip AG headers and post-EOFS blocks */ + if (!xfs_verify_fsbno(sc->mp, fsbno)) + continue; + bp = xfs_buf_incore(sc->mp->m_ddev_targp, + XFS_FSB_TO_DADDR(sc->mp, fsbno), + XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK); + if (bp) { + xfs_trans_bjoin(sc->tp, bp); + xfs_trans_binval(sc->tp, bp); } } @@ -519,9 +516,9 @@ xrep_put_freelist( return 0; } -/* Dispose of a single metadata block. */ +/* Dispose of a single block. */ STATIC int -xrep_dispose_btree_block( +xrep_reap_block( struct xfs_scrub *sc, xfs_fsblock_t fsbno, struct xfs_owner_info *oinfo, @@ -593,41 +590,35 @@ xrep_dispose_btree_block( return error; } -/* Dispose of btree blocks from an old per-AG btree. */ +/* Dispose of every block of every extent in the bitmap. */ int -xrep_reap_btree_extents( +xrep_reap_extents( struct xfs_scrub *sc, - struct xrep_extent_list *exlist, + struct xfs_bitmap *bitmap, struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type) { - struct xrep_extent *rex; - struct xrep_extent *n; + struct xfs_bitmap_range *bmr; + struct xfs_bitmap_range *n; + xfs_fsblock_t fsbno; int error = 0; ASSERT(xfs_sb_version_hasrmapbt(&sc->mp->m_sb)); - /* Dispose of every block from the old btree. */ - for_each_xrep_extent_safe(rex, n, exlist) { + for_each_xfs_bitmap_block(fsbno, bmr, n, bitmap) { ASSERT(sc->ip != NULL || - XFS_FSB_TO_AGNO(sc->mp, rex->fsbno) == sc->sa.agno); - + XFS_FSB_TO_AGNO(sc->mp, fsbno) == sc->sa.agno); trace_xrep_dispose_btree_extent(sc->mp, - XFS_FSB_TO_AGNO(sc->mp, rex->fsbno), - XFS_FSB_TO_AGBNO(sc->mp, rex->fsbno), rex->len); + XFS_FSB_TO_AGNO(sc->mp, fsbno), + XFS_FSB_TO_AGBNO(sc->mp, fsbno), 1); - for (; rex->len > 0; rex->len--, rex->fsbno++) { - error = xrep_dispose_btree_block(sc, rex->fsbno, - oinfo, type); - if (error) - goto out; - } - list_del(&rex->list); - kmem_free(rex); + error = xrep_reap_block(sc, fsbno, oinfo, type); + if (error) + goto out; } out: - xrep_cancel_btree_extents(sc, exlist); + xfs_bitmap_destroy(bitmap); return error; } diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index a3d491a438f4..5a4e92221916 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -27,13 +27,11 @@ int xrep_init_btblock(struct xfs_scrub *sc, xfs_fsblock_t fsb, struct xfs_buf **bpp, xfs_btnum_t btnum, const struct xfs_buf_ops *ops); -struct xrep_extent_list; +struct xfs_bitmap; int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink); -int xrep_invalidate_blocks(struct xfs_scrub *sc, - struct xrep_extent_list *btlist); -int xrep_reap_btree_extents(struct xfs_scrub *sc, - struct xrep_extent_list *exlist, +int xrep_invalidate_blocks(struct xfs_scrub *sc, struct xfs_bitmap *btlist); +int xrep_reap_extents(struct xfs_scrub *sc, struct xfs_bitmap *exlist, struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type); struct xrep_find_ag_btree { diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 93db22c39b51..4e20f0e48232 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -511,7 +511,6 @@ DEFINE_EVENT(xrep_extent_class, name, \ xfs_agblock_t agbno, xfs_extlen_t len), \ TP_ARGS(mp, agno, agbno, len)) DEFINE_REPAIR_EXTENT_EVENT(xrep_dispose_btree_extent); -DEFINE_REPAIR_EXTENT_EVENT(xrep_collect_btree_extent); DEFINE_REPAIR_EXTENT_EVENT(xrep_agfl_insert); DECLARE_EVENT_CLASS(xrep_rmap_class, From patchwork Thu Jul 26 00:19:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544979 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EEA701822 for ; Thu, 26 Jul 2018 00:20:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA0652AA33 for ; Thu, 26 Jul 2018 00:20:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CDE7F2AA60; Thu, 26 Jul 2018 00:20:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CCC772AA33 for ; Thu, 26 Jul 2018 00:20:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728369AbeGZBeK (ORCPT ); Wed, 25 Jul 2018 21:34:10 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:44446 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728268AbeGZBeK (ORCPT ); Wed, 25 Jul 2018 21:34:10 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0DptM166216; Thu, 26 Jul 2018 00:19:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=M+TICwW/5tOtaCvZH9pS486QwmZj357NPPY+jnyeY2Y=; b=ZwOB06W1QnXYN3zi09FH3b/Xh3oTvtwTzbUQkzTW709Luo/wDK1lurfNcII402o764Yg R7nnp1KML7w2SkkwHANLMmdR8BQbGkBfi8bNTmBC6TWe4wsM2ikVcXlbjciT7QJoIRw2 JnoTdOTCUpbhvQ36DYfbpCPQiF+5cXiGrobkDM62W50oLt8PHyWucIigYE96sh2v5YbW zyr6yPt4W1Wa4+YZJ8l1RQpvsvi9f3i2Ubcg+7Dxs7V0MVjAw/r3lPLi5O6Ft5hvjPDz Mu+m4krZgiTuSLioVILENz0GRaIJ9WVr3rHUOTkQQSLsDbX+5jsRys7XAvNExgmBcs65 dA== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2kbv8t7pps-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:19:59 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0Jvp0030588 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:19:58 GMT Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6Q0JvcU008636; Thu, 26 Jul 2018 00:19:57 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:19:57 -0700 Subject: [PATCH 04/16] xfs: repair the AGF From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:19:55 -0700 Message-ID: <153256439500.29021.5265670898665301362.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Regenerate the AGF from the rmap data. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/agheader_repair.c | 366 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.c | 27 ++- fs/xfs/scrub/repair.h | 4 fs/xfs/scrub/scrub.c | 2 4 files changed, 389 insertions(+), 10 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c index 1e96621ece3a..938af216cb1c 100644 --- a/fs/xfs/scrub/agheader_repair.c +++ b/fs/xfs/scrub/agheader_repair.c @@ -17,12 +17,19 @@ #include "xfs_sb.h" #include "xfs_inode.h" #include "xfs_alloc.h" +#include "xfs_alloc_btree.h" #include "xfs_ialloc.h" +#include "xfs_ialloc_btree.h" #include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_refcount.h" +#include "xfs_refcount_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" /* Superblock */ @@ -54,3 +61,362 @@ xrep_superblock( xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1); return error; } + +/* AGF */ + +struct xrep_agf_allocbt { + struct xfs_scrub *sc; + xfs_agblock_t freeblks; + xfs_agblock_t longest; +}; + +/* Record free space shape information. */ +STATIC int +xrep_agf_walk_allocbt( + struct xfs_btree_cur *cur, + struct xfs_alloc_rec_incore *rec, + void *priv) +{ + struct xrep_agf_allocbt *raa = priv; + int error = 0; + + if (xchk_should_terminate(raa->sc, &error)) + return error; + + raa->freeblks += rec->ar_blockcount; + if (rec->ar_blockcount > raa->longest) + raa->longest = rec->ar_blockcount; + return error; +} + +/* Does this AGFL block look sane? */ +STATIC int +xrep_agf_check_agfl_block( + struct xfs_mount *mp, + xfs_agblock_t agbno, + void *priv) +{ + struct xfs_scrub *sc = priv; + + if (!xfs_verify_agbno(mp, sc->sa.agno, agbno)) + return -EFSCORRUPTED; + return 0; +} + +/* + * Offset within the xrep_find_ag_btree array for each btree type. Avoid the + * XFS_BTNUM_ names here to avoid creating a sparse array. + */ +enum { + XREP_AGF_BNOBT = 0, + XREP_AGF_CNTBT, + XREP_AGF_RMAPBT, + XREP_AGF_REFCOUNTBT, + XREP_AGF_END, + XREP_AGF_MAX +}; + +/* + * Given the btree roots described by *fab, find the roots, check them for + * sanity, and pass the root data back out via *fab. + * + * This is /also/ a chicken and egg problem because we have to use the rmapbt + * (rooted in the AGF) to find the btrees rooted in the AGF. We also have no + * idea if the btrees make any sense. If we hit obvious corruptions in those + * btrees we'll bail out. + */ +STATIC int +xrep_agf_find_btrees( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp, + struct xrep_find_ag_btree *fab, + struct xfs_buf *agfl_bp) +{ + struct xfs_agf *old_agf = XFS_BUF_TO_AGF(agf_bp); + int error; + + /* Go find the root data. */ + error = xrep_find_ag_btree_roots(sc, agf_bp, fab, agfl_bp); + if (error) + return error; + + /* We must find the bnobt, cntbt, and rmapbt roots. */ + if (fab[XREP_AGF_BNOBT].root == NULLAGBLOCK || + fab[XREP_AGF_BNOBT].height > XFS_BTREE_MAXLEVELS || + fab[XREP_AGF_CNTBT].root == NULLAGBLOCK || + fab[XREP_AGF_CNTBT].height > XFS_BTREE_MAXLEVELS || + fab[XREP_AGF_RMAPBT].root == NULLAGBLOCK || + fab[XREP_AGF_RMAPBT].height > XFS_BTREE_MAXLEVELS) + return -EFSCORRUPTED; + + /* + * We relied on the rmapbt to reconstruct the AGF. If we get a + * different root then something's seriously wrong. + */ + if (fab[XREP_AGF_RMAPBT].root != + be32_to_cpu(old_agf->agf_roots[XFS_BTNUM_RMAPi])) + return -EFSCORRUPTED; + + /* We must find the refcountbt root if that feature is enabled. */ + if (xfs_sb_version_hasreflink(&sc->mp->m_sb) && + (fab[XREP_AGF_REFCOUNTBT].root == NULLAGBLOCK || + fab[XREP_AGF_REFCOUNTBT].height > XFS_BTREE_MAXLEVELS)) + return -EFSCORRUPTED; + + return 0; +} + +/* + * Reinitialize the AGF header, making an in-core copy of the old contents so + * that we know which in-core state needs to be reinitialized. + */ +STATIC void +xrep_agf_init_header( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp, + struct xfs_agf *old_agf) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_agf *agf = XFS_BUF_TO_AGF(agf_bp); + + memcpy(old_agf, agf, sizeof(*old_agf)); + memset(agf, 0, BBTOB(agf_bp->b_length)); + agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC); + agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION); + agf->agf_seqno = cpu_to_be32(sc->sa.agno); + agf->agf_length = cpu_to_be32(xfs_ag_block_count(mp, sc->sa.agno)); + agf->agf_flfirst = old_agf->agf_flfirst; + agf->agf_fllast = old_agf->agf_fllast; + agf->agf_flcount = old_agf->agf_flcount; + if (xfs_sb_version_hascrc(&mp->m_sb)) + uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid); + + /* Mark the incore AGF data stale until we're done fixing things. */ + ASSERT(sc->sa.pag->pagf_init); + sc->sa.pag->pagf_init = 0; +} + +/* Set btree root information in an AGF. */ +STATIC void +xrep_agf_set_roots( + struct xfs_scrub *sc, + struct xfs_agf *agf, + struct xrep_find_ag_btree *fab) +{ + agf->agf_roots[XFS_BTNUM_BNOi] = + cpu_to_be32(fab[XREP_AGF_BNOBT].root); + agf->agf_levels[XFS_BTNUM_BNOi] = + cpu_to_be32(fab[XREP_AGF_BNOBT].height); + + agf->agf_roots[XFS_BTNUM_CNTi] = + cpu_to_be32(fab[XREP_AGF_CNTBT].root); + agf->agf_levels[XFS_BTNUM_CNTi] = + cpu_to_be32(fab[XREP_AGF_CNTBT].height); + + agf->agf_roots[XFS_BTNUM_RMAPi] = + cpu_to_be32(fab[XREP_AGF_RMAPBT].root); + agf->agf_levels[XFS_BTNUM_RMAPi] = + cpu_to_be32(fab[XREP_AGF_RMAPBT].height); + + if (xfs_sb_version_hasreflink(&sc->mp->m_sb)) { + agf->agf_refcount_root = + cpu_to_be32(fab[XREP_AGF_REFCOUNTBT].root); + agf->agf_refcount_level = + cpu_to_be32(fab[XREP_AGF_REFCOUNTBT].height); + } +} + +/* Update all AGF fields which derive from btree contents. */ +STATIC int +xrep_agf_calc_from_btrees( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp) +{ + struct xrep_agf_allocbt raa = { .sc = sc }; + struct xfs_btree_cur *cur = NULL; + struct xfs_agf *agf = XFS_BUF_TO_AGF(agf_bp); + struct xfs_mount *mp = sc->mp; + xfs_agblock_t btreeblks; + xfs_agblock_t blocks; + int error; + + /* Update the AGF counters from the bnobt. */ + cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno, + XFS_BTNUM_BNO); + error = xfs_alloc_query_all(cur, xrep_agf_walk_allocbt, &raa); + if (error) + goto err; + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + btreeblks = blocks - 1; + agf->agf_freeblks = cpu_to_be32(raa.freeblks); + agf->agf_longest = cpu_to_be32(raa.longest); + + /* Update the AGF counters from the cntbt. */ + cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno, + XFS_BTNUM_CNT); + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + btreeblks += blocks - 1; + + /* Update the AGF counters from the rmapbt. */ + cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno); + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + agf->agf_rmap_blocks = cpu_to_be32(blocks); + btreeblks += blocks - 1; + + agf->agf_btreeblks = cpu_to_be32(btreeblks); + + /* Update the AGF counters from the refcountbt. */ + if (xfs_sb_version_hasreflink(&mp->m_sb)) { + cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp, + sc->sa.agno, NULL); + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + agf->agf_refcount_blocks = cpu_to_be32(blocks); + } + + return 0; +err: + xfs_btree_del_cursor(cur, error); + return error; +} + +/* Commit the new AGF and reinitialize the incore state. */ +STATIC int +xrep_agf_commit_new( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp) +{ + struct xfs_perag *pag; + struct xfs_agf *agf = XFS_BUF_TO_AGF(agf_bp); + + /* Trigger fdblocks recalculation */ + xfs_force_summary_recalc(sc->mp); + + /* Write this to disk. */ + xfs_trans_buf_set_type(sc->tp, agf_bp, XFS_BLFT_AGF_BUF); + xfs_trans_log_buf(sc->tp, agf_bp, 0, BBTOB(agf_bp->b_length) - 1); + + /* Now reinitialize the in-core counters we changed. */ + pag = sc->sa.pag; + sc->sa.pag->pagf_init = 1; + pag->pagf_btreeblks = be32_to_cpu(agf->agf_btreeblks); + pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks); + pag->pagf_longest = be32_to_cpu(agf->agf_longest); + pag->pagf_levels[XFS_BTNUM_BNOi] = + be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]); + pag->pagf_levels[XFS_BTNUM_CNTi] = + be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]); + pag->pagf_levels[XFS_BTNUM_RMAPi] = + be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]); + pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); + + return 0; +} + +/* Repair the AGF. v5 filesystems only. */ +int +xrep_agf( + struct xfs_scrub *sc) +{ + struct xrep_find_ag_btree fab[XREP_AGF_MAX] = { + [XREP_AGF_BNOBT] = { + .rmap_owner = XFS_RMAP_OWN_AG, + .buf_ops = &xfs_allocbt_buf_ops, + .magic = XFS_ABTB_CRC_MAGIC, + }, + [XREP_AGF_CNTBT] = { + .rmap_owner = XFS_RMAP_OWN_AG, + .buf_ops = &xfs_allocbt_buf_ops, + .magic = XFS_ABTC_CRC_MAGIC, + }, + [XREP_AGF_RMAPBT] = { + .rmap_owner = XFS_RMAP_OWN_AG, + .buf_ops = &xfs_rmapbt_buf_ops, + .magic = XFS_RMAP_CRC_MAGIC, + }, + [XREP_AGF_REFCOUNTBT] = { + .rmap_owner = XFS_RMAP_OWN_REFC, + .buf_ops = &xfs_refcountbt_buf_ops, + .magic = XFS_REFC_CRC_MAGIC, + }, + [XREP_AGF_END] = { + .buf_ops = NULL, + }, + }; + struct xfs_agf old_agf; + struct xfs_mount *mp = sc->mp; + struct xfs_buf *agf_bp; + struct xfs_buf *agfl_bp; + struct xfs_agf *agf; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp, + XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)), + XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL); + if (error) + return error; + agf_bp->b_ops = &xfs_agf_buf_ops; + agf = XFS_BUF_TO_AGF(agf_bp); + + /* + * Load the AGFL so that we can screen out OWN_AG blocks that are on + * the AGFL now; these blocks might have once been part of the + * bno/cnt/rmap btrees but are not now. This is a chicken and egg + * problem: the AGF is corrupt, so we have to trust the AGFL contents + * because we can't do any serious cross-referencing with any of the + * btrees rooted in the AGF. If the AGFL contents are obviously bad + * then we'll bail out. + */ + error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp); + if (error) + return error; + + /* + * Spot-check the AGFL blocks; if they're obviously corrupt then + * there's nothing we can do but bail out. + */ + error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp, + xrep_agf_check_agfl_block, sc); + if (error) + return error; + + /* + * Find the AGF btree roots. This is also a chicken-and-egg situation; + * see the function for more details. + */ + error = xrep_agf_find_btrees(sc, agf_bp, fab, agfl_bp); + if (error) + return error; + + /* Start rewriting the header and implant the btrees we found. */ + xrep_agf_init_header(sc, agf_bp, &old_agf); + xrep_agf_set_roots(sc, agf, fab); + error = xrep_agf_calc_from_btrees(sc, agf_bp); + if (error) + goto out_revert; + + /* Commit the changes and reinitialize incore state. */ + return xrep_agf_commit_new(sc, agf_bp); + +out_revert: + /* Mark the incore AGF state stale and revert the AGF. */ + sc->sa.pag->pagf_init = 0; + memcpy(agf, &old_agf, sizeof(old_agf)); + return error; +} diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 85b048b341a0..17cf48564390 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -128,9 +128,12 @@ xrep_roll_ag_trans( int error; /* Keep the AG header buffers locked so we can keep going. */ - xfs_trans_bhold(sc->tp, sc->sa.agi_bp); - xfs_trans_bhold(sc->tp, sc->sa.agf_bp); - xfs_trans_bhold(sc->tp, sc->sa.agfl_bp); + if (sc->sa.agi_bp) + xfs_trans_bhold(sc->tp, sc->sa.agi_bp); + if (sc->sa.agf_bp) + xfs_trans_bhold(sc->tp, sc->sa.agf_bp); + if (sc->sa.agfl_bp) + xfs_trans_bhold(sc->tp, sc->sa.agfl_bp); /* Roll the transaction. */ error = xfs_trans_roll(&sc->tp); @@ -138,9 +141,12 @@ xrep_roll_ag_trans( goto out_release; /* Join AG headers to the new transaction. */ - xfs_trans_bjoin(sc->tp, sc->sa.agi_bp); - xfs_trans_bjoin(sc->tp, sc->sa.agf_bp); - xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp); + if (sc->sa.agi_bp) + xfs_trans_bjoin(sc->tp, sc->sa.agi_bp); + if (sc->sa.agf_bp) + xfs_trans_bjoin(sc->tp, sc->sa.agf_bp); + if (sc->sa.agfl_bp) + xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp); return 0; @@ -150,9 +156,12 @@ xrep_roll_ag_trans( * buffers will be released during teardown on our way out * of the kernel. */ - xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp); - xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp); - xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp); + if (sc->sa.agi_bp) + xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp); + if (sc->sa.agf_bp) + xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp); + if (sc->sa.agfl_bp) + xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp); return error; } diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 5a4e92221916..1d283360b5ab 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -58,6 +58,8 @@ int xrep_ino_dqattach(struct xfs_scrub *sc); int xrep_probe(struct xfs_scrub *sc); int xrep_superblock(struct xfs_scrub *sc); +int xrep_agf(struct xfs_scrub *sc); +int xrep_agfl(struct xfs_scrub *sc); #else @@ -81,6 +83,8 @@ xrep_calc_ag_resblks( #define xrep_probe xrep_notsupported #define xrep_superblock xrep_notsupported +#define xrep_agf xrep_notsupported +#define xrep_agfl xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 6efb926f3cf8..1e8a17c8e2b9 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -214,7 +214,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_PERAG, .setup = xchk_setup_fs, .scrub = xchk_agf, - .repair = xrep_notsupported, + .repair = xrep_agf, }, [XFS_SCRUB_TYPE_AGFL]= { /* agfl */ .type = ST_PERAG, From patchwork Thu Jul 26 00:20:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544983 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD041A635 for ; Thu, 26 Jul 2018 00:20:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B97552A9A0 for ; Thu, 26 Jul 2018 00:20:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ADD962AA6E; Thu, 26 Jul 2018 00:20:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 848A82AA34 for ; Thu, 26 Jul 2018 00:20:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728501AbeGZBeT (ORCPT ); Wed, 25 Jul 2018 21:34:19 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:44516 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728268AbeGZBeT (ORCPT ); Wed, 25 Jul 2018 21:34:19 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0G3SC168100; Thu, 26 Jul 2018 00:20:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=mY8cqcpZN10IJKV87DD7QWV7xPH3Iw/+vBbv3hdXaWs=; b=Q0/fbSL47+UGhoJCaZF9bGC92jA3RMfp21gsvALZqPMgMzvK3ZmCoQiYRvE9IMGy/WOe Jdr05M0oWP4xP5rlYER1Fzpj8pO2ZSvq2AQNKX/JMKBe7eeFyPHFRh5UHsZ0adMmlARt 3zoxV3MvoTVYdh2siRGHkhQQp5x4p7gtv5R1dstyIx5X6LhkOjFz6WljgKNGV1/GCs2K UVwBqHFrlXWSiEpToHuvIW5Xr80bS1yHJZxTU8tuoV0LrUlkIgfazTs77sTSOWJJP0Vy EvFmnqSx8kT9Ids5qWmnuikeIW3gWEB0xfXQIBdbNzEaP7KmNuafcm146mLFYDcFRRKu Wg== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2kbv8t7pq2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:20:09 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0K8RN014052 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:20:08 GMT Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0K8it018874; Thu, 26 Jul 2018 00:20:08 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:20:07 -0700 Subject: [PATCH 05/16] xfs: repair the AGFL From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:20:01 -0700 Message-ID: <153256440143.29021.10996727150978290944.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Repair the AGFL from the rmap data. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/agheader_repair.c | 272 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/bitmap.c | 92 ++++++++++++++ fs/xfs/scrub/bitmap.h | 4 + fs/xfs/scrub/scrub.c | 2 4 files changed, 369 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c index 938af216cb1c..407fee8d94dd 100644 --- a/fs/xfs/scrub/agheader_repair.c +++ b/fs/xfs/scrub/agheader_repair.c @@ -420,3 +420,275 @@ xrep_agf( memcpy(agf, &old_agf, sizeof(old_agf)); return error; } + +/* AGFL */ + +struct xrep_agfl { + /* Bitmap of other OWN_AG metadata blocks. */ + struct xfs_bitmap agmetablocks; + + /* Bitmap of free space. */ + struct xfs_bitmap *freesp; + + struct xfs_scrub *sc; +}; + +/* Record all OWN_AG (free space btree) information from the rmap data. */ +STATIC int +xrep_agfl_walk_rmap( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_agfl *ra = priv; + xfs_fsblock_t fsb; + int error = 0; + + if (xchk_should_terminate(ra->sc, &error)) + return error; + + /* Record all the OWN_AG blocks. */ + if (rec->rm_owner == XFS_RMAP_OWN_AG) { + fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, + rec->rm_startblock); + error = xfs_bitmap_set(ra->freesp, fsb, rec->rm_blockcount); + if (error) + return error; + } + + return xfs_bitmap_set_btcur_path(&ra->agmetablocks, cur); +} + +/* + * Map out all the non-AGFL OWN_AG space in this AG so that we can deduce + * which blocks belong to the AGFL. + * + * Compute the set of old AGFL blocks by subtracting from the list of OWN_AG + * blocks the list of blocks owned by all other OWN_AG metadata (bnobt, cntbt, + * rmapbt). These are the old AGFL blocks, so return that list and the number + * of blocks we're actually going to put back on the AGFL. + */ +STATIC int +xrep_agfl_collect_blocks( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp, + struct xfs_bitmap *agfl_extents, + xfs_agblock_t *flcount) +{ + struct xrep_agfl ra; + struct xfs_mount *mp = sc->mp; + struct xfs_btree_cur *cur; + struct xfs_bitmap_range *br; + struct xfs_bitmap_range *n; + int error; + + ra.sc = sc; + ra.freesp = agfl_extents; + xfs_bitmap_init(&ra.agmetablocks); + + /* Find all space used by the free space btrees & rmapbt. */ + cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno); + error = xfs_rmap_query_all(cur, xrep_agfl_walk_rmap, &ra); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + + /* Find all blocks currently being used by the bnobt. */ + cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno, + XFS_BTNUM_BNO); + error = xfs_bitmap_set_btblocks(&ra.agmetablocks, cur); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + + /* Find all blocks currently being used by the cntbt. */ + cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno, + XFS_BTNUM_CNT); + error = xfs_bitmap_set_btblocks(&ra.agmetablocks, cur); + if (error) + goto err; + + xfs_btree_del_cursor(cur, error); + + /* + * Drop the freesp meta blocks that are in use by btrees. + * The remaining blocks /should/ be AGFL blocks. + */ + error = xfs_bitmap_disunion(agfl_extents, &ra.agmetablocks); + xfs_bitmap_destroy(&ra.agmetablocks); + if (error) + return error; + + /* + * Calculate the new AGFL size. If we found more blocks than fit in + * the AGFL we'll free them later. + */ + *flcount = 0; + for_each_xfs_bitmap_extent(br, n, agfl_extents) { + *flcount += br->len; + if (*flcount > xfs_agfl_size(mp)) + break; + } + if (*flcount > xfs_agfl_size(mp)) + *flcount = xfs_agfl_size(mp); + return 0; + +err: + xfs_bitmap_destroy(&ra.agmetablocks); + xfs_btree_del_cursor(cur, error); + return error; +} + +/* Update the AGF and reset the in-core state. */ +STATIC int +xrep_agfl_update_agf( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp, + xfs_agblock_t flcount) +{ + struct xfs_agf *agf = XFS_BUF_TO_AGF(agf_bp); + + ASSERT(flcount <= xfs_agfl_size(sc->mp)); + + /* Trigger fdblocks recalculation */ + xfs_force_summary_recalc(sc->mp); + + /* Update the AGF counters. */ + if (sc->sa.pag->pagf_init) + sc->sa.pag->pagf_flcount = flcount; + agf->agf_flfirst = cpu_to_be32(0); + agf->agf_flcount = cpu_to_be32(flcount); + agf->agf_fllast = cpu_to_be32(flcount - 1); + + xfs_alloc_log_agf(sc->tp, agf_bp, + XFS_AGF_FLFIRST | XFS_AGF_FLLAST | XFS_AGF_FLCOUNT); + return 0; +} + +/* Write out a totally new AGFL. */ +STATIC void +xrep_agfl_init_header( + struct xfs_scrub *sc, + struct xfs_buf *agfl_bp, + struct xfs_bitmap *agfl_extents, + xfs_agblock_t flcount) +{ + struct xfs_mount *mp = sc->mp; + __be32 *agfl_bno; + struct xfs_bitmap_range *br; + struct xfs_bitmap_range *n; + struct xfs_agfl *agfl; + xfs_agblock_t agbno; + unsigned int fl_off; + + ASSERT(flcount <= xfs_agfl_size(mp)); + + /* Start rewriting the header. */ + agfl = XFS_BUF_TO_AGFL(agfl_bp); + memset(agfl, 0xFF, BBTOB(agfl_bp->b_length)); + agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC); + agfl->agfl_seqno = cpu_to_be32(sc->sa.agno); + uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid); + + /* + * Fill the AGFL with the remaining blocks. If agfl_extents has more + * blocks than fit in the AGFL, they will be freed in a subsequent + * step. + */ + fl_off = 0; + agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agfl_bp); + for_each_xfs_bitmap_extent(br, n, agfl_extents) { + agbno = XFS_FSB_TO_AGBNO(mp, br->fsbno); + + trace_xrep_agfl_insert(mp, sc->sa.agno, agbno, br->len); + + while (br->len > 0 && fl_off < flcount) { + agfl_bno[fl_off] = cpu_to_be32(agbno); + fl_off++; + agbno++; + br->fsbno++; + br->len--; + } + + if (br->len) + break; + list_del(&br->list); + kmem_free(br); + } + + /* Write new AGFL to disk. */ + xfs_trans_buf_set_type(sc->tp, agfl_bp, XFS_BLFT_AGFL_BUF); + xfs_trans_log_buf(sc->tp, agfl_bp, 0, BBTOB(agfl_bp->b_length) - 1); +} + +/* Repair the AGFL. */ +int +xrep_agfl( + struct xfs_scrub *sc) +{ + struct xfs_owner_info oinfo; + struct xfs_bitmap agfl_extents; + struct xfs_mount *mp = sc->mp; + struct xfs_buf *agf_bp; + struct xfs_buf *agfl_bp; + xfs_agblock_t flcount; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + xfs_bitmap_init(&agfl_extents); + + /* + * Read the AGF so that we can query the rmapbt. We hope that there's + * nothing wrong with the AGF, but all the AG header repair functions + * have this chicken-and-egg problem. + */ + error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp); + if (error) + return error; + if (!agf_bp) + return -ENOMEM; + + error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp, + XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)), + XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL); + if (error) + return error; + agfl_bp->b_ops = &xfs_agfl_buf_ops; + + /* Gather all the extents we're going to put on the new AGFL. */ + error = xrep_agfl_collect_blocks(sc, agf_bp, &agfl_extents, &flcount); + if (error) + goto err; + + /* + * Update AGF and AGFL. We reset the global free block counter when + * we adjust the AGF flcount (which can fail) so avoid updating any + * buffers until we know that part works. + */ + error = xrep_agfl_update_agf(sc, agf_bp, flcount); + if (error) + goto err; + xrep_agfl_init_header(sc, agfl_bp, &agfl_extents, flcount); + + /* + * Ok, the AGFL should be ready to go now. Roll the transaction to + * make the new AGFL permanent before we start using it to return + * freespace overflow to the freespace btrees. + */ + sc->sa.agf_bp = agf_bp; + sc->sa.agfl_bp = agfl_bp; + error = xrep_roll_ag_trans(sc); + if (error) + goto err; + + /* Dump any AGFL overflow. */ + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG); + return xrep_reap_extents(sc, &agfl_extents, &oinfo, XFS_AG_RESV_AGFL); +err: + xfs_bitmap_destroy(&agfl_extents); + return error; +} diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c index 4840f5a1e179..046ee63982c9 100644 --- a/fs/xfs/scrub/bitmap.c +++ b/fs/xfs/scrub/bitmap.c @@ -9,6 +9,7 @@ #include "xfs_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" +#include "xfs_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -209,3 +210,94 @@ xfs_bitmap_disunion( } #undef LEFT_ALIGNED #undef RIGHT_ALIGNED + +/* + * Record all btree blocks seen while iterating all records of a btree. + * + * We know that the btree query_all function starts at the left edge and walks + * towards the right edge of the tree. Therefore, we know that we can walk up + * the btree cursor towards the root; if the pointer for a given level points + * to the first record/key in that block, we haven't seen this block before; + * and therefore we need to remember that we saw this block in the btree. + * + * So if our btree is: + * + * 4 + * / | \ + * 1 2 3 + * + * Pretend for this example that each leaf block has 100 btree records. For + * the first btree record, we'll observe that bc_ptrs[0] == 1, so we record + * that we saw block 1. Then we observe that bc_ptrs[1] == 1, so we record + * block 4. The list is [1, 4]. + * + * For the second btree record, we see that bc_ptrs[0] == 2, so we exit the + * loop. The list remains [1, 4]. + * + * For the 101st btree record, we've moved onto leaf block 2. Now + * bc_ptrs[0] == 1 again, so we record that we saw block 2. We see that + * bc_ptrs[1] == 2, so we exit the loop. The list is now [1, 4, 2]. + * + * For the 102nd record, bc_ptrs[0] == 2, so we continue. + * + * For the 201st record, we've moved on to leaf block 3. bc_ptrs[0] == 1, so + * we add 3 to the list. Now it is [1, 4, 2, 3]. + * + * For the 300th record we just exit, with the list being [1, 4, 2, 3]. + */ + +/* + * Record all the buffers pointed to by the btree cursor. Callers already + * engaged in a btree walk should call this function to capture the list of + * blocks going from the leaf towards the root. + */ +int +xfs_bitmap_set_btcur_path( + struct xfs_bitmap *bitmap, + struct xfs_btree_cur *cur) +{ + struct xfs_buf *bp; + xfs_fsblock_t fsb; + int i; + int error; + + for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) { + xfs_btree_get_block(cur, i, &bp); + if (!bp) + continue; + fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn); + error = xfs_bitmap_set(bitmap, fsb, 1); + if (error) + return error; + } + + return 0; +} + +/* Collect a btree's block in the bitmap. */ +STATIC int +xfs_bitmap_collect_btblock( + struct xfs_btree_cur *cur, + int level, + void *priv) +{ + struct xfs_bitmap *bitmap = priv; + struct xfs_buf *bp; + xfs_fsblock_t fsbno; + + xfs_btree_get_block(cur, level, &bp); + if (!bp) + return 0; + + fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn); + return xfs_bitmap_set(bitmap, fsbno, 1); +} + +/* Walk the btree and mark the bitmap wherever a btree block is found. */ +int +xfs_bitmap_set_btblocks( + struct xfs_bitmap *bitmap, + struct xfs_btree_cur *cur) +{ + return xfs_btree_visit_blocks(cur, xfs_bitmap_collect_btblock, bitmap); +} diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h index 3c39900e9269..b5df433b6e9c 100644 --- a/fs/xfs/scrub/bitmap.h +++ b/fs/xfs/scrub/bitmap.h @@ -29,5 +29,9 @@ void xfs_bitmap_destroy(struct xfs_bitmap *bitmap); int xfs_bitmap_set(struct xfs_bitmap *bitmap, xfs_fsblock_t fsbno, xfs_fsblock_t len); int xfs_bitmap_disunion(struct xfs_bitmap *bitmap, struct xfs_bitmap *sub); +int xfs_bitmap_set_btcur_path(struct xfs_bitmap *bitmap, + struct xfs_btree_cur *cur); +int xfs_bitmap_set_btblocks(struct xfs_bitmap *bitmap, + struct xfs_btree_cur *cur); #endif /* __XFS_SCRUB_BITMAP_H__ */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 1e8a17c8e2b9..2670f4cf62f4 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -220,7 +220,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_PERAG, .setup = xchk_setup_fs, .scrub = xchk_agfl, - .repair = xrep_notsupported, + .repair = xrep_agfl, }, [XFS_SCRUB_TYPE_AGI] = { /* agi */ .type = ST_PERAG, From patchwork Thu Jul 26 00:20:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544981 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B113112E for ; Thu, 26 Jul 2018 00:20:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 794152AA39 for ; Thu, 26 Jul 2018 00:20:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6CA632AA81; Thu, 26 Jul 2018 00:20:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 929862A9A0 for ; Thu, 26 Jul 2018 00:20:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728504AbeGZBeZ (ORCPT ); Wed, 25 Jul 2018 21:34:25 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:55656 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728503AbeGZBeZ (ORCPT ); Wed, 25 Jul 2018 21:34:25 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0E2wC167370; Thu, 26 Jul 2018 00:20:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=4ZzqpOJ5ApsqnFcCHO4ATxw8fDoFqLOhheP6mYktEP8=; b=eQt9NCkzEq2WCA2FnQ/LxYP+r1aEsb6Dx9RVH/P4q2YrOrnSgC9Jx4M6H9cMUJRLPKS6 4QwdO+Mcxz2t7EApRRsUQTKs2+3pmJw+ltRJBMPKNY+rE9q7bUa733p0adDgehKvVCZj v5U2CZIYkGdezMW9mIIwgrfWhspTGzKv7LSOph+yS+tKJ8hvZQj6XiFJ+YyY/kYz9yJ/ sjx5VHiRgKKdLUkmwHc+wOtcxgpDlyyq0DyEpEJKRB4H5aN5jU+ipAHbk3U+AFu2rIrS xLCEOYOkZRR09BE49MExNKZFXgoKzKPsqVnv3ziQx+lxEyiz4UPN+nhR4/283jHpoEQD 3Q== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2kbwfpync5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:20:15 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0KFjS005182 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:20:15 GMT Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0KEbH018908; Thu, 26 Jul 2018 00:20:15 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:20:14 -0700 Subject: [PATCH 06/16] xfs: repair the AGI From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:20:12 -0700 Message-ID: <153256441240.29021.18420025834984114788.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Rebuild the AGI header items with some help from the rmapbt. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/agheader_repair.c | 216 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 2 3 files changed, 219 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c index 407fee8d94dd..8d525fa28f17 100644 --- a/fs/xfs/scrub/agheader_repair.c +++ b/fs/xfs/scrub/agheader_repair.c @@ -692,3 +692,219 @@ xrep_agfl( xfs_bitmap_destroy(&agfl_extents); return error; } + +/* AGI */ + +/* + * Offset within the xrep_find_ag_btree array for each btree type. Avoid the + * XFS_BTNUM_ names here to avoid creating a sparse array. + */ +enum { + XREP_AGI_INOBT = 0, + XREP_AGI_FINOBT, + XREP_AGI_END, + XREP_AGI_MAX +}; + +/* + * Given the inode btree roots described by *fab, find the roots, check them + * for sanity, and pass the root data back out via *fab. + */ +STATIC int +xrep_agi_find_btrees( + struct xfs_scrub *sc, + struct xrep_find_ag_btree *fab) +{ + struct xfs_buf *agf_bp; + struct xfs_mount *mp = sc->mp; + int error; + + /* Read the AGF. */ + error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp); + if (error) + return error; + if (!agf_bp) + return -ENOMEM; + + /* Find the btree roots. */ + error = xrep_find_ag_btree_roots(sc, agf_bp, fab, NULL); + if (error) + return error; + + /* We must find the inobt root. */ + if (fab[XREP_AGI_INOBT].root == NULLAGBLOCK || + fab[XREP_AGI_INOBT].height > XFS_BTREE_MAXLEVELS) + return -EFSCORRUPTED; + + /* We must find the finobt root if that feature is enabled. */ + if (xfs_sb_version_hasfinobt(&mp->m_sb) && + (fab[XREP_AGI_FINOBT].root == NULLAGBLOCK || + fab[XREP_AGI_FINOBT].height > XFS_BTREE_MAXLEVELS)) + return -EFSCORRUPTED; + + return 0; +} + +/* + * Reinitialize the AGI header, making an in-core copy of the old contents so + * that we know which in-core state needs to be reinitialized. + */ +STATIC void +xrep_agi_init_header( + struct xfs_scrub *sc, + struct xfs_buf *agi_bp, + struct xfs_agi *old_agi) +{ + struct xfs_agi *agi = XFS_BUF_TO_AGI(agi_bp); + struct xfs_mount *mp = sc->mp; + + memcpy(old_agi, agi, sizeof(*old_agi)); + memset(agi, 0, BBTOB(agi_bp->b_length)); + agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC); + agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION); + agi->agi_seqno = cpu_to_be32(sc->sa.agno); + agi->agi_length = cpu_to_be32(xfs_ag_block_count(mp, sc->sa.agno)); + agi->agi_newino = cpu_to_be32(NULLAGINO); + agi->agi_dirino = cpu_to_be32(NULLAGINO); + if (xfs_sb_version_hascrc(&mp->m_sb)) + uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid); + + /* We don't know how to fix the unlinked list yet. */ + memcpy(&agi->agi_unlinked, &old_agi->agi_unlinked, + sizeof(agi->agi_unlinked)); + + /* Mark the incore AGF data stale until we're done fixing things. */ + ASSERT(sc->sa.pag->pagi_init); + sc->sa.pag->pagi_init = 0; +} + +/* Set btree root information in an AGI. */ +STATIC void +xrep_agi_set_roots( + struct xfs_scrub *sc, + struct xfs_agi *agi, + struct xrep_find_ag_btree *fab) +{ + agi->agi_root = cpu_to_be32(fab[XREP_AGI_INOBT].root); + agi->agi_level = cpu_to_be32(fab[XREP_AGI_INOBT].height); + + if (xfs_sb_version_hasfinobt(&sc->mp->m_sb)) { + agi->agi_free_root = cpu_to_be32(fab[XREP_AGI_FINOBT].root); + agi->agi_free_level = cpu_to_be32(fab[XREP_AGI_FINOBT].height); + } +} + +/* Update the AGI counters. */ +STATIC int +xrep_agi_calc_from_btrees( + struct xfs_scrub *sc, + struct xfs_buf *agi_bp) +{ + struct xfs_btree_cur *cur; + struct xfs_agi *agi = XFS_BUF_TO_AGI(agi_bp); + struct xfs_mount *mp = sc->mp; + xfs_agino_t count; + xfs_agino_t freecount; + int error; + + cur = xfs_inobt_init_cursor(mp, sc->tp, agi_bp, sc->sa.agno, + XFS_BTNUM_INO); + error = xfs_ialloc_count_inodes(cur, &count, &freecount); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + + agi->agi_count = cpu_to_be32(count); + agi->agi_freecount = cpu_to_be32(freecount); + return 0; +err: + xfs_btree_del_cursor(cur, error); + return error; +} + +/* Trigger reinitialization of the in-core data. */ +STATIC int +xrep_agi_commit_new( + struct xfs_scrub *sc, + struct xfs_buf *agi_bp, + const struct xfs_agi *old_agi) +{ + struct xfs_perag *pag; + struct xfs_agi *agi = XFS_BUF_TO_AGI(agi_bp); + + /* Trigger inode count recalculation */ + xfs_force_summary_recalc(sc->mp); + + /* Write this to disk. */ + xfs_trans_buf_set_type(sc->tp, agi_bp, XFS_BLFT_AGI_BUF); + xfs_trans_log_buf(sc->tp, agi_bp, 0, BBTOB(agi_bp->b_length) - 1); + + /* Now reinitialize the in-core counters if necessary. */ + pag = sc->sa.pag; + sc->sa.pag->pagi_init = 1; + pag->pagi_count = be32_to_cpu(agi->agi_count); + pag->pagi_freecount = be32_to_cpu(agi->agi_freecount); + + return 0; +} + +/* Repair the AGI. */ +int +xrep_agi( + struct xfs_scrub *sc) +{ + struct xrep_find_ag_btree fab[XREP_AGI_MAX] = { + [XREP_AGI_INOBT] = { + .rmap_owner = XFS_RMAP_OWN_INOBT, + .buf_ops = &xfs_inobt_buf_ops, + .magic = XFS_IBT_CRC_MAGIC, + }, + [XREP_AGI_FINOBT] = { + .rmap_owner = XFS_RMAP_OWN_INOBT, + .buf_ops = &xfs_inobt_buf_ops, + .magic = XFS_FIBT_CRC_MAGIC, + }, + [XREP_AGI_END] = { + .buf_ops = NULL + }, + }; + struct xfs_agi old_agi; + struct xfs_mount *mp = sc->mp; + struct xfs_buf *agi_bp; + struct xfs_agi *agi; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp, + XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGI_DADDR(mp)), + XFS_FSS_TO_BB(mp, 1), 0, &agi_bp, NULL); + if (error) + return error; + agi_bp->b_ops = &xfs_agi_buf_ops; + agi = XFS_BUF_TO_AGI(agi_bp); + + /* Find the AGI btree roots. */ + error = xrep_agi_find_btrees(sc, fab); + if (error) + return error; + + /* Start rewriting the header and implant the btrees we found. */ + xrep_agi_init_header(sc, agi_bp, &old_agi); + xrep_agi_set_roots(sc, agi, fab); + error = xrep_agi_calc_from_btrees(sc, agi_bp); + if (error) + goto out_revert; + + /* Reinitialize in-core state. */ + return xrep_agi_commit_new(sc, agi_bp, &old_agi); + +out_revert: + /* Mark the incore AGI state stale and revert the AGI. */ + sc->sa.pag->pagi_init = 0; + memcpy(agi, &old_agi, sizeof(old_agi)); + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 1d283360b5ab..9de321eee4ab 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -60,6 +60,7 @@ int xrep_probe(struct xfs_scrub *sc); int xrep_superblock(struct xfs_scrub *sc); int xrep_agf(struct xfs_scrub *sc); int xrep_agfl(struct xfs_scrub *sc); +int xrep_agi(struct xfs_scrub *sc); #else @@ -85,6 +86,7 @@ xrep_calc_ag_resblks( #define xrep_superblock xrep_notsupported #define xrep_agf xrep_notsupported #define xrep_agfl xrep_notsupported +#define xrep_agi xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 2670f4cf62f4..4bfae1e61d30 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -226,7 +226,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_PERAG, .setup = xchk_setup_fs, .scrub = xchk_agi, - .repair = xrep_notsupported, + .repair = xrep_agi, }, [XFS_SCRUB_TYPE_BNOBT] = { /* bnobt */ .type = ST_PERAG, From patchwork Thu Jul 26 00:20:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544985 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DF8751822 for ; Thu, 26 Jul 2018 00:21:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB06D2AA9B for ; Thu, 26 Jul 2018 00:21:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BE8582AACA; Thu, 26 Jul 2018 00:21:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 39DA92AAC9 for ; Thu, 26 Jul 2018 00:21:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728486AbeGZBfI (ORCPT ); Wed, 25 Jul 2018 21:35:08 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:44980 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbeGZBfI (ORCPT ); Wed, 25 Jul 2018 21:35:08 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0EuRc166948; Thu, 26 Jul 2018 00:20:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=z8H+ail0NQpQedcQSQwfZ5bb8Fg9ZoTh77yvvhxPCY4=; b=41wQ9UMq2V9T0DEl34g7Cfre45APD3TNRSlpWEqqWi/ME9MzaOB5LpSE21+yXnf+XKg2 jWolpFoLcj82SGfeTtiw4FPD1wZavSAs3xURL3GCgQ2HdXT1l4AiBGfI5pR0enOT1h2x K0QbaveZ0Z2rmhVXJVsUwH2PaCDiq1D6TZ/XQHIjnfDTu95qRKIig7ggVtweZNvfFa2E yywmsBcPugJG5LS+N5tJF3wetSF/SRNyKylFy9mpCq5FW3wj3ovFKU+WuVhdPtrmZnwV pwecwUNBAdMDE2wrAL8x5cKl7IQuKW8rosL4QSrwer9xf0tYXV2xTDRbmzZvMfwGqhnL Nw== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2kbv8t7pr5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:20:58 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0KvRW000549 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:20:57 GMT Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6Q0KuhB008985; Thu, 26 Jul 2018 00:20:56 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:20:51 -0700 Subject: [PATCH 07/16] xfs: repair free space btrees From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:20:18 -0700 Message-ID: <153256441877.29021.16391351691957669894.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Rebuild the free space btrees from the gaps in the rmap btree. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/alloc.c | 1 fs/xfs/scrub/alloc_repair.c | 581 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/common.c | 8 + fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 4 fs/xfs/scrub/trace.h | 2 fs/xfs/xfs_extent_busy.c | 14 + fs/xfs/xfs_extent_busy.h | 2 9 files changed, 610 insertions(+), 5 deletions(-) create mode 100644 fs/xfs/scrub/alloc_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 57ec46951ede..44ddd112acd2 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -164,6 +164,7 @@ xfs-$(CONFIG_XFS_QUOTA) += scrub/quota.o ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y) xfs-y += $(addprefix scrub/, \ agheader_repair.o \ + alloc_repair.o \ bitmap.o \ repair.o \ ) diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c index 036b5c7021eb..c9b34ba312ab 100644 --- a/fs/xfs/scrub/alloc.c +++ b/fs/xfs/scrub/alloc.c @@ -15,7 +15,6 @@ #include "xfs_log_format.h" #include "xfs_trans.h" #include "xfs_sb.h" -#include "xfs_alloc.h" #include "xfs_rmap.h" #include "xfs_alloc.h" #include "scrub/xfs_scrub.h" diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c new file mode 100644 index 000000000000..b228c2906de2 --- /dev/null +++ b/fs/xfs/scrub/alloc_repair.c @@ -0,0 +1,581 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_alloc.h" +#include "xfs_alloc_btree.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_inode.h" +#include "xfs_refcount.h" +#include "xfs_extent_busy.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" + +/* + * Free Space Btree Repair + * ======================= + * + * The reverse mappings are supposed to record all space usage for the entire + * AG. Therefore, we can recalculate the free extents in an AG by looking for + * gaps in the physical extents recorded in the rmapbt. On a reflink + * filesystem this is a little more tricky in that we have to be aware that + * the rmap records are allowed to overlap. + * + * We derive which blocks belonged to the old bnobt/cntbt by recording all the + * OWN_AG extents and subtracting out the blocks owned by all other OWN_AG + * metadata: the rmapbt blocks visited while iterating the reverse mappings + * and the AGFL blocks. + * + * Once we have both of those pieces, we can reconstruct the bnobt and cntbt + * by blowing out the free block state and freeing all the extents that we + * found. This adds the requirement that we can't have any busy extents in + * the AG because the busy code cannot handle duplicate records. + * + * Note that we can only rebuild both free space btrees at the same time + * because the regular extent freeing infrastructure loads both btrees at the + * same time. + * + * We use the prefix 'xrep_abt' here because we regenerate both free space + * allocation btrees at the same time. + */ + +struct xrep_abt_extent { + struct list_head list; + xfs_agblock_t bno; + xfs_extlen_t len; +}; + +struct xrep_abt { + /* Blocks owned by the rmapbt or the agfl. */ + struct xfs_bitmap nobtlist; + + /* All OWN_AG blocks. */ + struct xfs_bitmap *btlist; + + /* Free space extents. */ + struct list_head *extlist; + + struct xfs_scrub *sc; + + /* Length of extlist. */ + uint64_t nr_records; + + /* + * Next block we anticipate seeing in the rmap records. If the next + * rmap record is greater than next_bno, we have found unused space. + */ + xfs_agblock_t next_bno; + + /* Number of free blocks in this AG. */ + xfs_agblock_t nr_blocks; +}; + +/* Record extents that aren't in use from gaps in the rmap records. */ +STATIC int +xrep_abt_walk_rmap( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_abt *ra = priv; + struct xrep_abt_extent *rae; + xfs_fsblock_t fsb; + int error; + + /* Record all the OWN_AG blocks... */ + if (rec->rm_owner == XFS_RMAP_OWN_AG) { + fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, + rec->rm_startblock); + error = xfs_bitmap_set(ra->btlist, fsb, rec->rm_blockcount); + if (error) + return error; + } + + /* ...and all the rmapbt blocks... */ + error = xfs_bitmap_set_btcur_path(&ra->nobtlist, cur); + if (error) + return error; + + /* ...and all the free space. */ + if (rec->rm_startblock > ra->next_bno) { + trace_xrep_abt_walk_rmap(cur->bc_mp, cur->bc_private.a.agno, + ra->next_bno, rec->rm_startblock - ra->next_bno, + XFS_RMAP_OWN_NULL, 0, 0); + + rae = kmem_alloc(sizeof(struct xrep_abt_extent), KM_MAYFAIL); + if (!rae) + return -ENOMEM; + INIT_LIST_HEAD(&rae->list); + rae->bno = ra->next_bno; + rae->len = rec->rm_startblock - ra->next_bno; + list_add_tail(&rae->list, ra->extlist); + ra->nr_records++; + ra->nr_blocks += rae->len; + } + ra->next_bno = max_t(xfs_agblock_t, ra->next_bno, + rec->rm_startblock + rec->rm_blockcount); + return 0; +} + +/* Collect an AGFL block for the not-to-release list. */ +static int +xrep_abt_walk_agfl( + struct xfs_mount *mp, + xfs_agblock_t bno, + void *priv) +{ + struct xrep_abt *ra = priv; + xfs_fsblock_t fsb; + + fsb = XFS_AGB_TO_FSB(mp, ra->sc->sa.agno, bno); + return xfs_bitmap_set(&ra->nobtlist, fsb, 1); +} + +/* Compare two free space extents. */ +static int +xrep_abt_extent_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_abt_extent *ap; + struct xrep_abt_extent *bp; + + ap = container_of(a, struct xrep_abt_extent, list); + bp = container_of(b, struct xrep_abt_extent, list); + + if (ap->bno > bp->bno) + return 1; + else if (ap->bno < bp->bno) + return -1; + return 0; +} + +/* Free an extent, which creates a record in the bnobt/cntbt. */ +STATIC int +xrep_abt_free_extent( + struct xfs_scrub *sc, + xfs_fsblock_t fsbno, + xfs_extlen_t len, + struct xfs_owner_info *oinfo) +{ + int error; + + error = xfs_free_extent(sc->tp, fsbno, len, oinfo, 0); + if (error) + return error; + error = xrep_roll_ag_trans(sc); + if (error) + return error; + return xfs_mod_fdblocks(sc->mp, -(int64_t)len, false); +} + +/* Find the longest free extent in the list. */ +static struct xrep_abt_extent * +xrep_abt_get_longest( + struct list_head *free_extents) +{ + struct xrep_abt_extent *rae; + struct xrep_abt_extent *res = NULL; + + list_for_each_entry(rae, free_extents, list) { + if (!res || rae->len > res->len) + res = rae; + } + return res; +} + +/* + * Allocate a block from the (cached) first extent in the AG. In theory + * this should never fail, since we already checked that there was enough + * space to handle the new btrees. + */ +STATIC xfs_fsblock_t +xrep_abt_alloc_block( + struct xfs_scrub *sc, + struct list_head *free_extents) +{ + struct xrep_abt_extent *ext; + + /* Pull the first free space extent off the list, and... */ + ext = list_first_entry(free_extents, struct xrep_abt_extent, list); + + /* ...take its first block. */ + ext->bno++; + ext->len--; + if (ext->len == 0) { + list_del(&ext->list); + kmem_free(ext); + } + + return XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, ext->bno - 1); +} + +/* Free every record in the extent list. */ +STATIC void +xrep_abt_cancel_freelist( + struct list_head *extlist) +{ + struct xrep_abt_extent *rae; + struct xrep_abt_extent *n; + + list_for_each_entry_safe(rae, n, extlist, list) { + list_del(&rae->list); + kmem_free(rae); + } +} + +/* + * Iterate all reverse mappings to find (1) the free extents, (2) the OWN_AG + * extents, (3) the rmapbt blocks, and (4) the AGFL blocks. The free space is + * (1) + (2) - (3) - (4). Figure out if we have enough free space to + * reconstruct the free space btrees. Caller must clean up the input lists + * if something goes wrong. + */ +STATIC int +xrep_abt_find_freespace( + struct xfs_scrub *sc, + struct list_head *free_extents, + struct xfs_bitmap *old_allocbt_blocks) +{ + struct xrep_abt ra; + struct xrep_abt_extent *rae; + struct xfs_btree_cur *cur; + struct xfs_mount *mp = sc->mp; + xfs_agblock_t agend; + xfs_agblock_t nr_blocks; + int error; + + ra.extlist = free_extents; + ra.btlist = old_allocbt_blocks; + xfs_bitmap_init(&ra.nobtlist); + ra.next_bno = 0; + ra.nr_records = 0; + ra.nr_blocks = 0; + ra.sc = sc; + + /* + * Iterate all the reverse mappings to find gaps in the physical + * mappings, all the OWN_AG blocks, and all the rmapbt extents. + */ + cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno); + error = xfs_rmap_query_all(cur, xrep_abt_walk_rmap, &ra); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + cur = NULL; + + /* Insert a record for space between the last rmap and EOAG. */ + agend = be32_to_cpu(XFS_BUF_TO_AGF(sc->sa.agf_bp)->agf_length); + if (ra.next_bno < agend) { + rae = kmem_alloc(sizeof(struct xrep_abt_extent), KM_MAYFAIL); + if (!rae) { + error = -ENOMEM; + goto err; + } + INIT_LIST_HEAD(&rae->list); + rae->bno = ra.next_bno; + rae->len = agend - ra.next_bno; + list_add_tail(&rae->list, free_extents); + ra.nr_records++; + ra.nr_blocks += rae->len; + } + + /* Collect all the AGFL blocks. */ + error = xfs_agfl_walk(mp, XFS_BUF_TO_AGF(sc->sa.agf_bp), + sc->sa.agfl_bp, xrep_abt_walk_agfl, &ra); + if (error) + goto err; + + /* Do we have enough space to rebuild both freespace btrees? */ + nr_blocks = 2 * xfs_allocbt_calc_size(mp, ra.nr_records); + if (!xrep_ag_has_space(sc->sa.pag, nr_blocks, XFS_AG_RESV_NONE) || + ra.nr_blocks < nr_blocks) { + error = -ENOSPC; + goto err; + } + + /* Compute the old bnobt/cntbt blocks. */ + error = xfs_bitmap_disunion(old_allocbt_blocks, &ra.nobtlist); +err: + xfs_bitmap_destroy(&ra.nobtlist); + if (cur) + xfs_btree_del_cursor(cur, error); + return error; +} + +/* + * Reset the global free block counter and the per-AG counters to make it look + * like this AG has no free space. + */ +STATIC int +xrep_abt_reset_counters( + struct xfs_scrub *sc, + int *log_flags) +{ + struct xfs_perag *pag = sc->sa.pag; + struct xfs_agf *agf; + xfs_agblock_t new_btblks; + xfs_agblock_t to_free; + int error; + + /* + * Since we're abandoning the old bnobt/cntbt, we have to decrease + * fdblocks by the # of blocks in those trees. btreeblks counts the + * non-root blocks of the free space and rmap btrees. Do this before + * resetting the AGF counters. + */ + agf = XFS_BUF_TO_AGF(sc->sa.agf_bp); + + /* rmap_blocks accounts root block, btreeblks doesn't */ + new_btblks = be32_to_cpu(agf->agf_rmap_blocks) - 1; + + /* btreeblks doesn't account bno/cnt root blocks */ + to_free = pag->pagf_btreeblks + 2; + + /* and don't account for the blocks we aren't freeing */ + to_free -= new_btblks; + + error = xfs_mod_fdblocks(sc->mp, -(int64_t)to_free, false); + if (error) + return error; + + /* + * Reset the per-AG info, both incore and ondisk. Mark the incore + * state stale in case we fail out of here. + */ + ASSERT(pag->pagf_init); + pag->pagf_init = 0; + pag->pagf_btreeblks = new_btblks; + pag->pagf_freeblks = 0; + pag->pagf_longest = 0; + + agf->agf_btreeblks = cpu_to_be32(new_btblks); + agf->agf_freeblks = 0; + agf->agf_longest = 0; + *log_flags |= XFS_AGF_BTREEBLKS | XFS_AGF_LONGEST | XFS_AGF_FREEBLKS; + + return 0; +} + +/* Initialize a new free space btree root and implant into AGF. */ +STATIC int +xrep_abt_reset_btree( + struct xfs_scrub *sc, + xfs_btnum_t btnum, + struct list_head *free_extents) +{ + struct xfs_owner_info oinfo; + struct xfs_buf *bp; + struct xfs_perag *pag = sc->sa.pag; + struct xfs_mount *mp = sc->mp; + struct xfs_agf *agf = XFS_BUF_TO_AGF(sc->sa.agf_bp); + xfs_fsblock_t fsbno; + int error; + + /* Allocate new root block. */ + fsbno = xrep_abt_alloc_block(sc, free_extents); + if (fsbno == NULLFSBLOCK) + return -ENOSPC; + + /* Initialize new tree root. */ + error = xrep_init_btblock(sc, fsbno, &bp, btnum, &xfs_allocbt_buf_ops); + if (error) + return error; + + /* Implant into AGF. */ + agf->agf_roots[btnum] = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, fsbno)); + agf->agf_levels[btnum] = cpu_to_be32(1); + + /* Add rmap records for the btree roots */ + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG); + error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno, + XFS_FSB_TO_AGBNO(mp, fsbno), 1, &oinfo); + if (error) + return error; + + /* Reset the incore state. */ + pag->pagf_levels[btnum] = 1; + + return 0; +} + +/* Initialize new bnobt/cntbt roots and implant them into the AGF. */ +STATIC int +xrep_abt_reset_btrees( + struct xfs_scrub *sc, + struct list_head *free_extents, + int *log_flags) +{ + int error; + + error = xrep_abt_reset_btree(sc, XFS_BTNUM_BNOi, free_extents); + if (error) + return error; + error = xrep_abt_reset_btree(sc, XFS_BTNUM_CNTi, free_extents); + if (error) + return error; + + *log_flags |= XFS_AGF_ROOTS | XFS_AGF_LEVELS; + return 0; +} + +/* + * Make our new freespace btree roots permanent so that we can start freeing + * unused space back into the AG. + */ +STATIC int +xrep_abt_commit_new( + struct xfs_scrub *sc, + struct xfs_bitmap *old_allocbt_blocks, + int log_flags) +{ + int error; + + xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, log_flags); + + /* Invalidate the old freespace btree blocks and commit. */ + error = xrep_invalidate_blocks(sc, old_allocbt_blocks); + if (error) + return error; + error = xrep_roll_ag_trans(sc); + if (error) + return error; + + /* Now that we've succeeded, mark the incore state valid again. */ + sc->sa.pag->pagf_init = 1; + return 0; +} + +/* Build new free space btrees and dispose of the old one. */ +STATIC int +xrep_abt_rebuild_trees( + struct xfs_scrub *sc, + struct list_head *free_extents, + struct xfs_bitmap *old_allocbt_blocks) +{ + struct xfs_owner_info oinfo; + struct xrep_abt_extent *rae; + struct xrep_abt_extent *n; + struct xrep_abt_extent *longest; + int error; + + xfs_rmap_skip_owner_update(&oinfo); + + /* + * Insert the longest free extent in case it's necessary to + * refresh the AGFL with multiple blocks. If there is no longest + * extent, we had exactly the free space we needed; we're done. + */ + longest = xrep_abt_get_longest(free_extents); + if (!longest) + goto done; + error = xrep_abt_free_extent(sc, + XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, longest->bno), + longest->len, &oinfo); + list_del(&longest->list); + kmem_free(longest); + if (error) + return error; + + /* Insert records into the new btrees. */ + list_for_each_entry_safe(rae, n, free_extents, list) { + error = xrep_abt_free_extent(sc, + XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, rae->bno), + rae->len, &oinfo); + if (error) + return error; + list_del(&rae->list); + kmem_free(rae); + } + +done: + /* Free all the OWN_AG blocks that are not in the rmapbt/agfl. */ + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG); + return xrep_reap_extents(sc, old_allocbt_blocks, &oinfo, + XFS_AG_RESV_NONE); +} + +/* Repair the freespace btrees for some AG. */ +int +xrep_allocbt( + struct xfs_scrub *sc) +{ + struct list_head free_extents; + struct xfs_bitmap old_allocbt_blocks; + struct xfs_mount *mp = sc->mp; + int log_flags = 0; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + + /* + * Make sure the busy extent list is clear because we can't put + * extents on there twice. + */ + if (!xfs_extent_busy_list_empty(sc->sa.pag)) + return -EDEADLOCK; + + /* Collect the free space data and find the old btree blocks. */ + INIT_LIST_HEAD(&free_extents); + xfs_bitmap_init(&old_allocbt_blocks); + error = xrep_abt_find_freespace(sc, &free_extents, &old_allocbt_blocks); + if (error) + goto out; + + /* Make sure we got some free space. */ + if (list_empty(&free_extents)) { + error = -ENOSPC; + goto out; + } + + /* + * Sort the free extents by block number to avoid bnobt splits when we + * rebuild the free space btrees. + */ + list_sort(NULL, &free_extents, xrep_abt_extent_cmp); + + /* + * Blow out the old free space btrees. This is the point at which + * we are no longer able to bail out gracefully. + */ + error = xrep_abt_reset_counters(sc, &log_flags); + if (error) + goto out; + error = xrep_abt_reset_btrees(sc, &free_extents, &log_flags); + if (error) + goto out; + error = xrep_abt_commit_new(sc, &old_allocbt_blocks, log_flags); + if (error) + goto out; + + /* Now rebuild the freespace information. */ + error = xrep_abt_rebuild_trees(sc, &free_extents, &old_allocbt_blocks); +out: + xrep_abt_cancel_freelist(&free_extents); + xfs_bitmap_destroy(&old_allocbt_blocks); + return error; +} diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 6512d8fb67e1..5b910aecc59a 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -623,8 +623,14 @@ xchk_setup_ag_btree( * expensive operation should be performed infrequently and only * as a last resort. Any caller that sets force_log should * document why they need to do so. + * + * Force everything in memory out to disk if we're repairing. + * This ensures we won't get tripped up by btree blocks sitting + * in memory waiting to have LSNs stamped in. The AGF/AGI repair + * routines use any available rmap data to try to find a btree + * root that also passes the read verifiers. */ - if (force_log) { + if (force_log || (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) { error = xchk_checkpoint_log(mp); if (error) return error; diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 9de321eee4ab..bc1a5f1cbcdc 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -61,6 +61,7 @@ int xrep_superblock(struct xfs_scrub *sc); int xrep_agf(struct xfs_scrub *sc); int xrep_agfl(struct xfs_scrub *sc); int xrep_agi(struct xfs_scrub *sc); +int xrep_allocbt(struct xfs_scrub *sc); #else @@ -87,6 +88,7 @@ xrep_calc_ag_resblks( #define xrep_agf xrep_notsupported #define xrep_agfl xrep_notsupported #define xrep_agi xrep_notsupported +#define xrep_allocbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 4bfae1e61d30..2133a3199372 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -232,13 +232,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_PERAG, .setup = xchk_setup_ag_allocbt, .scrub = xchk_bnobt, - .repair = xrep_notsupported, + .repair = xrep_allocbt, }, [XFS_SCRUB_TYPE_CNTBT] = { /* cntbt */ .type = ST_PERAG, .setup = xchk_setup_ag_allocbt, .scrub = xchk_cntbt, - .repair = xrep_notsupported, + .repair = xrep_allocbt, }, [XFS_SCRUB_TYPE_INOBT] = { /* inobt */ .type = ST_PERAG, diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 4e20f0e48232..26bd5dc68efe 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -551,7 +551,7 @@ DEFINE_EVENT(xrep_rmap_class, name, \ xfs_agblock_t agbno, xfs_extlen_t len, \ uint64_t owner, uint64_t offset, unsigned int flags), \ TP_ARGS(mp, agno, agbno, len, owner, offset, flags)) -DEFINE_REPAIR_RMAP_EVENT(xrep_alloc_extent_fn); +DEFINE_REPAIR_RMAP_EVENT(xrep_abt_walk_rmap); DEFINE_REPAIR_RMAP_EVENT(xrep_ialloc_extent_fn); DEFINE_REPAIR_RMAP_EVENT(xrep_rmap_extent_fn); DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_extent_fn); diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index 0ed68379e551..82f99633a597 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -657,3 +657,17 @@ xfs_extent_busy_ag_cmp( diff = b1->bno - b2->bno; return diff; } + +/* Are there any busy extents in this AG? */ +bool +xfs_extent_busy_list_empty( + struct xfs_perag *pag) +{ + spin_lock(&pag->pagb_lock); + if (pag->pagb_tree.rb_node) { + spin_unlock(&pag->pagb_lock); + return false; + } + spin_unlock(&pag->pagb_lock); + return true; +} diff --git a/fs/xfs/xfs_extent_busy.h b/fs/xfs/xfs_extent_busy.h index 990ab3891971..2f8c73c712c6 100644 --- a/fs/xfs/xfs_extent_busy.h +++ b/fs/xfs/xfs_extent_busy.h @@ -65,4 +65,6 @@ static inline void xfs_extent_busy_sort(struct list_head *list) list_sort(NULL, list, xfs_extent_busy_ag_cmp); } +bool xfs_extent_busy_list_empty(struct xfs_perag *pag); + #endif /* __XFS_EXTENT_BUSY_H__ */ From patchwork Thu Jul 26 00:21:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544987 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 922E31822 for ; Thu, 26 Jul 2018 00:21:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D6122AA95 for ; Thu, 26 Jul 2018 00:21:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7BAAB2AAAD; Thu, 26 Jul 2018 00:21:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D1EA02AACA for ; Thu, 26 Jul 2018 00:21:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728499AbeGZBfP (ORCPT ); Wed, 25 Jul 2018 21:35:15 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:45032 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbeGZBfP (ORCPT ); Wed, 25 Jul 2018 21:35:15 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0E3Hj166557; Thu, 26 Jul 2018 00:21:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=uDkGS0n96dqRsO+x1x0UnH+n00+xgqUirZOgZQzE8w4=; b=hY0OaOW2cd8yrc8IEEOl7zvS+ZUG9PuW76Xuh3UmYZtPrhtoWXxN47Ha15FP33VbmXUB y96khKFlZT5BoWpHEtpDQ2QeQj2hUB0ZRuNWV163w7AygJN/kjF9msbAOySj82YL7POs rIjWw+49Nti/0Ow1tmleW17Eg3QSNECS3zjAvE5wqhZ7QNV588TRbQAD1ntyIPVgGLqp I2Hk55MPCSR6A9RUUXWDp/fheOYjQ8COjzxNM/MkrLnQnOyqOg3VyXoCup7xAiJH8qqu HPTVrC51Pe6myNRSx4zAOS7aHO0avoYbED8iJWcXLohCrYtLEbx0P29/PVepzHibEM+z oA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2130.oracle.com with ESMTP id 2kbv8t7pr9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:04 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0L3Vq007147 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:04 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6Q0L3Pp026975; Thu, 26 Jul 2018 00:21:03 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:21:02 -0700 Subject: [PATCH 08/16] xfs: repair inode btrees From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:21:00 -0700 Message-ID: <153256446066.29021.15943259538695674884.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Use the rmapbt to find inode chunks, query the chunks to compute hole and free masks, and with that information rebuild the inobt and finobt. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/common.c | 2 fs/xfs/scrub/ialloc_repair.c | 673 ++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.c | 20 + fs/xfs/scrub/repair.h | 11 + fs/xfs/scrub/scrub.c | 4 fs/xfs/scrub/scrub.h | 1 fs/xfs/scrub/trace.h | 4 8 files changed, 712 insertions(+), 4 deletions(-) create mode 100644 fs/xfs/scrub/ialloc_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 44ddd112acd2..af1dc9aeb1a7 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -166,6 +166,7 @@ xfs-y += $(addprefix scrub/, \ agheader_repair.o \ alloc_repair.o \ bitmap.o \ + ialloc_repair.o \ repair.o \ ) endif diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 5b910aecc59a..d03c4df38ac8 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -516,6 +516,8 @@ xchk_ag_free( struct xchk_ag *sa) { xchk_ag_btcur_free(sa); + if (sa->pag != NULL && sc->reset_perag_resv) + xrep_reset_perag_resv(sc); if (sa->agfl_bp) { xfs_trans_brelse(sc->tp, sa->agfl_bp); sa->agfl_bp = NULL; diff --git a/fs/xfs/scrub/ialloc_repair.c b/fs/xfs/scrub/ialloc_repair.c new file mode 100644 index 000000000000..126135c1a147 --- /dev/null +++ b/fs/xfs/scrub/ialloc_repair.c @@ -0,0 +1,673 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_alloc.h" +#include "xfs_ialloc.h" +#include "xfs_ialloc_btree.h" +#include "xfs_icache.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_log.h" +#include "xfs_trans_priv.h" +#include "xfs_error.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" + +/* + * Inode Btree Repair + * ================== + * + * A quick refresher of inode btrees on a v5 filesystem: + * + * - Each inode btree record can describe a single 'inode chunk'. The chunk + * size is defined to be 64 inodes. If sparse inodes are enabled, every + * inobt record must be aligned to the chunk size. A chunk can be smaller + * than a fs block. One must be careful with 64k-block filesystems whose + * inodes are smaller than 1k. + * + * - Inode buffers are read into memory in units of 'inode clusters'. However + * many inodes fit in a cluster buffer is the smallest number of inodes that + * can be allocated or freed. Clusters are never larger than a chunk and + * never smaller than a fs block. If sparse inodes are not enabled, then + * records can be aligned to a cluster. + * + * - If sparse inodes are enabled, the holemask field will be active. Each + * bit of the holemask represents 4 potential inodes; if set, the + * corresponding space does *not* contain inodes and must be left alone. + * + * So what's the rebuild algorithm? + * + * Iterate the reverse mapping records looking for OWN_INODES and OWN_INOBT + * records. The OWN_INOBT records are the old inode btree blocks and will be + * cleared out after we've rebuilt the tree. Each possible inode chunk within + * an OWN_INODES record will be read in and the freemask calculated from the + * i_mode data in the inode chunk. For sparse inodes the holemask will be + * calculated by creating the properly aligned inobt record and punching out + * any chunk that's missing. Inode allocations and frees grab the AGI first, + * so repair protects itself from concurrent access by locking the AGI. + * + * Once we've reconstructed all the inode records, we can create new inode + * btree roots and reload the btrees. We rebuild both inode trees at the same + * time because they have the same rmap owner and it would be more complex to + * figure out if the other tree isn't in need of a rebuild and which OWN_INOBT + * blocks it owns. We have all the data we need to build both, so dump + * everything and start over. + * + * We use the prefix 'xrep_ibt' because we rebuild both inode btrees. + */ + +struct xrep_ibt_extent { + struct list_head list; + xfs_inofree_t freemask; + xfs_agino_t startino; + unsigned int count; + unsigned int usedcount; + uint16_t holemask; +}; + +struct xrep_ibt { + /* Reconstructed inode records. */ + struct list_head *extlist; + + /* Old inode btree blocks we found in the rmap. */ + struct xfs_bitmap *btlist; + + struct xfs_scrub *sc; + + /* Number of inode btree block records. */ + uint64_t nr_records; +}; + +/* + * Is this inode in use? If the inode is in memory we can tell from i_mode, + * otherwise we have to check di_mode in the on-disk buffer. We only care + * that the high (i.e. non-permission) bits of _mode are zero. This should be + * safe because repair keeps all AG headers locked until the end, and process + * trying to perform an inode allocation/free must lock the AGI. + */ +STATIC int +xrep_ibt_check_free( + struct xfs_scrub *sc, + struct xfs_buf *bp, + xfs_ino_t fsino, + xfs_agino_t bpino, + bool *inuse) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_dinode *dip; + int error; + + /* Will the in-core inode tell us if it's in use? */ + error = xfs_icache_inode_is_allocated(mp, sc->tp, fsino, inuse); + if (!error) + return 0; + + /* Inode uncached or half assembled, read disk buffer */ + dip = xfs_buf_offset(bp, bpino * mp->m_sb.sb_inodesize); + if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC) + return -EFSCORRUPTED; + + if (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino) + return -EFSCORRUPTED; + + *inuse = dip->di_mode != 0; + return 0; +} + +/* + * For each inode cluster covering the physical extent recorded by the rmapbt, + * we must calculate the properly aligned startino of that cluster, then + * iterate each cluster to fill in used and filled masks appropriately. We + * then use the (startino, used, filled) information to construct the + * appropriate inode records. + */ +STATIC int +xrep_ibt_process_cluster( + struct xrep_ibt *ri, + xfs_agblock_t agbno, + int blks_per_cluster, + xfs_agino_t rec_agino) +{ + struct xfs_imap imap; + struct xrep_ibt_extent *rie; + struct xfs_dinode *dip; + struct xfs_buf *bp; + struct xfs_scrub *sc = ri->sc; + struct xfs_mount *mp = sc->mp; + xfs_ino_t fsino; + xfs_inofree_t usedmask; + xfs_agino_t nr_inodes; + xfs_agino_t startino; + xfs_agino_t clusterino; + xfs_agino_t clusteroff; + xfs_agino_t agino; + uint16_t fillmask; + bool inuse; + int usedcount; + int error; + + /* The per-AG inum of this inode cluster. */ + agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0); + + /* The per-AG inum of the inobt record. */ + startino = rec_agino + rounddown(agino - rec_agino, + XFS_INODES_PER_CHUNK); + + /* The per-AG inum of the cluster within the inobt record. */ + clusteroff = agino - startino; + + /* Every inode in this holemask slot is filled. */ + nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0); + fillmask = xfs_inobt_maskn(clusteroff / XFS_INODES_PER_HOLEMASK_BIT, + nr_inodes / XFS_INODES_PER_HOLEMASK_BIT); + + /* + * Grab the inode cluster buffer. This is safe to do with a broken + * inobt because imap_to_bp directly maps the buffer without touching + * either inode btree. + */ + imap.im_blkno = XFS_AGB_TO_DADDR(mp, sc->sa.agno, agbno); + imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster); + imap.im_boffset = 0; + error = xfs_imap_to_bp(mp, sc->tp, &imap, &dip, &bp, 0, + XFS_IGET_UNTRUSTED); + if (error) + return error; + + usedmask = 0; + usedcount = 0; + /* Which inodes within this cluster are free? */ + for (clusterino = 0; clusterino < nr_inodes; clusterino++) { + fsino = XFS_AGINO_TO_INO(mp, sc->sa.agno, agino + clusterino); + error = xrep_ibt_check_free(sc, bp, fsino, + clusterino, &inuse); + if (error) { + xfs_trans_brelse(sc->tp, bp); + return error; + } + if (inuse) { + usedcount++; + usedmask |= XFS_INOBT_MASK(clusteroff + clusterino); + } + } + xfs_trans_brelse(sc->tp, bp); + + /* + * If the last item in the list is our chunk record, + * update that. + */ + if (!list_empty(ri->extlist)) { + rie = list_last_entry(ri->extlist, struct xrep_ibt_extent, + list); + if (rie->startino + XFS_INODES_PER_CHUNK > startino) { + rie->freemask &= ~usedmask; + rie->holemask &= ~fillmask; + rie->count += nr_inodes; + rie->usedcount += usedcount; + return 0; + } + } + + /* New inode chunk; add to the list. */ + rie = kmem_alloc(sizeof(struct xrep_ibt_extent), KM_MAYFAIL); + if (!rie) + return -ENOMEM; + + INIT_LIST_HEAD(&rie->list); + rie->startino = startino; + rie->freemask = XFS_INOBT_ALL_FREE & ~usedmask; + rie->holemask = XFS_INOBT_ALL_FREE & ~fillmask; + rie->count = nr_inodes; + rie->usedcount = usedcount; + list_add_tail(&rie->list, ri->extlist); + ri->nr_records++; + + return 0; +} + +/* Record extents that belong to inode btrees. */ +STATIC int +xrep_ibt_walk_rmap( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_ibt *ri = priv; + struct xfs_mount *mp = cur->bc_mp; + xfs_fsblock_t fsbno; + xfs_agblock_t agbno = rec->rm_startblock; + xfs_agino_t inoalign; + xfs_agino_t agino; + xfs_agino_t rec_agino; + int blks_per_cluster; + int error = 0; + + if (xchk_should_terminate(ri->sc, &error)) + return error; + + /* Fragment of the old btrees; dispose of them later. */ + if (rec->rm_owner == XFS_RMAP_OWN_INOBT) { + fsbno = XFS_AGB_TO_FSB(mp, ri->sc->sa.agno, agbno); + return xfs_bitmap_set(ri->btlist, fsbno, rec->rm_blockcount); + } + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != XFS_RMAP_OWN_INODES) + return 0; + + blks_per_cluster = xfs_icluster_size_fsb(mp); + + if (agbno % blks_per_cluster != 0) + return -EFSCORRUPTED; + + trace_xrep_ibt_walk_rmap(mp, ri->sc->sa.agno, rec->rm_startblock, + rec->rm_blockcount, rec->rm_owner, rec->rm_offset, + rec->rm_flags); + + /* + * Determine the inode block alignment, and where the block + * ought to start if it's aligned properly. On a sparse inode + * system the rmap doesn't have to start on an alignment boundary, + * but the record does. On pre-sparse filesystems, we /must/ + * start both rmap and inobt on an alignment boundary. + */ + inoalign = xfs_ialloc_cluster_alignment(mp); + agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0); + rec_agino = XFS_OFFBNO_TO_AGINO(mp, rounddown(agbno, inoalign), 0); + if (!xfs_sb_version_hassparseinodes(&mp->m_sb) && agino != rec_agino) + return -EFSCORRUPTED; + + /* + * Set up the free/hole masks for each inode cluster that could be + * mapped by this rmap record. + */ + for (; + agbno < rec->rm_startblock + rec->rm_blockcount; + agbno += blks_per_cluster) { + error = xrep_ibt_process_cluster(ri, agbno, blks_per_cluster, + rec_agino); + if (error) + return error; + } + + return 0; +} + +/* Compare two ialloc extents. */ +static int +xrep_ibt_extent_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_ibt_extent *ap; + struct xrep_ibt_extent *bp; + + ap = container_of(a, struct xrep_ibt_extent, list); + bp = container_of(b, struct xrep_ibt_extent, list); + + if (ap->startino > bp->startino) + return 1; + else if (ap->startino < bp->startino) + return -1; + return 0; +} + +/* Insert an inode chunk record into a given btree. */ +static int +xrep_ibt_insert_btrec( + struct xfs_btree_cur *cur, + struct xrep_ibt_extent *rie) +{ + int stat; + int error; + + error = xfs_inobt_lookup(cur, rie->startino, XFS_LOOKUP_EQ, &stat); + if (error) + return error; + XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 0); + error = xfs_inobt_insert_rec(cur, rie->holemask, rie->count, + rie->count - rie->usedcount, rie->freemask, &stat); + if (error) + return error; + XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 1); + return error; +} + +/* Insert an inode chunk record into both inode btrees. */ +static int +xrep_ibt_insert_rec( + struct xfs_scrub *sc, + struct xrep_ibt_extent *rie) +{ + struct xfs_btree_cur *cur; + int error; + + trace_xrep_ibt_insert(sc->mp, sc->sa.agno, rie->startino, + rie->holemask, rie->count, rie->count - rie->usedcount, + rie->freemask); + + /* Insert into the inobt. */ + cur = xfs_inobt_init_cursor(sc->mp, sc->tp, sc->sa.agi_bp, sc->sa.agno, + XFS_BTNUM_INO); + error = xrep_ibt_insert_btrec(cur, rie); + if (error) + goto out_cur; + xfs_btree_del_cursor(cur, error); + + /* Insert into the finobt if chunk has free inodes. */ + if (xfs_sb_version_hasfinobt(&sc->mp->m_sb) && + rie->count != rie->usedcount) { + cur = xfs_inobt_init_cursor(sc->mp, sc->tp, sc->sa.agi_bp, + sc->sa.agno, XFS_BTNUM_FINO); + error = xrep_ibt_insert_btrec(cur, rie); + if (error) + goto out_cur; + xfs_btree_del_cursor(cur, error); + } + + return xrep_roll_ag_trans(sc); +out_cur: + xfs_btree_del_cursor(cur, error); + return error; +} + +/* Free every record in the inode list. */ +STATIC void +xrep_ibt_cancel_inorecs( + struct list_head *reclist) +{ + struct xrep_ibt_extent *rie; + struct xrep_ibt_extent *n; + + list_for_each_entry_safe(rie, n, reclist, list) { + list_del(&rie->list); + kmem_free(rie); + } +} + +/* + * Iterate all reverse mappings to find the inodes (OWN_INODES) and the inode + * btrees (OWN_INOBT). Figure out if we have enough free space to reconstruct + * the inode btrees. The caller must clean up the lists if anything goes + * wrong. + */ +STATIC int +xrep_ibt_find_inodes( + struct xfs_scrub *sc, + struct list_head *inode_records, + struct xfs_bitmap *old_iallocbt_blocks) +{ + struct xrep_ibt ri; + struct xfs_mount *mp = sc->mp; + struct xfs_btree_cur *cur; + xfs_agblock_t nr_blocks; + int error; + + /* Collect all reverse mappings for inode blocks. */ + ri.extlist = inode_records; + ri.btlist = old_iallocbt_blocks; + ri.nr_records = 0; + ri.sc = sc; + + cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno); + error = xfs_rmap_query_all(cur, xrep_ibt_walk_rmap, &ri); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + + /* Do we have enough space to rebuild all inode trees? */ + nr_blocks = xfs_iallocbt_calc_size(mp, ri.nr_records); + if (xfs_sb_version_hasfinobt(&mp->m_sb)) + nr_blocks *= 2; + if (!xrep_ag_has_space(sc->sa.pag, nr_blocks, XFS_AG_RESV_NONE)) + return -ENOSPC; + + return 0; + +err: + xfs_btree_del_cursor(cur, error); + return error; +} + +/* Update the AGI counters. */ +STATIC int +xrep_ibt_reset_counters( + struct xfs_scrub *sc, + struct list_head *inode_records, + int *log_flags) +{ + struct xfs_agi *agi; + struct xrep_ibt_extent *rie; + struct xfs_perag *pag = sc->sa.pag; + unsigned int count = 0; + unsigned int usedcount = 0; + unsigned int freecount; + + /* Figure out the new counters. */ + list_for_each_entry(rie, inode_records, list) { + count += rie->count; + usedcount += rie->usedcount; + } + + agi = XFS_BUF_TO_AGI(sc->sa.agi_bp); + freecount = count - usedcount; + + /* Trigger inode count recalculation */ + xfs_force_summary_recalc(sc->mp); + + /* + * Reset the per-AG info, both incore and ondisk. Mark the incore + * state stale in case we fail out of here. + */ + ASSERT(pag->pagi_init); + pag->pagi_init = 0; + pag->pagi_count = count; + pag->pagi_freecount = freecount; + + agi->agi_count = cpu_to_be32(count); + agi->agi_freecount = cpu_to_be32(freecount); + *log_flags |= XFS_AGI_COUNT | XFS_AGI_FREECOUNT; + + return 0; +} + +/* Initialize a new inode btree roots and implant it into the AGI. */ +STATIC int +xrep_ibt_reset_btree( + struct xfs_scrub *sc, + xfs_btnum_t btnum, + struct xfs_owner_info *oinfo, + enum xfs_ag_resv_type resv, + int *log_flags) +{ + struct xfs_agi *agi; + struct xfs_buf *bp; + struct xfs_mount *mp = sc->mp; + xfs_fsblock_t fsbno; + int error; + + agi = XFS_BUF_TO_AGI(sc->sa.agi_bp); + + /* Initialize new btree root. */ + error = xrep_alloc_ag_block(sc, oinfo, &fsbno, resv); + if (error) + return error; + error = xrep_init_btblock(sc, fsbno, &bp, btnum, &xfs_inobt_buf_ops); + if (error) + return error; + + switch (btnum) { + case XFS_BTNUM_INOi: + agi->agi_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, fsbno)); + agi->agi_level = cpu_to_be32(1); + *log_flags |= XFS_AGI_ROOT | XFS_AGI_LEVEL; + break; + case XFS_BTNUM_FINOi: + agi->agi_free_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, fsbno)); + agi->agi_free_level = cpu_to_be32(1); + *log_flags |= XFS_AGI_FREE_ROOT | XFS_AGI_FREE_LEVEL; + break; + default: + ASSERT(0); + } + + return 0; +} + +/* Initialize new inobt/finobt roots and implant them into the AGI. */ +STATIC int +xrep_ibt_reset_btrees( + struct xfs_scrub *sc, + struct xfs_owner_info *oinfo, + int *log_flags) +{ + enum xfs_ag_resv_type resv; + int error; + + resv = XFS_AG_RESV_NONE; + error = xrep_ibt_reset_btree(sc, XFS_BTNUM_INO, oinfo, XFS_AG_RESV_NONE, + log_flags); + if (error || !xfs_sb_version_hasfinobt(&sc->mp->m_sb)) + return error; + + /* + * If we made a per-AG reservation for the finobt then we must account + * the new block correctly. + */ + if (!sc->mp->m_inotbt_nores) + resv = XFS_AG_RESV_METADATA; + return xrep_ibt_reset_btree(sc, XFS_BTNUM_FINO, oinfo, resv, log_flags); +} + +/* Build new inode btrees and dispose of the old one. */ +STATIC int +xrep_ibt_rebuild_trees( + struct xfs_scrub *sc, + struct list_head *inode_records, + struct xfs_owner_info *oinfo, + struct xfs_bitmap *old_iallocbt_blocks) +{ + struct xrep_ibt_extent *rie; + struct xrep_ibt_extent *n; + int error; + + /* Add all records. */ + list_sort(NULL, inode_records, xrep_ibt_extent_cmp); + list_for_each_entry_safe(rie, n, inode_records, list) { + error = xrep_ibt_insert_rec(sc, rie); + if (error) + return error; + + list_del(&rie->list); + kmem_free(rie); + } + + /* Free the old inode btree blocks if they're not in use. */ + return xrep_reap_extents(sc, old_iallocbt_blocks, oinfo, + XFS_AG_RESV_NONE); +} + +/* + * Make our new inode btree roots permanent so that we can start re-adding + * inode records back into the AG. + */ +STATIC int +xrep_ibt_commit_new( + struct xfs_scrub *sc, + struct xfs_bitmap *old_iallocbt_blocks, + int log_flags) +{ + int error; + + xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, log_flags); + + /* Invalidate all the inobt/finobt blocks in btlist. */ + error = xrep_invalidate_blocks(sc, old_iallocbt_blocks); + if (error) + return error; + error = xrep_roll_ag_trans(sc); + if (error) + return error; + + /* + * Now that we've succeeded, mark the incore state valid again. If the + * finobt is enabled, make sure we reinitialize the per-AG reservations + * when we're done. + */ + sc->sa.pag->pagi_init = 1; + if (xfs_sb_version_hasfinobt(&sc->mp->m_sb)) + sc->reset_perag_resv = true; + return 0; +} + +/* Repair both inode btrees. */ +int +xrep_iallocbt( + struct xfs_scrub *sc) +{ + struct xfs_owner_info oinfo; + struct list_head inode_records; + struct xfs_bitmap old_iallocbt_blocks; + struct xfs_mount *mp = sc->mp; + int log_flags = 0; + int error = 0; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + + /* Collect the free space data and find the old btree blocks. */ + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT); + INIT_LIST_HEAD(&inode_records); + xfs_bitmap_init(&old_iallocbt_blocks); + error = xrep_ibt_find_inodes(sc, &inode_records, &old_iallocbt_blocks); + if (error) + goto out; + + /* + * Blow out the old inode btrees. This is the point at which + * we are no longer able to bail out gracefully. + */ + error = xrep_ibt_reset_counters(sc, &inode_records, &log_flags); + if (error) + goto out; + error = xrep_ibt_reset_btrees(sc, &oinfo, &log_flags); + if (error) + goto out; + error = xrep_ibt_commit_new(sc, &old_iallocbt_blocks, log_flags); + if (error) + goto out; + + /* Now rebuild the inode information. */ + error = xrep_ibt_rebuild_trees(sc, &inode_records, &oinfo, + &old_iallocbt_blocks); + if (error) + goto out; +out: + xrep_ibt_cancel_inorecs(&inode_records); + xfs_bitmap_destroy(&old_iallocbt_blocks); + return error; +} diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 17cf48564390..a44deb6f06ab 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -880,3 +880,23 @@ xrep_ino_dqattach( return error; } + +/* + * Reinitialize the per-AG block reservation for the AG we just fixed. + */ +int +xrep_reset_perag_resv( + struct xfs_scrub *sc) +{ + int error; + + ASSERT(sc->ops->type == ST_PERAG); + ASSERT(sc->tp); + + error = xfs_ag_resv_free(sc->sa.pag); + if (error) + goto out; + error = xfs_ag_resv_init(sc->sa.pag, sc->tp); +out: + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index bc1a5f1cbcdc..0cc53dee3228 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -53,6 +53,7 @@ int xrep_find_ag_btree_roots(struct xfs_scrub *sc, struct xfs_buf *agf_bp, struct xrep_find_ag_btree *btree_info, struct xfs_buf *agfl_bp); void xrep_force_quotacheck(struct xfs_scrub *sc, uint dqtype); int xrep_ino_dqattach(struct xfs_scrub *sc); +int xrep_reset_perag_resv(struct xfs_scrub *sc); /* Metadata repairers */ @@ -62,6 +63,7 @@ int xrep_agf(struct xfs_scrub *sc); int xrep_agfl(struct xfs_scrub *sc); int xrep_agi(struct xfs_scrub *sc); int xrep_allocbt(struct xfs_scrub *sc); +int xrep_iallocbt(struct xfs_scrub *sc); #else @@ -83,12 +85,21 @@ xrep_calc_ag_resblks( return 0; } +static inline int +xrep_reset_perag_resv( + struct xfs_scrub *sc) +{ + ASSERT(0); + return -EOPNOTSUPP; +} + #define xrep_probe xrep_notsupported #define xrep_superblock xrep_notsupported #define xrep_agf xrep_notsupported #define xrep_agfl xrep_notsupported #define xrep_agi xrep_notsupported #define xrep_allocbt xrep_notsupported +#define xrep_iallocbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 2133a3199372..631b0b06db99 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -244,14 +244,14 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_PERAG, .setup = xchk_setup_ag_iallocbt, .scrub = xchk_inobt, - .repair = xrep_notsupported, + .repair = xrep_iallocbt, }, [XFS_SCRUB_TYPE_FINOBT] = { /* finobt */ .type = ST_PERAG, .setup = xchk_setup_ag_iallocbt, .scrub = xchk_finobt, .has = xfs_sb_version_hasfinobt, - .repair = xrep_notsupported, + .repair = xrep_iallocbt, }, [XFS_SCRUB_TYPE_RMAPBT] = { /* rmapbt */ .type = ST_PERAG, diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index af323b229c4b..762db46fd696 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -64,6 +64,7 @@ struct xfs_scrub { uint ilock_flags; bool try_harder; bool has_quotaofflock; + bool reset_perag_resv; /* State tracking for single-AG operations. */ struct xchk_ag sa; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 26bd5dc68efe..9126dc66f726 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -552,7 +552,7 @@ DEFINE_EVENT(xrep_rmap_class, name, \ uint64_t owner, uint64_t offset, unsigned int flags), \ TP_ARGS(mp, agno, agbno, len, owner, offset, flags)) DEFINE_REPAIR_RMAP_EVENT(xrep_abt_walk_rmap); -DEFINE_REPAIR_RMAP_EVENT(xrep_ialloc_extent_fn); +DEFINE_REPAIR_RMAP_EVENT(xrep_ibt_walk_rmap); DEFINE_REPAIR_RMAP_EVENT(xrep_rmap_extent_fn); DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_extent_fn); @@ -700,7 +700,7 @@ TRACE_EVENT(xrep_reset_counters, MAJOR(__entry->dev), MINOR(__entry->dev)) ) -TRACE_EVENT(xrep_ialloc_insert, +TRACE_EVENT(xrep_ibt_insert, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agino_t startino, uint16_t holemask, uint8_t count, uint8_t freecount, uint64_t freemask), From patchwork Thu Jul 26 00:21:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544989 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0CED21822 for ; Thu, 26 Jul 2018 00:21:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED5902AA95 for ; Thu, 26 Jul 2018 00:21:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E0C802AAD3; Thu, 26 Jul 2018 00:21:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5C982AAEC for ; Thu, 26 Jul 2018 00:21:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728308AbeGZBfX (ORCPT ); Wed, 25 Jul 2018 21:35:23 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:45070 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbeGZBfW (ORCPT ); Wed, 25 Jul 2018 21:35:22 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0E3Hk166557; Thu, 26 Jul 2018 00:21:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=LHYfTO/O6Lut9mBCiqa1RVrBxSt3aU+OsJMNU5g2u+g=; b=rkldGR7J2U9umNAL/tGDksphzGT++odgOceC/hVzhgnJYEguYxwPqWGY6NS4smUEfyre LO1WvNnWnZ40VOG/GdiuK8QSvaQ48dfute6UKU6clo3g9rH0JwZbs9rH4opWiBLNj1B9 nUjRd/l/5Qpo8ptvrXdQM78Thp3LM5oy1R15oBPEwbfpzdSV4Ke96XEay3dYLwXH/oF1 3H5QOvPca/THZA5hwdMsojgoO4WHh+lNzOM7Qu3NFwQ20zGYdzgSbzjOJ6BdRgQ7SgdJ usAIAHLcYrhHEep35rOfbHZ1k55mjLPrRJNTh/1lOkIbxbdTUZdR39xpZfDPk93x69yi Kw== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2kbv8t7pre-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:10 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LAh9016546 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:10 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6Q0L9Yn009568; Thu, 26 Jul 2018 00:21:09 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:21:09 -0700 Subject: [PATCH 09/16] xfs: repair refcount btrees From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:21:07 -0700 Message-ID: <153256446716.29021.11487445106131817780.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Reconstruct the refcount data from the rmap btree. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/refcount_repair.c | 586 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 2 4 files changed, 590 insertions(+), 1 deletion(-) create mode 100644 fs/xfs/scrub/refcount_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index af1dc9aeb1a7..4ca97e026f94 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -167,6 +167,7 @@ xfs-y += $(addprefix scrub/, \ alloc_repair.o \ bitmap.o \ ialloc_repair.o \ + refcount_repair.o \ repair.o \ ) endif diff --git a/fs/xfs/scrub/refcount_repair.c b/fs/xfs/scrub/refcount_repair.c new file mode 100644 index 000000000000..549e1adc972c --- /dev/null +++ b/fs/xfs/scrub/refcount_repair.c @@ -0,0 +1,586 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_itable.h" +#include "xfs_alloc.h" +#include "xfs_ialloc.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_refcount.h" +#include "xfs_refcount_btree.h" +#include "xfs_error.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" + +/* + * Rebuilding the Reference Count Btree + * ==================================== + * + * This algorithm is "borrowed" from xfs_repair. Imagine the rmap + * entries as rectangles representing extents of physical blocks, and + * that the rectangles can be laid down to allow them to overlap each + * other; then we know that we must emit a refcnt btree entry wherever + * the amount of overlap changes, i.e. the emission stimulus is + * level-triggered: + * + * - --- + * -- ----- ---- --- ------ + * -- ---- ----------- ---- --------- + * -------------------------------- ----------- + * ^ ^ ^^ ^^ ^ ^^ ^^^ ^^^^ ^ ^^ ^ ^ ^ + * 2 1 23 21 3 43 234 2123 1 01 2 3 0 + * + * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner). + * + * Note that in the actual refcnt btree we don't store the refcount < 2 + * cases because the bnobt tells us which blocks are free; single-use + * blocks aren't recorded in the bnobt or the refcntbt. If the rmapbt + * supports storing multiple entries covering a given block we could + * theoretically dispense with the refcntbt and simply count rmaps, but + * that's inefficient in the (hot) write path, so we'll take the cost of + * the extra tree to save time. Also there's no guarantee that rmap + * will be enabled. + * + * Given an array of rmaps sorted by physical block number, a starting + * physical block (sp), a bag to hold rmaps that cover sp, and the next + * physical block where the level changes (np), we can reconstruct the + * refcount btree as follows: + * + * While there are still unprocessed rmaps in the array, + * - Set sp to the physical block (pblk) of the next unprocessed rmap. + * - Add to the bag all rmaps in the array where startblock == sp. + * - Set np to the physical block where the bag size will change. This + * is the minimum of (the pblk of the next unprocessed rmap) and + * (startblock + len of each rmap in the bag). + * - Record the bag size as old_bag_size. + * + * - While the bag isn't empty, + * - Remove from the bag all rmaps where startblock + len == np. + * - Add to the bag all rmaps in the array where startblock == np. + * - If the bag size isn't old_bag_size, store the refcount entry + * (sp, np - sp, bag_size) in the refcnt btree. + * - If the bag is empty, break out of the inner loop. + * - Set old_bag_size to the bag size + * - Set sp = np. + * - Set np to the physical block where the bag size will change. + * This is the minimum of (the pblk of the next unprocessed rmap) + * and (startblock + len of each rmap in the bag). + * + * Like all the other repairers, we make a list of all the refcount + * records we need, then reinitialize the refcount btree root and + * insert all the records. + */ + +struct xrep_refc_rmap { + struct list_head list; + struct xfs_rmap_irec rmap; +}; + +struct xrep_refc_extent { + struct list_head list; + struct xfs_refcount_irec refc; +}; + +struct xrep_refc { + struct list_head rmap_bag; /* rmaps we're tracking */ + struct list_head rmap_idle; /* idle rmaps */ + struct list_head *extlist; /* refcount extents */ + struct xfs_bitmap *btlist; /* old refcountbt blocks */ + struct xfs_scrub *sc; + unsigned long nr_records;/* nr refcount extents */ + xfs_extlen_t btblocks; /* # of refcountbt blocks */ +}; + +/* Grab the next record from the rmapbt. */ +STATIC int +xrep_refc_next_rmap( + struct xfs_btree_cur *cur, + struct xrep_refc *rr, + struct xfs_rmap_irec *rec, + bool *have_rec) +{ + struct xfs_rmap_irec rmap; + struct xfs_mount *mp = cur->bc_mp; + struct xrep_refc_extent *rre; + xfs_fsblock_t fsbno; + int have_gt; + int error = 0; + + *have_rec = false; + /* + * Loop through the remaining rmaps. Remember CoW staging + * extents and the refcountbt blocks from the old tree for later + * disposal. We can only share written data fork extents, so + * keep looping until we find an rmap for one. + */ + do { + if (xchk_should_terminate(rr->sc, &error)) + goto out_error; + + error = xfs_btree_increment(cur, 0, &have_gt); + if (error) + goto out_error; + if (!have_gt) + return 0; + + error = xfs_rmap_get_rec(cur, &rmap, &have_gt); + if (error) + goto out_error; + XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 1, out_error); + + if (rmap.rm_owner == XFS_RMAP_OWN_COW) { + /* Pass CoW staging extents right through. */ + rre = kmem_alloc(sizeof(struct xrep_refc_extent), + KM_MAYFAIL); + if (!rre) + goto out_error; + + INIT_LIST_HEAD(&rre->list); + rre->refc.rc_startblock = rmap.rm_startblock + + XFS_REFC_COW_START; + rre->refc.rc_blockcount = rmap.rm_blockcount; + rre->refc.rc_refcount = 1; + list_add_tail(&rre->list, rr->extlist); + } else if (rmap.rm_owner == XFS_RMAP_OWN_REFC) { + /* refcountbt block, dump it when we're done. */ + rr->btblocks += rmap.rm_blockcount; + fsbno = XFS_AGB_TO_FSB(cur->bc_mp, + cur->bc_private.a.agno, + rmap.rm_startblock); + error = xfs_bitmap_set(rr->btlist, fsbno, + rmap.rm_blockcount); + if (error) + goto out_error; + } + } while (XFS_RMAP_NON_INODE_OWNER(rmap.rm_owner) || + xfs_internal_inum(mp, rmap.rm_owner) || + (rmap.rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK | + XFS_RMAP_UNWRITTEN))); + + *rec = rmap; + *have_rec = true; + return 0; + +out_error: + return error; +} + +/* Recycle an idle rmap or allocate a new one. */ +static struct xrep_refc_rmap * +xrep_refc_get_rmap( + struct xrep_refc *rr) +{ + struct xrep_refc_rmap *rrm; + + if (list_empty(&rr->rmap_idle)) { + rrm = kmem_alloc(sizeof(struct xrep_refc_rmap), KM_MAYFAIL); + if (!rrm) + return NULL; + INIT_LIST_HEAD(&rrm->list); + return rrm; + } + + rrm = list_first_entry(&rr->rmap_idle, struct xrep_refc_rmap, list); + list_del_init(&rrm->list); + return rrm; +} + +/* Compare two btree extents. */ +static int +xrep_refcount_extent_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_refc_extent *ap; + struct xrep_refc_extent *bp; + + ap = container_of(a, struct xrep_refc_extent, list); + bp = container_of(b, struct xrep_refc_extent, list); + + if (ap->refc.rc_startblock > bp->refc.rc_startblock) + return 1; + else if (ap->refc.rc_startblock < bp->refc.rc_startblock) + return -1; + return 0; +} + +/* Record a reference count extent. */ +STATIC int +xrep_refc_new_refc( + struct xfs_scrub *sc, + struct xrep_refc *rr, + xfs_agblock_t agbno, + xfs_extlen_t len, + xfs_nlink_t refcount) +{ + struct xrep_refc_extent *rre; + struct xfs_refcount_irec irec; + + irec.rc_startblock = agbno; + irec.rc_blockcount = len; + irec.rc_refcount = refcount; + + trace_xrep_refcount_extent_fn(sc->mp, sc->sa.agno, &irec); + + rre = kmem_alloc(sizeof(struct xrep_refc_extent), KM_MAYFAIL); + if (!rre) + return -ENOMEM; + INIT_LIST_HEAD(&rre->list); + rre->refc = irec; + list_add_tail(&rre->list, rr->extlist); + + return 0; +} + +/* Iterate all the rmap records to generate reference count data. */ +#define RMAP_NEXT(r) ((r).rm_startblock + (r).rm_blockcount) +STATIC int +xrep_refc_generate_refcounts( + struct xfs_scrub *sc, + struct xrep_refc *rr) +{ + struct xfs_rmap_irec rmap; + struct xfs_btree_cur *cur; + struct xrep_refc_rmap *rrm; + struct xrep_refc_rmap *n; + xfs_agblock_t sbno; + xfs_agblock_t cbno; + xfs_agblock_t nbno; + size_t old_stack_sz; + size_t stack_sz = 0; + bool have; + int have_gt; + int error; + + /* Start the rmapbt cursor to the left of all records. */ + cur = xfs_rmapbt_init_cursor(sc->mp, sc->tp, sc->sa.agf_bp, + sc->sa.agno); + error = xfs_rmap_lookup_le(cur, 0, 0, 0, 0, 0, &have_gt); + if (error) + goto out; + ASSERT(have_gt == 0); + + /* Process reverse mappings into refcount data. */ + while (xfs_btree_has_more_records(cur)) { + /* Push all rmaps with pblk == sbno onto the stack */ + error = xrep_refc_next_rmap(cur, rr, &rmap, &have); + if (error) + goto out; + if (!have) + break; + sbno = cbno = rmap.rm_startblock; + while (have && rmap.rm_startblock == sbno) { + rrm = xrep_refc_get_rmap(rr); + if (!rrm) + goto out; + rrm->rmap = rmap; + list_add_tail(&rrm->list, &rr->rmap_bag); + stack_sz++; + error = xrep_refc_next_rmap(cur, rr, &rmap, &have); + if (error) + goto out; + } + error = xfs_btree_decrement(cur, 0, &have_gt); + if (error) + goto out; + XFS_WANT_CORRUPTED_GOTO(sc->mp, have_gt, out); + + /* Set nbno to the bno of the next refcount change */ + nbno = have ? rmap.rm_startblock : NULLAGBLOCK; + list_for_each_entry(rrm, &rr->rmap_bag, list) + nbno = min_t(xfs_agblock_t, nbno, RMAP_NEXT(rrm->rmap)); + + ASSERT(nbno > sbno); + old_stack_sz = stack_sz; + + /* While stack isn't empty... */ + while (stack_sz) { + /* Pop all rmaps that end at nbno */ + list_for_each_entry_safe(rrm, n, &rr->rmap_bag, list) { + if (RMAP_NEXT(rrm->rmap) != nbno) + continue; + stack_sz--; + list_move(&rrm->list, &rr->rmap_idle); + } + + /* Push array items that start at nbno */ + error = xrep_refc_next_rmap(cur, rr, &rmap, &have); + if (error) + goto out; + while (have && rmap.rm_startblock == nbno) { + rrm = xrep_refc_get_rmap(rr); + if (!rrm) + goto out; + rrm->rmap = rmap; + list_add_tail(&rrm->list, &rr->rmap_bag); + stack_sz++; + error = xrep_refc_next_rmap(cur, rr, &rmap, + &have); + if (error) + goto out; + } + error = xfs_btree_decrement(cur, 0, &have_gt); + if (error) + goto out; + XFS_WANT_CORRUPTED_GOTO(sc->mp, have_gt, out); + + /* Emit refcount if necessary */ + ASSERT(nbno > cbno); + if (stack_sz != old_stack_sz) { + if (old_stack_sz > 1) { + error = xrep_refc_new_refc(sc, rr, cbno, + nbno - cbno, + old_stack_sz); + if (error) + goto out; + rr->nr_records++; + } + cbno = nbno; + } + + /* Stack empty, go find the next rmap */ + if (stack_sz == 0) + break; + old_stack_sz = stack_sz; + sbno = nbno; + + /* Set nbno to the bno of the next refcount change */ + nbno = have ? rmap.rm_startblock : NULLAGBLOCK; + list_for_each_entry(rrm, &rr->rmap_bag, list) + nbno = min_t(xfs_agblock_t, nbno, + RMAP_NEXT(rrm->rmap)); + + ASSERT(nbno > sbno); + } + } + + /* Free all the leftover rmap records. */ + list_for_each_entry_safe(rrm, n, &rr->rmap_idle, list) { + list_del(&rrm->list); + kmem_free(rrm); + } + + ASSERT(list_empty(&rr->rmap_bag)); +out: + xfs_btree_del_cursor(cur, error); + return error; +} +#undef RMAP_NEXT + +/* + * Generate all the reference counts for this AG and a list of the old + * refcount btree blocks. Figure out if we have enough free space to + * reconstruct the inode btrees. The caller must clean up the lists if + * anything goes wrong. + */ +STATIC int +xrep_refc_find_refcounts( + struct xfs_scrub *sc, + struct list_head *refcount_records, + struct xfs_bitmap *old_refcountbt_blocks) +{ + struct xrep_refc rr; + struct xrep_refc_rmap *rrm; + struct xrep_refc_rmap *n; + struct xfs_mount *mp = sc->mp; + int error; + + INIT_LIST_HEAD(&rr.rmap_bag); + INIT_LIST_HEAD(&rr.rmap_idle); + rr.extlist = refcount_records; + rr.btlist = old_refcountbt_blocks; + rr.btblocks = 0; + rr.sc = sc; + rr.nr_records = 0; + + /* Generate all the refcount records. */ + error = xrep_refc_generate_refcounts(sc, &rr); + if (error) + goto out; + + /* Do we actually have enough space to do this? */ + if (!xrep_ag_has_space(sc->sa.pag, + xfs_refcountbt_calc_size(mp, rr.nr_records), + XFS_AG_RESV_METADATA)) { + error = -ENOSPC; + goto out; + } + +out: + list_for_each_entry_safe(rrm, n, &rr.rmap_idle, list) { + list_del(&rrm->list); + kmem_free(rrm); + } + list_for_each_entry_safe(rrm, n, &rr.rmap_bag, list) { + list_del(&rrm->list); + kmem_free(rrm); + } + return error; +} + +/* Initialize new refcountbt root and implant it into the AGF. */ +STATIC int +xrep_refc_reset_btree( + struct xfs_scrub *sc, + struct xfs_owner_info *oinfo, + int *log_flags) +{ + struct xfs_buf *bp; + struct xfs_agf *agf; + xfs_fsblock_t btfsb; + int error; + + agf = XFS_BUF_TO_AGF(sc->sa.agf_bp); + + /* Initialize a new refcountbt root. */ + error = xrep_alloc_ag_block(sc, oinfo, &btfsb, XFS_AG_RESV_METADATA); + if (error) + return error; + error = xrep_init_btblock(sc, btfsb, &bp, XFS_BTNUM_REFC, + &xfs_refcountbt_buf_ops); + if (error) + return error; + agf->agf_refcount_root = cpu_to_be32(XFS_FSB_TO_AGBNO(sc->mp, btfsb)); + agf->agf_refcount_level = cpu_to_be32(1); + agf->agf_refcount_blocks = cpu_to_be32(1); + *log_flags |= XFS_AGF_REFCOUNT_BLOCKS | XFS_AGF_REFCOUNT_ROOT | + XFS_AGF_REFCOUNT_LEVEL; + + return 0; +} + +/* Build new refcount btree and dispose of the old one. */ +STATIC int +xrep_refc_rebuild_tree( + struct xfs_scrub *sc, + struct list_head *refcount_records, + struct xfs_owner_info *oinfo, + struct xfs_bitmap *old_refcountbt_blocks) +{ + struct xrep_refc_extent *rre; + struct xrep_refc_extent *n; + struct xfs_mount *mp = sc->mp; + struct xfs_btree_cur *cur; + int have_gt; + int error; + + /* Add all records. */ + list_sort(NULL, refcount_records, xrep_refcount_extent_cmp); + list_for_each_entry_safe(rre, n, refcount_records, list) { + /* Insert into the refcountbt. */ + cur = xfs_refcountbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, + sc->sa.agno, NULL); + error = xfs_refcount_lookup_eq(cur, rre->refc.rc_startblock, + &have_gt); + if (error) + return error; + XFS_WANT_CORRUPTED_RETURN(mp, have_gt == 0); + error = xfs_refcount_insert(cur, &rre->refc, &have_gt); + if (error) + return error; + XFS_WANT_CORRUPTED_RETURN(mp, have_gt == 1); + xfs_btree_del_cursor(cur, error); + cur = NULL; + + error = xrep_roll_ag_trans(sc); + if (error) + return error; + + list_del(&rre->list); + kmem_free(rre); + } + + /* Free the old refcountbt blocks if they're not in use. */ + return xrep_reap_extents(sc, old_refcountbt_blocks, oinfo, + XFS_AG_RESV_METADATA); +} + +/* Free every record in the refcount list. */ +STATIC void +xrep_refc_cancel_recs( + struct list_head *recs) +{ + struct xrep_refc_extent *rre; + struct xrep_refc_extent *n; + + list_for_each_entry_safe(rre, n, recs, list) { + list_del(&rre->list); + kmem_free(rre); + } +} + +/* Rebuild the refcount btree. */ +int +xrep_refcountbt( + struct xfs_scrub *sc) +{ + struct xfs_owner_info oinfo; + struct list_head refcount_records; + struct xfs_bitmap old_refcountbt_blocks; + struct xfs_mount *mp = sc->mp; + int log_flags = 0; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + + /* Collect all reference counts. */ + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC); + INIT_LIST_HEAD(&refcount_records); + xfs_bitmap_init(&old_refcountbt_blocks); + error = xrep_refc_find_refcounts(sc, &refcount_records, + &old_refcountbt_blocks); + if (error) + goto out; + + /* + * Blow out the old refcount btrees. This is the point at which + * we are no longer able to bail out gracefully. + */ + error = xrep_refc_reset_btree(sc, &oinfo, &log_flags); + if (error) + goto out; + xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, log_flags); + + /* Invalidate all the inobt/finobt blocks in btlist. */ + error = xrep_invalidate_blocks(sc, &old_refcountbt_blocks); + if (error) + goto out; + error = xrep_roll_ag_trans(sc); + if (error) + goto out; + + /* Now rebuild the refcount information. */ + error = xrep_refc_rebuild_tree(sc, &refcount_records, &oinfo, + &old_refcountbt_blocks); + if (error) + goto out; + sc->reset_perag_resv = true; +out: + xfs_bitmap_destroy(&old_refcountbt_blocks); + xrep_refc_cancel_recs(&refcount_records); + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 0cc53dee3228..da12c20376ae 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -64,6 +64,7 @@ int xrep_agfl(struct xfs_scrub *sc); int xrep_agi(struct xfs_scrub *sc); int xrep_allocbt(struct xfs_scrub *sc); int xrep_iallocbt(struct xfs_scrub *sc); +int xrep_refcountbt(struct xfs_scrub *sc); #else @@ -100,6 +101,7 @@ xrep_reset_perag_resv( #define xrep_agi xrep_notsupported #define xrep_allocbt xrep_notsupported #define xrep_iallocbt xrep_notsupported +#define xrep_refcountbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 631b0b06db99..843eafe0acef 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -265,7 +265,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .setup = xchk_setup_ag_refcountbt, .scrub = xchk_refcountbt, .has = xfs_sb_version_hasreflink, - .repair = xrep_notsupported, + .repair = xrep_refcountbt, }, [XFS_SCRUB_TYPE_INODE] = { /* inode record */ .type = ST_INODE, From patchwork Thu Jul 26 00:21:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544991 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C661112E for ; Thu, 26 Jul 2018 00:21:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 871D22AA2D for ; Thu, 26 Jul 2018 00:21:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8561F2AAEC; Thu, 26 Jul 2018 00:21:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1C5452AAC9 for ; Thu, 26 Jul 2018 00:21:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728501AbeGZBfa (ORCPT ); Wed, 25 Jul 2018 21:35:30 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:47478 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbeGZBfa (ORCPT ); Wed, 25 Jul 2018 21:35:30 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0Dl3J002013; Thu, 26 Jul 2018 00:21:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=biqLb9wyTIgtGr67hEWJA891zHVuYWonE92lImrdXts=; b=ywjuy05PpaPRHnIfEOOs8IjjHCGi0Gr0Ou9joMZIRWerfbj8/iLyLwyqdcdoBNy/mPdF 2CryKYHHf8+6+BICryeHrPqIvDweORylQuQIoXns/CX8p2A7CtlyV6uohunK5w3Zaxq/ MS0THeGWqCQleaPDCEbFdBOSZcYXY8Sr9Eo1M22OP2EOVK+h//ZSsR54AxXjuumV5m2Y gajtuq+B4y+VsYwoSmFJPUiCOF6hQ72PQ4Vl378DgGL4Q4g0ERZ8MtQza8QHf3fU2UnY odJYhoXg72qqBopdYe5latxaFoccC2RmX/GrI3ESz2RVk5iAxV75Pck78kWEwKXFyES3 fA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2120.oracle.com with ESMTP id 2kbvsnyp28-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:17 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LGOP007523 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:16 GMT Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LG2g000763; Thu, 26 Jul 2018 00:21:16 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:21:15 -0700 Subject: [PATCH 10/16] xfs: repair inode records From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:21:13 -0700 Message-ID: <153256447359.29021.9585208919876582516.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Try to reinitialize corrupt inodes, or clear the reflink flag if it's not needed. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_format.h | 3 fs/xfs/scrub/inode_repair.c | 659 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 2 5 files changed, 665 insertions(+), 2 deletions(-) create mode 100644 fs/xfs/scrub/inode_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 4ca97e026f94..e01b5003d543 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -167,6 +167,7 @@ xfs-y += $(addprefix scrub/, \ alloc_repair.o \ bitmap.o \ ialloc_repair.o \ + inode_repair.o \ refcount_repair.o \ repair.o \ ) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 059bc44c27e8..d4ebf1a4f3e8 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -973,7 +973,8 @@ typedef enum xfs_dinode_fmt { #define XFS_DFORK_APTR(dip) \ (XFS_DFORK_DPTR(dip) + XFS_DFORK_BOFF(dip)) #define XFS_DFORK_PTR(dip,w) \ - ((w) == XFS_DATA_FORK ? XFS_DFORK_DPTR(dip) : XFS_DFORK_APTR(dip)) + ((void *)((w) == XFS_DATA_FORK ? XFS_DFORK_DPTR(dip) : \ + XFS_DFORK_APTR(dip))) #define XFS_DFORK_FORMAT(dip,w) \ ((w) == XFS_DATA_FORK ? \ diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c new file mode 100644 index 000000000000..5ff929ea3d11 --- /dev/null +++ b/fs/xfs/scrub/inode_repair.c @@ -0,0 +1,659 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_icache.h" +#include "xfs_inode_buf.h" +#include "xfs_inode_fork.h" +#include "xfs_ialloc.h" +#include "xfs_da_format.h" +#include "xfs_reflink.h" +#include "xfs_rmap.h" +#include "xfs_bmap.h" +#include "xfs_bmap_util.h" +#include "xfs_dir2.h" +#include "xfs_quota_defs.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" + +/* + * Inode Repair + * + * Roughly speaking, inode problems can be classified based on whether or not + * they trip the dinode verifiers. If those trip, then we won't be able to + * _iget ourselves the inode. + * + * Therefore, the xrep_dinode_* functions fix anything that will cause the + * inode buffer verifier or the dinode verifier. The xrep_inode_* functions + * fix things on live incore inodes. + */ + +/* Make sure this buffer can pass the inode buffer verifier. */ +STATIC void +xrep_dinode_buf( + struct xfs_scrub *sc, + struct xfs_buf *bp) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_trans *tp = sc->tp; + struct xfs_dinode *dip; + xfs_agnumber_t agno; + xfs_agino_t agino; + int ioff; + int i; + int ni; + bool crc_ok; + bool magic_ok; + bool unlinked_ok; + + ni = XFS_BB_TO_FSB(mp, bp->b_length) * mp->m_sb.sb_inopblock; + agno = xfs_daddr_to_agno(mp, XFS_BUF_ADDR(bp)); + for (i = 0; i < ni; i++) { + ioff = i << mp->m_sb.sb_inodelog; + dip = xfs_buf_offset(bp, ioff); + agino = be32_to_cpu(dip->di_next_unlinked); + + unlinked_ok = magic_ok = crc_ok = false; + + if (agino == NULLAGINO || xfs_verify_agino(sc->mp, agno, agino)) + unlinked_ok = true; + + if (dip->di_magic == cpu_to_be16(XFS_DINODE_MAGIC) && + xfs_dinode_good_version(mp, dip->di_version)) + magic_ok = true; + + if (xfs_verify_cksum((char *)dip, mp->m_sb.sb_inodesize, + XFS_DINODE_CRC_OFF)) + crc_ok = true; + + if (magic_ok && unlinked_ok && crc_ok) + continue; + + if (!magic_ok) { + dip->di_magic = cpu_to_be16(XFS_DINODE_MAGIC); + dip->di_version = 3; + } + if (!unlinked_ok) + dip->di_next_unlinked = cpu_to_be32(NULLAGINO); + xfs_dinode_calc_crc(mp, dip); + xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF); + xfs_trans_log_buf(tp, bp, ioff, ioff + sizeof(*dip) - 1); + } +} + +/* Reinitialize things that never change in an inode. */ +STATIC void +xrep_dinode_header( + struct xfs_scrub *sc, + struct xfs_dinode *dip) +{ + dip->di_magic = cpu_to_be16(XFS_DINODE_MAGIC); + if (!xfs_dinode_good_version(sc->mp, dip->di_version)) + dip->di_version = 3; + dip->di_ino = cpu_to_be64(sc->sm->sm_ino); + uuid_copy(&dip->di_uuid, &sc->mp->m_sb.sb_meta_uuid); + dip->di_gen = cpu_to_be32(sc->sm->sm_gen); +} + +/* + * Turn di_mode into /something/ recognizable. + * + * XXX: Ideally we'd try to read data block 0 to see if it's a directory. + */ +STATIC void +xrep_dinode_mode( + struct xfs_dinode *dip) +{ + uint16_t mode; + + mode = be16_to_cpu(dip->di_mode); + if (mode == 0 || xfs_mode_to_ftype(mode) != XFS_DIR3_FT_UNKNOWN) + return; + + /* bad mode, so we set it to a file that only root can read */ + mode = S_IFREG; + dip->di_mode = cpu_to_be16(mode); + dip->di_uid = 0; + dip->di_gid = 0; +} + +/* Fix any conflicting flags that the verifiers complain about. */ +STATIC void +xrep_dinode_flags( + struct xfs_scrub *sc, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = sc->mp; + uint64_t flags2; + uint16_t mode; + uint16_t flags; + + mode = be16_to_cpu(dip->di_mode); + flags = be16_to_cpu(dip->di_flags); + flags2 = be64_to_cpu(dip->di_flags2); + + if (xfs_sb_version_hasreflink(&mp->m_sb) && S_ISREG(mode)) + flags2 |= XFS_DIFLAG2_REFLINK; + else + flags2 &= ~(XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE); + if (flags & XFS_DIFLAG_REALTIME) + flags2 &= ~XFS_DIFLAG2_REFLINK; + if (flags2 & XFS_DIFLAG2_REFLINK) + flags2 &= ~XFS_DIFLAG2_DAX; + dip->di_flags = cpu_to_be16(flags); + dip->di_flags2 = cpu_to_be64(flags2); +} + +/* + * Blow out symlink; now it points to the current dir. We don't have to worry + * about incore state because this inode is failing the verifiers. + */ +STATIC void +xrep_dinode_zap_symlink( + struct xfs_dinode *dip) +{ + char *p; + + dip->di_format = XFS_DINODE_FMT_LOCAL; + dip->di_size = cpu_to_be64(1); + p = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + *p = '.'; +} + +/* + * Blow out dir, make it point to the root. In the future repair will + * reconstruct this directory for us. Note that there's no in-core directory + * inode because the sf verifier tripped, so we don't have to worry about the + * dentry cache. + */ +STATIC void +xrep_dinode_zap_dir( + struct xfs_mount *mp, + struct xfs_dinode *dip) +{ + const struct xfs_dir_ops *ops; + struct xfs_dir2_sf_hdr *sfp; + int i8count; + + dip->di_format = XFS_DINODE_FMT_LOCAL; + i8count = mp->m_sb.sb_rootino > XFS_DIR2_MAX_SHORT_INUM; + ops = xfs_dir_get_ops(mp, NULL); + sfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + sfp->count = 0; + sfp->i8count = i8count; + ops->sf_put_parent_ino(sfp, mp->m_sb.sb_rootino); + dip->di_size = cpu_to_be64(xfs_dir2_sf_hdr_size(i8count)); +} + +/* Make sure we don't have a garbage file size. */ +STATIC void +xrep_dinode_size( + struct xfs_mount *mp, + struct xfs_dinode *dip) +{ + uint64_t size; + uint16_t mode; + + mode = be16_to_cpu(dip->di_mode); + size = be64_to_cpu(dip->di_size); + switch (mode & S_IFMT) { + case S_IFIFO: + case S_IFCHR: + case S_IFBLK: + case S_IFSOCK: + /* di_size can't be nonzero for special files */ + dip->di_size = 0; + break; + case S_IFREG: + /* Regular files can't be larger than 2^63-1 bytes. */ + dip->di_size = cpu_to_be64(size & ~(1ULL << 63)); + break; + case S_IFLNK: + /* + * Truncate ridiculously oversized symlinks. If the size is + * zero, reset it to point to the current directory. Both of + * these conditions trigger dinode verifier errors, so there + * is no in-core state to reset. + */ + if (size > XFS_SYMLINK_MAXLEN) + dip->di_size = cpu_to_be64(XFS_SYMLINK_MAXLEN); + else if (size == 0) + xrep_dinode_zap_symlink(dip); + break; + case S_IFDIR: + /* + * Directories can't have a size larger than 32G. If the size + * is zero, reset it to an empty directory. Both of these + * conditions trigger dinode verifier errors, so there is no + * in-core state to reset. + */ + if (size > XFS_DIR2_SPACE_SIZE) + dip->di_size = cpu_to_be64(XFS_DIR2_SPACE_SIZE); + else if (size == 0) + xrep_dinode_zap_dir(mp, dip); + break; + } +} + +/* Fix extent size hints. */ +STATIC void +xrep_dinode_extsize_hints( + struct xfs_scrub *sc, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = sc->mp; + uint64_t flags2; + uint16_t flags; + uint16_t mode; + xfs_failaddr_t fa; + + mode = be16_to_cpu(dip->di_mode); + flags = be16_to_cpu(dip->di_flags); + flags2 = be64_to_cpu(dip->di_flags2); + + fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize), + mode, flags); + if (fa) { + dip->di_extsize = 0; + dip->di_flags &= ~cpu_to_be16(XFS_DIFLAG_EXTSIZE | + XFS_DIFLAG_EXTSZINHERIT); + } + + if (dip->di_version < 3) + return; + + fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize), + mode, flags, flags2); + if (fa) { + dip->di_cowextsize = 0; + dip->di_flags2 &= ~cpu_to_be64(XFS_DIFLAG2_COWEXTSIZE); + } +} + +/* Inode didn't pass verifiers, so fix the raw buffer and retry iget. */ +STATIC int +xrep_dinode_core( + struct xfs_scrub *sc) +{ + struct xfs_imap imap; + struct xfs_buf *bp; + struct xfs_dinode *dip; + xfs_ino_t ino; + bool inuse; + int error; + + /* Map & read inode. */ + ino = sc->sm->sm_ino; + error = xfs_imap(sc->mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED); + if (error) + return error; + + error = xfs_trans_read_buf(sc->mp, sc->tp, sc->mp->m_ddev_targp, + imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp, NULL); + if (error) + return error; + + /* Make absolutely sure this inode isn't in core. */ + error = xfs_icache_inode_is_allocated(sc->mp, sc->tp, ino, &inuse); + if (error == 0) { + ASSERT(0); + return -EFSCORRUPTED; + } + + /* Make sure we can pass the inode buffer verifier. */ + xrep_dinode_buf(sc, bp); + bp->b_ops = &xfs_inode_buf_ops; + + /* Fix everything the verifier will complain about. */ + dip = xfs_buf_offset(bp, imap.im_boffset); + xrep_dinode_header(sc, dip); + xrep_dinode_mode(dip); + xrep_dinode_flags(sc, dip); + xrep_dinode_size(sc->mp, dip); + xrep_dinode_extsize_hints(sc, dip); + + /* Write out the inode... */ + xfs_dinode_calc_crc(sc->mp, dip); + xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_DINO_BUF); + xfs_trans_log_buf(sc->tp, bp, imap.im_boffset, + imap.im_boffset + sc->mp->m_sb.sb_inodesize - 1); + error = xfs_trans_commit(sc->tp); + if (error) + return error; + sc->tp = NULL; + + /* ...and reload it? */ + error = xfs_iget(sc->mp, sc->tp, ino, + XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &sc->ip); + if (error) + return error; + sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; + xfs_ilock(sc->ip, sc->ilock_flags); + error = xchk_trans_alloc(sc, 0); + if (error) + return error; + sc->ilock_flags |= XFS_ILOCK_EXCL; + xfs_ilock(sc->ip, XFS_ILOCK_EXCL); + + return 0; +} + +/* Fix everything xfs_dinode_verify cares about. */ +STATIC int +xrep_dinode_problems( + struct xfs_scrub *sc) +{ + int error; + + error = xrep_dinode_core(sc); + if (error) + return error; + + /* We had to fix a totally busted inode, schedule quotacheck. */ + if (XFS_IS_UQUOTA_ON(sc->mp)) + xrep_force_quotacheck(sc, XFS_DQ_USER); + if (XFS_IS_GQUOTA_ON(sc->mp)) + xrep_force_quotacheck(sc, XFS_DQ_GROUP); + if (XFS_IS_PQUOTA_ON(sc->mp)) + xrep_force_quotacheck(sc, XFS_DQ_PROJ); + + return 0; +} + +/* + * Fix problems that the verifiers don't care about. In general these are + * errors that don't cause problems elsewhere in the kernel that we can easily + * detect, so we don't check them all that rigorously. + */ + +/* Make sure block and extent counts are ok. */ +STATIC int +xrep_inode_blockcounts( + struct xfs_scrub *sc) +{ + xfs_filblks_t count; + xfs_filblks_t acount; + xfs_extnum_t nextents; + int error; + + /* Set data fork counters from the data fork mappings. */ + error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK, + &nextents, &count); + if (error) + return error; + if (XFS_IS_REALTIME_INODE(sc->ip)) { + if (count >= sc->mp->m_sb.sb_rblocks) + return -EFSCORRUPTED; + } else if (!xfs_sb_version_hasreflink(&sc->mp->m_sb)) { + if (count >= sc->mp->m_sb.sb_dblocks) + return -EFSCORRUPTED; + } + sc->ip->i_d.di_nextents = nextents; + + /* Set attr fork counters from the attr fork mappings. */ + error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK, + &nextents, &acount); + if (error) + return error; + if (count >= sc->mp->m_sb.sb_dblocks) + return -EFSCORRUPTED; + if (nextents >= (uint16_t)-1U) + return -EFSCORRUPTED; + sc->ip->i_d.di_anextents = nextents; + + sc->ip->i_d.di_nblocks = count + acount; + + /* + * If we found attr fork extents but no attr fork root, zero the + * attr fork extent count so that the attr fork repair will run. + */ + if (sc->ip->i_d.di_anextents != 0 && sc->ip->i_d.di_forkoff == 0) + sc->ip->i_d.di_anextents = 0; + + return 0; +} + +/* Check for invalid uid/gid. Note that a -1U projid is allowed. */ +STATIC void +xrep_inode_ids( + struct xfs_scrub *sc) +{ + if (sc->ip->i_d.di_uid == -1U) { + sc->ip->i_d.di_uid = 0; + VFS_I(sc->ip)->i_mode &= ~(S_ISUID | S_ISGID); + if (XFS_IS_UQUOTA_ON(sc->mp)) + xrep_force_quotacheck(sc, XFS_DQ_USER); + } + + if (sc->ip->i_d.di_gid == -1U) { + sc->ip->i_d.di_gid = 0; + VFS_I(sc->ip)->i_mode &= ~(S_ISUID | S_ISGID); + if (XFS_IS_GQUOTA_ON(sc->mp)) + xrep_force_quotacheck(sc, XFS_DQ_GROUP); + } +} + +/* Nanosecond counters can't have more than 1 billion. */ +STATIC void +xrep_inode_timestamps( + struct xfs_inode *ip) +{ + if ((unsigned long)VFS_I(ip)->i_atime.tv_nsec >= NSEC_PER_SEC) + VFS_I(ip)->i_atime.tv_nsec = 0; + if ((unsigned long)VFS_I(ip)->i_mtime.tv_nsec >= NSEC_PER_SEC) + VFS_I(ip)->i_mtime.tv_nsec = 0; + if ((unsigned long)VFS_I(ip)->i_ctime.tv_nsec >= NSEC_PER_SEC) + VFS_I(ip)->i_ctime.tv_nsec = 0; + if (ip->i_d.di_version > 2 && + (unsigned long)ip->i_d.di_crtime.t_nsec >= NSEC_PER_SEC) + ip->i_d.di_crtime.t_nsec = 0; +} + +/* Fix inode flags that don't make sense together. */ +STATIC void +xrep_inode_flags( + struct xfs_scrub *sc) +{ + uint16_t mode; + + mode = VFS_I(sc->ip)->i_mode; + + /* Clear junk flags */ + if (sc->ip->i_d.di_flags & ~XFS_DIFLAG_ANY) + sc->ip->i_d.di_flags &= ~XFS_DIFLAG_ANY; + + /* NEWRTBM only applies to realtime bitmaps */ + if (sc->ip->i_ino == sc->mp->m_sb.sb_rbmino) + sc->ip->i_d.di_flags |= XFS_DIFLAG_NEWRTBM; + else + sc->ip->i_d.di_flags &= ~XFS_DIFLAG_NEWRTBM; + + /* These only make sense for directories. */ + if (!S_ISDIR(mode)) + sc->ip->i_d.di_flags &= ~(XFS_DIFLAG_RTINHERIT | + XFS_DIFLAG_EXTSZINHERIT | + XFS_DIFLAG_PROJINHERIT | + XFS_DIFLAG_NOSYMLINKS); + + /* These only make sense for files. */ + if (!S_ISREG(mode)) + sc->ip->i_d.di_flags &= ~(XFS_DIFLAG_REALTIME | + XFS_DIFLAG_EXTSIZE); + + /* These only make sense for non-rt files. */ + if (sc->ip->i_d.di_flags & XFS_DIFLAG_REALTIME) + sc->ip->i_d.di_flags &= ~XFS_DIFLAG_FILESTREAM; + + /* Immutable and append only? Drop the append. */ + if ((sc->ip->i_d.di_flags & XFS_DIFLAG_IMMUTABLE) && + (sc->ip->i_d.di_flags & XFS_DIFLAG_APPEND)) + sc->ip->i_d.di_flags &= ~XFS_DIFLAG_APPEND; + + if (sc->ip->i_d.di_version < 3) + return; + + /* Clear junk flags. */ + if (sc->ip->i_d.di_flags2 & ~XFS_DIFLAG2_ANY) + sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_ANY; + + /* No reflink flag unless we support it and it's a file. */ + if (!xfs_sb_version_hasreflink(&sc->mp->m_sb) || + !S_ISREG(mode)) + sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; + + /* DAX only applies to files and dirs. */ + if (!(S_ISREG(mode) || S_ISDIR(mode))) + sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_DAX; + + /* No reflink files on the realtime device. */ + if (sc->ip->i_d.di_flags & XFS_DIFLAG_REALTIME) + sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; + + /* No mixing reflink and DAX yet. */ + if (sc->ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK) + sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_DAX; +} + +/* Fix size problems with block/node format directories. */ +STATIC int +xrep_inode_blockdir_size( + struct xfs_scrub *sc) +{ + struct xfs_iext_cursor icur; + struct xfs_bmbt_irec got; + struct xfs_ifork *ifp; + xfs_fileoff_t off; + int error; + + /* Find the last block before 32G; this is the dir size. */ + ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK); + if (!(ifp->if_flags & XFS_IFEXTENTS)) { + error = xfs_iread_extents(sc->tp, sc->ip, XFS_DATA_FORK); + if (error) + return error; + } + + off = XFS_B_TO_FSB(sc->mp, XFS_DIR2_SPACE_SIZE); + if (!xfs_iext_lookup_extent_before(sc->ip, ifp, &off, &icur, &got)) { + /* zero-extents directory? */ + return -EFSCORRUPTED; + } + + off = got.br_startoff + got.br_blockcount; + sc->ip->i_d.di_size = min_t(loff_t, XFS_DIR2_SPACE_SIZE, + XFS_FSB_TO_B(sc->mp, off)); + return 0; +} + +/* Fix size problems with short format directories. */ +STATIC int +xrep_inode_sfdir_size( + struct xfs_scrub *sc) +{ + struct xfs_ifork *ifp; + + ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK); + sc->ip->i_d.di_size = ifp->if_bytes; + return 0; +} + +/* + * Fix any irregularities in an inode's size now that we can iterate extent + * maps and access other regular inode data. + */ +STATIC int +xrep_inode_size( + struct xfs_scrub *sc) +{ + /* + * Currently we only support fixing size on extents or btree format + * directories. Files can be any size and sizes for the other inode + * special types are fixed by xrep_dinode_size. + */ + if (!S_ISDIR(VFS_I(sc->ip)->i_mode)) + return 0; + switch (XFS_IFORK_FORMAT(sc->ip, XFS_DATA_FORK)) { + case XFS_DINODE_FMT_EXTENTS: + case XFS_DINODE_FMT_BTREE: + return xrep_inode_blockdir_size(sc); + case XFS_DINODE_FMT_LOCAL: + return xrep_inode_sfdir_size(sc); + default: + return 0; + } +} + +/* Fix any irregularities in an inode that the verifiers don't catch. */ +STATIC int +xrep_inode_problems( + struct xfs_scrub *sc) +{ + int error; + + error = xrep_inode_blockcounts(sc); + if (error) + return error; + xrep_inode_timestamps(sc->ip); + xrep_inode_flags(sc); + xrep_inode_ids(sc); + error = xrep_inode_size(sc); + if (error) + return error; + xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); + return xfs_trans_roll_inode(&sc->tp, sc->ip); +} + +/* Repair an inode's fields. */ +int +xrep_inode( + struct xfs_scrub *sc) +{ + int error = 0; + + /* + * No inode? That means we failed the _iget verifiers. Repair all + * the things that the inode verifiers care about, then retry _iget. + */ + if (!sc->ip) { + error = xrep_dinode_problems(sc); + if (error) + goto out; + } + + /* By this point we had better have a working incore inode. */ + ASSERT(sc->ip); + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* If we found corruption of any kind, try to fix it. */ + if ((sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) || + (sc->sm->sm_flags & XFS_SCRUB_OFLAG_XCORRUPT)) { + error = xrep_inode_problems(sc); + if (error) + goto out; + } + + /* See if we can clear the reflink flag. */ + if (xfs_is_reflink_inode(sc->ip)) + return xfs_reflink_clear_inode_flag(sc->ip, &sc->tp); + +out: + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index da12c20376ae..20e449c7a0df 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -65,6 +65,7 @@ int xrep_agi(struct xfs_scrub *sc); int xrep_allocbt(struct xfs_scrub *sc); int xrep_iallocbt(struct xfs_scrub *sc); int xrep_refcountbt(struct xfs_scrub *sc); +int xrep_inode(struct xfs_scrub *sc); #else @@ -102,6 +103,7 @@ xrep_reset_perag_resv( #define xrep_allocbt xrep_notsupported #define xrep_iallocbt xrep_notsupported #define xrep_refcountbt xrep_notsupported +#define xrep_inode xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 843eafe0acef..ae922801808d 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -271,7 +271,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_INODE, .setup = xchk_setup_inode, .scrub = xchk_inode, - .repair = xrep_notsupported, + .repair = xrep_inode, }, [XFS_SCRUB_TYPE_BMBTD] = { /* inode data fork */ .type = ST_INODE, From patchwork Thu Jul 26 00:21:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544993 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADE121822 for ; Thu, 26 Jul 2018 00:21:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9729E2AA33 for ; Thu, 26 Jul 2018 00:21:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 953BB2AAD1; Thu, 26 Jul 2018 00:21:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 674682AAC9 for ; Thu, 26 Jul 2018 00:21:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728504AbeGZBff (ORCPT ); Wed, 25 Jul 2018 21:35:35 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:56180 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbeGZBff (ORCPT ); Wed, 25 Jul 2018 21:35:35 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0E4hT167380; Thu, 26 Jul 2018 00:21:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=MQIctsda/QlLbYqNuGqhc8hYyTmjY2wZWhdOj46/FYw=; b=iOZwWUe/lDCOSh2JJV2BFyMg15eDiTDgMEdoxoM8ebqve6OQJxETOEmdZ+k4/C9cYSnT EtNL0Fh6LYCXboCyXD25OlcJQN6AzXM+DypzsikD9Q/KwsCZeLOIAmSzCERmN+alIo2H tTPS3BJgymuIHVBTCxLWeG5V9eKw4agiDl1v5xSFTHMqEhg3uxksTJnfp/GTouU7Gewf 4IJtis+doZT8UzzE2dM/cdUGPYdHna1F0eUSvt635J6OsNFvdXQBCjLWz2V1t5LMg5xG MT77Y0mwRXKSWcIlPwMEDZVZYv5AZK0TW1BHMDXlOOsjY5DY8tQJfsmSYbY3vA7foVi9 ZA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2kbwfpyndb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:23 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LMJh007828 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:23 GMT Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6Q0LM8T009592; Thu, 26 Jul 2018 00:21:22 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:21:22 -0700 Subject: [PATCH 11/16] xfs: zap broken inode forks From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:21:20 -0700 Message-ID: <153256447998.29021.6678908057246548154.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_attr_leaf.c | 32 ++- fs/xfs/libxfs/xfs_attr_leaf.h | 2 fs/xfs/libxfs/xfs_bmap.c | 21 ++ fs/xfs/libxfs/xfs_bmap.h | 2 fs/xfs/scrub/inode_repair.c | 401 +++++++++++++++++++++++++++++++++++++++++ 5 files changed, 439 insertions(+), 19 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index a673037c7d37..54e24001a2d0 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -898,23 +898,16 @@ xfs_attr_shortform_allfit( return xfs_attr_shortform_bytesfit(dp, bytes); } -/* Verify the consistency of an inline attribute fork. */ +/* Verify the consistency of a raw inline attribute fork. */ xfs_failaddr_t -xfs_attr_shortform_verify( - struct xfs_inode *ip) +xfs_attr_shortform_verify_struct( + struct xfs_attr_shortform *sfp, + size_t size) { - struct xfs_attr_shortform *sfp; struct xfs_attr_sf_entry *sfep; struct xfs_attr_sf_entry *next_sfep; char *endp; - struct xfs_ifork *ifp; int i; - int size; - - ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL); - ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK); - sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data; - size = ifp->if_bytes; /* * Give up if the attribute is way too short. @@ -972,6 +965,23 @@ xfs_attr_shortform_verify( return NULL; } +/* Verify the consistency of an inline attribute fork. */ +xfs_failaddr_t +xfs_attr_shortform_verify( + struct xfs_inode *ip) +{ + struct xfs_attr_shortform *sfp; + struct xfs_ifork *ifp; + int size; + + ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL); + ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK); + sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data; + size = ifp->if_bytes; + + return xfs_attr_shortform_verify_struct(sfp, size); +} + /* * Convert a leaf attribute list to shortform attribute list */ diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h index 7b74e18becff..728af25a1738 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.h +++ b/fs/xfs/libxfs/xfs_attr_leaf.h @@ -41,6 +41,8 @@ int xfs_attr_shortform_to_leaf(struct xfs_da_args *args, int xfs_attr_shortform_remove(struct xfs_da_args *args); int xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp); int xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes); +xfs_failaddr_t xfs_attr_shortform_verify_struct(struct xfs_attr_shortform *sfp, + size_t size); xfs_failaddr_t xfs_attr_shortform_verify(struct xfs_inode *ip); void xfs_attr_fork_remove(struct xfs_inode *ip, struct xfs_trans *tp); diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 50119b54a2b5..cf89c5cfd8f6 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6201,18 +6201,16 @@ xfs_bmap_finish_one( return error; } -/* Check that an inode's extent does not have invalid flags or bad ranges. */ +/* Check that an extent does not have invalid flags or bad ranges. */ xfs_failaddr_t -xfs_bmap_validate_extent( - struct xfs_inode *ip, +xfs_bmap_validate_extent_raw( + struct xfs_mount *mp, + bool isrt, int whichfork, struct xfs_bmbt_irec *irec) { - struct xfs_mount *mp = ip->i_mount; xfs_fsblock_t endfsb; - bool isrt; - isrt = XFS_IS_REALTIME_INODE(ip); endfsb = irec->br_startblock + irec->br_blockcount - 1; if (isrt) { if (!xfs_verify_rtbno(mp, irec->br_startblock)) @@ -6236,3 +6234,14 @@ xfs_bmap_validate_extent( } return NULL; } + +/* Check that an inode's extent does not have invalid flags or bad ranges. */ +xfs_failaddr_t +xfs_bmap_validate_extent( + struct xfs_inode *ip, + int whichfork, + struct xfs_bmbt_irec *irec) +{ + return xfs_bmap_validate_extent_raw(ip->i_mount, + XFS_IS_REALTIME_INODE(ip), whichfork, irec); +} diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 9b49ddf99c41..d2c15b2f0fc9 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -284,6 +284,8 @@ static inline int xfs_bmap_fork_to_state(int whichfork) } } +xfs_failaddr_t xfs_bmap_validate_extent_raw(struct xfs_mount *mp, bool isrt, + int whichfork, struct xfs_bmbt_irec *irec); xfs_failaddr_t xfs_bmap_validate_extent(struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *irec); diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 5ff929ea3d11..48918f09ebc9 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -22,11 +22,15 @@ #include "xfs_ialloc.h" #include "xfs_da_format.h" #include "xfs_reflink.h" +#include "xfs_alloc.h" #include "xfs_rmap.h" +#include "xfs_rmap_btree.h" #include "xfs_bmap.h" +#include "xfs_bmap_btree.h" #include "xfs_bmap_util.h" #include "xfs_dir2.h" #include "xfs_quota_defs.h" +#include "xfs_attr_leaf.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -139,7 +143,8 @@ xrep_dinode_mode( STATIC void xrep_dinode_flags( struct xfs_scrub *sc, - struct xfs_dinode *dip) + struct xfs_dinode *dip, + bool is_rt_file) { struct xfs_mount *mp = sc->mp; uint64_t flags2; @@ -150,6 +155,11 @@ xrep_dinode_flags( flags = be16_to_cpu(dip->di_flags); flags2 = be64_to_cpu(dip->di_flags2); + if (is_rt_file) + flags |= XFS_DIFLAG_REALTIME; + else + flags &= ~XFS_DIFLAG_REALTIME; + if (xfs_sb_version_hasreflink(&mp->m_sb) && S_ISREG(mode)) flags2 |= XFS_DIFLAG2_REFLINK; else @@ -288,11 +298,392 @@ xrep_dinode_extsize_hints( } } +/* Blocks and extents associated with an inode, according to rmap records. */ +struct xrep_dinode_stats { + struct xfs_scrub *sc; + + /* Blocks in use on the data device by data extents or bmbt blocks. */ + xfs_rfsblock_t data_blocks; + + /* Blocks in use on the rt device. */ + xfs_rfsblock_t rt_blocks; + + /* Blocks in use by the attr fork. */ + xfs_rfsblock_t attr_blocks; + + /* Number of data device extents for the data fork. */ + xfs_extnum_t data_extents; + + /* + * Number of realtime device extents for the data fork. If + * data_extents and rt_extents indicate that the data fork has extents + * on both devices, we'll just back away slowly. + */ + xfs_extnum_t rt_extents; + + /* Number of (data device) extents for the attr fork. */ + xfs_aextnum_t attr_extents; +}; + +/* Count extents and blocks for an inode given an rmap. */ +STATIC int +xrep_dinode_walk_rmap( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_dinode_stats *dis = priv; + + /* Is this even the right fork? */ + if (rec->rm_owner != dis->sc->sm->sm_ino) + return 0; + if (rec->rm_flags & XFS_RMAP_ATTR_FORK) { + dis->attr_blocks += rec->rm_blockcount; + if (!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK)) + dis->attr_extents++; + } else { + dis->data_blocks += rec->rm_blockcount; + if (!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK)) + dis->data_extents++; + } + return 0; +} + +/* Count extents and blocks for an inode from all AG rmap data. */ +STATIC int +xrep_dinode_count_ag_rmaps( + struct xrep_dinode_stats *dis, + xfs_agnumber_t agno) +{ + struct xfs_btree_cur *cur; + struct xfs_buf *agf; + int error; + + error = xfs_alloc_read_agf(dis->sc->mp, dis->sc->tp, agno, 0, &agf); + if (error) + return error; + + cur = xfs_rmapbt_init_cursor(dis->sc->mp, dis->sc->tp, agf, agno); + if (!cur) { + error = -ENOMEM; + goto out_agf; + } + + error = xfs_rmap_query_all(cur, xrep_dinode_walk_rmap, dis); + if (error == XFS_BTREE_QUERY_RANGE_ABORT) + error = 0; + + xfs_btree_del_cursor(cur, error); +out_agf: + xfs_trans_brelse(dis->sc->tp, agf); + return error; +} + +/* Count extents and blocks for a given inode from all rmap data. */ +STATIC int +xrep_dinode_count_rmaps( + struct xrep_dinode_stats *dis) +{ + xfs_agnumber_t agno; + int error; + + if (!xfs_sb_version_hasrmapbt(&dis->sc->mp->m_sb) || + xfs_sb_version_hasrealtime(&dis->sc->mp->m_sb)) + return -EOPNOTSUPP; + + /* XXX: find rt blocks too */ + if (dis->rt_extents != 0) { + ASSERT(0); + return -EOPNOTSUPP; + } + + for (agno = 0; agno < dis->sc->mp->m_sb.sb_agcount; agno++) { + error = xrep_dinode_count_ag_rmaps(dis, agno); + if (error) + return error; + } + + /* Can't have extents on both the rt and the data device. */ + if (dis->data_extents && dis->rt_extents) + return -EFSCORRUPTED; + + return 0; +} + +/* Return true if this extents-format ifork looks like garbage. */ +STATIC bool +xrep_dinode_bad_extents_fork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + int dfork_size, + int whichfork) +{ + struct xfs_bmbt_irec new; + struct xfs_bmbt_rec *dp; + bool isrt; + int i; + int nex; + int fork_size; + + nex = XFS_DFORK_NEXTENTS(dip, whichfork); + fork_size = nex * sizeof(struct xfs_bmbt_rec); + if (fork_size < 0 || fork_size > dfork_size) + return true; + if (whichfork == XFS_ATTR_FORK && nex > ((uint16_t)-1U)) + return true; + dp = XFS_DFORK_PTR(dip, whichfork); + + isrt = dip->di_flags & cpu_to_be16(XFS_DIFLAG_REALTIME); + for (i = 0; i < nex; i++, dp++) { + xfs_failaddr_t fa; + + xfs_bmbt_disk_get_all(dp, &new); + fa = xfs_bmap_validate_extent_raw(sc->mp, isrt, whichfork, + &new); + if (fa) + return true; + } + + return false; +} + +/* Return true if this btree-format ifork looks like garbage. */ +STATIC bool +xrep_dinode_bad_btree_fork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + int dfork_size, + int whichfork) +{ + struct xfs_bmdr_block *dfp; + int nrecs; + int level; + + if (XFS_DFORK_NEXTENTS(dip, whichfork) <= + dfork_size / sizeof(struct xfs_bmbt_irec)) + return true; + + dfp = XFS_DFORK_PTR(dip, whichfork); + nrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (nrecs == 0 || XFS_BMDR_SPACE_CALC(nrecs) > dfork_size) + return true; + if (level == 0 || level > XFS_BTREE_MAXLEVELS) + return true; + return false; +} + +/* + * Check the data fork for things that will fail the ifork verifiers or the + * ifork formatters. + */ +STATIC bool +xrep_dinode_check_dfork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + uint16_t mode) +{ + uint64_t size; + unsigned int fmt; + int dfork_size; + + fmt = XFS_DFORK_FORMAT(dip, XFS_DATA_FORK); + size = be64_to_cpu(dip->di_size); + switch (mode & S_IFMT) { + case S_IFIFO: + case S_IFCHR: + case S_IFBLK: + case S_IFSOCK: + if (fmt != XFS_DINODE_FMT_DEV) + return true; + break; + case S_IFREG: + if (fmt == XFS_DINODE_FMT_LOCAL) + return true; + /* fall through */ + case S_IFLNK: + case S_IFDIR: + switch (fmt) { + case XFS_DINODE_FMT_LOCAL: + case XFS_DINODE_FMT_EXTENTS: + case XFS_DINODE_FMT_BTREE: + break; + default: + return true; + } + break; + default: + return true; + } + dfork_size = XFS_DFORK_SIZE(dip, sc->mp, XFS_DATA_FORK); + switch (fmt) { + case XFS_DINODE_FMT_DEV: + break; + case XFS_DINODE_FMT_LOCAL: + if (size > dfork_size) + return true; + break; + case XFS_DINODE_FMT_EXTENTS: + if (xrep_dinode_bad_extents_fork(sc, dip, dfork_size, + XFS_DATA_FORK)) + return true; + break; + case XFS_DINODE_FMT_BTREE: + if (xrep_dinode_bad_btree_fork(sc, dip, dfork_size, + XFS_DATA_FORK)) + return true; + break; + default: + return true; + } + + return false; +} + +/* Reset the data fork to something sane. */ +STATIC void +xrep_dinode_zap_dfork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + uint16_t mode, + struct xrep_dinode_stats *dis) +{ + /* Special files always get reset to DEV */ + switch (mode & S_IFMT) { + case S_IFIFO: + case S_IFCHR: + case S_IFBLK: + case S_IFSOCK: + dip->di_format = XFS_DINODE_FMT_DEV; + dip->di_size = 0; + return; + } + + /* + * If we have data extents, reset to an empty map and hope the user + * will run the bmapbtd checker next. + */ + if (dis->data_extents || dis->rt_extents || S_ISREG(mode)) { + dip->di_format = XFS_DINODE_FMT_EXTENTS; + dip->di_nextents = 0; + return; + } + + /* Otherwise, reset the local format to the minimum. */ + switch (mode & S_IFMT) { + case S_IFLNK: + xrep_dinode_zap_symlink(dip); + break; + case S_IFDIR: + xrep_dinode_zap_dir(sc->mp, dip); + break; + } +} + +/* + * Check the attr fork for things that will fail the ifork verifiers or the + * ifork formatters. + */ +STATIC bool +xrep_dinode_check_afork( + struct xfs_scrub *sc, + struct xfs_dinode *dip) +{ + struct xfs_attr_shortform *sfp; + int size; + + if (XFS_DFORK_BOFF(dip) == 0) + return dip->di_aformat != XFS_DINODE_FMT_EXTENTS || + dip->di_anextents != 0; + + size = XFS_DFORK_SIZE(dip, sc->mp, XFS_ATTR_FORK); + switch (XFS_DFORK_FORMAT(dip, XFS_ATTR_FORK)) { + case XFS_DINODE_FMT_LOCAL: + sfp = XFS_DFORK_PTR(dip, XFS_ATTR_FORK); + return xfs_attr_shortform_verify_struct(sfp, size) != NULL; + case XFS_DINODE_FMT_EXTENTS: + if (xrep_dinode_bad_extents_fork(sc, dip, size, XFS_ATTR_FORK)) + return true; + break; + case XFS_DINODE_FMT_BTREE: + if (xrep_dinode_bad_btree_fork(sc, dip, size, XFS_ATTR_FORK)) + return true; + break; + default: + return true; + } + + return false; +} + +/* Reset the attr fork to something sane. */ +STATIC void +xrep_dinode_zap_afork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + struct xrep_dinode_stats *dis) +{ + dip->di_aformat = XFS_DINODE_FMT_EXTENTS; + dip->di_anextents = 0; + /* + * We leave a nonzero forkoff so that the bmap scrub will look for + * attr rmaps. + */ + dip->di_forkoff = dis->attr_extents ? 1 : 0; +} + +/* + * Zap the data/attr forks if we spot anything that isn't going to pass the + * ifork verifiers or the ifork formatters, because we need to get the inode + * into good enough shape that the higher level repair functions can run. + */ +STATIC void +xrep_dinode_zap_forks( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + struct xrep_dinode_stats *dis) +{ + uint16_t mode; + bool zap_datafork = false; + bool zap_attrfork = false; + + mode = be16_to_cpu(dip->di_mode); + + /* Inode counters don't make sense? */ + if (be32_to_cpu(dip->di_nextents) > be64_to_cpu(dip->di_nblocks)) + zap_datafork = true; + if (be16_to_cpu(dip->di_anextents) > be64_to_cpu(dip->di_nblocks)) + zap_attrfork = true; + if (be32_to_cpu(dip->di_nextents) + be16_to_cpu(dip->di_anextents) > + be64_to_cpu(dip->di_nblocks)) + zap_datafork = zap_attrfork = true; + + if (!zap_datafork) + zap_datafork = xrep_dinode_check_dfork(sc, dip, mode); + if (!zap_attrfork) + zap_attrfork = xrep_dinode_check_afork(sc, dip); + + /* Zap whatever's bad. */ + if (zap_attrfork) + xrep_dinode_zap_afork(sc, dip, dis); + if (zap_datafork) + xrep_dinode_zap_dfork(sc, dip, mode, dis); + dip->di_nblocks = 0; + if (!zap_attrfork) + be64_add_cpu(&dip->di_nblocks, dis->attr_blocks); + if (!zap_datafork) { + be64_add_cpu(&dip->di_nblocks, dis->data_blocks); + be64_add_cpu(&dip->di_nblocks, dis->rt_blocks); + } +} + /* Inode didn't pass verifiers, so fix the raw buffer and retry iget. */ STATIC int xrep_dinode_core( struct xfs_scrub *sc) { + struct xrep_dinode_stats dis = { .sc = sc }; struct xfs_imap imap; struct xfs_buf *bp; struct xfs_dinode *dip; @@ -300,6 +691,11 @@ xrep_dinode_core( bool inuse; int error; + /* Figure out what this inode had mapped in both forks. */ + error = xrep_dinode_count_rmaps(&dis); + if (error) + return error; + /* Map & read inode. */ ino = sc->sm->sm_ino; error = xfs_imap(sc->mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED); @@ -326,9 +722,10 @@ xrep_dinode_core( dip = xfs_buf_offset(bp, imap.im_boffset); xrep_dinode_header(sc, dip); xrep_dinode_mode(dip); - xrep_dinode_flags(sc, dip); + xrep_dinode_flags(sc, dip, dis.rt_extents > 0); xrep_dinode_size(sc->mp, dip); xrep_dinode_extsize_hints(sc, dip); + xrep_dinode_zap_forks(sc, dip, &dis); /* Write out the inode... */ xfs_dinode_calc_crc(sc->mp, dip); From patchwork Thu Jul 26 00:21:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544995 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6B516112E for ; Thu, 26 Jul 2018 00:21:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 56F8B2AB00 for ; Thu, 26 Jul 2018 00:21:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4BADE2AB26; Thu, 26 Jul 2018 00:21:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC6C42AB1B for ; Thu, 26 Jul 2018 00:21:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728268AbeGZBfm (ORCPT ); Wed, 25 Jul 2018 21:35:42 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:47552 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbeGZBfm (ORCPT ); Wed, 25 Jul 2018 21:35:42 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0Fek8003620; Thu, 26 Jul 2018 00:21:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=i6mnaQ1EfyFSnM5P0cLodzSgEzF+fx4oO52A1RaT/Pg=; b=GVOdG8guXe4VXdMRUyDD1t7ardcm22dCy+dwTMxXtpVNSujrH6W9NIAEMgJ4++qsm48X T8TVUZG4tYxfQqAzUzCZJWM53FPZN1BYm8vPVMoZTQQe5cuRzWRUQHWWs86tAmFDTv5h VPgFbUWOt61plcndjyI+KulEJnNwqD6/Dx13TYVBEVhbf2qM1hbx/+7OgwfyO+RSTR+8 rgLHzROOxzzQoan8/hD/MjLs+zg366kJowfN7OzhcTDf+UQRsqQmCb1yWKGSO1JQW9VD Rrw+JyV4HC5p+KIybqFU8V7IPpa9mCgrpTc8Aus94EultYx0QYD5tUmg4VhVcJe0tQcB iw== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2120.oracle.com with ESMTP id 2kbvsnyp2g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:30 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LTRl003748 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:29 GMT Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6Q0LTP8009602; Thu, 26 Jul 2018 00:21:29 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:21:28 -0700 Subject: [PATCH 12/16] xfs: repair inode block maps From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:21:26 -0700 Message-ID: <153256448640.29021.7539784905847600388.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Use the reverse-mapping btree information to rebuild an inode fork. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/bmap.c | 22 ++ fs/xfs/scrub/bmap_repair.c | 520 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 4 fs/xfs/scrub/scrub.c | 4 fs/xfs/scrub/trace.h | 2 fs/xfs/xfs_trans.c | 54 +++++ fs/xfs/xfs_trans.h | 2 8 files changed, 606 insertions(+), 3 deletions(-) create mode 100644 fs/xfs/scrub/bmap_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index e01b5003d543..7f5467bb18b9 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -166,6 +166,7 @@ xfs-y += $(addprefix scrub/, \ agheader_repair.o \ alloc_repair.o \ bitmap.o \ + bmap_repair.o \ ialloc_repair.o \ inode_repair.o \ refcount_repair.o \ diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index e1d11f3223e3..6659f41e7b4c 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -37,6 +37,7 @@ xchk_setup_inode_bmap( struct xfs_scrub *sc, struct xfs_inode *ip) { + bool is_repair = false; int error; error = xchk_get_inode(sc, ip); @@ -46,6 +47,10 @@ xchk_setup_inode_bmap( sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; xfs_ilock(sc->ip, sc->ilock_flags); +#ifdef CONFIG_XFS_REPAIR + is_repair = (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR); +#endif + /* * We don't want any ephemeral data fork updates sitting around * while we inspect block mappings, so wait for directio to finish @@ -53,10 +58,27 @@ xchk_setup_inode_bmap( */ if (S_ISREG(VFS_I(sc->ip)->i_mode) && sc->sm->sm_type == XFS_SCRUB_TYPE_BMBTD) { + /* Break all our leases, we're going to mess with things. */ + if (is_repair) { + error = xfs_break_layouts(VFS_I(sc->ip), + &sc->ilock_flags, BREAK_UNMAP); + if (error) + goto out; + } + inode_dio_wait(VFS_I(sc->ip)); error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping); if (error) goto out; + + /* Drop the page cache if we're repairing block mappings. */ + if (is_repair) { + error = invalidate_inode_pages2( + VFS_I(sc->ip)->i_mapping); + if (error) + goto out; + } + } /* Got the inode, lock it and we're ready to go. */ diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c new file mode 100644 index 000000000000..9b0172be0335 --- /dev/null +++ b/fs/xfs/scrub/bmap_repair.c @@ -0,0 +1,520 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_inode_fork.h" +#include "xfs_alloc.h" +#include "xfs_rtalloc.h" +#include "xfs_bmap.h" +#include "xfs_bmap_util.h" +#include "xfs_bmap_btree.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_refcount.h" +#include "xfs_quota.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" + +/* + * Inode fork block mapping (BMBT) repair. + * + * Basically, we gather all the rmap records for the inode and fork we're + * fixing, reset the incore fork, then re-add all the records. + */ + +struct xrep_bmap_extent { + struct list_head list; + struct xfs_rmap_irec rmap; + xfs_agnumber_t agno; +}; + +struct xrep_bmap { + /* List of new bmap records. */ + struct list_head *extlist; + + /* Old bmbt blocks */ + struct xfs_bitmap *btlist; + + struct xfs_scrub *sc; + + /* Inode we're fixing. */ + xfs_ino_t ino; + + /* How many blocks did we find in the other fork? */ + xfs_rfsblock_t otherfork_blocks; + + /* How many bmbt blocks did we find for this fork? */ + xfs_rfsblock_t bmbt_blocks; + + /* How many extents did we find for this fork? */ + xfs_extnum_t extents; + + /* Which fork are we fixing? */ + int whichfork; +}; + +/* Record extents that belong to this inode's fork. */ +STATIC int +xrep_bmap_walk_rmap( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_bmap *rb = priv; + struct xrep_bmap_extent *rbe; + struct xfs_mount *mp = cur->bc_mp; + xfs_fsblock_t fsbno; + int error = 0; + + if (xchk_should_terminate(rb->sc, &error)) + return error; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rb->ino) { + return 0; + } else if (rb->whichfork == XFS_DATA_FORK && + (rec->rm_flags & XFS_RMAP_ATTR_FORK)) { + rb->otherfork_blocks += rec->rm_blockcount; + return 0; + } else if (rb->whichfork == XFS_ATTR_FORK && + !(rec->rm_flags & XFS_RMAP_ATTR_FORK)) { + rb->otherfork_blocks += rec->rm_blockcount; + return 0; + } + + /* Delete the old bmbt blocks later. */ + if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) { + fsbno = XFS_AGB_TO_FSB(mp, cur->bc_private.a.agno, + rec->rm_startblock); + rb->bmbt_blocks += rec->rm_blockcount; + return xfs_bitmap_set(rb->btlist, fsbno, rec->rm_blockcount); + } + + /* Remember this rmap. */ + rb->extents++; + trace_xrep_bmap_walk_rmap(mp, cur->bc_private.a.agno, + rec->rm_startblock, rec->rm_blockcount, rec->rm_owner, + rec->rm_offset, rec->rm_flags); + + rbe = kmem_alloc(sizeof(struct xrep_bmap_extent), KM_MAYFAIL); + if (!rbe) + return -ENOMEM; + + INIT_LIST_HEAD(&rbe->list); + rbe->rmap = *rec; + rbe->agno = cur->bc_private.a.agno; + list_add_tail(&rbe->list, rb->extlist); + + return 0; +} + +/* Compare two bmap extents. */ +static int +xrep_bmap_extent_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_bmap_extent *ap; + struct xrep_bmap_extent *bp; + + ap = container_of(a, struct xrep_bmap_extent, list); + bp = container_of(b, struct xrep_bmap_extent, list); + + if (ap->rmap.rm_offset > bp->rmap.rm_offset) + return 1; + else if (ap->rmap.rm_offset < bp->rmap.rm_offset) + return -1; + return 0; +} + +/* Scan one AG for reverse mappings that we can turn into extent maps. */ +STATIC int +xrep_bmap_scan_ag( + struct xrep_bmap *rb, + xfs_agnumber_t agno) +{ + struct xfs_scrub *sc = rb->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_buf *agf_bp = NULL; + struct xfs_btree_cur *cur; + int error; + + error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, &agf_bp); + if (error) + return error; + if (!agf_bp) + return -ENOMEM; + cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, agno); + error = xfs_rmap_query_all(cur, xrep_bmap_walk_rmap, rb); + if (error == XFS_BTREE_QUERY_RANGE_ABORT) + error = 0; + xfs_btree_del_cursor(cur, error); + xfs_trans_brelse(sc->tp, agf_bp); + return error; +} + +/* Insert bmap records into an inode fork, given an rmap. */ +STATIC int +xrep_bmap_insert_rec( + struct xfs_scrub *sc, + struct xrep_bmap_extent *rbe, + int baseflags) +{ + struct xfs_bmbt_irec bmap; + struct xfs_defer_ops dfops; + xfs_fsblock_t firstfsb; + xfs_extlen_t extlen; + int flags; + int error = 0; + + /* Form the "new" mapping... */ + bmap.br_startblock = XFS_AGB_TO_FSB(sc->mp, rbe->agno, + rbe->rmap.rm_startblock); + bmap.br_startoff = rbe->rmap.rm_offset; + + flags = 0; + if (rbe->rmap.rm_flags & XFS_RMAP_UNWRITTEN) + flags = XFS_BMAPI_PREALLOC; + while (rbe->rmap.rm_blockcount > 0) { + xfs_defer_init(&dfops, &firstfsb); + extlen = min_t(xfs_extlen_t, rbe->rmap.rm_blockcount, + MAXEXTLEN); + bmap.br_blockcount = extlen; + + /* Re-add the extent to the fork. */ + error = xfs_bmapi_remap(sc->tp, sc->ip, bmap.br_startoff, + extlen, bmap.br_startblock, &dfops, + baseflags | flags); + if (error) + goto out_cancel; + + bmap.br_startblock += extlen; + bmap.br_startoff += extlen; + rbe->rmap.rm_blockcount -= extlen; + error = xfs_defer_ijoin(&dfops, sc->ip); + if (error) + goto out_cancel; + error = xfs_defer_finish(&sc->tp, &dfops); + if (error) + goto out; + /* Make sure we roll the transaction. */ + error = xfs_trans_roll_inode(&sc->tp, sc->ip); + if (error) + goto out; + } + + return 0; +out_cancel: + xfs_defer_cancel(&dfops); +out: + return error; +} + +/* Check for garbage inputs. */ +STATIC int +xrep_bmap_check_inputs( + struct xfs_scrub *sc, + int whichfork) +{ + ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK); + + /* Don't know how to repair the other fork formats. */ + if (XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_EXTENTS && + XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_BTREE) + return -EOPNOTSUPP; + + /* + * If there's no attr fork area in the inode, there's no attr fork to + * rebuild. + */ + if (whichfork == XFS_ATTR_FORK) { + if (!XFS_IFORK_Q(sc->ip)) + return -ENOENT; + return 0; + } + + /* Only files, symlinks, and directories get to have data forks. */ + switch (VFS_I(sc->ip)->i_mode & S_IFMT) { + case S_IFREG: + case S_IFDIR: + case S_IFLNK: + /* ok */ + break; + default: + return -EINVAL; + } + + /* If we somehow have delalloc extents, forget it. */ + if (sc->ip->i_delayed_blks) + return -EBUSY; + + /* Don't know how to rebuild realtime data forks. */ + if (XFS_IS_REALTIME_INODE(sc->ip)) + return -EOPNOTSUPP; + + return 0; +} + +/* + * Collect block mappings for this fork of this inode and decide if we have + * enough space to rebuild. Caller is responsible for cleaning up the list if + * anything goes wrong. + */ +STATIC int +xrep_bmap_find_mappings( + struct xfs_scrub *sc, + int whichfork, + struct list_head *mapping_records, + struct xfs_bitmap *old_bmbt_blocks, + xfs_rfsblock_t *old_bmbt_block_count, + xfs_rfsblock_t *otherfork_blocks) +{ + struct xrep_bmap rb; + xfs_agnumber_t agno; + unsigned int resblks; + int error; + + memset(&rb, 0, sizeof(rb)); + rb.extlist = mapping_records; + rb.btlist = old_bmbt_blocks; + rb.ino = sc->ip->i_ino; + rb.whichfork = whichfork; + rb.sc = sc; + + /* Iterate the rmaps for extents. */ + for (agno = 0; agno < sc->mp->m_sb.sb_agcount; agno++) { + error = xrep_bmap_scan_ag(&rb, agno); + if (error) + return error; + } + + /* + * Guess how many blocks we're going to need to rebuild an entire bmap + * from the number of extents we found, and pump up our transaction to + * have sufficient block reservation. + */ + resblks = xfs_bmbt_calc_size(sc->mp, rb.extents); + error = xfs_trans_reserve_more(sc->tp, resblks, 0); + if (error) + return error; + + *otherfork_blocks = rb.otherfork_blocks; + *old_bmbt_block_count = rb.bmbt_blocks; + return 0; +} + +/* Update the inode counters. */ +STATIC int +xrep_bmap_reset_counters( + struct xfs_scrub *sc, + xfs_rfsblock_t old_bmbt_block_count, + xfs_rfsblock_t otherfork_blocks, + int *log_flags) +{ + int error; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* + * We're going to use the bmap routines to reconstruct a fork from rmap + * records. Those functions increment di_nblocks for us, so we need to + * subtract out all the data and bmbt blocks from the fork we're about + * to rebuild. otherfork_blocks reflects all the data and bmbt blocks + * for the other fork, so this assignment effectively performs the + * subtraction for us. + */ + sc->ip->i_d.di_nblocks = otherfork_blocks; + *log_flags |= XFS_ILOG_CORE; + + if (!old_bmbt_block_count) + return 0; + + /* Release quota counts for the old bmbt blocks. */ + error = xrep_ino_dqattach(sc); + if (error) + return error; + xfs_trans_mod_dquot_byino(sc->tp, sc->ip, XFS_TRANS_DQ_BCOUNT, + -(int64_t)old_bmbt_block_count); + return 0; +} + +/* Initialize a new fork and implant it in the inode. */ +STATIC void +xrep_bmap_reset_fork( + struct xfs_scrub *sc, + int whichfork, + bool has_mappings, + int *log_flags) +{ + /* Set us back to extents format with zero records. */ + XFS_IFORK_FMT_SET(sc->ip, whichfork, XFS_DINODE_FMT_EXTENTS); + XFS_IFORK_NEXT_SET(sc->ip, whichfork, 0); + + /* Reinitialize the in-core fork. */ + if (XFS_IFORK_PTR(sc->ip, whichfork) != NULL) + xfs_idestroy_fork(sc->ip, whichfork); + if (whichfork == XFS_DATA_FORK) { + memset(&sc->ip->i_df, 0, sizeof(struct xfs_ifork)); + sc->ip->i_df.if_flags |= XFS_IFEXTENTS; + } else if (whichfork == XFS_ATTR_FORK) { + if (has_mappings) { + sc->ip->i_afp = NULL; + } else { + sc->ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, + KM_SLEEP); + sc->ip->i_afp->if_flags |= XFS_IFEXTENTS; + } + } + + /* + * Now that we've reinitialized the in-memory fork and set the inode + * back to extents format with zero extents, any extents that we + * subsequently map into the file will reinitialize the on-disk fork + * area for us. All we have to do is log the inode core to preserve + * the format and extent count fields. + */ + *log_flags |= XFS_ILOG_CORE; +} + +/* Make our changes permanent so that we can start rebuilding the fork. */ +STATIC int +xrep_bmap_commit_new( + struct xfs_scrub *sc, + int log_flags) +{ + xfs_trans_log_inode(sc->tp, sc->ip, log_flags); + return xfs_trans_roll_inode(&sc->tp, sc->ip); +} + +/* Build new fork mappings and dispose of the old bmbt blocks. */ +STATIC int +xrep_bmap_rebuild_tree( + struct xfs_scrub *sc, + int whichfork, + struct list_head *mapping_records, + struct xfs_bitmap *old_bmbt_blocks) +{ + struct xfs_owner_info oinfo; + struct xrep_bmap_extent *rbe; + struct xrep_bmap_extent *n; + int baseflags; + int error; + + baseflags = XFS_BMAPI_NORMAP; + if (whichfork == XFS_ATTR_FORK) + baseflags |= XFS_BMAPI_ATTRFORK; + + /* "Remap" the extents into the fork. */ + list_sort(NULL, mapping_records, xrep_bmap_extent_cmp); + list_for_each_entry_safe(rbe, n, mapping_records, list) { + error = xrep_bmap_insert_rec(sc, rbe, baseflags); + if (error) + return error; + list_del(&rbe->list); + kmem_free(rbe); + } + + /* Dispose of all the old bmbt blocks. */ + xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, whichfork); + return xrep_reap_extents(sc, old_bmbt_blocks, &oinfo, + XFS_AG_RESV_NONE); +} + +/* Free every record in the mapping list. */ +STATIC void +xrep_bmap_cancel_recs( + struct list_head *recs) +{ + struct xrep_bmap_extent *rbe; + struct xrep_bmap_extent *n; + + list_for_each_entry_safe(rbe, n, recs, list) { + list_del(&rbe->list); + kmem_free(rbe); + } +} + +/* Repair an inode fork. */ +STATIC int +xrep_bmap( + struct xfs_scrub *sc, + int whichfork) +{ + struct list_head mapping_records; + struct xfs_bitmap old_bmbt_blocks; + xfs_rfsblock_t old_bmbt_block_count; + xfs_rfsblock_t otherfork_blocks; + int log_flags = 0; + int error = 0; + + error = xrep_bmap_check_inputs(sc, whichfork); + if (error) + return error; + + /* Collect all reverse mappings for this fork's extents. */ + INIT_LIST_HEAD(&mapping_records); + xfs_bitmap_init(&old_bmbt_blocks); + error = xrep_bmap_find_mappings(sc, whichfork, &mapping_records, + &old_bmbt_blocks, &old_bmbt_block_count, + &otherfork_blocks); + if (error) + goto out; + + /* + * Blow out the in-core fork and zero the on-disk fork. This is the + * point at which we are no longer able to bail out gracefully. + */ + error = xrep_bmap_reset_counters(sc, old_bmbt_block_count, + otherfork_blocks, &log_flags); + if (error) + goto out; + xrep_bmap_reset_fork(sc, whichfork, list_empty(&mapping_records), + &log_flags); + error = xrep_bmap_commit_new(sc, log_flags); + if (error) + goto out; + + /* Now rebuild the fork extent map information. */ + error = xrep_bmap_rebuild_tree(sc, whichfork, &mapping_records, + &old_bmbt_blocks); +out: + xfs_bitmap_destroy(&old_bmbt_blocks); + xrep_bmap_cancel_recs(&mapping_records); + return error; +} + +/* Repair an inode's data fork. */ +int +xrep_bmap_data( + struct xfs_scrub *sc) +{ + return xrep_bmap(sc, XFS_DATA_FORK); +} + +/* Repair an inode's attr fork. */ +int +xrep_bmap_attr( + struct xfs_scrub *sc) +{ + return xrep_bmap(sc, XFS_ATTR_FORK); +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 20e449c7a0df..38444fec70db 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -66,6 +66,8 @@ int xrep_allocbt(struct xfs_scrub *sc); int xrep_iallocbt(struct xfs_scrub *sc); int xrep_refcountbt(struct xfs_scrub *sc); int xrep_inode(struct xfs_scrub *sc); +int xrep_bmap_data(struct xfs_scrub *sc); +int xrep_bmap_attr(struct xfs_scrub *sc); #else @@ -104,6 +106,8 @@ xrep_reset_perag_resv( #define xrep_iallocbt xrep_notsupported #define xrep_refcountbt xrep_notsupported #define xrep_inode xrep_notsupported +#define xrep_bmap_data xrep_notsupported +#define xrep_bmap_attr xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index ae922801808d..45af20a3ab50 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -277,13 +277,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_INODE, .setup = xchk_setup_inode_bmap, .scrub = xchk_bmap_data, - .repair = xrep_notsupported, + .repair = xrep_bmap_data, }, [XFS_SCRUB_TYPE_BMBTA] = { /* inode attr fork */ .type = ST_INODE, .setup = xchk_setup_inode_bmap, .scrub = xchk_bmap_attr, - .repair = xrep_notsupported, + .repair = xrep_bmap_attr, }, [XFS_SCRUB_TYPE_BMBTC] = { /* inode CoW fork */ .type = ST_INODE, diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 9126dc66f726..3383b14fd0c0 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -554,7 +554,7 @@ DEFINE_EVENT(xrep_rmap_class, name, \ DEFINE_REPAIR_RMAP_EVENT(xrep_abt_walk_rmap); DEFINE_REPAIR_RMAP_EVENT(xrep_ibt_walk_rmap); DEFINE_REPAIR_RMAP_EVENT(xrep_rmap_extent_fn); -DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_extent_fn); +DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_walk_rmap); TRACE_EVENT(xrep_refcount_extent_fn, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 524f543c5b82..c08785cf83a9 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -126,6 +126,60 @@ xfs_trans_dup( return ntp; } +/* + * Try to reserve more blocks for a transaction. The single use case we + * support is for online repair -- use a transaction to gather data without + * fear of btree cycle deadlocks; calculate how many blocks we really need + * from that data; and only then start modifying data. This can fail due to + * ENOSPC, so we have to be able to cancel the transaction. + */ +int +xfs_trans_reserve_more( + struct xfs_trans *tp, + uint blocks, + uint rtextents) +{ + struct xfs_mount *mp = tp->t_mountp; + bool rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0; + int error = 0; + + ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY)); + + /* + * Attempt to reserve the needed disk blocks by decrementing + * the number needed from the number available. This will + * fail if the count would go below zero. + */ + if (blocks > 0) { + error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd); + if (error) + return -ENOSPC; + tp->t_blk_res += blocks; + } + + /* + * Attempt to reserve the needed realtime extents by decrementing + * the number needed from the number available. This will + * fail if the count would go below zero. + */ + if (rtextents > 0) { + error = xfs_mod_frextents(mp, -((int64_t)rtextents)); + if (error) { + error = -ENOSPC; + goto out_blocks; + } + tp->t_rtx_res += rtextents; + } + + return 0; +out_blocks: + if (blocks > 0) { + xfs_mod_fdblocks(mp, (int64_t)blocks, rsvd); + tp->t_blk_res -= blocks; + } + return error; +} + /* * This is called to reserve free disk blocks and log space for the * given transaction. This must be done before allocating any resources diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 6526314f0b8f..bdbd3d5fd7b0 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -153,6 +153,8 @@ typedef struct xfs_trans { int xfs_trans_alloc(struct xfs_mount *mp, struct xfs_trans_res *resp, uint blocks, uint rtextents, uint flags, struct xfs_trans **tpp); +int xfs_trans_reserve_more(struct xfs_trans *tp, uint blocks, + uint rtextents); int xfs_trans_alloc_empty(struct xfs_mount *mp, struct xfs_trans **tpp); void xfs_trans_mod_sb(xfs_trans_t *, uint, int64_t); From patchwork Thu Jul 26 00:21:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544997 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CF0CA112E for ; Thu, 26 Jul 2018 00:21:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BB9602AB09 for ; Thu, 26 Jul 2018 00:21:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BA3772AB26; Thu, 26 Jul 2018 00:21:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90FFB2AB3D for ; Thu, 26 Jul 2018 00:21:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728352AbeGZBfu (ORCPT ); Wed, 25 Jul 2018 21:35:50 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:47626 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbeGZBft (ORCPT ); Wed, 25 Jul 2018 21:35:49 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0DrOA002025; Thu, 26 Jul 2018 00:21:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=w/CBa3XCK32YT3cuJUpN4Nz4Pv2oPJ+i4Bot0hh/WhY=; b=ENmty6vja2zJKYJu7eSZMZwDgmlW4sfcKkF8kkdxpJiYehafvBvh0zSVn/34QhuWN3bH hvoRExT96wkEHdC4P0ABJhcJJcQQiEAw7qTn8oZ8WP8tHcBcF+MyuGO9/rM5Cq1lnQjW 11zudwjNCZLN6p7FgFvCyLEQbX/s8+BGsTtjTjqygX9I5nOSrvRoBV71Hn/TWAAZwHmx g1YI5HrsvYthk+XQ0EmsCGmLuApm8AWk9X28KiHkcJDea+IfeiJsLXjFsrhaifN7HwIo EtvkJ3HIj7aPeQhlpcFXR9lR7SZq8cJPajgFN+o4EG0O8i9PxzIOcUQL1H5FQlUAOl2a Xg== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2120.oracle.com with ESMTP id 2kbvsnyp2n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:36 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LZYE017311 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:36 GMT Received: from abhmp0003.oracle.com (abhmp0003.oracle.com [141.146.116.9]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LZBk000885; Thu, 26 Jul 2018 00:21:35 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:21:35 -0700 Subject: [PATCH 13/16] xfs: repair damaged symlinks From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:21:32 -0700 Message-ID: <153256449277.29021.6908013033818398944.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Repair inconsistent symbolic link data. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/symlink.c | 5 + fs/xfs/scrub/symlink_repair.c | 254 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_symlink.c | 151 ++++++++++++++---------- fs/xfs/xfs_symlink.h | 4 + 7 files changed, 350 insertions(+), 69 deletions(-) create mode 100644 fs/xfs/scrub/symlink_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 7f5467bb18b9..e25cde969d99 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -171,6 +171,7 @@ xfs-y += $(addprefix scrub/, \ inode_repair.o \ refcount_repair.o \ repair.o \ + symlink_repair.o \ ) endif endif diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 38444fec70db..17769efb20d9 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -68,6 +68,7 @@ int xrep_refcountbt(struct xfs_scrub *sc); int xrep_inode(struct xfs_scrub *sc); int xrep_bmap_data(struct xfs_scrub *sc); int xrep_bmap_attr(struct xfs_scrub *sc); +int xrep_symlink(struct xfs_scrub *sc); #else @@ -108,6 +109,7 @@ xrep_reset_perag_resv( #define xrep_inode xrep_notsupported #define xrep_bmap_data xrep_notsupported #define xrep_bmap_attr xrep_notsupported +#define xrep_symlink xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 45af20a3ab50..0a8eea77e58f 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -307,7 +307,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_INODE, .setup = xchk_setup_symlink, .scrub = xchk_symlink, - .repair = xrep_notsupported, + .repair = xrep_symlink, }, [XFS_SCRUB_TYPE_PARENT] = { /* parent pointers */ .type = ST_INODE, diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c index f7ebaa946999..ee968c62d0f2 100644 --- a/fs/xfs/scrub/symlink.c +++ b/fs/xfs/scrub/symlink.c @@ -29,12 +29,15 @@ xchk_setup_symlink( struct xfs_scrub *sc, struct xfs_inode *ip) { + uint resblks; + /* Allocate the buffer without the inode lock held. */ sc->buf = kmem_zalloc_large(XFS_SYMLINK_MAXLEN + 1, KM_SLEEP); if (!sc->buf) return -ENOMEM; - return xchk_setup_inode_contents(sc, ip, 0); + resblks = xfs_symlink_blocks(sc->mp, XFS_SYMLINK_MAXLEN); + return xchk_setup_inode_contents(sc, ip, resblks); } /* Symbolic links. */ diff --git a/fs/xfs/scrub/symlink_repair.c b/fs/xfs/scrub/symlink_repair.c new file mode 100644 index 000000000000..6ebf2d5913ed --- /dev/null +++ b/fs/xfs/scrub/symlink_repair.c @@ -0,0 +1,254 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_inode_fork.h" +#include "xfs_symlink.h" +#include "xfs_bmap.h" +#include "xfs_quota.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_bmap_btree.h" +#include "xfs_trans_space.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/trace.h" +#include "scrub/repair.h" + +/* + * Symbolic Link Repair + * ==================== + * + * There's not much we can do to repair symbolic links -- we truncate them to + * the first NULL byte and reinitialize the target. Zero-length symlinks are + * turned into links to the current dir. + */ + +/* Try to salvage the pathname from rmt blocks. */ +STATIC int +xrep_symlink_salvage_remote( + struct xfs_scrub *sc) +{ + struct xfs_bmbt_irec mval[XFS_SYMLINK_MAPS]; + struct xfs_inode *ip = sc->ip; + struct xfs_buf *bp; + char *target_buf = sc->buf; + xfs_failaddr_t fa; + xfs_filblks_t fsblocks; + xfs_daddr_t d; + loff_t len; + loff_t offset; + unsigned int byte_cnt; + bool magic_ok; + bool hdr_ok; + int n; + int nmaps = XFS_SYMLINK_MAPS; + int error; + + /* We'll only read until the buffer is full. */ + len = max_t(loff_t, ip->i_d.di_size, XFS_SYMLINK_MAXLEN); + fsblocks = xfs_symlink_blocks(sc->mp, len); + error = xfs_bmapi_read(ip, 0, fsblocks, mval, &nmaps, 0); + if (error) + return error; + + offset = 0; + for (n = 0; n < nmaps; n++) { + struct xfs_dsymlink_hdr *dsl; + + d = XFS_FSB_TO_DADDR(sc->mp, mval[n].br_startblock); + + /* Read the rmt block. We'll run the verifiers manually. */ + error = xfs_trans_read_buf(sc->mp, sc->tp, sc->mp->m_ddev_targp, + d, XFS_FSB_TO_BB(sc->mp, mval[n].br_blockcount), + 0, &bp, NULL); + if (error) + return error; + bp->b_ops = &xfs_symlink_buf_ops; + + /* How many bytes do we expect to get out of this buffer? */ + byte_cnt = XFS_FSB_TO_B(sc->mp, mval[n].br_blockcount); + byte_cnt = XFS_SYMLINK_BUF_SPACE(sc->mp, byte_cnt); + byte_cnt = min_t(unsigned int, byte_cnt, len); + + /* + * See if the verifiers accept this block. We're willing to + * salvage if the if the offset/byte/ino are ok and either the + * verifier passed or the magic is ok. Anything else and we + * stop dead in our tracks. + */ + fa = bp->b_ops->verify_struct(bp); + dsl = bp->b_addr; + magic_ok = dsl->sl_magic == cpu_to_be32(XFS_SYMLINK_MAGIC); + hdr_ok = xfs_symlink_hdr_ok(ip->i_ino, offset, byte_cnt, bp); + if (!hdr_ok || (fa != NULL && !magic_ok)) + break; + + memcpy(target_buf + offset, dsl + 1, byte_cnt); + + len -= byte_cnt; + offset += byte_cnt; + } + + /* Ensure we have a zero at the end, and /some/ contents. */ + if (offset == 0) + sprintf(target_buf, "."); + else + target_buf[offset] = 0; + return 0; +} + +/* + * Try to salvage an inline symlink's contents. Empty symlinks become a link + * to the current directory. + */ +STATIC void +xrep_symlink_salvage_inline( + struct xfs_scrub *sc) +{ + struct xfs_inode *ip = sc->ip; + struct xfs_ifork *ifp; + + ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); + if (ifp->if_u1.if_data) + strncpy(sc->buf, ifp->if_u1.if_data, XFS_IFORK_DSIZE(ip)); + if (strlen(sc->buf) == 0) + sprintf(sc->buf, "."); +} + +/* Reset an inline symlink to its fresh configuration. */ +STATIC void +xrep_symlink_truncate_inline( + struct xfs_inode *ip) +{ + xfs_idestroy_fork(ip, XFS_DATA_FORK); + ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS; + ip->i_d.di_nextents = 0; + memset(&ip->i_df, 0, sizeof(struct xfs_ifork)); + ip->i_df.if_flags |= XFS_IFEXTENTS; +} + +/* + * Salvage an inline symlink's contents and reset data fork. + * Returns with the inode joined to the transaction. + */ +STATIC int +xrep_symlink_inline( + struct xfs_scrub *sc) +{ + /* Salvage whatever link target information we can find. */ + xrep_symlink_salvage_inline(sc); + + /* Truncate the symlink. */ + xrep_symlink_truncate_inline(sc->ip); + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + return 0; +} + +/* + * Salvage an inline symlink's contents and reset data fork. + * Returns with the inode joined to the transaction. + */ +STATIC int +xrep_symlink_remote( + struct xfs_scrub *sc) +{ + int error; + + /* Salvage whatever link target information we can find. */ + error = xrep_symlink_salvage_remote(sc); + if (error) + return error; + + /* Truncate the symlink. */ + xfs_trans_ijoin(sc->tp, sc->ip, 0); + return xfs_itruncate_extents(&sc->tp, sc->ip, XFS_DATA_FORK, 0); +} + +/* + * Reinitialize a link target. Caller must ensure the inode is joined to + * the transaction. + */ +STATIC int +xrep_symlink_reinitialize( + struct xfs_scrub *sc) +{ + struct xfs_defer_ops dfops; + xfs_fsblock_t first_block; + xfs_fsblock_t fs_blocks; + unsigned int target_len; + uint resblks; + int error; + + /* How many blocks do we need? */ + target_len = strlen(sc->buf); + ASSERT(target_len != 0); + if (target_len == 0 || target_len > XFS_SYMLINK_MAXLEN) + return -EFSCORRUPTED; + + /* Set up to reinitialize the target. */ + xfs_defer_init(&dfops, &first_block); + + fs_blocks = xfs_symlink_blocks(sc->mp, target_len); + resblks = XFS_SYMLINK_SPACE_RES(sc->mp, target_len, fs_blocks); + error = xfs_trans_reserve_quota_nblks(sc->tp, sc->ip, resblks, 0, + XFS_QMOPT_RES_REGBLKS); + + /* Try to write the new target back out. */ + xfs_defer_ijoin(&dfops, sc->ip); + error = xfs_symlink_write_target(sc->tp, sc->ip, &dfops, sc->buf, + target_len, &first_block, fs_blocks, resblks); + if (error) + goto err; + + /* Finish up any block mapping activities. */ + error = xfs_defer_finish(&sc->tp, &dfops); + if (error) + goto err; + return 0; +err: + xfs_defer_cancel(&dfops); + return error; +} + +/* Repair a symbolic link. */ +int +xrep_symlink( + struct xfs_scrub *sc) +{ + struct xfs_ifork *ifp; + int error; + + error = xfs_qm_dqattach_locked(sc->ip, false); + if (error) + return error; + + /* Salvage whatever we can of the target. */ + *((char *)sc->buf) = 0; + ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK); + if (ifp->if_flags & XFS_IFINLINE) + error = xrep_symlink_inline(sc); + else + error = xrep_symlink_remote(sc); + if (error) + return error; + + /* Now reset the target. */ + return xrep_symlink_reinitialize(sc); +} diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index 7f85342a09e6..ea689a0e502d 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -150,6 +150,86 @@ xfs_readlink( return error; } +/* Write the symlink target into the inode. */ +int +xfs_symlink_write_target( + struct xfs_trans *tp, + struct xfs_inode *ip, + struct xfs_defer_ops *dfops, + const char *target_path, + int pathlen, + xfs_fsblock_t *first_block, + xfs_fsblock_t fs_blocks, + uint resblks) +{ + struct xfs_bmbt_irec mval[XFS_SYMLINK_MAPS]; + struct xfs_mount *mp = tp->t_mountp; + const char *cur_chunk; + struct xfs_buf *bp; + xfs_daddr_t d; + int byte_cnt; + int nmaps; + int offset; + int n; + int error; + + /* + * If the symlink will fit into the inode, write it inline. + */ + if (pathlen <= XFS_IFORK_DSIZE(ip)) { + xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen); + + ip->i_d.di_size = pathlen; + ip->i_d.di_format = XFS_DINODE_FMT_LOCAL; + xfs_trans_log_inode(tp, ip, XFS_ILOG_DDATA | XFS_ILOG_CORE); + + return 0; + } + + /* Write target to remote blocks. */ + nmaps = XFS_SYMLINK_MAPS; + error = xfs_bmapi_write(tp, ip, 0, fs_blocks, XFS_BMAPI_METADATA, + first_block, resblks, mval, &nmaps, dfops); + if (error) + return error; + + ip->i_d.di_size = pathlen; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + + cur_chunk = target_path; + offset = 0; + for (n = 0; n < nmaps; n++) { + char *buf; + + d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock); + byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount); + bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, d, + BTOBB(byte_cnt), 0); + if (!bp) + return -ENOMEM; + bp->b_ops = &xfs_symlink_buf_ops; + + byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt); + byte_cnt = min(byte_cnt, pathlen); + + buf = bp->b_addr; + buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset, + byte_cnt, bp); + + memcpy(buf, cur_chunk, byte_cnt); + + cur_chunk += byte_cnt; + pathlen -= byte_cnt; + offset += byte_cnt; + + xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SYMLINK_BUF); + xfs_trans_log_buf(tp, bp, 0, (buf + byte_cnt - 1) - + (char *)bp->b_addr); + } + ASSERT(pathlen == 0); + return 0; +} + int xfs_symlink( struct xfs_inode *dp, @@ -166,15 +246,7 @@ xfs_symlink( struct xfs_defer_ops dfops; xfs_fsblock_t first_block; bool unlock_dp_on_error = false; - xfs_fileoff_t first_fsb; xfs_filblks_t fs_blocks; - int nmaps; - struct xfs_bmbt_irec mval[XFS_SYMLINK_MAPS]; - xfs_daddr_t d; - const char *cur_chunk; - int byte_cnt; - int n; - xfs_buf_t *bp; prid_t prid; struct xfs_dquot *udqp = NULL; struct xfs_dquot *gdqp = NULL; @@ -274,66 +346,11 @@ xfs_symlink( if (resblks) resblks -= XFS_IALLOC_SPACE_RES(mp); - /* - * If the symlink will fit into the inode, write it inline. - */ - if (pathlen <= XFS_IFORK_DSIZE(ip)) { - xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen); - - ip->i_d.di_size = pathlen; - ip->i_d.di_format = XFS_DINODE_FMT_LOCAL; - xfs_trans_log_inode(tp, ip, XFS_ILOG_DDATA | XFS_ILOG_CORE); - } else { - int offset; - - first_fsb = 0; - nmaps = XFS_SYMLINK_MAPS; - - error = xfs_bmapi_write(tp, ip, first_fsb, fs_blocks, - XFS_BMAPI_METADATA, &first_block, resblks, - mval, &nmaps, &dfops); - if (error) - goto out_bmap_cancel; - if (resblks) - resblks -= fs_blocks; - ip->i_d.di_size = pathlen; - xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - - cur_chunk = target_path; - offset = 0; - for (n = 0; n < nmaps; n++) { - char *buf; - - d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock); - byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount); - bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, d, - BTOBB(byte_cnt), 0); - if (!bp) { - error = -ENOMEM; - goto out_bmap_cancel; - } - bp->b_ops = &xfs_symlink_buf_ops; - - byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt); - byte_cnt = min(byte_cnt, pathlen); - - buf = bp->b_addr; - buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset, - byte_cnt, bp); - - memcpy(buf, cur_chunk, byte_cnt); - - cur_chunk += byte_cnt; - pathlen -= byte_cnt; - offset += byte_cnt; - - xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SYMLINK_BUF); - xfs_trans_log_buf(tp, bp, 0, (buf + byte_cnt - 1) - - (char *)bp->b_addr); - } - ASSERT(pathlen == 0); - } + error = xfs_symlink_write_target(tp, ip, &dfops, target_path, pathlen, + &first_block, fs_blocks, resblks); + if (error) + goto out_bmap_cancel; /* * Create the directory entry for the symlink. diff --git a/fs/xfs/xfs_symlink.h b/fs/xfs/xfs_symlink.h index 9743d8c9394b..fd7eaa155939 100644 --- a/fs/xfs/xfs_symlink.h +++ b/fs/xfs/xfs_symlink.h @@ -12,5 +12,9 @@ int xfs_symlink(struct xfs_inode *dp, struct xfs_name *link_name, int xfs_readlink_bmap_ilocked(struct xfs_inode *ip, char *link); int xfs_readlink(struct xfs_inode *ip, char *link); int xfs_inactive_symlink(struct xfs_inode *ip); +int xfs_symlink_write_target(struct xfs_trans *tp, struct xfs_inode *ip, + struct xfs_defer_ops *dfops, const char *target_path, + int pathlen, xfs_fsblock_t *first_block, + xfs_fsblock_t fs_blocks, uint resblks); #endif /* __XFS_SYMLINK_H */ From patchwork Thu Jul 26 00:21:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10544999 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58ADA112E for ; Thu, 26 Jul 2018 00:21:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4244E2AAEF for ; Thu, 26 Jul 2018 00:21:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 368782AB3E; Thu, 26 Jul 2018 00:21:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E3EAD2AAEF for ; Thu, 26 Jul 2018 00:21:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728505AbeGZBfy (ORCPT ); Wed, 25 Jul 2018 21:35:54 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:41470 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbeGZBfy (ORCPT ); Wed, 25 Jul 2018 21:35:54 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0EqWh192319; Thu, 26 Jul 2018 00:21:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=6BIEo3sfmCg49Ju+XfnXNUfEKuWmWtKyRCVADKw/YtQ=; b=QIrWrW6TNCsisKwiMhBgXKUc33OqCWPqAWfRUdyjcLDwV/MxgYuYLlae38szRqwaegBR EYBcZoMCpshELB1gqJH4RHMBPSvRkG/TP/RrGLyCYuThnHJqsQTX74LBvn1fW8gsik6o 4rLlVfVl+mm/AepcGMhwEC+KOOWKWwFuOQ8QJqzD3L8WDTTkDocX4i8aJzZgKeukTB84 DG+ZdHmLuaEUQ4G7UqsMjm5PGRX4QrkKDkJIgLBsqlWHzNOY+rHlelTvKijf/Y1ZUE53 Pwt+C2qO0y39gy33ZyU/NgCVpQt+ga1bS1ddzq1CvFUB6vn9meMQH2/cM6PsKZKlmYZx ig== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2130.oracle.com with ESMTP id 2kbtbcyt1b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:42 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LgaC002141 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:42 GMT Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0Lfwa000947; Thu, 26 Jul 2018 00:21:41 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:21:41 -0700 Subject: [PATCH 14/16] xfs: repair extended attributes From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:21:39 -0700 Message-ID: <153256449926.29021.12551448240961968713.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong If the extended attributes look bad, try to sift through the rubble to find whatever keys/values we can, zap the attr tree, and re-add the values. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/attr.c | 2 fs/xfs/scrub/attr_repair.c | 611 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/scrub.h | 3 6 files changed, 619 insertions(+), 2 deletions(-) create mode 100644 fs/xfs/scrub/attr_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index e25cde969d99..c3963c88f952 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -164,6 +164,7 @@ xfs-$(CONFIG_XFS_QUOTA) += scrub/quota.o ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y) xfs-y += $(addprefix scrub/, \ agheader_repair.o \ + attr_repair.o \ alloc_repair.o \ bitmap.o \ bmap_repair.o \ diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c index 81d5e90547a1..e20074c241b5 100644 --- a/fs/xfs/scrub/attr.c +++ b/fs/xfs/scrub/attr.c @@ -125,7 +125,7 @@ xchk_xattr_listent( * Within a char, the lowest bit of the char represents the byte with * the smallest address */ -STATIC bool +bool xchk_xattr_set_map( struct xfs_scrub *sc, unsigned long *map, diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c new file mode 100644 index 000000000000..5bacfb88f25e --- /dev/null +++ b/fs/xfs/scrub/attr_repair.c @@ -0,0 +1,611 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_dir2.h" +#include "xfs_attr.h" +#include "xfs_attr_leaf.h" +#include "xfs_attr_sf.h" +#include "xfs_attr_remote.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/trace.h" +#include "scrub/repair.h" + +/* + * Extended Attribute Repair + * ========================= + * + * We repair extended attributes by reading the attribute fork blocks looking + * for keys and values, then truncate the entire attr fork and reinsert all + * the attributes. Unfortunately, there's no secondary copy of most extended + * attribute data, which means that if we blow up midway through there's + * little we can do. + */ + +struct xrep_xattr_key { + struct list_head list; + unsigned char *value; + int valuelen; + int flags; + int namelen; + unsigned char name[0]; +}; + +#define XREP_XATTR_KEY_LEN(namelen) \ + (sizeof(struct xrep_xattr_key) + (namelen) + 1) + +struct xrep_xattr { + struct list_head *attrlist; + struct xfs_scrub *sc; +}; + +/* + * Iterate each block in an attr fork extent. The m_attr_geo fsbcount is + * always 1 for now, but code defensively in case this ever changes. + */ +#define for_each_xfs_attr_block(mp, irec, dabno) \ + for ((dabno) = roundup((xfs_dablk_t)(irec)->br_startoff, \ + (mp)->m_attr_geo->fsbcount); \ + (dabno) < (irec)->br_startoff + (irec)->br_blockcount; \ + (dabno) += (mp)->m_attr_geo->fsbcount) + +/* + * Decide if we want to salvage this attribute. We don't bother with + * incomplete or oversized keys or values. + */ +STATIC int +xrep_xattr_want_salvage( + int flags, + int namelen, + int valuelen) +{ + if (flags & XFS_ATTR_INCOMPLETE) + return false; + if (namelen > XATTR_NAME_MAX || namelen <= 0) + return false; + if (valuelen > XATTR_SIZE_MAX || valuelen < 0) + return false; + return true; +} + +/* Allocate an in-core record to hold xattrs while we rebuild the xattr data. */ +STATIC struct xrep_xattr_key * +xrep_xattr_salvage_key( + int flags, + unsigned char *name, + int namelen, + int valuelen) +{ + struct xrep_xattr_key *key; + + /* Store attr key. */ + key = kmem_alloc(XREP_XATTR_KEY_LEN(namelen), KM_MAYFAIL); + if (!key) + return NULL; + INIT_LIST_HEAD(&key->list); + key->valuelen = valuelen; + key->flags = flags & (ATTR_ROOT | ATTR_SECURE); + key->namelen = namelen; + key->name[namelen] = 0; + memcpy(key->name, name, namelen); + key->value = NULL; + if (valuelen) { + key->value = kmem_alloc_large(valuelen, KM_MAYFAIL); + if (!key->value) { + kmem_free(key); + return NULL; + } + } + return key; +} + +/* + * Record a shortform extended attribute key & value for later reinsertion + * into the inode. + */ +STATIC int +xrep_xattr_salvage_sf_attr( + struct xrep_xattr *rx, + struct xfs_attr_sf_entry *sfe) +{ + unsigned char *value = &sfe->nameval[sfe->namelen]; + struct xrep_xattr_key *key; + + if (!xrep_xattr_want_salvage(sfe->flags, sfe->namelen, sfe->valuelen)) + return 0; + key = xrep_xattr_salvage_key(sfe->flags, sfe->nameval, sfe->namelen, + sfe->valuelen); + if (!key) + return -ENOMEM; + if (sfe->valuelen) + memcpy(key->value, value, sfe->valuelen); + list_add_tail(&key->list, rx->attrlist); + return 0; +} + +/* + * Record a local format extended attribute key & value for later reinsertion + * into the inode. + */ +STATIC int +xrep_xattr_salvage_local_attr( + struct xrep_xattr *rx, + struct xfs_attr_leaf_entry *ent, + unsigned int nameidx, + const char *buf_end, + struct xfs_attr_leaf_name_local *lentry) +{ + struct xrep_xattr_key *key; + unsigned long *usedmap = rx->sc->buf; + unsigned int valuelen; + unsigned int namesize; + + /* + * Decode the leaf local entry format. If something seems wrong, we + * junk the attribute. + */ + valuelen = be16_to_cpu(lentry->valuelen); + namesize = xfs_attr_leaf_entsize_local(lentry->namelen, valuelen); + if ((char *)lentry + namesize > buf_end) + return 0; + if (!xrep_xattr_want_salvage(ent->flags, lentry->namelen, valuelen)) + return 0; + if (!xchk_xattr_set_map(rx->sc, usedmap, nameidx, namesize)) + return 0; + + /* Try to save this attribute. */ + key = xrep_xattr_salvage_key(ent->flags, lentry->nameval, + lentry->namelen, valuelen); + if (!key) + return -ENOMEM; + if (valuelen) + memcpy(key->value, &lentry->nameval[lentry->namelen], valuelen); + list_add_tail(&key->list, rx->attrlist); + return 0; +} + +/* + * Record a remote format extended attribute key & value for later reinsertion + * into the inode. + */ +STATIC int +xrep_xattr_salvage_remote_attr( + struct xrep_xattr *rx, + struct xfs_attr_leaf_entry *ent, + unsigned int nameidx, + const char *buf_end, + struct xfs_attr_leaf_name_remote *rentry, + unsigned int ent_idx, + struct xfs_buf *leaf_bp) +{ + struct xfs_da_args args = { + .trans = rx->sc->tp, + .dp = rx->sc->ip, + .index = ent_idx, + .geo = rx->sc->mp->m_attr_geo, + }; + struct xrep_xattr_key *key; + unsigned long *usedmap = rx->sc->buf; + unsigned int valuelen; + unsigned int namesize; + int error; + + /* + * Decode the leaf remote entry format. If something seems wrong, we + * junk the attribute. Note that we should never find a zero-length + * remote attribute value. + */ + valuelen = be32_to_cpu(rentry->valuelen); + namesize = xfs_attr_leaf_entsize_remote(rentry->namelen); + if ((char *)rentry + namesize > buf_end) + return 0; + if (valuelen == 0 || + !xrep_xattr_want_salvage(ent->flags, rentry->namelen, valuelen)) + return 0; + if (!xchk_xattr_set_map(rx->sc, usedmap, nameidx, namesize)) + return 0; + + /* Try to save this attribute. */ + key = xrep_xattr_salvage_key(ent->flags, rentry->name, rentry->namelen, + valuelen); + if (!key) + return -ENOMEM; + + /* Look up the remote value and stash it for reconstruction. */ + args.valuelen = valuelen; + args.namelen = rentry->namelen; + args.name = key->name; + args.value = key->value; + error = xfs_attr3_leaf_getvalue(leaf_bp, &args); + if (error || args.rmtblkno == 0) + goto err_free; + + error = xfs_attr_rmtval_get(&args); + if (error == 0) { + /* Got the value, add the attr and get out. */ + list_add_tail(&key->list, rx->attrlist); + return 0; + } + +err_free: + /* remote value was garbage, junk it */ + if (error == -EFSBADCRC || error == -EFSCORRUPTED) + error = 0; + kmem_free(key->value); + kmem_free(key); + return error; +} + +/* Extract every xattr key that we can from this attr fork block. */ +STATIC int +xrep_xattr_recover_leaf( + struct xrep_xattr *rx, + struct xfs_buf *bp) +{ + struct xfs_attr3_icleaf_hdr leafhdr; + struct xfs_scrub *sc = rx->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_attr_leafblock *leaf; + unsigned long *usedmap = sc->buf; + struct xfs_attr_leaf_name_local *lentry; + struct xfs_attr_leaf_name_remote *rentry; + struct xfs_attr_leaf_entry *ent; + struct xfs_attr_leaf_entry *entries; + char *buf_end; + size_t off; + unsigned int nameidx; + unsigned int hdrsize; + int i; + int error = 0; + + bitmap_zero(usedmap, mp->m_attr_geo->blksize); + + /* Check the leaf header */ + leaf = bp->b_addr; + xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf); + hdrsize = xfs_attr3_leaf_hdr_size(leaf); + xchk_xattr_set_map(sc, usedmap, 0, hdrsize); + entries = xfs_attr3_leaf_entryp(leaf); + + buf_end = (char *)bp->b_addr + mp->m_attr_geo->blksize; + for (i = 0, ent = entries; i < leafhdr.count; ent++, i++) { + /* Skip key if it conflicts with something else? */ + off = (char *)ent - (char *)leaf; + if (!xchk_xattr_set_map(sc, usedmap, off, + sizeof(xfs_attr_leaf_entry_t))) + continue; + + /* Check the name information. */ + nameidx = be16_to_cpu(ent->nameidx); + if (nameidx < leafhdr.firstused || + nameidx >= mp->m_attr_geo->blksize) + continue; + + if (ent->flags & XFS_ATTR_LOCAL) { + lentry = xfs_attr3_leaf_name_local(leaf, i); + error = xrep_xattr_salvage_local_attr(rx, ent, nameidx, + buf_end, lentry); + } else { + rentry = xfs_attr3_leaf_name_remote(leaf, i); + error = xrep_xattr_salvage_remote_attr(rx, ent, nameidx, + buf_end, rentry, i, bp); + } + if (error) + break; + } + + return error; +} + +/* Try to recover shortform attrs. */ +STATIC int +xrep_xattr_recover_sf( + struct xrep_xattr *rx) +{ + struct xfs_attr_shortform *sf; + struct xfs_attr_sf_entry *sfe; + struct xfs_attr_sf_entry *next; + struct xfs_ifork *ifp; + unsigned char *end; + int i; + int error; + + ifp = XFS_IFORK_PTR(rx->sc->ip, XFS_ATTR_FORK); + sf = (struct xfs_attr_shortform *)rx->sc->ip->i_afp->if_u1.if_data; + end = (unsigned char *)ifp->if_u1.if_data + ifp->if_bytes; + + for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) { + next = XFS_ATTR_SF_NEXTENTRY(sfe); + if ((unsigned char *)next > end) + break; + + /* Ok, let's save this key/value. */ + error = xrep_xattr_salvage_sf_attr(rx, sfe); + if (error) + return error; + + sfe = next; + } + + return 0; +} + +/* Extract as many attribute keys and values as we can. */ +STATIC int +xrep_xattr_recover( + struct xrep_xattr *rx) +{ + struct xfs_iext_cursor icur; + struct xfs_bmbt_irec got; + struct xfs_scrub *sc = rx->sc; + struct xfs_ifork *ifp; + struct xfs_da_blkinfo *info; + struct xfs_buf *bp; + xfs_dablk_t dabno; + int error = 0; + + if (sc->ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) + return xrep_xattr_recover_sf(rx); + + /* Iterate each attr block in the attr fork. */ + ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK); + for_each_xfs_iext(ifp, &icur, &got) { + for_each_xfs_attr_block(sc->mp, &got, dabno) { + /* + * Try to read buffer. We invalidate them in the next + * step so we don't bother to set a buffer type or + * ops. + */ + error = xfs_da_read_buf(sc->tp, sc->ip, dabno, -1, &bp, + XFS_ATTR_FORK, NULL); + if (error || !bp) + continue; + + /* Screen out non-leaves & other garbage. */ + info = bp->b_addr; + if (info->magic != cpu_to_be16(XFS_ATTR3_LEAF_MAGIC) || + xfs_attr3_leaf_buf_ops.verify_struct(bp) != NULL) + continue; + + error = xrep_xattr_recover_leaf(rx, bp); + if (error) + return error; + } + } + + return error; +} + +/* Free all the attribute fork blocks and delete the fork. */ +STATIC int +xrep_xattr_reset_btree( + struct xfs_scrub *sc) +{ + struct xfs_iext_cursor icur; + struct xfs_bmbt_irec got; + struct xfs_ifork *ifp; + struct xfs_buf *bp; + xfs_fileoff_t lblk; + int error; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + if (sc->ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) + goto out_fork_remove; + + /* Invalidate each attr block in the attr fork. */ + ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK); + for_each_xfs_iext(ifp, &icur, &got) { + for_each_xfs_attr_block(sc->mp, &got, lblk) { + error = xfs_da_get_buf(sc->tp, sc->ip, lblk, -1, &bp, + XFS_ATTR_FORK); + if (error || !bp) + continue; + xfs_trans_binval(sc->tp, bp); + error = xfs_trans_roll_inode(&sc->tp, sc->ip); + if (error) + return error; + } + } + + /* Now free all the blocks. */ + error = xfs_itruncate_extents(&sc->tp, sc->ip, XFS_ATTR_FORK, 0); + if (error) + return error; + +out_fork_remove: + /* Reset the attribute fork - this also destroys the in-core fork */ + xfs_attr_fork_remove(sc->ip, sc->tp); + return 0; +} + +/* + * Compare two xattr keys. ATTR_SECURE keys come before ATTR_ROOT and + * ATTR_ROOT keys come before user attrs. Otherwise sort in hash order. + */ +static int +xrep_xattr_key_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_xattr_key *ap; + struct xrep_xattr_key *bp; + uint ahash; + uint bhash; + + ap = container_of(a, struct xrep_xattr_key, list); + bp = container_of(b, struct xrep_xattr_key, list); + + if (ap->flags > bp->flags) + return 1; + else if (ap->flags < bp->flags) + return -1; + + ahash = xfs_da_hashname(ap->name, ap->namelen); + bhash = xfs_da_hashname(bp->name, bp->namelen); + if (ahash > bhash) + return 1; + else if (ahash < bhash) + return -1; + return 0; +} + +/* + * Find all the extended attributes for this inode by scraping them out of the + * attribute key blocks by hand. The caller must clean up the lists if + * anything goes wrong. + */ +STATIC int +xrep_xattr_find_attributes( + struct xfs_scrub *sc, + struct list_head *attrlist) +{ + struct xrep_xattr rx; + struct xfs_ifork *ifp; + int error; + + error = xrep_ino_dqattach(sc); + if (error) + return error; + + /* Extent map should be loaded. */ + ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK); + if (XFS_IFORK_FORMAT(sc->ip, XFS_ATTR_FORK) != XFS_DINODE_FMT_LOCAL && + !(ifp->if_flags & XFS_IFEXTENTS)) { + error = xfs_iread_extents(sc->tp, sc->ip, XFS_ATTR_FORK); + if (error) + return error; + } + + rx.attrlist = attrlist; + rx.sc = sc; + + /* Read every attr key and value and record them in memory. */ + return xrep_xattr_recover(&rx); +} + +/* Free all the attributes. */ +STATIC void +xrep_xattr_cancel_attrs( + struct list_head *attrlist) +{ + struct xrep_xattr_key *key; + struct xrep_xattr_key *n; + + list_for_each_entry_safe(key, n, attrlist, list) { + list_del(&key->list); + kmem_free(key->value); + kmem_free(key); + } +} + +/* + * Insert all the attributes that we collected. + * + * Commit the repair transaction and drop the ilock because the attribute + * setting code needs to be able to allocate special transactions and take the + * ilock on its own. Some day we'll have deferred attribute setting, at which + * point we'll be able to use that to replace the attributes atomically and + * safely. + */ +STATIC int +xrep_xattr_rebuild_tree( + struct xfs_scrub *sc, + struct list_head *attrlist) +{ + struct xrep_xattr_key *key; + struct xrep_xattr_key *n; + int error; + + error = xfs_trans_commit(sc->tp); + sc->tp = NULL; + if (error) + return error; + + xfs_iunlock(sc->ip, XFS_ILOCK_EXCL); + sc->ilock_flags &= ~XFS_ILOCK_EXCL; + + /* Re-add every attr to the file. */ + list_sort(NULL, attrlist, xrep_xattr_key_cmp); + list_for_each_entry_safe(key, n, attrlist, list) { + error = xfs_attr_set(sc->ip, key->name, key->value, + key->valuelen, key->flags); + if (error) + return error; + + /* + * If the attr value is larger than a single page, free the + * key now so that we aren't hogging memory while doing a lot + * of metadata updates. Otherwise, we want to spend as little + * time reconstructing the attrs as we possibly can. + */ + if (key->valuelen <= PAGE_SIZE) + continue; + list_del(&key->list); + kmem_free(key->value); + kmem_free(key); + } + + xrep_xattr_cancel_attrs(attrlist); + return 0; +} + +/* + * Repair the extended attribute metadata. + * + * XXX: Remote attribute value buffers encompass the entire (up to 64k) buffer. + * The buffer cache in XFS can't handle aliased multiblock buffers, so this + * might misbehave if the attr fork is crosslinked with other filesystem + * metadata. + */ +int +xrep_xattr( + struct xfs_scrub *sc) +{ + struct list_head attrlist; + int error; + + if (!xfs_inode_hasattr(sc->ip)) + return -ENOENT; + + /* Collect extended attributes by parsing raw blocks. */ + INIT_LIST_HEAD(&attrlist); + error = xrep_xattr_find_attributes(sc, &attrlist); + if (error) + goto out; + + /* + * Invalidate and truncate all attribute fork extents. This is the + * point at which we are no longer able to bail out gracefully. + * We commit the transaction here because xfs_attr_set allocates its + * own transactions. + */ + error = xrep_xattr_reset_btree(sc); + if (error) + goto out; + + /* Now rebuild the attribute information. */ + error = xrep_xattr_rebuild_tree(sc, &attrlist); +out: + xrep_xattr_cancel_attrs(&attrlist); + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 17769efb20d9..b630084d0f39 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -69,6 +69,7 @@ int xrep_inode(struct xfs_scrub *sc); int xrep_bmap_data(struct xfs_scrub *sc); int xrep_bmap_attr(struct xfs_scrub *sc); int xrep_symlink(struct xfs_scrub *sc); +int xrep_xattr(struct xfs_scrub *sc); #else @@ -110,6 +111,7 @@ xrep_reset_perag_resv( #define xrep_bmap_data xrep_notsupported #define xrep_bmap_attr xrep_notsupported #define xrep_symlink xrep_notsupported +#define xrep_xattr xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 0a8eea77e58f..537636d789fb 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -301,7 +301,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_INODE, .setup = xchk_setup_xattr, .scrub = xchk_xattr, - .repair = xrep_notsupported, + .repair = xrep_xattr, }, [XFS_SCRUB_TYPE_SYMLINK] = { /* symbolic link */ .type = ST_INODE, diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 762db46fd696..d7ad8fad9318 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -139,4 +139,7 @@ void xchk_xref_is_used_rt_space(struct xfs_scrub *sc, xfs_rtblock_t rtbno, # define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0) #endif +bool xchk_xattr_set_map(struct xfs_scrub *sc, unsigned long *map, + unsigned int start, unsigned int len); + #endif /* __XFS_SCRUB_SCRUB_H__ */ From patchwork Thu Jul 26 00:21:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10545003 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1EE83A635 for ; Thu, 26 Jul 2018 00:22:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C3882AB00 for ; Thu, 26 Jul 2018 00:22:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 009472AB19; Thu, 26 Jul 2018 00:22:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1EB4E2AB08 for ; Thu, 26 Jul 2018 00:21:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728411AbeGZBgG (ORCPT ); Wed, 25 Jul 2018 21:36:06 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:45464 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728385AbeGZBgG (ORCPT ); Wed, 25 Jul 2018 21:36:06 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0E0Co166488; Thu, 26 Jul 2018 00:21:49 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=d8337X/gYRbmUvYRUIQzoYOrBpH5Eo+4nw9WPwYznUk=; b=E1RYCSLoePMsHxC2lKXluvO3WZL/dgRYzEE4R5Fi+P5UfWVpmumdhI+sv9de+Odmkgrk 4+8IBEYiNpsy2x0gAlnbY3FZQa/AARMPdyZJ+6VA/sb80aSxUAAML7HoLJAO7o1mp1mg pt8Fm9V3SbcIMYgKPRBH3kTDpUKiFGbgRv0edUJ3qv9kIbyuzN6oDnisCRbKOfeM8Tlk SDDdWCXKMPxOnyrfrGOHRiGGlR00YXhT49uBlQzB9HxJUs6PRDESrT2W860jjltfL+R7 go/MHH7cl4WoJGzKPGN5ftsgGv3jJOsKmDmmPrF4wPNybO258qIFEtvFglq97kXGsVl8 HA== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2kbv8t7ps5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:49 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0Lm5R002238 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:48 GMT Received: from abhmp0018.oracle.com (abhmp0018.oracle.com [141.146.116.24]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LmKA001021; Thu, 26 Jul 2018 00:21:48 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:21:47 -0700 Subject: [PATCH 15/16] xfs: scrub should set preen if attr leaf has holes From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com, Dave Chinner Date: Wed, 25 Jul 2018 17:21:45 -0700 Message-ID: <153256450567.29021.15041438768630274061.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong If an attr block indicates that it could use compaction, set the preen flag to have the attr fork rebuilt, since the attr fork rebuilder can take care of that for us. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/scrub/attr.c | 2 ++ fs/xfs/scrub/dabtree.c | 15 +++++++++++++++ fs/xfs/scrub/dabtree.h | 1 + fs/xfs/scrub/trace.h | 1 + 4 files changed, 19 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c index e20074c241b5..0956d4588dc5 100644 --- a/fs/xfs/scrub/attr.c +++ b/fs/xfs/scrub/attr.c @@ -293,6 +293,8 @@ xchk_xattr_block( xchk_da_set_corrupt(ds, level); if (!xchk_xattr_set_map(ds->sc, usedmap, 0, hdrsize)) xchk_da_set_corrupt(ds, level); + if (leafhdr.holes) + xchk_da_set_preen(ds, level); if (ds->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) goto out; diff --git a/fs/xfs/scrub/dabtree.c b/fs/xfs/scrub/dabtree.c index f1260b4bfdee..e2ecf9c77010 100644 --- a/fs/xfs/scrub/dabtree.c +++ b/fs/xfs/scrub/dabtree.c @@ -85,6 +85,21 @@ xchk_da_set_corrupt( __return_address); } +/* Flag a da btree node in need of optimization. */ +void +xchk_da_set_preen( + struct xchk_da_btree *ds, + int level) +{ + struct xfs_scrub *sc = ds->sc; + + sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN; + trace_xchk_fblock_preen(sc, ds->dargs.whichfork, + xfs_dir2_da_to_db(ds->dargs.geo, + ds->state->path.blk[level].blkno), + __return_address); +} + /* Find an entry at a certain level in a da btree. */ STATIC void * xchk_da_btree_entry( diff --git a/fs/xfs/scrub/dabtree.h b/fs/xfs/scrub/dabtree.h index cb3f0003245b..b367bf87a183 100644 --- a/fs/xfs/scrub/dabtree.h +++ b/fs/xfs/scrub/dabtree.h @@ -36,6 +36,7 @@ bool xchk_da_process_error(struct xchk_da_btree *ds, int level, int *error); /* Check for da btree corruption. */ void xchk_da_set_corrupt(struct xchk_da_btree *ds, int level); +void xchk_da_set_preen(struct xchk_da_btree *ds, int level); int xchk_da_btree_hash(struct xchk_da_btree *ds, int level, __be32 *hashp); int xchk_da_btree(struct xfs_scrub *sc, int whichfork, diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 3383b14fd0c0..d7133d1d23d6 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -230,6 +230,7 @@ DEFINE_EVENT(xchk_fblock_error_class, name, \ DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xchk_fblock_error); DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xchk_fblock_warning); +DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xchk_fblock_preen); TRACE_EVENT(xchk_incomplete, TP_PROTO(struct xfs_scrub *sc, void *ret_ip), From patchwork Thu Jul 26 00:21:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10545001 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F8C91822 for ; Thu, 26 Jul 2018 00:22:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D01F2AAAE for ; Thu, 26 Jul 2018 00:22:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 81AEB2AAD1; Thu, 26 Jul 2018 00:22:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 69E872AB09 for ; Thu, 26 Jul 2018 00:21:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728385AbeGZBgH (ORCPT ); Wed, 25 Jul 2018 21:36:07 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:47782 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbeGZBgG (ORCPT ); Wed, 25 Jul 2018 21:36:06 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6Q0FACT002773; Thu, 26 Jul 2018 00:21:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=xz5npab1tGwtT30zhWthKxkduUPv74kUz4y370QK1eU=; b=kS0sN4tj3X7dS7dH/rSk8a35l96PTAYtY3WlwjBWS4ddfMgH1ivJw+sRIDnUrYxsH8FC 0KCPlFAN17QXOKN1+elqkhSrUMc7KZ1umFJN8APRjZ7ZOUSzZ/78Bf4AecHGExXBQbWI fFTC71sCVz69KODesS+xPXR7H8TY727CwTzBC016Hkj5qpIIcD8LB/CdvxTNxKmpCyOs rSt6rUKMOVm0lyyE9BGkjDqSoQ0fJObXcKYLryVwKFKR+hZsv7rbCS358Ug6ncRyYiri pXBZ+kSCeTwuW55nNCTpxxaq9xQLMGlUPqwgzSvdOE4wN5T7vhdgE3dMWkbeGX9fU1bQ rw== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2120.oracle.com with ESMTP id 2kbvsnyp3a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:55 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LssG004461 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Jul 2018 00:21:54 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6Q0LsP0001043; Thu, 26 Jul 2018 00:21:54 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2018 17:21:54 -0700 Subject: [PATCH 16/16] xfs: repair quotas From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, allison.henderson@oracle.com Date: Wed, 25 Jul 2018 17:21:52 -0700 Message-ID: <153256451206.29021.2669792675237328105.stgit@magnolia> In-Reply-To: <153256436688.29021.4638459579042241728.stgit@magnolia> References: <153256436688.29021.4638459579042241728.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8965 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807260002 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Fix anything that causes the quota verifiers to fail. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/attr_repair.c | 2 fs/xfs/scrub/common.h | 9 + fs/xfs/scrub/quota.c | 2 fs/xfs/scrub/quota_repair.c | 363 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.c | 58 +++++++ fs/xfs/scrub/repair.h | 8 + fs/xfs/scrub/scrub.c | 11 + 8 files changed, 446 insertions(+), 8 deletions(-) create mode 100644 fs/xfs/scrub/quota_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index c3963c88f952..ed1fc827ed15 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -174,5 +174,6 @@ xfs-y += $(addprefix scrub/, \ repair.o \ symlink_repair.o \ ) +xfs-$(CONFIG_XFS_QUOTA) += scrub/quota_repair.o endif endif diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c index 5bacfb88f25e..e01ca4350857 100644 --- a/fs/xfs/scrub/attr_repair.c +++ b/fs/xfs/scrub/attr_repair.c @@ -395,7 +395,7 @@ xrep_xattr_recover( } /* Free all the attribute fork blocks and delete the fork. */ -STATIC int +int xrep_xattr_reset_btree( struct xfs_scrub *sc) { diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 2d4324d12f9a..aab82f7f9a67 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -138,4 +138,13 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm) int xchk_metadata_inode_forks(struct xfs_scrub *sc); int xchk_ilock_inverted(struct xfs_inode *ip, uint lock_mode); +/* Do we need to invoke the repair tool? */ +static inline bool xfs_scrub_needs_repair(struct xfs_scrub_metadata *sm) +{ + return sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT | + XFS_SCRUB_OFLAG_XCORRUPT | + XFS_SCRUB_OFLAG_PREEN); +} +uint xchk_quota_to_dqtype(struct xfs_scrub *sc); + #endif /* __XFS_SCRUB_COMMON_H__ */ diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c index 782d582d3edd..0e5578ab088e 100644 --- a/fs/xfs/scrub/quota.c +++ b/fs/xfs/scrub/quota.c @@ -29,7 +29,7 @@ #include "scrub/trace.h" /* Convert a scrub type code to a DQ flag, or return 0 if error. */ -static inline uint +uint xchk_quota_to_dqtype( struct xfs_scrub *sc) { diff --git a/fs/xfs/scrub/quota_repair.c b/fs/xfs/scrub/quota_repair.c new file mode 100644 index 000000000000..36635f7ca217 --- /dev/null +++ b/fs/xfs/scrub/quota_repair.c @@ -0,0 +1,363 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_inode_fork.h" +#include "xfs_alloc.h" +#include "xfs_bmap.h" +#include "xfs_quota.h" +#include "xfs_qm.h" +#include "xfs_dquot.h" +#include "xfs_dquot_item.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/trace.h" +#include "scrub/repair.h" + +/* + * Quota Repair + * ============ + * + * Quota repairs are fairly simplistic; we fix everything that the dquot + * verifiers complain about, cap any counters or limits that make no sense, + * and schedule a quotacheck if we had to fix anything. We also repair any + * data fork extent records that don't apply to metadata files. + */ + +struct xrep_quota_info { + struct xfs_scrub *sc; + bool need_quotacheck; +}; + +/* Scrub the fields in an individual quota item. */ +STATIC int +xrep_quota_item( + struct xfs_dquot *dq, + uint dqtype, + void *priv) +{ + struct xrep_quota_info *rqi = priv; + struct xfs_scrub *sc = rqi->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_disk_dquot *d = &dq->q_core; + unsigned long long bsoft; + unsigned long long isoft; + unsigned long long rsoft; + unsigned long long bhard; + unsigned long long ihard; + unsigned long long rhard; + unsigned long long bcount; + unsigned long long icount; + unsigned long long rcount; + xfs_ino_t fs_icount; + bool dirty = false; + int error; + + /* Did we get the dquot type we wanted? */ + if (dqtype != (d->d_flags & XFS_DQ_ALLTYPES)) { + d->d_flags = dqtype; + dirty = true; + } + + if (d->d_pad0 || d->d_pad) { + d->d_pad0 = 0; + d->d_pad = 0; + dirty = true; + } + + /* Check the limits. */ + bhard = be64_to_cpu(d->d_blk_hardlimit); + ihard = be64_to_cpu(d->d_ino_hardlimit); + rhard = be64_to_cpu(d->d_rtb_hardlimit); + + bsoft = be64_to_cpu(d->d_blk_softlimit); + isoft = be64_to_cpu(d->d_ino_softlimit); + rsoft = be64_to_cpu(d->d_rtb_softlimit); + + if (bsoft > bhard) { + d->d_blk_softlimit = d->d_blk_hardlimit; + dirty = true; + } + + if (isoft > ihard) { + d->d_ino_softlimit = d->d_ino_hardlimit; + dirty = true; + } + + if (rsoft > rhard) { + d->d_rtb_softlimit = d->d_rtb_hardlimit; + dirty = true; + } + + /* Check the resource counts. */ + bcount = be64_to_cpu(d->d_bcount); + icount = be64_to_cpu(d->d_icount); + rcount = be64_to_cpu(d->d_rtbcount); + fs_icount = percpu_counter_sum(&mp->m_icount); + + /* + * Check that usage doesn't exceed physical limits. However, on + * a reflink filesystem we're allowed to exceed physical space + * if there are no quota limits. We don't know what the real number + * is, but we can make quotacheck find out for us. + */ + if (!xfs_sb_version_hasreflink(&mp->m_sb) && + mp->m_sb.sb_dblocks < bcount) { + dq->q_res_bcount -= be64_to_cpu(dq->q_core.d_bcount); + dq->q_res_bcount += mp->m_sb.sb_dblocks; + d->d_bcount = cpu_to_be64(mp->m_sb.sb_dblocks); + rqi->need_quotacheck = true; + dirty = true; + } + if (icount > fs_icount) { + dq->q_res_icount -= be64_to_cpu(dq->q_core.d_icount); + dq->q_res_icount += fs_icount; + d->d_icount = cpu_to_be64(fs_icount); + rqi->need_quotacheck = true; + dirty = true; + } + if (rcount > mp->m_sb.sb_rblocks) { + dq->q_res_rtbcount -= be64_to_cpu(dq->q_core.d_rtbcount); + dq->q_res_rtbcount += mp->m_sb.sb_rblocks; + d->d_rtbcount = cpu_to_be64(mp->m_sb.sb_rblocks); + rqi->need_quotacheck = true; + dirty = true; + } + + if (!dirty) + return 0; + + dq->dq_flags |= XFS_DQ_DIRTY; + xfs_trans_dqjoin(sc->tp, dq); + xfs_trans_log_dquot(sc->tp, dq); + error = xfs_trans_roll(&sc->tp); + xfs_dqlock(dq); + return error; +} + +/* Fix a quota timer so that we can pass the verifier. */ +STATIC void +xrep_quota_fix_timer( + __be64 softlimit, + __be64 countnow, + __be32 *timer, + time_t timelimit) +{ + uint64_t soft = be64_to_cpu(softlimit); + uint64_t count = be64_to_cpu(countnow); + + if (soft && count > soft && *timer == 0) + *timer = cpu_to_be32(get_seconds() + timelimit); +} + +/* Fix anything the verifiers complain about. */ +STATIC int +xrep_quota_block( + struct xfs_scrub *sc, + struct xfs_buf *bp, + uint dqtype, + xfs_dqid_t id) +{ + struct xfs_dqblk *d = (struct xfs_dqblk *)bp->b_addr; + struct xfs_disk_dquot *ddq; + struct xfs_quotainfo *qi = sc->mp->m_quotainfo; + enum xfs_blft buftype = 0; + int i; + + bp->b_ops = &xfs_dquot_buf_ops; + for (i = 0; i < qi->qi_dqperchunk; i++) { + ddq = &d[i].dd_diskdq; + + ddq->d_magic = cpu_to_be16(XFS_DQUOT_MAGIC); + ddq->d_version = XFS_DQUOT_VERSION; + ddq->d_flags = dqtype; + ddq->d_id = cpu_to_be32(id + i); + + xrep_quota_fix_timer(ddq->d_blk_softlimit, + ddq->d_bcount, &ddq->d_btimer, + qi->qi_btimelimit); + xrep_quota_fix_timer(ddq->d_ino_softlimit, + ddq->d_icount, &ddq->d_itimer, + qi->qi_itimelimit); + xrep_quota_fix_timer(ddq->d_rtb_softlimit, + ddq->d_rtbcount, &ddq->d_rtbtimer, + qi->qi_rtbtimelimit); + + /* We only support v5 filesystems so always set these. */ + uuid_copy(&d->dd_uuid, &sc->mp->m_sb.sb_meta_uuid); + xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk), + XFS_DQUOT_CRC_OFF); + d->dd_lsn = 0; + } + switch (dqtype) { + case XFS_DQ_USER: + buftype = XFS_BLFT_UDQUOT_BUF; + break; + case XFS_DQ_GROUP: + buftype = XFS_BLFT_GDQUOT_BUF; + break; + case XFS_DQ_PROJ: + buftype = XFS_BLFT_PDQUOT_BUF; + break; + } + xfs_trans_buf_set_type(sc->tp, bp, buftype); + xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1); + return xfs_trans_roll(&sc->tp); +} + +/* Repair quota's data fork. */ +STATIC int +xrep_quota_data_fork( + struct xfs_scrub *sc, + uint dqtype) +{ + struct xfs_bmbt_irec irec = { 0 }; + struct xfs_iext_cursor icur; + struct xfs_quotainfo *qi = sc->mp->m_quotainfo; + struct xfs_ifork *ifp; + struct xfs_buf *bp; + struct xfs_dqblk *d; + xfs_dqid_t id; + xfs_fileoff_t max_dqid_off; + xfs_fileoff_t off; + xfs_fsblock_t fsbno; + bool truncate = false; + int error = 0; + + error = xrep_metadata_inode_forks(sc); + if (error) + goto out; + + /* Check for data fork problems that apply only to quota files. */ + max_dqid_off = ((xfs_dqid_t)-1) / qi->qi_dqperchunk; + ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK); + for_each_xfs_iext(ifp, &icur, &irec) { + if (isnullstartblock(irec.br_startblock)) { + error = -EFSCORRUPTED; + goto out; + } + + if (irec.br_startoff > max_dqid_off || + irec.br_startoff + irec.br_blockcount - 1 > max_dqid_off) { + truncate = true; + break; + } + } + if (truncate) { + error = xfs_itruncate_extents(&sc->tp, sc->ip, XFS_DATA_FORK, + max_dqid_off * sc->mp->m_sb.sb_blocksize); + if (error) + goto out; + } + + /* Now go fix anything that fails the verifiers. */ + for_each_xfs_iext(ifp, &icur, &irec) { + for (fsbno = irec.br_startblock, off = irec.br_startoff; + fsbno < irec.br_startblock + irec.br_blockcount; + fsbno += XFS_DQUOT_CLUSTER_SIZE_FSB, + off += XFS_DQUOT_CLUSTER_SIZE_FSB) { + id = off * qi->qi_dqperchunk; + error = xfs_trans_read_buf(sc->mp, sc->tp, + sc->mp->m_ddev_targp, + XFS_FSB_TO_DADDR(sc->mp, fsbno), + qi->qi_dqchunklen, + 0, &bp, &xfs_dquot_buf_ops); + if (error == 0) { + d = (struct xfs_dqblk *)bp->b_addr; + if (id == be32_to_cpu(d->dd_diskdq.d_id)) { + xfs_trans_brelse(sc->tp, bp); + continue; + } + error = -EFSCORRUPTED; + xfs_trans_brelse(sc->tp, bp); + } + if (error != -EFSBADCRC && error != -EFSCORRUPTED) + goto out; + + /* Failed verifier, try again. */ + error = xfs_trans_read_buf(sc->mp, sc->tp, + sc->mp->m_ddev_targp, + XFS_FSB_TO_DADDR(sc->mp, fsbno), + qi->qi_dqchunklen, + 0, &bp, NULL); + if (error) + goto out; + + /* + * Fix the quota block, which will roll our transaction + * and release bp. + */ + error = xrep_quota_block(sc, bp, dqtype, id); + if (error) + goto out; + } + } + +out: + return error; +} + +/* + * Go fix anything in the quota items that we could have been mad about. Now + * that we've checked the quota inode data fork we have to drop ILOCK_EXCL to + * use the regular dquot functions. + */ +STATIC int +xrep_quota_problems( + struct xfs_scrub *sc, + uint dqtype) +{ + struct xrep_quota_info rqi; + int error; + + rqi.sc = sc; + rqi.need_quotacheck = false; + error = xfs_qm_dqiterate(sc->mp, dqtype, xrep_quota_item, &rqi); + if (error) + return error; + + /* Make a quotacheck happen. */ + if (rqi.need_quotacheck) + xrep_force_quotacheck(sc, dqtype); + return 0; +} + +/* Repair all of a quota type's items. */ +int +xrep_quota( + struct xfs_scrub *sc) +{ + uint dqtype; + int error; + + dqtype = xchk_quota_to_dqtype(sc); + + /* Fix problematic data fork mappings. */ + error = xrep_quota_data_fork(sc, dqtype); + if (error) + goto out; + + /* Unlock quota inode; we play only with dquots from now on. */ + xfs_iunlock(sc->ip, sc->ilock_flags); + sc->ilock_flags = 0; + + /* Fix anything the dquot verifiers complain about. */ + error = xrep_quota_problems(sc, dqtype); +out: + return error; +} diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index a44deb6f06ab..27cc50178d86 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -29,6 +29,8 @@ #include "xfs_ag_resv.h" #include "xfs_trans_space.h" #include "xfs_quota.h" +#include "xfs_attr.h" +#include "xfs_reflink.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -900,3 +902,59 @@ xrep_reset_perag_resv( out: return error; } + +/* + * Repair the attr/data forks of a metadata inode. The metadata inode must be + * pointed to by sc->ip and the ILOCK must be held. + */ +int +xrep_metadata_inode_forks( + struct xfs_scrub *sc) +{ + __u32 smtype; + __u32 smflags; + int error; + + smtype = sc->sm->sm_type; + smflags = sc->sm->sm_flags; + + /* Let's see if the forks need repair. */ + sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT; + error = xchk_metadata_inode_forks(sc); + if (error || !xfs_scrub_needs_repair(sc->sm)) + goto out; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* Clear the reflink flag & attr forks that we shouldn't have. */ + if (xfs_is_reflink_inode(sc->ip)) { + error = xfs_reflink_clear_inode_flag(sc->ip, &sc->tp); + if (error) + goto out; + } + + if (xfs_inode_hasattr(sc->ip)) { + error = xrep_xattr_reset_btree(sc); + if (error) + goto out; + } + + /* Repair the data fork. */ + sc->sm->sm_type = XFS_SCRUB_TYPE_BMBTD; + error = xrep_bmap_data(sc); + sc->sm->sm_type = smtype; + if (error) + goto out; + + /* Bail out if we still need repairs. */ + sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT; + error = xchk_metadata_inode_forks(sc); + if (error) + goto out; + if (xfs_scrub_needs_repair(sc->sm)) + error = -EFSCORRUPTED; +out: + sc->sm->sm_type = smtype; + sc->sm->sm_flags = smflags; + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index b630084d0f39..aa032a7b99d0 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -54,6 +54,8 @@ int xrep_find_ag_btree_roots(struct xfs_scrub *sc, struct xfs_buf *agf_bp, void xrep_force_quotacheck(struct xfs_scrub *sc, uint dqtype); int xrep_ino_dqattach(struct xfs_scrub *sc); int xrep_reset_perag_resv(struct xfs_scrub *sc); +int xrep_xattr_reset_btree(struct xfs_scrub *sc); +int xrep_metadata_inode_forks(struct xfs_scrub *sc); /* Metadata repairers */ @@ -70,6 +72,11 @@ int xrep_bmap_data(struct xfs_scrub *sc); int xrep_bmap_attr(struct xfs_scrub *sc); int xrep_symlink(struct xfs_scrub *sc); int xrep_xattr(struct xfs_scrub *sc); +#ifdef CONFIG_XFS_QUOTA +int xrep_quota(struct xfs_scrub *sc); +#else +# define xrep_quota xrep_notsupported +#endif /* CONFIG_XFS_QUOTA */ #else @@ -112,6 +119,7 @@ xrep_reset_perag_resv( #define xrep_bmap_attr xrep_notsupported #define xrep_symlink xrep_notsupported #define xrep_xattr xrep_notsupported +#define xrep_quota xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 537636d789fb..a9f969214e69 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -333,19 +333,19 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_FS, .setup = xchk_setup_quota, .scrub = xchk_quota, - .repair = xrep_notsupported, + .repair = xrep_quota, }, [XFS_SCRUB_TYPE_GQUOTA] = { /* group quota */ .type = ST_FS, .setup = xchk_setup_quota, .scrub = xchk_quota, - .repair = xrep_notsupported, + .repair = xrep_quota, }, [XFS_SCRUB_TYPE_PQUOTA] = { /* project quota */ .type = ST_FS, .setup = xchk_setup_quota, .scrub = xchk_quota, - .repair = xrep_notsupported, + .repair = xrep_quota, }, }; @@ -539,9 +539,8 @@ xfs_scrub_metadata( if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) sc.sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT; - needs_fix = (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT | - XFS_SCRUB_OFLAG_XCORRUPT | - XFS_SCRUB_OFLAG_PREEN)); + needs_fix = xfs_scrub_needs_repair(sc.sm); + /* * If userspace asked for a repair but it wasn't necessary, * report that back to userspace.