From patchwork Mon Jul 30 05:47:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548421 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A309139A for ; Mon, 30 Jul 2018 05:48:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9698293FF for ; Mon, 30 Jul 2018 05:48:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BD7A62951B; Mon, 30 Jul 2018 05:48:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 87E9F293FF for ; Mon, 30 Jul 2018 05:48:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726087AbeG3HVW (ORCPT ); Mon, 30 Jul 2018 03:21:22 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:53306 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726259AbeG3HVW (ORCPT ); Mon, 30 Jul 2018 03:21:22 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5hbpc004943; Mon, 30 Jul 2018 05:47:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=RgkSOHE6jUi6HzPbtrY9BNxjZf8j5y4UJJZkI98oOLk=; b=qkWcPP8mLD0nof8KVQQTmvAoQxkVuPHy8wO07W9031El1LlPrjKAUi6Fa/YK+HNcw5CP RUIXcpUggj+OgQVIAh4Tbra9Xr+uroNHJWlZatyMcRJXDS0La/reDb8UlZXgahNaNJGL IMml1JYcv3BVUp436iIp07EuQZ9S6sjUBhNWqyj8ahp01HwPCAiQkQ8F2iYkhRTv1mIr OulsM4uzquvlk5NRZLYWjnV8ZjFuV3Xt2ZK4LmLbau3MyqMlLxWkC+Gc3KevNyv//9Un RhrkfzwXvuD4SMS7HkPTPsJts5AOwTqc3mRBrGyNZB0wjzF3gue+jj4sMBF2bERBtspZ eQ== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2130.oracle.com with ESMTP id 2kgfwstx0x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:47:58 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5lvAd016493 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:47:57 GMT Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6U5lv9A031138; Mon, 30 Jul 2018 05:47:57 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:47:56 -0700 Subject: [PATCH 01/14] xfs: refactor the xrep_extent_list into xfs_bitmap From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:47:54 -0700 Message-ID: <153292967384.24509.6676118910447905441.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong As mentioned previously, the xrep_extent_list basically implements a bitmap with two functions: set and disjoint union. Rename all these functions to xfs_bitmap to shorten the name and make it more obvious what we're doing. Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster --- fs/xfs/scrub/bitmap.c | 183 +++++++++++++++++++++++++------------------------ fs/xfs/scrub/bitmap.h | 35 ++++----- fs/xfs/scrub/repair.c | 85 ++++++++++------------- fs/xfs/scrub/repair.h | 8 +- fs/xfs/scrub/trace.h | 1 5 files changed, 149 insertions(+), 163 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c index a7c2f4773f98..c770e2d0b6aa 100644 --- a/fs/xfs/scrub/bitmap.c +++ b/fs/xfs/scrub/bitmap.c @@ -16,183 +16,186 @@ #include "scrub/repair.h" #include "scrub/bitmap.h" -/* Collect a dead btree extent for later disposal. */ +/* + * Set a range of this bitmap. Caller must ensure the range is not set. + * + * This is the logical equivalent of bitmap |= mask(start, len). + */ int -xrep_collect_btree_extent( - struct xfs_scrub *sc, - struct xrep_extent_list *exlist, - xfs_fsblock_t fsbno, - xfs_extlen_t len) +xfs_bitmap_set( + struct xfs_bitmap *bitmap, + uint64_t start, + uint64_t len) { - struct xrep_extent *rex; + struct xfs_bitmap_range *bmr; - trace_xrep_collect_btree_extent(sc->mp, - XFS_FSB_TO_AGNO(sc->mp, fsbno), - XFS_FSB_TO_AGBNO(sc->mp, fsbno), len); - - rex = kmem_alloc(sizeof(struct xrep_extent), KM_MAYFAIL); - if (!rex) + bmr = kmem_alloc(sizeof(struct xfs_bitmap_range), KM_MAYFAIL); + if (!bmr) return -ENOMEM; - INIT_LIST_HEAD(&rex->list); - rex->fsbno = fsbno; - rex->len = len; - list_add_tail(&rex->list, &exlist->list); + INIT_LIST_HEAD(&bmr->list); + bmr->start = start; + bmr->len = len; + list_add_tail(&bmr->list, &bitmap->list); return 0; } -/* - * An error happened during the rebuild so the transaction will be cancelled. - * The fs will shut down, and the administrator has to unmount and run repair. - * Therefore, free all the memory associated with the list so we can die. - */ +/* Free everything related to this bitmap. */ void -xrep_cancel_btree_extents( - struct xfs_scrub *sc, - struct xrep_extent_list *exlist) +xfs_bitmap_destroy( + struct xfs_bitmap *bitmap) { - struct xrep_extent *rex; - struct xrep_extent *n; + struct xfs_bitmap_range *bmr; + struct xfs_bitmap_range *n; - for_each_xrep_extent_safe(rex, n, exlist) { - list_del(&rex->list); - kmem_free(rex); + for_each_xfs_bitmap_extent(bmr, n, bitmap) { + list_del(&bmr->list); + kmem_free(bmr); } } +/* Set up a per-AG block bitmap. */ +void +xfs_bitmap_init( + struct xfs_bitmap *bitmap) +{ + INIT_LIST_HEAD(&bitmap->list); +} + /* Compare two btree extents. */ static int -xrep_btree_extent_cmp( +xfs_bitmap_range_cmp( void *priv, struct list_head *a, struct list_head *b) { - struct xrep_extent *ap; - struct xrep_extent *bp; + struct xfs_bitmap_range *ap; + struct xfs_bitmap_range *bp; - ap = container_of(a, struct xrep_extent, list); - bp = container_of(b, struct xrep_extent, list); + ap = container_of(a, struct xfs_bitmap_range, list); + bp = container_of(b, struct xfs_bitmap_range, list); - if (ap->fsbno > bp->fsbno) + if (ap->start > bp->start) return 1; - if (ap->fsbno < bp->fsbno) + if (ap->start < bp->start) return -1; return 0; } /* - * Remove all the blocks mentioned in @sublist from the extents in @exlist. + * Remove all the blocks mentioned in @sub from the extents in @bitmap. * * The intent is that callers will iterate the rmapbt for all of its records - * for a given owner to generate @exlist; and iterate all the blocks of the + * for a given owner to generate @bitmap; and iterate all the blocks of the * metadata structures that are not being rebuilt and have the same rmapbt - * owner to generate @sublist. This routine subtracts all the extents - * mentioned in sublist from all the extents linked in @exlist, which leaves - * @exlist as the list of blocks that are not accounted for, which we assume + * owner to generate @sub. This routine subtracts all the extents + * mentioned in sub from all the extents linked in @bitmap, which leaves + * @bitmap as the list of blocks that are not accounted for, which we assume * are the dead blocks of the old metadata structure. The blocks mentioned in - * @exlist can be reaped. + * @bitmap can be reaped. + * + * This is the logical equivalent of bitmap &= ~sub. */ #define LEFT_ALIGNED (1 << 0) #define RIGHT_ALIGNED (1 << 1) int -xrep_subtract_extents( - struct xfs_scrub *sc, - struct xrep_extent_list *exlist, - struct xrep_extent_list *sublist) +xfs_bitmap_disunion( + struct xfs_bitmap *bitmap, + struct xfs_bitmap *sub) { struct list_head *lp; - struct xrep_extent *ex; - struct xrep_extent *newex; - struct xrep_extent *subex; - xfs_fsblock_t sub_fsb; - xfs_extlen_t sub_len; + struct xfs_bitmap_range *br; + struct xfs_bitmap_range *new_br; + struct xfs_bitmap_range *sub_br; + uint64_t sub_start; + uint64_t sub_len; int state; int error = 0; - if (list_empty(&exlist->list) || list_empty(&sublist->list)) + if (list_empty(&bitmap->list) || list_empty(&sub->list)) return 0; - ASSERT(!list_empty(&sublist->list)); + ASSERT(!list_empty(&sub->list)); - list_sort(NULL, &exlist->list, xrep_btree_extent_cmp); - list_sort(NULL, &sublist->list, xrep_btree_extent_cmp); + list_sort(NULL, &bitmap->list, xfs_bitmap_range_cmp); + list_sort(NULL, &sub->list, xfs_bitmap_range_cmp); /* - * Now that we've sorted both lists, we iterate exlist once, rolling - * forward through sublist and/or exlist as necessary until we find an + * Now that we've sorted both lists, we iterate bitmap once, rolling + * forward through sub and/or bitmap as necessary until we find an * overlap or reach the end of either list. We do not reset lp to the - * head of exlist nor do we reset subex to the head of sublist. The + * head of bitmap nor do we reset sub_br to the head of sub. The * list traversal is similar to merge sort, but we're deleting * instead. In this manner we avoid O(n^2) operations. */ - subex = list_first_entry(&sublist->list, struct xrep_extent, + sub_br = list_first_entry(&sub->list, struct xfs_bitmap_range, list); - lp = exlist->list.next; - while (lp != &exlist->list) { - ex = list_entry(lp, struct xrep_extent, list); + lp = bitmap->list.next; + while (lp != &bitmap->list) { + br = list_entry(lp, struct xfs_bitmap_range, list); /* - * Advance subex and/or ex until we find a pair that + * Advance sub_br and/or br until we find a pair that * intersect or we run out of extents. */ - while (subex->fsbno + subex->len <= ex->fsbno) { - if (list_is_last(&subex->list, &sublist->list)) + while (sub_br->start + sub_br->len <= br->start) { + if (list_is_last(&sub_br->list, &sub->list)) goto out; - subex = list_next_entry(subex, list); + sub_br = list_next_entry(sub_br, list); } - if (subex->fsbno >= ex->fsbno + ex->len) { + if (sub_br->start >= br->start + br->len) { lp = lp->next; continue; } - /* trim subex to fit the extent we have */ - sub_fsb = subex->fsbno; - sub_len = subex->len; - if (subex->fsbno < ex->fsbno) { - sub_len -= ex->fsbno - subex->fsbno; - sub_fsb = ex->fsbno; + /* trim sub_br to fit the extent we have */ + sub_start = sub_br->start; + sub_len = sub_br->len; + if (sub_br->start < br->start) { + sub_len -= br->start - sub_br->start; + sub_start = br->start; } - if (sub_len > ex->len) - sub_len = ex->len; + if (sub_len > br->len) + sub_len = br->len; state = 0; - if (sub_fsb == ex->fsbno) + if (sub_start == br->start) state |= LEFT_ALIGNED; - if (sub_fsb + sub_len == ex->fsbno + ex->len) + if (sub_start + sub_len == br->start + br->len) state |= RIGHT_ALIGNED; switch (state) { case LEFT_ALIGNED: /* Coincides with only the left. */ - ex->fsbno += sub_len; - ex->len -= sub_len; + br->start += sub_len; + br->len -= sub_len; break; case RIGHT_ALIGNED: /* Coincides with only the right. */ - ex->len -= sub_len; + br->len -= sub_len; lp = lp->next; break; case LEFT_ALIGNED | RIGHT_ALIGNED: /* Total overlap, just delete ex. */ lp = lp->next; - list_del(&ex->list); - kmem_free(ex); + list_del(&br->list); + kmem_free(br); break; case 0: /* * Deleting from the middle: add the new right extent * and then shrink the left extent. */ - newex = kmem_alloc(sizeof(struct xrep_extent), + new_br = kmem_alloc(sizeof(struct xfs_bitmap_range), KM_MAYFAIL); - if (!newex) { + if (!new_br) { error = -ENOMEM; goto out; } - INIT_LIST_HEAD(&newex->list); - newex->fsbno = sub_fsb + sub_len; - newex->len = ex->fsbno + ex->len - newex->fsbno; - list_add(&newex->list, &ex->list); - ex->len = sub_fsb - ex->fsbno; + INIT_LIST_HEAD(&new_br->list); + new_br->start = sub_start + sub_len; + new_br->len = br->start + br->len - new_br->start; + list_add(&new_br->list, &br->list); + br->len = sub_start - br->start; lp = lp->next; break; default: diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h index 1038157695a8..dad652ee9177 100644 --- a/fs/xfs/scrub/bitmap.h +++ b/fs/xfs/scrub/bitmap.h @@ -6,32 +6,27 @@ #ifndef __XFS_SCRUB_BITMAP_H__ #define __XFS_SCRUB_BITMAP_H__ -struct xrep_extent { +struct xfs_bitmap_range { struct list_head list; - xfs_fsblock_t fsbno; - xfs_extlen_t len; + uint64_t start; + uint64_t len; }; -struct xrep_extent_list { +struct xfs_bitmap { struct list_head list; }; -static inline void -xrep_init_extent_list( - struct xrep_extent_list *exlist) -{ - INIT_LIST_HEAD(&exlist->list); -} +void xfs_bitmap_init(struct xfs_bitmap *bitmap); +void xfs_bitmap_destroy(struct xfs_bitmap *bitmap); -#define for_each_xrep_extent_safe(rbe, n, exlist) \ - list_for_each_entry_safe((rbe), (n), &(exlist)->list, list) -int xrep_collect_btree_extent(struct xfs_scrub *sc, - struct xrep_extent_list *btlist, xfs_fsblock_t fsbno, - xfs_extlen_t len); -void xrep_cancel_btree_extents(struct xfs_scrub *sc, - struct xrep_extent_list *btlist); -int xrep_subtract_extents(struct xfs_scrub *sc, - struct xrep_extent_list *exlist, - struct xrep_extent_list *sublist); +#define for_each_xfs_bitmap_extent(bex, n, bitmap) \ + list_for_each_entry_safe((bex), (n), &(bitmap)->list, list) + +#define for_each_xfs_bitmap_block(b, bex, n, bitmap) \ + list_for_each_entry_safe((bex), (n), &(bitmap)->list, list) \ + for ((b) = bex->start; (b) < bex->start + bex->len; (b)++) + +int xfs_bitmap_set(struct xfs_bitmap *bitmap, uint64_t start, uint64_t len); +int xfs_bitmap_disunion(struct xfs_bitmap *bitmap, struct xfs_bitmap *sub); #endif /* __XFS_SCRUB_BITMAP_H__ */ diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 27a904ef6189..85b048b341a0 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -368,17 +368,17 @@ xrep_init_btblock( * * However, that leaves the matter of removing all the metadata describing the * old broken structure. For primary metadata we use the rmap data to collect - * every extent with a matching rmap owner (exlist); we then iterate all other + * every extent with a matching rmap owner (bitmap); we then iterate all other * metadata structures with the same rmap owner to collect the extents that - * cannot be removed (sublist). We then subtract sublist from exlist to + * cannot be removed (sublist). We then subtract sublist from bitmap to * derive the blocks that were used by the old btree. These blocks can be * reaped. * * For rmapbt reconstructions we must use different tactics for extent * collection. First we iterate all primary metadata (this excludes the old * rmapbt, obviously) to generate new rmap records. The gaps in the rmap - * records are collected as exlist. The bnobt records are collected as - * sublist. As with the other btrees we subtract sublist from exlist, and the + * records are collected as bitmap. The bnobt records are collected as + * sublist. As with the other btrees we subtract sublist from bitmap, and the * result (since the rmapbt lives in the free space) are the blocks from the * old rmapbt. * @@ -386,11 +386,11 @@ xrep_init_btblock( * * Now that we've constructed a new btree to replace the damaged one, we want * to dispose of the blocks that (we think) the old btree was using. - * Previously, we used the rmapbt to collect the extents (exlist) with the + * Previously, we used the rmapbt to collect the extents (bitmap) with the * rmap owner corresponding to the tree we rebuilt, collected extents for any * blocks with the same rmap owner that are owned by another data structure - * (sublist), and subtracted sublist from exlist. In theory the extents - * remaining in exlist are the old btree's blocks. + * (sublist), and subtracted sublist from bitmap. In theory the extents + * remaining in bitmap are the old btree's blocks. * * Unfortunately, it's possible that the btree was crosslinked with other * blocks on disk. The rmap data can tell us if there are multiple owners, so @@ -406,7 +406,7 @@ xrep_init_btblock( * If there are no rmap records at all, we also free the block. If the btree * being rebuilt lives in the free space (bnobt/cntbt/rmapbt) then there isn't * supposed to be a rmap record and everything is ok. For other btrees there - * had to have been an rmap entry for the block to have ended up on @exlist, + * had to have been an rmap entry for the block to have ended up on @bitmap, * so if it's gone now there's something wrong and the fs will shut down. * * Note: If there are multiple rmap records with only the same rmap owner as @@ -419,7 +419,7 @@ xrep_init_btblock( * The caller is responsible for locking the AG headers for the entire rebuild * operation so that nothing else can sneak in and change the AG state while * we're not looking. We also assume that the caller already invalidated any - * buffers associated with @exlist. + * buffers associated with @bitmap. */ /* @@ -429,13 +429,12 @@ xrep_init_btblock( int xrep_invalidate_blocks( struct xfs_scrub *sc, - struct xrep_extent_list *exlist) + struct xfs_bitmap *bitmap) { - struct xrep_extent *rex; - struct xrep_extent *n; + struct xfs_bitmap_range *bmr; + struct xfs_bitmap_range *n; struct xfs_buf *bp; xfs_fsblock_t fsbno; - xfs_agblock_t i; /* * For each block in each extent, see if there's an incore buffer for @@ -445,18 +444,16 @@ xrep_invalidate_blocks( * because we never own those; and if we can't TRYLOCK the buffer we * assume it's owned by someone else. */ - for_each_xrep_extent_safe(rex, n, exlist) { - for (fsbno = rex->fsbno, i = rex->len; i > 0; fsbno++, i--) { - /* Skip AG headers and post-EOFS blocks */ - if (!xfs_verify_fsbno(sc->mp, fsbno)) - continue; - bp = xfs_buf_incore(sc->mp->m_ddev_targp, - XFS_FSB_TO_DADDR(sc->mp, fsbno), - XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK); - if (bp) { - xfs_trans_bjoin(sc->tp, bp); - xfs_trans_binval(sc->tp, bp); - } + for_each_xfs_bitmap_block(fsbno, bmr, n, bitmap) { + /* Skip AG headers and post-EOFS blocks */ + if (!xfs_verify_fsbno(sc->mp, fsbno)) + continue; + bp = xfs_buf_incore(sc->mp->m_ddev_targp, + XFS_FSB_TO_DADDR(sc->mp, fsbno), + XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK); + if (bp) { + xfs_trans_bjoin(sc->tp, bp); + xfs_trans_binval(sc->tp, bp); } } @@ -519,9 +516,9 @@ xrep_put_freelist( return 0; } -/* Dispose of a single metadata block. */ +/* Dispose of a single block. */ STATIC int -xrep_dispose_btree_block( +xrep_reap_block( struct xfs_scrub *sc, xfs_fsblock_t fsbno, struct xfs_owner_info *oinfo, @@ -593,41 +590,35 @@ xrep_dispose_btree_block( return error; } -/* Dispose of btree blocks from an old per-AG btree. */ +/* Dispose of every block of every extent in the bitmap. */ int -xrep_reap_btree_extents( +xrep_reap_extents( struct xfs_scrub *sc, - struct xrep_extent_list *exlist, + struct xfs_bitmap *bitmap, struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type) { - struct xrep_extent *rex; - struct xrep_extent *n; + struct xfs_bitmap_range *bmr; + struct xfs_bitmap_range *n; + xfs_fsblock_t fsbno; int error = 0; ASSERT(xfs_sb_version_hasrmapbt(&sc->mp->m_sb)); - /* Dispose of every block from the old btree. */ - for_each_xrep_extent_safe(rex, n, exlist) { + for_each_xfs_bitmap_block(fsbno, bmr, n, bitmap) { ASSERT(sc->ip != NULL || - XFS_FSB_TO_AGNO(sc->mp, rex->fsbno) == sc->sa.agno); - + XFS_FSB_TO_AGNO(sc->mp, fsbno) == sc->sa.agno); trace_xrep_dispose_btree_extent(sc->mp, - XFS_FSB_TO_AGNO(sc->mp, rex->fsbno), - XFS_FSB_TO_AGBNO(sc->mp, rex->fsbno), rex->len); + XFS_FSB_TO_AGNO(sc->mp, fsbno), + XFS_FSB_TO_AGBNO(sc->mp, fsbno), 1); - for (; rex->len > 0; rex->len--, rex->fsbno++) { - error = xrep_dispose_btree_block(sc, rex->fsbno, - oinfo, type); - if (error) - goto out; - } - list_del(&rex->list); - kmem_free(rex); + error = xrep_reap_block(sc, fsbno, oinfo, type); + if (error) + goto out; } out: - xrep_cancel_btree_extents(sc, exlist); + xfs_bitmap_destroy(bitmap); return error; } diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index a3d491a438f4..5a4e92221916 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -27,13 +27,11 @@ int xrep_init_btblock(struct xfs_scrub *sc, xfs_fsblock_t fsb, struct xfs_buf **bpp, xfs_btnum_t btnum, const struct xfs_buf_ops *ops); -struct xrep_extent_list; +struct xfs_bitmap; int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink); -int xrep_invalidate_blocks(struct xfs_scrub *sc, - struct xrep_extent_list *btlist); -int xrep_reap_btree_extents(struct xfs_scrub *sc, - struct xrep_extent_list *exlist, +int xrep_invalidate_blocks(struct xfs_scrub *sc, struct xfs_bitmap *btlist); +int xrep_reap_extents(struct xfs_scrub *sc, struct xfs_bitmap *exlist, struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type); struct xrep_find_ag_btree { diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 93db22c39b51..4e20f0e48232 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -511,7 +511,6 @@ DEFINE_EVENT(xrep_extent_class, name, \ xfs_agblock_t agbno, xfs_extlen_t len), \ TP_ARGS(mp, agno, agbno, len)) DEFINE_REPAIR_EXTENT_EVENT(xrep_dispose_btree_extent); -DEFINE_REPAIR_EXTENT_EVENT(xrep_collect_btree_extent); DEFINE_REPAIR_EXTENT_EVENT(xrep_agfl_insert); DECLARE_EVENT_CLASS(xrep_rmap_class, From patchwork Mon Jul 30 05:48:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548423 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 453EA139A for ; Mon, 30 Jul 2018 05:48:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 484CB29516 for ; Mon, 30 Jul 2018 05:48:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3C51A2951B; Mon, 30 Jul 2018 05:48:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F85829516 for ; Mon, 30 Jul 2018 05:48:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726322AbeG3HV2 (ORCPT ); Mon, 30 Jul 2018 03:21:28 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:42840 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726259AbeG3HV2 (ORCPT ); Mon, 30 Jul 2018 03:21:28 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5hbUG034093; Mon, 30 Jul 2018 05:48:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=/+vGfVPFfK3nsNYua2M/StfbqKzFlfkQvKjeGJDTY/Q=; b=k8hc9B+ecIuXQP6TA+bUIEGs317Qbd0T5nPOHKyj5A/kZCDtrjjDVqonJ7FIm9IZuK62 ZJEQTLmgKFqbWJz0wwxEnGvlaizR0sOb4yFRliRQtyg1KbvjdeH7B/kmeIfH+8LQdZc+ 7HPLH1LxtWLATaWUXV/2OcJMepDm/TuVh9W0pA2YW09TX6+ya7kqf2RMr1gFeEEq54K4 0tH3tGEeXf0wZE/tcGXBR1OB9F1HCzUKcCcmZQo3oQhmIDcnzwmzzpHfMSfE1JGp9D6+ zw029YXb6P0hEBYvTnsnYCFy72IcJUiteCMco73IhzImMtlegRBM7jWhGKlgkJe124cS 1A== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2120.oracle.com with ESMTP id 2kggentv7q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:05 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5m4aT013985 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:04 GMT Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6U5m30B031263; Mon, 30 Jul 2018 05:48:03 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:48:03 -0700 Subject: [PATCH 02/14] xfs: repair the AGF From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:48:02 -0700 Message-ID: <153292968232.24509.16936110804102265045.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Regenerate the AGF from the rmap data. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/agheader_repair.c | 370 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.c | 27 ++- fs/xfs/scrub/repair.h | 4 fs/xfs/scrub/scrub.c | 2 4 files changed, 393 insertions(+), 10 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c index 1e96621ece3a..4842fc598c9b 100644 --- a/fs/xfs/scrub/agheader_repair.c +++ b/fs/xfs/scrub/agheader_repair.c @@ -17,12 +17,19 @@ #include "xfs_sb.h" #include "xfs_inode.h" #include "xfs_alloc.h" +#include "xfs_alloc_btree.h" #include "xfs_ialloc.h" +#include "xfs_ialloc_btree.h" #include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_refcount.h" +#include "xfs_refcount_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" /* Superblock */ @@ -54,3 +61,366 @@ xrep_superblock( xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1); return error; } + +/* AGF */ + +struct xrep_agf_allocbt { + struct xfs_scrub *sc; + xfs_agblock_t freeblks; + xfs_agblock_t longest; +}; + +/* Record free space shape information. */ +STATIC int +xrep_agf_walk_allocbt( + struct xfs_btree_cur *cur, + struct xfs_alloc_rec_incore *rec, + void *priv) +{ + struct xrep_agf_allocbt *raa = priv; + int error = 0; + + if (xchk_should_terminate(raa->sc, &error)) + return error; + + raa->freeblks += rec->ar_blockcount; + if (rec->ar_blockcount > raa->longest) + raa->longest = rec->ar_blockcount; + return error; +} + +/* Does this AGFL block look sane? */ +STATIC int +xrep_agf_check_agfl_block( + struct xfs_mount *mp, + xfs_agblock_t agbno, + void *priv) +{ + struct xfs_scrub *sc = priv; + + if (!xfs_verify_agbno(mp, sc->sa.agno, agbno)) + return -EFSCORRUPTED; + return 0; +} + +/* + * Offset within the xrep_find_ag_btree array for each btree type. Avoid the + * XFS_BTNUM_ names here to avoid creating a sparse array. + */ +enum { + XREP_AGF_BNOBT = 0, + XREP_AGF_CNTBT, + XREP_AGF_RMAPBT, + XREP_AGF_REFCOUNTBT, + XREP_AGF_END, + XREP_AGF_MAX +}; + +/* + * Given the btree roots described by *fab, find the roots, check them for + * sanity, and pass the root data back out via *fab. + * + * This is /also/ a chicken and egg problem because we have to use the rmapbt + * (rooted in the AGF) to find the btrees rooted in the AGF. We also have no + * idea if the btrees make any sense. If we hit obvious corruptions in those + * btrees we'll bail out. + */ +STATIC int +xrep_agf_find_btrees( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp, + struct xrep_find_ag_btree *fab, + struct xfs_buf *agfl_bp) +{ + struct xfs_agf *old_agf = XFS_BUF_TO_AGF(agf_bp); + int error; + + /* Go find the root data. */ + error = xrep_find_ag_btree_roots(sc, agf_bp, fab, agfl_bp); + if (error) + return error; + + /* We must find the bnobt, cntbt, and rmapbt roots. */ + if (fab[XREP_AGF_BNOBT].root == NULLAGBLOCK || + fab[XREP_AGF_BNOBT].height > XFS_BTREE_MAXLEVELS || + fab[XREP_AGF_CNTBT].root == NULLAGBLOCK || + fab[XREP_AGF_CNTBT].height > XFS_BTREE_MAXLEVELS || + fab[XREP_AGF_RMAPBT].root == NULLAGBLOCK || + fab[XREP_AGF_RMAPBT].height > XFS_BTREE_MAXLEVELS) + return -EFSCORRUPTED; + + /* + * We relied on the rmapbt to reconstruct the AGF. If we get a + * different root then something's seriously wrong. + */ + if (fab[XREP_AGF_RMAPBT].root != + be32_to_cpu(old_agf->agf_roots[XFS_BTNUM_RMAPi])) + return -EFSCORRUPTED; + + /* We must find the refcountbt root if that feature is enabled. */ + if (xfs_sb_version_hasreflink(&sc->mp->m_sb) && + (fab[XREP_AGF_REFCOUNTBT].root == NULLAGBLOCK || + fab[XREP_AGF_REFCOUNTBT].height > XFS_BTREE_MAXLEVELS)) + return -EFSCORRUPTED; + + return 0; +} + +/* + * Reinitialize the AGF header, making an in-core copy of the old contents so + * that we know which in-core state needs to be reinitialized. + */ +STATIC void +xrep_agf_init_header( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp, + struct xfs_agf *old_agf) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_agf *agf = XFS_BUF_TO_AGF(agf_bp); + + memcpy(old_agf, agf, sizeof(*old_agf)); + memset(agf, 0, BBTOB(agf_bp->b_length)); + agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC); + agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION); + agf->agf_seqno = cpu_to_be32(sc->sa.agno); + agf->agf_length = cpu_to_be32(xfs_ag_block_count(mp, sc->sa.agno)); + agf->agf_flfirst = old_agf->agf_flfirst; + agf->agf_fllast = old_agf->agf_fllast; + agf->agf_flcount = old_agf->agf_flcount; + if (xfs_sb_version_hascrc(&mp->m_sb)) + uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid); + + /* Mark the incore AGF data stale until we're done fixing things. */ + ASSERT(sc->sa.pag->pagf_init); + sc->sa.pag->pagf_init = 0; +} + +/* Set btree root information in an AGF. */ +STATIC void +xrep_agf_set_roots( + struct xfs_scrub *sc, + struct xfs_agf *agf, + struct xrep_find_ag_btree *fab) +{ + agf->agf_roots[XFS_BTNUM_BNOi] = + cpu_to_be32(fab[XREP_AGF_BNOBT].root); + agf->agf_levels[XFS_BTNUM_BNOi] = + cpu_to_be32(fab[XREP_AGF_BNOBT].height); + + agf->agf_roots[XFS_BTNUM_CNTi] = + cpu_to_be32(fab[XREP_AGF_CNTBT].root); + agf->agf_levels[XFS_BTNUM_CNTi] = + cpu_to_be32(fab[XREP_AGF_CNTBT].height); + + agf->agf_roots[XFS_BTNUM_RMAPi] = + cpu_to_be32(fab[XREP_AGF_RMAPBT].root); + agf->agf_levels[XFS_BTNUM_RMAPi] = + cpu_to_be32(fab[XREP_AGF_RMAPBT].height); + + if (xfs_sb_version_hasreflink(&sc->mp->m_sb)) { + agf->agf_refcount_root = + cpu_to_be32(fab[XREP_AGF_REFCOUNTBT].root); + agf->agf_refcount_level = + cpu_to_be32(fab[XREP_AGF_REFCOUNTBT].height); + } +} + +/* Update all AGF fields which derive from btree contents. */ +STATIC int +xrep_agf_calc_from_btrees( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp) +{ + struct xrep_agf_allocbt raa = { .sc = sc }; + struct xfs_btree_cur *cur = NULL; + struct xfs_agf *agf = XFS_BUF_TO_AGF(agf_bp); + struct xfs_mount *mp = sc->mp; + xfs_agblock_t btreeblks; + xfs_agblock_t blocks; + int error; + + /* Update the AGF counters from the bnobt. */ + cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno, + XFS_BTNUM_BNO); + error = xfs_alloc_query_all(cur, xrep_agf_walk_allocbt, &raa); + if (error) + goto err; + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + btreeblks = blocks - 1; + agf->agf_freeblks = cpu_to_be32(raa.freeblks); + agf->agf_longest = cpu_to_be32(raa.longest); + + /* Update the AGF counters from the cntbt. */ + cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno, + XFS_BTNUM_CNT); + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + btreeblks += blocks - 1; + + /* Update the AGF counters from the rmapbt. */ + cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno); + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + agf->agf_rmap_blocks = cpu_to_be32(blocks); + btreeblks += blocks - 1; + + agf->agf_btreeblks = cpu_to_be32(btreeblks); + + /* Update the AGF counters from the refcountbt. */ + if (xfs_sb_version_hasreflink(&mp->m_sb)) { + cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp, + sc->sa.agno); + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + agf->agf_refcount_blocks = cpu_to_be32(blocks); + } + + return 0; +err: + xfs_btree_del_cursor(cur, error); + return error; +} + +/* Commit the new AGF and reinitialize the incore state. */ +STATIC int +xrep_agf_commit_new( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp) +{ + struct xfs_perag *pag; + struct xfs_agf *agf = XFS_BUF_TO_AGF(agf_bp); + + /* Trigger fdblocks recalculation */ + xfs_force_summary_recalc(sc->mp); + + /* Write this to disk. */ + xfs_trans_buf_set_type(sc->tp, agf_bp, XFS_BLFT_AGF_BUF); + xfs_trans_log_buf(sc->tp, agf_bp, 0, BBTOB(agf_bp->b_length) - 1); + + /* Now reinitialize the in-core counters we changed. */ + pag = sc->sa.pag; + pag->pagf_btreeblks = be32_to_cpu(agf->agf_btreeblks); + pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks); + pag->pagf_longest = be32_to_cpu(agf->agf_longest); + pag->pagf_levels[XFS_BTNUM_BNOi] = + be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]); + pag->pagf_levels[XFS_BTNUM_CNTi] = + be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]); + pag->pagf_levels[XFS_BTNUM_RMAPi] = + be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]); + pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); + pag->pagf_init = 1; + + return 0; +} + +/* Repair the AGF. v5 filesystems only. */ +int +xrep_agf( + struct xfs_scrub *sc) +{ + struct xrep_find_ag_btree fab[XREP_AGF_MAX] = { + [XREP_AGF_BNOBT] = { + .rmap_owner = XFS_RMAP_OWN_AG, + .buf_ops = &xfs_allocbt_buf_ops, + .magic = XFS_ABTB_CRC_MAGIC, + }, + [XREP_AGF_CNTBT] = { + .rmap_owner = XFS_RMAP_OWN_AG, + .buf_ops = &xfs_allocbt_buf_ops, + .magic = XFS_ABTC_CRC_MAGIC, + }, + [XREP_AGF_RMAPBT] = { + .rmap_owner = XFS_RMAP_OWN_AG, + .buf_ops = &xfs_rmapbt_buf_ops, + .magic = XFS_RMAP_CRC_MAGIC, + }, + [XREP_AGF_REFCOUNTBT] = { + .rmap_owner = XFS_RMAP_OWN_REFC, + .buf_ops = &xfs_refcountbt_buf_ops, + .magic = XFS_REFC_CRC_MAGIC, + }, + [XREP_AGF_END] = { + .buf_ops = NULL, + }, + }; + struct xfs_agf old_agf; + struct xfs_mount *mp = sc->mp; + struct xfs_buf *agf_bp; + struct xfs_buf *agfl_bp; + struct xfs_agf *agf; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + /* + * Make sure we have the AGF buffer, as scrub might have decided it + * was corrupt after xfs_alloc_read_agf failed with -EFSCORRUPTED. + */ + error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp, + XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)), + XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL); + if (error) + return error; + agf_bp->b_ops = &xfs_agf_buf_ops; + agf = XFS_BUF_TO_AGF(agf_bp); + + /* + * Load the AGFL so that we can screen out OWN_AG blocks that are on + * the AGFL now; these blocks might have once been part of the + * bno/cnt/rmap btrees but are not now. This is a chicken and egg + * problem: the AGF is corrupt, so we have to trust the AGFL contents + * because we can't do any serious cross-referencing with any of the + * btrees rooted in the AGF. If the AGFL contents are obviously bad + * then we'll bail out. + */ + error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp); + if (error) + return error; + + /* + * Spot-check the AGFL blocks; if they're obviously corrupt then + * there's nothing we can do but bail out. + */ + error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp, + xrep_agf_check_agfl_block, sc); + if (error) + return error; + + /* + * Find the AGF btree roots. This is also a chicken-and-egg situation; + * see the function for more details. + */ + error = xrep_agf_find_btrees(sc, agf_bp, fab, agfl_bp); + if (error) + return error; + + /* Start rewriting the header and implant the btrees we found. */ + xrep_agf_init_header(sc, agf_bp, &old_agf); + xrep_agf_set_roots(sc, agf, fab); + error = xrep_agf_calc_from_btrees(sc, agf_bp); + if (error) + goto out_revert; + + /* Commit the changes and reinitialize incore state. */ + return xrep_agf_commit_new(sc, agf_bp); + +out_revert: + /* Mark the incore AGF state stale and revert the AGF. */ + sc->sa.pag->pagf_init = 0; + memcpy(agf, &old_agf, sizeof(old_agf)); + return error; +} diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 85b048b341a0..17cf48564390 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -128,9 +128,12 @@ xrep_roll_ag_trans( int error; /* Keep the AG header buffers locked so we can keep going. */ - xfs_trans_bhold(sc->tp, sc->sa.agi_bp); - xfs_trans_bhold(sc->tp, sc->sa.agf_bp); - xfs_trans_bhold(sc->tp, sc->sa.agfl_bp); + if (sc->sa.agi_bp) + xfs_trans_bhold(sc->tp, sc->sa.agi_bp); + if (sc->sa.agf_bp) + xfs_trans_bhold(sc->tp, sc->sa.agf_bp); + if (sc->sa.agfl_bp) + xfs_trans_bhold(sc->tp, sc->sa.agfl_bp); /* Roll the transaction. */ error = xfs_trans_roll(&sc->tp); @@ -138,9 +141,12 @@ xrep_roll_ag_trans( goto out_release; /* Join AG headers to the new transaction. */ - xfs_trans_bjoin(sc->tp, sc->sa.agi_bp); - xfs_trans_bjoin(sc->tp, sc->sa.agf_bp); - xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp); + if (sc->sa.agi_bp) + xfs_trans_bjoin(sc->tp, sc->sa.agi_bp); + if (sc->sa.agf_bp) + xfs_trans_bjoin(sc->tp, sc->sa.agf_bp); + if (sc->sa.agfl_bp) + xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp); return 0; @@ -150,9 +156,12 @@ xrep_roll_ag_trans( * buffers will be released during teardown on our way out * of the kernel. */ - xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp); - xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp); - xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp); + if (sc->sa.agi_bp) + xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp); + if (sc->sa.agf_bp) + xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp); + if (sc->sa.agfl_bp) + xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp); return error; } diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 5a4e92221916..1d283360b5ab 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -58,6 +58,8 @@ int xrep_ino_dqattach(struct xfs_scrub *sc); int xrep_probe(struct xfs_scrub *sc); int xrep_superblock(struct xfs_scrub *sc); +int xrep_agf(struct xfs_scrub *sc); +int xrep_agfl(struct xfs_scrub *sc); #else @@ -81,6 +83,8 @@ xrep_calc_ag_resblks( #define xrep_probe xrep_notsupported #define xrep_superblock xrep_notsupported +#define xrep_agf xrep_notsupported +#define xrep_agfl xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 6efb926f3cf8..1e8a17c8e2b9 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -214,7 +214,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_PERAG, .setup = xchk_setup_fs, .scrub = xchk_agf, - .repair = xrep_notsupported, + .repair = xrep_agf, }, [XFS_SCRUB_TYPE_AGFL]= { /* agfl */ .type = ST_PERAG, From patchwork Mon Jul 30 05:48:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548425 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A56231751 for ; Mon, 30 Jul 2018 05:48:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 93B1C29917 for ; Mon, 30 Jul 2018 05:48:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 880D42991D; Mon, 30 Jul 2018 05:48:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A09CC29917 for ; Mon, 30 Jul 2018 05:48:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726344AbeG3HVe (ORCPT ); Mon, 30 Jul 2018 03:21:34 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:53490 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726259AbeG3HVe (ORCPT ); Mon, 30 Jul 2018 03:21:34 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5hiPZ004969; Mon, 30 Jul 2018 05:48:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=I78+qorqu/j+UOzuR4K1rKHXil96eY3vkDJy1EqjGL0=; b=FJAXCF7McAKBaScyMtikdOzK5aU/4JAEE4Nn0+6WHdFk3HE3f4uhW0+3XLuUQuy2vIia 3jRVjTDLjUNypyFC5oTwZHSabekxtM78UGyacHg5xhgyZy0MBEUv59W6KsIinKxq0Uzp ZqO4pmlfubclUH08w+31ukXq8IrzH++i5kkrI7cOi8sYvqFhgWSErveSeuu6qVdJSpHJ UB+/ZPv1zSYXAO91kvkBz+Fe0NZcR7Bk5zO0QRd8y6je0AgOlm9QpS427xb8MjTbGsQr RS0kX9kn+uWX/puxz1h8SOepXK1kaMLhnInN47LBcSGVQV3vdYyaVpdIZmBFPEc3nOnq ww== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2kgfwstx1d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:11 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mAY3018883 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:10 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mAiu004000; Mon, 30 Jul 2018 05:48:10 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:48:09 -0700 Subject: [PATCH 03/14] xfs: repair the AGFL From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:48:08 -0700 Message-ID: <153292968888.24509.5021796491286828939.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Repair the AGFL from the rmap data. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/agheader_repair.c | 276 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/bitmap.c | 92 +++++++++++++ fs/xfs/scrub/bitmap.h | 4 + fs/xfs/scrub/scrub.c | 2 4 files changed, 373 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c index 4842fc598c9b..bfef066c87c3 100644 --- a/fs/xfs/scrub/agheader_repair.c +++ b/fs/xfs/scrub/agheader_repair.c @@ -424,3 +424,279 @@ xrep_agf( memcpy(agf, &old_agf, sizeof(old_agf)); return error; } + +/* AGFL */ + +struct xrep_agfl { + /* Bitmap of other OWN_AG metadata blocks. */ + struct xfs_bitmap agmetablocks; + + /* Bitmap of free space. */ + struct xfs_bitmap *freesp; + + struct xfs_scrub *sc; +}; + +/* Record all OWN_AG (free space btree) information from the rmap data. */ +STATIC int +xrep_agfl_walk_rmap( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_agfl *ra = priv; + xfs_fsblock_t fsb; + int error = 0; + + if (xchk_should_terminate(ra->sc, &error)) + return error; + + /* Record all the OWN_AG blocks. */ + if (rec->rm_owner == XFS_RMAP_OWN_AG) { + fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, + rec->rm_startblock); + error = xfs_bitmap_set(ra->freesp, fsb, rec->rm_blockcount); + if (error) + return error; + } + + return xfs_bitmap_set_btcur_path(&ra->agmetablocks, cur); +} + +/* + * Map out all the non-AGFL OWN_AG space in this AG so that we can deduce + * which blocks belong to the AGFL. + * + * Compute the set of old AGFL blocks by subtracting from the list of OWN_AG + * blocks the list of blocks owned by all other OWN_AG metadata (bnobt, cntbt, + * rmapbt). These are the old AGFL blocks, so return that list and the number + * of blocks we're actually going to put back on the AGFL. + */ +STATIC int +xrep_agfl_collect_blocks( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp, + struct xfs_bitmap *agfl_extents, + xfs_agblock_t *flcount) +{ + struct xrep_agfl ra; + struct xfs_mount *mp = sc->mp; + struct xfs_btree_cur *cur; + struct xfs_bitmap_range *br; + struct xfs_bitmap_range *n; + int error; + + ra.sc = sc; + ra.freesp = agfl_extents; + xfs_bitmap_init(&ra.agmetablocks); + + /* Find all space used by the free space btrees & rmapbt. */ + cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno); + error = xfs_rmap_query_all(cur, xrep_agfl_walk_rmap, &ra); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + + /* Find all blocks currently being used by the bnobt. */ + cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno, + XFS_BTNUM_BNO); + error = xfs_bitmap_set_btblocks(&ra.agmetablocks, cur); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + + /* Find all blocks currently being used by the cntbt. */ + cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno, + XFS_BTNUM_CNT); + error = xfs_bitmap_set_btblocks(&ra.agmetablocks, cur); + if (error) + goto err; + + xfs_btree_del_cursor(cur, error); + + /* + * Drop the freesp meta blocks that are in use by btrees. + * The remaining blocks /should/ be AGFL blocks. + */ + error = xfs_bitmap_disunion(agfl_extents, &ra.agmetablocks); + xfs_bitmap_destroy(&ra.agmetablocks); + if (error) + return error; + + /* + * Calculate the new AGFL size. If we found more blocks than fit in + * the AGFL we'll free them later. + */ + *flcount = 0; + for_each_xfs_bitmap_extent(br, n, agfl_extents) { + *flcount += br->len; + if (*flcount > xfs_agfl_size(mp)) + break; + } + if (*flcount > xfs_agfl_size(mp)) + *flcount = xfs_agfl_size(mp); + return 0; + +err: + xfs_bitmap_destroy(&ra.agmetablocks); + xfs_btree_del_cursor(cur, error); + return error; +} + +/* Update the AGF and reset the in-core state. */ +STATIC int +xrep_agfl_update_agf( + struct xfs_scrub *sc, + struct xfs_buf *agf_bp, + xfs_agblock_t flcount) +{ + struct xfs_agf *agf = XFS_BUF_TO_AGF(agf_bp); + + ASSERT(flcount <= xfs_agfl_size(sc->mp)); + + /* Trigger fdblocks recalculation */ + xfs_force_summary_recalc(sc->mp); + + /* Update the AGF counters. */ + if (sc->sa.pag->pagf_init) + sc->sa.pag->pagf_flcount = flcount; + agf->agf_flfirst = cpu_to_be32(0); + agf->agf_flcount = cpu_to_be32(flcount); + agf->agf_fllast = cpu_to_be32(flcount - 1); + + xfs_alloc_log_agf(sc->tp, agf_bp, + XFS_AGF_FLFIRST | XFS_AGF_FLLAST | XFS_AGF_FLCOUNT); + return 0; +} + +/* Write out a totally new AGFL. */ +STATIC void +xrep_agfl_init_header( + struct xfs_scrub *sc, + struct xfs_buf *agfl_bp, + struct xfs_bitmap *agfl_extents, + xfs_agblock_t flcount) +{ + struct xfs_mount *mp = sc->mp; + __be32 *agfl_bno; + struct xfs_bitmap_range *br; + struct xfs_bitmap_range *n; + struct xfs_agfl *agfl; + xfs_agblock_t agbno; + unsigned int fl_off; + + ASSERT(flcount <= xfs_agfl_size(mp)); + + /* Start rewriting the header. */ + agfl = XFS_BUF_TO_AGFL(agfl_bp); + memset(agfl, 0xFF, BBTOB(agfl_bp->b_length)); + agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC); + agfl->agfl_seqno = cpu_to_be32(sc->sa.agno); + uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid); + + /* + * Fill the AGFL with the remaining blocks. If agfl_extents has more + * blocks than fit in the AGFL, they will be freed in a subsequent + * step. + */ + fl_off = 0; + agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agfl_bp); + for_each_xfs_bitmap_extent(br, n, agfl_extents) { + agbno = XFS_FSB_TO_AGBNO(mp, br->start); + + trace_xrep_agfl_insert(mp, sc->sa.agno, agbno, br->len); + + while (br->len > 0 && fl_off < flcount) { + agfl_bno[fl_off] = cpu_to_be32(agbno); + fl_off++; + agbno++; + br->start++; + br->len--; + } + + if (br->len) + break; + list_del(&br->list); + kmem_free(br); + } + + /* Write new AGFL to disk. */ + xfs_trans_buf_set_type(sc->tp, agfl_bp, XFS_BLFT_AGFL_BUF); + xfs_trans_log_buf(sc->tp, agfl_bp, 0, BBTOB(agfl_bp->b_length) - 1); +} + +/* Repair the AGFL. */ +int +xrep_agfl( + struct xfs_scrub *sc) +{ + struct xfs_owner_info oinfo; + struct xfs_bitmap agfl_extents; + struct xfs_mount *mp = sc->mp; + struct xfs_buf *agf_bp; + struct xfs_buf *agfl_bp; + xfs_agblock_t flcount; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + xfs_bitmap_init(&agfl_extents); + + /* + * Read the AGF so that we can query the rmapbt. We hope that there's + * nothing wrong with the AGF, but all the AG header repair functions + * have this chicken-and-egg problem. + */ + error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp); + if (error) + return error; + if (!agf_bp) + return -ENOMEM; + + /* + * Make sure we have the AGFL buffer, as scrub might have decided it + * was corrupt after xfs_alloc_read_agfl failed with -EFSCORRUPTED. + */ + error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp, + XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)), + XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL); + if (error) + return error; + agfl_bp->b_ops = &xfs_agfl_buf_ops; + + /* Gather all the extents we're going to put on the new AGFL. */ + error = xrep_agfl_collect_blocks(sc, agf_bp, &agfl_extents, &flcount); + if (error) + goto err; + + /* + * Update AGF and AGFL. We reset the global free block counter when + * we adjust the AGF flcount (which can fail) so avoid updating any + * buffers until we know that part works. + */ + error = xrep_agfl_update_agf(sc, agf_bp, flcount); + if (error) + goto err; + xrep_agfl_init_header(sc, agfl_bp, &agfl_extents, flcount); + + /* + * Ok, the AGFL should be ready to go now. Roll the transaction to + * make the new AGFL permanent before we start using it to return + * freespace overflow to the freespace btrees. + */ + sc->sa.agf_bp = agf_bp; + sc->sa.agfl_bp = agfl_bp; + error = xrep_roll_ag_trans(sc); + if (error) + goto err; + + /* Dump any AGFL overflow. */ + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG); + return xrep_reap_extents(sc, &agfl_extents, &oinfo, XFS_AG_RESV_AGFL); +err: + xfs_bitmap_destroy(&agfl_extents); + return error; +} diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c index c770e2d0b6aa..fdadc9e1dc49 100644 --- a/fs/xfs/scrub/bitmap.c +++ b/fs/xfs/scrub/bitmap.c @@ -9,6 +9,7 @@ #include "xfs_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" +#include "xfs_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -209,3 +210,94 @@ xfs_bitmap_disunion( } #undef LEFT_ALIGNED #undef RIGHT_ALIGNED + +/* + * Record all btree blocks seen while iterating all records of a btree. + * + * We know that the btree query_all function starts at the left edge and walks + * towards the right edge of the tree. Therefore, we know that we can walk up + * the btree cursor towards the root; if the pointer for a given level points + * to the first record/key in that block, we haven't seen this block before; + * and therefore we need to remember that we saw this block in the btree. + * + * So if our btree is: + * + * 4 + * / | \ + * 1 2 3 + * + * Pretend for this example that each leaf block has 100 btree records. For + * the first btree record, we'll observe that bc_ptrs[0] == 1, so we record + * that we saw block 1. Then we observe that bc_ptrs[1] == 1, so we record + * block 4. The list is [1, 4]. + * + * For the second btree record, we see that bc_ptrs[0] == 2, so we exit the + * loop. The list remains [1, 4]. + * + * For the 101st btree record, we've moved onto leaf block 2. Now + * bc_ptrs[0] == 1 again, so we record that we saw block 2. We see that + * bc_ptrs[1] == 2, so we exit the loop. The list is now [1, 4, 2]. + * + * For the 102nd record, bc_ptrs[0] == 2, so we continue. + * + * For the 201st record, we've moved on to leaf block 3. bc_ptrs[0] == 1, so + * we add 3 to the list. Now it is [1, 4, 2, 3]. + * + * For the 300th record we just exit, with the list being [1, 4, 2, 3]. + */ + +/* + * Record all the buffers pointed to by the btree cursor. Callers already + * engaged in a btree walk should call this function to capture the list of + * blocks going from the leaf towards the root. + */ +int +xfs_bitmap_set_btcur_path( + struct xfs_bitmap *bitmap, + struct xfs_btree_cur *cur) +{ + struct xfs_buf *bp; + xfs_fsblock_t fsb; + int i; + int error; + + for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) { + xfs_btree_get_block(cur, i, &bp); + if (!bp) + continue; + fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn); + error = xfs_bitmap_set(bitmap, fsb, 1); + if (error) + return error; + } + + return 0; +} + +/* Collect a btree's block in the bitmap. */ +STATIC int +xfs_bitmap_collect_btblock( + struct xfs_btree_cur *cur, + int level, + void *priv) +{ + struct xfs_bitmap *bitmap = priv; + struct xfs_buf *bp; + xfs_fsblock_t fsbno; + + xfs_btree_get_block(cur, level, &bp); + if (!bp) + return 0; + + fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn); + return xfs_bitmap_set(bitmap, fsbno, 1); +} + +/* Walk the btree and mark the bitmap wherever a btree block is found. */ +int +xfs_bitmap_set_btblocks( + struct xfs_bitmap *bitmap, + struct xfs_btree_cur *cur) +{ + return xfs_btree_visit_blocks(cur, xfs_bitmap_collect_btblock, bitmap); +} diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h index dad652ee9177..ae8ecbce6fa6 100644 --- a/fs/xfs/scrub/bitmap.h +++ b/fs/xfs/scrub/bitmap.h @@ -28,5 +28,9 @@ void xfs_bitmap_destroy(struct xfs_bitmap *bitmap); int xfs_bitmap_set(struct xfs_bitmap *bitmap, uint64_t start, uint64_t len); int xfs_bitmap_disunion(struct xfs_bitmap *bitmap, struct xfs_bitmap *sub); +int xfs_bitmap_set_btcur_path(struct xfs_bitmap *bitmap, + struct xfs_btree_cur *cur); +int xfs_bitmap_set_btblocks(struct xfs_bitmap *bitmap, + struct xfs_btree_cur *cur); #endif /* __XFS_SCRUB_BITMAP_H__ */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 1e8a17c8e2b9..2670f4cf62f4 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -220,7 +220,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_PERAG, .setup = xchk_setup_fs, .scrub = xchk_agfl, - .repair = xrep_notsupported, + .repair = xrep_agfl, }, [XFS_SCRUB_TYPE_AGI] = { /* agi */ .type = ST_PERAG, From patchwork Mon Jul 30 05:48:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548427 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1589D1751 for ; Mon, 30 Jul 2018 05:48:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 03B2A29917 for ; Mon, 30 Jul 2018 05:48:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EC1BB2991D; Mon, 30 Jul 2018 05:48:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4ACE429917 for ; Mon, 30 Jul 2018 05:48:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726366AbeG3HVj (ORCPT ); Mon, 30 Jul 2018 03:21:39 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:59178 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726349AbeG3HVi (ORCPT ); Mon, 30 Jul 2018 03:21:38 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5hldv027197; Mon, 30 Jul 2018 05:48:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=2EE9J+Dx2wn0/X9D+ElAWd5QzyTOl+uE6Dgjg5Nk3sQ=; b=dy5sESfsH8DSy3hJkgF5Uh/Bp+Liti32puPF4k4OGAYzUXU0/BvJ7rjPCHuaVNQgV+jY hgtP9LDJhNa4Dej6zppWKdKASx27cUPmnx+THBf3ryYnHesGmyWYia6P4pxllHyNQADf /vAcPQmna4ESGQzmQLr2CwJ8ZvKEut7VAy2QRH5vo1pwOOOsyZvgcgVcdEhahp+lAhT0 kXRZC3RsuIhafrSZi9sNqe0v8kG0BD3PFAcTOuIxLpZabHC/qlI67vzv8Mi7t/NePYCQ +/Xf5JdFyXAFhHJr2QV7su9L7/VgUK6ub680q8bkqV54ktnVH9meJKaXqVYuyjBo5bjs 2g== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2130.oracle.com with ESMTP id 2kge0cu0m4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:17 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mHoQ014429 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:17 GMT Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6U5mGof003046; Mon, 30 Jul 2018 05:48:16 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:48:16 -0700 Subject: [PATCH 04/14] xfs: repair the AGI From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:48:15 -0700 Message-ID: <153292969532.24509.17576845400762793279.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Rebuild the AGI header items with some help from the rmapbt. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/agheader_repair.c | 220 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 2 3 files changed, 223 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c index bfef066c87c3..921e7d42a2ef 100644 --- a/fs/xfs/scrub/agheader_repair.c +++ b/fs/xfs/scrub/agheader_repair.c @@ -700,3 +700,223 @@ xrep_agfl( xfs_bitmap_destroy(&agfl_extents); return error; } + +/* AGI */ + +/* + * Offset within the xrep_find_ag_btree array for each btree type. Avoid the + * XFS_BTNUM_ names here to avoid creating a sparse array. + */ +enum { + XREP_AGI_INOBT = 0, + XREP_AGI_FINOBT, + XREP_AGI_END, + XREP_AGI_MAX +}; + +/* + * Given the inode btree roots described by *fab, find the roots, check them + * for sanity, and pass the root data back out via *fab. + */ +STATIC int +xrep_agi_find_btrees( + struct xfs_scrub *sc, + struct xrep_find_ag_btree *fab) +{ + struct xfs_buf *agf_bp; + struct xfs_mount *mp = sc->mp; + int error; + + /* Read the AGF. */ + error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp); + if (error) + return error; + if (!agf_bp) + return -ENOMEM; + + /* Find the btree roots. */ + error = xrep_find_ag_btree_roots(sc, agf_bp, fab, NULL); + if (error) + return error; + + /* We must find the inobt root. */ + if (fab[XREP_AGI_INOBT].root == NULLAGBLOCK || + fab[XREP_AGI_INOBT].height > XFS_BTREE_MAXLEVELS) + return -EFSCORRUPTED; + + /* We must find the finobt root if that feature is enabled. */ + if (xfs_sb_version_hasfinobt(&mp->m_sb) && + (fab[XREP_AGI_FINOBT].root == NULLAGBLOCK || + fab[XREP_AGI_FINOBT].height > XFS_BTREE_MAXLEVELS)) + return -EFSCORRUPTED; + + return 0; +} + +/* + * Reinitialize the AGI header, making an in-core copy of the old contents so + * that we know which in-core state needs to be reinitialized. + */ +STATIC void +xrep_agi_init_header( + struct xfs_scrub *sc, + struct xfs_buf *agi_bp, + struct xfs_agi *old_agi) +{ + struct xfs_agi *agi = XFS_BUF_TO_AGI(agi_bp); + struct xfs_mount *mp = sc->mp; + + memcpy(old_agi, agi, sizeof(*old_agi)); + memset(agi, 0, BBTOB(agi_bp->b_length)); + agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC); + agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION); + agi->agi_seqno = cpu_to_be32(sc->sa.agno); + agi->agi_length = cpu_to_be32(xfs_ag_block_count(mp, sc->sa.agno)); + agi->agi_newino = cpu_to_be32(NULLAGINO); + agi->agi_dirino = cpu_to_be32(NULLAGINO); + if (xfs_sb_version_hascrc(&mp->m_sb)) + uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid); + + /* We don't know how to fix the unlinked list yet. */ + memcpy(&agi->agi_unlinked, &old_agi->agi_unlinked, + sizeof(agi->agi_unlinked)); + + /* Mark the incore AGF data stale until we're done fixing things. */ + ASSERT(sc->sa.pag->pagi_init); + sc->sa.pag->pagi_init = 0; +} + +/* Set btree root information in an AGI. */ +STATIC void +xrep_agi_set_roots( + struct xfs_scrub *sc, + struct xfs_agi *agi, + struct xrep_find_ag_btree *fab) +{ + agi->agi_root = cpu_to_be32(fab[XREP_AGI_INOBT].root); + agi->agi_level = cpu_to_be32(fab[XREP_AGI_INOBT].height); + + if (xfs_sb_version_hasfinobt(&sc->mp->m_sb)) { + agi->agi_free_root = cpu_to_be32(fab[XREP_AGI_FINOBT].root); + agi->agi_free_level = cpu_to_be32(fab[XREP_AGI_FINOBT].height); + } +} + +/* Update the AGI counters. */ +STATIC int +xrep_agi_calc_from_btrees( + struct xfs_scrub *sc, + struct xfs_buf *agi_bp) +{ + struct xfs_btree_cur *cur; + struct xfs_agi *agi = XFS_BUF_TO_AGI(agi_bp); + struct xfs_mount *mp = sc->mp; + xfs_agino_t count; + xfs_agino_t freecount; + int error; + + cur = xfs_inobt_init_cursor(mp, sc->tp, agi_bp, sc->sa.agno, + XFS_BTNUM_INO); + error = xfs_ialloc_count_inodes(cur, &count, &freecount); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + + agi->agi_count = cpu_to_be32(count); + agi->agi_freecount = cpu_to_be32(freecount); + return 0; +err: + xfs_btree_del_cursor(cur, error); + return error; +} + +/* Trigger reinitialization of the in-core data. */ +STATIC int +xrep_agi_commit_new( + struct xfs_scrub *sc, + struct xfs_buf *agi_bp, + const struct xfs_agi *old_agi) +{ + struct xfs_perag *pag; + struct xfs_agi *agi = XFS_BUF_TO_AGI(agi_bp); + + /* Trigger inode count recalculation */ + xfs_force_summary_recalc(sc->mp); + + /* Write this to disk. */ + xfs_trans_buf_set_type(sc->tp, agi_bp, XFS_BLFT_AGI_BUF); + xfs_trans_log_buf(sc->tp, agi_bp, 0, BBTOB(agi_bp->b_length) - 1); + + /* Now reinitialize the in-core counters if necessary. */ + pag = sc->sa.pag; + sc->sa.pag->pagi_init = 1; + pag->pagi_count = be32_to_cpu(agi->agi_count); + pag->pagi_freecount = be32_to_cpu(agi->agi_freecount); + + return 0; +} + +/* Repair the AGI. */ +int +xrep_agi( + struct xfs_scrub *sc) +{ + struct xrep_find_ag_btree fab[XREP_AGI_MAX] = { + [XREP_AGI_INOBT] = { + .rmap_owner = XFS_RMAP_OWN_INOBT, + .buf_ops = &xfs_inobt_buf_ops, + .magic = XFS_IBT_CRC_MAGIC, + }, + [XREP_AGI_FINOBT] = { + .rmap_owner = XFS_RMAP_OWN_INOBT, + .buf_ops = &xfs_inobt_buf_ops, + .magic = XFS_FIBT_CRC_MAGIC, + }, + [XREP_AGI_END] = { + .buf_ops = NULL + }, + }; + struct xfs_agi old_agi; + struct xfs_mount *mp = sc->mp; + struct xfs_buf *agi_bp; + struct xfs_agi *agi; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + /* + * Make sure we have the AGI buffer, as scrub might have decided it + * was corrupt after xfs_ialloc_read_agi failed with -EFSCORRUPTED. + */ + error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp, + XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGI_DADDR(mp)), + XFS_FSS_TO_BB(mp, 1), 0, &agi_bp, NULL); + if (error) + return error; + agi_bp->b_ops = &xfs_agi_buf_ops; + agi = XFS_BUF_TO_AGI(agi_bp); + + /* Find the AGI btree roots. */ + error = xrep_agi_find_btrees(sc, fab); + if (error) + return error; + + /* Start rewriting the header and implant the btrees we found. */ + xrep_agi_init_header(sc, agi_bp, &old_agi); + xrep_agi_set_roots(sc, agi, fab); + error = xrep_agi_calc_from_btrees(sc, agi_bp); + if (error) + goto out_revert; + + /* Reinitialize in-core state. */ + return xrep_agi_commit_new(sc, agi_bp, &old_agi); + +out_revert: + /* Mark the incore AGI state stale and revert the AGI. */ + sc->sa.pag->pagi_init = 0; + memcpy(agi, &old_agi, sizeof(old_agi)); + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 1d283360b5ab..9de321eee4ab 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -60,6 +60,7 @@ int xrep_probe(struct xfs_scrub *sc); int xrep_superblock(struct xfs_scrub *sc); int xrep_agf(struct xfs_scrub *sc); int xrep_agfl(struct xfs_scrub *sc); +int xrep_agi(struct xfs_scrub *sc); #else @@ -85,6 +86,7 @@ xrep_calc_ag_resblks( #define xrep_superblock xrep_notsupported #define xrep_agf xrep_notsupported #define xrep_agfl xrep_notsupported +#define xrep_agi xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 2670f4cf62f4..4bfae1e61d30 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -226,7 +226,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_PERAG, .setup = xchk_setup_fs, .scrub = xchk_agi, - .repair = xrep_notsupported, + .repair = xrep_agi, }, [XFS_SCRUB_TYPE_BNOBT] = { /* bnobt */ .type = ST_PERAG, From patchwork Mon Jul 30 05:48:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548429 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 92D7D139A for ; Mon, 30 Jul 2018 05:48:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FB5029917 for ; Mon, 30 Jul 2018 05:48:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 73B742991D; Mon, 30 Jul 2018 05:48:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 18A6129917 for ; Mon, 30 Jul 2018 05:48:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726087AbeG3HVt (ORCPT ); Mon, 30 Jul 2018 03:21:49 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:53616 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbeG3HVt (ORCPT ); Mon, 30 Jul 2018 03:21:49 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5iPTL005532; Mon, 30 Jul 2018 05:48:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=I0HoURuIZP9hA0dwNBH82lpM7Ag186Je/pzEmXWm/Yk=; b=OgayNf4EBTRTSxxLKEyxZuRxL15BZx0ukBSWlJH9t4hVH8VEtdRS97WxtDANw0XUXQno EIpzwL8UaiJEmnFFNhnHQ1hngpAOis9zS+v0+iFZqGZHoCVO1vfFqsrMOrLz28qVRbhj v9KUSYgzC2cbZRNl1zaiECbIT/HYGTJm2Jm3jrvmaNtilohOjUnjorPjwKCcJzppR3AL US7B3n1vzTtWiBL2rdu2gGGDjww6RTfWoLxQaYtgtzESK7a00vY9a+87+15G4wLVvmUT MvMJDpLHQrcJwrjrKUFAfp+wETGPqF+UZsBNsyjH2Ad2N04NinpfA59Vx+cT0KjDhspl Lw== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2130.oracle.com with ESMTP id 2kgfwstx1u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:25 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mOPu017488 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:24 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mNBt004103; Mon, 30 Jul 2018 05:48:24 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:48:22 -0700 Subject: [PATCH 05/14] xfs: repair free space btrees From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:48:21 -0700 Message-ID: <153292970169.24509.4581630892233165448.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Rebuild the free space btrees from the gaps in the rmap btree. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/alloc.c | 1 fs/xfs/scrub/alloc_repair.c | 581 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/common.c | 8 + fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 4 fs/xfs/scrub/trace.h | 2 fs/xfs/xfs_extent_busy.c | 14 + fs/xfs/xfs_extent_busy.h | 2 9 files changed, 610 insertions(+), 5 deletions(-) create mode 100644 fs/xfs/scrub/alloc_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 57ec46951ede..44ddd112acd2 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -164,6 +164,7 @@ xfs-$(CONFIG_XFS_QUOTA) += scrub/quota.o ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y) xfs-y += $(addprefix scrub/, \ agheader_repair.o \ + alloc_repair.o \ bitmap.o \ repair.o \ ) diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c index 036b5c7021eb..c9b34ba312ab 100644 --- a/fs/xfs/scrub/alloc.c +++ b/fs/xfs/scrub/alloc.c @@ -15,7 +15,6 @@ #include "xfs_log_format.h" #include "xfs_trans.h" #include "xfs_sb.h" -#include "xfs_alloc.h" #include "xfs_rmap.h" #include "xfs_alloc.h" #include "scrub/xfs_scrub.h" diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c new file mode 100644 index 000000000000..b228c2906de2 --- /dev/null +++ b/fs/xfs/scrub/alloc_repair.c @@ -0,0 +1,581 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_alloc.h" +#include "xfs_alloc_btree.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_inode.h" +#include "xfs_refcount.h" +#include "xfs_extent_busy.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" + +/* + * Free Space Btree Repair + * ======================= + * + * The reverse mappings are supposed to record all space usage for the entire + * AG. Therefore, we can recalculate the free extents in an AG by looking for + * gaps in the physical extents recorded in the rmapbt. On a reflink + * filesystem this is a little more tricky in that we have to be aware that + * the rmap records are allowed to overlap. + * + * We derive which blocks belonged to the old bnobt/cntbt by recording all the + * OWN_AG extents and subtracting out the blocks owned by all other OWN_AG + * metadata: the rmapbt blocks visited while iterating the reverse mappings + * and the AGFL blocks. + * + * Once we have both of those pieces, we can reconstruct the bnobt and cntbt + * by blowing out the free block state and freeing all the extents that we + * found. This adds the requirement that we can't have any busy extents in + * the AG because the busy code cannot handle duplicate records. + * + * Note that we can only rebuild both free space btrees at the same time + * because the regular extent freeing infrastructure loads both btrees at the + * same time. + * + * We use the prefix 'xrep_abt' here because we regenerate both free space + * allocation btrees at the same time. + */ + +struct xrep_abt_extent { + struct list_head list; + xfs_agblock_t bno; + xfs_extlen_t len; +}; + +struct xrep_abt { + /* Blocks owned by the rmapbt or the agfl. */ + struct xfs_bitmap nobtlist; + + /* All OWN_AG blocks. */ + struct xfs_bitmap *btlist; + + /* Free space extents. */ + struct list_head *extlist; + + struct xfs_scrub *sc; + + /* Length of extlist. */ + uint64_t nr_records; + + /* + * Next block we anticipate seeing in the rmap records. If the next + * rmap record is greater than next_bno, we have found unused space. + */ + xfs_agblock_t next_bno; + + /* Number of free blocks in this AG. */ + xfs_agblock_t nr_blocks; +}; + +/* Record extents that aren't in use from gaps in the rmap records. */ +STATIC int +xrep_abt_walk_rmap( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_abt *ra = priv; + struct xrep_abt_extent *rae; + xfs_fsblock_t fsb; + int error; + + /* Record all the OWN_AG blocks... */ + if (rec->rm_owner == XFS_RMAP_OWN_AG) { + fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno, + rec->rm_startblock); + error = xfs_bitmap_set(ra->btlist, fsb, rec->rm_blockcount); + if (error) + return error; + } + + /* ...and all the rmapbt blocks... */ + error = xfs_bitmap_set_btcur_path(&ra->nobtlist, cur); + if (error) + return error; + + /* ...and all the free space. */ + if (rec->rm_startblock > ra->next_bno) { + trace_xrep_abt_walk_rmap(cur->bc_mp, cur->bc_private.a.agno, + ra->next_bno, rec->rm_startblock - ra->next_bno, + XFS_RMAP_OWN_NULL, 0, 0); + + rae = kmem_alloc(sizeof(struct xrep_abt_extent), KM_MAYFAIL); + if (!rae) + return -ENOMEM; + INIT_LIST_HEAD(&rae->list); + rae->bno = ra->next_bno; + rae->len = rec->rm_startblock - ra->next_bno; + list_add_tail(&rae->list, ra->extlist); + ra->nr_records++; + ra->nr_blocks += rae->len; + } + ra->next_bno = max_t(xfs_agblock_t, ra->next_bno, + rec->rm_startblock + rec->rm_blockcount); + return 0; +} + +/* Collect an AGFL block for the not-to-release list. */ +static int +xrep_abt_walk_agfl( + struct xfs_mount *mp, + xfs_agblock_t bno, + void *priv) +{ + struct xrep_abt *ra = priv; + xfs_fsblock_t fsb; + + fsb = XFS_AGB_TO_FSB(mp, ra->sc->sa.agno, bno); + return xfs_bitmap_set(&ra->nobtlist, fsb, 1); +} + +/* Compare two free space extents. */ +static int +xrep_abt_extent_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_abt_extent *ap; + struct xrep_abt_extent *bp; + + ap = container_of(a, struct xrep_abt_extent, list); + bp = container_of(b, struct xrep_abt_extent, list); + + if (ap->bno > bp->bno) + return 1; + else if (ap->bno < bp->bno) + return -1; + return 0; +} + +/* Free an extent, which creates a record in the bnobt/cntbt. */ +STATIC int +xrep_abt_free_extent( + struct xfs_scrub *sc, + xfs_fsblock_t fsbno, + xfs_extlen_t len, + struct xfs_owner_info *oinfo) +{ + int error; + + error = xfs_free_extent(sc->tp, fsbno, len, oinfo, 0); + if (error) + return error; + error = xrep_roll_ag_trans(sc); + if (error) + return error; + return xfs_mod_fdblocks(sc->mp, -(int64_t)len, false); +} + +/* Find the longest free extent in the list. */ +static struct xrep_abt_extent * +xrep_abt_get_longest( + struct list_head *free_extents) +{ + struct xrep_abt_extent *rae; + struct xrep_abt_extent *res = NULL; + + list_for_each_entry(rae, free_extents, list) { + if (!res || rae->len > res->len) + res = rae; + } + return res; +} + +/* + * Allocate a block from the (cached) first extent in the AG. In theory + * this should never fail, since we already checked that there was enough + * space to handle the new btrees. + */ +STATIC xfs_fsblock_t +xrep_abt_alloc_block( + struct xfs_scrub *sc, + struct list_head *free_extents) +{ + struct xrep_abt_extent *ext; + + /* Pull the first free space extent off the list, and... */ + ext = list_first_entry(free_extents, struct xrep_abt_extent, list); + + /* ...take its first block. */ + ext->bno++; + ext->len--; + if (ext->len == 0) { + list_del(&ext->list); + kmem_free(ext); + } + + return XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, ext->bno - 1); +} + +/* Free every record in the extent list. */ +STATIC void +xrep_abt_cancel_freelist( + struct list_head *extlist) +{ + struct xrep_abt_extent *rae; + struct xrep_abt_extent *n; + + list_for_each_entry_safe(rae, n, extlist, list) { + list_del(&rae->list); + kmem_free(rae); + } +} + +/* + * Iterate all reverse mappings to find (1) the free extents, (2) the OWN_AG + * extents, (3) the rmapbt blocks, and (4) the AGFL blocks. The free space is + * (1) + (2) - (3) - (4). Figure out if we have enough free space to + * reconstruct the free space btrees. Caller must clean up the input lists + * if something goes wrong. + */ +STATIC int +xrep_abt_find_freespace( + struct xfs_scrub *sc, + struct list_head *free_extents, + struct xfs_bitmap *old_allocbt_blocks) +{ + struct xrep_abt ra; + struct xrep_abt_extent *rae; + struct xfs_btree_cur *cur; + struct xfs_mount *mp = sc->mp; + xfs_agblock_t agend; + xfs_agblock_t nr_blocks; + int error; + + ra.extlist = free_extents; + ra.btlist = old_allocbt_blocks; + xfs_bitmap_init(&ra.nobtlist); + ra.next_bno = 0; + ra.nr_records = 0; + ra.nr_blocks = 0; + ra.sc = sc; + + /* + * Iterate all the reverse mappings to find gaps in the physical + * mappings, all the OWN_AG blocks, and all the rmapbt extents. + */ + cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno); + error = xfs_rmap_query_all(cur, xrep_abt_walk_rmap, &ra); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + cur = NULL; + + /* Insert a record for space between the last rmap and EOAG. */ + agend = be32_to_cpu(XFS_BUF_TO_AGF(sc->sa.agf_bp)->agf_length); + if (ra.next_bno < agend) { + rae = kmem_alloc(sizeof(struct xrep_abt_extent), KM_MAYFAIL); + if (!rae) { + error = -ENOMEM; + goto err; + } + INIT_LIST_HEAD(&rae->list); + rae->bno = ra.next_bno; + rae->len = agend - ra.next_bno; + list_add_tail(&rae->list, free_extents); + ra.nr_records++; + ra.nr_blocks += rae->len; + } + + /* Collect all the AGFL blocks. */ + error = xfs_agfl_walk(mp, XFS_BUF_TO_AGF(sc->sa.agf_bp), + sc->sa.agfl_bp, xrep_abt_walk_agfl, &ra); + if (error) + goto err; + + /* Do we have enough space to rebuild both freespace btrees? */ + nr_blocks = 2 * xfs_allocbt_calc_size(mp, ra.nr_records); + if (!xrep_ag_has_space(sc->sa.pag, nr_blocks, XFS_AG_RESV_NONE) || + ra.nr_blocks < nr_blocks) { + error = -ENOSPC; + goto err; + } + + /* Compute the old bnobt/cntbt blocks. */ + error = xfs_bitmap_disunion(old_allocbt_blocks, &ra.nobtlist); +err: + xfs_bitmap_destroy(&ra.nobtlist); + if (cur) + xfs_btree_del_cursor(cur, error); + return error; +} + +/* + * Reset the global free block counter and the per-AG counters to make it look + * like this AG has no free space. + */ +STATIC int +xrep_abt_reset_counters( + struct xfs_scrub *sc, + int *log_flags) +{ + struct xfs_perag *pag = sc->sa.pag; + struct xfs_agf *agf; + xfs_agblock_t new_btblks; + xfs_agblock_t to_free; + int error; + + /* + * Since we're abandoning the old bnobt/cntbt, we have to decrease + * fdblocks by the # of blocks in those trees. btreeblks counts the + * non-root blocks of the free space and rmap btrees. Do this before + * resetting the AGF counters. + */ + agf = XFS_BUF_TO_AGF(sc->sa.agf_bp); + + /* rmap_blocks accounts root block, btreeblks doesn't */ + new_btblks = be32_to_cpu(agf->agf_rmap_blocks) - 1; + + /* btreeblks doesn't account bno/cnt root blocks */ + to_free = pag->pagf_btreeblks + 2; + + /* and don't account for the blocks we aren't freeing */ + to_free -= new_btblks; + + error = xfs_mod_fdblocks(sc->mp, -(int64_t)to_free, false); + if (error) + return error; + + /* + * Reset the per-AG info, both incore and ondisk. Mark the incore + * state stale in case we fail out of here. + */ + ASSERT(pag->pagf_init); + pag->pagf_init = 0; + pag->pagf_btreeblks = new_btblks; + pag->pagf_freeblks = 0; + pag->pagf_longest = 0; + + agf->agf_btreeblks = cpu_to_be32(new_btblks); + agf->agf_freeblks = 0; + agf->agf_longest = 0; + *log_flags |= XFS_AGF_BTREEBLKS | XFS_AGF_LONGEST | XFS_AGF_FREEBLKS; + + return 0; +} + +/* Initialize a new free space btree root and implant into AGF. */ +STATIC int +xrep_abt_reset_btree( + struct xfs_scrub *sc, + xfs_btnum_t btnum, + struct list_head *free_extents) +{ + struct xfs_owner_info oinfo; + struct xfs_buf *bp; + struct xfs_perag *pag = sc->sa.pag; + struct xfs_mount *mp = sc->mp; + struct xfs_agf *agf = XFS_BUF_TO_AGF(sc->sa.agf_bp); + xfs_fsblock_t fsbno; + int error; + + /* Allocate new root block. */ + fsbno = xrep_abt_alloc_block(sc, free_extents); + if (fsbno == NULLFSBLOCK) + return -ENOSPC; + + /* Initialize new tree root. */ + error = xrep_init_btblock(sc, fsbno, &bp, btnum, &xfs_allocbt_buf_ops); + if (error) + return error; + + /* Implant into AGF. */ + agf->agf_roots[btnum] = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, fsbno)); + agf->agf_levels[btnum] = cpu_to_be32(1); + + /* Add rmap records for the btree roots */ + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG); + error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno, + XFS_FSB_TO_AGBNO(mp, fsbno), 1, &oinfo); + if (error) + return error; + + /* Reset the incore state. */ + pag->pagf_levels[btnum] = 1; + + return 0; +} + +/* Initialize new bnobt/cntbt roots and implant them into the AGF. */ +STATIC int +xrep_abt_reset_btrees( + struct xfs_scrub *sc, + struct list_head *free_extents, + int *log_flags) +{ + int error; + + error = xrep_abt_reset_btree(sc, XFS_BTNUM_BNOi, free_extents); + if (error) + return error; + error = xrep_abt_reset_btree(sc, XFS_BTNUM_CNTi, free_extents); + if (error) + return error; + + *log_flags |= XFS_AGF_ROOTS | XFS_AGF_LEVELS; + return 0; +} + +/* + * Make our new freespace btree roots permanent so that we can start freeing + * unused space back into the AG. + */ +STATIC int +xrep_abt_commit_new( + struct xfs_scrub *sc, + struct xfs_bitmap *old_allocbt_blocks, + int log_flags) +{ + int error; + + xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, log_flags); + + /* Invalidate the old freespace btree blocks and commit. */ + error = xrep_invalidate_blocks(sc, old_allocbt_blocks); + if (error) + return error; + error = xrep_roll_ag_trans(sc); + if (error) + return error; + + /* Now that we've succeeded, mark the incore state valid again. */ + sc->sa.pag->pagf_init = 1; + return 0; +} + +/* Build new free space btrees and dispose of the old one. */ +STATIC int +xrep_abt_rebuild_trees( + struct xfs_scrub *sc, + struct list_head *free_extents, + struct xfs_bitmap *old_allocbt_blocks) +{ + struct xfs_owner_info oinfo; + struct xrep_abt_extent *rae; + struct xrep_abt_extent *n; + struct xrep_abt_extent *longest; + int error; + + xfs_rmap_skip_owner_update(&oinfo); + + /* + * Insert the longest free extent in case it's necessary to + * refresh the AGFL with multiple blocks. If there is no longest + * extent, we had exactly the free space we needed; we're done. + */ + longest = xrep_abt_get_longest(free_extents); + if (!longest) + goto done; + error = xrep_abt_free_extent(sc, + XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, longest->bno), + longest->len, &oinfo); + list_del(&longest->list); + kmem_free(longest); + if (error) + return error; + + /* Insert records into the new btrees. */ + list_for_each_entry_safe(rae, n, free_extents, list) { + error = xrep_abt_free_extent(sc, + XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, rae->bno), + rae->len, &oinfo); + if (error) + return error; + list_del(&rae->list); + kmem_free(rae); + } + +done: + /* Free all the OWN_AG blocks that are not in the rmapbt/agfl. */ + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG); + return xrep_reap_extents(sc, old_allocbt_blocks, &oinfo, + XFS_AG_RESV_NONE); +} + +/* Repair the freespace btrees for some AG. */ +int +xrep_allocbt( + struct xfs_scrub *sc) +{ + struct list_head free_extents; + struct xfs_bitmap old_allocbt_blocks; + struct xfs_mount *mp = sc->mp; + int log_flags = 0; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + + /* + * Make sure the busy extent list is clear because we can't put + * extents on there twice. + */ + if (!xfs_extent_busy_list_empty(sc->sa.pag)) + return -EDEADLOCK; + + /* Collect the free space data and find the old btree blocks. */ + INIT_LIST_HEAD(&free_extents); + xfs_bitmap_init(&old_allocbt_blocks); + error = xrep_abt_find_freespace(sc, &free_extents, &old_allocbt_blocks); + if (error) + goto out; + + /* Make sure we got some free space. */ + if (list_empty(&free_extents)) { + error = -ENOSPC; + goto out; + } + + /* + * Sort the free extents by block number to avoid bnobt splits when we + * rebuild the free space btrees. + */ + list_sort(NULL, &free_extents, xrep_abt_extent_cmp); + + /* + * Blow out the old free space btrees. This is the point at which + * we are no longer able to bail out gracefully. + */ + error = xrep_abt_reset_counters(sc, &log_flags); + if (error) + goto out; + error = xrep_abt_reset_btrees(sc, &free_extents, &log_flags); + if (error) + goto out; + error = xrep_abt_commit_new(sc, &old_allocbt_blocks, log_flags); + if (error) + goto out; + + /* Now rebuild the freespace information. */ + error = xrep_abt_rebuild_trees(sc, &free_extents, &old_allocbt_blocks); +out: + xrep_abt_cancel_freelist(&free_extents); + xfs_bitmap_destroy(&old_allocbt_blocks); + return error; +} diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 346b02abccf7..0fb949afaca9 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -623,8 +623,14 @@ xchk_setup_ag_btree( * expensive operation should be performed infrequently and only * as a last resort. Any caller that sets force_log should * document why they need to do so. + * + * Force everything in memory out to disk if we're repairing. + * This ensures we won't get tripped up by btree blocks sitting + * in memory waiting to have LSNs stamped in. The AGF/AGI repair + * routines use any available rmap data to try to find a btree + * root that also passes the read verifiers. */ - if (force_log) { + if (force_log || (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) { error = xchk_checkpoint_log(mp); if (error) return error; diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 9de321eee4ab..bc1a5f1cbcdc 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -61,6 +61,7 @@ int xrep_superblock(struct xfs_scrub *sc); int xrep_agf(struct xfs_scrub *sc); int xrep_agfl(struct xfs_scrub *sc); int xrep_agi(struct xfs_scrub *sc); +int xrep_allocbt(struct xfs_scrub *sc); #else @@ -87,6 +88,7 @@ xrep_calc_ag_resblks( #define xrep_agf xrep_notsupported #define xrep_agfl xrep_notsupported #define xrep_agi xrep_notsupported +#define xrep_allocbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 4bfae1e61d30..2133a3199372 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -232,13 +232,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_PERAG, .setup = xchk_setup_ag_allocbt, .scrub = xchk_bnobt, - .repair = xrep_notsupported, + .repair = xrep_allocbt, }, [XFS_SCRUB_TYPE_CNTBT] = { /* cntbt */ .type = ST_PERAG, .setup = xchk_setup_ag_allocbt, .scrub = xchk_cntbt, - .repair = xrep_notsupported, + .repair = xrep_allocbt, }, [XFS_SCRUB_TYPE_INOBT] = { /* inobt */ .type = ST_PERAG, diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 4e20f0e48232..26bd5dc68efe 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -551,7 +551,7 @@ DEFINE_EVENT(xrep_rmap_class, name, \ xfs_agblock_t agbno, xfs_extlen_t len, \ uint64_t owner, uint64_t offset, unsigned int flags), \ TP_ARGS(mp, agno, agbno, len, owner, offset, flags)) -DEFINE_REPAIR_RMAP_EVENT(xrep_alloc_extent_fn); +DEFINE_REPAIR_RMAP_EVENT(xrep_abt_walk_rmap); DEFINE_REPAIR_RMAP_EVENT(xrep_ialloc_extent_fn); DEFINE_REPAIR_RMAP_EVENT(xrep_rmap_extent_fn); DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_extent_fn); diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index 0ed68379e551..82f99633a597 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -657,3 +657,17 @@ xfs_extent_busy_ag_cmp( diff = b1->bno - b2->bno; return diff; } + +/* Are there any busy extents in this AG? */ +bool +xfs_extent_busy_list_empty( + struct xfs_perag *pag) +{ + spin_lock(&pag->pagb_lock); + if (pag->pagb_tree.rb_node) { + spin_unlock(&pag->pagb_lock); + return false; + } + spin_unlock(&pag->pagb_lock); + return true; +} diff --git a/fs/xfs/xfs_extent_busy.h b/fs/xfs/xfs_extent_busy.h index 990ab3891971..2f8c73c712c6 100644 --- a/fs/xfs/xfs_extent_busy.h +++ b/fs/xfs/xfs_extent_busy.h @@ -65,4 +65,6 @@ static inline void xfs_extent_busy_sort(struct list_head *list) list_sort(NULL, list, xfs_extent_busy_ag_cmp); } +bool xfs_extent_busy_list_empty(struct xfs_perag *pag); + #endif /* __XFS_EXTENT_BUSY_H__ */ From patchwork Mon Jul 30 05:48:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548431 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 10222139A for ; Mon, 30 Jul 2018 05:48:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F135D29917 for ; Mon, 30 Jul 2018 05:48:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E51A72991D; Mon, 30 Jul 2018 05:48:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 65C5529917 for ; Mon, 30 Jul 2018 05:48:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726322AbeG3HV4 (ORCPT ); Mon, 30 Jul 2018 03:21:56 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:59310 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbeG3HVz (ORCPT ); Mon, 30 Jul 2018 03:21:55 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5i6JP027691; Mon, 30 Jul 2018 05:48:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=UumQHG95b3ET4q9IagoUqhgTKPAk1Olph7qn6Xgn+cA=; b=a3HAxzM4zC0pXso24XdEEv+gLDZq78rWLDIlRTCA9vzeBepK56FSO4Y5T8TFt53KX2LK 3qoP5xub/apspb4D+CGHph+mEeI5tKZ6K615QnoxlY1VQJjHqSMmz8ffgHvZO6ISCo+V 0kahbnK79x6ziyS12AnIO9HZ7FL2zp3+6pj2fGeueJYYiNIm66wopro7nzMzWg0dDXVo rSp6iUFistNe8/Xm9gJheXOAesykmiBlPu44sYXSyN8ADCmiYmCXWq4rFxwkMopnXMZm bn/TE14Kj4Eo6Uk/jJtpgQIl9QVN8nmJZCDfR18zANQPwWcKyeMCoZPduSub/t5GJikC xw== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2kge0cu0mg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:31 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mUoU017604 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:30 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6U5mUHI003180; Mon, 30 Jul 2018 05:48:30 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:48:29 -0700 Subject: [PATCH 06/14] xfs: repair inode btrees From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:48:28 -0700 Message-ID: <153292970836.24509.597298447307205186.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Use the rmapbt to find inode chunks, query the chunks to compute hole and free masks, and with that information rebuild the inobt and finobt. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/common.c | 2 fs/xfs/scrub/ialloc_repair.c | 673 ++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.c | 20 + fs/xfs/scrub/repair.h | 11 + fs/xfs/scrub/scrub.c | 4 fs/xfs/scrub/scrub.h | 1 fs/xfs/scrub/trace.h | 4 8 files changed, 712 insertions(+), 4 deletions(-) create mode 100644 fs/xfs/scrub/ialloc_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 44ddd112acd2..af1dc9aeb1a7 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -166,6 +166,7 @@ xfs-y += $(addprefix scrub/, \ agheader_repair.o \ alloc_repair.o \ bitmap.o \ + ialloc_repair.o \ repair.o \ ) endif diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 0fb949afaca9..67df7ea8798d 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -516,6 +516,8 @@ xchk_ag_free( struct xchk_ag *sa) { xchk_ag_btcur_free(sa); + if (sa->pag != NULL && sc->reset_perag_resv) + xrep_reset_perag_resv(sc); if (sa->agfl_bp) { xfs_trans_brelse(sc->tp, sa->agfl_bp); sa->agfl_bp = NULL; diff --git a/fs/xfs/scrub/ialloc_repair.c b/fs/xfs/scrub/ialloc_repair.c new file mode 100644 index 000000000000..126135c1a147 --- /dev/null +++ b/fs/xfs/scrub/ialloc_repair.c @@ -0,0 +1,673 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_alloc.h" +#include "xfs_ialloc.h" +#include "xfs_ialloc_btree.h" +#include "xfs_icache.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_log.h" +#include "xfs_trans_priv.h" +#include "xfs_error.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" + +/* + * Inode Btree Repair + * ================== + * + * A quick refresher of inode btrees on a v5 filesystem: + * + * - Each inode btree record can describe a single 'inode chunk'. The chunk + * size is defined to be 64 inodes. If sparse inodes are enabled, every + * inobt record must be aligned to the chunk size. A chunk can be smaller + * than a fs block. One must be careful with 64k-block filesystems whose + * inodes are smaller than 1k. + * + * - Inode buffers are read into memory in units of 'inode clusters'. However + * many inodes fit in a cluster buffer is the smallest number of inodes that + * can be allocated or freed. Clusters are never larger than a chunk and + * never smaller than a fs block. If sparse inodes are not enabled, then + * records can be aligned to a cluster. + * + * - If sparse inodes are enabled, the holemask field will be active. Each + * bit of the holemask represents 4 potential inodes; if set, the + * corresponding space does *not* contain inodes and must be left alone. + * + * So what's the rebuild algorithm? + * + * Iterate the reverse mapping records looking for OWN_INODES and OWN_INOBT + * records. The OWN_INOBT records are the old inode btree blocks and will be + * cleared out after we've rebuilt the tree. Each possible inode chunk within + * an OWN_INODES record will be read in and the freemask calculated from the + * i_mode data in the inode chunk. For sparse inodes the holemask will be + * calculated by creating the properly aligned inobt record and punching out + * any chunk that's missing. Inode allocations and frees grab the AGI first, + * so repair protects itself from concurrent access by locking the AGI. + * + * Once we've reconstructed all the inode records, we can create new inode + * btree roots and reload the btrees. We rebuild both inode trees at the same + * time because they have the same rmap owner and it would be more complex to + * figure out if the other tree isn't in need of a rebuild and which OWN_INOBT + * blocks it owns. We have all the data we need to build both, so dump + * everything and start over. + * + * We use the prefix 'xrep_ibt' because we rebuild both inode btrees. + */ + +struct xrep_ibt_extent { + struct list_head list; + xfs_inofree_t freemask; + xfs_agino_t startino; + unsigned int count; + unsigned int usedcount; + uint16_t holemask; +}; + +struct xrep_ibt { + /* Reconstructed inode records. */ + struct list_head *extlist; + + /* Old inode btree blocks we found in the rmap. */ + struct xfs_bitmap *btlist; + + struct xfs_scrub *sc; + + /* Number of inode btree block records. */ + uint64_t nr_records; +}; + +/* + * Is this inode in use? If the inode is in memory we can tell from i_mode, + * otherwise we have to check di_mode in the on-disk buffer. We only care + * that the high (i.e. non-permission) bits of _mode are zero. This should be + * safe because repair keeps all AG headers locked until the end, and process + * trying to perform an inode allocation/free must lock the AGI. + */ +STATIC int +xrep_ibt_check_free( + struct xfs_scrub *sc, + struct xfs_buf *bp, + xfs_ino_t fsino, + xfs_agino_t bpino, + bool *inuse) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_dinode *dip; + int error; + + /* Will the in-core inode tell us if it's in use? */ + error = xfs_icache_inode_is_allocated(mp, sc->tp, fsino, inuse); + if (!error) + return 0; + + /* Inode uncached or half assembled, read disk buffer */ + dip = xfs_buf_offset(bp, bpino * mp->m_sb.sb_inodesize); + if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC) + return -EFSCORRUPTED; + + if (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino) + return -EFSCORRUPTED; + + *inuse = dip->di_mode != 0; + return 0; +} + +/* + * For each inode cluster covering the physical extent recorded by the rmapbt, + * we must calculate the properly aligned startino of that cluster, then + * iterate each cluster to fill in used and filled masks appropriately. We + * then use the (startino, used, filled) information to construct the + * appropriate inode records. + */ +STATIC int +xrep_ibt_process_cluster( + struct xrep_ibt *ri, + xfs_agblock_t agbno, + int blks_per_cluster, + xfs_agino_t rec_agino) +{ + struct xfs_imap imap; + struct xrep_ibt_extent *rie; + struct xfs_dinode *dip; + struct xfs_buf *bp; + struct xfs_scrub *sc = ri->sc; + struct xfs_mount *mp = sc->mp; + xfs_ino_t fsino; + xfs_inofree_t usedmask; + xfs_agino_t nr_inodes; + xfs_agino_t startino; + xfs_agino_t clusterino; + xfs_agino_t clusteroff; + xfs_agino_t agino; + uint16_t fillmask; + bool inuse; + int usedcount; + int error; + + /* The per-AG inum of this inode cluster. */ + agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0); + + /* The per-AG inum of the inobt record. */ + startino = rec_agino + rounddown(agino - rec_agino, + XFS_INODES_PER_CHUNK); + + /* The per-AG inum of the cluster within the inobt record. */ + clusteroff = agino - startino; + + /* Every inode in this holemask slot is filled. */ + nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0); + fillmask = xfs_inobt_maskn(clusteroff / XFS_INODES_PER_HOLEMASK_BIT, + nr_inodes / XFS_INODES_PER_HOLEMASK_BIT); + + /* + * Grab the inode cluster buffer. This is safe to do with a broken + * inobt because imap_to_bp directly maps the buffer without touching + * either inode btree. + */ + imap.im_blkno = XFS_AGB_TO_DADDR(mp, sc->sa.agno, agbno); + imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster); + imap.im_boffset = 0; + error = xfs_imap_to_bp(mp, sc->tp, &imap, &dip, &bp, 0, + XFS_IGET_UNTRUSTED); + if (error) + return error; + + usedmask = 0; + usedcount = 0; + /* Which inodes within this cluster are free? */ + for (clusterino = 0; clusterino < nr_inodes; clusterino++) { + fsino = XFS_AGINO_TO_INO(mp, sc->sa.agno, agino + clusterino); + error = xrep_ibt_check_free(sc, bp, fsino, + clusterino, &inuse); + if (error) { + xfs_trans_brelse(sc->tp, bp); + return error; + } + if (inuse) { + usedcount++; + usedmask |= XFS_INOBT_MASK(clusteroff + clusterino); + } + } + xfs_trans_brelse(sc->tp, bp); + + /* + * If the last item in the list is our chunk record, + * update that. + */ + if (!list_empty(ri->extlist)) { + rie = list_last_entry(ri->extlist, struct xrep_ibt_extent, + list); + if (rie->startino + XFS_INODES_PER_CHUNK > startino) { + rie->freemask &= ~usedmask; + rie->holemask &= ~fillmask; + rie->count += nr_inodes; + rie->usedcount += usedcount; + return 0; + } + } + + /* New inode chunk; add to the list. */ + rie = kmem_alloc(sizeof(struct xrep_ibt_extent), KM_MAYFAIL); + if (!rie) + return -ENOMEM; + + INIT_LIST_HEAD(&rie->list); + rie->startino = startino; + rie->freemask = XFS_INOBT_ALL_FREE & ~usedmask; + rie->holemask = XFS_INOBT_ALL_FREE & ~fillmask; + rie->count = nr_inodes; + rie->usedcount = usedcount; + list_add_tail(&rie->list, ri->extlist); + ri->nr_records++; + + return 0; +} + +/* Record extents that belong to inode btrees. */ +STATIC int +xrep_ibt_walk_rmap( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_ibt *ri = priv; + struct xfs_mount *mp = cur->bc_mp; + xfs_fsblock_t fsbno; + xfs_agblock_t agbno = rec->rm_startblock; + xfs_agino_t inoalign; + xfs_agino_t agino; + xfs_agino_t rec_agino; + int blks_per_cluster; + int error = 0; + + if (xchk_should_terminate(ri->sc, &error)) + return error; + + /* Fragment of the old btrees; dispose of them later. */ + if (rec->rm_owner == XFS_RMAP_OWN_INOBT) { + fsbno = XFS_AGB_TO_FSB(mp, ri->sc->sa.agno, agbno); + return xfs_bitmap_set(ri->btlist, fsbno, rec->rm_blockcount); + } + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != XFS_RMAP_OWN_INODES) + return 0; + + blks_per_cluster = xfs_icluster_size_fsb(mp); + + if (agbno % blks_per_cluster != 0) + return -EFSCORRUPTED; + + trace_xrep_ibt_walk_rmap(mp, ri->sc->sa.agno, rec->rm_startblock, + rec->rm_blockcount, rec->rm_owner, rec->rm_offset, + rec->rm_flags); + + /* + * Determine the inode block alignment, and where the block + * ought to start if it's aligned properly. On a sparse inode + * system the rmap doesn't have to start on an alignment boundary, + * but the record does. On pre-sparse filesystems, we /must/ + * start both rmap and inobt on an alignment boundary. + */ + inoalign = xfs_ialloc_cluster_alignment(mp); + agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0); + rec_agino = XFS_OFFBNO_TO_AGINO(mp, rounddown(agbno, inoalign), 0); + if (!xfs_sb_version_hassparseinodes(&mp->m_sb) && agino != rec_agino) + return -EFSCORRUPTED; + + /* + * Set up the free/hole masks for each inode cluster that could be + * mapped by this rmap record. + */ + for (; + agbno < rec->rm_startblock + rec->rm_blockcount; + agbno += blks_per_cluster) { + error = xrep_ibt_process_cluster(ri, agbno, blks_per_cluster, + rec_agino); + if (error) + return error; + } + + return 0; +} + +/* Compare two ialloc extents. */ +static int +xrep_ibt_extent_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_ibt_extent *ap; + struct xrep_ibt_extent *bp; + + ap = container_of(a, struct xrep_ibt_extent, list); + bp = container_of(b, struct xrep_ibt_extent, list); + + if (ap->startino > bp->startino) + return 1; + else if (ap->startino < bp->startino) + return -1; + return 0; +} + +/* Insert an inode chunk record into a given btree. */ +static int +xrep_ibt_insert_btrec( + struct xfs_btree_cur *cur, + struct xrep_ibt_extent *rie) +{ + int stat; + int error; + + error = xfs_inobt_lookup(cur, rie->startino, XFS_LOOKUP_EQ, &stat); + if (error) + return error; + XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 0); + error = xfs_inobt_insert_rec(cur, rie->holemask, rie->count, + rie->count - rie->usedcount, rie->freemask, &stat); + if (error) + return error; + XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 1); + return error; +} + +/* Insert an inode chunk record into both inode btrees. */ +static int +xrep_ibt_insert_rec( + struct xfs_scrub *sc, + struct xrep_ibt_extent *rie) +{ + struct xfs_btree_cur *cur; + int error; + + trace_xrep_ibt_insert(sc->mp, sc->sa.agno, rie->startino, + rie->holemask, rie->count, rie->count - rie->usedcount, + rie->freemask); + + /* Insert into the inobt. */ + cur = xfs_inobt_init_cursor(sc->mp, sc->tp, sc->sa.agi_bp, sc->sa.agno, + XFS_BTNUM_INO); + error = xrep_ibt_insert_btrec(cur, rie); + if (error) + goto out_cur; + xfs_btree_del_cursor(cur, error); + + /* Insert into the finobt if chunk has free inodes. */ + if (xfs_sb_version_hasfinobt(&sc->mp->m_sb) && + rie->count != rie->usedcount) { + cur = xfs_inobt_init_cursor(sc->mp, sc->tp, sc->sa.agi_bp, + sc->sa.agno, XFS_BTNUM_FINO); + error = xrep_ibt_insert_btrec(cur, rie); + if (error) + goto out_cur; + xfs_btree_del_cursor(cur, error); + } + + return xrep_roll_ag_trans(sc); +out_cur: + xfs_btree_del_cursor(cur, error); + return error; +} + +/* Free every record in the inode list. */ +STATIC void +xrep_ibt_cancel_inorecs( + struct list_head *reclist) +{ + struct xrep_ibt_extent *rie; + struct xrep_ibt_extent *n; + + list_for_each_entry_safe(rie, n, reclist, list) { + list_del(&rie->list); + kmem_free(rie); + } +} + +/* + * Iterate all reverse mappings to find the inodes (OWN_INODES) and the inode + * btrees (OWN_INOBT). Figure out if we have enough free space to reconstruct + * the inode btrees. The caller must clean up the lists if anything goes + * wrong. + */ +STATIC int +xrep_ibt_find_inodes( + struct xfs_scrub *sc, + struct list_head *inode_records, + struct xfs_bitmap *old_iallocbt_blocks) +{ + struct xrep_ibt ri; + struct xfs_mount *mp = sc->mp; + struct xfs_btree_cur *cur; + xfs_agblock_t nr_blocks; + int error; + + /* Collect all reverse mappings for inode blocks. */ + ri.extlist = inode_records; + ri.btlist = old_iallocbt_blocks; + ri.nr_records = 0; + ri.sc = sc; + + cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno); + error = xfs_rmap_query_all(cur, xrep_ibt_walk_rmap, &ri); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + + /* Do we have enough space to rebuild all inode trees? */ + nr_blocks = xfs_iallocbt_calc_size(mp, ri.nr_records); + if (xfs_sb_version_hasfinobt(&mp->m_sb)) + nr_blocks *= 2; + if (!xrep_ag_has_space(sc->sa.pag, nr_blocks, XFS_AG_RESV_NONE)) + return -ENOSPC; + + return 0; + +err: + xfs_btree_del_cursor(cur, error); + return error; +} + +/* Update the AGI counters. */ +STATIC int +xrep_ibt_reset_counters( + struct xfs_scrub *sc, + struct list_head *inode_records, + int *log_flags) +{ + struct xfs_agi *agi; + struct xrep_ibt_extent *rie; + struct xfs_perag *pag = sc->sa.pag; + unsigned int count = 0; + unsigned int usedcount = 0; + unsigned int freecount; + + /* Figure out the new counters. */ + list_for_each_entry(rie, inode_records, list) { + count += rie->count; + usedcount += rie->usedcount; + } + + agi = XFS_BUF_TO_AGI(sc->sa.agi_bp); + freecount = count - usedcount; + + /* Trigger inode count recalculation */ + xfs_force_summary_recalc(sc->mp); + + /* + * Reset the per-AG info, both incore and ondisk. Mark the incore + * state stale in case we fail out of here. + */ + ASSERT(pag->pagi_init); + pag->pagi_init = 0; + pag->pagi_count = count; + pag->pagi_freecount = freecount; + + agi->agi_count = cpu_to_be32(count); + agi->agi_freecount = cpu_to_be32(freecount); + *log_flags |= XFS_AGI_COUNT | XFS_AGI_FREECOUNT; + + return 0; +} + +/* Initialize a new inode btree roots and implant it into the AGI. */ +STATIC int +xrep_ibt_reset_btree( + struct xfs_scrub *sc, + xfs_btnum_t btnum, + struct xfs_owner_info *oinfo, + enum xfs_ag_resv_type resv, + int *log_flags) +{ + struct xfs_agi *agi; + struct xfs_buf *bp; + struct xfs_mount *mp = sc->mp; + xfs_fsblock_t fsbno; + int error; + + agi = XFS_BUF_TO_AGI(sc->sa.agi_bp); + + /* Initialize new btree root. */ + error = xrep_alloc_ag_block(sc, oinfo, &fsbno, resv); + if (error) + return error; + error = xrep_init_btblock(sc, fsbno, &bp, btnum, &xfs_inobt_buf_ops); + if (error) + return error; + + switch (btnum) { + case XFS_BTNUM_INOi: + agi->agi_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, fsbno)); + agi->agi_level = cpu_to_be32(1); + *log_flags |= XFS_AGI_ROOT | XFS_AGI_LEVEL; + break; + case XFS_BTNUM_FINOi: + agi->agi_free_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, fsbno)); + agi->agi_free_level = cpu_to_be32(1); + *log_flags |= XFS_AGI_FREE_ROOT | XFS_AGI_FREE_LEVEL; + break; + default: + ASSERT(0); + } + + return 0; +} + +/* Initialize new inobt/finobt roots and implant them into the AGI. */ +STATIC int +xrep_ibt_reset_btrees( + struct xfs_scrub *sc, + struct xfs_owner_info *oinfo, + int *log_flags) +{ + enum xfs_ag_resv_type resv; + int error; + + resv = XFS_AG_RESV_NONE; + error = xrep_ibt_reset_btree(sc, XFS_BTNUM_INO, oinfo, XFS_AG_RESV_NONE, + log_flags); + if (error || !xfs_sb_version_hasfinobt(&sc->mp->m_sb)) + return error; + + /* + * If we made a per-AG reservation for the finobt then we must account + * the new block correctly. + */ + if (!sc->mp->m_inotbt_nores) + resv = XFS_AG_RESV_METADATA; + return xrep_ibt_reset_btree(sc, XFS_BTNUM_FINO, oinfo, resv, log_flags); +} + +/* Build new inode btrees and dispose of the old one. */ +STATIC int +xrep_ibt_rebuild_trees( + struct xfs_scrub *sc, + struct list_head *inode_records, + struct xfs_owner_info *oinfo, + struct xfs_bitmap *old_iallocbt_blocks) +{ + struct xrep_ibt_extent *rie; + struct xrep_ibt_extent *n; + int error; + + /* Add all records. */ + list_sort(NULL, inode_records, xrep_ibt_extent_cmp); + list_for_each_entry_safe(rie, n, inode_records, list) { + error = xrep_ibt_insert_rec(sc, rie); + if (error) + return error; + + list_del(&rie->list); + kmem_free(rie); + } + + /* Free the old inode btree blocks if they're not in use. */ + return xrep_reap_extents(sc, old_iallocbt_blocks, oinfo, + XFS_AG_RESV_NONE); +} + +/* + * Make our new inode btree roots permanent so that we can start re-adding + * inode records back into the AG. + */ +STATIC int +xrep_ibt_commit_new( + struct xfs_scrub *sc, + struct xfs_bitmap *old_iallocbt_blocks, + int log_flags) +{ + int error; + + xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, log_flags); + + /* Invalidate all the inobt/finobt blocks in btlist. */ + error = xrep_invalidate_blocks(sc, old_iallocbt_blocks); + if (error) + return error; + error = xrep_roll_ag_trans(sc); + if (error) + return error; + + /* + * Now that we've succeeded, mark the incore state valid again. If the + * finobt is enabled, make sure we reinitialize the per-AG reservations + * when we're done. + */ + sc->sa.pag->pagi_init = 1; + if (xfs_sb_version_hasfinobt(&sc->mp->m_sb)) + sc->reset_perag_resv = true; + return 0; +} + +/* Repair both inode btrees. */ +int +xrep_iallocbt( + struct xfs_scrub *sc) +{ + struct xfs_owner_info oinfo; + struct list_head inode_records; + struct xfs_bitmap old_iallocbt_blocks; + struct xfs_mount *mp = sc->mp; + int log_flags = 0; + int error = 0; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + + /* Collect the free space data and find the old btree blocks. */ + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT); + INIT_LIST_HEAD(&inode_records); + xfs_bitmap_init(&old_iallocbt_blocks); + error = xrep_ibt_find_inodes(sc, &inode_records, &old_iallocbt_blocks); + if (error) + goto out; + + /* + * Blow out the old inode btrees. This is the point at which + * we are no longer able to bail out gracefully. + */ + error = xrep_ibt_reset_counters(sc, &inode_records, &log_flags); + if (error) + goto out; + error = xrep_ibt_reset_btrees(sc, &oinfo, &log_flags); + if (error) + goto out; + error = xrep_ibt_commit_new(sc, &old_iallocbt_blocks, log_flags); + if (error) + goto out; + + /* Now rebuild the inode information. */ + error = xrep_ibt_rebuild_trees(sc, &inode_records, &oinfo, + &old_iallocbt_blocks); + if (error) + goto out; +out: + xrep_ibt_cancel_inorecs(&inode_records); + xfs_bitmap_destroy(&old_iallocbt_blocks); + return error; +} diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 17cf48564390..a44deb6f06ab 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -880,3 +880,23 @@ xrep_ino_dqattach( return error; } + +/* + * Reinitialize the per-AG block reservation for the AG we just fixed. + */ +int +xrep_reset_perag_resv( + struct xfs_scrub *sc) +{ + int error; + + ASSERT(sc->ops->type == ST_PERAG); + ASSERT(sc->tp); + + error = xfs_ag_resv_free(sc->sa.pag); + if (error) + goto out; + error = xfs_ag_resv_init(sc->sa.pag, sc->tp); +out: + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index bc1a5f1cbcdc..0cc53dee3228 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -53,6 +53,7 @@ int xrep_find_ag_btree_roots(struct xfs_scrub *sc, struct xfs_buf *agf_bp, struct xrep_find_ag_btree *btree_info, struct xfs_buf *agfl_bp); void xrep_force_quotacheck(struct xfs_scrub *sc, uint dqtype); int xrep_ino_dqattach(struct xfs_scrub *sc); +int xrep_reset_perag_resv(struct xfs_scrub *sc); /* Metadata repairers */ @@ -62,6 +63,7 @@ int xrep_agf(struct xfs_scrub *sc); int xrep_agfl(struct xfs_scrub *sc); int xrep_agi(struct xfs_scrub *sc); int xrep_allocbt(struct xfs_scrub *sc); +int xrep_iallocbt(struct xfs_scrub *sc); #else @@ -83,12 +85,21 @@ xrep_calc_ag_resblks( return 0; } +static inline int +xrep_reset_perag_resv( + struct xfs_scrub *sc) +{ + ASSERT(0); + return -EOPNOTSUPP; +} + #define xrep_probe xrep_notsupported #define xrep_superblock xrep_notsupported #define xrep_agf xrep_notsupported #define xrep_agfl xrep_notsupported #define xrep_agi xrep_notsupported #define xrep_allocbt xrep_notsupported +#define xrep_iallocbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 2133a3199372..631b0b06db99 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -244,14 +244,14 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_PERAG, .setup = xchk_setup_ag_iallocbt, .scrub = xchk_inobt, - .repair = xrep_notsupported, + .repair = xrep_iallocbt, }, [XFS_SCRUB_TYPE_FINOBT] = { /* finobt */ .type = ST_PERAG, .setup = xchk_setup_ag_iallocbt, .scrub = xchk_finobt, .has = xfs_sb_version_hasfinobt, - .repair = xrep_notsupported, + .repair = xrep_iallocbt, }, [XFS_SCRUB_TYPE_RMAPBT] = { /* rmapbt */ .type = ST_PERAG, diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index af323b229c4b..762db46fd696 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -64,6 +64,7 @@ struct xfs_scrub { uint ilock_flags; bool try_harder; bool has_quotaofflock; + bool reset_perag_resv; /* State tracking for single-AG operations. */ struct xchk_ag sa; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 26bd5dc68efe..9126dc66f726 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -552,7 +552,7 @@ DEFINE_EVENT(xrep_rmap_class, name, \ uint64_t owner, uint64_t offset, unsigned int flags), \ TP_ARGS(mp, agno, agbno, len, owner, offset, flags)) DEFINE_REPAIR_RMAP_EVENT(xrep_abt_walk_rmap); -DEFINE_REPAIR_RMAP_EVENT(xrep_ialloc_extent_fn); +DEFINE_REPAIR_RMAP_EVENT(xrep_ibt_walk_rmap); DEFINE_REPAIR_RMAP_EVENT(xrep_rmap_extent_fn); DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_extent_fn); @@ -700,7 +700,7 @@ TRACE_EVENT(xrep_reset_counters, MAJOR(__entry->dev), MINOR(__entry->dev)) ) -TRACE_EVENT(xrep_ialloc_insert, +TRACE_EVENT(xrep_ibt_insert, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agino_t startino, uint16_t holemask, uint8_t count, uint8_t freecount, uint64_t freemask), From patchwork Mon Jul 30 05:48:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548433 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C406B139A for ; Mon, 30 Jul 2018 05:48:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B14282991D for ; Mon, 30 Jul 2018 05:48:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A527829932; Mon, 30 Jul 2018 05:48:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7580E2991D for ; Mon, 30 Jul 2018 05:48:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726344AbeG3HWK (ORCPT ); Mon, 30 Jul 2018 03:22:10 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:58750 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726259AbeG3HWK (ORCPT ); Mon, 30 Jul 2018 03:22:10 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5hxXU194537; Mon, 30 Jul 2018 05:48:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=PvCUaJn2p0lRHLn8GcD/R0yd0i+8cClaelqKWQlE3NU=; b=pa/IPUYUglzhHOoqAaF+3lmdWN2KMSpGC18SibaJ2P4Ci//2YZjfy3oRv5LBG4kOgig0 ulCBdWikDSn+GJUUpH2wHVaeRpsCW9vGzwxtdnXz+7Qyl0NZAm3FLnVpHQDletnjGgRH V7BQVud3S/7V4xz3kVEuPTMO6w4d4Oye8nbAlEbuSOdoeCERzxqk4wCTcogoVY/hvsL6 j1C9DXYY7SLoeoHS/ey6lENAnFV190J0sGJNKP6sagSJEfozmkgazuZeYPLPG6SIip8L mg4IDpXbD0Yvc9EMCsvNOfJlogqMIg8ccCJXnlfuAVqqHrKiLJvivn1Pf7spw/AgZTjZ /g== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2kgh4ptt2e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:45 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mjmZ017983 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:45 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6U5mjX7031604; Mon, 30 Jul 2018 05:48:45 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:48:44 -0700 Subject: [PATCH 07/14] xfs: repair refcount btrees From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:48:34 -0700 Message-ID: <153292971493.24509.8505497430689944881.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Reconstruct the refcount data from the rmap btree. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/refcount_repair.c | 586 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 2 4 files changed, 590 insertions(+), 1 deletion(-) create mode 100644 fs/xfs/scrub/refcount_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index af1dc9aeb1a7..4ca97e026f94 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -167,6 +167,7 @@ xfs-y += $(addprefix scrub/, \ alloc_repair.o \ bitmap.o \ ialloc_repair.o \ + refcount_repair.o \ repair.o \ ) endif diff --git a/fs/xfs/scrub/refcount_repair.c b/fs/xfs/scrub/refcount_repair.c new file mode 100644 index 000000000000..f6076083dd94 --- /dev/null +++ b/fs/xfs/scrub/refcount_repair.c @@ -0,0 +1,586 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_itable.h" +#include "xfs_alloc.h" +#include "xfs_ialloc.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_refcount.h" +#include "xfs_refcount_btree.h" +#include "xfs_error.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" + +/* + * Rebuilding the Reference Count Btree + * ==================================== + * + * This algorithm is "borrowed" from xfs_repair. Imagine the rmap + * entries as rectangles representing extents of physical blocks, and + * that the rectangles can be laid down to allow them to overlap each + * other; then we know that we must emit a refcnt btree entry wherever + * the amount of overlap changes, i.e. the emission stimulus is + * level-triggered: + * + * - --- + * -- ----- ---- --- ------ + * -- ---- ----------- ---- --------- + * -------------------------------- ----------- + * ^ ^ ^^ ^^ ^ ^^ ^^^ ^^^^ ^ ^^ ^ ^ ^ + * 2 1 23 21 3 43 234 2123 1 01 2 3 0 + * + * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner). + * + * Note that in the actual refcnt btree we don't store the refcount < 2 + * cases because the bnobt tells us which blocks are free; single-use + * blocks aren't recorded in the bnobt or the refcntbt. If the rmapbt + * supports storing multiple entries covering a given block we could + * theoretically dispense with the refcntbt and simply count rmaps, but + * that's inefficient in the (hot) write path, so we'll take the cost of + * the extra tree to save time. Also there's no guarantee that rmap + * will be enabled. + * + * Given an array of rmaps sorted by physical block number, a starting + * physical block (sp), a bag to hold rmaps that cover sp, and the next + * physical block where the level changes (np), we can reconstruct the + * refcount btree as follows: + * + * While there are still unprocessed rmaps in the array, + * - Set sp to the physical block (pblk) of the next unprocessed rmap. + * - Add to the bag all rmaps in the array where startblock == sp. + * - Set np to the physical block where the bag size will change. This + * is the minimum of (the pblk of the next unprocessed rmap) and + * (startblock + len of each rmap in the bag). + * - Record the bag size as old_bag_size. + * + * - While the bag isn't empty, + * - Remove from the bag all rmaps where startblock + len == np. + * - Add to the bag all rmaps in the array where startblock == np. + * - If the bag size isn't old_bag_size, store the refcount entry + * (sp, np - sp, bag_size) in the refcnt btree. + * - If the bag is empty, break out of the inner loop. + * - Set old_bag_size to the bag size + * - Set sp = np. + * - Set np to the physical block where the bag size will change. + * This is the minimum of (the pblk of the next unprocessed rmap) + * and (startblock + len of each rmap in the bag). + * + * Like all the other repairers, we make a list of all the refcount + * records we need, then reinitialize the refcount btree root and + * insert all the records. + */ + +struct xrep_refc_rmap { + struct list_head list; + struct xfs_rmap_irec rmap; +}; + +struct xrep_refc_extent { + struct list_head list; + struct xfs_refcount_irec refc; +}; + +struct xrep_refc { + struct list_head rmap_bag; /* rmaps we're tracking */ + struct list_head rmap_idle; /* idle rmaps */ + struct list_head *extlist; /* refcount extents */ + struct xfs_bitmap *btlist; /* old refcountbt blocks */ + struct xfs_scrub *sc; + unsigned long nr_records;/* nr refcount extents */ + xfs_extlen_t btblocks; /* # of refcountbt blocks */ +}; + +/* Grab the next record from the rmapbt. */ +STATIC int +xrep_refc_next_rmap( + struct xfs_btree_cur *cur, + struct xrep_refc *rr, + struct xfs_rmap_irec *rec, + bool *have_rec) +{ + struct xfs_rmap_irec rmap; + struct xfs_mount *mp = cur->bc_mp; + struct xrep_refc_extent *rre; + xfs_fsblock_t fsbno; + int have_gt; + int error = 0; + + *have_rec = false; + /* + * Loop through the remaining rmaps. Remember CoW staging + * extents and the refcountbt blocks from the old tree for later + * disposal. We can only share written data fork extents, so + * keep looping until we find an rmap for one. + */ + do { + if (xchk_should_terminate(rr->sc, &error)) + goto out_error; + + error = xfs_btree_increment(cur, 0, &have_gt); + if (error) + goto out_error; + if (!have_gt) + return 0; + + error = xfs_rmap_get_rec(cur, &rmap, &have_gt); + if (error) + goto out_error; + XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 1, out_error); + + if (rmap.rm_owner == XFS_RMAP_OWN_COW) { + /* Pass CoW staging extents right through. */ + rre = kmem_alloc(sizeof(struct xrep_refc_extent), + KM_MAYFAIL); + if (!rre) + goto out_error; + + INIT_LIST_HEAD(&rre->list); + rre->refc.rc_startblock = rmap.rm_startblock + + XFS_REFC_COW_START; + rre->refc.rc_blockcount = rmap.rm_blockcount; + rre->refc.rc_refcount = 1; + list_add_tail(&rre->list, rr->extlist); + } else if (rmap.rm_owner == XFS_RMAP_OWN_REFC) { + /* refcountbt block, dump it when we're done. */ + rr->btblocks += rmap.rm_blockcount; + fsbno = XFS_AGB_TO_FSB(cur->bc_mp, + cur->bc_private.a.agno, + rmap.rm_startblock); + error = xfs_bitmap_set(rr->btlist, fsbno, + rmap.rm_blockcount); + if (error) + goto out_error; + } + } while (XFS_RMAP_NON_INODE_OWNER(rmap.rm_owner) || + xfs_internal_inum(mp, rmap.rm_owner) || + (rmap.rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK | + XFS_RMAP_UNWRITTEN))); + + *rec = rmap; + *have_rec = true; + return 0; + +out_error: + return error; +} + +/* Recycle an idle rmap or allocate a new one. */ +static struct xrep_refc_rmap * +xrep_refc_get_rmap( + struct xrep_refc *rr) +{ + struct xrep_refc_rmap *rrm; + + if (list_empty(&rr->rmap_idle)) { + rrm = kmem_alloc(sizeof(struct xrep_refc_rmap), KM_MAYFAIL); + if (!rrm) + return NULL; + INIT_LIST_HEAD(&rrm->list); + return rrm; + } + + rrm = list_first_entry(&rr->rmap_idle, struct xrep_refc_rmap, list); + list_del_init(&rrm->list); + return rrm; +} + +/* Compare two btree extents. */ +static int +xrep_refcount_extent_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_refc_extent *ap; + struct xrep_refc_extent *bp; + + ap = container_of(a, struct xrep_refc_extent, list); + bp = container_of(b, struct xrep_refc_extent, list); + + if (ap->refc.rc_startblock > bp->refc.rc_startblock) + return 1; + else if (ap->refc.rc_startblock < bp->refc.rc_startblock) + return -1; + return 0; +} + +/* Record a reference count extent. */ +STATIC int +xrep_refc_new_refc( + struct xfs_scrub *sc, + struct xrep_refc *rr, + xfs_agblock_t agbno, + xfs_extlen_t len, + xfs_nlink_t refcount) +{ + struct xrep_refc_extent *rre; + struct xfs_refcount_irec irec; + + irec.rc_startblock = agbno; + irec.rc_blockcount = len; + irec.rc_refcount = refcount; + + trace_xrep_refcount_extent_fn(sc->mp, sc->sa.agno, &irec); + + rre = kmem_alloc(sizeof(struct xrep_refc_extent), KM_MAYFAIL); + if (!rre) + return -ENOMEM; + INIT_LIST_HEAD(&rre->list); + rre->refc = irec; + list_add_tail(&rre->list, rr->extlist); + + return 0; +} + +/* Iterate all the rmap records to generate reference count data. */ +#define RMAP_NEXT(r) ((r).rm_startblock + (r).rm_blockcount) +STATIC int +xrep_refc_generate_refcounts( + struct xfs_scrub *sc, + struct xrep_refc *rr) +{ + struct xfs_rmap_irec rmap; + struct xfs_btree_cur *cur; + struct xrep_refc_rmap *rrm; + struct xrep_refc_rmap *n; + xfs_agblock_t sbno; + xfs_agblock_t cbno; + xfs_agblock_t nbno; + size_t old_stack_sz; + size_t stack_sz = 0; + bool have; + int have_gt; + int error; + + /* Start the rmapbt cursor to the left of all records. */ + cur = xfs_rmapbt_init_cursor(sc->mp, sc->tp, sc->sa.agf_bp, + sc->sa.agno); + error = xfs_rmap_lookup_le(cur, 0, 0, 0, 0, 0, &have_gt); + if (error) + goto out; + ASSERT(have_gt == 0); + + /* Process reverse mappings into refcount data. */ + while (xfs_btree_has_more_records(cur)) { + /* Push all rmaps with pblk == sbno onto the stack */ + error = xrep_refc_next_rmap(cur, rr, &rmap, &have); + if (error) + goto out; + if (!have) + break; + sbno = cbno = rmap.rm_startblock; + while (have && rmap.rm_startblock == sbno) { + rrm = xrep_refc_get_rmap(rr); + if (!rrm) + goto out; + rrm->rmap = rmap; + list_add_tail(&rrm->list, &rr->rmap_bag); + stack_sz++; + error = xrep_refc_next_rmap(cur, rr, &rmap, &have); + if (error) + goto out; + } + error = xfs_btree_decrement(cur, 0, &have_gt); + if (error) + goto out; + XFS_WANT_CORRUPTED_GOTO(sc->mp, have_gt, out); + + /* Set nbno to the bno of the next refcount change */ + nbno = have ? rmap.rm_startblock : NULLAGBLOCK; + list_for_each_entry(rrm, &rr->rmap_bag, list) + nbno = min_t(xfs_agblock_t, nbno, RMAP_NEXT(rrm->rmap)); + + ASSERT(nbno > sbno); + old_stack_sz = stack_sz; + + /* While stack isn't empty... */ + while (stack_sz) { + /* Pop all rmaps that end at nbno */ + list_for_each_entry_safe(rrm, n, &rr->rmap_bag, list) { + if (RMAP_NEXT(rrm->rmap) != nbno) + continue; + stack_sz--; + list_move(&rrm->list, &rr->rmap_idle); + } + + /* Push array items that start at nbno */ + error = xrep_refc_next_rmap(cur, rr, &rmap, &have); + if (error) + goto out; + while (have && rmap.rm_startblock == nbno) { + rrm = xrep_refc_get_rmap(rr); + if (!rrm) + goto out; + rrm->rmap = rmap; + list_add_tail(&rrm->list, &rr->rmap_bag); + stack_sz++; + error = xrep_refc_next_rmap(cur, rr, &rmap, + &have); + if (error) + goto out; + } + error = xfs_btree_decrement(cur, 0, &have_gt); + if (error) + goto out; + XFS_WANT_CORRUPTED_GOTO(sc->mp, have_gt, out); + + /* Emit refcount if necessary */ + ASSERT(nbno > cbno); + if (stack_sz != old_stack_sz) { + if (old_stack_sz > 1) { + error = xrep_refc_new_refc(sc, rr, cbno, + nbno - cbno, + old_stack_sz); + if (error) + goto out; + rr->nr_records++; + } + cbno = nbno; + } + + /* Stack empty, go find the next rmap */ + if (stack_sz == 0) + break; + old_stack_sz = stack_sz; + sbno = nbno; + + /* Set nbno to the bno of the next refcount change */ + nbno = have ? rmap.rm_startblock : NULLAGBLOCK; + list_for_each_entry(rrm, &rr->rmap_bag, list) + nbno = min_t(xfs_agblock_t, nbno, + RMAP_NEXT(rrm->rmap)); + + ASSERT(nbno > sbno); + } + } + + /* Free all the leftover rmap records. */ + list_for_each_entry_safe(rrm, n, &rr->rmap_idle, list) { + list_del(&rrm->list); + kmem_free(rrm); + } + + ASSERT(list_empty(&rr->rmap_bag)); +out: + xfs_btree_del_cursor(cur, error); + return error; +} +#undef RMAP_NEXT + +/* + * Generate all the reference counts for this AG and a list of the old + * refcount btree blocks. Figure out if we have enough free space to + * reconstruct the inode btrees. The caller must clean up the lists if + * anything goes wrong. + */ +STATIC int +xrep_refc_find_refcounts( + struct xfs_scrub *sc, + struct list_head *refcount_records, + struct xfs_bitmap *old_refcountbt_blocks) +{ + struct xrep_refc rr; + struct xrep_refc_rmap *rrm; + struct xrep_refc_rmap *n; + struct xfs_mount *mp = sc->mp; + int error; + + INIT_LIST_HEAD(&rr.rmap_bag); + INIT_LIST_HEAD(&rr.rmap_idle); + rr.extlist = refcount_records; + rr.btlist = old_refcountbt_blocks; + rr.btblocks = 0; + rr.sc = sc; + rr.nr_records = 0; + + /* Generate all the refcount records. */ + error = xrep_refc_generate_refcounts(sc, &rr); + if (error) + goto out; + + /* Do we actually have enough space to do this? */ + if (!xrep_ag_has_space(sc->sa.pag, + xfs_refcountbt_calc_size(mp, rr.nr_records), + XFS_AG_RESV_METADATA)) { + error = -ENOSPC; + goto out; + } + +out: + list_for_each_entry_safe(rrm, n, &rr.rmap_idle, list) { + list_del(&rrm->list); + kmem_free(rrm); + } + list_for_each_entry_safe(rrm, n, &rr.rmap_bag, list) { + list_del(&rrm->list); + kmem_free(rrm); + } + return error; +} + +/* Initialize new refcountbt root and implant it into the AGF. */ +STATIC int +xrep_refc_reset_btree( + struct xfs_scrub *sc, + struct xfs_owner_info *oinfo, + int *log_flags) +{ + struct xfs_buf *bp; + struct xfs_agf *agf; + xfs_fsblock_t btfsb; + int error; + + agf = XFS_BUF_TO_AGF(sc->sa.agf_bp); + + /* Initialize a new refcountbt root. */ + error = xrep_alloc_ag_block(sc, oinfo, &btfsb, XFS_AG_RESV_METADATA); + if (error) + return error; + error = xrep_init_btblock(sc, btfsb, &bp, XFS_BTNUM_REFC, + &xfs_refcountbt_buf_ops); + if (error) + return error; + agf->agf_refcount_root = cpu_to_be32(XFS_FSB_TO_AGBNO(sc->mp, btfsb)); + agf->agf_refcount_level = cpu_to_be32(1); + agf->agf_refcount_blocks = cpu_to_be32(1); + *log_flags |= XFS_AGF_REFCOUNT_BLOCKS | XFS_AGF_REFCOUNT_ROOT | + XFS_AGF_REFCOUNT_LEVEL; + + return 0; +} + +/* Build new refcount btree and dispose of the old one. */ +STATIC int +xrep_refc_rebuild_tree( + struct xfs_scrub *sc, + struct list_head *refcount_records, + struct xfs_owner_info *oinfo, + struct xfs_bitmap *old_refcountbt_blocks) +{ + struct xrep_refc_extent *rre; + struct xrep_refc_extent *n; + struct xfs_mount *mp = sc->mp; + struct xfs_btree_cur *cur; + int have_gt; + int error; + + /* Add all records. */ + list_sort(NULL, refcount_records, xrep_refcount_extent_cmp); + list_for_each_entry_safe(rre, n, refcount_records, list) { + /* Insert into the refcountbt. */ + cur = xfs_refcountbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, + sc->sa.agno); + error = xfs_refcount_lookup_eq(cur, rre->refc.rc_startblock, + &have_gt); + if (error) + return error; + XFS_WANT_CORRUPTED_RETURN(mp, have_gt == 0); + error = xfs_refcount_insert(cur, &rre->refc, &have_gt); + if (error) + return error; + XFS_WANT_CORRUPTED_RETURN(mp, have_gt == 1); + xfs_btree_del_cursor(cur, error); + cur = NULL; + + error = xrep_roll_ag_trans(sc); + if (error) + return error; + + list_del(&rre->list); + kmem_free(rre); + } + + /* Free the old refcountbt blocks if they're not in use. */ + return xrep_reap_extents(sc, old_refcountbt_blocks, oinfo, + XFS_AG_RESV_METADATA); +} + +/* Free every record in the refcount list. */ +STATIC void +xrep_refc_cancel_recs( + struct list_head *recs) +{ + struct xrep_refc_extent *rre; + struct xrep_refc_extent *n; + + list_for_each_entry_safe(rre, n, recs, list) { + list_del(&rre->list); + kmem_free(rre); + } +} + +/* Rebuild the refcount btree. */ +int +xrep_refcountbt( + struct xfs_scrub *sc) +{ + struct xfs_owner_info oinfo; + struct list_head refcount_records; + struct xfs_bitmap old_refcountbt_blocks; + struct xfs_mount *mp = sc->mp; + int log_flags = 0; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return -EOPNOTSUPP; + + xchk_perag_get(sc->mp, &sc->sa); + + /* Collect all reference counts. */ + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC); + INIT_LIST_HEAD(&refcount_records); + xfs_bitmap_init(&old_refcountbt_blocks); + error = xrep_refc_find_refcounts(sc, &refcount_records, + &old_refcountbt_blocks); + if (error) + goto out; + + /* + * Blow out the old refcount btrees. This is the point at which + * we are no longer able to bail out gracefully. + */ + error = xrep_refc_reset_btree(sc, &oinfo, &log_flags); + if (error) + goto out; + xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, log_flags); + + /* Invalidate all the inobt/finobt blocks in btlist. */ + error = xrep_invalidate_blocks(sc, &old_refcountbt_blocks); + if (error) + goto out; + error = xrep_roll_ag_trans(sc); + if (error) + goto out; + + /* Now rebuild the refcount information. */ + error = xrep_refc_rebuild_tree(sc, &refcount_records, &oinfo, + &old_refcountbt_blocks); + if (error) + goto out; + sc->reset_perag_resv = true; +out: + xfs_bitmap_destroy(&old_refcountbt_blocks); + xrep_refc_cancel_recs(&refcount_records); + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 0cc53dee3228..da12c20376ae 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -64,6 +64,7 @@ int xrep_agfl(struct xfs_scrub *sc); int xrep_agi(struct xfs_scrub *sc); int xrep_allocbt(struct xfs_scrub *sc); int xrep_iallocbt(struct xfs_scrub *sc); +int xrep_refcountbt(struct xfs_scrub *sc); #else @@ -100,6 +101,7 @@ xrep_reset_perag_resv( #define xrep_agi xrep_notsupported #define xrep_allocbt xrep_notsupported #define xrep_iallocbt xrep_notsupported +#define xrep_refcountbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 631b0b06db99..843eafe0acef 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -265,7 +265,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .setup = xchk_setup_ag_refcountbt, .scrub = xchk_refcountbt, .has = xfs_sb_version_hasreflink, - .repair = xrep_notsupported, + .repair = xrep_refcountbt, }, [XFS_SCRUB_TYPE_INODE] = { /* inode record */ .type = ST_INODE, From patchwork Mon Jul 30 05:48:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548435 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 565571751 for ; Mon, 30 Jul 2018 05:49:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 40B442991D for ; Mon, 30 Jul 2018 05:49:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3507E29932; Mon, 30 Jul 2018 05:49:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D0FC72991D for ; Mon, 30 Jul 2018 05:49:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726087AbeG3HWR (ORCPT ); Mon, 30 Jul 2018 03:22:17 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:53842 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbeG3HWR (ORCPT ); Mon, 30 Jul 2018 03:22:17 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5i3Jf005205; Mon, 30 Jul 2018 05:48:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=hNdAP7J3CvHL6O+Sgcc/cNcGY+B5Ueu3wxORlmCgqNM=; b=foYGO/7AT/6/57Xh0K72oITtnkMczaVKKPkjtR7NewQQZ+qpFG/c+46rB+lRjD+d88Vy ZqkdsRu2tKqo8RA0gFJH+90Ut0ucsc0blXnggKB4+NF9Bv22Fw1LIjQe+nRMLqJOJIIx re0iRSz8UpRAKvxRlE6Um7mRwB4diSqsQMt+v1XZn+l2ckWm6qRr1gQGt8qJZIVHURlN IdcmwU4bZfUF2lwd4ng5DLd1ikW63X1CmTPqn4o4DUIqaJCFCBpF7Y47N5dXB2WwZglu 7+lfklkIar/3EtuUX4yq776byWjDqXHn97huOC+SMl8zN3OOlCIGld0MiDbV1LMnLCLI RA== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2kgfwstx2h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:52 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mpLg020289 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:51 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mpr4004288; Mon, 30 Jul 2018 05:48:51 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:48:51 -0700 Subject: [PATCH 08/14] xfs: repair inode records From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:48:50 -0700 Message-ID: <153292973001.24509.13133591727522566817.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Try to reinitialize corrupt inodes, or clear the reflink flag if it's not needed. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_format.h | 3 fs/xfs/scrub/inode_repair.c | 659 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 2 5 files changed, 665 insertions(+), 2 deletions(-) create mode 100644 fs/xfs/scrub/inode_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 4ca97e026f94..e01b5003d543 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -167,6 +167,7 @@ xfs-y += $(addprefix scrub/, \ alloc_repair.o \ bitmap.o \ ialloc_repair.o \ + inode_repair.o \ refcount_repair.o \ repair.o \ ) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 059bc44c27e8..d4ebf1a4f3e8 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -973,7 +973,8 @@ typedef enum xfs_dinode_fmt { #define XFS_DFORK_APTR(dip) \ (XFS_DFORK_DPTR(dip) + XFS_DFORK_BOFF(dip)) #define XFS_DFORK_PTR(dip,w) \ - ((w) == XFS_DATA_FORK ? XFS_DFORK_DPTR(dip) : XFS_DFORK_APTR(dip)) + ((void *)((w) == XFS_DATA_FORK ? XFS_DFORK_DPTR(dip) : \ + XFS_DFORK_APTR(dip))) #define XFS_DFORK_FORMAT(dip,w) \ ((w) == XFS_DATA_FORK ? \ diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c new file mode 100644 index 000000000000..ec9d94d1e5d8 --- /dev/null +++ b/fs/xfs/scrub/inode_repair.c @@ -0,0 +1,659 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_icache.h" +#include "xfs_inode_buf.h" +#include "xfs_inode_fork.h" +#include "xfs_ialloc.h" +#include "xfs_da_format.h" +#include "xfs_reflink.h" +#include "xfs_rmap.h" +#include "xfs_bmap.h" +#include "xfs_bmap_util.h" +#include "xfs_dir2.h" +#include "xfs_quota_defs.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" + +/* + * Inode Repair + * + * Roughly speaking, inode problems can be classified based on whether or not + * they trip the dinode verifiers. If those trip, then we won't be able to + * _iget ourselves the inode. + * + * Therefore, the xrep_dinode_* functions fix anything that will cause the + * inode buffer verifier or the dinode verifier. The xrep_inode_* functions + * fix things on live incore inodes. + */ + +/* Make sure this buffer can pass the inode buffer verifier. */ +STATIC void +xrep_dinode_buf( + struct xfs_scrub *sc, + struct xfs_buf *bp) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_trans *tp = sc->tp; + struct xfs_dinode *dip; + xfs_agnumber_t agno; + xfs_agino_t agino; + int ioff; + int i; + int ni; + bool crc_ok; + bool magic_ok; + bool unlinked_ok; + + ni = XFS_BB_TO_FSB(mp, bp->b_length) * mp->m_sb.sb_inopblock; + agno = xfs_daddr_to_agno(mp, XFS_BUF_ADDR(bp)); + for (i = 0; i < ni; i++) { + ioff = i << mp->m_sb.sb_inodelog; + dip = xfs_buf_offset(bp, ioff); + agino = be32_to_cpu(dip->di_next_unlinked); + + unlinked_ok = magic_ok = crc_ok = false; + + if (agino == NULLAGINO || xfs_verify_agino(sc->mp, agno, agino)) + unlinked_ok = true; + + if (dip->di_magic == cpu_to_be16(XFS_DINODE_MAGIC) && + xfs_dinode_good_version(mp, dip->di_version)) + magic_ok = true; + + if (xfs_verify_cksum((char *)dip, mp->m_sb.sb_inodesize, + XFS_DINODE_CRC_OFF)) + crc_ok = true; + + if (magic_ok && unlinked_ok && crc_ok) + continue; + + if (!magic_ok) { + dip->di_magic = cpu_to_be16(XFS_DINODE_MAGIC); + dip->di_version = 3; + } + if (!unlinked_ok) + dip->di_next_unlinked = cpu_to_be32(NULLAGINO); + xfs_dinode_calc_crc(mp, dip); + xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF); + xfs_trans_log_buf(tp, bp, ioff, ioff + sizeof(*dip) - 1); + } +} + +/* Reinitialize things that never change in an inode. */ +STATIC void +xrep_dinode_header( + struct xfs_scrub *sc, + struct xfs_dinode *dip) +{ + dip->di_magic = cpu_to_be16(XFS_DINODE_MAGIC); + if (!xfs_dinode_good_version(sc->mp, dip->di_version)) + dip->di_version = 3; + dip->di_ino = cpu_to_be64(sc->sm->sm_ino); + uuid_copy(&dip->di_uuid, &sc->mp->m_sb.sb_meta_uuid); + dip->di_gen = cpu_to_be32(sc->sm->sm_gen); +} + +/* + * Turn di_mode into /something/ recognizable. + * + * XXX: Ideally we'd try to read data block 0 to see if it's a directory. + */ +STATIC void +xrep_dinode_mode( + struct xfs_dinode *dip) +{ + uint16_t mode; + + mode = be16_to_cpu(dip->di_mode); + if (mode == 0 || xfs_mode_to_ftype(mode) != XFS_DIR3_FT_UNKNOWN) + return; + + /* bad mode, so we set it to a file that only root can read */ + mode = S_IFREG; + dip->di_mode = cpu_to_be16(mode); + dip->di_uid = 0; + dip->di_gid = 0; +} + +/* Fix any conflicting flags that the verifiers complain about. */ +STATIC void +xrep_dinode_flags( + struct xfs_scrub *sc, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = sc->mp; + uint64_t flags2; + uint16_t mode; + uint16_t flags; + + mode = be16_to_cpu(dip->di_mode); + flags = be16_to_cpu(dip->di_flags); + flags2 = be64_to_cpu(dip->di_flags2); + + if (xfs_sb_version_hasreflink(&mp->m_sb) && S_ISREG(mode)) + flags2 |= XFS_DIFLAG2_REFLINK; + else + flags2 &= ~(XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE); + if (flags & XFS_DIFLAG_REALTIME) + flags2 &= ~XFS_DIFLAG2_REFLINK; + if (flags2 & XFS_DIFLAG2_REFLINK) + flags2 &= ~XFS_DIFLAG2_DAX; + dip->di_flags = cpu_to_be16(flags); + dip->di_flags2 = cpu_to_be64(flags2); +} + +/* + * Blow out symlink; now it points to the current dir. We don't have to worry + * about incore state because this inode is failing the verifiers. + */ +STATIC void +xrep_dinode_zap_symlink( + struct xfs_dinode *dip) +{ + char *p; + + dip->di_format = XFS_DINODE_FMT_LOCAL; + dip->di_size = cpu_to_be64(1); + p = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + *p = '.'; +} + +/* + * Blow out dir, make it point to the root. In the future repair will + * reconstruct this directory for us. Note that there's no in-core directory + * inode because the sf verifier tripped, so we don't have to worry about the + * dentry cache. + */ +STATIC void +xrep_dinode_zap_dir( + struct xfs_mount *mp, + struct xfs_dinode *dip) +{ + const struct xfs_dir_ops *ops; + struct xfs_dir2_sf_hdr *sfp; + int i8count; + + dip->di_format = XFS_DINODE_FMT_LOCAL; + i8count = mp->m_sb.sb_rootino > XFS_DIR2_MAX_SHORT_INUM; + ops = xfs_dir_get_ops(mp, NULL); + sfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + sfp->count = 0; + sfp->i8count = i8count; + ops->sf_put_parent_ino(sfp, mp->m_sb.sb_rootino); + dip->di_size = cpu_to_be64(xfs_dir2_sf_hdr_size(i8count)); +} + +/* Make sure we don't have a garbage file size. */ +STATIC void +xrep_dinode_size( + struct xfs_mount *mp, + struct xfs_dinode *dip) +{ + uint64_t size; + uint16_t mode; + + mode = be16_to_cpu(dip->di_mode); + size = be64_to_cpu(dip->di_size); + switch (mode & S_IFMT) { + case S_IFIFO: + case S_IFCHR: + case S_IFBLK: + case S_IFSOCK: + /* di_size can't be nonzero for special files */ + dip->di_size = 0; + break; + case S_IFREG: + /* Regular files can't be larger than 2^63-1 bytes. */ + dip->di_size = cpu_to_be64(size & ~(1ULL << 63)); + break; + case S_IFLNK: + /* + * Truncate ridiculously oversized symlinks. If the size is + * zero, reset it to point to the current directory. Both of + * these conditions trigger dinode verifier errors, so there + * is no in-core state to reset. + */ + if (size > XFS_SYMLINK_MAXLEN) + dip->di_size = cpu_to_be64(XFS_SYMLINK_MAXLEN); + else if (size == 0) + xrep_dinode_zap_symlink(dip); + break; + case S_IFDIR: + /* + * Directories can't have a size larger than 32G. If the size + * is zero, reset it to an empty directory. Both of these + * conditions trigger dinode verifier errors, so there is no + * in-core state to reset. + */ + if (size > XFS_DIR2_SPACE_SIZE) + dip->di_size = cpu_to_be64(XFS_DIR2_SPACE_SIZE); + else if (size == 0) + xrep_dinode_zap_dir(mp, dip); + break; + } +} + +/* Fix extent size hints. */ +STATIC void +xrep_dinode_extsize_hints( + struct xfs_scrub *sc, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = sc->mp; + uint64_t flags2; + uint16_t flags; + uint16_t mode; + xfs_failaddr_t fa; + + mode = be16_to_cpu(dip->di_mode); + flags = be16_to_cpu(dip->di_flags); + flags2 = be64_to_cpu(dip->di_flags2); + + fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize), + mode, flags); + if (fa) { + dip->di_extsize = 0; + dip->di_flags &= ~cpu_to_be16(XFS_DIFLAG_EXTSIZE | + XFS_DIFLAG_EXTSZINHERIT); + } + + if (dip->di_version < 3) + return; + + fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize), + mode, flags, flags2); + if (fa) { + dip->di_cowextsize = 0; + dip->di_flags2 &= ~cpu_to_be64(XFS_DIFLAG2_COWEXTSIZE); + } +} + +/* Inode didn't pass verifiers, so fix the raw buffer and retry iget. */ +STATIC int +xrep_dinode_core( + struct xfs_scrub *sc) +{ + struct xfs_imap imap; + struct xfs_buf *bp; + struct xfs_dinode *dip; + xfs_ino_t ino; + bool inuse; + int error; + + /* Map & read inode. */ + ino = sc->sm->sm_ino; + error = xfs_imap(sc->mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED); + if (error) + return error; + + error = xfs_trans_read_buf(sc->mp, sc->tp, sc->mp->m_ddev_targp, + imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp, NULL); + if (error) + return error; + + /* Make absolutely sure this inode isn't in core. */ + error = xfs_icache_inode_is_allocated(sc->mp, sc->tp, ino, &inuse); + if (error == 0) { + ASSERT(0); + return -EFSCORRUPTED; + } + + /* Make sure we can pass the inode buffer verifier. */ + xrep_dinode_buf(sc, bp); + bp->b_ops = &xfs_inode_buf_ops; + + /* Fix everything the verifier will complain about. */ + dip = xfs_buf_offset(bp, imap.im_boffset); + xrep_dinode_header(sc, dip); + xrep_dinode_mode(dip); + xrep_dinode_flags(sc, dip); + xrep_dinode_size(sc->mp, dip); + xrep_dinode_extsize_hints(sc, dip); + + /* Write out the inode... */ + xfs_dinode_calc_crc(sc->mp, dip); + xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_DINO_BUF); + xfs_trans_log_buf(sc->tp, bp, imap.im_boffset, + imap.im_boffset + sc->mp->m_sb.sb_inodesize - 1); + error = xfs_trans_commit(sc->tp); + if (error) + return error; + sc->tp = NULL; + + /* ...and reload it? */ + error = xfs_iget(sc->mp, sc->tp, ino, + XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &sc->ip); + if (error) + return error; + sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; + xfs_ilock(sc->ip, sc->ilock_flags); + error = xchk_trans_alloc(sc, 0); + if (error) + return error; + sc->ilock_flags |= XFS_ILOCK_EXCL; + xfs_ilock(sc->ip, XFS_ILOCK_EXCL); + + return 0; +} + +/* Fix everything xfs_dinode_verify cares about. */ +STATIC int +xrep_dinode_problems( + struct xfs_scrub *sc) +{ + int error; + + error = xrep_dinode_core(sc); + if (error) + return error; + + /* We had to fix a totally busted inode, schedule quotacheck. */ + if (XFS_IS_UQUOTA_ON(sc->mp)) + xrep_force_quotacheck(sc, XFS_DQ_USER); + if (XFS_IS_GQUOTA_ON(sc->mp)) + xrep_force_quotacheck(sc, XFS_DQ_GROUP); + if (XFS_IS_PQUOTA_ON(sc->mp)) + xrep_force_quotacheck(sc, XFS_DQ_PROJ); + + return 0; +} + +/* + * Fix problems that the verifiers don't care about. In general these are + * errors that don't cause problems elsewhere in the kernel that we can easily + * detect, so we don't check them all that rigorously. + */ + +/* Make sure block and extent counts are ok. */ +STATIC int +xrep_inode_blockcounts( + struct xfs_scrub *sc) +{ + xfs_filblks_t count; + xfs_filblks_t acount; + xfs_extnum_t nextents; + int error; + + /* Set data fork counters from the data fork mappings. */ + error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK, + &nextents, &count); + if (error) + return error; + if (XFS_IS_REALTIME_INODE(sc->ip)) { + if (count >= sc->mp->m_sb.sb_rblocks) + return -EFSCORRUPTED; + } else if (!xfs_sb_version_hasreflink(&sc->mp->m_sb)) { + if (count >= sc->mp->m_sb.sb_dblocks) + return -EFSCORRUPTED; + } + sc->ip->i_d.di_nextents = nextents; + + /* Set attr fork counters from the attr fork mappings. */ + error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK, + &nextents, &acount); + if (error) + return error; + if (count >= sc->mp->m_sb.sb_dblocks) + return -EFSCORRUPTED; + if (nextents >= (uint16_t)-1U) + return -EFSCORRUPTED; + sc->ip->i_d.di_anextents = nextents; + + sc->ip->i_d.di_nblocks = count + acount; + + /* + * If we found attr fork extents but no attr fork root, zero the + * attr fork extent count so that the attr fork repair will run. + */ + if (sc->ip->i_d.di_anextents != 0 && sc->ip->i_d.di_forkoff == 0) + sc->ip->i_d.di_anextents = 0; + + return 0; +} + +/* Check for invalid uid/gid. Note that a -1U projid is allowed. */ +STATIC void +xrep_inode_ids( + struct xfs_scrub *sc) +{ + if (sc->ip->i_d.di_uid == -1U) { + sc->ip->i_d.di_uid = 0; + VFS_I(sc->ip)->i_mode &= ~(S_ISUID | S_ISGID); + if (XFS_IS_UQUOTA_ON(sc->mp)) + xrep_force_quotacheck(sc, XFS_DQ_USER); + } + + if (sc->ip->i_d.di_gid == -1U) { + sc->ip->i_d.di_gid = 0; + VFS_I(sc->ip)->i_mode &= ~(S_ISUID | S_ISGID); + if (XFS_IS_GQUOTA_ON(sc->mp)) + xrep_force_quotacheck(sc, XFS_DQ_GROUP); + } +} + +/* Nanosecond counters can't have more than 1 billion. */ +STATIC void +xrep_inode_timestamps( + struct xfs_inode *ip) +{ + if ((unsigned long)VFS_I(ip)->i_atime.tv_nsec >= NSEC_PER_SEC) + VFS_I(ip)->i_atime.tv_nsec = 0; + if ((unsigned long)VFS_I(ip)->i_mtime.tv_nsec >= NSEC_PER_SEC) + VFS_I(ip)->i_mtime.tv_nsec = 0; + if ((unsigned long)VFS_I(ip)->i_ctime.tv_nsec >= NSEC_PER_SEC) + VFS_I(ip)->i_ctime.tv_nsec = 0; + if (ip->i_d.di_version > 2 && + (unsigned long)ip->i_d.di_crtime.t_nsec >= NSEC_PER_SEC) + ip->i_d.di_crtime.t_nsec = 0; +} + +/* Fix inode flags that don't make sense together. */ +STATIC void +xrep_inode_flags( + struct xfs_scrub *sc) +{ + uint16_t mode; + + mode = VFS_I(sc->ip)->i_mode; + + /* Clear junk flags */ + if (sc->ip->i_d.di_flags & ~XFS_DIFLAG_ANY) + sc->ip->i_d.di_flags &= ~XFS_DIFLAG_ANY; + + /* NEWRTBM only applies to realtime bitmaps */ + if (sc->ip->i_ino == sc->mp->m_sb.sb_rbmino) + sc->ip->i_d.di_flags |= XFS_DIFLAG_NEWRTBM; + else + sc->ip->i_d.di_flags &= ~XFS_DIFLAG_NEWRTBM; + + /* These only make sense for directories. */ + if (!S_ISDIR(mode)) + sc->ip->i_d.di_flags &= ~(XFS_DIFLAG_RTINHERIT | + XFS_DIFLAG_EXTSZINHERIT | + XFS_DIFLAG_PROJINHERIT | + XFS_DIFLAG_NOSYMLINKS); + + /* These only make sense for files. */ + if (!S_ISREG(mode)) + sc->ip->i_d.di_flags &= ~(XFS_DIFLAG_REALTIME | + XFS_DIFLAG_EXTSIZE); + + /* These only make sense for non-rt files. */ + if (sc->ip->i_d.di_flags & XFS_DIFLAG_REALTIME) + sc->ip->i_d.di_flags &= ~XFS_DIFLAG_FILESTREAM; + + /* Immutable and append only? Drop the append. */ + if ((sc->ip->i_d.di_flags & XFS_DIFLAG_IMMUTABLE) && + (sc->ip->i_d.di_flags & XFS_DIFLAG_APPEND)) + sc->ip->i_d.di_flags &= ~XFS_DIFLAG_APPEND; + + if (sc->ip->i_d.di_version < 3) + return; + + /* Clear junk flags. */ + if (sc->ip->i_d.di_flags2 & ~XFS_DIFLAG2_ANY) + sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_ANY; + + /* No reflink flag unless we support it and it's a file. */ + if (!xfs_sb_version_hasreflink(&sc->mp->m_sb) || + !S_ISREG(mode)) + sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; + + /* DAX only applies to files and dirs. */ + if (!(S_ISREG(mode) || S_ISDIR(mode))) + sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_DAX; + + /* No reflink files on the realtime device. */ + if (sc->ip->i_d.di_flags & XFS_DIFLAG_REALTIME) + sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; + + /* No mixing reflink and DAX yet. */ + if (sc->ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK) + sc->ip->i_d.di_flags2 &= ~XFS_DIFLAG2_DAX; +} + +/* + * Fix size problems with block/node format directories. If we fail to find + * the extent list, just bail out and let the bmapbtd repair functions clean + * up that mess. + */ +STATIC void +xrep_inode_blockdir_size( + struct xfs_scrub *sc) +{ + struct xfs_iext_cursor icur; + struct xfs_bmbt_irec got; + struct xfs_ifork *ifp; + xfs_fileoff_t off; + int error; + + /* Find the last block before 32G; this is the dir size. */ + ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK); + if (!(ifp->if_flags & XFS_IFEXTENTS)) { + error = xfs_iread_extents(sc->tp, sc->ip, XFS_DATA_FORK); + if (error) + return; + } + + off = XFS_B_TO_FSB(sc->mp, XFS_DIR2_SPACE_SIZE); + if (!xfs_iext_lookup_extent_before(sc->ip, ifp, &off, &icur, &got)) { + /* zero-extents directory? */ + return; + } + + off = got.br_startoff + got.br_blockcount; + sc->ip->i_d.di_size = min_t(loff_t, XFS_DIR2_SPACE_SIZE, + XFS_FSB_TO_B(sc->mp, off)); +} + +/* Fix size problems with short format directories. */ +STATIC void +xrep_inode_sfdir_size( + struct xfs_scrub *sc) +{ + struct xfs_ifork *ifp; + + ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK); + sc->ip->i_d.di_size = ifp->if_bytes; +} + +/* + * Fix any irregularities in an inode's size now that we can iterate extent + * maps and access other regular inode data. + */ +STATIC void +xrep_inode_size( + struct xfs_scrub *sc) +{ + /* + * Currently we only support fixing size on extents or btree format + * directories. Files can be any size and sizes for the other inode + * special types are fixed by xrep_dinode_size. + */ + if (!S_ISDIR(VFS_I(sc->ip)->i_mode)) + return; + switch (XFS_IFORK_FORMAT(sc->ip, XFS_DATA_FORK)) { + case XFS_DINODE_FMT_EXTENTS: + case XFS_DINODE_FMT_BTREE: + xrep_inode_blockdir_size(sc); + break; + case XFS_DINODE_FMT_LOCAL: + xrep_inode_sfdir_size(sc); + break; + } +} + +/* Fix any irregularities in an inode that the verifiers don't catch. */ +STATIC int +xrep_inode_problems( + struct xfs_scrub *sc) +{ + int error; + + error = xrep_inode_blockcounts(sc); + if (error) + return error; + xrep_inode_timestamps(sc->ip); + xrep_inode_flags(sc); + xrep_inode_ids(sc); + xrep_inode_size(sc); + xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); + return xfs_trans_roll_inode(&sc->tp, sc->ip); +} + +/* Repair an inode's fields. */ +int +xrep_inode( + struct xfs_scrub *sc) +{ + int error = 0; + + /* + * No inode? That means we failed the _iget verifiers. Repair all + * the things that the inode verifiers care about, then retry _iget. + */ + if (!sc->ip) { + error = xrep_dinode_problems(sc); + if (error) + goto out; + } + + /* By this point we had better have a working incore inode. */ + ASSERT(sc->ip); + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* If we found corruption of any kind, try to fix it. */ + if ((sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) || + (sc->sm->sm_flags & XFS_SCRUB_OFLAG_XCORRUPT)) { + error = xrep_inode_problems(sc); + if (error) + goto out; + } + + /* See if we can clear the reflink flag. */ + if (xfs_is_reflink_inode(sc->ip)) + return xfs_reflink_clear_inode_flag(sc->ip, &sc->tp); + +out: + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index da12c20376ae..20e449c7a0df 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -65,6 +65,7 @@ int xrep_agi(struct xfs_scrub *sc); int xrep_allocbt(struct xfs_scrub *sc); int xrep_iallocbt(struct xfs_scrub *sc); int xrep_refcountbt(struct xfs_scrub *sc); +int xrep_inode(struct xfs_scrub *sc); #else @@ -102,6 +103,7 @@ xrep_reset_perag_resv( #define xrep_allocbt xrep_notsupported #define xrep_iallocbt xrep_notsupported #define xrep_refcountbt xrep_notsupported +#define xrep_inode xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 843eafe0acef..ae922801808d 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -271,7 +271,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_INODE, .setup = xchk_setup_inode, .scrub = xchk_inode, - .repair = xrep_notsupported, + .repair = xrep_inode, }, [XFS_SCRUB_TYPE_BMBTD] = { /* inode data fork */ .type = ST_INODE, From patchwork Mon Jul 30 05:48:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548437 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7B2751751 for ; Mon, 30 Jul 2018 05:49:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 684FB2991D for ; Mon, 30 Jul 2018 05:49:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5C5CE29932; Mon, 30 Jul 2018 05:49:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 48D682991D for ; Mon, 30 Jul 2018 05:49:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726259AbeG3HWX (ORCPT ); Mon, 30 Jul 2018 03:22:23 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:59706 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbeG3HWW (ORCPT ); Mon, 30 Jul 2018 03:22:22 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5ihLE027849; Mon, 30 Jul 2018 05:49:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=csaP7FQJYjxkKA4HAUIDfau6DPYlUMET5OlfFHdqd0c=; b=HdgGrzHDRFFFfX51u4cnk9qZukIjchk+zH+T9NmKoGPFNvcUtbcuQDOUSjlEqxneA0n6 moP56wYw2I6FX101oWvPsVrKHCD244kwGi2Q6ZdNSIVhHMGgEt9z5j/4T+jZDr8I39V4 2vrkF+vHzpXn+X7TYS0w0LhBUyaGWMF7ZQubmJUmLgWllvPxjnalw7EcO3ssHJWuti9w NHxaJ+8vEGOCOlKEdETg1R1wsS8/s9mSr7aoh2p81l+sfiO0Bts4j1bQxupOH4CIfbNO ZSxOE9X+boS0JMnnOoUM5O5onWBXUNbaw+J2W/FFT738p8Qo466KnLGTE103b7qjIy/H Qw== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2130.oracle.com with ESMTP id 2kge0cu0pc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:59 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mwrU015427 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:48:58 GMT Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5mwX5004463; Mon, 30 Jul 2018 05:48:58 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:48:57 -0700 Subject: [PATCH 09/14] xfs: zap broken inode forks From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:48:56 -0700 Message-ID: <153292973654.24509.9109401449984743806.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_attr_leaf.c | 32 ++- fs/xfs/libxfs/xfs_attr_leaf.h | 2 fs/xfs/libxfs/xfs_bmap.c | 21 ++ fs/xfs/libxfs/xfs_bmap.h | 2 fs/xfs/scrub/inode_repair.c | 401 +++++++++++++++++++++++++++++++++++++++++ 5 files changed, 439 insertions(+), 19 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index 088ffcd22fa2..51c62dfe3059 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -896,23 +896,16 @@ xfs_attr_shortform_allfit( return xfs_attr_shortform_bytesfit(dp, bytes); } -/* Verify the consistency of an inline attribute fork. */ +/* Verify the consistency of a raw inline attribute fork. */ xfs_failaddr_t -xfs_attr_shortform_verify( - struct xfs_inode *ip) +xfs_attr_shortform_verify_struct( + struct xfs_attr_shortform *sfp, + size_t size) { - struct xfs_attr_shortform *sfp; struct xfs_attr_sf_entry *sfep; struct xfs_attr_sf_entry *next_sfep; char *endp; - struct xfs_ifork *ifp; int i; - int size; - - ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL); - ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK); - sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data; - size = ifp->if_bytes; /* * Give up if the attribute is way too short. @@ -970,6 +963,23 @@ xfs_attr_shortform_verify( return NULL; } +/* Verify the consistency of an inline attribute fork. */ +xfs_failaddr_t +xfs_attr_shortform_verify( + struct xfs_inode *ip) +{ + struct xfs_attr_shortform *sfp; + struct xfs_ifork *ifp; + int size; + + ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL); + ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK); + sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data; + size = ifp->if_bytes; + + return xfs_attr_shortform_verify_struct(sfp, size); +} + /* * Convert a leaf attribute list to shortform attribute list */ diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h index 7b74e18becff..728af25a1738 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.h +++ b/fs/xfs/libxfs/xfs_attr_leaf.h @@ -41,6 +41,8 @@ int xfs_attr_shortform_to_leaf(struct xfs_da_args *args, int xfs_attr_shortform_remove(struct xfs_da_args *args); int xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp); int xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes); +xfs_failaddr_t xfs_attr_shortform_verify_struct(struct xfs_attr_shortform *sfp, + size_t size); xfs_failaddr_t xfs_attr_shortform_verify(struct xfs_inode *ip); void xfs_attr_fork_remove(struct xfs_inode *ip, struct xfs_trans *tp); diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 92cd064a2589..649ce0a407dc 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6096,18 +6096,16 @@ xfs_bmap_finish_one( return error; } -/* Check that an inode's extent does not have invalid flags or bad ranges. */ +/* Check that an extent does not have invalid flags or bad ranges. */ xfs_failaddr_t -xfs_bmap_validate_extent( - struct xfs_inode *ip, +xfs_bmap_validate_extent_raw( + struct xfs_mount *mp, + bool isrt, int whichfork, struct xfs_bmbt_irec *irec) { - struct xfs_mount *mp = ip->i_mount; xfs_fsblock_t endfsb; - bool isrt; - isrt = XFS_IS_REALTIME_INODE(ip); endfsb = irec->br_startblock + irec->br_blockcount - 1; if (isrt) { if (!xfs_verify_rtbno(mp, irec->br_startblock)) @@ -6131,3 +6129,14 @@ xfs_bmap_validate_extent( } return NULL; } + +/* Check that an inode's extent does not have invalid flags or bad ranges. */ +xfs_failaddr_t +xfs_bmap_validate_extent( + struct xfs_inode *ip, + int whichfork, + struct xfs_bmbt_irec *irec) +{ + return xfs_bmap_validate_extent_raw(ip->i_mount, + XFS_IS_REALTIME_INODE(ip), whichfork, irec); +} diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 2e8555c1229a..a1bf6d1fec2d 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -273,6 +273,8 @@ static inline int xfs_bmap_fork_to_state(int whichfork) } } +xfs_failaddr_t xfs_bmap_validate_extent_raw(struct xfs_mount *mp, bool isrt, + int whichfork, struct xfs_bmbt_irec *irec); xfs_failaddr_t xfs_bmap_validate_extent(struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *irec); diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index ec9d94d1e5d8..3c9ac9e046fd 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -22,11 +22,15 @@ #include "xfs_ialloc.h" #include "xfs_da_format.h" #include "xfs_reflink.h" +#include "xfs_alloc.h" #include "xfs_rmap.h" +#include "xfs_rmap_btree.h" #include "xfs_bmap.h" +#include "xfs_bmap_btree.h" #include "xfs_bmap_util.h" #include "xfs_dir2.h" #include "xfs_quota_defs.h" +#include "xfs_attr_leaf.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -139,7 +143,8 @@ xrep_dinode_mode( STATIC void xrep_dinode_flags( struct xfs_scrub *sc, - struct xfs_dinode *dip) + struct xfs_dinode *dip, + bool is_rt_file) { struct xfs_mount *mp = sc->mp; uint64_t flags2; @@ -150,6 +155,11 @@ xrep_dinode_flags( flags = be16_to_cpu(dip->di_flags); flags2 = be64_to_cpu(dip->di_flags2); + if (is_rt_file) + flags |= XFS_DIFLAG_REALTIME; + else + flags &= ~XFS_DIFLAG_REALTIME; + if (xfs_sb_version_hasreflink(&mp->m_sb) && S_ISREG(mode)) flags2 |= XFS_DIFLAG2_REFLINK; else @@ -288,11 +298,392 @@ xrep_dinode_extsize_hints( } } +/* Blocks and extents associated with an inode, according to rmap records. */ +struct xrep_dinode_stats { + struct xfs_scrub *sc; + + /* Blocks in use on the data device by data extents or bmbt blocks. */ + xfs_rfsblock_t data_blocks; + + /* Blocks in use on the rt device. */ + xfs_rfsblock_t rt_blocks; + + /* Blocks in use by the attr fork. */ + xfs_rfsblock_t attr_blocks; + + /* Number of data device extents for the data fork. */ + xfs_extnum_t data_extents; + + /* + * Number of realtime device extents for the data fork. If + * data_extents and rt_extents indicate that the data fork has extents + * on both devices, we'll just back away slowly. + */ + xfs_extnum_t rt_extents; + + /* Number of (data device) extents for the attr fork. */ + xfs_aextnum_t attr_extents; +}; + +/* Count extents and blocks for an inode given an rmap. */ +STATIC int +xrep_dinode_walk_rmap( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_dinode_stats *dis = priv; + + /* Is this even the right fork? */ + if (rec->rm_owner != dis->sc->sm->sm_ino) + return 0; + if (rec->rm_flags & XFS_RMAP_ATTR_FORK) { + dis->attr_blocks += rec->rm_blockcount; + if (!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK)) + dis->attr_extents++; + } else { + dis->data_blocks += rec->rm_blockcount; + if (!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK)) + dis->data_extents++; + } + return 0; +} + +/* Count extents and blocks for an inode from all AG rmap data. */ +STATIC int +xrep_dinode_count_ag_rmaps( + struct xrep_dinode_stats *dis, + xfs_agnumber_t agno) +{ + struct xfs_btree_cur *cur; + struct xfs_buf *agf; + int error; + + error = xfs_alloc_read_agf(dis->sc->mp, dis->sc->tp, agno, 0, &agf); + if (error) + return error; + + cur = xfs_rmapbt_init_cursor(dis->sc->mp, dis->sc->tp, agf, agno); + if (!cur) { + error = -ENOMEM; + goto out_agf; + } + + error = xfs_rmap_query_all(cur, xrep_dinode_walk_rmap, dis); + if (error == XFS_BTREE_QUERY_RANGE_ABORT) + error = 0; + + xfs_btree_del_cursor(cur, error); +out_agf: + xfs_trans_brelse(dis->sc->tp, agf); + return error; +} + +/* Count extents and blocks for a given inode from all rmap data. */ +STATIC int +xrep_dinode_count_rmaps( + struct xrep_dinode_stats *dis) +{ + xfs_agnumber_t agno; + int error; + + if (!xfs_sb_version_hasrmapbt(&dis->sc->mp->m_sb) || + xfs_sb_version_hasrealtime(&dis->sc->mp->m_sb)) + return -EOPNOTSUPP; + + /* XXX: find rt blocks too */ + if (dis->rt_extents != 0) { + ASSERT(0); + return -EOPNOTSUPP; + } + + for (agno = 0; agno < dis->sc->mp->m_sb.sb_agcount; agno++) { + error = xrep_dinode_count_ag_rmaps(dis, agno); + if (error) + return error; + } + + /* Can't have extents on both the rt and the data device. */ + if (dis->data_extents && dis->rt_extents) + return -EFSCORRUPTED; + + return 0; +} + +/* Return true if this extents-format ifork looks like garbage. */ +STATIC bool +xrep_dinode_bad_extents_fork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + int dfork_size, + int whichfork) +{ + struct xfs_bmbt_irec new; + struct xfs_bmbt_rec *dp; + bool isrt; + int i; + int nex; + int fork_size; + + nex = XFS_DFORK_NEXTENTS(dip, whichfork); + fork_size = nex * sizeof(struct xfs_bmbt_rec); + if (fork_size < 0 || fork_size > dfork_size) + return true; + if (whichfork == XFS_ATTR_FORK && nex > ((uint16_t)-1U)) + return true; + dp = XFS_DFORK_PTR(dip, whichfork); + + isrt = dip->di_flags & cpu_to_be16(XFS_DIFLAG_REALTIME); + for (i = 0; i < nex; i++, dp++) { + xfs_failaddr_t fa; + + xfs_bmbt_disk_get_all(dp, &new); + fa = xfs_bmap_validate_extent_raw(sc->mp, isrt, whichfork, + &new); + if (fa) + return true; + } + + return false; +} + +/* Return true if this btree-format ifork looks like garbage. */ +STATIC bool +xrep_dinode_bad_btree_fork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + int dfork_size, + int whichfork) +{ + struct xfs_bmdr_block *dfp; + int nrecs; + int level; + + if (XFS_DFORK_NEXTENTS(dip, whichfork) <= + dfork_size / sizeof(struct xfs_bmbt_irec)) + return true; + + dfp = XFS_DFORK_PTR(dip, whichfork); + nrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (nrecs == 0 || XFS_BMDR_SPACE_CALC(nrecs) > dfork_size) + return true; + if (level == 0 || level > XFS_BTREE_MAXLEVELS) + return true; + return false; +} + +/* + * Check the data fork for things that will fail the ifork verifiers or the + * ifork formatters. + */ +STATIC bool +xrep_dinode_check_dfork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + uint16_t mode) +{ + uint64_t size; + unsigned int fmt; + int dfork_size; + + fmt = XFS_DFORK_FORMAT(dip, XFS_DATA_FORK); + size = be64_to_cpu(dip->di_size); + switch (mode & S_IFMT) { + case S_IFIFO: + case S_IFCHR: + case S_IFBLK: + case S_IFSOCK: + if (fmt != XFS_DINODE_FMT_DEV) + return true; + break; + case S_IFREG: + if (fmt == XFS_DINODE_FMT_LOCAL) + return true; + /* fall through */ + case S_IFLNK: + case S_IFDIR: + switch (fmt) { + case XFS_DINODE_FMT_LOCAL: + case XFS_DINODE_FMT_EXTENTS: + case XFS_DINODE_FMT_BTREE: + break; + default: + return true; + } + break; + default: + return true; + } + dfork_size = XFS_DFORK_SIZE(dip, sc->mp, XFS_DATA_FORK); + switch (fmt) { + case XFS_DINODE_FMT_DEV: + break; + case XFS_DINODE_FMT_LOCAL: + if (size > dfork_size) + return true; + break; + case XFS_DINODE_FMT_EXTENTS: + if (xrep_dinode_bad_extents_fork(sc, dip, dfork_size, + XFS_DATA_FORK)) + return true; + break; + case XFS_DINODE_FMT_BTREE: + if (xrep_dinode_bad_btree_fork(sc, dip, dfork_size, + XFS_DATA_FORK)) + return true; + break; + default: + return true; + } + + return false; +} + +/* Reset the data fork to something sane. */ +STATIC void +xrep_dinode_zap_dfork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + uint16_t mode, + struct xrep_dinode_stats *dis) +{ + /* Special files always get reset to DEV */ + switch (mode & S_IFMT) { + case S_IFIFO: + case S_IFCHR: + case S_IFBLK: + case S_IFSOCK: + dip->di_format = XFS_DINODE_FMT_DEV; + dip->di_size = 0; + return; + } + + /* + * If we have data extents, reset to an empty map and hope the user + * will run the bmapbtd checker next. + */ + if (dis->data_extents || dis->rt_extents || S_ISREG(mode)) { + dip->di_format = XFS_DINODE_FMT_EXTENTS; + dip->di_nextents = 0; + return; + } + + /* Otherwise, reset the local format to the minimum. */ + switch (mode & S_IFMT) { + case S_IFLNK: + xrep_dinode_zap_symlink(dip); + break; + case S_IFDIR: + xrep_dinode_zap_dir(sc->mp, dip); + break; + } +} + +/* + * Check the attr fork for things that will fail the ifork verifiers or the + * ifork formatters. + */ +STATIC bool +xrep_dinode_check_afork( + struct xfs_scrub *sc, + struct xfs_dinode *dip) +{ + struct xfs_attr_shortform *sfp; + int size; + + if (XFS_DFORK_BOFF(dip) == 0) + return dip->di_aformat != XFS_DINODE_FMT_EXTENTS || + dip->di_anextents != 0; + + size = XFS_DFORK_SIZE(dip, sc->mp, XFS_ATTR_FORK); + switch (XFS_DFORK_FORMAT(dip, XFS_ATTR_FORK)) { + case XFS_DINODE_FMT_LOCAL: + sfp = XFS_DFORK_PTR(dip, XFS_ATTR_FORK); + return xfs_attr_shortform_verify_struct(sfp, size) != NULL; + case XFS_DINODE_FMT_EXTENTS: + if (xrep_dinode_bad_extents_fork(sc, dip, size, XFS_ATTR_FORK)) + return true; + break; + case XFS_DINODE_FMT_BTREE: + if (xrep_dinode_bad_btree_fork(sc, dip, size, XFS_ATTR_FORK)) + return true; + break; + default: + return true; + } + + return false; +} + +/* Reset the attr fork to something sane. */ +STATIC void +xrep_dinode_zap_afork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + struct xrep_dinode_stats *dis) +{ + dip->di_aformat = XFS_DINODE_FMT_EXTENTS; + dip->di_anextents = 0; + /* + * We leave a nonzero forkoff so that the bmap scrub will look for + * attr rmaps. + */ + dip->di_forkoff = dis->attr_extents ? 1 : 0; +} + +/* + * Zap the data/attr forks if we spot anything that isn't going to pass the + * ifork verifiers or the ifork formatters, because we need to get the inode + * into good enough shape that the higher level repair functions can run. + */ +STATIC void +xrep_dinode_zap_forks( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + struct xrep_dinode_stats *dis) +{ + uint16_t mode; + bool zap_datafork = false; + bool zap_attrfork = false; + + mode = be16_to_cpu(dip->di_mode); + + /* Inode counters don't make sense? */ + if (be32_to_cpu(dip->di_nextents) > be64_to_cpu(dip->di_nblocks)) + zap_datafork = true; + if (be16_to_cpu(dip->di_anextents) > be64_to_cpu(dip->di_nblocks)) + zap_attrfork = true; + if (be32_to_cpu(dip->di_nextents) + be16_to_cpu(dip->di_anextents) > + be64_to_cpu(dip->di_nblocks)) + zap_datafork = zap_attrfork = true; + + if (!zap_datafork) + zap_datafork = xrep_dinode_check_dfork(sc, dip, mode); + if (!zap_attrfork) + zap_attrfork = xrep_dinode_check_afork(sc, dip); + + /* Zap whatever's bad. */ + if (zap_attrfork) + xrep_dinode_zap_afork(sc, dip, dis); + if (zap_datafork) + xrep_dinode_zap_dfork(sc, dip, mode, dis); + dip->di_nblocks = 0; + if (!zap_attrfork) + be64_add_cpu(&dip->di_nblocks, dis->attr_blocks); + if (!zap_datafork) { + be64_add_cpu(&dip->di_nblocks, dis->data_blocks); + be64_add_cpu(&dip->di_nblocks, dis->rt_blocks); + } +} + /* Inode didn't pass verifiers, so fix the raw buffer and retry iget. */ STATIC int xrep_dinode_core( struct xfs_scrub *sc) { + struct xrep_dinode_stats dis = { .sc = sc }; struct xfs_imap imap; struct xfs_buf *bp; struct xfs_dinode *dip; @@ -300,6 +691,11 @@ xrep_dinode_core( bool inuse; int error; + /* Figure out what this inode had mapped in both forks. */ + error = xrep_dinode_count_rmaps(&dis); + if (error) + return error; + /* Map & read inode. */ ino = sc->sm->sm_ino; error = xfs_imap(sc->mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED); @@ -326,9 +722,10 @@ xrep_dinode_core( dip = xfs_buf_offset(bp, imap.im_boffset); xrep_dinode_header(sc, dip); xrep_dinode_mode(dip); - xrep_dinode_flags(sc, dip); + xrep_dinode_flags(sc, dip, dis.rt_extents > 0); xrep_dinode_size(sc->mp, dip); xrep_dinode_extsize_hints(sc, dip); + xrep_dinode_zap_forks(sc, dip, &dis); /* Write out the inode... */ xfs_dinode_calc_crc(sc->mp, dip); From patchwork Mon Jul 30 05:49:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548439 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 627C5139A for ; Mon, 30 Jul 2018 05:49:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4FCF72991D for ; Mon, 30 Jul 2018 05:49:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4380A29932; Mon, 30 Jul 2018 05:49:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E78292991D for ; Mon, 30 Jul 2018 05:49:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726322AbeG3HWa (ORCPT ); Mon, 30 Jul 2018 03:22:30 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:54222 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbeG3HWa (ORCPT ); Mon, 30 Jul 2018 03:22:30 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5iduN005579; Mon, 30 Jul 2018 05:49:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=TnbedHGwFpNbgwdE9DgVZnGiStkr7Z2NmwKP4rUfQT8=; b=KF5yxccan/KPQcY5Wx6zSiVv+U1LNOmRrS1AXGFWqg0uKnVsC4C3smfwlam+Q+OYy1RA Ax1pfkDhvsxOt3i5Kdv7amqH+xZhcZgYfmldW0JiC8lfaJO2vVda0jYcTd6Z3qzZge+r m1LaF7r0SMF01Sl4n/8JLaTAwxcdZ9qdVfjbrkEYhd+73sFkX9jd/k2DZUiPB9QneY8t MsuNgdR5o6KRygW+WB98dV5we65U27V/djxAwH1hG9Gf7QeEVutqhCubLE4nKopFk9vQ KPZQvERcEvrCEIu92DwuqjS21eN6iuK5xAnhiMeVwm3WUepPX1+HRHwwSbbF4QUMbS/f RQ== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2kgfwstx51-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:49:05 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5n5d2015719 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:49:05 GMT Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5n4IP004523; Mon, 30 Jul 2018 05:49:04 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:49:04 -0700 Subject: [PATCH 10/14] xfs: repair inode block maps From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:49:03 -0700 Message-ID: <153292974314.24509.8286797647259797021.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Use the reverse-mapping btree information to rebuild an inode fork. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/bmap.c | 22 ++ fs/xfs/scrub/bmap_repair.c | 513 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 4 fs/xfs/scrub/scrub.c | 4 fs/xfs/scrub/trace.h | 2 fs/xfs/xfs_trans.c | 54 +++++ fs/xfs/xfs_trans.h | 2 8 files changed, 599 insertions(+), 3 deletions(-) create mode 100644 fs/xfs/scrub/bmap_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index e01b5003d543..7f5467bb18b9 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -166,6 +166,7 @@ xfs-y += $(addprefix scrub/, \ agheader_repair.o \ alloc_repair.o \ bitmap.o \ + bmap_repair.o \ ialloc_repair.o \ inode_repair.o \ refcount_repair.o \ diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index e1d11f3223e3..6659f41e7b4c 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -37,6 +37,7 @@ xchk_setup_inode_bmap( struct xfs_scrub *sc, struct xfs_inode *ip) { + bool is_repair = false; int error; error = xchk_get_inode(sc, ip); @@ -46,6 +47,10 @@ xchk_setup_inode_bmap( sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; xfs_ilock(sc->ip, sc->ilock_flags); +#ifdef CONFIG_XFS_REPAIR + is_repair = (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR); +#endif + /* * We don't want any ephemeral data fork updates sitting around * while we inspect block mappings, so wait for directio to finish @@ -53,10 +58,27 @@ xchk_setup_inode_bmap( */ if (S_ISREG(VFS_I(sc->ip)->i_mode) && sc->sm->sm_type == XFS_SCRUB_TYPE_BMBTD) { + /* Break all our leases, we're going to mess with things. */ + if (is_repair) { + error = xfs_break_layouts(VFS_I(sc->ip), + &sc->ilock_flags, BREAK_UNMAP); + if (error) + goto out; + } + inode_dio_wait(VFS_I(sc->ip)); error = filemap_write_and_wait(VFS_I(sc->ip)->i_mapping); if (error) goto out; + + /* Drop the page cache if we're repairing block mappings. */ + if (is_repair) { + error = invalidate_inode_pages2( + VFS_I(sc->ip)->i_mapping); + if (error) + goto out; + } + } /* Got the inode, lock it and we're ready to go. */ diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c new file mode 100644 index 000000000000..00907b97512e --- /dev/null +++ b/fs/xfs/scrub/bmap_repair.c @@ -0,0 +1,513 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_inode_fork.h" +#include "xfs_alloc.h" +#include "xfs_rtalloc.h" +#include "xfs_bmap.h" +#include "xfs_bmap_util.h" +#include "xfs_bmap_btree.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_refcount.h" +#include "xfs_quota.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" + +/* + * Inode fork block mapping (BMBT) repair. + * + * Basically, we gather all the rmap records for the inode and fork we're + * fixing, reset the incore fork, then re-add all the records. + */ + +struct xrep_bmap_extent { + struct list_head list; + struct xfs_rmap_irec rmap; + xfs_agnumber_t agno; +}; + +struct xrep_bmap { + /* List of new bmap records. */ + struct list_head *extlist; + + /* Old bmbt blocks */ + struct xfs_bitmap *btlist; + + struct xfs_scrub *sc; + + /* Inode we're fixing. */ + xfs_ino_t ino; + + /* How many blocks did we find in the other fork? */ + xfs_rfsblock_t otherfork_blocks; + + /* How many bmbt blocks did we find for this fork? */ + xfs_rfsblock_t bmbt_blocks; + + /* How many extents did we find for this fork? */ + xfs_extnum_t extents; + + /* Which fork are we fixing? */ + int whichfork; +}; + +/* Record extents that belong to this inode's fork. */ +STATIC int +xrep_bmap_walk_rmap( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_bmap *rb = priv; + struct xrep_bmap_extent *rbe; + struct xfs_mount *mp = cur->bc_mp; + xfs_fsblock_t fsbno; + int error = 0; + + if (xchk_should_terminate(rb->sc, &error)) + return error; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rb->ino) { + return 0; + } else if (rb->whichfork == XFS_DATA_FORK && + (rec->rm_flags & XFS_RMAP_ATTR_FORK)) { + rb->otherfork_blocks += rec->rm_blockcount; + return 0; + } else if (rb->whichfork == XFS_ATTR_FORK && + !(rec->rm_flags & XFS_RMAP_ATTR_FORK)) { + rb->otherfork_blocks += rec->rm_blockcount; + return 0; + } + + /* Delete the old bmbt blocks later. */ + if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) { + fsbno = XFS_AGB_TO_FSB(mp, cur->bc_private.a.agno, + rec->rm_startblock); + rb->bmbt_blocks += rec->rm_blockcount; + return xfs_bitmap_set(rb->btlist, fsbno, rec->rm_blockcount); + } + + /* Remember this rmap. */ + rb->extents++; + trace_xrep_bmap_walk_rmap(mp, cur->bc_private.a.agno, + rec->rm_startblock, rec->rm_blockcount, rec->rm_owner, + rec->rm_offset, rec->rm_flags); + + rbe = kmem_alloc(sizeof(struct xrep_bmap_extent), KM_MAYFAIL); + if (!rbe) + return -ENOMEM; + + INIT_LIST_HEAD(&rbe->list); + rbe->rmap = *rec; + rbe->agno = cur->bc_private.a.agno; + list_add_tail(&rbe->list, rb->extlist); + + return 0; +} + +/* Compare two bmap extents. */ +static int +xrep_bmap_extent_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_bmap_extent *ap; + struct xrep_bmap_extent *bp; + + ap = container_of(a, struct xrep_bmap_extent, list); + bp = container_of(b, struct xrep_bmap_extent, list); + + if (ap->rmap.rm_offset > bp->rmap.rm_offset) + return 1; + else if (ap->rmap.rm_offset < bp->rmap.rm_offset) + return -1; + return 0; +} + +/* Scan one AG for reverse mappings that we can turn into extent maps. */ +STATIC int +xrep_bmap_scan_ag( + struct xrep_bmap *rb, + xfs_agnumber_t agno) +{ + struct xfs_scrub *sc = rb->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_buf *agf_bp = NULL; + struct xfs_btree_cur *cur; + int error; + + error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, &agf_bp); + if (error) + return error; + if (!agf_bp) + return -ENOMEM; + cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, agno); + error = xfs_rmap_query_all(cur, xrep_bmap_walk_rmap, rb); + if (error == XFS_BTREE_QUERY_RANGE_ABORT) + error = 0; + xfs_btree_del_cursor(cur, error); + xfs_trans_brelse(sc->tp, agf_bp); + return error; +} + +/* Insert bmap records into an inode fork, given an rmap. */ +STATIC int +xrep_bmap_insert_rec( + struct xfs_scrub *sc, + struct xrep_bmap_extent *rbe, + int baseflags) +{ + struct xfs_bmbt_irec bmap; + xfs_extlen_t extlen; + int flags; + int error = 0; + + /* Form the "new" mapping... */ + bmap.br_startblock = XFS_AGB_TO_FSB(sc->mp, rbe->agno, + rbe->rmap.rm_startblock); + bmap.br_startoff = rbe->rmap.rm_offset; + + flags = 0; + if (rbe->rmap.rm_flags & XFS_RMAP_UNWRITTEN) + flags = XFS_BMAPI_PREALLOC; + while (rbe->rmap.rm_blockcount > 0) { + extlen = min_t(xfs_extlen_t, rbe->rmap.rm_blockcount, + MAXEXTLEN); + bmap.br_blockcount = extlen; + + /* Re-add the extent to the fork. */ + error = xfs_bmapi_remap(sc->tp, sc->ip, bmap.br_startoff, + extlen, bmap.br_startblock, baseflags | flags); + if (error) + goto out; + + bmap.br_startblock += extlen; + bmap.br_startoff += extlen; + rbe->rmap.rm_blockcount -= extlen; + error = xfs_defer_ijoin(sc->tp->t_dfops, sc->ip); + if (error) + goto out; + error = xfs_defer_finish(&sc->tp); + if (error) + goto out; + /* Make sure we roll the transaction. */ + error = xfs_trans_roll_inode(&sc->tp, sc->ip); + if (error) + goto out; + } + +out: + return error; +} + +/* Check for garbage inputs. */ +STATIC int +xrep_bmap_check_inputs( + struct xfs_scrub *sc, + int whichfork) +{ + ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK); + + /* Don't know how to repair the other fork formats. */ + if (XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_EXTENTS && + XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_BTREE) + return -EOPNOTSUPP; + + /* + * If there's no attr fork area in the inode, there's no attr fork to + * rebuild. + */ + if (whichfork == XFS_ATTR_FORK) { + if (!XFS_IFORK_Q(sc->ip)) + return -ENOENT; + return 0; + } + + /* Only files, symlinks, and directories get to have data forks. */ + switch (VFS_I(sc->ip)->i_mode & S_IFMT) { + case S_IFREG: + case S_IFDIR: + case S_IFLNK: + /* ok */ + break; + default: + return -EINVAL; + } + + /* If we somehow have delalloc extents, forget it. */ + if (sc->ip->i_delayed_blks) + return -EBUSY; + + /* Don't know how to rebuild realtime data forks. */ + if (XFS_IS_REALTIME_INODE(sc->ip)) + return -EOPNOTSUPP; + + return 0; +} + +/* + * Collect block mappings for this fork of this inode and decide if we have + * enough space to rebuild. Caller is responsible for cleaning up the list if + * anything goes wrong. + */ +STATIC int +xrep_bmap_find_mappings( + struct xfs_scrub *sc, + int whichfork, + struct list_head *mapping_records, + struct xfs_bitmap *old_bmbt_blocks, + xfs_rfsblock_t *old_bmbt_block_count, + xfs_rfsblock_t *otherfork_blocks) +{ + struct xrep_bmap rb; + xfs_agnumber_t agno; + unsigned int resblks; + int error; + + memset(&rb, 0, sizeof(rb)); + rb.extlist = mapping_records; + rb.btlist = old_bmbt_blocks; + rb.ino = sc->ip->i_ino; + rb.whichfork = whichfork; + rb.sc = sc; + + /* Iterate the rmaps for extents. */ + for (agno = 0; agno < sc->mp->m_sb.sb_agcount; agno++) { + error = xrep_bmap_scan_ag(&rb, agno); + if (error) + return error; + } + + /* + * Guess how many blocks we're going to need to rebuild an entire bmap + * from the number of extents we found, and pump up our transaction to + * have sufficient block reservation. + */ + resblks = xfs_bmbt_calc_size(sc->mp, rb.extents); + error = xfs_trans_reserve_more(sc->tp, resblks, 0); + if (error) + return error; + + *otherfork_blocks = rb.otherfork_blocks; + *old_bmbt_block_count = rb.bmbt_blocks; + return 0; +} + +/* Update the inode counters. */ +STATIC int +xrep_bmap_reset_counters( + struct xfs_scrub *sc, + xfs_rfsblock_t old_bmbt_block_count, + xfs_rfsblock_t otherfork_blocks, + int *log_flags) +{ + int error; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* + * We're going to use the bmap routines to reconstruct a fork from rmap + * records. Those functions increment di_nblocks for us, so we need to + * subtract out all the data and bmbt blocks from the fork we're about + * to rebuild. otherfork_blocks reflects all the data and bmbt blocks + * for the other fork, so this assignment effectively performs the + * subtraction for us. + */ + sc->ip->i_d.di_nblocks = otherfork_blocks; + *log_flags |= XFS_ILOG_CORE; + + if (!old_bmbt_block_count) + return 0; + + /* Release quota counts for the old bmbt blocks. */ + error = xrep_ino_dqattach(sc); + if (error) + return error; + xfs_trans_mod_dquot_byino(sc->tp, sc->ip, XFS_TRANS_DQ_BCOUNT, + -(int64_t)old_bmbt_block_count); + return 0; +} + +/* Initialize a new fork and implant it in the inode. */ +STATIC void +xrep_bmap_reset_fork( + struct xfs_scrub *sc, + int whichfork, + bool has_mappings, + int *log_flags) +{ + /* Set us back to extents format with zero records. */ + XFS_IFORK_FMT_SET(sc->ip, whichfork, XFS_DINODE_FMT_EXTENTS); + XFS_IFORK_NEXT_SET(sc->ip, whichfork, 0); + + /* Reinitialize the in-core fork. */ + if (XFS_IFORK_PTR(sc->ip, whichfork) != NULL) + xfs_idestroy_fork(sc->ip, whichfork); + if (whichfork == XFS_DATA_FORK) { + memset(&sc->ip->i_df, 0, sizeof(struct xfs_ifork)); + sc->ip->i_df.if_flags |= XFS_IFEXTENTS; + } else if (whichfork == XFS_ATTR_FORK) { + if (has_mappings) { + sc->ip->i_afp = NULL; + } else { + sc->ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, + KM_SLEEP); + sc->ip->i_afp->if_flags |= XFS_IFEXTENTS; + } + } + + /* + * Now that we've reinitialized the in-memory fork and set the inode + * back to extents format with zero extents, any extents that we + * subsequently map into the file will reinitialize the on-disk fork + * area for us. All we have to do is log the inode core to preserve + * the format and extent count fields. + */ + *log_flags |= XFS_ILOG_CORE; +} + +/* Make our changes permanent so that we can start rebuilding the fork. */ +STATIC int +xrep_bmap_commit_new( + struct xfs_scrub *sc, + int log_flags) +{ + xfs_trans_log_inode(sc->tp, sc->ip, log_flags); + return xfs_trans_roll_inode(&sc->tp, sc->ip); +} + +/* Build new fork mappings and dispose of the old bmbt blocks. */ +STATIC int +xrep_bmap_rebuild_tree( + struct xfs_scrub *sc, + int whichfork, + struct list_head *mapping_records, + struct xfs_bitmap *old_bmbt_blocks) +{ + struct xfs_owner_info oinfo; + struct xrep_bmap_extent *rbe; + struct xrep_bmap_extent *n; + int baseflags; + int error; + + baseflags = XFS_BMAPI_NORMAP; + if (whichfork == XFS_ATTR_FORK) + baseflags |= XFS_BMAPI_ATTRFORK; + + /* "Remap" the extents into the fork. */ + list_sort(NULL, mapping_records, xrep_bmap_extent_cmp); + list_for_each_entry_safe(rbe, n, mapping_records, list) { + error = xrep_bmap_insert_rec(sc, rbe, baseflags); + if (error) + return error; + list_del(&rbe->list); + kmem_free(rbe); + } + + /* Dispose of all the old bmbt blocks. */ + xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, whichfork); + return xrep_reap_extents(sc, old_bmbt_blocks, &oinfo, + XFS_AG_RESV_NONE); +} + +/* Free every record in the mapping list. */ +STATIC void +xrep_bmap_cancel_recs( + struct list_head *recs) +{ + struct xrep_bmap_extent *rbe; + struct xrep_bmap_extent *n; + + list_for_each_entry_safe(rbe, n, recs, list) { + list_del(&rbe->list); + kmem_free(rbe); + } +} + +/* Repair an inode fork. */ +STATIC int +xrep_bmap( + struct xfs_scrub *sc, + int whichfork) +{ + struct list_head mapping_records; + struct xfs_bitmap old_bmbt_blocks; + xfs_rfsblock_t old_bmbt_block_count; + xfs_rfsblock_t otherfork_blocks; + int log_flags = 0; + int error = 0; + + error = xrep_bmap_check_inputs(sc, whichfork); + if (error) + return error; + + /* Collect all reverse mappings for this fork's extents. */ + INIT_LIST_HEAD(&mapping_records); + xfs_bitmap_init(&old_bmbt_blocks); + error = xrep_bmap_find_mappings(sc, whichfork, &mapping_records, + &old_bmbt_blocks, &old_bmbt_block_count, + &otherfork_blocks); + if (error) + goto out; + + /* + * Blow out the in-core fork and zero the on-disk fork. This is the + * point at which we are no longer able to bail out gracefully. + */ + error = xrep_bmap_reset_counters(sc, old_bmbt_block_count, + otherfork_blocks, &log_flags); + if (error) + goto out; + xrep_bmap_reset_fork(sc, whichfork, list_empty(&mapping_records), + &log_flags); + error = xrep_bmap_commit_new(sc, log_flags); + if (error) + goto out; + + /* Now rebuild the fork extent map information. */ + error = xrep_bmap_rebuild_tree(sc, whichfork, &mapping_records, + &old_bmbt_blocks); +out: + xfs_bitmap_destroy(&old_bmbt_blocks); + xrep_bmap_cancel_recs(&mapping_records); + return error; +} + +/* Repair an inode's data fork. */ +int +xrep_bmap_data( + struct xfs_scrub *sc) +{ + return xrep_bmap(sc, XFS_DATA_FORK); +} + +/* Repair an inode's attr fork. */ +int +xrep_bmap_attr( + struct xfs_scrub *sc) +{ + return xrep_bmap(sc, XFS_ATTR_FORK); +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 20e449c7a0df..38444fec70db 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -66,6 +66,8 @@ int xrep_allocbt(struct xfs_scrub *sc); int xrep_iallocbt(struct xfs_scrub *sc); int xrep_refcountbt(struct xfs_scrub *sc); int xrep_inode(struct xfs_scrub *sc); +int xrep_bmap_data(struct xfs_scrub *sc); +int xrep_bmap_attr(struct xfs_scrub *sc); #else @@ -104,6 +106,8 @@ xrep_reset_perag_resv( #define xrep_iallocbt xrep_notsupported #define xrep_refcountbt xrep_notsupported #define xrep_inode xrep_notsupported +#define xrep_bmap_data xrep_notsupported +#define xrep_bmap_attr xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index ae922801808d..45af20a3ab50 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -277,13 +277,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_INODE, .setup = xchk_setup_inode_bmap, .scrub = xchk_bmap_data, - .repair = xrep_notsupported, + .repair = xrep_bmap_data, }, [XFS_SCRUB_TYPE_BMBTA] = { /* inode attr fork */ .type = ST_INODE, .setup = xchk_setup_inode_bmap, .scrub = xchk_bmap_attr, - .repair = xrep_notsupported, + .repair = xrep_bmap_attr, }, [XFS_SCRUB_TYPE_BMBTC] = { /* inode CoW fork */ .type = ST_INODE, diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 9126dc66f726..3383b14fd0c0 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -554,7 +554,7 @@ DEFINE_EVENT(xrep_rmap_class, name, \ DEFINE_REPAIR_RMAP_EVENT(xrep_abt_walk_rmap); DEFINE_REPAIR_RMAP_EVENT(xrep_ibt_walk_rmap); DEFINE_REPAIR_RMAP_EVENT(xrep_rmap_extent_fn); -DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_extent_fn); +DEFINE_REPAIR_RMAP_EVENT(xrep_bmap_walk_rmap); TRACE_EVENT(xrep_refcount_extent_fn, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 7bf5c1202719..226a30465def 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -133,6 +133,60 @@ xfs_trans_dup( return ntp; } +/* + * Try to reserve more blocks for a transaction. The single use case we + * support is for online repair -- use a transaction to gather data without + * fear of btree cycle deadlocks; calculate how many blocks we really need + * from that data; and only then start modifying data. This can fail due to + * ENOSPC, so we have to be able to cancel the transaction. + */ +int +xfs_trans_reserve_more( + struct xfs_trans *tp, + uint blocks, + uint rtextents) +{ + struct xfs_mount *mp = tp->t_mountp; + bool rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0; + int error = 0; + + ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY)); + + /* + * Attempt to reserve the needed disk blocks by decrementing + * the number needed from the number available. This will + * fail if the count would go below zero. + */ + if (blocks > 0) { + error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd); + if (error) + return -ENOSPC; + tp->t_blk_res += blocks; + } + + /* + * Attempt to reserve the needed realtime extents by decrementing + * the number needed from the number available. This will + * fail if the count would go below zero. + */ + if (rtextents > 0) { + error = xfs_mod_frextents(mp, -((int64_t)rtextents)); + if (error) { + error = -ENOSPC; + goto out_blocks; + } + tp->t_rtx_res += rtextents; + } + + return 0; +out_blocks: + if (blocks > 0) { + xfs_mod_fdblocks(mp, (int64_t)blocks, rsvd); + tp->t_blk_res -= blocks; + } + return error; +} + /* * This is called to reserve free disk blocks and log space for the * given transaction. This must be done before allocating any resources diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 5170e89bec02..8708bd2bbdba 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -169,6 +169,8 @@ typedef struct xfs_trans { int xfs_trans_alloc(struct xfs_mount *mp, struct xfs_trans_res *resp, uint blocks, uint rtextents, uint flags, struct xfs_trans **tpp); +int xfs_trans_reserve_more(struct xfs_trans *tp, uint blocks, + uint rtextents); int xfs_trans_alloc_empty(struct xfs_mount *mp, struct xfs_trans **tpp); void xfs_trans_mod_sb(xfs_trans_t *, uint, int64_t); From patchwork Mon Jul 30 05:49:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548441 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 439761751 for ; Mon, 30 Jul 2018 05:49:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 31C8E2991D for ; Mon, 30 Jul 2018 05:49:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 261F029932; Mon, 30 Jul 2018 05:49:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 23F142991D for ; Mon, 30 Jul 2018 05:49:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726087AbeG3HWg (ORCPT ); Mon, 30 Jul 2018 03:22:36 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:43686 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbeG3HWf (ORCPT ); Mon, 30 Jul 2018 03:22:35 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5hbUL034093; Mon, 30 Jul 2018 05:49:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=dyVZYYrRWs1FuGf5dHPuCswpLC9NYlChn2cveAThUTg=; b=LUVvPH1P8bzDVqdeUi+TqcKC2SqDdl1fTJvs7dxKSH9WMpOh9psdlqRWQAX6UJXzRN/7 lgIFGuHYidtgo6nLa/OmqCevjEa7RqeiyCtcLiH1SqyTncqAYymkpUki6uVFmx6S02Ha dcxESg+WoekY3vC6nNwxZV6BX02cKgZ8EbCKi6pt90GXflr8rAcZZIKczghasdYlkBXG a7pSYr9Tutk19TSZomDo/Dn5SheKuDZLNlChS6d+oxMkyBIy4jE7m81YzrnTnl6/hDEf 1txxv/P0NEAuPJVaIEVNx2kb5hnyG+Wi1dM5JiY/8k4s2jI96it3UKQK/L5hGMv6UpIY 7g== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2120.oracle.com with ESMTP id 2kggentvc8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:49:12 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5nBci021304 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:49:12 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6U5nBpO003393; Mon, 30 Jul 2018 05:49:11 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:49:11 -0700 Subject: [PATCH 11/14] xfs: repair damaged symlinks From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:49:09 -0700 Message-ID: <153292974980.24509.14967171523048953889.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300065 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Repair inconsistent symbolic link data. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/symlink.c | 5 + fs/xfs/scrub/symlink_repair.c | 244 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_symlink.c | 150 ++++++++++++++----------- fs/xfs/xfs_symlink.h | 3 + 7 files changed, 339 insertions(+), 68 deletions(-) create mode 100644 fs/xfs/scrub/symlink_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 7f5467bb18b9..e25cde969d99 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -171,6 +171,7 @@ xfs-y += $(addprefix scrub/, \ inode_repair.o \ refcount_repair.o \ repair.o \ + symlink_repair.o \ ) endif endif diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 38444fec70db..17769efb20d9 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -68,6 +68,7 @@ int xrep_refcountbt(struct xfs_scrub *sc); int xrep_inode(struct xfs_scrub *sc); int xrep_bmap_data(struct xfs_scrub *sc); int xrep_bmap_attr(struct xfs_scrub *sc); +int xrep_symlink(struct xfs_scrub *sc); #else @@ -108,6 +109,7 @@ xrep_reset_perag_resv( #define xrep_inode xrep_notsupported #define xrep_bmap_data xrep_notsupported #define xrep_bmap_attr xrep_notsupported +#define xrep_symlink xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 45af20a3ab50..0a8eea77e58f 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -307,7 +307,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_INODE, .setup = xchk_setup_symlink, .scrub = xchk_symlink, - .repair = xrep_notsupported, + .repair = xrep_symlink, }, [XFS_SCRUB_TYPE_PARENT] = { /* parent pointers */ .type = ST_INODE, diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c index f7ebaa946999..ee968c62d0f2 100644 --- a/fs/xfs/scrub/symlink.c +++ b/fs/xfs/scrub/symlink.c @@ -29,12 +29,15 @@ xchk_setup_symlink( struct xfs_scrub *sc, struct xfs_inode *ip) { + uint resblks; + /* Allocate the buffer without the inode lock held. */ sc->buf = kmem_zalloc_large(XFS_SYMLINK_MAXLEN + 1, KM_SLEEP); if (!sc->buf) return -ENOMEM; - return xchk_setup_inode_contents(sc, ip, 0); + resblks = xfs_symlink_blocks(sc->mp, XFS_SYMLINK_MAXLEN); + return xchk_setup_inode_contents(sc, ip, resblks); } /* Symbolic links. */ diff --git a/fs/xfs/scrub/symlink_repair.c b/fs/xfs/scrub/symlink_repair.c new file mode 100644 index 000000000000..6888094cf941 --- /dev/null +++ b/fs/xfs/scrub/symlink_repair.c @@ -0,0 +1,244 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_inode_fork.h" +#include "xfs_symlink.h" +#include "xfs_bmap.h" +#include "xfs_quota.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_bmap_btree.h" +#include "xfs_trans_space.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/trace.h" +#include "scrub/repair.h" + +/* + * Symbolic Link Repair + * ==================== + * + * There's not much we can do to repair symbolic links -- we truncate them to + * the first NULL byte and reinitialize the target. Zero-length symlinks are + * turned into links to the current dir. + */ + +/* Try to salvage the pathname from rmt blocks. */ +STATIC int +xrep_symlink_salvage_remote( + struct xfs_scrub *sc) +{ + struct xfs_bmbt_irec mval[XFS_SYMLINK_MAPS]; + struct xfs_inode *ip = sc->ip; + struct xfs_buf *bp; + char *target_buf = sc->buf; + xfs_failaddr_t fa; + xfs_filblks_t fsblocks; + xfs_daddr_t d; + loff_t len; + loff_t offset; + unsigned int byte_cnt; + bool magic_ok; + bool hdr_ok; + int n; + int nmaps = XFS_SYMLINK_MAPS; + int error; + + /* We'll only read until the buffer is full. */ + len = max_t(loff_t, ip->i_d.di_size, XFS_SYMLINK_MAXLEN); + fsblocks = xfs_symlink_blocks(sc->mp, len); + error = xfs_bmapi_read(ip, 0, fsblocks, mval, &nmaps, 0); + if (error) + return error; + + offset = 0; + for (n = 0; n < nmaps; n++) { + struct xfs_dsymlink_hdr *dsl; + + d = XFS_FSB_TO_DADDR(sc->mp, mval[n].br_startblock); + + /* Read the rmt block. We'll run the verifiers manually. */ + error = xfs_trans_read_buf(sc->mp, sc->tp, sc->mp->m_ddev_targp, + d, XFS_FSB_TO_BB(sc->mp, mval[n].br_blockcount), + 0, &bp, NULL); + if (error) + return error; + bp->b_ops = &xfs_symlink_buf_ops; + + /* How many bytes do we expect to get out of this buffer? */ + byte_cnt = XFS_FSB_TO_B(sc->mp, mval[n].br_blockcount); + byte_cnt = XFS_SYMLINK_BUF_SPACE(sc->mp, byte_cnt); + byte_cnt = min_t(unsigned int, byte_cnt, len); + + /* + * See if the verifiers accept this block. We're willing to + * salvage if the if the offset/byte/ino are ok and either the + * verifier passed or the magic is ok. Anything else and we + * stop dead in our tracks. + */ + fa = bp->b_ops->verify_struct(bp); + dsl = bp->b_addr; + magic_ok = dsl->sl_magic == cpu_to_be32(XFS_SYMLINK_MAGIC); + hdr_ok = xfs_symlink_hdr_ok(ip->i_ino, offset, byte_cnt, bp); + if (!hdr_ok || (fa != NULL && !magic_ok)) + break; + + memcpy(target_buf + offset, dsl + 1, byte_cnt); + + len -= byte_cnt; + offset += byte_cnt; + } + + /* Ensure we have a zero at the end, and /some/ contents. */ + if (offset == 0) + sprintf(target_buf, "."); + else + target_buf[offset] = 0; + return 0; +} + +/* + * Try to salvage an inline symlink's contents. Empty symlinks become a link + * to the current directory. + */ +STATIC void +xrep_symlink_salvage_inline( + struct xfs_scrub *sc) +{ + struct xfs_inode *ip = sc->ip; + struct xfs_ifork *ifp; + + ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); + if (ifp->if_u1.if_data) + strncpy(sc->buf, ifp->if_u1.if_data, XFS_IFORK_DSIZE(ip)); + if (strlen(sc->buf) == 0) + sprintf(sc->buf, "."); +} + +/* Reset an inline symlink to its fresh configuration. */ +STATIC void +xrep_symlink_truncate_inline( + struct xfs_inode *ip) +{ + xfs_idestroy_fork(ip, XFS_DATA_FORK); + ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS; + ip->i_d.di_nextents = 0; + memset(&ip->i_df, 0, sizeof(struct xfs_ifork)); + ip->i_df.if_flags |= XFS_IFEXTENTS; +} + +/* + * Salvage an inline symlink's contents and reset data fork. + * Returns with the inode joined to the transaction. + */ +STATIC int +xrep_symlink_inline( + struct xfs_scrub *sc) +{ + /* Salvage whatever link target information we can find. */ + xrep_symlink_salvage_inline(sc); + + /* Truncate the symlink. */ + xrep_symlink_truncate_inline(sc->ip); + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + return 0; +} + +/* + * Salvage an inline symlink's contents and reset data fork. + * Returns with the inode joined to the transaction. + */ +STATIC int +xrep_symlink_remote( + struct xfs_scrub *sc) +{ + int error; + + /* Salvage whatever link target information we can find. */ + error = xrep_symlink_salvage_remote(sc); + if (error) + return error; + + /* Truncate the symlink. */ + xfs_trans_ijoin(sc->tp, sc->ip, 0); + return xfs_itruncate_extents(&sc->tp, sc->ip, XFS_DATA_FORK, 0); +} + +/* + * Reinitialize a link target. Caller must ensure the inode is joined to + * the transaction. + */ +STATIC int +xrep_symlink_reinitialize( + struct xfs_scrub *sc) +{ + xfs_fsblock_t fs_blocks; + unsigned int target_len; + uint resblks; + int error; + + /* How many blocks do we need? */ + target_len = strlen(sc->buf); + ASSERT(target_len != 0); + if (target_len == 0 || target_len > XFS_SYMLINK_MAXLEN) + return -EFSCORRUPTED; + + /* Set up to reinitialize the target. */ + fs_blocks = xfs_symlink_blocks(sc->mp, target_len); + resblks = XFS_SYMLINK_SPACE_RES(sc->mp, target_len, fs_blocks); + error = xfs_trans_reserve_quota_nblks(sc->tp, sc->ip, resblks, 0, + XFS_QMOPT_RES_REGBLKS); + + /* Try to write the new target back out. */ + xfs_defer_ijoin(sc->tp->t_dfops, sc->ip); + error = xfs_symlink_write_target(sc->tp, sc->ip, sc->buf, target_len, + fs_blocks, resblks); + if (error) + return error; + + /* Finish up any block mapping activities. */ + return xfs_defer_finish(&sc->tp); +} + +/* Repair a symbolic link. */ +int +xrep_symlink( + struct xfs_scrub *sc) +{ + struct xfs_ifork *ifp; + int error; + + error = xfs_qm_dqattach_locked(sc->ip, false); + if (error) + return error; + + /* Salvage whatever we can of the target. */ + *((char *)sc->buf) = 0; + ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK); + if (ifp->if_flags & XFS_IFINLINE) + error = xrep_symlink_inline(sc); + else + error = xrep_symlink_remote(sc); + if (error) + return error; + + /* Now reset the target. */ + return xrep_symlink_reinitialize(sc); +} diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index 2bfe7fbbedb2..1097d9062ef3 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -150,6 +150,86 @@ xfs_readlink( return error; } +/* Write the symlink target into the inode. */ +int +xfs_symlink_write_target( + struct xfs_trans *tp, + struct xfs_inode *ip, + const char *target_path, + int pathlen, + xfs_fsblock_t fs_blocks, + uint resblks) +{ + struct xfs_bmbt_irec mval[XFS_SYMLINK_MAPS]; + struct xfs_mount *mp = tp->t_mountp; + const char *cur_chunk; + struct xfs_buf *bp; + xfs_daddr_t d; + int byte_cnt; + int nmaps; + int offset; + int n; + int error; + + /* + * If the symlink will fit into the inode, write it inline. + */ + if (pathlen <= XFS_IFORK_DSIZE(ip)) { + xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen); + + ip->i_d.di_size = pathlen; + ip->i_d.di_format = XFS_DINODE_FMT_LOCAL; + xfs_trans_log_inode(tp, ip, XFS_ILOG_DDATA | XFS_ILOG_CORE); + + return 0; + } + + /* Write target to remote blocks. */ + nmaps = XFS_SYMLINK_MAPS; + error = xfs_bmapi_write(tp, ip, 0, fs_blocks, XFS_BMAPI_METADATA, + resblks, mval, &nmaps); + if (error) + return error; + + if (resblks) + resblks -= fs_blocks; + ip->i_d.di_size = pathlen; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + + cur_chunk = target_path; + offset = 0; + for (n = 0; n < nmaps; n++) { + char *buf; + + d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock); + byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount); + bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, d, + BTOBB(byte_cnt), 0); + if (!bp) + return -ENOMEM; + bp->b_ops = &xfs_symlink_buf_ops; + + byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt); + byte_cnt = min(byte_cnt, pathlen); + + buf = bp->b_addr; + buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset, + byte_cnt, bp); + + memcpy(buf, cur_chunk, byte_cnt); + + cur_chunk += byte_cnt; + pathlen -= byte_cnt; + offset += byte_cnt; + + xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SYMLINK_BUF); + xfs_trans_log_buf(tp, bp, 0, (buf + byte_cnt - 1) - + (char *)bp->b_addr); + } + ASSERT(pathlen == 0); + return 0; +} + int xfs_symlink( struct xfs_inode *dp, @@ -164,15 +244,7 @@ xfs_symlink( int error = 0; int pathlen; bool unlock_dp_on_error = false; - xfs_fileoff_t first_fsb; xfs_filblks_t fs_blocks; - int nmaps; - struct xfs_bmbt_irec mval[XFS_SYMLINK_MAPS]; - xfs_daddr_t d; - const char *cur_chunk; - int byte_cnt; - int n; - xfs_buf_t *bp; prid_t prid; struct xfs_dquot *udqp = NULL; struct xfs_dquot *gdqp = NULL; @@ -265,65 +337,11 @@ xfs_symlink( if (resblks) resblks -= XFS_IALLOC_SPACE_RES(mp); - /* - * If the symlink will fit into the inode, write it inline. - */ - if (pathlen <= XFS_IFORK_DSIZE(ip)) { - xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen); - - ip->i_d.di_size = pathlen; - ip->i_d.di_format = XFS_DINODE_FMT_LOCAL; - xfs_trans_log_inode(tp, ip, XFS_ILOG_DDATA | XFS_ILOG_CORE); - } else { - int offset; - - first_fsb = 0; - nmaps = XFS_SYMLINK_MAPS; - - error = xfs_bmapi_write(tp, ip, first_fsb, fs_blocks, - XFS_BMAPI_METADATA, resblks, mval, &nmaps); - if (error) - goto out_trans_cancel; - - if (resblks) - resblks -= fs_blocks; - ip->i_d.di_size = pathlen; - xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - - cur_chunk = target_path; - offset = 0; - for (n = 0; n < nmaps; n++) { - char *buf; - - d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock); - byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount); - bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, d, - BTOBB(byte_cnt), 0); - if (!bp) { - error = -ENOMEM; - goto out_trans_cancel; - } - bp->b_ops = &xfs_symlink_buf_ops; - - byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt); - byte_cnt = min(byte_cnt, pathlen); - - buf = bp->b_addr; - buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset, - byte_cnt, bp); - - memcpy(buf, cur_chunk, byte_cnt); - cur_chunk += byte_cnt; - pathlen -= byte_cnt; - offset += byte_cnt; - - xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SYMLINK_BUF); - xfs_trans_log_buf(tp, bp, 0, (buf + byte_cnt - 1) - - (char *)bp->b_addr); - } - ASSERT(pathlen == 0); - } + error = xfs_symlink_write_target(tp, ip, target_path, pathlen, + fs_blocks, resblks); + if (error) + goto out_trans_cancel; /* * Create the directory entry for the symlink. diff --git a/fs/xfs/xfs_symlink.h b/fs/xfs/xfs_symlink.h index 9743d8c9394b..d7252f9cab41 100644 --- a/fs/xfs/xfs_symlink.h +++ b/fs/xfs/xfs_symlink.h @@ -12,5 +12,8 @@ int xfs_symlink(struct xfs_inode *dp, struct xfs_name *link_name, int xfs_readlink_bmap_ilocked(struct xfs_inode *ip, char *link); int xfs_readlink(struct xfs_inode *ip, char *link); int xfs_inactive_symlink(struct xfs_inode *ip); +int xfs_symlink_write_target(struct xfs_trans *tp, struct xfs_inode *ip, + const char *target_path, int pathlen, xfs_fsblock_t fs_blocks, + uint resblks); #endif /* __XFS_SYMLINK_H */ From patchwork Mon Jul 30 05:49:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548443 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A0EB1751 for ; Mon, 30 Jul 2018 05:49:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 16F432991D for ; Mon, 30 Jul 2018 05:49:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0B07429932; Mon, 30 Jul 2018 05:49:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C45022991D for ; Mon, 30 Jul 2018 05:49:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726344AbeG3HWm (ORCPT ); Mon, 30 Jul 2018 03:22:42 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:43812 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbeG3HWm (ORCPT ); Mon, 30 Jul 2018 03:22:42 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5nHO6038337; Mon, 30 Jul 2018 05:49:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=6BIEo3sfmCg49Ju+XfnXNUfEKuWmWtKyRCVADKw/YtQ=; b=4l7mM5NnON0FttdyIxQLn2DBT6hXS4/5B5YT7BwpmXOCjTNHo9M/7V1NLf2eYJE6hLxv qITe6NW/XT4XbOagrWbQ1Qk2Ci5NGH4SK5IovMxeI22CAhk6IBqPLpWL3iJh0nMGegtU z42tmCCDazHQ+WvvJApXUXbOZzVE559q6jbwY15dVlnHTMP9E5uNX+UqCXqkv9j3xm0p stMyYAi1KS1DhLu71SzDU5MUFM9R3egXkYdS90ETl/SZ9OZg1AGxA2pqeKIngH04ZGQt eo/JL3mfdDjpkDzK3ZmfC1ESN5cUxp77Zhc1v1X6g3/y3QZQb2izvu8QhZZVenHhufoo sA== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2120.oracle.com with ESMTP id 2kggentvcv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:49:19 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5nITu016252 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:49:18 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5nINv004585; Mon, 30 Jul 2018 05:49:18 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:49:17 -0700 Subject: [PATCH 12/14] xfs: repair extended attributes From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:49:16 -0700 Message-ID: <153292975639.24509.12593511556451726584.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300066 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong If the extended attributes look bad, try to sift through the rubble to find whatever keys/values we can, zap the attr tree, and re-add the values. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/attr.c | 2 fs/xfs/scrub/attr_repair.c | 611 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 2 fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/scrub.h | 3 6 files changed, 619 insertions(+), 2 deletions(-) create mode 100644 fs/xfs/scrub/attr_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index e25cde969d99..c3963c88f952 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -164,6 +164,7 @@ xfs-$(CONFIG_XFS_QUOTA) += scrub/quota.o ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y) xfs-y += $(addprefix scrub/, \ agheader_repair.o \ + attr_repair.o \ alloc_repair.o \ bitmap.o \ bmap_repair.o \ diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c index 81d5e90547a1..e20074c241b5 100644 --- a/fs/xfs/scrub/attr.c +++ b/fs/xfs/scrub/attr.c @@ -125,7 +125,7 @@ xchk_xattr_listent( * Within a char, the lowest bit of the char represents the byte with * the smallest address */ -STATIC bool +bool xchk_xattr_set_map( struct xfs_scrub *sc, unsigned long *map, diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c new file mode 100644 index 000000000000..5bacfb88f25e --- /dev/null +++ b/fs/xfs/scrub/attr_repair.c @@ -0,0 +1,611 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_dir2.h" +#include "xfs_attr.h" +#include "xfs_attr_leaf.h" +#include "xfs_attr_sf.h" +#include "xfs_attr_remote.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/trace.h" +#include "scrub/repair.h" + +/* + * Extended Attribute Repair + * ========================= + * + * We repair extended attributes by reading the attribute fork blocks looking + * for keys and values, then truncate the entire attr fork and reinsert all + * the attributes. Unfortunately, there's no secondary copy of most extended + * attribute data, which means that if we blow up midway through there's + * little we can do. + */ + +struct xrep_xattr_key { + struct list_head list; + unsigned char *value; + int valuelen; + int flags; + int namelen; + unsigned char name[0]; +}; + +#define XREP_XATTR_KEY_LEN(namelen) \ + (sizeof(struct xrep_xattr_key) + (namelen) + 1) + +struct xrep_xattr { + struct list_head *attrlist; + struct xfs_scrub *sc; +}; + +/* + * Iterate each block in an attr fork extent. The m_attr_geo fsbcount is + * always 1 for now, but code defensively in case this ever changes. + */ +#define for_each_xfs_attr_block(mp, irec, dabno) \ + for ((dabno) = roundup((xfs_dablk_t)(irec)->br_startoff, \ + (mp)->m_attr_geo->fsbcount); \ + (dabno) < (irec)->br_startoff + (irec)->br_blockcount; \ + (dabno) += (mp)->m_attr_geo->fsbcount) + +/* + * Decide if we want to salvage this attribute. We don't bother with + * incomplete or oversized keys or values. + */ +STATIC int +xrep_xattr_want_salvage( + int flags, + int namelen, + int valuelen) +{ + if (flags & XFS_ATTR_INCOMPLETE) + return false; + if (namelen > XATTR_NAME_MAX || namelen <= 0) + return false; + if (valuelen > XATTR_SIZE_MAX || valuelen < 0) + return false; + return true; +} + +/* Allocate an in-core record to hold xattrs while we rebuild the xattr data. */ +STATIC struct xrep_xattr_key * +xrep_xattr_salvage_key( + int flags, + unsigned char *name, + int namelen, + int valuelen) +{ + struct xrep_xattr_key *key; + + /* Store attr key. */ + key = kmem_alloc(XREP_XATTR_KEY_LEN(namelen), KM_MAYFAIL); + if (!key) + return NULL; + INIT_LIST_HEAD(&key->list); + key->valuelen = valuelen; + key->flags = flags & (ATTR_ROOT | ATTR_SECURE); + key->namelen = namelen; + key->name[namelen] = 0; + memcpy(key->name, name, namelen); + key->value = NULL; + if (valuelen) { + key->value = kmem_alloc_large(valuelen, KM_MAYFAIL); + if (!key->value) { + kmem_free(key); + return NULL; + } + } + return key; +} + +/* + * Record a shortform extended attribute key & value for later reinsertion + * into the inode. + */ +STATIC int +xrep_xattr_salvage_sf_attr( + struct xrep_xattr *rx, + struct xfs_attr_sf_entry *sfe) +{ + unsigned char *value = &sfe->nameval[sfe->namelen]; + struct xrep_xattr_key *key; + + if (!xrep_xattr_want_salvage(sfe->flags, sfe->namelen, sfe->valuelen)) + return 0; + key = xrep_xattr_salvage_key(sfe->flags, sfe->nameval, sfe->namelen, + sfe->valuelen); + if (!key) + return -ENOMEM; + if (sfe->valuelen) + memcpy(key->value, value, sfe->valuelen); + list_add_tail(&key->list, rx->attrlist); + return 0; +} + +/* + * Record a local format extended attribute key & value for later reinsertion + * into the inode. + */ +STATIC int +xrep_xattr_salvage_local_attr( + struct xrep_xattr *rx, + struct xfs_attr_leaf_entry *ent, + unsigned int nameidx, + const char *buf_end, + struct xfs_attr_leaf_name_local *lentry) +{ + struct xrep_xattr_key *key; + unsigned long *usedmap = rx->sc->buf; + unsigned int valuelen; + unsigned int namesize; + + /* + * Decode the leaf local entry format. If something seems wrong, we + * junk the attribute. + */ + valuelen = be16_to_cpu(lentry->valuelen); + namesize = xfs_attr_leaf_entsize_local(lentry->namelen, valuelen); + if ((char *)lentry + namesize > buf_end) + return 0; + if (!xrep_xattr_want_salvage(ent->flags, lentry->namelen, valuelen)) + return 0; + if (!xchk_xattr_set_map(rx->sc, usedmap, nameidx, namesize)) + return 0; + + /* Try to save this attribute. */ + key = xrep_xattr_salvage_key(ent->flags, lentry->nameval, + lentry->namelen, valuelen); + if (!key) + return -ENOMEM; + if (valuelen) + memcpy(key->value, &lentry->nameval[lentry->namelen], valuelen); + list_add_tail(&key->list, rx->attrlist); + return 0; +} + +/* + * Record a remote format extended attribute key & value for later reinsertion + * into the inode. + */ +STATIC int +xrep_xattr_salvage_remote_attr( + struct xrep_xattr *rx, + struct xfs_attr_leaf_entry *ent, + unsigned int nameidx, + const char *buf_end, + struct xfs_attr_leaf_name_remote *rentry, + unsigned int ent_idx, + struct xfs_buf *leaf_bp) +{ + struct xfs_da_args args = { + .trans = rx->sc->tp, + .dp = rx->sc->ip, + .index = ent_idx, + .geo = rx->sc->mp->m_attr_geo, + }; + struct xrep_xattr_key *key; + unsigned long *usedmap = rx->sc->buf; + unsigned int valuelen; + unsigned int namesize; + int error; + + /* + * Decode the leaf remote entry format. If something seems wrong, we + * junk the attribute. Note that we should never find a zero-length + * remote attribute value. + */ + valuelen = be32_to_cpu(rentry->valuelen); + namesize = xfs_attr_leaf_entsize_remote(rentry->namelen); + if ((char *)rentry + namesize > buf_end) + return 0; + if (valuelen == 0 || + !xrep_xattr_want_salvage(ent->flags, rentry->namelen, valuelen)) + return 0; + if (!xchk_xattr_set_map(rx->sc, usedmap, nameidx, namesize)) + return 0; + + /* Try to save this attribute. */ + key = xrep_xattr_salvage_key(ent->flags, rentry->name, rentry->namelen, + valuelen); + if (!key) + return -ENOMEM; + + /* Look up the remote value and stash it for reconstruction. */ + args.valuelen = valuelen; + args.namelen = rentry->namelen; + args.name = key->name; + args.value = key->value; + error = xfs_attr3_leaf_getvalue(leaf_bp, &args); + if (error || args.rmtblkno == 0) + goto err_free; + + error = xfs_attr_rmtval_get(&args); + if (error == 0) { + /* Got the value, add the attr and get out. */ + list_add_tail(&key->list, rx->attrlist); + return 0; + } + +err_free: + /* remote value was garbage, junk it */ + if (error == -EFSBADCRC || error == -EFSCORRUPTED) + error = 0; + kmem_free(key->value); + kmem_free(key); + return error; +} + +/* Extract every xattr key that we can from this attr fork block. */ +STATIC int +xrep_xattr_recover_leaf( + struct xrep_xattr *rx, + struct xfs_buf *bp) +{ + struct xfs_attr3_icleaf_hdr leafhdr; + struct xfs_scrub *sc = rx->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_attr_leafblock *leaf; + unsigned long *usedmap = sc->buf; + struct xfs_attr_leaf_name_local *lentry; + struct xfs_attr_leaf_name_remote *rentry; + struct xfs_attr_leaf_entry *ent; + struct xfs_attr_leaf_entry *entries; + char *buf_end; + size_t off; + unsigned int nameidx; + unsigned int hdrsize; + int i; + int error = 0; + + bitmap_zero(usedmap, mp->m_attr_geo->blksize); + + /* Check the leaf header */ + leaf = bp->b_addr; + xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf); + hdrsize = xfs_attr3_leaf_hdr_size(leaf); + xchk_xattr_set_map(sc, usedmap, 0, hdrsize); + entries = xfs_attr3_leaf_entryp(leaf); + + buf_end = (char *)bp->b_addr + mp->m_attr_geo->blksize; + for (i = 0, ent = entries; i < leafhdr.count; ent++, i++) { + /* Skip key if it conflicts with something else? */ + off = (char *)ent - (char *)leaf; + if (!xchk_xattr_set_map(sc, usedmap, off, + sizeof(xfs_attr_leaf_entry_t))) + continue; + + /* Check the name information. */ + nameidx = be16_to_cpu(ent->nameidx); + if (nameidx < leafhdr.firstused || + nameidx >= mp->m_attr_geo->blksize) + continue; + + if (ent->flags & XFS_ATTR_LOCAL) { + lentry = xfs_attr3_leaf_name_local(leaf, i); + error = xrep_xattr_salvage_local_attr(rx, ent, nameidx, + buf_end, lentry); + } else { + rentry = xfs_attr3_leaf_name_remote(leaf, i); + error = xrep_xattr_salvage_remote_attr(rx, ent, nameidx, + buf_end, rentry, i, bp); + } + if (error) + break; + } + + return error; +} + +/* Try to recover shortform attrs. */ +STATIC int +xrep_xattr_recover_sf( + struct xrep_xattr *rx) +{ + struct xfs_attr_shortform *sf; + struct xfs_attr_sf_entry *sfe; + struct xfs_attr_sf_entry *next; + struct xfs_ifork *ifp; + unsigned char *end; + int i; + int error; + + ifp = XFS_IFORK_PTR(rx->sc->ip, XFS_ATTR_FORK); + sf = (struct xfs_attr_shortform *)rx->sc->ip->i_afp->if_u1.if_data; + end = (unsigned char *)ifp->if_u1.if_data + ifp->if_bytes; + + for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) { + next = XFS_ATTR_SF_NEXTENTRY(sfe); + if ((unsigned char *)next > end) + break; + + /* Ok, let's save this key/value. */ + error = xrep_xattr_salvage_sf_attr(rx, sfe); + if (error) + return error; + + sfe = next; + } + + return 0; +} + +/* Extract as many attribute keys and values as we can. */ +STATIC int +xrep_xattr_recover( + struct xrep_xattr *rx) +{ + struct xfs_iext_cursor icur; + struct xfs_bmbt_irec got; + struct xfs_scrub *sc = rx->sc; + struct xfs_ifork *ifp; + struct xfs_da_blkinfo *info; + struct xfs_buf *bp; + xfs_dablk_t dabno; + int error = 0; + + if (sc->ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) + return xrep_xattr_recover_sf(rx); + + /* Iterate each attr block in the attr fork. */ + ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK); + for_each_xfs_iext(ifp, &icur, &got) { + for_each_xfs_attr_block(sc->mp, &got, dabno) { + /* + * Try to read buffer. We invalidate them in the next + * step so we don't bother to set a buffer type or + * ops. + */ + error = xfs_da_read_buf(sc->tp, sc->ip, dabno, -1, &bp, + XFS_ATTR_FORK, NULL); + if (error || !bp) + continue; + + /* Screen out non-leaves & other garbage. */ + info = bp->b_addr; + if (info->magic != cpu_to_be16(XFS_ATTR3_LEAF_MAGIC) || + xfs_attr3_leaf_buf_ops.verify_struct(bp) != NULL) + continue; + + error = xrep_xattr_recover_leaf(rx, bp); + if (error) + return error; + } + } + + return error; +} + +/* Free all the attribute fork blocks and delete the fork. */ +STATIC int +xrep_xattr_reset_btree( + struct xfs_scrub *sc) +{ + struct xfs_iext_cursor icur; + struct xfs_bmbt_irec got; + struct xfs_ifork *ifp; + struct xfs_buf *bp; + xfs_fileoff_t lblk; + int error; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + if (sc->ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) + goto out_fork_remove; + + /* Invalidate each attr block in the attr fork. */ + ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK); + for_each_xfs_iext(ifp, &icur, &got) { + for_each_xfs_attr_block(sc->mp, &got, lblk) { + error = xfs_da_get_buf(sc->tp, sc->ip, lblk, -1, &bp, + XFS_ATTR_FORK); + if (error || !bp) + continue; + xfs_trans_binval(sc->tp, bp); + error = xfs_trans_roll_inode(&sc->tp, sc->ip); + if (error) + return error; + } + } + + /* Now free all the blocks. */ + error = xfs_itruncate_extents(&sc->tp, sc->ip, XFS_ATTR_FORK, 0); + if (error) + return error; + +out_fork_remove: + /* Reset the attribute fork - this also destroys the in-core fork */ + xfs_attr_fork_remove(sc->ip, sc->tp); + return 0; +} + +/* + * Compare two xattr keys. ATTR_SECURE keys come before ATTR_ROOT and + * ATTR_ROOT keys come before user attrs. Otherwise sort in hash order. + */ +static int +xrep_xattr_key_cmp( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xrep_xattr_key *ap; + struct xrep_xattr_key *bp; + uint ahash; + uint bhash; + + ap = container_of(a, struct xrep_xattr_key, list); + bp = container_of(b, struct xrep_xattr_key, list); + + if (ap->flags > bp->flags) + return 1; + else if (ap->flags < bp->flags) + return -1; + + ahash = xfs_da_hashname(ap->name, ap->namelen); + bhash = xfs_da_hashname(bp->name, bp->namelen); + if (ahash > bhash) + return 1; + else if (ahash < bhash) + return -1; + return 0; +} + +/* + * Find all the extended attributes for this inode by scraping them out of the + * attribute key blocks by hand. The caller must clean up the lists if + * anything goes wrong. + */ +STATIC int +xrep_xattr_find_attributes( + struct xfs_scrub *sc, + struct list_head *attrlist) +{ + struct xrep_xattr rx; + struct xfs_ifork *ifp; + int error; + + error = xrep_ino_dqattach(sc); + if (error) + return error; + + /* Extent map should be loaded. */ + ifp = XFS_IFORK_PTR(sc->ip, XFS_ATTR_FORK); + if (XFS_IFORK_FORMAT(sc->ip, XFS_ATTR_FORK) != XFS_DINODE_FMT_LOCAL && + !(ifp->if_flags & XFS_IFEXTENTS)) { + error = xfs_iread_extents(sc->tp, sc->ip, XFS_ATTR_FORK); + if (error) + return error; + } + + rx.attrlist = attrlist; + rx.sc = sc; + + /* Read every attr key and value and record them in memory. */ + return xrep_xattr_recover(&rx); +} + +/* Free all the attributes. */ +STATIC void +xrep_xattr_cancel_attrs( + struct list_head *attrlist) +{ + struct xrep_xattr_key *key; + struct xrep_xattr_key *n; + + list_for_each_entry_safe(key, n, attrlist, list) { + list_del(&key->list); + kmem_free(key->value); + kmem_free(key); + } +} + +/* + * Insert all the attributes that we collected. + * + * Commit the repair transaction and drop the ilock because the attribute + * setting code needs to be able to allocate special transactions and take the + * ilock on its own. Some day we'll have deferred attribute setting, at which + * point we'll be able to use that to replace the attributes atomically and + * safely. + */ +STATIC int +xrep_xattr_rebuild_tree( + struct xfs_scrub *sc, + struct list_head *attrlist) +{ + struct xrep_xattr_key *key; + struct xrep_xattr_key *n; + int error; + + error = xfs_trans_commit(sc->tp); + sc->tp = NULL; + if (error) + return error; + + xfs_iunlock(sc->ip, XFS_ILOCK_EXCL); + sc->ilock_flags &= ~XFS_ILOCK_EXCL; + + /* Re-add every attr to the file. */ + list_sort(NULL, attrlist, xrep_xattr_key_cmp); + list_for_each_entry_safe(key, n, attrlist, list) { + error = xfs_attr_set(sc->ip, key->name, key->value, + key->valuelen, key->flags); + if (error) + return error; + + /* + * If the attr value is larger than a single page, free the + * key now so that we aren't hogging memory while doing a lot + * of metadata updates. Otherwise, we want to spend as little + * time reconstructing the attrs as we possibly can. + */ + if (key->valuelen <= PAGE_SIZE) + continue; + list_del(&key->list); + kmem_free(key->value); + kmem_free(key); + } + + xrep_xattr_cancel_attrs(attrlist); + return 0; +} + +/* + * Repair the extended attribute metadata. + * + * XXX: Remote attribute value buffers encompass the entire (up to 64k) buffer. + * The buffer cache in XFS can't handle aliased multiblock buffers, so this + * might misbehave if the attr fork is crosslinked with other filesystem + * metadata. + */ +int +xrep_xattr( + struct xfs_scrub *sc) +{ + struct list_head attrlist; + int error; + + if (!xfs_inode_hasattr(sc->ip)) + return -ENOENT; + + /* Collect extended attributes by parsing raw blocks. */ + INIT_LIST_HEAD(&attrlist); + error = xrep_xattr_find_attributes(sc, &attrlist); + if (error) + goto out; + + /* + * Invalidate and truncate all attribute fork extents. This is the + * point at which we are no longer able to bail out gracefully. + * We commit the transaction here because xfs_attr_set allocates its + * own transactions. + */ + error = xrep_xattr_reset_btree(sc); + if (error) + goto out; + + /* Now rebuild the attribute information. */ + error = xrep_xattr_rebuild_tree(sc, &attrlist); +out: + xrep_xattr_cancel_attrs(&attrlist); + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 17769efb20d9..b630084d0f39 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -69,6 +69,7 @@ int xrep_inode(struct xfs_scrub *sc); int xrep_bmap_data(struct xfs_scrub *sc); int xrep_bmap_attr(struct xfs_scrub *sc); int xrep_symlink(struct xfs_scrub *sc); +int xrep_xattr(struct xfs_scrub *sc); #else @@ -110,6 +111,7 @@ xrep_reset_perag_resv( #define xrep_bmap_data xrep_notsupported #define xrep_bmap_attr xrep_notsupported #define xrep_symlink xrep_notsupported +#define xrep_xattr xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 0a8eea77e58f..537636d789fb 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -301,7 +301,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_INODE, .setup = xchk_setup_xattr, .scrub = xchk_xattr, - .repair = xrep_notsupported, + .repair = xrep_xattr, }, [XFS_SCRUB_TYPE_SYMLINK] = { /* symbolic link */ .type = ST_INODE, diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 762db46fd696..d7ad8fad9318 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -139,4 +139,7 @@ void xchk_xref_is_used_rt_space(struct xfs_scrub *sc, xfs_rtblock_t rtbno, # define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0) #endif +bool xchk_xattr_set_map(struct xfs_scrub *sc, unsigned long *map, + unsigned int start, unsigned int len); + #endif /* __XFS_SCRUB_SCRUB_H__ */ From patchwork Mon Jul 30 05:49:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548445 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EDA75139A for ; Mon, 30 Jul 2018 05:49:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDE9A2991D for ; Mon, 30 Jul 2018 05:49:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D23D429932; Mon, 30 Jul 2018 05:49:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6F5552991D for ; Mon, 30 Jul 2018 05:49:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726259AbeG3HWr (ORCPT ); Mon, 30 Jul 2018 03:22:47 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:59372 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbeG3HWr (ORCPT ); Mon, 30 Jul 2018 03:22:47 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5nHQp002596; Mon, 30 Jul 2018 05:49:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=d8337X/gYRbmUvYRUIQzoYOrBpH5Eo+4nw9WPwYznUk=; b=yzDCw3e7vo56gx/oVbEi70hJD5yf7UA3r5tEx5R9OTTNju/fNb5fWAxXDNW2QkW/qiv5 C8Y5tZWvdyus94aJvGjhPSCHAdUtLIh0b3XKDsR6WGf8JIohTlMqwVoIkmgFFZA/yLBL v2XBgvtX1U9g17JNhtKX5fsuc8GgmOPl1XHKfvBfK/qxsjOvyPOQsnPnuoTzRXtijlT0 JF8YJfbTHH11mzsNGmmMsIJQ2fSSDS9GkuGmkweoMOCwlPu5aQqb+fJs4X3jr53i6AGo V2KEZ08pl/SbJT0d4KkYYJKAK4kx6jVbS1UrhXf6rsIjofAm4Oq+pYYscxCtuRFHYzCq 3w== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2kgh4ptt6r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:49:25 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5nO3x019483 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:49:25 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6U5nOlb003626; Mon, 30 Jul 2018 05:49:24 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:49:24 -0700 Subject: [PATCH 13/14] xfs: scrub should set preen if attr leaf has holes From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com, Dave Chinner Date: Sun, 29 Jul 2018 22:49:23 -0700 Message-ID: <153292976302.24509.5442706644295723757.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300066 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong If an attr block indicates that it could use compaction, set the preen flag to have the attr fork rebuilt, since the attr fork rebuilder can take care of that for us. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/scrub/attr.c | 2 ++ fs/xfs/scrub/dabtree.c | 15 +++++++++++++++ fs/xfs/scrub/dabtree.h | 1 + fs/xfs/scrub/trace.h | 1 + 4 files changed, 19 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c index e20074c241b5..0956d4588dc5 100644 --- a/fs/xfs/scrub/attr.c +++ b/fs/xfs/scrub/attr.c @@ -293,6 +293,8 @@ xchk_xattr_block( xchk_da_set_corrupt(ds, level); if (!xchk_xattr_set_map(ds->sc, usedmap, 0, hdrsize)) xchk_da_set_corrupt(ds, level); + if (leafhdr.holes) + xchk_da_set_preen(ds, level); if (ds->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) goto out; diff --git a/fs/xfs/scrub/dabtree.c b/fs/xfs/scrub/dabtree.c index f1260b4bfdee..e2ecf9c77010 100644 --- a/fs/xfs/scrub/dabtree.c +++ b/fs/xfs/scrub/dabtree.c @@ -85,6 +85,21 @@ xchk_da_set_corrupt( __return_address); } +/* Flag a da btree node in need of optimization. */ +void +xchk_da_set_preen( + struct xchk_da_btree *ds, + int level) +{ + struct xfs_scrub *sc = ds->sc; + + sc->sm->sm_flags |= XFS_SCRUB_OFLAG_PREEN; + trace_xchk_fblock_preen(sc, ds->dargs.whichfork, + xfs_dir2_da_to_db(ds->dargs.geo, + ds->state->path.blk[level].blkno), + __return_address); +} + /* Find an entry at a certain level in a da btree. */ STATIC void * xchk_da_btree_entry( diff --git a/fs/xfs/scrub/dabtree.h b/fs/xfs/scrub/dabtree.h index cb3f0003245b..b367bf87a183 100644 --- a/fs/xfs/scrub/dabtree.h +++ b/fs/xfs/scrub/dabtree.h @@ -36,6 +36,7 @@ bool xchk_da_process_error(struct xchk_da_btree *ds, int level, int *error); /* Check for da btree corruption. */ void xchk_da_set_corrupt(struct xchk_da_btree *ds, int level); +void xchk_da_set_preen(struct xchk_da_btree *ds, int level); int xchk_da_btree_hash(struct xchk_da_btree *ds, int level, __be32 *hashp); int xchk_da_btree(struct xfs_scrub *sc, int whichfork, diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 3383b14fd0c0..d7133d1d23d6 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -230,6 +230,7 @@ DEFINE_EVENT(xchk_fblock_error_class, name, \ DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xchk_fblock_error); DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xchk_fblock_warning); +DEFINE_SCRUB_FBLOCK_ERROR_EVENT(xchk_fblock_preen); TRACE_EVENT(xchk_incomplete, TP_PROTO(struct xfs_scrub *sc, void *ret_ip), From patchwork Mon Jul 30 05:49:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10548447 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CABAA1751 for ; Mon, 30 Jul 2018 05:49:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B81852991D for ; Mon, 30 Jul 2018 05:49:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AC21029932; Mon, 30 Jul 2018 05:49:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96CBD2991D for ; Mon, 30 Jul 2018 05:49:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726087AbeG3HWz (ORCPT ); Mon, 30 Jul 2018 03:22:55 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:54586 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbeG3HWz (ORCPT ); Mon, 30 Jul 2018 03:22:55 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6U5nWwN009196; Mon, 30 Jul 2018 05:49:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=xz5npab1tGwtT30zhWthKxkduUPv74kUz4y370QK1eU=; b=IQvm+Jk8UW+MSdmlZLZ9rRr0jPcjDRNxP0zVvxCo2MH/91cG8Dqzd5+jDfLzQyYkpPRG cEGLmS4+IwAOcMB69LpgTcVir7/FWqATAxuWhlB2eE5uG7CvUhdP2+PjWz8LMuo404AB E8WPGFo7eoP4xQ1VJI+qinGuRge1d8jbhrtr2ViDk/hiqkkwgWPs+ZC1Mczv06zTCruB 1fHVGI2MEWHG7bMH1eRlo1feWth72oK/qd9hwgZA3/vm2S4mUa0NBmclZZcqiSAj19RU jllmv+jXmrVxkN5FuuQoR+m7eiXPdx0nXibdsLESHVmzrGtY8QTOzFdiRn4JahOlhuji jg== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2kgfwstx6w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:49:31 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6U5nURb016997 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Jul 2018 05:49:31 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6U5nU7n031980; Mon, 30 Jul 2018 05:49:30 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 29 Jul 2018 22:49:30 -0700 Subject: [PATCH 14/14] xfs: repair quotas From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, david@fromorbit.com, allison.henderson@oracle.com Date: Sun, 29 Jul 2018 22:49:29 -0700 Message-ID: <153292976946.24509.5875092286243407619.stgit@magnolia> In-Reply-To: <153292966714.24509.15809693393247424274.stgit@magnolia> References: <153292966714.24509.15809693393247424274.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8969 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807300066 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Fix anything that causes the quota verifiers to fail. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/attr_repair.c | 2 fs/xfs/scrub/common.h | 9 + fs/xfs/scrub/quota.c | 2 fs/xfs/scrub/quota_repair.c | 363 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.c | 58 +++++++ fs/xfs/scrub/repair.h | 8 + fs/xfs/scrub/scrub.c | 11 + 8 files changed, 446 insertions(+), 8 deletions(-) create mode 100644 fs/xfs/scrub/quota_repair.c -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index c3963c88f952..ed1fc827ed15 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -174,5 +174,6 @@ xfs-y += $(addprefix scrub/, \ repair.o \ symlink_repair.o \ ) +xfs-$(CONFIG_XFS_QUOTA) += scrub/quota_repair.o endif endif diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c index 5bacfb88f25e..e01ca4350857 100644 --- a/fs/xfs/scrub/attr_repair.c +++ b/fs/xfs/scrub/attr_repair.c @@ -395,7 +395,7 @@ xrep_xattr_recover( } /* Free all the attribute fork blocks and delete the fork. */ -STATIC int +int xrep_xattr_reset_btree( struct xfs_scrub *sc) { diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 2d4324d12f9a..aab82f7f9a67 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -138,4 +138,13 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm) int xchk_metadata_inode_forks(struct xfs_scrub *sc); int xchk_ilock_inverted(struct xfs_inode *ip, uint lock_mode); +/* Do we need to invoke the repair tool? */ +static inline bool xfs_scrub_needs_repair(struct xfs_scrub_metadata *sm) +{ + return sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT | + XFS_SCRUB_OFLAG_XCORRUPT | + XFS_SCRUB_OFLAG_PREEN); +} +uint xchk_quota_to_dqtype(struct xfs_scrub *sc); + #endif /* __XFS_SCRUB_COMMON_H__ */ diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c index 782d582d3edd..0e5578ab088e 100644 --- a/fs/xfs/scrub/quota.c +++ b/fs/xfs/scrub/quota.c @@ -29,7 +29,7 @@ #include "scrub/trace.h" /* Convert a scrub type code to a DQ flag, or return 0 if error. */ -static inline uint +uint xchk_quota_to_dqtype( struct xfs_scrub *sc) { diff --git a/fs/xfs/scrub/quota_repair.c b/fs/xfs/scrub/quota_repair.c new file mode 100644 index 000000000000..36635f7ca217 --- /dev/null +++ b/fs/xfs/scrub/quota_repair.c @@ -0,0 +1,363 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2018 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_inode_fork.h" +#include "xfs_alloc.h" +#include "xfs_bmap.h" +#include "xfs_quota.h" +#include "xfs_qm.h" +#include "xfs_dquot.h" +#include "xfs_dquot_item.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/trace.h" +#include "scrub/repair.h" + +/* + * Quota Repair + * ============ + * + * Quota repairs are fairly simplistic; we fix everything that the dquot + * verifiers complain about, cap any counters or limits that make no sense, + * and schedule a quotacheck if we had to fix anything. We also repair any + * data fork extent records that don't apply to metadata files. + */ + +struct xrep_quota_info { + struct xfs_scrub *sc; + bool need_quotacheck; +}; + +/* Scrub the fields in an individual quota item. */ +STATIC int +xrep_quota_item( + struct xfs_dquot *dq, + uint dqtype, + void *priv) +{ + struct xrep_quota_info *rqi = priv; + struct xfs_scrub *sc = rqi->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_disk_dquot *d = &dq->q_core; + unsigned long long bsoft; + unsigned long long isoft; + unsigned long long rsoft; + unsigned long long bhard; + unsigned long long ihard; + unsigned long long rhard; + unsigned long long bcount; + unsigned long long icount; + unsigned long long rcount; + xfs_ino_t fs_icount; + bool dirty = false; + int error; + + /* Did we get the dquot type we wanted? */ + if (dqtype != (d->d_flags & XFS_DQ_ALLTYPES)) { + d->d_flags = dqtype; + dirty = true; + } + + if (d->d_pad0 || d->d_pad) { + d->d_pad0 = 0; + d->d_pad = 0; + dirty = true; + } + + /* Check the limits. */ + bhard = be64_to_cpu(d->d_blk_hardlimit); + ihard = be64_to_cpu(d->d_ino_hardlimit); + rhard = be64_to_cpu(d->d_rtb_hardlimit); + + bsoft = be64_to_cpu(d->d_blk_softlimit); + isoft = be64_to_cpu(d->d_ino_softlimit); + rsoft = be64_to_cpu(d->d_rtb_softlimit); + + if (bsoft > bhard) { + d->d_blk_softlimit = d->d_blk_hardlimit; + dirty = true; + } + + if (isoft > ihard) { + d->d_ino_softlimit = d->d_ino_hardlimit; + dirty = true; + } + + if (rsoft > rhard) { + d->d_rtb_softlimit = d->d_rtb_hardlimit; + dirty = true; + } + + /* Check the resource counts. */ + bcount = be64_to_cpu(d->d_bcount); + icount = be64_to_cpu(d->d_icount); + rcount = be64_to_cpu(d->d_rtbcount); + fs_icount = percpu_counter_sum(&mp->m_icount); + + /* + * Check that usage doesn't exceed physical limits. However, on + * a reflink filesystem we're allowed to exceed physical space + * if there are no quota limits. We don't know what the real number + * is, but we can make quotacheck find out for us. + */ + if (!xfs_sb_version_hasreflink(&mp->m_sb) && + mp->m_sb.sb_dblocks < bcount) { + dq->q_res_bcount -= be64_to_cpu(dq->q_core.d_bcount); + dq->q_res_bcount += mp->m_sb.sb_dblocks; + d->d_bcount = cpu_to_be64(mp->m_sb.sb_dblocks); + rqi->need_quotacheck = true; + dirty = true; + } + if (icount > fs_icount) { + dq->q_res_icount -= be64_to_cpu(dq->q_core.d_icount); + dq->q_res_icount += fs_icount; + d->d_icount = cpu_to_be64(fs_icount); + rqi->need_quotacheck = true; + dirty = true; + } + if (rcount > mp->m_sb.sb_rblocks) { + dq->q_res_rtbcount -= be64_to_cpu(dq->q_core.d_rtbcount); + dq->q_res_rtbcount += mp->m_sb.sb_rblocks; + d->d_rtbcount = cpu_to_be64(mp->m_sb.sb_rblocks); + rqi->need_quotacheck = true; + dirty = true; + } + + if (!dirty) + return 0; + + dq->dq_flags |= XFS_DQ_DIRTY; + xfs_trans_dqjoin(sc->tp, dq); + xfs_trans_log_dquot(sc->tp, dq); + error = xfs_trans_roll(&sc->tp); + xfs_dqlock(dq); + return error; +} + +/* Fix a quota timer so that we can pass the verifier. */ +STATIC void +xrep_quota_fix_timer( + __be64 softlimit, + __be64 countnow, + __be32 *timer, + time_t timelimit) +{ + uint64_t soft = be64_to_cpu(softlimit); + uint64_t count = be64_to_cpu(countnow); + + if (soft && count > soft && *timer == 0) + *timer = cpu_to_be32(get_seconds() + timelimit); +} + +/* Fix anything the verifiers complain about. */ +STATIC int +xrep_quota_block( + struct xfs_scrub *sc, + struct xfs_buf *bp, + uint dqtype, + xfs_dqid_t id) +{ + struct xfs_dqblk *d = (struct xfs_dqblk *)bp->b_addr; + struct xfs_disk_dquot *ddq; + struct xfs_quotainfo *qi = sc->mp->m_quotainfo; + enum xfs_blft buftype = 0; + int i; + + bp->b_ops = &xfs_dquot_buf_ops; + for (i = 0; i < qi->qi_dqperchunk; i++) { + ddq = &d[i].dd_diskdq; + + ddq->d_magic = cpu_to_be16(XFS_DQUOT_MAGIC); + ddq->d_version = XFS_DQUOT_VERSION; + ddq->d_flags = dqtype; + ddq->d_id = cpu_to_be32(id + i); + + xrep_quota_fix_timer(ddq->d_blk_softlimit, + ddq->d_bcount, &ddq->d_btimer, + qi->qi_btimelimit); + xrep_quota_fix_timer(ddq->d_ino_softlimit, + ddq->d_icount, &ddq->d_itimer, + qi->qi_itimelimit); + xrep_quota_fix_timer(ddq->d_rtb_softlimit, + ddq->d_rtbcount, &ddq->d_rtbtimer, + qi->qi_rtbtimelimit); + + /* We only support v5 filesystems so always set these. */ + uuid_copy(&d->dd_uuid, &sc->mp->m_sb.sb_meta_uuid); + xfs_update_cksum((char *)d, sizeof(struct xfs_dqblk), + XFS_DQUOT_CRC_OFF); + d->dd_lsn = 0; + } + switch (dqtype) { + case XFS_DQ_USER: + buftype = XFS_BLFT_UDQUOT_BUF; + break; + case XFS_DQ_GROUP: + buftype = XFS_BLFT_GDQUOT_BUF; + break; + case XFS_DQ_PROJ: + buftype = XFS_BLFT_PDQUOT_BUF; + break; + } + xfs_trans_buf_set_type(sc->tp, bp, buftype); + xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1); + return xfs_trans_roll(&sc->tp); +} + +/* Repair quota's data fork. */ +STATIC int +xrep_quota_data_fork( + struct xfs_scrub *sc, + uint dqtype) +{ + struct xfs_bmbt_irec irec = { 0 }; + struct xfs_iext_cursor icur; + struct xfs_quotainfo *qi = sc->mp->m_quotainfo; + struct xfs_ifork *ifp; + struct xfs_buf *bp; + struct xfs_dqblk *d; + xfs_dqid_t id; + xfs_fileoff_t max_dqid_off; + xfs_fileoff_t off; + xfs_fsblock_t fsbno; + bool truncate = false; + int error = 0; + + error = xrep_metadata_inode_forks(sc); + if (error) + goto out; + + /* Check for data fork problems that apply only to quota files. */ + max_dqid_off = ((xfs_dqid_t)-1) / qi->qi_dqperchunk; + ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK); + for_each_xfs_iext(ifp, &icur, &irec) { + if (isnullstartblock(irec.br_startblock)) { + error = -EFSCORRUPTED; + goto out; + } + + if (irec.br_startoff > max_dqid_off || + irec.br_startoff + irec.br_blockcount - 1 > max_dqid_off) { + truncate = true; + break; + } + } + if (truncate) { + error = xfs_itruncate_extents(&sc->tp, sc->ip, XFS_DATA_FORK, + max_dqid_off * sc->mp->m_sb.sb_blocksize); + if (error) + goto out; + } + + /* Now go fix anything that fails the verifiers. */ + for_each_xfs_iext(ifp, &icur, &irec) { + for (fsbno = irec.br_startblock, off = irec.br_startoff; + fsbno < irec.br_startblock + irec.br_blockcount; + fsbno += XFS_DQUOT_CLUSTER_SIZE_FSB, + off += XFS_DQUOT_CLUSTER_SIZE_FSB) { + id = off * qi->qi_dqperchunk; + error = xfs_trans_read_buf(sc->mp, sc->tp, + sc->mp->m_ddev_targp, + XFS_FSB_TO_DADDR(sc->mp, fsbno), + qi->qi_dqchunklen, + 0, &bp, &xfs_dquot_buf_ops); + if (error == 0) { + d = (struct xfs_dqblk *)bp->b_addr; + if (id == be32_to_cpu(d->dd_diskdq.d_id)) { + xfs_trans_brelse(sc->tp, bp); + continue; + } + error = -EFSCORRUPTED; + xfs_trans_brelse(sc->tp, bp); + } + if (error != -EFSBADCRC && error != -EFSCORRUPTED) + goto out; + + /* Failed verifier, try again. */ + error = xfs_trans_read_buf(sc->mp, sc->tp, + sc->mp->m_ddev_targp, + XFS_FSB_TO_DADDR(sc->mp, fsbno), + qi->qi_dqchunklen, + 0, &bp, NULL); + if (error) + goto out; + + /* + * Fix the quota block, which will roll our transaction + * and release bp. + */ + error = xrep_quota_block(sc, bp, dqtype, id); + if (error) + goto out; + } + } + +out: + return error; +} + +/* + * Go fix anything in the quota items that we could have been mad about. Now + * that we've checked the quota inode data fork we have to drop ILOCK_EXCL to + * use the regular dquot functions. + */ +STATIC int +xrep_quota_problems( + struct xfs_scrub *sc, + uint dqtype) +{ + struct xrep_quota_info rqi; + int error; + + rqi.sc = sc; + rqi.need_quotacheck = false; + error = xfs_qm_dqiterate(sc->mp, dqtype, xrep_quota_item, &rqi); + if (error) + return error; + + /* Make a quotacheck happen. */ + if (rqi.need_quotacheck) + xrep_force_quotacheck(sc, dqtype); + return 0; +} + +/* Repair all of a quota type's items. */ +int +xrep_quota( + struct xfs_scrub *sc) +{ + uint dqtype; + int error; + + dqtype = xchk_quota_to_dqtype(sc); + + /* Fix problematic data fork mappings. */ + error = xrep_quota_data_fork(sc, dqtype); + if (error) + goto out; + + /* Unlock quota inode; we play only with dquots from now on. */ + xfs_iunlock(sc->ip, sc->ilock_flags); + sc->ilock_flags = 0; + + /* Fix anything the dquot verifiers complain about. */ + error = xrep_quota_problems(sc, dqtype); +out: + return error; +} diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index a44deb6f06ab..27cc50178d86 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -29,6 +29,8 @@ #include "xfs_ag_resv.h" #include "xfs_trans_space.h" #include "xfs_quota.h" +#include "xfs_attr.h" +#include "xfs_reflink.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -900,3 +902,59 @@ xrep_reset_perag_resv( out: return error; } + +/* + * Repair the attr/data forks of a metadata inode. The metadata inode must be + * pointed to by sc->ip and the ILOCK must be held. + */ +int +xrep_metadata_inode_forks( + struct xfs_scrub *sc) +{ + __u32 smtype; + __u32 smflags; + int error; + + smtype = sc->sm->sm_type; + smflags = sc->sm->sm_flags; + + /* Let's see if the forks need repair. */ + sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT; + error = xchk_metadata_inode_forks(sc); + if (error || !xfs_scrub_needs_repair(sc->sm)) + goto out; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* Clear the reflink flag & attr forks that we shouldn't have. */ + if (xfs_is_reflink_inode(sc->ip)) { + error = xfs_reflink_clear_inode_flag(sc->ip, &sc->tp); + if (error) + goto out; + } + + if (xfs_inode_hasattr(sc->ip)) { + error = xrep_xattr_reset_btree(sc); + if (error) + goto out; + } + + /* Repair the data fork. */ + sc->sm->sm_type = XFS_SCRUB_TYPE_BMBTD; + error = xrep_bmap_data(sc); + sc->sm->sm_type = smtype; + if (error) + goto out; + + /* Bail out if we still need repairs. */ + sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT; + error = xchk_metadata_inode_forks(sc); + if (error) + goto out; + if (xfs_scrub_needs_repair(sc->sm)) + error = -EFSCORRUPTED; +out: + sc->sm->sm_type = smtype; + sc->sm->sm_flags = smflags; + return error; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index b630084d0f39..aa032a7b99d0 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -54,6 +54,8 @@ int xrep_find_ag_btree_roots(struct xfs_scrub *sc, struct xfs_buf *agf_bp, void xrep_force_quotacheck(struct xfs_scrub *sc, uint dqtype); int xrep_ino_dqattach(struct xfs_scrub *sc); int xrep_reset_perag_resv(struct xfs_scrub *sc); +int xrep_xattr_reset_btree(struct xfs_scrub *sc); +int xrep_metadata_inode_forks(struct xfs_scrub *sc); /* Metadata repairers */ @@ -70,6 +72,11 @@ int xrep_bmap_data(struct xfs_scrub *sc); int xrep_bmap_attr(struct xfs_scrub *sc); int xrep_symlink(struct xfs_scrub *sc); int xrep_xattr(struct xfs_scrub *sc); +#ifdef CONFIG_XFS_QUOTA +int xrep_quota(struct xfs_scrub *sc); +#else +# define xrep_quota xrep_notsupported +#endif /* CONFIG_XFS_QUOTA */ #else @@ -112,6 +119,7 @@ xrep_reset_perag_resv( #define xrep_bmap_attr xrep_notsupported #define xrep_symlink xrep_notsupported #define xrep_xattr xrep_notsupported +#define xrep_quota xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 537636d789fb..a9f969214e69 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -333,19 +333,19 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .type = ST_FS, .setup = xchk_setup_quota, .scrub = xchk_quota, - .repair = xrep_notsupported, + .repair = xrep_quota, }, [XFS_SCRUB_TYPE_GQUOTA] = { /* group quota */ .type = ST_FS, .setup = xchk_setup_quota, .scrub = xchk_quota, - .repair = xrep_notsupported, + .repair = xrep_quota, }, [XFS_SCRUB_TYPE_PQUOTA] = { /* project quota */ .type = ST_FS, .setup = xchk_setup_quota, .scrub = xchk_quota, - .repair = xrep_notsupported, + .repair = xrep_quota, }, }; @@ -539,9 +539,8 @@ xfs_scrub_metadata( if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) sc.sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT; - needs_fix = (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT | - XFS_SCRUB_OFLAG_XCORRUPT | - XFS_SCRUB_OFLAG_PREEN)); + needs_fix = xfs_scrub_needs_repair(sc.sm); + /* * If userspace asked for a repair but it wasn't necessary, * report that back to userspace.