From patchwork Thu Aug 25 23:50:19 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 9300525 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 413AB60757 for ; Thu, 25 Aug 2016 23:51:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 318172936B for ; Thu, 25 Aug 2016 23:51:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 263F229415; Thu, 25 Aug 2016 23:51:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from oss.sgi.com (oss.sgi.com [192.48.182.195]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4AC6D2936B for ; Thu, 25 Aug 2016 23:51:15 +0000 (UTC) Received: from oss.sgi.com (localhost [IPv6:::1]) by oss.sgi.com (Postfix) with ESMTP id 06E3981B5; Thu, 25 Aug 2016 18:50:31 -0500 (CDT) X-Original-To: xfs@oss.sgi.com Delivered-To: xfs@oss.sgi.com Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id B527B81AD for ; Thu, 25 Aug 2016 18:50:28 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay2.corp.sgi.com (Postfix) with ESMTP id 6B71830404E for ; Thu, 25 Aug 2016 16:50:28 -0700 (PDT) X-ASG-Debug-ID: 1472169025-0bf8155c751e86c0001-NocioJ Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) by cuda.sgi.com with ESMTP id EC74x0nI1hMsqRFb (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Thu, 25 Aug 2016 16:50:26 -0700 (PDT) X-Barracuda-Envelope-From: darrick.wong@oracle.com X-Barracuda-Effective-Source-IP: aserp1040.oracle.com[141.146.126.69] X-Barracuda-Apparent-Source-IP: 141.146.126.69 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u7PNoMfS024851 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Aug 2016 23:50:22 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id u7PNoMnY026466 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 25 Aug 2016 23:50:22 GMT Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u7PNoL7r001405; Thu, 25 Aug 2016 23:50:22 GMT Received: from localhost (/10.145.178.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 25 Aug 2016 16:50:21 -0700 Subject: [PATCH 35/71] xfs: preallocate blocks for worst-case btree expansion From: "Darrick J. Wong" X-ASG-Orig-Subj: [PATCH 35/71] xfs: preallocate blocks for worst-case btree expansion To: david@fromorbit.com, darrick.wong@oracle.com Date: Thu, 25 Aug 2016 16:50:19 -0700 Message-ID: <147216901995.4420.9308541290373322813.stgit@birch.djwong.org> In-Reply-To: <147216879156.4420.2446767701729565218.stgit@birch.djwong.org> References: <147216879156.4420.2446767701729565218.stgit@birch.djwong.org> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Source-IP: aserv0022.oracle.com [141.146.126.234] X-Barracuda-Connect: aserp1040.oracle.com[141.146.126.69] X-Barracuda-Start-Time: 1472169025 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://192.48.157.11:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 8358 X-Virus-Scanned: by bsmtpd at sgi.com X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.7 tests=BSF_SC0_MISMATCH_TO, UNPARSEABLE_RELAY X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.32328 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 BSF_SC0_MISMATCH_TO Envelope rcpt doesn't match header 0.00 UNPARSEABLE_RELAY Informational: message has unparseable relay lines Cc: linux-xfs@vger.kernel.org, Christoph Hellwig , xfs@oss.sgi.com X-BeenThere: xfs@oss.sgi.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com X-Virus-Scanned: ClamAV using ClamSMTP To gracefully handle the situation where a CoW operation turns a single refcount extent into a lot of tiny ones and then run out of space when a tree split has to happen, use the per-AG reserved block pool to pre-allocate all the space we'll ever need for a maximal btree. For a 4K block size, this only costs an overhead of 0.3% of available disk space. When reflink is enabled, we have an unfortunate problem with rmap -- since we can share a block billions of times, this means that the reverse mapping btree can expand basically infinitely. When an AG is so full that there are no free blocks with which to expand the rmapbt, the filesystem will shut down hard. This is rather annoying to the user, so use the AG reservation code to reserve a "reasonable" amount of space for rmap. We'll prevent reflinks and CoW operations if we think we're getting close to exhausting an AG's free space rather than shutting down, but this permanent reservation should be enough for "most" users. Hopefully. v2: Simplify the return value from xfs_perag_pool_free_block to a bool so that we can easily call xfs_trans_binval for both the per-AG pool and the real freeing case. Without this we fail to invalidate the btree buffer and will trip over the write verifier on a shrinking refcount btree. v3: Convert to the new per-AG reservation code. v4: Combine this patch with the one that adds the rmapbt reservation, since the rmapbt reservation is only needed for reflink filesystems. Signed-off-by: Darrick J. Wong [hch@lst.de: ensure that we invalidate the freed btree buffer] Signed-off-by: Christoph Hellwig --- libxfs/xfs_ag_resv.c | 11 ++++++++ libxfs/xfs_refcount_btree.c | 45 ++++++++++++++++++++++++++++++-- libxfs/xfs_refcount_btree.h | 3 ++ libxfs/xfs_rmap_btree.c | 60 +++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rmap_btree.h | 7 +++++ 5 files changed, 123 insertions(+), 3 deletions(-) diff --git a/libxfs/xfs_ag_resv.c b/libxfs/xfs_ag_resv.c index af69235..9d338bd 100644 --- a/libxfs/xfs_ag_resv.c +++ b/libxfs/xfs_ag_resv.c @@ -37,6 +37,7 @@ #include "xfs_trans_space.h" #include "xfs_rmap_btree.h" #include "xfs_btree.h" +#include "xfs_refcount_btree.h" /* * Per-AG Block Reservations @@ -229,6 +230,11 @@ xfs_ag_resv_init( /* Create the metadata reservation. */ ask = used = 0; + err2 = xfs_refcountbt_calc_reserves(pag->pag_mount, pag->pag_agno, + &ask, &used); + if (err2 && !error) + error = err2; + err2 = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA, ask, used); if (err2 && !error) error = err2; @@ -240,6 +246,11 @@ init_agfl: /* Create the AGFL metadata reservation */ ask = used = 0; + err2 = xfs_rmapbt_calc_reserves(pag->pag_mount, pag->pag_agno, + &ask, &used); + if (err2 && !error) + error = err2; + err2 = __xfs_ag_resv_init(pag, XFS_AG_RESV_AGFL, ask, used); if (err2 && !error) error = err2; diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c index 568a2f8..d2dbdbd 100644 --- a/libxfs/xfs_refcount_btree.c +++ b/libxfs/xfs_refcount_btree.c @@ -78,6 +78,8 @@ xfs_refcountbt_alloc_block( struct xfs_alloc_arg args; /* block allocation args */ int error; /* error return value */ + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + memset(&args, 0, sizeof(args)); args.tp = cur->bc_tp; args.mp = cur->bc_mp; @@ -87,6 +89,7 @@ xfs_refcountbt_alloc_block( args.firstblock = args.fsbno; xfs_rmap_ag_owner(&args.oinfo, XFS_RMAP_OWN_REFC); args.minlen = args.maxlen = args.prod = 1; + args.resv = XFS_AG_RESV_METADATA; error = xfs_alloc_vextent(&args); if (error) @@ -124,16 +127,19 @@ xfs_refcountbt_free_block( struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp); xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp)); struct xfs_owner_info oinfo; + int error; trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_private.a.agno, XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1); xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC); be32_add_cpu(&agf->agf_refcount_blocks, -1); xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS); - xfs_bmap_add_free(mp, cur->bc_private.a.dfops, fsbno, 1, - &oinfo); + error = xfs_free_extent(cur->bc_tp, fsbno, 1, &oinfo, + XFS_AG_RESV_METADATA); + if (error) + return error; - return 0; + return error; } STATIC int @@ -403,3 +409,36 @@ xfs_refcountbt_max_size( return xfs_refcountbt_calc_size(mp, mp->m_sb.sb_agblocks); } + +/* + * Figure out how many blocks to reserve and how many are used by this btree. + */ +int +xfs_refcountbt_calc_reserves( + struct xfs_mount *mp, + xfs_agnumber_t agno, + xfs_extlen_t *ask, + xfs_extlen_t *used) +{ + struct xfs_buf *agbp; + struct xfs_agf *agf; + xfs_extlen_t tree_len; + int error; + + if (!xfs_sb_version_hasreflink(&mp->m_sb)) + return 0; + + *ask += xfs_refcountbt_max_size(mp); + + error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp); + if (error) + return error; + + agf = XFS_BUF_TO_AGF(agbp); + tree_len = be32_to_cpu(agf->agf_refcount_blocks); + xfs_buf_relse(agbp); + + *used += tree_len; + + return error; +} diff --git a/libxfs/xfs_refcount_btree.h b/libxfs/xfs_refcount_btree.h index 780b02f..3be7768 100644 --- a/libxfs/xfs_refcount_btree.h +++ b/libxfs/xfs_refcount_btree.h @@ -68,4 +68,7 @@ extern xfs_extlen_t xfs_refcountbt_calc_size(struct xfs_mount *mp, unsigned long long len); extern xfs_extlen_t xfs_refcountbt_max_size(struct xfs_mount *mp); +extern int xfs_refcountbt_calc_reserves(struct xfs_mount *mp, + xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used); + #endif /* __XFS_REFCOUNT_BTREE_H__ */ diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c index 29a4dd6..02ceace 100644 --- a/libxfs/xfs_rmap_btree.c +++ b/libxfs/xfs_rmap_btree.c @@ -33,6 +33,7 @@ #include "xfs_rmap_btree.h" #include "xfs_trace.h" #include "xfs_cksum.h" +#include "xfs_ag_resv.h" /* * Reverse map btree. @@ -516,3 +517,62 @@ xfs_rmapbt_compute_maxlevels( mp->m_rmap_maxlevels = xfs_btree_compute_maxlevels(mp, mp->m_rmap_mnr, mp->m_sb.sb_agblocks); } + +/* Calculate the refcount btree size for some records. */ +xfs_extlen_t +xfs_rmapbt_calc_size( + struct xfs_mount *mp, + unsigned long long len) +{ + return xfs_btree_calc_size(mp, mp->m_rmap_mnr, len); +} + +/* + * Calculate the maximum refcount btree size. + */ +xfs_extlen_t +xfs_rmapbt_max_size( + struct xfs_mount *mp) +{ + /* Bail out if we're uninitialized, which can happen in mkfs. */ + if (mp->m_rmap_mxr[0] == 0) + return 0; + + return xfs_rmapbt_calc_size(mp, mp->m_sb.sb_agblocks); +} + +/* + * Figure out how many blocks to reserve and how many are used by this btree. + */ +int +xfs_rmapbt_calc_reserves( + struct xfs_mount *mp, + xfs_agnumber_t agno, + xfs_extlen_t *ask, + xfs_extlen_t *used) +{ + struct xfs_buf *agbp; + struct xfs_agf *agf; + xfs_extlen_t pool_len; + xfs_extlen_t tree_len; + int error; + + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) + return 0; + + /* Reserve 1% of the AG or enough for 1 block per record. */ + pool_len = max(mp->m_sb.sb_agblocks / 100, xfs_rmapbt_max_size(mp)); + *ask += pool_len; + + error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp); + if (error) + return error; + + agf = XFS_BUF_TO_AGF(agbp); + tree_len = be32_to_cpu(agf->agf_rmap_blocks); + xfs_buf_relse(agbp); + + *used += tree_len; + + return error; +} diff --git a/libxfs/xfs_rmap_btree.h b/libxfs/xfs_rmap_btree.h index 5ff9cfa..f3137a3 100644 --- a/libxfs/xfs_rmap_btree.h +++ b/libxfs/xfs_rmap_btree.h @@ -58,4 +58,11 @@ struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp, int xfs_rmapbt_maxrecs(struct xfs_mount *mp, int blocklen, int leaf); extern void xfs_rmapbt_compute_maxlevels(struct xfs_mount *mp); +extern xfs_extlen_t xfs_rmapbt_calc_size(struct xfs_mount *mp, + unsigned long long len); +extern xfs_extlen_t xfs_rmapbt_max_size(struct xfs_mount *mp); + +extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp, + xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used); + #endif /* __XFS_RMAP_BTREE_H__ */