From patchwork Wed Sep 26 19:35:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10616649 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2B3BA13A4 for ; Wed, 26 Sep 2018 19:36:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 166EB2B769 for ; Wed, 26 Sep 2018 19:36:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 08A832B771; Wed, 26 Sep 2018 19:36:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6691D2B769 for ; Wed, 26 Sep 2018 19:36:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726416AbeI0Bun (ORCPT ); Wed, 26 Sep 2018 21:50:43 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:52614 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726385AbeI0Bun (ORCPT ); Wed, 26 Sep 2018 21:50:43 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w8QJXmBX174727; Wed, 26 Sep 2018 19:36:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=dcKF5dv0WRDdhindDmftQqP2T8UfLZVX5bXhvgYAwqc=; b=Sq+6CONzB/SXYRFO0PyUAfDs+BgYSpsmLDB0jHT59O01Bhx3sdXdj+XUJUVGpCsh/8Hj ja+409PUeJoBfMn1EYHbrRUq/E2E00WjoNDZfUatPaRDGY33v5sgODRnS9PxA1ZIJDWB +MNett21uXSn1sG+JwpD88m3Hbf8tvryr2cMNCPJ2jjVsUphrRFo45Vq4H1sjZOniMA9 MePWIEeQ/PdXLiW+g3jJUO61U0Is29pdCmz1YzW3LNlO4u05vk6OATQoc5DdpuoiL+lU 6+Takc5IgGqnS5jVx02vN2ffLhT+9U3a0QASSQUq36CjQkT1SaeeePbaxiF+ddPSnyNR Tw== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2130.oracle.com with ESMTP id 2mnd5tmrp1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 26 Sep 2018 19:36:11 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w8QJa5ud027233 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 26 Sep 2018 19:36:06 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w8QJa55d015582; Wed, 26 Sep 2018 19:36:05 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 26 Sep 2018 12:36:05 -0700 Subject: [PATCH 15/24] docs: add XFS refcount btree structure to DS&A book From: "Darrick J. Wong" To: david@fromorbit.com, darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Wed, 26 Sep 2018 12:35:55 -0700 Message-ID: <153799055508.31202.18152484638333180720.stgit@magnolia> In-Reply-To: <153799045443.31202.17537455000771265705.stgit@magnolia> References: <153799045443.31202.17537455000771265705.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9028 signatures=668707 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809260183 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Signed-off-by: Darrick J. Wong --- .../filesystems/xfs/ondisk/allocation_groups.rst | 1 .../filesystems/xfs/ondisk/refcountbt.rst | 154 ++++++++++++++++++++ 2 files changed, 155 insertions(+) create mode 100644 Documentation/filesystems/xfs/ondisk/refcountbt.rst diff --git a/Documentation/filesystems/xfs/ondisk/allocation_groups.rst b/Documentation/filesystems/xfs/ondisk/allocation_groups.rst index 296c520a2fbb..622cef4577bb 100644 --- a/Documentation/filesystems/xfs/ondisk/allocation_groups.rst +++ b/Documentation/filesystems/xfs/ondisk/allocation_groups.rst @@ -1381,3 +1381,4 @@ None of the XFS per-AG B+trees are involved with real time files. It is not possible for real time files to share data blocks. .. include:: rmapbt.rst +.. include:: refcountbt.rst diff --git a/Documentation/filesystems/xfs/ondisk/refcountbt.rst b/Documentation/filesystems/xfs/ondisk/refcountbt.rst new file mode 100644 index 000000000000..c76f4ec7e543 --- /dev/null +++ b/Documentation/filesystems/xfs/ondisk/refcountbt.rst @@ -0,0 +1,154 @@ +.. SPDX-License-Identifier: CC-BY-SA-3.0+ + +Reference Count B+tree +~~~~~~~~~~~~~~~~~~~~~~ + +To support the sharing of file data blocks (reflink), each allocation group +has its own reference count B+tree, which grows in the allocated space like +the inode B+trees. This data could be gleaned by performing an interval query +of the reverse-mapping B+tree, but doing so would come at a huge performance +penalty. Therefore, this data structure is a cache of computable information. + +This B+tree is only present if the XFS\_SB\_FEAT\_RO\_COMPAT\_REFLINK feature +is enabled. The feature requires a version 5 filesystem. + +Each record in the reference count B+tree has the following structure: + +.. code:: c + + struct xfs_refcount_rec { + __be32 rc_startblock; + __be32 rc_blockcount; + __be32 rc_refcount; + }; + +**rc\_startblock** + AG block number of this record. The high bit is set for all records + referring to an extent that is being used to stage a copy on write + operation. This reduces recovery time during mount operations. The + reference count of these staging events must only be 1. + +**rc\_blockcount** + The length of this extent. + +**rc\_refcount** + Number of mappings of this filesystem extent. + +Node pointers are an AG relative block pointer: + +.. code:: c + + struct xfs_refcount_key { + __be32 rc_startblock; + }; + +- As the reference counting is AG relative, all the block numbers are only + 32-bits. + +- The bb\_magic value is "R3FC" (0x52334643). + +- The xfs\_btree\_sblock\_t header is used for intermediate B+tree node as + well as the leaves. + +xfs\_db refcntbt Example +^^^^^^^^^^^^^^^^^^^^^^^^ + +For this example, an XFS filesystem was populated with a root filesystem and a +deduplication program was run to create shared blocks: + +:: + + xfs_db> agf 0 + xfs_db> addr refcntroot + xfs_db> p + magic = 0x52334643 + level = 1 + numrecs = 6 + leftsib = null + rightsib = null + bno = 36892 + lsn = 0x200004ec2 + uuid = f1f89746-e00b-49c9-96b3-ecef0f2f14ae + owner = 0 + crc = 0x75f35128 (correct) + keys[1-6] = [startblock] 1:[14] 2:[65633] 3:[65780] 4:[94571] 5:[117201] 6:[152442] + ptrs[1-6] = 1:7 2:25836 3:25835 4:18447 5:18445 6:18449 + xfs_db> addr ptrs[3] + xfs_db> p + magic = 0x52334643 + level = 0 + numrecs = 80 + leftsib = 25836 + rightsib = 18447 + bno = 51670 + lsn = 0x200004ec2 + uuid = f1f89746-e00b-49c9-96b3-ecef0f2f14ae + owner = 0 + crc = 0xc3962813 (correct) + recs[1-80] = [startblock,blockcount,refcount,cowflag] + 1:[65780,1,2,0] 2:[65781,1,3,0] 3:[65785,2,2,0] 4:[66640,1,2,0] + 5:[69602,4,2,0] 6:[72256,16,2,0] 7:[72871,4,2,0] 8:[72879,20,2,0] + 9:[73395,4,2,0] 10:[75063,4,2,0] 11:[79093,4,2,0] 12:[86344,16,2,0] + ... + 80:[35235,10,1,1] + +Notice record 80. The copy on write flag is set and the reference count is 1, +which indicates that the extent 35,235 - 35,244 are being used to stage a copy +on write activity. The "cowflag" field is the high bit of rc\_startblock. + +Record 6 in the reference count B+tree for AG 0 indicates that the AG extent +starting at block 72,256 and running for 16 blocks has a reference count of 2. +This means that there are two files sharing the block: + +:: + + xfs_db> blockget -n + xfs_db> fsblock 72256 + xfs_db> blockuse + block 72256 (0/72256) type rldata inode 25169197 + +The blockuse type changes to "rldata" to indicate that the block is shared +data. Unfortunately, blockuse only tells us about one block owner. If we +happen to have enabled the reverse-mapping B+tree, we can use it to find all +inodes that own this block: + +:: + + xfs_db> agf 0 + xfs_db> addr rmaproot + ... + xfs_db> addr ptrs[3] + ... + xfs_db> addr ptrs[7] + xfs_db> p + magic = 0x524d4233 + level = 0 + numrecs = 22 + leftsib = 65057 + rightsib = 65058 + bno = 291478 + lsn = 0x200004ec2 + uuid = f1f89746-e00b-49c9-96b3-ecef0f2f14ae + owner = 0 + crc = 0xed7da3f7 (correct) + recs[1-22] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock] + 1:[68957,8,3201,0,0,0,0] 2:[68965,4,25260953,0,0,0,0] + ... + 18:[72232,58,3227,0,0,0,0] 19:[72256,16,25169197,24,0,0,0] + 20:[72290,75,3228,0,0,0,0] 21:[72365,46,3229,0,0,0,0] + +Records 18 and 19 intersect the block 72,256; they tell us that inodes 3,227 +and 25,169,197 both claim ownership. Let us confirm this: + +:: + + xfs_db> inode 25169197 + xfs_db> bmap + data offset 0 startblock 12632259 (3/49347) count 24 flag 0 + data offset 24 startblock 72256 (0/72256) count 16 flag 0 + data offset 40 startblock 12632299 (3/49387) count 18 flag 0 + xfs_db> inode 3227 + xfs_db> bmap + data offset 0 startblock 72232 (0/72232) count 58 flag 0 + +Inodes 25,169,197 and 3,227 both contain mappings to block 0/72,256.