[08/10] btrfs: speedup checking for extent sharedness during fiemap

From: Filipe Manana <fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

One of the most expensive tasks performed during fiemap is to check if
an extent is shared. This task has two major steps:

1) Check if the data extent is shared. This implies checking the extent
   item in the extent tree, checking delayed references, etc. If we
   find the data extent is directly shared, we terminate immediately;

2) If the data extent is not directly shared (its extent item has a
   refcount of 1), then it may be shared if we have snapshots that share
   subtrees of the inode's subvolume b+tree. So we check if the leaf
   containing the file extent item is shared, then its parent node, then
   the parent node of the parent node, etc, until we reach the root node
   or we find one of them is shared - in which case we stop immediately.

During fiemap we process the extents of a file from left to right, from
file offset 0 to eof. This means that we iterate b+tree leaves from left
to right, and has the implication that we keep repeating that second step
above several times for the same b+tree path of the inode's subvolume
b+tree.

For example, if we have two file extent items in leaf X, and the path to
leaf X is A -> B -> C -> X, then when we try to determine if the data
extent referenced by the first extent item is shared, we check if the data
extent is shared - if it's not, then we check if leaf X is shared, if not,
then we check if node C is shared, if not, then check if node B is shared,
if not than check if node A is shared. When we move to the next file
extent item, after determining the data extent is not shared, we repeat
the checks for X, C, B and A - doing all the expensive searches in the
extent tree, delayed refs, etc. If we have thousands of tile extents, then
we keep repeating the sharedness checks for the same paths over and over.

On a file that has no shared extents or only a small portion, it's easy
to see that this scales terribly with the number of extents in the file
and the sizes of the extent and subvolume b+trees.

This change eliminates the repeated sharedness check on extent buffers
by caching the results of the last path used. The results can be used as
long as no snapshots were created since they were cached (for not shared
extent buffers) or no roots were dropped since they were cached (for
shared extent buffers). This greatly reduces the time spent by fiemap for
files with thousands of extents and/or large extent and subvolume b+trees.

Example performance test:

    $ cat fiemap-perf-test.sh
    #!/bin/bash

    DEV=/dev/sdi
    MNT=/mnt/sdi

    mkfs.btrfs -f $DEV
    mount -o compress=lzo $DEV $MNT

    # 40G gives 327680 128K file extents (due to compression).
    xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar

    umount $MNT
    mount -o compress=lzo $DEV $MNT

    start=$(date +%s%N)
    filefrag $MNT/foobar
    end=$(date +%s%N)
    dur=$(( (end - start) / 1000000 ))
    echo "fiemap took $dur milliseconds (metadata not cached)"

    start=$(date +%s%N)
    filefrag $MNT/foobar
    end=$(date +%s%N)
    dur=$(( (end - start) / 1000000 ))
    echo "fiemap took $dur milliseconds (metadata cached)"

    umount $MNT

Before this patch:

    $ ./fiemap-perf-test.sh
    (...)
    /mnt/sdi/foobar: 327680 extents found
    fiemap took 3597 milliseconds (metadata not cached)
    /mnt/sdi/foobar: 327680 extents found
    fiemap took 2107 milliseconds (metadata cached)

After this patch:

    $ ./fiemap-perf-test.sh
    (...)
    /mnt/sdi/foobar: 327680 extents found
    fiemap took 1646 milliseconds (metadata not cached)
    /mnt/sdi/foobar: 327680 extents found
    fiemap took 698 milliseconds (metadata cached)

That's about 2.2x faster when no metadata is cached, and about 3x faster
when all metadata is cached. On a real filesystem with many other files,
data, directories, etc, the b+trees will be 2 or 3 levels higher,
therefore this optimization will have a higher impact.

Several reports of a slow fiemap show up often, the two Link tags below
refer to two recent reports of such slowness. This patch, together with
the next ones in the series, is meant to address that.

Link: https://lore.kernel.org/linux-btrfs/21dd32c6-f1f9-f44a-466a-e18fdc6788a7@virtuozzo.com/
Link: https://lore.kernel.org/linux-btrfs/Ysace25wh5BbLd5f@atmark-techno.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/backref.c     | 122 ++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/backref.h     |  17 +++++-
 fs/btrfs/ctree.h       |  18 ++++++
 fs/btrfs/extent-tree.c |  10 +++-
 fs/btrfs/extent_io.c   |  11 ++--
 5 files changed, 170 insertions(+), 8 deletions(-)

Message ID	5e696c29b65f6558b8012596aa513101ed04a21a.1662022922.git.fdmanana@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1DC1ECAAD3 for <linux-btrfs@archiver.kernel.org>; Thu, 1 Sep 2022 13:20:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234109AbiIANUW (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>); Thu, 1 Sep 2022 09:20:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232445AbiIANTv (ORCPT <rfc822;linux-btrfs@vger.kernel.org>); Thu, 1 Sep 2022 09:19:51 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE56FF59 for <linux-btrfs@vger.kernel.org>; Thu, 1 Sep 2022 06:18:43 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 61659B826A9 for <linux-btrfs@vger.kernel.org>; Thu, 1 Sep 2022 13:18:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AEE12C433B5 for <linux-btrfs@vger.kernel.org>; Thu, 1 Sep 2022 13:18:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1662038321; bh=ITM9WwfPvayJZ5zveQDzocA6bXa3xkNVC5+fmrzWXxs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=lcVQAIsDfGriR3uAD7B+abMiNK/FaGZncJKgPXoGmpCZQVz3o9Bylbv3113MSGeCJ uVjGuPQSsSYNvNAf2Gul/6WbcwsAU5Pg+2gPBIsnWllI8K1yCh6J+8CS6hnXU934pC na5hb0PcRg6KLZGBYrN/1xMgwpCW3bXkPPKvag+ISAO1HdxivB87MmTPtJlvF5h47I qnE3y1p58Du6IHX3i1w96qzJ+Xj3/dQRmmrS+cL2fe0akIs839cq0cKsQaK5LuFtXf YDBhWXaBvnnY6F2dz78nW4sLC6Xb26DRd+eHdj1G7mZUIrBfDUnN+DNS5RjAapi1Np u0unrcAqv7h7A== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 08/10] btrfs: speedup checking for extent sharedness during fiemap Date: Thu, 1 Sep 2022 14:18:28 +0100 Message-Id: <5e696c29b65f6558b8012596aa513101ed04a21a.1662022922.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <cover.1662022922.git.fdmanana@suse.com> References: <cover.1662022922.git.fdmanana@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-btrfs.vger.kernel.org> X-Mailing-List: linux-btrfs@vger.kernel.org
Series	btrfs: make lseek and fiemap much more efficient \| expand [00/10] btrfs: make lseek and fiemap much more efficient [01/10] btrfs: allow hole and data seeking to be interruptible [02/10] btrfs: make hole and data seeking a lot more efficient [03/10] btrfs: remove check for impossible block start for an extent map at fiemap [04/10] btrfs: remove zero length check when entering fiemap [05/10] btrfs: properly flush delalloc when entering fiemap [06/10] btrfs: allow fiemap to be interruptible [07/10] btrfs: rename btrfs_check_shared() to a more descriptive name [08/10] btrfs: speedup checking for extent sharedness during fiemap [09/10] btrfs: skip unnecessary extent buffer sharedness checks during fiemap [10/10] btrfs: make fiemap more efficient and accurate reporting extent sharedness

[08/10] btrfs: speedup checking for extent sharedness during fiemap

Commit Message

Comments

Patch