btrfs: add extra ending condition for indirect data backref resolution

Btrfs has two types of data backref.
For BTRFS_EXTENT_DATA_REF_KEY type of backref, we don't have the
exact block number. Therefore, we need to call resolve_indirect_refs
which uses btrfs_search_slot to locate the leaf block. After that,
we need to walk through the leafs to search for the EXTENT_DATA items
that have disk bytenr matching the extent item(add_all_parents).

The only conditions we'll stop searching are
1. We find different object id or type is not EXTENT_DATA
2. We've already got all the refs we want(total_refs)

Take the following EXTENT_ITEM as example:
item 11 key (40831553536 EXTENT_ITEM 4194304) itemoff 15460 itemsize 95
    extent refs 24 gen 7302 flags DATA
    extent data backref root 257 objectid 260 offset 65536 count 5 #backref entry 1
    extent data backref root 258 objectid 265 offset 0 count 9 #backref entry 2
    shared data backref parent 394985472 count 10 #backref entry 3

If we want to search for backref entry 1, total_refs here would be 24 rather
than its count 5.

The reason to use 24 is because some EXTENT_DATA in backref entry 3 block
394985472 also points to EXTENT_ITEM 40831553536, if this block also belongs to
root 257 and lies between these 5 items of backref entry 1,
and we use total_refs = 5, we'll end up missing some refs from backref
entry 1.

But using total_refs=24 is not accurate. We'll never find extent data keys in
backref entry 2, since we searched root 257 not 258. We'll never reach block
394985472 either if this block is not a leaf in root 257.
As a result, the loop keeps on going until we reach the end of that inode.

Since we're searching for parent block of this backref entry 1,
we're 100% sure we'll never find any EXTENT_DATA beyond (65536 + 4194304) that
matching this entry. If there's any EXTENT_DATA with offset beyond this range
using this extent item, its backref must be stored at different backref entry.
That EXTENT_DATA will be handled when we process that backref entry.

Fix this by breaking from loop if we reach offset + (size of EXTENT_ITEM).

btrfs send use backref to search for clone candidate.
Without this patch, performance drops when running following script.
This script creates a 10G file with all of its extent size 64K.
Then it generates shared backref for each data extent, and
those backrefs could not be found when doing btrfs_resolve_indirect_refs.

item 87 key (11843469312 EXTENT_ITEM 65536) itemoff 10475 itemsize 66
    refs 3 gen 74 flags DATA
    extent data backref root 256 objectid 260 offset 10289152 count 2
    # This shared backref couldn't be found when resolving
    # indirect ref from snapshot of sub 256
    shared data backref parent 2303049728 count 1

btrfs subvolume create /volume1/sub1
for i in `seq 1 163840`; do dd if=/dev/zero of=/volume1/sub1/file bs=64K count=1 seek=$((i-1)) conv=notrunc oflag=direct 2>/dev/null; done
btrfs subvolume snapshot /volume1/sub1 /volume1/sub2
for i in `seq 1 163840`; do dd if=/dev/zero of=/volume1/sub1/file bs=4K count=1 seek=$(((i-1)*16+10)) conv=notrunc oflag=direct 2>/dev/null; done
btrfs subvolume snapshot -r /volume1/sub1 /volume1/snap1
time btrfs send /volume1/snap1 | btrfs receive /volume2

without this patch
real 69m48.124s
user 0m50.199s
sys  70m15.600s

with this patch
real 1m31.498s
user 0m35.858s
sys  2m55.544s

Signed-off-by: ethanwu <ethanwu@synology.com>
---
 fs/btrfs/backref.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Message ID	1578044681-25562-1-git-send-email-ethanwu@synology.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=sbT/=2Y=vger.kernel.org=linux-btrfs-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A18CC109A for <patchwork-linux-btrfs@patchwork.kernel.org>; Fri, 3 Jan 2020 09:53:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 528B821D7D for <patchwork-linux-btrfs@patchwork.kernel.org>; Fri, 3 Jan 2020 09:53:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=synology.com header.i=@synology.com header.b="eYHlGcw6" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727220AbgACJx3 (ORCPT <rfc822;patchwork-linux-btrfs@patchwork.kernel.org>); Fri, 3 Jan 2020 04:53:29 -0500 Received: from mail.synology.com ([211.23.38.101]:50548 "EHLO synology.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725972AbgACJx3 (ORCPT <rfc822;linux-btrfs@vger.kernel.org>); Fri, 3 Jan 2020 04:53:29 -0500 X-Greylist: delayed 466 seconds by postgrey-1.27 at vger.kernel.org; Fri, 03 Jan 2020 04:53:27 EST From: ethanwu <ethanwu@synology.com> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=synology.com; s=123; t=1578044739; bh=DbfJeGGDGsoT0KBsrih4KPc3KNaPV92btTy+pgwq5Fk=; h=From:To:Cc:Subject:Date; b=eYHlGcw6pH9G/cv7cR2DlGPnWbdLcRcnKJtQQT85JbnBUOannRklSeijPc6qfiu8Q K2DJlqOWIv17w9Ryu8pAp//Wr7pw9hhgnIpNX2ddZEIV+ucj2JqaB6l0+6duZqeaPk n66FhYeT3RAL5OPqrTVBHuj/B+OBOzHaeuNhoYQ8= To: linux-btrfs@vger.kernel.org Cc: ethanwu <ethanwu@synology.com> Subject: [PATCH] btrfs: add extra ending condition for indirect data backref resolution Date: Fri, 3 Jan 2020 17:44:41 +0800 Message-Id: <1578044681-25562-1-git-send-email-ethanwu@synology.com> X-Synology-MCP-Status: no X-Synology-Spam-Flag: no X-Synology-Spam-Status: score=0, required 6, WHITELIST_FROM_ADDRESS 0 X-Synology-Virus-Status: no Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: <linux-btrfs.vger.kernel.org> X-Mailing-List: linux-btrfs@vger.kernel.org
Series	btrfs: add extra ending condition for indirect data backref resolution \| expand btrfs: add extra ending condition for indirect data backref resolution

btrfs: add extra ending condition for indirect data backref resolution

Commit Message

Comments

Patch