[v3,3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time

The overall idea of the new BG_TREE is pretty simple:
Put BLOCK_GROUP_ITEMS into a separate tree.

This brings one obvious enhancement:
- Reduce mount time of large fs

Although it could be possible to accept BLOCK_GROUP_ITEMS in either
trees (extent root or bg root), I'll leave that kernel convert as
alternatives to offline convert, as next step if there are a lot of
interests in that.

So for now, if an existing fs want to take advantage of BG_TREE feature,
btrfs-progs will provide offline convertion tool.

[[Benchmark]]
Physical device:	NVMe SSD
VM device:		VirtIO block device, backup by sparse file
Nodesize:		4K  (to bump up tree height)
Extent data size:	4M
Fs size used:		1T

All file extents on disk is in 4M size, preallocated to reduce space usage
(as the VM uses loopback block device backed by sparse file)

Without patchset:
Use ftrace function graph:

 7)               |  open_ctree [btrfs]() {
 7)               |    btrfs_read_block_groups [btrfs]() {
 7) @ 805851.8 us |    }
 7) @ 911890.2 us |  }

 btrfs_read_block_groups() takes 88% of the total mount time,

With patchset, and use -O bg-tree mkfs option:

 6)               |  open_ctree [btrfs]() {
 6)               |    btrfs_read_block_groups [btrfs]() {
 6) * 91204.69 us |    }
 6) @ 192039.5 us |  }

  open_ctree() time is only 21% of original mount time.
  And btrfs_read_block_groups() only takes 47% of total open_ctree()
  execution time.

The reason is pretty obvious when considering how many tree blocks needs
to be read from disk:
- Original extent tree:
  nodes:	55
  leaves:	1025
  total:	1080
- Block group tree:
  nodes:	1
  leaves:	13
  total:	14

Not to mention all the tree blocks readahead works pretty fine for bg
tree, as we will read every item.
While readahead for extent tree will just be a diaster, as all block
groups are scatter across the whole extent tree.

The reduction of mount time is already obvious even on super fast NVMe
disk with memory cache.
It would be even more obvious if the fs is on spinning rust.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/block-group.c          | 104 ++++++++++++++++++++++++++------
 fs/btrfs/ctree.h                |   5 +-
 fs/btrfs/disk-io.c              |  13 ++++
 fs/btrfs/sysfs.c                |   2 +
 include/uapi/linux/btrfs.h      |   1 +
 include/uapi/linux/btrfs_tree.h |   3 +
 6 files changed, 110 insertions(+), 18 deletions(-)

Message ID	20191010023928.24586-4-wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=z0ck=YD=vger.kernel.org=linux-btrfs-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DED181668 for <patchwork-linux-btrfs@patchwork.kernel.org>; Thu, 10 Oct 2019 02:39:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C0FBA218AC for <patchwork-linux-btrfs@patchwork.kernel.org>; Thu, 10 Oct 2019 02:39:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732663AbfJJCjk (ORCPT <rfc822;patchwork-linux-btrfs@patchwork.kernel.org>); Wed, 9 Oct 2019 22:39:40 -0400 Received: from mx2.suse.de ([195.135.220.15]:41178 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726465AbfJJCjj (ORCPT <rfc822;linux-btrfs@vger.kernel.org>); Wed, 9 Oct 2019 22:39:39 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 4D858AE35 for <linux-btrfs@vger.kernel.org>; Thu, 10 Oct 2019 02:39:36 +0000 (UTC) From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time Date: Thu, 10 Oct 2019 10:39:28 +0800 Message-Id: <20191010023928.24586-4-wqu@suse.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191010023928.24586-1-wqu@suse.com> References: <20191010023928.24586-1-wqu@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: <linux-btrfs.vger.kernel.org> X-Mailing-List: linux-btrfs@vger.kernel.org
Series	btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time \| expand [v3,0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time [v3,1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() [v3,2/3] btrfs: block-group: Refactor btrfs_read_block_groups() [v3,3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time

[v3,3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time

Commit Message

Comments

Patch