[v3,3/3] btrfs: statfs: Use virtual chunk allocation to calculation available data space

Although btrfs_calc_avail_data_space() is trying to do an estimation
on how many data chunks it can allocate, the estimation is far from
perfect:

- Metadata over-commit is not considered at all
- Chunk allocation doesn't take RAID5/6 into consideration

Although current per-profile available space itself is not able to
handle metadata over-commit itself, the virtual chunk infrastructure can
be re-used to address above problems.

This patch will change btrfs_calc_avail_data_space() to do the following
things:
- Do metadata virtual chunk allocation first
  This is to address the over-commit behavior.
  If current metadata chunks have enough free space, we can completely
  skip this step.

- Allocate data virtual chunks as many as possible
  Just like what we did in per-profile available space estimation.
  Here we only need to calculate one profile, since statfs() call is
  a relative cold path.

Now statfs() should be able to report near perfect estimation on
available data space, and can handle RAID5/6 better.

[BENCHMARK]
For the performance difference, here is the benchmark:
 Disk layout:
 - devid 1:	1G
 - devid 2:	2G
 - devid 3:	3G
 - devid 4:	4G
 - devid 5:	5G
 metadata:	RAID1
 data:		RAID5

 This layout should be the worst case for RAID5, as it can
 from 5 disks raid5 to 2 disks raid 5 with unusable space.

 Then use ftrace to trace the execution time of btrfs_statfs() after
 allocating 1G data chunk. Both have 12 samples.
				avg		std
 Patched:			17.59 us	7.04
 WIthout patch (v5.5-rc2):	14.98 us	6.16

When the fs is cold, there is a small performance for this particular
case, as we need to do several more iterations to calculate correct
RAID5 data space.
But it's still pretty good, and won't block chunk allocator for any
observable time.

When the fs is hot, the performance bottleneck is the chunk_mutex, where
the most common and longest holder would be __btrfs_chunk_alloc().
In that case, we may sleep much longer as __btrfs_chunk_alloc() can
trigger IO.

Since the new implementation is not observable slower than the old one,
and won't cause meaningful delay for chunk allocator, it should be more
or less OK.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/super.c   | 190 ++++++++++++++++-----------------------------
 fs/btrfs/volumes.c |  12 +--
 fs/btrfs/volumes.h |   4 +
 3 files changed, 79 insertions(+), 127 deletions(-)

Message ID	20200106061343.18772-4-wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=bXJ1=23=vger.kernel.org=linux-btrfs-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CFC111398 for <patchwork-linux-btrfs@patchwork.kernel.org>; Mon, 6 Jan 2020 06:13:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AE9B821775 for <patchwork-linux-btrfs@patchwork.kernel.org>; Mon, 6 Jan 2020 06:13:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727020AbgAFGN7 (ORCPT <rfc822;patchwork-linux-btrfs@patchwork.kernel.org>); Mon, 6 Jan 2020 01:13:59 -0500 Received: from mx2.suse.de ([195.135.220.15]:37938 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726692AbgAFGN6 (ORCPT <rfc822;linux-btrfs@vger.kernel.org>); Mon, 6 Jan 2020 01:13:58 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id BD7A4AEB1 for <linux-btrfs@vger.kernel.org>; Mon, 6 Jan 2020 06:13:55 +0000 (UTC) From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 3/3] btrfs: statfs: Use virtual chunk allocation to calculation available data space Date: Mon, 6 Jan 2020 14:13:43 +0800 Message-Id: <20200106061343.18772-4-wqu@suse.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200106061343.18772-1-wqu@suse.com> References: <20200106061343.18772-1-wqu@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: <linux-btrfs.vger.kernel.org> X-Mailing-List: linux-btrfs@vger.kernel.org
Series	Introduce per-profile available space array to avoid over-confident can_overcommit() \| expand [v3,0/3] Introduce per-profile available space array to avoid over-confident can_overcommit() [v3,1/3] btrfs: Introduce per-profile available space facility [v3,2/3] btrfs: space-info: Use per-profile available space in can_overcommit() [v3,3/3] btrfs: statfs: Use virtual chunk allocation to calculation available data space

[v3,3/3] btrfs: statfs: Use virtual chunk allocation to calculation available data space

Commit Message

Comments

Patch