From patchwork Sat Jun 3 18:52:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: iker vagyok X-Patchwork-Id: 13266223 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C23FBC77B7A for ; Sat, 3 Jun 2023 18:52:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229535AbjFCSwf (ORCPT ); Sat, 3 Jun 2023 14:52:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229451AbjFCSwe (ORCPT ); Sat, 3 Jun 2023 14:52:34 -0400 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 397CACA for ; Sat, 3 Jun 2023 11:52:33 -0700 (PDT) Received: by mail-ed1-x531.google.com with SMTP id 4fb4d7f45d1cf-514859f3ffbso4798088a12.1 for ; Sat, 03 Jun 2023 11:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685818351; x=1688410351; h=content-transfer-encoding:to:subject:message-id:date:from :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vePfZj0VmfcpN9f2UHRADB0qcL+Jwy6NEMaozgbQ3jU=; b=PTM0vuv7HVLB4HmUFqf44LYLzognhxcnlLCLJlREYqQ8WYk4KF4I81YbU8edZDUd/l msIRjWnERrN7ejqsJYWAERGkqG0eP7W9pXuuUwlc/DtYBuawpR2J8uN9m66dWbwNWQFx OoHwKjL59n0Nd8c+9GAwt9ZMdXyiJcGyVfUwXhSF9AjYs5+TzXl/Phhe1eBeBQ5rWw3q 6ypr7WNLts3N8pqqCDa8mh7jft+cqvo5nmJW5ei4oOLg55iMvisxhC3uR37HceOgK+T1 s5wcECHnmZ/RqfbUgEpDME7YK94nHOwCGLaww6UwVBmSOtpLjsdOtpMcN6O+Q4NmeurY r4IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685818351; x=1688410351; h=content-transfer-encoding:to:subject:message-id:date:from :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vePfZj0VmfcpN9f2UHRADB0qcL+Jwy6NEMaozgbQ3jU=; b=iGNlQzMkps8xDILtPGLUpGrIUYmvh3DAbHY5OvyxVrljlGLDY06/w3hao0exGn++Lg Luv1VSslG8hAqB0dXE7A499Cm3mehW6gGVJKdCGduqcWVmGwggjGFa5fnxjMATnRq7OV F++Gzh1kbPDHygGqwAZqtzTNjITM+X4IO0bZS/hLAFB+lG+KmO5jLAJLV2vmpHnnkSrM +5awHOXl0LM6FclMYI/ujTzECvVCCcb+kV8NVlCpyY7K21l/bJOrxez8tkSJPXOTY/5t A65mJ0an8nRPbNB8RbcLdmEbHhl/Dd/HRF33ccedn3StaZxz9s54Znhq3ueVPoaoPTCV p1cA== X-Gm-Message-State: AC+VfDxuUz8DEeW1VaRAxsjf+ZpYoVSx938mFlrpGd1IZEMbnGE1Catf 8c3JWOFTVIqisMGyfHidxe3E5xVJH1jxAJfDQ9YQjXd1E/E= X-Google-Smtp-Source: ACHHUZ4qIsm9N7GZj/Y1OJQmnWVsmPcgSu/YZ7N0qd1lY4E2vW+N69AL3jwB2LAuQAlmmHce8ZQZ9RTGOGZ/QKAmh+g= X-Received: by 2002:a17:907:3e14:b0:973:344:6a39 with SMTP id hp20-20020a1709073e1400b0097303446a39mr2176621ejc.76.1685818351322; Sat, 03 Jun 2023 11:52:31 -0700 (PDT) MIME-Version: 1.0 From: iker vagyok Date: Sat, 3 Jun 2023 20:52:19 +0200 Message-ID: Subject: Different sized disks, allocation strategy To: linux-btrfs@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Hello! I am curious if the multiple disk allocation strategy for btrfs could be improved for my use case. The current situation is: # btrfs filesystem usage -T / Overall: Device size: 32.74TiB Device allocated: 15.66TiB Device unallocated: 17.08TiB Device missing: 0.00B Device slack: 0.00B Used: 15.57TiB Free (estimated): 8.58TiB (min: 4.31TiB) Free (statfs, df): 6.12TiB Data ratio: 2.00 Metadata ratio: 4.00 Global reserve: 512.00MiB (used: 0.00B) Multiple profiles: no Data Metadata System Id Path RAID1 RAID1C4 RAID1C4 Unallocated Total Slack -- --------- ------- -------- -------- ----------- -------- ----- 2 /dev/sdc2 4.18TiB 13.00GiB 32.00MiB 4.91TiB 9.09TiB - 3 /dev/sda2 - 7.00GiB - 2.72TiB 2.72TiB - 6 /dev/sde2 - 6.00GiB 32.00MiB 2.72TiB 2.72TiB - 8 /dev/sdb2 5.72TiB 13.00GiB 32.00MiB 3.37TiB 9.09TiB - 9 /dev/sdd2 5.71TiB 13.00GiB 32.00MiB 3.37TiB 9.09TiB - -- --------- ------- -------- -------- ----------- -------- ----- Total 7.80TiB 13.00GiB 32.00MiB 17.08TiB 32.74TiB 0.00B Used 7.77TiB 9.95GiB 1.19MiB As you can see, my server has 2*3TB and 3*10TB HDDs and uses RAID1 for data and RAID1C4 for metadata. This works fine and the smaller devices are even used for RAID1C4 (metadata), as there are not enough big drives to handle 4 copies. But the smaller drives are not used for RAID1 (data) until all bigger disks are filled (with <3TB remaining free). Only then will all the disks take part in the raid setup. I propose a simple change to the strategy where btrfs is still filling up the big drives faster, but doesn't wait for ~7TB used on each big drive, before it starts to use the 3TB drives as well. It would be nice if all disks would be used corresponding to their relative free space. IMO it wouldn't change the behavior for big setups with same-sized devices and would make rebuilds and drive wear much easier to anticipate on smaller setups. I browsed through the kernel code and believe that this could be a very simple and contained change - but my C knowledge is limited. Something in the ballpark of this: return -1; if (di_a->max_avail < di_b->max_avail) What do you think about the proposal? Thank you and best regards, Balázs diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 841e799dece5..db36c01a62be 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5049,6 +5049,14 @@ static int btrfs_cmp_device_info(const void *a, const void *b) const struct btrfs_device_info *di_a = a; const struct btrfs_device_info *di_b = b; + if (di_a->total_avail > 0 && di_b->total_avail > 0) + { + if (di_a->max_avail / di_a->total_avail > di_b->max_avail / di_b->total_avail) + return -1; + if (di_a->max_avail / di_a->total_avail < di_b->max_avail / di_b->total_avail) + return 1; + }; if (di_a->max_avail > di_b->max_avail)