From patchwork Fri Oct 23 13:58:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11853519 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A379C388F9 for ; Fri, 23 Oct 2020 13:58:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 04448208C3 for ; Fri, 23 Oct 2020 13:58:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="OFS+rB3A" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S373670AbgJWN6Q (ORCPT ); Fri, 23 Oct 2020 09:58:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S372968AbgJWN6Q (ORCPT ); Fri, 23 Oct 2020 09:58:16 -0400 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBCF6C0613CE for ; Fri, 23 Oct 2020 06:58:15 -0700 (PDT) Received: by mail-qv1-xf42.google.com with SMTP id w9so722752qvj.0 for ; Fri, 23 Oct 2020 06:58:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=7WuzjlE0G17n6fXLNoW/DGqQfUNZELT0J7f8+45tXiA=; b=OFS+rB3AKhmy4OHiw7VrSdvlvw+r0wyqpdRf8nl8Q/LE4bAEy6GUJd8PTTf797Xbhw t3GV2tx8rZPHk705lZ73aIYHv6AyI9coTRDchHor7OaakOGeWTCsOasNbm70OCRRGT2l 90QW68bnWT+rTpixE1lrzMCqruciDfgbveO7QQZ+PY5YktWqTS5J385pbxfCNZeLpf5R u4dHdWCWQMyZjxpRY3dIrNH7YyzMUiRpv2UoG0jguY3D3b1HOTJR1GY5yeiVtGnnj6ST 1I/fOUeMFOKqXNHj7VD44X92xWfqT100yZkoztAzJIvsy0klJdsQdb57gdIeMWFh/VUc kO0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7WuzjlE0G17n6fXLNoW/DGqQfUNZELT0J7f8+45tXiA=; b=Q1gi9xvRbo16UakD24Qu3vZ6Bja0hpMXBjJlp2clEGmhnC7orYP27XF3v+ToAaD/oM ceSGx6Jv/5cl1IsyNJ/+t37mRgT1qO3qzqoP9CTQ+3LCic3Jdu8VtYvf31Ss3FwShZLa CxuIP/Vrxa5GLjA8SkNdyj/r0ZvfRP5Zrt7b5vlxXqP4e4b+AVzS3pJ3SWctb606wQq3 +dg5Aabk+stYn1YiBgEViOyDXmfvh5Tc5Wr7riwb+7lHLZ/fOFt7OG5l1yCqKbJyuzf+ U/MJpURCD0rdv6OZoeOcwz1+Oasa3mWSaZ0nbWq+KejQsaIl4DLIwoVoPi9J4aP7DOXg vVSg== X-Gm-Message-State: AOAM530AG1eEW42z0CAmDdHbMc5Rz1yRNdEee8kJR0xSO+dMPOT8BrkO u+ncisIqrWHBpZd3OIvyv+c1xk0JSGy2iaYK X-Google-Smtp-Source: ABdhPJxFTl9kJdGy5zd1bpiD7Rn7IPgwZH9xiU+HHy1QrJe8b3Zr9WiKrP/4D0w2vlx0vlsPBOZleQ== X-Received: by 2002:a0c:cc12:: with SMTP id r18mr2454470qvk.2.1603461494722; Fri, 23 Oct 2020 06:58:14 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id p136sm775873qke.25.2020.10.23.06.58.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 06:58:14 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 1/8] btrfs: do not shorten unpin len for caching block groups Date: Fri, 23 Oct 2020 09:58:04 -0400 Message-Id: <1e88615a596a6d811954832a744d105f94e42645.1603460665.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org While fixing up our ->last_byte_to_unpin locking I noticed that we will shorten len based on ->last_byte_to_unpin if we're caching when we're adding back the free space. This is correct for the free space, as we cannot unpin more than ->last_byte_to_unpin, however we use len to adjust the ->bytes_pinned counters and such, which need to track the actual pinned usage. This could result in WARN_ON(space_info->bytes_pinned) triggering at unmount time. Fix this by using a local variable for the amount to add to free space cache, and leave len untouched in this case. Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana --- fs/btrfs/extent-tree.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5fd60b13f4f8..a98f484a2fc1 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2816,10 +2816,10 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, len = cache->start + cache->length - start; len = min(len, end + 1 - start); - if (start < cache->last_byte_to_unpin) { - len = min(len, cache->last_byte_to_unpin - start); - if (return_free_space) - btrfs_add_free_space(cache, start, len); + if (start < cache->last_byte_to_unpin && return_free_space) { + u64 add_len = min(len, + cache->last_byte_to_unpin - start); + btrfs_add_free_space(cache, start, add_len); } start += len; From patchwork Fri Oct 23 13:58:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11853515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A2B6C55179 for ; Fri, 23 Oct 2020 13:58:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9AC73208C3 for ; Fri, 23 Oct 2020 13:58:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="DFuVu0hV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S373714AbgJWN6T (ORCPT ); Fri, 23 Oct 2020 09:58:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S372968AbgJWN6T (ORCPT ); Fri, 23 Oct 2020 09:58:19 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDD75C0613CE for ; Fri, 23 Oct 2020 06:58:17 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id h140so1173079qke.7 for ; Fri, 23 Oct 2020 06:58:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Kqs/2jDVnYDl7xnn7knteRj+EJd55upv0d1bMNO3H8I=; b=DFuVu0hVKe7SUvrwABRD6Q2KGtE+ds84gBjF9Yq4dpZpuJRHfa3YGlibGAvR0nCLSY HKSd0ojKZ26MMzZHl/9ZSUD9LblEf4csUkXYN7vojyMFRy7r2+omJCIzWiklY91sc1Aa z9q+8kUvq3a381DmFbekwje3SNA0q+cCFOEQIxce+LS8XOePN8ewck5+17nPq4SRP/lv GC/gI9wiFwtvZKkt7kZdQ3yeVMS2ODnw4z0Ii6P0s1v27wldzMqBXD/AvAOvLMVkWN8d upRKZpe9XIPYl5RfxDvrsxtItHv0tl6cEFMrDgYAJQYRBiaovDtAXmtYvtSayypj97se i+2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Kqs/2jDVnYDl7xnn7knteRj+EJd55upv0d1bMNO3H8I=; b=tiiRgT8B8kdsQeNU98e30XS0kUcbZ752OsRBV71wla0QAbEgh5ucFV4elDz8tyYdrA YUoM6u9RYZ/tNU33yjeTM4Rq2Xrud9ClVOr1lV+OnFwjTiOhxjmLXkPIxyCHH7bW04/3 /ZWDJ6enf/9u1rbuqh7Cuhx/r4sIgReQjtgiRhFKlgLVfy+MM2XAZlUYO1qtBSRt0JhB 9D04KzHnM0vSXbFxt7zPsY6Mz3kMA2AK+IGAKQY0fa1/cpiy6Jh0XL1bKyxs1XxE3Oza IKW42wMd05aJVw4Or6ewU7KLRXVCVz59UDSiHmGGYTEk0oajgtyya6t2WR67V5DXupY6 5Jxw== X-Gm-Message-State: AOAM531pE8wvwue7cOQ17WrI8ykzv6cWoIQQomdmmDfO3mbdUwrAjvk6 nC5jgE5GrwQkM2tlE2znIDJY3UhXfe+GTMC5 X-Google-Smtp-Source: ABdhPJz/iDK0/Rja3XdFYCmMR6h25iB5/4OS5EOGJVyFwBtXJ/8OZAnzSwn/o5DINnWO+inGIuzouw== X-Received: by 2002:ae9:eb97:: with SMTP id b145mr2420075qkg.60.1603461496468; Fri, 23 Oct 2020 06:58:16 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id w40sm915120qtj.48.2020.10.23.06.58.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 06:58:15 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 2/8] btrfs: update last_byte_to_unpin in switch_commit_roots Date: Fri, 23 Oct 2020 09:58:05 -0400 Message-Id: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org While writing an explanation for the need of the commit_root_sem for btrfs_prepare_extent_commit, I realized we have a slight hole that could result in leaked space if we have to do the old style caching. Consider the following scenario commit root +----+----+----+----+----+----+----+ |\\\\| |\\\\|\\\\| |\\\\|\\\\| +----+----+----+----+----+----+----+ 0 1 2 3 4 5 6 7 new commit root +----+----+----+----+----+----+----+ | | | |\\\\| | |\\\\| +----+----+----+----+----+----+----+ 0 1 2 3 4 5 6 7 Prior to this patch, we run btrfs_prepare_extent_commit, which updates the last_byte_to_unpin, and then we subsequently run switch_commit_roots. In this example lets assume that caching_ctl->progress == 1 at btrfs_prepare_extent_commit() time, which means that cache->last_byte_to_unpin == 1. Then we go and do the switch_commit_roots(), but in the meantime the caching thread has made some more progress, because we drop the commit_root_sem and re-acquired it. Now caching_ctl->progress == 3. We swap out the commit root and carry on to unpin. In the unpin code we have last_byte_to_unpin == 1, so we unpin [0,1), but do not unpin [2,3). However because caching_ctl->progress == 3 we do not see the newly free'd section of [2,3), and thus do not add it to our free space cache. This results in us missing a chunk of free space in memory. Fix this by making sure the ->last_byte_to_unpin is set at the same time that we swap the commit roots, this ensures that we will always be consistent. Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana --- fs/btrfs/ctree.h | 1 - fs/btrfs/extent-tree.c | 25 ------------------------- fs/btrfs/transaction.c | 41 +++++++++++++++++++++++++++++++++++++++-- 3 files changed, 39 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8a83bce3225c..41c76db65c8e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2592,7 +2592,6 @@ int btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info, u64 start, u64 len, int delalloc); int btrfs_pin_reserved_extent(struct btrfs_trans_handle *trans, u64 start, u64 len); -void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info); int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans); int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, struct btrfs_ref *generic_ref); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a98f484a2fc1..ee7bceace8b3 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2730,31 +2730,6 @@ btrfs_inc_block_group_reservations(struct btrfs_block_group *bg) atomic_inc(&bg->reservations); } -void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info) -{ - struct btrfs_caching_control *next; - struct btrfs_caching_control *caching_ctl; - struct btrfs_block_group *cache; - - down_write(&fs_info->commit_root_sem); - - list_for_each_entry_safe(caching_ctl, next, - &fs_info->caching_block_groups, list) { - cache = caching_ctl->block_group; - if (btrfs_block_group_done(cache)) { - cache->last_byte_to_unpin = (u64)-1; - list_del_init(&caching_ctl->list); - btrfs_put_caching_control(caching_ctl); - } else { - cache->last_byte_to_unpin = caching_ctl->progress; - } - } - - up_write(&fs_info->commit_root_sem); - - btrfs_update_global_block_rsv(fs_info); -} - /* * Returns the free cluster for the given space info and sets empty_cluster to * what it should be based on the mount options. diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 52ada47aff50..9ef6cba1eb59 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -155,6 +155,7 @@ static noinline void switch_commit_roots(struct btrfs_trans_handle *trans) struct btrfs_transaction *cur_trans = trans->transaction; struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_root *root, *tmp; + struct btrfs_caching_control *caching_ctl, *next; down_write(&fs_info->commit_root_sem); list_for_each_entry_safe(root, tmp, &cur_trans->switch_commits, @@ -180,6 +181,44 @@ static noinline void switch_commit_roots(struct btrfs_trans_handle *trans) spin_lock(&cur_trans->dropped_roots_lock); } spin_unlock(&cur_trans->dropped_roots_lock); + + /* + * We have to update the last_byte_to_unpin under the commit_root_sem, + * at the same time we swap out the commit roots. + * + * This is because we must have a real view of the last spot the caching + * kthreads were while caching. Consider the following views of the + * extent tree for a block group + * + * commit root + * +----+----+----+----+----+----+----+ + * |\\\\| |\\\\|\\\\| |\\\\|\\\\| + * +----+----+----+----+----+----+----+ + * 0 1 2 3 4 5 6 7 + * + * new commit root + * +----+----+----+----+----+----+----+ + * | | | |\\\\| | |\\\\| + * +----+----+----+----+----+----+----+ + * 0 1 2 3 4 5 6 7 + * + * If the cache_ctl->progress was at 3, then we are only allowed to + * unpin [0,1) and [2,3], because the caching thread has already + * processed those extents. We are not allowed to unpin [5,6), because + * the caching thread will re-start it's search from 3, and thus find + * the hole from [4,6) to add to the free space cache. + */ + list_for_each_entry_safe(caching_ctl, next, + &fs_info->caching_block_groups, list) { + struct btrfs_block_group *cache = caching_ctl->block_group; + if (btrfs_block_group_done(cache)) { + cache->last_byte_to_unpin = (u64)-1; + list_del_init(&caching_ctl->list); + btrfs_put_caching_control(caching_ctl); + } else { + cache->last_byte_to_unpin = caching_ctl->progress; + } + } up_write(&fs_info->commit_root_sem); } @@ -2293,8 +2332,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto unlock_tree_log; } - btrfs_prepare_extent_commit(fs_info); - cur_trans = fs_info->running_transaction; btrfs_set_root_node(&fs_info->tree_root->root_item, From patchwork Fri Oct 23 13:58:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11853523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03212C5517A for ; Fri, 23 Oct 2020 13:58:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9D079208C3 for ; Fri, 23 Oct 2020 13:58:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="ACU/VAYm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S373761AbgJWN6V (ORCPT ); Fri, 23 Oct 2020 09:58:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S372968AbgJWN6T (ORCPT ); Fri, 23 Oct 2020 09:58:19 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F0F1C0613CE for ; Fri, 23 Oct 2020 06:58:19 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id q199so1163169qke.10 for ; Fri, 23 Oct 2020 06:58:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=oQU7QLhY1kKqjctlcWh/QGaZxq5YshwtcYffCP+u30o=; b=ACU/VAYm5n8MT/nfXrF96uFCpJ8YTAjRscyR/LBc6O05/KEhEwrqOkohwLyMg8rT6G 3ggSFplVuVYR71XbU2LfkZ34G/hgoQX/yet+CZ4Nm4oN1xH5GB5BMD2JPv9MDMcYkBU7 p9B3IOR3d8PmQOMsPd40+K4xcARbzTcJ3xXanV1UcVS9THDHrrsg3bjxJhUfdm2jFTQ4 0PBpDdBYRMda93+rODQLPyBJEhmZie4rMlHFV4gblEt6YTeexwOrrvV0ciknIl8pYeVB xTmiyqScaf11Aod7wOqo6GsAp55qQDgnNLAl+id4iQcyuMCjERM7MSYWecHE683HIPE9 OH2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oQU7QLhY1kKqjctlcWh/QGaZxq5YshwtcYffCP+u30o=; b=es1Slfx44+6LTxo4N2tH9SkZWQIEEcnXQilERGv9B6+2T6k1Ajbag4fG3rLn4RAL/b u6nGpd4rzbgvXC9ytxSEiUM8gsNopvocy5LHrDi5Pzhr/DmkK7M/NqQQzqOUX1WBcH0r q1w+FUKIlaYnidinjmejecQ0u/QPUlMUwgJJZdfxrDtLEL1kR16cZdk9C3Fd4eVccT5z VmYLmGBN4J1BC7hN2CpNhiWPPiCuZBAW85vrRUsjwKmbeEB0xdIhHOjKFGfo/N/2/MPj IMpltI/ksssOyy5U12WZurKw3iUtoL72OvpK6ECjdxdA/n8w8ngg6jzh5ux1HWTeCzFC 2myA== X-Gm-Message-State: AOAM533vVjoAYzkjw3DL+ojnz7E13RomnBubJwRLlMHeyXXxPTJhG54e CHrHd1e7LNxXicYmXGOBi+gfUIFFda9gJtxI X-Google-Smtp-Source: ABdhPJypREheLEiXOR+UVVJnX5qwUE+9LA1ymz7/GbbQtuR4lVhB8YY+aXnSacTS08CSUYjrGUH2Og== X-Received: by 2002:a37:5906:: with SMTP id n6mr2398547qkb.254.1603461498211; Fri, 23 Oct 2020 06:58:18 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id j25sm741101qkk.124.2020.10.23.06.58.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 06:58:17 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 3/8] btrfs: explicitly protect ->last_byte_to_unpin in unpin_extent_range Date: Fri, 23 Oct 2020 09:58:06 -0400 Message-Id: <129622d0259e8e3209d4c9f9fe9a44e58a011b93.1603460665.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently unpin_extent_range happens in the transaction commit context, so we are protected from ->last_byte_to_unpin changing while we're unpinning, because any new transactions would have to wait for us to complete before modifying ->last_byte_to_unpin. However in the future we may want to change how this works, for instance with async unpinning or other such TODO items. To prepare for that future explicitly protect ->last_byte_to_unpin with the commit_root_sem so we are sure it won't change while we're doing our work. Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana --- fs/btrfs/extent-tree.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ee7bceace8b3..5d3564b077bf 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2791,11 +2791,13 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, len = cache->start + cache->length - start; len = min(len, end + 1 - start); + down_read(&fs_info->commit_root_sem); if (start < cache->last_byte_to_unpin && return_free_space) { u64 add_len = min(len, cache->last_byte_to_unpin - start); btrfs_add_free_space(cache, start, add_len); } + up_read(&fs_info->commit_root_sem); start += len; total_unpinned += len; From patchwork Fri Oct 23 13:58:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11853527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 993D5C56201 for ; Fri, 23 Oct 2020 13:58:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2E5BA21527 for ; Fri, 23 Oct 2020 13:58:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="jKehH3Fj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S373789AbgJWN6X (ORCPT ); Fri, 23 Oct 2020 09:58:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S372968AbgJWN6W (ORCPT ); Fri, 23 Oct 2020 09:58:22 -0400 Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 575E3C0613CE for ; Fri, 23 Oct 2020 06:58:21 -0700 (PDT) Received: by mail-qv1-xf41.google.com with SMTP id s17so699307qvr.11 for ; Fri, 23 Oct 2020 06:58:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Z497L4gT5Xjgc2QE6kY0Ag3tjw4Ujb5y7t6fOXrVYxY=; b=jKehH3Fj6oK0/j/EkTg4vWafiOiSeJHO9ToiL9L0UQunhBqCk6f7bWZfsX8Qi27xye fJrnugje//Zs5IaFOgLtVNZQgrsErWKi+BzulTRCvjvCYDxiZgOC6NbhbLaaWVP8YScc DL+ST7BzUCz+l+cTMze1IoIHY73TV/uAq/33/A/rKqN4Yt2Ai6q5NmRn3PwlAjZ12tye k43EJ5FiKYUEyN35NRWAbkbrHaMJ/O42OGWun7cCdvuiF7jLwKM6nEA+gZTjM97hdVVG c3ezPpTJ63M41nKXDMb6Y/ijmjO0eM4JOCQei88C4tFqOUz/r+mN3aSZwV9w/3lbLACS NUog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Z497L4gT5Xjgc2QE6kY0Ag3tjw4Ujb5y7t6fOXrVYxY=; b=O6+0xj3agzZWo2RQ4lwifyXVLX3x2PmfwNoUARvBKvEB9sB+P2Vhz9OeFrA5sFahG9 Z/ON5my0EntJAtAEjBGLhHCO476cO0mMlV6suuV8EzXeN87QOz39U++7ymGNakTxT4Tt 4R+T9P4CoKbQhEUhzJLvSMKxG3zDytCzWDc56YKBpVm492BFbRMWQEPF1cj5SZBm9iRH MoOTE6h5xxkj3wBbFI1Zj8Di8OmahjJKgg9FcccZu/6Fc0iXrvJskvU1KuQ6o49iQlFA LW3siUJQq5BpujsLjbJZ8EJusN0BFvHVwpC1HWx+NGP9CUhMivMhA9+HH9+NQgzqGlvx KB2g== X-Gm-Message-State: AOAM533B0YYdwKhaKkOsGvG0nT4nSfsgp/5WeZx0ylFGAjfa20rGcLsH AbmmhxIeqT0BngN3DSYUKc8bkz+M9o+tS91L X-Google-Smtp-Source: ABdhPJzq2eDGCBfPObAFXnOm8NkYKjzygl6IEISW7644NMv7yWkalxQK3TxSoB8i8zbRLDA2mDHJNA== X-Received: by 2002:a0c:f38b:: with SMTP id i11mr2271254qvk.41.1603461500080; Fri, 23 Oct 2020 06:58:20 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id 19sm760237qki.33.2020.10.23.06.58.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 06:58:19 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 4/8] btrfs: cleanup btrfs_discard_update_discardable usage Date: Fri, 23 Oct 2020 09:58:07 -0400 Message-Id: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This passes in the block_group and the free_space_ctl, but we can get this from the block group itself. Part of this is because we call it from __load_free_space_cache, which can be called for the inode cache as well. Move that call into the block group specific load section, wrap it in the right lock that we need, and fix up the arguments to only take the block group. Add a lockdep_assert as well for good measure to make sure we don't mess up the locking again. Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana --- fs/btrfs/discard.c | 7 ++++--- fs/btrfs/discard.h | 3 +-- fs/btrfs/free-space-cache.c | 14 ++++++++------ 3 files changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 741c7e19c32f..5a88b584276f 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -563,15 +563,14 @@ void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl) /** * btrfs_discard_update_discardable - propagate discard counters * @block_group: block_group of interest - * @ctl: free_space_ctl of @block_group * * This propagates deltas of counters up to the discard_ctl. It maintains a * current counter and a previous counter passing the delta up to the global * stat. Then the current counter value becomes the previous counter value. */ -void btrfs_discard_update_discardable(struct btrfs_block_group *block_group, - struct btrfs_free_space_ctl *ctl) +void btrfs_discard_update_discardable(struct btrfs_block_group *block_group) { + struct btrfs_free_space_ctl *ctl; struct btrfs_discard_ctl *discard_ctl; s32 extents_delta; s64 bytes_delta; @@ -581,8 +580,10 @@ void btrfs_discard_update_discardable(struct btrfs_block_group *block_group, !btrfs_is_block_group_data_only(block_group)) return; + ctl = block_group->free_space_ctl; discard_ctl = &block_group->fs_info->discard_ctl; + lockdep_assert_held(&ctl->tree_lock); extents_delta = ctl->discardable_extents[BTRFS_STAT_CURR] - ctl->discardable_extents[BTRFS_STAT_PREV]; if (extents_delta) { diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index 353228d62f5a..57b9202f427f 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -28,8 +28,7 @@ bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl); /* Update operations */ void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl); -void btrfs_discard_update_discardable(struct btrfs_block_group *block_group, - struct btrfs_free_space_ctl *ctl); +void btrfs_discard_update_discardable(struct btrfs_block_group *block_group); /* Setup/cleanup operations */ void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 5ea36a06e514..0787339c7b93 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -828,7 +828,6 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, merge_space_tree(ctl); ret = 1; out: - btrfs_discard_update_discardable(ctl->private, ctl); io_ctl_free(&io_ctl); return ret; free_cache: @@ -929,6 +928,9 @@ int load_free_space_cache(struct btrfs_block_group *block_group) block_group->start); } + spin_lock(&ctl->tree_lock); + btrfs_discard_update_discardable(block_group); + spin_unlock(&ctl->tree_lock); iput(inode); return ret; } @@ -2508,7 +2510,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, if (ret) kmem_cache_free(btrfs_free_space_cachep, info); out: - btrfs_discard_update_discardable(block_group, ctl); + btrfs_discard_update_discardable(block_group); spin_unlock(&ctl->tree_lock); if (ret) { @@ -2643,7 +2645,7 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, goto again; } out_lock: - btrfs_discard_update_discardable(block_group, ctl); + btrfs_discard_update_discardable(block_group); spin_unlock(&ctl->tree_lock); out: return ret; @@ -2779,7 +2781,7 @@ void __btrfs_remove_free_space_cache(struct btrfs_free_space_ctl *ctl) spin_lock(&ctl->tree_lock); __btrfs_remove_free_space_cache_locked(ctl); if (ctl->private) - btrfs_discard_update_discardable(ctl->private, ctl); + btrfs_discard_update_discardable(ctl->private); spin_unlock(&ctl->tree_lock); } @@ -2801,7 +2803,7 @@ void btrfs_remove_free_space_cache(struct btrfs_block_group *block_group) cond_resched_lock(&ctl->tree_lock); } __btrfs_remove_free_space_cache_locked(ctl); - btrfs_discard_update_discardable(block_group, ctl); + btrfs_discard_update_discardable(block_group); spin_unlock(&ctl->tree_lock); } @@ -2885,7 +2887,7 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group *block_group, link_free_space(ctl, entry); } out: - btrfs_discard_update_discardable(block_group, ctl); + btrfs_discard_update_discardable(block_group); spin_unlock(&ctl->tree_lock); if (align_gap_len) From patchwork Fri Oct 23 13:58:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11853529 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FACBC5DF9D for ; Fri, 23 Oct 2020 13:58:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 09455208C3 for ; Fri, 23 Oct 2020 13:58:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="LlbRzB30" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S373798AbgJWN6Y (ORCPT ); Fri, 23 Oct 2020 09:58:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S373787AbgJWN6X (ORCPT ); Fri, 23 Oct 2020 09:58:23 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19122C0613D2 for ; Fri, 23 Oct 2020 06:58:23 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id a23so1157497qkg.13 for ; Fri, 23 Oct 2020 06:58:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=2PkB/qMj06MsTyF4+Cotlo+Y3zPH84MKH4FKFqFuDsA=; b=LlbRzB3078v3rp7kaPpXO9kbQ+L5qayh6KZdwxY5/0ELKTBHOzxOyjsHJO8xcoZm0H KiJjTaLrma4HQyAeeK4j/gpRbUaRPwZ9/pTIcrtF0XQ6GexGbB/PKCk/Yy3kzNG//rGJ /gcJHL4UTo/+zWuCBEn9fd3QgpbT8EVpBDXRG0T2dUdyCJ3HV0b2/AGXWOsIRsgz6Qxe F4xS35XgGqUTke1EoBVwOQFSZMgPByXrxBmenmzeZ/B5xBWWQWZhrThWsGIngtfc5Vjq o94rbwiK9CK/2XiZnIOgJ26R0jUMr4B4ZwaqgZAzbb/3lchA2XWE7h/Sdmd8b7UCl7vS 4EXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2PkB/qMj06MsTyF4+Cotlo+Y3zPH84MKH4FKFqFuDsA=; b=d/agbJURZnX5V4nrRInjn0PyTiF+6kYjmix9XKzibW6OT6Vwinl1UiI8rWB35jgPpy u8rSkh/2jOqqH9f39F0/LDWTp6g8+OMQG4IwsHTsHkPZA0s5HfH2+AzwiJSapjy4rDp1 Bd49I6fq7wWUDHTOb+39JG0aUBWMpuMfwLV+ThhzsDPoAlz2N+vflqiqdWmj01aEYKmK NX+7BQlb9wgbuQhzdbZjbLLCeclO1hRWjxqpyn37ZUiEIkiAc79J+TSOT1xvQt8ajppa 0T1I+NJEEZ47gOVDyFlP+BTEJtLZsT68pdg/GYyVEEoapFm+FMgiGxdIEP3dhZjibgWO UAmQ== X-Gm-Message-State: AOAM530ETiYGCtkCePM0Q6WbOkZbhnF489gY7vv4vw5dyDmhiD5lvNJo PsRKJCQCf9Ctc5hsM8jDBiPJUKBBcg/uUZdn X-Google-Smtp-Source: ABdhPJzTSCBeYvwMHpZONuEv0zaZNiqrnhbko6vgUnI9j/vt85rPgJS6RBPz8+S1BJhT1R5CFiUMkQ== X-Received: by 2002:a05:620a:1221:: with SMTP id v1mr2266038qkj.98.1603461501723; Fri, 23 Oct 2020 06:58:21 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id g27sm754928qkk.135.2020.10.23.06.58.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 06:58:21 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 5/8] btrfs: load free space cache into a temporary ctl Date: Fri, 23 Oct 2020 09:58:08 -0400 Message-Id: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The free space cache has been special in that we would load it right away instead of farming the work off to a worker thread. This resulted in some weirdness that had to be taken into account for this fact, namely that if we every found a block group being cached the fast way we had to wait for it to finish, because we could get the cache before it had been validated and we may throw the cache away. To handle this particular case instead create a temporary btrfs_free_space_ctl to load the free space cache into. Then once we've validated that it makes sense, copy it's contents into the actual block_group->free_space_ctl. This allows us to avoid the problems of needing to wait for the caching to complete, we can clean up the discard extent handling stuff in __load_free_space_cache, and we no longer need to do the merge_space_tree() because the space is added one by one into the real free_space_ctl. This will allow further reworks of how we handle loading the free space cache. Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana --- fs/btrfs/block-group.c | 29 +------ fs/btrfs/free-space-cache.c | 155 +++++++++++++++-------------------- fs/btrfs/free-space-cache.h | 3 +- fs/btrfs/tests/btrfs-tests.c | 2 +- 4 files changed, 70 insertions(+), 119 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index bb6685711824..adbd18dc08a1 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -695,33 +695,6 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only btrfs_init_work(&caching_ctl->work, caching_thread, NULL, NULL); spin_lock(&cache->lock); - /* - * This should be a rare occasion, but this could happen I think in the - * case where one thread starts to load the space cache info, and then - * some other thread starts a transaction commit which tries to do an - * allocation while the other thread is still loading the space cache - * info. The previous loop should have kept us from choosing this block - * group, but if we've moved to the state where we will wait on caching - * block groups we need to first check if we're doing a fast load here, - * so we can wait for it to finish, otherwise we could end up allocating - * from a block group who's cache gets evicted for one reason or - * another. - */ - while (cache->cached == BTRFS_CACHE_FAST) { - struct btrfs_caching_control *ctl; - - ctl = cache->caching_ctl; - refcount_inc(&ctl->count); - prepare_to_wait(&ctl->wait, &wait, TASK_UNINTERRUPTIBLE); - spin_unlock(&cache->lock); - - schedule(); - - finish_wait(&ctl->wait, &wait); - btrfs_put_caching_control(ctl); - spin_lock(&cache->lock); - } - if (cache->cached != BTRFS_CACHE_NO) { spin_unlock(&cache->lock); kfree(caching_ctl); @@ -1805,7 +1778,7 @@ static struct btrfs_block_group *btrfs_create_block_group_cache( INIT_LIST_HEAD(&cache->discard_list); INIT_LIST_HEAD(&cache->dirty_list); INIT_LIST_HEAD(&cache->io_list); - btrfs_init_free_space_ctl(cache); + btrfs_init_free_space_ctl(cache, cache->free_space_ctl); atomic_set(&cache->frozen, 0); mutex_init(&cache->free_space_lock); btrfs_init_full_stripe_locks_tree(&cache->full_stripe_locks_root); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 0787339c7b93..58bd2d3e54db 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -33,8 +33,6 @@ struct btrfs_trim_range { struct list_head list; }; -static int count_bitmap_extents(struct btrfs_free_space_ctl *ctl, - struct btrfs_free_space *bitmap_info); static int link_free_space(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info); static void unlink_free_space(struct btrfs_free_space_ctl *ctl, @@ -43,6 +41,14 @@ static int btrfs_wait_cache_io_root(struct btrfs_root *root, struct btrfs_trans_handle *trans, struct btrfs_io_ctl *io_ctl, struct btrfs_path *path); +static int search_bitmap(struct btrfs_free_space_ctl *ctl, + struct btrfs_free_space *bitmap_info, u64 *offset, + u64 *bytes, bool for_alloc); +static void free_bitmap(struct btrfs_free_space_ctl *ctl, + struct btrfs_free_space *bitmap_info); +static void bitmap_clear_bits(struct btrfs_free_space_ctl *ctl, + struct btrfs_free_space *info, u64 offset, + u64 bytes); static struct inode *__lookup_free_space_inode(struct btrfs_root *root, struct btrfs_path *path, @@ -625,44 +631,6 @@ static int io_ctl_read_bitmap(struct btrfs_io_ctl *io_ctl, return 0; } -/* - * Since we attach pinned extents after the fact we can have contiguous sections - * of free space that are split up in entries. This poses a problem with the - * tree logging stuff since it could have allocated across what appears to be 2 - * entries since we would have merged the entries when adding the pinned extents - * back to the free space cache. So run through the space cache that we just - * loaded and merge contiguous entries. This will make the log replay stuff not - * blow up and it will make for nicer allocator behavior. - */ -static void merge_space_tree(struct btrfs_free_space_ctl *ctl) -{ - struct btrfs_free_space *e, *prev = NULL; - struct rb_node *n; - -again: - spin_lock(&ctl->tree_lock); - for (n = rb_first(&ctl->free_space_offset); n; n = rb_next(n)) { - e = rb_entry(n, struct btrfs_free_space, offset_index); - if (!prev) - goto next; - if (e->bitmap || prev->bitmap) - goto next; - if (prev->offset + prev->bytes == e->offset) { - unlink_free_space(ctl, prev); - unlink_free_space(ctl, e); - prev->bytes += e->bytes; - kmem_cache_free(btrfs_free_space_cachep, e); - link_free_space(ctl, prev); - prev = NULL; - spin_unlock(&ctl->tree_lock); - goto again; - } -next: - prev = e; - } - spin_unlock(&ctl->tree_lock); -} - static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, struct btrfs_free_space_ctl *ctl, struct btrfs_path *path, u64 offset) @@ -753,16 +721,6 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, goto free_cache; } - /* - * Sync discard ensures that the free space cache is always - * trimmed. So when reading this in, the state should reflect - * that. We also do this for async as a stop gap for lack of - * persistence. - */ - if (btrfs_test_opt(fs_info, DISCARD_SYNC) || - btrfs_test_opt(fs_info, DISCARD_ASYNC)) - e->trim_state = BTRFS_TRIM_STATE_TRIMMED; - if (!e->bytes) { kmem_cache_free(btrfs_free_space_cachep, e); goto free_cache; @@ -816,16 +774,9 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, ret = io_ctl_read_bitmap(&io_ctl, e); if (ret) goto free_cache; - e->bitmap_extents = count_bitmap_extents(ctl, e); - if (!btrfs_free_space_trimmed(e)) { - ctl->discardable_extents[BTRFS_STAT_CURR] += - e->bitmap_extents; - ctl->discardable_bytes[BTRFS_STAT_CURR] += e->bytes; - } } io_ctl_drop_pages(&io_ctl); - merge_space_tree(ctl); ret = 1; out: io_ctl_free(&io_ctl); @@ -836,16 +787,59 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, goto out; } +static int copy_free_space_cache(struct btrfs_block_group *block_group, + struct btrfs_free_space_ctl *ctl) +{ + struct btrfs_free_space *info; + struct rb_node *n; + int ret = 0; + + while (!ret && (n = rb_first(&ctl->free_space_offset)) != NULL) { + info = rb_entry(n, struct btrfs_free_space, offset_index); + if (!info->bitmap) { + unlink_free_space(ctl, info); + ret = btrfs_add_free_space(block_group, info->offset, + info->bytes); + kmem_cache_free(btrfs_free_space_cachep, info); + } else { + u64 offset = info->offset; + u64 bytes = ctl->unit; + + while (search_bitmap(ctl, info, &offset, &bytes, + false) == 0) { + ret = btrfs_add_free_space(block_group, offset, + bytes); + if (ret) + break; + bitmap_clear_bits(ctl, info, offset, bytes); + offset = info->offset; + bytes = ctl->unit; + } + free_bitmap(ctl, info); + } + cond_resched(); + } + return ret; +} + int load_free_space_cache(struct btrfs_block_group *block_group) { struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + struct btrfs_free_space_ctl tmp_ctl = {}; struct inode *inode; struct btrfs_path *path; int ret = 0; bool matched; u64 used = block_group->used; + /* + * Because we could potentially discard our loaded free space, we want + * to load everything into a temporary structure first, and then if it's + * valid copy it all into the actual free space ctl. + */ + btrfs_init_free_space_ctl(block_group, &tmp_ctl); + /* * If this block group has been marked to be cleared for one reason or * another then we can't trust the on disk cache, so just return. @@ -897,19 +891,25 @@ int load_free_space_cache(struct btrfs_block_group *block_group) } spin_unlock(&block_group->lock); - ret = __load_free_space_cache(fs_info->tree_root, inode, ctl, + ret = __load_free_space_cache(fs_info->tree_root, inode, &tmp_ctl, path, block_group->start); btrfs_free_path(path); if (ret <= 0) goto out; - spin_lock(&ctl->tree_lock); - matched = (ctl->free_space == (block_group->length - used - - block_group->bytes_super)); - spin_unlock(&ctl->tree_lock); + matched = (tmp_ctl.free_space == (block_group->length - used - + block_group->bytes_super)); - if (!matched) { - __btrfs_remove_free_space_cache(ctl); + if (matched) { + ret = copy_free_space_cache(block_group, &tmp_ctl); + /* + * ret == 1 means we successfully loaded the free space cache, + * so we need to re-set it here. + */ + if (ret == 0) + ret = 1; + } else { + __btrfs_remove_free_space_cache(&tmp_ctl); btrfs_warn(fs_info, "block group %llu has wrong amount of free space", block_group->start); @@ -1914,29 +1914,6 @@ find_free_space(struct btrfs_free_space_ctl *ctl, u64 *offset, u64 *bytes, return NULL; } -static int count_bitmap_extents(struct btrfs_free_space_ctl *ctl, - struct btrfs_free_space *bitmap_info) -{ - struct btrfs_block_group *block_group = ctl->private; - u64 bytes = bitmap_info->bytes; - unsigned int rs, re; - int count = 0; - - if (!block_group || !bytes) - return count; - - bitmap_for_each_set_region(bitmap_info->bitmap, rs, re, 0, - BITS_PER_BITMAP) { - bytes -= (rs - re) * ctl->unit; - count++; - - if (!bytes) - break; - } - - return count; -} - static void add_new_bitmap(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info, u64 offset) { @@ -2676,10 +2653,10 @@ void btrfs_dump_free_space(struct btrfs_block_group *block_group, "%d blocks of free space at or bigger than bytes is", count); } -void btrfs_init_free_space_ctl(struct btrfs_block_group *block_group) +void btrfs_init_free_space_ctl(struct btrfs_block_group *block_group, + struct btrfs_free_space_ctl *ctl) { struct btrfs_fs_info *fs_info = block_group->fs_info; - struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; spin_lock_init(&ctl->tree_lock); ctl->unit = fs_info->sectorsize; diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index e3d5e0ad8f8e..bf8d127d2407 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -109,7 +109,8 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root, struct btrfs_path *path, struct inode *inode); -void btrfs_init_free_space_ctl(struct btrfs_block_group *block_group); +void btrfs_init_free_space_ctl(struct btrfs_block_group *block_group, + struct btrfs_free_space_ctl *ctl); int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, struct btrfs_free_space_ctl *ctl, u64 bytenr, u64 size, diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c index 999c14e5d0bd..8519f7746b2e 100644 --- a/fs/btrfs/tests/btrfs-tests.c +++ b/fs/btrfs/tests/btrfs-tests.c @@ -224,7 +224,7 @@ btrfs_alloc_dummy_block_group(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(&cache->list); INIT_LIST_HEAD(&cache->cluster_list); INIT_LIST_HEAD(&cache->bg_list); - btrfs_init_free_space_ctl(cache); + btrfs_init_free_space_ctl(cache, cache->free_space_ctl); mutex_init(&cache->free_space_lock); return cache; From patchwork Fri Oct 23 13:58:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11853521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62B76C61DD8 for ; Fri, 23 Oct 2020 13:58:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F268A208C3 for ; Fri, 23 Oct 2020 13:58:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="dLQufQqa" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S373821AbgJWN60 (ORCPT ); Fri, 23 Oct 2020 09:58:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S373787AbgJWN6Z (ORCPT ); Fri, 23 Oct 2020 09:58:25 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 688E7C0613CE for ; Fri, 23 Oct 2020 06:58:24 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id m65so950835qte.11 for ; Fri, 23 Oct 2020 06:58:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=ds09wwiqKVOYJerPFF4bwYXctkBgLSGX/gOlKtxP2n0=; b=dLQufQqaBi9UaC6WplvJBSjte1BeQxoqDgcQjIGZAOgdi0F2K984dfLiNfcPsNn7Ha zWXVchA2tVbKgzC3UKm3rC8o2PJ87BL6+HoAnrvtOEF+m/noEomiklSUP/+D/2Xk9p0o EFrTl0lV/kSUJWfh+yA0PI4VgHQFryppxnpSmKsE9WADwyHaXk1Z/LH2M3NxreSblWZc 1cJ+IGlroGbe6DmwI5JH4Yo3laKw1904eYcRv9xEmRtbn1JLP4BiduU98ZTSa5Xh8enB puLeZPSaXoffX/x/W8j/udoFxct28v548h4FLl7q76RNQTa2GxxuTgzh654/o99yQfAG 6SHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ds09wwiqKVOYJerPFF4bwYXctkBgLSGX/gOlKtxP2n0=; b=dV+XuhWgahFV15Y4VJKxeLcdbC5y7G3xF1lw5Uzbe67Vi9F2kXUOCdqfNe+hNuwdUB hxIT/Q+HsvrcrOjN5tTSH22YtQ9Ucp5hbTiqNRbtCoz3JZUzqEuYUveG6tC5TArtXdkZ dNMOc7BJGygZurb21dfz57DHcsGFVRS0VZshsuBIVWWwdb2UdV8aAoIfG+e0rLYokxFh deVZm+eEFiBvj4MYkVy6c5z1ts46RJG2bsLf5I8PRlhxHXRdVNW3tv33jZVMk//bG72R G/GqsH8lveKMovDVGVCWH5rrnNlCEfjEGSjCmQj0r3lgn5Cr/vU/xeOYvJJsxFJzD/So upmA== X-Gm-Message-State: AOAM530m3KrH5vIDa5vEVW5XT3bCbPADeE5TP5JLtzsfMaMsAdSulx7U EyPffiBR1RHrhVrqgLUeAzl6+ZqeR3sS6K40 X-Google-Smtp-Source: ABdhPJyO88u4lnJbrpLlP5tWpb/MXDsfZg3EK5L8L1PS8Nc9WKzNVM5A7RFPyiaNHStpRL5VWPyPQw== X-Received: by 2002:aed:3323:: with SMTP id u32mr2259755qtd.355.1603461503339; Fri, 23 Oct 2020 06:58:23 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id h7sm851439qtd.82.2020.10.23.06.58.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 06:58:22 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 6/8] btrfs: load the free space cache inode extents from commit root Date: Fri, 23 Oct 2020 09:58:09 -0400 Message-Id: <5e4e7c68ef710c23034d6a7a180e8912d6ebbc7d.1603460665.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Historically we've allowed recursive locking specifically for the free space inode. This is because we are only doing reads and know that it's safe. However we don't actually need this feature, we can get away with reading the commit root for the extents. In fact if we want to allow asynchronous loading of the free space cache we have to use the commit root, otherwise we will deadlock. Switch to using the commit root for the file extents. These are only read at load time, and are replaced as soon as we start writing the cache out to disk. The cache is never read again, so this is legitimate. This matches what we do for the inode itself, as we read that from the commit root as well. Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana --- fs/btrfs/inode.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1dcccd212809..53d6a66670d3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6577,7 +6577,15 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, */ path->leave_spinning = 1; - path->recurse = btrfs_is_free_space_inode(inode); + /* + * The same explanation in load_free_space_cache applies here as well, + * we only read when we're loading the free space cache, and at that + * point the commit_root has everything we need. + */ + if (btrfs_is_free_space_inode(inode)) { + path->search_commit_root = 1; + path->skip_locking = 1; + } ret = btrfs_lookup_file_extent(NULL, root, path, objectid, start, 0); if (ret < 0) { From patchwork Fri Oct 23 13:58:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11853525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 990AFC5DF9E for ; Fri, 23 Oct 2020 13:58:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3D3D720FC3 for ; Fri, 23 Oct 2020 13:58:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="c3cZMsWa" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750172AbgJWN61 (ORCPT ); Fri, 23 Oct 2020 09:58:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S373787AbgJWN61 (ORCPT ); Fri, 23 Oct 2020 09:58:27 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54F53C0613CE for ; Fri, 23 Oct 2020 06:58:26 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id c15so971118qtc.2 for ; Fri, 23 Oct 2020 06:58:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=CeWKkm/6MXM0WI2GvWCxR0wARE9mo8N0h07K2TRqGFw=; b=c3cZMsWaPScyjlIOchNQZBRXieSIxBBXZHKG79MBuOiMhAq0mbC4RKnnCcqi4cVCqE 6AuVNmZb1aK34ST9St5CyACBufIwlFuOXnMFrJpcNWzLQ28nzmpSyOuXp0Og14VQ8x63 70dLFgxTMAHTcwS+TzA1lXJNND8muOZ/6swpmpBN0TU3TVXHkUYCS9JF0cC6R5RiuRae JEASGWMLFqJYUamDhBP0Tgo+na/evraqmfdbA/HfJxmO1IqZYGU391HNGAsedcK0X+K6 G10nsNL2J+4tsOFjtYjMe4/yxpfsdP0+GioWf9bv/R9GY8o0v1QHKaqVIWhWkG3n0nG2 okBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CeWKkm/6MXM0WI2GvWCxR0wARE9mo8N0h07K2TRqGFw=; b=gOPTZc6VsIxOz4Y9d3VMgWLzbajRN3uZmW0F9lPSv6CHSgOlwjmraaFQZQM8vLO0I7 Bg0z36sZUpoxM2ypi6rwor2XoTS0nUvsaU6rjWMgOIwJbEViFNF3gC0rTIVNRbuOwCKP 465MJo1VtEXRKfYnDesN116zL+hCuk+VaV2XkIdYU5HeAU90zOwo6pPCO1ejwAiq3lLB HaQWUVmSCtNkl9T7YBv+E2+J+pWU8DW68aOffeoAvg53y0WKLeIS3uKhPQCFXF7bzsle wPSgI+Wl2JAmKUu4PCpcMVHISFmY1ebGKNLRUcKktsCg9TCHoY6PCfGjG8KB8ZAbX1ex Rkdg== X-Gm-Message-State: AOAM532Yu2F7+39dj8PlndGQyeiY06xrzNipDwQf0KwwyuFA+FzxOIUj yt3+F5BrsCdM/Mmd5hUgE2n/9AUU/LPsVRwL X-Google-Smtp-Source: ABdhPJwLyTdv2TViCoqnnvJVzssT75apOwqmY4mEpgCoqNJgX9GyM53z7MW/z3sOP+nRZLYUS6XtQw== X-Received: by 2002:ac8:1089:: with SMTP id a9mr2362297qtj.111.1603461505122; Fri, 23 Oct 2020 06:58:25 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id e1sm751270qkm.35.2020.10.23.06.58.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 06:58:24 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 7/8] btrfs: async load free space cache Date: Fri, 23 Oct 2020 09:58:10 -0400 Message-Id: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org While documenting the usage of the commit_root_sem, I noticed that we do not actually take the commit_root_sem in the case of the free space cache. This is problematic because we're supposed to hold that sem while we're reading the commit roots, which is what we do for the free space cache. The reason I did it inline when I originally wrote the code was because there's the case of unpinning where we need to make sure that the free space cache is loaded if we're going to use the free space cache. But we can accomplish the same thing by simply waiting for the cache to be loaded. Rework this code to load the free space cache asynchronously. This allows us to greatly cleanup the caching code because now it's all shared by the various caching methods. We also are now in a position to have the commit_root semaphore held while we're loading the free space cache. And finally our modification of ->last_byte_to_unpin is removed because it can be handled in the proper way on commit. Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana --- fs/btrfs/block-group.c | 123 ++++++++++++++++++----------------------- 1 file changed, 53 insertions(+), 70 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index adbd18dc08a1..ba6564f67d9a 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -424,6 +424,23 @@ int btrfs_wait_block_group_cache_done(struct btrfs_block_group *cache) return ret; } +static bool space_cache_v1_done(struct btrfs_block_group *cache) +{ + bool ret; + + spin_lock(&cache->lock); + ret = cache->cached != BTRFS_CACHE_FAST; + spin_unlock(&cache->lock); + + return ret; +} + +static void btrfs_wait_space_cache_v1_finished(struct btrfs_block_group *cache, + struct btrfs_caching_control *caching_ctl) +{ + wait_event(caching_ctl->wait, space_cache_v1_done(cache)); +} + #ifdef CONFIG_BTRFS_DEBUG static void fragment_free_space(struct btrfs_block_group *block_group) { @@ -639,11 +656,28 @@ static noinline void caching_thread(struct btrfs_work *work) mutex_lock(&caching_ctl->mutex); down_read(&fs_info->commit_root_sem); + if (btrfs_test_opt(fs_info, SPACE_CACHE)) { + ret = load_free_space_cache(block_group); + if (ret == 1) { + ret = 0; + goto done; + } + + /* + * We failed to load the space cache, set ourselves to + * CACHE_STARTED and carry on. + */ + spin_lock(&block_group->lock); + block_group->cached = BTRFS_CACHE_STARTED; + spin_unlock(&block_group->lock); + wake_up(&caching_ctl->wait); + } + if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) ret = load_free_space_tree(caching_ctl); else ret = load_extent_tree_free(caching_ctl); - +done: spin_lock(&block_group->lock); block_group->caching_ctl = NULL; block_group->cached = ret ? BTRFS_CACHE_ERROR : BTRFS_CACHE_FINISHED; @@ -679,7 +713,7 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only { DEFINE_WAIT(wait); struct btrfs_fs_info *fs_info = cache->fs_info; - struct btrfs_caching_control *caching_ctl; + struct btrfs_caching_control *caching_ctl = NULL; int ret = 0; caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_NOFS); @@ -691,84 +725,28 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only init_waitqueue_head(&caching_ctl->wait); caching_ctl->block_group = cache; caching_ctl->progress = cache->start; - refcount_set(&caching_ctl->count, 1); + refcount_set(&caching_ctl->count, 2); btrfs_init_work(&caching_ctl->work, caching_thread, NULL, NULL); spin_lock(&cache->lock); if (cache->cached != BTRFS_CACHE_NO) { - spin_unlock(&cache->lock); kfree(caching_ctl); - return 0; + + caching_ctl = cache->caching_ctl; + if (caching_ctl) + refcount_inc(&caching_ctl->count); + spin_unlock(&cache->lock); + goto out; } WARN_ON(cache->caching_ctl); cache->caching_ctl = caching_ctl; - cache->cached = BTRFS_CACHE_FAST; + if (btrfs_test_opt(fs_info, SPACE_CACHE)) + cache->cached = BTRFS_CACHE_FAST; + else + cache->cached = BTRFS_CACHE_STARTED; + cache->has_caching_ctl = 1; spin_unlock(&cache->lock); - if (btrfs_test_opt(fs_info, SPACE_CACHE)) { - mutex_lock(&caching_ctl->mutex); - ret = load_free_space_cache(cache); - - spin_lock(&cache->lock); - if (ret == 1) { - cache->caching_ctl = NULL; - cache->cached = BTRFS_CACHE_FINISHED; - cache->last_byte_to_unpin = (u64)-1; - caching_ctl->progress = (u64)-1; - } else { - if (load_cache_only) { - cache->caching_ctl = NULL; - cache->cached = BTRFS_CACHE_NO; - } else { - cache->cached = BTRFS_CACHE_STARTED; - cache->has_caching_ctl = 1; - } - } - spin_unlock(&cache->lock); -#ifdef CONFIG_BTRFS_DEBUG - if (ret == 1 && - btrfs_should_fragment_free_space(cache)) { - u64 bytes_used; - - spin_lock(&cache->space_info->lock); - spin_lock(&cache->lock); - bytes_used = cache->length - cache->used; - cache->space_info->bytes_used += bytes_used >> 1; - spin_unlock(&cache->lock); - spin_unlock(&cache->space_info->lock); - fragment_free_space(cache); - } -#endif - mutex_unlock(&caching_ctl->mutex); - - wake_up(&caching_ctl->wait); - if (ret == 1) { - btrfs_put_caching_control(caching_ctl); - btrfs_free_excluded_extents(cache); - return 0; - } - } else { - /* - * We're either using the free space tree or no caching at all. - * Set cached to the appropriate value and wakeup any waiters. - */ - spin_lock(&cache->lock); - if (load_cache_only) { - cache->caching_ctl = NULL; - cache->cached = BTRFS_CACHE_NO; - } else { - cache->cached = BTRFS_CACHE_STARTED; - cache->has_caching_ctl = 1; - } - spin_unlock(&cache->lock); - wake_up(&caching_ctl->wait); - } - - if (load_cache_only) { - btrfs_put_caching_control(caching_ctl); - return 0; - } - down_write(&fs_info->commit_root_sem); refcount_inc(&caching_ctl->count); list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups); @@ -777,6 +755,11 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only btrfs_get_block_group(cache); btrfs_queue_work(fs_info->caching_workers, &caching_ctl->work); +out: + if (load_cache_only && caching_ctl) + btrfs_wait_space_cache_v1_finished(cache, caching_ctl); + if (caching_ctl) + btrfs_put_caching_control(caching_ctl); return ret; } From patchwork Fri Oct 23 13:58:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11853549 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9291EC63697 for ; Fri, 23 Oct 2020 13:58:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 37665208C3 for ; Fri, 23 Oct 2020 13:58:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="eH/55Tm9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750178AbgJWN6a (ORCPT ); Fri, 23 Oct 2020 09:58:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S373787AbgJWN63 (ORCPT ); Fri, 23 Oct 2020 09:58:29 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86C9FC0613CE for ; Fri, 23 Oct 2020 06:58:29 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id v200so1192035qka.0 for ; Fri, 23 Oct 2020 06:58:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=nI3bmwVfYpHgiqCrHL37Qqlp+5V0SUIv5ub/7rvv8nU=; b=eH/55Tm9UFMnc04acKsjya063QtU/QCxLDjWRQcji0OeNVzMf8uDa8/GmnV+Xyo3Ei 1Lw0RA1cMl7jpuUaR7VbhptLa8m3qoUpjwbEgOUdpc115SJP3hNrxtyH0eNkSLMOO5iH uvLxyBGpsH4otPbUg8k4dyNj6B5qxb10uKQ4KWcanvZqptaLCvZfSMvpkl9aBnNo2NyI NFagwenha1XdXTDXt8VcMo6JqqCeAu4VggCAg56zdkdDqZPyUYVTkWwOCjmoxkiCADFI 3b83jO/n88c+NkLqcbG8n8zDsPDtqfwjLA6znKR8TgT5PhZBVWQvgf5miOsDt++TNlVp Bd8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nI3bmwVfYpHgiqCrHL37Qqlp+5V0SUIv5ub/7rvv8nU=; b=npuJ01MiouSiUW5r3FPGkyWt7CaOnVotiEgX/W9H1BNwx7QTW9fz/vc8wT8Kdl3efZ 4tAl7z5/9liiauIIF973/noaeBzfeIXFpiff4LlrR7zt5/D+gI7mUQijNgosGmZkcYzD iHUeyYOHrqavf6U7HPBsUgEtxu2RS4qrRqNlRny2lIkuKO26SIGdLb6t6de+1+1/JG7z Fyzie1OoRE9NwTVGBbiWERLObu0lp8l4EATYz3vciZz+tiemm8YyrhJbM+KhqWevx9Gx Z6lReWGfs3rd6DWELRHYHzxDRqb5VYJaPIfzD4UgTEK+TiVP74JPPyROShYccTyzUm2p vGFg== X-Gm-Message-State: AOAM5326hCiVyLiovITUGjvcyss4C3NQKrbFSeMhonaitc1biUcV6ya6 ANYFD4TiB+5XSMA5r9LIbk57jFZ5RuyjXD5i X-Google-Smtp-Source: ABdhPJwO+OPn4GP16HN7QPRRhfuYtPbHgjc6Xqpn8UN7hyZ4wgkFUCuzx2mEhlusaUNdL062IdEJ1g== X-Received: by 2002:a05:620a:19:: with SMTP id j25mr2286459qki.498.1603461507038; Fri, 23 Oct 2020 06:58:27 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id m25sm744976qki.105.2020.10.23.06.58.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 06:58:26 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 8/8] btrfs: protect the fs_info->caching_block_groups differently Date: Fri, 23 Oct 2020 09:58:11 -0400 Message-Id: <7f656118637ade71f45d1a3faca617ccbea9f61f.1603460665.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org I got the following lockdep splat ====================================================== WARNING: possible circular locking dependency detected 5.9.0+ #101 Not tainted ------------------------------------------------------ btrfs-cleaner/3445 is trying to acquire lock: ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170 but task is already holding lock: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&fs_info->commit_root_sem){++++}-{3:3}: down_write+0x3d/0x70 btrfs_cache_block_group+0x2d5/0x510 find_free_extent+0xb6e/0x12f0 btrfs_reserve_extent+0xb3/0x1b0 btrfs_alloc_tree_block+0xb1/0x330 alloc_tree_block_no_bg_flush+0x4f/0x60 __btrfs_cow_block+0x11d/0x580 btrfs_cow_block+0x10c/0x220 commit_cowonly_roots+0x47/0x2e0 btrfs_commit_transaction+0x595/0xbd0 sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x36/0xa0 cleanup_mnt+0x12d/0x190 task_work_run+0x5c/0xa0 exit_to_user_mode_prepare+0x1df/0x200 syscall_exit_to_user_mode+0x54/0x280 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&space_info->groups_sem){++++}-{3:3}: down_read+0x40/0x130 find_free_extent+0x2ed/0x12f0 btrfs_reserve_extent+0xb3/0x1b0 btrfs_alloc_tree_block+0xb1/0x330 alloc_tree_block_no_bg_flush+0x4f/0x60 __btrfs_cow_block+0x11d/0x580 btrfs_cow_block+0x10c/0x220 commit_cowonly_roots+0x47/0x2e0 btrfs_commit_transaction+0x595/0xbd0 sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x36/0xa0 cleanup_mnt+0x12d/0x190 task_work_run+0x5c/0xa0 exit_to_user_mode_prepare+0x1df/0x200 syscall_exit_to_user_mode+0x54/0x280 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (btrfs-root-00){++++}-{3:3}: __lock_acquire+0x1167/0x2150 lock_acquire+0xb9/0x3d0 down_read_nested+0x43/0x130 __btrfs_tree_read_lock+0x32/0x170 __btrfs_read_lock_root_node+0x3a/0x50 btrfs_search_slot+0x614/0x9d0 btrfs_find_root+0x35/0x1b0 btrfs_read_tree_root+0x61/0x120 btrfs_get_root_ref+0x14b/0x600 find_parent_nodes+0x3e6/0x1b30 btrfs_find_all_roots_safe+0xb4/0x130 btrfs_find_all_roots+0x60/0x80 btrfs_qgroup_trace_extent_post+0x27/0x40 btrfs_add_delayed_data_ref+0x3fd/0x460 btrfs_free_extent+0x42/0x100 __btrfs_mod_ref+0x1d7/0x2f0 walk_up_proc+0x11c/0x400 walk_up_tree+0xf0/0x180 btrfs_drop_snapshot+0x1c7/0x780 btrfs_clean_one_deleted_snapshot+0xfb/0x110 cleaner_kthread+0xd4/0x140 kthread+0x13a/0x150 ret_from_fork+0x1f/0x30 other info that might help us debug this: Chain exists of: btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->commit_root_sem); lock(&space_info->groups_sem); lock(&fs_info->commit_root_sem); lock(btrfs-root-00); *** DEADLOCK *** 3 locks held by btrfs-cleaner/3445: #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140 #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0 #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80 stack backtrace: CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x8b/0xb0 check_noncircular+0xcf/0xf0 __lock_acquire+0x1167/0x2150 ? __bfs+0x42/0x210 lock_acquire+0xb9/0x3d0 ? __btrfs_tree_read_lock+0x32/0x170 down_read_nested+0x43/0x130 ? __btrfs_tree_read_lock+0x32/0x170 __btrfs_tree_read_lock+0x32/0x170 __btrfs_read_lock_root_node+0x3a/0x50 btrfs_search_slot+0x614/0x9d0 ? find_held_lock+0x2b/0x80 btrfs_find_root+0x35/0x1b0 ? do_raw_spin_unlock+0x4b/0xa0 btrfs_read_tree_root+0x61/0x120 btrfs_get_root_ref+0x14b/0x600 find_parent_nodes+0x3e6/0x1b30 btrfs_find_all_roots_safe+0xb4/0x130 btrfs_find_all_roots+0x60/0x80 btrfs_qgroup_trace_extent_post+0x27/0x40 btrfs_add_delayed_data_ref+0x3fd/0x460 btrfs_free_extent+0x42/0x100 __btrfs_mod_ref+0x1d7/0x2f0 walk_up_proc+0x11c/0x400 walk_up_tree+0xf0/0x180 btrfs_drop_snapshot+0x1c7/0x780 ? btrfs_clean_one_deleted_snapshot+0x73/0x110 btrfs_clean_one_deleted_snapshot+0xfb/0x110 cleaner_kthread+0xd4/0x140 ? btrfs_alloc_root+0x50/0x50 kthread+0x13a/0x150 ? kthread_create_worker_on_cpu+0x40/0x40 ret_from_fork+0x1f/0x30 while testing another lockdep fix. This happens because we're using the commit_root_sem to protect fs_info->caching_block_groups, which creates a dependency on the groups_sem -> commit_root_sem, which is problematic because we will allocate blocks while holding tree roots. Fix this by making the list itself protected by the fs_info->block_group_cache_lock. Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana --- fs/btrfs/block-group.c | 12 ++++++------ fs/btrfs/transaction.c | 2 ++ 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index ba6564f67d9a..f19fabae4754 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -747,10 +747,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only cache->has_caching_ctl = 1; spin_unlock(&cache->lock); - down_write(&fs_info->commit_root_sem); + spin_lock(&fs_info->block_group_cache_lock); refcount_inc(&caching_ctl->count); list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups); - up_write(&fs_info->commit_root_sem); + spin_unlock(&fs_info->block_group_cache_lock); btrfs_get_block_group(cache); @@ -999,7 +999,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, if (block_group->cached == BTRFS_CACHE_STARTED) btrfs_wait_block_group_cache_done(block_group); if (block_group->has_caching_ctl) { - down_write(&fs_info->commit_root_sem); + spin_lock(&fs_info->block_group_cache_lock); if (!caching_ctl) { struct btrfs_caching_control *ctl; @@ -1013,7 +1013,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, } if (caching_ctl) list_del_init(&caching_ctl->list); - up_write(&fs_info->commit_root_sem); + spin_unlock(&fs_info->block_group_cache_lock); if (caching_ctl) { /* Once for the caching bgs list and once for us. */ btrfs_put_caching_control(caching_ctl); @@ -3311,14 +3311,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) struct btrfs_caching_control *caching_ctl; struct rb_node *n; - down_write(&info->commit_root_sem); + spin_lock(&info->block_group_cache_lock); while (!list_empty(&info->caching_block_groups)) { caching_ctl = list_entry(info->caching_block_groups.next, struct btrfs_caching_control, list); list_del(&caching_ctl->list); btrfs_put_caching_control(caching_ctl); } - up_write(&info->commit_root_sem); + spin_unlock(&info->block_group_cache_lock); spin_lock(&info->unused_bgs_lock); while (!list_empty(&info->unused_bgs)) { diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 9ef6cba1eb59..a0cf0e0c4085 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -208,6 +208,7 @@ static noinline void switch_commit_roots(struct btrfs_trans_handle *trans) * the caching thread will re-start it's search from 3, and thus find * the hole from [4,6) to add to the free space cache. */ + spin_lock(&fs_info->block_group_cache_lock); list_for_each_entry_safe(caching_ctl, next, &fs_info->caching_block_groups, list) { struct btrfs_block_group *cache = caching_ctl->block_group; @@ -219,6 +220,7 @@ static noinline void switch_commit_roots(struct btrfs_trans_handle *trans) cache->last_byte_to_unpin = caching_ctl->progress; } } + spin_unlock(&fs_info->block_group_cache_lock); up_write(&fs_info->commit_root_sem); }