From patchwork Sun Mar 19 17:18:59 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Lyakas X-Patchwork-Id: 9632823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0BB756020B for ; Sun, 19 Mar 2017 17:19:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC64020881 for ; Sun, 19 Mar 2017 17:19:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E0AB8281B7; Sun, 19 Mar 2017 17:19:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.6 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,STOX_REPLY_TYPE,STOX_REPLY_TYPE_WITHOUT_QUOTES, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17B3720881 for ; Sun, 19 Mar 2017 17:19:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752367AbdCSRTF (ORCPT ); Sun, 19 Mar 2017 13:19:05 -0400 Received: from mail-wm0-f46.google.com ([74.125.82.46]:37584 "EHLO mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752327AbdCSRTE (ORCPT ); Sun, 19 Mar 2017 13:19:04 -0400 Received: by mail-wm0-f46.google.com with SMTP id n11so48616322wma.0 for ; Sun, 19 Mar 2017 10:19:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zadarastorage-com.20150623.gappssmtp.com; s=20150623; h=message-id:from:to:cc:subject:date:mime-version :content-transfer-encoding:importance; bh=tCHmNV8Rr11RponVh4G/x4IHW3d+BXOG3F/1jLxw0Zo=; b=izpF93e9ptecVnnlvHNPQt+rWj1JQNnaWGr+HpTPs7/P2pwDoUU3hLNKjVYDRSgXJh 3OfNOaHV1AxFz9tSifQ7AnxtQz1kW3OWM4AUBXlLTQbhMui5oD11tLdWSWOkGvjWnC9M vRbsVAr/ed6EC7qRyjsf2nIr6hc8kwZ9lKnSmLyyl0rwBqa9Spct2yYaiKpt5myeq8v1 lNPEOs4PKW6gl/8uq9Tbi39175XnPZGZmKkHHh9Ore/KdUah+thffCbgNBezut/waoTn wpRiYqxth5yAZy22s9NSa8XjdvS51aAlsIf8rTFr+e7hsF3jAPLn25HzJU3UiK6mJ8w9 ddqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:from:to:cc:subject:date:mime-version :content-transfer-encoding:importance; bh=tCHmNV8Rr11RponVh4G/x4IHW3d+BXOG3F/1jLxw0Zo=; b=Svbzw7C43pbAEwb6TXht6nsD6duOR1LEtx54LT9evECQzIxfrXZs+tx33DBFShZNJC 2qI8RERQBL6hjDuOikABGb//6xbPSk/hnyHUfHPp1ciTSxC4QQfasRx/4cE6Zc1lA+d4 cOVovVY0PioRuNMjMTUhTvvdZxgzEXBPikm6V274qssx+h9+FM9z5cDFY+bwQGXGva8Z 2NXuNAzOgsNiRgiyXlkan/JyHd2risSRmdmKCv7qtnVZtbtDqQpjh5qUW9lzQn8fKsHq +Q7PoORVV59NaEP6gw3MhLux20WhD0TetdOxlIyLBCiMDvsXh96tTtK6h0QtDLB++IqJ /tDg== X-Gm-Message-State: AFeK/H2yLx++wAmAHdcAU2QKJuyAEm86yjARgWP89FMLfvrXxUG0XmiWPwg1gy65a08WCA== X-Received: by 10.28.167.203 with SMTP id q194mr6601119wme.111.1489943942482; Sun, 19 Mar 2017 10:19:02 -0700 (PDT) Received: from alyakaslap ([82.166.81.77]) by smtp.gmail.com with ESMTPSA id z21sm17752103wrz.31.2017.03.19.10.19.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 19 Mar 2017 10:19:01 -0700 (PDT) Message-ID: <7D943753182E46848AC6B688BF365906@alyakaslap> From: "Alex Lyakas" To: Cc: , Subject: [PATCH RFC] btrfs: introduce a separate mutex for caching_block_groups list Date: Sun, 19 Mar 2017 19:18:59 +0200 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We have a commit_root_sem, which is a read-write semaphore that protects the commit roots. But it is also used to protect the list of caching block groups. As a result, while doing "slow" caching, the following issue is seen: Some of the caching threads are scanning the extent tree with commit_root_sem acquired in shared mode, with stack like: [] read_extent_buffer_pages+0x2d2/0x300 [btrfs] [] btree_read_extent_buffer_pages.constprop.50+0xb7/0x1e0 [btrfs] [] read_tree_block+0x40/0x70 [btrfs] [] read_block_for_search.isra.33+0x12c/0x370 [btrfs] [] btrfs_search_slot+0x3c6/0xb10 [btrfs] [] caching_thread+0x1b9/0x820 [btrfs] [] normal_work_helper+0xc6/0x340 [btrfs] [] btrfs_cache_helper+0x12/0x20 [btrfs] IO requests that want to allocate space are waiting in cache_block_group() to acquire the commit_root_sem in exclusive mode. But they only want to add the caching control structure to the list of caching block-groups: [] schedule+0x29/0x70 [] rwsem_down_write_failed+0x145/0x320 [] call_rwsem_down_write_failed+0x13/0x20 [] cache_block_group+0x25b/0x450 [btrfs] [] find_free_extent+0xd16/0xdb0 [btrfs] [] btrfs_reserve_extent+0xaf/0x160 [btrfs] Other caching threads want to continue their scanning, and for that they are waiting to acquire commit_root_sem in shared mode. But since there are IO threads that want the exclusive lock, the caching threads are unable to continue the scanning, because (I presume) rw_semaphore guarantees some fairness: [] schedule+0x29/0x70 [] rwsem_down_read_failed+0xc5/0x120 [] call_rwsem_down_read_failed+0x14/0x30 [] caching_thread+0x1a1/0x820 [btrfs] [] normal_work_helper+0xc6/0x340 [btrfs] [] btrfs_cache_helper+0x12/0x20 [btrfs] [] process_one_work+0x146/0x410 This causes slowness of the IO, especially when there are many block groups that need to be scanned for free space. In some cases it takes minutes until a single IO thread is able to allocate free space. I don't see a deadlock here, because the caching threads that were able to acquire the commit_root_sem will call rwsem_is_contended() and should give up the semaphore, so that IO threads are able to acquire it in exclusive mode. However, introducing a separate mutex that protects only the list of caching block groups makes things move forward much faster. This patch is based on kernel 3.18. Unfortunately, I am not able to submit a patch based on one of the latest kernels, because here btrfs is part of the larger system, and upgrading the kernel is a significant effort. Hence marking the patch as RFC. Hopefully, this patch still has some value to the community. Signed-off-by: Alex Lyakas @@ -5693,6 +5693,7 @@ void btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans, down_write(&fs_info->commit_root_sem); + mutex_lock(&fs_info->caching_block_groups_mutex); list_for_each_entry_safe(caching_ctl, next, &fs_info->caching_block_groups, list) { cache = caching_ctl->block_group; @@ -5704,6 +5705,7 @@ void btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans, cache->last_byte_to_unpin = caching_ctl->progress; } } + mutex_unlock(&fs_info->caching_block_groups_mutex); if (fs_info->pinned_extents == &fs_info->freed_extents[0]) fs_info->pinned_extents = &fs_info->freed_extents[1]; @@ -8849,14 +8851,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) struct btrfs_caching_control *caching_ctl; struct rb_node *n; - down_write(&info->commit_root_sem); + mutex_lock(&info->caching_block_groups_mutex); while (!list_empty(&info->caching_block_groups)) { caching_ctl = list_entry(info->caching_block_groups.next, struct btrfs_caching_control, list); list_del(&caching_ctl->list); put_caching_control(caching_ctl); } - up_write(&info->commit_root_sem); + mutex_unlock(&info->caching_block_groups_mutex); spin_lock(&info->unused_bgs_lock); while (!list_empty(&info->unused_bgs)) { --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 42d11e7..74feacb 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1490,6 +1490,8 @@ struct btrfs_fs_info { struct list_head trans_list; struct list_head dead_roots; struct list_head caching_block_groups; + /* protects the above list */ + struct mutex caching_block_groups_mutex; spinlock_t delayed_iput_lock; struct list_head delayed_iputs; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5177954..130ec58 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2229,6 +2229,7 @@ int open_ctree(struct super_block *sb, INIT_LIST_HEAD(&fs_info->delayed_iputs); INIT_LIST_HEAD(&fs_info->delalloc_roots); INIT_LIST_HEAD(&fs_info->caching_block_groups); + mutex_init(&fs_info->caching_block_groups_mutex); spin_lock_init(&fs_info->delalloc_root_lock); spin_lock_init(&fs_info->trans_lock); spin_lock_init(&fs_info->fs_roots_radix_lock); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a067065..906fb08 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -637,10 +637,10 @@ static int cache_block_group(struct btrfs_block_group_cache *cache, return 0; } - down_write(&fs_info->commit_root_sem); + mutex_lock(&fs_info->caching_block_groups_mutex); atomic_inc(&caching_ctl->count); list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups); - up_write(&fs_info->commit_root_sem); + mutex_unlock(&fs_info->caching_block_groups_mutex); btrfs_get_block_group(cache);