From patchwork Thu Dec 2 10:30:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 12652165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D2A8C433F5 for ; Thu, 2 Dec 2021 10:30:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356989AbhLBKeM (ORCPT ); Thu, 2 Dec 2021 05:34:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356980AbhLBKeK (ORCPT ); Thu, 2 Dec 2021 05:34:10 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [IPv6:2604:1380:40e1:4800::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D26CC06174A for ; Thu, 2 Dec 2021 02:30:48 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id F07B4CE21E3 for ; Thu, 2 Dec 2021 10:30:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A188EC53FD0 for ; Thu, 2 Dec 2021 10:30:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1638441044; bh=jgw/cETeeOb/H13VPaef1031Zu3sMrNlDvhHn3BCOoE=; h=From:To:Subject:Date:In-Reply-To:References:From; b=SYvNjUw2s9PL/3n0q3sR2r5ACldlGi7vUiR8x2fz9RKBpocq19gkdizpnMz7g/xal EO/GKRsiS7Pm9xNEZN5EW9pOuCSk86MnmZWhzaKjVNScp7yDOWV3lS6u2qehXIs/a1 hSO6GrBWR9uo3RbvmwINIZ2uEhgm61k8ydOfRGk1u5yE9biFehXtuUeQvsJkzkwWUT CBXLmsjMKxupOPjquDzul80fmhCAc6oq4g2wcO7S9Rm4+apLp92jYHzYxhfmF2OG3W MQSZf+bLeaBzsn3axh3dC6ibtHV5o8+RUqPRSQiLoouFkrkq7xSHFTXRTCAw//zjeA 8IkUeqkY32s1g== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/6] btrfs: allow generic_bin_search() to take low boundary as an argument Date: Thu, 2 Dec 2021 10:30:35 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Right now generic_bin_search() always uses a low boundary slot of 0, but in the next patch we'll want to often skip slot 0 when searching for a key. So make generic_bin_search() have the low boundary slot specified as an argument, and move the check for the extent buffer level from btrfs_bin_search() to generic_bin_search() to avoid adding another wrapper around generic_bin_search(). Signed-off-by: Filipe Manana --- fs/btrfs/ctree.c | 43 +++++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index d2297e449072..9329c8a3c855 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -726,21 +726,23 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans, } /* - * search for key in the extent_buffer. The items start at offset p, - * and they are item_size apart. + * Search for a key in the given extent_buffer. * - * the slot in the array is returned via slot, and it points to - * the place where you would insert key if it is not found in - * the array. + * The lower boundary for the search is specified by the slot number @low. Use a + * value of 0 to search over the whole extent buffer. * - * Slot may point to total number of items if the key is bigger than - * all of the keys + * The slot in the extent buffer is returned via @slot. If the key exists in the + * extent buffer, then @slot will point to the slot where the key is, otherwise + * it points to the slot where you would insert the key. + * + * Slot may point to the total number of items (i.e. one position beyond the last + * key) if the key is bigger than the last key in the extent buffer. */ -static noinline int generic_bin_search(struct extent_buffer *eb, - unsigned long p, int item_size, +static noinline int generic_bin_search(struct extent_buffer *eb, int low, const struct btrfs_key *key, int *slot) { - int low = 0; + unsigned long p; + int item_size; int high = btrfs_header_nritems(eb); int ret; const int key_size = sizeof(struct btrfs_disk_key); @@ -753,6 +755,14 @@ static noinline int generic_bin_search(struct extent_buffer *eb, return -EINVAL; } + if (btrfs_header_level(eb) == 0) { + p = offsetof(struct btrfs_leaf, items); + item_size = sizeof(struct btrfs_item); + } else { + p = offsetof(struct btrfs_node, ptrs); + item_size = sizeof(struct btrfs_key_ptr); + } + while (low < high) { unsigned long oip; unsigned long offset; @@ -791,20 +801,13 @@ static noinline int generic_bin_search(struct extent_buffer *eb, } /* - * simple bin_search frontend that does the right thing for - * leaves vs nodes + * Simple binary search on an extent buffer. Works for both leaves and nodes, and + * always searches over the whole range of keys (slot 0 to slot 'nritems - 1'). */ int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key, int *slot) { - if (btrfs_header_level(eb) == 0) - return generic_bin_search(eb, - offsetof(struct btrfs_leaf, items), - sizeof(struct btrfs_item), key, slot); - else - return generic_bin_search(eb, - offsetof(struct btrfs_node, ptrs), - sizeof(struct btrfs_key_ptr), key, slot); + return generic_bin_search(eb, 0, key, slot); } static void root_add_used(struct btrfs_root *root, u32 size) From patchwork Thu Dec 2 10:30:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 12652169 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ABABC4332F for ; Thu, 2 Dec 2021 10:30:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356975AbhLBKeN (ORCPT ); Thu, 2 Dec 2021 05:34:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356984AbhLBKeL (ORCPT ); Thu, 2 Dec 2021 05:34:11 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [IPv6:2604:1380:40e1:4800::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D17D2C061756 for ; Thu, 2 Dec 2021 02:30:48 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id E7336CE21E2 for ; Thu, 2 Dec 2021 10:30:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 966F4C53FCD for ; Thu, 2 Dec 2021 10:30:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1638441045; bh=nMUliZflM3jlUb/Qotz/AL1IPH3zx0ltY0oeCCAt688=; h=From:To:Subject:Date:In-Reply-To:References:From; b=aks64HpN8H9rJK3H2mx3puEclnl3cYwK9EpKt7xDP8JgPyJUOQBFRPvffao9NDMjZ SnNJ8BxD/AyM/+hO7XV0u6qGI2PbxMpy/huXl0nYlFbJoc5Ds5gr+rA4xwiMcJAYTL 2uPj6zjlzDbceLuqQCOrvyngelPAXjhWDvz0LqT8iyG+T74t95SHf72YH415AF9cbW 45y8/w4LsrIllb0H8ONEGgCGa7H2CFKtvRVGqmuBw7SARUDknzlss//3yvcHf7/AyP MnIW/1ntnZpJBjsReymfF73sL11+/wSaAIt21syyaoF/Gmf159Jffeixv/jQ19553f aLPMuZ/vlK+Yg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/6] btrfs: try to unlock parent nodes earlier when inserting a key Date: Thu, 2 Dec 2021 10:30:36 +0000 Message-Id: <6b19d8920fb24d301a43eee0628d7c3789546b30.1638440535.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When inserting a new key, we release the write lock on the leaf's parent only after doing the binary search on the leaf. This is because if the key ends up at slot 0, we will have to update the key at slot 0 of the parent node. The same reasoning applies to any other upper level nodes when their slot is 0. We also need to keep the parent locked in case the leaf does not have enough free space to insert the new key/item, because in that case we will split the leaf and we will need to add a new key to the parent due to a new leaf resulting from the split operation. However if the leaf has enough space for the new key and the key does not end up at slot 0 of the leaf we could release our write lock on the parent before doing the binary search on the leaf to figure out the destination slot. That leads to reducing the amount of time other tasks are blocked waiting to lock the parent, therefore increasing parallelism when there are other tasks that are trying to access other leaves accessible through the same parent. This also applies to other upper nodes besides the immediate parent, when their slot is 0, since we keep locks on them until we figure out if the leaf slot is slot 0 or not. In fact, having the key ending at up slot 0 when is rare. Typically it only happens when the key is less than or equals to the smallest, the "left most", key of the entire btree, during a split attempt when we try to push to the right sibling leaf or when the caller just wants to update the item of an existing key. It's also very common that a leaf has enough space to insert a new key, since after a split we move about half of the keys from one into the new leaf. So unlock the parent, and any other upper level nodes, when during a key insertion we notice the key is greater then the first key in the leaf and the leaf has enough free space. After unlocking the upper level nodes, do the binary search using a low boundary of slot 1 and not slot 0, to figure out the slot where the key will be inserted (or where the key already is in case it exists and the caller wants to modify its item data). This extra comparison, with the first key, is cheap and the key is very likely already in a cache line because it immediatelly follows the header of the extent buffer and we have recently read the level field of the header (which in fact is the last field of the header). The following fs_mark test was run on a non-debug kernel (debian's default kernel config), with a 12 cores intel cpu, and using a nvme device: $ cat run-fsmark.sh #!/bin/bash DEV=/dev/nvme0n1 MNT=/mnt/nvme0n1 MOUNT_OPTIONS="-o ssd" MKFS_OPTIONS="-O no-holes -R free-space-tree" FILES=100000 THREADS=$(nproc --all) FILE_SIZE=0 echo "performance" | \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor mkfs.btrfs -f $MKFS_OPTIONS $DEV mount $MOUNT_OPTIONS $DEV $MNT OPTS="-S 0 -L 10 -n $FILES -s $FILE_SIZE -t $THREADS -k" for ((i = 1; i <= $THREADS; i++)); do OPTS="$OPTS -d $MNT/d$i" done fs_mark $OPTS umount $MNT Before this change: FSUse% Count Size Files/sec App Overhead 0 1200000 0 165273.6 5958381 0 2400000 0 190938.3 6284477 0 3600000 0 181429.1 6044059 0 4800000 0 173979.2 6223418 0 6000000 0 139288.0 6384560 0 7200000 0 163000.4 6520083 1 8400000 0 57799.2 5388544 1 9600000 0 66461.6 5552969 2 10800000 0 49593.5 5163675 2 12000000 0 57672.1 4889398 After this change: FSUse% Count Size Files/sec App Overhead 0 1200000 0 167987.3 (+1.6%) 6272730 0 2400000 0 198563.9 (+4.0%) 6048847 0 3600000 0 197436.6 (+8.8%) 6163637 0 4800000 0 202880.7 (+16.6%) 6371771 1 6000000 0 167275.9 (+20.1%) 6556733 1 7200000 0 204051.2 (+25.2%) 6817091 1 8400000 0 69622.8 (+20.5%) 5525675 1 9600000 0 69384.5 (+4.4%) 5700723 1 10800000 0 61454.1 (+23.9%) 5363754 3 12000000 0 61908.7 (+7.3%) 5370196 Signed-off-by: Filipe Manana --- fs/btrfs/ctree.c | 137 ++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 118 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 9329c8a3c855..62066c034363 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1679,6 +1679,27 @@ static int finish_need_commit_sem_search(struct btrfs_path *path) return 0; } +static inline int search_for_key_slot(struct extent_buffer *eb, + int search_low_slot, + const struct btrfs_key *key, + int prev_cmp, + int *slot) +{ + /* + * If a previous call to btrfs_bin_search() on a parent node returned an + * exact match (prev_cmp == 0), we can safely assume the target key will + * always be at slot 0 on lower levels, since each key pointer + * (struct btrfs_key_ptr) refers to the lowest key acessible from the + * subtree it points to. Thus we can skip searching lower levels. + */ + if (prev_cmp == 0) { + *slot = 0; + return 0; + } + + return generic_bin_search(eb, search_low_slot, key, slot); +} + /* * btrfs_search_slot - look for a key in a tree and perform necessary * modifications to preserve tree invariants. @@ -1839,25 +1860,98 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, } } - /* - * If btrfs_bin_search returns an exact match (prev_cmp == 0) - * we can safely assume the target key will always be in slot 0 - * on lower levels due to the invariants BTRFS' btree provides, - * namely that a btrfs_key_ptr entry always points to the - * lowest key in the child node, thus we can skip searching - * lower levels - */ - if (prev_cmp == 0) { - slot = 0; - ret = 0; - } else { - ret = btrfs_bin_search(b, key, &slot); - prev_cmp = ret; + if (level == 0) { + int leaf_free_space = 0; + int search_low_slot = 0; + + /* + * If we are doing an insertion, the leaf has enough free + * space and the destination slot for the key is not slot + * 0, then we can unlock our write lock on the parent, and + * any other upper nodes, before doing the binary search + * on the leaf (with search_for_key_slot()), allowing other + * tasks to lock the parent and any other upper nodes. + */ + if (ins_len > 0) { + struct btrfs_disk_key first_key; + + /* + * Cache the leaf free space, since we will need it + * later and it will not change until then. + */ + leaf_free_space = btrfs_leaf_free_space(b); + + /* + * !p->locks[1] means we have a single node tree, + * the leaf is the root of the tree. + */ + if (!p->locks[1] || leaf_free_space < ins_len) + goto leaf_search; + + ASSERT(btrfs_header_nritems(b) > 0); + btrfs_item_key(b, &first_key, 0); + + /* + * Doing the extra comparison with the first key + * is cheap, taking into account that the first + * key is very likely already in a cache line + * because it immediattely follows the extent + * buffer's header and we have recently accessed + * the header's level field. + */ + ret = comp_keys(&first_key, key); + if (ret < 0) { + /* + * The first key is smaller than the key + * we want to insert, so we are safe to + * unlock all upper nodes and we have to + * do the binary search. + * + * We do use btrfs_unlock_up_safe() and + * not unlock_up() because the later does + * not unlock nodes with a slot of 0. + * We can safely unlock any node even if + * its slot is 0 since in this case the + * key does not end up at slot 0 of the + * leaf and there's also no need to split + * the leaf. + */ + btrfs_unlock_up_safe(p, 1); + search_low_slot = 1; + } else { + /* + * The first key is >= then the key we + * want to insert, so we can skip the + * binary search as the target key will + * be at slot 0. + * + * We can not unlock upper nodes when + * the key is less than the first key, + * because we will need to update the key + * at slot 0 of the parent node and + * possibly of other upper nodes too. + * If the key matches the first key, then + * we can unlock all the upper nodes, + * using btrfs_unlock_up_safe() instead + * of unlock_up() as stated above. + */ + if (ret == 0) + btrfs_unlock_up_safe(p, 1); + slot = 0; + /* + * ret is already 0 or 1, matching the + * result of a btrfs_bin_search() call, + * so there is no need to adjust it. + */ + goto skip_leaf_search; + } + } +leaf_search: + ret = search_for_key_slot(b, search_low_slot, key, + prev_cmp, &slot); if (ret < 0) goto done; - } - - if (level == 0) { +skip_leaf_search: p->slots[level] = slot; /* * Item key already exists. In this case, if we are @@ -1873,8 +1967,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, ASSERT(ins_len >= sizeof(struct btrfs_item)); ins_len -= sizeof(struct btrfs_item); } - if (ins_len > 0 && - btrfs_leaf_free_space(b) < ins_len) { + if (ins_len > 0 && leaf_free_space < ins_len) { if (write_lock_level < 1) { write_lock_level = 1; btrfs_release_path(p); @@ -1895,6 +1988,12 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, min_write_lock_level, NULL); goto done; } + + ret = search_for_key_slot(b, 0, key, prev_cmp, &slot); + if (ret < 0) + goto done; + prev_cmp = ret; + if (ret && slot > 0) { dec = 1; slot--; From patchwork Thu Dec 2 10:30:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 12652167 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E00D5C433FE for ; Thu, 2 Dec 2021 10:30:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356991AbhLBKeM (ORCPT ); Thu, 2 Dec 2021 05:34:12 -0500 Received: from sin.source.kernel.org ([145.40.73.55]:50514 "EHLO sin.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356975AbhLBKeL (ORCPT ); Thu, 2 Dec 2021 05:34:11 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id D9D33CE2207 for ; Thu, 2 Dec 2021 10:30:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8C449C53FCB for ; Thu, 2 Dec 2021 10:30:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1638441046; bh=59xhs2+h70b9yFRcwtESXYasR/N9z/s/vqBCBRwZtVI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Dxo+XrORERohez+0j6tdkCVtwTbq+doVnCEoNUIpNrrJ2RXhNrLt6mUvh48YV/EQK 6YLhHmGdjv8kEZJe1nzBnPdTJAKbxfcg3zc0H3t1suKYD5ZD9EKLE/GUuf8MFrGFVs iMm+HN5/mG41tVLj5lszuE6K5HHALa88qL+2qba8xINrGvBpneTn0zTeJQOxCLjDy3 SfuAsxQxxp0K9JFn6enNWxHyfhiLCgcpgyO2aHYAoqvrGE0T5ImkpNwI1n4yPhz8+6 sBxBNXUkBkXuEdgS6WKTmtHELwS0QrSoIe8JM9dagNwDRP8xgldZtKYWIUty7/hrW5 /KNP5q7vZZQCQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/6] btrfs: remove useless condition check before splitting leaf Date: Thu, 2 Dec 2021 10:30:37 +0000 Message-Id: <0f1bed5dede2ac134033ca79e899ddd5dec833b1.1638440535.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When inserting a key, we check if the write_lock_level is less than 1, and if so we set it to 1, release the path and retry the tree traversal. However that is unnecessary, because when ins_len is greater than 0, we know that write_lock_level can never be less than 1. The logic to retry is also buggy, because in case ins_len was decremented, due to an exact key match and the search is not meant for item extension (path->search_for_extension is 0), we retry without incrementing ins_len, which would make the next retry decrement it again by the same amount. So remove the check for write_lock_level being less than 1 and add an assertion to assert it's always >= 1. Signed-off-by: Filipe Manana --- fs/btrfs/ctree.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 62066c034363..9439c8606835 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1968,11 +1968,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, ins_len -= sizeof(struct btrfs_item); } if (ins_len > 0 && leaf_free_space < ins_len) { - if (write_lock_level < 1) { - write_lock_level = 1; - btrfs_release_path(p); - goto again; - } + ASSERT(write_lock_level >= 1); err = split_leaf(trans, root, key, p, ins_len, ret == 0); From patchwork Thu Dec 2 10:30:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 12652173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2F7BC433EF for ; Thu, 2 Dec 2021 10:30:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356993AbhLBKeN (ORCPT ); Thu, 2 Dec 2021 05:34:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356985AbhLBKeM (ORCPT ); Thu, 2 Dec 2021 05:34:12 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0E62C06174A for ; Thu, 2 Dec 2021 02:30:49 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4D129B82219 for ; Thu, 2 Dec 2021 10:30:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 85696C00446 for ; Thu, 2 Dec 2021 10:30:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1638441047; bh=FvoYpk+g3WC6lnf7SHkCyzcHU/We+QkiqLRltYHZHeQ=; h=From:To:Subject:Date:In-Reply-To:References:From; b=fllKlrfiY4TodEo5tsDjAdmsEJ9LasP6C0RUiwfDJuZ4bq3P1A2JFuAxosPUznH71 fSTSe+CfpqU7Ieek+6lTRhW0Ut5QMzHroyus3lP+lRVJy4sm+raGBzE+r89HffHj8t LL8e2DR6GTaR8mS7mbFjtYZtboozXe1PKaKLCePduPV3JHL7WDpJDkN2qBXMGGdeuy HsKa6TaRFkcxNoFlkRBLayx9tMnxhsqsuIoWOjPVAPfn2Ih9wB+7mwmwVG9Nh+EoRp AwpLMq/QhhVuIH9Q06vr8GRHGDGJ5R4eEt57qNRyBu1fohODRLIkr16BcBVv7sqeya 69yHvjQ1uo1og== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 4/6] btrfs: move leaf search logic out of btrfs_search_slot() Date: Thu, 2 Dec 2021 10:30:38 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana There's quite a significant amount of code for doing the key search for a leaf at btrfs_search_slot(), with a couple labels and gotos in it, plus btrfs_search_slot() is already big enough. So move the logic that does the key search on a leaf into a new helper function. This makes it better organized, removing the need for the labels and the gotos, as well as reducing the indentation level and the size of btrfs_search_slot(). Signed-off-by: Filipe Manana --- fs/btrfs/ctree.c | 244 +++++++++++++++++++++++++---------------------- 1 file changed, 128 insertions(+), 116 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 9439c8606835..15357274a0c4 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1700,6 +1700,132 @@ static inline int search_for_key_slot(struct extent_buffer *eb, return generic_bin_search(eb, search_low_slot, key, slot); } +static int search_leaf(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + const struct btrfs_key *key, + struct btrfs_path *path, + int ins_len, + int prev_cmp) +{ + struct extent_buffer *leaf = path->nodes[0]; + int leaf_free_space = -1; + int search_low_slot = 0; + int ret; + bool do_bin_search = true; + + /* + * If we are doing an insertion, the leaf has enough free space and the + * destination slot for the key is not slot 0, then we can unlock our + * write lock on the parent, and any other upper nodes, before doing the + * binary search on the leaf (with search_for_key_slot()), allowing other + * tasks to lock the parent and any other upper nodes. + */ + if (ins_len > 0) { + /* + * Cache the leaf free space, since we will need it later and it + * will not change until then. + */ + leaf_free_space = btrfs_leaf_free_space(leaf); + + /* + * !path->locks[1] means we have a single node tree, the leaf is + * the root of the tree. + */ + if (path->locks[1] && leaf_free_space >= ins_len) { + struct btrfs_disk_key first_key; + + ASSERT(btrfs_header_nritems(leaf) > 0); + btrfs_item_key(leaf, &first_key, 0); + + /* + * Doing the extra comparison with the first key is cheap, + * taking into account that the first key is very likely + * already in a cache line because it immediattely follows + * the extent buffer's header and we have recently accessed + * the header's level field. + */ + ret = comp_keys(&first_key, key); + if (ret < 0) { + /* + * The first key is smaller than the key we want + * to insert, so we are safe to unlock all upper + * nodes and we have to do the binary search. + * + * We do use btrfs_unlock_up_safe() and not + * unlock_up() because the later does not unlock + * nodes with a slot of 0 - we can safely unlock + * any node even if its slot is 0 since in this + * case the key does not end up at slot 0 of the + * leaf and there's no need to split the leaf. + */ + btrfs_unlock_up_safe(path, 1); + search_low_slot = 1; + } else { + /* + * The first key is >= then the key we want to + * insert, so we can skip the binary search as + * the target key will be at slot 0. + * + * We can not unlock upper nodes when the key is + * less than the first key, because we will need + * to update the key at slot 0 of the parent node + * and possibly of other upper nodes too. + * If the key matches the first key, then we can + * unlock all the upper nodes, using + * btrfs_unlock_up_safe() instead of unlock_up() + * as stated above. + */ + if (ret == 0) + btrfs_unlock_up_safe(path, 1); + /* + * ret is already 0 or 1, matching the result of + * a btrfs_bin_search() call, so there is no need + * to adjust it. + */ + do_bin_search = false; + path->slots[0] = 0; + } + } + } + + if (do_bin_search) { + ret = search_for_key_slot(leaf, search_low_slot, key, + prev_cmp, &path->slots[0]); + if (ret < 0) + return ret; + } + + if (ins_len > 0) { + /* + * Item key already exists. In this case, if we are allowed to + * insert the item (for example, in dir_item case, item key + * collision is allowed), it will be merged with the original + * item. Only the item size grows, no new btrfs item will be + * added. If search_for_extension is not set, ins_len already + * accounts the size btrfs_item, deduct it here so leaf space + * check will be correct. + */ + if (ret == 0 && !path->search_for_extension) { + ASSERT(ins_len >= sizeof(struct btrfs_item)); + ins_len -= sizeof(struct btrfs_item); + } + + ASSERT(leaf_free_space >= 0); + + if (leaf_free_space < ins_len) { + int err; + + err = split_leaf(trans, root, key, path, ins_len, + (ret == 0)); + BUG_ON(err > 0); + if (err) + ret = err; + } + } + + return ret; +} + /* * btrfs_search_slot - look for a key in a tree and perform necessary * modifications to preserve tree invariants. @@ -1861,124 +1987,10 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, } if (level == 0) { - int leaf_free_space = 0; - int search_low_slot = 0; - - /* - * If we are doing an insertion, the leaf has enough free - * space and the destination slot for the key is not slot - * 0, then we can unlock our write lock on the parent, and - * any other upper nodes, before doing the binary search - * on the leaf (with search_for_key_slot()), allowing other - * tasks to lock the parent and any other upper nodes. - */ - if (ins_len > 0) { - struct btrfs_disk_key first_key; - - /* - * Cache the leaf free space, since we will need it - * later and it will not change until then. - */ - leaf_free_space = btrfs_leaf_free_space(b); - - /* - * !p->locks[1] means we have a single node tree, - * the leaf is the root of the tree. - */ - if (!p->locks[1] || leaf_free_space < ins_len) - goto leaf_search; - - ASSERT(btrfs_header_nritems(b) > 0); - btrfs_item_key(b, &first_key, 0); - - /* - * Doing the extra comparison with the first key - * is cheap, taking into account that the first - * key is very likely already in a cache line - * because it immediattely follows the extent - * buffer's header and we have recently accessed - * the header's level field. - */ - ret = comp_keys(&first_key, key); - if (ret < 0) { - /* - * The first key is smaller than the key - * we want to insert, so we are safe to - * unlock all upper nodes and we have to - * do the binary search. - * - * We do use btrfs_unlock_up_safe() and - * not unlock_up() because the later does - * not unlock nodes with a slot of 0. - * We can safely unlock any node even if - * its slot is 0 since in this case the - * key does not end up at slot 0 of the - * leaf and there's also no need to split - * the leaf. - */ - btrfs_unlock_up_safe(p, 1); - search_low_slot = 1; - } else { - /* - * The first key is >= then the key we - * want to insert, so we can skip the - * binary search as the target key will - * be at slot 0. - * - * We can not unlock upper nodes when - * the key is less than the first key, - * because we will need to update the key - * at slot 0 of the parent node and - * possibly of other upper nodes too. - * If the key matches the first key, then - * we can unlock all the upper nodes, - * using btrfs_unlock_up_safe() instead - * of unlock_up() as stated above. - */ - if (ret == 0) - btrfs_unlock_up_safe(p, 1); - slot = 0; - /* - * ret is already 0 or 1, matching the - * result of a btrfs_bin_search() call, - * so there is no need to adjust it. - */ - goto skip_leaf_search; - } - } -leaf_search: - ret = search_for_key_slot(b, search_low_slot, key, - prev_cmp, &slot); - if (ret < 0) - goto done; -skip_leaf_search: - p->slots[level] = slot; - /* - * Item key already exists. In this case, if we are - * allowed to insert the item (for example, in dir_item - * case, item key collision is allowed), it will be - * merged with the original item. Only the item size - * grows, no new btrfs item will be added. If - * search_for_extension is not set, ins_len already - * accounts the size btrfs_item, deduct it here so leaf - * space check will be correct. - */ - if (ret == 0 && ins_len > 0 && !p->search_for_extension) { - ASSERT(ins_len >= sizeof(struct btrfs_item)); - ins_len -= sizeof(struct btrfs_item); - } - if (ins_len > 0 && leaf_free_space < ins_len) { + if (ins_len > 0) ASSERT(write_lock_level >= 1); - err = split_leaf(trans, root, key, - p, ins_len, ret == 0); - - BUG_ON(err > 0); - if (err) { - ret = err; - goto done; - } - } + ret = search_leaf(trans, root, key, p, ins_len, prev_cmp); if (!p->search_for_split) unlock_up(p, level, lowest_unlock, min_write_lock_level, NULL); From patchwork Thu Dec 2 10:30:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 12652175 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79551C433FE for ; Thu, 2 Dec 2021 10:30:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356995AbhLBKeO (ORCPT ); Thu, 2 Dec 2021 05:34:14 -0500 Received: from sin.source.kernel.org ([145.40.73.55]:50524 "EHLO sin.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356980AbhLBKeN (ORCPT ); Thu, 2 Dec 2021 05:34:13 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id C29F9CE2206 for ; Thu, 2 Dec 2021 10:30:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 78EE0C53FD0 for ; Thu, 2 Dec 2021 10:30:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1638441048; bh=FH5/aMjQb7dh7n92dAky3VGE4WCmnzptcGr/oaeoFuw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=PrVvwgKw0S3kcPOqPeDCSqOHt5xacxkwoirsak6V98qKNGalR7u3ctvGwNyz71gJy cBGwcUuxR5M8iTmMR+EXQVAEvzDWCP8xVPKhgY89bfqMsBCDkIOrAbsLvRi+T544HM 9Om6ECDEER/MdIDyRabL61wlwyF2UCEoqYjXEFRntXZ15VXyNXNF4yVqS1Gm8Va1F2 AmTQzXlXO3I1y4hg6lXVdiJHmYxEo1w2FiC9vRiIcLYSr/rP8WJf3bHKpqXPJtxY7V Rmh4w8VxyjO82TthdbLAEhfewXH7xvjCJ3yh+1u0cWitcDf/FcVGutJm8NThpUXLNO cAHDBzBvig/FQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 5/6] btrfs: remove BUG_ON() after splitting leaf Date: Thu, 2 Dec 2021 10:30:39 +0000 Message-Id: <966e008cdf776b803b96a87756aa58917212a0c6.1638440535.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana After calling split_leaf() we BUG_ON() if the returned value is greater than zero. However split_leaf() only returns 0, in case of success, or a negative value in case of an error. The reason for the BUG_ON() is that if we ever get a positive return value from split_leaf(), we can not simply propagate it to the callers of btrfs_search_slot(), as that would be interpreted as "key not found" and not as an error. That means it could result in callers ending up causing some potential silent corruption. So change the BUG_ON() to an ASSERT(), and in case assertions are disabled, produce a warning and set the return value to an error, to make it not possible to get into a silent corruption and having the error not noticed. Signed-off-by: Filipe Manana --- fs/btrfs/ctree.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 15357274a0c4..bcf99c787d68 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1817,7 +1817,9 @@ static int search_leaf(struct btrfs_trans_handle *trans, err = split_leaf(trans, root, key, path, ins_len, (ret == 0)); - BUG_ON(err > 0); + ASSERT(err <= 0); + if (WARN_ON(err > 0)) + err = -EUCLEAN; if (err) ret = err; } From patchwork Thu Dec 2 10:30:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 12652171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CCCAC43217 for ; Thu, 2 Dec 2021 10:30:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356994AbhLBKeO (ORCPT ); Thu, 2 Dec 2021 05:34:14 -0500 Received: from sin.source.kernel.org ([145.40.73.55]:50518 "EHLO sin.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356986AbhLBKeM (ORCPT ); Thu, 2 Dec 2021 05:34:12 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 59FC7CE2131 for ; Thu, 2 Dec 2021 10:30:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6DB28C53FCB for ; Thu, 2 Dec 2021 10:30:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1638441048; bh=a6114suPyocTPMf1t9V0aVxrusR6FaURfkzzrIje8l0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Ln9V0BFVcjnkPKO2rfGuTIb6SZ1na1+xCm80KI7A4G7KikNc+15bRBzCaglUnaiHA l6rMt33kfwYEGXbrdiOaGlkS/oowhmIJNDq49GRTtQicKDtoW0d9hoExbPKEyZTNlZ AX4If1eo+GenGlAERXm5il2RXdNluHTWfA+95abkg3/Dt8o/JXi1KxlQu7UmiaO1ZD IWMNfL+BlQbyZRZuk1/GZnQ/c5BxVbGLsJf5MK9+BYOW5I0u62B6072adq1YDC9gbv mNppbNV/oH3cJluLGX9xkt82zEmQz/agqJ1uGzhVPO1fNLa1pLjG3B7LSjF6Pkyhz+ 0TLZnnHGiIpvw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 6/6] btrfs: remove stale comment about locking at btrfs_search_slot() Date: Thu, 2 Dec 2021 10:30:40 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana The comment refers to the old extent buffer locking code, where we used to have custom locks that had blocking and spinning behaviour modes. That is not the case anymore, since we have transitioned to rw semaphores, so the comment does not offer any value anymore. Remove it. Signed-off-by: Filipe Manana --- fs/btrfs/ctree.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index bcf99c787d68..fdd286ab1259 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1963,10 +1963,6 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, } cow_done: p->nodes[level] = b; - /* - * Leave path with blocking locks to avoid massive - * lock context switch, this is made on purpose. - */ /* * we have a lock on b and as long as we aren't changing