[2/6] btrfs: try to unlock parent nodes earlier when inserting a key

From: Filipe Manana <fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

When inserting a new key, we release the write lock on the leaf's parent
only after doing the binary search on the leaf. This is because if the
key ends up at slot 0, we will have to update the key at slot 0 of the
parent node. The same reasoning applies to any other upper level nodes
when their slot is 0. We also need to keep the parent locked in case the
leaf does not have enough free space to insert the new key/item, because
in that case we will split the leaf and we will need to add a new key to
the parent due to a new leaf resulting from the split operation.

However if the leaf has enough space for the new key and the key does not
end up at slot 0 of the leaf we could release our write lock on the parent
before doing the binary search on the leaf to figure out the destination
slot. That leads to reducing the amount of time other tasks are blocked
waiting to lock the parent, therefore increasing parallelism when there
are other tasks that are trying to access other leaves accessible through
the same parent. This also applies to other upper nodes besides the
immediate parent, when their slot is 0, since we keep locks on them until
we figure out if the leaf slot is slot 0 or not.

In fact, having the key ending at up slot 0 when is rare. Typically it
only happens when the key is less than or equals to the smallest, the
"left most", key of the entire btree, during a split attempt when we try
to push to the right sibling leaf or when the caller just wants to update
the item of an existing key. It's also very common that a leaf has enough
space to insert a new key, since after a split we move about half of the
keys from one into the new leaf.

So unlock the parent, and any other upper level nodes, when during a key
insertion we notice the key is greater then the first key in the leaf and
the leaf has enough free space. After unlocking the upper level nodes, do
the binary search using a low boundary of slot 1 and not slot 0, to figure
out the slot where the key will be inserted (or where the key already is
in case it exists and the caller wants to modify its item data).
This extra comparison, with the first key, is cheap and the key is very
likely already in a cache line because it immediatelly follows the header
of the extent buffer and we have recently read the level field of the
header (which in fact is the last field of the header).

The following fs_mark test was run on a non-debug kernel (debian's default
kernel config), with a 12 cores intel cpu, and using a nvme device:

  $ cat run-fsmark.sh
  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/nvme0n1
  MOUNT_OPTIONS="-o ssd"
  MKFS_OPTIONS="-O no-holes -R free-space-tree"
  FILES=100000
  THREADS=$(nproc --all)
  FILE_SIZE=0

  echo "performance" | \
	tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  OPTS="-S 0 -L 10 -n $FILES -s $FILE_SIZE -t $THREADS -k"
  for ((i = 1; i <= $THREADS; i++)); do
      OPTS="$OPTS -d $MNT/d$i"
  done

  fs_mark $OPTS

  umount $MNT

Before this change:

FSUse%        Count         Size    Files/sec     App Overhead
     0      1200000            0     165273.6          5958381
     0      2400000            0     190938.3          6284477
     0      3600000            0     181429.1          6044059
     0      4800000            0     173979.2          6223418
     0      6000000            0     139288.0          6384560
     0      7200000            0     163000.4          6520083
     1      8400000            0      57799.2          5388544
     1      9600000            0      66461.6          5552969
     2     10800000            0      49593.5          5163675
     2     12000000            0      57672.1          4889398

After this change:

FSUse%        Count         Size    Files/sec            App Overhead
     0      1200000            0     167987.3 (+1.6%)         6272730
     0      2400000            0     198563.9 (+4.0%)         6048847
     0      3600000            0     197436.6 (+8.8%)         6163637
     0      4800000            0     202880.7 (+16.6%)        6371771
     1      6000000            0     167275.9 (+20.1%)        6556733
     1      7200000            0     204051.2 (+25.2%)        6817091
     1      8400000            0      69622.8 (+20.5%)        5525675
     1      9600000            0      69384.5 (+4.4%)         5700723
     1     10800000            0      61454.1 (+23.9%)        5363754
     3     12000000            0      61908.7 (+7.3%)         5370196

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 137 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 118 insertions(+), 19 deletions(-)

Message ID	6b19d8920fb24d301a43eee0628d7c3789546b30.1638440535.git.fdmanana@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ABABC4332F for <linux-btrfs@archiver.kernel.org>; Thu, 2 Dec 2021 10:30:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356975AbhLBKeN (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>); Thu, 2 Dec 2021 05:34:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356984AbhLBKeL (ORCPT <rfc822;linux-btrfs@vger.kernel.org>); Thu, 2 Dec 2021 05:34:11 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [IPv6:2604:1380:40e1:4800::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D17D2C061756 for <linux-btrfs@vger.kernel.org>; Thu, 2 Dec 2021 02:30:48 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id E7336CE21E2 for <linux-btrfs@vger.kernel.org>; Thu, 2 Dec 2021 10:30:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 966F4C53FCD for <linux-btrfs@vger.kernel.org>; Thu, 2 Dec 2021 10:30:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1638441045; bh=nMUliZflM3jlUb/Qotz/AL1IPH3zx0ltY0oeCCAt688=; h=From:To:Subject:Date:In-Reply-To:References:From; b=aks64HpN8H9rJK3H2mx3puEclnl3cYwK9EpKt7xDP8JgPyJUOQBFRPvffao9NDMjZ SnNJ8BxD/AyM/+hO7XV0u6qGI2PbxMpy/huXl0nYlFbJoc5Ds5gr+rA4xwiMcJAYTL 2uPj6zjlzDbceLuqQCOrvyngelPAXjhWDvz0LqT8iyG+T74t95SHf72YH415AF9cbW 45y8/w4LsrIllb0H8ONEGgCGa7H2CFKtvRVGqmuBw7SARUDknzlss//3yvcHf7/AyP MnIW/1ntnZpJBjsReymfF73sL11+/wSaAIt21syyaoF/Gmf159Jffeixv/jQ19553f aLPMuZ/vlK+Yg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/6] btrfs: try to unlock parent nodes earlier when inserting a key Date: Thu, 2 Dec 2021 10:30:36 +0000 Message-Id: <6b19d8920fb24d301a43eee0628d7c3789546b30.1638440535.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <cover.1638440535.git.fdmanana@suse.com> References: <cover.1638440535.git.fdmanana@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-btrfs.vger.kernel.org> X-Mailing-List: linux-btrfs@vger.kernel.org
Series	btrfs: optimize btree insertions and some cleanups \| expand [0/6] btrfs: optimize btree insertions and some cleanups [1/6] btrfs: allow generic_bin_search() to take low boundary as an argument [2/6] btrfs: try to unlock parent nodes earlier when inserting a key [3/6] btrfs: remove useless condition check before splitting leaf [4/6] btrfs: move leaf search logic out of btrfs_search_slot() [5/6] btrfs: remove BUG_ON() after splitting leaf [6/6] btrfs: remove stale comment about locking at btrfs_search_slot()

[2/6] btrfs: try to unlock parent nodes earlier when inserting a key

Commit Message

Patch