Btrfs: bring back key search optimization to btrfs_search_old_slot()
diff mbox series

Message ID 20181116110845.28561-1-fdmanana@kernel.org
State New
Headers show
Series
  • Btrfs: bring back key search optimization to btrfs_search_old_slot()
Related show

Commit Message

Filipe Manana Nov. 16, 2018, 11:08 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

Commit d7396f07358a ("Btrfs: optimize key searches in btrfs_search_slot"),
dated from August 2013, introduced an optimization to search for keys in a
node/leaf to both btrfs_search_slot() and btrfs_search_old_slot(). For the
later, it ended up being reverted in commit d4b4087c43cc ("Btrfs: do a
full search everytime in btrfs_search_old_slot"), from September 2013,
because the content of extent buffers were often inconsistent during
replay. It turned out that the reason why they were often inconsistent was
because the extent buffer replay stopped being done atomically, and got
broken after commit c8cc63416537 ("Btrfs: stop using GFP_ATOMIC for the
tree mod log allocations"), introduced in July 2013. The extent buffer
replay issue was then found and fixed by commit 5de865eebb83 ("Btrfs: fix
tree mod logging"), dated from December 2013.

So bring back the optimization to btrfs_search_old_slot() as skipping it
and its comment about disabling it confusing. After all, if unwinding
extent buffers resulted in some inconsistency, the normal searches (binary
searches) would also not always work.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Filipe Manana Nov. 26, 2018, 4:53 p.m. UTC | #1
On Fri, Nov 16, 2018 at 11:09 AM <fdmanana@kernel.org> wrote:
>
> From: Filipe Manana <fdmanana@suse.com>
>
> Commit d7396f07358a ("Btrfs: optimize key searches in btrfs_search_slot"),
> dated from August 2013, introduced an optimization to search for keys in a
> node/leaf to both btrfs_search_slot() and btrfs_search_old_slot(). For the
> later, it ended up being reverted in commit d4b4087c43cc ("Btrfs: do a
> full search everytime in btrfs_search_old_slot"), from September 2013,
> because the content of extent buffers were often inconsistent during
> replay. It turned out that the reason why they were often inconsistent was
> because the extent buffer replay stopped being done atomically, and got
> broken after commit c8cc63416537 ("Btrfs: stop using GFP_ATOMIC for the
> tree mod log allocations"), introduced in July 2013. The extent buffer
> replay issue was then found and fixed by commit 5de865eebb83 ("Btrfs: fix
> tree mod logging"), dated from December 2013.
>
> So bring back the optimization to btrfs_search_old_slot() as skipping it
> and its comment about disabling it confusing. After all, if unwinding
> extent buffers resulted in some inconsistency, the normal searches (binary
> searches) would also not always work.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

David, please remove this change from the integration branch.

It turns out after 3 weeks of stress tests it finally triggered an
assertion failure (hard to hit) and
it's indeed not reliable to use the search optimization because of how
the mod log tree currently works.
The idea was just to not make it different from btrfs_search_slot().
Use of the mod log tree is limited
to some cases where occasional faster search wouldn't bring much benefits.

Thanks.

> ---
>  fs/btrfs/ctree.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 089b46c4d97f..cf5487a79c96 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2966,7 +2966,7 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
>         int level;
>         int lowest_unlock = 1;
>         u8 lowest_level = 0;
> -       int prev_cmp = -1;
> +       int prev_cmp;
>
>         lowest_level = p->lowest_level;
>         WARN_ON(p->nodes[0] != NULL);
> @@ -2977,6 +2977,7 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
>         }
>
>  again:
> +       prev_cmp = -1;
>         b = get_old_root(root, time_seq);
>         level = btrfs_header_level(b);
>         p->locks[level] = BTRFS_READ_LOCK;
> @@ -2994,11 +2995,6 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
>                  */
>                 btrfs_unlock_up_safe(p, level + 1);
>
> -               /*
> -                * Since we can unwind ebs we want to do a real search every
> -                * time.
> -                */
> -               prev_cmp = -1;
>                 ret = key_search(b, key, level, &prev_cmp, &slot);
>
>                 if (level != 0) {
> --
> 2.11.0
>
David Sterba Nov. 29, 2018, 2:50 p.m. UTC | #2
On Mon, Nov 26, 2018 at 04:53:11PM +0000, Filipe Manana wrote:
> On Fri, Nov 16, 2018 at 11:09 AM <fdmanana@kernel.org> wrote:
> >
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Commit d7396f07358a ("Btrfs: optimize key searches in btrfs_search_slot"),
> > dated from August 2013, introduced an optimization to search for keys in a
> > node/leaf to both btrfs_search_slot() and btrfs_search_old_slot(). For the
> > later, it ended up being reverted in commit d4b4087c43cc ("Btrfs: do a
> > full search everytime in btrfs_search_old_slot"), from September 2013,
> > because the content of extent buffers were often inconsistent during
> > replay. It turned out that the reason why they were often inconsistent was
> > because the extent buffer replay stopped being done atomically, and got
> > broken after commit c8cc63416537 ("Btrfs: stop using GFP_ATOMIC for the
> > tree mod log allocations"), introduced in July 2013. The extent buffer
> > replay issue was then found and fixed by commit 5de865eebb83 ("Btrfs: fix
> > tree mod logging"), dated from December 2013.
> >
> > So bring back the optimization to btrfs_search_old_slot() as skipping it
> > and its comment about disabling it confusing. After all, if unwinding
> > extent buffers resulted in some inconsistency, the normal searches (binary
> > searches) would also not always work.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> 
> David, please remove this change from the integration branch.
> 
> It turns out after 3 weeks of stress tests it finally triggered an
> assertion failure (hard to hit) and
> it's indeed not reliable to use the search optimization because of how
> the mod log tree currently works.
> The idea was just to not make it different from btrfs_search_slot().
> Use of the mod log tree is limited
> to some cases where occasional faster search wouldn't bring much benefits.

Understood, thanks. Patch removed from misc-next.

Patch
diff mbox series

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 089b46c4d97f..cf5487a79c96 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2966,7 +2966,7 @@  int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
 	int level;
 	int lowest_unlock = 1;
 	u8 lowest_level = 0;
-	int prev_cmp = -1;
+	int prev_cmp;
 
 	lowest_level = p->lowest_level;
 	WARN_ON(p->nodes[0] != NULL);
@@ -2977,6 +2977,7 @@  int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
 	}
 
 again:
+	prev_cmp = -1;
 	b = get_old_root(root, time_seq);
 	level = btrfs_header_level(b);
 	p->locks[level] = BTRFS_READ_LOCK;
@@ -2994,11 +2995,6 @@  int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
 		 */
 		btrfs_unlock_up_safe(p, level + 1);
 
-		/*
-		 * Since we can unwind ebs we want to do a real search every
-		 * time.
-		 */
-		prev_cmp = -1;
 		ret = key_search(b, key, level, &prev_cmp, &slot);
 
 		if (level != 0) {