[v3,4/4] Btrfs: make ranged full fsyncs more efficient

From: Filipe Manana <fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

Commit 0c713cbab6200b ("Btrfs: fix race between ranged fsync and writeback
of adjacent ranges") fixed a bug where we could end up with file extent
items in a log tree that represent file ranges that overlap due to a race
between the hole detection of a ranged full fsync and writeback for a
different file range.

The problem was solved by forcing any ranged full fsync to become a
non-ranged full fsync - setting the range start to 0 and the end offset to
LLONG_MAX. This was a simple solution because the code that detected and
marked holes was very complex, it used to be done at copy_items() and
implied several searches on the fs/subvolume tree. The drawback of that
solution was that we started to flush delalloc for the entire file and
wait for all the ordered extents to complete for ranged full fsyncs
(including ordered extents covering ranges completely outside the given
range). Fortunatelly ranged full fsyncs are not the most common case
(hopefully for most workloads).

However a later fix for detecting and marking holes was made by commit
0e56315ca147b3 ("Btrfs: fix missing hole after hole punching and fsync
when using NO_HOLES") and it simplified a lot the detection of holes,
and now copy_items() no longer does it and we do it in a much more simple
way at btrfs_log_holes().

This makes it now possible to simply make the code that detects holes to
operate only on the initial range and no longer need to operate on the
whole file, while also avoiding the need to flush delalloc for the entire
file and wait for ordered extents that cover ranges that don't overlap the
given range.

Another special care is that we must skip file extent items that fall
entirely outside the fsync range when copying inode items from the
fs/subvolume tree into the log tree - this is to avoid races with ordered
extent completion for extents falling outside the fsync range, which could
cause us to end up with file extent items in the log tree that have
overlapping ranges - for example if the fsync range is [1Mb, 2Mb], when
we copy inode items we could copy an extent item for the range [0, 512K],
then release the search path and before moving to the next leaf, an
ordered extent for a range of [256Kb, 512Kb] completes - this would
cause us to copy the new extent item for range [256Kb, 512Kb] into the
log tree after we have copied one for the range [0, 512Kb] - the extents
overlap, resulting in a corruption.

So this change just does these steps:

1) When the NO_HOLES feature is enabled it leaves the initial range
   intact - no longer sets it to [0, LLONG_MAX] when the full sync bit
   is set in the inode. If NO_HOLES is not enabled, always set the range
   to a full, just like before this change, to avoid missing file extent
   items representing holes after replaying the log (for both full and
   fast fsyncs);

2) Make the hole detection code to operate only on the fsync range;

3) Make the code that copies items from the fs/subvolume tree to skip
   copying file extent items that cover a range completely outside the
   range of the fsync.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c     | 13 --------
 fs/btrfs/tree-log.c | 93 +++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 79 insertions(+), 27 deletions(-)

Message ID	20200309124108.18952-5-fdmanana@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=wiEv=42=vger.kernel.org=linux-btrfs-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 61E5E14E3 for <patchwork-linux-btrfs@patchwork.kernel.org>; Mon, 9 Mar 2020 12:41:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3A9B420828 for <patchwork-linux-btrfs@patchwork.kernel.org>; Mon, 9 Mar 2020 12:41:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1583757678; bh=bfnGp/rRMqEr9BNfDLXGU2/rF9ztCo3D5A7aqqs+URM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=WZnfk1M3VBQ+g2PtO/JONVokk/l9knjzdFRnKe1RbdCxVwqodCNVFOdusg8MecPZ3 DfvG8Dsgg6lzbufpCiIJOqeWcaxaad6ICPRE4uV96047a7N/HX5n0J8csIz18ip42x 0Kfwn9u2r4pGPE8ugQ5IcTKCrq/1QJ4B7Vs+CGL4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726521AbgCIMlR (ORCPT <rfc822;patchwork-linux-btrfs@patchwork.kernel.org>); Mon, 9 Mar 2020 08:41:17 -0400 Received: from mail.kernel.org ([198.145.29.99]:54008 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726514AbgCIMlQ (ORCPT <rfc822;linux-btrfs@vger.kernel.org>); Mon, 9 Mar 2020 08:41:16 -0400 Received: from debian6.Home (bl8-197-74.dsl.telepac.pt [85.241.197.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7764620848; Mon, 9 Mar 2020 12:41:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1583757676; bh=bfnGp/rRMqEr9BNfDLXGU2/rF9ztCo3D5A7aqqs+URM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vwfqnQZb/njSPThZYItku3axwKcDFL4Igrz9hOJNzefG1FR2QbL/4hv9bLM9pOslF 6F7PGGUQLnMEdFsJzLLS8E/GGodhK7/ZlefOn9FJLeVyt+eKrW0vhV6uQhdUVV6pHK 1olLB3l8pItneTy+TjtLKEEmsuZOLdNv+DGLwAfY= From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Cc: josef@toxicpanda.com Subject: [PATCH v3 4/4] Btrfs: make ranged full fsyncs more efficient Date: Mon, 9 Mar 2020 12:41:08 +0000 Message-Id: <20200309124108.18952-5-fdmanana@kernel.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20200309124108.18952-1-fdmanana@kernel.org> References: <20200309124108.18952-1-fdmanana@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: <linux-btrfs.vger.kernel.org> X-Mailing-List: linux-btrfs@vger.kernel.org
Series	Btrfs: make ranged fsyncs always respect the given range \| expand [v3,0/4] Btrfs: make ranged fsyncs always respect the given range [v3,1/4] Btrfs: fix missing file extent item for hole after ranged fsync [v3,2/4] Btrfs: add helper to get the end offset of a file extent item [v3,3/4] Btrfs: factor out inode items copy loop from btrfs_log_inode() [v3,4/4] Btrfs: make ranged full fsyncs more efficient

[v3,4/4] Btrfs: make ranged full fsyncs more efficient

Commit Message

Patch