[v5,10/11] btrfs: fix a subpage relocation data corruption

[BUG]
When using the following script, btrfs will report data corruption after
one data balance with subpage support:

  mkfs.btrfs -f -s 4k $dev
  mount $dev -o nospace_cache $mnt
  $fsstress -w -n 8 -s 1620948986 -d $mnt/ -v > /tmp/fsstress
  sync
  btrfs balance start -d $mnt
  btrfs scrub start -B $mnt

Similar problem can be easily observed in btrfs/028 test case, there
will be tons of balance failure with -EIO.

[CAUSE]
Above fsstress will result the following data extents layout in extent
tree:
  item 10 key (13631488 EXTENT_ITEM 98304) itemoff 15889 itemsize 82
    refs 2 gen 7 flags DATA
    extent data backref root FS_TREE objectid 259 offset 1339392 count 1
    extent data backref root FS_TREE objectid 259 offset 647168 count 1
  item 11 key (13631488 BLOCK_GROUP_ITEM 8388608) itemoff 15865 itemsize 24
    block group used 102400 chunk_objectid 256 flags DATA
  item 12 key (13733888 EXTENT_ITEM 4096) itemoff 15812 itemsize 53
    refs 1 gen 7 flags DATA
    extent data backref root FS_TREE objectid 259 offset 729088 count 1

Then when creating the data reloc inode, the data reloc inode will look
like this:

	0	32K	64K	96K 100K	104K
	|<------ Extent A ----->|   |<- Ext B ->|

Then when we first try to relocate extent A, we setup the data reloc
inode with iszie 96K, then read both page [0, 64K) and page [64K, 128K).

For page 64K, since the isize is just 96K, we fill range [96K, 128K)
with 0 and set it uptodate.

Then when we come to extent B, we update isize to 104K, then try to read
page [64K, 128K).
Then we find the page is already uptodate, so we skip the read.
But range [96K, 128K) is filled with 0, not the real data.

Then we writeback the data reloc inode to disk, with 0 filling range
[96K, 128K), corrupting the content of extent B.

The behavior is caused by the fact that we still do full page read for
subpage case.

The bug won't really happen for regular sectorsize, as one page only
contains one sector.

[FIX]
This patch will fix the problem by invalidating range [isize, PAGE_END]
in prealloc_file_extent_cluster().

So that if above example happens, when we preallocate the file extent
for extent B, we will clear the uptodate bits for range [96K, 128K),
allowing later relocate_one_page() to re-read the needed range.

There is a special note for the invalidating part.

Since we're not calling real btrfs_invalidatepage(), but just clearing
the subpage and page uptodate bits, we can leave a page half dirty and
half out of date.

Reading such page can make btrfs to deadlock, as we normally expect a
dirty page to be full uptodate.

Thus here we flush and wait the data reloc inode before doing the hacked
invalidating.
This won't cause extra overhead, as we're going to writeback the data
later anyway.

Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 59 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

Message ID	20210618072437.207550-11-wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4D3DC48BDF for <linux-btrfs@archiver.kernel.org>; Fri, 18 Jun 2021 07:25:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AEBF0610A2 for <linux-btrfs@archiver.kernel.org>; Fri, 18 Jun 2021 07:25:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232318AbhFRH1n (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>); Fri, 18 Jun 2021 03:27:43 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:60618 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230494AbhFRH1j (ORCPT <rfc822;linux-btrfs@vger.kernel.org>); Fri, 18 Jun 2021 03:27:39 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 66DD321AAE; Fri, 18 Jun 2021 07:25:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1624001130; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NiPr8EftLpJebEM8QdozW8FLL9nbwyYgcBowic4Ox3o=; b=sRMiTkHCT2l4gxpvx0QJbDCIzsAOvZ9/siPjfGTj25+gFcv3gfHwExMiTcW0lo1ack19ne aTzIlS52CgkhOgff5uD6wm0IXCIQqdtR/AcR/gtawAJHoCQ9zzl2Ls1N7rRmuLgo94Gcy3 2mdI8Eccus1rZgZ4HVOIXT8B0zSZ6N8= Received: from adam-pc.lan (unknown [10.163.16.38]) by relay2.suse.de (Postfix) with ESMTP id CCC9BA3BD3; Fri, 18 Jun 2021 07:25:27 +0000 (UTC) From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Cc: Ritesh Harjani <riteshh@linux.ibm.com> Subject: [PATCH v5 10/11] btrfs: fix a subpage relocation data corruption Date: Fri, 18 Jun 2021 15:24:36 +0800 Message-Id: <20210618072437.207550-11-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210618072437.207550-1-wqu@suse.com> References: <20210618072437.207550-1-wqu@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-btrfs.vger.kernel.org> X-Mailing-List: linux-btrfs@vger.kernel.org
Series	btrfs: add data write support for subpage \| expand [v5,00/11] btrfs: add data write support for subpage [v5,01/11] btrfs: extract relocation page read and dirty part into its own function [v5,02/11] btrfs: make relocate_one_page() to handle subpage case [v5,03/11] btrfs: fix wild subpage writeback which does not have ordered extent. [v5,04/11] btrfs: disable inline extent creation for subpage [v5,05/11] btrfs: allow submit_extent_page() to do bio split for subpage [v5,06/11] btrfs: reject raid5/6 fs for subpage [v5,07/11] btrfs: fix a crash caused by race between prepare_pages() and btrfs_releasepage() [v5,08/11] btrfs: fix a use-after-free bug in writeback subpage helper [v5,09/11] btrfs: fix a subpage false alert for relocating partial preallocated data extents [v5,10/11] btrfs: fix a subpage relocation data corruption [v5,11/11] btrfs: allow read-write for 4K sectorsize on 64K page size systems

[v5,10/11] btrfs: fix a subpage relocation data corruption

Commit Message

Patch