[RFC] btrfs: scrub: Mandatory RO block group for device replace

[BUG]
btrfs/06[45] btrfs/071 could fail by finding csum error.
The reproducibility is not high, around 1/20~1/100, needs to run them in
loops.

And the profile doesn't make much difference, SINGLE/SINGLE can also
reproduce the problem.

The bug is observable after commit b12de52896c0 ("btrfs: scrub: Don't
check free space before marking a block group RO")

[CAUSE]
Device replace reuses scrub code to iterate existing extents.

It adds scrub_write_block_to_dev_replace() to scrub_block_complete(), so
that scrub read can write the verified data to target device.

Device replace also utilizes "write duplication" to write new data to
both source and target device.

However those two write can conflict and may lead to data corruption:
- Scrub writes old data from commit root
  Both extent location and csum are fetched from commit root, which
  is not always the up-to-date data.

- Write duplication is always duplicating latest data

This means there could be a race, that "write duplication" writes the
latest data to disk, then scrub write back the old data, causing data
corruption.

In theory, this should only affects data, not metadata.
Metadata write back only happens when committing transaction, thus it's
always after scrub writes.

[FIX]
Make dev-replace to require mandatory RO for target block group.

And to be extra safe, for dev-replace, wait for all exiting writes to
finish before scrubbing the chunk.

This patch will mostly revert commit 76a8efa171bf ("btrfs: Continue replace
when set_block_ro failed").
ENOSPC for dev-replace is still much better than data corruption.

Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: 76a8efa171bf ("btrfs: Continue replace when set_block_ro failed")
Fixes: b12de52896c0 ("btrfs: scrub: Don't check free space before marking a block group RO")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Not concretely confirmed, mostly through guess, thus it has RFC tag.

My first guess is race at the dev-replace starting point, but related
code is in fact very safe.
---
 fs/btrfs/scrub.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

Message ID	20200122083628.16331-1-wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=6xoG=3L=vger.kernel.org=linux-btrfs-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9D32392A for <patchwork-linux-btrfs@patchwork.kernel.org>; Wed, 22 Jan 2020 08:37:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7917224656 for <patchwork-linux-btrfs@patchwork.kernel.org>; Wed, 22 Jan 2020 08:37:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726004AbgAVIhC (ORCPT <rfc822;patchwork-linux-btrfs@patchwork.kernel.org>); Wed, 22 Jan 2020 03:37:02 -0500 Received: from mx2.suse.de ([195.135.220.15]:34892 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725868AbgAVIhC (ORCPT <rfc822;linux-btrfs@vger.kernel.org>); Wed, 22 Jan 2020 03:37:02 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 987DAAE0D; Wed, 22 Jan 2020 08:36:59 +0000 (UTC) From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Cc: Filipe Manana <fdmanana@suse.com> Subject: [PATCH RFC] btrfs: scrub: Mandatory RO block group for device replace Date: Wed, 22 Jan 2020 16:36:28 +0800 Message-Id: <20200122083628.16331-1-wqu@suse.com> X-Mailer: git-send-email 2.25.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: <linux-btrfs.vger.kernel.org> X-Mailing-List: linux-btrfs@vger.kernel.org
Series	[RFC] btrfs: scrub: Mandatory RO block group for device replace \| expand [RFC] btrfs: scrub: Mandatory RO block group for device replace

[RFC] btrfs: scrub: Mandatory RO block group for device replace

Commit Message

Comments

Patch