[RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount

[PROBLEM]
The following script can easily create unnecessary SINGLE or DUP chunks:
  #!/bin/bash

  dev1="/dev/test/scratch1"
  dev2="/dev/test/scratch2"
  dev3="/dev/test/scratch3"
  mnt="/mnt/btrfs"

  umount $dev1 $dev2 $dev3 $mnt &> /dev/null

  mkfs.btrfs -f $dev1 $dev2 -d raid1 -m raid1

  mount $dev1 $mnt
  umount $dev1

  wipefs -fa $dev2

  mount $dev1 -o degraded $mnt
  btrfs replace start -Bf 2 $dev3 $mnt
  umount $dev1
  btrfs ins dump-tree -t chunk $dev1

With the following chunks in chunk tree:
  leaf 3016753152 items 11 free space 14900 generation 9 owner CHUNK_TREE
  leaf 3016753152 flags 0x1(WRITTEN) backref revision 1
  fs uuid 7c5fc730-5c16-4a2b-ad39-c26e85951426
  chunk uuid 1c64265b-253e-411e-b164-b935a45d474b
  	item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
  	item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
  	item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 22020096) itemoff 15975 itemsize 112
  		length 8388608 owner 2 stripe_len 65536 type SYSTEM|RAID1
  		...
  	item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15863 itemsize 112
  		length 1073741824 owner 2 stripe_len 65536 type METADATA|RAID1
  	item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 1104150528) itemoff 15751 itemsize 112
  		length 1073741824 owner 2 stripe_len 65536 type DATA|RAID1
  	item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 2177892352) itemoff 15671 itemsize 80
  		length 268435456 owner 2 stripe_len 65536 type METADATA
  							       ^^^ SINGLE
  	item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 2446327808) itemoff 15591 itemsize 80
  		length 33554432 owner 2 stripe_len 65536 type SYSTEM
  							      ^^^ SINGLE
  	item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 2479882240) itemoff 15511 itemsize 80
  		length 536870912 owner 2 stripe_len 65536 type DATA
  							       ^^^ SINGLE
  	item 8 key (FIRST_CHUNK_TREE CHUNK_ITEM 3016753152) itemoff 15399 itemsize 112
  		length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP
  							      ^^^ DUP
  	item 9 key (FIRST_CHUNK_TREE CHUNK_ITEM 3050307584) itemoff 15287 itemsize 112
  		length 268435456 owner 2 stripe_len 65536 type METADATA|DUP
  							       ^^^ DUP
  	item 10 key (FIRST_CHUNK_TREE CHUNK_ITEM 3318743040) itemoff 15175 itemsize 112
  		length 536870912 owner 2 stripe_len 65536 type DATA|DUP
  							       ^^^ DUP

[CAUSE]
When degraded mounted, no matter whether we're mounting RW or RO,
missing devices are never considered RW, as we're acting as we only have
one rw device.

So any write to the degraded fs will cause btrfs to create new SINGLE or
DUP chunks to restore newly written data.

[FIX]
At mount time, btrfs has already done chunk level degradation check,
thus we can write to degraded chunks without problem.

So we only need to consider missing devices as writable, and calculate
our chunk allocation profile with missing devices too.

Then every thing should work as expected, without annoying SINGLE/DUP
chunks blocking later degraded mount.

With fix applied, the above replace will result the following chunk
layout instead:
leaf 22036480 items 5 free space 15626 generation 5 owner CHUNK_TREE
leaf 22036480 flags 0x1(WRITTEN) backref revision 1
fs uuid 7b825e77-e694-4474-9bfe-7bd7565fde0e
chunk uuid 2c2d9e94-a819-4479-8f16-ab529c0a4f62
	item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
	item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
	item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 22020096) itemoff 15975 itemsize 112
		length 8388608 owner 2 stripe_len 65536 type SYSTEM|RAID1
	item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15863 itemsize 112
		length 1073741824 owner 2 stripe_len 65536 type METADATA|RAID1
	item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 1104150528) itemoff 15751 itemsize 112
		length 1073741824 owner 2 stripe_len 65536 type DATA|RAID1

Reported-by: Jakob Schöttl <jschoett@gmail.com>
Cc: Jakob Schöttl <jschoett@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 13 +++++++++++++
 fs/btrfs/volumes.c     |  7 +++++++
 2 files changed, 20 insertions(+)

Message ID	20190212070319.30619-1-wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Cc: =?utf-8?q?Jakob_Sch=C3=B6ttl?= <jschoett@gmail.com> Subject: [PATCH RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount Date: Tue, 12 Feb 2019 15:03:19 +0800 Message-Id: <20190212070319.30619-1-wqu@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk
Series	[RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount \| expand [RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount

[RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount

Commit Message

Comments

Patch