[09/19] btrfs: limit super block locations in HMZONED mode

Message ID	20190607131025.31996-10-naohiro.aota@wdc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> IronPort-SDR: CCqCKZuRz33+0hF/Fif4eTafdXCw53t/vEnu8l+lsei+Ws7h4klDz89W7W3UnK8/3bktc37P0n oeUyENIZsJj9Z76jCHc31bsS3SI8PBQ9DljEHen2k4jqmud+zq0YLXryPSKVvvEV33+qh9WuJ2 B6aLFhEsDnZqCvin1rtSXOTJbIgSWt5eMV/5E7ocB8HKc30VTa+6MsarDdj9bMK8SSPwAJsPnQ cTQpzSL5/NdU3ImxdEH51R8PA+f4gT5alMZVF1U4fvC/zXf43mfHfVOg8ojRbU9YV7mopmUBVQ DceQaC2vgsKi2RZIhY4t8k1c IronPort-SDR: LERPIoMDAu8EulzSRTx1WTxMy0zUcYeAl54vVYEBP3Slghj8Dvvt864i3oygsiyTO5XtUrPWnv 6E4BmyMBG9F39N8DUsb50gM1pCslo42THAkGwCra6FgvcDT+/12o/wXYK+TlVp7UmPLWyXtrU2 837GKd30eqKoPR9wLdR+IXVUr5PTDvvaF0LTeLP4kZd14kDxPtxf0Y4+Mn3Y496YtBqPmtjMej /WZDr6Om6YmQxx5bHmUOg5jq2athotKO1RWJ7epc/5qTJz1e0ZESdFCpf18Aex2I8/UIYzIqR1 3Js= From: Naohiro Aota <naohiro.aota@wdc.com> To: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>, Qu Wenruo <wqu@suse.com>, Nikolay Borisov <nborisov@suse.com>, linux-kernel@vger.kernel.org, Hannes Reinecke <hare@suse.com>, linux-fsdevel@vger.kernel.org, Damien Le Moal <damien.lemoal@wdc.com>, =?utf-8?q?Matias_Bj=C3=B8rling?= <mb@lightnvm.io>, Johannes Thumshirn <jthumshirn@suse.de>, Bart Van Assche <bvanassche@acm.org>, Naohiro Aota <naohiro.aota@wdc.com> Subject: [PATCH 09/19] btrfs: limit super block locations in HMZONED mode Date: Fri, 7 Jun 2019 22:10:15 +0900 Message-Id: <20190607131025.31996-10-naohiro.aota@wdc.com> In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk
Series	btrfs zoned block device support \| expand [v2,00/19] btrfs zoned block device support [01/19] btrfs: introduce HMZONED feature flag [02/19] btrfs: Get zone information of zoned block devices [03/19] btrfs: Check and enable HMZONED mode [04/19] btrfs: disable fallocate in HMZONED mode [05/19] btrfs: disable direct IO in HMZONED mode [06/19] btrfs: align dev extent allocation to zone boundary [07/19] btrfs: do sequential extent allocation in HMZONED mode [08/19] btrfs: make unmirroed BGs readonly only if we have at least one writable BG [09/19] btrfs: limit super block locations in HMZONED mode [10/19] btrfs: rename btrfs_map_bio() [11/19] btrfs: introduce submit buffer [12/19] btrfs: expire submit buffer on timeout [13/19] btrfs: avoid sync IO prioritization on checksum in HMZONED mode [14/19] btrfs: redirty released extent buffers in sequential BGs [15/19] btrfs: reset zones of unused block groups [16/19] btrfs: wait existing extents before truncating [17/19] btrfs: shrink delayed allocation size in HMZONED mode [18/19] btrfs: support dev-replace in HMZONED mode [19/19] btrfs: enable to mount HMZONED incompat flag

Naohiro Aota June 7, 2019, 1:10 p.m. UTC

When in HMZONED mode, make sure that device super blocks are located in
randomly writable zones of zoned block devices. That is, do not write super
blocks in sequential write required zones of host-managed zoned block
devices as update would not be possible.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/disk-io.c     | 11 +++++++++++
 fs/btrfs/disk-io.h     |  1 +
 fs/btrfs/extent-tree.c |  4 ++++
 fs/btrfs/scrub.c       |  2 ++
 4 files changed, 18 insertions(+)

Josef Bacik June 13, 2019, 2:12 p.m. UTC | #1

On Fri, Jun 07, 2019 at 10:10:15PM +0900, Naohiro Aota wrote:
> When in HMZONED mode, make sure that device super blocks are located in
> randomly writable zones of zoned block devices. That is, do not write super
> blocks in sequential write required zones of host-managed zoned block
> devices as update would not be possible.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/disk-io.c     | 11 +++++++++++
>  fs/btrfs/disk-io.h     |  1 +
>  fs/btrfs/extent-tree.c |  4 ++++
>  fs/btrfs/scrub.c       |  2 ++
>  4 files changed, 18 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 7c1404c76768..ddbb02906042 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3466,6 +3466,13 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>  	return latest;
>  }
>  
> +int btrfs_check_super_location(struct btrfs_device *device, u64 pos)
> +{
> +	/* any address is good on a regular (zone_size == 0) device */
> +	/* non-SEQUENTIAL WRITE REQUIRED zones are capable on a zoned device */

This is not how you do multi-line comments in the kernel.  Thanks,

Josef

David Sterba June 17, 2019, 10:53 p.m. UTC | #2

On Fri, Jun 07, 2019 at 10:10:15PM +0900, Naohiro Aota wrote:
> When in HMZONED mode, make sure that device super blocks are located in
> randomly writable zones of zoned block devices. That is, do not write super
> blocks in sequential write required zones of host-managed zoned block
> devices as update would not be possible.

This could be explained in more detail. My understanding is that the 1st
and 2nd copy superblocks is skipped at write time but the zone
containing the superblocks is not excluded from allocations. Ie. regular
data can appear in place where the superblocks would exist on
non-hmzoned filesystem. Is that correct?

The other option is to completely exclude the zone that contains the
superblock copies.

primary sb			 64K
1st copy			 64M
2nd copy			256G

Depends on the drives, but I think the size of the random write zone
will very often cover primary and 1st copy. So there's at least some
backup copy.

The 2nd copy will be in the sequential-only zone, so the whole zone
needs to be excluded in exclude_super_stripes. But it's not, so this
means data can go there.  I think the zone should be left empty.

Naohiro Aota June 18, 2019, 8:51 a.m. UTC | #3

On 2019/06/13 23:13, Josef Bacik wrote:
> On Fri, Jun 07, 2019 at 10:10:15PM +0900, Naohiro Aota wrote:
>> When in HMZONED mode, make sure that device super blocks are located in
>> randomly writable zones of zoned block devices. That is, do not write super
>> blocks in sequential write required zones of host-managed zoned block
>> devices as update would not be possible.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> ---
>>   fs/btrfs/disk-io.c     | 11 +++++++++++
>>   fs/btrfs/disk-io.h     |  1 +
>>   fs/btrfs/extent-tree.c |  4 ++++
>>   fs/btrfs/scrub.c       |  2 ++
>>   4 files changed, 18 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 7c1404c76768..ddbb02906042 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3466,6 +3466,13 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>>   	return latest;
>>   }
>>   
>> +int btrfs_check_super_location(struct btrfs_device *device, u64 pos)
>> +{
>> +	/* any address is good on a regular (zone_size == 0) device */
>> +	/* non-SEQUENTIAL WRITE REQUIRED zones are capable on a zoned device */
> 
> This is not how you do multi-line comments in the kernel.  Thanks,
> 
> Josef
> 

Thanks. I'll fix the style.
# I thought the checkpatch was catching this ...

Naohiro Aota June 18, 2019, 9:01 a.m. UTC | #4

On 2019/06/18 7:53, David Sterba wrote:
> On Fri, Jun 07, 2019 at 10:10:15PM +0900, Naohiro Aota wrote:
>> When in HMZONED mode, make sure that device super blocks are located in
>> randomly writable zones of zoned block devices. That is, do not write super
>> blocks in sequential write required zones of host-managed zoned block
>> devices as update would not be possible.
> 
> This could be explained in more detail. My understanding is that the 1st
> and 2nd copy superblocks is skipped at write time but the zone
> containing the superblocks is not excluded from allocations. Ie. regular
> data can appear in place where the superblocks would exist on
> non-hmzoned filesystem. Is that correct?

Correct. You can see regular data stored at usually SB location on HMZONED fs.

> The other option is to completely exclude the zone that contains the
> superblock copies.
> 
> primary sb			 64K
> 1st copy			 64M
> 2nd copy			256G
> 
> Depends on the drives, but I think the size of the random write zone
> will very often cover primary and 1st copy. So there's at least some
> backup copy.
> 
> The 2nd copy will be in the sequential-only zone, so the whole zone
> needs to be excluded in exclude_super_stripes. But it's not, so this
> means data can go there.  I think the zone should be left empty.
> 

I see. That's more safe for the older kernel/userland, right? By keeping that zone empty,
we can avoid old ones to mis-interpret data to be SB.

Alright, I will change the code to do so.

David Sterba June 27, 2019, 3:35 p.m. UTC | #5

On Tue, Jun 18, 2019 at 09:01:35AM +0000, Naohiro Aota wrote:
> On 2019/06/18 7:53, David Sterba wrote:
> > On Fri, Jun 07, 2019 at 10:10:15PM +0900, Naohiro Aota wrote:
> >> When in HMZONED mode, make sure that device super blocks are located in
> >> randomly writable zones of zoned block devices. That is, do not write super
> >> blocks in sequential write required zones of host-managed zoned block
> >> devices as update would not be possible.
> > 
> > This could be explained in more detail. My understanding is that the 1st
> > and 2nd copy superblocks is skipped at write time but the zone
> > containing the superblocks is not excluded from allocations. Ie. regular
> > data can appear in place where the superblocks would exist on
> > non-hmzoned filesystem. Is that correct?
> 
> Correct. You can see regular data stored at usually SB location on HMZONED fs.
> 
> > The other option is to completely exclude the zone that contains the
> > superblock copies.
> > 
> > primary sb			 64K
> > 1st copy			 64M
> > 2nd copy			256G
> > 
> > Depends on the drives, but I think the size of the random write zone
> > will very often cover primary and 1st copy. So there's at least some
> > backup copy.
> > 
> > The 2nd copy will be in the sequential-only zone, so the whole zone
> > needs to be excluded in exclude_super_stripes. But it's not, so this
> > means data can go there.  I think the zone should be left empty.
> > 
> 
> I see. That's more safe for the older kernel/userland, right? By keeping that zone empty,
> we can avoid old ones to mis-interpret data to be SB.

That's not only for older kernels, the superblock locations are known
and the contents should not depend on the type of device on which it was
created. This can be considered part of the on-disk format.

Anand Jain June 28, 2019, 3:55 a.m. UTC | #6

On 7/6/19 9:10 PM, Naohiro Aota wrote:
> When in HMZONED mode, make sure that device super blocks are located in
> randomly writable zones of zoned block devices. That is, do not write super
> blocks in sequential write required zones of host-managed zoned block
> devices as update would not be possible.

  By design all copies of SB must be updated at each transaction,
  as they are redundant copies they must match at the end of
  each transaction.

  Instead of skipping the sb updates, why not alter number of
  copies at the time of mkfs.btrfs?

Thanks, Anand


> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   fs/btrfs/disk-io.c     | 11 +++++++++++
>   fs/btrfs/disk-io.h     |  1 +
>   fs/btrfs/extent-tree.c |  4 ++++
>   fs/btrfs/scrub.c       |  2 ++
>   4 files changed, 18 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 7c1404c76768..ddbb02906042 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3466,6 +3466,13 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>   	return latest;
>   }
>   
> +int btrfs_check_super_location(struct btrfs_device *device, u64 pos)
> +{
> +	/* any address is good on a regular (zone_size == 0) device */
> +	/* non-SEQUENTIAL WRITE REQUIRED zones are capable on a zoned device */
> +	return device->zone_size == 0 || !btrfs_dev_is_sequential(device, pos);
> +}
> +
>   /*
>    * Write superblock @sb to the @device. Do not wait for completion, all the
>    * buffer heads we write are pinned.
> @@ -3495,6 +3502,8 @@ static int write_dev_supers(struct btrfs_device *device,
>   		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
>   		    device->commit_total_bytes)
>   			break;
> +		if (!btrfs_check_super_location(device, bytenr))
> +			continue;
>   
>   		btrfs_set_super_bytenr(sb, bytenr);
>   
> @@ -3561,6 +3570,8 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
>   		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
>   		    device->commit_total_bytes)
>   			break;
> +		if (!btrfs_check_super_location(device, bytenr))
> +			continue;
>   
>   		bh = __find_get_block(device->bdev,
>   				      bytenr / BTRFS_BDEV_BLOCKSIZE,
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index a0161aa1ea0b..70e97cd6fa76 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -141,6 +141,7 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
>   		struct page *page, size_t pg_offset, u64 start, u64 len,
>   		int create);
>   int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
> +int btrfs_check_super_location(struct btrfs_device *device, u64 pos);
>   int __init btrfs_end_io_wq_init(void);
>   void __cold btrfs_end_io_wq_exit(void);
>   
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3d41d840fe5c..ae2c895d08c4 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -267,6 +267,10 @@ static int exclude_super_stripes(struct btrfs_block_group_cache *cache)
>   			return ret;
>   	}
>   
> +	/* we won't have super stripes in sequential zones */
> +	if (cache->alloc_type == BTRFS_ALLOC_SEQ)
> +		return 0;
> +
>   	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>   		bytenr = btrfs_sb_offset(i);
>   		ret = btrfs_rmap_block(fs_info, cache->key.objectid,
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index f7b29f9db5e2..36ad4fad7eaf 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -3720,6 +3720,8 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
>   		if (bytenr + BTRFS_SUPER_INFO_SIZE >
>   		    scrub_dev->commit_total_bytes)
>   			break;
> +		if (!btrfs_check_super_location(scrub_dev, bytenr))
> +			continue;
>   
>   		ret = scrub_pages(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr,
>   				  scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i,
>

Naohiro Aota June 28, 2019, 6:39 a.m. UTC | #7

On 2019/06/28 12:56, Anand Jain wrote:
> On 7/6/19 9:10 PM, Naohiro Aota wrote:
>> When in HMZONED mode, make sure that device super blocks are located in
>> randomly writable zones of zoned block devices. That is, do not write super
>> blocks in sequential write required zones of host-managed zoned block
>> devices as update would not be possible.
> 
>    By design all copies of SB must be updated at each transaction,
>    as they are redundant copies they must match at the end of
>    each transaction.
> 
>    Instead of skipping the sb updates, why not alter number of
>    copies at the time of mkfs.btrfs?
> 
> Thanks, Anand

That is exactly what the patched code does. It updates all the SB
copies, but it just avoids writing a copy to sequential writing
required zones. Mkfs.btrfs do the same. So, all the available SB
copies always match after a transaction. At the SB location in a
sequential write required zone, you will see zeroed region (in the
next version of the patch series), but that is easy to ignore: it
lacks even BTRFS_MAGIC.

The number of SB copy available on HMZONED device will vary
by its zone size and its zone layout.

Thanks,

> 
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> ---
>>    fs/btrfs/disk-io.c     | 11 +++++++++++
>>    fs/btrfs/disk-io.h     |  1 +
>>    fs/btrfs/extent-tree.c |  4 ++++
>>    fs/btrfs/scrub.c       |  2 ++
>>    4 files changed, 18 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 7c1404c76768..ddbb02906042 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3466,6 +3466,13 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>>    	return latest;
>>    }
>>    
>> +int btrfs_check_super_location(struct btrfs_device *device, u64 pos)
>> +{
>> +	/* any address is good on a regular (zone_size == 0) device */
>> +	/* non-SEQUENTIAL WRITE REQUIRED zones are capable on a zoned device */
>> +	return device->zone_size == 0 || !btrfs_dev_is_sequential(device, pos);
>> +}
>> +
>>    /*
>>     * Write superblock @sb to the @device. Do not wait for completion, all the
>>     * buffer heads we write are pinned.
>> @@ -3495,6 +3502,8 @@ static int write_dev_supers(struct btrfs_device *device,
>>    		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
>>    		    device->commit_total_bytes)
>>    			break;
>> +		if (!btrfs_check_super_location(device, bytenr))
>> +			continue;
>>    
>>    		btrfs_set_super_bytenr(sb, bytenr);
>>    
>> @@ -3561,6 +3570,8 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
>>    		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
>>    		    device->commit_total_bytes)
>>    			break;
>> +		if (!btrfs_check_super_location(device, bytenr))
>> +			continue;
>>    
>>    		bh = __find_get_block(device->bdev,
>>    				      bytenr / BTRFS_BDEV_BLOCKSIZE,
>> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
>> index a0161aa1ea0b..70e97cd6fa76 100644
>> --- a/fs/btrfs/disk-io.h
>> +++ b/fs/btrfs/disk-io.h
>> @@ -141,6 +141,7 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
>>    		struct page *page, size_t pg_offset, u64 start, u64 len,
>>    		int create);
>>    int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
>> +int btrfs_check_super_location(struct btrfs_device *device, u64 pos);
>>    int __init btrfs_end_io_wq_init(void);
>>    void __cold btrfs_end_io_wq_exit(void);
>>    
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 3d41d840fe5c..ae2c895d08c4 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -267,6 +267,10 @@ static int exclude_super_stripes(struct btrfs_block_group_cache *cache)
>>    			return ret;
>>    	}
>>    
>> +	/* we won't have super stripes in sequential zones */
>> +	if (cache->alloc_type == BTRFS_ALLOC_SEQ)
>> +		return 0;
>> +
>>    	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>    		bytenr = btrfs_sb_offset(i);
>>    		ret = btrfs_rmap_block(fs_info, cache->key.objectid,
>> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
>> index f7b29f9db5e2..36ad4fad7eaf 100644
>> --- a/fs/btrfs/scrub.c
>> +++ b/fs/btrfs/scrub.c
>> @@ -3720,6 +3720,8 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
>>    		if (bytenr + BTRFS_SUPER_INFO_SIZE >
>>    		    scrub_dev->commit_total_bytes)
>>    			break;
>> +		if (!btrfs_check_super_location(scrub_dev, bytenr))
>> +			continue;
>>    
>>    		ret = scrub_pages(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr,
>>    				  scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i,
>>
> 
>

Anand Jain June 28, 2019, 6:52 a.m. UTC | #8

> On 28 Jun 2019, at 2:39 PM, Naohiro Aota <Naohiro.Aota@wdc.com> wrote:
> 
> On 2019/06/28 12:56, Anand Jain wrote:
>> On 7/6/19 9:10 PM, Naohiro Aota wrote:
>>> When in HMZONED mode, make sure that device super blocks are located in
>>> randomly writable zones of zoned block devices. That is, do not write super
>>> blocks in sequential write required zones of host-managed zoned block
>>> devices as update would not be possible.
>> 
>>   By design all copies of SB must be updated at each transaction,
>>   as they are redundant copies they must match at the end of
>>   each transaction.
>> 
>>   Instead of skipping the sb updates, why not alter number of
>>   copies at the time of mkfs.btrfs?
>> 
>> Thanks, Anand
> 
> That is exactly what the patched code does. It updates all the SB
> copies, but it just avoids writing a copy to sequential writing
> required zones. Mkfs.btrfs do the same. So, all the available SB
> copies always match after a transaction. At the SB location in a
> sequential write required zone, you will see zeroed region (in the
> next version of the patch series), but that is easy to ignore: it
> lacks even BTRFS_MAGIC.
> 

 Right, I saw the related Btrfs-progs patches at a later time,
 there are piles of emails after a vacation.;-)

> The number of SB copy available on HMZONED device will vary
> by its zone size and its zone layout.


Thanks, Anand

> Thanks,
> 
>> 
>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>> ---
>>>   fs/btrfs/disk-io.c     | 11 +++++++++++
>>>   fs/btrfs/disk-io.h     |  1 +
>>>   fs/btrfs/extent-tree.c |  4 ++++
>>>   fs/btrfs/scrub.c       |  2 ++
>>>   4 files changed, 18 insertions(+)
>>> 
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 7c1404c76768..ddbb02906042 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3466,6 +3466,13 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>>>   	return latest;
>>>   }
>>> 
>>> +int btrfs_check_super_location(struct btrfs_device *device, u64 pos)
>>> +{
>>> +	/* any address is good on a regular (zone_size == 0) device */
>>> +	/* non-SEQUENTIAL WRITE REQUIRED zones are capable on a zoned device */
>>> +	return device->zone_size == 0 || !btrfs_dev_is_sequential(device, pos);
>>> +}
>>> +
>>>   /*
>>>    * Write superblock @sb to the @device. Do not wait for completion, all the
>>>    * buffer heads we write are pinned.
>>> @@ -3495,6 +3502,8 @@ static int write_dev_supers(struct btrfs_device *device,
>>>   		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
>>>   		    device->commit_total_bytes)
>>>   			break;
>>> +		if (!btrfs_check_super_location(device, bytenr))
>>> +			continue;
>>> 
>>>   		btrfs_set_super_bytenr(sb, bytenr);
>>> 
>>> @@ -3561,6 +3570,8 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
>>>   		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
>>>   		    device->commit_total_bytes)
>>>   			break;
>>> +		if (!btrfs_check_super_location(device, bytenr))
>>> +			continue;
>>> 
>>>   		bh = __find_get_block(device->bdev,
>>>   				      bytenr / BTRFS_BDEV_BLOCKSIZE,
>>> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
>>> index a0161aa1ea0b..70e97cd6fa76 100644
>>> --- a/fs/btrfs/disk-io.h
>>> +++ b/fs/btrfs/disk-io.h
>>> @@ -141,6 +141,7 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
>>>   		struct page *page, size_t pg_offset, u64 start, u64 len,
>>>   		int create);
>>>   int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
>>> +int btrfs_check_super_location(struct btrfs_device *device, u64 pos);
>>>   int __init btrfs_end_io_wq_init(void);
>>>   void __cold btrfs_end_io_wq_exit(void);
>>> 
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 3d41d840fe5c..ae2c895d08c4 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -267,6 +267,10 @@ static int exclude_super_stripes(struct btrfs_block_group_cache *cache)
>>>   			return ret;
>>>   	}
>>> 
>>> +	/* we won't have super stripes in sequential zones */
>>> +	if (cache->alloc_type == BTRFS_ALLOC_SEQ)
>>> +		return 0;
>>> +
>>>   	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>   		bytenr = btrfs_sb_offset(i);
>>>   		ret = btrfs_rmap_block(fs_info, cache->key.objectid,
>>> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
>>> index f7b29f9db5e2..36ad4fad7eaf 100644
>>> --- a/fs/btrfs/scrub.c
>>> +++ b/fs/btrfs/scrub.c
>>> @@ -3720,6 +3720,8 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
>>>   		if (bytenr + BTRFS_SUPER_INFO_SIZE >
>>>   		    scrub_dev->commit_total_bytes)
>>>   			break;
>>> +		if (!btrfs_check_super_location(scrub_dev, bytenr))
>>> +			continue;
>>> 
>>>   		ret = scrub_pages(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr,
>>>   				  scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i,
>>> 
>> 
>> 
>

[09/19] btrfs: limit super block locations in HMZONED mode

Commit Message

Comments

Patch