[RFC] btrfs: self heal from SB fail

Message ID	20171208075705.23462-1-anand.jain@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Anand Jain <anand.jain@oracle.com> To: linux-btrfs@vger.kernel.org Subject: [PATCH RFC] btrfs: self heal from SB fail Date: Fri, 8 Dec 2017 15:57:05 +0800 Message-Id: <20171208075705.23462-1-anand.jain@oracle.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk

Anand Jain Dec. 8, 2017, 7:57 a.m. UTC

-EXPERIMENTAL-
As of now when primary SB fails we won't self heal and would fail mount,
this is an experimental patch which thinks why not go and read backup
copy.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/disk-io.c |  8 +++++++-
 fs/btrfs/volumes.c | 10 +++++++---
 2 files changed, 14 insertions(+), 4 deletions(-)

Qu Wenruo Dec. 8, 2017, 8:17 a.m. UTC | #1

On 2017年12月08日 15:57, Anand Jain wrote:
> -EXPERIMENTAL-
> As of now when primary SB fails we won't self heal and would fail mount,
> this is an experimental patch which thinks why not go and read backup
> copy.

Just curious about in which real world case that backup super block can
help.
At least from what I see in mail list, only few cases where backup super
helps.

Despite that self super heal seems good, although I agree with David, we
need a weaker but necessary check (magic and fsid from primary super?)
to ensure it's a valid btrfs before we use the backup supers.

Thanks,
Qu


> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/disk-io.c |  8 +++++++-
>  fs/btrfs/volumes.c | 10 +++++++---
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9b20c1f3563b..a791b8dfe8a8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>  	 * So, we need to add a special mount option to scan for
>  	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>  	 */
> -	for (i = 0; i < 1; i++) {
> +	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>  		ret = btrfs_read_dev_one_super(bdev, i, &bh);
>  		if (ret)
>  			continue;
> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
>  		ret = -EINVAL;
>  	}
>  
> +#if 0
> +	/*
> +	 * Need a way to check for any copy of SB, as its not a
> +	 * strong check, just ignore this for now.
> +	 */
>  	if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>  		btrfs_err(fs_info, "super offset mismatch %llu != %u",
>  			  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>  		ret = -EINVAL;
>  	}
> +#endif
>  
>  	/*
>  	 * Obvious sys_chunk_array corruptions, it must hold at least one key
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9fa2539a8493..f368db94d62b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>  {
>  	struct btrfs_super_block *disk_super;
>  	struct block_device *bdev;
> -	struct page *page;
> +	struct buffer_head *sb_bh;
>  	int ret = -EINVAL;
>  	u64 devid;
>  	u64 transid;
> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>  		goto error;
>  	}
>  
> -	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
> +	sb_bh = btrfs_read_dev_super(bdev);
> +	if (IS_ERR(sb_bh)) {
> +		ret = PTR_ERR(sb_bh);
>  		goto error_bdev_put;
> +	}
> +	disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>  
>  	devid = btrfs_stack_device_id(&disk_super->dev_item);
>  	transid = btrfs_super_generation(disk_super);
> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>  	if (!ret && fs_devices_ret)
>  		(*fs_devices_ret)->total_devices = total_devices;
>  
> -	btrfs_release_disk_super(page);
> +	brelse(sb_bh);
>  
>  error_bdev_put:
>  	blkdev_put(bdev, flags);
>

Nikolay Borisov Dec. 8, 2017, 8:40 a.m. UTC | #2

On  8.12.2017 09:57, Anand Jain wrote:
> -EXPERIMENTAL-
> As of now when primary SB fails we won't self heal and would fail mount,
> this is an experimental patch which thinks why not go and read backup
> copy.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/disk-io.c |  8 +++++++-
>  fs/btrfs/volumes.c | 10 +++++++---
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9b20c1f3563b..a791b8dfe8a8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>  	 * So, we need to add a special mount option to scan for
>  	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>  	 */
> -	for (i = 0; i < 1; i++) {
> +	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>  		ret = btrfs_read_dev_one_super(bdev, i, &bh);
>  		if (ret)
>  			continue;
> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
>  		ret = -EINVAL;
>  	}
>  
> +#if 0
> +	/*
> +	 * Need a way to check for any copy of SB, as its not a
> +	 * strong check, just ignore this for now.
> +	 */
>  	if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>  		btrfs_err(fs_info, "super offset mismatch %llu != %u",
>  			  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>  		ret = -EINVAL;
>  	}
> +#endif
>  
>  	/*
>  	 * Obvious sys_chunk_array corruptions, it must hold at least one key
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9fa2539a8493..f368db94d62b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>  {
>  	struct btrfs_super_block *disk_super;
>  	struct block_device *bdev;
> -	struct page *page;
> +	struct buffer_head *sb_bh;
>  	int ret = -EINVAL;
>  	u64 devid;
>  	u64 transid;
> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>  		goto error;
>  	}
>  
> -	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
> +	sb_bh = btrfs_read_dev_super(bdev);

This patch prompts another question: why do we have a page-based and a
bufferhead-based interface to reading the super block? I did prototype
switching the bufferheads to page based but the resulting code wasn't
any cleaner. I believe there is also open the question what happens when
btrfs is run on a 64k page machine. I.e. we are going to read a single
page and the sb is going to be in the first 4k but what about the rest
60, they could potentially contain other metadata. The page will have to
be freed asap so as not to peg the neighboring metadata?


> +	if (IS_ERR(sb_bh)) {
> +		ret = PTR_ERR(sb_bh);
>  		goto error_bdev_put;
> +	}
> +	disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>  
>  	devid = btrfs_stack_device_id(&disk_super->dev_item);
>  	transid = btrfs_super_generation(disk_super);
> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>  	if (!ret && fs_devices_ret)
>  		(*fs_devices_ret)->total_devices = total_devices;
>  
> -	btrfs_release_disk_super(page);
> +	brelse(sb_bh);
>  
>  error_bdev_put:
>  	blkdev_put(bdev, flags);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anand Jain Dec. 8, 2017, 10:33 a.m. UTC | #3

On 12/08/2017 04:40 PM, Nikolay Borisov wrote:
> 
> 
> On  8.12.2017 09:57, Anand Jain wrote:
>> -EXPERIMENTAL-
>> As of now when primary SB fails we won't self heal and would fail mount,
>> this is an experimental patch which thinks why not go and read backup
>> copy.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/disk-io.c |  8 +++++++-
>>   fs/btrfs/volumes.c | 10 +++++++---
>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 9b20c1f3563b..a791b8dfe8a8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>>   	 * So, we need to add a special mount option to scan for
>>   	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>   	 */
>> -	for (i = 0; i < 1; i++) {
>> +	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>   		ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>   		if (ret)
>>   			continue;
>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
>>   		ret = -EINVAL;
>>   	}
>>   
>> +#if 0
>> +	/*
>> +	 * Need a way to check for any copy of SB, as its not a
>> +	 * strong check, just ignore this for now.
>> +	 */
>>   	if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>   		btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>   			  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>   		ret = -EINVAL;
>>   	}
>> +#endif
>>   
>>   	/*
>>   	 * Obvious sys_chunk_array corruptions, it must hold at least one key
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9fa2539a8493..f368db94d62b 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>>   {
>>   	struct btrfs_super_block *disk_super;
>>   	struct block_device *bdev;
>> -	struct page *page;
>> +	struct buffer_head *sb_bh;
>>   	int ret = -EINVAL;
>>   	u64 devid;
>>   	u64 transid;
>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>>   		goto error;
>>   	}
>>   
>> -	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>> +	sb_bh = btrfs_read_dev_super(bdev);
> 
> This patch prompts another question: why do we have a page-based and a
> bufferhead-based interface to reading the super block?

  Right. we need to know that. Sorry I just saw this.

  I have a very old patch to converge them and clean this up, but haven't
  sent it because there are some missing information on why it ended up
  like that in the first place.

Thanks, Anand


> I did prototype
> switching the bufferheads to page based but the resulting code wasn't
> any cleaner. I believe there is also open the question what happens when
> btrfs is run on a 64k page machine. I.e. we are going to read a single
> page and the sb is going to be in the first 4k but what about the rest
> 60, they could potentially contain other metadata. The page will have to
> be freed asap so as not to peg the neighboring metadata?


> 
>> +	if (IS_ERR(sb_bh)) {
>> +		ret = PTR_ERR(sb_bh);
>>   		goto error_bdev_put;
>> +	}
>> +	disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>   
>>   	devid = btrfs_stack_device_id(&disk_super->dev_item);
>>   	transid = btrfs_super_generation(disk_super);
>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>>   	if (!ret && fs_devices_ret)
>>   		(*fs_devices_ret)->total_devices = total_devices;
>>   
>> -	btrfs_release_disk_super(page);
>> +	brelse(sb_bh);
>>   
>>   error_bdev_put:
>>   	blkdev_put(bdev, flags);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anand Jain Dec. 8, 2017, 10:39 a.m. UTC | #4

On 12/08/2017 04:17 PM, Qu Wenruo wrote:
> 
> 
> On 2017年12月08日 15:57, Anand Jain wrote:
>> -EXPERIMENTAL-
>> As of now when primary SB fails we won't self heal and would fail mount,
>> this is an experimental patch which thinks why not go and read backup
>> copy.
> 
> Just curious about in which real world case that backup super block can
> help.
> At least from what I see in mail list, only few cases where backup super
> helps.

  Theoretical design helps. I ended up in this situation though. And
  ext4 has -o sb flag to manage this part. When we can expect EIO on
  any part of the disk block why not on the LBA which contains primary
  SB. And should we fail the mount for that reason ? No.

> Despite that self super heal seems good, although I agree with David, we
> need a weaker but necessary check (magic and fsid from primary super?)
> to ensure it's a valid btrfs before we use the backup supers.

  Of course, we already have btrfs_check_super_valid() to verify the SB,
  I don't understand why - how do we verify the SB should be the point of
  concern here, at all.

Thanks, Anand

> Thanks,
> Qu
> 
> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/disk-io.c |  8 +++++++-
>>   fs/btrfs/volumes.c | 10 +++++++---
>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 9b20c1f3563b..a791b8dfe8a8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>>   	 * So, we need to add a special mount option to scan for
>>   	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>   	 */
>> -	for (i = 0; i < 1; i++) {
>> +	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>   		ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>   		if (ret)
>>   			continue;
>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
>>   		ret = -EINVAL;
>>   	}
>>   
>> +#if 0
>> +	/*
>> +	 * Need a way to check for any copy of SB, as its not a
>> +	 * strong check, just ignore this for now.
>> +	 */
>>   	if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>   		btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>   			  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>   		ret = -EINVAL;
>>   	}
>> +#endif
>>   
>>   	/*
>>   	 * Obvious sys_chunk_array corruptions, it must hold at least one key
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9fa2539a8493..f368db94d62b 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>>   {
>>   	struct btrfs_super_block *disk_super;
>>   	struct block_device *bdev;
>> -	struct page *page;
>> +	struct buffer_head *sb_bh;
>>   	int ret = -EINVAL;
>>   	u64 devid;
>>   	u64 transid;
>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>>   		goto error;
>>   	}
>>   
>> -	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>> +	sb_bh = btrfs_read_dev_super(bdev);
>> +	if (IS_ERR(sb_bh)) {
>> +		ret = PTR_ERR(sb_bh);
>>   		goto error_bdev_put;
>> +	}
>> +	disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>   
>>   	devid = btrfs_stack_device_id(&disk_super->dev_item);
>>   	transid = btrfs_super_generation(disk_super);
>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>>   	if (!ret && fs_devices_ret)
>>   		(*fs_devices_ret)->total_devices = total_devices;
>>   
>> -	btrfs_release_disk_super(page);
>> +	brelse(sb_bh);
>>   
>>   error_bdev_put:
>>   	blkdev_put(bdev, flags);
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo Dec. 8, 2017, 11:01 a.m. UTC | #5

On 2017年12月08日 18:39, Anand Jain wrote:
> 
> 
> On 12/08/2017 04:17 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 15:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail mount,
>>> this is an experimental patch which thinks why not go and read backup
>>> copy.
>>
>> Just curious about in which real world case that backup super block can
>> help.
>> At least from what I see in mail list, only few cases where backup super
>> helps.
> 
>  Theoretical design helps. I ended up in this situation though. And
>  ext4 has -o sb flag to manage this part. When we can expect EIO on
>  any part of the disk block why not on the LBA which contains primary
>  SB. And should we fail the mount for that reason ? No.

And how do you ensure it's a btrfs?

> 
>> Despite that self super heal seems good, although I agree with David, we
>> need a weaker but necessary check (magic and fsid from primary super?)
>> to ensure it's a valid btrfs before we use the backup supers.
> 
>  Of course, we already have btrfs_check_super_valid() to verify the SB,
>  I don't understand why - how do we verify the SB should be the point of
>  concern here, at all.

The point here is, to distinguish an old and invalid btrfs (some other
valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
primary fs.

This the main concern here.
The difference between recovery and recognizing a bad btrfs is quite
important.

Thanks,
Qu

> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>
>>>
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>> ---
>>>   fs/btrfs/disk-io.c |  8 +++++++-
>>>   fs/btrfs/volumes.c | 10 +++++++---
>>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>> block_device *bdev)
>>>        * So, we need to add a special mount option to scan for
>>>        * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>        */
>>> -    for (i = 0; i < 1; i++) {
>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>           ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>           if (ret)
>>>               continue;
>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>> btrfs_fs_info *fs_info)
>>>           ret = -EINVAL;
>>>       }
>>>   +#if 0
>>> +    /*
>>> +     * Need a way to check for any copy of SB, as its not a
>>> +     * strong check, just ignore this for now.
>>> +     */
>>>       if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>           btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>                 btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>           ret = -EINVAL;
>>>       }
>>> +#endif
>>>         /*
>>>        * Obvious sys_chunk_array corruptions, it must hold at least
>>> one key
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 9fa2539a8493..f368db94d62b 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   {
>>>       struct btrfs_super_block *disk_super;
>>>       struct block_device *bdev;
>>> -    struct page *page;
>>> +    struct buffer_head *sb_bh;
>>>       int ret = -EINVAL;
>>>       u64 devid;
>>>       u64 transid;
>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>           goto error;
>>>       }
>>>   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>> +    if (IS_ERR(sb_bh)) {
>>> +        ret = PTR_ERR(sb_bh);
>>>           goto error_bdev_put;
>>> +    }
>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>         devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>       transid = btrfs_super_generation(disk_super);
>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>       if (!ret && fs_devices_ret)
>>>           (*fs_devices_ret)->total_devices = total_devices;
>>>   -    btrfs_release_disk_super(page);
>>> +    brelse(sb_bh);
>>>     error_bdev_put:
>>>       blkdev_put(bdev, flags);
>>>
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anand Jain Dec. 8, 2017, 11:48 a.m. UTC | #6

On 12/08/2017 07:01 PM, Qu Wenruo wrote:
> 
> 
> On 2017年12月08日 18:39, Anand Jain wrote:
>>
>>
>> On 12/08/2017 04:17 PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2017年12月08日 15:57, Anand Jain wrote:
>>>> -EXPERIMENTAL-
>>>> As of now when primary SB fails we won't self heal and would fail mount,
>>>> this is an experimental patch which thinks why not go and read backup
>>>> copy.
>>>
>>> Just curious about in which real world case that backup super block can
>>> help.
>>> At least from what I see in mail list, only few cases where backup super
>>> helps.
>>
>>   Theoretical design helps. I ended up in this situation though. And
>>   ext4 has -o sb flag to manage this part. When we can expect EIO on
>>   any part of the disk block why not on the LBA which contains primary
>>   SB. And should we fail the mount for that reason ? No.
> 
> And how do you ensure it's a btrfs?

  Hmm. You mean outside of btrfs ? I did experiment with wipe and then
  using /etc/fstab to mount, and it did lead to btrfs, is that your
  concern that it shouldn't have been. That looked surprising to me as
  well, but then problem points at wipefs instead.

> 
>>
>>> Despite that self super heal seems good, although I agree with David, we
>>> need a weaker but necessary check (magic and fsid from primary super?)
>>> to ensure it's a valid btrfs before we use the backup supers.
>>
>>   Of course, we already have btrfs_check_super_valid() to verify the SB,
>>   I don't understand why - how do we verify the SB should be the point of
>>   concern here, at all.
> 
> The point here is, to distinguish an old and invalid btrfs (some other
> valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
> primary fs.

  Ok. When you check all the SBs you would pick the one which has passed
  btrfs_check_super_valid() and has highest generation. Did I ans your
  concern ? If a SB does not pass btrfs_check_super_valid() its not a
  valid btrfs SB at all.


> This the main concern here.
> The difference between recovery and recognizing a bad btrfs is quite
> important.

  btrfs_check_super_valid() is already doing that right ? The point here
  is, should we use the backup SB when btrfs_check_super_valid() fails on
  primary SB.

Thanks, Anand

> Thanks,
> Qu
> 
>>
>> Thanks, Anand
>>
>>> Thanks,
>>> Qu
>>>
>>>
>>>>
>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>> ---
>>>>    fs/btrfs/disk-io.c |  8 +++++++-
>>>>    fs/btrfs/volumes.c | 10 +++++++---
>>>>    2 files changed, 14 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>>> block_device *bdev)
>>>>         * So, we need to add a special mount option to scan for
>>>>         * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>>         */
>>>> -    for (i = 0; i < 1; i++) {
>>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>>            ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>>            if (ret)
>>>>                continue;
>>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>>> btrfs_fs_info *fs_info)
>>>>            ret = -EINVAL;
>>>>        }
>>>>    +#if 0
>>>> +    /*
>>>> +     * Need a way to check for any copy of SB, as its not a
>>>> +     * strong check, just ignore this for now.
>>>> +     */
>>>>        if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>>            btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>>                  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>>            ret = -EINVAL;
>>>>        }
>>>> +#endif
>>>>          /*
>>>>         * Obvious sys_chunk_array corruptions, it must hold at least
>>>> one key
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index 9fa2539a8493..f368db94d62b 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>>> fmode_t flags, void *holder,
>>>>    {
>>>>        struct btrfs_super_block *disk_super;
>>>>        struct block_device *bdev;
>>>> -    struct page *page;
>>>> +    struct buffer_head *sb_bh;
>>>>        int ret = -EINVAL;
>>>>        u64 devid;
>>>>        u64 transid;
>>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>>> fmode_t flags, void *holder,
>>>>            goto error;
>>>>        }
>>>>    -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>>> +    if (IS_ERR(sb_bh)) {
>>>> +        ret = PTR_ERR(sb_bh);
>>>>            goto error_bdev_put;
>>>> +    }
>>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>>          devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>>        transid = btrfs_super_generation(disk_super);
>>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>>> fmode_t flags, void *holder,
>>>>        if (!ret && fs_devices_ret)
>>>>            (*fs_devices_ret)->total_devices = total_devices;
>>>>    -    btrfs_release_disk_super(page);
>>>> +    brelse(sb_bh);
>>>>      error_bdev_put:
>>>>        blkdev_put(bdev, flags);
>>>>
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo Dec. 8, 2017, 12:02 p.m. UTC | #7

On 2017年12月08日 19:48, Anand Jain wrote:
> 
> 
> On 12/08/2017 07:01 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 18:39, Anand Jain wrote:
>>>
>>>
>>> On 12/08/2017 04:17 PM, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2017年12月08日 15:57, Anand Jain wrote:
>>>>> -EXPERIMENTAL-
>>>>> As of now when primary SB fails we won't self heal and would fail
>>>>> mount,
>>>>> this is an experimental patch which thinks why not go and read backup
>>>>> copy.
>>>>
>>>> Just curious about in which real world case that backup super block can
>>>> help.
>>>> At least from what I see in mail list, only few cases where backup
>>>> super
>>>> helps.
>>>
>>>   Theoretical design helps. I ended up in this situation though. And
>>>   ext4 has -o sb flag to manage this part. When we can expect EIO on
>>>   any part of the disk block why not on the LBA which contains primary
>>>   SB. And should we fail the mount for that reason ? No.
>>
>> And how do you ensure it's a btrfs?
> 
>  Hmm. You mean outside of btrfs ? I did experiment with wipe and then
>  using /etc/fstab to mount, and it did lead to btrfs, is that your
>  concern that it shouldn't have been. That looked surprising to me as
>  well, but then problem points at wipefs instead.

It's closer, but still doesn't reach the point.
It can be a mkfs, other than wipefs.

It's can be another case like:

One user used btrfs for a while
But bugs made him/her unhappy, and he/she turned to use xfs (whatever
the fs is) instead.

While he/she forgot to change its fstab, and rebooted the system.

And suddenly, it's btrfs again!
What a surprise.


> 
>>
>>>
>>>> Despite that self super heal seems good, although I agree with
>>>> David, we
>>>> need a weaker but necessary check (magic and fsid from primary super?)
>>>> to ensure it's a valid btrfs before we use the backup supers.
>>>
>>>   Of course, we already have btrfs_check_super_valid() to verify the SB,
>>>   I don't understand why - how do we verify the SB should be the
>>> point of
>>>   concern here, at all.
>>
>> The point here is, to distinguish an old and invalid btrfs (some other
>> valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
>> primary fs.
> 
>  Ok. When you check all the SBs you would pick the one which has passed
>  btrfs_check_super_valid() and has highest generation. Did I ans your
>  concern ? If a SB does not pass btrfs_check_super_valid() its not a
>  valid btrfs SB at all.

No, not really.

What if the first SB is a XFS one or even a fs you didn't ever hear?

Skip it and use the 2nd one?
This filesystem is not even btrfs anymore.

Thanks,
Qu

> 
> 
>> This the main concern here.
>> The difference between recovery and recognizing a bad btrfs is quite
>> important.
> 
>  btrfs_check_super_valid() is already doing that right ? The point here
>  is, should we use the backup SB when btrfs_check_super_valid() fails on
>  primary SB.
> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>>
>>> Thanks, Anand
>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>
>>>>>
>>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>>> ---
>>>>>    fs/btrfs/disk-io.c |  8 +++++++-
>>>>>    fs/btrfs/volumes.c | 10 +++++++---
>>>>>    2 files changed, 14 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>>>> --- a/fs/btrfs/disk-io.c
>>>>> +++ b/fs/btrfs/disk-io.c
>>>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>>>> block_device *bdev)
>>>>>         * So, we need to add a special mount option to scan for
>>>>>         * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>>>         */
>>>>> -    for (i = 0; i < 1; i++) {
>>>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>>>            ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>>>            if (ret)
>>>>>                continue;
>>>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>>>> btrfs_fs_info *fs_info)
>>>>>            ret = -EINVAL;
>>>>>        }
>>>>>    +#if 0
>>>>> +    /*
>>>>> +     * Need a way to check for any copy of SB, as its not a
>>>>> +     * strong check, just ignore this for now.
>>>>> +     */
>>>>>        if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>>>            btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>>>                  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>>>            ret = -EINVAL;
>>>>>        }
>>>>> +#endif
>>>>>          /*
>>>>>         * Obvious sys_chunk_array corruptions, it must hold at least
>>>>> one key
>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>> index 9fa2539a8493..f368db94d62b 100644
>>>>> --- a/fs/btrfs/volumes.c
>>>>> +++ b/fs/btrfs/volumes.c
>>>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>>>> fmode_t flags, void *holder,
>>>>>    {
>>>>>        struct btrfs_super_block *disk_super;
>>>>>        struct block_device *bdev;
>>>>> -    struct page *page;
>>>>> +    struct buffer_head *sb_bh;
>>>>>        int ret = -EINVAL;
>>>>>        u64 devid;
>>>>>        u64 transid;
>>>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>>>> fmode_t flags, void *holder,
>>>>>            goto error;
>>>>>        }
>>>>>    -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>>>> +    if (IS_ERR(sb_bh)) {
>>>>> +        ret = PTR_ERR(sb_bh);
>>>>>            goto error_bdev_put;
>>>>> +    }
>>>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>>>          devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>>>        transid = btrfs_super_generation(disk_super);
>>>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>>>> fmode_t flags, void *holder,
>>>>>        if (!ret && fs_devices_ret)
>>>>>            (*fs_devices_ret)->total_devices = total_devices;
>>>>>    -    btrfs_release_disk_super(page);
>>>>> +    brelse(sb_bh);
>>>>>      error_bdev_put:
>>>>>        blkdev_put(bdev, flags);
>>>>>
>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nikolay Borisov Dec. 8, 2017, 12:12 p.m. UTC | #8

On  8.12.2017 12:33, Anand Jain wrote:
> 
> 
> On 12/08/2017 04:40 PM, Nikolay Borisov wrote:
>>
>>
>> On  8.12.2017 09:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail mount,
>>> this is an experimental patch which thinks why not go and read backup
>>> copy.
>>>
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>> ---
>>>   fs/btrfs/disk-io.c |  8 +++++++-
>>>   fs/btrfs/volumes.c | 10 +++++++---
>>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>> block_device *bdev)
>>>        * So, we need to add a special mount option to scan for
>>>        * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>        */
>>> -    for (i = 0; i < 1; i++) {
>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>           ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>           if (ret)
>>>               continue;
>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>> btrfs_fs_info *fs_info)
>>>           ret = -EINVAL;
>>>       }
>>>   +#if 0
>>> +    /*
>>> +     * Need a way to check for any copy of SB, as its not a
>>> +     * strong check, just ignore this for now.
>>> +     */
>>>       if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>           btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>                 btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>           ret = -EINVAL;
>>>       }
>>> +#endif
>>>         /*
>>>        * Obvious sys_chunk_array corruptions, it must hold at least
>>> one key
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 9fa2539a8493..f368db94d62b 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   {
>>>       struct btrfs_super_block *disk_super;
>>>       struct block_device *bdev;
>>> -    struct page *page;
>>> +    struct buffer_head *sb_bh;
>>>       int ret = -EINVAL;
>>>       u64 devid;
>>>       u64 transid;
>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>           goto error;
>>>       }
>>>   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>
>> This patch prompts another question: why do we have a page-based and a
>> bufferhead-based interface to reading the super block?
> 
>  Right. we need to know that. Sorry I just saw this.

FWIW unless we explicitly need a per-block state tracking (which I don't
think we do) we should ideally switch to page-based mechanism. Buffer
heads are considered deprecated. Also the only reason why we do have
btrfsic_submit_bh is for the superblock bufer head. So potentially by
removing the only requirement in the kernel where bh are used we can
simplify the btrfsic code as well . Just something to keep in mind.

> 
>  I have a very old patch to converge them and clean this up, but haven't
>  sent it because there are some missing information on why it ended up
>  like that in the first place.
> 
> Thanks, Anand
> 
> 
>> I did prototype
>> switching the bufferheads to page based but the resulting code wasn't
>> any cleaner. I believe there is also open the question what happens when
>> btrfs is run on a 64k page machine. I.e. we are going to read a single
>> page and the sb is going to be in the first 4k but what about the rest
>> 60, they could potentially contain other metadata. The page will have to
>> be freed asap so as not to peg the neighboring metadata?
> 
> 
>>
>>> +    if (IS_ERR(sb_bh)) {
>>> +        ret = PTR_ERR(sb_bh);
>>>           goto error_bdev_put;
>>> +    }
>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>         devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>       transid = btrfs_super_generation(disk_super);
>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>       if (!ret && fs_devices_ret)
>>>           (*fs_devices_ret)->total_devices = total_devices;
>>>   -    btrfs_release_disk_super(page);
>>> +    brelse(sb_bh);
>>>     error_bdev_put:
>>>       blkdev_put(bdev, flags);
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hans van Kranenburg Dec. 8, 2017, 12:40 p.m. UTC | #9

On 12/08/2017 09:17 AM, Qu Wenruo wrote:
> 
> 
> On 2017年12月08日 15:57, Anand Jain wrote:
>> -EXPERIMENTAL-
>> As of now when primary SB fails we won't self heal and would fail mount,
>> this is an experimental patch which thinks why not go and read backup
>> copy.
> 
> Just curious about in which real world case that backup super block can
> help.
> At least from what I see in mail list, only few cases where backup super
> helps.

That's about using 'backup roots' with old info about previous
generations than the current one on disk, which is a different thing
than using one of the other copies of the super block itself.

> Despite that self super heal seems good, although I agree with David, we
> need a weaker but necessary check (magic and fsid from primary super?)
> to ensure it's a valid btrfs before we use the backup supers.
> 
> Thanks,
> Qu
> 
> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>  fs/btrfs/disk-io.c |  8 +++++++-
>>  fs/btrfs/volumes.c | 10 +++++++---
>>  2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 9b20c1f3563b..a791b8dfe8a8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>>  	 * So, we need to add a special mount option to scan for
>>  	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>  	 */
>> -	for (i = 0; i < 1; i++) {
>> +	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>  		ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>  		if (ret)
>>  			continue;
>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
>>  		ret = -EINVAL;
>>  	}
>>  
>> +#if 0
>> +	/*
>> +	 * Need a way to check for any copy of SB, as its not a
>> +	 * strong check, just ignore this for now.
>> +	 */
>>  	if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>  		btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>  			  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>  		ret = -EINVAL;
>>  	}
>> +#endif
>>  
>>  	/*
>>  	 * Obvious sys_chunk_array corruptions, it must hold at least one key
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9fa2539a8493..f368db94d62b 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>>  {
>>  	struct btrfs_super_block *disk_super;
>>  	struct block_device *bdev;
>> -	struct page *page;
>> +	struct buffer_head *sb_bh;
>>  	int ret = -EINVAL;
>>  	u64 devid;
>>  	u64 transid;
>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>>  		goto error;
>>  	}
>>  
>> -	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>> +	sb_bh = btrfs_read_dev_super(bdev);
>> +	if (IS_ERR(sb_bh)) {
>> +		ret = PTR_ERR(sb_bh);
>>  		goto error_bdev_put;
>> +	}
>> +	disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>  
>>  	devid = btrfs_stack_device_id(&disk_super->dev_item);
>>  	transid = btrfs_super_generation(disk_super);
>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>>  	if (!ret && fs_devices_ret)
>>  		(*fs_devices_ret)->total_devices = total_devices;
>>  
>> -	btrfs_release_disk_super(page);
>> +	brelse(sb_bh);
>>  
>>  error_bdev_put:
>>  	blkdev_put(bdev, flags);
>>
>

Anand Jain Dec. 8, 2017, 12:41 p.m. UTC | #10

On 12/08/2017 08:02 PM, Qu Wenruo wrote:
> 
> 
> On 2017年12月08日 19:48, Anand Jain wrote:
>>
>>
>> On 12/08/2017 07:01 PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2017年12月08日 18:39, Anand Jain wrote:
>>>>
>>>>
>>>> On 12/08/2017 04:17 PM, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2017年12月08日 15:57, Anand Jain wrote:
>>>>>> -EXPERIMENTAL-
>>>>>> As of now when primary SB fails we won't self heal and would fail
>>>>>> mount,
>>>>>> this is an experimental patch which thinks why not go and read backup
>>>>>> copy.
>>>>>
>>>>> Just curious about in which real world case that backup super block can
>>>>> help.
>>>>> At least from what I see in mail list, only few cases where backup
>>>>> super
>>>>> helps.
>>>>
>>>>    Theoretical design helps. I ended up in this situation though. And
>>>>    ext4 has -o sb flag to manage this part. When we can expect EIO on
>>>>    any part of the disk block why not on the LBA which contains primary
>>>>    SB. And should we fail the mount for that reason ? No.
>>>
>>> And how do you ensure it's a btrfs?
>>
>>   Hmm. You mean outside of btrfs ? I did experiment with wipe and then
>>   using /etc/fstab to mount, and it did lead to btrfs, is that your
>>   concern that it shouldn't have been. That looked surprising to me as
>>   well, but then problem points at wipefs instead.
> 
> It's closer, but still doesn't reach the point.
> It can be a mkfs, other than wipefs.
> 
> It's can be another case like:
> 
> One user used btrfs for a while
> But bugs made him/her unhappy, and he/she turned to use xfs (whatever
> the fs is) instead.
> 
> While he/she forgot to change its fstab, and rebooted the system.
> 
> And suddenly, it's btrfs again!
> What a surprise.
> 

  I have worked quite a lot on defining the problem statements before
  btrfs. Here the user has to be blamed to have forgotten to update
  the fstab. I don't understand why should we workaround in btrfs for
  the reason that user may miss to update fstab. We don't design
  a stuff that like that, we design for the purpose, here backup SB
  is for the purpose that we all know, if we don't use backup SB, then
  its an incomplete design.

>>
>>>
>>>>
>>>>> Despite that self super heal seems good, although I agree with
>>>>> David, we
>>>>> need a weaker but necessary check (magic and fsid from primary super?)
>>>>> to ensure it's a valid btrfs before we use the backup supers.
>>>>
>>>>    Of course, we already have btrfs_check_super_valid() to verify the SB,
>>>>    I don't understand why - how do we verify the SB should be the
>>>> point of
>>>>    concern here, at all.
>>>
>>> The point here is, to distinguish an old and invalid btrfs (some other
>>> valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
>>> primary fs.
>>
>>   Ok. When you check all the SBs you would pick the one which has passed
>>   btrfs_check_super_valid() and has highest generation. Did I ans your
>>   concern ? If a SB does not pass btrfs_check_super_valid() its not a
>>   valid btrfs SB at all.
> 
> No, not really.
> 
> What if the first SB is a XFS one or even a fs you didn't ever hear?
> 
> Skip it and use the 2nd one?
> This filesystem is not even btrfs anymore.

  There is no problem is user does the correct thing that is to
  update the fstab. OR using a complete mount command. If user
  using -t btrfs OR unupdated btrfs then it indicates his intentions.

Thanks, Anand

> Thanks,
> Qu
> 
>>
>>
>>> This the main concern here.
>>> The difference between recovery and recognizing a bad btrfs is quite
>>> important.
>>
>>   btrfs_check_super_valid() is already doing that right ? The point here
>>   is, should we use the backup SB when btrfs_check_super_valid() fails on
>>   primary SB.
>>
>> Thanks, Anand
>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> Thanks, Anand
>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>>>> ---
>>>>>>     fs/btrfs/disk-io.c |  8 +++++++-
>>>>>>     fs/btrfs/volumes.c | 10 +++++++---
>>>>>>     2 files changed, 14 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>>>>> --- a/fs/btrfs/disk-io.c
>>>>>> +++ b/fs/btrfs/disk-io.c
>>>>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>>>>> block_device *bdev)
>>>>>>          * So, we need to add a special mount option to scan for
>>>>>>          * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>>>>          */
>>>>>> -    for (i = 0; i < 1; i++) {
>>>>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>>>>             ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>>>>             if (ret)
>>>>>>                 continue;
>>>>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>>>>> btrfs_fs_info *fs_info)
>>>>>>             ret = -EINVAL;
>>>>>>         }
>>>>>>     +#if 0
>>>>>> +    /*
>>>>>> +     * Need a way to check for any copy of SB, as its not a
>>>>>> +     * strong check, just ignore this for now.
>>>>>> +     */
>>>>>>         if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>>>>             btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>>>>                   btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>>>>             ret = -EINVAL;
>>>>>>         }
>>>>>> +#endif
>>>>>>           /*
>>>>>>          * Obvious sys_chunk_array corruptions, it must hold at least
>>>>>> one key
>>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>>> index 9fa2539a8493..f368db94d62b 100644
>>>>>> --- a/fs/btrfs/volumes.c
>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>>>>> fmode_t flags, void *holder,
>>>>>>     {
>>>>>>         struct btrfs_super_block *disk_super;
>>>>>>         struct block_device *bdev;
>>>>>> -    struct page *page;
>>>>>> +    struct buffer_head *sb_bh;
>>>>>>         int ret = -EINVAL;
>>>>>>         u64 devid;
>>>>>>         u64 transid;
>>>>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>>>>> fmode_t flags, void *holder,
>>>>>>             goto error;
>>>>>>         }
>>>>>>     -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>>>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>>>>> +    if (IS_ERR(sb_bh)) {
>>>>>> +        ret = PTR_ERR(sb_bh);
>>>>>>             goto error_bdev_put;
>>>>>> +    }
>>>>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>>>>           devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>>>>         transid = btrfs_super_generation(disk_super);
>>>>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>>>>> fmode_t flags, void *holder,
>>>>>>         if (!ret && fs_devices_ret)
>>>>>>             (*fs_devices_ret)->total_devices = total_devices;
>>>>>>     -    btrfs_release_disk_super(page);
>>>>>> +    brelse(sb_bh);
>>>>>>       error_bdev_put:
>>>>>>         blkdev_put(bdev, flags);
>>>>>>
>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Austin S. Hemmelgarn Dec. 8, 2017, 12:51 p.m. UTC | #11

On 2017-12-08 02:57, Anand Jain wrote:
> -EXPERIMENTAL-
> As of now when primary SB fails we won't self heal and would fail mount,
> this is an experimental patch which thinks why not go and read backup
> copy.
I like the concept, and actually think this should be default behavior 
on a filesystem that's already mounted (we fix other errors, why not 
SB's), but I don't think it should be default behavior at mount time for 
the reasons Qu has outlined (picking up old BTRFS SB's after 
reformatting is bad).  However, I do think it's useful to be able to ask 
for this behavior on mount, so that you don't need to fight with the 
programs to get a filesystem to mount when the first SB is missing 
(perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>   fs/btrfs/disk-io.c |  8 +++++++-
>   fs/btrfs/volumes.c | 10 +++++++---
>   2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9b20c1f3563b..a791b8dfe8a8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
>   	 * So, we need to add a special mount option to scan for
>   	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>   	 */
> -	for (i = 0; i < 1; i++) {
> +	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>   		ret = btrfs_read_dev_one_super(bdev, i, &bh);
>   		if (ret)
>   			continue;
> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
>   		ret = -EINVAL;
>   	}
>   
> +#if 0
> +	/*
> +	 * Need a way to check for any copy of SB, as its not a
> +	 * strong check, just ignore this for now.
> +	 */
>   	if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>   		btrfs_err(fs_info, "super offset mismatch %llu != %u",
>   			  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>   		ret = -EINVAL;
>   	}
> +#endif
>   
>   	/*
>   	 * Obvious sys_chunk_array corruptions, it must hold at least one key
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9fa2539a8493..f368db94d62b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>   {
>   	struct btrfs_super_block *disk_super;
>   	struct block_device *bdev;
> -	struct page *page;
> +	struct buffer_head *sb_bh;
>   	int ret = -EINVAL;
>   	u64 devid;
>   	u64 transid;
> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>   		goto error;
>   	}
>   
> -	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
> +	sb_bh = btrfs_read_dev_super(bdev);
> +	if (IS_ERR(sb_bh)) {
> +		ret = PTR_ERR(sb_bh);
>   		goto error_bdev_put;
> +	}
> +	disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>   
>   	devid = btrfs_stack_device_id(&disk_super->dev_item);
>   	transid = btrfs_super_generation(disk_super);
> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
>   	if (!ret && fs_devices_ret)
>   		(*fs_devices_ret)->total_devices = total_devices;
>   
> -	btrfs_release_disk_super(page);
> +	brelse(sb_bh);
>   
>   error_bdev_put:
>   	blkdev_put(bdev, flags);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo Dec. 8, 2017, 12:52 p.m. UTC | #12

On 2017年12月08日 20:41, Anand Jain wrote:
> 
> 
> On 12/08/2017 08:02 PM, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 19:48, Anand Jain wrote:
>>>
>>>
>>> On 12/08/2017 07:01 PM, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2017年12月08日 18:39, Anand Jain wrote:
>>>>>
>>>>>
>>>>> On 12/08/2017 04:17 PM, Qu Wenruo wrote:
>>>>>>
>>>>>>
>>>>>> On 2017年12月08日 15:57, Anand Jain wrote:
>>>>>>> -EXPERIMENTAL-
>>>>>>> As of now when primary SB fails we won't self heal and would fail
>>>>>>> mount,
>>>>>>> this is an experimental patch which thinks why not go and read
>>>>>>> backup
>>>>>>> copy.
>>>>>>
>>>>>> Just curious about in which real world case that backup super
>>>>>> block can
>>>>>> help.
>>>>>> At least from what I see in mail list, only few cases where backup
>>>>>> super
>>>>>> helps.
>>>>>
>>>>>    Theoretical design helps. I ended up in this situation though. And
>>>>>    ext4 has -o sb flag to manage this part. When we can expect EIO on
>>>>>    any part of the disk block why not on the LBA which contains
>>>>> primary
>>>>>    SB. And should we fail the mount for that reason ? No.
>>>>
>>>> And how do you ensure it's a btrfs?
>>>
>>>   Hmm. You mean outside of btrfs ? I did experiment with wipe and then
>>>   using /etc/fstab to mount, and it did lead to btrfs, is that your
>>>   concern that it shouldn't have been. That looked surprising to me as
>>>   well, but then problem points at wipefs instead.
>>
>> It's closer, but still doesn't reach the point.
>> It can be a mkfs, other than wipefs.
>>
>> It's can be another case like:
>>
>> One user used btrfs for a while
>> But bugs made him/her unhappy, and he/she turned to use xfs (whatever
>> the fs is) instead.
>>
>> While he/she forgot to change its fstab, and rebooted the system.
>>
>> And suddenly, it's btrfs again!
>> What a surprise.
>>
> 
>  I have worked quite a lot on defining the problem statements before
>  btrfs. Here the user has to be blamed to have forgotten to update
>  the fstab. I don't understand why should we workaround in btrfs for
>  the reason that user may miss to update fstab. We don't design
>  a stuff that like that, we design for the purpose, here backup SB
>  is for the purpose that we all know, if we don't use backup SB, then
>  its an incomplete design.

Then implement something like ext*, using explicit mount option sb=n.

And since it's already called "backup", it's something used for
recovery, not for normal operation.
We already have rescue tool to recover sb from backup, so it's not a
incomplete design.

Personally speaking, I prefer the current way and leave backup just as
backup.
You can my example of a user fault, but the real world needs us to
handle user's fault.

Or there will be no need for mkfs -f options or rm --no-preserve-root
options at all.

Thanks,
Qu

> 
>>>
>>>>
>>>>>
>>>>>> Despite that self super heal seems good, although I agree with
>>>>>> David, we
>>>>>> need a weaker but necessary check (magic and fsid from primary
>>>>>> super?)
>>>>>> to ensure it's a valid btrfs before we use the backup supers.
>>>>>
>>>>>    Of course, we already have btrfs_check_super_valid() to verify
>>>>> the SB,
>>>>>    I don't understand why - how do we verify the SB should be the
>>>>> point of
>>>>>    concern here, at all.
>>>>
>>>> The point here is, to distinguish an old and invalid btrfs (some other
>>>> valid fs mkfs beyond the old fs) from a valid btrfs with corrupted
>>>> primary fs.
>>>
>>>   Ok. When you check all the SBs you would pick the one which has passed
>>>   btrfs_check_super_valid() and has highest generation. Did I ans your
>>>   concern ? If a SB does not pass btrfs_check_super_valid() its not a
>>>   valid btrfs SB at all.
>>
>> No, not really.
>>
>> What if the first SB is a XFS one or even a fs you didn't ever hear?
>>
>> Skip it and use the 2nd one?
>> This filesystem is not even btrfs anymore.
> 
>  There is no problem is user does the correct thing that is to
>  update the fstab. OR using a complete mount command. If user
>  using -t btrfs OR unupdated btrfs then it indicates his intentions.
> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>>
>>>
>>>> This the main concern here.
>>>> The difference between recovery and recognizing a bad btrfs is quite
>>>> important.
>>>
>>>   btrfs_check_super_valid() is already doing that right ? The point here
>>>   is, should we use the backup SB when btrfs_check_super_valid()
>>> fails on
>>>   primary SB.
>>>
>>> Thanks, Anand
>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> Thanks, Anand
>>>>>
>>>>>> Thanks,
>>>>>> Qu
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>>>>> ---
>>>>>>>     fs/btrfs/disk-io.c |  8 +++++++-
>>>>>>>     fs/btrfs/volumes.c | 10 +++++++---
>>>>>>>     2 files changed, 14 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>>>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>>>>>> --- a/fs/btrfs/disk-io.c
>>>>>>> +++ b/fs/btrfs/disk-io.c
>>>>>>> @@ -3190,7 +3190,7 @@ struct buffer_head
>>>>>>> *btrfs_read_dev_super(struct
>>>>>>> block_device *bdev)
>>>>>>>          * So, we need to add a special mount option to scan for
>>>>>>>          * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>>>>>          */
>>>>>>> -    for (i = 0; i < 1; i++) {
>>>>>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>>>>>             ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>>>>>             if (ret)
>>>>>>>                 continue;
>>>>>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>>>>>> btrfs_fs_info *fs_info)
>>>>>>>             ret = -EINVAL;
>>>>>>>         }
>>>>>>>     +#if 0
>>>>>>> +    /*
>>>>>>> +     * Need a way to check for any copy of SB, as its not a
>>>>>>> +     * strong check, just ignore this for now.
>>>>>>> +     */
>>>>>>>         if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>>>>>             btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>>>>>                   btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>>>>>             ret = -EINVAL;
>>>>>>>         }
>>>>>>> +#endif
>>>>>>>           /*
>>>>>>>          * Obvious sys_chunk_array corruptions, it must hold at
>>>>>>> least
>>>>>>> one key
>>>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>>>> index 9fa2539a8493..f368db94d62b 100644
>>>>>>> --- a/fs/btrfs/volumes.c
>>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>>>>>> fmode_t flags, void *holder,
>>>>>>>     {
>>>>>>>         struct btrfs_super_block *disk_super;
>>>>>>>         struct block_device *bdev;
>>>>>>> -    struct page *page;
>>>>>>> +    struct buffer_head *sb_bh;
>>>>>>>         int ret = -EINVAL;
>>>>>>>         u64 devid;
>>>>>>>         u64 transid;
>>>>>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>>>>>> fmode_t flags, void *holder,
>>>>>>>             goto error;
>>>>>>>         }
>>>>>>>     -    if (btrfs_read_disk_super(bdev, bytenr, &page,
>>>>>>> &disk_super))
>>>>>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>>>>>> +    if (IS_ERR(sb_bh)) {
>>>>>>> +        ret = PTR_ERR(sb_bh);
>>>>>>>             goto error_bdev_put;
>>>>>>> +    }
>>>>>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>>>>>           devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>>>>>         transid = btrfs_super_generation(disk_super);
>>>>>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>>>>>> fmode_t flags, void *holder,
>>>>>>>         if (!ret && fs_devices_ret)
>>>>>>>             (*fs_devices_ret)->total_devices = total_devices;
>>>>>>>     -    btrfs_release_disk_super(page);
>>>>>>> +    brelse(sb_bh);
>>>>>>>       error_bdev_put:
>>>>>>>         blkdev_put(bdev, flags);
>>>>>>>
>>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo Dec. 8, 2017, 12:59 p.m. UTC | #13

On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote:
> On 2017-12-08 02:57, Anand Jain wrote:
>> -EXPERIMENTAL-
>> As of now when primary SB fails we won't self heal and would fail mount,
>> this is an experimental patch which thinks why not go and read backup
>> copy.
> I like the concept, and actually think this should be default behavior
> on a filesystem that's already mounted (we fix other errors, why not
> SB's), but I don't think it should be default behavior at mount time for
> the reasons Qu has outlined (picking up old BTRFS SB's after
> reformatting is bad).  However, I do think it's useful to be able to ask
> for this behavior on mount, so that you don't need to fight with the
> programs to get a filesystem to mount when the first SB is missing
> (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).

Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount
option than do it automatically.

Thanks,
Qu

>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/disk-io.c |  8 +++++++-
>>   fs/btrfs/volumes.c | 10 +++++++---
>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 9b20c1f3563b..a791b8dfe8a8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>> block_device *bdev)
>>        * So, we need to add a special mount option to scan for
>>        * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>        */
>> -    for (i = 0; i < 1; i++) {
>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>           ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>           if (ret)
>>               continue;
>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>> btrfs_fs_info *fs_info)
>>           ret = -EINVAL;
>>       }
>>   +#if 0
>> +    /*
>> +     * Need a way to check for any copy of SB, as its not a
>> +     * strong check, just ignore this for now.
>> +     */
>>       if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>           btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>                 btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>           ret = -EINVAL;
>>       }
>> +#endif
>>         /*
>>        * Obvious sys_chunk_array corruptions, it must hold at least
>> one key
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9fa2539a8493..f368db94d62b 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>> fmode_t flags, void *holder,
>>   {
>>       struct btrfs_super_block *disk_super;
>>       struct block_device *bdev;
>> -    struct page *page;
>> +    struct buffer_head *sb_bh;
>>       int ret = -EINVAL;
>>       u64 devid;
>>       u64 transid;
>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>> fmode_t flags, void *holder,
>>           goto error;
>>       }
>>   -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>> +    sb_bh = btrfs_read_dev_super(bdev);
>> +    if (IS_ERR(sb_bh)) {
>> +        ret = PTR_ERR(sb_bh);
>>           goto error_bdev_put;
>> +    }
>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>         devid = btrfs_stack_device_id(&disk_super->dev_item);
>>       transid = btrfs_super_generation(disk_super);
>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>> fmode_t flags, void *holder,
>>       if (!ret && fs_devices_ret)
>>           (*fs_devices_ret)->total_devices = total_devices;
>>   -    btrfs_release_disk_super(page);
>> +    brelse(sb_bh);
>>     error_bdev_put:
>>       blkdev_put(bdev, flags);
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anand Jain Dec. 8, 2017, 1:02 p.m. UTC | #14

On 12/08/2017 08:12 PM, Nikolay Borisov wrote:
> 
> 
> On  8.12.2017 12:33, Anand Jain wrote:
>>
>>
>> On 12/08/2017 04:40 PM, Nikolay Borisov wrote:
>>>
>>>
>>> On  8.12.2017 09:57, Anand Jain wrote:
>>>> -EXPERIMENTAL-
>>>> As of now when primary SB fails we won't self heal and would fail mount,
>>>> this is an experimental patch which thinks why not go and read backup
>>>> copy.
>>>>
>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>> ---
>>>>    fs/btrfs/disk-io.c |  8 +++++++-
>>>>    fs/btrfs/volumes.c | 10 +++++++---
>>>>    2 files changed, 14 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>>> block_device *bdev)
>>>>         * So, we need to add a special mount option to scan for
>>>>         * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>>         */
>>>> -    for (i = 0; i < 1; i++) {
>>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>>            ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>>            if (ret)
>>>>                continue;
>>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>>> btrfs_fs_info *fs_info)
>>>>            ret = -EINVAL;
>>>>        }
>>>>    +#if 0
>>>> +    /*
>>>> +     * Need a way to check for any copy of SB, as its not a
>>>> +     * strong check, just ignore this for now.
>>>> +     */
>>>>        if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>>            btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>>                  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>>            ret = -EINVAL;
>>>>        }
>>>> +#endif
>>>>          /*
>>>>         * Obvious sys_chunk_array corruptions, it must hold at least
>>>> one key
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index 9fa2539a8493..f368db94d62b 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>>> fmode_t flags, void *holder,
>>>>    {
>>>>        struct btrfs_super_block *disk_super;
>>>>        struct block_device *bdev;
>>>> -    struct page *page;
>>>> +    struct buffer_head *sb_bh;
>>>>        int ret = -EINVAL;
>>>>        u64 devid;
>>>>        u64 transid;
>>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>>> fmode_t flags, void *holder,
>>>>            goto error;
>>>>        }
>>>>    -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>>
>>> This patch prompts another question: why do we have a page-based and a
>>> bufferhead-based interface to reading the super block?
>>
>>   Right. we need to know that. Sorry I just saw this.
> 
> FWIW unless we explicitly need a per-block state tracking (which I don't
> think we do) we should ideally switch to page-based mechanism. Buffer
> heads are considered deprecated. Also the only reason why we do have
> btrfsic_submit_bh is for the superblock bufer head. So potentially by
> removing the only requirement in the kernel where bh are used we can
> simplify the btrfsic code as well . Just something to keep in mind.

  Thanks. These are related to the other email thread [1]
  [1]
  [Question] Two variants of SB reads can we move into bio read instead ?
  We can follow up there. ? Could we ? That cleanup is due for a long
  time now.

Thanks, Anand

>>
>>   I have a very old patch to converge them and clean this up, but haven't
>>   sent it because there are some missing information on why it ended up
>>   like that in the first place.
>>
>> Thanks, Anand
>>
>>
>>> I did prototype
>>> switching the bufferheads to page based but the resulting code wasn't
>>> any cleaner. I believe there is also open the question what happens when
>>> btrfs is run on a 64k page machine. I.e. we are going to read a single
>>> page and the sb is going to be in the first 4k but what about the rest
>>> 60, they could potentially contain other metadata. The page will have to
>>> be freed asap so as not to peg the neighboring metadata?
>>
>>
>>>
>>>> +    if (IS_ERR(sb_bh)) {
>>>> +        ret = PTR_ERR(sb_bh);
>>>>            goto error_bdev_put;
>>>> +    }
>>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>>          devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>>        transid = btrfs_super_generation(disk_super);
>>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>>> fmode_t flags, void *holder,
>>>>        if (!ret && fs_devices_ret)
>>>>            (*fs_devices_ret)->total_devices = total_devices;
>>>>    -    btrfs_release_disk_super(page);
>>>> +    brelse(sb_bh);
>>>>      error_bdev_put:
>>>>        blkdev_put(bdev, flags);
>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Austin S. Hemmelgarn Dec. 8, 2017, 1:05 p.m. UTC | #15

On 2017-12-08 07:59, Qu Wenruo wrote:
> 
> 
> On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote:
>> On 2017-12-08 02:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail mount,
>>> this is an experimental patch which thinks why not go and read backup
>>> copy.
>> I like the concept, and actually think this should be default behavior
>> on a filesystem that's already mounted (we fix other errors, why not
>> SB's), but I don't think it should be default behavior at mount time for
>> the reasons Qu has outlined (picking up old BTRFS SB's after
>> reformatting is bad).  However, I do think it's useful to be able to ask
>> for this behavior on mount, so that you don't need to fight with the
>> programs to get a filesystem to mount when the first SB is missing
>> (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).
> 
> Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount
> option than do it automatically.
I still think there should be an option to do automatic detection (it's 
not particularly hard, and it's not likely to do the wrong thing in most 
cases), but being able to explicitly specify a particular superblock for 
a mount is definitely a step in the right direction.
> 
> Thanks,
> Qu
> 
>>>
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>> ---
>>>    fs/btrfs/disk-io.c |  8 +++++++-
>>>    fs/btrfs/volumes.c | 10 +++++++---
>>>    2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>> block_device *bdev)
>>>         * So, we need to add a special mount option to scan for
>>>         * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>         */
>>> -    for (i = 0; i < 1; i++) {
>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>            ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>            if (ret)
>>>                continue;
>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>> btrfs_fs_info *fs_info)
>>>            ret = -EINVAL;
>>>        }
>>>    +#if 0
>>> +    /*
>>> +     * Need a way to check for any copy of SB, as its not a
>>> +     * strong check, just ignore this for now.
>>> +     */
>>>        if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>            btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>                  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>            ret = -EINVAL;
>>>        }
>>> +#endif
>>>          /*
>>>         * Obvious sys_chunk_array corruptions, it must hold at least
>>> one key
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 9fa2539a8493..f368db94d62b 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>    {
>>>        struct btrfs_super_block *disk_super;
>>>        struct block_device *bdev;
>>> -    struct page *page;
>>> +    struct buffer_head *sb_bh;
>>>        int ret = -EINVAL;
>>>        u64 devid;
>>>        u64 transid;
>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>            goto error;
>>>        }
>>>    -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>> +    if (IS_ERR(sb_bh)) {
>>> +        ret = PTR_ERR(sb_bh);
>>>            goto error_bdev_put;
>>> +    }
>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>          devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>        transid = btrfs_super_generation(disk_super);
>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>        if (!ret && fs_devices_ret)
>>>            (*fs_devices_ret)->total_devices = total_devices;
>>>    -    btrfs_release_disk_super(page);
>>> +    brelse(sb_bh);
>>>      error_bdev_put:
>>>        blkdev_put(bdev, flags);
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo Dec. 8, 2017, 1:09 p.m. UTC | #16

On 2017年12月08日 21:05, Austin S. Hemmelgarn wrote:
> On 2017-12-08 07:59, Qu Wenruo wrote:
>>
>>
>> On 2017年12月08日 20:51, Austin S. Hemmelgarn wrote:
>>> On 2017-12-08 02:57, Anand Jain wrote:
>>>> -EXPERIMENTAL-
>>>> As of now when primary SB fails we won't self heal and would fail
>>>> mount,
>>>> this is an experimental patch which thinks why not go and read backup
>>>> copy.
>>> I like the concept, and actually think this should be default behavior
>>> on a filesystem that's already mounted (we fix other errors, why not
>>> SB's), but I don't think it should be default behavior at mount time for
>>> the reasons Qu has outlined (picking up old BTRFS SB's after
>>> reformatting is bad).  However, I do think it's useful to be able to ask
>>> for this behavior on mount, so that you don't need to fight with the
>>> programs to get a filesystem to mount when the first SB is missing
>>> (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).
>>
>> Yeah, I also like the idea of 'usebackupsuper'/'usebackupsuper=n' mount
>> option than do it automatically.
> I still think there should be an option to do automatic detection (it's
> not particularly hard, and it's not likely to do the wrong thing in most
> cases), but being able to explicitly specify a particular superblock for
> a mount is definitely a step in the right direction.

usebackupsuper <- Auto
usebackupsuper=n <- Manual
usebackupsuper=n,m <- Manual multi, in given order
(The last one seems a little overkilled though)

Thanks,
Qu

>>
>> Thanks,
>> Qu
>>
>>>>
>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>> ---
>>>>    fs/btrfs/disk-io.c |  8 +++++++-
>>>>    fs/btrfs/volumes.c | 10 +++++++---
>>>>    2 files changed, 14 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>>> block_device *bdev)
>>>>         * So, we need to add a special mount option to scan for
>>>>         * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>>         */
>>>> -    for (i = 0; i < 1; i++) {
>>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>>            ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>>            if (ret)
>>>>                continue;
>>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>>> btrfs_fs_info *fs_info)
>>>>            ret = -EINVAL;
>>>>        }
>>>>    +#if 0
>>>> +    /*
>>>> +     * Need a way to check for any copy of SB, as its not a
>>>> +     * strong check, just ignore this for now.
>>>> +     */
>>>>        if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>>            btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>>                  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>>            ret = -EINVAL;
>>>>        }
>>>> +#endif
>>>>          /*
>>>>         * Obvious sys_chunk_array corruptions, it must hold at least
>>>> one key
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index 9fa2539a8493..f368db94d62b 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>>> fmode_t flags, void *holder,
>>>>    {
>>>>        struct btrfs_super_block *disk_super;
>>>>        struct block_device *bdev;
>>>> -    struct page *page;
>>>> +    struct buffer_head *sb_bh;
>>>>        int ret = -EINVAL;
>>>>        u64 devid;
>>>>        u64 transid;
>>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>>> fmode_t flags, void *holder,
>>>>            goto error;
>>>>        }
>>>>    -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>>> +    if (IS_ERR(sb_bh)) {
>>>> +        ret = PTR_ERR(sb_bh);
>>>>            goto error_bdev_put;
>>>> +    }
>>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>>          devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>>        transid = btrfs_super_generation(disk_super);
>>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>>> fmode_t flags, void *holder,
>>>>        if (!ret && fs_devices_ret)
>>>>            (*fs_devices_ret)->total_devices = total_devices;
>>>>    -    btrfs_release_disk_super(page);
>>>> +    brelse(sb_bh);
>>>>      error_bdev_put:
>>>>        blkdev_put(bdev, flags);
>>>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anand Jain Dec. 8, 2017, 2:02 p.m. UTC | #17

On 12/08/2017 08:51 PM, Austin S. Hemmelgarn wrote:
> On 2017-12-08 02:57, Anand Jain wrote:
>> -EXPERIMENTAL-
>> As of now when primary SB fails we won't self heal and would fail mount,
>> this is an experimental patch which thinks why not go and read backup
>> copy.
> I like the concept, and actually think this should be default behavior 
> on a filesystem that's already mounted (we fix other errors, why not 
> SB's), 


> but I don't think it should be default behavior at mount time for 
> the reasons Qu has outlined (picking up old BTRFS SB's after 
> reformatting is bad).

  If the primary SB is not btrfs, then unless forced with -t btrfs
  option, this mount won't be in btrfs at all. That means it goes to
  the respective FS by default.

  If the old btrfs is over written with newer btrfs SB, and if mkfs.btrfs
  is not overwriting all the copies of SB then its a mkfs.btrfs bug.

  Which means there is already mount -t option to redirect to the FS
  module that is needed by the user, when primary SB is corrupted.
  We should use this feature.

Thanks, Anand


> However, I do think it's useful to be able to ask 
> for this behavior on mount, so that you don't need to fight with the 
> programs to get a filesystem to mount when the first SB is missing 
> (perhaps add a 'usebackupsb' option to mirror 'usebackuproot'?).
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/disk-io.c |  8 +++++++-
>>   fs/btrfs/volumes.c | 10 +++++++---
>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 9b20c1f3563b..a791b8dfe8a8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct 
>> block_device *bdev)
>>        * So, we need to add a special mount option to scan for
>>        * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>        */
>> -    for (i = 0; i < 1; i++) {
>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>           ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>           if (ret)
>>               continue;
>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct 
>> btrfs_fs_info *fs_info)
>>           ret = -EINVAL;
>>       }
>> +#if 0
>> +    /*
>> +     * Need a way to check for any copy of SB, as its not a
>> +     * strong check, just ignore this for now.
>> +     */
>>       if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>           btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>                 btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>           ret = -EINVAL;
>>       }
>> +#endif
>>       /*
>>        * Obvious sys_chunk_array corruptions, it must hold at least 
>> one key
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9fa2539a8493..f368db94d62b 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path, 
>> fmode_t flags, void *holder,
>>   {
>>       struct btrfs_super_block *disk_super;
>>       struct block_device *bdev;
>> -    struct page *page;
>> +    struct buffer_head *sb_bh;
>>       int ret = -EINVAL;
>>       u64 devid;
>>       u64 transid;
>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path, 
>> fmode_t flags, void *holder,
>>           goto error;
>>       }
>> -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>> +    sb_bh = btrfs_read_dev_super(bdev);
>> +    if (IS_ERR(sb_bh)) {
>> +        ret = PTR_ERR(sb_bh);
>>           goto error_bdev_put;
>> +    }
>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>       devid = btrfs_stack_device_id(&disk_super->dev_item);
>>       transid = btrfs_super_generation(disk_super);
>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path, 
>> fmode_t flags, void *holder,
>>       if (!ret && fs_devices_ret)
>>           (*fs_devices_ret)->total_devices = total_devices;
>> -    btrfs_release_disk_super(page);
>> +    brelse(sb_bh);
>>   error_bdev_put:
>>       blkdev_put(bdev, flags);
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paul Jones Dec. 8, 2017, 11:40 p.m. UTC | #18

> -----Original Message-----

> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-

> owner@vger.kernel.org] On Behalf Of Austin S. Hemmelgarn

> Sent: Friday, 8 December 2017 11:51 PM

> To: Anand Jain <anand.jain@oracle.com>; linux-btrfs@vger.kernel.org

> Subject: Re: [PATCH RFC] btrfs: self heal from SB fail

> 

> On 2017-12-08 02:57, Anand Jain wrote:

> > -EXPERIMENTAL-

> > As of now when primary SB fails we won't self heal and would fail

> > mount, this is an experimental patch which thinks why not go and read

> > backup copy.

> I like the concept, and actually think this should be default behavior on a

> filesystem that's already mounted (we fix other errors, why not SB's), but I

> don't think it should be default behavior at mount time for the reasons Qu

> has outlined (picking up old BTRFS SB's after reformatting is bad).  However, I

> do think it's useful to be able to ask for this behavior on mount, so that you

> don't need to fight with the programs to get a filesystem to mount when the

> first SB is missing (perhaps add a 'usebackupsb' option to mirror

> 'usebackuproot'?).


I agree with this. The behaviour I'd like to see would be refusal to mount (without additional mount options) but also: print the needed info to the kernel log so the user can add the required mount option or read the wiki for more information, and print some diagnostic info on the primary + secondary super blocks.

Paul.

Qu Wenruo Dec. 9, 2017, 1:48 a.m. UTC | #19

On 2017年12月08日 22:02, Anand Jain wrote:
> 
> 
> On 12/08/2017 08:51 PM, Austin S. Hemmelgarn wrote:
>> On 2017-12-08 02:57, Anand Jain wrote:
>>> -EXPERIMENTAL-
>>> As of now when primary SB fails we won't self heal and would fail mount,
>>> this is an experimental patch which thinks why not go and read backup
>>> copy.
>> I like the concept, and actually think this should be default behavior
>> on a filesystem that's already mounted (we fix other errors, why not
>> SB's), 
> 
> 
>> but I don't think it should be default behavior at mount time for the
>> reasons Qu has outlined (picking up old BTRFS SB's after reformatting
>> is bad).
> 
>  If the primary SB is not btrfs, then unless forced with -t btrfs
>  option, this mount won't be in btrfs at all. That means it goes to
>  the respective FS by default.
> 
>  If the old btrfs is over written with newer btrfs SB, and if mkfs.btrfs
>  is not overwriting all the copies of SB then its a mkfs.btrfs bug.

Nope.

If the new btrfs is a smaller one than original btrfs, the 2nd super can
still be there.

(AFAIK all these examples have been given by David for several times)

Thanks,
Qu
> 
>  Which means there is already mount -t option to redirect to the FS
>  module that is needed by the user, when primary SB is corrupted.
>  We should use this feature.
> 
> Thanks, Anand
> 
> 
>> However, I do think it's useful to be able to ask for this behavior on
>> mount, so that you don't need to fight with the programs to get a
>> filesystem to mount when the first SB is missing (perhaps add a
>> 'usebackupsb' option to mirror 'usebackuproot'?).
>>>
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>> ---
>>>   fs/btrfs/disk-io.c |  8 +++++++-
>>>   fs/btrfs/volumes.c | 10 +++++++---
>>>   2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 9b20c1f3563b..a791b8dfe8a8 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3190,7 +3190,7 @@ struct buffer_head *btrfs_read_dev_super(struct
>>> block_device *bdev)
>>>        * So, we need to add a special mount option to scan for
>>>        * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>>        */
>>> -    for (i = 0; i < 1; i++) {
>>> +    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>>           ret = btrfs_read_dev_one_super(bdev, i, &bh);
>>>           if (ret)
>>>               continue;
>>> @@ -4015,11 +4015,17 @@ static int btrfs_check_super_valid(struct
>>> btrfs_fs_info *fs_info)
>>>           ret = -EINVAL;
>>>       }
>>> +#if 0
>>> +    /*
>>> +     * Need a way to check for any copy of SB, as its not a
>>> +     * strong check, just ignore this for now.
>>> +     */
>>>       if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>>>           btrfs_err(fs_info, "super offset mismatch %llu != %u",
>>>                 btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>>>           ret = -EINVAL;
>>>       }
>>> +#endif
>>>       /*
>>>        * Obvious sys_chunk_array corruptions, it must hold at least
>>> one key
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 9fa2539a8493..f368db94d62b 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1369,7 +1369,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>   {
>>>       struct btrfs_super_block *disk_super;
>>>       struct block_device *bdev;
>>> -    struct page *page;
>>> +    struct buffer_head *sb_bh;
>>>       int ret = -EINVAL;
>>>       u64 devid;
>>>       u64 transid;
>>> @@ -1392,8 +1392,12 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>           goto error;
>>>       }
>>> -    if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
>>> +    sb_bh = btrfs_read_dev_super(bdev);
>>> +    if (IS_ERR(sb_bh)) {
>>> +        ret = PTR_ERR(sb_bh);
>>>           goto error_bdev_put;
>>> +    }
>>> +    disk_super = (struct btrfs_super_block *) sb_bh->b_data;
>>>       devid = btrfs_stack_device_id(&disk_super->dev_item);
>>>       transid = btrfs_super_generation(disk_super);
>>> @@ -1413,7 +1417,7 @@ int btrfs_scan_one_device(const char *path,
>>> fmode_t flags, void *holder,
>>>       if (!ret && fs_devices_ret)
>>>           (*fs_devices_ret)->total_devices = total_devices;
>>> -    btrfs_release_disk_super(page);
>>> +    brelse(sb_bh);
>>>   error_bdev_put:
>>>       blkdev_put(bdev, flags);
>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anand Jain Dec. 11, 2017, 1:49 p.m. UTC | #20

>>   If the old btrfs is over written with newer btrfs SB, and if mkfs.btrfs
>>   is not overwriting all the copies of SB then its a mkfs.btrfs bug.
> 
> Nope.
> 
> If the new btrfs is a smaller one than original btrfs, the 2nd super can
> still be there.

  Oh right, the mkfs.btrfs -b <SIZE> option, where we don't
  use entire device.

  Here the source of the problem is having the stale SB which is
  created by previous btrfs, and at the time of mkfs we do notice
  that there is a stale SB, so we already know how many copy of SB
  were there, but we are bit conservative not to clean all SBs.
  Its important to fix the problem end (mkfs), not the victim end
  (sb recovery).

  But looks like not everyone agrees to this view, and sb self heal
  will go wrong if mkfs does not clean all copy of SB, so now the
  only way to recover it is by user intervention, so I am ok to add
  mount -o sb_copy_num=<0|1|2>. Suggestions for any better name is
  welcome.

  Side check, any idea what's the use case around mkfs.btrfs -b option
  other than for testing ?

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] btrfs: self heal from SB fail

Commit Message

Comments

Patch