[19/22] ext4: don't check before replay

Message ID	1563758631-29550-20-git-send-email-jsimmons@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <lustre-devel-bounces@lists.lustre.org> From: James Simmons <jsimmons@infradead.org> To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.com>, Shaun Tancheff <stancheff@cray.com>, Li Dongyang <dongyangli@ddn.com>, Artem Blagodarenko <c17828@cray.com>, Yang Sheng <ys@whamcloud.com> Date: Sun, 21 Jul 2019 21:23:48 -0400 Message-Id: <1563758631-29550-20-git-send-email-jsimmons@infradead.org> In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 19/22] ext4: don't check before replay Precedence: list Cc: Lustre Development List <lustre-devel@lists.lustre.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>
Series	ldiskfs patches against 5.2-rc2+ \| expand [00/22,RFC] ldiskfs patches against 5.2-rc2+ [01/22] ext4: add i_fs_version [02/22] ext4: use d_find_alias() in ext4_lookup [03/22] ext4: prealloc table optimization [04/22] ext4: export inode management [05/22] ext4: various misc changes [06/22] ext4: add extra checks for mballoc [07/22] ext4: update .. for hash indexed directory [08/22] ext4: kill off struct dx_root [09/22] ext4: fix mballoc pa free mismatch [10/22] ext4: add data in dentry feature [11/22] ext4: over ride current_time [12/22] ext4: add htree lock implementation [13/22] ext4: Add a proc interface for max_dir_size. [14/22] ext4: remove inode_lock handling [15/22] ext4: remove bitmap corruption warnings [16/22] ext4: add warning for directory htree growth [17/22] ext4: optimize ext4_journal_callback_add [18/22] ext4: attach jinode in writepages [19/22] ext4: don't check before replay [20/22] ext4: use GFP_NOFS in ext4_inode_attach_jinode [21/22] ext4: export ext4_orphan_add [22/22] ext4: export mb stream allocator variables

James Simmons July 22, 2019, 1:23 a.m. UTC

When ldiskfs run in failover mode whith read-only disk.
Part of allocation updates are lost and ldiskfs may fail
while mounting this is due to inconsistent state of
group-descriptor. Group-descriptor check is added after
journal replay.

Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/ext4/super.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

NeilBrown July 22, 2019, 5:29 a.m. UTC | #1

On Sun, Jul 21 2019, James Simmons wrote:

> When ldiskfs run in failover mode whith read-only disk.
> Part of allocation updates are lost and ldiskfs may fail
> while mounting this is due to inconsistent state of
> group-descriptor. Group-descriptor check is added after
> journal replay.

I think this needs to be enabled by a mount option or super-block flag.

NeilBrown


>
> Signed-off-by: James Simmons <jsimmons@infradead.org>
> ---
>  fs/ext4/super.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index a3179b2..b818acb 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4255,11 +4255,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  		}
>  	}
>  	sbi->s_gdb_count = db_count;
> -	if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
> -		ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
> -		ret = -EFSCORRUPTED;
> -		goto failed_mount2;
> -	}
>  
>  	timer_setup(&sbi->s_err_report, print_daily_error_info, 0);
>  
> @@ -4401,6 +4396,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  	sbi->s_journal->j_commit_callback = ext4_journal_commit_callback;
>  
>  no_journal:
> +	if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
> +		ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
> +		ret = -EFSCORRUPTED;
> +		goto failed_mount_wq;
> +	}
> +
>  	if (!test_opt(sb, NO_MBCACHE)) {
>  		sbi->s_ea_block_cache = ext4_xattr_create_cache();
>  		if (!sbi->s_ea_block_cache) {
> -- 
> 1.8.3.1

NeilBrown July 22, 2019, 6:46 a.m. UTC | #2

On Mon, Jul 22 2019, Alexey Lyashkov wrote:

> Why?
> Purpose of this patch is simple and don’t addressed a failover in general.
> Crash can occurred in commit time - when journal _partially_ flushed to the FS. Checking any FS metadata in this time is wrong, because we have no guarantee to be consistence.
> But checking an after journal replay is fine, it check have verification no corruption hit and FS is fine. 

If the corruption can occur in non-ldiskfs usage, and would be fixed by
a journal replay, then yes - the patch looks like a good idea.

Possibly I misunderstood the source of the corruption... maybe if that
could be made clearer in the commit message, that would help.

Thanks,
NeilBrown


>
>
>> 22 июля 2019 г., в 8:29, NeilBrown <neilb@suse.com> написал(а):
>> 
>> On Sun, Jul 21 2019, James Simmons wrote:
>> 
>>> When ldiskfs run in failover mode whith read-only disk.
>>> Part of allocation updates are lost and ldiskfs may fail
>>> while mounting this is due to inconsistent state of
>>> group-descriptor. Group-descriptor check is added after
>>> journal replay.
>> 
>> I think this needs to be enabled by a mount option or super-block flag.
>> 
>> NeilBrown
>> 
>> 
>>> 
>>> Signed-off-by: James Simmons <jsimmons@infradead.org>
>>> ---
>>> fs/ext4/super.c | 11 ++++++-----
>>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>> index a3179b2..b818acb 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -4255,11 +4255,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>> 		}
>>> 	}
>>> 	sbi->s_gdb_count = db_count;
>>> -	if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
>>> -		ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
>>> -		ret = -EFSCORRUPTED;
>>> -		goto failed_mount2;
>>> -	}
>>> 
>>> 	timer_setup(&sbi->s_err_report, print_daily_error_info, 0);
>>> 
>>> @@ -4401,6 +4396,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>> 	sbi->s_journal->j_commit_callback = ext4_journal_commit_callback;
>>> 
>>> no_journal:
>>> +	if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
>>> +		ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
>>> +		ret = -EFSCORRUPTED;
>>> +		goto failed_mount_wq;
>>> +	}
>>> +
>>> 	if (!test_opt(sb, NO_MBCACHE)) {
>>> 		sbi->s_ea_block_cache = ext4_xattr_create_cache();
>>> 		if (!sbi->s_ea_block_cache) {
>>> -- 
>>> 1.8.3.1
>> _______________________________________________
>> lustre-devel mailing list
>> lustre-devel@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Oleg Drokin July 22, 2019, 6:56 a.m. UTC | #3

> On Jul 22, 2019, at 2:46 AM, NeilBrown <neilb@suse.com> wrote:
> 
> On Mon, Jul 22 2019, Alexey Lyashkov wrote:
> 
>> Why?
>> Purpose of this patch is simple and don’t addressed a failover in general.
>> Crash can occurred in commit time - when journal _partially_ flushed to the FS. Checking any FS metadata in this time is wrong, because we have no guarantee to be consistence.
>> But checking an after journal replay is fine, it check have verification no corruption hit and FS is fine. 
> 
> If the corruption can occur in non-ldiskfs usage, and would be fixed by
> a journal replay, then yes - the patch looks like a good idea.
> 
> Possibly I misunderstood the source of the corruption... maybe if that
> could be made clearer in the commit message, that would help.

I think the argument is: wtf do we even look at metadata (or anything, really)
before journal replay.
The patch is actually good because it does move the read after the journal replay
though?


> 
>> 
>> 
>>> 22 июля 2019 г., в 8:29, NeilBrown <neilb@suse.com> написал(а):
>>> 
>>> On Sun, Jul 21 2019, James Simmons wrote:
>>> 
>>>> When ldiskfs run in failover mode whith read-only disk.
>>>> Part of allocation updates are lost and ldiskfs may fail
>>>> while mounting this is due to inconsistent state of
>>>> group-descriptor. Group-descriptor check is added after
>>>> journal replay.
>>> 
>>> I think this needs to be enabled by a mount option or super-block flag.
>>> 
>>> NeilBrown
>>> 
>>> 
>>>> 
>>>> Signed-off-by: James Simmons <jsimmons@infradead.org>
>>>> ---
>>>> fs/ext4/super.c | 11 ++++++-----
>>>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>>> 
>>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>>> index a3179b2..b818acb 100644
>>>> --- a/fs/ext4/super.c
>>>> +++ b/fs/ext4/super.c
>>>> @@ -4255,11 +4255,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>>> 		}
>>>> 	}
>>>> 	sbi->s_gdb_count = db_count;
>>>> -	if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
>>>> -		ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
>>>> -		ret = -EFSCORRUPTED;
>>>> -		goto failed_mount2;
>>>> -	}
>>>> 
>>>> 	timer_setup(&sbi->s_err_report, print_daily_error_info, 0);
>>>> 
>>>> @@ -4401,6 +4396,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>>> 	sbi->s_journal->j_commit_callback = ext4_journal_commit_callback;
>>>> 
>>>> no_journal:
>>>> +	if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
>>>> +		ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
>>>> +		ret = -EFSCORRUPTED;
>>>> +		goto failed_mount_wq;
>>>> +	}
>>>> +
>>>> 	if (!test_opt(sb, NO_MBCACHE)) {
>>>> 		sbi->s_ea_block_cache = ext4_xattr_create_cache();
>>>> 		if (!sbi->s_ea_block_cache) {
>>>> -- 
>>>> 1.8.3.1
>>> _______________________________________________
>>> lustre-devel mailing list
>>> lustre-devel@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Alexey Lyashkov July 22, 2019, 9:51 a.m. UTC | #4

> 22 июля 2019 г., в 9:56, Oleg Drokin <green@whamcloud.com> написал(а):
> 
> 
> 
>> On Jul 22, 2019, at 2:46 AM, NeilBrown <neilb@suse.com> wrote:
>> 
>> On Mon, Jul 22 2019, Alexey Lyashkov wrote:
>> 
>>> Why?
>>> Purpose of this patch is simple and don’t addressed a failover in general.
>>> Crash can occurred in commit time - when journal _partially_ flushed to the FS. Checking any FS metadata in this time is wrong, because we have no guarantee to be consistence.
>>> But checking an after journal replay is fine, it check have verification no corruption hit and FS is fine. 
>> 
>> If the corruption can occur in non-ldiskfs usage, and would be fixed by
>> a journal replay, then yes - the patch looks like a good idea.
Yes, bug should fixed by journal replay. BUT this check was run _before_ journal replay and patch move groups check _after_ journal replay done.


>> 
>> Possibly I misunderstood the source of the corruption... maybe if that
>> could be made clearer in the commit message, that would help.
> 
> I think the argument is: wtf do we even look at metadata (or anything, really)
> before journal replay.
> The patch is actually good because it does move the read after the journal replay
> though?

Oleg,

you are right. 
> 
> 
>> 
>>> 
>>> 
>>>> 22 июля 2019 г., в 8:29, NeilBrown <neilb@suse.com> написал(а):
>>>> 
>>>> On Sun, Jul 21 2019, James Simmons wrote:
>>>> 
>>>>> When ldiskfs run in failover mode whith read-only disk.
>>>>> Part of allocation updates are lost and ldiskfs may fail
>>>>> while mounting this is due to inconsistent state of
>>>>> group-descriptor. Group-descriptor check is added after
>>>>> journal replay.
>>>> 
>>>> I think this needs to be enabled by a mount option or super-block flag.
>>>> 
>>>> NeilBrown
>>>> 
>>>> 
>>>>> 
>>>>> Signed-off-by: James Simmons <jsimmons@infradead.org>
>>>>> ---
>>>>> fs/ext4/super.c | 11 ++++++-----
>>>>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>>>> 
>>>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>>>> index a3179b2..b818acb 100644
>>>>> --- a/fs/ext4/super.c
>>>>> +++ b/fs/ext4/super.c
>>>>> @@ -4255,11 +4255,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>>>> 		}
>>>>> 	}
>>>>> 	sbi->s_gdb_count = db_count;
>>>>> -	if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
>>>>> -		ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
>>>>> -		ret = -EFSCORRUPTED;
>>>>> -		goto failed_mount2;
>>>>> -	}
>>>>> 
>>>>> 	timer_setup(&sbi->s_err_report, print_daily_error_info, 0);
>>>>> 
>>>>> @@ -4401,6 +4396,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>>>> 	sbi->s_journal->j_commit_callback = ext4_journal_commit_callback;
>>>>> 
>>>>> no_journal:
>>>>> +	if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
>>>>> +		ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
>>>>> +		ret = -EFSCORRUPTED;
>>>>> +		goto failed_mount_wq;
>>>>> +	}
>>>>> +
>>>>> 	if (!test_opt(sb, NO_MBCACHE)) {
>>>>> 		sbi->s_ea_block_cache = ext4_xattr_create_cache();
>>>>> 		if (!sbi->s_ea_block_cache) {
>>>>> -- 
>>>>> 1.8.3.1
>>>> _______________________________________________
>>>> lustre-devel mailing list
>>>> lustre-devel@lists.lustre.org
>>>> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
> 
> _______________________________________________
> lustre-devel mailing list
> lustre-devel@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Andreas Dilger July 23, 2019, 1:57 a.m. UTC | #5

Actually, I think this patch would be OK to push upstream. 

Cheers, Andreas

> On Jul 21, 2019, at 23:29, NeilBrown <neilb@suse.com> wrote:
> 
>> On Sun, Jul 21 2019, James Simmons wrote:
>> 
>> When ldiskfs run in failover mode whith read-only disk.
>> Part of allocation updates are lost and ldiskfs may fail
>> while mounting this is due to inconsistent state of
>> group-descriptor. Group-descriptor check is added after
>> journal replay.
> 
> I think this needs to be enabled by a mount option or super-block flag.
> 
> NeilBrown
> 
> 
>> 
>> Signed-off-by: James Simmons <jsimmons@infradead.org>
>> ---
>> fs/ext4/super.c | 11 ++++++-----
>> 1 file changed, 6 insertions(+), 5 deletions(-)
>> 
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index a3179b2..b818acb 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -4255,11 +4255,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>        }
>>    }
>>    sbi->s_gdb_count = db_count;
>> -    if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
>> -        ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
>> -        ret = -EFSCORRUPTED;
>> -        goto failed_mount2;
>> -    }
>> 
>>    timer_setup(&sbi->s_err_report, print_daily_error_info, 0);
>> 
>> @@ -4401,6 +4396,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>    sbi->s_journal->j_commit_callback = ext4_journal_commit_callback;
>> 
>> no_journal:
>> +    if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
>> +        ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
>> +        ret = -EFSCORRUPTED;
>> +        goto failed_mount_wq;
>> +    }
>> +
>>    if (!test_opt(sb, NO_MBCACHE)) {
>>        sbi->s_ea_block_cache = ext4_xattr_create_cache();
>>        if (!sbi->s_ea_block_cache) {
>> -- 
>> 1.8.3.1

Oleg Drokin July 23, 2019, 2:01 a.m. UTC | #6

what I think needs to happen is a better description.

Something like:

In a crash group descriptors might not be written completely
in place that would lead to FS error message on subsequent mount.

Move the check to after journal replay to ensure we are
dealing with up to date (and hopefully correct) information
before declaring the FS as bad.

> On Jul 22, 2019, at 9:57 PM, Andreas Dilger <adilger@whamcloud.com> wrote:
> 
> Actually, I think this patch would be OK to push upstream. 
> 
> Cheers, Andreas
> 
>> On Jul 21, 2019, at 23:29, NeilBrown <neilb@suse.com> wrote:
>> 
>>> On Sun, Jul 21 2019, James Simmons wrote:
>>> 
>>> When ldiskfs run in failover mode whith read-only disk.
>>> Part of allocation updates are lost and ldiskfs may fail
>>> while mounting this is due to inconsistent state of
>>> group-descriptor. Group-descriptor check is added after
>>> journal replay.
>> 
>> I think this needs to be enabled by a mount option or super-block flag.
>> 
>> NeilBrown
>> 
>> 
>>> 
>>> Signed-off-by: James Simmons <jsimmons@infradead.org>
>>> ---
>>> fs/ext4/super.c | 11 ++++++-----
>>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>> index a3179b2..b818acb 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -4255,11 +4255,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>>       }
>>>   }
>>>   sbi->s_gdb_count = db_count;
>>> -    if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
>>> -        ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
>>> -        ret = -EFSCORRUPTED;
>>> -        goto failed_mount2;
>>> -    }
>>> 
>>>   timer_setup(&sbi->s_err_report, print_daily_error_info, 0);
>>> 
>>> @@ -4401,6 +4396,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>>   sbi->s_journal->j_commit_callback = ext4_journal_commit_callback;
>>> 
>>> no_journal:
>>> +    if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
>>> +        ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
>>> +        ret = -EFSCORRUPTED;
>>> +        goto failed_mount_wq;
>>> +    }
>>> +
>>>   if (!test_opt(sb, NO_MBCACHE)) {
>>>       sbi->s_ea_block_cache = ext4_xattr_create_cache();
>>>       if (!sbi->s_ea_block_cache) {
>>> -- 
>>> 1.8.3.1

[19/22] ext4: don't check before replay

Commit Message

Comments

Patch