diff mbox series

ocfs2: checkpoint appending truncate log transaction before flushing

Message ID 1550116993-17084-1-git-send-email-ge.changwei@h3c.com (mailing list archive)
State New, archived
Headers show
Series ocfs2: checkpoint appending truncate log transaction before flushing | expand

Commit Message

Changwei Ge Feb. 14, 2019, 4:03 a.m. UTC
Appending truncate log(TA) and and flushing truncate log(TF) are
two separated transactions. They can be both committed but not
checkpointed. If crash occurs then, both two transaction will be
replayed with several already released to global bitmap clusters.
Then truncate log will be replayed resulting in cluster double free.

To reproduce this issue, just crash the host while punching hole to files.

Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
---
 fs/ocfs2/alloc.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

piaojun Feb. 14, 2019, 8:24 a.m. UTC | #1
Hi Changwei,

On 2019/2/14 12:03, Changwei Ge wrote:
> Appending truncate log(TA) and and flushing truncate log(TF) are
> two separated transactions. They can be both committed but not
> checkpointed. If crash occurs then, both two transaction will be
> replayed with several already released to global bitmap clusters.

Do you mean that both the two transactions will release cluster to
global bitmap? But I think the TA won't give back clusters to global
bitmap.

> Then truncate log will be replayed resulting in cluster double free.

Does this problem only cause some error log? As below:

ocfs2_replay_truncate_records
  ocfs2_free_clusters
    _ocfs2_free_clusters
      _ocfs2_free_suballoc_bits
        ocfs2_block_group_clear_bits
          "Trying to clear %u bits at offset %u in group descriptor"

Thanks,
Jun

> 
> To reproduce this issue, just crash the host while punching hole to files.
> 
> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
> ---
>  fs/ocfs2/alloc.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index d1cbb27..29bc777 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -6007,6 +6007,7 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>  	struct buffer_head *data_alloc_bh = NULL;
>  	struct ocfs2_dinode *di;
>  	struct ocfs2_truncate_log *tl;
> +	struct ocfs2_journal *journal = osb->journal;
>  
>  	BUG_ON(inode_trylock(tl_inode));
>  
> @@ -6027,6 +6028,20 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>  		goto out;
>  	}
>  
> +	/* Appending truncate log(TA) and and flushing truncate log(TF) are
> +	 * two separated transactions. They can be both committed but not
> +	 * checkpointed. If crash occurs then, both two transaction will be
> +	 * replayed with several already released to global bitmap clusters.
> +	 * Then truncate log will be replayed resulting in cluster double free.
> +	 */
> +	jbd2_journal_lock_updates(journal->j_journal);
> +	status = jbd2_journal_flush(journal->j_journal);
> +	jbd2_journal_unlock_updates(journal->j_journal);
> +	if (status < 0) {
> +		mlog_errno(status);
> +		goto out;
> +	}
> +
>  	data_alloc_inode = ocfs2_get_system_file_inode(osb,
>  						       GLOBAL_BITMAP_SYSTEM_INODE,
>  						       OCFS2_INVALID_SLOT);
>
Changwei Ge Feb. 14, 2019, 8:53 a.m. UTC | #2
Hi Jun,

Thanks for looking into this :-)

On 2019/2/14 16:24, piaojun wrote:
> Hi Changwei,
> 
> On 2019/2/14 12:03, Changwei Ge wrote:
>> Appending truncate log(TA) and and flushing truncate log(TF) are
>> two separated transactions. They can be both committed but not
>> checkpointed. If crash occurs then, both two transaction will be
>> replayed with several already released to global bitmap clusters.
> 
> Do you mean that both the two transactions will release cluster to
> global bitmap? But I think the TA won't give back clusters to global
> bitmap.
> 

No, I don't mean that both TA and TF are releasing clusters to global bitmap.

But consideration into clusters reclaim , clusters will first be recorded in truncate
log and then be returned to global bitmap, which involves TA and TF jdb2/transactions.

TA's job is to append cluster records to truncate log, by which we can overcome a potential space leak problem.
TF's job is to return clusters to global bitmap.

It's possible that TA and TF are both committed to JBD but sadly none of them is check-pointed.
So journal replaying need to replay both TA and TF during next mount.
Then there is a record residing in truncate log representing the already released cluster
which has been returned to global bitmap by replaying TF.

Now the double free shows up.


>> Then truncate log will be replayed resulting in cluster double free.
> 
> Does this problem only cause some error log? As below:
> 
> ocfs2_replay_truncate_records
>    ocfs2_free_clusters
>      _ocfs2_free_clusters
>        _ocfs2_free_suballoc_bits
>          ocfs2_block_group_clear_bits
>            "Trying to clear %u bits at offset %u in group descriptor"
> 

Exactly, when the issue occurs, it will be printed as above.

Thanks,
Changwei

> Thanks,
> Jun
> 
>>
>> To reproduce this issue, just crash the host while punching hole to files.
>>
>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>> ---
>>   fs/ocfs2/alloc.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>> index d1cbb27..29bc777 100644
>> --- a/fs/ocfs2/alloc.c
>> +++ b/fs/ocfs2/alloc.c
>> @@ -6007,6 +6007,7 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>   	struct buffer_head *data_alloc_bh = NULL;
>>   	struct ocfs2_dinode *di;
>>   	struct ocfs2_truncate_log *tl;
>> +	struct ocfs2_journal *journal = osb->journal;
>>   
>>   	BUG_ON(inode_trylock(tl_inode));
>>   
>> @@ -6027,6 +6028,20 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>   		goto out;
>>   	}
>>   
>> +	/* Appending truncate log(TA) and and flushing truncate log(TF) are
>> +	 * two separated transactions. They can be both committed but not
>> +	 * checkpointed. If crash occurs then, both two transaction will be
>> +	 * replayed with several already released to global bitmap clusters.
>> +	 * Then truncate log will be replayed resulting in cluster double free.
>> +	 */
>> +	jbd2_journal_lock_updates(journal->j_journal);
>> +	status = jbd2_journal_flush(journal->j_journal);
>> +	jbd2_journal_unlock_updates(journal->j_journal);
>> +	if (status < 0) {
>> +		mlog_errno(status);
>> +		goto out;
>> +	}
>> +
>>   	data_alloc_inode = ocfs2_get_system_file_inode(osb,
>>   						       GLOBAL_BITMAP_SYSTEM_INODE,
>>   						       OCFS2_INVALID_SLOT);
>>
>
piaojun Feb. 14, 2019, 10:06 a.m. UTC | #3
Hi Changwei,

On 2019/2/14 16:53, Changwei Ge wrote:
> Hi Jun,
> 
> Thanks for looking into this :-)
> 
> On 2019/2/14 16:24, piaojun wrote:
>> Hi Changwei,
>>
>> On 2019/2/14 12:03, Changwei Ge wrote:
>>> Appending truncate log(TA) and and flushing truncate log(TF) are
>>> two separated transactions. They can be both committed but not
>>> checkpointed. If crash occurs then, both two transaction will be
>>> replayed with several already released to global bitmap clusters.
>>
>> Do you mean that both the two transactions will release cluster to
>> global bitmap? But I think the TA won't give back clusters to global
>> bitmap.
>>
> 
> No, I don't mean that both TA and TF are releasing clusters to global bitmap.
> 
> But consideration into clusters reclaim , clusters will first be recorded in truncate
> log and then be returned to global bitmap, which involves TA and TF jdb2/transactions.
> 
> TA's job is to append cluster records to truncate log, by which we can overcome a potential space leak problem.
> TF's job is to return clusters to global bitmap.
> 
> It's possible that TA and TF are both committed to JBD but sadly none of them is check-pointed.
> So journal replaying need to replay both TA and TF during next mount.
> Then there is a record residing in truncate log representing the already released cluster
> which has been returned to global bitmap by replaying TF.
> 
> Now the double free shows up.

Do you mean that when mount again, truncate log recovery will find
record residing in truncate log which already released? But after the
TF transaction replayed during mount, truncate log won't be recovered
as tl->tl_used is less than tl->tl_count.

Thanks,
Jun

> 
> 
>>> Then truncate log will be replayed resulting in cluster double free.
>>
>> Does this problem only cause some error log? As below:
>>
>> ocfs2_replay_truncate_records
>>    ocfs2_free_clusters
>>      _ocfs2_free_clusters
>>        _ocfs2_free_suballoc_bits
>>          ocfs2_block_group_clear_bits
>>            "Trying to clear %u bits at offset %u in group descriptor"
>>
> 
> Exactly, when the issue occurs, it will be printed as above.
> 
> Thanks,
> Changwei
> 
>> Thanks,
>> Jun
>>
>>>
>>> To reproduce this issue, just crash the host while punching hole to files.
>>>
>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>> ---
>>>   fs/ocfs2/alloc.c | 15 +++++++++++++++
>>>   1 file changed, 15 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>>> index d1cbb27..29bc777 100644
>>> --- a/fs/ocfs2/alloc.c
>>> +++ b/fs/ocfs2/alloc.c
>>> @@ -6007,6 +6007,7 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>>   	struct buffer_head *data_alloc_bh = NULL;
>>>   	struct ocfs2_dinode *di;
>>>   	struct ocfs2_truncate_log *tl;
>>> +	struct ocfs2_journal *journal = osb->journal;
>>>   
>>>   	BUG_ON(inode_trylock(tl_inode));
>>>   
>>> @@ -6027,6 +6028,20 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>>   		goto out;
>>>   	}
>>>   
>>> +	/* Appending truncate log(TA) and and flushing truncate log(TF) are
>>> +	 * two separated transactions. They can be both committed but not
>>> +	 * checkpointed. If crash occurs then, both two transaction will be
>>> +	 * replayed with several already released to global bitmap clusters.
>>> +	 * Then truncate log will be replayed resulting in cluster double free.
>>> +	 */
>>> +	jbd2_journal_lock_updates(journal->j_journal);
>>> +	status = jbd2_journal_flush(journal->j_journal);
>>> +	jbd2_journal_unlock_updates(journal->j_journal);
>>> +	if (status < 0) {
>>> +		mlog_errno(status);
>>> +		goto out;
>>> +	}
>>> +
>>>   	data_alloc_inode = ocfs2_get_system_file_inode(osb,
>>>   						       GLOBAL_BITMAP_SYSTEM_INODE,
>>>   						       OCFS2_INVALID_SLOT);
>>>
>>
> .
>
Changwei Ge Feb. 14, 2019, 10:23 a.m. UTC | #4
On 2019/2/14 18:06, piaojun wrote:
> Hi Changwei,
> 
> On 2019/2/14 16:53, Changwei Ge wrote:
>> Hi Jun,
>>
>> Thanks for looking into this :-)
>>
>> On 2019/2/14 16:24, piaojun wrote:
>>> Hi Changwei,
>>>
>>> On 2019/2/14 12:03, Changwei Ge wrote:
>>>> Appending truncate log(TA) and and flushing truncate log(TF) are
>>>> two separated transactions. They can be both committed but not
>>>> checkpointed. If crash occurs then, both two transaction will be
>>>> replayed with several already released to global bitmap clusters.
>>>
>>> Do you mean that both the two transactions will release cluster to
>>> global bitmap? But I think the TA won't give back clusters to global
>>> bitmap.
>>>
>>
>> No, I don't mean that both TA and TF are releasing clusters to global bitmap.
>>
>> But consideration into clusters reclaim , clusters will first be recorded in truncate
>> log and then be returned to global bitmap, which involves TA and TF jdb2/transactions.
>>
>> TA's job is to append cluster records to truncate log, by which we can overcome a potential space leak problem.
>> TF's job is to return clusters to global bitmap.
>>
>> It's possible that TA and TF are both committed to JBD but sadly none of them is check-pointed.
>> So journal replaying need to replay both TA and TF during next mount.
>> Then there is a record residing in truncate log representing the already released cluster
>> which has been returned to global bitmap by replaying TF.
>>
>> Now the double free shows up.
> 
> Do you mean that when mount again, truncate log recovery will find
> record residing in truncate log which already released? But after the
> TF transaction replayed during mount, truncate log won't be recovered
> as tl->tl_used is less than tl->tl_count.

Um, not just truncate log relaying but also involves a jbd2 transaction recording its last append operation.
That operation may meet the flush condition (ocfs2_truncate_log_needs_flush)

Thanks,
Changwei

> 
> Thanks,
> Jun
> 
>>
>>
>>>> Then truncate log will be replayed resulting in cluster double free.
>>>
>>> Does this problem only cause some error log? As below:
>>>
>>> ocfs2_replay_truncate_records
>>>     ocfs2_free_clusters
>>>       _ocfs2_free_clusters
>>>         _ocfs2_free_suballoc_bits
>>>           ocfs2_block_group_clear_bits
>>>             "Trying to clear %u bits at offset %u in group descriptor"
>>>
>>
>> Exactly, when the issue occurs, it will be printed as above.
>>
>> Thanks,
>> Changwei
>>
>>> Thanks,
>>> Jun
>>>
>>>>
>>>> To reproduce this issue, just crash the host while punching hole to files.
>>>>
>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>> ---
>>>>    fs/ocfs2/alloc.c | 15 +++++++++++++++
>>>>    1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>>>> index d1cbb27..29bc777 100644
>>>> --- a/fs/ocfs2/alloc.c
>>>> +++ b/fs/ocfs2/alloc.c
>>>> @@ -6007,6 +6007,7 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>>>    	struct buffer_head *data_alloc_bh = NULL;
>>>>    	struct ocfs2_dinode *di;
>>>>    	struct ocfs2_truncate_log *tl;
>>>> +	struct ocfs2_journal *journal = osb->journal;
>>>>    
>>>>    	BUG_ON(inode_trylock(tl_inode));
>>>>    
>>>> @@ -6027,6 +6028,20 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>>>    		goto out;
>>>>    	}
>>>>    
>>>> +	/* Appending truncate log(TA) and and flushing truncate log(TF) are
>>>> +	 * two separated transactions. They can be both committed but not
>>>> +	 * checkpointed. If crash occurs then, both two transaction will be
>>>> +	 * replayed with several already released to global bitmap clusters.
>>>> +	 * Then truncate log will be replayed resulting in cluster double free.
>>>> +	 */
>>>> +	jbd2_journal_lock_updates(journal->j_journal);
>>>> +	status = jbd2_journal_flush(journal->j_journal);
>>>> +	jbd2_journal_unlock_updates(journal->j_journal);
>>>> +	if (status < 0) {
>>>> +		mlog_errno(status);
>>>> +		goto out;
>>>> +	}
>>>> +
>>>>    	data_alloc_inode = ocfs2_get_system_file_inode(osb,
>>>>    						       GLOBAL_BITMAP_SYSTEM_INODE,
>>>>    						       OCFS2_INVALID_SLOT);
>>>>
>>>
>> .
>>
>
Changwei Ge Feb. 15, 2019, 8:27 a.m. UTC | #5
Hi Jun,

Do you have any other question, advise or concern?
I am expecting an explicit feedback(ack/nack) if you already understand the problem and my way fixing it.

Thanks,
Changwei

On 2019/2/14 18:25, Changwei Ge wrote:
> On 2019/2/14 18:06, piaojun wrote:
>> Hi Changwei,
>>
>> On 2019/2/14 16:53, Changwei Ge wrote:
>>> Hi Jun,
>>>
>>> Thanks for looking into this :-)
>>>
>>> On 2019/2/14 16:24, piaojun wrote:
>>>> Hi Changwei,
>>>>
>>>> On 2019/2/14 12:03, Changwei Ge wrote:
>>>>> Appending truncate log(TA) and and flushing truncate log(TF) are
>>>>> two separated transactions. They can be both committed but not
>>>>> checkpointed. If crash occurs then, both two transaction will be
>>>>> replayed with several already released to global bitmap clusters.
>>>>
>>>> Do you mean that both the two transactions will release cluster to
>>>> global bitmap? But I think the TA won't give back clusters to global
>>>> bitmap.
>>>>
>>>
>>> No, I don't mean that both TA and TF are releasing clusters to global bitmap.
>>>
>>> But consideration into clusters reclaim , clusters will first be recorded in truncate
>>> log and then be returned to global bitmap, which involves TA and TF jdb2/transactions.
>>>
>>> TA's job is to append cluster records to truncate log, by which we can overcome a potential space leak problem.
>>> TF's job is to return clusters to global bitmap.
>>>
>>> It's possible that TA and TF are both committed to JBD but sadly none of them is check-pointed.
>>> So journal replaying need to replay both TA and TF during next mount.
>>> Then there is a record residing in truncate log representing the already released cluster
>>> which has been returned to global bitmap by replaying TF.
>>>
>>> Now the double free shows up.
>>
>> Do you mean that when mount again, truncate log recovery will find
>> record residing in truncate log which already released? But after the
>> TF transaction replayed during mount, truncate log won't be recovered
>> as tl->tl_used is less than tl->tl_count.
> 
> Um, not just truncate log relaying but also involves a jbd2 transaction recording its last append operation.
> That operation may meet the flush condition (ocfs2_truncate_log_needs_flush)
> 
> Thanks,
> Changwei
> 
>>
>> Thanks,
>> Jun
>>
>>>
>>>
>>>>> Then truncate log will be replayed resulting in cluster double free.
>>>>
>>>> Does this problem only cause some error log? As below:
>>>>
>>>> ocfs2_replay_truncate_records
>>>>      ocfs2_free_clusters
>>>>        _ocfs2_free_clusters
>>>>          _ocfs2_free_suballoc_bits
>>>>            ocfs2_block_group_clear_bits
>>>>              "Trying to clear %u bits at offset %u in group descriptor"
>>>>
>>>
>>> Exactly, when the issue occurs, it will be printed as above.
>>>
>>> Thanks,
>>> Changwei
>>>
>>>> Thanks,
>>>> Jun
>>>>
>>>>>
>>>>> To reproduce this issue, just crash the host while punching hole to files.
>>>>>
>>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>>> ---
>>>>>     fs/ocfs2/alloc.c | 15 +++++++++++++++
>>>>>     1 file changed, 15 insertions(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>>>>> index d1cbb27..29bc777 100644
>>>>> --- a/fs/ocfs2/alloc.c
>>>>> +++ b/fs/ocfs2/alloc.c
>>>>> @@ -6007,6 +6007,7 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>>>>     	struct buffer_head *data_alloc_bh = NULL;
>>>>>     	struct ocfs2_dinode *di;
>>>>>     	struct ocfs2_truncate_log *tl;
>>>>> +	struct ocfs2_journal *journal = osb->journal;
>>>>>     
>>>>>     	BUG_ON(inode_trylock(tl_inode));
>>>>>     
>>>>> @@ -6027,6 +6028,20 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>>>>     		goto out;
>>>>>     	}
>>>>>     
>>>>> +	/* Appending truncate log(TA) and and flushing truncate log(TF) are
>>>>> +	 * two separated transactions. They can be both committed but not
>>>>> +	 * checkpointed. If crash occurs then, both two transaction will be
>>>>> +	 * replayed with several already released to global bitmap clusters.
>>>>> +	 * Then truncate log will be replayed resulting in cluster double free.
>>>>> +	 */
>>>>> +	jbd2_journal_lock_updates(journal->j_journal);
>>>>> +	status = jbd2_journal_flush(journal->j_journal);
>>>>> +	jbd2_journal_unlock_updates(journal->j_journal);
>>>>> +	if (status < 0) {
>>>>> +		mlog_errno(status);
>>>>> +		goto out;
>>>>> +	}
>>>>> +
>>>>>     	data_alloc_inode = ocfs2_get_system_file_inode(osb,
>>>>>     						       GLOBAL_BITMAP_SYSTEM_INODE,
>>>>>     						       OCFS2_INVALID_SLOT);
>>>>>
>>>>
>>> .
>>>
>>
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
piaojun Feb. 15, 2019, 9:21 a.m. UTC | #6
Hi Changwei,

I just need more time to review this.

Thanks,
Jun

On 2019/2/15 16:27, Changwei Ge wrote:
> Hi Jun,
> 
> Do you have any other question, advise or concern?
> I am expecting an explicit feedback(ack/nack) if you already understand the problem and my way fixing it.
> 
> Thanks,
> Changwei
> 
> On 2019/2/14 18:25, Changwei Ge wrote:
>> On 2019/2/14 18:06, piaojun wrote:
>>> Hi Changwei,
>>>
>>> On 2019/2/14 16:53, Changwei Ge wrote:
>>>> Hi Jun,
>>>>
>>>> Thanks for looking into this :-)
>>>>
>>>> On 2019/2/14 16:24, piaojun wrote:
>>>>> Hi Changwei,
>>>>>
>>>>> On 2019/2/14 12:03, Changwei Ge wrote:
>>>>>> Appending truncate log(TA) and and flushing truncate log(TF) are
>>>>>> two separated transactions. They can be both committed but not
>>>>>> checkpointed. If crash occurs then, both two transaction will be
>>>>>> replayed with several already released to global bitmap clusters.
>>>>>
>>>>> Do you mean that both the two transactions will release cluster to
>>>>> global bitmap? But I think the TA won't give back clusters to global
>>>>> bitmap.
>>>>>
>>>>
>>>> No, I don't mean that both TA and TF are releasing clusters to global bitmap.
>>>>
>>>> But consideration into clusters reclaim , clusters will first be recorded in truncate
>>>> log and then be returned to global bitmap, which involves TA and TF jdb2/transactions.
>>>>
>>>> TA's job is to append cluster records to truncate log, by which we can overcome a potential space leak problem.
>>>> TF's job is to return clusters to global bitmap.
>>>>
>>>> It's possible that TA and TF are both committed to JBD but sadly none of them is check-pointed.
>>>> So journal replaying need to replay both TA and TF during next mount.
>>>> Then there is a record residing in truncate log representing the already released cluster
>>>> which has been returned to global bitmap by replaying TF.
>>>>
>>>> Now the double free shows up.
>>>
>>> Do you mean that when mount again, truncate log recovery will find
>>> record residing in truncate log which already released? But after the
>>> TF transaction replayed during mount, truncate log won't be recovered
>>> as tl->tl_used is less than tl->tl_count.
>>
>> Um, not just truncate log relaying but also involves a jbd2 transaction recording its last append operation.
>> That operation may meet the flush condition (ocfs2_truncate_log_needs_flush)
>>
>> Thanks,
>> Changwei
>>
>>>
>>> Thanks,
>>> Jun
>>>
>>>>
>>>>
>>>>>> Then truncate log will be replayed resulting in cluster double free.
>>>>>
>>>>> Does this problem only cause some error log? As below:
>>>>>
>>>>> ocfs2_replay_truncate_records
>>>>>      ocfs2_free_clusters
>>>>>        _ocfs2_free_clusters
>>>>>          _ocfs2_free_suballoc_bits
>>>>>            ocfs2_block_group_clear_bits
>>>>>              "Trying to clear %u bits at offset %u in group descriptor"
>>>>>
>>>>
>>>> Exactly, when the issue occurs, it will be printed as above.
>>>>
>>>> Thanks,
>>>> Changwei
>>>>
>>>>> Thanks,
>>>>> Jun
>>>>>
>>>>>>
>>>>>> To reproduce this issue, just crash the host while punching hole to files.
>>>>>>
>>>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>>>> ---
>>>>>>     fs/ocfs2/alloc.c | 15 +++++++++++++++
>>>>>>     1 file changed, 15 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>>>>>> index d1cbb27..29bc777 100644
>>>>>> --- a/fs/ocfs2/alloc.c
>>>>>> +++ b/fs/ocfs2/alloc.c
>>>>>> @@ -6007,6 +6007,7 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>>>>>     	struct buffer_head *data_alloc_bh = NULL;
>>>>>>     	struct ocfs2_dinode *di;
>>>>>>     	struct ocfs2_truncate_log *tl;
>>>>>> +	struct ocfs2_journal *journal = osb->journal;
>>>>>>     
>>>>>>     	BUG_ON(inode_trylock(tl_inode));
>>>>>>     
>>>>>> @@ -6027,6 +6028,20 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>>>>>     		goto out;
>>>>>>     	}
>>>>>>     
>>>>>> +	/* Appending truncate log(TA) and and flushing truncate log(TF) are
>>>>>> +	 * two separated transactions. They can be both committed but not
>>>>>> +	 * checkpointed. If crash occurs then, both two transaction will be
>>>>>> +	 * replayed with several already released to global bitmap clusters.
>>>>>> +	 * Then truncate log will be replayed resulting in cluster double free.
>>>>>> +	 */
>>>>>> +	jbd2_journal_lock_updates(journal->j_journal);
>>>>>> +	status = jbd2_journal_flush(journal->j_journal);
>>>>>> +	jbd2_journal_unlock_updates(journal->j_journal);
>>>>>> +	if (status < 0) {
>>>>>> +		mlog_errno(status);
>>>>>> +		goto out;
>>>>>> +	}
>>>>>> +
>>>>>>     	data_alloc_inode = ocfs2_get_system_file_inode(osb,
>>>>>>     						       GLOBAL_BITMAP_SYSTEM_INODE,
>>>>>>     						       OCFS2_INVALID_SLOT);
>>>>>>
>>>>>
>>>> .
>>>>
>>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
> .
>
Joseph Qi Sept. 16, 2019, 1:41 a.m. UTC | #7
On 19/2/14 12:03, Changwei Ge wrote:
> Appending truncate log(TA) and and flushing truncate log(TF) are
> two separated transactions. They can be both committed but not
> checkpointed. If crash occurs then, both two transaction will be
> replayed with several already released to global bitmap clusters.
> Then truncate log will be replayed resulting in cluster double free.
> 
> To reproduce this issue, just crash the host while punching hole to files.
> 
> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>

Looks good to me.
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>

> ---
>  fs/ocfs2/alloc.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index d1cbb27..29bc777 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -6007,6 +6007,7 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>  	struct buffer_head *data_alloc_bh = NULL;
>  	struct ocfs2_dinode *di;
>  	struct ocfs2_truncate_log *tl;
> +	struct ocfs2_journal *journal = osb->journal;
>  
>  	BUG_ON(inode_trylock(tl_inode));
>  
> @@ -6027,6 +6028,20 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>  		goto out;
>  	}
>  
> +	/* Appending truncate log(TA) and and flushing truncate log(TF) are
> +	 * two separated transactions. They can be both committed but not
> +	 * checkpointed. If crash occurs then, both two transaction will be
> +	 * replayed with several already released to global bitmap clusters.
> +	 * Then truncate log will be replayed resulting in cluster double free.
> +	 */
> +	jbd2_journal_lock_updates(journal->j_journal);
> +	status = jbd2_journal_flush(journal->j_journal);
> +	jbd2_journal_unlock_updates(journal->j_journal);
> +	if (status < 0) {
> +		mlog_errno(status);
> +		goto out;
> +	}
> +
>  	data_alloc_inode = ocfs2_get_system_file_inode(osb,
>  						       GLOBAL_BITMAP_SYSTEM_INODE,
>  						       OCFS2_INVALID_SLOT);
>
diff mbox series

Patch

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index d1cbb27..29bc777 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -6007,6 +6007,7 @@  int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
 	struct buffer_head *data_alloc_bh = NULL;
 	struct ocfs2_dinode *di;
 	struct ocfs2_truncate_log *tl;
+	struct ocfs2_journal *journal = osb->journal;
 
 	BUG_ON(inode_trylock(tl_inode));
 
@@ -6027,6 +6028,20 @@  int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
 		goto out;
 	}
 
+	/* Appending truncate log(TA) and and flushing truncate log(TF) are
+	 * two separated transactions. They can be both committed but not
+	 * checkpointed. If crash occurs then, both two transaction will be
+	 * replayed with several already released to global bitmap clusters.
+	 * Then truncate log will be replayed resulting in cluster double free.
+	 */
+	jbd2_journal_lock_updates(journal->j_journal);
+	status = jbd2_journal_flush(journal->j_journal);
+	jbd2_journal_unlock_updates(journal->j_journal);
+	if (status < 0) {
+		mlog_errno(status);
+		goto out;
+	}
+
 	data_alloc_inode = ocfs2_get_system_file_inode(osb,
 						       GLOBAL_BITMAP_SYSTEM_INODE,
 						       OCFS2_INVALID_SLOT);