[V2] ocfs2: don't clear bh uptodate for block read

Message ID	20181114000040.8154-1-junxiao.bi@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <ocfs2-devel-bounces@oss.oracle.com> From: Junxiao Bi <junxiao.bi@oracle.com> To: ocfs2-devel@oss.oracle.com Date: Wed, 14 Nov 2018 08:00:40 +0800 Message-Id: <20181114000040.8154-1-junxiao.bi@oracle.com> Subject: [Ocfs2-devel] [PATCH V2] ocfs2: don't clear bh uptodate for block read Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com
Series	[V2] ocfs2: don't clear bh uptodate for block read \| expand [V2] ocfs2: don't clear bh uptodate for block read

Junxiao Bi Nov. 14, 2018, midnight UTC

For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
and submit the io, second wait io done, last check whether bh uptodate,
if not return io error.

If two sync io for the same bh were issued, it could be the first io done
and set uptodate flag, but just before check that flag, the second io came
in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
will return IO error.

Indeed it's not necessary to clear uptodate flag, as the io end handler
end_buffer_read_sync() will set or clear it based on io succeed or failed.

The following message was found from a nfs server but the underlying
storage returned no error.

[4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
[4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
[4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
[4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
[4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5

Same issue in non sync read ocfs2_read_blocks(), fixed it as well.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
---

v1 -> v2:

  - fixed non sync read ocfs2_read_blocks() as well.

---
 fs/ocfs2/buffer_head_io.c | 2 --
 1 file changed, 2 deletions(-)

Changwei Ge Nov. 14, 2018, 1:42 a.m. UTC | #1

Hi Junxiao,

Actually, I think we still have another race around reading blocks methods.
Consider below scenario:

Two parallel reading IO towards the same block.
The first succeeds while the second fails.
The second will clear _UPTODATE_ state in end_buffer_read_sync() thus making the first one failed as well.

So better for us to not use generic end_buffer_read_sync() but ocfs2 own ones.
But it has nothing to do with this patch.
I am adding my Reviewed-by.

Thanks,
Changwei

On 2018/11/14 8:04, Junxiao Bi wrote:
> For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
> and submit the io, second wait io done, last check whether bh uptodate,
> if not return io error.
> 
> If two sync io for the same bh were issued, it could be the first io done
> and set uptodate flag, but just before check that flag, the second io came
> in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
> will return IO error.
> 
> Indeed it's not necessary to clear uptodate flag, as the io end handler
> end_buffer_read_sync() will set or clear it based on io succeed or failed.
> 
> The following message was found from a nfs server but the underlying
> storage returned no error.
> 
> [4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
> [4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
> [4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
> [4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
> [4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5
> 
> Same issue in non sync read ocfs2_read_blocks(), fixed it as well.
> 
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> Cc: Changwei Ge <ge.changwei@h3c.com>

Reviewed-by: Changwei Ge <ge.changwei@h3c.com>

> ---
> 
> v1 -> v2:
> 
>    - fixed non sync read ocfs2_read_blocks() as well.
> 
> ---
>   fs/ocfs2/buffer_head_io.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
> index 4ebbd57cbf84..f9b84f7a3e4b 100644
> --- a/fs/ocfs2/buffer_head_io.c
> +++ b/fs/ocfs2/buffer_head_io.c
> @@ -161,7 +161,6 @@ int ocfs2_read_blocks_sync(struct ocfs2_super *osb, u64 block,
>   #endif
>   		}
>   
> -		clear_buffer_uptodate(bh);
>   		get_bh(bh); /* for end_buffer_read_sync() */
>   		bh->b_end_io = end_buffer_read_sync;
>   		submit_bh(REQ_OP_READ, 0, bh);
> @@ -341,7 +340,6 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>   				continue;
>   			}
>   
> -			clear_buffer_uptodate(bh);
>   			get_bh(bh); /* for end_buffer_read_sync() */
>   			if (validate)
>   				set_buffer_needs_validate(bh);
>

Junxiao Bi Nov. 14, 2018, 3:15 a.m. UTC | #2

Hi Changwei,

Thank you for the review.

See comment inlined

On 11/14/18 9:42 AM, Changwei Ge wrote:
> Hi Junxiao,
>
> Actually, I think we still have another race around reading blocks methods.
> Consider below scenario:
>
> Two parallel reading IO towards the same block.
> The first succeeds while the second fails.
> The second will clear _UPTODATE_ state in end_buffer_read_sync() thus making the first one failed as well.
>
> So better for us to not use generic end_buffer_read_sync() but ocfs2 own ones.

Is it necessary to do this complicated things? Seemed it will not cause 
any difference, at last IO error will be returned for this very block.

thanks,

Junxiao.

> But it has nothing to do with this patch.
> I am adding my Reviewed-by.
>
> Thanks,
> Changwei
>
> On 2018/11/14 8:04, Junxiao Bi wrote:
>> For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
>> and submit the io, second wait io done, last check whether bh uptodate,
>> if not return io error.
>>
>> If two sync io for the same bh were issued, it could be the first io done
>> and set uptodate flag, but just before check that flag, the second io came
>> in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
>> will return IO error.
>>
>> Indeed it's not necessary to clear uptodate flag, as the io end handler
>> end_buffer_read_sync() will set or clear it based on io succeed or failed.
>>
>> The following message was found from a nfs server but the underlying
>> storage returned no error.
>>
>> [4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
>> [4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
>> [4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
>> [4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
>> [4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5
>>
>> Same issue in non sync read ocfs2_read_blocks(), fixed it as well.
>>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Cc: Changwei Ge <ge.changwei@h3c.com>
> Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
>
>> ---
>>
>> v1 -> v2:
>>
>>     - fixed non sync read ocfs2_read_blocks() as well.
>>
>> ---
>>    fs/ocfs2/buffer_head_io.c | 2 --
>>    1 file changed, 2 deletions(-)
>>
>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>> index 4ebbd57cbf84..f9b84f7a3e4b 100644
>> --- a/fs/ocfs2/buffer_head_io.c
>> +++ b/fs/ocfs2/buffer_head_io.c
>> @@ -161,7 +161,6 @@ int ocfs2_read_blocks_sync(struct ocfs2_super *osb, u64 block,
>>    #endif
>>    		}
>>    
>> -		clear_buffer_uptodate(bh);
>>    		get_bh(bh); /* for end_buffer_read_sync() */
>>    		bh->b_end_io = end_buffer_read_sync;
>>    		submit_bh(REQ_OP_READ, 0, bh);
>> @@ -341,7 +340,6 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>    				continue;
>>    			}
>>    
>> -			clear_buffer_uptodate(bh);
>>    			get_bh(bh); /* for end_buffer_read_sync() */
>>    			if (validate)
>>    				set_buffer_needs_validate(bh);
>>

Jiangyiwen Nov. 19, 2018, 3:05 a.m. UTC | #3

On 2018/11/14 8:00, Junxiao Bi wrote:
> For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
> and submit the io, second wait io done, last check whether bh uptodate,
> if not return io error.
> 
> If two sync io for the same bh were issued, it could be the first io done
> and set uptodate flag, but just before check that flag, the second io came
> in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
> will return IO error.
> 
> Indeed it's not necessary to clear uptodate flag, as the io end handler
> end_buffer_read_sync() will set or clear it based on io succeed or failed.
> 
> The following message was found from a nfs server but the underlying
> storage returned no error.
> 
> [4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
> [4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
> [4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
> [4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
> [4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5
> 
> Same issue in non sync read ocfs2_read_blocks(), fixed it as well.
> 
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> Cc: Changwei Ge <ge.changwei@h3c.com>
> ---
> 
> v1 -> v2:
> 
>   - fixed non sync read ocfs2_read_blocks() as well.
> 
> ---
>  fs/ocfs2/buffer_head_io.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
> index 4ebbd57cbf84..f9b84f7a3e4b 100644
> --- a/fs/ocfs2/buffer_head_io.c
> +++ b/fs/ocfs2/buffer_head_io.c
> @@ -161,7 +161,6 @@ int ocfs2_read_blocks_sync(struct ocfs2_super *osb, u64 block,
>  #endif
>  		}
>  
> -		clear_buffer_uptodate(bh);

If we should use mutex_lock to keep sync?

Thanks.

>  		get_bh(bh); /* for end_buffer_read_sync() */
>  		bh->b_end_io = end_buffer_read_sync;
>  		submit_bh(REQ_OP_READ, 0, bh);
> @@ -341,7 +340,6 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  				continue;
>  			}
>  
> -			clear_buffer_uptodate(bh);

Actually, this don't need to change, because we will add ocfs2_metadata_cache_io_lock() and
ensure only one thread to read buffer.

>  			get_bh(bh); /* for end_buffer_read_sync() */
>  			if (validate)
>  				set_buffer_needs_validate(bh);
>

Junxiao Bi Nov. 19, 2018, 5:46 a.m. UTC | #4

On 11/19/18 11:05 AM, jiangyiwen wrote:
> On 2018/11/14 8:00, Junxiao Bi wrote:
>> For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
>> and submit the io, second wait io done, last check whether bh uptodate,
>> if not return io error.
>>
>> If two sync io for the same bh were issued, it could be the first io done
>> and set uptodate flag, but just before check that flag, the second io came
>> in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
>> will return IO error.
>>
>> Indeed it's not necessary to clear uptodate flag, as the io end handler
>> end_buffer_read_sync() will set or clear it based on io succeed or failed.
>>
>> The following message was found from a nfs server but the underlying
>> storage returned no error.
>>
>> [4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
>> [4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
>> [4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
>> [4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
>> [4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5
>>
>> Same issue in non sync read ocfs2_read_blocks(), fixed it as well.
>>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Cc: Changwei Ge <ge.changwei@h3c.com>
>> ---
>>
>> v1 -> v2:
>>
>>    - fixed non sync read ocfs2_read_blocks() as well.
>>
>> ---
>>   fs/ocfs2/buffer_head_io.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>> index 4ebbd57cbf84..f9b84f7a3e4b 100644
>> --- a/fs/ocfs2/buffer_head_io.c
>> +++ b/fs/ocfs2/buffer_head_io.c
>> @@ -161,7 +161,6 @@ int ocfs2_read_blocks_sync(struct ocfs2_super *osb, u64 block,
>>   #endif
>>   		}
>>   
>> -		clear_buffer_uptodate(bh);
> If we should use mutex_lock to keep sync?
>
> Thanks.
As my last email in this thread mentioned, it will not cause difference. 
Here race is between processes accessing the same block. At last the io 
error will be return for this very block. I don't think we need bother 
using mutex to sync them.
>
>>   		get_bh(bh); /* for end_buffer_read_sync() */
>>   		bh->b_end_io = end_buffer_read_sync;
>>   		submit_bh(REQ_OP_READ, 0, bh);
>> @@ -341,7 +340,6 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   				continue;
>>   			}
>>   
>> -			clear_buffer_uptodate(bh);
> Actually, this don't need to change, because we will add ocfs2_metadata_cache_io_lock() and
> ensure only one thread to read buffer.

Not every block has this lock used, only inode and refcount using it.

Thanks,

Junxiao.

>
>>   			get_bh(bh); /* for end_buffer_read_sync() */
>>   			if (validate)
>>   				set_buffer_needs_validate(bh);
>>
>

Jiangyiwen Nov. 19, 2018, 6:14 a.m. UTC | #5

On 2018/11/19 13:46, Junxiao Bi wrote:
> 
> On 11/19/18 11:05 AM, jiangyiwen wrote:
>> On 2018/11/14 8:00, Junxiao Bi wrote:
>>> For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
>>> and submit the io, second wait io done, last check whether bh uptodate,
>>> if not return io error.
>>>
>>> If two sync io for the same bh were issued, it could be the first io done
>>> and set uptodate flag, but just before check that flag, the second io came
>>> in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
>>> will return IO error.
>>>
>>> Indeed it's not necessary to clear uptodate flag, as the io end handler
>>> end_buffer_read_sync() will set or clear it based on io succeed or failed.
>>>
>>> The following message was found from a nfs server but the underlying
>>> storage returned no error.
>>>
>>> [4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
>>> [4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
>>> [4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
>>> [4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
>>> [4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5
>>>
>>> Same issue in non sync read ocfs2_read_blocks(), fixed it as well.
>>>
>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>> Cc: Changwei Ge <ge.changwei@h3c.com>
>>> ---
>>>
>>> v1 -> v2:
>>>
>>>    - fixed non sync read ocfs2_read_blocks() as well.
>>>
>>> ---
>>>   fs/ocfs2/buffer_head_io.c | 2 --
>>>   1 file changed, 2 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>>> index 4ebbd57cbf84..f9b84f7a3e4b 100644
>>> --- a/fs/ocfs2/buffer_head_io.c
>>> +++ b/fs/ocfs2/buffer_head_io.c
>>> @@ -161,7 +161,6 @@ int ocfs2_read_blocks_sync(struct ocfs2_super *osb, u64 block,
>>>   #endif
>>>           }
>>>   -        clear_buffer_uptodate(bh);
>> If we should use mutex_lock to keep sync?
>>
>> Thanks.
> As my last email in this thread mentioned, it will not cause difference. Here race is between processes accessing the same block. At last the io error will be return for this very block. I don't think we need bother using mutex to sync them.

I have a question, why doesn't this scenario happen in
underlying storage?

Thanks.

>>
>>>           get_bh(bh); /* for end_buffer_read_sync() */
>>>           bh->b_end_io = end_buffer_read_sync;
>>>           submit_bh(REQ_OP_READ, 0, bh);
>>> @@ -341,7 +340,6 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>                   continue;
>>>               }
>>>   -            clear_buffer_uptodate(bh);
>> Actually, this don't need to change, because we will add ocfs2_metadata_cache_io_lock() and
>> ensure only one thread to read buffer.
> 
> Not every block has this lock used, only inode and refcount using it.
> 

Actually ci must be not empty, so ocfs2_read_block() will
be called sync for the same block.

> Thanks,
> 
> Junxiao.
> 
>>
>>>               get_bh(bh); /* for end_buffer_read_sync() */
>>>               if (validate)
>>>                   set_buffer_needs_validate(bh);
>>>
>>
> 
> .
>

Junxiao Bi Nov. 19, 2018, 6:34 a.m. UTC | #6

On 11/19/18 2:14 PM, jiangyiwen wrote:

> On 2018/11/19 13:46, Junxiao Bi wrote:
>> On 11/19/18 11:05 AM, jiangyiwen wrote:
>>> On 2018/11/14 8:00, Junxiao Bi wrote:
>>>> For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
>>>> and submit the io, second wait io done, last check whether bh uptodate,
>>>> if not return io error.
>>>>
>>>> If two sync io for the same bh were issued, it could be the first io done
>>>> and set uptodate flag, but just before check that flag, the second io came
>>>> in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
>>>> will return IO error.
>>>>
>>>> Indeed it's not necessary to clear uptodate flag, as the io end handler
>>>> end_buffer_read_sync() will set or clear it based on io succeed or failed.
>>>>
>>>> The following message was found from a nfs server but the underlying
>>>> storage returned no error.
>>>>
>>>> [4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
>>>> [4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
>>>> [4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
>>>> [4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
>>>> [4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5
>>>>
>>>> Same issue in non sync read ocfs2_read_blocks(), fixed it as well.
>>>>
>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Cc: Changwei Ge <ge.changwei@h3c.com>
>>>> ---
>>>>
>>>> v1 -> v2:
>>>>
>>>>     - fixed non sync read ocfs2_read_blocks() as well.
>>>>
>>>> ---
>>>>    fs/ocfs2/buffer_head_io.c | 2 --
>>>>    1 file changed, 2 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>>>> index 4ebbd57cbf84..f9b84f7a3e4b 100644
>>>> --- a/fs/ocfs2/buffer_head_io.c
>>>> +++ b/fs/ocfs2/buffer_head_io.c
>>>> @@ -161,7 +161,6 @@ int ocfs2_read_blocks_sync(struct ocfs2_super *osb, u64 block,
>>>>    #endif
>>>>            }
>>>>    -        clear_buffer_uptodate(bh);
>>> If we should use mutex_lock to keep sync?
>>>
>>> Thanks.
>> As my last email in this thread mentioned, it will not cause difference. Here race is between processes accessing the same block. At last the io error will be return for this very block. I don't think we need bother using mutex to sync them.
> I have a question, why doesn't this scenario happen in
> underlying storage?
didn't follow, could you detail your question?
>
> Thanks.
>
>>>>            get_bh(bh); /* for end_buffer_read_sync() */
>>>>            bh->b_end_io = end_buffer_read_sync;
>>>>            submit_bh(REQ_OP_READ, 0, bh);
>>>> @@ -341,7 +340,6 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>>                    continue;
>>>>                }
>>>>    -            clear_buffer_uptodate(bh);
>>> Actually, this don't need to change, because we will add ocfs2_metadata_cache_io_lock() and
>>> ensure only one thread to read buffer.
>> Not every block has this lock used, only inode and refcount using it.
>>
> Actually ci must be not empty, so ocfs2_read_block() will
> be called sync for the same block.

OK.  Any way this will not cause harm and more align with the sync read 
code.

Thanks,

Junxiao.

>
>> Thanks,
>>
>> Junxiao.
>>
>>>>                get_bh(bh); /* for end_buffer_read_sync() */
>>>>                if (validate)
>>>>                    set_buffer_needs_validate(bh);
>>>>
>> .
>>
>

Jiangyiwen Nov. 19, 2018, 7:01 a.m. UTC | #7

On 2018/11/19 14:34, Junxiao Bi wrote:
> On 11/19/18 2:14 PM, jiangyiwen wrote:
> 
>> On 2018/11/19 13:46, Junxiao Bi wrote:
>>> On 11/19/18 11:05 AM, jiangyiwen wrote:
>>>> On 2018/11/14 8:00, Junxiao Bi wrote:
>>>>> For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
>>>>> and submit the io, second wait io done, last check whether bh uptodate,
>>>>> if not return io error.
>>>>>
>>>>> If two sync io for the same bh were issued, it could be the first io done
>>>>> and set uptodate flag, but just before check that flag, the second io came
>>>>> in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
>>>>> will return IO error.
>>>>>
>>>>> Indeed it's not necessary to clear uptodate flag, as the io end handler
>>>>> end_buffer_read_sync() will set or clear it based on io succeed or failed.
>>>>>
>>>>> The following message was found from a nfs server but the underlying
>>>>> storage returned no error.
>>>>>
>>>>> [4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
>>>>> [4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
>>>>> [4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
>>>>> [4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
>>>>> [4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5
>>>>>
>>>>> Same issue in non sync read ocfs2_read_blocks(), fixed it as well.
>>>>>
>>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>

>>>>> Cc: Changwei Ge <ge.changwei@h3c.com>
>>>>> ---
>>>>>
>>>>> v1 -> v2:
>>>>>
>>>>>     - fixed non sync read ocfs2_read_blocks() as well.
>>>>>
>>>>> ---
>>>>>    fs/ocfs2/buffer_head_io.c | 2 --
>>>>>    1 file changed, 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>>>>> index 4ebbd57cbf84..f9b84f7a3e4b 100644
>>>>> --- a/fs/ocfs2/buffer_head_io.c
>>>>> +++ b/fs/ocfs2/buffer_head_io.c
>>>>> @@ -161,7 +161,6 @@ int ocfs2_read_blocks_sync(struct ocfs2_super *osb, u64 block,
>>>>>    #endif
>>>>>            }
>>>>>    -        clear_buffer_uptodate(bh);
>>>> If we should use mutex_lock to keep sync?
>>>>
>>>> Thanks.
>>> As my last email in this thread mentioned, it will not cause difference. Here race is between processes accessing the same block. At last the io error will be return for this very block. I don't think we need bother using mutex to sync them.
>> I have a question, why doesn't this scenario happen in
>> underlying storage?
> didn't follow, could you detail your question?

Sorry, It is my misunderstand, I think this
case only happened in the "nfs + ocfs2" before.

Then, there is no problem.

Thanks,
Yiwen.

>>
>> Thanks.
>>
>>>>>            get_bh(bh); /* for end_buffer_read_sync() */
>>>>>            bh->b_end_io = end_buffer_read_sync;
>>>>>            submit_bh(REQ_OP_READ, 0, bh);
>>>>> @@ -341,7 +340,6 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>>>                    continue;
>>>>>                }
>>>>>    -            clear_buffer_uptodate(bh);
>>>> Actually, this don't need to change, because we will add ocfs2_metadata_cache_io_lock() and
>>>> ensure only one thread to read buffer.
>>> Not every block has this lock used, only inode and refcount using it.
>>>
>> Actually ci must be not empty, so ocfs2_read_block() will
>> be called sync for the same block.
> 
> OK.  Any way this will not cause harm and more align with the sync read code.
> 
> Thanks,
> 
> Junxiao.
> 
>>
>>> Thanks,
>>>
>>> Junxiao.
>>>
>>>>>                get_bh(bh); /* for end_buffer_read_sync() */
>>>>>                if (validate)
>>>>>                    set_buffer_needs_validate(bh);
>>>>>
>>> .
>>>
>>
> 
> .
>

Junxiao Bi Nov. 19, 2018, 7:12 a.m. UTC | #8

On 11/19/18 3:01 PM, jiangyiwen wrote:

> On 2018/11/19 14:34, Junxiao Bi wrote:
>> On 11/19/18 2:14 PM, jiangyiwen wrote:
>>
>>> On 2018/11/19 13:46, Junxiao Bi wrote:
>>>> On 11/19/18 11:05 AM, jiangyiwen wrote:
>>>>> On 2018/11/14 8:00, Junxiao Bi wrote:
>>>>>> For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
>>>>>> and submit the io, second wait io done, last check whether bh uptodate,
>>>>>> if not return io error.
>>>>>>
>>>>>> If two sync io for the same bh were issued, it could be the first io done
>>>>>> and set uptodate flag, but just before check that flag, the second io came
>>>>>> in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
>>>>>> will return IO error.
>>>>>>
>>>>>> Indeed it's not necessary to clear uptodate flag, as the io end handler
>>>>>> end_buffer_read_sync() will set or clear it based on io succeed or failed.
>>>>>>
>>>>>> The following message was found from a nfs server but the underlying
>>>>>> storage returned no error.
>>>>>>
>>>>>> [4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
>>>>>> [4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
>>>>>> [4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
>>>>>> [4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
>>>>>> [4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5
>>>>>>
>>>>>> Same issue in non sync read ocfs2_read_blocks(), fixed it as well.
>>>>>>
>>>>>> Signed-off-by: Junxiao Bi<junxiao.bi@oracle.com>
> Reviewed-by: Yiwen Jiang<jiangyiwen@huawei.com>

Yiwen, many thanks for your review.


>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=windows-1252">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>On 11/19/18 3:01 PM, jiangyiwen wrote:<br>
    </p>
    <blockquote type="cite" cite="mid:5BF25FD8.8090802@huawei.com">
      <pre class="moz-quote-pre" wrap="">On 2018/11/19 14:34, Junxiao Bi wrote:
</pre>
      <blockquote type="cite" style="color: #000000;">
        <pre class="moz-quote-pre" wrap="">On 11/19/18 2:14 PM, jiangyiwen wrote:

</pre>
        <blockquote type="cite" style="color: #000000;">
          <pre class="moz-quote-pre" wrap="">On 2018/11/19 13:46, Junxiao Bi wrote:
</pre>
          <blockquote type="cite" style="color: #000000;">
            <pre class="moz-quote-pre" wrap="">On 11/19/18 11:05 AM, jiangyiwen wrote:
</pre>
            <blockquote type="cite" style="color: #000000;">
              <pre class="moz-quote-pre" wrap="">On 2018/11/14 8:00, Junxiao Bi wrote:
</pre>
              <blockquote type="cite" style="color: #000000;">
                <pre class="moz-quote-pre" wrap="">For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
and submit the io, second wait io done, last check whether bh uptodate,
if not return io error.

If two sync io for the same bh were issued, it could be the first io done
and set uptodate flag, but just before check that flag, the second io came
in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
will return IO error.

Indeed it's not necessary to clear uptodate flag, as the io end handler
end_buffer_read_sync() will set or clear it based on io succeed or failed.

The following message was found from a nfs server but the underlying
storage returned no error.

[4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
[4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
[4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
[4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
[4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5

Same issue in non sync read ocfs2_read_blocks(), fixed it as well.

Signed-off-by: Junxiao Bi <a class="moz-txt-link-rfc2396E" href="mailto:junxiao.bi@oracle.com" moz-do-not-send="true">&lt;junxiao.bi@oracle.com&gt;</a>
</pre>
              </blockquote>
            </blockquote>
          </blockquote>
        </blockquote>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">Reviewed-by: Yiwen Jiang <a class="moz-txt-link-rfc2396E" href="mailto:jiangyiwen@huawei.com" moz-do-not-send="true">&lt;jiangyiwen@huawei.com&gt;</a></pre>
    </blockquote>
    <p>Yiwen, many thanks for your review.</p>
    <p><br>
    </p>
    <blockquote type="cite" cite="mid:5BF25FD8.8090802@huawei.com">
      <pre class="moz-quote-pre" wrap="">

</pre>
    </blockquote>
  </body>
</html>

[V2] ocfs2: don't clear bh uptodate for block read

Commit Message

Comments

Patch