diff mbox

ocfs2: don't evaluate buffer head to NULL managed by caller

Message ID 1522289162-31693-1-git-send-email-ge.changwei@h3c.com (mailing list archive)
State New, archived
Headers show

Commit Message

Changwei Ge March 29, 2018, 2:06 a.m. UTC
ocfs2_read_blocks() is used to read several blocks from disk.
Currently, the input argument *bhs* can be NULL or NOT. It depends on
the caller's behavior. If the function fails in reading blocks from
disk, the corresponding bh will be assigned to NULL and put.

Obviously, above process for non-NULL input bh is not appropriate.
Because the caller doesn't even know its bhs are put and re-assigned.

If buffer head is managed by caller, ocfs2_read_blocks should not
evaluate it to NULL. It will cause caller accessing illegal memory,
thus crash.

Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
---
 fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

Comments

Gang He March 29, 2018, 2:35 a.m. UTC | #1
Hello Changwei,


Do you have the related crash backtrace?
Maybe I feel that new adding check is not necessary.
since the below code has make sure all buffer head is NOT NULL before reading block.
216         ocfs2_metadata_cache_io_lock(ci);
217         for (i = 0 ; i < nr ; i++) {
218                 if (bhs[i] == NULL) {
219                         bhs[i] = sb_getblk(sb, block++);   <<= here
220                         if (bhs[i] == NULL) {
221                                 ocfs2_metadata_cache_io_unlock(ci);
222                                 status = -ENOMEM;
223                                 mlog_errno(status);
224                                 goto bail;
225                         }
226                 }
227                 bh = bhs[i];


Thanks
Gang 


>>> 
> ocfs2_read_blocks() is used to read several blocks from disk.
> Currently, the input argument *bhs* can be NULL or NOT. It depends on
> the caller's behavior. If the function fails in reading blocks from
> disk, the corresponding bh will be assigned to NULL and put.
> 
> Obviously, above process for non-NULL input bh is not appropriate.
> Because the caller doesn't even know its bhs are put and re-assigned.
> 
> If buffer head is managed by caller, ocfs2_read_blocks should not
> evaluate it to NULL. It will cause caller accessing illegal memory,
> thus crash.
> 
> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
> ---
>  fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
> index d9ebe11..17329b6 100644
> --- a/fs/ocfs2/buffer_head_io.c
> +++ b/fs/ocfs2/buffer_head_io.c
> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 
> block, int nr,
>  	int i, ignore_cache = 0;
>  	struct buffer_head *bh;
>  	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
> +	int new_bh = 0;
>  
>  	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>  
> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 
> block, int nr,
>  		goto bail;
>  	}
>  
> +	/* Use below trick to check if all bhs are NULL or assigned.
> +	 * Basically, we hope all bhs are consistent so that we can
> +	 * handle exception easily.
> +	 */
> +	new_bh = (bhs[0] == NULL);
> +	for (i = 1 ; i < nr ; i++) {
> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
> +			WARN(1, "Not all bhs are consistent\n");
> +			break;
> +		}
> +	}
> +
>  	ocfs2_metadata_cache_io_lock(ci);
>  	for (i = 0 ; i < nr ; i++) {
>  		if (bhs[i] == NULL) {
> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 
> block, int nr,
>  		if (!(flags & OCFS2_BH_READAHEAD)) {
>  			if (status) {
>  				/* Clear the rest of the buffers on error */
> -				put_bh(bh);
> -				bhs[i] = NULL;
> +				if (new_bh) {
> +					put_bh(bh);
> +					bhs[i] = NULL;
> +				}
>  				continue;
>  			}
>  			/* We know this can't have changed as we hold the
> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 
> block, int nr,
>  				 * for this bh as it's not marked locally
>  				 * uptodate. */
>  				status = -EIO;
> -				put_bh(bh);
> -				bhs[i] = NULL;
> +				if (new_bh) {
> +					put_bh(bh);
> +					bhs[i] = NULL;
> +				}
>  				continue;
>  			}
>  
> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 
> block, int nr,
>  				clear_buffer_needs_validate(bh);
>  				status = validate(sb, bh);
>  				if (status) {
> -					put_bh(bh);
> -					bhs[i] = NULL;
> +					if (new_bh) {
> +						put_bh(bh);
> +						bhs[i] = NULL;
> +					}
>  					continue;
>  				}
>  			}
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com 
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Changwei Ge March 29, 2018, 3:01 a.m. UTC | #2
Hi Gang,

On 2018/3/29 10:36, Gang He wrote:
> Hello Changwei,
> 
> 
> Do you have the related crash backtrace?
This patch has been pending in my tree for quite a long time and sadly I can't 
find the back trace right now. But we can still find the risk by reviewing 
related code. :)

> Maybe I feel that new adding check is not necessary.

Very true, but the check I add is for debug purpose.
We can see that there are many places calling ocfs2_read_blocks(), some of them 
are passing only one bh while others are not.
In order to handle potential exception easily, it's better for callers to pass 
bhs which are all null or assigned. So I add that trick to tell if some callers 
are doing stupid things.

Thanks,
Changwei

> since the below code has make sure all buffer head is NOT NULL before reading block.
> 216         ocfs2_metadata_cache_io_lock(ci);
> 217         for (i = 0 ; i < nr ; i++) {
> 218                 if (bhs[i] == NULL) {
> 219                         bhs[i] = sb_getblk(sb, block++);   <<= here
> 220                         if (bhs[i] == NULL) {
> 221                                 ocfs2_metadata_cache_io_unlock(ci);
> 222                                 status = -ENOMEM;
> 223                                 mlog_errno(status);
> 224                                 goto bail;
> 225                         }
> 226                 }
> 227                 bh = bhs[i];
> 
> 
> Thanks
> Gang
> 
> 
>>>>
>> ocfs2_read_blocks() is used to read several blocks from disk.
>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>> the caller's behavior. If the function fails in reading blocks from
>> disk, the corresponding bh will be assigned to NULL and put.
>>
>> Obviously, above process for non-NULL input bh is not appropriate.
>> Because the caller doesn't even know its bhs are put and re-assigned.
>>
>> If buffer head is managed by caller, ocfs2_read_blocks should not
>> evaluate it to NULL. It will cause caller accessing illegal memory,
>> thus crash.
>>
>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>> ---
>>   fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>   1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>> index d9ebe11..17329b6 100644
>> --- a/fs/ocfs2/buffer_head_io.c
>> +++ b/fs/ocfs2/buffer_head_io.c
>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>> block, int nr,
>>   	int i, ignore_cache = 0;
>>   	struct buffer_head *bh;
>>   	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>> +	int new_bh = 0;
>>   
>>   	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>   
>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>> block, int nr,
>>   		goto bail;
>>   	}
>>   
>> +	/* Use below trick to check if all bhs are NULL or assigned.
>> +	 * Basically, we hope all bhs are consistent so that we can
>> +	 * handle exception easily.
>> +	 */
>> +	new_bh = (bhs[0] == NULL);
>> +	for (i = 1 ; i < nr ; i++) {
>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>> +			WARN(1, "Not all bhs are consistent\n");
>> +			break;
>> +		}
>> +	}
>> +
>>   	ocfs2_metadata_cache_io_lock(ci);
>>   	for (i = 0 ; i < nr ; i++) {
>>   		if (bhs[i] == NULL) {
>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>> block, int nr,
>>   		if (!(flags & OCFS2_BH_READAHEAD)) {
>>   			if (status) {
>>   				/* Clear the rest of the buffers on error */
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
>>   				continue;
>>   			}
>>   			/* We know this can't have changed as we hold the
>> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>> block, int nr,
>>   				 * for this bh as it's not marked locally
>>   				 * uptodate. */
>>   				status = -EIO;
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
>>   				continue;
>>   			}
>>   
>> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>> block, int nr,
>>   				clear_buffer_needs_validate(bh);
>>   				status = validate(sb, bh);
>>   				if (status) {
>> -					put_bh(bh);
>> -					bhs[i] = NULL;
>> +					if (new_bh) {
>> +						put_bh(bh);
>> +						bhs[i] = NULL;
>> +					}
>>   					continue;
>>   				}
>>   			}
>> -- 
>> 2.7.4
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
>
Gang He March 29, 2018, 3:21 a.m. UTC | #3
Hi Changwei,


>>> 
> Hi Gang,
> 
> On 2018/3/29 10:36, Gang He wrote:
>> Hello Changwei,
>> 
>> 
>> Do you have the related crash backtrace?
> This patch has been pending in my tree for quite a long time and sadly I 
> can't 
> find the back trace right now. But we can still find the risk by reviewing 
> related code. :)
> 
>> Maybe I feel that new adding check is not necessary.
> 
> Very true, but the check I add is for debug purpose.
> We can see that there are many places calling ocfs2_read_blocks(), some of 
> them 
> are passing only one bh while others are not.
> In order to handle potential exception easily, it's better for callers to 
> pass 
> bhs which are all null or assigned. So I add that trick to tell if some 
> callers 
> are doing stupid things.
> 
> Thanks,
> Changwei
> 
>> since the below code has make sure all buffer head is NOT NULL before 
> reading block.
>> 216         ocfs2_metadata_cache_io_lock(ci);
>> 217         for (i = 0 ; i < nr ; i++) {
>> 218                 if (bhs[i] == NULL) {
>> 219                         bhs[i] = sb_getblk(sb, block++);   <<= here
>> 220                         if (bhs[i] == NULL) {
>> 221                                 ocfs2_metadata_cache_io_unlock(ci);
>> 222                                 status = -ENOMEM;
>> 223                                 mlog_errno(status);
>> 224                                 goto bail;
>> 225                         }
>> 226                 }
>> 227                 bh = bhs[i];
>> 
>> 
>> Thanks
>> Gang
>> 
>> 
>>>>>
>>> ocfs2_read_blocks() is used to read several blocks from disk.
>>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>>> the caller's behavior. If the function fails in reading blocks from
>>> disk, the corresponding bh will be assigned to NULL and put.
>>>
>>> Obviously, above process for non-NULL input bh is not appropriate.
>>> Because the caller doesn't even know its bhs are put and re-assigned.
>>>
>>> If buffer head is managed by caller, ocfs2_read_blocks should not
>>> evaluate it to NULL. It will cause caller accessing illegal memory,
>>> thus crash.
>>>
>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>> ---
>>>   fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>>   1 file changed, 25 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>>> index d9ebe11..17329b6 100644
>>> --- a/fs/ocfs2/buffer_head_io.c
>>> +++ b/fs/ocfs2/buffer_head_io.c
>>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>>> block, int nr,
>>>   	int i, ignore_cache = 0;
>>>   	struct buffer_head *bh;
>>>   	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>>> +	int new_bh = 0;
>>>   
>>>   	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>>   
>>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>>> block, int nr,
>>>   		goto bail;
>>>   	}
>>>   
>>> +	/* Use below trick to check if all bhs are NULL or assigned.
>>> +	 * Basically, we hope all bhs are consistent so that we can
>>> +	 * handle exception easily.
>>> +	 */
>>> +	new_bh = (bhs[0] == NULL);
>>> +	for (i = 1 ; i < nr ; i++) {
>>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>>> +			WARN(1, "Not all bhs are consistent\n");
>>> +			break;
>>> +		}
>>> +	}
Maybe just adding a buffer head array check is OK?
If not consistent, give a warning.
why do we need the below code change?
since all head buffers are always NOT NULL.

Thanks
Gang

>>> +
>>>   	ocfs2_metadata_cache_io_lock(ci);
>>>   	for (i = 0 ; i < nr ; i++) {
>>>   		if (bhs[i] == NULL) {
>>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>>> block, int nr,
>>>   		if (!(flags & OCFS2_BH_READAHEAD)) {
>>>   			if (status) {
>>>   				/* Clear the rest of the buffers on error */
>>> -				put_bh(bh);
>>> -				bhs[i] = NULL;
>>> +				if (new_bh) {
>>> +					put_bh(bh);
>>> +					bhs[i] = NULL;
>>> +				}
>>>   				continue;
>>>   			}
>>>   			/* We know this can't have changed as we hold the
>>> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>>> block, int nr,
>>>   				 * for this bh as it's not marked locally
>>>   				 * uptodate. */
>>>   				status = -EIO;
>>> -				put_bh(bh);
>>> -				bhs[i] = NULL;
>>> +				if (new_bh) {
>>> +					put_bh(bh);
>>> +					bhs[i] = NULL;
>>> +				}
>>>   				continue;
>>>   			}
>>>   
>>> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>>> block, int nr,
>>>   				clear_buffer_needs_validate(bh);
>>>   				status = validate(sb, bh);
>>>   				if (status) {
>>> -					put_bh(bh);
>>> -					bhs[i] = NULL;
>>> +					if (new_bh) {
>>> +						put_bh(bh);
>>> +						bhs[i] = NULL;
>>> +					}
>>>   					continue;
>>>   				}
>>>   			}
>>> -- 
>>> 2.7.4
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel@oss.oracle.com 
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>> 
>>
Larry Chen March 29, 2018, 3:36 a.m. UTC | #4
Hi Changwei,

I found that your patch call put_bh function only if new_bh==1,
Will it cause buffer_head use count inconsistent??

Thanks
Larry


On 03/29/2018 10:06 AM, Changwei Ge wrote:
> ocfs2_read_blocks() is used to read several blocks from disk.
> Currently, the input argument *bhs* can be NULL or NOT. It depends on
> the caller's behavior. If the function fails in reading blocks from
> disk, the corresponding bh will be assigned to NULL and put.
>
> Obviously, above process for non-NULL input bh is not appropriate.
> Because the caller doesn't even know its bhs are put and re-assigned.
>
> If buffer head is managed by caller, ocfs2_read_blocks should not
> evaluate it to NULL. It will cause caller accessing illegal memory,
> thus crash.
>
> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
> ---
>   fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>   1 file changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
> index d9ebe11..17329b6 100644
> --- a/fs/ocfs2/buffer_head_io.c
> +++ b/fs/ocfs2/buffer_head_io.c
> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>   	int i, ignore_cache = 0;
>   	struct buffer_head *bh;
>   	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
> +	int new_bh = 0;
>   
>   	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>   
> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>   		goto bail;
>   	}
>   
> +	/* Use below trick to check if all bhs are NULL or assigned.
> +	 * Basically, we hope all bhs are consistent so that we can
> +	 * handle exception easily.
> +	 */
> +	new_bh = (bhs[0] == NULL);
> +	for (i = 1 ; i < nr ; i++) {
> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
> +			WARN(1, "Not all bhs are consistent\n");
> +			break;
> +		}
> +	}
> +
>   	ocfs2_metadata_cache_io_lock(ci);
>   	for (i = 0 ; i < nr ; i++) {
>   		if (bhs[i] == NULL) {
> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>   		if (!(flags & OCFS2_BH_READAHEAD)) {
>   			if (status) {
>   				/* Clear the rest of the buffers on error */
> -				put_bh(bh);
> -				bhs[i] = NULL;
> +				if (new_bh) {
> +					put_bh(bh);
> +					bhs[i] = NULL;
> +				}
>   				continue;
>   			}
>   			/* We know this can't have changed as we hold the
> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>   				 * for this bh as it's not marked locally
>   				 * uptodate. */
>   				status = -EIO;
> -				put_bh(bh);
> -				bhs[i] = NULL;
> +				if (new_bh) {
> +					put_bh(bh);
> +					bhs[i] = NULL;
> +				}
>   				continue;
>   			}
>   
> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>   				clear_buffer_needs_validate(bh);
>   				status = validate(sb, bh);
>   				if (status) {
> -					put_bh(bh);
> -					bhs[i] = NULL;
> +					if (new_bh) {
> +						put_bh(bh);
> +						bhs[i] = NULL;
> +					}
>   					continue;
>   				}
>   			}
Changwei Ge March 29, 2018, 6:25 a.m. UTC | #5
Hi Gang,

On 2018/3/29 11:22, Gang He wrote:
> Hi Changwei,
> 
> 
>>>>
>> Hi Gang,
>>
>> On 2018/3/29 10:36, Gang He wrote:
>>> Hello Changwei,
>>>
>>>
>>> Do you have the related crash backtrace?
>> This patch has been pending in my tree for quite a long time and sadly I
>> can't
>> find the back trace right now. But we can still find the risk by reviewing
>> related code. :)
>>
>>> Maybe I feel that new adding check is not necessary.
>>
>> Very true, but the check I add is for debug purpose.
>> We can see that there are many places calling ocfs2_read_blocks(), some of
>> them
>> are passing only one bh while others are not.
>> In order to handle potential exception easily, it's better for callers to
>> pass
>> bhs which are all null or assigned. So I add that trick to tell if some
>> callers
>> are doing stupid things.
>>
>> Thanks,
>> Changwei
>>
>>> since the below code has make sure all buffer head is NOT NULL before
>> reading block.
>>> 216         ocfs2_metadata_cache_io_lock(ci);
>>> 217         for (i = 0 ; i < nr ; i++) {
>>> 218                 if (bhs[i] == NULL) {
>>> 219                         bhs[i] = sb_getblk(sb, block++);   <<= here
>>> 220                         if (bhs[i] == NULL) {
>>> 221                                 ocfs2_metadata_cache_io_unlock(ci);
>>> 222                                 status = -ENOMEM;
>>> 223                                 mlog_errno(status);
>>> 224                                 goto bail;
>>> 225                         }
>>> 226                 }
>>> 227                 bh = bhs[i];
>>>
>>>
>>> Thanks
>>> Gang
>>>
>>>
>>>>>>
>>>> ocfs2_read_blocks() is used to read several blocks from disk.
>>>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>>>> the caller's behavior. If the function fails in reading blocks from
>>>> disk, the corresponding bh will be assigned to NULL and put.
>>>>
>>>> Obviously, above process for non-NULL input bh is not appropriate.
>>>> Because the caller doesn't even know its bhs are put and re-assigned.
>>>>
>>>> If buffer head is managed by caller, ocfs2_read_blocks should not
>>>> evaluate it to NULL. It will cause caller accessing illegal memory,
>>>> thus crash.
>>>>
>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>> ---
>>>>    fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>>>    1 file changed, 25 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>>>> index d9ebe11..17329b6 100644
>>>> --- a/fs/ocfs2/buffer_head_io.c
>>>> +++ b/fs/ocfs2/buffer_head_io.c
>>>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>>>> block, int nr,
>>>>    	int i, ignore_cache = 0;
>>>>    	struct buffer_head *bh;
>>>>    	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>>>> +	int new_bh = 0;
>>>>    
>>>>    	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>>>    
>>>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>>>> block, int nr,
>>>>    		goto bail;
>>>>    	}
>>>>    
>>>> +	/* Use below trick to check if all bhs are NULL or assigned.
>>>> +	 * Basically, we hope all bhs are consistent so that we can
>>>> +	 * handle exception easily.
>>>> +	 */
>>>> +	new_bh = (bhs[0] == NULL);
>>>> +	for (i = 1 ; i < nr ; i++) {
>>>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>>>> +			WARN(1, "Not all bhs are consistent\n");
>>>> +			break;
>>>> +		}
>>>> +	}
> Maybe just adding a buffer head array check is OK?
> If not consistent, give a warning.
> why do we need the below code change?
> since all head buffers are always NOT NULL.

Thanks for your review.
I will elaborate my intention and the reason doing so further.

There are *two* kinds of customers of ocfs2_read_blocks().

One kind like _slot map_ uses this function with *buffer head* allocated in 
advance. For this type, ocfs2_read_blocks() will not allocate *buffer head* via 
sb_getblk(). Because _slot map_ has reserved some buffer heads during its 
initialization. In other words, the input argument *bhs* should be an array with 
all entries assigned to non-NULL.
You can refer to code path:
ocfs2_refresh_slot_info -> ocfs2_read_blocks

The other kind doesn't reserve buffer head in advance, it relies on 
ocfs2_read_blocks() to allocate buffer head for following read from disk. This 
is why ocfs2_read_blocks() checks if bhs[i] is NULL.

For the first type, if ocfs2_read_blocks fails in reading from disk.
Current code will assign bhs[i] to NULL and put it, which my patch wants to fix.
Because the customer doesn't know what ocfs2_read_blocks() did to its bhs.
The customer like _slot map_ will still try to reference those bhs.

Thanks,
Changwei

> 
> Thanks
> Gang
> 
>>>> +
>>>>    	ocfs2_metadata_cache_io_lock(ci);
>>>>    	for (i = 0 ; i < nr ; i++) {
>>>>    		if (bhs[i] == NULL) {
>>>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>>>> block, int nr,
>>>>    		if (!(flags & OCFS2_BH_READAHEAD)) {
>>>>    			if (status) {
>>>>    				/* Clear the rest of the buffers on error */
>>>> -				put_bh(bh);
>>>> -				bhs[i] = NULL;
>>>> +				if (new_bh) {
>>>> +					put_bh(bh);
>>>> +					bhs[i] = NULL;
>>>> +				}
>>>>    				continue;
>>>>    			}
>>>>    			/* We know this can't have changed as we hold the
>>>> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>>>> block, int nr,
>>>>    				 * for this bh as it's not marked locally
>>>>    				 * uptodate. */
>>>>    				status = -EIO;
>>>> -				put_bh(bh);
>>>> -				bhs[i] = NULL;
>>>> +				if (new_bh) {
>>>> +					put_bh(bh);
>>>> +					bhs[i] = NULL;
>>>> +				}
>>>>    				continue;
>>>>    			}
>>>>    
>>>> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64
>>>> block, int nr,
>>>>    				clear_buffer_needs_validate(bh);
>>>>    				status = validate(sb, bh);
>>>>    				if (status) {
>>>> -					put_bh(bh);
>>>> -					bhs[i] = NULL;
>>>> +					if (new_bh) {
>>>> +						put_bh(bh);
>>>> +						bhs[i] = NULL;
>>>> +					}
>>>>    					continue;
>>>>    				}
>>>>    			}
>>>> -- 
>>>> 2.7.4
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel@oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>>
>
Changwei Ge March 29, 2018, 6:27 a.m. UTC | #6
Hi Larry,

On 2018/3/29 11:37, Larry Chen wrote:
> Hi Changwei,
> 
> I found that your patch call put_bh function only if new_bh==1,
> Will it cause buffer_head use count inconsistent??

We don't have to worry about that since sb_getblk() should never be invoked in 
that case.

Thanks,
Changwei

> 
> Thanks
> Larry
> 
> 
> On 03/29/2018 10:06 AM, Changwei Ge wrote:
>> ocfs2_read_blocks() is used to read several blocks from disk.
>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>> the caller's behavior. If the function fails in reading blocks from
>> disk, the corresponding bh will be assigned to NULL and put.
>>
>> Obviously, above process for non-NULL input bh is not appropriate.
>> Because the caller doesn't even know its bhs are put and re-assigned.
>>
>> If buffer head is managed by caller, ocfs2_read_blocks should not
>> evaluate it to NULL. It will cause caller accessing illegal memory,
>> thus crash.
>>
>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>> ---
>>    fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>    1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>> index d9ebe11..17329b6 100644
>> --- a/fs/ocfs2/buffer_head_io.c
>> +++ b/fs/ocfs2/buffer_head_io.c
>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>    	int i, ignore_cache = 0;
>>    	struct buffer_head *bh;
>>    	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>> +	int new_bh = 0;
>>    
>>    	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>    
>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>    		goto bail;
>>    	}
>>    
>> +	/* Use below trick to check if all bhs are NULL or assigned.
>> +	 * Basically, we hope all bhs are consistent so that we can
>> +	 * handle exception easily.
>> +	 */
>> +	new_bh = (bhs[0] == NULL);
>> +	for (i = 1 ; i < nr ; i++) {
>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>> +			WARN(1, "Not all bhs are consistent\n");
>> +			break;
>> +		}
>> +	}
>> +
>>    	ocfs2_metadata_cache_io_lock(ci);
>>    	for (i = 0 ; i < nr ; i++) {
>>    		if (bhs[i] == NULL) {
>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>    		if (!(flags & OCFS2_BH_READAHEAD)) {
>>    			if (status) {
>>    				/* Clear the rest of the buffers on error */
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
>>    				continue;
>>    			}
>>    			/* We know this can't have changed as we hold the
>> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>    				 * for this bh as it's not marked locally
>>    				 * uptodate. */
>>    				status = -EIO;
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
>>    				continue;
>>    			}
>>    
>> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>    				clear_buffer_needs_validate(bh);
>>    				status = validate(sb, bh);
>>    				if (status) {
>> -					put_bh(bh);
>> -					bhs[i] = NULL;
>> +					if (new_bh) {
>> +						put_bh(bh);
>> +						bhs[i] = NULL;
>> +					}
>>    					continue;
>>    				}
>>    			}
> 
>
piaojun March 29, 2018, 9:50 a.m. UTC | #7
Hi Changwei,

On 2018/3/29 10:06, Changwei Ge wrote:
> ocfs2_read_blocks() is used to read several blocks from disk.
> Currently, the input argument *bhs* can be NULL or NOT. It depends on
> the caller's behavior. If the function fails in reading blocks from
> disk, the corresponding bh will be assigned to NULL and put.
> 
> Obviously, above process for non-NULL input bh is not appropriate.
> Because the caller doesn't even know its bhs are put and re-assigned.
> 
> If buffer head is managed by caller, ocfs2_read_blocks should not
> evaluate it to NULL. It will cause caller accessing illegal memory,
> thus crash.
> 
> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
> ---
>  fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
> index d9ebe11..17329b6 100644
> --- a/fs/ocfs2/buffer_head_io.c
> +++ b/fs/ocfs2/buffer_head_io.c
> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  	int i, ignore_cache = 0;
>  	struct buffer_head *bh;
>  	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
> +	int new_bh = 0;
>  
>  	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>  
> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  		goto bail;
>  	}
>  
> +	/* Use below trick to check if all bhs are NULL or assigned.
> +	 * Basically, we hope all bhs are consistent so that we can
> +	 * handle exception easily.
> +	 */
> +	new_bh = (bhs[0] == NULL);
> +	for (i = 1 ; i < nr ; i++) {
> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
> +			WARN(1, "Not all bhs are consistent\n");
> +			break;
> +		}
> +	}
> +
>  	ocfs2_metadata_cache_io_lock(ci);
>  	for (i = 0 ; i < nr ; i++) {
>  		if (bhs[i] == NULL) {
> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  		if (!(flags & OCFS2_BH_READAHEAD)) {
>  			if (status) {
>  				/* Clear the rest of the buffers on error */
> -				put_bh(bh);
> -				bhs[i] = NULL;
> +				if (new_bh) {
> +					put_bh(bh);
> +					bhs[i] = NULL;
> +				}
>  				continue;
>  			}
>  			/* We know this can't have changed as we hold the
> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  				 * for this bh as it's not marked locally
>  				 * uptodate. */
>  				status = -EIO;
> -				put_bh(bh);
> -				bhs[i] = NULL;
> +				if (new_bh) {
> +					put_bh(bh);
> +					bhs[i] = NULL;
> +				}
How to make suer 'bhs[i]' is not allocated by user according to 'new_bh'?
'new_bh' equis 1 only means 'bhs[0]' is allocated by ocfs2_read_blocks()
and we should put it here, right?

thanks,
Jun
>  				continue;
>  			}
>  
> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  				clear_buffer_needs_validate(bh);
>  				status = validate(sb, bh);
>  				if (status) {
> -					put_bh(bh);
> -					bhs[i] = NULL;
> +					if (new_bh) {
> +						put_bh(bh);
> +						bhs[i] = NULL;
> +					}
>  					continue;
>  				}
>  			}
>
Larry Chen March 29, 2018, 10:32 a.m. UTC | #8
Hi Changwei,

On 03/29/2018 05:50 PM, piaojun wrote:
> Hi Changwei,
>
> On 2018/3/29 10:06, Changwei Ge wrote:
>> ocfs2_read_blocks() is used to read several blocks from disk.
>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>> the caller's behavior. If the function fails in reading blocks from
>> disk, the corresponding bh will be assigned to NULL and put.
>>
>> Obviously, above process for non-NULL input bh is not appropriate.
>> Because the caller doesn't even know its bhs are put and re-assigned.
>>
>> If buffer head is managed by caller, ocfs2_read_blocks should not
>> evaluate it to NULL. It will cause caller accessing illegal memory,
>> thus crash.
>>
>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>> ---
>>   fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>   1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>> index d9ebe11..17329b6 100644
>> --- a/fs/ocfs2/buffer_head_io.c
>> +++ b/fs/ocfs2/buffer_head_io.c
>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   	int i, ignore_cache = 0;
>>   	struct buffer_head *bh;
>>   	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>> +	int new_bh = 0;
>>   
>>   	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>   
>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   		goto bail;
>>   	}
>>   
>> +	/* Use below trick to check if all bhs are NULL or assigned.
>> +	 * Basically, we hope all bhs are consistent so that we can
>> +	 * handle exception easily.
>> +	 */
>> +	new_bh = (bhs[0] == NULL);
>> +	for (i = 1 ; i < nr ; i++) {
>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>> +			WARN(1, "Not all bhs are consistent\n");
>> +			break;
>> +		}
>> +	}
>> +
>>   	ocfs2_metadata_cache_io_lock(ci);
>>   	for (i = 0 ; i < nr ; i++) {
>>   		if (bhs[i] == NULL) {
>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   		if (!(flags & OCFS2_BH_READAHEAD)) {
>>   			if (status) {
>>   				/* Clear the rest of the buffers on error */
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
>>   				continue;
>>   			}
>>   			/* We know this can't have changed as we hold the
>> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   				 * for this bh as it's not marked locally
>>   				 * uptodate. */
>>   				status = -EIO;
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
> How to make suer 'bhs[i]' is not allocated by user according to 'new_bh'?
> 'new_bh' equis 1 only means 'bhs[0]' is allocated by ocfs2_read_blocks()
> and we should put it here, right?
Does your patch assumes that bhs refers to either an all-NULL-elements 
array or
an all-preallocated-elements array?

Thanks
Larry
> thanks,
> Jun
>>   				continue;
>>   			}
>>   
>> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   				clear_buffer_needs_validate(bh);
>>   				status = validate(sb, bh);
>>   				if (status) {
>> -					put_bh(bh);
>> -					bhs[i] = NULL;
>> +					if (new_bh) {
>> +						put_bh(bh);
>> +						bhs[i] = NULL;
>> +					}
>>   					continue;
>>   				}
>>   			}
>>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
>
Changwei Ge March 29, 2018, 12:03 p.m. UTC | #9
Hi Jun,

On 2018/3/29 17:51, piaojun wrote:
> Hi Changwei,
> 
> On 2018/3/29 10:06, Changwei Ge wrote:
>> ocfs2_read_blocks() is used to read several blocks from disk.
>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>> the caller's behavior. If the function fails in reading blocks from
>> disk, the corresponding bh will be assigned to NULL and put.
>>
>> Obviously, above process for non-NULL input bh is not appropriate.
>> Because the caller doesn't even know its bhs are put and re-assigned.
>>
>> If buffer head is managed by caller, ocfs2_read_blocks should not
>> evaluate it to NULL. It will cause caller accessing illegal memory,
>> thus crash.
>>
>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>> ---
>>   fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>   1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>> index d9ebe11..17329b6 100644
>> --- a/fs/ocfs2/buffer_head_io.c
>> +++ b/fs/ocfs2/buffer_head_io.c
>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   	int i, ignore_cache = 0;
>>   	struct buffer_head *bh;
>>   	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>> +	int new_bh = 0;
>>   
>>   	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>   
>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   		goto bail;
>>   	}
>>   
>> +	/* Use below trick to check if all bhs are NULL or assigned.
>> +	 * Basically, we hope all bhs are consistent so that we can
>> +	 * handle exception easily.
>> +	 */
>> +	new_bh = (bhs[0] == NULL);
>> +	for (i = 1 ; i < nr ; i++) {
>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>> +			WARN(1, "Not all bhs are consistent\n");
>> +			break;
>> +		}
>> +	}
>> +
>>   	ocfs2_metadata_cache_io_lock(ci);
>>   	for (i = 0 ; i < nr ; i++) {
>>   		if (bhs[i] == NULL) {
>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   		if (!(flags & OCFS2_BH_READAHEAD)) {
>>   			if (status) {
>>   				/* Clear the rest of the buffers on error */
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
>>   				continue;
>>   			}
>>   			/* We know this can't have changed as we hold the
>> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   				 * for this bh as it's not marked locally
>>   				 * uptodate. */
>>   				status = -EIO;
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
> How to make suer 'bhs[i]' is not allocated by user according to 'new_bh'?
> 'new_bh' equis 1 only means 'bhs[0]' is allocated by ocfs2_read_blocks()
> and we should put it here, right?

Thanks for your review.
If I understand correctly, you mean *new_bh* only represents that only bhs[0] is 
allocated by ocfs2_read_blocks() while other bhs[i] can't be ensured NULL as 
well as input arugment?
I suppose every single elements in bhs[i] should be NULL or non-NULL.
This should be guaranteed by caller.

So I add a trick to check if condition is met, which you can find under a comment.

Thanks,
Changwei

> 
> thanks,
> Jun
>>   				continue;
>>   			}
>>   
>> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   				clear_buffer_needs_validate(bh);
>>   				status = validate(sb, bh);
>>   				if (status) {
>> -					put_bh(bh);
>> -					bhs[i] = NULL;
>> +					if (new_bh) {
>> +						put_bh(bh);
>> +						bhs[i] = NULL;
>> +					}
>>   					continue;
>>   				}
>>   			}
>>
>
Changwei Ge March 29, 2018, 12:04 p.m. UTC | #10
Hi Larry,

On 2018/3/29 18:33, Larry Chen wrote:
> Hi Changwei,
> 
> On 03/29/2018 05:50 PM, piaojun wrote:
>> Hi Changwei,
>>
>> On 2018/3/29 10:06, Changwei Ge wrote:
>>> ocfs2_read_blocks() is used to read several blocks from disk.
>>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>>> the caller's behavior. If the function fails in reading blocks from
>>> disk, the corresponding bh will be assigned to NULL and put.
>>>
>>> Obviously, above process for non-NULL input bh is not appropriate.
>>> Because the caller doesn't even know its bhs are put and re-assigned.
>>>
>>> If buffer head is managed by caller, ocfs2_read_blocks should not
>>> evaluate it to NULL. It will cause caller accessing illegal memory,
>>> thus crash.
>>>
>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>> ---
>>>    fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>>    1 file changed, 25 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>>> index d9ebe11..17329b6 100644
>>> --- a/fs/ocfs2/buffer_head_io.c
>>> +++ b/fs/ocfs2/buffer_head_io.c
>>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>    	int i, ignore_cache = 0;
>>>    	struct buffer_head *bh;
>>>    	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>>> +	int new_bh = 0;
>>>    
>>>    	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>>    
>>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>    		goto bail;
>>>    	}
>>>    
>>> +	/* Use below trick to check if all bhs are NULL or assigned.
>>> +	 * Basically, we hope all bhs are consistent so that we can
>>> +	 * handle exception easily.
>>> +	 */
>>> +	new_bh = (bhs[0] == NULL);
>>> +	for (i = 1 ; i < nr ; i++) {
>>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>>> +			WARN(1, "Not all bhs are consistent\n");
>>> +			break;
>>> +		}
>>> +	}
>>> +
>>>    	ocfs2_metadata_cache_io_lock(ci);
>>>    	for (i = 0 ; i < nr ; i++) {
>>>    		if (bhs[i] == NULL) {
>>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>    		if (!(flags & OCFS2_BH_READAHEAD)) {
>>>    			if (status) {
>>>    				/* Clear the rest of the buffers on error */
>>> -				put_bh(bh);
>>> -				bhs[i] = NULL;
>>> +				if (new_bh) {
>>> +					put_bh(bh);
>>> +					bhs[i] = NULL;
>>> +				}
>>>    				continue;
>>>    			}
>>>    			/* We know this can't have changed as we hold the
>>> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>    				 * for this bh as it's not marked locally
>>>    				 * uptodate. */
>>>    				status = -EIO;
>>> -				put_bh(bh);
>>> -				bhs[i] = NULL;
>>> +				if (new_bh) {
>>> +					put_bh(bh);
>>> +					bhs[i] = NULL;
>>> +				}
>> How to make suer 'bhs[i]' is not allocated by user according to 'new_bh'?
>> 'new_bh' equis 1 only means 'bhs[0]' is allocated by ocfs2_read_blocks()
>> and we should put it here, right?
> Does your patch assumes that bhs refers to either an all-NULL-elements
> array or
> an all-preallocated-elements array?

True, I mean that. :)

Thanks,
Changwei

> 
> Thanks
> Larry
>> thanks,
>> Jun
>>>    				continue;
>>>    			}
>>>    
>>> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>    				clear_buffer_needs_validate(bh);
>>>    				status = validate(sb, bh);
>>>    				if (status) {
>>> -					put_bh(bh);
>>> -					bhs[i] = NULL;
>>> +					if (new_bh) {
>>> +						put_bh(bh);
>>> +						bhs[i] = NULL;
>>> +					}
>>>    					continue;
>>>    				}
>>>    			}
>>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
>>
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
Andrew Morton March 29, 2018, 9:45 p.m. UTC | #11
On Thu, 29 Mar 2018 10:06:02 +0800 Changwei Ge <ge.changwei@h3c.com> wrote:

> ocfs2_read_blocks() is used to read several blocks from disk.
> Currently, the input argument *bhs* can be NULL or NOT. It depends on
> the caller's behavior. If the function fails in reading blocks from
> disk, the corresponding bh will be assigned to NULL and put.
> 
> Obviously, above process for non-NULL input bh is not appropriate.
> Because the caller doesn't even know its bhs are put and re-assigned.
> 
> If buffer head is managed by caller, ocfs2_read_blocks should not
> evaluate it to NULL. It will cause caller accessing illegal memory,
> thus crash.

(What about ocfs2_read_blocks_sync()?)

Passing non-NULL entries in bhs[] looks like a weird thing to do.  Do
any callers actually do this?  And of they do, do they actually care
about the alteration of bhs[] if the call failed?
Changwei Ge March 30, 2018, 12:50 a.m. UTC | #12
Hi Andrew,

On 2018/3/30 5:45, Andrew Morton wrote:
> On Thu, 29 Mar 2018 10:06:02 +0800 Changwei Ge <ge.changwei@h3c.com> wrote:
> 
>> ocfs2_read_blocks() is used to read several blocks from disk.
>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>> the caller's behavior. If the function fails in reading blocks from
>> disk, the corresponding bh will be assigned to NULL and put.
>>
>> Obviously, above process for non-NULL input bh is not appropriate.
>> Because the caller doesn't even know its bhs are put and re-assigned.
>>
>> If buffer head is managed by caller, ocfs2_read_blocks should not
>> evaluate it to NULL. It will cause caller accessing illegal memory,
>> thus crash.
> 
> (What about ocfs2_read_blocks_sync()?)

ocfs2_read_blocks_sync() seems to have the same issue,too.

> 
> Passing non-NULL entries in bhs[] looks like a weird thing to do.  Do
> any callers actually do this?  And of they do, do they actually care

Yes, some callers actually pass non-NULL entries in bhs[].
In ocfs2, _slot map_ keeps the mapping relationship between *node number* and 
*slot number* which identifies a dedicated disk resource(usually metadata or 
journal).

_Slot map_ is loaded from disk during mount in function ocfs2_map_slot_buffers() 
where ->si_bh[] are allocated with NULL filled. Then it invokes 
ocfs2_read_blocks() to read a block from disk and assign the returned bh to 
->si_bh[i].

If ocfs2 needs to refresh _slot map_ via ocfs2_refresh_slot_info(), ->si_bh is 
directly passed in. So the weird thing happens. :(

> about the alteration of bhs[] if the call failed?

Unfortunately, the alternation is ignored and that's what my patch wants to fix.

A thing deserved to be noticed is that a caller may pass bhs[] with mixed NULL 
and non-NULL entries in. That really bothers me, so I add a WARN check to notice 
the caller to pass a proper pattern of bhs[] in. After that, ocfs2_read_blocks() 
can handle read failure easily.
And do you have any advice for how to fix this?

Thanks,
Changwei

> 
>
Joseph Qi March 30, 2018, 1:26 a.m. UTC | #13
On 18/3/29 10:06, Changwei Ge wrote:
> ocfs2_read_blocks() is used to read several blocks from disk.
> Currently, the input argument *bhs* can be NULL or NOT. It depends on
> the caller's behavior. If the function fails in reading blocks from
> disk, the corresponding bh will be assigned to NULL and put.
> 
> Obviously, above process for non-NULL input bh is not appropriate.
> Because the caller doesn't even know its bhs are put and re-assigned.
> 
> If buffer head is managed by caller, ocfs2_read_blocks should not
> evaluate it to NULL. It will cause caller accessing illegal memory,
> thus crash.
> 
> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
> ---
>  fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
> index d9ebe11..17329b6 100644
> --- a/fs/ocfs2/buffer_head_io.c
> +++ b/fs/ocfs2/buffer_head_io.c
> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  	int i, ignore_cache = 0;
>  	struct buffer_head *bh;
>  	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
> +	int new_bh = 0;
>  
>  	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>  
> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  		goto bail;
>  	}
>  
> +	/* Use below trick to check if all bhs are NULL or assigned.
> +	 * Basically, we hope all bhs are consistent so that we can
> +	 * handle exception easily.
> +	 */
> +	new_bh = (bhs[0] == NULL);
> +	for (i = 1 ; i < nr ; i++) {
> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
> +			WARN(1, "Not all bhs are consistent\n");
> +			break;
> +		}
> +	}
> +
>  	ocfs2_metadata_cache_io_lock(ci);
>  	for (i = 0 ; i < nr ; i++) {
>  		if (bhs[i] == NULL) {
> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  		if (!(flags & OCFS2_BH_READAHEAD)) {
>  			if (status) {
>  				/* Clear the rest of the buffers on error */
> -				put_bh(bh);
> -				bhs[i] = NULL;
> +				if (new_bh) {
> +					put_bh(bh);
> +					bhs[i] = NULL;
> +				}

Since we assume caller has to pass either all NULL or all non-NULL,
here we will only put bh internal allocated. Am I missing something?

Thanks,
Joseph

>  				continue;
>  			}
>  			/* We know this can't have changed as we hold the
> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  				 * for this bh as it's not marked locally
>  				 * uptodate. */
>  				status = -EIO;
> -				put_bh(bh);
> -				bhs[i] = NULL;
> +				if (new_bh) {
> +					put_bh(bh);
> +					bhs[i] = NULL;
> +				}
>  				continue;
>  			}
>  
> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>  				clear_buffer_needs_validate(bh);
>  				status = validate(sb, bh);
>  				if (status) {
> -					put_bh(bh);
> -					bhs[i] = NULL;
> +					if (new_bh) {
> +						put_bh(bh);
> +						bhs[i] = NULL;
> +					}
>  					continue;
>  				}
>  			}
>
Changwei Ge March 30, 2018, 1:31 a.m. UTC | #14
Hi Joseph,

On 2018/3/30 9:27, Joseph Qi wrote:
> 
> 
> On 18/3/29 10:06, Changwei Ge wrote:
>> ocfs2_read_blocks() is used to read several blocks from disk.
>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>> the caller's behavior. If the function fails in reading blocks from
>> disk, the corresponding bh will be assigned to NULL and put.
>>
>> Obviously, above process for non-NULL input bh is not appropriate.
>> Because the caller doesn't even know its bhs are put and re-assigned.
>>
>> If buffer head is managed by caller, ocfs2_read_blocks should not
>> evaluate it to NULL. It will cause caller accessing illegal memory,
>> thus crash.
>>
>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>> ---
>>   fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>   1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>> index d9ebe11..17329b6 100644
>> --- a/fs/ocfs2/buffer_head_io.c
>> +++ b/fs/ocfs2/buffer_head_io.c
>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   	int i, ignore_cache = 0;
>>   	struct buffer_head *bh;
>>   	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>> +	int new_bh = 0;
>>   
>>   	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>   
>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   		goto bail;
>>   	}
>>   
>> +	/* Use below trick to check if all bhs are NULL or assigned.
>> +	 * Basically, we hope all bhs are consistent so that we can
>> +	 * handle exception easily.
>> +	 */
>> +	new_bh = (bhs[0] == NULL);
>> +	for (i = 1 ; i < nr ; i++) {
>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>> +			WARN(1, "Not all bhs are consistent\n");
>> +			break;
>> +		}
>> +	}
>> +
>>   	ocfs2_metadata_cache_io_lock(ci);
>>   	for (i = 0 ; i < nr ; i++) {
>>   		if (bhs[i] == NULL) {
>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   		if (!(flags & OCFS2_BH_READAHEAD)) {
>>   			if (status) {
>>   				/* Clear the rest of the buffers on error */
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
> 
> Since we assume caller has to pass either all NULL or all non-NULL,
> here we will only put bh internal allocated. Am I missing something?

Thanks for your review.
Yes, we will only put bh internally allocated.
If bh is reserved in advance, we will not put it and re-assign it to NULL.

Thanks,
Changwei

> 
> Thanks,
> Joseph
> 
>>   				continue;
>>   			}
>>   			/* We know this can't have changed as we hold the
>> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   				 * for this bh as it's not marked locally
>>   				 * uptodate. */
>>   				status = -EIO;
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
>>   				continue;
>>   			}
>>   
>> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>   				clear_buffer_needs_validate(bh);
>>   				status = validate(sb, bh);
>>   				if (status) {
>> -					put_bh(bh);
>> -					bhs[i] = NULL;
>> +					if (new_bh) {
>> +						put_bh(bh);
>> +						bhs[i] = NULL;
>> +					}
>>   					continue;
>>   				}
>>   			}
>>
>
Joseph Qi March 30, 2018, 2:03 a.m. UTC | #15
On 18/3/30 09:31, Changwei Ge wrote:
> Hi Joseph,
> 
> On 2018/3/30 9:27, Joseph Qi wrote:
>>
>>
>> On 18/3/29 10:06, Changwei Ge wrote:
>>> ocfs2_read_blocks() is used to read several blocks from disk.
>>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>>> the caller's behavior. If the function fails in reading blocks from
>>> disk, the corresponding bh will be assigned to NULL and put.
>>>
>>> Obviously, above process for non-NULL input bh is not appropriate.
>>> Because the caller doesn't even know its bhs are put and re-assigned.
>>>
>>> If buffer head is managed by caller, ocfs2_read_blocks should not
>>> evaluate it to NULL. It will cause caller accessing illegal memory,
>>> thus crash.
>>>
>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>> ---
>>>   fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>>   1 file changed, 25 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>>> index d9ebe11..17329b6 100644
>>> --- a/fs/ocfs2/buffer_head_io.c
>>> +++ b/fs/ocfs2/buffer_head_io.c
>>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>   	int i, ignore_cache = 0;
>>>   	struct buffer_head *bh;
>>>   	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>>> +	int new_bh = 0;
>>>   
>>>   	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>>   
>>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>   		goto bail;
>>>   	}
>>>   
>>> +	/* Use below trick to check if all bhs are NULL or assigned.
>>> +	 * Basically, we hope all bhs are consistent so that we can
>>> +	 * handle exception easily.
>>> +	 */
>>> +	new_bh = (bhs[0] == NULL);
>>> +	for (i = 1 ; i < nr ; i++) {
>>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>>> +			WARN(1, "Not all bhs are consistent\n");
>>> +			break;
>>> +		}
>>> +	}
>>> +
>>>   	ocfs2_metadata_cache_io_lock(ci);
>>>   	for (i = 0 ; i < nr ; i++) {
>>>   		if (bhs[i] == NULL) {
>>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>   		if (!(flags & OCFS2_BH_READAHEAD)) {
>>>   			if (status) {
>>>   				/* Clear the rest of the buffers on error */
>>> -				put_bh(bh);
>>> -				bhs[i] = NULL;
>>> +				if (new_bh) {
>>> +					put_bh(bh);
>>> +					bhs[i] = NULL;
>>> +				}
>>
>> Since we assume caller has to pass either all NULL or all non-NULL,
>> here we will only put bh internal allocated. Am I missing something?
> 
> Thanks for your review.
> Yes, we will only put bh internally allocated.
> If bh is reserved in advance, we will not put it and re-assign it to NULL.
> 

So this branch won't have risk, right?

Thanks,
Joseph
piaojun March 30, 2018, 2:16 a.m. UTC | #16
Hi Joseph and Changwei,

On 2018/3/30 9:26, Joseph Qi wrote:
> 
> 
> On 18/3/29 10:06, Changwei Ge wrote:
>> ocfs2_read_blocks() is used to read several blocks from disk.
>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>> the caller's behavior. If the function fails in reading blocks from
>> disk, the corresponding bh will be assigned to NULL and put.
>>
>> Obviously, above process for non-NULL input bh is not appropriate.
>> Because the caller doesn't even know its bhs are put and re-assigned.
>>
>> If buffer head is managed by caller, ocfs2_read_blocks should not
>> evaluate it to NULL. It will cause caller accessing illegal memory,
>> thus crash.
>>
>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>> ---
>>  fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>  1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>> index d9ebe11..17329b6 100644
>> --- a/fs/ocfs2/buffer_head_io.c
>> +++ b/fs/ocfs2/buffer_head_io.c
>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>  	int i, ignore_cache = 0;
>>  	struct buffer_head *bh;
>>  	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>> +	int new_bh = 0;
>>  
>>  	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>  
>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>  		goto bail;
>>  	}
>>  
>> +	/* Use below trick to check if all bhs are NULL or assigned.
>> +	 * Basically, we hope all bhs are consistent so that we can
>> +	 * handle exception easily.
>> +	 */
>> +	new_bh = (bhs[0] == NULL);
>> +	for (i = 1 ; i < nr ; i++) {
>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>> +			WARN(1, "Not all bhs are consistent\n");
>> +			break;
>> +		}
>> +	}
>> +
>>  	ocfs2_metadata_cache_io_lock(ci);
>>  	for (i = 0 ; i < nr ; i++) {
>>  		if (bhs[i] == NULL) {
>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>  		if (!(flags & OCFS2_BH_READAHEAD)) {
>>  			if (status) {
>>  				/* Clear the rest of the buffers on error */
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
> 
> Since we assume caller has to pass either all NULL or all non-NULL,
> here we will only put bh internal allocated. Am I missing something?
I think this branch will put bh external allocated as 'new_bh' only means
bhs[0] is internal allocated. So this branch seems inappropriate.

thanks,
Jun
> 
> Thanks,
> Joseph
> 
>>  				continue;
>>  			}
>>  			/* We know this can't have changed as we hold the
>> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>  				 * for this bh as it's not marked locally
>>  				 * uptodate. */
>>  				status = -EIO;
>> -				put_bh(bh);
>> -				bhs[i] = NULL;
>> +				if (new_bh) {
>> +					put_bh(bh);
>> +					bhs[i] = NULL;
>> +				}
>>  				continue;
>>  			}
>>  
>> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>  				clear_buffer_needs_validate(bh);
>>  				status = validate(sb, bh);
>>  				if (status) {
>> -					put_bh(bh);
>> -					bhs[i] = NULL;
>> +					if (new_bh) {
>> +						put_bh(bh);
>> +						bhs[i] = NULL;
>> +					}
>>  					continue;
>>  				}
>>  			}
>>
> .
>
Changwei Ge March 30, 2018, 2:17 a.m. UTC | #17
Hi Joseph,

On 2018/3/30 10:04, Joseph Qi wrote:
> 
> 
> On 18/3/30 09:31, Changwei Ge wrote:
>> Hi Joseph,
>>
>> On 2018/3/30 9:27, Joseph Qi wrote:
>>>
>>>
>>> On 18/3/29 10:06, Changwei Ge wrote:
>>>> ocfs2_read_blocks() is used to read several blocks from disk.
>>>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>>>> the caller's behavior. If the function fails in reading blocks from
>>>> disk, the corresponding bh will be assigned to NULL and put.
>>>>
>>>> Obviously, above process for non-NULL input bh is not appropriate.
>>>> Because the caller doesn't even know its bhs are put and re-assigned.
>>>>
>>>> If buffer head is managed by caller, ocfs2_read_blocks should not
>>>> evaluate it to NULL. It will cause caller accessing illegal memory,
>>>> thus crash.
>>>>
>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>> ---
>>>>    fs/ocfs2/buffer_head_io.c | 31 +++++++++++++++++++++++++------
>>>>    1 file changed, 25 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>>>> index d9ebe11..17329b6 100644
>>>> --- a/fs/ocfs2/buffer_head_io.c
>>>> +++ b/fs/ocfs2/buffer_head_io.c
>>>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>>    	int i, ignore_cache = 0;
>>>>    	struct buffer_head *bh;
>>>>    	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>>>> +	int new_bh = 0;
>>>>    
>>>>    	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>>>    
>>>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>>    		goto bail;
>>>>    	}
>>>>    
>>>> +	/* Use below trick to check if all bhs are NULL or assigned.
>>>> +	 * Basically, we hope all bhs are consistent so that we can
>>>> +	 * handle exception easily.
>>>> +	 */
>>>> +	new_bh = (bhs[0] == NULL);
>>>> +	for (i = 1 ; i < nr ; i++) {
>>>> +		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>>>> +			WARN(1, "Not all bhs are consistent\n");
>>>> +			break;
>>>> +		}
>>>> +	}
>>>> +
>>>>    	ocfs2_metadata_cache_io_lock(ci);
>>>>    	for (i = 0 ; i < nr ; i++) {
>>>>    		if (bhs[i] == NULL) {
>>>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
>>>>    		if (!(flags & OCFS2_BH_READAHEAD)) {
>>>>    			if (status) {
>>>>    				/* Clear the rest of the buffers on error */
>>>> -				put_bh(bh);
>>>> -				bhs[i] = NULL;
>>>> +				if (new_bh) {
>>>> +					put_bh(bh);
>>>> +					bhs[i] = NULL;
>>>> +				}
>>>
>>> Since we assume caller has to pass either all NULL or all non-NULL,
>>> here we will only put bh internal allocated. Am I missing something?
>>
>> Thanks for your review.
>> Yes, we will only put bh internally allocated.
>> If bh is reserved in advance, we will not put it and re-assign it to NULL.
>>
> 
> So this branch won't have risk, right?

Sorry... I'm not sure if I understand you correctly.
This branch will be walked through when previous part of bhs[] faces a read 
failure in order to put bh allocated in ocfs2_read_blocks().
And we assume all bh should be NULL or non-NULL, if new_bh is set, the back part 
should also be put to release those buffer heads.

If I made a mistake or misunderstand you, please let me know.

Thanks,
Changwei


> 
> Thanks,
> Joseph
>
Joseph Qi March 30, 2018, 2:37 a.m. UTC | #18
On 18/3/30 10:17, Changwei Ge wrote:
>>>> Since we assume caller has to pass either all NULL or all non-NULL,
>>>> here we will only put bh internal allocated. Am I missing something?
>>> Thanks for your review.
>>> Yes, we will only put bh internally allocated.
>>> If bh is reserved in advance, we will not put it and re-assign it to NULL.
>>>
>> So this branch won't have risk, right?
> Sorry... I'm not sure if I understand you correctly.
> This branch will be walked through when previous part of bhs[] faces a read 
> failure in order to put bh allocated in ocfs2_read_blocks().
> And we assume all bh should be NULL or non-NULL, if new_bh is set, the back part 
> should also be put to release those buffer heads.
> 
> If I made a mistake or misunderstand you, please let me know.


I'm saying that sb_getblk() will only be called if bh hasn't been
allocated yet. That means if it fails, the bh to be put can be
guaranteed internal allocated.
Also I don't think the WARN check is necessary as this is common path
and will bring additional cpu consumption. We can make it clear at
comments of ocfs2_read_blocks() that either all NULL or non-NULL bhs
is prerequisite for the caller. And then we make sure we won't put bh
that is allocated outside.

Thanks,
Joseph
Changwei Ge March 31, 2018, 2:06 a.m. UTC | #19
On 2018/3/30 10:38, Joseph Qi wrote:
> 
> 
> On 18/3/30 10:17, Changwei Ge wrote:
>>>>> Since we assume caller has to pass either all NULL or all non-NULL,
>>>>> here we will only put bh internal allocated. Am I missing something?
>>>> Thanks for your review.
>>>> Yes, we will only put bh internally allocated.
>>>> If bh is reserved in advance, we will not put it and re-assign it to NULL.
>>>>
>>> So this branch won't have risk, right?
>> Sorry... I'm not sure if I understand you correctly.
>> This branch will be walked through when previous part of bhs[] faces a read
>> failure in order to put bh allocated in ocfs2_read_blocks().
>> And we assume all bh should be NULL or non-NULL, if new_bh is set, the back part
>> should also be put to release those buffer heads.
>>
>> If I made a mistake or misunderstand you, please let me know.
> 
> 
> I'm saying that sb_getblk() will only be called if bh hasn't been
> allocated yet. That means if it fails, the bh to be put can be
> guaranteed internal allocated.

If sb_getblk() fails, code will bail out rather than goes into this branch 
check. Particularly speaking, the *second* for loop will be jumped.
And I do think we need to check new_bh here to put the rest of buffer heads if 
one buffer head encounters read failure. If below code snippet is walked in, the 
mentioned code branch can work.
			
			if (!buffer_uptodate(bh)) {
				/* Status won't be cleared from here on out,
				 * so we can safely record this and loop back
				 * to cleanup the other buffers. Don't need to
				 * remove the clustered uptodate information
				 * for this bh as it's not marked locally
				 * uptodate. */
				status = -EIO;
				put_bh(bh);
				bhs[i] = NULL;
				continue;
			}

After our discussion I find another problem residing in ocfs2_read_blocks() too.
I will send a patch set to fix them all later.

> Also I don't think the WARN check is necessary as this is common path
> and will bring additional cpu consumption. We can make it clear at
> comments of ocfs2_read_blocks() that either all NULL or non-NULL bhs
> is prerequisite for the caller. And then we make sure we won't put bh
> that is allocated outside.

Agree.

Thanks,
Changwei

> 
> Thanks,
> Joseph
>
diff mbox

Patch

diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
index d9ebe11..17329b6 100644
--- a/fs/ocfs2/buffer_head_io.c
+++ b/fs/ocfs2/buffer_head_io.c
@@ -188,6 +188,7 @@  int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
 	int i, ignore_cache = 0;
 	struct buffer_head *bh;
 	struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
+	int new_bh = 0;
 
 	trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
 
@@ -213,6 +214,18 @@  int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
 		goto bail;
 	}
 
+	/* Use below trick to check if all bhs are NULL or assigned.
+	 * Basically, we hope all bhs are consistent so that we can
+	 * handle exception easily.
+	 */
+	new_bh = (bhs[0] == NULL);
+	for (i = 1 ; i < nr ; i++) {
+		if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
+			WARN(1, "Not all bhs are consistent\n");
+			break;
+		}
+	}
+
 	ocfs2_metadata_cache_io_lock(ci);
 	for (i = 0 ; i < nr ; i++) {
 		if (bhs[i] == NULL) {
@@ -324,8 +337,10 @@  int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
 		if (!(flags & OCFS2_BH_READAHEAD)) {
 			if (status) {
 				/* Clear the rest of the buffers on error */
-				put_bh(bh);
-				bhs[i] = NULL;
+				if (new_bh) {
+					put_bh(bh);
+					bhs[i] = NULL;
+				}
 				continue;
 			}
 			/* We know this can't have changed as we hold the
@@ -342,8 +357,10 @@  int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
 				 * for this bh as it's not marked locally
 				 * uptodate. */
 				status = -EIO;
-				put_bh(bh);
-				bhs[i] = NULL;
+				if (new_bh) {
+					put_bh(bh);
+					bhs[i] = NULL;
+				}
 				continue;
 			}
 
@@ -355,8 +372,10 @@  int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 block, int nr,
 				clear_buffer_needs_validate(bh);
 				status = validate(sb, bh);
 				if (status) {
-					put_bh(bh);
-					bhs[i] = NULL;
+					if (new_bh) {
+						put_bh(bh);
+						bhs[i] = NULL;
+					}
 					continue;
 				}
 			}