ocfs2/o2hb: check len for bio_add_page() to avoid submitting incorrect bio
diff mbox

Message ID 5ABB110A.3050808@huawei.com
State New
Headers show

Commit Message

piaojun March 28, 2018, 3:50 a.m. UTC
We need check len for bio_add_page() to make sure the bio has been set up
correctly, otherwise we may submit incorrect data to device.

Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
---
 fs/ocfs2/cluster/heartbeat.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Joseph Qi March 28, 2018, 4:58 a.m. UTC | #1
On 18/3/28 11:50, piaojun wrote:
> We need check len for bio_add_page() to make sure the bio has been set up
> correctly, otherwise we may submit incorrect data to device.
> 
> Signed-off-by: Jun Piao <piaojun@huawei.com>
> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
> ---
>  fs/ocfs2/cluster/heartbeat.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index ea8c551..43ad79f 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
>  		     current_page, vec_len, vec_start);
> 
>  		len = bio_add_page(bio, page, vec_len, vec_start);
> -		if (len != vec_len) break;
> +		if (len != vec_len) {
> +			mlog(ML_ERROR, "Adding page[%d] to bio failed, "
> +			     "page %p, len %d, vec_len %u, vec_start %u, "
> +			     "bi_sector %llu\n", current_page, page, len,
> +			     vec_len, vec_start,
> +			     (unsigned long long)bio->bi_iter.bi_sector);
> +			bio_put(bio);
> +			bio = ERR_PTR(-EFAULT);

IMO, EFAULT is not an appropriate error code here.
If __bio_add_page returns 0, some are caused by bio checking failed.
Also I've noticed that several other callers just use ENOMEM, so I think
EINVAL or ENOMEM may be better.

Thanks,
Joseph

> +			return bio;
> +		}
> 
>  		cs += vec_len / (PAGE_SIZE/spp);
>  		vec_start = 0;
>
piaojun March 28, 2018, 7:02 a.m. UTC | #2
Hi Joseph,

On 2018/3/28 12:58, Joseph Qi wrote:
> 
> 
> On 18/3/28 11:50, piaojun wrote:
>> We need check len for bio_add_page() to make sure the bio has been set up
>> correctly, otherwise we may submit incorrect data to device.
>>
>> Signed-off-by: Jun Piao <piaojun@huawei.com>
>> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
>> ---
>>  fs/ocfs2/cluster/heartbeat.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
>> index ea8c551..43ad79f 100644
>> --- a/fs/ocfs2/cluster/heartbeat.c
>> +++ b/fs/ocfs2/cluster/heartbeat.c
>> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
>>  		     current_page, vec_len, vec_start);
>>
>>  		len = bio_add_page(bio, page, vec_len, vec_start);
>> -		if (len != vec_len) break;
>> +		if (len != vec_len) {
>> +			mlog(ML_ERROR, "Adding page[%d] to bio failed, "
>> +			     "page %p, len %d, vec_len %u, vec_start %u, "
>> +			     "bi_sector %llu\n", current_page, page, len,
>> +			     vec_len, vec_start,
>> +			     (unsigned long long)bio->bi_iter.bi_sector);
>> +			bio_put(bio);
>> +			bio = ERR_PTR(-EFAULT);
> 
> IMO, EFAULT is not an appropriate error code here.
> If __bio_add_page returns 0, some are caused by bio checking failed.
> Also I've noticed that several other callers just use ENOMEM, so I think
> EINVAL or ENOMEM may be better.

__bio_add_page has been deleted in patch c66a14d07c13, and I notice that
other callers always use -EFAULT or -EIO. I'm afraid we are not basing on
the same kernel source.

thansk,
Jun
> 
> Thanks,
> Joseph
> 
>> +			return bio;
>> +		}
>>
>>  		cs += vec_len / (PAGE_SIZE/spp);
>>  		vec_start = 0;
>>
> .
>
Joseph Qi March 28, 2018, 9:50 a.m. UTC | #3
On 18/3/28 15:02, piaojun wrote:
> Hi Joseph,
> 
> On 2018/3/28 12:58, Joseph Qi wrote:
>>
>>
>> On 18/3/28 11:50, piaojun wrote:
>>> We need check len for bio_add_page() to make sure the bio has been set up
>>> correctly, otherwise we may submit incorrect data to device.
>>>
>>> Signed-off-by: Jun Piao <piaojun@huawei.com>
>>> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
>>> ---
>>>  fs/ocfs2/cluster/heartbeat.c | 11 ++++++++++-
>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
>>> index ea8c551..43ad79f 100644
>>> --- a/fs/ocfs2/cluster/heartbeat.c
>>> +++ b/fs/ocfs2/cluster/heartbeat.c
>>> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
>>>  		     current_page, vec_len, vec_start);
>>>
>>>  		len = bio_add_page(bio, page, vec_len, vec_start);
>>> -		if (len != vec_len) break;
>>> +		if (len != vec_len) {
>>> +			mlog(ML_ERROR, "Adding page[%d] to bio failed, "
>>> +			     "page %p, len %d, vec_len %u, vec_start %u, "
>>> +			     "bi_sector %llu\n", current_page, page, len,
>>> +			     vec_len, vec_start,
>>> +			     (unsigned long long)bio->bi_iter.bi_sector);
>>> +			bio_put(bio);
>>> +			bio = ERR_PTR(-EFAULT);
>>
>> IMO, EFAULT is not an appropriate error code here.
>> If __bio_add_page returns 0, some are caused by bio checking failed.
>> Also I've noticed that several other callers just use ENOMEM, so I think
>> EINVAL or ENOMEM may be better.
> 
> __bio_add_page has been deleted in patch c66a14d07c13, and I notice that
> other callers always use -EFAULT or -EIO. I'm afraid we are not basing on
> the same kernel source.
> 

Oops... Yes, I was looking an old kernel...
EIO sounds reasonable, but I don't know why EFAULT since it means "Bad address".

Thanks,
Joseph
Changwei Ge March 29, 2018, 1:09 a.m. UTC | #4
Hi Jun,

On 2018/3/28 17:51, Joseph Qi wrote:
> 
> 
> On 18/3/28 15:02, piaojun wrote:
>> Hi Joseph,
>>
>> On 2018/3/28 12:58, Joseph Qi wrote:
>>>
>>>
>>> On 18/3/28 11:50, piaojun wrote:
>>>> We need check len for bio_add_page() to make sure the bio has been set up
>>>> correctly, otherwise we may submit incorrect data to device.
>>>>
>>>> Signed-off-by: Jun Piao <piaojun@huawei.com>
>>>> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
>>>> ---
>>>>   fs/ocfs2/cluster/heartbeat.c | 11 ++++++++++-
>>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
>>>> index ea8c551..43ad79f 100644
>>>> --- a/fs/ocfs2/cluster/heartbeat.c
>>>> +++ b/fs/ocfs2/cluster/heartbeat.c
>>>> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
>>>>   		     current_page, vec_len, vec_start);
>>>>
>>>>   		len = bio_add_page(bio, page, vec_len, vec_start);
>>>> -		if (len != vec_len) break;
>>>> +		if (len != vec_len) {
>>>> +			mlog(ML_ERROR, "Adding page[%d] to bio failed, "
>>>> +			     "page %p, len %d, vec_len %u, vec_start %u, "
>>>> +			     "bi_sector %llu\n", current_page, page, len,
>>>> +			     vec_len, vec_start,
>>>> +			     (unsigned long long)bio->bi_iter.bi_sector);
>>>> +			bio_put(bio);
>>>> +			bio = ERR_PTR(-EFAULT);
>>>
>>> IMO, EFAULT is not an appropriate error code here.
>>> If __bio_add_page returns 0, some are caused by bio checking failed.
>>> Also I've noticed that several other callers just use ENOMEM, so I think
>>> EINVAL or ENOMEM may be better.
>>
>> __bio_add_page has been deleted in patch c66a14d07c13, and I notice that
>> other callers always use -EFAULT or -EIO. I'm afraid we are not basing on
>> the same kernel source.
>>
> 
> Oops... Yes, I was looking an old kernel...
> EIO sounds reasonable, but I don't know why EFAULT since it means "Bad address".

I agree with Joseph that EFAULT seems unreasonable for this exception cached.
But your trick looks good to me.
After applying a more appropriate error number, please feel free to add my:
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>

Thanks,
Changwei


> 
> Thanks,
> Joseph
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
piaojun March 29, 2018, 1:12 a.m. UTC | #5
Hi Changwei and Joseph,

EIO sounds more reasonable, thanks a lot for your suggestions, and I will
send patch v2 later.

thanks,
Jun

On 2018/3/29 9:09, Changwei Ge wrote:
> Hi Jun,
> 
> On 2018/3/28 17:51, Joseph Qi wrote:
>>
>>
>> On 18/3/28 15:02, piaojun wrote:
>>> Hi Joseph,
>>>
>>> On 2018/3/28 12:58, Joseph Qi wrote:
>>>>
>>>>
>>>> On 18/3/28 11:50, piaojun wrote:
>>>>> We need check len for bio_add_page() to make sure the bio has been set up
>>>>> correctly, otherwise we may submit incorrect data to device.
>>>>>
>>>>> Signed-off-by: Jun Piao <piaojun@huawei.com>
>>>>> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
>>>>> ---
>>>>>   fs/ocfs2/cluster/heartbeat.c | 11 ++++++++++-
>>>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
>>>>> index ea8c551..43ad79f 100644
>>>>> --- a/fs/ocfs2/cluster/heartbeat.c
>>>>> +++ b/fs/ocfs2/cluster/heartbeat.c
>>>>> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
>>>>>   		     current_page, vec_len, vec_start);
>>>>>
>>>>>   		len = bio_add_page(bio, page, vec_len, vec_start);
>>>>> -		if (len != vec_len) break;
>>>>> +		if (len != vec_len) {
>>>>> +			mlog(ML_ERROR, "Adding page[%d] to bio failed, "
>>>>> +			     "page %p, len %d, vec_len %u, vec_start %u, "
>>>>> +			     "bi_sector %llu\n", current_page, page, len,
>>>>> +			     vec_len, vec_start,
>>>>> +			     (unsigned long long)bio->bi_iter.bi_sector);
>>>>> +			bio_put(bio);
>>>>> +			bio = ERR_PTR(-EFAULT);
>>>>
>>>> IMO, EFAULT is not an appropriate error code here.
>>>> If __bio_add_page returns 0, some are caused by bio checking failed.
>>>> Also I've noticed that several other callers just use ENOMEM, so I think
>>>> EINVAL or ENOMEM may be better.
>>>
>>> __bio_add_page has been deleted in patch c66a14d07c13, and I notice that
>>> other callers always use -EFAULT or -EIO. I'm afraid we are not basing on
>>> the same kernel source.
>>>
>>
>> Oops... Yes, I was looking an old kernel...
>> EIO sounds reasonable, but I don't know why EFAULT since it means "Bad address".
> 
> I agree with Joseph that EFAULT seems unreasonable for this exception cached.
> But your trick looks good to me.
> After applying a more appropriate error number, please feel free to add my:
> Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
> 
> Thanks,
> Changwei
> 
> 
>>
>> Thanks,
>> Joseph
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
> .
>
Changwei Ge April 11, 2018, 2:14 a.m. UTC | #6
Hi Jun,

Thanks for your patch.

I just applied your patch into my tree and triggered ocfs2-test.
Unfortunately, the very first case fails in making fs since bio can't accommodate more than 16 vecs.

Of course this is not introduced by your patch. You patch just makes this hidden issue visible.

I just want to remind if this patch is applied. The cluster scale can't exceed 16 nodes.

And I will try to post a patch to fix it.

Attach log:

Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329330] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329331] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329332] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329333] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329334] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329335] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329336] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329337] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329338] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329339] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329339] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329340] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329341] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329342] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329343] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329344] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329345] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329346] (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio failed, page ffffea0002d7ed40, len 0, vec_len 4096, vec_start 0, bi_sector 8192
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329357] (mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329361] (mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329364] (mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5


On 2018/3/28 11:52, piaojun wrote:
> We need check len for bio_add_page() to make sure the bio has been set up
> correctly, otherwise we may submit incorrect data to device.
> 
> Signed-off-by: Jun Piao <piaojun@huawei.com>
> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
> ---
>   fs/ocfs2/cluster/heartbeat.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index ea8c551..43ad79f 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
>   		     current_page, vec_len, vec_start);
> 
>   		len = bio_add_page(bio, page, vec_len, vec_start);
> -		if (len != vec_len) break;
> +		if (len != vec_len) {
> +			mlog(ML_ERROR, "Adding page[%d] to bio failed, "
> +			     "page %p, len %d, vec_len %u, vec_start %u, "
> +			     "bi_sector %llu\n", current_page, page, len,
> +			     vec_len, vec_start,
> +			     (unsigned long long)bio->bi_iter.bi_sector);
> +			bio_put(bio);
> +			bio = ERR_PTR(-EFAULT);
> +			return bio;
> +		}
> 
>   		cs += vec_len / (PAGE_SIZE/spp);
>   		vec_start = 0;
>

Patch
diff mbox

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index ea8c551..43ad79f 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -570,7 +570,16 @@  static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
 		     current_page, vec_len, vec_start);

 		len = bio_add_page(bio, page, vec_len, vec_start);
-		if (len != vec_len) break;
+		if (len != vec_len) {
+			mlog(ML_ERROR, "Adding page[%d] to bio failed, "
+			     "page %p, len %d, vec_len %u, vec_start %u, "
+			     "bi_sector %llu\n", current_page, page, len,
+			     vec_len, vec_start,
+			     (unsigned long long)bio->bi_iter.bi_sector);
+			bio_put(bio);
+			bio = ERR_PTR(-EFAULT);
+			return bio;
+		}

 		cs += vec_len / (PAGE_SIZE/spp);
 		vec_start = 0;