diff mbox series

ceph: redirty the folio/page when offset and size are not aligned

Message ID 20220505105808.35214-1-xiubli@redhat.com (mailing list archive)
State New, archived
Headers show
Series ceph: redirty the folio/page when offset and size are not aligned | expand

Commit Message

Xiubo Li May 5, 2022, 10:58 a.m. UTC
At the same time fix another buggy code because in writepages_finish
if the opcode doesn't equal to CEPH_OSD_OP_WRITE the request memory
must have been corrupted.

URL: https://tracker.ceph.com/issues/55421
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/addr.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Jeff Layton May 5, 2022, 11:32 a.m. UTC | #1
On Thu, 2022-05-05 at 18:58 +0800, Xiubo Li wrote:
> At the same time fix another buggy code because in writepages_finish
> if the opcode doesn't equal to CEPH_OSD_OP_WRITE the request memory
> must have been corrupted.
> 
> URL: https://tracker.ceph.com/issues/55421
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/addr.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index e52b62407b10..ae224135440b 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -146,6 +146,8 @@ static void ceph_invalidate_folio(struct folio *folio, size_t offset,
>  	if (offset != 0 || length != folio_size(folio)) {
>  		dout("%p invalidate_folio idx %lu partial dirty page %zu~%zu\n",
>  		     inode, folio->index, offset, length);
> +		filemap_dirty_folio(folio->mapping, folio);
> +		folio_account_redirty(folio);
>  		return;
>  	}
>  

This looks wrong to me. 

How do you know the page was dirty in the first place? The caller should
not have cleaned a dirty page without writing it back first so it should
still be dirty if it hasn't. I don't see that we need to do anything
like this.

> @@ -733,8 +735,7 @@ static void writepages_finish(struct ceph_osd_request *req)
>  
>  	/* clean all pages */
>  	for (i = 0; i < req->r_num_ops; i++) {
> -		if (req->r_ops[i].op != CEPH_OSD_OP_WRITE)
> -			break;
> +		BUG_ON(req->r_ops[i].op != CEPH_OSD_OP_WRITE);

I'd prefer we not BUG here. This does mean the data in the incoming
frame was probably scrambled, but I don't see that as a good reason to
crash the box.

Throwing a warning message would be fine here, but a WARN_ON is probably
not terribly helpful. Maybe add a pr_warn that dumps some info about the
request in this situation (index, tid, flags, rval, etc...) ?


>  
>  		osd_data = osd_req_op_extent_osd_data(req, i);
>  		BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_PAGES);
Xiubo Li May 5, 2022, 11:58 a.m. UTC | #2
On 5/5/22 7:32 PM, Jeff Layton wrote:
> On Thu, 2022-05-05 at 18:58 +0800, Xiubo Li wrote:
>> At the same time fix another buggy code because in writepages_finish
>> if the opcode doesn't equal to CEPH_OSD_OP_WRITE the request memory
>> must have been corrupted.
>>
>> URL: https://tracker.ceph.com/issues/55421
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/addr.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
>> index e52b62407b10..ae224135440b 100644
>> --- a/fs/ceph/addr.c
>> +++ b/fs/ceph/addr.c
>> @@ -146,6 +146,8 @@ static void ceph_invalidate_folio(struct folio *folio, size_t offset,
>>   	if (offset != 0 || length != folio_size(folio)) {
>>   		dout("%p invalidate_folio idx %lu partial dirty page %zu~%zu\n",
>>   		     inode, folio->index, offset, length);
>> +		filemap_dirty_folio(folio->mapping, folio);
>> +		folio_account_redirty(folio);
>>   		return;
>>   	}
>>   
> This looks wrong to me.
>
> How do you know the page was dirty in the first place? The caller should
> not have cleaned a dirty page without writing it back first so it should
> still be dirty if it hasn't. I don't see that we need to do anything
> like this.

Correct, check the history of the related commits and the vfs code, it's 
possible the page is none-dirty.

 From ceph_writepages_start() and writepage_nounlock() they will clear 
the dirty bit, and the same time the offset and size will be aligned 
too, so from ceph callers it should be okay.

The other case is from  mm/truncate.c, they don't care about the dirty bit.

>> @@ -733,8 +735,7 @@ static void writepages_finish(struct ceph_osd_request *req)
>>   
>>   	/* clean all pages */
>>   	for (i = 0; i < req->r_num_ops; i++) {
>> -		if (req->r_ops[i].op != CEPH_OSD_OP_WRITE)
>> -			break;
>> +		BUG_ON(req->r_ops[i].op != CEPH_OSD_OP_WRITE);
> I'd prefer we not BUG here. This does mean the data in the incoming
> frame was probably scrambled, but I don't see that as a good reason to
> crash the box.
>
> Throwing a warning message would be fine here, but a WARN_ON is probably
> not terribly helpful. Maybe add a pr_warn that dumps some info about the
> request in this situation (index, tid, flags, rval, etc...) ?
>
Looks good. Will fix it.

-- Xiubo

>>   
>>   		osd_data = osd_req_op_extent_osd_data(req, i);
>>   		BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_PAGES);
diff mbox series

Patch

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index e52b62407b10..ae224135440b 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -146,6 +146,8 @@  static void ceph_invalidate_folio(struct folio *folio, size_t offset,
 	if (offset != 0 || length != folio_size(folio)) {
 		dout("%p invalidate_folio idx %lu partial dirty page %zu~%zu\n",
 		     inode, folio->index, offset, length);
+		filemap_dirty_folio(folio->mapping, folio);
+		folio_account_redirty(folio);
 		return;
 	}
 
@@ -733,8 +735,7 @@  static void writepages_finish(struct ceph_osd_request *req)
 
 	/* clean all pages */
 	for (i = 0; i < req->r_num_ops; i++) {
-		if (req->r_ops[i].op != CEPH_OSD_OP_WRITE)
-			break;
+		BUG_ON(req->r_ops[i].op != CEPH_OSD_OP_WRITE);
 
 		osd_data = osd_req_op_extent_osd_data(req, i);
 		BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_PAGES);