diff mbox series

[RFC] ceph: fix out-of-bound array access when doing a file read

Message ID 20240822150113.14274-1-luis.henriques@linux.dev (mailing list archive)
State New, archived
Headers show
Series [RFC] ceph: fix out-of-bound array access when doing a file read | expand

Commit Message

Luis Henriques Aug. 22, 2024, 3:01 p.m. UTC
If, while doing a read, the inode is updated and the size is set to zero,
__ceph_sync_read() may not be able to handle it.  It is thus easy to hit a
NULL pointer dereferrence by continuously reading a file while, on another
client, we keep truncating and writing new data into it.

This patch fixes the issue by adding extra checks to avoid integer overflows
for the case of a zero size inode.  This will prevent the loop doing page
copies from running and thus accessing the pages[] array beyond num_pages.

Link: https://tracker.ceph.com/issues/67524
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
---
Hi!

Please note that this patch is only lightly tested and, to be honest, I'm
not sure if this is the correct way to fix this bug.  For example, if the
inode size is 0, then maybe ceph_osdc_wait_request() should have returned
0 and the problem would be solved.  However, it seems to be returning the
size of the reply message and that's not something easy to change.  Or maybe
I'm just reading it wrong.  Anyway, this is just an RFC to see if there's
other ideas.

Also, the tracker contains a simple testcase for crashing the client.

 fs/ceph/file.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Xiubo Li Aug. 23, 2024, 1:48 a.m. UTC | #1
On 8/22/24 23:01, Luis Henriques (SUSE) wrote:
> If, while doing a read, the inode is updated and the size is set to zero,
> __ceph_sync_read() may not be able to handle it.  It is thus easy to hit a
> NULL pointer dereferrence by continuously reading a file while, on another
> client, we keep truncating and writing new data into it.
>
> This patch fixes the issue by adding extra checks to avoid integer overflows
> for the case of a zero size inode.  This will prevent the loop doing page
> copies from running and thus accessing the pages[] array beyond num_pages.
>
> Link: https://tracker.ceph.com/issues/67524
> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
> ---
> Hi!
>
> Please note that this patch is only lightly tested and, to be honest, I'm
> not sure if this is the correct way to fix this bug.  For example, if the
> inode size is 0, then maybe ceph_osdc_wait_request() should have returned
> 0 and the problem would be solved.  However, it seems to be returning the
> size of the reply message and that's not something easy to change.  Or maybe
> I'm just reading it wrong.  Anyway, this is just an RFC to see if there's
> other ideas.
>
> Also, the tracker contains a simple testcase for crashing the client.
>
>   fs/ceph/file.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 4b8d59ebda00..dc23d5e5b11e 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1200,9 +1200,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   		}
>   
>   		idx = 0;
> -		if (ret <= 0)
> +		if ((ret <= 0) || (i_size == 0))

Hi Luis,

This change looks incorrect to me.

As I mentioned before when the 'IFILE' lock is in MIX state the 'Frw' 
caps could be issued to multiple clients at the same time. Which means 
the file could be updated by another client and the local 'i_size' may 
haven't been changed in time. So in this case the 'ret' will be larger 
than '0' and the 'i_size' could be '0'.


>   			left = 0;
> -		else if (off + ret > i_size)
> +		else if ((i_size >= off) && (off + ret > i_size))

And the 'off' also could equal to little than the 'i_size'.

BTW, could you reproduce the crash issue ?

Thanks

- Xiubo

>   			left = i_size - off;
>   		else
>   			left = ret;
> @@ -1210,6 +1210,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   			size_t plen, copied;
>   
>   			plen = min_t(size_t, left, PAGE_SIZE - page_off);
> +			WARN_ON_ONCE(idx >= num_pages);
>   			SetPageUptodate(pages[idx]);
>   			copied = copy_page_to_iter(pages[idx++],
>   						   page_off, plen, to);
> @@ -1234,7 +1235,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   	}
>   
>   	if (ret > 0) {
> -		if (off >= i_size) {
> +		if ((i_size >= *ki_pos) && (off >= i_size)) {
>   			*retry_op = CHECK_EOF;
>   			ret = i_size - *ki_pos;
>   			*ki_pos = i_size;
>
Luis Henriques Aug. 23, 2024, 7:25 a.m. UTC | #2
On Fri, Aug 23 2024, Xiubo Li wrote:

> On 8/22/24 23:01, Luis Henriques (SUSE) wrote:
>> If, while doing a read, the inode is updated and the size is set to zero,
>> __ceph_sync_read() may not be able to handle it.  It is thus easy to hit a
>> NULL pointer dereferrence by continuously reading a file while, on another
>> client, we keep truncating and writing new data into it.
>>
>> This patch fixes the issue by adding extra checks to avoid integer overflows
>> for the case of a zero size inode.  This will prevent the loop doing page
>> copies from running and thus accessing the pages[] array beyond num_pages.
>>
>> Link: https://tracker.ceph.com/issues/67524
>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
>> ---
>> Hi!
>>
>> Please note that this patch is only lightly tested and, to be honest, I'm
>> not sure if this is the correct way to fix this bug.  For example, if the
>> inode size is 0, then maybe ceph_osdc_wait_request() should have returned
>> 0 and the problem would be solved.  However, it seems to be returning the
>> size of the reply message and that's not something easy to change.  Or maybe
>> I'm just reading it wrong.  Anyway, this is just an RFC to see if there's
>> other ideas.
>>
>> Also, the tracker contains a simple testcase for crashing the client.
>>
>>   fs/ceph/file.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 4b8d59ebda00..dc23d5e5b11e 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -1200,9 +1200,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>   		}
>>     		idx = 0;
>> -		if (ret <= 0)
>> +		if ((ret <= 0) || (i_size == 0))
>
> Hi Luis,
>
> This change looks incorrect to me.
>
> As I mentioned before when the 'IFILE' lock is in MIX state the 'Frw' caps could
> be issued to multiple clients at the same time. Which means the file could be
> updated by another client and the local 'i_size' may haven't been changed in
> time. So in this case the 'ret' will be larger than '0' and the 'i_size' could
> be '0'.
>
>
>>   			left = 0;
>> -		else if (off + ret > i_size)
>> +		else if ((i_size >= off) && (off + ret > i_size))
>
> And the 'off' also could equal to little than the 'i_size'.
>
> BTW, could you reproduce the crash issue ?

Yes, 100% reproducible :-)

See https://tracker.ceph.com/issues/67524

Cheers,
Luis Henriques Aug. 23, 2024, 10:33 a.m. UTC | #3
On Fri, Aug 23 2024, Xiubo Li wrote:

> On 8/22/24 23:01, Luis Henriques (SUSE) wrote:
>> If, while doing a read, the inode is updated and the size is set to zero,
>> __ceph_sync_read() may not be able to handle it.  It is thus easy to hit a
>> NULL pointer dereferrence by continuously reading a file while, on another
>> client, we keep truncating and writing new data into it.
>>
>> This patch fixes the issue by adding extra checks to avoid integer overflows
>> for the case of a zero size inode.  This will prevent the loop doing page
>> copies from running and thus accessing the pages[] array beyond num_pages.
>>
>> Link: https://tracker.ceph.com/issues/67524
>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
>> ---
>> Hi!
>>
>> Please note that this patch is only lightly tested and, to be honest, I'm
>> not sure if this is the correct way to fix this bug.  For example, if the
>> inode size is 0, then maybe ceph_osdc_wait_request() should have returned
>> 0 and the problem would be solved.  However, it seems to be returning the
>> size of the reply message and that's not something easy to change.  Or maybe
>> I'm just reading it wrong.  Anyway, this is just an RFC to see if there's
>> other ideas.
>>
>> Also, the tracker contains a simple testcase for crashing the client.
>>
>>   fs/ceph/file.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 4b8d59ebda00..dc23d5e5b11e 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -1200,9 +1200,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>   		}
>>     		idx = 0;
>> -		if (ret <= 0)
>> +		if ((ret <= 0) || (i_size == 0))
>
> Hi Luis,
>
> This change looks incorrect to me.
>
> As I mentioned before when the 'IFILE' lock is in MIX state the 'Frw' caps could
> be issued to multiple clients at the same time. Which means the file could be
> updated by another client and the local 'i_size' may haven't been changed in
> time. So in this case the 'ret' will be larger than '0' and the 'i_size' could
> be '0'.
>
>
>>   			left = 0;
>> -		else if (off + ret > i_size)
>> +		else if ((i_size >= off) && (off + ret > i_size))
>
> And the 'off' also could equal to little than the 'i_size'.

(I forgot to comment here.)

This change is _exactly_ what will prevent the NULL pointer from
occurring, because if 'i_size' is 0, then:

		left = i_size - off;

will leave 'left' with a huge value.  And the loop 'while (left > 0) {}'
will execute until the access to 'pages[idx]' crashes.

Cheers,
Xiubo Li Aug. 25, 2024, 11:53 p.m. UTC | #4
On 8/23/24 15:25, Luis Henriques wrote:
> On Fri, Aug 23 2024, Xiubo Li wrote:
>
>> On 8/22/24 23:01, Luis Henriques (SUSE) wrote:
>>> If, while doing a read, the inode is updated and the size is set to zero,
>>> __ceph_sync_read() may not be able to handle it.  It is thus easy to hit a
>>> NULL pointer dereferrence by continuously reading a file while, on another
>>> client, we keep truncating and writing new data into it.
>>>
>>> This patch fixes the issue by adding extra checks to avoid integer overflows
>>> for the case of a zero size inode.  This will prevent the loop doing page
>>> copies from running and thus accessing the pages[] array beyond num_pages.
>>>
>>> Link: https://tracker.ceph.com/issues/67524
>>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
>>> ---
>>> Hi!
>>>
>>> Please note that this patch is only lightly tested and, to be honest, I'm
>>> not sure if this is the correct way to fix this bug.  For example, if the
>>> inode size is 0, then maybe ceph_osdc_wait_request() should have returned
>>> 0 and the problem would be solved.  However, it seems to be returning the
>>> size of the reply message and that's not something easy to change.  Or maybe
>>> I'm just reading it wrong.  Anyway, this is just an RFC to see if there's
>>> other ideas.
>>>
>>> Also, the tracker contains a simple testcase for crashing the client.
>>>
>>>    fs/ceph/file.c | 7 ++++---
>>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>> index 4b8d59ebda00..dc23d5e5b11e 100644
>>> --- a/fs/ceph/file.c
>>> +++ b/fs/ceph/file.c
>>> @@ -1200,9 +1200,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>>    		}
>>>      		idx = 0;
>>> -		if (ret <= 0)
>>> +		if ((ret <= 0) || (i_size == 0))
>> Hi Luis,
>>
>> This change looks incorrect to me.
>>
>> As I mentioned before when the 'IFILE' lock is in MIX state the 'Frw' caps could
>> be issued to multiple clients at the same time. Which means the file could be
>> updated by another client and the local 'i_size' may haven't been changed in
>> time. So in this case the 'ret' will be larger than '0' and the 'i_size' could
>> be '0'.
>>
>>
>>>    			left = 0;
>>> -		else if (off + ret > i_size)
>>> +		else if ((i_size >= off) && (off + ret > i_size))
>> And the 'off' also could equal to little than the 'i_size'.
>>
>> BTW, could you reproduce the crash issue ?
> Yes, 100% reproducible :-)
>
> See https://tracker.ceph.com/issues/67524

Okay, Let me have a try about this.

Thanks


>
> Cheers,
Luis Henriques Aug. 27, 2024, 1:36 p.m. UTC | #5
On Thu, Aug 22 2024, Luis Henriques (SUSE) wrote:

> If, while doing a read, the inode is updated and the size is set to zero,
> __ceph_sync_read() may not be able to handle it.  It is thus easy to hit a
> NULL pointer dereferrence by continuously reading a file while, on another
> client, we keep truncating and writing new data into it.
>
> This patch fixes the issue by adding extra checks to avoid integer overflows
> for the case of a zero size inode.  This will prevent the loop doing page
> copies from running and thus accessing the pages[] array beyond num_pages.
>
> Link: https://tracker.ceph.com/issues/67524
> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
> ---
> Hi!
>
> Please note that this patch is only lightly tested and, to be honest, I'm
> not sure if this is the correct way to fix this bug.  For example, if the
> inode size is 0, then maybe ceph_osdc_wait_request() should have returned
> 0 and the problem would be solved.  However, it seems to be returning the
> size of the reply message and that's not something easy to change.  Or maybe
> I'm just reading it wrong.  Anyway, this is just an RFC to see if there's
> other ideas.
>
> Also, the tracker contains a simple testcase for crashing the client.

Just for the record, I've done a quick bisect as this bug is easily
reproducible.  The issue was introduced in v6.9-rc1, with commit
1065da21e5df ("ceph: stop copying to iter at EOF on sync reads").
Reverting it makes the crash go away.

Cheers,
Xiubo Li Aug. 28, 2024, 5:47 a.m. UTC | #6
On 8/27/24 21:36, Luis Henriques wrote:
> On Thu, Aug 22 2024, Luis Henriques (SUSE) wrote:
>
>> If, while doing a read, the inode is updated and the size is set to zero,
>> __ceph_sync_read() may not be able to handle it.  It is thus easy to hit a
>> NULL pointer dereferrence by continuously reading a file while, on another
>> client, we keep truncating and writing new data into it.
>>
>> This patch fixes the issue by adding extra checks to avoid integer overflows
>> for the case of a zero size inode.  This will prevent the loop doing page
>> copies from running and thus accessing the pages[] array beyond num_pages.
>>
>> Link: https://tracker.ceph.com/issues/67524
>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
>> ---
>> Hi!
>>
>> Please note that this patch is only lightly tested and, to be honest, I'm
>> not sure if this is the correct way to fix this bug.  For example, if the
>> inode size is 0, then maybe ceph_osdc_wait_request() should have returned
>> 0 and the problem would be solved.  However, it seems to be returning the
>> size of the reply message and that's not something easy to change.  Or maybe
>> I'm just reading it wrong.  Anyway, this is just an RFC to see if there's
>> other ideas.
>>
>> Also, the tracker contains a simple testcase for crashing the client.
> Just for the record, I've done a quick bisect as this bug is easily
> reproducible.  The issue was introduced in v6.9-rc1, with commit
> 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads").
> Reverting it makes the crash go away.

Thanks very much Luis.

So let's try to find the root cause of it and then improve the patch.

Thanks

- Xiubo


> Cheers,
Luis Henriques Aug. 28, 2024, 3:48 p.m. UTC | #7
On Wed, Aug 28 2024, Xiubo Li wrote:

> On 8/27/24 21:36, Luis Henriques wrote:
>> On Thu, Aug 22 2024, Luis Henriques (SUSE) wrote:
>>
>>> If, while doing a read, the inode is updated and the size is set to zero,
>>> __ceph_sync_read() may not be able to handle it.  It is thus easy to hit a
>>> NULL pointer dereferrence by continuously reading a file while, on another
>>> client, we keep truncating and writing new data into it.
>>>
>>> This patch fixes the issue by adding extra checks to avoid integer overflows
>>> for the case of a zero size inode.  This will prevent the loop doing page
>>> copies from running and thus accessing the pages[] array beyond num_pages.
>>>
>>> Link: https://tracker.ceph.com/issues/67524
>>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
>>> ---
>>> Hi!
>>>
>>> Please note that this patch is only lightly tested and, to be honest, I'm
>>> not sure if this is the correct way to fix this bug.  For example, if the
>>> inode size is 0, then maybe ceph_osdc_wait_request() should have returned
>>> 0 and the problem would be solved.  However, it seems to be returning the
>>> size of the reply message and that's not something easy to change.  Or maybe
>>> I'm just reading it wrong.  Anyway, this is just an RFC to see if there's
>>> other ideas.
>>>
>>> Also, the tracker contains a simple testcase for crashing the client.
>> Just for the record, I've done a quick bisect as this bug is easily
>> reproducible.  The issue was introduced in v6.9-rc1, with commit
>> 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads").
>> Reverting it makes the crash go away.
>
> Thanks very much Luis.
>
> So let's try to find the root cause of it and then improve the patch.

What's happening is that we have an inode with size 0, but we are not
checking it's size.  The bug is easy to trigger (at least in my test
environment), and the conditions for it are:

 1) the inode size has to be 0, and
 2) the read has to return data ('ret = ceph_osdc_wait_request()').

This will lead to 'left' being set to huge values due to the overflow in:

	left = i_size - off;

However, some times (maybe most of the time) __ceph_sync_read() will not
crash and will return -EFAULT instead.  In the 'while (left > 0) { ... }'
loop, the condition '(copied < plen)' will be true and this error is
returned in the first iteration of the loop.

So, here's a much simpler approach to fix this issue: to bailout if we
have a 0-sized inode.  What do you think?

Cheers,
Xiubo Li Sept. 6, 2024, 11:08 a.m. UTC | #8
Hi Luis,

Sorry for late reply.

On 8/28/24 23:48, Luis Henriques wrote:
> On Wed, Aug 28 2024, Xiubo Li wrote:
>
>> On 8/27/24 21:36, Luis Henriques wrote:
>>> On Thu, Aug 22 2024, Luis Henriques (SUSE) wrote:
>>>
>>>> If, while doing a read, the inode is updated and the size is set to zero,
>>>> __ceph_sync_read() may not be able to handle it.  It is thus easy to hit a
>>>> NULL pointer dereferrence by continuously reading a file while, on another
>>>> client, we keep truncating and writing new data into it.
>>>>
>>>> This patch fixes the issue by adding extra checks to avoid integer overflows
>>>> for the case of a zero size inode.  This will prevent the loop doing page
>>>> copies from running and thus accessing the pages[] array beyond num_pages.
>>>>
>>>> Link: https://tracker.ceph.com/issues/67524
>>>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
>>>> ---
>>>> Hi!
>>>>
>>>> Please note that this patch is only lightly tested and, to be honest, I'm
>>>> not sure if this is the correct way to fix this bug.  For example, if the
>>>> inode size is 0, then maybe ceph_osdc_wait_request() should have returned
>>>> 0 and the problem would be solved.  However, it seems to be returning the
>>>> size of the reply message and that's not something easy to change.  Or maybe
>>>> I'm just reading it wrong.  Anyway, this is just an RFC to see if there's
>>>> other ideas.
>>>>
>>>> Also, the tracker contains a simple testcase for crashing the client.
>>> Just for the record, I've done a quick bisect as this bug is easily
>>> reproducible.  The issue was introduced in v6.9-rc1, with commit
>>> 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads").
>>> Reverting it makes the crash go away.
>> Thanks very much Luis.
>>
>> So let's try to find the root cause of it and then improve the patch.
> What's happening is that we have an inode with size 0, but we are not
> checking it's size.  The bug is easy to trigger (at least in my test
> environment), and the conditions for it are:
>
>   1) the inode size has to be 0, and
>   2) the read has to return data ('ret = ceph_osdc_wait_request()').
>
> This will lead to 'left' being set to huge values due to the overflow in:
>
> 	left = i_size - off;
>
> However, some times (maybe most of the time) __ceph_sync_read() will not
> crash and will return -EFAULT instead.  In the 'while (left > 0) { ... }'
> loop, the condition '(copied < plen)' will be true and this error is
> returned in the first iteration of the loop.
>
> So, here's a much simpler approach to fix this issue: to bailout if we
> have a 0-sized inode.  What do you think?

I saw your V2 let's discuss there.

Thanks

- Xiubo


>
> Cheers,
diff mbox series

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 4b8d59ebda00..dc23d5e5b11e 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1200,9 +1200,9 @@  ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 		}
 
 		idx = 0;
-		if (ret <= 0)
+		if ((ret <= 0) || (i_size == 0))
 			left = 0;
-		else if (off + ret > i_size)
+		else if ((i_size >= off) && (off + ret > i_size))
 			left = i_size - off;
 		else
 			left = ret;
@@ -1210,6 +1210,7 @@  ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 			size_t plen, copied;
 
 			plen = min_t(size_t, left, PAGE_SIZE - page_off);
+			WARN_ON_ONCE(idx >= num_pages);
 			SetPageUptodate(pages[idx]);
 			copied = copy_page_to_iter(pages[idx++],
 						   page_off, plen, to);
@@ -1234,7 +1235,7 @@  ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 	}
 
 	if (ret > 0) {
-		if (off >= i_size) {
+		if ((i_size >= *ki_pos) && (off >= i_size)) {
 			*retry_op = CHECK_EOF;
 			ret = i_size - *ki_pos;
 			*ki_pos = i_size;