[linux-cifs-client] cifs: Fix insufficient memory allocation for nativeFileSystem field
diff mbox

Message ID 49D9A9D6.10003@suse.de
State New, archived
Headers show

Commit Message

Suresh Jayaraman April 6, 2009, 7:05 a.m. UTC
The upstream commit b363b3304bcf68c4541683b2eff70b29f0446a5b attempted
to fix memory overwrite during tree connect response processing while
mounting. However, the memory allocated may still be insufficient as
UTF-8 string can be upto 4X times as UCS. So, would it be safe to
allocate memory that is 4X instead of 2X?

Noticed by Marcus Meissner <meissner@suse.de>.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 fs/cifs/connect.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Jeff Layton April 6, 2009, 11:53 a.m. UTC | #1
On Mon, 06 Apr 2009 12:35:58 +0530
Suresh Jayaraman <sjayaraman@suse.de> wrote:

> The upstream commit b363b3304bcf68c4541683b2eff70b29f0446a5b attempted
> to fix memory overwrite during tree connect response processing while
> mounting. However, the memory allocated may still be insufficient as
> UTF-8 string can be upto 4X times as UCS. So, would it be safe to
> allocate memory that is 4X instead of 2X?
> 
> Noticed by Marcus Meissner <meissner@suse.de>.
> 
> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
> ---
>  fs/cifs/connect.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 0de3b56..b361be0 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -3674,7 +3674,7 @@ CIFSTCon(unsigned int xid, struct cifsSesInfo *ses,
>  			    BCC(smb_buffer_response)) {
>  				kfree(tcon->nativeFileSystem);
>  				tcon->nativeFileSystem =
> -				    kzalloc(2*(length + 1), GFP_KERNEL);
> +				    kzalloc((4 * length) + 2, GFP_KERNEL);
>  				if (tcon->nativeFileSystem)
>  					cifs_strfromUCS_le(
>  						tcon->nativeFileSystem,

Wait...is this even enough? It looks like nls.h defines this:

/* this value hold the maximum octet of charset */
#define NLS_MAX_CHARSET_SIZE 6 /* for UTF-8 */

...it really looks like this needs to use the same constant.

There are other places in this code that make this sort of allocation.
Could you audit and fix them too? A better solution is really needed
here.

A helper function that basically does the allocation and buffer-length
limited conversion would be ideal. We have some functions that sort of
do this, but none of them seem to be quite right. Maybe the best thing
is just to fix cifs_strncpy_to_host() so that it's right and fix most
of the places that do this allocation manually to do it using that
function instead.
Suresh Jayaraman April 6, 2009, 12:50 p.m. UTC | #2
Jeff Layton wrote:
> On Mon, 06 Apr 2009 12:35:58 +0530
> Suresh Jayaraman <sjayaraman@suse.de> wrote:

>> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
>> index 0de3b56..b361be0 100644
>> --- a/fs/cifs/connect.c
>> +++ b/fs/cifs/connect.c
>> @@ -3674,7 +3674,7 @@ CIFSTCon(unsigned int xid, struct cifsSesInfo *ses,
>>  			    BCC(smb_buffer_response)) {
>>  				kfree(tcon->nativeFileSystem);
>>  				tcon->nativeFileSystem =
>> -				    kzalloc(2*(length + 1), GFP_KERNEL);
>> +				    kzalloc((4 * length) + 2, GFP_KERNEL);
>>  				if (tcon->nativeFileSystem)
>>  					cifs_strfromUCS_le(
>>  						tcon->nativeFileSystem,
> 
> Wait...is this even enough? It looks like nls.h defines this:
> 
> /* this value hold the maximum octet of charset */
> #define NLS_MAX_CHARSET_SIZE 6 /* for UTF-8 */
> 
> ...it really looks like this needs to use the same constant.

True. Seems I was influenced by a comment in fs/cifs/sess.c

313         /* UTF-8 string will not grow more than four times as big as
UCS-16 */

> There are other places in this code that make this sort of allocation.
> Could you audit and fix them too? A better solution is really needed
> here.

Yeah, there are other files (for e.g sess.c) as well. Looks like we
intentionally limit maxlen to 512, may be we want to limit in NL case
too? Wondering what could be the rationale behind this..

> A helper function that basically does the allocation and buffer-length
> limited conversion would be ideal. We have some functions that sort of
> do this, but none of them seem to be quite right. Maybe the best thing
> is just to fix cifs_strncpy_to_host() so that it's right and fix most
> of the places that do this allocation manually to do it using that
> function instead.
> 

I agree. This seems the right thing to me, too. I'll try to fix and
resend this patch.


Thanks,
Jeff Layton April 6, 2009, 1:22 p.m. UTC | #3
On Mon, 06 Apr 2009 18:20:21 +0530
Suresh Jayaraman <sjayaraman@suse.de> wrote:

> Jeff Layton wrote:
> > On Mon, 06 Apr 2009 12:35:58 +0530
> > Suresh Jayaraman <sjayaraman@suse.de> wrote:
> 
> >> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> >> index 0de3b56..b361be0 100644
> >> --- a/fs/cifs/connect.c
> >> +++ b/fs/cifs/connect.c
> >> @@ -3674,7 +3674,7 @@ CIFSTCon(unsigned int xid, struct cifsSesInfo *ses,
> >>  			    BCC(smb_buffer_response)) {
> >>  				kfree(tcon->nativeFileSystem);
> >>  				tcon->nativeFileSystem =
> >> -				    kzalloc(2*(length + 1), GFP_KERNEL);
> >> +				    kzalloc((4 * length) + 2, GFP_KERNEL);
> >>  				if (tcon->nativeFileSystem)
> >>  					cifs_strfromUCS_le(
> >>  						tcon->nativeFileSystem,
> > 
> > Wait...is this even enough? It looks like nls.h defines this:
> > 
> > /* this value hold the maximum octet of charset */
> > #define NLS_MAX_CHARSET_SIZE 6 /* for UTF-8 */
> > 
> > ...it really looks like this needs to use the same constant.
> 
> True. Seems I was influenced by a comment in fs/cifs/sess.c
> 
> 313         /* UTF-8 string will not grow more than four times as big as
> UCS-16 */
> 

That looks like that's wrong (or at least potentially so). AFAICT, UTF-8
allows up to 6 bytes per character. I suppose that it's possible that
none of the characters allowed in UCS-16 will ever translate to a
character that's more than 4 bytes, but I'd like to see that confirmed
before we depend on it.

> > There are other places in this code that make this sort of allocation.
> > Could you audit and fix them too? A better solution is really needed
> > here.
> 
> Yeah, there are other files (for e.g sess.c) as well. Looks like we
> intentionally limit maxlen to 512, may be we want to limit in NL case
> too? Wondering what could be the rationale behind this..
> 

Good Q. This is the problem with sprinkling "magic numbers" around the
code.

> > A helper function that basically does the allocation and buffer-length
> > limited conversion would be ideal. We have some functions that sort of
> > do this, but none of them seem to be quite right. Maybe the best thing
> > is just to fix cifs_strncpy_to_host() so that it's right and fix most
> > of the places that do this allocation manually to do it using that
> > function instead.
> > 
> 
> I agree. This seems the right thing to me, too. I'll try to fix and
> resend this patch.
> 
> 

Excellent. I look forward to seeing it.
simo April 6, 2009, 2:02 p.m. UTC | #4
On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote:
> > True. Seems I was influenced by a comment in fs/cifs/sess.c
> > 
> > 313         /* UTF-8 string will not grow more than four times as
> big as
> > UCS-16 */
> > 
> 
> That looks like that's wrong (or at least potentially so). AFAICT,
> UTF-8
> allows up to 6 bytes per character. I suppose that it's possible that
> none of the characters allowed in UCS-16 will ever translate to a
> character that's more than 4 bytes, but I'd like to see that confirmed
> before we depend on it.

I wonder whats UCS-16 tho, UCS-16 does not exist :)

It may be either UTF16 or UCS2.
Both these charsets have a base length of 2 bytes per character. UCS2 is
limited to 65535 values, while UTF-16 is a multi-word charset.

If the comment above is to be read as "UTF-8 string will not grow more
than four times as big as UCS-2/UTF-16" then what it is saying is that
at maximum an UTF-8 chars can be 4 words (or 8 bytes long).
IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max
estimate seem correct.

If "length" in the code was the length in bytes, and not the number of
characters, of an UCS-2/UTF-16 string than 4*length should, indeed be
long enough.

Simo.
Jeff Layton April 6, 2009, 2:08 p.m. UTC | #5
On Mon, 06 Apr 2009 14:02:25 +0000
simo <idra@samba.org> wrote:

> On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote:
> > > True. Seems I was influenced by a comment in fs/cifs/sess.c
> > > 
> > > 313         /* UTF-8 string will not grow more than four times as
> > big as
> > > UCS-16 */
> > > 
> > 
> > That looks like that's wrong (or at least potentially so). AFAICT,
> > UTF-8
> > allows up to 6 bytes per character. I suppose that it's possible that
> > none of the characters allowed in UCS-16 will ever translate to a
> > character that's more than 4 bytes, but I'd like to see that confirmed
> > before we depend on it.
> 
> I wonder whats UCS-16 tho, UCS-16 does not exist :)
> 
> It may be either UTF16 or UCS2.
> Both these charsets have a base length of 2 bytes per character. UCS2 is
> limited to 65535 values, while UTF-16 is a multi-word charset.
> 
> If the comment above is to be read as "UTF-8 string will not grow more
> than four times as big as UCS-2/UTF-16" then what it is saying is that
> at maximum an UTF-8 chars can be 4 words (or 8 bytes long).
> IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max
> estimate seem correct.
> 
> If "length" in the code was the length in bytes, and not the number of
> characters, of an UCS-2/UTF-16 string than 4*length should, indeed be
> long enough.
> 

Ahh, you're correct. I guess I'm accustomed to thinking about lengths in
bytes. I guess though this means that 4*length is allocating too
much...wouldn't 3*length then be right (assuming NULL termination is
also accounted for) ?

Regardless of the math, I'd like to see all of this moved into some
nice, well commented helper functions instead of being open-coded all
over the place. It's just too easy to get this stuff wrong. Let's solve
this in a way that makes it easier in the future.
simo April 6, 2009, 2:30 p.m. UTC | #6
On Mon, 2009-04-06 at 10:08 -0400, Jeff Layton wrote:
> On Mon, 06 Apr 2009 14:02:25 +0000
> simo <idra@samba.org> wrote:
> 
> > On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote:
> > > > True. Seems I was influenced by a comment in fs/cifs/sess.c
> > > > 
> > > > 313         /* UTF-8 string will not grow more than four times as
> > > big as
> > > > UCS-16 */
> > > > 
> > > 
> > > That looks like that's wrong (or at least potentially so). AFAICT,
> > > UTF-8
> > > allows up to 6 bytes per character. I suppose that it's possible that
> > > none of the characters allowed in UCS-16 will ever translate to a
> > > character that's more than 4 bytes, but I'd like to see that confirmed
> > > before we depend on it.
> > 
> > I wonder whats UCS-16 tho, UCS-16 does not exist :)
> > 
> > It may be either UTF16 or UCS2.
> > Both these charsets have a base length of 2 bytes per character. UCS2 is
> > limited to 65535 values, while UTF-16 is a multi-word charset.
> > 
> > If the comment above is to be read as "UTF-8 string will not grow more
> > than four times as big as UCS-2/UTF-16" then what it is saying is that
> > at maximum an UTF-8 chars can be 4 words (or 8 bytes long).
> > IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max
> > estimate seem correct.
> > 
> > If "length" in the code was the length in bytes, and not the number of
> > characters, of an UCS-2/UTF-16 string than 4*length should, indeed be
> > long enough.
> > 
> 
> Ahh, you're correct. I guess I'm accustomed to thinking about lengths in
> bytes. I guess though this means that 4*length is allocating too
> much...wouldn't 3*length then be right (assuming NULL termination is
> also accounted for) ?
> 
> Regardless of the math, I'd like to see all of this moved into some
> nice, well commented helper functions instead of being open-coded all
> over the place. It's just too easy to get this stuff wrong. Let's solve
> this in a way that makes it easier in the future.

Yes, this is the way to go.

Simo.
Steve French April 6, 2009, 4:41 p.m. UTC | #7
I don't think that we should be using these size assumptions
(multiples of UCS stringlen).    A new UCS helper function should be
created that calculates how much memory would be needed for a
converted string - and we need to use this before we do the malloc and
string conversion.  In effect a strlen and strnlen function that takes
a target code page argument.  For strings that will never be more than
a hundred bytes this may not be needed, and we can use the length
assumption, but since mallocs in kernel can be so expensive I would
rather calculate the actual string length needed for the target.

On Mon, Apr 6, 2009 at 9:30 AM, simo <idra@samba.org> wrote:
> On Mon, 2009-04-06 at 10:08 -0400, Jeff Layton wrote:
>> On Mon, 06 Apr 2009 14:02:25 +0000
>> simo <idra@samba.org> wrote:
>>
>> > On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote:
>> > > > True. Seems I was influenced by a comment in fs/cifs/sess.c
>> > > >
>> > > > 313         /* UTF-8 string will not grow more than four times as
>> > > big as
>> > > > UCS-16 */
>> > > >
>> > >
>> > > That looks like that's wrong (or at least potentially so). AFAICT,
>> > > UTF-8
>> > > allows up to 6 bytes per character. I suppose that it's possible that
>> > > none of the characters allowed in UCS-16 will ever translate to a
>> > > character that's more than 4 bytes, but I'd like to see that confirmed
>> > > before we depend on it.
>> >
>> > I wonder whats UCS-16 tho, UCS-16 does not exist :)
>> >
>> > It may be either UTF16 or UCS2.
>> > Both these charsets have a base length of 2 bytes per character. UCS2 is
>> > limited to 65535 values, while UTF-16 is a multi-word charset.
>> >
>> > If the comment above is to be read as "UTF-8 string will not grow more
>> > than four times as big as UCS-2/UTF-16" then what it is saying is that
>> > at maximum an UTF-8 chars can be 4 words (or 8 bytes long).
>> > IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max
>> > estimate seem correct.
>> >
>> > If "length" in the code was the length in bytes, and not the number of
>> > characters, of an UCS-2/UTF-16 string than 4*length should, indeed be
>> > long enough.
>> >
>>
>> Ahh, you're correct. I guess I'm accustomed to thinking about lengths in
>> bytes. I guess though this means that 4*length is allocating too
>> much...wouldn't 3*length then be right (assuming NULL termination is
>> also accounted for) ?
>>
>> Regardless of the math, I'd like to see all of this moved into some
>> nice, well commented helper functions instead of being open-coded all
>> over the place. It's just too easy to get this stuff wrong. Let's solve
>> this in a way that makes it easier in the future.
>
> Yes, this is the way to go.
>
> Simo.
>
> --
> Simo Sorce
> Samba Team GPL Compliance Officer <simo@samba.org>
> Principal Software Engineer at Red Hat, Inc. <simo@redhat.com>
>
>
Suresh Jayaraman April 6, 2009, 5:03 p.m. UTC | #8
Steve French wrote:
> I don't think that we should be using these size assumptions
> (multiples of UCS stringlen).    A new UCS helper function should be
> created that calculates how much memory would be needed for a
> converted string - and we need to use this before we do the malloc and
> string conversion.  In effect a strlen and strnlen function that takes
> a target code page argument.  For strings that will never be more than
> a hundred bytes this may not be needed, and we can use the length
> assumption, but since mallocs in kernel can be so expensive I would
> rather calculate the actual string length needed for the target.

Ah, ok. I thought of writing a little function based on
cifs_strncpy_to_host() and adding a comment like below:

/* UniStrnlen() returns length in 16 bit Unicode  characters
 * (UCS-2) with base length of 2 bytes per character. An UTF-8
 * character can be up to 8 bytes maximum, so we need to
 * allocate (len/2) * 4 bytes (or) (4 * len) bytes for the
 * UTF-8 string */

and make the callers use this. May be Steve's idea is a better one.
However, I could find any utility function inside kernel source that
calculates length of a UTF-8 string. It's not clear how can I find the
length of a UTF-8 string.

Thanks,

> On Mon, Apr 6, 2009 at 9:30 AM, simo <idra@samba.org> wrote:
>> On Mon, 2009-04-06 at 10:08 -0400, Jeff Layton wrote:
>>> On Mon, 06 Apr 2009 14:02:25 +0000
>>> simo <idra@samba.org> wrote:
>>>
>>>> On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote:
>>>>>> True. Seems I was influenced by a comment in fs/cifs/sess.c
>>>>>>
>>>>>> 313 � � � � /* UTF-8 string will not grow more than four times as
>>>>> big as
>>>>>> UCS-16 */
>>>>>>
>>>>> That looks like that's wrong (or at least potentially so). AFAICT,
>>>>> UTF-8
>>>>> allows up to 6 bytes per character. I suppose that it's possible that
>>>>> none of the characters allowed in UCS-16 will ever translate to a
>>>>> character that's more than 4 bytes, but I'd like to see that confirmed
>>>>> before we depend on it.
>>>> I wonder whats UCS-16 tho, UCS-16 does not exist :)
>>>>
>>>> It may be either UTF16 or UCS2.
>>>> Both these charsets have a base length of 2 bytes per character. UCS2 is
>>>> limited to 65535 values, while UTF-16 is a multi-word charset.
>>>>
>>>> If the comment above is to be read as "UTF-8 string will not grow more
>>>> than four times as big as UCS-2/UTF-16" then what it is saying is that
>>>> at maximum an UTF-8 chars can be 4 words (or 8 bytes long).
>>>> IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max
>>>> estimate seem correct.
>>>>
>>>> If "length" in the code was the length in bytes, and not the number of
>>>> characters, of an UCS-2/UTF-16 string than 4*length should, indeed be
>>>> long enough.
>>>>
>>> Ahh, you're correct. I guess I'm accustomed to thinking about lengths in
>>> bytes. I guess though this means that 4*length is allocating too
>>> much...wouldn't 3*length then be right (assuming NULL termination is
>>> also accounted for) ?
>>>
>>> Regardless of the math, I'd like to see all of this moved into some
>>> nice, well commented helper functions instead of being open-coded all
>>> over the place. It's just too easy to get this stuff wrong. Let's solve
>>> this in a way that makes it easier in the future.
>> Yes, this is the way to go.
>>
Jeff Layton April 6, 2009, 5:25 p.m. UTC | #9
On Mon, 06 Apr 2009 22:33:09 +0530
Suresh Jayaraman <sjayaraman@suse.de> wrote:

> Steve French wrote:
> > I don't think that we should be using these size assumptions
> > (multiples of UCS stringlen).    A new UCS helper function should be
> > created that calculates how much memory would be needed for a
> > converted string - and we need to use this before we do the malloc and
> > string conversion.  In effect a strlen and strnlen function that takes
> > a target code page argument.  For strings that will never be more than
> > a hundred bytes this may not be needed, and we can use the length
> > assumption, but since mallocs in kernel can be so expensive I would
> > rather calculate the actual string length needed for the target.
> 
> Ah, ok. I thought of writing a little function based on
> cifs_strncpy_to_host() and adding a comment like below:
> 
> /* UniStrnlen() returns length in 16 bit Unicode  characters
>  * (UCS-2) with base length of 2 bytes per character. An UTF-8
>  * character can be up to 8 bytes maximum, so we need to
>  * allocate (len/2) * 4 bytes (or) (4 * len) bytes for the
>  * UTF-8 string */
> 
> and make the callers use this. May be Steve's idea is a better one.
> However, I could find any utility function inside kernel source that
> calculates length of a UTF-8 string. It's not clear how can I find the
> length of a UTF-8 string.
> 
> Thanks,
> 

I think you'll have to basically do the conversion twice. Walk the
string once and convert each character determine its length and then
discard it. Get the total and allocate that many bytes (plus the null
termination), and do the conversion again into the buffer.

Another option might be to have a static buffer already allocated
somewhere (maybe even a whole page?) and do the conversion into that
while keeping a count of bytes. Then memcpy it into the destination
buffer after you allocate it.

I'm not truly convinced this is really necessary though. You have to
figure that kmalloc is a power-of-two allocator. If you kmalloc 17
bytes, you get 32 anyway. You'll probably end up using roughly the same
amount of memory that you would have had you just estimated the size.

Patch
diff mbox

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 0de3b56..b361be0 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -3674,7 +3674,7 @@  CIFSTCon(unsigned int xid, struct cifsSesInfo *ses,
 			    BCC(smb_buffer_response)) {
 				kfree(tcon->nativeFileSystem);
 				tcon->nativeFileSystem =
-				    kzalloc(2*(length + 1), GFP_KERNEL);
+				    kzalloc((4 * length) + 2, GFP_KERNEL);
 				if (tcon->nativeFileSystem)
 					cifs_strfromUCS_le(
 						tcon->nativeFileSystem,