From patchwork Tue Apr  7 13:15:46 2009
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Suresh Jayaraman <sjayaraman@suse.de>
X-Patchwork-Id: 16814
Received: from lists.samba.org (mail.samba.org [66.70.73.150])
	by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n37DGSHp027069
	for <patchwork-cifs-client@patchwork.kernel.org>;
	Tue, 7 Apr 2009 13:16:28 GMT
Received: from dp.samba.org (localhost [127.0.0.1])
	by lists.samba.org (Postfix) with ESMTP id 528AC163C1D
	for <patchwork-cifs-client@patchwork.kernel.org>;
	Tue,  7 Apr 2009 13:16:09 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on dp.samba.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.5 required=3.8 tests=AWL, BAYES_00 autolearn=ham
	version=3.1.7
X-Original-To: linux-cifs-client@lists.samba.org
Delivered-To: linux-cifs-client@lists.samba.org
Received: from victor.provo.novell.com (victor.provo.novell.com
	[137.65.250.26])
	by lists.samba.org (Postfix) with ESMTP id 04DAB163BA0
	for <linux-cifs-client@lists.samba.org>;
	Tue,  7 Apr 2009 13:15:39 +0000 (GMT)
Received: from [164.99.138.63] (prv-ext-foundry1.gns.novell.com
	[137.65.251.240])
	by victor.provo.novell.com with ESMTP; Tue, 07 Apr 2009 07:15:51 -0600
Message-ID: <49DB5202.4030601@suse.de>
Date: Tue, 07 Apr 2009 18:45:46 +0530
From: Suresh Jayaraman <sjayaraman@suse.de>
User-Agent: Thunderbird 2.0.0.19 (X11/20081227)
MIME-Version: 1.0
To: Jeff Layton <jlayton@redhat.com>
Subject: Re: [linux-cifs-client] [PATCH] cifs: Fix insufficient memory
	allocation for nativeFileSystem field
References: <49D9A9D6.10003@suse.de>
	<20090406075351.5e9e32f0@tleilax.poochiereds.net>	<49D9FA8D.4090607@suse.de>
	<20090406092238.10aeccb7@tleilax.poochiereds.net>
	<1239026545.7649.46.camel@pico.li.ssimo.org>
	<20090406100846.54d6938c@tleilax.poochiereds.net>
	<1239028218.7649.49.camel@pico.li.ssimo.org>
	<524f69650904060941j78102e9aue71d7348e0938573@mail.gmail.com>
	<49DA35CD.20804@suse.de>
	<20090406132539.1f37322c@tleilax.poochiereds.net>
In-Reply-To: <20090406132539.1f37322c@tleilax.poochiereds.net>
X-Enigmail-Version: 0.95.7
Cc: Steve French <smfrench@gmail.com>, "linux-cifs-client@lists.samba.org"
	<linux-cifs-client@lists.samba.org>
X-BeenThere: linux-cifs-client@lists.samba.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: The Linux CIFS VFS client <linux-cifs-client.lists.samba.org>
List-Unsubscribe: 
 <https://lists.samba.org/mailman/listinfo/linux-cifs-client>,
	<mailto:linux-cifs-client-request@lists.samba.org?subject=unsubscribe>
List-Archive: <http://lists.samba.org/archive/linux-cifs-client>
List-Post: <mailto:linux-cifs-client@lists.samba.org>
List-Help: <mailto:linux-cifs-client-request@lists.samba.org?subject=help>
List-Subscribe: <https://lists.samba.org/mailman/listinfo/linux-cifs-client>,
	<mailto:linux-cifs-client-request@lists.samba.org?subject=subscribe>
Sender: 
 linux-cifs-client-bounces+patchwork-cifs-client=patchwork.kernel.org@lists.samba.org
Errors-To: 
 linux-cifs-client-bounces+patchwork-cifs-client=patchwork.kernel.org@lists.samba.org

Jeff Layton wrote:
> On Mon, 06 Apr 2009 22:33:09 +0530
> Suresh Jayaraman <sjayaraman@suse.de> wrote:
> 
>> Steve French wrote:
>>> I don't think that we should be using these size assumptions
>>> (multiples of UCS stringlen).    A new UCS helper function should be
>>> created that calculates how much memory would be needed for a
>>> converted string - and we need to use this before we do the malloc and
>>> string conversion.  In effect a strlen and strnlen function that takes
>>> a target code page argument.  For strings that will never be more than
>>> a hundred bytes this may not be needed, and we can use the length
>>> assumption, but since mallocs in kernel can be so expensive I would
>>> rather calculate the actual string length needed for the target.
>> Ah, ok. I thought of writing a little function based on
>> cifs_strncpy_to_host() and adding a comment like below:
>>
>> /* UniStrnlen() returns length in 16 bit Unicode  characters
>>  * (UCS-2) with base length of 2 bytes per character. An UTF-8
>>  * character can be up to 8 bytes maximum, so we need to
>>  * allocate (len/2) * 4 bytes (or) (4 * len) bytes for the
>>  * UTF-8 string */
>>
> 
> I think you'll have to basically do the conversion twice. Walk the
> string once and convert each character determine its length and then
> discard it. Get the total and allocate that many bytes (plus the null

Thanks for explaining. It seems adding a new UCS helper that computes
length in bytes like the below would be good enough and make use of it
to compute length for memory allocation.

> termination), and do the conversion again into the buffer.

Do we still need this conversion again?


> 
> I'm not truly convinced this is really necessary though. You have to
> figure that kmalloc is a power-of-two allocator. If you kmalloc 17
> bytes, you get 32 anyway. You'll probably end up using roughly the same
> amount of memory that you would have had you just estimated the size.
> 

Yes, it seems for most cases we would end up with roughly the same
amount of memory.

Both the approaches sounds plausible to me. If we have a concensus on
one of these two, I can convert the code in audited places and resend
the patch.

Thanks,

diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
index 14eb9a2..0396bdc 100644
--- a/fs/cifs/cifs_unicode.h
+++ b/fs/cifs/cifs_unicode.h
@@ -159,6 +159,23 @@ UniStrnlen(const wchar_t *ucs1, int maxlen)
 }

 /*
+ * UniStrnlenBytes: Return the length in bytes of a UTF-8 string
+ */
+static inline size_t
+UniStrnlenBytes(const unsigned char *str, int maxlen)
+{
+       size_t nbytes = 0;
+       wchar_t *uni;
+
+       while (*str++) {
+               /* convert each char, find its length and add to nbytes */
+               if (char2uni(str, maxlen, uni) > 0)
+                       nbytes += strnlen(uni, NLS_MAX_CHARSET_SIZE);
+       }
+       return nbytes;
+}
+
+/*

We would still be needing the version (UniStrnlen) that returns length
in characters also.