From patchwork Tue Apr 7 13:15:46 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suresh Jayaraman X-Patchwork-Id: 16814 Received: from lists.samba.org (mail.samba.org [66.70.73.150]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n37DGSHp027069 for ; Tue, 7 Apr 2009 13:16:28 GMT Received: from dp.samba.org (localhost [127.0.0.1]) by lists.samba.org (Postfix) with ESMTP id 528AC163C1D for ; Tue, 7 Apr 2009 13:16:09 +0000 (GMT) X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on dp.samba.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.8 tests=AWL, BAYES_00 autolearn=ham version=3.1.7 X-Original-To: linux-cifs-client@lists.samba.org Delivered-To: linux-cifs-client@lists.samba.org Received: from victor.provo.novell.com (victor.provo.novell.com [137.65.250.26]) by lists.samba.org (Postfix) with ESMTP id 04DAB163BA0 for ; Tue, 7 Apr 2009 13:15:39 +0000 (GMT) Received: from [164.99.138.63] (prv-ext-foundry1.gns.novell.com [137.65.251.240]) by victor.provo.novell.com with ESMTP; Tue, 07 Apr 2009 07:15:51 -0600 Message-ID: <49DB5202.4030601@suse.de> Date: Tue, 07 Apr 2009 18:45:46 +0530 From: Suresh Jayaraman User-Agent: Thunderbird 2.0.0.19 (X11/20081227) MIME-Version: 1.0 To: Jeff Layton Subject: Re: [linux-cifs-client] [PATCH] cifs: Fix insufficient memory allocation for nativeFileSystem field References: <49D9A9D6.10003@suse.de> <20090406075351.5e9e32f0@tleilax.poochiereds.net> <49D9FA8D.4090607@suse.de> <20090406092238.10aeccb7@tleilax.poochiereds.net> <1239026545.7649.46.camel@pico.li.ssimo.org> <20090406100846.54d6938c@tleilax.poochiereds.net> <1239028218.7649.49.camel@pico.li.ssimo.org> <524f69650904060941j78102e9aue71d7348e0938573@mail.gmail.com> <49DA35CD.20804@suse.de> <20090406132539.1f37322c@tleilax.poochiereds.net> In-Reply-To: <20090406132539.1f37322c@tleilax.poochiereds.net> X-Enigmail-Version: 0.95.7 Cc: Steve French , "linux-cifs-client@lists.samba.org" X-BeenThere: linux-cifs-client@lists.samba.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: The Linux CIFS VFS client List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-cifs-client-bounces+patchwork-cifs-client=patchwork.kernel.org@lists.samba.org Errors-To: linux-cifs-client-bounces+patchwork-cifs-client=patchwork.kernel.org@lists.samba.org Jeff Layton wrote: > On Mon, 06 Apr 2009 22:33:09 +0530 > Suresh Jayaraman wrote: > >> Steve French wrote: >>> I don't think that we should be using these size assumptions >>> (multiples of UCS stringlen). A new UCS helper function should be >>> created that calculates how much memory would be needed for a >>> converted string - and we need to use this before we do the malloc and >>> string conversion. In effect a strlen and strnlen function that takes >>> a target code page argument. For strings that will never be more than >>> a hundred bytes this may not be needed, and we can use the length >>> assumption, but since mallocs in kernel can be so expensive I would >>> rather calculate the actual string length needed for the target. >> Ah, ok. I thought of writing a little function based on >> cifs_strncpy_to_host() and adding a comment like below: >> >> /* UniStrnlen() returns length in 16 bit Unicode characters >> * (UCS-2) with base length of 2 bytes per character. An UTF-8 >> * character can be up to 8 bytes maximum, so we need to >> * allocate (len/2) * 4 bytes (or) (4 * len) bytes for the >> * UTF-8 string */ >> > > I think you'll have to basically do the conversion twice. Walk the > string once and convert each character determine its length and then > discard it. Get the total and allocate that many bytes (plus the null Thanks for explaining. It seems adding a new UCS helper that computes length in bytes like the below would be good enough and make use of it to compute length for memory allocation. > termination), and do the conversion again into the buffer. Do we still need this conversion again? > > I'm not truly convinced this is really necessary though. You have to > figure that kmalloc is a power-of-two allocator. If you kmalloc 17 > bytes, you get 32 anyway. You'll probably end up using roughly the same > amount of memory that you would have had you just estimated the size. > Yes, it seems for most cases we would end up with roughly the same amount of memory. Both the approaches sounds plausible to me. If we have a concensus on one of these two, I can convert the code in audited places and resend the patch. Thanks, diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h index 14eb9a2..0396bdc 100644 --- a/fs/cifs/cifs_unicode.h +++ b/fs/cifs/cifs_unicode.h @@ -159,6 +159,23 @@ UniStrnlen(const wchar_t *ucs1, int maxlen) } /* + * UniStrnlenBytes: Return the length in bytes of a UTF-8 string + */ +static inline size_t +UniStrnlenBytes(const unsigned char *str, int maxlen) +{ + size_t nbytes = 0; + wchar_t *uni; + + while (*str++) { + /* convert each char, find its length and add to nbytes */ + if (char2uni(str, maxlen, uni) > 0) + nbytes += strnlen(uni, NLS_MAX_CHARSET_SIZE); + } + return nbytes; +} + +/* We would still be needing the version (UniStrnlen) that returns length in characters also.