From patchwork Fri Jul 27 19:13:42 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Frediano Ziglio <frediano.ziglio@citrix.com>
X-Patchwork-Id: 1250571
Return-Path: <linux-cifs-owner@vger.kernel.org>
X-Original-To: patchwork-cifs-client@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork2.kernel.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by patchwork2.kernel.org (Postfix) with ESMTP id 5F9B2E00A7
	for <patchwork-cifs-client@patchwork.kernel.org>;
	Fri, 27 Jul 2012 19:13:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752089Ab2G0TNo (ORCPT
	<rfc822;patchwork-cifs-client@patchwork.kernel.org>);
	Fri, 27 Jul 2012 15:13:44 -0400
Received: from smtp.ctxuk.citrix.com ([62.200.22.115]:41079 "EHLO
	SMTP.EU.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751965Ab2G0TNo (ORCPT
	<rfc822; linux-cifs@vger.kernel.org>); Fri, 27 Jul 2012 15:13:44 -0400
X-IronPort-AV: E=Sophos;i="4.77,668,1336348800"; d="scan'208";a="13740736"
Received: from lonpmailmx02.citrite.net ([10.30.203.163])
	by LONPIPO01.EU.CITRIX.COM with ESMTP/TLS/RC4-MD5;
	27 Jul 2012 19:13:42 +0000
Received: from LONPMAILBOX01.citrite.net ([10.30.224.161]) by
	LONPMAILMX02.citrite.net ([10.30.203.163]) with mapi; Fri, 27 Jul 2012
	20:13:42 +0100
From: Frediano Ziglio <frediano.ziglio@citrix.com>
To: "sfrench@samba.org" <sfrench@samba.org>
CC: "linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>
Date: Fri, 27 Jul 2012 20:13:42 +0100
Subject: ALPHA: advice for a patch to CIFS
Thread-Topic: ALPHA: advice for a patch to CIFS
Thread-Index: Ac1sK++fKCWvnpFCRl+IPh6lrDISDw==
Message-ID: 
 <7CE799CC0E4DE04B88D5FDF226E18AC2CDF8E31455@LONPMAILBOX01.citrite.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
acceptlanguage: en-US
MIME-Version: 1.0
Sender: linux-cifs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-cifs.vger.kernel.org>
X-Mailing-List: linux-cifs@vger.kernel.org

Hi,
  I'm currently trying to support utf-16 with characters not in plane 0.

I'm currently end up with this patch. Currently is not against latest
kernel but the problem still reside in last git kernel.

wchar_t is currently 16bit so converting a utf8 encoded characters not
in plane 0 (>= 0x10000) to wchar_t (that is calling char2uni) lead to a
-EINVAL return. This patch detect utf8 in cifs_strtoUCS and add special
code calling directly utf8_to_utf32.

Does it sound a good patch or just a bad hack. Perhaps would be better
to change char2uni converting to unicode_t (32bit) instead of wchar_t
but probably many code have to be checked in order to make sure it does
not lead to wrong conversions, overflows or other bad stuff.

Is it worth working in this hacking way? I'd like to upstream this
patch.


Signed-off-by: "Frediano Ziglio" <frediano.ziglio@citrix.com>

Regards,
  Frediano

diff -r c2325d754e8d fs/cifs/cifs_unicode.c
--- a/fs/cifs/cifs_unicode.c	Fri Jul 27 15:12:23 2012 +0100
+++ b/fs/cifs/cifs_unicode.c	Fri Jul 27 19:09:04 2012 +0100
@@ -192,22 +192,40 @@ cifs_strtoUCS(__le16 *to, const char *fr
 {
 	int charlen;
 	int i;
-	wchar_t *wchar_to = (wchar_t *)to; /* needed to quiet sparse */
+	int is_utf8 = !strcmp(codepage->charset, "utf8");
+	wchar_t wchar_to; /* needed to quiet sparse */
+	unicode_t uni;
 
 	for (i = 0; len && *from; i++, from += charlen, len -= charlen) {
 
 		/* works for 2.4.0 kernel or later */
-		charlen = codepage->char2uni(from, len, &wchar_to[i]);
+		if (is_utf8) {
+			charlen = utf8_to_utf32(from, len, &uni);
+		} else {
+			charlen = codepage->char2uni(from, len, &wchar_to);
+			uni = wchar_to;
+		}
+
 		if (charlen < 1) {
 			cERROR(1,
 			       ("strtoUCS: char2uni of %d returned %d",
 				(int)*from, charlen));
 			/* A question mark */
-			to[i] = cpu_to_le16(0x003f);
+			wchar_to = 0x003f;
 			charlen = 1;
-		} else
-			to[i] = cpu_to_le16(wchar_to[i]);
-
+		} else if (uni < 0x10000) {
+			wchar_to = uni;
+		} else if (uni < 0x110000) {
+			uni -= 0x10000;
+			to[i++] = cpu_to_le16(0xD800 | (uni >> 10));
+			wchar_to = 0xDC00 | (uni & 0x3FF);
+		} else {
+			cERROR(1,
+			       ("strtoUCS: char2uni of %d returned %d",
+				(int)*from, charlen));
+			wchar_to = 0x003f;
+		}
+		to[i] = cpu_to_le16(wchar_to);
 	}
 
 	to[i] = 0;