From patchwork Tue Apr 17 12:55:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 10344957 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9D71260216 for ; Tue, 17 Apr 2018 12:55:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90877286B6 for ; Tue, 17 Apr 2018 12:55:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 853AC28A91; Tue, 17 Apr 2018 12:55:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0AAEF286B6 for ; Tue, 17 Apr 2018 12:55:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753321AbeDQMzj (ORCPT ); Tue, 17 Apr 2018 08:55:39 -0400 Received: from mx2.suse.de ([195.135.220.15]:43665 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752359AbeDQMzi (ORCPT ); Tue, 17 Apr 2018 08:55:38 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id B7229ACEE; Tue, 17 Apr 2018 12:55:36 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 804C21E0A79; Tue, 17 Apr 2018 14:55:36 +0200 (CEST) From: Jan Kara To: Cc: Andrew Gabbasov , arthur200126@gmail.com, Jan Kara Subject: [PATCH 3/7] udf: Use UTF-32 <-> UTF-8 conversion functions from NLS Date: Tue, 17 Apr 2018 14:55:25 +0200 Message-Id: <20180417125529.6975-4-jack@suse.cz> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180417125529.6975-1-jack@suse.cz> References: <20180417125529.6975-1-jack@suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Instead of implementing our own functions converting to and from UTF-8, use the ones provided by NLS. Signed-off-by: Jan Kara --- fs/udf/unicode.c | 80 ++++++++++++-------------------------------------------- 1 file changed, 17 insertions(+), 63 deletions(-) diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c index 16a8ad21b77e..18df831afd3d 100644 --- a/fs/udf/unicode.c +++ b/fs/udf/unicode.c @@ -28,6 +28,7 @@ #include "udf_sb.h" +#define UNICODE_MAX 0x10ffff #define SURROGATE_MASK 0xfffff800 #define SURROGATE_PAIR 0x0000d800 @@ -40,22 +41,12 @@ static int udf_uni2char_utf8(wchar_t uni, if (boundlen <= 0) return -ENAMETOOLONG; - if ((uni & SURROGATE_MASK) == SURROGATE_PAIR) - return -EINVAL; - - if (uni < 0x80) { - out[u_len++] = (unsigned char)uni; - } else if (uni < 0x800) { - if (boundlen < 2) - return -ENAMETOOLONG; - out[u_len++] = (unsigned char)(0xc0 | (uni >> 6)); - out[u_len++] = (unsigned char)(0x80 | (uni & 0x3f)); - } else { - if (boundlen < 3) - return -ENAMETOOLONG; - out[u_len++] = (unsigned char)(0xe0 | (uni >> 12)); - out[u_len++] = (unsigned char)(0x80 | ((uni >> 6) & 0x3f)); - out[u_len++] = (unsigned char)(0x80 | (uni & 0x3f)); + u_len = utf32_to_utf8(uni, out, boundlen); + if (u_len < 0) { + if (uni > UNICODE_MAX || + (uni & SURROGATE_MASK) == SURROGATE_PAIR) + return -EINVAL; + return -ENAMETOOLONG; } return u_len; } @@ -64,56 +55,19 @@ static int udf_char2uni_utf8(const unsigned char *in, int boundlen, wchar_t *uni) { - unsigned int utf_char; - unsigned char c; - int utf_cnt, u_len; - - utf_char = 0; - utf_cnt = 0; - for (u_len = 0; u_len < boundlen;) { - c = in[u_len++]; - - /* Complete a multi-byte UTF-8 character */ - if (utf_cnt) { - utf_char = (utf_char << 6) | (c & 0x3f); - if (--utf_cnt) - continue; - } else { - /* Check for a multi-byte UTF-8 character */ - if (c & 0x80) { - /* Start a multi-byte UTF-8 character */ - if ((c & 0xe0) == 0xc0) { - utf_char = c & 0x1f; - utf_cnt = 1; - } else if ((c & 0xf0) == 0xe0) { - utf_char = c & 0x0f; - utf_cnt = 2; - } else if ((c & 0xf8) == 0xf0) { - utf_char = c & 0x07; - utf_cnt = 3; - } else if ((c & 0xfc) == 0xf8) { - utf_char = c & 0x03; - utf_cnt = 4; - } else if ((c & 0xfe) == 0xfc) { - utf_char = c & 0x01; - utf_cnt = 5; - } else { - utf_cnt = -1; - break; - } - continue; - } else { - /* Single byte UTF-8 character (most common) */ - utf_char = c; - } - } - *uni = utf_char; - break; - } - if (utf_cnt) { + int u_len; + unicode_t c; + + u_len = utf8_to_utf32(in, boundlen, &c); + if (u_len < 0) { *uni = '?'; return -EINVAL; } + + if (c > MAX_WCHAR_T) + *uni = '?'; + else + *uni = c; return u_len; }