From patchwork Tue Jul 17 22:09:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ernesto_A=2E_Fern=C3=A1ndez?= X-Patchwork-Id: 10530897 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 208A56020A for ; Tue, 17 Jul 2018 22:10:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C02F29357 for ; Tue, 17 Jul 2018 22:10:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F25A429365; Tue, 17 Jul 2018 22:10:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BD16929357 for ; Tue, 17 Jul 2018 22:09:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731258AbeGQWok (ORCPT ); Tue, 17 Jul 2018 18:44:40 -0400 Received: from mail-qt0-f193.google.com ([209.85.216.193]:35769 "EHLO mail-qt0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730232AbeGQWok (ORCPT ); Tue, 17 Jul 2018 18:44:40 -0400 Received: by mail-qt0-f193.google.com with SMTP id a5-v6so2375229qtp.2 for ; Tue, 17 Jul 2018 15:09:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mime-version:content-disposition :content-transfer-encoding; bh=auJW8QMgoCLP6i7xZsySox3HHqBW/4MbN/XvT562i8M=; b=mvDZf62d8YMOJoYZMbTgrr4fAK35aNvqKbDhYcfCDbMYA9oITQjcD0DvOJkbba3EDe JtG2uoTkqrkJ8bTKFWDb52RgC0xYhMGAfQvT6zR281h6c4ZAiU/avJ+t2D2BZV8VOty2 1EWANzO+sYna+Ct/BcpXm3jfjc6wRd0VZ1vnuHQrwAJcFqzsRtboH860wTyE60r03NhQ YO1IgSLxj5iiluvISR9yvy/LCUF4PI70OGllzU9RgEIDHzqiHtA7jU1PDmpsesBbDlW9 xV/aFw++ksFVCBhjXDPZ13jjCXI/Z8X9M1qhjr9hJ/w11sqSmHX/dJ5QHiTh/Hrvqqm8 4etg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition:content-transfer-encoding; bh=auJW8QMgoCLP6i7xZsySox3HHqBW/4MbN/XvT562i8M=; b=dMLB7Xvq7yEMbgBKPFvnmFBBPRCo8y8Rj4aGPssk2ZiqdKkUIcZuPXGz7p+lmpRxH3 tb1vxndetvoQ/G1cN3Xs0kDrrzwFgfEsFqyMz8JXS32VrkAdBgR+ThEEC5qO3PVZOZBw XG3SKkH7IGLYPxylvEgfEoZbTdlFAABAQJUt2FlZxwNSxLznJ+CNNRAz6X/6iPF3OFNW VywrHp9QYgzwkPxtkbcXqzHbOnUjxzgusnQ+Eta1MH4gq5mqQW3lLtY8fwXaSSETMGFD ILddgu9nUzW0EkzfRUmR/Xk8vlTY5Kz3uExDXIz0NUxKjq5+Yvcv08Pg73YRoxsvgPIp GBMg== X-Gm-Message-State: AOUpUlFLRluI739hzz5WoIAhsIhtciC/Jznblx66dl8WYrRjvXF+IpIh uJ3yYIjMSWeEiaxytU0VfPCmQw== X-Google-Smtp-Source: AAOMgpe10sEQFqAzOmGI43q4UOr7700RJbwQW5kwrQb/SrlFn/BH5B6gKeIYSNdWoD28RYPehyRXtA== X-Received: by 2002:a0c:becb:: with SMTP id f11-v6mr3860757qvj.217.1531865397440; Tue, 17 Jul 2018 15:09:57 -0700 (PDT) Received: from eaf ([181.47.179.0]) by smtp.gmail.com with ESMTPSA id s39-v6sm1561575qts.42.2018.07.17.15.09.55 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 17 Jul 2018 15:09:56 -0700 (PDT) Date: Tue, 17 Jul 2018 19:09:51 -0300 From: Ernesto =?utf-8?Q?A=2E_Fern=C3=A1ndez?= To: linux-fsdevel@vger.kernel.org Cc: Andrew Morton , Ting-Chang Hou Subject: [PATCH] hfsplus: fix decomposition of Hangul characters Message-ID: <20180717220951.p6qqrgautc4pxvzu@eaf> MIME-Version: 1.0 Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Files created under macOS cannot be opened under linux if their names contain Korean characters, and vice versa. The Korean alphabet is special because its normalization is done without a table. The module deals with it correctly when composing, but forgets about it for the decomposition. Fix this using the Hangul decomposition function provided in the Unicode Standard. The code fits a bit awkwardly because it requires a buffer, while all the other normalizations are returned as pointers to the decomposition table. This is actually also a bug because reordering may still be needed, but for now leave it as it is. The patch will cause trouble for Hangul filenames already created by the module in the past. This shouldn't really be concern because its main purpose was always sharing with macOS. If a user actually needs to access such a file the nodecompose mount option should be enough. Reported-and-tested-by: Ting-Chang Hou Signed-off-by: Ernesto A. Fernández --- fs/hfsplus/unicode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 56 insertions(+), 6 deletions(-) diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c index dfa90c21948f..c8d1b2be7854 100644 --- a/fs/hfsplus/unicode.c +++ b/fs/hfsplus/unicode.c @@ -272,8 +272,8 @@ static inline int asc2unichar(struct super_block *sb, const char *astr, int len, return size; } -/* Decomposes a single unicode character. */ -static inline u16 *decompose_unichar(wchar_t uc, int *size) +/* Decomposes a non-Hangul unicode character. */ +static u16 *hfsplus_decompose_nonhangul(wchar_t uc, int *size) { int off; @@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t uc, int *size) return hfsplus_decompose_table + (off / 4); } +/* + * Try to decompose a unicode character as Hangul. Return 0 if @uc is not + * precomposed Hangul, otherwise return the length of the decomposition. + * + * This function was adapted from sample code from the Unicode Standard + * Annex #15: Unicode Normalization Forms, version 3.2.0. + * + * Copyright (C) 1991-2018 Unicode, Inc. All rights reserved. Distributed + * under the Terms of Use in http://www.unicode.org/copyright.html. + */ +static int hfsplus_try_decompose_hangul(wchar_t uc, u16 *result) +{ + int index; + int l, v, t; + + index = uc - Hangul_SBase; + if (index < 0 || index >= Hangul_SCount) + return 0; + + l = Hangul_LBase + index / Hangul_NCount; + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; + t = Hangul_TBase + index % Hangul_TCount; + + result[0] = l; + result[1] = v; + if (t != Hangul_TBase) { + result[2] = t; + return 3; + } + return 2; +} + +/* Decomposes a single unicode character. */ +static u16 *decompose_unichar(wchar_t uc, int *size, u16 *hangul_buffer) +{ + u16 *result; + + /* Hangul is handled separately */ + result = hangul_buffer; + *size = hfsplus_try_decompose_hangul(uc, result); + if (*size == 0) + result = hfsplus_decompose_nonhangul(uc, size); + return result; +} + int hfsplus_asc2uni(struct super_block *sb, struct hfsplus_unistr *ustr, int max_unistr_len, const char *astr, int len) @@ -303,13 +348,14 @@ int hfsplus_asc2uni(struct super_block *sb, int size, dsize, decompose; u16 *dstr, outlen = 0; wchar_t c; + u16 dhangul[3]; decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); while (outlen < max_unistr_len && len > 0) { size = asc2unichar(sb, astr, len, &c); if (decompose) - dstr = decompose_unichar(c, &dsize); + dstr = decompose_unichar(c, &dsize, dhangul); else dstr = NULL; if (dstr) { @@ -344,6 +390,7 @@ int hfsplus_hash_dentry(const struct dentry *dentry, struct qstr *str) unsigned long hash; wchar_t c; u16 c2; + u16 dhangul[3]; casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags); decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); @@ -357,7 +404,7 @@ int hfsplus_hash_dentry(const struct dentry *dentry, struct qstr *str) len -= size; if (decompose) - dstr = decompose_unichar(c, &dsize); + dstr = decompose_unichar(c, &dsize, dhangul); else dstr = NULL; if (dstr) { @@ -396,6 +443,7 @@ int hfsplus_compare_dentry(const struct dentry *dentry, const char *astr1, *astr2; u16 c1, c2; wchar_t c; + u16 dhangul_1[3], dhangul_2[3]; casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags); decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); @@ -413,7 +461,8 @@ int hfsplus_compare_dentry(const struct dentry *dentry, len1 -= size; if (decompose) - dstr1 = decompose_unichar(c, &dsize1); + dstr1 = decompose_unichar(c, &dsize1, + dhangul_1); if (!decompose || !dstr1) { c1 = c; dstr1 = &c1; @@ -427,7 +476,8 @@ int hfsplus_compare_dentry(const struct dentry *dentry, len2 -= size; if (decompose) - dstr2 = decompose_unichar(c, &dsize2); + dstr2 = decompose_unichar(c, &dsize2, + dhangul_2); if (!decompose || !dstr2) { c2 = c; dstr2 = &c2;