From patchwork Mon Mar 18 20:27:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gabriel Krisman Bertazi X-Patchwork-Id: 10858521 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0D1C1390 for ; Mon, 18 Mar 2019 20:28:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D3CA32953F for ; Mon, 18 Mar 2019 20:28:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C128F28565; Mon, 18 Mar 2019 20:28:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 93B9B29527 for ; Mon, 18 Mar 2019 20:28:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727461AbfCRU2F (ORCPT ); Mon, 18 Mar 2019 16:28:05 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:32952 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726998AbfCRU2F (ORCPT ); Mon, 18 Mar 2019 16:28:05 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id 2A31728119C From: Gabriel Krisman Bertazi To: tytso@mit.edu Cc: linux-ext4@vger.kernel.org, sfrench@samba.org, darrick.wong@oracle.com, jlayton@kernel.org, bfields@fieldses.org, paulus@samba.org, linux-fsdevel@vger.kernel.org, Olaf Weber , Gabriel Krisman Bertazi Subject: [PATCH RFC v6 01/11] unicode: Add unicode character database files Date: Mon, 18 Mar 2019 16:27:35 -0400 Message-Id: <20190318202745.5200-2-krisman@collabora.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190318202745.5200-1-krisman@collabora.com> References: <20190318202745.5200-1-krisman@collabora.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Add files from the Unicode Character Database, version 11.0, to the source. A helper program that generates a trie used for normalization from these files is part of a separate commit. - Notes on the update from 8.0.0 and 11.0: The structure of ucd files and special cases have not experienced any changes between versions 8.0.0 and 11.0.0. 8.0.0 saw the addition of Cherokee LC characters, which is an interesting case for case-folding. The update is accompanied by new tests on the test_ucd module to catch specific cases. No changes to mkutf8data script was required for the update. The actual files are not part of the commit submitted to the list because they are to big and would bounce. Still, they can be obtained by the following script: FILES="CaseFolding.txt DerivedAge.txt extracted/DerivedCombiningClass.txt DerivedCoreProperties.txt NormalizationCorrections.txt NormalizationTest.txt UnicodeData.txt" VERSION=11.0.0 BASE=http://www.unicode.org/Public/${VERSION}/ucd for i in ${FILES} ; do wget "${BASE}/$i" -O fs/unicode/ucd/$(basename ${i} .txt)-${VERSION}.txt done Signed-off-by: Olaf Weber Signed-off-by: Gabriel Krisman Bertazi [Move ucd directory to fs/unicode/] [Update to Unicode 11.0.0] --- fs/unicode/ucd/README | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) create mode 100644 fs/unicode/ucd/README diff --git a/fs/unicode/ucd/README b/fs/unicode/ucd/README new file mode 100644 index 000000000000..5f89017b35ee --- /dev/null +++ b/fs/unicode/ucd/README @@ -0,0 +1,33 @@ +The files in this directory are part of the Unicode Character Database +for version 11.0.0 of the Unicode standard. + +The full set of files can be found here: + + http://www.unicode.org/Public/11.0.0/ucd/ + +The latest released version of the UCD can be found here: + + http://www.unicode.org/Public/UCD/latest/ + +The files in this directory are identical, except that they have been +renamed with a suffix indicating the unicode version. + +Individual source links: + + http://www.unicode.org/Public/11.0.0/ucd/CaseFolding.txt + http://www.unicode.org/Public/11.0.0/ucd/DerivedAge.txt + http://www.unicode.org/Public/11.0.0/ucd/extracted/DerivedCombiningClass.txt + http://www.unicode.org/Public/11.0.0/ucd/DerivedCoreProperties.txt + http://www.unicode.org/Public/11.0.0/ucd/NormalizationCorrections.txt + http://www.unicode.org/Public/11.0.0/ucd/NormalizationTest.txt + http://www.unicode.org/Public/11.0.0/ucd/UnicodeData.txt + +md5sums + + 414436796cf097df55f798e1585448ee CaseFolding-11.0.0.txt + 6032a595fbb782694456491d86eecfac DerivedAge-11.0.0.txt + 3240997d671297ac754ab0d27577acf7 DerivedCombiningClass-11.0.0.txt + 2a4fe257d9d8184518e036194d2248ec DerivedCoreProperties-11.0.0.txt + 4e7d383fa0dd3cd9d49d64e5b7b7c9e0 NormalizationCorrections-11.0.0.txt + c9500c5b8b88e584469f056023ecc3f2 NormalizationTest-11.0.0.txt + acc291106c3758d2025f8d7bd5518bee UnicodeData-11.0.0.txt