From patchwork Tue Jul 30 01:06:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746095 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D57BA10E3 for ; Tue, 30 Jul 2024 01:06:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301563; cv=none; b=ilDPy3yBUcJ8a9vLVL8Hu3JVpk6R14Y0PhVKYZ5ZEvZ7LUeN/+ViZbJZZCbhRyGZTiWqI+MiI9Jg2CMN1M/mLPlJZ+5EV91RDV2Z1NZVxH8BfzUemI1lTWaC50zXQPIJo/kQwOjEkw/hHwyE6HWEe4UljEm1kpQZ6CbFHajqtLw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301563; c=relaxed/simple; bh=R4CY6a2ZyyDFRcuYkRRgw7GLMPxTN2gVjmTEgt5FKvM=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=erJvviZDfm1UxjTCK1za/7DZKnh0S1lh3bHuiUuj3r+cIc0NdMJnr482f4dkRQRsyJ+QflFMqmwOcy+mw4Z9q7o6pXgfo1YSsMaXIy2REZ9JWUGBNmoEcLWNjo91Y7+UOb9FkBBdHP6BbJzgeK2Nvbvsyzt2f4HkNtBbCkKnY5E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=c3FUK9bz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="c3FUK9bz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AD3F5C32786; Tue, 30 Jul 2024 01:06:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301563; bh=R4CY6a2ZyyDFRcuYkRRgw7GLMPxTN2gVjmTEgt5FKvM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=c3FUK9bzTPRURuVSk6fn60MjjkYnGNXSIW8DAUgbmji909y3ntYASyYVEbRSXF/4C sJTVnynbW+lCALh0k55DxQuIZyKbFyhWs37s42cu60JAkHkPgLeY9dmUNl1klyrGN4 l9MGCNzi4pmpzewAihr7g2u5z7ghDjaVn0Hh8LetK//1LDzBA2M1I/pqx+fZRSa0H2 FsDMJdxR++ni3PCBFTj89KPQdJb4ZNcrIZ5sbXLSAs7qEjDiI/KcUjLNia33mQ032U No6pd9zABjVLvAOtMnmXVonR6tgfa2rm8MqO4iZb3g6LZFwb7FyhNCVqoH3B/9fKmR 5uuuRlC1P/tCw== Date: Mon, 29 Jul 2024 18:06:03 -0700 Subject: [PATCH 01/13] xfs_scrub: use proper UChar string iterators From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847546.1348850.8998764683709347135.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong For code that wants to examine a UChar string, use libicu's string iterators to walk UChar strings, instead of the open-coded U16_NEXT* macros that perform no typechecking. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index dd3016435..02a1b94ef 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -330,13 +330,12 @@ name_entry_examine( struct name_entry *entry, unsigned int *badflags) { + UCharIterator uiter; UChar32 uchr; - int32_t i; uint8_t mask = 0; - for (i = 0; i < entry->normstrlen;) { - U16_NEXT_UNSAFE(entry->normstr, i, uchr); - + uiter_setString(&uiter, entry->normstr, entry->normstrlen); + while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { /* zero width character sequences */ switch (uchr) { case 0x200B: /* zero width space */ From patchwork Tue Jul 30 01:06:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746106 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D28110E3 for ; Tue, 30 Jul 2024 01:06:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301579; cv=none; b=RCZMoTQecvJQHkaHnzFh1m0VoZbSvJyhch9Kl4SwGb1S7SRPKYsfWN+2HsAZhGIEM8jgmw5u82XfPHEE7nm265fMR4o+tK5F1m6FErX+MVmjV7Fr/890iIqYxOZ3mc6oetjZptcDQ21B8Uh9gyZDeNHfPNJeaXYbqIh09RRG+fI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301579; c=relaxed/simple; bh=MkoONVRwz64FQqQTIhCGD7I7WiyVqX90RiC4rWsyR6c=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DhMNHNB66588Am5MP7dhh90948h+SDEjjsY8qcqA+2+UIRkFGFArrm3RlHpxU6nFCBCPmgV86dqXZROqKRk5AFsblrlhGP4xRrCdCT34TuUtYC2UhdjewTebwJGoUK7JWbKaNBSZ50gt26lNvJBh98sGDS28hACPOVNLKUQ63TE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=HUi+31jf; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="HUi+31jf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 512CDC32786; Tue, 30 Jul 2024 01:06:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301579; bh=MkoONVRwz64FQqQTIhCGD7I7WiyVqX90RiC4rWsyR6c=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=HUi+31jfS5JK0pHLg2pHzd/fUeNum7lh4hwFGxbKrJ+ibEaGxMiGIPq3Uo9gaEoCg WxAmKy3j/Loyi9uG7fKsE7yHsZh9A9pk52aMWo8W1I79Exz1gZiRrgHSlgh0QcitbH Dmx0ocPZG5dlBZraJifHYaUodrRYCtYBcmqFzdCuY61N0780bKnD1ED+YScCeeqULx I4lxZL9oEvVGFH/A97u16NEJF2pthaR/+AP1tVZd/1/iyCD6vxQAPjs4wsY8KInLSF e2ixXOgrXisf+a4fNRzLuF4NdYcblHiJMVQ5tI17vqQYFgMevMYq+UEjrncYJBUXra WC0MjXRv2OYjQ== Date: Mon, 29 Jul 2024 18:06:18 -0700 Subject: [PATCH 02/13] xfs_scrub: hoist code that removes ignorable characters From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847561.1348850.2823932170105684878.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Hoist the loop that removes "ignorable" code points from the skeleton string into a separate function and give the UChar cursors names that are easier to understand. Convert the code to use the safe versions of the U16_ accessor functions. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 02a1b94ef..96e20114c 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -145,6 +145,31 @@ is_utf8_locale(void) return answer; } +/* + * Remove control/formatting characters from this string and return its new + * length. UChar32 is required for U16_NEXT, despite the name. + */ +static int32_t +remove_ignorable( + UChar *ustr, + int32_t ustrlen) +{ + UChar32 uchr; + int32_t src, dest; + + for (src = 0, dest = 0; src < ustrlen; dest = src) { + U16_NEXT(ustr, src, ustrlen, uchr); + if (!u_isIDIgnorable(uchr)) + continue; + memmove(&ustr[dest], &ustr[src], + (ustrlen - src + 1) * sizeof(UChar)); + ustrlen -= (src - dest); + src = dest; + } + + return dest; +} + /* * Generate normalized form and skeleton of the name. If this fails, just * forget everything and return false; this is an advisory checker. @@ -160,9 +185,6 @@ name_entry_compute_checknames( int32_t normstrlen; int32_t unistrlen; int32_t skelstrlen; - UChar32 uchr; - int32_t i, j; - UErrorCode uerr = U_ZERO_ERROR; /* Convert bytestr to unistr for normalization */ @@ -206,16 +228,7 @@ name_entry_compute_checknames( if (U_FAILURE(uerr)) goto out_skelstr; - /* Remove control/formatting characters from skeleton. */ - for (i = 0, j = 0; i < skelstrlen; j = i) { - U16_NEXT_UNSAFE(skelstr, i, uchr); - if (!u_isIDIgnorable(uchr)) - continue; - memmove(&skelstr[j], &skelstr[i], - (skelstrlen - i + 1) * sizeof(UChar)); - skelstrlen -= (i - j); - i = j; - } + skelstrlen = remove_ignorable(skelstr, skelstrlen); entry->skelstr = skelstr; entry->skelstrlen = skelstrlen; From patchwork Tue Jul 30 01:06:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746107 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 201B24C97 for ; Tue, 30 Jul 2024 01:06:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301595; cv=none; b=VynRd14IQn4GehOlajXiBSN08rGWLR6rp1+ivcVizLwqQCILx/ccbVE+OTg6EybbbQyqYQpauXs4Fn02P4DgRwM+3cOl0wmDIwuapg9kxcz/zLCI+O4C5LN2qQK4mDFHpe/HFBh3cefwKU/3ZosTHc5MFqKPE8v0QmS82BSFOaQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301595; c=relaxed/simple; bh=clb0cWF+E7Yy91hZ2f4Ob0arAKmdCLfWGVgIDpqRJtI=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=E5ufy4RvxITaljvId8STYQ9mRm6Ct82QqjQsJxEyIos2GtA5m1N+WXbm42Dd3Zi9XfPHjfmRZbx3jerzJVa+X8/3BIjDW4SS3WxwwrlSGBXpTZZgCEJJBiH3A+YNqcCXm68GB6Uh3BNJeIf3Dg5x52ggn70D+oUUXGO2pT+TDT4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=M71CI13C; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="M71CI13C" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F19ABC32786; Tue, 30 Jul 2024 01:06:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301595; bh=clb0cWF+E7Yy91hZ2f4Ob0arAKmdCLfWGVgIDpqRJtI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=M71CI13CGZjSj49SCGnBs1FLmOu4JP/lyODgkXj5v00PAW8TD7BFRq5rYG6mi3Ywm Fd0NqWHiQJdDuB5ZZ30iAbt0h0Zme3kmj4A39Jlefgj7+ri9GreTb5USWRaZ9c7Gdf 0tDOXUAVdUwNJL5+AMElEp1t03EFvsW8nZrOMcdtTh+yYb0fHzO7xip0Band8dmFjM K5uhDZtS6bGWi2/6U7AI342UMlhSeITGXkXB14wF/9PBLuJDd0n9+m15dKHQmU9sKA hk9pm2AE68aIXrpW+x9QfElCZfzXlX0ZfbP5XpauQFkX2VSUVLeoRTjRoqg6IlGwsn /CsuZtOEFoKmw== Date: Mon, 29 Jul 2024 18:06:34 -0700 Subject: [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847576.1348850.17804705325546372553.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong I missed a few non-rendering code points in the "zero width" classification code. Add them now, and sort the list. Finding them is an annoyingly manual process because there are various code points that are not supposed to affect the rendering of a string of text but are not explicitly named as such. There are other code points that, when surrounded by code points from the same chart, actually /do/ affect the rendering. IOWs, the only way to figure this out is to grep the likely code points and then go figure out how each of them render by reading the Unicode spec or trying it. $ wget https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt $ grep -E '(separator|zero width|invisible|joiner|application)' -i UnicodeData.txt Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 96e20114c..edc32d55c 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -351,15 +351,19 @@ name_entry_examine( while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { /* zero width character sequences */ switch (uchr) { + case 0x034F: /* combining grapheme joiner */ case 0x200B: /* zero width space */ case 0x200C: /* zero width non-joiner */ case 0x200D: /* zero width joiner */ - case 0xFEFF: /* zero width non breaking space */ + case 0x2028: /* line separator */ + case 0x2029: /* paragraph separator */ case 0x2060: /* word joiner */ case 0x2061: /* function application */ case 0x2062: /* invisible times (multiply) */ case 0x2063: /* invisible separator (comma) */ case 0x2064: /* invisible plus (addition) */ + case 0x2D7F: /* tifinagh consonant joiner */ + case 0xFEFF: /* zero width non breaking space */ *badflags |= UNICRASH_ZERO_WIDTH; break; } From patchwork Tue Jul 30 01:06:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746108 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F3CB7464 for ; Tue, 30 Jul 2024 01:06:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301611; cv=none; b=svFh1aikdFHQvAOCdZBZQcBJUdKSgzAeLow/o2yNSl0cr4yo6h9GfMVhtVnmOEBR28ZfAkqkOpjO/ZpE6ElgvA70nTtoLk0xOb/92b0NQaeZxfMfRx+sgWBxydOLAUA6jpOOmsbYOsg0MMw3IuCbCKu6UDXlJEwskwmLwEXXCho= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301611; c=relaxed/simple; bh=rGE3BySm7eCOCtJgnWsCCfcISzTN6G7a/AXDRMNmv7g=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=lLu1XdCJBRYkux/9BTqpmE74CTBsTtyipqi2vYX7bePEPSGHmPq7MT3Xg9/jjvTE1A+gsdLAHtfc9ObnpLjonCUZc3UfNAJ5Deph0niQnguIgqivh+8QEedkGh/7qtc/gSjY8q4GmzDeXc6LouOmLv8KQ88V+K03T4OebPEPom0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JCC8Vfpe; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JCC8Vfpe" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9E757C32786; Tue, 30 Jul 2024 01:06:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301610; bh=rGE3BySm7eCOCtJgnWsCCfcISzTN6G7a/AXDRMNmv7g=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=JCC8Vfpea31Wm/P/DOzHcyHGieDBaV1Qo1cmjhvaHMLqanPw3FAkBe0gfxEcXJwrr bKUHbYS1wV+URJE8N5HI6LADKMkQUODqU5VY54B6reIwnjF18LErNImRMiectXWMUT DdtQgntQQj8/Ieyi8PpzdiITrzuQJzmLs9KtNu97yiSiheUSuHqcgT1ZKJZ9nX8RF2 GKnKB4VLl6XOYNS6CMlAeXduH+EccrF1+SY0dDWQnPTXu7F9+5+acBwOid6i47cPAK Kve/gdsHArRH5dudome9TUZThqjUD93bJ9ukPaYJP0iGhGTWzPuyCf4SMKw9kQryoM Q256ZxX0OOnDQ== Date: Mon, 29 Jul 2024 18:06:50 -0700 Subject: [PATCH 04/13] xfs_scrub: avoid potential UAF after freeing a duplicate name entry From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847591.1348850.7263037580833766617.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Change the function declaration of unicrash_add to set the caller's @new_entry to NULL if we detect an updated name entry and do not wish to continue processing. This avoids a theoretical UAF if the unicrash_add caller were to accidentally continue using the pointer. This isn't an /actual/ UAF because the function formerly set @badflags to zero, but let's be a little defensive. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index edc32d55c..4517e2bce 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -628,10 +628,11 @@ _("Unicode name \"%s\" in %s could be confused with \"%s\"."), static void unicrash_add( struct unicrash *uc, - struct name_entry *new_entry, + struct name_entry **new_entryp, unsigned int *badflags, struct name_entry **existing_entry) { + struct name_entry *new_entry = *new_entryp; struct name_entry *entry; size_t bucket; xfs_dahash_t hash; @@ -654,7 +655,7 @@ unicrash_add( entry->ino = new_entry->ino; uc->buckets[bucket] = new_entry->next; name_entry_free(new_entry); - *badflags = 0; + *new_entryp = NULL; return; } @@ -697,8 +698,8 @@ __unicrash_check_name( return 0; name_entry_examine(new_entry, &badflags); - unicrash_add(uc, new_entry, &badflags, &dup_entry); - if (badflags) + unicrash_add(uc, &new_entry, &badflags, &dup_entry); + if (new_entry && badflags) unicrash_complain(uc, dsc, namedescr, new_entry, badflags, dup_entry); From patchwork Tue Jul 30 01:07:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746109 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E63479CC for ; Tue, 30 Jul 2024 01:07:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301626; cv=none; b=WRvnGGzuDFZHu9GB2bXJ2felSNkrkKzizM22PjeumHtZoJpCDY7tDcoRJ8InPUS98AzXDJTCOhxDsa59H9lxLTqP5UoeDhoZ7tqMuEDVItEacugdoi9F/kveSOmCsu2fDELjfT3rp7GENheZ2KeTC4wlEzUr3fKWU5/SR/vcMR4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301626; c=relaxed/simple; bh=HV883gwiwCo86qWToZxiQrD1tSoKqRiYuZ4qTl1jfno=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=csrGD/TJ64CCDNqDcoDUDudioPR2XxZBAavr8gnD/FoPadU50UxTd7OjnQzhOBXh3rpMYZF53bMN5Zw3puQ3LDOYva8JSwaMtWOCN2eoDWCMctQqkwOwx8tQCM4guGQojTRLbaqnIJJIyRCk7OL68eKMHrt0PKwf+pu07kXUp/M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qG0ad+px; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qG0ad+px" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 49715C4AF0B; Tue, 30 Jul 2024 01:07:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301626; bh=HV883gwiwCo86qWToZxiQrD1tSoKqRiYuZ4qTl1jfno=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=qG0ad+pxKIjAjg74HoYHarHddbobMWWkT1pr1St/i5c8uBKEawWdWJDyEIY/Yrm3P hX43Y4YOghunzRVH44FO52BlddbNbJ28meJYyub7+gW/hWvSVC+vVYaw9Y/NqS+izW 5RBnauhtty6Rba/qUhgSNRes6Odyg+6sdMXCBji7SNqtaZ378g0bj1uJxB9Nbgju62 80GVKbCCrCY/fk/wEJq3OMURwzWTpIOeyoxts1/AKMVqXNzoBn60jqilOQgrUCAC+M NfeICrzXpi/hjPl/pWSoU1heUTiBspOz2/yfU8BfGbF7FLlYcL6vWSlkhoh/s6cEoG WKMW8JTfmi1Uw== Date: Mon, 29 Jul 2024 18:07:05 -0700 Subject: [PATCH 05/13] xfs_scrub: guard against libicu returning negative buffer lengths From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847607.1348850.1430158585543881123.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The libicu functions u_strFromUTF8, unorm2_normalize, and uspoof_getSkeleton return int32_t values. Guard against negative return values, even though the library itself never does this. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 4517e2bce..456caec27 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -189,7 +189,7 @@ name_entry_compute_checknames( /* Convert bytestr to unistr for normalization */ u_strFromUTF8(NULL, 0, &unistrlen, entry->name, entry->namelen, &uerr); - if (uerr != U_BUFFER_OVERFLOW_ERROR) + if (uerr != U_BUFFER_OVERFLOW_ERROR || unistrlen < 0) return false; uerr = U_ZERO_ERROR; unistr = calloc(unistrlen + 1, sizeof(UChar)); @@ -203,7 +203,7 @@ name_entry_compute_checknames( /* Normalize the string. */ normstrlen = unorm2_normalize(uc->normalizer, unistr, unistrlen, NULL, 0, &uerr); - if (uerr != U_BUFFER_OVERFLOW_ERROR) + if (uerr != U_BUFFER_OVERFLOW_ERROR || normstrlen < 0) goto out_unistr; uerr = U_ZERO_ERROR; normstr = calloc(normstrlen + 1, sizeof(UChar)); @@ -217,7 +217,7 @@ name_entry_compute_checknames( /* Compute skeleton. */ skelstrlen = uspoof_getSkeleton(uc->spoof, 0, unistr, unistrlen, NULL, 0, &uerr); - if (uerr != U_BUFFER_OVERFLOW_ERROR) + if (uerr != U_BUFFER_OVERFLOW_ERROR || skelstrlen < 0) goto out_normstr; uerr = U_ZERO_ERROR; skelstr = calloc(skelstrlen + 1, sizeof(UChar)); From patchwork Tue Jul 30 01:07:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746110 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 147C58BF0 for ; Tue, 30 Jul 2024 01:07:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301642; cv=none; b=LM8wey076asjmdnDETNsqI6TBHpdBd9b3iFT25MgjkK4kKY+tsEIzDgCr+5DcHGBqGoej22SkqIrtBVN9GQUoCKC9kf+ehftcXQrUQmv+bptsGaHE/a9fRGzsNf1TW6QmrBeis+LjyrzUSCYU/dSGpVMYmh3poB9zyYIrUDN2pw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301642; c=relaxed/simple; bh=8m8UtefwAmoeq70hmZKMNubb1EQJy079huBxObUcYUY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WO2XzbwUWEmBW4DSLt+ybGAzYSQupkY3r5W0yWtQTfXvCubk2OWsLIsxpIeRNqSzA+oBGXgwGgLABjqyd+5A8eG3+DA89KfiCx59iiOXYiUnBAC+fzlCZ+lRCfYnNMlXlupIATZVoQnFBn67i8a3FF8IG063irFZxfwNh4RjshM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MQSpBGob; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MQSpBGob" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E1B8FC32786; Tue, 30 Jul 2024 01:07:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301641; bh=8m8UtefwAmoeq70hmZKMNubb1EQJy079huBxObUcYUY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=MQSpBGob3VETbiovm6+i7H9aiXftDznIXQA82HGhMZFPu6vxAMQXLJPzHR6WyjWWF xIJCc5Z5CWUB2i2/5V7WgzFVbEm4sfngrHCFdaVibfvNoYTwbV12i8OPqEo6MwP9RT Zga0zxoFiUMCWq+2Yfnk8+K/kzji9P0cVaGT7/9oQqiU6b86r6VTvhmn8LVMqIOm42 cSwiEEq7i4eWLW0HKZl2V1VWzBPg/xxLv1X6OGFovQDeDMcufDQLb3obALfajV6Kpf zi6LvKcgBS6TtF/lxQKouWmRYbOeW1/KPlJblPWfdHPmyoM18ASXGSOPrXWnnqWr3B /n+3II9j8Bjsw== Date: Mon, 29 Jul 2024 18:07:21 -0700 Subject: [PATCH 06/13] xfs_scrub: hoist non-rendering character predicate From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847622.1348850.4728182139864049922.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Hoist this predicate code into its own function; we're going to use it elsewhere later on. While we're at it, document how we generated this list in the first place. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 49 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 32 insertions(+), 17 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 456caec27..1a86b5f8c 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -170,6 +170,36 @@ remove_ignorable( return dest; } +/* + * Certain unicode codepoints are formatting hints that are not themselves + * supposed to be rendered by a display system. These codepoints can be + * encoded in file names to try to confuse users. + * + * Download https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt and + * $ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt + */ +static inline bool is_nonrendering(UChar32 uchr) +{ + switch (uchr) { + case 0x034F: /* combining grapheme joiner */ + case 0x200B: /* zero width space */ + case 0x200C: /* zero width non-joiner */ + case 0x200D: /* zero width joiner */ + case 0x2028: /* line separator */ + case 0x2029: /* paragraph separator */ + case 0x2060: /* word joiner */ + case 0x2061: /* function application */ + case 0x2062: /* invisible times (multiply) */ + case 0x2063: /* invisible separator (comma) */ + case 0x2064: /* invisible plus (addition) */ + case 0x2D7F: /* tifinagh consonant joiner */ + case 0xFEFF: /* zero width non breaking space */ + return true; + } + + return false; +} + /* * Generate normalized form and skeleton of the name. If this fails, just * forget everything and return false; this is an advisory checker. @@ -349,24 +379,9 @@ name_entry_examine( uiter_setString(&uiter, entry->normstr, entry->normstrlen); while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { - /* zero width character sequences */ - switch (uchr) { - case 0x034F: /* combining grapheme joiner */ - case 0x200B: /* zero width space */ - case 0x200C: /* zero width non-joiner */ - case 0x200D: /* zero width joiner */ - case 0x2028: /* line separator */ - case 0x2029: /* paragraph separator */ - case 0x2060: /* word joiner */ - case 0x2061: /* function application */ - case 0x2062: /* invisible times (multiply) */ - case 0x2063: /* invisible separator (comma) */ - case 0x2064: /* invisible plus (addition) */ - case 0x2D7F: /* tifinagh consonant joiner */ - case 0xFEFF: /* zero width non breaking space */ + /* characters are invisible */ + if (is_nonrendering(uchr)) *badflags |= UNICRASH_ZERO_WIDTH; - break; - } /* control characters */ if (u_iscntrl(uchr)) From patchwork Tue Jul 30 01:07:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746111 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF3E48BF0 for ; Tue, 30 Jul 2024 01:07:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301657; cv=none; b=SwOj69QdWaPzYYCTBayc9TTwhnZyBk1c5JGyYTZADG3Q1RxuftS7I8mSoiF1Mb3VJM2kDX4zvn/u60IEPUEgYmCRgu82D1APMkP8mdfcVeOhqAe+mdYlWJHnCqU+OElZe8IQmi4F2pwMHQn0jr0eNOsU1a/ywwjZLbxAMfBWois= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301657; c=relaxed/simple; bh=jpLr/KwytkeSaemdnI8DqHQ1S73Vk7y7pXf9ZuV9x3E=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=rPj8/cRgLNyiw/fT4tpmmlxtAqyOSnrNy0SfXv59qgTSvjd09EvZUrk4jsRdTc2Lp7mciZsEvB2ghaJM0xsA/+10zC3QuDXt5wdj/4St6Mz65lCFo7KhE/l0Fm0QXdpiVx5g9GdwCNJBzvHekGpF9Wi848mQoaY3icJSIcWtY8k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bGeZjkum; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bGeZjkum" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8B73AC32786; Tue, 30 Jul 2024 01:07:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301657; bh=jpLr/KwytkeSaemdnI8DqHQ1S73Vk7y7pXf9ZuV9x3E=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=bGeZjkumSZ3bXw8SSyNTWRpA53n8v1cqjF4e8ewuBMUVkLwotSe09izlz41VpA2b/ bIphWXIbPxqsheOL7Hznny88ZY+HSATzRKSW4LoE5XcPDlB8qlMJPfTSKNw9//2wOc QosDgRppKkaJTJ8Zw+ExwEFUeXIAf15mnY7zPYciqkl2c7CgOb7xt12szj5vn81fab a7Y2OaHeeDXZiHJS5D4sBO9dsA+HzVDF4gPyGP4H6bLrqHDuskV4Lyo7IWN7qN6/mS PqGra2I4Q+/kGcKzGQGtyRS+xgGG01JjfTlbnbCruc9kr8gTny3APY3QlPy4y/rBut ulVt0kuziPIAg== Date: Mon, 29 Jul 2024 18:07:37 -0700 Subject: [PATCH 07/13] xfs_scrub: store bad flags with the name entry From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847636.1348850.13236039975048852780.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When scrub is checking unicode names, there are certain properties of the directory/attribute/label name itself that it can complain about. Store these in struct name_entry so that the confusable names detector can pick this up later. This restructuring enables a subsequent patch to detect suspicious sequences in the NFC normalized form of the name without needing to hang on to that NFC form until the end of processing. IOWs, it's a memory usage optimization. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 122 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 64 insertions(+), 58 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 1a86b5f8c..e98f850ab 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -69,6 +69,9 @@ struct name_entry { xfs_ino_t ino; + /* Everything that we don't like about this name. */ + unsigned int badflags; + /* Raw dirent name */ size_t namelen; char name[0]; @@ -276,6 +279,55 @@ name_entry_compute_checknames( return false; } +/* + * Check a name for suspicious elements that have appeared in filename + * spoofing attacks. This includes names that mixed directions or contain + * direction overrides control characters, both of which have appeared in + * filename spoofing attacks. + */ +static unsigned int +name_entry_examine( + const struct name_entry *entry) +{ + UCharIterator uiter; + UChar32 uchr; + uint8_t mask = 0; + unsigned int ret = 0; + + uiter_setString(&uiter, entry->normstr, entry->normstrlen); + while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { + /* characters are invisible */ + if (is_nonrendering(uchr)) + ret |= UNICRASH_ZERO_WIDTH; + + /* control characters */ + if (u_iscntrl(uchr)) + ret |= UNICRASH_CONTROL_CHAR; + + switch (u_charDirection(uchr)) { + case U_LEFT_TO_RIGHT: + mask |= 0x01; + break; + case U_RIGHT_TO_LEFT: + mask |= 0x02; + break; + case U_RIGHT_TO_LEFT_OVERRIDE: + ret |= UNICRASH_BIDI_OVERRIDE; + break; + case U_LEFT_TO_RIGHT_OVERRIDE: + ret |= UNICRASH_BIDI_OVERRIDE; + break; + default: + break; + } + } + + /* mixing left-to-right and right-to-left chars */ + if (mask == 0x3) + ret |= UNICRASH_BIDI_MIXED; + return ret; +} + /* Create a new name entry, returns false if we could not succeed. */ static bool name_entry_create( @@ -301,6 +353,7 @@ name_entry_create( if (!name_entry_compute_checknames(uc, new_entry)) goto out; + new_entry->badflags = name_entry_examine(new_entry); *entry = new_entry; return true; @@ -362,54 +415,6 @@ name_entry_hash( } } -/* - * Check a name for suspicious elements that have appeared in filename - * spoofing attacks. This includes names that mixed directions or contain - * direction overrides control characters, both of which have appeared in - * filename spoofing attacks. - */ -static void -name_entry_examine( - struct name_entry *entry, - unsigned int *badflags) -{ - UCharIterator uiter; - UChar32 uchr; - uint8_t mask = 0; - - uiter_setString(&uiter, entry->normstr, entry->normstrlen); - while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { - /* characters are invisible */ - if (is_nonrendering(uchr)) - *badflags |= UNICRASH_ZERO_WIDTH; - - /* control characters */ - if (u_iscntrl(uchr)) - *badflags |= UNICRASH_CONTROL_CHAR; - - switch (u_charDirection(uchr)) { - case U_LEFT_TO_RIGHT: - mask |= 0x01; - break; - case U_RIGHT_TO_LEFT: - mask |= 0x02; - break; - case U_RIGHT_TO_LEFT_OVERRIDE: - *badflags |= UNICRASH_BIDI_OVERRIDE; - break; - case U_LEFT_TO_RIGHT_OVERRIDE: - *badflags |= UNICRASH_BIDI_OVERRIDE; - break; - default: - break; - } - } - - /* mixing left-to-right and right-to-left chars */ - if (mask == 0x3) - *badflags |= UNICRASH_BIDI_MIXED; -} - /* Initialize the collision detector. */ static int unicrash_init( @@ -640,17 +645,17 @@ _("Unicode name \"%s\" in %s could be confused with \"%s\"."), * must be skeletonized according to Unicode TR39 to detect names that * could be visually confused with each other. */ -static void +static unsigned int unicrash_add( struct unicrash *uc, struct name_entry **new_entryp, - unsigned int *badflags, struct name_entry **existing_entry) { struct name_entry *new_entry = *new_entryp; struct name_entry *entry; size_t bucket; xfs_dahash_t hash; + unsigned int badflags = new_entry->badflags; /* Store name in hashtable. */ hash = name_entry_hash(new_entry); @@ -671,28 +676,30 @@ unicrash_add( uc->buckets[bucket] = new_entry->next; name_entry_free(new_entry); *new_entryp = NULL; - return; + return 0; } /* Same normalization? */ if (new_entry->normstrlen == entry->normstrlen && !u_strcmp(new_entry->normstr, entry->normstr) && (uc->compare_ino ? entry->ino != new_entry->ino : true)) { - *badflags |= UNICRASH_NOT_UNIQUE; + badflags |= UNICRASH_NOT_UNIQUE; *existing_entry = entry; - return; + break; } /* Confusable? */ if (new_entry->skelstrlen == entry->skelstrlen && !u_strcmp(new_entry->skelstr, entry->skelstr) && (uc->compare_ino ? entry->ino != new_entry->ino : true)) { - *badflags |= UNICRASH_CONFUSABLE; + badflags |= UNICRASH_CONFUSABLE; *existing_entry = entry; - return; + break; } entry = entry->next; } + + return badflags; } /* Check a name for unicode normalization problems or collisions. */ @@ -706,14 +713,13 @@ __unicrash_check_name( { struct name_entry *dup_entry = NULL; struct name_entry *new_entry = NULL; - unsigned int badflags = 0; + unsigned int badflags; /* If we can't create entry data, just skip it. */ if (!name_entry_create(uc, name, ino, &new_entry)) return 0; - name_entry_examine(new_entry, &badflags); - unicrash_add(uc, &new_entry, &badflags, &dup_entry); + badflags = unicrash_add(uc, &new_entry, &dup_entry); if (new_entry && badflags) unicrash_complain(uc, dsc, namedescr, new_entry, badflags, dup_entry); From patchwork Tue Jul 30 01:07:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746112 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B3F18BF0 for ; Tue, 30 Jul 2024 01:07:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301673; cv=none; b=svM8kWj4KVlrPCRtFsgCAnLPEaanc4cZ171yiAenCPNgnGtc+wKJiWB2a4HAffFDQKs9I1Kba3wxC2b3dlkb0C8FcvqLOmA+8AK1nkJOXbMIrVG9Naj7VKaI6lFhYIbT1KoVfF+S3b7VMnoZ+lOskWrFQ99R0/5xmkSNqTUryIo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301673; c=relaxed/simple; bh=BYsCQ2kUyIy6LU2tU4ROVz/qsex4Kp3i7vMDKYyzdmk=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ioCpXTX7Roy06zOmHsK+JuhRLu7w2z3M/NhTk12+pUIx4xWAe/s2SnyvbKUSJKqyJbWJwuBBOdV3zs81XtH1VaQpOaBPTSqLS3ViWmzNl/JTQJXnaDtcuuxJaTm9+WHSDSAbazROTcw8ROSfXP2lLYHvu758edjj7i4CC7tRpyY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MjzLn3iw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MjzLn3iw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 62923C32786; Tue, 30 Jul 2024 01:07:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301673; bh=BYsCQ2kUyIy6LU2tU4ROVz/qsex4Kp3i7vMDKYyzdmk=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=MjzLn3iw8wNVE+xIYEBfIKuBB+Uv8gcDs8b4uYQWyfgBu22brNPcIzD+ThKlgNDpN NSHwHUQGe0CZqFfIwmihhPD0TbjV94/9RpGiQklKo0UovCEpvCBsEuePHDE2FFXK/z dlkhyqlHtc6uta3areUa5Y6IbM2SIQOjoTrqk9mP4YIrkelPOt9eZdby+8OQtrnH6d wHfb82A1hQjPZSx2ofuFJeBHYuo2OpCM/fk4z3wpghh9f8ovDvJNruqt/YubfOgB8y taegUju47XSNAz9TUNIqeg8SgT0+DlynfWJ2yutt6bvDTrZK9x/KFtub3FXf2jFDWq qkXElpIYfyRmw== Date: Mon, 29 Jul 2024 18:07:52 -0700 Subject: [PATCH 08/13] xfs_scrub: rename UNICRASH_ZERO_WIDTH to UNICRASH_INVISIBLE From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847652.1348850.6582462888057187832.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong "Zero width" doesn't fully describe what the flag represents -- it gets set for any codepoint that doesn't render. Rename it accordingly. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index e98f850ab..5447d94f0 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -109,7 +109,7 @@ struct unicrash { #define UNICRASH_CONTROL_CHAR (1 << 3) /* Invisible characters. Only a problem if we have collisions. */ -#define UNICRASH_ZERO_WIDTH (1 << 4) +#define UNICRASH_INVISIBLE (1 << 4) /* Multiple names resolve to the same skeleton string. */ #define UNICRASH_CONFUSABLE (1 << 5) @@ -298,7 +298,7 @@ name_entry_examine( while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { /* characters are invisible */ if (is_nonrendering(uchr)) - ret |= UNICRASH_ZERO_WIDTH; + ret |= UNICRASH_INVISIBLE; /* control characters */ if (u_iscntrl(uchr)) @@ -582,7 +582,7 @@ _("Unicode name \"%s\" in %s renders identically to \"%s\"."), * confused with another name as a result, we should complain. * "moocow" and "moocow" are misleading. */ - if ((badflags & UNICRASH_ZERO_WIDTH) && + if ((badflags & UNICRASH_INVISIBLE) && (badflags & UNICRASH_CONFUSABLE)) { str_warn(uc->ctx, descr_render(dsc), _("Unicode name \"%s\" in %s could be confused with '%s' due to invisible characters."), From patchwork Tue Jul 30 01:08:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746113 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44D4B8BF0 for ; Tue, 30 Jul 2024 01:08:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301689; cv=none; b=hwTPVMtge9+Nyra+8PrCnA7nwZQrzJoSQYaaX5SdC/+0LlfLmLjViVvW4ncbBgVJ1i4jm39+0OQKVKrYkeBMntFKmyIwg+IP2XAI+nvnXF0WggoqRy5NcdtgMlxngfU52fIz5+9siLCfKLnz2iW3y27EHkSf6jgh1gMWL/MA1TU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301689; c=relaxed/simple; bh=+T2svynhSRS4pq4cYgWTHmIIHeR5985xnhZugHYBkkc=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hMV7RZKY5aar2Uq5BqjPFxOaMF5DC7nXH3gsHGY8CgcCWnG069zaDutZQgQuTftBteQP+kuA9P5PsbTaNLFf7vY0jKTE73E+gzBODMGcRaXtEFj3OVvOjfYn8Dl//pod1I2qRmBZQVVOKp8eN8tbmtmaIfhHcgQKS2ld6EI0nIs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fWfuA8qR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fWfuA8qR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1951EC32786; Tue, 30 Jul 2024 01:08:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301689; bh=+T2svynhSRS4pq4cYgWTHmIIHeR5985xnhZugHYBkkc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=fWfuA8qR/zQcpQWAuLCY8l4zeQOpvIslTRIk/+HVqo/0uL/15r+ADuiWMezwWFOqp 1ko26iyQqjtuqOL1Ud/SaqKHQ4tIgw8+nMaUHdEx3z67PJGDuR/hN1lqDlmPRMeYs8 VwTue8GMzu39HmNsV2zobfIT7pcRp4e/er7yW7UYGUgfQZ1aOail+1FpAL8xq2rVAn YA3y7jkDH31GSBftxIvI1EF+qc+3mx+x0L3uPudO03mi+8H07WykgcWSJ5MR8O4PET 2sF6GKt9JCHaFKjIThFZGRN2KjaeVhs8dbyWwY+jBVYcrfMlhA6DjuRxXaJYE+wGN9 yJCuIJPTC8Ysw== Date: Mon, 29 Jul 2024 18:08:08 -0700 Subject: [PATCH 09/13] xfs_scrub: type-coerce the UNICRASH_* flags From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847667.1348850.1406694167703531231.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Promote this type to something that we can type-check. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 5447d94f0..63694c39a 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -4,6 +4,7 @@ * Author: Darrick J. Wong */ #include "xfs.h" +#include "xfs_arch.h" #include #include #include @@ -56,6 +57,8 @@ * In other words, skel = remove_invisible(nfd(remap_confusables(nfd(name)))). */ +typedef unsigned int __bitwise badname_t; + struct name_entry { struct name_entry *next; @@ -70,7 +73,7 @@ struct name_entry { xfs_ino_t ino; /* Everything that we don't like about this name. */ - unsigned int badflags; + badname_t badflags; /* Raw dirent name */ size_t namelen; @@ -93,26 +96,29 @@ struct unicrash { /* Things to complain about in Unicode naming. */ +/* Everything is ok */ +#define UNICRASH_OK ((__force badname_t)0) + /* * Multiple names resolve to the same normalized string and therefore render * identically. */ -#define UNICRASH_NOT_UNIQUE (1 << 0) +#define UNICRASH_NOT_UNIQUE ((__force badname_t)(1U << 0)) /* Name contains directional overrides. */ -#define UNICRASH_BIDI_OVERRIDE (1 << 1) +#define UNICRASH_BIDI_OVERRIDE ((__force badname_t)(1U << 1)) /* Name mixes left-to-right and right-to-left characters. */ -#define UNICRASH_BIDI_MIXED (1 << 2) +#define UNICRASH_BIDI_MIXED ((__force badname_t)(1U << 2)) /* Control characters in name. */ -#define UNICRASH_CONTROL_CHAR (1 << 3) +#define UNICRASH_CONTROL_CHAR ((__force badname_t)(1U << 3)) /* Invisible characters. Only a problem if we have collisions. */ -#define UNICRASH_INVISIBLE (1 << 4) +#define UNICRASH_INVISIBLE ((__force badname_t)(1U << 4)) /* Multiple names resolve to the same skeleton string. */ -#define UNICRASH_CONFUSABLE (1 << 5) +#define UNICRASH_CONFUSABLE ((__force badname_t)(1U << 5)) /* * We only care about validating utf8 collisions if the underlying @@ -542,7 +548,7 @@ unicrash_complain( struct descr *dsc, const char *what, struct name_entry *entry, - unsigned int badflags, + badname_t badflags, struct name_entry *dup_entry) { char *bad1 = NULL; @@ -645,7 +651,7 @@ _("Unicode name \"%s\" in %s could be confused with \"%s\"."), * must be skeletonized according to Unicode TR39 to detect names that * could be visually confused with each other. */ -static unsigned int +static badname_t unicrash_add( struct unicrash *uc, struct name_entry **new_entryp, @@ -655,7 +661,7 @@ unicrash_add( struct name_entry *entry; size_t bucket; xfs_dahash_t hash; - unsigned int badflags = new_entry->badflags; + badname_t badflags = new_entry->badflags; /* Store name in hashtable. */ hash = name_entry_hash(new_entry); @@ -713,14 +719,14 @@ __unicrash_check_name( { struct name_entry *dup_entry = NULL; struct name_entry *new_entry = NULL; - unsigned int badflags; + badname_t badflags; /* If we can't create entry data, just skip it. */ if (!name_entry_create(uc, name, ino, &new_entry)) return 0; badflags = unicrash_add(uc, &new_entry, &dup_entry); - if (new_entry && badflags) + if (new_entry && badflags != UNICRASH_OK) unicrash_complain(uc, dsc, namedescr, new_entry, badflags, dup_entry); From patchwork Tue Jul 30 01:08:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746114 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6E828BF0 for ; Tue, 30 Jul 2024 01:08:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301705; cv=none; b=g66qtxU1kEGtYc8LsBqaAQqFnTNzdL1IerT8b4qi0xBjZSykdeNFaAAat+PawHJE5eTUvqqHxIe1KXODpZeI3IkdNS6jS06bQwYUbdVc2Cyp6tqQsObtenHQPZaDcuYbmzbFX75QlTZLl9sotLPE7rR8bS9ZtuvzdNmrga5Dxkg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301705; c=relaxed/simple; bh=ITS7lNaXEw3O6E5pGSkr2qOTyaAOFABUt3QXmb2c5fY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=E3B7pZa9ZKn/KNv5R6QpKfyWd+rZa8TDDVvv/DJs9SpAL98cHlXGmm44p0I03/rI6f94p53FzoMNYOCpWXEpgqBjkNPx4uXVXL/jIwouIg5sozNvz8EWLly33yhDpWexYCdAFdzWd3eiapD5qM1+eFGyCgg7baTPH7/Oojkjs0U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tzzT2DxE; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tzzT2DxE" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B8E12C32786; Tue, 30 Jul 2024 01:08:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301704; bh=ITS7lNaXEw3O6E5pGSkr2qOTyaAOFABUt3QXmb2c5fY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=tzzT2DxEoeZRAiASpSvjaIxSVWxbeuZAth+IBegb6Q+V9r9EIcnEqRKf3NErCacSx tmG5j8gD8Uio4NBztq6cpLG8Lcgb2VbcEmxOBrw6+rfuDvViRZjaEV/+YbQxe28E2B 2/8kNaKGkiDu2f0FBG6EyAwliigUUr35vcUTNLNxEQEJnnicknU/a+YryjAsVoKsF7 E2YN/yyejkuaVqxrljJ714OOKWEKVoLGsLUFlONJEyRsMkAg8/z6GQuhttb899fO7P 3Rq44D8WIoQhZ7J4Wx3CfirFlbYh13pOvtmbUFrl2joIlDVi+kgX+OZ4rCj5tNbvD2 Ec7WrZcWMvCtQ== Date: Mon, 29 Jul 2024 18:08:24 -0700 Subject: [PATCH 10/13] xfs_scrub: reduce size of struct name_entry From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847681.1348850.5157188790717732185.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong libicu doesn't support processing strings longer than 2GB in length, and we never feed the unicrash code a name longer than about 300 bytes. Rearrange the structure to reduce the head structure size from 56 bytes to 44 bytes. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 63694c39a..74c0fe1f9 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -57,18 +57,20 @@ * In other words, skel = remove_invisible(nfd(remap_confusables(nfd(name)))). */ -typedef unsigned int __bitwise badname_t; +typedef uint16_t __bitwise badname_t; struct name_entry { struct name_entry *next; /* NFKC normalized name */ UChar *normstr; - size_t normstrlen; /* Unicode skeletonized name */ UChar *skelstr; - size_t skelstrlen; + + /* Lengths for normstr and skelstr */ + int32_t normstrlen; + int32_t skelstrlen; xfs_ino_t ino; @@ -76,7 +78,7 @@ struct name_entry { badname_t badflags; /* Raw dirent name */ - size_t namelen; + uint16_t namelen; char name[0]; }; #define NAME_ENTRY_SZ(nl) (sizeof(struct name_entry) + 1 + \ @@ -345,6 +347,12 @@ name_entry_create( struct name_entry *new_entry; size_t namelen = strlen(name); + /* should never happen */ + if (namelen > UINT16_MAX) { + ASSERT(namelen <= UINT16_MAX); + return false; + } + /* Create new entry */ new_entry = calloc(NAME_ENTRY_SZ(namelen), 1); if (!new_entry) From patchwork Tue Jul 30 01:08:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746115 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E53AD10E3 for ; Tue, 30 Jul 2024 01:08:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301721; cv=none; b=DXd9+P+dvAmmO6lM4ecNaB0eGo3l7z9R0gTVhwg3W0Ev1vvtaFMWsDB8tFaSdut35siCtRtLvAMRvIQ/6T53JWCnARDYjhEMBDBZx7J1VDTc1rPSgqjrh4z/1upEQEFyRk/DwHIR/nzRQ937a09RVCdl+oDU2FICMryVj+eO5o4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301721; c=relaxed/simple; bh=dOyOyTnSQVlRDMxGss5Ydyf63y30U3aB4uSHZb9bq84=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=EkKf6VCPLwd5GIZrlsHMCKWEOablOuFirnkHgns8m1UfFkM+rc0LbB5efeEeBmVJS/jvaXHtVBKzQwaD84Y9eoeg0XWbuRslQvqjUWlQ/5n5INIrAQqYFcNe20Q9ktPTTJRNS61TM1x4EfX4yI0ld5t8UtgrMx4AoUhV3gjN0+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dOYrtNBL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dOYrtNBL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6D0A8C32786; Tue, 30 Jul 2024 01:08:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301720; bh=dOyOyTnSQVlRDMxGss5Ydyf63y30U3aB4uSHZb9bq84=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=dOYrtNBLWztuXrh7h1Y7txQ3MdXbnjl1sfmcEVU/ijNBYWW05CRlK0eRfBWwNXCQY //mMHmpmNCSzXWrMaFaowvCkfqGU3J44dhF/2dGJARpToVr7FYjBH1pMs253hv4Z3f iIEly2DjruGhsC+gx3rYE/KsRr15jOq6OXZFt2zYSZzEKqsXyRo16fld+jnscH0qwK 4W60DlNtcMX50ERgvg350c/p39WH96BJ2/ifhhLsPmTio+wqjb/p7W3pbpQKv4lopO oR7iUOEOAD3WphLV/stXs4mzQsPs7c4QdgxdOyURKIxccgf4MvXMS4c4RSfVRU4J9j NUnZu7d+vcDVA== Date: Mon, 29 Jul 2024 18:08:39 -0700 Subject: [PATCH 11/13] xfs_scrub: rename struct unicrash.normalizer From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847697.1348850.293417564068933959.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong We're about to introduce a second normalizer, so change the name of the existing one to reflect the algorithm that you'll get if you use it. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 74c0fe1f9..9cde9afff 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -87,7 +87,7 @@ struct name_entry { struct unicrash { struct scrub_ctx *ctx; USpoofChecker *spoof; - const UNormalizer2 *normalizer; + const UNormalizer2 *nfkc; bool compare_ino; bool is_only_root_writeable; size_t nr_buckets; @@ -242,7 +242,7 @@ name_entry_compute_checknames( goto out_unistr; /* Normalize the string. */ - normstrlen = unorm2_normalize(uc->normalizer, unistr, unistrlen, NULL, + normstrlen = unorm2_normalize(uc->nfkc, unistr, unistrlen, NULL, 0, &uerr); if (uerr != U_BUFFER_OVERFLOW_ERROR || normstrlen < 0) goto out_unistr; @@ -250,7 +250,7 @@ name_entry_compute_checknames( normstr = calloc(normstrlen + 1, sizeof(UChar)); if (!normstr) goto out_unistr; - unorm2_normalize(uc->normalizer, unistr, unistrlen, normstr, normstrlen, + unorm2_normalize(uc->nfkc, unistr, unistrlen, normstr, normstrlen, &uerr); if (U_FAILURE(uerr)) goto out_normstr; @@ -457,7 +457,7 @@ unicrash_init( p->ctx = ctx; p->nr_buckets = nr_buckets; p->compare_ino = compare_ino; - p->normalizer = unorm2_getNFKCInstance(&uerr); + p->nfkc = unorm2_getNFKCInstance(&uerr); if (U_FAILURE(uerr)) goto out_free; p->spoof = uspoof_open(&uerr); From patchwork Tue Jul 30 01:08:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746116 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4231A4C97 for ; Tue, 30 Jul 2024 01:08:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301736; cv=none; b=uz8aTe/aTKiA6+Hylkb9kgY9/VYeW6fCIpdwIa3W0omHwb0SkJVGKQRujavx/Pd3NnkaxeY9ucGxTWwEI/u/XNUpjxct2v7wBqutqULsplmN2sWO/aaLJSm/B8Olj6OL9EVA4uKEzTN1hmXrJvM4Ckam/YnCox5K3+w6aXXmPoY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301736; c=relaxed/simple; bh=wJ6U9I2Kitxnbrk+33yep4gbpPEKeRssdnMl0E9uifg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Tts6Ib7ho/LeQWzAcVYQbKbGE+w5GYlvXlmlOid4x+mF+6HTqx0DbWbeIeP0Gwuwj91BU2Zby/L8xo/t4uCmzG/8ZxvXnX2RjN/HxUTk4l1+mzdEDKTNRdP9/PvOK+tYeWgOxQPppLWM0nCc9uHrU14wBK8iaZTqwPgMPPHRfP0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=V69yFdiA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="V69yFdiA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1A15FC4AF07; Tue, 30 Jul 2024 01:08:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301736; bh=wJ6U9I2Kitxnbrk+33yep4gbpPEKeRssdnMl0E9uifg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=V69yFdiAmm6BJRZ9U57L3Y9vUEcLZDniNhcqJI8MyxyPBCSn07ILwvF0r7t+Kjw7u qqLMKffFsDodo6Xf5PJwP9xZKJOjoS2K+YQwBbpClCcGt6ZmipLRpvBASZZx7akO5z +v7ShveGGJgBe4NkUb6HWypBDASFQmNTqDnDhNmgnd4GOhDt3sDCEqZv4rDwqePMvW pAdQ51Z6/RhLlkffEuVY6jNnEt1l++HUg4mYvtkfIvy5JzCyqAu3AL4U3AnFCd7AQz CNylP31Egx4Ik4tvqacRKEwmbuxhuRFcsGjGV8cQhV1aqc8G0/NfoJQAKnlPzUcQRb mKqstG65f+YEg== Date: Mon, 29 Jul 2024 18:08:55 -0700 Subject: [PATCH 12/13] xfs_scrub: report deceptive file extensions From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847711.1348850.157975623845704769.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Earlier this year, ESET revealed that Linux users had been tricked into opening executables containing malware payloads. The trickery came in the form of a malicious zip file containing a filename with the string "job offer․pdf". Note that the filename does *not* denote a real pdf file, since the last four codepoints in the file name are "ONE DOT LEADER", p, d, and f. Not period (ok, FULL STOP), p, d, f like you'd normally expect. Teach xfs_scrub to look for codepoints that could be confused with a period followed by alphanumerics. Link: https://www.welivesecurity.com/2023/04/20/linux-malware-strengthens-links-lazarus-3cx-supply-chain-attack/ Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 215 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 214 insertions(+), 1 deletion(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 9cde9afff..8a896f33c 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -88,6 +88,7 @@ struct unicrash { struct scrub_ctx *ctx; USpoofChecker *spoof; const UNormalizer2 *nfkc; + const UNormalizer2 *nfc; bool compare_ino; bool is_only_root_writeable; size_t nr_buckets; @@ -122,6 +123,12 @@ struct unicrash { /* Multiple names resolve to the same skeleton string. */ #define UNICRASH_CONFUSABLE ((__force badname_t)(1U << 5)) +/* Possible phony file extension. */ +#define UNICRASH_PHONY_EXTENSION ((__force badname_t)(1U << 6)) + +/* FULL STOP (aka period), 0x2E */ +#define UCHAR_PERIOD ((UChar32)'.') + /* * We only care about validating utf8 collisions if the underlying * system configuration says we're using utf8. If the language @@ -211,6 +218,193 @@ static inline bool is_nonrendering(UChar32 uchr) return false; } +/* + * Decide if this unicode codepoint looks similar enough to a period (".") + * to fool users into thinking that any subsequent alphanumeric sequence is + * the file extension. Most of the fullstop characters do not do this. + * + * $ grep -i 'full stop' UnicodeData.txt + */ +static inline bool is_fullstop_lookalike(UChar32 uchr) +{ + switch (uchr) { + case 0x0701: /* syriac supralinear full stop */ + case 0x0702: /* syriac sublinear full stop */ + case 0x2024: /* one dot leader */ + case 0xA4F8: /* lisu letter tone mya ti */ + case 0xFE52: /* small full stop */ + case 0xFF61: /* haflwidth ideographic full stop */ + case 0xFF0E: /* fullwidth full stop */ + return true; + } + + return false; +} + +/* How many UChar do we need to fit a full UChar32 codepoint? */ +#define UCHAR_PER_UCHAR32 2 + +/* Format this UChar32 into a UChar buffer. */ +static inline int32_t +uchar32_to_uchar( + UChar32 uchr, + UChar *buf) +{ + int32_t i = 0; + bool err = false; + + U16_APPEND(buf, i, UCHAR_PER_UCHAR32, uchr, err); + if (err) + return 0; + return i; +} + +/* Extract a single UChar32 code point from this UChar string. */ +static inline UChar32 +uchar_to_uchar32( + UChar *buf, + int32_t buflen) +{ + UChar32 ret; + int32_t i = 0; + + U16_NEXT(buf, i, buflen, ret); + return ret; +} + +/* + * For characters that are not themselves a full stop (0x2E), let's see if the + * compatibility normalization (NFKC) will turn it into a full stop. If so, + * then this could be the start of a phony file extension. + */ +static bool +is_period_lookalike( + struct unicrash *uc, + UChar32 uchr) +{ + UChar uchrstr[UCHAR_PER_UCHAR32]; + UChar nfkcstr[UCHAR_PER_UCHAR32]; + int32_t uchrstrlen, nfkcstrlen; + UChar32 nfkc_uchr; + UErrorCode uerr = U_ZERO_ERROR; + + if (uchr == UCHAR_PERIOD) + return false; + + uchrstrlen = uchar32_to_uchar(uchr, uchrstr); + if (!uchrstrlen) + return false; + + /* + * Normalize the UChar string to NFKC form, which does all the + * compatibility transformations. + */ + nfkcstrlen = unorm2_normalize(uc->nfkc, uchrstr, uchrstrlen, NULL, + 0, &uerr); + if (uerr == U_BUFFER_OVERFLOW_ERROR) + return false; + + uerr = U_ZERO_ERROR; + unorm2_normalize(uc->nfkc, uchrstr, uchrstrlen, nfkcstr, nfkcstrlen, + &uerr); + if (U_FAILURE(uerr)) + return false; + + nfkc_uchr = uchar_to_uchar32(nfkcstr, nfkcstrlen); + return nfkc_uchr == UCHAR_PERIOD; +} + +/* + * Detect directory entry names that contain deceptive sequences that look like + * file extensions but are not. This we define as a sequence that begins with + * a code point that renders like a period ("full stop" in unicode parlance) + * but is not actually a period, followed by any number of alphanumeric code + * points or a period, all the way to the end. + * + * The 3cx attack used a zip file containing an executable file named "job + * offer․pdf". Note that the dot mark in the extension is /not/ a period but + * the Unicode codepoint "leader dot". The file was also marked executable + * inside the zip file, which meant that naïve file explorers could inflate + * the file and restore the execute bit. If a user double-clicked on the file, + * the binary would open a decoy pdf while infecting the system. + * + * For this check, we need to normalize with canonical (and not compatibility) + * decomposition, because compatibility mode will turn certain code points + * (e.g. one dot leader, 0x2024) into actual periods (0x2e). The NFC + * composition is not needed after this, so we save some memory by keeping this + * a separate function from name_entry_examine. + */ +static badname_t +name_entry_phony_extension( + struct unicrash *uc, + const UChar *unistr, + int32_t unistrlen) +{ + UCharIterator uiter; + UChar *nfcstr; + int32_t nfcstrlen; + UChar32 uchr; + bool maybe_phony_extension = false; + badname_t ret = UNICRASH_OK; + UErrorCode uerr = U_ZERO_ERROR; + + /* Normalize with NFC. */ + nfcstrlen = unorm2_normalize(uc->nfc, unistr, unistrlen, NULL, + 0, &uerr); + if (uerr != U_BUFFER_OVERFLOW_ERROR || nfcstrlen < 0) + return ret; + uerr = U_ZERO_ERROR; + nfcstr = calloc(nfcstrlen + 1, sizeof(UChar)); + if (!nfcstr) + return ret; + unorm2_normalize(uc->nfc, unistr, unistrlen, nfcstr, nfcstrlen, + &uerr); + if (U_FAILURE(uerr)) + goto out_nfcstr; + + /* Examine the NFC normalized string... */ + uiter_setString(&uiter, nfcstr, nfcstrlen); + while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { + /* + * If this *looks* like, but is not, a full stop (0x2E), this + * could be the start of a phony file extension. + */ + if (is_period_lookalike(uc, uchr)) { + maybe_phony_extension = true; + continue; + } + + if (is_fullstop_lookalike(uchr)) { + /* + * The normalizer above should catch most of these + * codepoints that look like periods, but record the + * ones known to have been used in attacks. + */ + maybe_phony_extension = true; + } else if (uchr == UCHAR_PERIOD) { + /* + * Due to the propensity of file explorers to obscure + * file extensions in the name of "user friendliness", + * this classifier ignores periods. + */ + } else { + /* + * File extensions (as far as the author knows) tend + * only to use ascii alphanumerics. + */ + if (maybe_phony_extension && + !u_isalnum(uchr) && !is_nonrendering(uchr)) + maybe_phony_extension = false; + } + } + if (maybe_phony_extension) + ret |= UNICRASH_PHONY_EXTENSION; + +out_nfcstr: + free(nfcstr); + return ret; +} + /* * Generate normalized form and skeleton of the name. If this fails, just * forget everything and return false; this is an advisory checker. @@ -271,6 +465,11 @@ name_entry_compute_checknames( skelstrlen = remove_ignorable(skelstr, skelstrlen); + /* Check for deceptive file extensions in directory entry names. */ + if (entry->ino) + entry->badflags |= name_entry_phony_extension(uc, unistr, + unistrlen); + entry->skelstr = skelstr; entry->skelstrlen = skelstrlen; entry->normstr = normstr; @@ -367,7 +566,7 @@ name_entry_create( if (!name_entry_compute_checknames(uc, new_entry)) goto out; - new_entry->badflags = name_entry_examine(new_entry); + new_entry->badflags |= name_entry_examine(new_entry); *entry = new_entry; return true; @@ -458,6 +657,9 @@ unicrash_init( p->nr_buckets = nr_buckets; p->compare_ino = compare_ino; p->nfkc = unorm2_getNFKCInstance(&uerr); + if (U_FAILURE(uerr)) + goto out_free; + p->nfc = unorm2_getNFCInstance(&uerr); if (U_FAILURE(uerr)) goto out_free; p->spoof = uspoof_open(&uerr); @@ -604,6 +806,17 @@ _("Unicode name \"%s\" in %s could be confused with '%s' due to invisible charac goto out; } + /* + * Fake looking file extensions have tricked Linux users into thinking + * that an executable is actually a pdf. See Lazarus 3cx attack. + */ + if (badflags & UNICRASH_PHONY_EXTENSION) { + str_warn(uc->ctx, descr_render(dsc), +_("Unicode name \"%s\" in %s contains a possibly deceptive file extension."), + bad1, what); + goto out; + } + /* * Unfiltered control characters can mess up your terminal and render * invisibly in filechooser UIs. From patchwork Tue Jul 30 01:09:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746117 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6F15C121 for ; Tue, 30 Jul 2024 01:09:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301751; cv=none; b=G/lMcvBFtVvyovtMu8l0C34idKkECzyRTfL4dXdhYVK3lhokarHKj+QOa5dPCYCxNXFJc79mWmLetGjBW7oMeyazhTMQzhGUueP6E4DoUOY7H5K4wcmUnLJ/abE1elkoBpgCOkdhJKjwwgJK6PSMWhuKgu8ewI7+tGXA16V4ySE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301751; c=relaxed/simple; bh=Qg6Rqr05i0QzsWtHkoH73PQVdQ1BBfoqgXOUHxS6Vn4=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=h+HqXsH+j6Ls3qMSEGi5MDGkFzG4RIK7w2ZHeX+Y33iVKQjvhLK/ib879siPRsdwIAlPrjZd764Z4PooX+eAzDyrUr5H7wQftas62/FhNq4o+HJoX2pJI5j3s21wU8D6hINU1e4aj/bjky+aJhhxRWxKz73bX+zMQSDDtzq3K40= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=L45DIodz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="L45DIodz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B0ED9C32786; Tue, 30 Jul 2024 01:09:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301751; bh=Qg6Rqr05i0QzsWtHkoH73PQVdQ1BBfoqgXOUHxS6Vn4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=L45DIodzbNF12K9B1skk2bWceWm91NHhJqLAhiacsaDXpe4uOpuzTWIXaiQ+HmO8j FLu9+YaiPxddZEbZe2J4buKw3MPJOe9hID0Js8GNg0M9DrmGqZD3oiSsixNv8brLa2 50GFiDud6L978ODrR5go/XBa5o9cpLTAWBVyXbzmVchcihQDDloEtbQJg1XtZ3J6N8 QWZ7zQcnleSSALpFT7a6SGVAT5aVBqC5pCycoaiS6QI9W7JyFGw3EEqDKF7XNYK39J tFgBhO5YcSkuF6xINW4Q2Y0trxGa58eP6rCsHaPSzDzfkhbE7/2wLjyTxM7FCaUlKm +ufEUrTrIuuxQ== Date: Mon, 29 Jul 2024 18:09:11 -0700 Subject: [PATCH 13/13] xfs_scrub: dump unicode points From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229847727.1348850.14998443199466261121.stgit@frogsfrogsfrogs> In-Reply-To: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> References: <172229847517.1348850.11238185324580578408.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add some debug functions to make it easier to query unicode character properties. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 57 insertions(+), 2 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 8a896f33c..143060b56 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -5,6 +5,7 @@ */ #include "xfs.h" #include "xfs_arch.h" +#include "list.h" #include #include #include @@ -1003,14 +1004,68 @@ unicrash_check_fs_label( label, 0); } +/* Dump a unicode code point and its properties. */ +static inline void dump_uchar32(UChar32 c) +{ + UChar uchrstr[UCHAR_PER_UCHAR32]; + const char *descr; + char buf[16]; + int32_t uchrstrlen, buflen; + UProperty p; + UErrorCode uerr = U_ZERO_ERROR; + + printf("Unicode point 0x%x:", c); + + /* Convert UChar32 to UTF8 representation. */ + uchrstrlen = uchar32_to_uchar(c, uchrstr); + if (!uchrstrlen) + return; + + u_strToUTF8(buf, sizeof(buf), &buflen, uchrstr, uchrstrlen, &uerr); + if (!U_FAILURE(uerr) && buflen > 0) { + int32_t i; + + printf(" \""); + for (i = 0; i < buflen; i++) + printf("\\x%02x", buf[i]); + printf("\""); + } + printf("\n"); + + for (p = 0; p < UCHAR_BINARY_LIMIT; p++) { + int has; + + descr = u_getPropertyName(p, U_LONG_PROPERTY_NAME); + if (!descr) + descr = u_getPropertyName(p, U_SHORT_PROPERTY_NAME); + + has = u_hasBinaryProperty(c, p) ? 1 : 0; + if (descr) { + printf(" %s(%u) = %d\n", descr, p, has); + } else { + printf(" ?(%u) = %d\n", p, has); + } + } +} + /* Load libicu and initialize it. */ bool unicrash_load(void) { - UErrorCode uerr = U_ZERO_ERROR; + char *dbgstr; + UChar32 uchr; + UErrorCode uerr = U_ZERO_ERROR; u_init(&uerr); - return U_FAILURE(uerr); + if (U_FAILURE(uerr)) + return true; + + dbgstr = getenv("XFS_SCRUB_DUMP_CHAR"); + if (dbgstr) { + uchr = strtol(dbgstr, NULL, 0); + dump_uchar32(uchr); + } + return false; } /* Unload libicu once we're done with it. */