From patchwork Tue Jul 2 00:57:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718766 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 182F58BFA for ; Tue, 2 Jul 2024 00:57:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881860; cv=none; b=ChkJj0lsokll9MuUhySiVBIRMA7/ddygmeNh17ioevgBeXzY6R22hw6JjdLjEdddRB4b7ZfynudkrJ8Xg1RMQj2EFmsA/zU+riR3wuUtmiIbVBeA2okUfEhXLvTjYH99M87np6LEKPCvTilEi13oxtNASXPCXAcXsxw2os4phfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881860; c=relaxed/simple; bh=iCcFMo6fWukrZ6J4PQcFMGrS+nBMFsKBll3FoLG6mfg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RArAQaDMqPTw0nLhjyHBJQEWoxrVAQFUVueOzxEio94zOlCih7YLuZZ5zoYQRi1NUd91NgARC+NxjQoD/DNJwdS/Q0VatoD9AKNhW86Lmg7GYXrHC1UHDKkDcJsWgZ4QFsHcDbD+xtymKONiv7iE7fOxZv0PWCpKX/fiYoHbzOY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GS3lDxYz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GS3lDxYz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 94D8CC116B1; Tue, 2 Jul 2024 00:57:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719881859; bh=iCcFMo6fWukrZ6J4PQcFMGrS+nBMFsKBll3FoLG6mfg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=GS3lDxYzHsB8PyrzSgrMWTZUTdjlH4TfdBa/mnfRMQLZxZZErSysYIg2kOjdDW7IM xKRwp+zU5Z6ayRnMUYdbJPopFW++tq9AG6H8aBKD5eskjxJaZTfi6MhBJj8WtQRpMo 43ll79/x37OHzfZO5WV4Oeg/Uk1l9za0vJJfPuEyGFhAQFKC3bjPspzgRhFkiFVILe ISZnEYfV55fv8MeZtDs7y18VMYye47UO++G/6I4maiM0tY3pAeEhiZHfXtylVk5O97 iXYTjSzEtnQw0jv+OM1ZBVOVDz0Rct92bbCaB5ysGv39+EW285jUfxYoT+YHy01DaF iYwqtI/K3lsDQ== Date: Mon, 01 Jul 2024 17:57:39 -0700 Subject: [PATCH 01/13] xfs_scrub: use proper UChar string iterators From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117626.2007123.198416706338315744.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong For code that wants to examine a UChar string, use libicu's string iterators to walk UChar strings, instead of the open-coded U16_NEXT* macros that perform no typechecking. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index dd30164354e3..02a1b94efb4d 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -330,13 +330,12 @@ name_entry_examine( struct name_entry *entry, unsigned int *badflags) { + UCharIterator uiter; UChar32 uchr; - int32_t i; uint8_t mask = 0; - for (i = 0; i < entry->normstrlen;) { - U16_NEXT_UNSAFE(entry->normstr, i, uchr); - + uiter_setString(&uiter, entry->normstr, entry->normstrlen); + while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { /* zero width character sequences */ switch (uchr) { case 0x200B: /* zero width space */ From patchwork Tue Jul 2 00:57:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718767 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD525B64A for ; Tue, 2 Jul 2024 00:57:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881875; cv=none; b=twPZF8CVigBiPDdwgiayscKGT6zr1mJYMemtx+/xu/hCVJYYHcIt4I/GqVW6T/X2ecHJKB6PZJzZBIuA2RnUFftpqw2yG8v/A9SMTmm78/R4+JT6mNMOrxtLcONRKPBaBhBnNDXvSLXnbNcKwiBRxhNwZA2w0koZ6KXaxZqqqK8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881875; c=relaxed/simple; bh=zpTiJcFsD+BEDekhMsOMexU6hmx0q6T3DJEJVUI1HFs=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=t7LaIrEjfMXaL4vJEbkzURyx6rvDIw4QnJgqwPsc6SX/3eFPfosL0IuXz005R826Fo7ewZjtIPcTIdaVyrr4epw5TFbHho/h0nEaigPMY1Ja7CcHtlr95Fuf46kyazJ6C/lGo8VlI420Jh9SppQ8CpjSbU991NMtL/S2JEaydis= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ew3K95vH; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ew3K95vH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 39E03C116B1; Tue, 2 Jul 2024 00:57:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719881875; bh=zpTiJcFsD+BEDekhMsOMexU6hmx0q6T3DJEJVUI1HFs=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Ew3K95vHxMbdLV5KPyvl3VPF4f63NjUNhgs5fsnob5wNYBBrJtkGXGVpky84u5AXU Ytg41l0PKPzzR8ByGFdwr/ywZg4nrLSO1TeAHPGgjYGEM0O1NJa1otyUV95sSOWcMc iLjJJ5eweEzBICNP+fCJIYEe7tGcxgA6Vy4i1ibrcA7apqGwGVg/BvjylleDoLxUcn aJtSa4dwgXlXB19l1lkrBlEwcfxZDTjfdnec3tPiItLKTQm3IdIhv7ItRM4yE49PGH 8hWjc7TYcjUuPYcN+sHoG/YUwFvDXEDVH1YO1gjsFopGyRNlQJgg+WYWFKR6d2XUxF Xrmd1XG4W3wcA== Date: Mon, 01 Jul 2024 17:57:54 -0700 Subject: [PATCH 02/13] xfs_scrub: hoist code that removes ignorable characters From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117642.2007123.7226044391521696300.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Hoist the loop that removes "ignorable" code points from the skeleton string into a separate function and give the UChar cursors names that are easier to understand. Convert the code to use the safe versions of the U16_ accessor functions. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 02a1b94efb4d..96e20114c484 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -145,6 +145,31 @@ is_utf8_locale(void) return answer; } +/* + * Remove control/formatting characters from this string and return its new + * length. UChar32 is required for U16_NEXT, despite the name. + */ +static int32_t +remove_ignorable( + UChar *ustr, + int32_t ustrlen) +{ + UChar32 uchr; + int32_t src, dest; + + for (src = 0, dest = 0; src < ustrlen; dest = src) { + U16_NEXT(ustr, src, ustrlen, uchr); + if (!u_isIDIgnorable(uchr)) + continue; + memmove(&ustr[dest], &ustr[src], + (ustrlen - src + 1) * sizeof(UChar)); + ustrlen -= (src - dest); + src = dest; + } + + return dest; +} + /* * Generate normalized form and skeleton of the name. If this fails, just * forget everything and return false; this is an advisory checker. @@ -160,9 +185,6 @@ name_entry_compute_checknames( int32_t normstrlen; int32_t unistrlen; int32_t skelstrlen; - UChar32 uchr; - int32_t i, j; - UErrorCode uerr = U_ZERO_ERROR; /* Convert bytestr to unistr for normalization */ @@ -206,16 +228,7 @@ name_entry_compute_checknames( if (U_FAILURE(uerr)) goto out_skelstr; - /* Remove control/formatting characters from skeleton. */ - for (i = 0, j = 0; i < skelstrlen; j = i) { - U16_NEXT_UNSAFE(skelstr, i, uchr); - if (!u_isIDIgnorable(uchr)) - continue; - memmove(&skelstr[j], &skelstr[i], - (skelstrlen - i + 1) * sizeof(UChar)); - skelstrlen -= (i - j); - i = j; - } + skelstrlen = remove_ignorable(skelstr, skelstrlen); entry->skelstr = skelstr; entry->skelstrlen = skelstrlen; From patchwork Tue Jul 2 00:58:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718768 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A394EDE for ; Tue, 2 Jul 2024 00:58:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881891; cv=none; b=bq34WnVyarK0kAMNUGmWtDkC17eF8ditUANBcbb1K86JqZFOCFKzJrA3bMHESd8yVJWoCB6OKJ+Np5JEOsN+3ql3fZfbI3KIxKMrBHE/0+OWMR5J6hC6bixhHme3gQsSWMsYppvxP3eRQ9m9FOt/+TIVjApaBYlaCJblvvKgaZA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881891; c=relaxed/simple; bh=4NwdOswh07YgllYHLzuIZrf3PhCGZtVSU1jvFqbMPnQ=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=anjZOrscQT1iZx7OtsC9R3CN1ffAdhBk8em76o3aA9wBI7ia9TfcdSTTK8nlXN5NPNhWu4M+8pOHWakBbIB5PmUNBpPdd0SU+KudhFtZVnKGqxNzsa3jDzb8/iBiwdJAHMWqrxKi/MsV0FTXgtX6CXjagZQnBfgmF+JvVItp0/A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Tcbeu4uh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Tcbeu4uh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD98DC116B1; Tue, 2 Jul 2024 00:58:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719881890; bh=4NwdOswh07YgllYHLzuIZrf3PhCGZtVSU1jvFqbMPnQ=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Tcbeu4uhlPW4ggZL+Pm1b5qBaEk+7XLuDpuVQXS8CXmbX9wfVqYa/GHdz8ZhelAd6 a2NXf/TJZ7qGOHlwsHhxOIaR3DQrqlj2rsKMpFihZlz3i8e0P3D4K6sm5QSaUC2opy dXFaHctZrJGT7IA85yzgsyFDnR95hgdW0Htoz0QGN8zwQfGriiccf6vfYWiLEgnSAa 26LRuda7Pbjo2HbOyBT4m0tAnWLgn6sf02KLvQJlcFuqEmhDwl1lNULzUxoNzp63ua Ts4AMOJrWr7EfcuvnDDt0Hcfx9+SCVHhNXtOdYIJ1aCfSXkk1/WhhEukzftFXEW2eQ +xIloBtfnOqXA== Date: Mon, 01 Jul 2024 17:58:10 -0700 Subject: [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117657.2007123.5376979485947307326.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong I missed a few non-rendering code points in the "zero width" classification code. Add them now, and sort the list. $ wget https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt $ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 96e20114c484..fc1adb2caab7 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -351,15 +351,17 @@ name_entry_examine( while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { /* zero width character sequences */ switch (uchr) { + case 0x034F: /* combining grapheme joiner */ case 0x200B: /* zero width space */ case 0x200C: /* zero width non-joiner */ case 0x200D: /* zero width joiner */ - case 0xFEFF: /* zero width non breaking space */ case 0x2060: /* word joiner */ case 0x2061: /* function application */ case 0x2062: /* invisible times (multiply) */ case 0x2063: /* invisible separator (comma) */ case 0x2064: /* invisible plus (addition) */ + case 0x2D7F: /* tifinagh consonant joiner */ + case 0xFEFF: /* zero width non breaking space */ *badflags |= UNICRASH_ZERO_WIDTH; break; } From patchwork Tue Jul 2 00:58:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718769 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A69E6FC3 for ; Tue, 2 Jul 2024 00:58:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881906; cv=none; b=QYACl1CSj0EmUYj0eitHzFUhP6G3nYtmdc4f5gGpkMNvSDuQluC47OCuZBRxHPQHuF+zPsu/QA+56BDE7JPCu831PnOvo9acJ2WJYlx0cidpOnVteqKmEhUl9NZNXE5SuuN5etNe7VD9HB8LkIlVqJY6Qn+kP2IjlpoN+f5wzg4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881906; c=relaxed/simple; bh=YGF71p9bHkNTZoBkDGTz041KCYInXLTRWSBsCuocVbI=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=A4ufPH4aAhXWS/yHsw6+BqYdaVIY7taZZE1doA77ka2JBIOXQZ+iuLL/T9ySIcvGD/cTPFgv1IVkPlc4fLUFISzCC+tMtNIAI6G2pbs4zRqHOTiS75NoOVvClnj5OaBMTZcD6WSwq5NIeXoaRtvJQG9GDCEY/tCrdGS6zYWkYO0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=otOWjFGn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="otOWjFGn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 766ABC116B1; Tue, 2 Jul 2024 00:58:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719881906; bh=YGF71p9bHkNTZoBkDGTz041KCYInXLTRWSBsCuocVbI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=otOWjFGnmgl7nL4g4QrKoVW8Uxc1FWNLaczOiullhV5O2WVXBcoRcN27w71TRaUU3 c3fMHYZ+aSMOpqa8RlQT4sGhiD3yqLzIdFfXiKHbxfLycT4rBf644VeuTF8qUZLD/c i7isXW0mBZMyIqZRlQ3u7n9+ECPzUbFxZL0l1pdaOSOEtIZzzVOc9nblQutul77ew4 QX1S+sHeTBUL/RxNeMnLzwuuz4sP4e6qz1by9JoQOXVonHOwX4AvpcwJSeTzu4OykI BylSHe39r1oyzG5eUQnxeHdwa+jkkmh5y35DZKa1pCjdd2ctV5vDq9JYiw/OJzb3R3 MLqqR5vvS9h7A== Date: Mon, 01 Jul 2024 17:58:26 -0700 Subject: [PATCH 04/13] xfs_scrub: avoid potential UAF after freeing a duplicate name entry From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117673.2007123.15647994096603486811.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Change the function declaration of unicrash_add to set the caller's @new_entry to NULL if we detect an updated name entry and do not wish to continue processing. This avoids a theoretical UAF if the unicrash_add caller were to accidentally continue using the pointer. This isn't an /actual/ UAF because the function formerly set @badflags to zero, but let's be a little defensive. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index fc1adb2caab7..5a61d69705bd 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -626,10 +626,11 @@ _("Unicode name \"%s\" in %s could be confused with \"%s\"."), static void unicrash_add( struct unicrash *uc, - struct name_entry *new_entry, + struct name_entry **new_entryp, unsigned int *badflags, struct name_entry **existing_entry) { + struct name_entry *new_entry = *new_entryp; struct name_entry *entry; size_t bucket; xfs_dahash_t hash; @@ -652,7 +653,7 @@ unicrash_add( entry->ino = new_entry->ino; uc->buckets[bucket] = new_entry->next; name_entry_free(new_entry); - *badflags = 0; + *new_entryp = NULL; return; } @@ -695,8 +696,8 @@ __unicrash_check_name( return 0; name_entry_examine(new_entry, &badflags); - unicrash_add(uc, new_entry, &badflags, &dup_entry); - if (badflags) + unicrash_add(uc, &new_entry, &badflags, &dup_entry); + if (new_entry && badflags) unicrash_complain(uc, dsc, namedescr, new_entry, badflags, dup_entry); From patchwork Tue Jul 2 00:58:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718770 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C051BA27 for ; Tue, 2 Jul 2024 00:58:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881922; cv=none; b=KxPkV4vWtX3hF5FGyBNTG26BbGRz+yu8SwKC52C9hmxuENxNOrn5zEYy2qXuPEtq+Nyl+Ztid7C0nBTIT8bDjrFxdn6Oyi4/FMcECSHjtGd5ch17WBZGWfNiyfJKkn6sqESaorqZL6atxVED347Ud8eku2BEP55WPZ6Pz8hRPAk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881922; c=relaxed/simple; bh=MIAPnN3/4ZoRfQYD/Vt2Q9/Fexwi89/YcxW1kyYbXmM=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mZ0lFYUN2LwqxtR4obyRcE1OZoOcE/venRD142NCFDlGtvGTfYEEbUwaZsyJfcvI5rl+T7Geu3ubl7no0tEfcMfCE+eNW+auKNUGetZ/k0Zw4YrQNCR/FtpMrBWVNCNOBu8RfTYs9dST13+JEF6W2ydfZVkNswTXqY75RWjAsxg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RSR5EQ4W; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RSR5EQ4W" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 16A60C116B1; Tue, 2 Jul 2024 00:58:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719881922; bh=MIAPnN3/4ZoRfQYD/Vt2Q9/Fexwi89/YcxW1kyYbXmM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=RSR5EQ4Wly0N0Az5/UduYR5ELboOlfTF8W1N6QyBsjUVFdeWz+/Qo2CHKsJg+IPnM haSaMOtDNA7DRGVaASgk/MuJF/epE2pxPArVdGQJYBC8d+Y/nKEtzed8PiW5ODKYS+ lV/W8aJITi/AEycco+biKg1jvZSyRpnKdL9slvqUEsAg8PsMiaFNW8BLIKaZBe6tB8 0hLCcHPXXAyG6d9BtYWuU+hN14alP8s3h1JeuZb/LkGBEt+ROujCDy0uPbU28tsXAy Ey/desFlO/RaFyLGvPeKlyZCexDfUoxbXPVzl73QoLYaLZH5UVzctu+AkJKrhVVOT8 Nm6cxj7/Dhh7g== Date: Mon, 01 Jul 2024 17:58:41 -0700 Subject: [PATCH 05/13] xfs_scrub: guard against libicu returning negative buffer lengths From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117688.2007123.16198746541223850379.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The libicu functions u_strFromUTF8, unorm2_normalize, and uspoof_getSkeleton return int32_t values. Guard against negative return values, even though the library itself never does this. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 5a61d69705bd..1c0597e52f76 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -189,7 +189,7 @@ name_entry_compute_checknames( /* Convert bytestr to unistr for normalization */ u_strFromUTF8(NULL, 0, &unistrlen, entry->name, entry->namelen, &uerr); - if (uerr != U_BUFFER_OVERFLOW_ERROR) + if (uerr != U_BUFFER_OVERFLOW_ERROR || unistrlen < 0) return false; uerr = U_ZERO_ERROR; unistr = calloc(unistrlen + 1, sizeof(UChar)); @@ -203,7 +203,7 @@ name_entry_compute_checknames( /* Normalize the string. */ normstrlen = unorm2_normalize(uc->normalizer, unistr, unistrlen, NULL, 0, &uerr); - if (uerr != U_BUFFER_OVERFLOW_ERROR) + if (uerr != U_BUFFER_OVERFLOW_ERROR || normstrlen < 0) goto out_unistr; uerr = U_ZERO_ERROR; normstr = calloc(normstrlen + 1, sizeof(UChar)); @@ -217,7 +217,7 @@ name_entry_compute_checknames( /* Compute skeleton. */ skelstrlen = uspoof_getSkeleton(uc->spoof, 0, unistr, unistrlen, NULL, 0, &uerr); - if (uerr != U_BUFFER_OVERFLOW_ERROR) + if (uerr != U_BUFFER_OVERFLOW_ERROR || skelstrlen < 0) goto out_normstr; uerr = U_ZERO_ERROR; skelstr = calloc(skelstrlen + 1, sizeof(UChar)); From patchwork Tue Jul 2 00:58:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718771 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4587BE5A for ; Tue, 2 Jul 2024 00:58:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881937; cv=none; b=CZkDsXQw9cgbSLHn6MUt6mv5usJRDzcW15tgtI7sFvMPVZMQzp9BnWg84bs55t06hn+3cPITmP3puzQjvrZjgpPmvcxwF0Z95t0lutG05tyURbdhpFy567EdF3w6B7PiL9UVwGwPAhO/B6CyJfE/c80oBbvmtFJkOuBs+Jy0TJw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881937; c=relaxed/simple; bh=3h3Uwwu33ZxJiTqvJdLSfk6QE4PSNTuBz1LcGHbtOGA=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GK11wXmTMiZjE3ClRthzemVx7mjEas4JjLSVVp4+J5/rTWuHK5thAWXqbm1/NfRX9Yv8P0gndZtG4VSTyGCw0kdkHTd4KDbNA5zQX62V4o1ezRpF1mbyciWvUKk8S1jsbun7xptekHnlAhVm4073937FfMczGfnIwTPia0O7ETM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=N2tK68nC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="N2tK68nC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A95F5C116B1; Tue, 2 Jul 2024 00:58:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719881937; bh=3h3Uwwu33ZxJiTqvJdLSfk6QE4PSNTuBz1LcGHbtOGA=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=N2tK68nCNC3XrROy7Dd/upY8BjVWL9LufJH05za9NSmn9nyaMcE5sGSmb2l3mVGOC aOrHFxAHR8Oh+Efv5f/ZVUm27R1xlp2Jar1A5BctDWY3VE4AFXCiU+PrkuJIWKQV4c rDbj9SKx0bWJSQMH6pi94IumnIgknbNUPbjGcHqO1wMMnz28sx+HG9rmcBxiLscont oTsMlyt3J2zlZ16Kqz9u9eUo2PzSqSvl/mQoZT8oDnv9czIwn1J8ERoRAyfRtDEcs4 SoCOWr6y+zwYzBGDuj9WCBvscMHSP2t+0O+FawHPbmRUKiL2OLMkOKSLOq24rLuqem Yv6d699ImOrDQ== Date: Mon, 01 Jul 2024 17:58:57 -0700 Subject: [PATCH 06/13] xfs_scrub: hoist non-rendering character predicate From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117701.2007123.7758480825890362373.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Hoist this predicate code into its own function; we're going to use it elsewhere later on. While we're at it, document how we generated this list in the first place. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 45 ++++++++++++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 15 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 1c0597e52f76..385e42c6acc9 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -170,6 +170,34 @@ remove_ignorable( return dest; } +/* + * Certain unicode codepoints are formatting hints that are not themselves + * supposed to be rendered by a display system. These codepoints can be + * encoded in file names to try to confuse users. + * + * Download https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt and + * $ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt + */ +static inline bool is_nonrendering(UChar32 uchr) +{ + switch (uchr) { + case 0x034F: /* combining grapheme joiner */ + case 0x200B: /* zero width space */ + case 0x200C: /* zero width non-joiner */ + case 0x200D: /* zero width joiner */ + case 0x2060: /* word joiner */ + case 0x2061: /* function application */ + case 0x2062: /* invisible times (multiply) */ + case 0x2063: /* invisible separator (comma) */ + case 0x2064: /* invisible plus (addition) */ + case 0x2D7F: /* tifinagh consonant joiner */ + case 0xFEFF: /* zero width non breaking space */ + return true; + } + + return false; +} + /* * Generate normalized form and skeleton of the name. If this fails, just * forget everything and return false; this is an advisory checker. @@ -349,22 +377,9 @@ name_entry_examine( uiter_setString(&uiter, entry->normstr, entry->normstrlen); while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { - /* zero width character sequences */ - switch (uchr) { - case 0x034F: /* combining grapheme joiner */ - case 0x200B: /* zero width space */ - case 0x200C: /* zero width non-joiner */ - case 0x200D: /* zero width joiner */ - case 0x2060: /* word joiner */ - case 0x2061: /* function application */ - case 0x2062: /* invisible times (multiply) */ - case 0x2063: /* invisible separator (comma) */ - case 0x2064: /* invisible plus (addition) */ - case 0x2D7F: /* tifinagh consonant joiner */ - case 0xFEFF: /* zero width non breaking space */ + /* characters are invisible */ + if (is_nonrendering(uchr)) *badflags |= UNICRASH_ZERO_WIDTH; - break; - } /* control characters */ if (u_iscntrl(uchr)) From patchwork Tue Jul 2 00:59:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718772 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2AAA3209 for ; Tue, 2 Jul 2024 00:59:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881953; cv=none; b=dlg/3l6SIxt+AthuK+0i03HtqVmqWgkGMI5P/75VhVEqyd/F+3CsDgLMauziDWRpJcmHT3+FVQdwSv+LSFTKzyElbXGSSb/yhPOd6cUA7gIg/2rkL5R5Ff/UosNv657rhZO4l5epUSP3bVytJL8gwLMThhnTlj5RLmVSTuqdwaE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881953; c=relaxed/simple; bh=fEWq3wYu0uSExS7J8jiSaakiIA8kmfe/YegFkFG7tEY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KYySs1LEE7SR52wx1Ko6KUYvaUqkBOwRd1CksZ5ed2EkBkaxm8DShdNycjtdiCi1hfhScg7LtsGyb8TSaLgN2caT9qlzzu5MrnHtf9H7fL3f5x7Q1iRPIGeDPXS6fGnhljdx7UPicr06+cv59olMEqD7gfbt2yV6xmTGnD7+Fvk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=scw4WHSr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="scw4WHSr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 57DBEC116B1; Tue, 2 Jul 2024 00:59:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719881953; bh=fEWq3wYu0uSExS7J8jiSaakiIA8kmfe/YegFkFG7tEY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=scw4WHSrioXGVkTN2d0kXHS1ksnQJh7rKusYdpoc+0jWLiWFpt3wCllbv0lALRqw5 2UjfRRCHc71R8oLjSvpTOmkcBnA5lB3yRXbohb+2fXTNHySqzAu3pgpxJVoWP8XrKs Tk9KtjhOxP9swHhuw5UivL7vQT6CcpoGsav0zEZWA7y30gSWUd+pTaWfzDWkrdinqJ vpKFOOga2+dnsuIQQRKfGPsvVyva9q2C5Sqd7BtGU9qkacGDyv3QZcSiG2hW1XvsKb ZbzNWDDlPPUZsTdtpDY6qbrZv64f3A1XoZjsHEy2Rc9vSkmpjo+lQP/oxMyPP0IJtf DrLd0oybjqH9A== Date: Mon, 01 Jul 2024 17:59:12 -0700 Subject: [PATCH 07/13] xfs_scrub: store bad flags with the name entry From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117716.2007123.15452278521191859205.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When scrub is checking unicode names, there are certain properties of the directory/attribute/label name itself that it can complain about. Store these in struct name_entry so that the confusable names detector can pick this up later. This restructuring enables a subsequent patch to detect suspicious sequences in the NFC normalized form of the name without needing to hang on to that NFC form until the end of processing. IOWs, it's a memory usage optimization. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 122 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 64 insertions(+), 58 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 385e42c6acc9..a770d0d7aae4 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -69,6 +69,9 @@ struct name_entry { xfs_ino_t ino; + /* Everything that we don't like about this name. */ + unsigned int badflags; + /* Raw dirent name */ size_t namelen; char name[0]; @@ -274,6 +277,55 @@ name_entry_compute_checknames( return false; } +/* + * Check a name for suspicious elements that have appeared in filename + * spoofing attacks. This includes names that mixed directions or contain + * direction overrides control characters, both of which have appeared in + * filename spoofing attacks. + */ +static unsigned int +name_entry_examine( + const struct name_entry *entry) +{ + UCharIterator uiter; + UChar32 uchr; + uint8_t mask = 0; + unsigned int ret = 0; + + uiter_setString(&uiter, entry->normstr, entry->normstrlen); + while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { + /* characters are invisible */ + if (is_nonrendering(uchr)) + ret |= UNICRASH_ZERO_WIDTH; + + /* control characters */ + if (u_iscntrl(uchr)) + ret |= UNICRASH_CONTROL_CHAR; + + switch (u_charDirection(uchr)) { + case U_LEFT_TO_RIGHT: + mask |= 0x01; + break; + case U_RIGHT_TO_LEFT: + mask |= 0x02; + break; + case U_RIGHT_TO_LEFT_OVERRIDE: + ret |= UNICRASH_BIDI_OVERRIDE; + break; + case U_LEFT_TO_RIGHT_OVERRIDE: + ret |= UNICRASH_BIDI_OVERRIDE; + break; + default: + break; + } + } + + /* mixing left-to-right and right-to-left chars */ + if (mask == 0x3) + ret |= UNICRASH_BIDI_MIXED; + return ret; +} + /* Create a new name entry, returns false if we could not succeed. */ static bool name_entry_create( @@ -299,6 +351,7 @@ name_entry_create( if (!name_entry_compute_checknames(uc, new_entry)) goto out; + new_entry->badflags = name_entry_examine(new_entry); *entry = new_entry; return true; @@ -360,54 +413,6 @@ name_entry_hash( } } -/* - * Check a name for suspicious elements that have appeared in filename - * spoofing attacks. This includes names that mixed directions or contain - * direction overrides control characters, both of which have appeared in - * filename spoofing attacks. - */ -static void -name_entry_examine( - struct name_entry *entry, - unsigned int *badflags) -{ - UCharIterator uiter; - UChar32 uchr; - uint8_t mask = 0; - - uiter_setString(&uiter, entry->normstr, entry->normstrlen); - while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { - /* characters are invisible */ - if (is_nonrendering(uchr)) - *badflags |= UNICRASH_ZERO_WIDTH; - - /* control characters */ - if (u_iscntrl(uchr)) - *badflags |= UNICRASH_CONTROL_CHAR; - - switch (u_charDirection(uchr)) { - case U_LEFT_TO_RIGHT: - mask |= 0x01; - break; - case U_RIGHT_TO_LEFT: - mask |= 0x02; - break; - case U_RIGHT_TO_LEFT_OVERRIDE: - *badflags |= UNICRASH_BIDI_OVERRIDE; - break; - case U_LEFT_TO_RIGHT_OVERRIDE: - *badflags |= UNICRASH_BIDI_OVERRIDE; - break; - default: - break; - } - } - - /* mixing left-to-right and right-to-left chars */ - if (mask == 0x3) - *badflags |= UNICRASH_BIDI_MIXED; -} - /* Initialize the collision detector. */ static int unicrash_init( @@ -638,17 +643,17 @@ _("Unicode name \"%s\" in %s could be confused with \"%s\"."), * must be skeletonized according to Unicode TR39 to detect names that * could be visually confused with each other. */ -static void +static unsigned int unicrash_add( struct unicrash *uc, struct name_entry **new_entryp, - unsigned int *badflags, struct name_entry **existing_entry) { struct name_entry *new_entry = *new_entryp; struct name_entry *entry; size_t bucket; xfs_dahash_t hash; + unsigned int badflags = new_entry->badflags; /* Store name in hashtable. */ hash = name_entry_hash(new_entry); @@ -669,28 +674,30 @@ unicrash_add( uc->buckets[bucket] = new_entry->next; name_entry_free(new_entry); *new_entryp = NULL; - return; + return 0; } /* Same normalization? */ if (new_entry->normstrlen == entry->normstrlen && !u_strcmp(new_entry->normstr, entry->normstr) && (uc->compare_ino ? entry->ino != new_entry->ino : true)) { - *badflags |= UNICRASH_NOT_UNIQUE; + badflags |= UNICRASH_NOT_UNIQUE; *existing_entry = entry; - return; + break; } /* Confusable? */ if (new_entry->skelstrlen == entry->skelstrlen && !u_strcmp(new_entry->skelstr, entry->skelstr) && (uc->compare_ino ? entry->ino != new_entry->ino : true)) { - *badflags |= UNICRASH_CONFUSABLE; + badflags |= UNICRASH_CONFUSABLE; *existing_entry = entry; - return; + break; } entry = entry->next; } + + return badflags; } /* Check a name for unicode normalization problems or collisions. */ @@ -704,14 +711,13 @@ __unicrash_check_name( { struct name_entry *dup_entry = NULL; struct name_entry *new_entry = NULL; - unsigned int badflags = 0; + unsigned int badflags; /* If we can't create entry data, just skip it. */ if (!name_entry_create(uc, name, ino, &new_entry)) return 0; - name_entry_examine(new_entry, &badflags); - unicrash_add(uc, &new_entry, &badflags, &dup_entry); + badflags = unicrash_add(uc, &new_entry, &dup_entry); if (new_entry && badflags) unicrash_complain(uc, dsc, namedescr, new_entry, badflags, dup_entry); From patchwork Tue Jul 2 00:59:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718773 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 234837462 for ; Tue, 2 Jul 2024 00:59:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881969; cv=none; b=HzPGQxpAS5CryavLGKlfNOBx5TjUbv6wpni/C9K+sdGk0xz6PyM4m7xwsXp6otOnfHIn+176jWazUAULvSQn4zHDtzYuvgtbZehnREZ1cuthM29kqBtanrDZby6pooEZZs/yMqBUYCxHEAdFA23sBdjgPr49ey+sKLFokyjLN9M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881969; c=relaxed/simple; bh=hkh3NKr8wsVVv9geAdW9TiwPJmH6dJl+JeUZVCf7ahg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=rGnJGcXDJSBmKS21El3wvg18ERLyomtiAX0CZwMhnx6i/CxcC2Ch+wlE6SDUHl0VURMee0Tr0mYa8uGInK7VwxtsyexzAkayVzMVJ4bbfS1m/7e8xcbeMN7dxTCahaxV0U5fnSak3SkITMrv/YDHkge5qK6U7yWSobMAKmaGgjg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y4mL5Rs6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y4mL5Rs6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E6DB9C116B1; Tue, 2 Jul 2024 00:59:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719881969; bh=hkh3NKr8wsVVv9geAdW9TiwPJmH6dJl+JeUZVCf7ahg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Y4mL5Rs60SnRWUoAQ/KaVbtdRxjX55pIXVhGJ/dweOoPGkoFMpWcR0NhqWuwPYJV1 czMWIZn0GcoeGaujNkkdgWNkVKEpsQ9Ghhb8l34axkg2UVHQa5WJ4WFj8h++QofheP mk6KfbhUcvfiuFG7I4OqNtrk9WoB+YXMC4a944iUDtmSxyXOAG1gkjmF4c9hAeCqAT dDuY6dgiJI95CvPLUdVCMhFpycN1BvSi+W2VQaF958BaEmopPHiZ5USnRwf9CmnxQI cUbl/iZdj0JrFCrU0JVX1yIZ/pghVLqS33OFAyqVPR+KJ5i7lm0dQVwWf/SMHRomRG S0PGKlZRr2SCA== Date: Mon, 01 Jul 2024 17:59:28 -0700 Subject: [PATCH 08/13] xfs_scrub: rename UNICRASH_ZERO_WIDTH to UNICRASH_INVISIBLE From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117732.2007123.5575573821176131888.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong "Zero width" doesn't fully describe what the flag represents -- it gets set for any codepoint that doesn't render. Rename it accordingly. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index a770d0d7aae4..b2baa47ad6ca 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -109,7 +109,7 @@ struct unicrash { #define UNICRASH_CONTROL_CHAR (1 << 3) /* Invisible characters. Only a problem if we have collisions. */ -#define UNICRASH_ZERO_WIDTH (1 << 4) +#define UNICRASH_INVISIBLE (1 << 4) /* Multiple names resolve to the same skeleton string. */ #define UNICRASH_CONFUSABLE (1 << 5) @@ -296,7 +296,7 @@ name_entry_examine( while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { /* characters are invisible */ if (is_nonrendering(uchr)) - ret |= UNICRASH_ZERO_WIDTH; + ret |= UNICRASH_INVISIBLE; /* control characters */ if (u_iscntrl(uchr)) @@ -580,7 +580,7 @@ _("Unicode name \"%s\" in %s renders identically to \"%s\"."), * confused with another name as a result, we should complain. * "moocow" and "moocow" are misleading. */ - if ((badflags & UNICRASH_ZERO_WIDTH) && + if ((badflags & UNICRASH_INVISIBLE) && (badflags & UNICRASH_CONFUSABLE)) { str_warn(uc->ctx, descr_render(dsc), _("Unicode name \"%s\" in %s could be confused with '%s' due to invisible characters."), From patchwork Tue Jul 2 00:59:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718774 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC17B8479 for ; Tue, 2 Jul 2024 00:59:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881984; cv=none; b=aH24cQmmP3fRh7C3MZ9x3R+f/lpGznqCnO/4rHXtaOPHqOHkULrGd+bqCaxcEL0VwJfNbC7A6urrhe/2LN/XuE+0AL3mmhDbesNyjAC2hrK5+NrGiOjob3dMzkd30QfvBj5tC+BTVamKZcW0hAjZaMBZaioBh4d8XX/C8UoCknI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719881984; c=relaxed/simple; bh=o6YvHOsCb717yDORuUZxyEPCVhre7S8QNnCYxLRZcL4=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IZzCg2IVhFw0HzZLDermeQ6d60i+OwkhnXSMFI7Gdj8tBPh+ub9DfrRtvb1B3DobRzmjTh/h4zRe6CIGlG8PEXRm6L3uUeqlLWBIVa7sPjstjYHafm5UXj4gAP1eH62n+HOywF699XOjkLPP7/NNUlHibUVr7eLkZi6s6R2YC8E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KIkL0jL7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KIkL0jL7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 90389C116B1; Tue, 2 Jul 2024 00:59:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719881984; bh=o6YvHOsCb717yDORuUZxyEPCVhre7S8QNnCYxLRZcL4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=KIkL0jL7gwTtp4JtisXCzaEhwC9KmHwY9Od9KmVIax1Z0hkHqiBH3EOzyKAAh1j3d zPy/jYgEngPUZkD7E9ZRlpAgql7NQDspRMbU9M2McTNDU0w0DdAYa7FtvZz+yY0Dmb AwIOgZ58Kdd+/oJJbtSeUPPVKZttOEWsUPjcjluT36p6+AGq+/mD9L4o8pHKjRofJW MkVfxuL3mze55G4uOB/JyJ9RNtdqPmP4TTx7dwm0jNxPwBdUo7ysgZMJ+0K9hjD/c0 1T/QtkWYWEVBX217fe6M2VCKAOczkaw90Q2D35Y7H2fOpzhg98InebzxPMw1DIbnZv 82LjzCX9s3rDg== Date: Mon, 01 Jul 2024 17:59:44 -0700 Subject: [PATCH 09/13] xfs_scrub: type-coerce the UNICRASH_* flags From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117748.2007123.4070255254924781803.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Promote this type to something that we can type-check. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index b2baa47ad6ca..25f562b0a36f 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -4,6 +4,7 @@ * Author: Darrick J. Wong */ #include "xfs.h" +#include "xfs_arch.h" #include #include #include @@ -56,6 +57,8 @@ * In other words, skel = remove_invisible(nfd(remap_confusables(nfd(name)))). */ +typedef unsigned int __bitwise badname_t; + struct name_entry { struct name_entry *next; @@ -70,7 +73,7 @@ struct name_entry { xfs_ino_t ino; /* Everything that we don't like about this name. */ - unsigned int badflags; + badname_t badflags; /* Raw dirent name */ size_t namelen; @@ -93,26 +96,29 @@ struct unicrash { /* Things to complain about in Unicode naming. */ +/* Everything is ok */ +#define UNICRASH_OK ((__force badname_t)0) + /* * Multiple names resolve to the same normalized string and therefore render * identically. */ -#define UNICRASH_NOT_UNIQUE (1 << 0) +#define UNICRASH_NOT_UNIQUE ((__force badname_t)(1U << 0)) /* Name contains directional overrides. */ -#define UNICRASH_BIDI_OVERRIDE (1 << 1) +#define UNICRASH_BIDI_OVERRIDE ((__force badname_t)(1U << 1)) /* Name mixes left-to-right and right-to-left characters. */ -#define UNICRASH_BIDI_MIXED (1 << 2) +#define UNICRASH_BIDI_MIXED ((__force badname_t)(1U << 2)) /* Control characters in name. */ -#define UNICRASH_CONTROL_CHAR (1 << 3) +#define UNICRASH_CONTROL_CHAR ((__force badname_t)(1U << 3)) /* Invisible characters. Only a problem if we have collisions. */ -#define UNICRASH_INVISIBLE (1 << 4) +#define UNICRASH_INVISIBLE ((__force badname_t)(1U << 4)) /* Multiple names resolve to the same skeleton string. */ -#define UNICRASH_CONFUSABLE (1 << 5) +#define UNICRASH_CONFUSABLE ((__force badname_t)(1U << 5)) /* * We only care about validating utf8 collisions if the underlying @@ -540,7 +546,7 @@ unicrash_complain( struct descr *dsc, const char *what, struct name_entry *entry, - unsigned int badflags, + badname_t badflags, struct name_entry *dup_entry) { char *bad1 = NULL; @@ -643,7 +649,7 @@ _("Unicode name \"%s\" in %s could be confused with \"%s\"."), * must be skeletonized according to Unicode TR39 to detect names that * could be visually confused with each other. */ -static unsigned int +static badname_t unicrash_add( struct unicrash *uc, struct name_entry **new_entryp, @@ -653,7 +659,7 @@ unicrash_add( struct name_entry *entry; size_t bucket; xfs_dahash_t hash; - unsigned int badflags = new_entry->badflags; + badname_t badflags = new_entry->badflags; /* Store name in hashtable. */ hash = name_entry_hash(new_entry); @@ -711,14 +717,14 @@ __unicrash_check_name( { struct name_entry *dup_entry = NULL; struct name_entry *new_entry = NULL; - unsigned int badflags; + badname_t badflags; /* If we can't create entry data, just skip it. */ if (!name_entry_create(uc, name, ino, &new_entry)) return 0; badflags = unicrash_add(uc, &new_entry, &dup_entry); - if (new_entry && badflags) + if (new_entry && badflags != UNICRASH_OK) unicrash_complain(uc, dsc, namedescr, new_entry, badflags, dup_entry); From patchwork Tue Jul 2 00:59:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718775 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C8468830 for ; Tue, 2 Jul 2024 01:00:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719882000; cv=none; b=BqCtYq7YtiLSNHhrTwY88/UG78vPOAYi9eTjbIoq7e5qT16vPwYaBQf2gD1SpLMupFimziAYJ1HMXOHyaLXVcZQofkuzYUa6wYQ7PZGDbqSCtSuguVyHXmUsE/msns9yx9HFg7b8P6idX5K6JPYMgWR+G97Bx57XOHmavhxvDLU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719882000; c=relaxed/simple; bh=LF8G5U2AVAVEpiE2egVc1o0Vl5IdwO7RVF51oGPt84Q=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XjyJkJoIpjw9NSYteyQNCd3rdGNBXKjUaLz2RyqU8q1QWytPHnAlatAjdyiw7bad6IFaZvhu0hwVkg2gW63eRz4vlbFuqzDGR04ynzNBJ7XeTn12DEOcV7+XaPn8rPMI9p7yRDtf+MLiEyf7Ttp+dJi1JmAraf8d+6TrjEEbEgo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Le19T169; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Le19T169" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 35659C116B1; Tue, 2 Jul 2024 01:00:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719882000; bh=LF8G5U2AVAVEpiE2egVc1o0Vl5IdwO7RVF51oGPt84Q=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Le19T169KOisJjNaPGzvnWkcZPtcgt8Za9m2aVbiCCso7OrpwIaL8QGCtiefcAacD nrDbJ/x4I68ebGrzVv5F6puVcUiOxCX/AdUMENX++b0/mMUaDy5YD9EyXBAC6SGz6a FUIs/MIVsmfzxkPwhBesLPavTm0u65jMXOSxJJr6Q7oGRE7RHWWR4Ng7r9Gi+5OpcP FKnSRquhAT7puvcmgY8V6YMeqfypZv+bxDzf3egA16fQiULuGFJmxZIxBGDY2Gw7Wn cgKZ25qjfc21g8d6nfuOF7pxzvyoDfsu7oG65Hc5t7wPGDHipiijD8h+72pk6NiaLr 0pcAT8NZFXRgg== Date: Mon, 01 Jul 2024 17:59:59 -0700 Subject: [PATCH 10/13] xfs_scrub: reduce size of struct name_entry From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117763.2007123.5929393677337209049.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong libicu doesn't support processing strings longer than 2GB in length, and we never feed the unicrash code a name longer than about 300 bytes. Rearrange the structure to reduce the head structure size from 56 bytes to 44 bytes. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 25f562b0a36f..dfa798b09b0e 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -57,18 +57,20 @@ * In other words, skel = remove_invisible(nfd(remap_confusables(nfd(name)))). */ -typedef unsigned int __bitwise badname_t; +typedef uint16_t __bitwise badname_t; struct name_entry { struct name_entry *next; /* NFKC normalized name */ UChar *normstr; - size_t normstrlen; /* Unicode skeletonized name */ UChar *skelstr; - size_t skelstrlen; + + /* Lengths for normstr and skelstr */ + int32_t normstrlen; + int32_t skelstrlen; xfs_ino_t ino; @@ -76,7 +78,7 @@ struct name_entry { badname_t badflags; /* Raw dirent name */ - size_t namelen; + uint16_t namelen; char name[0]; }; #define NAME_ENTRY_SZ(nl) (sizeof(struct name_entry) + 1 + \ @@ -343,6 +345,12 @@ name_entry_create( struct name_entry *new_entry; size_t namelen = strlen(name); + /* should never happen */ + if (namelen > UINT16_MAX) { + ASSERT(namelen <= UINT16_MAX); + return false; + } + /* Create new entry */ new_entry = calloc(NAME_ENTRY_SZ(namelen), 1); if (!new_entry) From patchwork Tue Jul 2 01:00:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718776 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73EBC7F9 for ; Tue, 2 Jul 2024 01:00:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719882016; cv=none; b=ENB0IdMDEtAB8+OPU2m7Ww0SQxcmdpTfar4u0cjtY6wn1FGdpP57ct9Uzy0Q2Cgref9+4/7aEXXCHF01nelFiC1MSav61usa8f3RvCMdfSeLUEHdt98+V9+xwAFLTMc3+Ky3vSpopfnBPl/q1t1vaEyPD/MNMzbHlvYh+DKHcLE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719882016; c=relaxed/simple; bh=pHtRb5eB1u+mkgAPbJcz9Jg7RY64spRKgWfuUM2OLrU=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TxSL7W2o0DnrNOVunX5VbzKevgF7TqKZyE+UswaJTbWfYgQsdkMLGyc+Pa91e9KlgBzqR2gcgBcNiyseBKbBIL55+FW3jdNSnDy8qfjmCNiyFDb9oxXLrn5vyxAYjK0NEvtQI5l8i+mEydxneZYzW6PZ3nWaJNyVXA5Gre6VVkQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=roMWiFzx; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="roMWiFzx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id ED7DFC116B1; Tue, 2 Jul 2024 01:00:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719882016; bh=pHtRb5eB1u+mkgAPbJcz9Jg7RY64spRKgWfuUM2OLrU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=roMWiFzxfoRP49IBDDBt4uACAohMK3JiWmrIDiEo2ECdJBONYwH9raSuQhS5MnxNr a50+HiiZKIj0JwSdrin76FVMTdySWh8nfLkHHAmZHOONZqmzlTq+Vxt+UlAmBMOq8p F4WPzlc7Bp3gNKN9qI0m6kcm9w3PbstYnLht0uLnv+rbiokFezt8rhQLe/Q2wWmAwT 3CwyJLOFB9OeXeLZgdDBWj7VFbdCOnP8d+MGz4z+t7KQFBFX91hc6Go1tZECCbs6vp 4zBCQiY2GDv1KD7DrnAnKujja/omp0WCfS2XMX6lMu6zPjjtSEodbaWmAEwbtl22JH SXJKHi8vgi1Og== Date: Mon, 01 Jul 2024 18:00:15 -0700 Subject: [PATCH 11/13] xfs_scrub: rename struct unicrash.normalizer From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117778.2007123.8328418916342708343.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong We're about to introduce a second normalizer, so change the name of the existing one to reflect the algorithm that you'll get if you use it. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index dfa798b09b0e..f6b53276c05d 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -87,7 +87,7 @@ struct name_entry { struct unicrash { struct scrub_ctx *ctx; USpoofChecker *spoof; - const UNormalizer2 *normalizer; + const UNormalizer2 *nfkc; bool compare_ino; bool is_only_root_writeable; size_t nr_buckets; @@ -240,7 +240,7 @@ name_entry_compute_checknames( goto out_unistr; /* Normalize the string. */ - normstrlen = unorm2_normalize(uc->normalizer, unistr, unistrlen, NULL, + normstrlen = unorm2_normalize(uc->nfkc, unistr, unistrlen, NULL, 0, &uerr); if (uerr != U_BUFFER_OVERFLOW_ERROR || normstrlen < 0) goto out_unistr; @@ -248,7 +248,7 @@ name_entry_compute_checknames( normstr = calloc(normstrlen + 1, sizeof(UChar)); if (!normstr) goto out_unistr; - unorm2_normalize(uc->normalizer, unistr, unistrlen, normstr, normstrlen, + unorm2_normalize(uc->nfkc, unistr, unistrlen, normstr, normstrlen, &uerr); if (U_FAILURE(uerr)) goto out_normstr; @@ -455,7 +455,7 @@ unicrash_init( p->ctx = ctx; p->nr_buckets = nr_buckets; p->compare_ino = compare_ino; - p->normalizer = unorm2_getNFKCInstance(&uerr); + p->nfkc = unorm2_getNFKCInstance(&uerr); if (U_FAILURE(uerr)) goto out_free; p->spoof = uspoof_open(&uerr); From patchwork Tue Jul 2 01:00:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718777 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFDAC7F9 for ; Tue, 2 Jul 2024 01:00:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719882031; cv=none; b=dDhT2GE1L1sO6xjqL8kNwdjWIOHExd+mfOmbeXVXRmgmN8vMFwhwCEhoC+bpBmUKQo/YhRdBxrEJK+m8DMcfpoHMNCWvbuL71RAG3DOyePuavhVflmzRdzUzBbqgRDZGbV8VcoIWgxLD92rAewELw0OJnWzTqv9cJWBSY7/4M30= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719882031; c=relaxed/simple; bh=FuNwZhvz9K96BHmaG5Etyv6C53LAsvG/bF0bTHS+O9o=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DS43PNJlxXXlxzs2/EOeyJh/Joye6td/1+LQqBQmLdynRQboBnnww+Tjj/F1XqLabDRrfkbFOlCbulppgWha+Rbb7PGtEZfJ8xfUlH0CXxDfqrttmWPzWx1MmB1T9idxPQO/PYy7C7VQ+7BizL/f02fJuJXbaxaxm0NBVHlzS+E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mLlrZ6+Y; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mLlrZ6+Y" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A9040C116B1; Tue, 2 Jul 2024 01:00:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719882031; bh=FuNwZhvz9K96BHmaG5Etyv6C53LAsvG/bF0bTHS+O9o=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=mLlrZ6+YUNdCwT11icFvw4QB4XVvg1lwH9nBwqw+nlUJuT0hkJ8AmiHlD3V0euZl+ 2odP9jzm8xvQmVs2eB0MsIlU2qvCBHRmDa4ygVOz/RacdXk45//Md8Uud7bCcCq+22 XE5P8dJ3/ebIKgfy7tuasUDfAAtXGacCQpjNj2aJOc4oXARZ+yYA0vW5PRutL7Bi5J +BAh3nDrZmm+MjFhCA7mYg6cXyjDeZ68L9t5muBYpq5HlgfqjjdBz1m+MMPZ4Led06 AJXYp4rFyQBMhb047fxZOEwUk5Axsmpd9oJ0lbz8ZGgDv7TqUIUYZum1LpMdAmUN3Y obCoAA+SB3EFQ== Date: Mon, 01 Jul 2024 18:00:31 -0700 Subject: [PATCH 12/13] xfs_scrub: report deceptive file extensions From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117794.2007123.11468032897655925875.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Earlier this year, ESET revealed that Linux users had been tricked into opening executables containing malware payloads. The trickery came in the form of a malicious zip file containing a filename with the string "job offer․pdf". Note that the filename does *not* denote a real pdf file, since the last four codepoints in the file name are "ONE DOT LEADER", p, d, and f. Not period (ok, FULL STOP), p, d, f like you'd normally expect. Teach xfs_scrub to look for codepoints that could be confused with a period followed by alphanumerics. Link: https://www.welivesecurity.com/2023/04/20/linux-malware-strengthens-links-lazarus-3cx-supply-chain-attack/ Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 215 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 214 insertions(+), 1 deletion(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index f6b53276c05d..e895afe32aab 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -88,6 +88,7 @@ struct unicrash { struct scrub_ctx *ctx; USpoofChecker *spoof; const UNormalizer2 *nfkc; + const UNormalizer2 *nfc; bool compare_ino; bool is_only_root_writeable; size_t nr_buckets; @@ -122,6 +123,12 @@ struct unicrash { /* Multiple names resolve to the same skeleton string. */ #define UNICRASH_CONFUSABLE ((__force badname_t)(1U << 5)) +/* Possible phony file extension. */ +#define UNICRASH_PHONY_EXTENSION ((__force badname_t)(1U << 6)) + +/* FULL STOP (aka period), 0x2E */ +#define UCHAR_PERIOD ((UChar32)'.') + /* * We only care about validating utf8 collisions if the underlying * system configuration says we're using utf8. If the language @@ -209,6 +216,193 @@ static inline bool is_nonrendering(UChar32 uchr) return false; } +/* + * Decide if this unicode codepoint looks similar enough to a period (".") + * to fool users into thinking that any subsequent alphanumeric sequence is + * the file extension. Most of the fullstop characters do not do this. + * + * $ grep -i 'full stop' UnicodeData.txt + */ +static inline bool is_fullstop_lookalike(UChar32 uchr) +{ + switch (uchr) { + case 0x0701: /* syriac supralinear full stop */ + case 0x0702: /* syriac sublinear full stop */ + case 0x2024: /* one dot leader */ + case 0xA4F8: /* lisu letter tone mya ti */ + case 0xFE52: /* small full stop */ + case 0xFF61: /* haflwidth ideographic full stop */ + case 0xFF0E: /* fullwidth full stop */ + return true; + } + + return false; +} + +/* How many UChar do we need to fit a full UChar32 codepoint? */ +#define UCHAR_PER_UCHAR32 2 + +/* Format this UChar32 into a UChar buffer. */ +static inline int32_t +uchar32_to_uchar( + UChar32 uchr, + UChar *buf) +{ + int32_t i = 0; + bool err = false; + + U16_APPEND(buf, i, UCHAR_PER_UCHAR32, uchr, err); + if (err) + return 0; + return i; +} + +/* Extract a single UChar32 code point from this UChar string. */ +static inline UChar32 +uchar_to_uchar32( + UChar *buf, + int32_t buflen) +{ + UChar32 ret; + int32_t i = 0; + + U16_NEXT(buf, i, buflen, ret); + return ret; +} + +/* + * For characters that are not themselves a full stop (0x2E), let's see if the + * compatibility normalization (NFKC) will turn it into a full stop. If so, + * then this could be the start of a phony file extension. + */ +static bool +is_period_lookalike( + struct unicrash *uc, + UChar32 uchr) +{ + UChar uchrstr[UCHAR_PER_UCHAR32]; + UChar nfkcstr[UCHAR_PER_UCHAR32]; + int32_t uchrstrlen, nfkcstrlen; + UChar32 nfkc_uchr; + UErrorCode uerr = U_ZERO_ERROR; + + if (uchr == UCHAR_PERIOD) + return false; + + uchrstrlen = uchar32_to_uchar(uchr, uchrstr); + if (!uchrstrlen) + return false; + + /* + * Normalize the UChar string to NFKC form, which does all the + * compatibility transformations. + */ + nfkcstrlen = unorm2_normalize(uc->nfkc, uchrstr, uchrstrlen, NULL, + 0, &uerr); + if (uerr == U_BUFFER_OVERFLOW_ERROR) + return false; + + uerr = U_ZERO_ERROR; + unorm2_normalize(uc->nfkc, uchrstr, uchrstrlen, nfkcstr, nfkcstrlen, + &uerr); + if (U_FAILURE(uerr)) + return false; + + nfkc_uchr = uchar_to_uchar32(nfkcstr, nfkcstrlen); + return nfkc_uchr == UCHAR_PERIOD; +} + +/* + * Detect directory entry names that contain deceptive sequences that look like + * file extensions but are not. This we define as a sequence that begins with + * a code point that renders like a period ("full stop" in unicode parlance) + * but is not actually a period, followed by any number of alphanumeric code + * points or a period, all the way to the end. + * + * The 3cx attack used a zip file containing an executable file named "job + * offer․pdf". Note that the dot mark in the extension is /not/ a period but + * the Unicode codepoint "leader dot". The file was also marked executable + * inside the zip file, which meant that naïve file explorers could inflate + * the file and restore the execute bit. If a user double-clicked on the file, + * the binary would open a decoy pdf while infecting the system. + * + * For this check, we need to normalize with canonical (and not compatibility) + * decomposition, because compatibility mode will turn certain code points + * (e.g. one dot leader, 0x2024) into actual periods (0x2e). The NFC + * composition is not needed after this, so we save some memory by keeping this + * a separate function from name_entry_examine. + */ +static badname_t +name_entry_phony_extension( + struct unicrash *uc, + const UChar *unistr, + int32_t unistrlen) +{ + UCharIterator uiter; + UChar *nfcstr; + int32_t nfcstrlen; + UChar32 uchr; + bool maybe_phony_extension = false; + badname_t ret = UNICRASH_OK; + UErrorCode uerr = U_ZERO_ERROR; + + /* Normalize with NFC. */ + nfcstrlen = unorm2_normalize(uc->nfc, unistr, unistrlen, NULL, + 0, &uerr); + if (uerr != U_BUFFER_OVERFLOW_ERROR || nfcstrlen < 0) + return ret; + uerr = U_ZERO_ERROR; + nfcstr = calloc(nfcstrlen + 1, sizeof(UChar)); + if (!nfcstr) + return ret; + unorm2_normalize(uc->nfc, unistr, unistrlen, nfcstr, nfcstrlen, + &uerr); + if (U_FAILURE(uerr)) + goto out_nfcstr; + + /* Examine the NFC normalized string... */ + uiter_setString(&uiter, nfcstr, nfcstrlen); + while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { + /* + * If this *looks* like, but is not, a full stop (0x2E), this + * could be the start of a phony file extension. + */ + if (is_period_lookalike(uc, uchr)) { + maybe_phony_extension = true; + continue; + } + + if (is_fullstop_lookalike(uchr)) { + /* + * The normalizer above should catch most of these + * codepoints that look like periods, but record the + * ones known to have been used in attacks. + */ + maybe_phony_extension = true; + } else if (uchr == UCHAR_PERIOD) { + /* + * Due to the propensity of file explores to obscure + * file extensions in the name of "user friendliness", + * this classifier ignores periods. + */ + } else { + /* + * File extensions (as far as the author knows) tend + * only to use ascii alphanumerics. + */ + if (maybe_phony_extension && + !u_isalnum(uchr) && !is_nonrendering(uchr)) + maybe_phony_extension = false; + } + } + if (maybe_phony_extension) + ret |= UNICRASH_PHONY_EXTENSION; + +out_nfcstr: + free(nfcstr); + return ret; +} + /* * Generate normalized form and skeleton of the name. If this fails, just * forget everything and return false; this is an advisory checker. @@ -269,6 +463,11 @@ name_entry_compute_checknames( skelstrlen = remove_ignorable(skelstr, skelstrlen); + /* Check for deceptive file extensions in directory entry names. */ + if (entry->ino) + entry->badflags |= name_entry_phony_extension(uc, unistr, + unistrlen); + entry->skelstr = skelstr; entry->skelstrlen = skelstrlen; entry->normstr = normstr; @@ -365,7 +564,7 @@ name_entry_create( if (!name_entry_compute_checknames(uc, new_entry)) goto out; - new_entry->badflags = name_entry_examine(new_entry); + new_entry->badflags |= name_entry_examine(new_entry); *entry = new_entry; return true; @@ -456,6 +655,9 @@ unicrash_init( p->nr_buckets = nr_buckets; p->compare_ino = compare_ino; p->nfkc = unorm2_getNFKCInstance(&uerr); + if (U_FAILURE(uerr)) + goto out_free; + p->nfc = unorm2_getNFCInstance(&uerr); if (U_FAILURE(uerr)) goto out_free; p->spoof = uspoof_open(&uerr); @@ -602,6 +804,17 @@ _("Unicode name \"%s\" in %s could be confused with '%s' due to invisible charac goto out; } + /* + * Fake looking file extensions have tricked Linux users into thinking + * that an executable is actually a pdf. See Lazarus 3cx attack. + */ + if (badflags & UNICRASH_PHONY_EXTENSION) { + str_warn(uc->ctx, descr_render(dsc), +_("Unicode name \"%s\" in %s contains a possibly deceptive file extension."), + bad1, what); + goto out; + } + /* * Unfiltered control characters can mess up your terminal and render * invisibly in filechooser UIs. From patchwork Tue Jul 2 01:00:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13718778 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8E1D3D64 for ; Tue, 2 Jul 2024 01:00:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719882047; cv=none; b=ucIIYwsIeB3dMZYElmvOYMS9CD5rGR+7nIGfpiwpAGBoVj91kJqqWyqy6TSqjKoVuJhc/sdiFw2hq1a0P3uM9gSEez6e5HwIxkUiN/jNuxGlUArqsaF++fF/3lKdRK5RWK6Tjm5YN4d2ZA2rLcVImjDXJLpjgpY6JO3DrGvrgbo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719882047; c=relaxed/simple; bh=274yWZmMBpjVxsIaYYljQoAxadW6ioZMZY6MW+HW758=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WfKD4o0mSwxyFQEz9aASwz/5nojPkq9iUm7b6uMxhFLhmL1Df8g58Birkr1Ol2mW0ENoEDGdey9TnjW55CEeCvCiZk7aETq9bw54N+/OQAwiXyq5kCAdIBnRLxU9wA35oZJJIbTF9kXYdyum73thPmLvfuzo8DCDF8Ztg779CBc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GBmUYnRK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GBmUYnRK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4CA61C116B1; Tue, 2 Jul 2024 01:00:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719882047; bh=274yWZmMBpjVxsIaYYljQoAxadW6ioZMZY6MW+HW758=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=GBmUYnRKruKhUS5X9ZkgJVXrevgQVpuR45HhREeAzMPRJ4rDTpmIYc7fiq+v+3mhp BCFyfudVZUWFBwPfNeMXMzRIGAcYPvtviUNe7eiVHzjdagyK8BH7Ocqn0wmKEdzxMP DbOzDrX0JK2NJezOJB8Gy7+hFlklyXf2fX2o+qeR0oZPn+GBw8P71ZRPNGl4WhMpou MLQVuTEDRtrk4UKALGFKFJ3NybPmduBnOIA07meWpn1zRXqKuOacsqI6f38r9coE9s zdkc27BGBtuOfFxn0Ei2gklHjxZvRxQjVJ1/Z26GLKYBRk7J7ePEjf9KMmiMKbvisg wTXrQUEMS3Hpw== Date: Mon, 01 Jul 2024 18:00:46 -0700 Subject: [PATCH 13/13] xfs_scrub: dump unicode points From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de Message-ID: <171988117809.2007123.1252324748291441519.stgit@frogsfrogsfrogs> In-Reply-To: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> References: <171988117591.2007123.4966781934074641923.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add some debug functions to make it easier to query unicode character properties. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/unicrash.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 57 insertions(+), 2 deletions(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index e895afe32aab..119656b0b9d5 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -5,6 +5,7 @@ */ #include "xfs.h" #include "xfs_arch.h" +#include "list.h" #include #include #include @@ -1001,14 +1002,68 @@ unicrash_check_fs_label( label, 0); } +/* Dump a unicode code point and its properties. */ +static inline void dump_uchar32(UChar32 c) +{ + UChar uchrstr[UCHAR_PER_UCHAR32]; + const char *descr; + char buf[16]; + int32_t uchrstrlen, buflen; + UProperty p; + UErrorCode uerr = U_ZERO_ERROR; + + printf("Unicode point 0x%x:", c); + + /* Convert UChar32 to UTF8 representation. */ + uchrstrlen = uchar32_to_uchar(c, uchrstr); + if (!uchrstrlen) + return; + + u_strToUTF8(buf, sizeof(buf), &buflen, uchrstr, uchrstrlen, &uerr); + if (!U_FAILURE(uerr) && buflen > 0) { + int32_t i; + + printf(" \""); + for (i = 0; i < buflen; i++) + printf("\\x%02x", buf[i]); + printf("\""); + } + printf("\n"); + + for (p = 0; p < UCHAR_BINARY_LIMIT; p++) { + int has; + + descr = u_getPropertyName(p, U_LONG_PROPERTY_NAME); + if (!descr) + descr = u_getPropertyName(p, U_SHORT_PROPERTY_NAME); + + has = u_hasBinaryProperty(c, p) ? 1 : 0; + if (descr) { + printf(" %s(%u) = %d\n", descr, p, has); + } else { + printf(" ?(%u) = %d\n", p, has); + } + } +} + /* Load libicu and initialize it. */ bool unicrash_load(void) { - UErrorCode uerr = U_ZERO_ERROR; + char *dbgstr; + UChar32 uchr; + UErrorCode uerr = U_ZERO_ERROR; u_init(&uerr); - return U_FAILURE(uerr); + if (U_FAILURE(uerr)) + return true; + + dbgstr = getenv("XFS_SCRUB_DUMP_CHAR"); + if (dbgstr) { + uchr = strtol(dbgstr, NULL, 0); + dump_uchar32(uchr); + } + return false; } /* Unload libicu once we're done with it. */