diff mbox series

[2/3] fsck: actually detect bad file modes in trees

Message ID YvQc9egGzwFomQbw@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 53602a937dc9eacd67b6afcd781f7b15bb02682f
Headers show
Series actually detect bad file modes in fsck | expand

Commit Message

Jeff King Aug. 10, 2022, 9:02 p.m. UTC
We use the normal tree_desc code to iterate over trees in fsck, meaning
we only see the canonicalized modes it returns. And hence we'd never see
anything unexpected, since it will coerce literally any garbage into one
of our normal and accepted modes.

We can use the new RAW_MODES flag to see the real modes, and then use
the existing code to actually analyze them. The existing code is written
as allow-known-good, so there's not much point in testing a variety of
breakages. The one tested here should be S_IFREG but with nonsense
permissions.

Do note that the error-reporting here isn't great. We don't mention the
specific bad mode, but just that the tree has one or more broken modes.
But when you go to look at it with "git ls-tree", we'll report the
canonicalized mode! This isn't ideal, but given that this should come up
rarely, and that any number of other tree corruptions might force you
into looking at the binary bytes via "cat-file", it's not the end of the
world. And it's something we can improve on top later if we choose.

Reported-by: Xavier Morel <xavier.morel@masklinn.net>
Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c          |  2 +-
 t/t1450-fsck.sh | 14 ++++++++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Aug. 10, 2022, 9:35 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> We use the normal tree_desc code to iterate over trees in fsck, meaning
> we only see the canonicalized modes it returns. And hence we'd never see
> anything unexpected, since it will coerce literally any garbage into one
> of our normal and accepted modes.

Wow.  I did know canon_mode() deliberately discarding the extra
permission bits on trees and blobs, but it was that bad to mark
whatever it does not understand as a gitlink.  That is simply
horrible.

> -	if (init_tree_desc_gently(&desc, buffer, size, 0)) {
> +	if (init_tree_desc_gently(&desc, buffer, size, TREE_DESC_RAW_MODES)) {
>  		retval += report(options, tree_oid, OBJ_TREE,
>  				 FSCK_MSG_BAD_TREE,
>  				 "cannot be parsed as a tree");

OK, so we'll let desc.entry.mode carry whatever bogus bit pattern we
got out of buffer and the downstream code already knows what to do
with them.  That's a clean and minimum way to do this.

Thanks.

> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index ab7f31f1dc..53c2aa10b7 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -364,6 +364,20 @@ test_expect_success 'tree entry with type mismatch' '
>  	test_i18ngrep ! "dangling blob" out
>  '
>  
> +test_expect_success 'tree entry with bogus mode' '
> +	test_when_finished "remove_object \$blob" &&
> +	test_when_finished "remove_object \$tree" &&
> +	blob=$(echo blob | git hash-object -w --stdin) &&
> +	blob_oct=$(echo $blob | hex2oct) &&
> +	tree=$(printf "100000 foo\0${blob_oct}" |
> +	       git hash-object -t tree --stdin -w --literally) &&
> +	git fsck 2>err &&
> +	cat >expect <<-EOF &&
> +	warning in tree $tree: badFilemode: contains bad file modes
> +	EOF
> +	test_cmp expect err
> +'
> +
>  test_expect_success 'tag pointing to nonexistent' '
>  	badoid=$(test_oid deadbeef) &&
>  	cat >invalid-tag <<-EOF &&
Jeff King Aug. 10, 2022, 9:46 p.m. UTC | #2
On Wed, Aug 10, 2022 at 02:35:32PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We use the normal tree_desc code to iterate over trees in fsck, meaning
> > we only see the canonicalized modes it returns. And hence we'd never see
> > anything unexpected, since it will coerce literally any garbage into one
> > of our normal and accepted modes.
> 
> Wow.  I did know canon_mode() deliberately discarding the extra
> permission bits on trees and blobs, but it was that bad to mark
> whatever it does not understand as a gitlink.  That is simply
> horrible.

Yeah, I noticed that, as well. It might actually be reasonable to put a
die() at the end of that canon_mode() function. It's rather unfriendly,
but then, treating nonsense as a gitlink is likely to just result in the
caller ending up aborting anyway.

Outside the scope of my patch, though. :)

-Peff
Junio C Hamano Aug. 10, 2022, 9:54 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> Outside the scope of my patch, though. :)

Absolutely.
diff mbox series

Patch

diff --git a/fsck.c b/fsck.c
index 5acc982a7c..b3da1d68c0 100644
--- a/fsck.c
+++ b/fsck.c
@@ -578,7 +578,7 @@  static int fsck_tree(const struct object_id *tree_oid,
 	const char *o_name;
 	struct name_stack df_dup_candidates = { NULL };
 
-	if (init_tree_desc_gently(&desc, buffer, size, 0)) {
+	if (init_tree_desc_gently(&desc, buffer, size, TREE_DESC_RAW_MODES)) {
 		retval += report(options, tree_oid, OBJ_TREE,
 				 FSCK_MSG_BAD_TREE,
 				 "cannot be parsed as a tree");
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index ab7f31f1dc..53c2aa10b7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -364,6 +364,20 @@  test_expect_success 'tree entry with type mismatch' '
 	test_i18ngrep ! "dangling blob" out
 '
 
+test_expect_success 'tree entry with bogus mode' '
+	test_when_finished "remove_object \$blob" &&
+	test_when_finished "remove_object \$tree" &&
+	blob=$(echo blob | git hash-object -w --stdin) &&
+	blob_oct=$(echo $blob | hex2oct) &&
+	tree=$(printf "100000 foo\0${blob_oct}" |
+	       git hash-object -t tree --stdin -w --literally) &&
+	git fsck 2>err &&
+	cat >expect <<-EOF &&
+	warning in tree $tree: badFilemode: contains bad file modes
+	EOF
+	test_cmp expect err
+'
+
 test_expect_success 'tag pointing to nonexistent' '
 	badoid=$(test_oid deadbeef) &&
 	cat >invalid-tag <<-EOF &&