diff mbox series

[3/3] fsck: downgrade tree badFilemode to "info"

Message ID YvQdR3sDqDMCIjIE@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 4dd3b045f528b8d9cbbb4a50e371affb0543f37d
Headers show
Series actually detect bad file modes in fsck | expand

Commit Message

Jeff King Aug. 10, 2022, 9:04 p.m. UTC
The previous commit un-broke the "badFileMode" check; before then it was
literally testing nothing. And as far as I can tell, it has been so
since the very initial version of fsck.

The current severity of "badFileMode" is just "warning". But in the
--strict mode used by transfer.fsckObjects, that is elevated to an
error. This will potentially cause hassle for users, because historical
objects with bad modes will suddenly start causing pushes to many server
operators to be rejected.

At the same time, these bogus modes aren't actually a big risk. Because
we canonicalize them everywhere besides fsck, they can't cause too much
mischief in the real world. The worst thing you can do is end up with
two almost-identical trees that have different hashes but are
interpreted the same. That will generally cause things to be inefficient
rather than wrong, and is a bug somebody working on a Git implementation
would want to fix, but probably not worth inconveniencing users by
refusing to push or fetch.

So let's downgrade this to "info" by default, which is our setting for
"mention this when fscking, but don't ever reject, even under strict
mode". If somebody really wants to be paranoid, they can still adjust
the level using config.

Suggested-by: Xavier Morel <xavier.morel@masklinn.net>
Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.h                          |  2 +-
 t/t5504-fetch-receive-strict.sh | 17 +++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Aug. 10, 2022, 10:08 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> ... That will generally cause things to be inefficient
> rather than wrong, and is a bug somebody working on a Git implementation
> would want to fix, but probably not worth inconveniencing users by
> refusing to push or fetch.
>
> So let's downgrade this to "info" by default, which is our setting for
> "mention this when fscking, but don't ever reject, even under strict
> mode". If somebody really wants to be paranoid, they can still adjust
> the level using config.

IIRC, I heard that every reimplementation of Git got the tree entry
wrong one way or another at least once.  I think this is prudent.

I was almost sure that before we "unified" the codepath for normal
tree reading and the one used for fsck in a mistaken way, which was
fixed in this series, we were catching these anomalous mode bits,
but the suspected regression is too long ago that it does not make a
practical difference if it was always broken or it was broken long
time ago.  The risk to start complaining on existing projects is the
same either way.

Thanks for working on this series.
Jeff King Aug. 11, 2022, 8:33 a.m. UTC | #2
On Wed, Aug 10, 2022 at 03:08:14PM -0700, Junio C Hamano wrote:

> I was almost sure that before we "unified" the codepath for normal
> tree reading and the one used for fsck in a mistaken way, which was
> fixed in this series, we were catching these anomalous mode bits,
> but the suspected regression is too long ago that it does not make a
> practical difference if it was always broken or it was broken long
> time ago.  The risk to start complaining on existing projects is the
> same either way.

I agree with the "it was so long ago it does not matter", but for the
sake of posterity, here's what my digging found:

  - we got the mode fsck checks in 64071805ed (git-fsck-cache: be
    stricter about "tree" objects, 2005-07-27), though there is a proto
    version that is even a little older. Back then we were using a
    linked list to hold the parsed tree entries (!), but it was parsed
    by a central spot in parse_tree_buffer().

  - that linked list code went away in 15b5536ee4 (Remove last vestiges
    of generic tree_entry_list, 2006-05-29). But...

  - ...by then we already had 1b0c7174a1 (tree/diff header cleanup.,
    2006-03-29), which had tree_entry_extract(). And that commit
    introduced canon_mode, including the same "set unexpected things to
    a default", though of course back then it waas S_IFDIR since
    gitlinks didn't exist. ;)

  - that canon_mode() was just a rename from DIFF_FILE_CANON_MODE(),
    which ultimately came from 67574c403f ([PATCH] diff: mode bits
    fixes, 2005-06-01)

So some form of canon_mode() does predate the fsck checks, but I _think_
the fsck code was using the old linked-list version until 15b5536ee4,
and would not have been affected. So yes, there were probably 10 months
in 2005-2006 where we would have detected these. :)

Again, probably not important, but it was interesting for me at least to
see the evolution of the tree code. Most of those changes predate my
involvement with the code.

-Peff
Junio C Hamano Aug. 11, 2022, 4:24 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> I agree with the "it was so long ago it does not matter", but for the
> sake of posterity, here's what my digging found:
> ...

That's indeed an ancient history.  Thanks for a funny find.  I
forgot all about the linked list of tree entries.

> Again, probably not important, but it was interesting for me at least to
> see the evolution of the tree code. Most of those changes predate my
> involvement with the code.

Yeah, your name didn't appear in the history leading to the version
with linked list of tree entries, but by the time it was converted
to more modern "update_tree_entry()" interface, you had a handful of
commits already.

The main codepath never cared about zero-padded octal (we used to
use sscanf "%o", and these days get_mode() does it manually), but
there used to be Git implementations that zero-padded the mode
(e.g. "040000") that we need to warn about.  Both in code before and
after 15b5536ee4, we peek into the bytes directly and check '0',
independent from parsing the octal string into 'mode'.  I found that
interesting (no, there is nothing to fix, just a fun read).

Thanks.
diff mbox series

Patch

diff --git a/fsck.h b/fsck.h
index d07f7a2459..6f801e53b1 100644
--- a/fsck.h
+++ b/fsck.h
@@ -56,7 +56,6 @@  enum fsck_msg_type {
 	FUNC(GITMODULES_PATH, ERROR) \
 	FUNC(GITMODULES_UPDATE, ERROR) \
 	/* warnings */ \
-	FUNC(BAD_FILEMODE, WARN) \
 	FUNC(EMPTY_NAME, WARN) \
 	FUNC(FULL_PATHNAME, WARN) \
 	FUNC(HAS_DOT, WARN) \
@@ -66,6 +65,7 @@  enum fsck_msg_type {
 	FUNC(ZERO_PADDED_FILEMODE, WARN) \
 	FUNC(NUL_IN_COMMIT, WARN) \
 	/* infos (reported as warnings, but ignored by default) */ \
+	FUNC(BAD_FILEMODE, INFO) \
 	FUNC(GITMODULES_PARSE, INFO) \
 	FUNC(GITIGNORE_SYMLINK, INFO) \
 	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index b0b795aca9..ac4099ca89 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -352,4 +352,21 @@  test_expect_success \
 	grep "Cannot demote unterminatedheader" act
 '
 
+test_expect_success 'badFilemode is not a strict error' '
+	git init --bare badmode.git &&
+	tree=$(
+		cd badmode.git &&
+		blob=$(echo blob | git hash-object -w --stdin | hex2oct) &&
+		printf "123456 foo\0${blob}" |
+		git hash-object -t tree --stdin -w --literally
+	) &&
+
+	rm -rf dst.git &&
+	git init --bare dst.git &&
+	git -C dst.git config transfer.fsckObjects true &&
+
+	git -C badmode.git push ../dst.git $tree:refs/tags/tree 2>err &&
+	grep "$tree: badFilemode" err
+'
+
 test_done