Message ID | YvQdR3sDqDMCIjIE@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 4dd3b045f528b8d9cbbb4a50e371affb0543f37d |
Headers | show |
Series | actually detect bad file modes in fsck | expand |
Jeff King <peff@peff.net> writes: > ... That will generally cause things to be inefficient > rather than wrong, and is a bug somebody working on a Git implementation > would want to fix, but probably not worth inconveniencing users by > refusing to push or fetch. > > So let's downgrade this to "info" by default, which is our setting for > "mention this when fscking, but don't ever reject, even under strict > mode". If somebody really wants to be paranoid, they can still adjust > the level using config. IIRC, I heard that every reimplementation of Git got the tree entry wrong one way or another at least once. I think this is prudent. I was almost sure that before we "unified" the codepath for normal tree reading and the one used for fsck in a mistaken way, which was fixed in this series, we were catching these anomalous mode bits, but the suspected regression is too long ago that it does not make a practical difference if it was always broken or it was broken long time ago. The risk to start complaining on existing projects is the same either way. Thanks for working on this series.
On Wed, Aug 10, 2022 at 03:08:14PM -0700, Junio C Hamano wrote: > I was almost sure that before we "unified" the codepath for normal > tree reading and the one used for fsck in a mistaken way, which was > fixed in this series, we were catching these anomalous mode bits, > but the suspected regression is too long ago that it does not make a > practical difference if it was always broken or it was broken long > time ago. The risk to start complaining on existing projects is the > same either way. I agree with the "it was so long ago it does not matter", but for the sake of posterity, here's what my digging found: - we got the mode fsck checks in 64071805ed (git-fsck-cache: be stricter about "tree" objects, 2005-07-27), though there is a proto version that is even a little older. Back then we were using a linked list to hold the parsed tree entries (!), but it was parsed by a central spot in parse_tree_buffer(). - that linked list code went away in 15b5536ee4 (Remove last vestiges of generic tree_entry_list, 2006-05-29). But... - ...by then we already had 1b0c7174a1 (tree/diff header cleanup., 2006-03-29), which had tree_entry_extract(). And that commit introduced canon_mode, including the same "set unexpected things to a default", though of course back then it waas S_IFDIR since gitlinks didn't exist. ;) - that canon_mode() was just a rename from DIFF_FILE_CANON_MODE(), which ultimately came from 67574c403f ([PATCH] diff: mode bits fixes, 2005-06-01) So some form of canon_mode() does predate the fsck checks, but I _think_ the fsck code was using the old linked-list version until 15b5536ee4, and would not have been affected. So yes, there were probably 10 months in 2005-2006 where we would have detected these. :) Again, probably not important, but it was interesting for me at least to see the evolution of the tree code. Most of those changes predate my involvement with the code. -Peff
Jeff King <peff@peff.net> writes: > I agree with the "it was so long ago it does not matter", but for the > sake of posterity, here's what my digging found: > ... That's indeed an ancient history. Thanks for a funny find. I forgot all about the linked list of tree entries. > Again, probably not important, but it was interesting for me at least to > see the evolution of the tree code. Most of those changes predate my > involvement with the code. Yeah, your name didn't appear in the history leading to the version with linked list of tree entries, but by the time it was converted to more modern "update_tree_entry()" interface, you had a handful of commits already. The main codepath never cared about zero-padded octal (we used to use sscanf "%o", and these days get_mode() does it manually), but there used to be Git implementations that zero-padded the mode (e.g. "040000") that we need to warn about. Both in code before and after 15b5536ee4, we peek into the bytes directly and check '0', independent from parsing the octal string into 'mode'. I found that interesting (no, there is nothing to fix, just a fun read). Thanks.
diff --git a/fsck.h b/fsck.h index d07f7a2459..6f801e53b1 100644 --- a/fsck.h +++ b/fsck.h @@ -56,7 +56,6 @@ enum fsck_msg_type { FUNC(GITMODULES_PATH, ERROR) \ FUNC(GITMODULES_UPDATE, ERROR) \ /* warnings */ \ - FUNC(BAD_FILEMODE, WARN) \ FUNC(EMPTY_NAME, WARN) \ FUNC(FULL_PATHNAME, WARN) \ FUNC(HAS_DOT, WARN) \ @@ -66,6 +65,7 @@ enum fsck_msg_type { FUNC(ZERO_PADDED_FILEMODE, WARN) \ FUNC(NUL_IN_COMMIT, WARN) \ /* infos (reported as warnings, but ignored by default) */ \ + FUNC(BAD_FILEMODE, INFO) \ FUNC(GITMODULES_PARSE, INFO) \ FUNC(GITIGNORE_SYMLINK, INFO) \ FUNC(GITATTRIBUTES_SYMLINK, INFO) \ diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh index b0b795aca9..ac4099ca89 100755 --- a/t/t5504-fetch-receive-strict.sh +++ b/t/t5504-fetch-receive-strict.sh @@ -352,4 +352,21 @@ test_expect_success \ grep "Cannot demote unterminatedheader" act ' +test_expect_success 'badFilemode is not a strict error' ' + git init --bare badmode.git && + tree=$( + cd badmode.git && + blob=$(echo blob | git hash-object -w --stdin | hex2oct) && + printf "123456 foo\0${blob}" | + git hash-object -t tree --stdin -w --literally + ) && + + rm -rf dst.git && + git init --bare dst.git && + git -C dst.git config transfer.fsckObjects true && + + git -C badmode.git push ../dst.git $tree:refs/tags/tree 2>err && + grep "$tree: badFilemode" err +' + test_done
The previous commit un-broke the "badFileMode" check; before then it was literally testing nothing. And as far as I can tell, it has been so since the very initial version of fsck. The current severity of "badFileMode" is just "warning". But in the --strict mode used by transfer.fsckObjects, that is elevated to an error. This will potentially cause hassle for users, because historical objects with bad modes will suddenly start causing pushes to many server operators to be rejected. At the same time, these bogus modes aren't actually a big risk. Because we canonicalize them everywhere besides fsck, they can't cause too much mischief in the real world. The worst thing you can do is end up with two almost-identical trees that have different hashes but are interpreted the same. That will generally cause things to be inefficient rather than wrong, and is a bug somebody working on a Git implementation would want to fix, but probably not worth inconveniencing users by refusing to push or fetch. So let's downgrade this to "info" by default, which is our setting for "mention this when fscking, but don't ever reject, even under strict mode". If somebody really wants to be paranoid, they can still adjust the level using config. Suggested-by: Xavier Morel <xavier.morel@masklinn.net> Signed-off-by: Jeff King <peff@peff.net> --- fsck.h | 2 +- t/t5504-fetch-receive-strict.sh | 17 +++++++++++++++++ 2 files changed, 18 insertions(+), 1 deletion(-)