Message ID | Zvj-sBX-0AFsuFDC@ArchLinux (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | add ref content check for files backend | expand |
On Sun, Sep 29, 2024 at 03:16:00PM +0800, shejialuo wrote: > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > index 22c385ea22..e310b5bce9 100644 > --- a/Documentation/fsck-msgids.txt > +++ b/Documentation/fsck-msgids.txt > @@ -179,6 +179,14 @@ > `unknownType`:: > (ERROR) Found an unknown object type. > > +`unofficialFormattedRef`:: > + (INFO) The content of a loose ref file is not in the official > + format such as not having a LF at the end or having trailing > + garbage. As valid implementations of Git never created such a > + loose ref file, it may become an error in the future. Report > + to the git@vger.kernel.org mailing list if you see this error, > + as we need to know what tools created such a file. > + I find "unofficial" to be a tad weird. Do we rather want to say something like "badRefTrailingGarbage"? > @@ -3541,6 +3546,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store, > goto cleanup; > } > > + if (!(type & REF_ISSYMREF)) { > + if (!*trailing) { > + ret = fsck_report_ref(o, &report, > + FSCK_MSG_UNOFFICIAL_FORMATTED_REF, > + "misses LF at the end"); > + goto cleanup; > + } > + if (*trailing != '\n' || *(trailing + 1)) { > + ret = fsck_report_ref(o, &report, > + FSCK_MSG_UNOFFICIAL_FORMATTED_REF, > + "has trailing garbage: '%s'", trailing); > + goto cleanup; > + } > + } > + I think we should discern these two error cases and provide different message IDs. Patrick
On Mon, Oct 07, 2024 at 08:58:37AM +0200, Patrick Steinhardt wrote: > On Sun, Sep 29, 2024 at 03:16:00PM +0800, shejialuo wrote: > > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > > index 22c385ea22..e310b5bce9 100644 > > --- a/Documentation/fsck-msgids.txt > > +++ b/Documentation/fsck-msgids.txt > > @@ -179,6 +179,14 @@ > > `unknownType`:: > > (ERROR) Found an unknown object type. > > > > +`unofficialFormattedRef`:: > > + (INFO) The content of a loose ref file is not in the official > > + format such as not having a LF at the end or having trailing > > + garbage. As valid implementations of Git never created such a > > + loose ref file, it may become an error in the future. Report > > + to the git@vger.kernel.org mailing list if you see this error, > > + as we need to know what tools created such a file. > > + > > I find "unofficial" to be a tad weird. Do we rather want to say > something like "badRefTrailingGarbage"? > Well, I will answer this question just in below question together. > > @@ -3541,6 +3546,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store, > > goto cleanup; > > } > > > > + if (!(type & REF_ISSYMREF)) { > > + if (!*trailing) { > > + ret = fsck_report_ref(o, &report, > > + FSCK_MSG_UNOFFICIAL_FORMATTED_REF, > > + "misses LF at the end"); > > + goto cleanup; > > + } > > + if (*trailing != '\n' || *(trailing + 1)) { > > + ret = fsck_report_ref(o, &report, > > + FSCK_MSG_UNOFFICIAL_FORMATTED_REF, > > + "has trailing garbage: '%s'", trailing); > > + goto cleanup; > > + } > > + } > > + > > I think we should discern these two error cases and provide different > message IDs. > Actually, in the previous versions, I have mapped one message id to one error case. But, in the v4, Junio asked a question Not limited to this patch, but isn't fsck_report_ref() misdesigned, or is it just they are used poorly in these patches? In these two callsites, the message string parameter does not give any more information than what the FSCK_MSG_* enum gives. That is what I meant by "misdesigned"---if one message enum always corresponds to one human-readable message, there is not much point in forcing callers to supply both, is there? In my opinion, we should have only one case here for trailing garbage and not end with a newline. When writing the code, I chose the name "unofficialFormattedRef" for the following reason: 1. If we use two message ids here, for every message id, we need write to info the user "please report this to git mailing list". 2. If we decide to make this as an error. We could just classify them into "badRefContent" message category. 3. The semantic is correct here, they are truly curious formatted refs, and eventually we will give the info to the user what is curious. So, I think we should not always map one message to one error case. > Patrick Thanks, Jialuo
On Mon, Oct 07, 2024 at 04:44:16PM +0800, shejialuo wrote: > On Mon, Oct 07, 2024 at 08:58:37AM +0200, Patrick Steinhardt wrote: > > On Sun, Sep 29, 2024 at 03:16:00PM +0800, shejialuo wrote: > > > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > > > index 22c385ea22..e310b5bce9 100644 > > > --- a/Documentation/fsck-msgids.txt > > > +++ b/Documentation/fsck-msgids.txt > > > @@ -3541,6 +3546,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store, > > > goto cleanup; > > > } > > > > > > + if (!(type & REF_ISSYMREF)) { > > > + if (!*trailing) { > > > + ret = fsck_report_ref(o, &report, > > > + FSCK_MSG_UNOFFICIAL_FORMATTED_REF, > > > + "misses LF at the end"); > > > + goto cleanup; > > > + } > > > + if (*trailing != '\n' || *(trailing + 1)) { > > > + ret = fsck_report_ref(o, &report, > > > + FSCK_MSG_UNOFFICIAL_FORMATTED_REF, > > > + "has trailing garbage: '%s'", trailing); > > > + goto cleanup; > > > + } > > > + } > > > + > > > > I think we should discern these two error cases and provide different > > message IDs. > > > > Actually, in the previous versions, I have mapped one message id to one > error case. But, in the v4, Junio asked a question > > Not limited to this patch, but isn't fsck_report_ref() misdesigned, > or is it just they are used poorly in these patches? In these two > callsites, the message string parameter does not give any more > information than what the FSCK_MSG_* enum gives. > > That is what I meant by "misdesigned"---if one message enum always > corresponds to one human-readable message, there is not much point > in forcing callers to supply both, is there? > > In my opinion, we should have only one case here for trailing garbage > and not end with a newline. When writing the code, I chose the name > "unofficialFormattedRef" for the following reason: > > 1. If we use two message ids here, for every message id, we need write > to info the user "please report this to git mailing list". > > 2. If we decide to make this as an error. We could just classify them > into "badRefContent" message category. > > 3. The semantic is correct here, they are truly curious formatted > refs, and eventually we will give the info to the user what is > curious. > > So, I think we should not always map one message to one error case. From my point of view the error codes should be the single source of truth, as this is what a user can use to disable specific checks. So if one code maps to multiple messages they have the problem that they can only disable all of those messages. I don't disagree with what Junio is saying. It is somewhat duplicate that the user has to pass both a code and a message in the current form-- it should be sufficient for them to pass the code, and the message can then e.g. be extracted from a central array that maps codes to messages. But you can also make the reverse argument: messages can be dynamic, so that the caller may include additional details around why specfically the check failed. The code and message would still be 1:1, but we may include additional details like that to guide the user. Patrick
On Mon, Oct 07, 2024 at 11:25:17AM +0200, Patrick Steinhardt wrote: [snip] > > > > Actually, in the previous versions, I have mapped one message id to one > > error case. But, in the v4, Junio asked a question > > > > Not limited to this patch, but isn't fsck_report_ref() misdesigned, > > or is it just they are used poorly in these patches? In these two > > callsites, the message string parameter does not give any more > > information than what the FSCK_MSG_* enum gives. > > > > That is what I meant by "misdesigned"---if one message enum always > > corresponds to one human-readable message, there is not much point > > in forcing callers to supply both, is there? > > > > In my opinion, we should have only one case here for trailing garbage > > and not end with a newline. When writing the code, I chose the name > > "unofficialFormattedRef" for the following reason: > > > > 1. If we use two message ids here, for every message id, we need write > > to info the user "please report this to git mailing list". > > > > 2. If we decide to make this as an error. We could just classify them > > into "badRefContent" message category. > > > > 3. The semantic is correct here, they are truly curious formatted > > refs, and eventually we will give the info to the user what is > > curious. > > > > So, I think we should not always map one message to one error case. > > From my point of view the error codes should be the single source of > truth, as this is what a user can use to disable specific checks. So if > one code maps to multiple messages they have the problem that they can > only disable all of those messages. > Thanks for your remind here. I totally forgot this. I have changed my mind now, we should use one to one mapping here. As you said, if we do not, we will give the user the bad experience. > I don't disagree with what Junio is saying. It is somewhat duplicate > that the user has to pass both a code and a message in the current > form-- it should be sufficient for them to pass the code, and the > message can then e.g. be extracted from a central array that maps codes > to messages. > > But you can also make the reverse argument: messages can be dynamic, so > that the caller may include additional details around why specfically > the check failed. The code and message would still be 1:1, but we may > include additional details like that to guide the user. > Yes, I will refactor the "fsck_report" to allow the user pass the "NULL" message if the fsck message id is clear enough to indicate the error case. So, more things to do here. > Patrick Thanks, Jialuo
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt index 22c385ea22..e310b5bce9 100644 --- a/Documentation/fsck-msgids.txt +++ b/Documentation/fsck-msgids.txt @@ -179,6 +179,14 @@ `unknownType`:: (ERROR) Found an unknown object type. +`unofficialFormattedRef`:: + (INFO) The content of a loose ref file is not in the official + format such as not having a LF at the end or having trailing + garbage. As valid implementations of Git never created such a + loose ref file, it may become an error in the future. Report + to the git@vger.kernel.org mailing list if you see this error, + as we need to know what tools created such a file. + `unterminatedHeader`:: (FATAL) Missing end-of-line in the object header. diff --git a/fsck.h b/fsck.h index 0d99a87911..7420add5c0 100644 --- a/fsck.h +++ b/fsck.h @@ -85,6 +85,7 @@ enum fsck_msg_type { FUNC(MAILMAP_SYMLINK, INFO) \ FUNC(BAD_TAG_NAME, INFO) \ FUNC(MISSING_TAGGER_ENTRY, INFO) \ + FUNC(UNOFFICIAL_FORMATTED_REF, INFO) \ /* ignored (elevated when requested) */ \ FUNC(EXTRA_HEADER_ENTRY, IGNORE) diff --git a/refs.c b/refs.c index 5f729ed412..6ba1bb1aa1 100644 --- a/refs.c +++ b/refs.c @@ -1788,7 +1788,7 @@ static int refs_read_special_head(struct ref_store *ref_store, } result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf, - oid, referent, type, failure_errno); + oid, referent, type, NULL, failure_errno); done: strbuf_release(&full_path); diff --git a/refs/files-backend.c b/refs/files-backend.c index 35b3fa983e..b2a790c884 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -568,7 +568,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname, buf = sb_contents.buf; ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf, - oid, referent, type, &myerr); + oid, referent, type, NULL, &myerr); out: if (ret && !myerr) @@ -605,7 +605,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn int parse_loose_ref_contents(const struct git_hash_algo *algop, const char *buf, struct object_id *oid, struct strbuf *referent, unsigned int *type, - int *failure_errno) + const char **trailing, int *failure_errno) { const char *p; if (skip_prefix(buf, "ref:", &buf)) { @@ -627,6 +627,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop, *failure_errno = EINVAL; return -1; } + + if (trailing) + *trailing = p; + return 0; } @@ -3513,6 +3517,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store, struct strbuf referent = STRBUF_INIT; struct strbuf refname = STRBUF_INIT; struct fsck_ref_report report = { 0 }; + const char *trailing = NULL; unsigned int type = 0; int failure_errno = 0; struct object_id oid; @@ -3533,7 +3538,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store, if (parse_loose_ref_contents(ref_store->repo->hash_algo, ref_content.buf, &oid, &referent, - &type, &failure_errno)) { + &type, &trailing, &failure_errno)) { strbuf_rtrim(&ref_content); ret = fsck_report_ref(o, &report, FSCK_MSG_BAD_REF_CONTENT, @@ -3541,6 +3546,21 @@ static int files_fsck_refs_content(struct ref_store *ref_store, goto cleanup; } + if (!(type & REF_ISSYMREF)) { + if (!*trailing) { + ret = fsck_report_ref(o, &report, + FSCK_MSG_UNOFFICIAL_FORMATTED_REF, + "misses LF at the end"); + goto cleanup; + } + if (*trailing != '\n' || *(trailing + 1)) { + ret = fsck_report_ref(o, &report, + FSCK_MSG_UNOFFICIAL_FORMATTED_REF, + "has trailing garbage: '%s'", trailing); + goto cleanup; + } + } + cleanup: strbuf_release(&refname); strbuf_release(&ref_content); diff --git a/refs/refs-internal.h b/refs/refs-internal.h index 2313c830d8..73b05f971b 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -715,7 +715,7 @@ struct ref_store { int parse_loose_ref_contents(const struct git_hash_algo *algop, const char *buf, struct object_id *oid, struct strbuf *referent, unsigned int *type, - int *failure_errno); + const char **trailing, int *failure_errno); /* * Fill in the generic part of refs and add it to our collection of diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh index 628f9bcc46..2f5c4a1926 100755 --- a/t/t0602-reffiles-fsck.sh +++ b/t/t0602-reffiles-fsck.sh @@ -185,6 +185,61 @@ test_expect_success 'regular ref content should be checked (individual)' ' error: refs/heads/a/b/branch-bad: badRefContent: $bad_content EOF rm $branch_dir_prefix/a/b/branch-bad && + test_cmp expect err && + + printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/branch-no-newline: unofficialFormattedRef: misses LF at the end + EOF + rm $branch_dir_prefix/branch-no-newline && + test_cmp expect err && + + printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/branch-garbage: unofficialFormattedRef: has trailing garbage: '\'' garbage'\'' + EOF + rm $branch_dir_prefix/branch-garbage && + test_cmp expect err && + + printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/tags/tag-garbage-1: unofficialFormattedRef: has trailing garbage: '\'' + + + '\'' + EOF + rm $tag_dir_prefix/tag-garbage-1 && + test_cmp expect err && + + printf "%s\n\n\n garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/tags/tag-garbage-2: unofficialFormattedRef: has trailing garbage: '\'' + + + garbage'\'' + EOF + rm $tag_dir_prefix/tag-garbage-2 && + test_cmp expect err && + + printf "%s garbage\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/tags/tag-garbage-3: unofficialFormattedRef: has trailing garbage: '\'' garbage + a'\'' + EOF + rm $tag_dir_prefix/tag-garbage-3 && + test_cmp expect err && + + printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 && + test_must_fail git -c fsck.unofficialFormattedRef=error refs verify 2>err && + cat >expect <<-EOF && + error: refs/tags/tag-garbage-4: unofficialFormattedRef: has trailing garbage: '\'' garbage'\'' + EOF + rm $tag_dir_prefix/tag-garbage-4 && test_cmp expect err ' @@ -203,12 +258,16 @@ test_expect_success 'regular ref content should be checked (aggregate)' ' printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 && printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 && printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad && + printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline && + printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage && test_must_fail git refs verify 2>err && cat >expect <<-EOF && error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3 error: refs/tags/tag-bad-1: badRefContent: $bad_content_1 error: refs/tags/tag-bad-2: badRefContent: $bad_content_2 + warning: refs/heads/branch-garbage: unofficialFormattedRef: has trailing garbage: '\'' garbage'\'' + warning: refs/heads/branch-no-newline: unofficialFormattedRef: misses LF at the end EOF sort err >sorted_err && test_cmp expect sorted_err
We have already used "parse_loose_ref_contents" function to check whether the ref content is valid in files backend. However, by using "parse_loose_ref_contents", we allow the ref's content to end with garbage or without a newline. Even though we never create such loose refs ourselves, we have accepted such loose refs. So, it is entirely possible that some third-party tools may rely on such loose refs being valid. We should not report an error fsck message at current. We should notify the users about such "curiously formatted" loose refs so that adequate care is taken before we decide to tighten the rules in the future. And it's not suitable either to report a warn fsck message to the user. We don't yet want the "--strict" flag that controls this bit to end up generating errors for such weirdly-formatted reference contents, as we first want to assess whether this retroactive tightening will cause issues for any tools out there. It may cause compatibility issues which may break the repository. So we add the "unofficialFormattedRef(INFO)" fsck message to represent the situation where the ref format is not officially created by us and notify the users it may become an error in the future. It might appear that we can't provide the user with any warnings by using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert FSCK_INFO to FSCK_WARN and we can still warn the user about these situations when using "git refs verify" without introducing compatibility issues. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- Documentation/fsck-msgids.txt | 8 +++++ fsck.h | 1 + refs.c | 2 +- refs/files-backend.c | 26 +++++++++++++-- refs/refs-internal.h | 2 +- t/t0602-reffiles-fsck.sh | 59 +++++++++++++++++++++++++++++++++++ 6 files changed, 93 insertions(+), 5 deletions(-)