Message ID | ZuRzwKTFd65RL4HC@ArchLinux (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4,1/5] ref: initialize "fsck_ref_report" with zero | expand |
shejialuo <shejialuo@gmail.com> writes: > We implicitly rely on "git-fsck(1)" to check the consistency of regular > refs. However, we have already set up the infrastructure of the ref > consistency checks. We need to port original checks from "git-fsck(1)". > Thus, we could clean the "git-fsck(1)" code by removing these implicit > checks. The above reads as if you are, in preparation to "port" the checks we have in "fsck" to elsewhere (presumably to "refs verify"), you are removing the checks that _will_ become redundant from "fsck". But that does not seem to be what is happening. Let me try to paraphrase, in order to check my understanding of what you wanted to say: "git-fsck(1) has some consistency checks for regular refs. As we want to align the checks "git refs verify" performs with them (and eventually call the unified code that checks refs from both), port the logic "git fsck" has to "git refs verify". If we fail to achieve the "a single unified code to check called by both fsck and refs-verify" at the end of this series, and instead end up with duplicated code that implements the checks in two separate code, risking them to be slightly different and drift away over time from each other, that is fine, as long as our intention is to continue the effort for unification in a follow up series. But such a plan needs to be spelled out. > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 890d0324e1..b1ed2e5c04 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -3430,6 +3430,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, > const char *refs_check_dir, > struct dir_iterator *iter); > > +static int files_fsck_refs_content(struct ref_store *ref_store, > + struct fsck_options *o, > + const char *refs_check_dir, > + struct dir_iterator *iter) > +{ > + struct strbuf ref_content = STRBUF_INIT; > + struct strbuf referent = STRBUF_INIT; > + struct strbuf refname = STRBUF_INIT; > + struct fsck_ref_report report = {0}; > + unsigned int type = 0; > + int failure_errno = 0; > + struct object_id oid; > + int ret = 0; > + > + strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path); > + report.path = refname.buf; > + > + if (S_ISLNK(iter->st.st_mode)) > + goto cleanup; "symbolic links are OK" for now. We'll add sanity checks for them in later steps. OK. > + if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) { > + ret = error_errno(_("unable to read ref '%s/%s'"), > + refs_check_dir, iter->relative_path); Is there a reason why we cannot to use report.path aka refname.buf, and instead we have to recompute the same path again? Should this error be propagated back to the caller, not just to the end-user, by a call to fsck_report_ref(), like you do for a ref file that has questionable contents? If ref iteration (like for-each-ref) claims there is this ref, and you cannot read its value when you try to use it, it is just as bad as having a loose ref file that has unusable contents, isn't it? It is a separate matter if such a failure mode deserves its own error code (FSCK_MSG_UNREADABLE_REF) or can be rolled into the same FSCK_MSG_BAD_REF_CONTENT. I can see arguments for both sides and offhand have no strong preference either way. Thanks. > + goto cleanup; > + } > + > + if (parse_loose_ref_contents(ref_store->repo->hash_algo, > + ref_content.buf, &oid, &referent, > + &type, &failure_errno)) { > + ret = fsck_report_ref(o, &report, > + FSCK_MSG_BAD_REF_CONTENT, > + "invalid ref content"); > + goto cleanup; > + } > + > +cleanup: > + strbuf_release(&refname); > + strbuf_release(&ref_content); > + strbuf_release(&referent); > + return ret; > +}
On Wed, Sep 18, 2024 at 11:59:45AM -0700, Junio C Hamano wrote: [snip] > The above reads as if you are, in preparation to "port" the checks > we have in "fsck" to elsewhere (presumably to "refs verify"), you > are removing the checks that _will_ become redundant from "fsck". > > But that does not seem to be what is happening. Let me try to > paraphrase, in order to check my understanding of what you wanted to > say: > > "git-fsck(1) has some consistency checks for regular refs. As > we want to align the checks "git refs verify" performs with > them (and eventually call the unified code that checks refs from > both), port the logic "git fsck" has to "git refs verify". > Thanks, I have re-read my words, I did not explain this thing well. > > + if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) { > > + ret = error_errno(_("unable to read ref '%s/%s'"), > > + refs_check_dir, iter->relative_path); > > Is there a reason why we cannot to use report.path aka refname.buf, > and instead we have to recompute the same path again? > Thanks for pointing out this, because this part I wrote a long time ago and I think it's unrelated to the fsck part. So, I forgot to change. > Should this error be propagated back to the caller, not just to the > end-user, by a call to fsck_report_ref(), like you do for a ref file > that has questionable contents? If ref iteration (like for-each-ref) > claims there is this ref, and you cannot read its value when you try > to use it, it is just as bad as having a loose ref file that has > unusable contents, isn't it? > I agree. The initial motivation for this design is that I think this is OS-specific issue (It may be read successfully in the next time). So, I don't put it into the fsck part. But It make senses that we should report this. > It is a separate matter if such a failure mode deserves its own > error code (FSCK_MSG_UNREADABLE_REF) or can be rolled into the same > FSCK_MSG_BAD_REF_CONTENT. I can see arguments for both sides and > offhand have no strong preference either way. > We could just use "FSCK_MSG_BAD_REF_CONTENT" and add a message "cannot open this file". I guess this should be enough.
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt index 68a2801f15..22c385ea22 100644 --- a/Documentation/fsck-msgids.txt +++ b/Documentation/fsck-msgids.txt @@ -19,6 +19,9 @@ `badParentSha1`:: (ERROR) A commit object has a bad parent sha1. +`badRefContent`:: + (ERROR) A ref has bad content. + `badRefFiletype`:: (ERROR) A ref has a bad file type. diff --git a/fsck.h b/fsck.h index 500b4c04d2..0d99a87911 100644 --- a/fsck.h +++ b/fsck.h @@ -31,6 +31,7 @@ enum fsck_msg_type { FUNC(BAD_NAME, ERROR) \ FUNC(BAD_OBJECT_SHA1, ERROR) \ FUNC(BAD_PARENT_SHA1, ERROR) \ + FUNC(BAD_REF_CONTENT, ERROR) \ FUNC(BAD_REF_FILETYPE, ERROR) \ FUNC(BAD_REF_NAME, ERROR) \ FUNC(BAD_TIMEZONE, ERROR) \ diff --git a/refs/files-backend.c b/refs/files-backend.c index 890d0324e1..b1ed2e5c04 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3430,6 +3430,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, const char *refs_check_dir, struct dir_iterator *iter); +static int files_fsck_refs_content(struct ref_store *ref_store, + struct fsck_options *o, + const char *refs_check_dir, + struct dir_iterator *iter) +{ + struct strbuf ref_content = STRBUF_INIT; + struct strbuf referent = STRBUF_INIT; + struct strbuf refname = STRBUF_INIT; + struct fsck_ref_report report = {0}; + unsigned int type = 0; + int failure_errno = 0; + struct object_id oid; + int ret = 0; + + strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path); + report.path = refname.buf; + + if (S_ISLNK(iter->st.st_mode)) + goto cleanup; + + if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) { + ret = error_errno(_("unable to read ref '%s/%s'"), + refs_check_dir, iter->relative_path); + goto cleanup; + } + + if (parse_loose_ref_contents(ref_store->repo->hash_algo, + ref_content.buf, &oid, &referent, + &type, &failure_errno)) { + ret = fsck_report_ref(o, &report, + FSCK_MSG_BAD_REF_CONTENT, + "invalid ref content"); + goto cleanup; + } + +cleanup: + strbuf_release(&refname); + strbuf_release(&ref_content); + strbuf_release(&referent); + return ret; +} + static int files_fsck_refs_name(struct ref_store *ref_store UNUSED, struct fsck_options *o, const char *refs_check_dir, @@ -3512,6 +3554,7 @@ static int files_fsck_refs(struct ref_store *ref_store, { files_fsck_refs_fn fsck_refs_fn[]= { files_fsck_refs_name, + files_fsck_refs_content, NULL, }; diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh index 71a4d1a5ae..a1205b3a3b 100755 --- a/t/t0602-reffiles-fsck.sh +++ b/t/t0602-reffiles-fsck.sh @@ -89,4 +89,64 @@ test_expect_success 'ref name check should be adapted into fsck messages' ' test_must_be_empty err ' +test_expect_success 'regular ref content should be checked (individual)' ' + test_when_finished "rm -rf repo" && + git init repo && + branch_dir_prefix=.git/refs/heads && + tag_dir_prefix=.git/refs/tags && + cd repo && + test_commit default && + mkdir -p "$branch_dir_prefix/a/b" && + + git refs verify 2>err && + test_must_be_empty err && + + printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: refs/tags/tag-bad-1: badRefContent: invalid ref content + EOF + rm $tag_dir_prefix/tag-bad-1 && + test_cmp expect err && + + printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: refs/tags/tag-bad-2: badRefContent: invalid ref content + EOF + rm $tag_dir_prefix/tag-bad-2 && + test_cmp expect err && + + printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content + EOF + rm $branch_dir_prefix/a/b/branch-bad && + test_cmp expect err +' + +test_expect_success 'regular ref content should be checked (aggregate)' ' + test_when_finished "rm -rf repo" && + git init repo && + branch_dir_prefix=.git/refs/heads && + tag_dir_prefix=.git/refs/tags && + cd repo && + test_commit default && + mkdir -p "$branch_dir_prefix/a/b" && + + printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 && + printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 && + printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad && + + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content + error: refs/tags/tag-bad-1: badRefContent: invalid ref content + error: refs/tags/tag-bad-2: badRefContent: invalid ref content + EOF + sort err >sorted_err && + test_cmp expect sorted_err +' + test_done
We implicitly rely on "git-fsck(1)" to check the consistency of regular refs. However, we have already set up the infrastructure of the ref consistency checks. We need to port original checks from "git-fsck(1)". Thus, we could clean the "git-fsck(1)" code by removing these implicit checks. The "git-fsck(1)" command reports an error when the ref content is invalid. Following this, add a similar check to "git refs verify". Add a new fsck error message called "badRefContent(ERROR)" to represent that a ref has an invalid content. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- Documentation/fsck-msgids.txt | 3 ++ fsck.h | 1 + refs/files-backend.c | 43 +++++++++++++++++++++++++ t/t0602-reffiles-fsck.sh | 60 +++++++++++++++++++++++++++++++++++ 4 files changed, 107 insertions(+)