Message ID | Zs3558scHssaG_XS@ArchLinux (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | add ref content check for files backend | expand |
shejialuo <shejialuo@gmail.com> writes: > We have already introduced the checks for regular refs. There is no need > to check the consistency of the target which the symbolic ref points to. > Instead, we just check the content of the symbolic ref itself. Just in case you need it in the future, if you ever need to refer to a symbolic ref in a way that it is clear which of the two kinds you are talking about, you can say "textual symref" (a regular file whose contents is "ref: " followed by the target), to contrast them with "symbolic link used as symref". In the proposed log message of this commit, all references to "symbolic ref" talk about textual ones, so I do not see any need to be extra explicit by saying "textual symref". > In order to check the content of the symbolic ref, create a function > "files_fsck_symref_target". It will first check whether the "pointee" is > under the "refs/" directory and then we will check the "pointee" itself. Hmph, as the pointee must be within the usual places that you would find refs (either in refs/ directory or pseudo ref files immediately below $GIT_DIR), wouldn't we check the pointee when fsck (or "git refs verify") run and check everything? The pointee will have its turn to be checked, and I am not sure why you need to check the pointee when you find a symbolic ref is pointing at it, which will lead for it to be checked twice (or more). I however did not find an additional code to "check the pointee itself" in the patch, so perhaps it is OK---the only thing that needs fixing may be the above paragraph if that is the case. > There is no specification about the content of the symbolic ref. > Although we do write "ref: %s\n" to create a symbolic ref by using > "git-symbolic-ref(1)" command. However, this is not mandatory. We still > accept symbolic refs with null trailing garbage. Put it more specific, > the following are correct: > > 1. "ref: refs/heads/master " > 2. "ref: refs/heads/master \n \n" > 3. "ref: refs/heads/master\n\n" > > But we do not allow any non-null trailing garbage. Your use of word "null" is probably too confusing to contributors to this project. None of the above has NUL bytes in them. I think you want to say something like this: A regular file is accepted as a textual symbolic ref if it begins with "ref:", followed by zero or more whitespaces, followed by the full refname (e.g. "refs/heads/master", "refs/tags/v1.0"), followed only by whitespace characters. We always write a single SP after "ref:" and a single LF after the full refname, but third-party reimplementations of Git may have taken advantage of the looser syntax that is allowed as above. > The following are bad > symbolic contents which will be reported as fsck error by "git-fsck(1)". > > 1. "ref: refs/heads/master garbage\n" > 2. "ref: refs/heads/master \n\n\n garbage " > > In order to provide above checks, we will use "strrchr" to check whether > we have newline in the ref content. strrchr() to look for only LF is overly strict. You need to match what refs/files-backend.c:read_ref_internal() does to the contents read from such a loose ref file, i.e. strbuf_rtrim(). Any isspace() bytes are trimmed at the end, including SP, HT, CR and LF. > +static int files_fsck_symref_target(struct fsck_options *o, > + struct fsck_ref_report *report, > + const char *refname, > + struct strbuf *pointee_name, > + struct strbuf *pointee_path) > +{ > + const char *newline_pos = NULL; > + const char *p = NULL; > + struct stat st; > + int ret = 0; > + > + if (!skip_prefix(pointee_name->buf, "refs/", &p)) { > + > + ret = fsck_report_ref(o, report, > + FSCK_MSG_BAD_SYMREF_POINTEE, > + "points to ref outside the refs directory"); > + goto out; > + } > + > + newline_pos = strrchr(p, '\n'); > + if (!newline_pos || *(newline_pos + 1)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_REF_MISSING_NEWLINE, > + "missing newline"); If newline_pos is NULL, it is truly a "missing newline" situation. If I am reading the code correctly, the severity level is set to INFO, which is good. If newline_pos is not NULL but newline_pos[1] is not NUL, however, that is not a "missing newline". "refs: refs/heads/master\n " would trigger this report, for example. As far as I can tell, such a textual symbolic ref is taken as a valid symbolic ref pointing at "refs/heads/master" by refs/files-backend.c:read_ref_internal(), so we are trying to detect a valid but curiously formatted textual symbolic ref file with the above code? And strrchr() to find the last LF is not sufficient for that purpose. We would never write "refs: refs/head/master \n", but the above code will find the LF, be satisified that the LF is followed by NUL, without realizing that SP there is not something we would have written! I am not sure if that is worth detecting that if it is something we would have written, but if that were the case, then you would probably need to do (1) check the last byte of pointee_name.buf[] to make sure that it is LF; and (2) remember pointee_name.len, run strbuf_rtrim() on pointee_name, and that LF at the end was the only thing that was trimmed by checking the pointee_name.len after trimming. or something like that. Then you do not have to have an ugly "oh we need to check again"---the production code would not do that, either. > + if (check_refname_format(pointee_name->buf, 0)) { > + /* > + * When containing null-garbage, "check_refname_format" will > + * fail, we should trim the "pointee" to check again. > + */ > + strbuf_rtrim(pointee_name); > + if (!check_refname_format(pointee_name->buf, 0)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_TRAILING_REF_CONTENT, > + "trailing null-garbage"); > + goto out; > + } IOW, the above "let's retry" feels totally wrong. You shouldn't have to do so, and that comes from running check_refname_format() before rtrimming the pointee_name. > + ret = fsck_report_ref(o, report, > + FSCK_MSG_BAD_SYMREF_POINTEE, > + "points to refname with invalid format"); > + } Good. With this check, we know that the referent, if exists, is well-formed. The contents of the referent will then be checked just like all other refs that may not be pointed by any symbolic ref. > + /* > + * Missing target should not be treated as any error worthy event and > + * not even warn. It is a common case that a symbolic ref points to a > + * ref that does not exist yet. If the target ref does not exist, just > + * skip the check for the file type. > + */ > + if (lstat(pointee_path->buf, &st) < 0) > + goto out; Good. > + if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_BAD_SYMREF_POINTEE, > + "points to an invalid file type"); > + goto out; I do not think it is wrong per se, but I am not sure if this check is needed, either. When "git fsck" or "git refs verify" is told to check the loose refs, wouldn't it walk the refs directory and report such an unusual filesystem entity that is not a regular file, symbolic link, or a directory as "there is unusual cruft exist here"?
On Wed, Aug 28, 2024 at 12:08:07AM +0800, shejialuo wrote: > We have already introduced the checks for regular refs. There is no need > to check the consistency of the target which the symbolic ref points to. > Instead, we just check the content of the symbolic ref itself. > > In order to check the content of the symbolic ref, create a function > "files_fsck_symref_target". It will first check whether the "pointee" is > under the "refs/" directory and then we will check the "pointee" itself. > > There is no specification about the content of the symbolic ref. > Although we do write "ref: %s\n" to create a symbolic ref by using > "git-symbolic-ref(1)" command. However, this is not mandatory. We still > accept symbolic refs with null trailing garbage. Put it more specific, > the following are correct: > > 1. "ref: refs/heads/master " > 2. "ref: refs/heads/master \n \n" > 3. "ref: refs/heads/master\n\n" Now that we're talking about tightening the rules for direct refs, I wonder whether we'd also want to apply the same rules to symrefs. Namely, when there is trailing whitespace we should generate an INFO-level message about that, too. This is mostly for the sake of consistency. [snip] > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > index fc074fc571..85fd058c81 100644 > --- a/Documentation/fsck-msgids.txt > +++ b/Documentation/fsck-msgids.txt > @@ -28,6 +28,9 @@ > `badRefName`:: > (ERROR) A ref has an invalid format. > > +`badSymrefPointee`:: > + (ERROR) The pointee of a symref is bad. I think we'd want to clarify what "bad" is supposed to mean. Like, is a missing symref pointee bad? If this is about the format of the pointee's name, we might want to call this "badSymrefPointeeName". Also, I think we don't typically call the value of a symbolic ref "pointee", but "target". Searching for "pointee" in our codebase only gives a single hit, and that one is not related to symbolic refs. > diff --git a/fsck.h b/fsck.h > index b85072df57..cbe837f84c 100644 > --- a/fsck.h > +++ b/fsck.h > @@ -34,6 +34,7 @@ enum fsck_msg_type { > FUNC(BAD_REF_CONTENT, ERROR) \ > FUNC(BAD_REF_FILETYPE, ERROR) \ > FUNC(BAD_REF_NAME, ERROR) \ > + FUNC(BAD_SYMREF_POINTEE, ERROR) \ > FUNC(BAD_TIMEZONE, ERROR) \ > FUNC(BAD_TREE, ERROR) \ > FUNC(BAD_TREE_SHA1, ERROR) \ > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 69c00073eb..382c73fcf7 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -3434,11 +3434,81 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, > const char *refs_check_dir, > struct dir_iterator *iter); > > +/* > + * Check the symref "pointee_name" and "pointee_path". The caller should > + * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name" > + * would be the content after "refs:". > + */ > +static int files_fsck_symref_target(struct fsck_options *o, > + struct fsck_ref_report *report, > + const char *refname, > + struct strbuf *pointee_name, > + struct strbuf *pointee_path) > +{ > + const char *newline_pos = NULL; > + const char *p = NULL; > + struct stat st; > + int ret = 0; > + > + if (!skip_prefix(pointee_name->buf, "refs/", &p)) { > + > + ret = fsck_report_ref(o, report, > + FSCK_MSG_BAD_SYMREF_POINTEE, > + "points to ref outside the refs directory"); > + goto out; > + } > + > + newline_pos = strrchr(p, '\n'); > + if (!newline_pos || *(newline_pos + 1)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_REF_MISSING_NEWLINE, > + "missing newline"); > + } The second condition `*(newline_pos + 1)` checks whether there is any data after the newline, doesn't it? That indicates a different kind of error than "missing newline", namely that there is trailing garbage. I guess we'd want to report a separate info-level message for this. Also, shouldn't we use `strchr` instead of `strrchr()`? Otherwise, we're only checking for trailing garbage after the _last_ newline, not after the first one. > + if (check_refname_format(pointee_name->buf, 0)) { > + /* > + * When containing null-garbage, "check_refname_format" will > + * fail, we should trim the "pointee" to check again. > + */ > + strbuf_rtrim(pointee_name); > + if (!check_refname_format(pointee_name->buf, 0)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_TRAILING_REF_CONTENT, > + "trailing null-garbage"); > + goto out; > + } Ah, I didn't get at first that we're doing the check a second time here. As mentioned above, I think we should check for trailing garbage further up already and more explicitly. > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh > index 7c1910d784..69280795ca 100755 > --- a/t/t0602-reffiles-fsck.sh > +++ b/t/t0602-reffiles-fsck.sh > @@ -176,4 +176,58 @@ test_expect_success 'regular ref content should be checked' ' > test_cmp expect err > ' > > +test_expect_success 'symbolic ref content should be checked' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + branch_dir_prefix=.git/refs/heads && > + tag_dir_prefix=.git/refs/tags && > + cd repo && > + git commit --allow-empty -m initial && > + git checkout -b branch-1 && > + git tag tag-1 && > + git checkout -b a/b/branch-2 && > + > + printf "ref: refs/heads/branch" > $branch_dir_prefix/branch-1-no-newline && > + git refs verify 2>err && > + cat >expect <<-EOF && > + warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline > + EOF > + rm $branch_dir_prefix/branch-1-no-newline && > + test_cmp expect err && Same comments here as in the preceding patch for the tests. Patrick
On Tue, Aug 27, 2024 at 12:19:11PM -0700, Junio C Hamano wrote: > shejialuo <shejialuo@gmail.com> writes: > > > In order to check the content of the symbolic ref, create a function > > "files_fsck_symref_target". It will first check whether the "pointee" is > > under the "refs/" directory and then we will check the "pointee" itself. > > Hmph, as the pointee must be within the usual places that you would > find refs (either in refs/ directory or pseudo ref files immediately > below $GIT_DIR), wouldn't we check the pointee when fsck (or "git > refs verify") run and check everything? The pointee will have its > turn to be checked, and I am not sure why you need to check the > pointee when you find a symbolic ref is pointing at it, which will > lead for it to be checked twice (or more). > > I however did not find an additional code to "check the pointee itself" > in the patch, so perhaps it is OK---the only thing that needs fixing > may be the above paragraph if that is the case. > Yes, "we will check the 'pointee'" itself makes the reader confused. I will fix the above paragraph. Actually we do not check the "pointee", but check the symref content. Will fix this in the next version. > > There is no specification about the content of the symbolic ref. > > Although we do write "ref: %s\n" to create a symbolic ref by using > > "git-symbolic-ref(1)" command. However, this is not mandatory. We still > > accept symbolic refs with null trailing garbage. Put it more specific, > > the following are correct: > > > > 1. "ref: refs/heads/master " > > 2. "ref: refs/heads/master \n \n" > > 3. "ref: refs/heads/master\n\n" > > > > But we do not allow any non-null trailing garbage. > > Your use of word "null" is probably too confusing to contributors to > this project. None of the above has NUL bytes in them. I think you > want to say something like this: > > A regular file is accepted as a textual symbolic ref if it > begins with "ref:", followed by zero or more whitespaces, > followed by the full refname (e.g. "refs/heads/master", > "refs/tags/v1.0"), followed only by whitespace characters. We > always write a single SP after "ref:" and a single LF after the > full refname, but third-party reimplementations of Git may have > taken advantage of the looser syntax that is allowed as above. > Thanks for your suggestion. I will improve this in the next version. > > The following are bad > > symbolic contents which will be reported as fsck error by "git-fsck(1)". > > > > 1. "ref: refs/heads/master garbage\n" > > 2. "ref: refs/heads/master \n\n\n garbage " > > > > In order to provide above checks, we will use "strrchr" to check whether > > we have newline in the ref content. > > strrchr() to look for only LF is overly strict. You need to match > what refs/files-backend.c:read_ref_internal() does to the contents > read from such a loose ref file, i.e. strbuf_rtrim(). Any isspace() > bytes are trimmed at the end, including SP, HT, CR and LF. > I will look into how "strbuf_rtrim" does to see whether we can reuse some functions to avoid repetition. > > +static int files_fsck_symref_target(struct fsck_options *o, > > + struct fsck_ref_report *report, > > + const char *refname, > > + struct strbuf *pointee_name, > > + struct strbuf *pointee_path) > > +{ > > + const char *newline_pos = NULL; > > + const char *p = NULL; > > + struct stat st; > > + int ret = 0; > > + > > + if (!skip_prefix(pointee_name->buf, "refs/", &p)) { > > + > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_BAD_SYMREF_POINTEE, > > + "points to ref outside the refs directory"); > > + goto out; > > + } > > + > > + newline_pos = strrchr(p, '\n'); > > + if (!newline_pos || *(newline_pos + 1)) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_REF_MISSING_NEWLINE, > > + "missing newline"); > > If newline_pos is NULL, it is truly a "missing newline" situation. > If I am reading the code correctly, the severity level is set to > INFO, which is good. > > If newline_pos is not NULL but newline_pos[1] is not NUL, however, > that is not a "missing newline". "refs: refs/heads/master\n " would > trigger this report, for example. > When I design this, I actually consider "ref: refs/heads/master\n " is still missing the newline. And then we also report that it has garbage. I think "ref: refs/heads/master\n \n" is not missing the newline. But, I don't think this is good. I will find a good way to handle this. > As far as I can tell, such a textual symbolic ref is taken as a > valid symbolic ref pointing at "refs/heads/master" by > refs/files-backend.c:read_ref_internal(), so we are trying to detect > a valid but curiously formatted textual symbolic ref file with the > above code? Yes, these situations will be taken as a valid symbolic ref but actually there are something wrong. So this is what we need to care about. > > And strrchr() to find the last LF is not sufficient for that > purpose. We would never write "refs: refs/head/master \n", > but the above code will find the LF, be satisified that the LF is > followed by NUL, without realizing that SP there is not something we > would have written! I totally ignored this situation, and in current patch, we cannot check this. I know why Patrick lets me use "strchr" but not "strrchr". I think we should find the last '\n'. But instead we need to find the first '\n'. However, in this example, we will still fail by using "strchr". This part should be totally re-designed. > > I am not sure if that is worth detecting that if it is something we > would have written, but if that were the case, then you would > probably need to do > > (1) check the last byte of pointee_name.buf[] to make sure that > it is LF; and > (2) remember pointee_name.len, run strbuf_rtrim() on pointee_name, > and that LF at the end was the only thing that was trimmed by > checking the pointee_name.len after trimming. > > or something like that. Then you do not have to have an ugly "oh we > need to check again"---the production code would not do that, either. > Yes, this is a good idea. > > + if (check_refname_format(pointee_name->buf, 0)) { > > + /* > > + * When containing null-garbage, "check_refname_format" will > > + * fail, we should trim the "pointee" to check again. > > + */ > > + strbuf_rtrim(pointee_name); > > + if (!check_refname_format(pointee_name->buf, 0)) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_TRAILING_REF_CONTENT, > > + "trailing null-garbage"); > > + goto out; > > + } > > IOW, the above "let's retry" feels totally wrong. You shouldn't > have to do so, and that comes from running check_refname_format() > before rtrimming the pointee_name. > Yes, actually, I have thought I could compare the length change after executing the "strbuf_rtrim". I don't want to create two new variables, so I call "check_refname_format" twice. Will fix this in the next version. > > + if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_BAD_SYMREF_POINTEE, > > + "points to an invalid file type"); > > + goto out; > > I do not think it is wrong per se, but I am not sure if this check > is needed, either. When "git fsck" or "git refs verify" is told to > check the loose refs, wouldn't it walk the refs directory and report > such an unusual filesystem entity that is not a regular file, > symbolic link, or a directory as "there is unusual cruft exist > here"? When setting up the infrastructure, actually we DO report filesystem entity that is not a regular file or symbolic link like the following: if (S_ISDIR(iter->st.st_mode)) { continue; } else if (S_ISREG(iter->st.st_mode) || S_ISLNK(iter->st.st_mode)) { ...; } else { // report file system error } We do not check the directory, because the directory will be always valid in the filesystem. we could not say that "refs/heads/a/" is a bad ref. So, this check mainly need to check whether the symref points to a directory. Actually, Patrick has also gave the review about this question in the previous version: > What exactly are we guarding against here? Don't we already verify that > files in `refs/` have the correct type? Or are we checking that it does > not point to a directory? However, we should remove this line, because "check_refname_format" will take care for us. git check-ref-format 'refs/heads/' It will generate an error. So, we could entirely remove this line and let "check_refname_format" do this. And we could also remove the if (lstat(pointee_path->buf, &st) < 0) goto out; The code will be much more clean. Thanks, Jialuo
On Wed, Aug 28, 2024 at 02:50:09PM +0200, Patrick Steinhardt wrote: > On Wed, Aug 28, 2024 at 12:08:07AM +0800, shejialuo wrote: > > We have already introduced the checks for regular refs. There is no need > > to check the consistency of the target which the symbolic ref points to. > > Instead, we just check the content of the symbolic ref itself. > > > > In order to check the content of the symbolic ref, create a function > > "files_fsck_symref_target". It will first check whether the "pointee" is > > under the "refs/" directory and then we will check the "pointee" itself. > > > > There is no specification about the content of the symbolic ref. > > Although we do write "ref: %s\n" to create a symbolic ref by using > > "git-symbolic-ref(1)" command. However, this is not mandatory. We still > > accept symbolic refs with null trailing garbage. Put it more specific, > > the following are correct: > > > > 1. "ref: refs/heads/master " > > 2. "ref: refs/heads/master \n \n" > > 3. "ref: refs/heads/master\n\n" > > Now that we're talking about tightening the rules for direct refs, I > wonder whether we'd also want to apply the same rules to symrefs. > Namely, when there is trailing whitespace we should generate an > INFO-level message about that, too. This is mostly for the sake of > consistency. > Yes, actually this patch does this. I think I need to mention we reuse the "FSCK_INFO" message id defined in the [PATCH v2 2/4]. > [snip] > > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > > index fc074fc571..85fd058c81 100644 > > --- a/Documentation/fsck-msgids.txt > > +++ b/Documentation/fsck-msgids.txt > > @@ -28,6 +28,9 @@ > > `badRefName`:: > > (ERROR) A ref has an invalid format. > > > > +`badSymrefPointee`:: > > + (ERROR) The pointee of a symref is bad. > > I think we'd want to clarify what "bad" is supposed to mean. Like, is a > missing symref pointee bad? If this is about the format of the pointee's > name, we might want to call this "badSymrefPointeeName". > I agree, bad is too general here, we need to make it concrete. > Also, I think we don't typically call the value of a symbolic ref > "pointee", but "target". Searching for "pointee" in our codebase only > gives a single hit, and that one is not related to symbolic refs. > Thanks, I will fix this in the next version. > > +/* > > + * Check the symref "pointee_name" and "pointee_path". The caller should > > + * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name" > > + * would be the content after "refs:". > > + */ > > +static int files_fsck_symref_target(struct fsck_options *o, > > + struct fsck_ref_report *report, > > + const char *refname, > > + struct strbuf *pointee_name, > > + struct strbuf *pointee_path) > > +{ > > + const char *newline_pos = NULL; > > + const char *p = NULL; > > + struct stat st; > > + int ret = 0; > > + > > + if (!skip_prefix(pointee_name->buf, "refs/", &p)) { > > + > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_BAD_SYMREF_POINTEE, > > + "points to ref outside the refs directory"); > > + goto out; > > + } > > + > > + newline_pos = strrchr(p, '\n'); > > + if (!newline_pos || *(newline_pos + 1)) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_REF_MISSING_NEWLINE, > > + "missing newline"); > > + } > > The second condition `*(newline_pos + 1)` checks whether there is any > data after the newline, doesn't it? That indicates a different kind of > error than "missing newline", namely that there is trailing garbage. I > guess we'd want to report a separate info-level message for this. > > Also, shouldn't we use `strchr` instead of `strrchr()`? Otherwise, we're > only checking for trailing garbage after the _last_ newline, not after > the first one. > Yes, I totally made a mistake here. I will try to think about a new design. I have already replied to Junio like the following: > > And strrchr() to find the last LF is not sufficient for that > > purpose. We would never write "refs: refs/head/master \n", > > but the above code will find the LF, be satisified that the LF is > > followed by NUL, without realizing that SP there is not something we > > would have written! > I totally ignored this situation, and in current patch, we cannot check > this. I know why Patrick lets me use "strchr" but not "strrchr". I think > we should find the last '\n'. But instead we need to find the first > '\n'. However, in this example, we will still fail by using "strchr". > This part should be totally re-designed. > > + if (check_refname_format(pointee_name->buf, 0)) { > > + /* > > + * When containing null-garbage, "check_refname_format" will > > + * fail, we should trim the "pointee" to check again. > > + */ > > + strbuf_rtrim(pointee_name); > > + if (!check_refname_format(pointee_name->buf, 0)) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_TRAILING_REF_CONTENT, > > + "trailing null-garbage"); > > + goto out; > > + } > > Ah, I didn't get at first that we're doing the check a second time here. > As mentioned above, I think we should check for trailing garbage further > up already and more explicitly. > Well, I guess the implementation about this is totally wrong, which will make the reviewers hard to understand. I will drop this way to explicitly check the garbage. Thanks, Jialuo
Patrick Steinhardt <ps@pks.im> writes: > Also, I think we don't typically call the value of a symbolic ref > "pointee", but "target". Searching for "pointee" in our codebase only > gives a single hit, and that one is not related to symbolic refs. Yesterday while I was studying for reviewing this series, I saw some existing code that call them "referent". There may also be "target". >> + if (!newline_pos || *(newline_pos + 1)) { >> + ret = fsck_report_ref(o, report, >> + FSCK_MSG_REF_MISSING_NEWLINE, >> + "missing newline"); >> + } > > The second condition `*(newline_pos + 1)` checks whether there is any > data after the newline, doesn't it? That indicates a different kind of > error than "missing newline", namely that there is trailing garbage. I > guess we'd want to report a separate info-level message for this. > > Also, shouldn't we use `strchr` instead of `strrchr()`? Otherwise, we're > only checking for trailing garbage after the _last_ newline, not after > the first one. None of the above. It should strbuf_rtrim() and if we removed anything but just a single terminating LF, we are looking at something we wouldn't ahve written. The next check_refname_format() call would then find "trailing garbage". - "refs/heads/master \n " gets rtrimmed to "refs/heads/master", which is "valid but curious". - "refs/heads/main trash\n " becomes "refs/heads/main trash", which is outright bad.
On Wed, Aug 28, 2024 at 08:41:08AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > Also, I think we don't typically call the value of a symbolic ref > > "pointee", but "target". Searching for "pointee" in our codebase only > > gives a single hit, and that one is not related to symbolic refs. > > Yesterday while I was studying for reviewing this series, I saw some > existing code that call them "referent". There may also be "target". Ah, true, I totally forgot about "referent". I guess we use both, but it would of course be great if we only had a single term to refer them. Referent seems to be used more widely, at least in the refs subsystem. > >> + if (!newline_pos || *(newline_pos + 1)) { > >> + ret = fsck_report_ref(o, report, > >> + FSCK_MSG_REF_MISSING_NEWLINE, > >> + "missing newline"); > >> + } > > > > The second condition `*(newline_pos + 1)` checks whether there is any > > data after the newline, doesn't it? That indicates a different kind of > > error than "missing newline", namely that there is trailing garbage. I > > guess we'd want to report a separate info-level message for this. > > > > Also, shouldn't we use `strchr` instead of `strrchr()`? Otherwise, we're > > only checking for trailing garbage after the _last_ newline, not after > > the first one. > > None of the above. It should strbuf_rtrim() and if we removed > anything but just a single terminating LF, we are looking at > something we wouldn't ahve written. The next check_refname_format() > call would then find "trailing garbage". Fair. > - "refs/heads/master \n " gets rtrimmed to "refs/heads/master", > which is "valid but curious". Okay. This _may_ be something to generate an info message for, mostly in the same spirit as we want to do it for direct refs. > - "refs/heads/main trash\n " becomes "refs/heads/main trash", > which is outright bad. Yeah, this one should be an error indeed. Patrick
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt index fc074fc571..85fd058c81 100644 --- a/Documentation/fsck-msgids.txt +++ b/Documentation/fsck-msgids.txt @@ -28,6 +28,9 @@ `badRefName`:: (ERROR) A ref has an invalid format. +`badSymrefPointee`:: + (ERROR) The pointee of a symref is bad. + `badTagName`:: (INFO) A tag has an invalid format. diff --git a/fsck.h b/fsck.h index b85072df57..cbe837f84c 100644 --- a/fsck.h +++ b/fsck.h @@ -34,6 +34,7 @@ enum fsck_msg_type { FUNC(BAD_REF_CONTENT, ERROR) \ FUNC(BAD_REF_FILETYPE, ERROR) \ FUNC(BAD_REF_NAME, ERROR) \ + FUNC(BAD_SYMREF_POINTEE, ERROR) \ FUNC(BAD_TIMEZONE, ERROR) \ FUNC(BAD_TREE, ERROR) \ FUNC(BAD_TREE_SHA1, ERROR) \ diff --git a/refs/files-backend.c b/refs/files-backend.c index 69c00073eb..382c73fcf7 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3434,11 +3434,81 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, const char *refs_check_dir, struct dir_iterator *iter); +/* + * Check the symref "pointee_name" and "pointee_path". The caller should + * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name" + * would be the content after "refs:". + */ +static int files_fsck_symref_target(struct fsck_options *o, + struct fsck_ref_report *report, + const char *refname, + struct strbuf *pointee_name, + struct strbuf *pointee_path) +{ + const char *newline_pos = NULL; + const char *p = NULL; + struct stat st; + int ret = 0; + + if (!skip_prefix(pointee_name->buf, "refs/", &p)) { + + ret = fsck_report_ref(o, report, + FSCK_MSG_BAD_SYMREF_POINTEE, + "points to ref outside the refs directory"); + goto out; + } + + newline_pos = strrchr(p, '\n'); + if (!newline_pos || *(newline_pos + 1)) { + ret = fsck_report_ref(o, report, + FSCK_MSG_REF_MISSING_NEWLINE, + "missing newline"); + } + + if (check_refname_format(pointee_name->buf, 0)) { + /* + * When containing null-garbage, "check_refname_format" will + * fail, we should trim the "pointee" to check again. + */ + strbuf_rtrim(pointee_name); + if (!check_refname_format(pointee_name->buf, 0)) { + ret = fsck_report_ref(o, report, + FSCK_MSG_TRAILING_REF_CONTENT, + "trailing null-garbage"); + goto out; + } + + ret = fsck_report_ref(o, report, + FSCK_MSG_BAD_SYMREF_POINTEE, + "points to refname with invalid format"); + } + + /* + * Missing target should not be treated as any error worthy event and + * not even warn. It is a common case that a symbolic ref points to a + * ref that does not exist yet. If the target ref does not exist, just + * skip the check for the file type. + */ + if (lstat(pointee_path->buf, &st) < 0) + goto out; + + if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) { + ret = fsck_report_ref(o, report, + FSCK_MSG_BAD_SYMREF_POINTEE, + "points to an invalid file type"); + goto out; + } + +out: + return ret; +} + static int files_fsck_refs_content(struct ref_store *ref_store, struct fsck_options *o, const char *refs_check_dir, struct dir_iterator *iter) { + struct strbuf pointee_path = STRBUF_INIT; struct strbuf ref_content = STRBUF_INIT; struct strbuf referent = STRBUF_INIT; struct strbuf refname = STRBUF_INIT; @@ -3482,6 +3552,12 @@ static int files_fsck_refs_content(struct ref_store *ref_store, "trailing garbage in ref"); goto cleanup; } + } else { + strbuf_addf(&pointee_path, "%s/%s", + ref_store->gitdir, referent.buf); + ret = files_fsck_symref_target(o, &report, refname.buf, + &referent, + &pointee_path); } goto cleanup; } @@ -3490,6 +3566,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store, strbuf_release(&refname); strbuf_release(&ref_content); strbuf_release(&referent); + strbuf_release(&pointee_path); return ret; } diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh index 7c1910d784..69280795ca 100755 --- a/t/t0602-reffiles-fsck.sh +++ b/t/t0602-reffiles-fsck.sh @@ -176,4 +176,58 @@ test_expect_success 'regular ref content should be checked' ' test_cmp expect err ' +test_expect_success 'symbolic ref content should be checked' ' + test_when_finished "rm -rf repo" && + git init repo && + branch_dir_prefix=.git/refs/heads && + tag_dir_prefix=.git/refs/tags && + cd repo && + git commit --allow-empty -m initial && + git checkout -b branch-1 && + git tag tag-1 && + git checkout -b a/b/branch-2 && + + printf "ref: refs/heads/branch" > $branch_dir_prefix/branch-1-no-newline && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline + EOF + rm $branch_dir_prefix/branch-1-no-newline && + test_cmp expect err && + + printf "ref: refs/heads/branch " > $branch_dir_prefix/a/b/branch-trailing && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/a/b/branch-trailing: refMissingNewline: missing newline + warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage + EOF + rm $branch_dir_prefix/a/b/branch-trailing && + test_cmp expect err && + + printf "ref: refs/heads/branch\n\n" > $branch_dir_prefix/a/b/branch-trailing && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage + EOF + rm $branch_dir_prefix/a/b/branch-trailing && + test_cmp expect err && + + printf "ref: refs/heads/branch \n\n " > $branch_dir_prefix/a/b/branch-trailing && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/a/b/branch-trailing: refMissingNewline: missing newline + warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage + EOF + rm $branch_dir_prefix/a/b/branch-trailing && + test_cmp expect err && + + printf "ref: refs/heads/.branch\n" > $branch_dir_prefix/branch-2-bad && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: refs/heads/branch-2-bad: badSymrefPointee: points to refname with invalid format + EOF + rm $branch_dir_prefix/branch-2-bad && + test_cmp expect err +' + test_done
We have already introduced the checks for regular refs. There is no need to check the consistency of the target which the symbolic ref points to. Instead, we just check the content of the symbolic ref itself. In order to check the content of the symbolic ref, create a function "files_fsck_symref_target". It will first check whether the "pointee" is under the "refs/" directory and then we will check the "pointee" itself. There is no specification about the content of the symbolic ref. Although we do write "ref: %s\n" to create a symbolic ref by using "git-symbolic-ref(1)" command. However, this is not mandatory. We still accept symbolic refs with null trailing garbage. Put it more specific, the following are correct: 1. "ref: refs/heads/master " 2. "ref: refs/heads/master \n \n" 3. "ref: refs/heads/master\n\n" But we do not allow any non-null trailing garbage. The following are bad symbolic contents which will be reported as fsck error by "git-fsck(1)". 1. "ref: refs/heads/master garbage\n" 2. "ref: refs/heads/master \n\n\n garbage " In order to provide above checks, we will use "strrchr" to check whether we have newline in the ref content. Then we will check the name of the "pointee" is correct by using "check_refname_format". If the function fails, we need to trim the "pointee" to see whether the null-garbage causes the function fails. If so, we need to report that there is null-garbage in the symref content. Otherwise, we should report the user the "pointee" is bad. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- Documentation/fsck-msgids.txt | 3 ++ fsck.h | 1 + refs/files-backend.c | 77 +++++++++++++++++++++++++++++++++++ t/t0602-reffiles-fsck.sh | 54 ++++++++++++++++++++++++ 4 files changed, 135 insertions(+)