Message ID | ZsIM4OZWfylcP5Ix@ArchLinux (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | add ref content check for files backend | expand |
On Sun, Aug 18, 2024 at 11:01:52PM +0800, shejialuo wrote: > We have already introduced the checks for regular refs. There is no need > to check the consistency of the target which the symbolic ref points to. > Instead, we just check the content of the symbolic ref itself. > > In order to check the content of the symbolic ref, create a function > "files_fsck_symref_target". It will first check whether the "pointee" is > under the "refs/" directory and then we will check the "pointee" itself. > > There is no specification about the content of the symbolic ref. > Although we do write "ref: %s\n" to create a symbolic ref by using > "git-symbolic-ref(1)" command. However, this is not mandatory. We still > accept symbolic refs with null trailing garbage. Put it more specific, > the following are correct: > > 1. "ref: refs/heads/master " > 2. "ref: refs/heads/master \n \n" > 3. "ref: refs/heads/master\n\n" > > But we do not allow any non-null trailing garbage. The following are bad > symbolic contents. > > 1. "ref: refs/heads/master garbage\n" > 2. "ref: refs/heads/master \n\n\n garbage " > > In order to provide above checks, we will traverse the "pointee" to > report the user whether this is null-garbage or no newline. And if > symbolic refs contain non-null garbage, we will report > "FSCK_MSG_BAD_REF_CONTENT" to the user. > > Then, we will check the name of the "pointee" is correct by using > "check_refname_format". And then if we can access the "pointee_path" in > the file system, we should ensure that the file type is correct. > > Mentored-by: Patrick Steinhardt <ps@pks.im> > Mentored-by: Karthik Nayak <karthik.188@gmail.com> > Signed-off-by: shejialuo <shejialuo@gmail.com> > --- > Documentation/fsck-msgids.txt | 3 ++ > fsck.h | 1 + > refs/files-backend.c | 87 +++++++++++++++++++++++++++++++++++ > t/t0602-reffiles-fsck.sh | 52 +++++++++++++++++++++ > 4 files changed, 143 insertions(+) > > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > index 1688c2f1fe..73587661dc 100644 > --- a/Documentation/fsck-msgids.txt > +++ b/Documentation/fsck-msgids.txt > @@ -28,6 +28,9 @@ > `badRefName`:: > (ERROR) A ref has an invalid format. > > +`badSymrefPointee`:: > + (ERROR) The pointee of a symref is bad. > + > `badTagName`:: > (INFO) A tag has an invalid format. > > diff --git a/fsck.h b/fsck.h > index 975d9b9da9..985b674dd9 100644 > --- a/fsck.h > +++ b/fsck.h > @@ -34,6 +34,7 @@ enum fsck_msg_type { > FUNC(BAD_REF_CONTENT, ERROR) \ > FUNC(BAD_REF_FILETYPE, ERROR) \ > FUNC(BAD_REF_NAME, ERROR) \ > + FUNC(BAD_SYMREF_POINTEE, ERROR) \ > FUNC(BAD_TIMEZONE, ERROR) \ > FUNC(BAD_TREE, ERROR) \ > FUNC(BAD_TREE_SHA1, ERROR) \ > diff --git a/refs/files-backend.c b/refs/files-backend.c > index ae71692f36..bfb8d338d2 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -3434,12 +3434,92 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, > const char *refs_check_dir, > struct dir_iterator *iter); > > +/* > + * Check the symref "pointee_name" and "pointee_path". The caller should > + * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name" > + * would be the content after "refs:". > + */ > +static int files_fsck_symref_target(struct fsck_options *o, > + struct fsck_ref_report *report, > + const char *refname, > + struct strbuf *pointee_name, > + struct strbuf *pointee_path) > +{ > + unsigned int newline_num = 0; > + unsigned int space_num = 0; > + const char *p = NULL; > + struct stat st; > + int ret = 0; > + > + if (!skip_prefix(pointee_name->buf, "refs/", &p)) { > + > + ret = fsck_report_ref(o, report, > + FSCK_MSG_BAD_SYMREF_POINTEE, > + "points to ref outside the refs directory"); > + goto out; > + } > + > + while (*p != '\0') { We typically write this `while (*p)`. > + if ((space_num || newline_num) && !isspace(*p)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_BAD_REF_CONTENT, > + "contains non-null garbage"); > + goto out; > + } > + > + if (*p == '\n') { > + newline_num++; > + } else if (*p == ' ') { > + space_num++; > + } > + p++; > + } Can't we replace this with a single `strchr('\n')` call to check for the newline and then verify that the next character is a `\0`? The check for spaces would then be handled by `check_refname_format()`. > + /* > + * Missing target should not be treated as any error worthy event and > + * not even warn. It is a common case that a symbolic ref points to a > + * ref that does not exist yet. If the target ref does not exist, just > + * skip the check for the file type. > + */ > + if (lstat(pointee_path->buf, &st) < 0) > + goto out; > + > + if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_BAD_SYMREF_POINTEE, > + "points to an invalid file type"); > + goto out; > + } What exactly are we guarding against here? Don't we already verify that files in `refs/` have the correct type? Or are we checking that it does not point to a directory? Patrick
On Thu, Aug 22, 2024 at 10:53:57AM +0200, Patrick Steinhardt wrote: > > + if ((space_num || newline_num) && !isspace(*p)) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_BAD_REF_CONTENT, > > + "contains non-null garbage"); > > + goto out; > > + } > > + > > + if (*p == '\n') { > > + newline_num++; > > + } else if (*p == ' ') { > > + space_num++; > > + } > > + p++; > > + } > > Can't we replace this with a single `strchr('\n')` call to check for the > newline and then verify that the next character is a `\0`? The check for > spaces would then be handled by `check_refname_format()`. > We cannot. Think about this situation. "ref: refs/heads/master \n " We find that the next character of '\n' is not '\0'. Then we leave it to "check_refname_format". But "check_refname_format" will report an error here, but this is an allowed symref. But I think using `strchr` is a nice way. I will try to find an elegant way here to handle this logic here. > > + /* > > + * Missing target should not be treated as any error worthy event and > > + * not even warn. It is a common case that a symbolic ref points to a > > + * ref that does not exist yet. If the target ref does not exist, just > > + * skip the check for the file type. > > + */ > > + if (lstat(pointee_path->buf, &st) < 0) > > + goto out; > > + > > + if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_BAD_SYMREF_POINTEE, > > + "points to an invalid file type"); > > + goto out; > > + } > > What exactly are we guarding against here? Don't we already verify that > files in `refs/` have the correct type? Or are we checking that it does > not point to a directory? > When scanning the "refs" directory, we will check the file in the ref database, but we ignore the directory. So we are checking to know whether it does not point to a directory. If the ref points to a bad file type for example "ref/heads/bad-file" If it is a block type file. We will first report that "refs/heads/bad-file" is a bad file and then report ref points to bad file "refs/heads/bad-file". Actually, I think this is a little redundant here, but we can be tolerant about this because we need to guard against directory. We need to consider this situation. So we could let this be. > Patrick
On Thu, Aug 22, 2024 at 08:42:02PM +0800, shejialuo wrote: > On Thu, Aug 22, 2024 at 10:53:57AM +0200, Patrick Steinhardt wrote: > > > > > + if ((space_num || newline_num) && !isspace(*p)) { > > > + ret = fsck_report_ref(o, report, > > > + FSCK_MSG_BAD_REF_CONTENT, > > > + "contains non-null garbage"); > > > + goto out; > > > + } > > > + > > > + if (*p == '\n') { > > > + newline_num++; > > > + } else if (*p == ' ') { > > > + space_num++; > > > + } > > > + p++; > > > + } > > > > Can't we replace this with a single `strchr('\n')` call to check for the > > newline and then verify that the next character is a `\0`? The check for > > spaces would then be handled by `check_refname_format()`. > > > > We cannot. Think about this situation. > > "ref: refs/heads/master \n " > > We find that the next character of '\n' is not '\0'. Then we leave it to > "check_refname_format". But "check_refname_format" will report an error > here, but this is an allowed symref. Wouldn't it be correct to warn about this? To me the above very much looks like garbage after the refname, same like we'd also warn about such garbage for direct refs. Patrick
On Fri, Aug 23, 2024 at 07:36:10AM +0200, Patrick Steinhardt wrote: > On Thu, Aug 22, 2024 at 08:42:02PM +0800, shejialuo wrote: > > On Thu, Aug 22, 2024 at 10:53:57AM +0200, Patrick Steinhardt wrote: > > > > > > > > + if ((space_num || newline_num) && !isspace(*p)) { > > > > + ret = fsck_report_ref(o, report, > > > > + FSCK_MSG_BAD_REF_CONTENT, > > > > + "contains non-null garbage"); > > > > + goto out; > > > > + } > > > > + > > > > + if (*p == '\n') { > > > > + newline_num++; > > > > + } else if (*p == ' ') { > > > > + space_num++; > > > > + } > > > > + p++; > > > > + } > > > > > > Can't we replace this with a single `strchr('\n')` call to check for the > > > newline and then verify that the next character is a `\0`? The check for > > > spaces would then be handled by `check_refname_format()`. > > > > > > > We cannot. Think about this situation. > > > > "ref: refs/heads/master \n " > > > > We find that the next character of '\n' is not '\0'. Then we leave it to > > "check_refname_format". But "check_refname_format" will report an error > > here, but this is an allowed symref. > > Wouldn't it be correct to warn about this? To me the above very much > looks like garbage after the refname, same like we'd also warn about > such garbage for direct refs. > Yes, we should warn about this. But only null-garbage is allowed for symref. The following situation is bad: "ref: refs/heads/master \n garbage\n" We should report error here, from my perspective, it's a FATAL ERROR. However, let's decide how to do this when we know what fsck error level we should set. > Patrick
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt index 1688c2f1fe..73587661dc 100644 --- a/Documentation/fsck-msgids.txt +++ b/Documentation/fsck-msgids.txt @@ -28,6 +28,9 @@ `badRefName`:: (ERROR) A ref has an invalid format. +`badSymrefPointee`:: + (ERROR) The pointee of a symref is bad. + `badTagName`:: (INFO) A tag has an invalid format. diff --git a/fsck.h b/fsck.h index 975d9b9da9..985b674dd9 100644 --- a/fsck.h +++ b/fsck.h @@ -34,6 +34,7 @@ enum fsck_msg_type { FUNC(BAD_REF_CONTENT, ERROR) \ FUNC(BAD_REF_FILETYPE, ERROR) \ FUNC(BAD_REF_NAME, ERROR) \ + FUNC(BAD_SYMREF_POINTEE, ERROR) \ FUNC(BAD_TIMEZONE, ERROR) \ FUNC(BAD_TREE, ERROR) \ FUNC(BAD_TREE_SHA1, ERROR) \ diff --git a/refs/files-backend.c b/refs/files-backend.c index ae71692f36..bfb8d338d2 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3434,12 +3434,92 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, const char *refs_check_dir, struct dir_iterator *iter); +/* + * Check the symref "pointee_name" and "pointee_path". The caller should + * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name" + * would be the content after "refs:". + */ +static int files_fsck_symref_target(struct fsck_options *o, + struct fsck_ref_report *report, + const char *refname, + struct strbuf *pointee_name, + struct strbuf *pointee_path) +{ + unsigned int newline_num = 0; + unsigned int space_num = 0; + const char *p = NULL; + struct stat st; + int ret = 0; + + if (!skip_prefix(pointee_name->buf, "refs/", &p)) { + + ret = fsck_report_ref(o, report, + FSCK_MSG_BAD_SYMREF_POINTEE, + "points to ref outside the refs directory"); + goto out; + } + + while (*p != '\0') { + if ((space_num || newline_num) && !isspace(*p)) { + ret = fsck_report_ref(o, report, + FSCK_MSG_BAD_REF_CONTENT, + "contains non-null garbage"); + goto out; + } + + if (*p == '\n') { + newline_num++; + } else if (*p == ' ') { + space_num++; + } + p++; + } + + if (space_num || newline_num > 1) { + ret = fsck_report_ref(o, report, + FSCK_MSG_TRAILING_REF_CONTENT, + "trailing null-garbage"); + } else if (!newline_num) { + ret = fsck_report_ref(o, report, + FSCK_MSG_REF_MISSING_NEWLINE, + "missing newline"); + } + + strbuf_rtrim(pointee_name); + + if (check_refname_format(pointee_name->buf, 0)) { + ret = fsck_report_ref(o, report, + FSCK_MSG_BAD_SYMREF_POINTEE, + "points to refname with invalid format"); + } + + /* + * Missing target should not be treated as any error worthy event and + * not even warn. It is a common case that a symbolic ref points to a + * ref that does not exist yet. If the target ref does not exist, just + * skip the check for the file type. + */ + if (lstat(pointee_path->buf, &st) < 0) + goto out; + + if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) { + ret = fsck_report_ref(o, report, + FSCK_MSG_BAD_SYMREF_POINTEE, + "points to an invalid file type"); + goto out; + } + +out: + return ret; +} + static int files_fsck_refs_content(struct ref_store *ref_store, struct fsck_options *o, const char *refs_check_dir, struct dir_iterator *iter) { struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT; + struct strbuf pointee_path = STRBUF_INIT; struct strbuf ref_content = STRBUF_INIT; struct strbuf referent = STRBUF_INIT; struct strbuf refname = STRBUF_INIT; @@ -3482,6 +3562,12 @@ static int files_fsck_refs_content(struct ref_store *ref_store, "trailing garbage in ref"); goto cleanup; } + } else { + strbuf_addf(&pointee_path, "%s/%s", + ref_store->gitdir, referent.buf); + ret = files_fsck_symref_target(o, &report, refname.buf, + &referent, + &pointee_path); } } @@ -3489,6 +3575,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store, strbuf_release(&refname); strbuf_release(&ref_content); strbuf_release(&referent); + strbuf_release(&pointee_path); return ret; } diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh index 7c1910d784..e8fc2ef015 100755 --- a/t/t0602-reffiles-fsck.sh +++ b/t/t0602-reffiles-fsck.sh @@ -176,4 +176,56 @@ test_expect_success 'regular ref content should be checked' ' test_cmp expect err ' +test_expect_success 'symbolic ref content should be checked' ' + test_when_finished "rm -rf repo" && + git init repo && + branch_dir_prefix=.git/refs/heads && + tag_dir_prefix=.git/refs/tags && + cd repo && + git commit --allow-empty -m initial && + git checkout -b branch-1 && + git tag tag-1 && + git checkout -b a/b/branch-2 && + + printf "ref: refs/heads/branch" > $branch_dir_prefix/branch-1-no-newline && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline + EOF + rm $branch_dir_prefix/branch-1-no-newline && + test_cmp expect err && + + printf "ref: refs/heads/branch " > $branch_dir_prefix/a/b/branch-trailing && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage + EOF + rm $branch_dir_prefix/a/b/branch-trailing && + test_cmp expect err && + + printf "ref: refs/heads/branch\n\n" > $branch_dir_prefix/a/b/branch-trailing && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage + EOF + rm $branch_dir_prefix/a/b/branch-trailing && + test_cmp expect err && + + printf "ref: refs/heads/branch \n\n " > $branch_dir_prefix/a/b/branch-trailing && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage + EOF + rm $branch_dir_prefix/a/b/branch-trailing && + test_cmp expect err && + + printf "ref: refs/heads/.branch\n" > $branch_dir_prefix/branch-2-bad && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: refs/heads/branch-2-bad: badSymrefPointee: points to refname with invalid format + EOF + rm $branch_dir_prefix/branch-2-bad && + test_cmp expect err +' + test_done
We have already introduced the checks for regular refs. There is no need to check the consistency of the target which the symbolic ref points to. Instead, we just check the content of the symbolic ref itself. In order to check the content of the symbolic ref, create a function "files_fsck_symref_target". It will first check whether the "pointee" is under the "refs/" directory and then we will check the "pointee" itself. There is no specification about the content of the symbolic ref. Although we do write "ref: %s\n" to create a symbolic ref by using "git-symbolic-ref(1)" command. However, this is not mandatory. We still accept symbolic refs with null trailing garbage. Put it more specific, the following are correct: 1. "ref: refs/heads/master " 2. "ref: refs/heads/master \n \n" 3. "ref: refs/heads/master\n\n" But we do not allow any non-null trailing garbage. The following are bad symbolic contents. 1. "ref: refs/heads/master garbage\n" 2. "ref: refs/heads/master \n\n\n garbage " In order to provide above checks, we will traverse the "pointee" to report the user whether this is null-garbage or no newline. And if symbolic refs contain non-null garbage, we will report "FSCK_MSG_BAD_REF_CONTENT" to the user. Then, we will check the name of the "pointee" is correct by using "check_refname_format". And then if we can access the "pointee_path" in the file system, we should ensure that the file type is correct. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- Documentation/fsck-msgids.txt | 3 ++ fsck.h | 1 + refs/files-backend.c | 87 +++++++++++++++++++++++++++++++++++ t/t0602-reffiles-fsck.sh | 52 +++++++++++++++++++++ 4 files changed, 143 insertions(+)