Message ID | Zvj-xaa_j26Auig7@ArchLinux (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | add ref content check for files backend | expand |
On Sun, Sep 29, 2024 at 03:16:21PM +0800, shejialuo wrote: > Ideally, we want to the users use "git symbolic-ref" to create symrefs > instead of writing raw contents into the filesystem. However, "git > symbolic-ref" is strict with the refname but not strict with the > referent. For example, we can make the "referent" located at the > "$(gitdir)/logs/aaa" and manually write the content into this where we > can still successfully parse this symref by using "git rev-parse". > > $ git init repo && cd repo && git commit --allow-empty -mx > $ git symbolic-ref refs/heads/test logs/aaa > $ echo $(git rev-parse HEAD) > .git/logs/aaa > $ git rev-parse test Oh, curious. This should definitely be tightened in git-symbolic-ref(1) itself. The target should either be a root ref or something starting with "refs/". Anyway, that is of course outside of the scope of this patch series. > We may need to add some restrictions for "referent" parameter when using > "git symbolic-ref" to create symrefs because ideally all the > nonpeudo-refs should be located under the "refs" directory and we may > tighten this in the future. Agreed. > In order to tell the user we may tighten the "escape" situation, create > a new fsck message "escapeReferent" to notify the user that this may > become an error in the future. > > Mentored-by: Patrick Steinhardt <ps@pks.im> > Mentored-by: Karthik Nayak <karthik.188@gmail.com> > Signed-off-by: shejialuo <shejialuo@gmail.com> > --- > Documentation/fsck-msgids.txt | 8 ++++++++ > fsck.h | 1 + > refs/files-backend.c | 7 +++++++ > t/t0602-reffiles-fsck.sh | 18 ++++++++++++++++++ > 4 files changed, 34 insertions(+) > > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > index e0e4519334..223974057d 100644 > --- a/Documentation/fsck-msgids.txt > +++ b/Documentation/fsck-msgids.txt > @@ -52,6 +52,14 @@ > `emptyName`:: > (WARN) A path contains an empty name. > > +`escapeReferent`:: > + (INFO) The referent of a symref is outside the "ref" directory. Proposal: 'The referent of a symbolic reference points neither to a root reference nor to a reference starting with "refs/".' I'd also rename this to e.g. "symrefTargetIsNotAReference" or something like that, because it's not really about whether or not the referent is "escaping". It's a bit of a mouthful, but I don't really have a better name. So feel free to pick something different that describes the error better. > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 57ac466b64..bd215c8d08 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -3520,6 +3520,13 @@ static int files_fsck_symref_target(struct fsck_options *o, > orig_last_byte = referent->buf[orig_len - 1]; > strbuf_rtrim(referent); > > + if (!starts_with(referent->buf, "refs/")) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_ESCAPE_REFERENT, > + "referent '%s' is outside of refs/", > + referent->buf); > + } > + > if (check_refname_format(referent->buf, 0)) { > ret = fsck_report_ref(o, report, > FSCK_MSG_BAD_REFERENT, This check is invalid, because referents can also point to root refs. So you should probably also add a call to `is_root_ref()` here. We also have `is_pseudo_ref()`, and one might be tempted to also allow that. But pseudo refs aren't proper refs, so I'd argue that a symref pointing to a pseudo ref is invalid, too. Patrick
On Mon, Oct 07, 2024 at 08:58:55AM +0200, Patrick Steinhardt wrote: > On Sun, Sep 29, 2024 at 03:16:21PM +0800, shejialuo wrote: > > Ideally, we want to the users use "git symbolic-ref" to create symrefs > > instead of writing raw contents into the filesystem. However, "git > > symbolic-ref" is strict with the refname but not strict with the > > referent. For example, we can make the "referent" located at the > > "$(gitdir)/logs/aaa" and manually write the content into this where we > > can still successfully parse this symref by using "git rev-parse". > > > > $ git init repo && cd repo && git commit --allow-empty -mx > > $ git symbolic-ref refs/heads/test logs/aaa > > $ echo $(git rev-parse HEAD) > .git/logs/aaa > > $ git rev-parse test > > Oh, curious. This should definitely be tightened in git-symbolic-ref(1) > itself. The target should either be a root ref or something starting > with "refs/". Anyway, that is of course outside of the scope of this > patch series. > I am curious here too when I did experiments when writing the code. Because Junio have told me this could happen, so I dive into this. However, it's not reasonable. If we want to tighten the rule, we need to also let "git symbolic-ref" to align with the behavior. That's another question though. [snip] > > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > > index e0e4519334..223974057d 100644 > > --- a/Documentation/fsck-msgids.txt > > +++ b/Documentation/fsck-msgids.txt > > @@ -52,6 +52,14 @@ > > `emptyName`:: > > (WARN) A path contains an empty name. > > > > +`escapeReferent`:: > > + (INFO) The referent of a symref is outside the "ref" directory. > > Proposal: 'The referent of a symbolic reference points neither to a root > reference nor to a reference starting with "refs/".' > That's much better. > I'd also rename this to e.g. "symrefTargetIsNotAReference" or something > like that, because it's not really about whether or not the referent is > "escaping". It's a bit of a mouthful, but I don't really have a better > name. So feel free to pick something different that describes the error > better. > I guess "symrefTargetIsNotAReference" is a little too long. If we decide to convert it to error later. Why not just put it into the "badReferent" fsck message? So, I do not think we need to rename. As I have talked about, we don't need to map error case to fsck message id one by one. > > diff --git a/refs/files-backend.c b/refs/files-backend.c > > index 57ac466b64..bd215c8d08 100644 > > --- a/refs/files-backend.c > > +++ b/refs/files-backend.c > > @@ -3520,6 +3520,13 @@ static int files_fsck_symref_target(struct fsck_options *o, > > orig_last_byte = referent->buf[orig_len - 1]; > > strbuf_rtrim(referent); > > > > + if (!starts_with(referent->buf, "refs/")) { > > + ret = fsck_report_ref(o, report, > > + FSCK_MSG_ESCAPE_REFERENT, > > + "referent '%s' is outside of refs/", > > + referent->buf); > > + } > > + > > if (check_refname_format(referent->buf, 0)) { > > ret = fsck_report_ref(o, report, > > FSCK_MSG_BAD_REFERENT, > > This check is invalid, because referents can also point to root refs. So > you should probably also add a call to `is_root_ref()` here. > Thanks, I omit this situation here. > We also have `is_pseudo_ref()`, and one might be tempted to also allow > that. But pseudo refs aren't proper refs, so I'd argue that a symref > pointing to a pseudo ref is invalid, too. > I agree. > Patrick Thanks, Jialuo
On Mon, Oct 07, 2024 at 04:44:44PM +0800, shejialuo wrote: > On Mon, Oct 07, 2024 at 08:58:55AM +0200, Patrick Steinhardt wrote: > > On Sun, Sep 29, 2024 at 03:16:21PM +0800, shejialuo wrote: > > I'd also rename this to e.g. "symrefTargetIsNotAReference" or something > > like that, because it's not really about whether or not the referent is > > "escaping". It's a bit of a mouthful, but I don't really have a better > > name. So feel free to pick something different that describes the error > > better. > > > > I guess "symrefTargetIsNotAReference" is a little too long. If we decide > to convert it to error later. Why not just put it into the "badReferent" > fsck message? > > So, I do not think we need to rename. As I have talked about, we don't > need to map error case to fsck message id one by one. Mostly because I disagree with this here. I think there should be a 1:1 mapping, and "badReferent" is too generic to provide that. Patrick
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt index e0e4519334..223974057d 100644 --- a/Documentation/fsck-msgids.txt +++ b/Documentation/fsck-msgids.txt @@ -52,6 +52,14 @@ `emptyName`:: (WARN) A path contains an empty name. +`escapeReferent`:: + (INFO) The referent of a symref is outside the "ref" directory. + Although we allow create a symref pointing to the referent which + is outside the "ref" by using `git symbolic-ref`, we may tighten + the rule in the future. Report to the git@vger.kernel.org + mailing list if you see this error, as we need to know what tools + created such a file. + `extraHeaderEntry`:: (IGNORE) Extra headers found after `tagger`. diff --git a/fsck.h b/fsck.h index 979d75cb53..5ecee0fda5 100644 --- a/fsck.h +++ b/fsck.h @@ -80,6 +80,7 @@ enum fsck_msg_type { FUNC(LARGE_PATHNAME, WARN) \ /* infos (reported as warnings, but ignored by default) */ \ FUNC(BAD_FILEMODE, INFO) \ + FUNC(ESCAPE_REFERENT, INFO) \ FUNC(GITMODULES_PARSE, INFO) \ FUNC(GITIGNORE_SYMLINK, INFO) \ FUNC(GITATTRIBUTES_SYMLINK, INFO) \ diff --git a/refs/files-backend.c b/refs/files-backend.c index 57ac466b64..bd215c8d08 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3520,6 +3520,13 @@ static int files_fsck_symref_target(struct fsck_options *o, orig_last_byte = referent->buf[orig_len - 1]; strbuf_rtrim(referent); + if (!starts_with(referent->buf, "refs/")) { + ret = fsck_report_ref(o, report, + FSCK_MSG_ESCAPE_REFERENT, + "referent '%s' is outside of refs/", + referent->buf); + } + if (check_refname_format(referent->buf, 0)) { ret = fsck_report_ref(o, report, FSCK_MSG_BAD_REFERENT, diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh index 718f6abb71..585f562245 100755 --- a/t/t0602-reffiles-fsck.sh +++ b/t/t0602-reffiles-fsck.sh @@ -370,4 +370,22 @@ test_expect_success 'textual symref content should be checked (aggregate)' ' test_cmp expect sorted_err ' +test_expect_success 'textual symref should be checked whether it is escaped' ' + test_when_finished "rm -rf repo" && + git init repo && + branch_dir_prefix=.git/refs/heads && + tag_dir_prefix=.git/refs/tags && + cd repo && + test_commit default && + mkdir -p "$branch_dir_prefix/a/b" && + + printf "ref: refs-back/heads/main\n" >$branch_dir_prefix/branch-bad-1 && + git refs verify 2>err && + cat >expect <<-EOF && + warning: refs/heads/branch-bad-1: escapeReferent: referent '\''refs-back/heads/main'\'' is outside of refs/ + EOF + rm $branch_dir_prefix/branch-bad-1 && + test_cmp expect err +' + test_done
Ideally, we want to the users use "git symbolic-ref" to create symrefs instead of writing raw contents into the filesystem. However, "git symbolic-ref" is strict with the refname but not strict with the referent. For example, we can make the "referent" located at the "$(gitdir)/logs/aaa" and manually write the content into this where we can still successfully parse this symref by using "git rev-parse". $ git init repo && cd repo && git commit --allow-empty -mx $ git symbolic-ref refs/heads/test logs/aaa $ echo $(git rev-parse HEAD) > .git/logs/aaa $ git rev-parse test We may need to add some restrictions for "referent" parameter when using "git symbolic-ref" to create symrefs because ideally all the nonpeudo-refs should be located under the "refs" directory and we may tighten this in the future. In order to tell the user we may tighten the "escape" situation, create a new fsck message "escapeReferent" to notify the user that this may become an error in the future. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- Documentation/fsck-msgids.txt | 8 ++++++++ fsck.h | 1 + refs/files-backend.c | 7 +++++++ t/t0602-reffiles-fsck.sh | 18 ++++++++++++++++++ 4 files changed, 34 insertions(+)