Message ID | Z3qN30z1NCXa3AX-@ArchLinux (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | add more ref consistency checks | expand |
shejialuo <shejialuo@gmail.com> writes: > In "packed-backend.c", there are some functions such as "create_snapshot" > and "next_record" which would check the correctness of the content of > the "packed-ref" file. When anything is bad, the program will die. So you're saying, `create_snapshot()` and `next_record()` exit the program on any error. Okay that seems to be valid. > It may seem that we have nothing relevant to above feature, because we > are going to read and parse the raw "packed-ref" file without creating > the snapshot and using the ref iterator to check the consistency. > > However, when using "get_worktrees" in "builtin/refs", we will parse the > head information. If the referent of the "HEAD" is inside the > "packed-ref", we will call "create_snapshot" and "next_record" functions > to parse the "packed-ref" to get the head information. And if there are > something wrong, the program will die. > > Although this behavior has no harm for the program, it will > short-circuit the program. When the users execute "git refs verify" or > "git fsck", we don't want to simply die the program but rather show the > warnings or errors as many as possible to info the users. So, we should > avoiding reading the head info. > This is a bit tricky here. If the information for the `HEAD` ref is incorrect in the packed-refs, git would exit early. Which is what we're trying to avoid in this patch, by using the `get_worktrees_internal()` function. However, I would question if this is the right approach. Shouldn't `get_worktree()` failing indicate that the repository is invalid? In that case does it really make sense to allow the user to even run `git refs verify`? Isn't the prerequisite for running the `git-refs(1)` command a valid repository? Generally, I'd agree that we try to obtain all errors so that the user can get a full picture. But exposing internal worktree functions so we treat invalid repos as valid repos so we can do that, seems a bit of a stretch. > Fortunately, in 465a22b338 (worktree: skip reading HEAD when repairing > worktrees, 2023-12-29), we have introduced a function > "get_worktrees_internal" which allows us to get worktrees without > reading head info. > > Create a new exposed function "get_worktrees_without_reading_head", then > replace the "get_worktrees" in "builtin/refs" with the new created > function. > [snip]
On Tue, Jan 07, 2025 at 06:57:08AM -0800, Karthik Nayak wrote: > shejialuo <shejialuo@gmail.com> writes: > > > In "packed-backend.c", there are some functions such as "create_snapshot" > > and "next_record" which would check the correctness of the content of > > the "packed-ref" file. When anything is bad, the program will die. > > So you're saying, `create_snapshot()` and `next_record()` exit the > program on any error. Okay that seems to be valid. > > > It may seem that we have nothing relevant to above feature, because we > > are going to read and parse the raw "packed-ref" file without creating > > the snapshot and using the ref iterator to check the consistency. > > > > However, when using "get_worktrees" in "builtin/refs", we will parse the > > head information. If the referent of the "HEAD" is inside the > > "packed-ref", we will call "create_snapshot" and "next_record" functions > > to parse the "packed-ref" to get the head information. And if there are > > something wrong, the program will die. > > > > Although this behavior has no harm for the program, it will > > short-circuit the program. When the users execute "git refs verify" or > > "git fsck", we don't want to simply die the program but rather show the > > warnings or errors as many as possible to info the users. So, we should > > avoiding reading the head info. > > > > This is a bit tricky here. If the information for the `HEAD` ref is > incorrect in the packed-refs, git would exit early. Which is what we're > trying to avoid in this patch, by using the `get_worktrees_internal()` > function. > I think my commit message may confuse you here. The information of the "HEAD" ref will never be stored in the "packed-refs", but if we need to read the head information, we need to parse the "packed-refs" via "create_snapshot" method. Even though the corresponding referent is correct (and even if it is not correct, it won't let the program die), "create_snapshot" will call "verify_buffer_safe" to check whether there is a newline in the last line of the file. If not, it will die. However, this is a bad thing. For example, if the HEAD points to "refs/heads/main", now we need to use the code path from packed-backend, we have to call "create_snapshot", the program will die. And we cannot tell the user the other faults. ```packed-refs <good_oid> refs/heads/main\n <bad_oid> <bad_refname>\n <oid> refs/heads/a ``` So, the motivation here is that we should not read HEAD at all when we are doing consistency checking to make the code totally independent of the "create_snapshot" and "next_record". > However, I would question if this is the right approach. Shouldn't > `get_worktree()` failing indicate that the repository is invalid? In > that case does it really make sense to allow the user to even run `git > refs verify`? Isn't the prerequisite for running the `git-refs(1)` > command a valid repository? > As I have talked about above, even though the referent of "HEAD" is good, "get_worktree()" will still fail because of some fatal errors in "packed-refs" file. I don't think that the repository is invalid in this situation. Put it further more, in what situations, the users want to execute "git refs verify" or "git-fsck". From my intuitive thinking, the users will execute these check commands when something fails. They want to know why. So we should execute these commands when the repository is invalid to tell the user what may be wrong. And this is the value of these two commands. Thanks, Jialuo
shejialuo <shejialuo@gmail.com> writes: > On Tue, Jan 07, 2025 at 06:57:08AM -0800, Karthik Nayak wrote: >> shejialuo <shejialuo@gmail.com> writes: >> >> > In "packed-backend.c", there are some functions such as "create_snapshot" >> > and "next_record" which would check the correctness of the content of >> > the "packed-ref" file. When anything is bad, the program will die. >> >> So you're saying, `create_snapshot()` and `next_record()` exit the >> program on any error. Okay that seems to be valid. >> >> > It may seem that we have nothing relevant to above feature, because we >> > are going to read and parse the raw "packed-ref" file without creating >> > the snapshot and using the ref iterator to check the consistency. >> > >> > However, when using "get_worktrees" in "builtin/refs", we will parse the >> > head information. If the referent of the "HEAD" is inside the >> > "packed-ref", we will call "create_snapshot" and "next_record" functions >> > to parse the "packed-ref" to get the head information. And if there are >> > something wrong, the program will die. >> > >> > Although this behavior has no harm for the program, it will >> > short-circuit the program. When the users execute "git refs verify" or >> > "git fsck", we don't want to simply die the program but rather show the >> > warnings or errors as many as possible to info the users. So, we should >> > avoiding reading the head info. >> > >> >> This is a bit tricky here. If the information for the `HEAD` ref is >> incorrect in the packed-refs, git would exit early. Which is what we're >> trying to avoid in this patch, by using the `get_worktrees_internal()` >> function. >> > > I think my commit message may confuse you here. The information of the > "HEAD" ref will never be stored in the "packed-refs", but if we need to > read the head information, we need to parse the "packed-refs" via > "create_snapshot" method. Even though the corresponding referent is > correct (and even if it is not correct, it won't let the program die), > "create_snapshot" will call "verify_buffer_safe" to check whether there > is a newline in the last line of the file. If not, it will die. > > However, this is a bad thing. For example, if the HEAD points to > "refs/heads/main", now we need to use the code path from packed-backend, > we have to call "create_snapshot", the program will die. And we cannot > tell the user the other faults. > > ```packed-refs > <good_oid> refs/heads/main\n > <bad_oid> <bad_refname>\n > <oid> refs/heads/a > ``` > > So, the motivation here is that we should not read HEAD at all when we > are doing consistency checking to make the code totally independent of > the "create_snapshot" and "next_record". > Thanks for clarifying. I understand better the point now. >> However, I would question if this is the right approach. Shouldn't >> `get_worktree()` failing indicate that the repository is invalid? In >> that case does it really make sense to allow the user to even run `git >> refs verify`? Isn't the prerequisite for running the `git-refs(1)` >> command a valid repository? >> > > As I have talked about above, even though the referent of "HEAD" is > good, "get_worktree()" will still fail because of some fatal errors in > "packed-refs" file. I don't think that the repository is invalid in this > situation. > > Put it further more, in what situations, the users want to execute "git > refs verify" or "git-fsck". From my intuitive thinking, the users will > execute these check commands when something fails. They want to know > why. So we should execute these commands when the repository is invalid > to tell the user what may be wrong. And this is the value of these two > commands. > I agree with your inference here, we should try and figure out as much as we can and report it, so clients can make informed decisions on how to fix their refdb/repo. Thanks for explaining. > > Thanks, > Jialuo
diff --git a/builtin/refs.c b/builtin/refs.c index a29f195834..55ff5dae11 100644 --- a/builtin/refs.c +++ b/builtin/refs.c @@ -88,7 +88,7 @@ static int cmd_refs_verify(int argc, const char **argv, const char *prefix, git_config(git_fsck_config, &fsck_refs_options); prepare_repo_settings(the_repository); - worktrees = get_worktrees(); + worktrees = get_worktrees_without_reading_head(); for (size_t i = 0; worktrees[i]; i++) ret |= refs_fsck(get_worktree_ref_store(worktrees[i]), &fsck_refs_options, worktrees[i]); diff --git a/worktree.c b/worktree.c index af68b24f9d..74cb463e51 100644 --- a/worktree.c +++ b/worktree.c @@ -174,6 +174,11 @@ struct worktree **get_worktrees(void) return get_worktrees_internal(0); } +struct worktree **get_worktrees_without_reading_head(void) +{ + return get_worktrees_internal(1); +} + const char *get_worktree_git_dir(const struct worktree *wt) { if (!wt) diff --git a/worktree.h b/worktree.h index 38145df80f..1ba4a161a0 100644 --- a/worktree.h +++ b/worktree.h @@ -30,6 +30,12 @@ struct worktree { */ struct worktree **get_worktrees(void); +/* + * Like `get_worktrees`, but does not read HEAD. This is useful when checking + * the consistency, as reading HEAD may not be necessary. + */ +struct worktree **get_worktrees_without_reading_head(void); + /* * Returns 1 if linked worktrees exist, 0 otherwise. */
In "packed-backend.c", there are some functions such as "create_snapshot" and "next_record" which would check the correctness of the content of the "packed-ref" file. When anything is bad, the program will die. It may seem that we have nothing relevant to above feature, because we are going to read and parse the raw "packed-ref" file without creating the snapshot and using the ref iterator to check the consistency. However, when using "get_worktrees" in "builtin/refs", we will parse the head information. If the referent of the "HEAD" is inside the "packed-ref", we will call "create_snapshot" and "next_record" functions to parse the "packed-ref" to get the head information. And if there are something wrong, the program will die. Although this behavior has no harm for the program, it will short-circuit the program. When the users execute "git refs verify" or "git fsck", we don't want to simply die the program but rather show the warnings or errors as many as possible to info the users. So, we should avoiding reading the head info. Fortunately, in 465a22b338 (worktree: skip reading HEAD when repairing worktrees, 2023-12-29), we have introduced a function "get_worktrees_internal" which allows us to get worktrees without reading head info. Create a new exposed function "get_worktrees_without_reading_head", then replace the "get_worktrees" in "builtin/refs" with the new created function. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- builtin/refs.c | 2 +- worktree.c | 5 +++++ worktree.h | 6 ++++++ 3 files changed, 12 insertions(+), 1 deletion(-)