diff mbox series

[02/10] builtin/refs.h: get worktrees without reading head info

Message ID Z3qN30z1NCXa3AX-@ArchLinux (mailing list archive)
State New
Headers show
Series add more ref consistency checks | expand

Commit Message

shejialuo Jan. 5, 2025, 1:49 p.m. UTC
In "packed-backend.c", there are some functions such as "create_snapshot"
and "next_record" which would check the correctness of the content of
the "packed-ref" file. When anything is bad, the program will die.

It may seem that we have nothing relevant to above feature, because we
are going to read and parse the raw "packed-ref" file without creating
the snapshot and using the ref iterator to check the consistency.

However, when using "get_worktrees" in "builtin/refs", we will parse the
head information. If the referent of the "HEAD" is inside the
"packed-ref", we will call "create_snapshot" and "next_record" functions
to parse the "packed-ref" to get the head information. And if there are
something wrong, the program will die.

Although this behavior has no harm for the program, it will
short-circuit the program. When the users execute "git refs verify" or
"git fsck", we don't want to simply die the program but rather show the
warnings or errors as many as possible to info the users. So, we should
avoiding reading the head info.

Fortunately, in 465a22b338 (worktree: skip reading HEAD when repairing
worktrees, 2023-12-29), we have introduced a function
"get_worktrees_internal" which allows us to get worktrees without
reading head info.

Create a new exposed function "get_worktrees_without_reading_head", then
replace the "get_worktrees" in "builtin/refs" with the new created
function.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 builtin/refs.c | 2 +-
 worktree.c     | 5 +++++
 worktree.h     | 6 ++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

Comments

Karthik Nayak Jan. 7, 2025, 2:57 p.m. UTC | #1
shejialuo <shejialuo@gmail.com> writes:

> In "packed-backend.c", there are some functions such as "create_snapshot"
> and "next_record" which would check the correctness of the content of
> the "packed-ref" file. When anything is bad, the program will die.

So you're saying, `create_snapshot()` and `next_record()` exit the
program on any error. Okay that seems to be valid.

> It may seem that we have nothing relevant to above feature, because we
> are going to read and parse the raw "packed-ref" file without creating
> the snapshot and using the ref iterator to check the consistency.
>
> However, when using "get_worktrees" in "builtin/refs", we will parse the
> head information. If the referent of the "HEAD" is inside the
> "packed-ref", we will call "create_snapshot" and "next_record" functions
> to parse the "packed-ref" to get the head information. And if there are
> something wrong, the program will die.
>
> Although this behavior has no harm for the program, it will
> short-circuit the program. When the users execute "git refs verify" or
> "git fsck", we don't want to simply die the program but rather show the
> warnings or errors as many as possible to info the users. So, we should
> avoiding reading the head info.
>

This is a bit tricky here. If the information for the `HEAD` ref is
incorrect in the packed-refs, git would exit early. Which is what we're
trying to avoid in this patch, by using the `get_worktrees_internal()`
function.

However, I would question if this is the right approach. Shouldn't
`get_worktree()` failing indicate that the repository is invalid? In
that case does it really make sense to allow the user to even run `git
refs verify`? Isn't the prerequisite for running the `git-refs(1)`
command a valid repository?

Generally, I'd agree that we try to obtain all errors so that the user
can get a full picture. But exposing internal worktree functions so we
treat invalid repos as valid repos so we can do that, seems a bit of a
stretch.

> Fortunately, in 465a22b338 (worktree: skip reading HEAD when repairing
> worktrees, 2023-12-29), we have introduced a function
> "get_worktrees_internal" which allows us to get worktrees without
> reading head info.
>
> Create a new exposed function "get_worktrees_without_reading_head", then
> replace the "get_worktrees" in "builtin/refs" with the new created
> function.
>

[snip]
shejialuo Jan. 7, 2025, 4:34 p.m. UTC | #2
On Tue, Jan 07, 2025 at 06:57:08AM -0800, Karthik Nayak wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > In "packed-backend.c", there are some functions such as "create_snapshot"
> > and "next_record" which would check the correctness of the content of
> > the "packed-ref" file. When anything is bad, the program will die.
> 
> So you're saying, `create_snapshot()` and `next_record()` exit the
> program on any error. Okay that seems to be valid.
> 
> > It may seem that we have nothing relevant to above feature, because we
> > are going to read and parse the raw "packed-ref" file without creating
> > the snapshot and using the ref iterator to check the consistency.
> >
> > However, when using "get_worktrees" in "builtin/refs", we will parse the
> > head information. If the referent of the "HEAD" is inside the
> > "packed-ref", we will call "create_snapshot" and "next_record" functions
> > to parse the "packed-ref" to get the head information. And if there are
> > something wrong, the program will die.
> >
> > Although this behavior has no harm for the program, it will
> > short-circuit the program. When the users execute "git refs verify" or
> > "git fsck", we don't want to simply die the program but rather show the
> > warnings or errors as many as possible to info the users. So, we should
> > avoiding reading the head info.
> >
> 
> This is a bit tricky here. If the information for the `HEAD` ref is
> incorrect in the packed-refs, git would exit early. Which is what we're
> trying to avoid in this patch, by using the `get_worktrees_internal()`
> function.
> 

I think my commit message may confuse you here. The information of the
"HEAD" ref will never be stored in the "packed-refs", but if we need to
read the head information, we need to parse the "packed-refs" via
"create_snapshot" method. Even though the corresponding referent is
correct (and even if it is not correct, it won't let the program die),
"create_snapshot" will call "verify_buffer_safe" to check whether there
is a newline in the last line of the file. If not, it will die.

However, this is a bad thing. For example, if the HEAD points to
"refs/heads/main", now we need to use the code path from packed-backend,
we have to call "create_snapshot", the program will die. And we cannot
tell the user the other faults.

```packed-refs
<good_oid> refs/heads/main\n
<bad_oid> <bad_refname>\n
<oid> refs/heads/a
```

So, the motivation here is that we should not read HEAD at all when we
are doing consistency checking to make the code totally independent of
the "create_snapshot" and "next_record".

> However, I would question if this is the right approach. Shouldn't
> `get_worktree()` failing indicate that the repository is invalid? In
> that case does it really make sense to allow the user to even run `git
> refs verify`? Isn't the prerequisite for running the `git-refs(1)`
> command a valid repository?
> 

As I have talked about above, even though the referent of "HEAD" is
good, "get_worktree()" will still fail because of some fatal errors in
"packed-refs" file. I don't think that the repository is invalid in this
situation.

Put it further more, in what situations, the users want to execute "git
refs verify" or "git-fsck". From my intuitive thinking, the users will
execute these check commands when something fails. They want to know
why. So we should execute these commands when the repository is invalid
to tell the user what may be wrong. And this is the value of these two
commands.

Thanks,
Jialuo
Karthik Nayak Jan. 8, 2025, 8:40 a.m. UTC | #3
shejialuo <shejialuo@gmail.com> writes:

> On Tue, Jan 07, 2025 at 06:57:08AM -0800, Karthik Nayak wrote:
>> shejialuo <shejialuo@gmail.com> writes:
>>
>> > In "packed-backend.c", there are some functions such as "create_snapshot"
>> > and "next_record" which would check the correctness of the content of
>> > the "packed-ref" file. When anything is bad, the program will die.
>>
>> So you're saying, `create_snapshot()` and `next_record()` exit the
>> program on any error. Okay that seems to be valid.
>>
>> > It may seem that we have nothing relevant to above feature, because we
>> > are going to read and parse the raw "packed-ref" file without creating
>> > the snapshot and using the ref iterator to check the consistency.
>> >
>> > However, when using "get_worktrees" in "builtin/refs", we will parse the
>> > head information. If the referent of the "HEAD" is inside the
>> > "packed-ref", we will call "create_snapshot" and "next_record" functions
>> > to parse the "packed-ref" to get the head information. And if there are
>> > something wrong, the program will die.
>> >
>> > Although this behavior has no harm for the program, it will
>> > short-circuit the program. When the users execute "git refs verify" or
>> > "git fsck", we don't want to simply die the program but rather show the
>> > warnings or errors as many as possible to info the users. So, we should
>> > avoiding reading the head info.
>> >
>>
>> This is a bit tricky here. If the information for the `HEAD` ref is
>> incorrect in the packed-refs, git would exit early. Which is what we're
>> trying to avoid in this patch, by using the `get_worktrees_internal()`
>> function.
>>
>
> I think my commit message may confuse you here. The information of the
> "HEAD" ref will never be stored in the "packed-refs", but if we need to
> read the head information, we need to parse the "packed-refs" via
> "create_snapshot" method. Even though the corresponding referent is
> correct (and even if it is not correct, it won't let the program die),
> "create_snapshot" will call "verify_buffer_safe" to check whether there
> is a newline in the last line of the file. If not, it will die.
>
> However, this is a bad thing. For example, if the HEAD points to
> "refs/heads/main", now we need to use the code path from packed-backend,
> we have to call "create_snapshot", the program will die. And we cannot
> tell the user the other faults.
>
> ```packed-refs
> <good_oid> refs/heads/main\n
> <bad_oid> <bad_refname>\n
> <oid> refs/heads/a
> ```
>
> So, the motivation here is that we should not read HEAD at all when we
> are doing consistency checking to make the code totally independent of
> the "create_snapshot" and "next_record".
>

Thanks for clarifying. I understand better the point now.

>> However, I would question if this is the right approach. Shouldn't
>> `get_worktree()` failing indicate that the repository is invalid? In
>> that case does it really make sense to allow the user to even run `git
>> refs verify`? Isn't the prerequisite for running the `git-refs(1)`
>> command a valid repository?
>>
>
> As I have talked about above, even though the referent of "HEAD" is
> good, "get_worktree()" will still fail because of some fatal errors in
> "packed-refs" file. I don't think that the repository is invalid in this
> situation.
>
> Put it further more, in what situations, the users want to execute "git
> refs verify" or "git-fsck". From my intuitive thinking, the users will
> execute these check commands when something fails. They want to know
> why. So we should execute these commands when the repository is invalid
> to tell the user what may be wrong. And this is the value of these two
> commands.
>

I agree with your inference here, we should try and figure out as much
as we can and report it, so clients can make informed decisions on how
to fix their refdb/repo. Thanks for explaining.

>
> Thanks,
> Jialuo
diff mbox series

Patch

diff --git a/builtin/refs.c b/builtin/refs.c
index a29f195834..55ff5dae11 100644
--- a/builtin/refs.c
+++ b/builtin/refs.c
@@ -88,7 +88,7 @@  static int cmd_refs_verify(int argc, const char **argv, const char *prefix,
 	git_config(git_fsck_config, &fsck_refs_options);
 	prepare_repo_settings(the_repository);
 
-	worktrees = get_worktrees();
+	worktrees = get_worktrees_without_reading_head();
 	for (size_t i = 0; worktrees[i]; i++)
 		ret |= refs_fsck(get_worktree_ref_store(worktrees[i]),
 				 &fsck_refs_options, worktrees[i]);
diff --git a/worktree.c b/worktree.c
index af68b24f9d..74cb463e51 100644
--- a/worktree.c
+++ b/worktree.c
@@ -174,6 +174,11 @@  struct worktree **get_worktrees(void)
 	return get_worktrees_internal(0);
 }
 
+struct worktree **get_worktrees_without_reading_head(void)
+{
+	return get_worktrees_internal(1);
+}
+
 const char *get_worktree_git_dir(const struct worktree *wt)
 {
 	if (!wt)
diff --git a/worktree.h b/worktree.h
index 38145df80f..1ba4a161a0 100644
--- a/worktree.h
+++ b/worktree.h
@@ -30,6 +30,12 @@  struct worktree {
  */
 struct worktree **get_worktrees(void);
 
+/*
+ * Like `get_worktrees`, but does not read HEAD. This is useful when checking
+ * the consistency, as reading HEAD may not be necessary.
+ */
+struct worktree **get_worktrees_without_reading_head(void);
+
 /*
  * Returns 1 if linked worktrees exist, 0 otherwise.
  */