diff mbox series

[v2,02/12] worktree: skip reading HEAD when repairing worktrees

Message ID ecf4f1ddee36643f0ff7e3d40b9aa7c7e6e6ce43.1703753910.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Introduce `refStorage` extension | expand

Commit Message

Patrick Steinhardt Dec. 28, 2023, 9:57 a.m. UTC
When calling `git init --separate-git-dir=<new-path>` on a preexisting
repository, we move the Git directory of that repository to the new path
specified by the user. If there are worktrees present in the repository,
we need to repair the worktrees so that their gitlinks point to the new
location of the repository.

This repair logic will load repositories via `get_worktrees()`, which
will enumerate up and initialize all worktrees. Part of initialization
is logic that we resolve their respective worktree HEADs, even though
that information may not actually be needed in the end by all callers.

In the context of git-init(1) this is about to become a problem, because
we do not have a repository that was set up via `setup_git_directory()`
or friends. Consequentially, it is not yet fully initialized at the time
of calling `repair_worktrees()`, and properly setting up all parts of
the repository in `init_db()` before we repair worktrees is not an easy
thing to do. While this is okay right now where we only have a single
reference backend in Git, once we gain a second one we would be trying
to look up the worktree HEADs before we have figured out the reference
format, which does not work.

We do not require the worktree HEADs at all to repair worktrees. So
let's fix this issue by skipping over the step that reads them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 worktree.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

Comments

Eric Sunshine Dec. 28, 2023, 6:08 p.m. UTC | #1
On Thu, Dec 28, 2023 at 4:57 AM Patrick Steinhardt <ps@pks.im> wrote:
> When calling `git init --separate-git-dir=<new-path>` on a preexisting
> repository, we move the Git directory of that repository to the new path
> specified by the user. If there are worktrees present in the repository,
> we need to repair the worktrees so that their gitlinks point to the new
> location of the repository.
>
> This repair logic will load repositories via `get_worktrees()`, which
> will enumerate up and initialize all worktrees. Part of initialization
> is logic that we resolve their respective worktree HEADs, even though
> that information may not actually be needed in the end by all callers.
>
> In the context of git-init(1) this is about to become a problem, because
> we do not have a repository that was set up via `setup_git_directory()`
> or friends. Consequentially, it is not yet fully initialized at the time
> of calling `repair_worktrees()`, and properly setting up all parts of
> the repository in `init_db()` before we repair worktrees is not an easy
> thing to do. While this is okay right now where we only have a single
> reference backend in Git, once we gain a second one we would be trying
> to look up the worktree HEADs before we have figured out the reference
> format, which does not work.

s/Consequentially/Consequently/

I found it difficult to digest this paragraph with its foreshadowing
phrase "about to become a problem" since it wasn't apparent until the
very final sentence in the paragraph what the actual problem would be.
Perhaps if you mention early on that the reftable backend will have
trouble with the current code, it would be easier to grasp. Maybe
something like this:

    Although not a problem presently with the file-based reference
    backend, it will become a problem with the upcoming reftable
    backend.  In the context of git-init(1) a fully-materialized
    repository set up via `setup_git_directory()` or friends is not
    yet present.  Consequently, it is not yet fully initialized at the
    time `repair_worktrees()` is called, and properly setting up all
    parts of the repository in `init_db()` before we repair worktrees
    is not an easy task.  With introduction of the reftable backend,
    it would try to look up the worktree HEADs before we have figured
    out the reference format, thus would not work.

> We do not require the worktree HEADs at all to repair worktrees. So
> let's fix this issue by skipping over the step that reads them.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> diff --git a/worktree.c b/worktree.c
> @@ -51,7 +51,7 @@ static void add_head_info(struct worktree *wt)
> -static struct worktree *get_main_worktree(void)
> +static struct worktree *get_main_worktree(int skip_reading_head)
>  {
> -       add_head_info(worktree);
> +       if (!skip_reading_head)
> +               add_head_info(worktree);

This is so special-case that it feels more than a little dirty.

> @@ -591,7 +599,7 @@ static void repair_noop(int iserr UNUSED,
>  void repair_worktrees(worktree_repair_fn fn, void *cb_data)
>  {
> -       struct worktree **worktrees = get_worktrees();
> +       struct worktree **worktrees = get_worktrees_internal(1);

In an ideal world, a repair function should not be calling
get_worktrees() at all since get_worktrees() is not tolerant of
corruption of the worktree administrative files. (Plus, as you note,
it does more work than necessary for the current set of repairs
performed by `git worktree repair`.)

Even as I was implementing the worktree repair code, I wavered back
and forth multiple times between calling get_worktrees() and writing a
custom corruption-tolerant function to retrieve worktree
administrative information. In the end, I opted for get_worktrees()
for the pragmatic reason that it allowed me to narrow the scope of the
patches to the types of repairs which were the current focus without
getting mired down in the involved details of writing a
corruption-tolerant function for retrieving worktree metadata.
However, that decision was made with the understanding that the
pragmatic choice of the moment would not rule out the possibility of
returning later and implementing the more correct approach of having a
corruption-tolerant function for retrieving worktree metadata.

The special-case ugliness of this patch suggests strongly in favor of
implementing the earlier-envisioned corruption-tolerant function for
retrieving worktree metadata rather than the band-aid approach taken
by this patch. The generic name get_worktrees_internal() isn't helpful
either; it doesn't do a good job of conveying any particular meaning
to the reader.

Having said all that, I'm not overly opposed to this patch, especially
since your main focus is on getting the reftable backend integrated,
and because the changes (and ugliness) introduced by this patch are
entirely self-contained and private to worktree.c, so are not a
show-stopper by any means. Rather, I wanted to get down to writing
what I think would be a better future approach if someone gets around
to tackling it. (There is no pressing need at the moment, and that
someone doesn't have to be you.)
Eric Sunshine Dec. 28, 2023, 6:13 p.m. UTC | #2
On Thu, Dec 28, 2023 at 1:08 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> Having said all that, I'm not overly opposed to this patch, especially
> since your main focus is on getting the reftable backend integrated,
> and because the changes (and ugliness) introduced by this patch are
> entirely self-contained and private to worktree.c, so are not a
> show-stopper by any means. Rather, I wanted to get down to writing
> what I think would be a better future approach if someone gets around
> to tackling it. (There is no pressing need at the moment, and that
> someone doesn't have to be you.)

I forgot to mention that, if you reroll for some reason, the
get_worktrees()/get_worktrees_internal() dance might deserve an
in-source NEEDSWORK comment explaining that get_worktrees_internal()
exists to work around the shortcoming that a corruption-tolerant
function for retrieving worktree metadata (for use by the "repair"
function) does not yet exist.
Patrick Steinhardt Dec. 28, 2023, 8:18 p.m. UTC | #3
On Thu, Dec 28, 2023 at 01:13:04PM -0500, Eric Sunshine wrote:
> On Thu, Dec 28, 2023 at 1:08 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > Having said all that, I'm not overly opposed to this patch, especially
> > since your main focus is on getting the reftable backend integrated,
> > and because the changes (and ugliness) introduced by this patch are
> > entirely self-contained and private to worktree.c, so are not a
> > show-stopper by any means. Rather, I wanted to get down to writing
> > what I think would be a better future approach if someone gets around
> > to tackling it. (There is no pressing need at the moment, and that
> > someone doesn't have to be you.)
> 
> I forgot to mention that, if you reroll for some reason, the
> get_worktrees()/get_worktrees_internal() dance might deserve an
> in-source NEEDSWORK comment explaining that get_worktrees_internal()
> exists to work around the shortcoming that a corruption-tolerant
> function for retrieving worktree metadata (for use by the "repair"
> function) does not yet exist.

Thanks for sharing your thoughts, will do.

Patrick
diff mbox series

Patch

diff --git a/worktree.c b/worktree.c
index a56a6c2a3d..9702ed0308 100644
--- a/worktree.c
+++ b/worktree.c
@@ -51,7 +51,7 @@  static void add_head_info(struct worktree *wt)
 /**
  * get the main worktree
  */
-static struct worktree *get_main_worktree(void)
+static struct worktree *get_main_worktree(int skip_reading_head)
 {
 	struct worktree *worktree = NULL;
 	struct strbuf worktree_path = STRBUF_INIT;
@@ -70,11 +70,13 @@  static struct worktree *get_main_worktree(void)
 	 */
 	worktree->is_bare = (is_bare_repository_cfg == 1) ||
 		is_bare_repository();
-	add_head_info(worktree);
+	if (!skip_reading_head)
+		add_head_info(worktree);
 	return worktree;
 }
 
-static struct worktree *get_linked_worktree(const char *id)
+static struct worktree *get_linked_worktree(const char *id,
+					    int skip_reading_head)
 {
 	struct worktree *worktree = NULL;
 	struct strbuf path = STRBUF_INIT;
@@ -93,7 +95,8 @@  static struct worktree *get_linked_worktree(const char *id)
 	CALLOC_ARRAY(worktree, 1);
 	worktree->path = strbuf_detach(&worktree_path, NULL);
 	worktree->id = xstrdup(id);
-	add_head_info(worktree);
+	if (!skip_reading_head)
+		add_head_info(worktree);
 
 done:
 	strbuf_release(&path);
@@ -118,7 +121,7 @@  static void mark_current_worktree(struct worktree **worktrees)
 	free(git_dir);
 }
 
-struct worktree **get_worktrees(void)
+static struct worktree **get_worktrees_internal(int skip_reading_head)
 {
 	struct worktree **list = NULL;
 	struct strbuf path = STRBUF_INIT;
@@ -128,7 +131,7 @@  struct worktree **get_worktrees(void)
 
 	ALLOC_ARRAY(list, alloc);
 
-	list[counter++] = get_main_worktree();
+	list[counter++] = get_main_worktree(skip_reading_head);
 
 	strbuf_addf(&path, "%s/worktrees", get_git_common_dir());
 	dir = opendir(path.buf);
@@ -137,7 +140,7 @@  struct worktree **get_worktrees(void)
 		while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
 			struct worktree *linked = NULL;
 
-			if ((linked = get_linked_worktree(d->d_name))) {
+			if ((linked = get_linked_worktree(d->d_name, skip_reading_head))) {
 				ALLOC_GROW(list, counter + 1, alloc);
 				list[counter++] = linked;
 			}
@@ -151,6 +154,11 @@  struct worktree **get_worktrees(void)
 	return list;
 }
 
+struct worktree **get_worktrees(void)
+{
+	return get_worktrees_internal(0);
+}
+
 const char *get_worktree_git_dir(const struct worktree *wt)
 {
 	if (!wt)
@@ -591,7 +599,7 @@  static void repair_noop(int iserr UNUSED,
 
 void repair_worktrees(worktree_repair_fn fn, void *cb_data)
 {
-	struct worktree **worktrees = get_worktrees();
+	struct worktree **worktrees = get_worktrees_internal(1);
 	struct worktree **wt = worktrees + 1; /* +1 skips main worktree */
 
 	if (!fn)