Message ID | ecf4f1ddee36643f0ff7e3d40b9aa7c7e6e6ce43.1703753910.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Introduce `refStorage` extension | expand |
On Thu, Dec 28, 2023 at 4:57 AM Patrick Steinhardt <ps@pks.im> wrote: > When calling `git init --separate-git-dir=<new-path>` on a preexisting > repository, we move the Git directory of that repository to the new path > specified by the user. If there are worktrees present in the repository, > we need to repair the worktrees so that their gitlinks point to the new > location of the repository. > > This repair logic will load repositories via `get_worktrees()`, which > will enumerate up and initialize all worktrees. Part of initialization > is logic that we resolve their respective worktree HEADs, even though > that information may not actually be needed in the end by all callers. > > In the context of git-init(1) this is about to become a problem, because > we do not have a repository that was set up via `setup_git_directory()` > or friends. Consequentially, it is not yet fully initialized at the time > of calling `repair_worktrees()`, and properly setting up all parts of > the repository in `init_db()` before we repair worktrees is not an easy > thing to do. While this is okay right now where we only have a single > reference backend in Git, once we gain a second one we would be trying > to look up the worktree HEADs before we have figured out the reference > format, which does not work. s/Consequentially/Consequently/ I found it difficult to digest this paragraph with its foreshadowing phrase "about to become a problem" since it wasn't apparent until the very final sentence in the paragraph what the actual problem would be. Perhaps if you mention early on that the reftable backend will have trouble with the current code, it would be easier to grasp. Maybe something like this: Although not a problem presently with the file-based reference backend, it will become a problem with the upcoming reftable backend. In the context of git-init(1) a fully-materialized repository set up via `setup_git_directory()` or friends is not yet present. Consequently, it is not yet fully initialized at the time `repair_worktrees()` is called, and properly setting up all parts of the repository in `init_db()` before we repair worktrees is not an easy task. With introduction of the reftable backend, it would try to look up the worktree HEADs before we have figured out the reference format, thus would not work. > We do not require the worktree HEADs at all to repair worktrees. So > let's fix this issue by skipping over the step that reads them. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > diff --git a/worktree.c b/worktree.c > @@ -51,7 +51,7 @@ static void add_head_info(struct worktree *wt) > -static struct worktree *get_main_worktree(void) > +static struct worktree *get_main_worktree(int skip_reading_head) > { > - add_head_info(worktree); > + if (!skip_reading_head) > + add_head_info(worktree); This is so special-case that it feels more than a little dirty. > @@ -591,7 +599,7 @@ static void repair_noop(int iserr UNUSED, > void repair_worktrees(worktree_repair_fn fn, void *cb_data) > { > - struct worktree **worktrees = get_worktrees(); > + struct worktree **worktrees = get_worktrees_internal(1); In an ideal world, a repair function should not be calling get_worktrees() at all since get_worktrees() is not tolerant of corruption of the worktree administrative files. (Plus, as you note, it does more work than necessary for the current set of repairs performed by `git worktree repair`.) Even as I was implementing the worktree repair code, I wavered back and forth multiple times between calling get_worktrees() and writing a custom corruption-tolerant function to retrieve worktree administrative information. In the end, I opted for get_worktrees() for the pragmatic reason that it allowed me to narrow the scope of the patches to the types of repairs which were the current focus without getting mired down in the involved details of writing a corruption-tolerant function for retrieving worktree metadata. However, that decision was made with the understanding that the pragmatic choice of the moment would not rule out the possibility of returning later and implementing the more correct approach of having a corruption-tolerant function for retrieving worktree metadata. The special-case ugliness of this patch suggests strongly in favor of implementing the earlier-envisioned corruption-tolerant function for retrieving worktree metadata rather than the band-aid approach taken by this patch. The generic name get_worktrees_internal() isn't helpful either; it doesn't do a good job of conveying any particular meaning to the reader. Having said all that, I'm not overly opposed to this patch, especially since your main focus is on getting the reftable backend integrated, and because the changes (and ugliness) introduced by this patch are entirely self-contained and private to worktree.c, so are not a show-stopper by any means. Rather, I wanted to get down to writing what I think would be a better future approach if someone gets around to tackling it. (There is no pressing need at the moment, and that someone doesn't have to be you.)
On Thu, Dec 28, 2023 at 1:08 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > Having said all that, I'm not overly opposed to this patch, especially > since your main focus is on getting the reftable backend integrated, > and because the changes (and ugliness) introduced by this patch are > entirely self-contained and private to worktree.c, so are not a > show-stopper by any means. Rather, I wanted to get down to writing > what I think would be a better future approach if someone gets around > to tackling it. (There is no pressing need at the moment, and that > someone doesn't have to be you.) I forgot to mention that, if you reroll for some reason, the get_worktrees()/get_worktrees_internal() dance might deserve an in-source NEEDSWORK comment explaining that get_worktrees_internal() exists to work around the shortcoming that a corruption-tolerant function for retrieving worktree metadata (for use by the "repair" function) does not yet exist.
On Thu, Dec 28, 2023 at 01:13:04PM -0500, Eric Sunshine wrote: > On Thu, Dec 28, 2023 at 1:08 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > > Having said all that, I'm not overly opposed to this patch, especially > > since your main focus is on getting the reftable backend integrated, > > and because the changes (and ugliness) introduced by this patch are > > entirely self-contained and private to worktree.c, so are not a > > show-stopper by any means. Rather, I wanted to get down to writing > > what I think would be a better future approach if someone gets around > > to tackling it. (There is no pressing need at the moment, and that > > someone doesn't have to be you.) > > I forgot to mention that, if you reroll for some reason, the > get_worktrees()/get_worktrees_internal() dance might deserve an > in-source NEEDSWORK comment explaining that get_worktrees_internal() > exists to work around the shortcoming that a corruption-tolerant > function for retrieving worktree metadata (for use by the "repair" > function) does not yet exist. Thanks for sharing your thoughts, will do. Patrick
diff --git a/worktree.c b/worktree.c index a56a6c2a3d..9702ed0308 100644 --- a/worktree.c +++ b/worktree.c @@ -51,7 +51,7 @@ static void add_head_info(struct worktree *wt) /** * get the main worktree */ -static struct worktree *get_main_worktree(void) +static struct worktree *get_main_worktree(int skip_reading_head) { struct worktree *worktree = NULL; struct strbuf worktree_path = STRBUF_INIT; @@ -70,11 +70,13 @@ static struct worktree *get_main_worktree(void) */ worktree->is_bare = (is_bare_repository_cfg == 1) || is_bare_repository(); - add_head_info(worktree); + if (!skip_reading_head) + add_head_info(worktree); return worktree; } -static struct worktree *get_linked_worktree(const char *id) +static struct worktree *get_linked_worktree(const char *id, + int skip_reading_head) { struct worktree *worktree = NULL; struct strbuf path = STRBUF_INIT; @@ -93,7 +95,8 @@ static struct worktree *get_linked_worktree(const char *id) CALLOC_ARRAY(worktree, 1); worktree->path = strbuf_detach(&worktree_path, NULL); worktree->id = xstrdup(id); - add_head_info(worktree); + if (!skip_reading_head) + add_head_info(worktree); done: strbuf_release(&path); @@ -118,7 +121,7 @@ static void mark_current_worktree(struct worktree **worktrees) free(git_dir); } -struct worktree **get_worktrees(void) +static struct worktree **get_worktrees_internal(int skip_reading_head) { struct worktree **list = NULL; struct strbuf path = STRBUF_INIT; @@ -128,7 +131,7 @@ struct worktree **get_worktrees(void) ALLOC_ARRAY(list, alloc); - list[counter++] = get_main_worktree(); + list[counter++] = get_main_worktree(skip_reading_head); strbuf_addf(&path, "%s/worktrees", get_git_common_dir()); dir = opendir(path.buf); @@ -137,7 +140,7 @@ struct worktree **get_worktrees(void) while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) { struct worktree *linked = NULL; - if ((linked = get_linked_worktree(d->d_name))) { + if ((linked = get_linked_worktree(d->d_name, skip_reading_head))) { ALLOC_GROW(list, counter + 1, alloc); list[counter++] = linked; } @@ -151,6 +154,11 @@ struct worktree **get_worktrees(void) return list; } +struct worktree **get_worktrees(void) +{ + return get_worktrees_internal(0); +} + const char *get_worktree_git_dir(const struct worktree *wt) { if (!wt) @@ -591,7 +599,7 @@ static void repair_noop(int iserr UNUSED, void repair_worktrees(worktree_repair_fn fn, void *cb_data) { - struct worktree **worktrees = get_worktrees(); + struct worktree **worktrees = get_worktrees_internal(1); struct worktree **wt = worktrees + 1; /* +1 skips main worktree */ if (!fn)
When calling `git init --separate-git-dir=<new-path>` on a preexisting repository, we move the Git directory of that repository to the new path specified by the user. If there are worktrees present in the repository, we need to repair the worktrees so that their gitlinks point to the new location of the repository. This repair logic will load repositories via `get_worktrees()`, which will enumerate up and initialize all worktrees. Part of initialization is logic that we resolve their respective worktree HEADs, even though that information may not actually be needed in the end by all callers. In the context of git-init(1) this is about to become a problem, because we do not have a repository that was set up via `setup_git_directory()` or friends. Consequentially, it is not yet fully initialized at the time of calling `repair_worktrees()`, and properly setting up all parts of the repository in `init_db()` before we repair worktrees is not an easy thing to do. While this is okay right now where we only have a single reference backend in Git, once we gain a second one we would be trying to look up the worktree HEADs before we have figured out the reference format, which does not work. We do not require the worktree HEADs at all to repair worktrees. So let's fix this issue by skipping over the step that reads them. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- worktree.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)