Message ID | 2bc647b6fcd6032a1e97e67739ced31117bfbfce.1656422759.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | rebase: update branches in multi-part topic | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <derrickstolee@github.com> > > The branch_checked_out() helper helps commands like 'git branch' and > 'git fetch' from overwriting refs that are currently checked out in > other worktrees. > > A future update to 'git rebase' will introduce a new '--update-refs' > option which will update the local refs that point to commits that are > being rebased. To avoid collisions as the rebase completes, we want to > make the future data store for these refs to be considered by > branch_checked_out(). > > The data store is a plaintext file inside the 'rebase-merge' directory > for that worktree. The file alternates refnames and OIDs. The OIDs will > be used to store the to-be-written values as the rebase progresses, but > can be ignored at the moment. > > Create a new sequencer_get_update_refs_state() method that parses this > file and populates a struct string_list with the ref-OID pairs. We can > then use this list to add to the current_checked_out_branches strmap > used by branch_checked_out(). > > To properly navigate to the rebase directory for a given worktree, > extract the static strbuf_worktree_gitdir() method to a public API > method. > > We can test that this works without having Git write this file by > artificially creating one in our test script. Hmph, I am not thrilled to see an ad-hoc text file for things like this. Do the objects that appear in this file otherwise already anchored by existing refs, or can some of them be "floating" without any refs pointing at them, i.e. making the "connected" and "fsck" machinery also being aware of them to protect them from collected as garbage and reported as dangling/unreachable?
On 6/28/22 4:48 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Derrick Stolee <derrickstolee@github.com> >> >> The branch_checked_out() helper helps commands like 'git branch' and >> 'git fetch' from overwriting refs that are currently checked out in >> other worktrees. >> >> A future update to 'git rebase' will introduce a new '--update-refs' >> option which will update the local refs that point to commits that are >> being rebased. To avoid collisions as the rebase completes, we want to >> make the future data store for these refs to be considered by >> branch_checked_out(). >> >> The data store is a plaintext file inside the 'rebase-merge' directory >> for that worktree. The file alternates refnames and OIDs. The OIDs will >> be used to store the to-be-written values as the rebase progresses, but >> can be ignored at the moment. >> >> Create a new sequencer_get_update_refs_state() method that parses this >> file and populates a struct string_list with the ref-OID pairs. We can >> then use this list to add to the current_checked_out_branches strmap >> used by branch_checked_out(). >> >> To properly navigate to the rebase directory for a given worktree, >> extract the static strbuf_worktree_gitdir() method to a public API >> method. >> >> We can test that this works without having Git write this file by >> artificially creating one in our test script. > > Hmph, I am not thrilled to see an ad-hoc text file for things like > this. Do the objects that appear in this file otherwise already > anchored by existing refs, or can some of them be "floating" without > any refs pointing at them, i.e. making the "connected" and "fsck" > machinery also being aware of them to protect them from collected as > garbage and reported as dangling/unreachable? You're right that we could have this sequence of events in a todo file: pick deadbeef Here is a commit that will become unreachable update-ref will-be-lost squash 1234567 make will-be-lost unreachable ... So if a GC runs after this 'update-ref' step but before the rebase completes, then that commit can be lost. The ref-update at the end of the rebase process would fail. I wonder how important such a situation is, though. But I'm willing to add the extra lookups in 'git gc' and 'git fsck' to make this a non-issue. I'll take a look to see where the other refs in rebase-<backend>/* are handled by these processes. Thanks, -Stolee
On 28/06/2022 14:25, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > > The branch_checked_out() helper helps commands like 'git branch' and > 'git fetch' from overwriting refs that are currently checked out in > other worktrees. > > A future update to 'git rebase' will introduce a new '--update-refs' > option which will update the local refs that point to commits that are > being rebased. To avoid collisions as the rebase completes, we want to > make the future data store for these refs to be considered by > branch_checked_out(). > > The data store is a plaintext file inside the 'rebase-merge' directory > for that worktree. The file alternates refnames and OIDs. The OIDs will > be used to store the to-be-written values as the rebase progresses, but > can be ignored at the moment. > > Create a new sequencer_get_update_refs_state() method that parses this > file and populates a struct string_list with the ref-OID pairs. We can > then use this list to add to the current_checked_out_branches strmap > used by branch_checked_out(). > > To properly navigate to the rebase directory for a given worktree, > extract the static strbuf_worktree_gitdir() method to a public API > method. > > We can test that this works without having Git write this file by > artificially creating one in our test script. This looks good apart from a couple of small nits > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > branch.c | 13 ++++++++++++ > sequencer.c | 42 +++++++++++++++++++++++++++++++++++++++ > sequencer.h | 9 +++++++++ > t/t2407-worktree-heads.sh | 22 ++++++++++++++++---- > 4 files changed, 82 insertions(+), 4 deletions(-) > > diff --git a/branch.c b/branch.c > index 526e8237673..f252c4eef6c 100644 > --- a/branch.c > +++ b/branch.c > @@ -365,6 +365,7 @@ static void prepare_checked_out_branches(void) > char *old; > struct wt_status_state state = { 0 }; > struct worktree *wt = worktrees[i++]; > + struct string_list update_refs = STRING_LIST_INIT_DUP; > > if (wt->is_bare) > continue; > @@ -400,6 +401,18 @@ static void prepare_checked_out_branches(void) > strbuf_release(&ref); > } > wt_status_state_free_buffers(&state); > + > + if (!sequencer_get_update_refs_state(get_worktree_git_dir(wt), > + &update_refs)) { > + struct string_list_item *item; > + for_each_string_list_item(item, &update_refs) { > + old = strmap_put(¤t_checked_out_branches, > + item->string, > + xstrdup(wt->path)); > + free(old); > + } > + string_list_clear(&update_refs, 1); > + } > } > > free_worktrees(worktrees); > diff --git a/sequencer.c b/sequencer.c > index 8c3ed3532ac..1094e146b99 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -5901,3 +5901,45 @@ int sequencer_determine_whence(struct repository *r, enum commit_whence *whence) > > return 0; > } > + > +int sequencer_get_update_refs_state(const char *wt_dir, > + struct string_list *refs) > +{ > + int result = 0; > + struct stat st; > + FILE *fp = NULL; > + struct strbuf ref = STRBUF_INIT; > + struct strbuf hash = STRBUF_INIT; > + char *path = xstrfmt("%s/rebase-merge/update-refs", wt_dir); I think it would make sense to introduce rebase_path_update_refs() in this patch rather than later in the series > + > + if (stat(path, &st)) > + goto cleanup; What's the reason for stating the file first, rather than just letting fopen() fail if it does not exist? Best Wishes Phillip
On 29/06/2022 13:58, Derrick Stolee wrote: > On 6/28/22 4:48 PM, Junio C Hamano wrote: >> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >> >>> From: Derrick Stolee <derrickstolee@github.com> >>> >>> The branch_checked_out() helper helps commands like 'git branch' and >>> 'git fetch' from overwriting refs that are currently checked out in >>> other worktrees. >>> >>> A future update to 'git rebase' will introduce a new '--update-refs' >>> option which will update the local refs that point to commits that are >>> being rebased. To avoid collisions as the rebase completes, we want to >>> make the future data store for these refs to be considered by >>> branch_checked_out(). >>> >>> The data store is a plaintext file inside the 'rebase-merge' directory >>> for that worktree. The file alternates refnames and OIDs. The OIDs will >>> be used to store the to-be-written values as the rebase progresses, but >>> can be ignored at the moment. >>> >>> Create a new sequencer_get_update_refs_state() method that parses this >>> file and populates a struct string_list with the ref-OID pairs. We can >>> then use this list to add to the current_checked_out_branches strmap >>> used by branch_checked_out(). >>> >>> To properly navigate to the rebase directory for a given worktree, >>> extract the static strbuf_worktree_gitdir() method to a public API >>> method. >>> >>> We can test that this works without having Git write this file by >>> artificially creating one in our test script. >> >> Hmph, I am not thrilled to see an ad-hoc text file for things like >> this. Do the objects that appear in this file otherwise already >> anchored by existing refs, or can some of them be "floating" without >> any refs pointing at them, i.e. making the "connected" and "fsck" >> machinery also being aware of them to protect them from collected as >> garbage and reported as dangling/unreachable? > > You're right that we could have this sequence of events in a todo > file: > > pick deadbeef Here is a commit that will become unreachable > update-ref will-be-lost > > squash 1234567 make will-be-lost unreachable > ... > > So if a GC runs after this 'update-ref' step but before the rebase > completes, then that commit can be lost. The ref-update at the end > of the rebase process would fail. The commit is protected by HEAD's reflog for a while because we update HEAD after each pick but it is not permanently safe. I've taken a look at the first few patches and they're looking good, I hope to look at the rest tomorrow Best Wishes Phillip > I wonder how important such a situation is, though. But I'm willing > to add the extra lookups in 'git gc' and 'git fsck' to make this a > non-issue. > > I'll take a look to see where the other refs in rebase-<backend>/* > are handled by these processes. > > Thanks, > -Stolee
On 6/30/2022 5:32 AM, Phillip Wood wrote: > On 28/06/2022 14:25, Derrick Stolee via GitGitGadget wrote: >> From: Derrick Stolee <derrickstolee@github.com> >> +int sequencer_get_update_refs_state(const char *wt_dir, >> + struct string_list *refs) >> +{ >> + int result = 0; >> + struct stat st; >> + FILE *fp = NULL; >> + struct strbuf ref = STRBUF_INIT; >> + struct strbuf hash = STRBUF_INIT; >> + char *path = xstrfmt("%s/rebase-merge/update-refs", wt_dir); > > I think it would make sense to introduce rebase_path_update_refs() in this patch rather than later in the series The biggest difference is that rebase_path_update_refs() only gives the path for the current worktree, while this method needs to read the file for any worktree. There is likely some opportunity to create rebase_path_update_refs() in a different way that serves both purposes. >> + >> + if (stat(path, &st)) >> + goto cleanup; > > What's the reason for stating the file first, rather than just letting fopen() fail if it does not exist? Not sure what I was looking at that gave this pattern, but you're right that it isn't necessary. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > I wonder how important such a situation is, though. But I'm willing > to add the extra lookups in 'git gc' and 'git fsck' to make this a > non-issue. What I was hinting at was if some of this can be done by using a ref, so that we do not have to touch "gc" or "fsck" at all. As to the importance, I would say it is about the same importance as being able to prevent the branches involved in the rebase from touched in other worktrees. We expect it to take sufficiently large wallclock time from the beginning of a rebase and its finish to make such a protection necessary, so an auto-gc may even be started from other worktrees without our knowledge. Thanks.
Phillip Wood <phillip.wood123@gmail.com> writes: > The commit is protected by HEAD's reflog for a while because we update > HEAD after each pick but it is not permanently safe. Ah, I forgot about that one. Thanks for pointing it out. It does sound like HEAD's reflog makes it safe enough.
Hi Stolee On 30/06/2022 14:35, Derrick Stolee wrote: > On 6/30/2022 5:32 AM, Phillip Wood wrote: >> On 28/06/2022 14:25, Derrick Stolee via GitGitGadget wrote: >>> From: Derrick Stolee <derrickstolee@github.com> > >>> +int sequencer_get_update_refs_state(const char *wt_dir, >>> + struct string_list *refs) >>> +{ >>> + int result = 0; >>> + struct stat st; >>> + FILE *fp = NULL; >>> + struct strbuf ref = STRBUF_INIT; >>> + struct strbuf hash = STRBUF_INIT; >>> + char *path = xstrfmt("%s/rebase-merge/update-refs", wt_dir); >> >> I think it would make sense to introduce rebase_path_update_refs() in this patch rather than later in the series > > The biggest difference is that rebase_path_update_refs() only > gives the path for the current worktree, while this method needs > to read the file for any worktree. That's an important distinction that I'd failed to notice > There is likely some opportunity to create rebase_path_update_refs() > in a different way that serves both purposes. That would be nice, even just having a #define for "rebase-merge/update-refs" and using that here and in the other patch would mean we're not hard coding the filename in two different places. Best Wishes Phillip >>> + >>> + if (stat(path, &st)) >>> + goto cleanup; >> >> What's the reason for stating the file first, rather than just letting fopen() fail if it does not exist? > > Not sure what I was looking at that gave this pattern, but you're > right that it isn't necessary. > > Thanks, > -Stolee
diff --git a/branch.c b/branch.c index 526e8237673..f252c4eef6c 100644 --- a/branch.c +++ b/branch.c @@ -365,6 +365,7 @@ static void prepare_checked_out_branches(void) char *old; struct wt_status_state state = { 0 }; struct worktree *wt = worktrees[i++]; + struct string_list update_refs = STRING_LIST_INIT_DUP; if (wt->is_bare) continue; @@ -400,6 +401,18 @@ static void prepare_checked_out_branches(void) strbuf_release(&ref); } wt_status_state_free_buffers(&state); + + if (!sequencer_get_update_refs_state(get_worktree_git_dir(wt), + &update_refs)) { + struct string_list_item *item; + for_each_string_list_item(item, &update_refs) { + old = strmap_put(¤t_checked_out_branches, + item->string, + xstrdup(wt->path)); + free(old); + } + string_list_clear(&update_refs, 1); + } } free_worktrees(worktrees); diff --git a/sequencer.c b/sequencer.c index 8c3ed3532ac..1094e146b99 100644 --- a/sequencer.c +++ b/sequencer.c @@ -5901,3 +5901,45 @@ int sequencer_determine_whence(struct repository *r, enum commit_whence *whence) return 0; } + +int sequencer_get_update_refs_state(const char *wt_dir, + struct string_list *refs) +{ + int result = 0; + struct stat st; + FILE *fp = NULL; + struct strbuf ref = STRBUF_INIT; + struct strbuf hash = STRBUF_INIT; + char *path = xstrfmt("%s/rebase-merge/update-refs", wt_dir); + + if (stat(path, &st)) + goto cleanup; + + fp = fopen(path, "r"); + if (!fp) + goto cleanup; + + while (strbuf_getline(&ref, fp) != EOF) { + struct object_id oid; + struct string_list_item *item; + + if (strbuf_getline(&hash, fp) == EOF || + get_oid_hex(hash.buf, &oid)) { + warning(_("update-refs file at '%s' is invalid"), + path); + result = -1; + goto cleanup; + } + + item = string_list_append(refs, ref.buf); + item->util = oiddup(&oid); + } + +cleanup: + if (fp) + fclose(fp); + free(path); + strbuf_release(&ref); + strbuf_release(&hash); + return result; +} diff --git a/sequencer.h b/sequencer.h index da64473636b..3ae541bb145 100644 --- a/sequencer.h +++ b/sequencer.h @@ -232,4 +232,13 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose); int sequencer_get_last_command(struct repository* r, enum replay_action *action); int sequencer_determine_whence(struct repository *r, enum commit_whence *whence); + +/** + * Append the set of ref-OID pairs that are currently stored for the 'git + * rebase --update-refs' feature if such a rebase is currently happening. + * + * Localized to a worktree's git dir. + */ +int sequencer_get_update_refs_state(const char *wt_dir, struct string_list *refs); + #endif /* SEQUENCER_H */ diff --git a/t/t2407-worktree-heads.sh b/t/t2407-worktree-heads.sh index 4f59bc21303..f1e9e172a0c 100755 --- a/t/t2407-worktree-heads.sh +++ b/t/t2407-worktree-heads.sh @@ -7,8 +7,11 @@ TEST_PASSES_SANITIZE_LEAK=true test_expect_success 'setup' ' test_commit init && - git branch -f fake-1 && - git branch -f fake-2 && + + for i in 1 2 3 4 + do + git branch -f fake-$i || return 1 + done && for i in 1 2 3 4 do @@ -73,8 +76,19 @@ test_expect_success 'refuse to overwrite: worktree in rebase (merge)' ' echo refs/heads/fake-1 >.git/worktrees/wt-3/rebase-merge/head-name && echo refs/heads/fake-2 >.git/worktrees/wt-3/rebase-merge/onto && - test_must_fail git branch -f fake-1 HEAD 2>err && - grep "cannot force update the branch '\''fake-1'\'' checked out at.*wt-3" err + cat >.git/worktrees/wt-3/rebase-merge/update-refs <<-EOF && + refs/heads/fake-3 + $(git rev-parse HEAD~1) + refs/heads/fake-4 + $(git rev-parse HEAD) + EOF + + for i in 1 3 4 + do + test_must_fail git branch -f fake-$i HEAD 2>err && + grep "cannot force update the branch '\''fake-$i'\'' checked out at.*wt-3" err || + return 1 + done ' test_expect_success !SANITIZE_LEAK 'refuse to fetch over ref: checked out' '