diff mbox series

[v3,2/8] branch: consider refs under 'update-refs'

Message ID 2bc647b6fcd6032a1e97e67739ced31117bfbfce.1656422759.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series rebase: update branches in multi-part topic | expand

Commit Message

Derrick Stolee June 28, 2022, 1:25 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

The branch_checked_out() helper helps commands like 'git branch' and
'git fetch' from overwriting refs that are currently checked out in
other worktrees.

A future update to 'git rebase' will introduce a new '--update-refs'
option which will update the local refs that point to commits that are
being rebased. To avoid collisions as the rebase completes, we want to
make the future data store for these refs to be considered by
branch_checked_out().

The data store is a plaintext file inside the 'rebase-merge' directory
for that worktree. The file alternates refnames and OIDs. The OIDs will
be used to store the to-be-written values as the rebase progresses, but
can be ignored at the moment.

Create a new sequencer_get_update_refs_state() method that parses this
file and populates a struct string_list with the ref-OID pairs. We can
then use this list to add to the current_checked_out_branches strmap
used by branch_checked_out().

To properly navigate to the rebase directory for a given worktree,
extract the static strbuf_worktree_gitdir() method to a public API
method.

We can test that this works without having Git write this file by
artificially creating one in our test script.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 branch.c                  | 13 ++++++++++++
 sequencer.c               | 42 +++++++++++++++++++++++++++++++++++++++
 sequencer.h               |  9 +++++++++
 t/t2407-worktree-heads.sh | 22 ++++++++++++++++----
 4 files changed, 82 insertions(+), 4 deletions(-)

Comments

Junio C Hamano June 28, 2022, 8:48 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The branch_checked_out() helper helps commands like 'git branch' and
> 'git fetch' from overwriting refs that are currently checked out in
> other worktrees.
>
> A future update to 'git rebase' will introduce a new '--update-refs'
> option which will update the local refs that point to commits that are
> being rebased. To avoid collisions as the rebase completes, we want to
> make the future data store for these refs to be considered by
> branch_checked_out().
>
> The data store is a plaintext file inside the 'rebase-merge' directory
> for that worktree. The file alternates refnames and OIDs. The OIDs will
> be used to store the to-be-written values as the rebase progresses, but
> can be ignored at the moment.
>
> Create a new sequencer_get_update_refs_state() method that parses this
> file and populates a struct string_list with the ref-OID pairs. We can
> then use this list to add to the current_checked_out_branches strmap
> used by branch_checked_out().
>
> To properly navigate to the rebase directory for a given worktree,
> extract the static strbuf_worktree_gitdir() method to a public API
> method.
>
> We can test that this works without having Git write this file by
> artificially creating one in our test script.

Hmph, I am not thrilled to see an ad-hoc text file for things like
this.  Do the objects that appear in this file otherwise already
anchored by existing refs, or can some of them be "floating" without
any refs pointing at them, i.e. making the "connected" and "fsck"
machinery also being aware of them to protect them from collected as
garbage and reported as dangling/unreachable?
Derrick Stolee June 29, 2022, 12:58 p.m. UTC | #2
On 6/28/22 4:48 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> The branch_checked_out() helper helps commands like 'git branch' and
>> 'git fetch' from overwriting refs that are currently checked out in
>> other worktrees.
>>
>> A future update to 'git rebase' will introduce a new '--update-refs'
>> option which will update the local refs that point to commits that are
>> being rebased. To avoid collisions as the rebase completes, we want to
>> make the future data store for these refs to be considered by
>> branch_checked_out().
>>
>> The data store is a plaintext file inside the 'rebase-merge' directory
>> for that worktree. The file alternates refnames and OIDs. The OIDs will
>> be used to store the to-be-written values as the rebase progresses, but
>> can be ignored at the moment.
>>
>> Create a new sequencer_get_update_refs_state() method that parses this
>> file and populates a struct string_list with the ref-OID pairs. We can
>> then use this list to add to the current_checked_out_branches strmap
>> used by branch_checked_out().
>>
>> To properly navigate to the rebase directory for a given worktree,
>> extract the static strbuf_worktree_gitdir() method to a public API
>> method.
>>
>> We can test that this works without having Git write this file by
>> artificially creating one in our test script.
> 
> Hmph, I am not thrilled to see an ad-hoc text file for things like
> this.  Do the objects that appear in this file otherwise already
> anchored by existing refs, or can some of them be "floating" without
> any refs pointing at them, i.e. making the "connected" and "fsck"
> machinery also being aware of them to protect them from collected as
> garbage and reported as dangling/unreachable?

You're right that we could have this sequence of events in a todo
file:

  pick deadbeef Here is a commit that will become unreachable
  update-ref will-be-lost

  squash 1234567 make will-be-lost unreachable
  ...

So if a GC runs after this 'update-ref' step but before the rebase
completes, then that commit can be lost. The ref-update at the end
of the rebase process would fail.

I wonder how important such a situation is, though. But I'm willing
to add the extra lookups in 'git gc' and 'git fsck' to make this a
non-issue.

I'll take a look to see where the other refs in rebase-<backend>/*
are handled by these processes.

Thanks,
-Stolee
Phillip Wood June 30, 2022, 9:32 a.m. UTC | #3
On 28/06/2022 14:25, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The branch_checked_out() helper helps commands like 'git branch' and
> 'git fetch' from overwriting refs that are currently checked out in
> other worktrees.
> 
> A future update to 'git rebase' will introduce a new '--update-refs'
> option which will update the local refs that point to commits that are
> being rebased. To avoid collisions as the rebase completes, we want to
> make the future data store for these refs to be considered by
> branch_checked_out().
> 
> The data store is a plaintext file inside the 'rebase-merge' directory
> for that worktree. The file alternates refnames and OIDs. The OIDs will
> be used to store the to-be-written values as the rebase progresses, but
> can be ignored at the moment.
> 
> Create a new sequencer_get_update_refs_state() method that parses this
> file and populates a struct string_list with the ref-OID pairs. We can
> then use this list to add to the current_checked_out_branches strmap
> used by branch_checked_out().
> 
> To properly navigate to the rebase directory for a given worktree,
> extract the static strbuf_worktree_gitdir() method to a public API
> method.
> 
> We can test that this works without having Git write this file by
> artificially creating one in our test script.

This looks good apart from a couple of small nits

> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>   branch.c                  | 13 ++++++++++++
>   sequencer.c               | 42 +++++++++++++++++++++++++++++++++++++++
>   sequencer.h               |  9 +++++++++
>   t/t2407-worktree-heads.sh | 22 ++++++++++++++++----
>   4 files changed, 82 insertions(+), 4 deletions(-)
> 
> diff --git a/branch.c b/branch.c
> index 526e8237673..f252c4eef6c 100644
> --- a/branch.c
> +++ b/branch.c
> @@ -365,6 +365,7 @@ static void prepare_checked_out_branches(void)
>   		char *old;
>   		struct wt_status_state state = { 0 };
>   		struct worktree *wt = worktrees[i++];
> +		struct string_list update_refs = STRING_LIST_INIT_DUP;
>   
>   		if (wt->is_bare)
>   			continue;
> @@ -400,6 +401,18 @@ static void prepare_checked_out_branches(void)
>   			strbuf_release(&ref);
>   		}
>   		wt_status_state_free_buffers(&state);
> +
> +		if (!sequencer_get_update_refs_state(get_worktree_git_dir(wt),
> +						     &update_refs)) {
> +			struct string_list_item *item;
> +			for_each_string_list_item(item, &update_refs) {
> +				old = strmap_put(&current_checked_out_branches,
> +						 item->string,
> +						 xstrdup(wt->path));
> +				free(old);
> +			}
> +			string_list_clear(&update_refs, 1);
> +		}
>   	}
>   
>   	free_worktrees(worktrees);
> diff --git a/sequencer.c b/sequencer.c
> index 8c3ed3532ac..1094e146b99 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -5901,3 +5901,45 @@ int sequencer_determine_whence(struct repository *r, enum commit_whence *whence)
>   
>   	return 0;
>   }
> +
> +int sequencer_get_update_refs_state(const char *wt_dir,
> +				    struct string_list *refs)
> +{
> +	int result = 0;
> +	struct stat st;
> +	FILE *fp = NULL;
> +	struct strbuf ref = STRBUF_INIT;
> +	struct strbuf hash = STRBUF_INIT;
> +	char *path = xstrfmt("%s/rebase-merge/update-refs", wt_dir);

I think it would make sense to introduce rebase_path_update_refs() in 
this patch rather than later in the series

> +
> +	if (stat(path, &st))
> +		goto cleanup;

What's the reason for stating the file first, rather than just letting 
fopen() fail if it does not exist?

Best Wishes

Phillip
Phillip Wood June 30, 2022, 9:47 a.m. UTC | #4
On 29/06/2022 13:58, Derrick Stolee wrote:
> On 6/28/22 4:48 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Derrick Stolee <derrickstolee@github.com>
>>>
>>> The branch_checked_out() helper helps commands like 'git branch' and
>>> 'git fetch' from overwriting refs that are currently checked out in
>>> other worktrees.
>>>
>>> A future update to 'git rebase' will introduce a new '--update-refs'
>>> option which will update the local refs that point to commits that are
>>> being rebased. To avoid collisions as the rebase completes, we want to
>>> make the future data store for these refs to be considered by
>>> branch_checked_out().
>>>
>>> The data store is a plaintext file inside the 'rebase-merge' directory
>>> for that worktree. The file alternates refnames and OIDs. The OIDs will
>>> be used to store the to-be-written values as the rebase progresses, but
>>> can be ignored at the moment.
>>>
>>> Create a new sequencer_get_update_refs_state() method that parses this
>>> file and populates a struct string_list with the ref-OID pairs. We can
>>> then use this list to add to the current_checked_out_branches strmap
>>> used by branch_checked_out().
>>>
>>> To properly navigate to the rebase directory for a given worktree,
>>> extract the static strbuf_worktree_gitdir() method to a public API
>>> method.
>>>
>>> We can test that this works without having Git write this file by
>>> artificially creating one in our test script.
>>
>> Hmph, I am not thrilled to see an ad-hoc text file for things like
>> this.  Do the objects that appear in this file otherwise already
>> anchored by existing refs, or can some of them be "floating" without
>> any refs pointing at them, i.e. making the "connected" and "fsck"
>> machinery also being aware of them to protect them from collected as
>> garbage and reported as dangling/unreachable?
> 
> You're right that we could have this sequence of events in a todo
> file:
> 
>    pick deadbeef Here is a commit that will become unreachable
>    update-ref will-be-lost
> 
>    squash 1234567 make will-be-lost unreachable
>    ...
> 
> So if a GC runs after this 'update-ref' step but before the rebase
> completes, then that commit can be lost. The ref-update at the end
> of the rebase process would fail.

The commit is protected by HEAD's reflog for a while because we update 
HEAD after each pick but it is not permanently safe.

I've taken a look at the first few patches and they're looking good, I 
hope to look at the rest tomorrow

Best Wishes

Phillip

> I wonder how important such a situation is, though. But I'm willing
> to add the extra lookups in 'git gc' and 'git fsck' to make this a
> non-issue.
> 
> I'll take a look to see where the other refs in rebase-<backend>/*
> are handled by these processes.
> 
> Thanks,
> -Stolee
Derrick Stolee June 30, 2022, 1:35 p.m. UTC | #5
On 6/30/2022 5:32 AM, Phillip Wood wrote:
> On 28/06/2022 14:25, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>

>> +int sequencer_get_update_refs_state(const char *wt_dir,
>> +                    struct string_list *refs)
>> +{
>> +    int result = 0;
>> +    struct stat st;
>> +    FILE *fp = NULL;
>> +    struct strbuf ref = STRBUF_INIT;
>> +    struct strbuf hash = STRBUF_INIT;
>> +    char *path = xstrfmt("%s/rebase-merge/update-refs", wt_dir);
> 
> I think it would make sense to introduce rebase_path_update_refs() in this patch rather than later in the series

The biggest difference is that rebase_path_update_refs() only
gives the path for the current worktree, while this method needs
to read the file for any worktree.

There is likely some opportunity to create rebase_path_update_refs()
in a different way that serves both purposes.

>> +
>> +    if (stat(path, &st))
>> +        goto cleanup;
> 
> What's the reason for stating the file first, rather than just letting fopen() fail if it does not exist?

Not sure what I was looking at that gave this pattern, but you're
right that it isn't necessary.

Thanks,
-Stolee
Junio C Hamano June 30, 2022, 4:49 p.m. UTC | #6
Derrick Stolee <derrickstolee@github.com> writes:

> I wonder how important such a situation is, though. But I'm willing
> to add the extra lookups in 'git gc' and 'git fsck' to make this a
> non-issue.

What I was hinting at was if some of this can be done by using a
ref, so that we do not have to touch "gc" or "fsck" at all.

As to the importance, I would say it is about the same importance as
being able to prevent the branches involved in the rebase from
touched in other worktrees.  We expect it to take sufficiently large
wallclock time from the beginning of a rebase and its finish to make
such a protection necessary, so an auto-gc may even be started from
other worktrees without our knowledge.

Thanks.
Junio C Hamano June 30, 2022, 4:50 p.m. UTC | #7
Phillip Wood <phillip.wood123@gmail.com> writes:

> The commit is protected by HEAD's reflog for a while because we update
> HEAD after each pick but it is not permanently safe.

Ah, I forgot about that one.  Thanks for pointing it out.  It does
sound like HEAD's reflog makes it safe enough.
Phillip Wood July 1, 2022, 1:40 p.m. UTC | #8
Hi Stolee

On 30/06/2022 14:35, Derrick Stolee wrote:
> On 6/30/2022 5:32 AM, Phillip Wood wrote:
>> On 28/06/2022 14:25, Derrick Stolee via GitGitGadget wrote:
>>> From: Derrick Stolee <derrickstolee@github.com>
> 
>>> +int sequencer_get_update_refs_state(const char *wt_dir,
>>> +                    struct string_list *refs)
>>> +{
>>> +    int result = 0;
>>> +    struct stat st;
>>> +    FILE *fp = NULL;
>>> +    struct strbuf ref = STRBUF_INIT;
>>> +    struct strbuf hash = STRBUF_INIT;
>>> +    char *path = xstrfmt("%s/rebase-merge/update-refs", wt_dir);
>>
>> I think it would make sense to introduce rebase_path_update_refs() in this patch rather than later in the series
> 
> The biggest difference is that rebase_path_update_refs() only
> gives the path for the current worktree, while this method needs
> to read the file for any worktree.

That's an important distinction that I'd failed to notice

> There is likely some opportunity to create rebase_path_update_refs()
> in a different way that serves both purposes.

That would be nice, even just having a #define for 
"rebase-merge/update-refs" and using that here and in the other patch 
would mean we're not hard coding the filename in two different places.

Best Wishes

Phillip

>>> +
>>> +    if (stat(path, &st))
>>> +        goto cleanup;
>>
>> What's the reason for stating the file first, rather than just letting fopen() fail if it does not exist?
> 
> Not sure what I was looking at that gave this pattern, but you're
> right that it isn't necessary.
> 
> Thanks,
> -Stolee
diff mbox series

Patch

diff --git a/branch.c b/branch.c
index 526e8237673..f252c4eef6c 100644
--- a/branch.c
+++ b/branch.c
@@ -365,6 +365,7 @@  static void prepare_checked_out_branches(void)
 		char *old;
 		struct wt_status_state state = { 0 };
 		struct worktree *wt = worktrees[i++];
+		struct string_list update_refs = STRING_LIST_INIT_DUP;
 
 		if (wt->is_bare)
 			continue;
@@ -400,6 +401,18 @@  static void prepare_checked_out_branches(void)
 			strbuf_release(&ref);
 		}
 		wt_status_state_free_buffers(&state);
+
+		if (!sequencer_get_update_refs_state(get_worktree_git_dir(wt),
+						     &update_refs)) {
+			struct string_list_item *item;
+			for_each_string_list_item(item, &update_refs) {
+				old = strmap_put(&current_checked_out_branches,
+						 item->string,
+						 xstrdup(wt->path));
+				free(old);
+			}
+			string_list_clear(&update_refs, 1);
+		}
 	}
 
 	free_worktrees(worktrees);
diff --git a/sequencer.c b/sequencer.c
index 8c3ed3532ac..1094e146b99 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -5901,3 +5901,45 @@  int sequencer_determine_whence(struct repository *r, enum commit_whence *whence)
 
 	return 0;
 }
+
+int sequencer_get_update_refs_state(const char *wt_dir,
+				    struct string_list *refs)
+{
+	int result = 0;
+	struct stat st;
+	FILE *fp = NULL;
+	struct strbuf ref = STRBUF_INIT;
+	struct strbuf hash = STRBUF_INIT;
+	char *path = xstrfmt("%s/rebase-merge/update-refs", wt_dir);
+
+	if (stat(path, &st))
+		goto cleanup;
+
+	fp = fopen(path, "r");
+	if (!fp)
+		goto cleanup;
+
+	while (strbuf_getline(&ref, fp) != EOF) {
+		struct object_id oid;
+		struct string_list_item *item;
+
+		if (strbuf_getline(&hash, fp) == EOF ||
+		    get_oid_hex(hash.buf, &oid)) {
+			warning(_("update-refs file at '%s' is invalid"),
+				  path);
+			result = -1;
+			goto cleanup;
+		}
+
+		item = string_list_append(refs, ref.buf);
+		item->util = oiddup(&oid);
+	}
+
+cleanup:
+	if (fp)
+		fclose(fp);
+	free(path);
+	strbuf_release(&ref);
+	strbuf_release(&hash);
+	return result;
+}
diff --git a/sequencer.h b/sequencer.h
index da64473636b..3ae541bb145 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -232,4 +232,13 @@  void sequencer_post_commit_cleanup(struct repository *r, int verbose);
 int sequencer_get_last_command(struct repository* r,
 			       enum replay_action *action);
 int sequencer_determine_whence(struct repository *r, enum commit_whence *whence);
+
+/**
+ * Append the set of ref-OID pairs that are currently stored for the 'git
+ * rebase --update-refs' feature if such a rebase is currently happening.
+ *
+ * Localized to a worktree's git dir.
+ */
+int sequencer_get_update_refs_state(const char *wt_dir, struct string_list *refs);
+
 #endif /* SEQUENCER_H */
diff --git a/t/t2407-worktree-heads.sh b/t/t2407-worktree-heads.sh
index 4f59bc21303..f1e9e172a0c 100755
--- a/t/t2407-worktree-heads.sh
+++ b/t/t2407-worktree-heads.sh
@@ -7,8 +7,11 @@  TEST_PASSES_SANITIZE_LEAK=true
 
 test_expect_success 'setup' '
 	test_commit init &&
-	git branch -f fake-1 &&
-	git branch -f fake-2 &&
+
+	for i in 1 2 3 4
+	do
+		git branch -f fake-$i || return 1
+	done &&
 
 	for i in 1 2 3 4
 	do
@@ -73,8 +76,19 @@  test_expect_success 'refuse to overwrite: worktree in rebase (merge)' '
 	echo refs/heads/fake-1 >.git/worktrees/wt-3/rebase-merge/head-name &&
 	echo refs/heads/fake-2 >.git/worktrees/wt-3/rebase-merge/onto &&
 
-	test_must_fail git branch -f fake-1 HEAD 2>err &&
-	grep "cannot force update the branch '\''fake-1'\'' checked out at.*wt-3" err
+	cat >.git/worktrees/wt-3/rebase-merge/update-refs <<-EOF &&
+	refs/heads/fake-3
+	$(git rev-parse HEAD~1)
+	refs/heads/fake-4
+	$(git rev-parse HEAD)
+	EOF
+
+	for i in 1 3 4
+	do
+		test_must_fail git branch -f fake-$i HEAD 2>err &&
+		grep "cannot force update the branch '\''fake-$i'\'' checked out at.*wt-3" err ||
+			return 1
+	done
 '
 
 test_expect_success !SANITIZE_LEAK 'refuse to fetch over ref: checked out' '