[v2,0/3] Ensure unique worktree ids across repositories

Message ID	20241129-wt_unique_ids-v2-0-ff444e9e625a@pm.me (mailing list archive)
Headers	show Received: from mail-4316.protonmail.ch (mail-4316.protonmail.ch [185.70.43.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79DB92C18C for <git@vger.kernel.org>; Fri, 29 Nov 2024 22:37:45 +0000 (UTC) Date: Fri, 29 Nov 2024 22:37:34 +0000 To: git@vger.kernel.org From: Caleb White <cdwhite3@pm.me> Cc: shejialuo <shejialuo@gmail.com>, Junio C Hamano <gitster@pobox.com>, Caleb White <cdwhite3@pm.me> Subject: [PATCH v2 0/3] Ensure unique worktree ids across repositories Message-ID: <20241129-wt_unique_ids-v2-0-ff444e9e625a@pm.me> Feedback-ID: 31210263:user:proton Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Series	Ensure unique worktree ids across repositories \| expand [v2,0/3] Ensure unique worktree ids across repositories [v2,1/3] worktree: add worktree with unique suffix [v2,2/3] worktree: rename worktree id during worktree move [v2,3/3] worktree: add id to `worktree list` output

Caleb White Nov. 29, 2024, 10:37 p.m. UTC

The `es/worktree-repair-copied` topic added support for repairing a
worktree from a copy scenario. I noted[1,2] that the topic added the
ability for a repository to "take over" a worktree from another
repository if the worktree_id matched a worktree inside the current
repository which can happen if two repositories use the same worktree name.

This series teaches Git to create worktrees with a unique suffix so
that the worktree_id is unique across all repositories even if they have
the same name. For example creating a worktree `develop` would look like:

    foo/
    ├── .git/worktrees/develop-5445874156/
    └── develop/
    bar/
    ├── .git/worktrees/develop-1549518426/
    └── develop/

The actual worktree directory name is still `develop`, but the
worktree_id is unique and prevents the "take over" scenario. The suffix
is given by the `git_rand()` function which should be sufficient to
ensure that the suffix is effectively unique (the likelihood of
a collision with the same name and suffix is extremely low).

During a `worktree move` the worktree directory is moved/renamed but the
repository under `worktrees/<id>` is not updated. For example, moving
`develop` to `master` results in

    foo/
    ├── .git/worktrees/develop-5445874156/
    └── master/

This works because the linking files still point to the correct
repository, but this is a little weird. This series teaches Git to also
move/rename the repository / worktree id during a `move` so that the
structure now looks like:

    foo/
    ├── .git/worktrees/master-1565465986/
    └── master/

Note that a new unique suffix is assigned to reduce the complexity of
trying to parse and reuse the existing suffix.

Additionally, this series teaches `worktree list` to output the worktree
id in the verbose and porcelain modes, which allows users and scripts to
more easily obtain the id for a given worktree.

[1]: https://lore.kernel.org/git/20241008153035.71178-1-cdwhite3@pm.me/
[2]: https://lore.kernel.org/git/r4zmcET41Skr_FMop47AKd7cms9E8bKPSvHuAUpnYavzKEY6JybJta0_7GfuYB0q-gD-XNcvh5VDTfiT3qthGKjqhS1sbT4M2lUABynOz2Q=@pm.me/

Signed-off-by: Caleb White <cdwhite3@pm.me>
---
The base for this series is obtained by merging the `cw/worktree-extension`
topic (2024-11-26, 20241125-wt_relative_options-v5-0-356d122ff3db@pm.me)
onto 090d24e9af.

Changes in v2:
- Add the worktree id to `worktree list` output
- Updated cover letter
- Link to v1: https://lore.kernel.org/r/20241128-wt_unique_ids-v1-0-30345d010e43@pm.me

---
Caleb White (3):
      worktree: add worktree with unique suffix
      worktree: rename worktree id during worktree move
      worktree: add id to `worktree list` output

 Documentation/git-worktree.txt     | 17 +++++--
 builtin/worktree.c                 | 35 ++++++++++++++
 t/t0035-safe-bare-repository.sh    |  4 +-
 t/t0600-reffiles-backend.sh        | 10 ++--
 t/t0601-reffiles-pack-refs.sh      |  4 +-
 t/t0610-reftable-basics.sh         | 54 +++++++++++-----------
 t/t1407-worktree-ref-store.sh      |  4 +-
 t/t1410-reflog.sh                  | 10 ++--
 t/t1415-worktree-refs.sh           | 26 +++++------
 t/t1450-fsck.sh                    | 14 +++---
 t/t1500-rev-parse.sh               |  6 +--
 t/t2400-worktree-add.sh            | 51 +++++++++++----------
 t/t2401-worktree-prune.sh          | 20 ++++----
 t/t2402-worktree-list.sh           | 16 ++++---
 t/t2403-worktree-move.sh           | 38 ++++++++--------
 t/t2405-worktree-submodule.sh      | 10 ++--
 t/t2406-worktree-repair.sh         | 93 ++++++++++++++++++++++++--------------
 t/t2407-worktree-heads.sh          | 27 +++++------
 t/t3200-branch.sh                  | 10 ++--
 t/t5304-prune.sh                   |  2 +-
 t/t7412-submodule-absorbgitdirs.sh |  4 +-
 21 files changed, 265 insertions(+), 190 deletions(-)
---
base-commit: 090d24e9af6e9f59c3f7bee97c42bb1ae3c7f559
change-id: 20241127-wt_unique_ids-1ffd7ea0bb19
prerequisite-change-id: 20241025-wt_relative_options-afa41987bc32:v6
prerequisite-patch-id: 179410e257e8eedf100f4f9faa9467cbbba4d61b
prerequisite-patch-id: 56ffe0afeadd511c9eef5f548a371659b040acab
prerequisite-patch-id: 809c1314e5dfa966f4f3d73b52f286f8aa89370f
prerequisite-patch-id: cf5f9491c8f8e58d1e0e103a5f8c64c55f2896e3
prerequisite-patch-id: 9884b33822bf4c7c3b89a9a6b49d4ab44c2670e7
prerequisite-patch-id: 62a09496d98d78a6bd1f9150ba887ee72359c7ee
prerequisite-patch-id: 5527e4b745963dd4fa08029491fcbfe3d91d5104
prerequisite-patch-id: bf433443e90939a493fa586de30938f78cb77020

Best regards,

Randall S. Becker Nov. 29, 2024, 10:54 p.m. UTC | #1

On November 29, 2024 5:38 PM, Caleb White writes:
>The `es/worktree-repair-copied` topic added support for repairing a worktree from
>a copy scenario. I noted[1,2] that the topic added the ability for a repository to
>"take over" a worktree from another repository if the worktree_id matched a
>worktree inside the current repository which can happen if two repositories use the
>same worktree name.
>
>This series teaches Git to create worktrees with a unique suffix so that the
>worktree_id is unique across all repositories even if they have the same name. For
>example creating a worktree `develop` would look like:
>
>    foo/
>    ├── .git/worktrees/develop-5445874156/
>    └── develop/
>    bar/
>    ├── .git/worktrees/develop-1549518426/
>    └── develop/
>
>The actual worktree directory name is still `develop`, but the worktree_id is unique
>and prevents the "take over" scenario. The suffix is given by the `git_rand()` function
>which should be sufficient to ensure that the suffix is effectively unique (the
>likelihood of a collision with the same name and suffix is extremely low).
>
>During a `worktree move` the worktree directory is moved/renamed but the
>repository under `worktrees/<id>` is not updated. For example, moving `develop` to
>`master` results in
>
>    foo/
>    ├── .git/worktrees/develop-5445874156/
>    └── master/
>
>This works because the linking files still point to the correct repository, but this is a
>little weird. This series teaches Git to also move/rename the repository / worktree id
>during a `move` so that the structure now looks like:
>
>    foo/
>    ├── .git/worktrees/master-1565465986/
>    └── master/
>
>Note that a new unique suffix is assigned to reduce the complexity of trying to parse
>and reuse the existing suffix.
>
>Additionally, this series teaches `worktree list` to output the worktree id in the
>verbose and porcelain modes, which allows users and scripts to more easily obtain
>the id for a given worktree.
>
>[1]: https://lore.kernel.org/git/20241008153035.71178-1-cdwhite3@pm.me/
>[2]:
>https://lore.kernel.org/git/r4zmcET41Skr_FMop47AKd7cms9E8bKPSvHuAUpnYav
>zKEY6JybJta0_7GfuYB0q-gD-
>XNcvh5VDTfiT3qthGKjqhS1sbT4M2lUABynOz2Q=@pm.me/
>
>Signed-off-by: Caleb White <cdwhite3@pm.me>
>---
>The base for this series is obtained by merging the `cw/worktree-extension` topic
>(2024-11-26, 20241125-wt_relative_options-v5-0-356d122ff3db@pm.me)
>onto 090d24e9af.
>
>Changes in v2:
>- Add the worktree id to `worktree list` output
>- Updated cover letter
>- Link to v1: https://lore.kernel.org/r/20241128-wt_unique_ids-v1-0-
>30345d010e43@pm.me
>
>---
>Caleb White (3):
>      worktree: add worktree with unique suffix
>      worktree: rename worktree id during worktree move
>      worktree: add id to `worktree list` output
>
> Documentation/git-worktree.txt     | 17 +++++--
> builtin/worktree.c                 | 35 ++++++++++++++
> t/t0035-safe-bare-repository.sh    |  4 +-
> t/t0600-reffiles-backend.sh        | 10 ++--
> t/t0601-reffiles-pack-refs.sh      |  4 +-
> t/t0610-reftable-basics.sh         | 54 +++++++++++-----------
> t/t1407-worktree-ref-store.sh      |  4 +-
> t/t1410-reflog.sh                  | 10 ++--
> t/t1415-worktree-refs.sh           | 26 +++++------
> t/t1450-fsck.sh                    | 14 +++---
> t/t1500-rev-parse.sh               |  6 +--
> t/t2400-worktree-add.sh            | 51 +++++++++++----------
> t/t2401-worktree-prune.sh          | 20 ++++----
> t/t2402-worktree-list.sh           | 16 ++++---
> t/t2403-worktree-move.sh           | 38 ++++++++--------
> t/t2405-worktree-submodule.sh      | 10 ++--
> t/t2406-worktree-repair.sh         | 93 ++++++++++++++++++++++++--------------
> t/t2407-worktree-heads.sh          | 27 +++++------
> t/t3200-branch.sh                  | 10 ++--
> t/t5304-prune.sh                   |  2 +-
> t/t7412-submodule-absorbgitdirs.sh |  4 +-
> 21 files changed, 265 insertions(+), 190 deletions(-)
>---
>base-commit: 090d24e9af6e9f59c3f7bee97c42bb1ae3c7f559
>change-id: 20241127-wt_unique_ids-1ffd7ea0bb19
>prerequisite-change-id: 20241025-wt_relative_options-afa41987bc32:v6
>prerequisite-patch-id: 179410e257e8eedf100f4f9faa9467cbbba4d61b
>prerequisite-patch-id: 56ffe0afeadd511c9eef5f548a371659b040acab
>prerequisite-patch-id: 809c1314e5dfa966f4f3d73b52f286f8aa89370f
>prerequisite-patch-id: cf5f9491c8f8e58d1e0e103a5f8c64c55f2896e3
>prerequisite-patch-id: 9884b33822bf4c7c3b89a9a6b49d4ab44c2670e7
>prerequisite-patch-id: 62a09496d98d78a6bd1f9150ba887ee72359c7ee
>prerequisite-patch-id: 5527e4b745963dd4fa08029491fcbfe3d91d5104
>prerequisite-patch-id: bf433443e90939a493fa586de30938f78cb77020

General comment on this series: Is there a mechanism of preserving existing
functionality for those of us who have existing scripts that depend on the
existing branch and worktree naming?

Caleb White Nov. 29, 2024, 11:13 p.m. UTC | #2

On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
> General comment on this series: Is there a mechanism of preserving existing
> functionality for those of us who have existing scripts that depend on the
> existing branch and worktree naming?

Existing worktrees will continue to work as they do now. The only change
is the worktree id for new worktrees. However, there's not an option to
preserve the existing behavior for new worktrees (nor do I think there
should be).

As stated in the v1 threads, the worktree id is already not guaranteed
to be equal to the worktree/branch name (there's several ways that this
can occur), so it's buggy behavior for scripts to make this assumption.
Any script that needs the worktree id should be parsing it from the 
`.git` file, `git rev-parse --git-dir`, or (with the changes in this
series) `git worktree list`.

Best,

Caleb

Randall S. Becker Nov. 29, 2024, 11:17 p.m. UTC | #3

On November 29, 2024 6:14 PM, Caleb White writes:
>On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
>> General comment on this series: Is there a mechanism of preserving
>> existing functionality for those of us who have existing scripts that
>> depend on the existing branch and worktree naming?
>
>Existing worktrees will continue to work as they do now. The only change is the
>worktree id for new worktrees. However, there's not an option to preserve the
>existing behavior for new worktrees (nor do I think there should be).

I do not agree. Companies that have existing scripts should have some way to
preserve their investment. Just saying "No more worktrees for you" is not
really considerate.

>As stated in the v1 threads, the worktree id is already not guaranteed to be equal to
>the worktree/branch name (there's several ways that this can occur), so it's buggy
>behavior for scripts to make this assumption.
>Any script that needs the worktree id should be parsing it from the `.git` file, `git rev-
>parse --git-dir`, or (with the changes in this
>series) `git worktree list`.

I agree, but I think having some kind of notice beyond one release is important, rather
than pulling the rug out from under people.

Just my suggestion that there should be a migration period of this critical function.
--Randall

Caleb White Nov. 29, 2024, 11:29 p.m. UTC | #4

On Fri Nov 29, 2024 at 5:17 PM CST, rsbecker wrote:
> On November 29, 2024 6:14 PM, Caleb White writes:
>>On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
>>> General comment on this series: Is there a mechanism of preserving
>>> existing functionality for those of us who have existing scripts that
>>> depend on the existing branch and worktree naming?
>>
>>Existing worktrees will continue to work as they do now. The only change is the
>>worktree id for new worktrees. However, there's not an option to preserve the
>>existing behavior for new worktrees (nor do I think there should be).
>
> I do not agree. Companies that have existing scripts should have some way to
> preserve their investment. Just saying "No more worktrees for you" is not
> really considerate.

How exactly are your scripts depending on the worktree id? There are
very few reasons a script might need to know the worktree id, and
I suspect that there's some confusion here. The worktree name is still
used with the `git worktree` commands, so there no change on that front.

Best,

Caleb

Randall S. Becker Nov. 29, 2024, 11:44 p.m. UTC | #5

On November 29, 2024 6:29 PM, Caleb White wrote:
>On Fri Nov 29, 2024 at 5:17 PM CST, rsbecker wrote:
>> On November 29, 2024 6:14 PM, Caleb White writes:
>>>On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
>>>> General comment on this series: Is there a mechanism of preserving
>>>> existing functionality for those of us who have existing scripts
>>>> that depend on the existing branch and worktree naming?
>>>
>>>Existing worktrees will continue to work as they do now. The only
>>>change is the worktree id for new worktrees. However, there's not an
>>>option to preserve the existing behavior for new worktrees (nor do I think there
>should be).
>>
>> I do not agree. Companies that have existing scripts should have some
>> way to preserve their investment. Just saying "No more worktrees for
>> you" is not really considerate.
>
>How exactly are your scripts depending on the worktree id? There are very few
>reasons a script might need to know the worktree id, and I suspect that there's
>some confusion here. The worktree name is still used with the `git worktree`
>commands, so there no change on that front.

The graphic describing this showed the id in addition to the worktree name.
During cleanup detection, the directory of the worktree is significant. If that
Observation is wrong, I retract all this.

Caleb White Nov. 30, 2024, 12:08 a.m. UTC | #6

On Fri Nov 29, 2024 at 5:44 PM CST, rsbecker wrote:
> On November 29, 2024 6:29 PM, Caleb White wrote:
>>On Fri Nov 29, 2024 at 5:17 PM CST, rsbecker wrote:
>>> On November 29, 2024 6:14 PM, Caleb White writes:
>>>>On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
>>>>> General comment on this series: Is there a mechanism of preserving
>>>>> existing functionality for those of us who have existing scripts
>>>>> that depend on the existing branch and worktree naming?
>>>>
>>>>Existing worktrees will continue to work as they do now. The only
>>>>change is the worktree id for new worktrees. However, there's not an
>>>>option to preserve the existing behavior for new worktrees (nor do I think there
>>should be).
>>>
>>> I do not agree. Companies that have existing scripts should have some
>>> way to preserve their investment. Just saying "No more worktrees for
>>> you" is not really considerate.
>>
>>How exactly are your scripts depending on the worktree id? There are very few
>>reasons a script might need to know the worktree id, and I suspect that there's
>>some confusion here. The worktree name is still used with the `git worktree`
>>commands, so there no change on that front.
>
> The graphic describing this showed the id in addition to the worktree name.
> During cleanup detection, the directory of the worktree is significant. If that
> Observation is wrong, I retract all this.

So here's the graphic again:

    foo/
    ├── .git/worktrees/develop-5445874156/
    └── develop/

Here, the `develop` directory is the worktree directory (this can be
located anywhere), and the `develop-5445874156` is the worktree id.
However, the worktree id can already be something like `develop1`
or something else entirely if the `develop` directory was renamed in the
past. Again, there are very few things a script should need to know the
worktree id for.

If the `develop` directory is deleted, cleanup detection is handled
by the `git worktree prune` command, which will remove worktrees under
`.git/worktrees/*` that are no longer valid. This happens automatically
after the expiry time or it can be executed manually. Of course,
executing `git worktree remove develop` will also remove the worktree
and its associated worktree id.

Best,

Caleb

Randall S. Becker Nov. 30, 2024, 12:38 a.m. UTC | #7

On November 29, 2024 7:09 PM, Caleb White wrote:
>To: rsbecker@nexbridge.com; git@vger.kernel.org
>Cc: 'shejialuo' <shejialuo@gmail.com>; 'Junio C Hamano' <gitster@pobox.com>
>Subject: Re: [PATCH v2 0/3] Ensure unique worktree ids across repositories
>
>On Fri Nov 29, 2024 at 5:44 PM CST, rsbecker wrote:
>> On November 29, 2024 6:29 PM, Caleb White wrote:
>>>On Fri Nov 29, 2024 at 5:17 PM CST, rsbecker wrote:
>>>> On November 29, 2024 6:14 PM, Caleb White writes:
>>>>>On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
>>>>>> General comment on this series: Is there a mechanism of preserving
>>>>>> existing functionality for those of us who have existing scripts
>>>>>> that depend on the existing branch and worktree naming?
>>>>>
>>>>>Existing worktrees will continue to work as they do now. The only
>>>>>change is the worktree id for new worktrees. However, there's not an
>>>>>option to preserve the existing behavior for new worktrees (nor do I
>>>>>think there
>>>should be).
>>>>
>>>> I do not agree. Companies that have existing scripts should have
>>>> some way to preserve their investment. Just saying "No more
>>>> worktrees for you" is not really considerate.
>>>
>>>How exactly are your scripts depending on the worktree id? There are
>>>very few reasons a script might need to know the worktree id, and I
>>>suspect that there's some confusion here. The worktree name is still
>>>used with the `git worktree` commands, so there no change on that front.
>>
>> The graphic describing this showed the id in addition to the worktree name.
>> During cleanup detection, the directory of the worktree is
>> significant. If that Observation is wrong, I retract all this.
>
>So here's the graphic again:
>
>    foo/
>    ├── .git/worktrees/develop-5445874156/
>    └── develop/
>
>Here, the `develop` directory is the worktree directory (this can be located
>anywhere), and the `develop-5445874156` is the worktree id.
>However, the worktree id can already be something like `develop1` or something
>else entirely if the `develop` directory was renamed in the past. Again, there are very
>few things a script should need to know the worktree id for.
>
>If the `develop` directory is deleted, cleanup detection is handled by the `git
>worktree prune` command, which will remove worktrees under `.git/worktrees/*`
>that are no longer valid. This happens automatically after the expiry time or it can be
>executed manually. Of course, executing `git worktree remove develop` will also
>remove the worktree and its associated worktree id.

This last bit is an assumption, and not necessarily valid. Scripts that use worktrees
may maintain lists or their own pointers. It is important to be able to emulate
cleanup functions - something I discovered early in the worktree functions
when released. I need to make sure that cleanup will continue to have enough
information - prior to git worktree cleanup - to function correctly. This will
need coordination with people who have such scripts in my community. It
probably will not impact you, but I would have appreciated more than one
release notice on this capability.

Caleb White Nov. 30, 2024, 4:08 p.m. UTC | #8

On Fri Nov 29, 2024 at 6:38 PM CST, rsbecker wrote:
> On November 29, 2024 7:09 PM, Caleb White wrote:
>>If the `develop` directory is deleted, cleanup detection is handled by the `git
>>worktree prune` command, which will remove worktrees under `.git/worktrees/*`
>>that are no longer valid. This happens automatically after the expiry time or it can be
>>executed manually. Of course, executing `git worktree remove develop` will also
>>remove the worktree and its associated worktree id.
>
> This last bit is an assumption, and not necessarily valid. Scripts that use worktrees
> may maintain lists or their own pointers. It is important to be able to emulate
> cleanup functions - something I discovered early in the worktree functions
> when released. I need to make sure that cleanup will continue to have enough
> information - prior to git worktree cleanup - to function correctly. This will
> need coordination with people who have such scripts in my community. It
> probably will not impact you, but I would have appreciated more than one
> release notice on this capability.

I'm not sure I understand the specific use-case you're talking about.
Could you provide an example?

However, I suppose I can add a config / env variable to be able to
disable this new functionality.

Best,

Caleb

Randall S. Becker Nov. 30, 2024, 5:16 p.m. UTC | #9

On November 30, 2024 11:08 AM, Caleb White wrote:
>On Fri Nov 29, 2024 at 6:38 PM CST, rsbecker wrote:
>> On November 29, 2024 7:09 PM, Caleb White wrote:
>>>If the `develop` directory is deleted, cleanup detection is handled by
>>>the `git worktree prune` command, which will remove worktrees under
>>>`.git/worktrees/*` that are no longer valid. This happens
>>>automatically after the expiry time or it can be executed manually. Of
>>>course, executing `git worktree remove develop` will also remove the worktree
>and its associated worktree id.
>>
>> This last bit is an assumption, and not necessarily valid. Scripts
>> that use worktrees may maintain lists or their own pointers. It is
>> important to be able to emulate cleanup functions - something I
>> discovered early in the worktree functions when released. I need to
>> make sure that cleanup will continue to have enough information -
>> prior to git worktree cleanup - to function correctly. This will need
>> coordination with people who have such scripts in my community. It
>> probably will not impact you, but I would have appreciated more than one release
>notice on this capability.
>
>I'm not sure I understand the specific use-case you're talking about.
>Could you provide an example?

Speaking as a professional product manager...

I'm not expressing "maintaining compatibility for 2 releases" or something like
that is a reasonable use case. There are customers who depend on things
working in a particular way. It is fine if you want to change it and improve it,
and I am supportive. However, when making a change that causes git to
behave differently without allowing people to plan for such a change is
impolite. People outside this list do not read each patch looking for
compatibility breaking changes - they only get told in release notes. A
statement like "this is going to change with 2.49" for a breaking
enhancement is what I would expect - unless it is a defect correction.

>However, I suppose I can add a config / env variable to be able to disable this new
>functionality.

That would be very helpful although an opt-in is generally better than an
opt-out.

I think we should have this as a general policy, not just for this series.

Thanks,
Randall

Junio C Hamano Dec. 2, 2024, 2 a.m. UTC | #10

Caleb White <cdwhite3@pm.me> writes:

> - Add the worktree id to `worktree list` output

I have always thought that we deliberately hid the "ID" from the end
user's view because it is a mere implementation detail, and used the
filesystem path of the worktree directory instead to identify each
instance of the worktree in the end-user interaction.  It is unclear
why this change is a good idea from that point of view.

Thanks.

shejialuo Dec. 2, 2024, 11:46 a.m. UTC | #11

On Mon, Dec 02, 2024 at 11:00:51AM +0900, Junio C Hamano wrote:
> Caleb White <cdwhite3@pm.me> writes:
> 
> > - Add the worktree id to `worktree list` output
> 
> I have always thought that we deliberately hid the "ID" from the end
> user's view because it is a mere implementation detail, and used the
> filesystem path of the worktree directory instead to identify each
> instance of the worktree in the end-user interaction.  It is unclear
> why this change is a good idea from that point of view.
> 

I have a discussion with Caleb in the first version. Because appending a
hash / random number will cause the worktree id has more digits, the
user cannot easily use the following commands to make a ref in the main
worktree point to linked worktree ref:

    ```sh
    git symbolic-ref refs/heads/foo \
        worktrees/<worktree id>/refs/worktree/foo
    ```

And I expressed my concern about above situation, the user types the
extra hash to do above. So, Caleb decides to list the worktree id.

Actually, this usage should not be common. But when implementing the
consistency check for files backend, you have told me that the above
situation could happen. And you have said we _deliberately_ hide the
"ID" from the end user's view.

But cross-ref operations between worktrees must explicitly specify the
worktree id, so I am wondering whether we should allow the user do
cross-ref operations in the first place:

  1. main worktree symref points to linked worktree ref.
  2. A linked worktree ref points to another linked worktree ref.

Thanks,
Jialuo

Junio C Hamano Dec. 3, 2024, 12:46 a.m. UTC | #12

shejialuo <shejialuo@gmail.com> writes:

> But cross-ref operations between worktrees must explicitly specify the
> worktree id, so I am wondering whether we should allow the user do
> cross-ref operations in the first place:
>
>   1. main worktree symref points to linked worktree ref.
>   2. A linked worktree ref points to another linked worktree ref.

What is a cross-ref operation?  A worktree is either the primary
working tree for a (non-bare) repository, or something added with
"git add worktree" (i.e. whose .git is not the repository but a
link file into the real repository).  Are you adding another mode
where a worktree points at another worktree and not the repository?

Eric Sunshine Dec. 3, 2024, 12:56 a.m. UTC | #13

On Mon, Dec 2, 2024 at 7:46 PM Junio C Hamano <gitster@pobox.com> wrote:
> shejialuo <shejialuo@gmail.com> writes:
> > But cross-ref operations between worktrees must explicitly specify the
> > worktree id, so I am wondering whether we should allow the user do
> > cross-ref operations in the first place:
> >
> >   1. main worktree symref points to linked worktree ref.
> >   2. A linked worktree ref points to another linked worktree ref.
>
> What is a cross-ref operation?  A worktree is either the primary
> working tree for a (non-bare) repository, or something added with
> "git add worktree" (i.e. whose .git is not the repository but a
> link file into the real repository).  Are you adding another mode
> where a worktree points at another worktree and not the repository?

Unfortunately, I haven't had time lately to follow all these
worktree-related changes or discussions, but perhaps shejialuo is
referring to the ability, from within one worktree, to mention a ref
from another worktree (which is a feature Duy added some time after
the initial worktree implementation). Specifically, from the
git-worktree man page:

    Refs that are per-worktree can still be accessed from another
    worktree via two special paths, main-worktree and worktrees. The
    former gives access to per-worktree refs of the main worktree,
    while the latter to all linked worktrees.

    For example, main-worktree/HEAD or main-worktree/refs/bisect/good
    resolve to the same value as the main worktree’s HEAD and
    refs/bisect/good respectively. Similarly, worktrees/foo/HEAD or
    worktrees/bar/refs/bisect/bad are the same as
    $GIT_COMMON_DIR/worktrees/foo/HEAD and
    $GIT_COMMON_DIR/worktrees/bar/refs/bisect/bad.

shejialuo Dec. 3, 2024, 1:24 a.m. UTC | #14

On Tue, Dec 03, 2024 at 09:46:12AM +0900, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > But cross-ref operations between worktrees must explicitly specify the
> > worktree id, so I am wondering whether we should allow the user do
> > cross-ref operations in the first place:
> >
> >   1. main worktree symref points to linked worktree ref.
> >   2. A linked worktree ref points to another linked worktree ref.
> 
> What is a cross-ref operation?  A worktree is either the primary
> working tree for a (non-bare) repository, or something added with
> "git add worktree" (i.e. whose .git is not the repository but a
> link file into the real repository).  Are you adding another mode
> where a worktree points at another worktree and not the repository?

I am sorry that my words may confuse you here. And the Eric has already
explained what I mean here. At current, we have the ability to mention
a ref another worktree within the current worktree. You also have tole
me that in [1], there is a possibility that the user could create a
symbolic link to some worktree-specific ref in another worktree.

So, if our intention is to deliberately hide the worktree id. Why we
allow such ability? Let me give an example.

If we are in the main-worktree, and we want to access the worktree
specified refs, we must specify the worktree id like the following
(also if we are in the linked-worktree, we want to access another
linked-worktree refs):

    worktrees/<worktree id>/refs/worktree/foo

We do not want to the user know the worktree id. However, we allow
above. This is something I feel really strange during the review
process. To the front-end user, the worktree path is the interface.
However, for above ability, we need the user to explicitly specify the
worktree id.

From my perspective, this is not a good design which is against our
design.

[1] https://lore.kernel.org/git/xmqq5xqn8w6r.fsf@gitster.g/

Junio C Hamano Dec. 3, 2024, 1:46 a.m. UTC | #15

Eric Sunshine <sunshine@sunshineco.com> writes:

> Unfortunately, I haven't had time lately to follow all these
> worktree-related changes or discussions, but perhaps shejialuo is
> referring to the ability, from within one worktree, to mention a ref
> from another worktree (which is a feature Duy added some time after
> the initial worktree implementation).

Ah, yes, that exposes (and has to expose) the worktree ID.  It still
does not have to be unique across repositories (only has to unique
among the worktrees that share the same single repository).

Thanks.

Randall S. Becker Dec. 3, 2024, 1:53 a.m. UTC | #16

On December 2, 2024 8:46 PM, Junio C Hamano wrote:
>Eric Sunshine <sunshine@sunshineco.com> writes:
>
>> Unfortunately, I haven't had time lately to follow all these
>> worktree-related changes or discussions, but perhaps shejialuo is
>> referring to the ability, from within one worktree, to mention a ref
>> from another worktree (which is a feature Duy added some time after
>> the initial worktree implementation).
>
>Ah, yes, that exposes (and has to expose) the worktree ID.  It still does
not have to
>be unique across repositories (only has to unique among the worktrees that
share
>the same single repository).

I might be mistaken, but I think the intent of the worktree series being
discussed
deliberately wanted the worktree ID to be globally unique on a specific
machine.
I might be wrong (hoping I am). The original author should comment on this.

Junio C Hamano Dec. 3, 2024, 2:30 a.m. UTC | #17

<rsbecker@nexbridge.com> writes:

>>Ah, yes, that exposes (and has to expose) the worktree ID.  It still does
> not have to
>>be unique across repositories (only has to unique among the worktrees that
> share
>>the same single repository).
>
> I might be mistaken, but I think the intent of the worktree series being
> discussed
> deliberately wanted the worktree ID to be globally unique on a specific
> machine.

That is my understanding, but I do not understand why such a
uniqueness is needed.  Repositories are not even aware of other
repositories, in any sense to make it matter to know worktree IDs
other repositories are using.  Until there is an attempt to link a
worktree that used to belong to a repository to a different
repository, that is.  At that time, names must be made unique among
worktrees that belong to the adopting repository, of course, but the
names used in the original repository for its worktrees would not
matter at that point, I would think.

> I might be wrong (hoping I am). The original author should comment on this.

Sure.  Thanks for commenting.

Caleb White Dec. 3, 2024, 3:42 a.m. UTC | #18

On Mon Dec 2, 2024 at 8:30 PM CST, Junio C Hamano wrote:
> <rsbecker@nexbridge.com> writes:
>
>>>Ah, yes, that exposes (and has to expose) the worktree ID.  It still does
>> not have to
>>>be unique across repositories (only has to unique among the worktrees that
>> share
>>>the same single repository).
>>
>> I might be mistaken, but I think the intent of the worktree series being
>> discussed
>> deliberately wanted the worktree ID to be globally unique on a specific
>> machine.
>
> That is my understanding, but I do not understand why such a
> uniqueness is needed.  Repositories are not even aware of other
> repositories, in any sense to make it matter to know worktree IDs
> other repositories are using.  Until there is an attempt to link a
> worktree that used to belong to a repository to a different
> repository, that is.  At that time, names must be made unique among
> worktrees that belong to the adopting repository, of course, but the
> names used in the original repository for its worktrees would not
> matter at that point, I would think.

Perhaps I should've have come up with a better series name, I think
there's been a lot of hang-up with the term "unique". When I refer to
uniqueness in this context, I'm not advocating for strict, absolute
uniqueness in the sense of ensuring no collisions under any conceivable
circumstance, or requiring that repositories are now aware of other
repositories. Instead, I'm discussing uniqueness from a practical
perspective: the combination of a random 32-bit integer from a CSPRNG
with a worktree name should be "unique" for all intents and purposes.
The theoretical risk of a collision does exist, of course, but the
probability is astronomically lower than the current approach, rendering
it effectively "unique" in practice.

You're correct in that the worktree ids are only relevant within the
context of a single repository. However, I've already demonstrated that
it's possible for a repository to "repair" (i.e., take over) a worktree
belonging to another repository if the ids match (inferred backlink).
In my experience, there's some pretty common names for worktrees (e.g.,
"main", "master", "develop", "hotfix", etc.), and it's not uncommon for
multiple repositories to have worktrees with the same name. This can be
avoided entirely by introducing some randomness into the worktree id and
significantly reducing the probability of a collision (e.g., one
repository would have a `develop-54678976` id while another would have
a `develop-987465246` id), which is the primary motivation behind this
series.

As I've mentioned earlier, the concept of a suffix is not new and should
not be a breaking change. It's already possible to have worktrees with
a different id from the public worktree directory name, so users and
scripts should not just assume them to be the same (this is buggy
behavior), but instead should be querying the worktree id from the `.git`
file or `git rev-parse --git-dir` if they really need it (very rare).
As part of this series I did add the worktree id to the `worktree list`
output to make it easier for scripts to query this information if they
do need it.

Perhaps this "take-over" scenario is not that big of a concern in
practice, I just noted that this behavior was made possible in the
`es/worktree-repair-copied` topic and I thought it was worth addressing.
If it's decided that this is not that big of a concern, then I suppose
this series can be dropped (although I've made some other QoL
improvements that may be useful).

Best,

Caleb

Junio C Hamano Dec. 3, 2024, 4:37 a.m. UTC | #19

Caleb White <cdwhite3@pm.me> writes:

> You're correct in that the worktree ids are only relevant within the
> context of a single repository. However, I've already demonstrated that
> it's possible for a repository to "repair" (i.e., take over) a worktree
> belonging to another repository if the ids match (inferred backlink).

I know.  But isn't that a BUG in the code that "repair"s?  If a
worktree had a name 'develop' that was OK in the context of
repository X, and when you "repair" things so that it becomes one of
the worktrees of a different repository Y, the "repair" operation is
what MUST make sure that the worktree that used to be known as
'develop' to repository X does not interfere any existing worktrees
that is attached to the repository Y.  If the repository Y already
had a worktree called 'develop', the "repair" operation must make
sure that the newly adopted worktree would get a different name.

But then, the concern is exactly the same when you try to create a
new worktree (no "repair" involved) in repository Y and try to give
it a name 'develop', isn't it?  You have to make sure that there is
no worktree that is called 'develop' in the repository Y before
giving it the name.  Is it broken?  If not, what are we doing to
make sure we won't give the name 'develop' to the new worktree?
Certainly we do not use any hash or random number for that, so why
does this new series need to use a random number?

Caleb White Dec. 3, 2024, 5:31 a.m. UTC | #20

On Mon Dec 2, 2024 at 10:37 PM CST, Junio C Hamano wrote:
> Caleb White <cdwhite3@pm.me> writes:
>
>> You're correct in that the worktree ids are only relevant within the
>> context of a single repository. However, I've already demonstrated that
>> it's possible for a repository to "repair" (i.e., take over) a worktree
>> belonging to another repository if the ids match (inferred backlink).
>
> I know.  But isn't that a BUG in the code that "repair"s?  If a
> worktree had a name 'develop' that was OK in the context of
> repository X, and when you "repair" things so that it becomes one of
> the worktrees of a different repository Y, the "repair" operation is
> what MUST make sure that the worktree that used to be known as
> 'develop' to repository X does not interfere any existing worktrees
> that is attached to the repository Y.  If the repository Y already
> had a worktree called 'develop', the "repair" operation must make
> sure that the newly adopted worktree would get a different name.

No, this is incorrect---there should be no reason to "repair" a worktree
from another repository in the first place. That would be undefined
behavior and is indicative of a user mistake or unintentional repair
(e.g., it would make no sense for the php-src repository to repair
a worktree from this git repository). The repair operation is only
intended to be used with worktrees of the same repository.

> But then, the concern is exactly the same when you try to create a
> new worktree (no "repair" involved) in repository Y and try to give
> it a name 'develop', isn't it?  You have to make sure that there is
> no worktree that is called 'develop' in the repository Y before
> giving it the name.  Is it broken?  If not, what are we doing to
> make sure we won't give the name 'develop' to the new worktree?
> Certainly we do not use any hash or random number for that, so why
> does this new series need to use a random number?

We currently suffix an auto incrementing number in this use case, so 
you can have two `develop` worktrees (located in different directories
of course), and one will have an id of `develop` and the other will have
an id of `develop1`.

Best,

[v2,0/3] Ensure unique worktree ids across repositories

Message

Comments