mbox series

[v2,0/5] Sparse checkout: fix bug with worktree of bare repo

Message ID pull.1101.v2.git.1640114048.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse checkout: fix bug with worktree of bare repo | expand

Message

Philippe Blain via GitGitGadget Dec. 21, 2021, 7:14 p.m. UTC
This patch series includes a fix to the bug reported by Sean Allred [1] and
diagnosed by Eric Sunshine [2].

The root cause is that 'git sparse-checkout init' writes to the worktree
config without checking that core.bare might need to be set. This only
matters when the base repository is bare, since creating the config.worktree
file and enabling extensions.worktreeConfig will cause Git to treat the base
repo's core.bare=false as important for this worktree.

This series fixes this, but also puts in place some helpers to prevent this
from happening in the future. While here, some of the config paths are
modified to take a repository struct.

The critical bits are in Patches 3, 4, and 5 which introduce a helper for
upgrading to worktree config, a helper to write to worktree config, and then
consume that config helper in builtin/sparse-checkout.c and sparse-index.c.

[1]
https://lore.kernel.org/git/CABceR4bZmtC4rCwgxZ1BBYZP69VOUca1f_moJoP989vTUZWu9Q@mail.gmail.com/
[2]
https://lore.kernel.org/git/CAPig+cQ6U_yFw-X2OWrizB1rbCvc4bNxuSzKFzmoLNnm0GH8Eg@mail.gmail.com/


Update in v2
============

 * Eric correctly pointed out that I was writing core.bare incorrectly. It
   should move out of the core config and into the core repository's
   worktree config.
 * Patch 3 is new, separating the "upgrade" logic out of config.c, but it is
   still called by the config helper to make it painless to write worktree
   config.

Thanks, -Stolee

Derrick Stolee (5):
  setup: use a repository when upgrading format
  config: make some helpers repo-aware
  worktree: add upgrade_to_worktree_config()
  config: add repo_config_set_worktree_gently()
  sparse-checkout: use repo_config_set_worktree_gently()

 builtin/sparse-checkout.c          | 25 +++++-----------
 config.c                           | 39 +++++++++++++++++++++++--
 config.h                           | 14 +++++++++
 list-objects-filter-options.c      |  2 +-
 repository.h                       |  2 +-
 setup.c                            |  6 ++--
 sparse-index.c                     | 10 ++-----
 t/t1091-sparse-checkout-builtin.sh | 16 +++++++++-
 worktree.c                         | 47 ++++++++++++++++++++++++++++++
 worktree.h                         | 12 ++++++++
 10 files changed, 140 insertions(+), 33 deletions(-)


base-commit: 69a9c10c95e28df457e33b3c7400b16caf2e2962
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1101%2Fderrickstolee%2Fsparse-checkout%2Fbare-worktree-bug-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1101/derrickstolee/sparse-checkout/bare-worktree-bug-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1101

Range-diff vs v1:

 1:  28813703ff6 = 1:  889e69dc45d setup: use a repository when upgrading format
 2:  3b549770eb9 = 2:  3e01356815a config: make some helpers repo-aware
 -:  ----------- > 3:  ed8e2a7b19d worktree: add upgrade_to_worktree_config()
 3:  67993f6cff2 ! 4:  22896e9bb04 config: add repo_config_set_worktree_gently()
     @@ Metadata
       ## Commit message ##
          config: add repo_config_set_worktree_gently()
      
     -    When adding config values to the worktree config, we might enable the
     -    extensions.worktreeConfig setting and create the config.worktree file
     -    for the first time. When the base repository is bare, this creates a
     -    change of behavior for determining if the worktree is bare or not. A
     -    worktree off of a bare repository is assumed to be non-bare when
     -    extensions.worktreeConfig is disabled. When extensions.worktreeConfig is
     -    enabled but config.worktree is empty, the worktree is considered bare
     -    because the base repo's core.bare=true setting is used.
     +    The previous change added upgrade_to_worktree_config() to assist
     +    creating a worktree-specific config for the first time. However, this
     +    requires every config writer to care about that upgrade before writing
     +    to the worktree-specific config. In addition, callers need to know how
     +    to generate the name of the config.worktree file and pass it to the
     +    config API.
      
     -    To avoid issues like this, create a helper that initializes all the
     -    right settings in the correct order. A caller will be added in the next
     -    change.
     +    To assist, create a new repo_config_set_worktree_gently() method in the
     +    config API that handles the upgrade_to_worktree_config() method in
     +    addition to assigning the value in the worktree-specific config. This
     +    will be consumed by an upcoming change.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## config.c ##
     +@@
     + #include "dir.h"
     + #include "color.h"
     + #include "refs.h"
     ++#include "worktree.h"
     + 
     + struct config_source {
     + 	struct config_source *prev;
      @@ config.c: int git_config_set_gently(const char *key, const char *value)
       	return git_config_set_multivar_gently(key, value, NULL, 0);
       }
     @@ config.c: int git_config_set_gently(const char *key, const char *value)
      +int repo_config_set_worktree_gently(struct repository *r,
      +				    const char *key, const char *value)
      +{
     -+	int res;
     -+	const char *config_filename = repo_git_path(r, "config.worktree");
     -+
     -+	/*
     -+	 * Ensure that core.bare reflects the current worktree, since the
     -+	 * logic for is_bare_repository() changes if extensions.worktreeConfig
     -+	 * is disabled.
     -+	 */
     -+	if ((res = git_config_set_multivar_in_file_gently(config_filename, "core.bare",
     -+							  r->worktree ? "false" : "true",
     -+							  NULL, 0))) {
     -+		error(_("unable to set core.bare setting in worktree config"));
     -+		return res;
     -+	}
     -+	if (upgrade_repository_format(r, 1) < 0)
     -+		return error(_("unable to upgrade repository format to enable worktreeConfig"));
     -+	if ((res = git_config_set_gently("extensions.worktreeConfig", "true"))) {
     -+		error(_("failed to set extensions.worktreeConfig setting"));
     -+		return res;
     -+	}
     -+
     -+	return git_config_set_multivar_in_file_gently(config_filename, key, value, NULL, 0);
     ++	return upgrade_to_worktree_config(r) ||
     ++	       git_config_set_multivar_in_file_gently(
     ++			 repo_git_path(r, "config.worktree"),
     ++			 key, value, NULL, 0);
      +}
      +
       void git_config_set(const char *key, const char *value)
     @@ config.h: void git_config_set_in_file(const char *, const char *, const char *);
       
      +/**
      + * Write a config value into the config.worktree file for the current
     -+ * worktree. This will initialize extensions.worktreeConfig if necessary.
     ++ * worktree. This will initialize extensions.worktreeConfig if necessary,
     ++ * which might trigger some changes to the root repository's config file.
      + */
      +int repo_config_set_worktree_gently(struct repository *, const char *, const char *);
      +
 4:  6202f767f4a ! 5:  06457fafa78 sparse-checkout: use repo_config_set_worktree_gently()
     @@ t/t1091-sparse-checkout-builtin.sh: test_expect_success 'git sparse-checkout ini
      +	(
      +		cd worktree &&
      +		git sparse-checkout init &&
     -+		test_cmp_config false core.bare &&
     ++		test_must_fail git config core.bare &&
      +		git sparse-checkout set /*
     -+	)
     ++	) &&
     ++	git -C bare config --list --show-origin >actual &&
     ++	grep "file:config.worktree	core.bare=true" actual
      +'
      +
       test_expect_success 'git sparse-checkout list after init' '

Comments

Eric Sunshine Dec. 22, 2021, 6:05 a.m. UTC | #1
On Tue, Dec 21, 2021 at 2:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> The root cause is that 'git sparse-checkout init' writes to the worktree
> config without checking that core.bare might need to be set. This only
> matters when the base repository is bare, since creating the config.worktree
> file and enabling extensions.worktreeConfig will cause Git to treat the base
> repo's core.bare=false as important for this worktree.

I'm having trouble understanding what this is trying to say. Did you
mean "true" rather than "false" in the final sentence? Anyhow, I think
this description is somewhat stale. A more succinct way to describe
the issue is that `git sparse-checkout init` wasn't correctly
upgrading the repo to per-worktree configuration, with the result that
the `core.bare=true` setting of a bare repo bled into
worktree-specific configuration, which caused a bit of havoc. This
patch series fixes `init` to upgrade the repo properly.

> The critical bits are in Patches 3, 4, and 5 which introduce a helper for
> upgrading to worktree config, a helper to write to worktree config, and then
> consume that config helper in builtin/sparse-checkout.c and sparse-index.c.
>
> Update in v2
> ============
>
>  * Eric correctly pointed out that I was writing core.bare incorrectly. It
>    should move out of the core config and into the core repository's
>    worktree config.
>  * Patch 3 is new, separating the "upgrade" logic out of config.c, but it is
>    still called by the config helper to make it painless to write worktree
>    config.

It's good to see the "upgrade to per-worktree config" split out to a
standalone single-purpose utility function. I left several review
comments in that patch, the most important of which is that the
implementation is incomplete (because it doesn't relocate
`core.worktree`), thus it can leave the repo in an inconsistent and
broken state. Several of the other review comments are actionable, as
well.

I'm still concerned about and uncomfortable with the implementation of
the new repo_config_set_worktree_gently(), but I've left ample
comments about that elsewhere in this discussion, so I needn't go into
it here.

Thanks for working on this.
Elijah Newren Dec. 22, 2021, 10:54 p.m. UTC | #2
On Wed, Dec 22, 2021 at 8:00 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This patch series includes a fix to the bug reported by Sean Allred [1] and
> diagnosed by Eric Sunshine [2].
>
> The root cause is that 'git sparse-checkout init' writes to the worktree
> config without checking that core.bare might need to be set. This only
> matters when the base repository is bare, since creating the config.worktree
> file and enabling extensions.worktreeConfig will cause Git to treat the base
> repo's core.bare=false as important for this worktree.
>
> This series fixes this, but also puts in place some helpers to prevent this
> from happening in the future. While here, some of the config paths are
> modified to take a repository struct.
>
> The critical bits are in Patches 3, 4, and 5 which introduce a helper for
> upgrading to worktree config, a helper to write to worktree config, and then
> consume that config helper in builtin/sparse-checkout.c and sparse-index.c.

Based on the description I went and fetched the patch series and tried it out.

This feels like a bandaid to me.  In addition to fixating on core.bare
(thus overlooking core.worktree), it also overlooks that people can
use worktrees without using sparse-checkout.  What if they do
something like:

  git clone --bare $URL myrepo
  cd myrepo
  git worktree add foo
  git worktree add bar
  git worktree add baz
  ... days/weeks later ...
  cd foo
  git config extensions.worktreeConfig true
  git config status.showUntrackedFiles no  # Or other config options
  ... hours/days later ..
  cd ../bar
  git status

At this point the user gets "fatal: this operation must be run in a
work tree".  And it's much, much worse if the original clone was not
bare, but had core.worktree pointing somewhere else (because then the
`git status` invocation will show the differences between the *other*
worktree with the HEAD of *this* worktree).  I think that "git
worktree add" should check if either core.bare is false or
core.worktree is set, and if so then set extensions.worktreeConfig and
migrate the relevant config.

While there may be some concern about non-git clients not
understanding extensions.worktreeConfig, I'd say that this type of
situation is one where we are just flat broken without it.  Without
it, we're stuck in a situation that will: (a) confuse users ("Why does
core.bare/core.worktree seem to get ignored or have incorrect
values?"), (b) confuse non-git clients (are they really going to have
the tricky logic to overrule core.bare/core.worktree when in another
worktree?), (c) confuse git itself after subsequent configuration
tweaks, and (d) (related to item c) lead to ever more complicated
logic in core git to attempt to know when and how to overrule
core.bare and core.worktree as additional concerns enter the picture
(e.g. will someone contribute code to override core.bare/core.worktree
when run from a worktree with extensions.worktreeConfig=true, just as
someone originally wrote code to override core.bare/core.worktree when
the extensions.worktreeConfig setting wasn't present?)


I also think `git worktree add` should handle additional configuration
items related to sparse checkouts (as we've discussed elsewhere in the
past), but that's going a bit outside the scope of this series; I only
mention it so that we're aware the functionality added to `git
worktree add` will be getting some additions in the future.
Eric Sunshine Dec. 27, 2021, 7:15 a.m. UTC | #3
On Wed, Dec 22, 2021 at 5:54 PM Elijah Newren <newren@gmail.com> wrote:
> On Wed, Dec 22, 2021 at 8:00 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > This patch series includes a fix to the bug reported by Sean Allred [1] and
> > diagnosed by Eric Sunshine [2].
>
> This feels like a bandaid to me.  In addition to fixating on core.bare
> (thus overlooking core.worktree), it also overlooks that people can
> use worktrees without using sparse-checkout.  What if they do
> something like:
>
>   git clone --bare $URL myrepo
>   cd myrepo
>   git worktree add foo
>   git worktree add bar
>   git worktree add baz
>   ... days/weeks later ...
>   cd foo
>   git config extensions.worktreeConfig true
>   git config status.showUntrackedFiles no  # Or other config options
>   ... hours/days later ..
>   cd ../bar
>   git status
>
> At this point the user gets "fatal: this operation must be run in a
> work tree".

Your example indeed leads to a broken state because it doesn't follow
the instructions given by git-worktree.txt for enabling
`extensions.worktreeConfig`, which involves additional bookkeeping
operations beyond merely setting that config variable. It is exactly
this sort of situation which prompted me to suggest several
times[1,2,3] in the conversation following my diagnosis of the
problem, as well as in my reviews of this series, that we may want to
add a git-worktree subcommand which does all the necessary bookkeeping
to enable `extensions.worktreeConfig` rather than expecting users to
handle it all manually. In [1], I called this hypothetical command
`git worktree manage --enable-worktree-config ` and in [4], I called
it `git worktree config --enable-per-worktree` (not because I like
either name, but because I couldn't think of anything better).

> I think that "git
> worktree add" should check if either core.bare is false or
> core.worktree is set, and if so then set extensions.worktreeConfig and
> migrate the relevant config.

(I think you meant "...if either core.bare is _true_ or...".)

Similar to my response to Sean in [1] and to Stolee in [2], while this
may help the situation for worktrees created _after_
`extensions.worktreeConfig` is enabled, it does _not_ help existing
worktrees at all. For this reason, in my opinion, `git worktree add`
is simply not the correct place to be addressing this problem, and
it's why I suggested a separate command for enabling the feature and
doing all the necessary bookkeeping. It's also why I suggested[2] that
in the long run, we may want per-worktree config to be the default
(and only) behavior rather than the current (legacy) behavior of all
config being shared between worktrees.

Aside from that, I'm uncomfortable with the suggestion that `git
worktree add` should be responsible for making these sort of dramatic
changes (upgrading to version=1 and enabling
`extensions.worktreeConfig`) to the repository automatically. That
seems very much out of scope for what this command should be doing. On
the other hand, I would have no problem with `git worktree add`
protecting users by detecting whether `core.bare=true` or
`core.worktree` is set in the shared .git/config file and aborting
with an error if so, and giving a "HINT" telling the user to enable
per-worktree config via the (hypothetical) `git worktree config
--enable-per-worktree` command.

Regarding your feeling that this patch series is a "band-aid", while I
agree with you that we ultimately need a better approach, such as the
hypothetical `git worktree config --enable-per-worktree` (or
eventually making per-worktree config be the default), that better
solution does not need to be implemented today, and certainly
shouldn't derail _this_ patch series which is aimed at fixing a very
real bug which exists presently in `git sparse-checkout init`. This
patch series does need a good number of improvements and fixes before
it is ready -- as indicated by my v2 review comments[4,5,6], the most
obvious of which is its missing handling of `core.worktree` -- but I
do think this series is headed in the correct direction by focusing on
fixing the immediate problem with `git sparse-checkout init` (and
paving the way for an eventual more complete solution, such as `git
worktree config --enable-per-worktree `).

[1]: https://lore.kernel.org/git/CAPig+cQ6U_yFw-X2OWrizB1rbCvc4bNxuSzKFzmoLNnm0GH8Eg@mail.gmail.com/
[2]: https://lore.kernel.org/git/CAPig+cQPUe9REf+wgVNjyak_nk3V361h-48rTFgk6TGC7vJgOA@mail.gmail.com/
[3]: https://lore.kernel.org/git/CAPig+cRombN-8g0t7Hs9qQypJoY41gK3+kvypH4D0G6EB4JgbQ@mail.gmail.com/
[4]: https://lore.kernel.org/git/CAPig+cQrJ9yWjkc8VMu=uyx_qtrXdL3cNnxLVafoxOo6e-r9kw@mail.gmail.com/
[5]: https://lore.kernel.org/git/CAPig+cRi2SA6+poaemY8XR5ZoMweuztfiENpcRVOCnukg3Qa7w@mail.gmail.com/
[6]: https://lore.kernel.org/git/CAPig+cRuY40RNi4bC3CBfghLLqz74VUPRbaYJYEhmF78b0GfPQ@mail.gmail.com/#t

> I also think `git worktree add` should handle additional configuration
> items related to sparse checkouts (as we've discussed elsewhere in the
> past), but that's going a bit outside the scope of this series; I only
> mention it so that we're aware the functionality added to `git
> worktree add` will be getting some additions in the future.

I vaguely recall some mention of this not long ago on the list but
didn't follow the discussion at all. Do you have pointers or a
summary?
Eric Sunshine Dec. 27, 2021, 7:34 a.m. UTC | #4
On Mon, Dec 27, 2021 at 2:15 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Wed, Dec 22, 2021 at 5:54 PM Elijah Newren <newren@gmail.com> wrote:
> > I think that "git
> > worktree add" should check if either core.bare is false or
> > core.worktree is set, and if so then set extensions.worktreeConfig and
> > migrate the relevant config.
>
> Similar to my response to Sean in [1] and to Stolee in [2], while this
> may help the situation for worktrees created _after_
> `extensions.worktreeConfig` is enabled, it does _not_ help existing
> worktrees at all. For this reason, in my opinion, `git worktree add`
> is simply not the correct place to be addressing this problem, and
> it's why I suggested a separate command for enabling the feature and
> doing all the necessary bookkeeping. It's also why I suggested[2] that
> in the long run, we may want per-worktree config to be the default
> (and only) behavior rather than the current (legacy) behavior of all
> config being shared between worktrees.

Thinking upon it further, I take back what I said about it not helping
existing worktrees.

Your proposal is _almost_ the same as my suggestion of eventually
making per-worktree config the default. The difference is that you're
only making it the default if `core.bare=true` or `core.worktree` is
set. But do we need that distinction? If people are comfortable with
that, then are they comfortable with simply flipping the switch and
making per-worktree config the default today regardless of `core.bare`
and `core.worktree`?

I'm not sure that I'm comfortable with it due to the potential of
breaking older versions of git which don't understand
`extensions.worktreeConfig`, as well as breaking third-party tools,
but maybe other people feel differently?
Elijah Newren Dec. 27, 2021, 7:35 p.m. UTC | #5
On Sun, Dec 26, 2021 at 11:15 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Wed, Dec 22, 2021 at 5:54 PM Elijah Newren <newren@gmail.com> wrote:
> > On Wed, Dec 22, 2021 at 8:00 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > > This patch series includes a fix to the bug reported by Sean Allred [1] and
> > > diagnosed by Eric Sunshine [2].
> >
> > This feels like a bandaid to me.  In addition to fixating on core.bare
> > (thus overlooking core.worktree), it also overlooks that people can
> > use worktrees without using sparse-checkout.  What if they do
> > something like:
> >
> >   git clone --bare $URL myrepo
> >   cd myrepo
> >   git worktree add foo
> >   git worktree add bar
> >   git worktree add baz
> >   ... days/weeks later ...
> >   cd foo
> >   git config extensions.worktreeConfig true
> >   git config status.showUntrackedFiles no  # Or other config options
> >   ... hours/days later ..
> >   cd ../bar
> >   git status
> >
> > At this point the user gets "fatal: this operation must be run in a
> > work tree".
>
> Your example indeed leads to a broken state because it doesn't follow
> the instructions given by git-worktree.txt for enabling
> `extensions.worktreeConfig`, which involves additional bookkeeping
> operations beyond merely setting that config variable.

These are instructions which neither Stolee nor I was aware of prior
to your pointing it out.  Not only had we often flipped that variable,
we did so for many of our users (I know I did for some time prior to
Stolee introducing sparse-checkout).  With `git sparse-checkout` we
propagated that to many more users, and now have many repositories out
in the wild that have been set up for _years_ in violation of these
instructions.  So, even if Stolee and I are independently particularly
bad about noticing the relevant documentation, we now have a situation
where people can discover this misconfiguration just by looking around
in their config.  Once they notice it, they may well copy it
elsewhere.

I'd suspect that Stolee and I are actually _more_ likely to be aware
of relevant documentation than the average Git user, so if we missed
it, I suspect many of them will.  Especially now that we've amplified
their opportunities for discovering repositories set up in
contravention to that documentation.

So, I don't think relying on folks to read this particular piece of
documentation is a reliable course of action...at least not without
some changes to make it much more likely to be noticed.

> It is exactly
> this sort of situation which prompted me to suggest several
> times[1,2,3] in the conversation following my diagnosis of the
> problem, as well as in my reviews of this series, that we may want to
> add a git-worktree subcommand which does all the necessary bookkeeping
> to enable `extensions.worktreeConfig` rather than expecting users to
> handle it all manually. In [1], I called this hypothetical command
> `git worktree manage --enable-worktree-config ` and in [4], I called
> it `git worktree config --enable-per-worktree` (not because I like
> either name, but because I couldn't think of anything better).

How would users discover this new command and use it?  Is it any more
reliably discoverable than the above documentation?

Your suggestion sounds to me like "We know this command will break
things, so we'll provide another command they can use to avoid the
breakage, and hope they notice this new command and use it."  I'm sure
that's not your intent, and perhaps there's a way of making this
suggestion robust, but to me it just sounds like it leads to
inevitable breakage.  I'd rather just fix the command that can break
things.

> > I think that "git
> > worktree add" should check if either core.bare is false or
> > core.worktree is set, and if so then set extensions.worktreeConfig and
> > migrate the relevant config.
>
> (I think you meant "...if either core.bare is _true_ or...".)

Yes, indeed.

> Similar to my response to Sean in [1] and to Stolee in [2], while this
> may help the situation for worktrees created _after_
> `extensions.worktreeConfig` is enabled, it does _not_ help existing
> worktrees at all. For this reason, in my opinion, `git worktree add`
> is simply not the correct place to be addressing this problem, and
> it's why I suggested a separate command for enabling the feature and
> doing all the necessary bookkeeping. It's also why I suggested[2] that
> in the long run, we may want per-worktree config to be the default
> (and only) behavior rather than the current (legacy) behavior of all
> config being shared between worktrees.
>
> Aside from that, I'm uncomfortable with the suggestion that `git
> worktree add` should be responsible for making these sort of dramatic
> changes (upgrading to version=1 and enabling
> `extensions.worktreeConfig`) to the repository automatically. That
> seems very much out of scope for what this command should be doing. On
> the other hand, I would have no problem with `git worktree add`
> protecting users by detecting whether `core.bare=true` or
> `core.worktree` is set in the shared .git/config file and aborting
> with an error if so, and giving a "HINT" telling the user to enable
> per-worktree config via the (hypothetical) `git worktree config
> --enable-per-worktree` command.
>
> Regarding your feeling that this patch series is a "band-aid", while I
> agree with you that we ultimately need a better approach, such as the
> hypothetical `git worktree config --enable-per-worktree` (or
> eventually making per-worktree config be the default), that better
> solution does not need to be implemented today, and certainly
> shouldn't derail _this_ patch series which is aimed at fixing a very
> real bug which exists presently in `git sparse-checkout init`. This
> patch series does need a good number of improvements and fixes before
> it is ready -- as indicated by my v2 review comments[4,5,6], the most
> obvious of which is its missing handling of `core.worktree` -- but I
> do think this series is headed in the correct direction by focusing on
> fixing the immediate problem with `git sparse-checkout init` (and
> paving the way for an eventual more complete solution, such as `git
> worktree config --enable-per-worktree `).

Looks like you've changed your opinion a bit and it'd be better for me
to respond to these parts in your follow-up email.

> [1]: https://lore.kernel.org/git/CAPig+cQ6U_yFw-X2OWrizB1rbCvc4bNxuSzKFzmoLNnm0GH8Eg@mail.gmail.com/
> [2]: https://lore.kernel.org/git/CAPig+cQPUe9REf+wgVNjyak_nk3V361h-48rTFgk6TGC7vJgOA@mail.gmail.com/
> [3]: https://lore.kernel.org/git/CAPig+cRombN-8g0t7Hs9qQypJoY41gK3+kvypH4D0G6EB4JgbQ@mail.gmail.com/
> [4]: https://lore.kernel.org/git/CAPig+cQrJ9yWjkc8VMu=uyx_qtrXdL3cNnxLVafoxOo6e-r9kw@mail.gmail.com/
> [5]: https://lore.kernel.org/git/CAPig+cRi2SA6+poaemY8XR5ZoMweuztfiENpcRVOCnukg3Qa7w@mail.gmail.com/
> [6]: https://lore.kernel.org/git/CAPig+cRuY40RNi4bC3CBfghLLqz74VUPRbaYJYEhmF78b0GfPQ@mail.gmail.com/#t
>
> > I also think `git worktree add` should handle additional configuration
> > items related to sparse checkouts (as we've discussed elsewhere in the
> > past), but that's going a bit outside the scope of this series; I only
> > mention it so that we're aware the functionality added to `git
> > worktree add` will be getting some additions in the future.
>
> I vaguely recall some mention of this not long ago on the list but
> didn't follow the discussion at all. Do you have pointers or a
> summary?

For the microsoft repositories, sparse-checkouts are needed because a
full checkout is unmanageable (millions of files to check out
otherwise).  For other repositories, full checkouts might technically
be manageable, but are annoyingly slow and users may only want to work
with sparse checkouts (and for some of these, due to various
mono-repoization efforts, the repository is growing towards a size
where manageability of full checkouts is decreasing).

The fact that `git worktree add` does a full checkout is quite
painful...possibility to the point of making worktrees useless for
some users.  I think `git worktree add` should copy the sparsity of
the worktree from which it was invoked.

Addressing potential questions/objections to this proposal:
  * just requiring users to do a full checkout first is very
unfriendly: the checkout might not even fit in available disk space,
and even if it does fit, this has the performance penalty of inflating
and writing all files to disk only to delete a (vast?) majority of
them immediately after.  Users have shown a willingness to swallow a
lot of pain trying to figure out how to avoid that performance
penalty.
  * full-checkout: If users do want a full checkout of the new
worktree despite running from a sparse-checkout, it's a single command
away (`git sparse-checkout disable` or `<sparsity-wrapper-script>
--undo`).  And in that case, the invoked commands don't do huge
amounts of unnecessary work.
  * using --no-checkout as a proxy: This means no files checked out
and no index file.  The lack of an index file makes it appear that
everything was manually deleted (with the deletion staged).  Also, if
the project is using some kind of <sparsity-wrapper-script> (e.g. for
determining dependencies between directories so that appropriate
'modules' can be specified and transformed into a list of directories
passed to sparse-checkout), then the sparsity-wrapper-script isn't
available to them to invoke.  If users try to check out just the
wrapper file, then an index will be created and have just one entry
and we kind of cement the fact that all other files look like they
were intended to be deleted.  Also, even if the user runs `git
sparse-checkout init --cone`, you don't actually don't transform this
no-checkout into a sparse checkout because sparse-checkout doesn't
want to undo your staged deletions.  Despite the fact that I'm very
familiar with all the implementation internals, it was not obvious to
me all the necessary additional commands needed for users to get a
sparse checkout while making use of --no-checkout.  Users stand little
chance of figuring the necessary command invocations out without a
huge amount of effort (and they've given up and come to me before
asking for help, and my first response turned out to be incomplete in
various cases...).
Elijah Newren Dec. 27, 2021, 8:16 p.m. UTC | #6
On Sun, Dec 26, 2021 at 11:34 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Mon, Dec 27, 2021 at 2:15 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > On Wed, Dec 22, 2021 at 5:54 PM Elijah Newren <newren@gmail.com> wrote:
> > > I think that "git
> > > worktree add" should check if either core.bare is false or
> > > core.worktree is set, and if so then set extensions.worktreeConfig and
> > > migrate the relevant config.
> >
> > Similar to my response to Sean in [1] and to Stolee in [2], while this
> > may help the situation for worktrees created _after_
> > `extensions.worktreeConfig` is enabled, it does _not_ help existing
> > worktrees at all. For this reason, in my opinion, `git worktree add`
> > is simply not the correct place to be addressing this problem, and
> > it's why I suggested a separate command for enabling the feature and
> > doing all the necessary bookkeeping. It's also why I suggested[2] that
> > in the long run, we may want per-worktree config to be the default
> > (and only) behavior rather than the current (legacy) behavior of all
> > config being shared between worktrees.
>
> Thinking upon it further, I take back what I said about it not helping
> existing worktrees.
>
> Your proposal is _almost_ the same as my suggestion of eventually
> making per-worktree config the default. The difference is that you're
> only making it the default if `core.bare=true` or `core.worktree` is
> set.

Indeed.  :-)

> But do we need that distinction? If people are comfortable with
> that, then are they comfortable with simply flipping the switch and
> making per-worktree config the default today regardless of `core.bare`
> and `core.worktree`?

This is tempting, at least if we leave core.repositoryFormatVersion as
0 (see 11664196ac ("Revert "check_repository_format_gently(): refuse
extensions for old repositories"", 2020-07-15)) when core.bare is
false and core.worktree was unset.  However, for that case:

* This is a case where operating on the primary worktree was not
previously problematic for older git versions or third party tools.
* Interestingly, git <= 2.6.2 can continue to operate on the primary
worktree (because it didn't know to error out on unknown extensions)
* git >= 2.19.0 could continue to operate on the primary worktree
(because it understands the extension)
* git versions between that range would suddenly break, erroring out
on the unknown extension (though those versions would start working
again if we migrated core.bare and core.worktree but just didn't set
extensions.worktreeConfig).

> I'm not sure that I'm comfortable with it due to the potential of
> breaking older versions of git which don't understand
> `extensions.worktreeConfig`, as well as breaking third-party tools,
> but maybe other people feel differently?

The distinction I made was particularly written with third party tools
and older versions of git in mind, to allow them to continue to safely
operate when possible.  But let's flesh it out a little:

* core.bare = false AND core.worktree is unset (i.e. a typical
non-bare clone), AND we try to add a worktree: we have _years_ of
in-the-wild usage showing that old git versions and third party tools
operate fine without migrating the config.  Leave it in place and do
not set extensions.worktreeConfig and do not upgrade
core.repositoryFormatVersion.

* (core.bare = true OR core.worktree is set) AND we try to add a
worktree: all third party tools and all git versions (old and new) are
broken anyway.  Flip the switch (upgrade core.repositoryFormatVersion
to 1, set extensions.worktreeConfig=true, and migrate
core.bare/core.worktree for main repo and newly created worktree),
which at least allows new git versions and new tools to work
correctly, and will hopefully cause old tools to error out since this
is a configuration they won't handle correctly.


Further:
  * Toggling extensions.worktreeConfig=true for the first time is
rather trivial for users to try; for years they have been able to do
so without making _any_ other changes and expect things to continue to
work (assuming new enough git versions and third-party tools).  They
have likely disseminated this information to other users.  I certainly
did.  If we are going to expect more of anyone toggling this option,
we need lots of documentation and perhaps code changes to help shore
up the path.  I'd rather just allow folks to continue to do such
toggling.
  * Toggling either core.bare or core.worktree in an existing clone
requires significant additional manual work to make things consistent.
Because it requires a lot more knowledge and work, I think the burden
should be on these users to know about the ramifications with existing
worktrees.  (I've never heard of a user other than myself attempt to
toggle these; I'm sure there are some, it just seems quite rare.)
Eric Sunshine Dec. 28, 2021, 7:33 a.m. UTC | #7
On Mon, Dec 27, 2021 at 2:35 PM Elijah Newren <newren@gmail.com> wrote:
> On Sun, Dec 26, 2021 at 11:15 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > Your example indeed leads to a broken state because it doesn't follow
> > the instructions given by git-worktree.txt for enabling
> > `extensions.worktreeConfig`, which involves additional bookkeeping
> > operations beyond merely setting that config variable.
>
> These are instructions which neither Stolee nor I was aware of prior
> to your pointing it out.  [...]
> I'd suspect that Stolee and I are actually _more_ likely to be aware
> of relevant documentation than the average Git user, so if we missed
> it, I suspect many of them will. [...]

I wasn't originally aware of the bookkeeping instructions either. In
fact, for a good while, I wasn't even aware that Duy had implemented
per-worktree configuration or that there was such a thing. I either
must have been entirely off-list during the implementation or was not
in a position to follow changes to the project at the time. I only
became aware of per-worktree configuration at some point in the last
two or three years when someone asked some question on the list
related to the feature and I ended up diving into the documentation,
the source code, and the patches themselves in order to understand it
fully -- because I think I didn't understand it merely from reading
the documentation which is rather sparse (no pun intended). And I had
forgotten enough about it since then that I had to re-research it when
Sean raised the issue on the list a few days back in relation to
sparse-checkout.

> So, I don't think relying on folks to read this particular piece of
> documentation is a reliable course of action...at least not without
> some changes to make it much more likely to be noticed.

The sparsity of documentation about per-worktree configuration
certainly doesn't help, nor the fact that it's fairly near the end of
git-worktree.txt, at which point people may have given up reading. We
could make it a bit more prominent by mentioning it early in the
command description, but it still involves enough fiddly bookkeeping
that it likely will continue to be problematic.

Making per-worktree configuration the default does seem like the best
long-term solution. Doing so should make all these problems go away. I
don't know what Duy's plans were, nor whether he had some migration
strategy planned.

> > I vaguely recall some mention of this not long ago on the list but
> > didn't follow the discussion at all. Do you have pointers or a
> > summary?
>
> For the microsoft repositories, sparse-checkouts are needed because a
> full checkout is unmanageable (millions of files to check out
> otherwise).  For other repositories, full checkouts might technically
> be manageable, but are annoyingly slow and users may only want to work
> with sparse checkouts (and for some of these, due to various
> mono-repoization efforts, the repository is growing towards a size
> where manageability of full checkouts is decreasing).
>
> The fact that `git worktree add` does a full checkout is quite
> painful...possibility to the point of making worktrees useless for
> some users.  I think `git worktree add` should copy the sparsity of
> the worktree from which it was invoked.

Okay, I do recall reading a message in which you proposed this, though
I didn't understand the reasoning for the suggestion since I wasn't
following the discussion. The explanation you provide here makes the
proposal understandable.

>   * using --no-checkout as a proxy: This means no files checked out
> and no index file.  The lack of an index file makes it appear that
> everything was manually deleted (with the deletion staged).  Also, if
> the project is using some kind of <sparsity-wrapper-script> (e.g. for
> determining dependencies between directories so that appropriate
> 'modules' can be specified and transformed into a list of directories
> passed to sparse-checkout), then the sparsity-wrapper-script isn't
> available to them to invoke.  If users try to check out just the
> wrapper file, then an index will be created and have just one entry
> and we kind of cement the fact that all other files look like they
> were intended to be deleted.  Also, even if the user runs `git
> sparse-checkout init --cone`, you don't actually don't transform this
> no-checkout into a sparse checkout because sparse-checkout doesn't
> want to undo your staged deletions.  Despite the fact that I'm very
> familiar with all the implementation internals, it was not obvious to
> me all the necessary additional commands needed for users to get a
> sparse checkout while making use of --no-checkout.  Users stand little
> chance of figuring the necessary command invocations out without a
> huge amount of effort (and they've given up and come to me before
> asking for help, and my first response turned out to be incomplete in
> various cases...).

You've clearly put much more thought into this than I have (since I
only just read this), so I'm not likely to have any meaningful input,
but I'll write down a few thoughts/questions which popped into my head
while reading what you wrote. Perhaps they've already been discussed
elsewhere, so feel free to ignore (and they may not be worth
responding to anyhow).

When you say "copy the sparsity of the worktree from which it was
invoked", do you mean that literally, such that it special-cases it
and only copies sparse-checkout information? An alternative would be
to allow the user to specify -- via the shared configuration
(.git/config) -- exactly which config keys get inherited/copied over
by `git worktree add`. Such a solution would avoid special-casing
sparse-checkout and could be useful in the future for other commands
which might need such functionality, though this approach may be
overengineered.

A more general approach might be for the new worktree to copy all the
per-worktree configuration from the worktree in which the command was
invoked, thus sparsity would be inherited "for free" along with other
settings. This has the benefits of not requiring sparse-checkout
special-cases in the code and it's easy to document ("the new worktree
inherits/copies configuration settings from the worktree in which `git
worktree add` was invoked") and easy to understand.

I also wondered if adding some sort of `--sparse-checkout=...` option
to `git worktree add` would solve this particular dilemma, thus
allowing the user to configure custom sparse-checkout for the worktree
as it is being created. I also very briefly wondered if this should
instead be a feature of the `git sparse-checkout` command itself, such
as `git sparse-checkout add-worktree`, but I think that's probably a
dead-end in terms of user discoverability, whereas `git worktree add
--sparse-checkout=...` is more easily discoverable for people wanting
to work with worktrees.
Eric Sunshine Dec. 28, 2021, 9:11 a.m. UTC | #8
On Mon, Dec 27, 2021 at 3:16 PM Elijah Newren <newren@gmail.com> wrote:
> On Sun, Dec 26, 2021 at 11:34 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > Your proposal is _almost_ the same as my suggestion of eventually
> > making per-worktree config the default. The difference is that you're
> > only making it the default if `core.bare=true` or `core.worktree` is
> > set.
>
> Indeed.  :-)
>
> > But do we need that distinction? If people are comfortable with
> > that, then are they comfortable with simply flipping the switch and
> > making per-worktree config the default today regardless of `core.bare`
> > and `core.worktree`?
>
> This is tempting, at least if we leave core.repositoryFormatVersion as
> 0 (see 11664196ac ("Revert "check_repository_format_gently(): refuse
> extensions for old repositories"", 2020-07-15)) when core.bare is
> false and core.worktree was unset.  However, for that case:

I'll try to respond to this email when I can find a quiet block of
time to really concentrate on what you're saying and reason through it
more thoroughly; it will probably require several read-throughs.
Elijah Newren Dec. 28, 2021, 6:16 p.m. UTC | #9
On Mon, Dec 27, 2021 at 11:33 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Mon, Dec 27, 2021 at 2:35 PM Elijah Newren <newren@gmail.com> wrote:
> > On Sun, Dec 26, 2021 at 11:15 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > > Your example indeed leads to a broken state because it doesn't follow
> > > the instructions given by git-worktree.txt for enabling
> > > `extensions.worktreeConfig`, which involves additional bookkeeping
> > > operations beyond merely setting that config variable.
> >
> > These are instructions which neither Stolee nor I was aware of prior
> > to your pointing it out.  [...]
> > I'd suspect that Stolee and I are actually _more_ likely to be aware
> > of relevant documentation than the average Git user, so if we missed
> > it, I suspect many of them will. [...]
>
> I wasn't originally aware of the bookkeeping instructions either. In
> fact, for a good while, I wasn't even aware that Duy had implemented
> per-worktree configuration or that there was such a thing. I either
> must have been entirely off-list during the implementation or was not
> in a position to follow changes to the project at the time. I only
> became aware of per-worktree configuration at some point in the last
> two or three years when someone asked some question on the list
> related to the feature and I ended up diving into the documentation,
> the source code, and the patches themselves in order to understand it
> fully -- because I think I didn't understand it merely from reading
> the documentation which is rather sparse (no pun intended). And I had
> forgotten enough about it since then that I had to re-research it when
> Sean raised the issue on the list a few days back in relation to
> sparse-checkout.
>
> > So, I don't think relying on folks to read this particular piece of
> > documentation is a reliable course of action...at least not without
> > some changes to make it much more likely to be noticed.
>
> The sparsity of documentation about per-worktree configuration
> certainly doesn't help, nor the fact that it's fairly near the end of
> git-worktree.txt, at which point people may have given up reading. We
> could make it a bit more prominent by mentioning it early in the
> command description, but it still involves enough fiddly bookkeeping
> that it likely will continue to be problematic.

Further, it's not clear people even looked at git-worktree.txt at the
time they learned about extensions.worktreeConfig.  I believe I
discovered and started using extensions.worktreeConfig from `git
config --help`, which makes no mention of or even reference to the
need for any extra steps.  (I didn't see the original mailing list
discussions around that setting either.)  It never occurred to me in
the ~3 years since to even look in `git worktree --help` for
additional guidance around that config setting until this particular
mailing list thread.

> Making per-worktree configuration the default does seem like the best
> long-term solution. Doing so should make all these problems go away. I
> don't know what Duy's plans were, nor whether he had some migration
> strategy planned.
>
> > > I vaguely recall some mention of this not long ago on the list but
> > > didn't follow the discussion at all. Do you have pointers or a
> > > summary?
> >
> > For the microsoft repositories, sparse-checkouts are needed because a
> > full checkout is unmanageable (millions of files to check out
> > otherwise).  For other repositories, full checkouts might technically
> > be manageable, but are annoyingly slow and users may only want to work
> > with sparse checkouts (and for some of these, due to various
> > mono-repoization efforts, the repository is growing towards a size
> > where manageability of full checkouts is decreasing).
> >
> > The fact that `git worktree add` does a full checkout is quite
> > painful...possibility to the point of making worktrees useless for
> > some users.  I think `git worktree add` should copy the sparsity of
> > the worktree from which it was invoked.
>
> Okay, I do recall reading a message in which you proposed this, though
> I didn't understand the reasoning for the suggestion since I wasn't
> following the discussion. The explanation you provide here makes the
> proposal understandable.
>
> >   * using --no-checkout as a proxy: This means no files checked out
> > and no index file.  The lack of an index file makes it appear that
> > everything was manually deleted (with the deletion staged).  Also, if
> > the project is using some kind of <sparsity-wrapper-script> (e.g. for
> > determining dependencies between directories so that appropriate
> > 'modules' can be specified and transformed into a list of directories
> > passed to sparse-checkout), then the sparsity-wrapper-script isn't
> > available to them to invoke.  If users try to check out just the
> > wrapper file, then an index will be created and have just one entry
> > and we kind of cement the fact that all other files look like they
> > were intended to be deleted.  Also, even if the user runs `git
> > sparse-checkout init --cone`, you don't actually don't transform this
> > no-checkout into a sparse checkout because sparse-checkout doesn't
> > want to undo your staged deletions.  Despite the fact that I'm very
> > familiar with all the implementation internals, it was not obvious to
> > me all the necessary additional commands needed for users to get a
> > sparse checkout while making use of --no-checkout.  Users stand little
> > chance of figuring the necessary command invocations out without a
> > huge amount of effort (and they've given up and come to me before
> > asking for help, and my first response turned out to be incomplete in
> > various cases...).
>
> You've clearly put much more thought into this than I have (since I
> only just read this), so I'm not likely to have any meaningful input,
> but I'll write down a few thoughts/questions which popped into my head
> while reading what you wrote. Perhaps they've already been discussed
> elsewhere, so feel free to ignore (and they may not be worth
> responding to anyhow).
>
> When you say "copy the sparsity of the worktree from which it was
> invoked", do you mean that literally, such that it special-cases it
> and only copies sparse-checkout information? An alternative would be
> to allow the user to specify -- via the shared configuration
> (.git/config) -- exactly which config keys get inherited/copied over
> by `git worktree add`. Such a solution would avoid special-casing
> sparse-checkout and could be useful in the future for other commands
> which might need such functionality, though this approach may be
> overengineered.
>
> A more general approach might be for the new worktree to copy all the
> per-worktree configuration from the worktree in which the command was
> invoked, thus sparsity would be inherited "for free" along with other
> settings. This has the benefits of not requiring sparse-checkout
> special-cases in the code and it's easy to document ("the new worktree
> inherits/copies configuration settings from the worktree in which `git
> worktree add` was invoked") and easy to understand.

Ooh, this is a good point and I *really* like this simple solution.
Thanks for pointing it out.

Do note, though, that it's more than just config.worktree -- I also
want the ${GITDIR}/info/sparse-checkout file copied.

> I also wondered if adding some sort of `--sparse-checkout=...` option
> to `git worktree add` would solve this particular dilemma, thus
> allowing the user to configure custom sparse-checkout for the worktree
> as it is being created. I also very briefly wondered if this should
> instead be a feature of the `git sparse-checkout` command itself, such
> as `git sparse-checkout add-worktree`, but I think that's probably a
> dead-end in terms of user discoverability, whereas `git worktree add
> --sparse-checkout=...` is more easily discoverable for people wanting
> to work with worktrees.

This might be a useful extra capability (we'd probably want to keep
this flag in sync with git-clone's --sparse flag and whatever
capabilities grow there), but I don't see it as a solution to this
problem.  I think the default needs to be copying the existing
sparsity.  Making users specify cone/non-cone mode and
sparse-index/non-sparse-index and and several dozen directories by
hand just doesn't sound reasonable to me.  (We have a case with
several hundred directories/modules, with various dependencies between
them.  Users can use a wrapper, `./sparsify --modules $MODULE_A
$MODULE_B` which figures out the several dozen relevant directories
and calls sparse-checkout set with those, but of course that wrapper
won't yet be available in the new worktree until after the new
worktree has been added.)

An alternative that would be workable, though annoying, is giving the
user a super-sparse checkout with only files in the toplevel directory
present (i.e. what you'd get after `git sparse-checkout init --cone`
or `git clone --sparse ...`), and then making them use the normal
tools to manually specify the wanted sparsity (which probably requires
switching back to the previous worktree to run some info commands to
determine exactly what the sparsity was).

An increasingly unworkable alternative is the current behavior of
defaulting to a full checkout in all cases (and forcing users to
sparsify afterwards).  A full checkout is fine if the user came from
one (and probably preferable in such a case), but it's increasingly
problematic for us even with our repo being nowhere near the size of
the microsoft repos.
Eric Sunshine Dec. 30, 2021, 6:21 a.m. UTC | #10
On Mon, Dec 27, 2021 at 3:16 PM Elijah Newren <newren@gmail.com> wrote:
> On Sun, Dec 26, 2021 at 11:34 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > Your proposal is _almost_ the same as my suggestion of eventually
> > making per-worktree config the default. The difference is that you're
> > only making it the default if `core.bare=true` or `core.worktree` is
> > set.
>
> Indeed.  :-)

I mentioned previously[1] that I needed to find a block of time to
really think through the topic before I'd be able to respond to this
email. So, today I spent some time trying to reason through the
various cases under discussion, and I came back and re-read this email
with the intention of trying to summarize my understanding of the
situation and my understanding of the points you were making. However,
you did such a good job of summarizing the various cases at the very
end of [2] that it probably makes more sense for me to respond to that
email instead.

[1]: https://lore.kernel.org/git/CAPig+cTFSDw-9Aq+=+r4sHSzTmG7s2T93Z0uqWTxHbKwGFaiYQ@mail.gmail.com/
[2]: https://lore.kernel.org/git/CABPp-BHuO3B366uJuODMQo-y449p8cAMVn0g2MTcO5di3Xa7Zg@mail.gmail.com/

> > But do we need that distinction? If people are comfortable with
> > that, then are they comfortable with simply flipping the switch and
> > making per-worktree config the default today regardless of `core.bare`
> > and `core.worktree`?
>
> This is tempting, at least if we leave core.repositoryFormatVersion as
> 0 (see 11664196ac ("Revert "check_repository_format_gently(): refuse
> extensions for old repositories"", 2020-07-15)) when core.bare is
> false and core.worktree was unset.  However, for that case:

I had seen 11664196ac when researching one of my earlier responses,
though it took more than one read to (hopefully) fully understand what
it is saying (i.e. due to an oversight, it's too late to enforce the
`core.repositoryFormatVersion=1` requirement when extensions are used,
as originally intended).

> * This is a case where operating on the primary worktree was not
> previously problematic for older git versions or third party tools.
> * Interestingly, git <= 2.6.2 can continue to operate on the primary
> worktree (because it didn't know to error out on unknown extensions)
> * git >= 2.19.0 could continue to operate on the primary worktree
> (because it understands the extension)
> * git versions between that range would suddenly break, erroring out
> on the unknown extension (though those versions would start working
> again if we migrated core.bare and core.worktree but just didn't set
> extensions.worktreeConfig).

The significance of versions 2.6.2 and 2.19.0 is unclear to me. What
context or criteria are you using to identify those versions as
meaningful here?
Eric Sunshine Dec. 30, 2021, 6:40 a.m. UTC | #11
On Tue, Dec 28, 2021 at 1:16 PM Elijah Newren <newren@gmail.com> wrote:
> On Mon, Dec 27, 2021 at 11:33 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > The sparsity of documentation about per-worktree configuration
> > certainly doesn't help, nor the fact that it's fairly near the end of
> > git-worktree.txt, at which point people may have given up reading. We
> > could make it a bit more prominent by mentioning it early in the
> > command description, but it still involves enough fiddly bookkeeping
> > that it likely will continue to be problematic.
>
> Further, it's not clear people even looked at git-worktree.txt at the
> time they learned about extensions.worktreeConfig.  I believe I
> discovered and started using extensions.worktreeConfig from `git
> config --help`, which makes no mention of or even reference to the
> need for any extra steps.  (I didn't see the original mailing list
> discussions around that setting either.)  It never occurred to me in
> the ~3 years since to even look in `git worktree --help` for
> additional guidance around that config setting until this particular
> mailing list thread.

That's an interesting datapoint. At the very least, we probably should
update Documentation/git-config.txt to mention the extra bookkeeping
required when setting `extensions.worktreeConfig=true`.

> > A more general approach might be for the new worktree to copy all the
> > per-worktree configuration from the worktree in which the command was
> > invoked, thus sparsity would be inherited "for free" along with other
> > settings. This has the benefits of not requiring sparse-checkout
> > special-cases in the code and it's easy to document ("the new worktree
> > inherits/copies configuration settings from the worktree in which `git
> > worktree add` was invoked") and easy to understand.
>
> Ooh, this is a good point and I *really* like this simple solution.
> Thanks for pointing it out.

I do wonder, though, if there are traps waiting for us with this
all-inclusive approach. I don't know what sort of worktree-specific
configuration people use, so I do worry a bit that this could be
casting a too-wide net, and that it might in fact be better to only
copy the sparse-checkout settings (as ugly as it is to special-case
that -- but we need to special-case `core.bare` and `core.worktree`
anyhow[1]).

[1]: https://lore.kernel.org/git/CAPig+cSUOknNC9GMyPvAqdBU0r1MVgvSpvgpSpXUmBm67HO7PQ@mail.gmail.com/

> Do note, though, that it's more than just config.worktree -- I also
> want the ${GITDIR}/info/sparse-checkout file copied.

Thanks for pointing that out. I'm reasonably (or completely) ignorant
of sparse-checkout since I've never used it nor read the
documentation, and I didn't follow the earlier discussions.

> > I also wondered if adding some sort of `--sparse-checkout=...` option
> > to `git worktree add` would solve this particular dilemma, thus
> > allowing the user to configure custom sparse-checkout for the worktree
> > as it is being created. I also very briefly wondered if this should
> > instead be a feature of the `git sparse-checkout` command itself, such
> > as `git sparse-checkout add-worktree`, but I think that's probably a
> > dead-end in terms of user discoverability, whereas `git worktree add
> > --sparse-checkout=...` is more easily discoverable for people wanting
> > to work with worktrees.
>
> This might be a useful extra capability (we'd probably want to keep
> this flag in sync with git-clone's --sparse flag and whatever
> capabilities grow there), but I don't see it as a solution to this
> problem.  I think the default needs to be copying the existing
> sparsity.  Making users specify cone/non-cone mode and
> sparse-index/non-sparse-index and and several dozen directories by
> hand just doesn't sound reasonable to me.  (We have a case with
> several hundred directories/modules, with various dependencies between
> them.  Users can use a wrapper, `./sparsify --modules $MODULE_A
> $MODULE_B` which figures out the several dozen relevant directories
> and calls sparse-checkout set with those, but of course that wrapper
> won't yet be available in the new worktree until after the new
> worktree has been added.)

Okay.

> An alternative that would be workable, though annoying, is giving the
> user a super-sparse checkout with only files in the toplevel directory
> present (i.e. what you'd get after `git sparse-checkout init --cone`
> or `git clone --sparse ...`), and then making them use the normal
> tools to manually specify the wanted sparsity (which probably requires
> switching back to the previous worktree to run some info commands to
> determine exactly what the sparsity was).

Sounds somewhat painful.

> An increasingly unworkable alternative is the current behavior of
> defaulting to a full checkout in all cases (and forcing users to
> sparsify afterwards).  A full checkout is fine if the user came from
> one (and probably preferable in such a case), but it's increasingly
> problematic for us even with our repo being nowhere near the size of
> the microsoft repos.

It feels unfortunate and a bit dirty to spread around this
special-case knowledge about sparse-checkout to various parts of the
system, but based upon the pain-points you describe, having a new
worktree inherit the sparsity from the originating worktree does sound
(given my limited knowledge of the topic) like it would ease the pain
for users.
Elijah Newren Dec. 30, 2021, 5:40 p.m. UTC | #12
On Wed, Dec 29, 2021 at 10:22 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Mon, Dec 27, 2021 at 3:16 PM Elijah Newren <newren@gmail.com> wrote:
> > On Sun, Dec 26, 2021 at 11:34 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > > Your proposal is _almost_ the same as my suggestion of eventually
> > > making per-worktree config the default. The difference is that you're
> > > only making it the default if `core.bare=true` or `core.worktree` is
> > > set.
> >
> > Indeed.  :-)
>
> I mentioned previously[1] that I needed to find a block of time to
> really think through the topic before I'd be able to respond to this
> email. So, today I spent some time trying to reason through the
> various cases under discussion, and I came back and re-read this email
> with the intention of trying to summarize my understanding of the
> situation and my understanding of the points you were making. However,
> you did such a good job of summarizing the various cases at the very
> end of [2] that it probably makes more sense for me to respond to that
> email instead.
>
> [1]: https://lore.kernel.org/git/CAPig+cTFSDw-9Aq+=+r4sHSzTmG7s2T93Z0uqWTxHbKwGFaiYQ@mail.gmail.com/
> [2]: https://lore.kernel.org/git/CABPp-BHuO3B366uJuODMQo-y449p8cAMVn0g2MTcO5di3Xa7Zg@mail.gmail.com/
>
> > > But do we need that distinction? If people are comfortable with
> > > that, then are they comfortable with simply flipping the switch and
> > > making per-worktree config the default today regardless of `core.bare`
> > > and `core.worktree`?
> >
> > This is tempting, at least if we leave core.repositoryFormatVersion as
> > 0 (see 11664196ac ("Revert "check_repository_format_gently(): refuse
> > extensions for old repositories"", 2020-07-15)) when core.bare is
> > false and core.worktree was unset.  However, for that case:
>
> I had seen 11664196ac when researching one of my earlier responses,
> though it took more than one read to (hopefully) fully understand what
> it is saying (i.e. due to an oversight, it's too late to enforce the
> `core.repositoryFormatVersion=1` requirement when extensions are used,
> as originally intended).
>
> > * This is a case where operating on the primary worktree was not
> > previously problematic for older git versions or third party tools.
> > * Interestingly, git <= 2.6.2 can continue to operate on the primary
> > worktree (because it didn't know to error out on unknown extensions)
> > * git >= 2.19.0 could continue to operate on the primary worktree
> > (because it understands the extension)
> > * git versions between that range would suddenly break, erroring out
> > on the unknown extension (though those versions would start working
> > again if we migrated core.bare and core.worktree but just didn't set
> > extensions.worktreeConfig).
>
> The significance of versions 2.6.2 and 2.19.0 is unclear to me. What
> context or criteria are you using to identify those versions as
> meaningful here?

We had been discussing how certain config settings "might break
external tools OR old git versions", but hadn't brought up which tools
or which git versions.  I don't know which all external tools might be
in play (though I mentioned some in use at $DAYJOB in another thread)
but in this email I had just thought that I'd mention where the cutoff
point was in terms of git versions which understood
core.repositoryFormatVersion and extensions.worktreeConfig.  If other
folks also want to test how things behaved before or after these
patches with "old git versions", those were the switchover points that
are relevant and which I tested with.
Elijah Newren Dec. 30, 2021, 6:38 p.m. UTC | #13
On Wed, Dec 29, 2021 at 10:41 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Tue, Dec 28, 2021 at 1:16 PM Elijah Newren <newren@gmail.com> wrote:
> > On Mon, Dec 27, 2021 at 11:33 PM Eric Sunshine <sunshine@sunshineco.com> wrote:

> > > A more general approach might be for the new worktree to copy all the
> > > per-worktree configuration from the worktree in which the command was
> > > invoked, thus sparsity would be inherited "for free" along with other
> > > settings. This has the benefits of not requiring sparse-checkout
> > > special-cases in the code and it's easy to document ("the new worktree
> > > inherits/copies configuration settings from the worktree in which `git
> > > worktree add` was invoked") and easy to understand.
> >
> > Ooh, this is a good point and I *really* like this simple solution.
> > Thanks for pointing it out.
>
> I do wonder, though, if there are traps waiting for us with this
> all-inclusive approach. I don't know what sort of worktree-specific
> configuration people use, so I do worry a bit that this could be
> casting a too-wide net, and that it might in fact be better to only
> copy the sparse-checkout settings (as ugly as it is to special-case
> that -- but we need to special-case `core.bare` and `core.worktree`
> anyhow[1]).
>
> [1]: https://lore.kernel.org/git/CAPig+cSUOknNC9GMyPvAqdBU0r1MVgvSpvgpSpXUmBm67HO7PQ@mail.gmail.com/

I could probably be persuaded either way (do users want to copy
something and tweak it, or start with a clean slate?), and it might
even make sense to have a flag for users to specify.

My hunch, at least with the developers I work with, is that they're
more likely to think in terms of "I want another worktree like this
one, except that I'm going to change..."

Also, another reason to prefer copying all of core.worktree (minus the
always worktree-specific value of core.worktree, and core.bare), is
because it's easy to explain in the documentation, and I think we'd be
much less likely to obsolete user's knowledge over time.  (I think
additional sparse-checkout things, or new other features that also
want to be copied over, are much more likely than the addition of keys
that are always worktree-specific like core.worktree).

...
> > An increasingly unworkable alternative is the current behavior of
> > defaulting to a full checkout in all cases (and forcing users to
> > sparsify afterwards).  A full checkout is fine if the user came from
> > one (and probably preferable in such a case), but it's increasingly
> > problematic for us even with our repo being nowhere near the size of
> > the microsoft repos.
>
> It feels unfortunate and a bit dirty to spread around this
> special-case knowledge about sparse-checkout to various parts of the
> system,

This might makes things worse, but it's far too late to avoid that:

$ git grep -il -e sparse-checkout -e skip_worktree
.gitignore
Documentation/RelNotes/2.19.0.txt
Documentation/RelNotes/2.25.0.txt
Documentation/RelNotes/2.26.0.txt
Documentation/RelNotes/2.27.0.txt
Documentation/RelNotes/2.28.0.txt
Documentation/RelNotes/2.34.0.txt
Documentation/RelNotes/2.6.0.txt
Documentation/config/checkout.txt
Documentation/config/core.txt
Documentation/git-add.txt
Documentation/git-checkout.txt
Documentation/git-clone.txt
Documentation/git-read-tree.txt
Documentation/git-restore.txt
Documentation/git-rm.txt
Documentation/git-sparse-checkout.txt
Documentation/git-worktree.txt
Documentation/gitrepository-layout.txt
Documentation/rev-list-options.txt
Documentation/technical/index-format.txt
Documentation/technical/parallel-checkout.txt
Documentation/technical/sparse-index.txt
Makefile
advice.c
apply.c
attr.c
builtin/add.c
builtin/check-ignore.c
builtin/checkout.c
builtin/clone.c
builtin/commit.c
builtin/grep.c
builtin/ls-files.c
builtin/mv.c
builtin/read-tree.c
builtin/rm.c
builtin/sparse-checkout.c
builtin/stash.c
builtin/update-index.c
cache-tree.c
cache.h
command-list.txt
contrib/completion/git-prompt.sh
diff-lib.c
diff.c
dir.c
dir.h
entry.c
git.c
list-objects-filter.c
list-objects-filter.h
merge-ort.c
merge-recursive.c
path.c
pathspec.c
pathspec.h
po/bg.po
po/ca.po
po/de.po
po/el.po
po/es.po
po/fr.po
po/git.pot
po/id.po
po/it.po
po/ko.po
po/pl.po
po/pt_PT.po
po/ru.po
po/sv.po
po/tr.po
po/vi.po
po/zh_CN.po
po/zh_TW.po
preload-index.c
read-cache.c
sparse-index.c
sparse-index.h
t/perf/p0005-status.sh
t/perf/p0006-read-tree-checkout.sh
t/perf/p0007-write-cache.sh
t/perf/p2000-sparse-operations.sh
t/perf/repos/inflate-repo.sh
t/perf/repos/many-files.sh
t/t0060-path-utils.sh
t/t1011-read-tree-sparse-checkout.sh
t/t1090-sparse-checkout-scope.sh
t/t1091-sparse-checkout-builtin.sh
t/t1092-sparse-checkout-compatibility.sh
t/t2018-checkout-branch.sh
t/t3507-cherry-pick-conflict.sh
t/t3602-rm-sparse-checkout.sh
t/t3705-add-sparse-checkout.sh
t/t5317-pack-objects-filter-objects.sh
t/t6112-rev-list-filters-objects.sh
t/t6428-merge-conflicts-sparse.sh
t/t6435-merge-sparse.sh
t/t7002-mv-sparse-checkout.sh
t/t7012-skip-worktree-writing.sh
t/t7063-status-untracked-cache.sh
t/t7418-submodule-sparse-gitmodules.sh
t/t7519-status-fsmonitor.sh
t/t7817-grep-sparse-checkout.sh
unpack-trees.c
wt-status.c

>  but based upon the pain-points you describe, having a new
> worktree inherit the sparsity from the originating worktree does sound
> (given my limited knowledge of the topic) like it would ease the pain
> for users.

:-)
Eric Sunshine Jan. 3, 2022, 6:51 a.m. UTC | #14
On Thu, Dec 30, 2021 at 1:38 PM Elijah Newren <newren@gmail.com> wrote:
> On Wed, Dec 29, 2021 at 10:41 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > On Tue, Dec 28, 2021 at 1:16 PM Elijah Newren <newren@gmail.com> wrote:
> > > On Mon, Dec 27, 2021 at 11:33 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > > > A more general approach might be for the new worktree to copy all the
> > > > per-worktree configuration from the worktree in which the command was
> > > > invoked, thus sparsity would be inherited "for free" along with other
> > > > settings. This has the benefits of not requiring sparse-checkout
> > > > special-cases in the code and it's easy to document ("the new worktree
> > > > inherits/copies configuration settings from the worktree in which `git
> > > > worktree add` was invoked") and easy to understand.
> > >
> > > Ooh, this is a good point and I *really* like this simple solution.
> > > Thanks for pointing it out.
> >
> > I do wonder, though, if there are traps waiting for us with this
> > all-inclusive approach. I don't know what sort of worktree-specific
> > configuration people use, so I do worry a bit that this could be
> > casting a too-wide net, and that it might in fact be better to only
> > copy the sparse-checkout settings (as ugly as it is to special-case
> > that -- but we need to special-case `core.bare` and `core.worktree`
> > anyhow[1]).
>
> I could probably be persuaded either way (do users want to copy
> something and tweak it, or start with a clean slate?), and it might
> even make sense to have a flag for users to specify.

I also could probably be persuaded either way, and yes a flag is a
possibility, though it would be nice if we could get along without it.

> My hunch, at least with the developers I work with, is that they're
> more likely to think in terms of "I want another worktree like this
> one, except that I'm going to change..."
>
> Also, another reason to prefer copying all of core.worktree (minus the
> always worktree-specific value of core.worktree, and core.bare), is
> because it's easy to explain in the documentation, and I think we'd be
> much less likely to obsolete user's knowledge over time.  (I think
> additional sparse-checkout things, or new other features that also
> want to be copied over, are much more likely than the addition of keys
> that are always worktree-specific like core.worktree).

Another possible point in favor of copying all worktree-specific
config to the new worktree is that if the user really does want to do
some configuration specific to the new worktree, then that is going to
require a certain amount of manual setup after creating the new
worktree regardless of whether we copy all worktree-specific config or
only a select subset (such as the sparse-checkout settings).