mbox series

[v6,0/5] teach submodules to know they're submodules

Message ID 20211117005701.371808-1-emilyshaffer@google.com (mailing list archive)
Headers show
Series teach submodules to know they're submodules | expand

Message

Emily Shaffer Nov. 17, 2021, 12:56 a.m. UTC
For the original cover letter, see
https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer%40google.com.

CI run: https://github.com/nasamuffin/git/actions/runs/1469463328

Since v5:

A couple things. Firstly, a semantics change *back* to the semantics of
v3 - we map from gitdir to gitdir, *not* from common dir to common dir,
so that theoretically a submodule with multiple worktrees in multiple
superproject worktrees will be able to figure out which worktree of the
superproject it's in. (Realistically, that's not really possible right
now, but I'd like to change that soon.)

Secondly, a rewording of comments and commit messages to indicate that
this isn't a cache of some expensive operation, but rather intended to
be the source of truth for all submodules. I also added a fifth commit
rewriting `git rev-parse --show-superproject-working-tree` to
demonstrate what that means in practice - but from a practical
standpoint, I'm a little worried about that fifth patch. More details in
the patch 5 description.

I did discuss Ævar's idea of relying on in-process filesystem digging to
find the superproject's gitdir with the rest of the Google team, but in
the end decided that there are some worries about filesystem digging in
this way (namely, some ugly interactions with network drives that are
actually already an issue for Googler Linux machines). Plus, the allure
of being able to definitively know that we're a submodule is pretty
strong. ;) But overall, this is the direction I'd prefer to keep going
in, rather than trying to guess from the filesystem going forward.

Since v4:

The only real change here is a slight semantics change to map from
<submodule gitdir> to <superproject common git dir>. In every case
*except* for when the superproject has a worktree, this changes nothing.
For the case when the superproject has a worktree, this means that now
submodules will refer to the general superproject common dir (e.g. no
worktree-specific refs or configs or whatnot).

I *think* that because a submodule should exist in the context of the
common dir, not the worktree gitdir, that is ok. However, it does mean
it would be difficult to do something like sharing a config specific to
the worktree (the initial goal of this series).

$ROOT/.git
$ROOT/.git/config.superproject <- shared by $ROOT/.git/modules/sub
$ROOT/.git/modules/sub <- points to $ROOT/.git
$ROOT/.git/worktrees/wt
$ROOT/.git/worktrees/wt/config.superproject <- contains a certain config-based pre-commit hook

If the submodule only knows about the common dir, that is tough, because
the submodule would basically have to guess which worktree it's in from
its own path. There would be no way for '$WT/sub' to inherit
'$ROOT/.git/worktrees/wt/config.superproject'.

That said... right now, we don't support submodules in worktrees very
well at all. A submodule in a worktree will get a brand new gitdir in
$ROOT/.git/worktrees/modules/ (and that brand new gitdir would point to
the super's common dir). So I think we can punt on this entire question
until we teach submodules and worktrees to play more gracefully together
(it's on my long list...), and at that time we can probably introduce a
pointer from $ROOT/.git/modules/sub/worktrees/wt/ to
$ROOT/.git/worktrees/wt/....

Or, to summarize the long ramble above: "this is still kind of weird
with worktrees, but let's fix it later when we fix worktrees more
thoroughly".

(More rambling about worktree weirdness here:
https://lore.kernel.org/git/YYRaII8YWVxlBqsF%40google.com )


Since v3, a pretty major change: the semantics of
submodule.superprojectGitDir has changed, to point from the submodule's
gitdir to the superproject's gitdir (in v3 and earlier, we kept a path
from the submodule's *worktree* to the superproject's gitdir instead).
This cleans up some of the confusions about the behavior when a
submodule worktree moves around in the superproject's tree, or in a
future when we support submodules having multiple worktrees.

I also tried to simplify the tests to use 'test-tool path-utils
relative_path' everywhere - I think that makes them much more clear for
a test reader, but if you're reviewing and it isn't obvious what we're
testing for, please speak up.

I think this is pretty mature and there was a lot of general agreement
that the gitdir->gitdir association was the way to go, so please be
brutal and look for nits, leaks, etc. this round ;)
[/v4 cover letter]

Emily Shaffer (5):
  t7400-submodule-basic: modernize inspect() helper
  introduce submodule.superprojectGitDir record
  submodule: record superproject gitdir during absorbgitdirs
  submodule: record superproject gitdir during 'update'
  submodule: use config to find superproject worktree

 Documentation/config/submodule.txt |  12 ++++
 builtin/submodule--helper.c        |  11 +++
 git-submodule.sh                   |  15 ++++
 submodule.c                        | 108 ++++++++++++++++++++++++++++-
 t/t1500-rev-parse.sh               |   9 +++
 t/t7400-submodule-basic.sh         |  54 ++++++++-------
 t/t7406-submodule-update.sh        |  27 ++++++++
 t/t7412-submodule-absorbgitdirs.sh |  82 +++++++++++++++++++++-
 8 files changed, 290 insertions(+), 28 deletions(-)

Range-diff against v5:
1:  6ff10beaf2 = 1:  f1b08a7057 t7400-submodule-basic: modernize inspect() helper
2:  d4f4627585 ! 2:  d46c8439ab introduce submodule.superprojectGitDir record
    @@ Commit message
     
         By using a relative path instead of an absolute path, we can move the
         superproject directory around on the filesystem without breaking the
    -    submodule's cache. And by using the path from gitdir to gitdir, we can
    +    submodule's pointer. And by using the path from gitdir to gitdir, we can
         move the submodule within the superproject's tree structure without
    -    breaking the submodule's cache, too. Finally, by pointing at
    -    "get_git_common_dir()" instead of "get_git_dir()", we ensure the link
    -    will refer to the parent repo, not to a specific worktree.
    +    breaking the submodule's pointer, too. Finally, by pointing at the
    +    superproject's worktree gitdir (if it exists), we ensure that we can
    +    tell which worktree contains our submodule.
     
         Since this hint value is only introduced during new submodule creation
         via `git submodule add`, though, there is more work to do to allow the
         record to be created at other times.
     
    -    If the new config is present, we can do some optional value-added
    -    behavior, like letting "git status" print additional info about the
    -    submodule's status in relation to its superproject, or like letting the
    -    superproject and submodule share an additional config file separate from
    -    either one's local config.
    +    Once this new config is reliably in place, we can use it to know
    +    definitively that we are working in a submodule, and to know which
    +    superproject we are a submodule of. This allows us to do some
    +    value-added behavior, like letting "git status" print additional info
    +    about the submodule's status in relation to its superproject, or like
    +    letting the superproject and submodule share an additional config file
    +    separate from either one's local config.
     
         Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
         Helped-by: Junio C Hamano <gitster@pobox.com>
    @@ Documentation/config/submodule.txt: submodule.alternateErrorStrategy::
     +
     +submodule.superprojectGitDir::
     +	The relative path from the submodule's gitdir to its superproject's
    -+	common dir. When Git is run in a repository, it usually makes no
    ++	gitdir. When Git is run in a repository, it usually makes no
     +	difference whether this repository is standalone or a submodule, but if
     +	this configuration variable is present, additional behavior may be
     +	possible, such as "git status" printing additional information about
    @@ builtin/submodule--helper.c: static int clone_submodule(struct module_clone_data
      		git_config_set_in_file(p, "submodule.alternateErrorStrategy",
      				       error_strategy);
      
    -+	git_config_set_in_file(p, "submodule.superprojectGitdir",
    -+			       relative_path(absolute_path(get_git_common_dir()),
    ++	/*
    ++	 * Set the path from submodule's new gitdir to superproject's gitdir.
    ++	 * The latter may be a worktree gitdir. However, it is not possible for
    ++	 * the submodule to have a worktree-specific gitdir or config at clone
    ++	 * time, because "extensions.worktreeConfig" is only valid when set in
    ++	 * the local gitconfig, which the brand new submodule does not have yet.
    ++	 */
    ++	git_config_set_in_file(p, "submodule.superprojectGitDir",
    ++			       relative_path(absolute_path(get_git_dir()),
     +					     sm_gitdir, &sb));
     +
      	free(sm_alternate);
    @@ t/t7400-submodule-basic.sh: submodurl=$(pwd -P)
     +	# Ensure that submodule.superprojectGitDir contains the path from the
     +	# submodule's gitdir to the superproject's gitdir.
     +
    -+	super_abs_gitdir=$(git -C "$super_dir" rev-parse --path-format=absolute --git-common-dir) &&
    -+	sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --path-format=absolute --git-common-dir) &&
    ++	super_abs_gitdir=$(git -C "$super_dir" rev-parse --absolute-git-dir) &&
    ++	sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --absolute-git-dir) &&
     +
     +	[ "$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" = \
     +	  "$(test-tool path-utils relative_path "$super_abs_gitdir" \
3:  2dae297943 ! 3:  63ddaf5608 submodule: record superproject gitdir during absorbgitdirs
    @@ submodule.c: static void relocate_single_git_dir_into_superproject(const char *p
      	char *old_git_dir = NULL, *real_old_git_dir = NULL, *real_new_git_dir = NULL;
      	struct strbuf new_gitdir = STRBUF_INIT;
      	const struct submodule *sub;
    ++	struct config_set sub_cs;
     +	struct strbuf config_path = STRBUF_INIT, sb = STRBUF_INIT;
    ++	int tmp;
      
      	if (submodule_uses_worktrees(path))
      		die(_("relocate_gitdir for submodule '%s' with "
    @@ submodule.c: static void relocate_single_git_dir_into_superproject(const char *p
      
      	relocate_gitdir(path, real_old_git_dir, real_new_git_dir);
      
    -+	/* cache pointer to superproject's gitdir */
    ++	/*
    ++	 * Note location of superproject's gitdir. Because the submodule already
    ++	 * has a gitdir and local config, we can store this pointer from
    ++	 * worktree config to worktree config, if the submodule has
    ++	 * extensions.worktreeConfig set.
    ++	 */
     +	strbuf_addf(&config_path, "%s/config", real_new_git_dir);
    ++	git_configset_init(&sub_cs);
    ++	git_configset_add_file(&sub_cs, config_path.buf);
    ++	/* return 0 indicates config was found - we have a worktree config */
    ++	if (!git_configset_get_bool(&sub_cs, "extensions.worktreeConfig", &tmp))
    ++		strbuf_addstr(&config_path, ".worktree");
    ++
     +	git_config_set_in_file(config_path.buf, "submodule.superprojectGitdir",
    -+			       relative_path(absolute_path(get_git_common_dir()),
    ++			       relative_path(absolute_path(get_git_dir()),
     +					     real_new_git_dir, &sb));
     +
    ++	git_configset_clear(&sub_cs);
     +	strbuf_release(&config_path);
     +	strbuf_release(&sb);
      	free(old_git_dir);
    @@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorbing fails for a s
     +	# absorb the git dir
     +	git submodule absorbgitdirs sub4 &&
     +
    -+	# make sure the submodule cached the superproject gitdir correctly
    -+	submodule_gitdir="$(git -C sub4 rev-parse --path-format=absolute --git-common-dir)" &&
    -+	superproject_gitdir="$(git rev-parse --path-format=absolute --git-common-dir)" &&
    ++	# make sure the submodule noted the superproject gitdir correctly
    ++	submodule_gitdir="$(git -C sub4 rev-parse --absolute-git-dir)" &&
    ++	superproject_gitdir="$(git rev-parse --absolute-git-dir)" &&
     +
     +	test-tool path-utils relative_path "$superproject_gitdir" \
     +		"$submodule_gitdir" >expect &&
    @@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorbing fails for a s
     +	test_cmp expect actual
     +	)
     +'
    ++
    ++test_expect_success 'absorbgitdirs works with a submodule with worktree config' '
    ++	# reuse the worktree of the superproject
    ++	(
    ++	cd wt &&
    ++
    ++	# create a new unembedded git dir
    ++	git init sub5 &&
    ++	test_commit -C sub5 first &&
    ++	git submodule add ./sub5 &&
    ++	test_tick &&
    ++
    ++	# turn on worktree configs for submodule
    ++	git -C sub5 config extensions.worktreeConfig true &&
    ++
    ++	# absorb the git dir
    ++	git submodule absorbgitdirs sub5 &&
    ++
    ++	# make sure the submodule noted the superproject gitdir correctly
    ++	submodule_gitdir="$(git -C sub5 rev-parse --absolute-git-dir)" &&
    ++	superproject_gitdir="$(git rev-parse --absolute-git-dir)" &&
    ++
    ++	test-tool path-utils relative_path "$superproject_gitdir" \
    ++		"$submodule_gitdir" >expect &&
    ++	git -C sub5 config submodule.superprojectGitDir >actual &&
    ++
    ++	test_cmp expect actual &&
    ++
    ++	# make sure the config went into the submodule config.worktree
    ++	test_file_not_empty "$submodule_gitdir/config.worktree"
    ++	)
    ++'
     +
      test_done
4:  f0412d6d34 ! 4:  33a582ef13 submodule: record superproject gitdir during 'update'
    @@ Metadata
      ## Commit message ##
         submodule: record superproject gitdir during 'update'
     
    -    A recorded hint path to the superproject's gitdir might be added during
    +    A recorded path to the superproject's gitdir might be added during
         'git submodule add', but in some cases - like submodules which were
         created before 'git submodule add' learned to record that info - it might
         be useful to update the hint. Let's do it during 'git submodule
    @@ git-submodule.sh: cmd_update()
      			;;
      		esac
      
    -+		# Cache a pointer to the superproject's common dir. This may have
    -+		# changed, unless it's a fresh clone. Writes it to worktree
    -+		# if applicable, otherwise to local.
    ++		# Store a poitner to the superproject's gitdir. This may have
    ++		# changed, unless it's a fresh clone. Write to worktree if
    ++		# applicable, and point to superproject's worktree gitdir if
    ++		# applicable.
     +		if test -z "$just_cloned"
     +		then
     +			sm_gitdir="$(git -C "$sm_path" rev-parse --absolute-git-dir)"
     +			relative_gitdir="$(git rev-parse --path-format=relative \
     +							 --prefix "${sm_gitdir}" \
    -+							 --git-common-dir)"
    ++							 --git-dir)"
     +
     +			git -C "$sm_path" config --worktree \
     +				submodule.superprojectgitdir "$relative_gitdir"
    @@ t/t7406-submodule-update.sh: test_expect_success 'submodule update --quiet passe
     +	 git -C submodule config --unset submodule.superprojectGitdir &&
     +	 git submodule update &&
     +	 test-tool path-utils relative_path \
    -+		"$(git rev-parse --path-format=absolute --git-common-dir)" \
    -+		"$(git -C submodule rev-parse --path-format=absolute --git-common-dir)" >expect &&
    ++		"$(git rev-parse --absolute-git-dir)" \
    ++		"$(git -C submodule rev-parse --absolute-git-dir)" >expect &&
     +	 git -C submodule config submodule.superprojectGitdir >actual &&
     +	 test_cmp expect actual
     +	)
     +'
    ++
    ++test_expect_success 'submodule update uses config.worktree if applicable' '
    ++	(cd super &&
    ++	 git -C submodule config --unset submodule.superprojectGitDir &&
    ++	 git -C submodule config extensions.worktreeConfig true &&
    ++	 git submodule update &&
    ++	 test-tool path-utils relative_path \
    ++		"$(git rev-parse --absolute-git-dir)" \
    ++		"$(git -C submodule rev-parse --absolute-git-dir)" >expect &&
    ++	 git -C submodule config submodule.superprojectGitdir >actual &&
    ++	 test_cmp expect actual &&
    ++
    ++	 test_file_not_empty "$(git -C submodule rev-parse --absolute-git-dir)/config.worktree"
    ++	)
    ++'
     +
      test_done
-:  ---------- > 5:  a8b5d40a77 submodule: use config to find superproject worktree

Comments

Jonathan Tan Nov. 17, 2021, 11:28 p.m. UTC | #1
Emily Shaffer <emilyshaffer@google.com> writes:
> For the original cover letter, see
> https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer%40google.com.

Also for reference, v4 and v5 (as a reply to v4) can be found here:
https://lore.kernel.org/git/20211014203416.2802639-1-emilyshaffer@google.com/

> Since v5:
> 
> A couple things. Firstly, a semantics change *back* to the semantics of
> v3 - we map from gitdir to gitdir, *not* from common dir to common dir,
> so that theoretically a submodule with multiple worktrees in multiple
> superproject worktrees will be able to figure out which worktree of the
> superproject it's in. (Realistically, that's not really possible right
> now, but I'd like to change that soon.)

Makes sense. Also, thanks for the tests covering what happens when this
is run from worktrees.

> Secondly, a rewording of comments and commit messages to indicate that
> this isn't a cache of some expensive operation, but rather intended to
> be the source of truth for all submodules. I also added a fifth commit
> rewriting `git rev-parse --show-superproject-working-tree` to
> demonstrate what that means in practice - but from a practical
> standpoint, I'm a little worried about that fifth patch. More details in
> the patch 5 description.

OK - this is not the "this variable being missing is OK" idea that I had
[1], but we want to be able to depend on it to some extent. (And it is
not a cache either - we are not planning to perform an operation to
obtain the superproject gitdir if the cache is missing, but we are just
going to assume that there is no superproject.)

To that end, the 5th patch is misleading - it is behaving exactly like a
cache. I think it's better to drop it.

What would make sense to me (and seems to be in the spirit of this patch
set) is to describe this as something that Git commands can rely on to
determine if the current repo is a submodule, for performance reasons.
So maybe Git commands/parameters that directly reference the submodule
concept like "--show-superproject-working-tree" will work hard to find
the superproject (by searching the filesystem), but those that do not
(e.g. "git status") can make assumptions.

Making this variable a source of truth wouldn't work, I think, because
the source of truth is whether this repo appears in a .gitmodules file
(and that hasn't changed).

To this end, I'll comment on the changes I'd like to see on the
individual patches too.

[1] https://lore.kernel.org/git/20210727174650.2462099-1-jonathantanmy@google.com/
Emily Shaffer Nov. 23, 2021, 8:28 p.m. UTC | #2
On Wed, Nov 17, 2021 at 03:28:46PM -0800, Jonathan Tan wrote:
> 
> Emily Shaffer <emilyshaffer@google.com> writes:
> > For the original cover letter, see
> > https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer%40google.com.
> 
> Also for reference, v4 and v5 (as a reply to v4) can be found here:
> https://lore.kernel.org/git/20211014203416.2802639-1-emilyshaffer@google.com/
> 
> > Since v5:
> > 
> > A couple things. Firstly, a semantics change *back* to the semantics of
> > v3 - we map from gitdir to gitdir, *not* from common dir to common dir,
> > so that theoretically a submodule with multiple worktrees in multiple
> > superproject worktrees will be able to figure out which worktree of the
> > superproject it's in. (Realistically, that's not really possible right
> > now, but I'd like to change that soon.)
> 
> Makes sense. Also, thanks for the tests covering what happens when this
> is run from worktrees.
> 
> > Secondly, a rewording of comments and commit messages to indicate that
> > this isn't a cache of some expensive operation, but rather intended to
> > be the source of truth for all submodules. I also added a fifth commit
> > rewriting `git rev-parse --show-superproject-working-tree` to
> > demonstrate what that means in practice - but from a practical
> > standpoint, I'm a little worried about that fifth patch. More details in
> > the patch 5 description.
> 
> OK - this is not the "this variable being missing is OK" idea that I had
> [1], but we want to be able to depend on it to some extent. (And it is
> not a cache either - we are not planning to perform an operation to
> obtain the superproject gitdir if the cache is missing, but we are just
> going to assume that there is no superproject.)
> 
> To that end, the 5th patch is misleading - it is behaving exactly like a
> cache. I think it's better to drop it.

Yeah, I think you are right.

> 
> What would make sense to me (and seems to be in the spirit of this patch
> set) is to describe this as something that Git commands can rely on to
> determine if the current repo is a submodule, for performance reasons.
> So maybe Git commands/parameters that directly reference the submodule
> concept like "--show-superproject-working-tree" will work hard to find
> the superproject (by searching the filesystem), but those that do not
> (e.g. "git status") can make assumptions.

Oh interesting, I like the idea of that distinction. Thanks for the
suggestion.

I wonder, though, how best to delineate. I'd almost rather say like I
suggested to Ævar
(https://lore.kernel.org/git/YZ1KLNwsxx7IR1%2B5%40google.com), that we
can treat "--show-superproject-working-tree" as a "legacy" option people
are already using, and treat anything added from now on as "if it
doesn't think it is a submodule, you should 'git submodule update' in
the superproject". That relies on us being able to reliably keep the
value of submodule.superprojectGitDir correct, but I think the
gitdir->gitdir linking helps with that (as opposed to earlier
iterations).
> 
> Making this variable a source of truth wouldn't work, I think, because
> the source of truth is whether this repo appears in a .gitmodules file
> (and that hasn't changed).

Interestingly, '--show-superproject-working-tree' doesn't check the
.gitmodules file at all. It checks whether the project found in the
filesystem above it knows about an object named path/to/superproject/
which is a gitlink - that is, it asks the index, not .gitmodules, as far
as I understand it. So if the only existing alternative to
submodule.superprojectGitDir isn't treating .gitmodules as source of
truth, then what does that mean?

> 
> To this end, I'll comment on the changes I'd like to see on the
> individual patches too.
> 
> [1] https://lore.kernel.org/git/20210727174650.2462099-1-jonathantanmy@google.com/
Emily Shaffer Feb. 3, 2022, 9:59 p.m. UTC | #3
For the original cover letter, see
https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer%40google.com.

CI run: https://github.com/nasamuffin/git/actions/runs/1780282431

Since v6:

I've dropped the fifth commit to use this new config for `git rev-parse
--show-superproject-working-tree`. I think it did more harm than good -
that tool uses an odd way to determine whether the superproject is
actually the superproject, anyways.

I poked a little bit at trying to find some benchmark to demonstrate
that "submodule.superprojectGitDir" is actually faster - but I didn't
end up able to write one without writing a ton of new code to traverse
the filesystem. To be honest, I'm not all that interested in performance
- I want the config added for correctness, instead.

So, the only real changes between v6 and v7 are some documentation
changes suggested by Jonathan Tan
(https://lore.kernel.org/git/20211117234300.2598132-1-jonathantanmy%40google.com).

Since v5:

A couple things. Firstly, a semantics change *back* to the semantics of
v3 - we map from gitdir to gitdir, *not* from common dir to common dir,
so that theoretically a submodule with multiple worktrees in multiple
superproject worktrees will be able to figure out which worktree of the
superproject it's in. (Realistically, that's not really possible right
now, but I'd like to change that soon.)

Secondly, a rewording of comments and commit messages to indicate that
this isn't a cache of some expensive operation, but rather intended to
be the source of truth for all submodules. I also added a fifth commit
rewriting `git rev-parse --show-superproject-working-tree` to
demonstrate what that means in practice - but from a practical
standpoint, I'm a little worried about that fifth patch. More details in
the patch 5 description.

I did discuss Ævar's idea of relying on in-process filesystem digging to
find the superproject's gitdir with the rest of the Google team, but in
the end decided that there are some worries about filesystem digging in
this way (namely, some ugly interactions with network drives that are
actually already an issue for Googler Linux machines). Plus, the allure
of being able to definitively know that we're a submodule is pretty
strong. ;) But overall, this is the direction I'd prefer to keep going
in, rather than trying to guess from the filesystem going forward.

Since v4:

The only real change here is a slight semantics change to map from
<submodule gitdir> to <superproject common git dir>. In every case
*except* for when the superproject has a worktree, this changes nothing.
For the case when the superproject has a worktree, this means that now
submodules will refer to the general superproject common dir (e.g. no
worktree-specific refs or configs or whatnot).

I *think* that because a submodule should exist in the context of the
common dir, not the worktree gitdir, that is ok. However, it does mean
it would be difficult to do something like sharing a config specific to
the worktree (the initial goal of this series).

$ROOT/.git
$ROOT/.git/config.superproject <- shared by $ROOT/.git/modules/sub
$ROOT/.git/modules/sub <- points to $ROOT/.git
$ROOT/.git/worktrees/wt
$ROOT/.git/worktrees/wt/config.superproject <- contains a certain config-based pre-commit hook

If the submodule only knows about the common dir, that is tough, because
the submodule would basically have to guess which worktree it's in from
its own path. There would be no way for '$WT/sub' to inherit
'$ROOT/.git/worktrees/wt/config.superproject'.

That said... right now, we don't support submodules in worktrees very
well at all. A submodule in a worktree will get a brand new gitdir in
$ROOT/.git/worktrees/modules/ (and that brand new gitdir would point to
the super's common dir). So I think we can punt on this entire question
until we teach submodules and worktrees to play more gracefully together
(it's on my long list...), and at that time we can probably introduce a
pointer from $ROOT/.git/modules/sub/worktrees/wt/ to
$ROOT/.git/worktrees/wt/....

Or, to summarize the long ramble above: "this is still kind of weird
with worktrees, but let's fix it later when we fix worktrees more
thoroughly".

(More rambling about worktree weirdness here:
https://lore.kernel.org/git/YYRaII8YWVxlBqsF%40google.com )


Since v3, a pretty major change: the semantics of
submodule.superprojectGitDir has changed, to point from the submodule's
gitdir to the superproject's gitdir (in v3 and earlier, we kept a path
from the submodule's *worktree* to the superproject's gitdir instead).
This cleans up some of the confusions about the behavior when a
submodule worktree moves around in the superproject's tree, or in a
future when we support submodules having multiple worktrees.

I also tried to simplify the tests to use 'test-tool path-utils
relative_path' everywhere - I think that makes them much more clear for
a test reader, but if you're reviewing and it isn't obvious what we're
testing for, please speak up.

I think this is pretty mature and there was a lot of general agreement
that the gitdir->gitdir association was the way to go, so please be
brutal and look for nits, leaks, etc. this round ;)
[/v4 cover letter]


Emily Shaffer (4):
  t7400-submodule-basic: modernize inspect() helper
  introduce submodule.superprojectGitDir record
  submodule: record superproject gitdir during absorbgitdirs
  submodule: record superproject gitdir during 'update'

 Documentation/config/submodule.txt |  9 ++++
 builtin/submodule--helper.c        | 11 ++++
 git-submodule.sh                   | 15 ++++++
 submodule.c                        | 23 +++++++++
 t/t7400-submodule-basic.sh         | 54 +++++++++++---------
 t/t7406-submodule-update.sh        | 27 ++++++++++
 t/t7412-submodule-absorbgitdirs.sh | 82 +++++++++++++++++++++++++++++-
 7 files changed, 194 insertions(+), 27 deletions(-)

Range-diff against v6:
1:  f1b08a7057 = 1:  251510c687 t7400-submodule-basic: modernize inspect() helper
2:  d46c8439ab ! 2:  1a85deb1c5 introduce submodule.superprojectGitDir record
    @@ Metadata
      ## Commit message ##
         introduce submodule.superprojectGitDir record
     
    -    Teach submodules a reference to their superproject's gitdir. This allows
    -    us to A) know that we're running from a submodule, and B) have a
    -    shortcut to the superproject's vitals, for example, configs.
    +    Teach submodules a config variable referencing to their superproject's
    +    gitdir. Git commands can rely on this reference to determine whether the
    +    current repo is a submodule to another repo. If this reference is
    +    absent, Git may assume the current repo is not a submodule.
    +
    +    In practice, some commands which specifically reference the submodule
    +    relationship, like 'git rev-parse --show-superproject-working-tree',
    +    will still search the parent directory. Other newly added or implicit
    +    behavior, such as "git status" showing the submodule's status in
    +    relation to the superproject, or a config which is shared between the
    +    superproject and the submodule, should not search the parent directory
    +    and instead rely on this "submodule.superprojectGitDir" config.
     
         By using a relative path instead of an absolute path, we can move the
         superproject directory around on the filesystem without breaking the
    @@ Commit message
         superproject's worktree gitdir (if it exists), we ensure that we can
         tell which worktree contains our submodule.
     
    -    Since this hint value is only introduced during new submodule creation
    -    via `git submodule add`, though, there is more work to do to allow the
    -    record to be created at other times.
    -
    -    Once this new config is reliably in place, we can use it to know
    -    definitively that we are working in a submodule, and to know which
    -    superproject we are a submodule of. This allows us to do some
    -    value-added behavior, like letting "git status" print additional info
    -    about the submodule's status in relation to its superproject, or like
    -    letting the superproject and submodule share an additional config file
    -    separate from either one's local config.
    +    This commit teaches "git submodule add" to add the aformentioned config
    +    variable. Subsequent commits will teach other commands to do so.
     
         Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
         Helped-by: Junio C Hamano <gitster@pobox.com>
    @@ Documentation/config/submodule.txt: submodule.alternateErrorStrategy::
      	clone proceeds as if no alternate was specified.
     +
     +submodule.superprojectGitDir::
    -+	The relative path from the submodule's gitdir to its superproject's
    -+	gitdir. When Git is run in a repository, it usually makes no
    -+	difference whether this repository is standalone or a submodule, but if
    -+	this configuration variable is present, additional behavior may be
    -+	possible, such as "git status" printing additional information about
    -+	this submodule's status with respect to its superproject. This config
    -+	should only be present in projects which are submodules, but is not
    -+	guaranteed to be present in every submodule, so only optional
    -+	value-added behavior should be linked to it. It is set automatically
    -+	during submodule creation.
    ++	If this repository is a submodule, the relative path from this repo's
    ++	gitdir to its superproject's gitdir. Git commands may rely on this
    ++	reference to determine whether the current repo is a submodule to
    ++	another repo; if this reference is absent, Git will treat the current
    ++	repo as though it is not a submodule (this does not make a difference to
    ++	most Git commands). It is set automatically during submodule creation.
     
      ## builtin/submodule--helper.c ##
     @@ builtin/submodule--helper.c: static int clone_submodule(struct module_clone_data *clone_data)
3:  63ddaf5608 ! 3:  7a44b0edf9 submodule: record superproject gitdir during absorbgitdirs
    @@ Commit message
     
         Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
     
    + ## Documentation/config/submodule.txt ##
    +@@ Documentation/config/submodule.txt: submodule.superprojectGitDir::
    + 	reference to determine whether the current repo is a submodule to
    + 	another repo; if this reference is absent, Git will treat the current
    + 	repo as though it is not a submodule (this does not make a difference to
    +-	most Git commands). It is set automatically during submodule creation.
    ++	most Git commands). It is set automatically during submodule creation
    ++	and 'git submodule absorbgitdir'.
    +
      ## submodule.c ##
     @@ submodule.c: static void relocate_single_git_dir_into_superproject(const char *path)
      	char *old_git_dir = NULL, *real_old_git_dir = NULL, *real_new_git_dir = NULL;
4:  33a582ef13 ! 4:  63e736e69d submodule: record superproject gitdir during 'update'
    @@ Commit message
     
         Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
     
    + ## Documentation/config/submodule.txt ##
    +@@ Documentation/config/submodule.txt: submodule.superprojectGitDir::
    + 	reference to determine whether the current repo is a submodule to
    + 	another repo; if this reference is absent, Git will treat the current
    + 	repo as though it is not a submodule (this does not make a difference to
    +-	most Git commands). It is set automatically during submodule creation
    +-	and 'git submodule absorbgitdir'.
    ++	most Git commands). It is set automatically during submodule creation,
    ++	update, and 'git submodule absorbgitdir'.
    +
      ## git-submodule.sh ##
     @@ git-submodule.sh: cmd_update()
      			;;
5:  a8b5d40a77 < -:  ---------- submodule: use config to find superproject worktree
Junio C Hamano Feb. 3, 2022, 10:39 p.m. UTC | #4
Emily Shaffer <emilyshaffer@google.com> writes:

> A couple things. Firstly, a semantics change *back* to the semantics of
> v3 - we map from gitdir to gitdir, *not* from common dir to common dir,
> so that theoretically a submodule with multiple worktrees in multiple
> superproject worktrees will be able to figure out which worktree of the
> superproject it's in. (Realistically, that's not really possible right
> now, but I'd like to change that soon.)

Sounds sensible.

> Secondly, a rewording of comments and commit messages to indicate that
> this isn't a cache of some expensive operation, but rather intended to
> be the source of truth for all submodules.

I'd expect that there is a way (e.g. "git fsck") that helps the
users notice when the actual filesystem layout contradicts with what
the gitdir-to-gitdir link says, and repair the repositories when
they go out of sync if possible.

It would be similar to "git worktree", where a link between the
".git" file that records "gitdir:" in a secondary worktree and the
repository's $GIT_DIR/worktrees/*/gitdir, and the "repair" command
can be used to bring them back in sync after moving the real
repository without telling the secondary worktree about the move.

> I did discuss Ævar's idea of relying on in-process filesystem digging to
> find the superproject's gitdir with the rest of the Google team, but in
> the end decided that there are some worries about filesystem digging in
> this way (namely, some ugly interactions with network drives that are
> actually already an issue for Googler Linux machines). Plus, the allure
> of being able to definitively know that we're a submodule is pretty
> strong.

The other side of the coin is that, even when a configuration
variable says that you are a submodule of the superproject at
location X, if such a submodule gets moved out of the superproject
(perhaps because the end-user wanted to concentrate on that
submodule project alone as an independent project) and the
superproject that used to be at location X got archived away,
trusting and relying on what the configuration variable says would
not help us access the now-gone superproject.  And that would not
change no matter how strongly we declare that it is the source of
truth.

Unless we have a very good way to detect inconsistency and stop
spreading the damage (e.g. the setting thought our superproject sits
at directory X, but that location is now occupied by a different
repository that is not related), I am still skeptical about the
"setting is the sole truth" design.
Ævar Arnfjörð Bjarmason Feb. 4, 2022, 1:15 a.m. UTC | #5
On Thu, Feb 03 2022, Emily Shaffer wrote:

> Since v6:

Thanks for the re-roll!

> I've dropped the fifth commit to use this new config for `git rev-parse
> --show-superproject-working-tree`. I think it did more harm than good -
> that tool uses an odd way to determine whether the superproject is
> actually the superproject, anyways.
>
> I poked a little bit at trying to find some benchmark to demonstrate
> that "submodule.superprojectGitDir" is actually faster - but I didn't
> end up able to write one without writing a ton of new code to traverse
> the filesystem.

I'm assuming that's tested against some variant of the submodule-in-C[1]
conversion. I.e. at least when I tested it [2][3] it seemed easy to come
up with (probably overly artificial) benchmarks where it would matter
for the shell-out in this series.

The question performance-wise was rather whether we'd just be
introducing the config mechanism as a transitory performance workaround,
and the need for it would evaporate once that C migration happened (re
original CL quoted in [3]).

> To be honest, I'm not all that interested in performance
> - I want the config added for correctness, instead.

And I'm honestly still at the point of not even being against this whole
thing, although it probably sounds like that. I'm really not.

I just genuinely don't get where this is headed. I.e. for the last
iteration I did a demo patch on top that showed that there was no case
added by the series where the on-the-fly discovery wasn't equivalent to
the set-in-config value[4].

That change showed that after this series in a state where the config
*is* redundant to on the fly discovery (or maybe not, and we're just
missing test coverage).

But since you're citing correctness do you have some repo->sub
relationship in mind that would be ambiguous in a way where the
configuration would resolve the ambiguity?

I can imagine how such a thing might work, e.g. if we gave submodules
some git-worktree-like method of being completely detached from the
parent. I.e. being able to place a /usr/me/repo.git whose submodule
entry for a "test" dir lives in /opt/tests or something. So when you "cd
/opt/tests" you wouldn't be able to detect you're within a submodule.

(I'm assuming the case where the submodule has its own "in-tree" .git,
is that even supported anymore...?)

But I can't think of one where such an ambiguity would arise within our
current featureset.

What I really can't see is how if the need for such "config [...] for
correctness" would arise how that doesn't also invalidate the
assumptions you're making in 3/4 and 4/4.

I.e. surely if we need the config for correctness it's also true that we
can't after-the-fact add the config on the fly to (such) existing
submodules without user intervention. Or maybe the ambiguity would only
arise from the POV of the submodule, but not for commands executed
within the parent?

1. https://lore.kernel.org/git/cover-v5-0.9-00000000000-20220128T125206Z-avarab@gmail.com/
2. https://lore.kernel.org/git/RFC-cover-0.2-00000000000-20211117T113134Z-avarab@gmail.com/
3. https://lore.kernel.org/git/211124.86a6hue2wk.gmgdl@evledraar.gmail.com/
4. https://lore.kernel.org/git/RFC-patch-2.2-b49d4c8db7d-20211117T113134Z-avarab@gmail.com/
Junio C Hamano Feb. 4, 2022, 4:20 p.m. UTC | #6
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I just genuinely don't get where this is headed. I.e. for the last
> iteration I did a demo patch on top that showed that there was no case
> added by the series where the on-the-fly discovery wasn't equivalent to
> the set-in-config value[4].
>
> That change showed that after this series in a state where the config
> *is* redundant to on the fly discovery (or maybe not, and we're just
> missing test coverage).
>
> But since you're citing correctness do you have some repo->sub
> relationship in mind that would be ambiguous in a way where the
> configuration would resolve the ambiguity?

This is an excellent question, which I wish I could have raised in
my earlier response.  A clear explanation why this setting is not
a redundant copy but adds real information on top of what we should
be able to learn from the filesystem structure would really help in
justifying the new thing.

Thanks.
Jonathan Nieder Feb. 7, 2022, 7:56 p.m. UTC | #7
Hi,

Ævar Arnfjörð Bjarmason wrote:
> On Thu, Feb 03 2022, Emily Shaffer wrote:

>> To be honest, I'm not all that interested in performance
>> - I want the config added for correctness, instead.
>
> And I'm honestly still at the point of not even being against this whole
> thing, although it probably sounds like that. I'm really not.
>
> I just genuinely don't get where this is headed. I.e. for the last
> iteration I did a demo patch on top that showed that there was no case
> added by the series where the on-the-fly discovery wasn't equivalent to
> the set-in-config value[4].

Here's a few examples:

1. Suppose I track my $HOME directory as a git repository.  Within my
   home directory, I have a src/git/ subdirectory with a clone of
   git.git, but I never intended to treat this as a submodule.

   If I run "git rev-parse --show-superproject-working-tree", then it
   will discover my home directory repository, run ls-files in there
   to see if it has GITLINK entries, and either see one for src/git if
   I had "git add"ed it by mistake or not see one.  In either case,
   it would it would view my src/git/ directory as being a submodule
   of my home directory even though I hadn't intended it to be so.

2. Suppose I have a copy of a repository such as
   https://gerrit.googlesource.com/gerrit/, with all its submodules.
   I am in the plugins/replication/ directory.

   If I run "git rev-parse --show-superproject-working-tree", then it
   will discover my gerrit repository, run ls-files in there to see if
   it has GITLINK entries, and use the result to decide whether the
   cwd is a submodule.  So for example, if I had run "git rm --cached
   plugins/replication" to _prepare to_ remove the plugins/replication
   submodule, then "git rev-parse --show-superproject-working-tree"
   will produce the wrong result.

3. Suppose I am not using submodules at all.  I have a clone of
   mawk.git and I am working there.

   If I run "git rev-parse --show-superproject-working-tree", then I'm
   presumably interested in doing something submodule-specific;
   nothing wrong with that.  But the series we're responding to is
   meant to support a wider variety of operations --- for example,
   suppose I am running a plain "git status" operation.

   If "git status" runs "git rev-parse
   --show-superproject-working-tree", then git would walk up the
   filesystem above my mawk/ directory, looking for another .git dir.
   We can reach an NFS automounter directory and just hang.  Even
   without an NFS automounter, we'd expect this to take a while
   because, unlike normal repository discovery, we have no reason to
   believe that the walk is going to quickly discover a .git directory
   and terminate.  So this would violate user expectations.

Thanks and hope that helps,
Jonathan

> 4. https://lore.kernel.org/git/RFC-patch-2.2-b49d4c8db7d-20211117T113134Z-avarab@gmail.com/
Junio C Hamano Feb. 7, 2022, 11:21 p.m. UTC | #8
Jonathan Nieder <jrnieder@gmail.com> writes:

> Here's a few examples:
>
> 1. Suppose I track my $HOME directory as a git repository.  Within my
>    home directory, I have a src/git/ subdirectory with a clone of
>    git.git, but I never intended to treat this as a submodule.
>
>    If I run "git rev-parse --show-superproject-working-tree", then it
>    will discover my home directory repository, run ls-files in there
>    to see if it has GITLINK entries, and either see one for src/git if
>    I had "git add"ed it by mistake or not see one.  In either case,
>    it would it would view my src/git/ directory as being a submodule
>    of my home directory even though I hadn't intended it to be so.

I am not sure about this one.  If you added an unrelated one with
"git add" by mistake, you'd want to know about the mistake sooner
rather than later, no?

> 2. Suppose I have a copy of a repository such as
>    https://gerrit.googlesource.com/gerrit/, with all its submodules.
>    I am in the plugins/replication/ directory.
>
>    If I run "git rev-parse --show-superproject-working-tree", then it
>    will discover my gerrit repository, run ls-files in there to see if
>    it has GITLINK entries, and use the result to decide whether the
>    cwd is a submodule.  So for example, if I had run "git rm --cached
>    plugins/replication" to _prepare to_ remove the plugins/replication
>    submodule, then "git rev-parse --show-superproject-working-tree"
>    will produce the wrong result.

Yes, looking only at the index of the superproject will have that
problem, but don't other things in the superproject point at the
submodule, too, e.g. submodule.<name>.* configuration variables?

And then, after removing them to truly dissociate the submodule from
the superproject, "git rev-parse --show-superproject-working-tree"
may stop saying that it is a submodule, but this series wants to
make it irrelevant what the command says.  Until you unset the
configuration variable in the submodule, it will stay to be a
submodule of the superproject, but the superproject no longer thinks
it is responsible for the submodule.  You'll have to deal with an
inconsistent state during the transition either way, so I am not
sure it is the best solution to introduce an extra setting that can
easily go out of sync.

> 3. Suppose I am not using submodules at all.  I have a clone of
>    mawk.git and I am working there.
>
>    If I run "git rev-parse --show-superproject-working-tree", then I'm
>    presumably interested in doing something submodule-specific;
>    nothing wrong with that.  But the series we're responding to is
>    meant to support a wider variety of operations --- for example,
>    suppose I am running a plain "git status" operation.
>
>    If "git status" runs "git rev-parse
>    --show-superproject-working-tree", then git would walk up the
>    filesystem above my mawk/ directory, looking for another .git dir.
>    We can reach an NFS automounter directory and just hang.  Even
>    without an NFS automounter, we'd expect this to take a while
>    because, unlike normal repository discovery, we have no reason to
>    believe that the walk is going to quickly discover a .git directory
>    and terminate.  So this would violate user expectations.

It would be a problem, but I do not know if "this is a submodule of
that superproject" link is the only solution, let alone the most
effective one.  It seems to me that you are looking more for
something like GIT_CEILING_DIRECTORIES.
Jonathan Nieder Feb. 8, 2022, 1:18 a.m. UTC | #9
Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>> Here's a few examples:
>>
>> 1. Suppose I track my $HOME directory as a git repository.  Within my
>>    home directory, I have a src/git/ subdirectory with a clone of
>>    git.git, but I never intended to treat this as a submodule.
>>
>>    If I run "git rev-parse --show-superproject-working-tree", then it
>>    will discover my home directory repository, run ls-files in there
>>    to see if it has GITLINK entries, and either see one for src/git if
>>    I had "git add"ed it by mistake or not see one.  In either case,
>>    it would it would view my src/git/ directory as being a submodule
>>    of my home directory even though I hadn't intended it to be so.
>
> I am not sure about this one.  If you added an unrelated one with
> "git add" by mistake, you'd want to know about the mistake sooner
> rather than later, no?

My point with this example is that it's useful to have a definition of
what is a submodule repository, to make it unambiguous whether this
repository is a submodule or whether it's just a repository that
happens to have been cloned inside of a git-managed worktree.

For the specific example of having run "git add", I don't have any
very strong opinions.

[...]
>> 2. Suppose I have a copy of a repository such as
>>    https://gerrit.googlesource.com/gerrit/, with all its submodules.
>>    I am in the plugins/replication/ directory.
[...]
>>                         So for example, if I had run "git rm --cached
>>    plugins/replication" to _prepare to_ remove the plugins/replication
>>    submodule, then "git rev-parse --show-superproject-working-tree"
>>    will produce the wrong result.
>
> Yes, looking only at the index of the superproject will have that
> problem, but don't other things in the superproject point at the
> submodule, too, e.g. submodule.<name>.* configuration variables?

What all of those suggested alternatives have in common is that they
are pointers from another repository to the submodule.

This would be the first time in git history that we are saying a
property of a repository depends on having to examine files outside of
it.  I guess the main question I'd have is, why _wouldn't_ I want a
submodule to be able to point to the superproject containing it?  I
can think of many advantages to having that linkage, and the main
disadvantage I can think of is that it is a change.

I don't think that submodule.<name>.* is an adequate substitute for
having this setting, because it requires
- finding the superproject
- mapping the <name> to a path, using .gitmodules
- comparing the path to the submodule location

which would be complex, slow, and error-prone.

The one thing that I think could approach being an adequate substitute
is examining the path to the current repository and stripping off path
components until we find modules/; then the parent is the containing
superproject.  That would only work for absorbed submodules, though,
and it would be less explicit than having a config item.

> And then, after removing them to truly dissociate the submodule from
> the superproject, "git rev-parse --show-superproject-working-tree"
> may stop saying that it is a submodule, but this series wants to
> make it irrelevant what the command says.  Until you unset the
> configuration variable in the submodule, it will stay to be a
> submodule of the superproject, but the superproject no longer thinks
> it is responsible for the submodule.  You'll have to deal with an
> inconsistent state during the transition either way, so I am not
> sure it is the best solution to introduce an extra setting that can
> easily go out of sync.

This hints at a reason why one wouldn't want the linkage back ---
dealing with the ambiguity of inconsistencies (what if a submodule
declares a superproject but the superproject does not declare the
submodule?).

I would not expect that ambiguity to be much of a problem,
because the typical way to use superproject linkage would be to
print output from commands like "git status": for example,

	This is a submodule of ../../gerrit; you can run

		git -C ../../gerrit status

	to get the status of the superproject.

An inconsistency could occur due to the user using "mv" (instead of
"git mv") to move a submodule to a path a different number of path
components from its superproject.  One way to handle that would be to
make submodules record a boolean setting reflecting whether they are a
submodule, instead of the path to the superproject.  (This would be
similar to settings like core.bare.)  Alternatively, if the path to
the superproject is recorded and if "git fsck" is able to notice such
an inconsistency, then the user should be able to have an okay
experience repairing it.

[...]
>>    If "git status" runs "git rev-parse
>>    --show-superproject-working-tree", then git would walk up the
>>    filesystem above my mawk/ directory, looking for another .git dir.
>>    We can reach an NFS automounter directory and just hang.  Even
>>    without an NFS automounter, we'd expect this to take a while
>>    because, unlike normal repository discovery, we have no reason to
>>    believe that the walk is going to quickly discover a .git directory
>>    and terminate.  So this would violate user expectations.
>
> It would be a problem, but I do not know if "this is a submodule of
> that superproject" link is the only solution, let alone the most
> effective one.  It seems to me that you are looking more for
> something like GIT_CEILING_DIRECTORIES.

Who is the "you" addressed here?  The end user can use
GIT_CEILING_DIRECTORIES if they are expecting to run git commands
within an NFS automounter directory and outside of any git repository,
but they'd be right to be surprised if that suddenly became required
when inside git repositories.  I don't think we should assume that
running an extra .git discovery walk is cost-free to users who are not
using submodules and an acceptable burden to impose on them for the
sake of submodule users.

Thanks and hope that helps,
Jonathan
Junio C Hamano Feb. 8, 2022, 6:24 p.m. UTC | #10
Jonathan Nieder <jrnieder@gmail.com> writes:

> My point with this example is that it's useful to have a definition of
> what is a submodule repository, to make it unambiguous whether this
> repository is a submodule or whether it's just a repository that
> happens to have been cloned inside of a git-managed worktree.

OK, together with the other "no need to let NFS automounter worry
about parent directories", it makes a very successful argument for a
single bit (i.e. this is a free-standing repository and is not a
submodule, so no need to auto-discover if it is one).  I think the
"Alternatively" you later mention to solve ambiguity with just a
single bit, without "this is a submodule of that superproject"
linkage, is essentially the same?

But I do not think it argues for the need to say "a config, not
filesystem layout, must be the single source of truth to say which
superproject this repository belongs as its submodule".

> This would be the first time in git history that we are saying a
> property of a repository depends on having to examine files outside of
> it.

Well, path-based configuration inclusion, with configuration driven
hooks, I do not think the distinction matters much anymore in these
days.

> I guess the main question I'd have is, why _wouldn't_ I want a
> submodule to be able to point to the superproject containing it?

Because with (the absense of) a single "this is freestanding" bit, 
by default the filesystem layout can already "point" at it?
Emily Shaffer Feb. 10, 2022, 10:12 p.m. UTC | #11
On Tue, Feb 08, 2022 at 10:24:49AM -0800, Junio C Hamano wrote:
> 
> Jonathan Nieder <jrnieder@gmail.com> writes:
> 
> > My point with this example is that it's useful to have a definition of
> > what is a submodule repository, to make it unambiguous whether this
> > repository is a submodule or whether it's just a repository that
> > happens to have been cloned inside of a git-managed worktree.
> 
> OK, together with the other "no need to let NFS automounter worry
> about parent directories", it makes a very successful argument for a
> single bit (i.e. this is a free-standing repository and is not a
> submodule, so no need to auto-discover if it is one).  I think the
> "Alternatively" you later mention to solve ambiguity with just a
> single bit, without "this is a submodule of that superproject"
> linkage, is essentially the same?

That resolution - "teach submodules to know they're submodules, but not
whose submodule they are" - would still count as a success to me. The
reason I proposed a path instead of a boolean here was simply because
storing a path is a boolean (by whether it's present or not) *and*
additional information (the path to the superproject), and it seemed
silly to me to opt for less information. Or, to put it another way - "am
I a submodule?" seems pretty vital, and "yes, and I belong to xyz" is an
optimization on top of that. So I don't terribly mind sending this as
just a boolean, if we feel that the effort to keep it up outweighs the
benefit of saving us a filesystem walk.

I'm not completely convinced that it does, though - would the addition
of a 'git fsck' check for this config be satisfactory? In other words,
is the problem that the execution of this series wasn't thorough enough
and it should be refined, or that the concept itself is beyond saving?

 - Emily

> 
> But I do not think it argues for the need to say "a config, not
> filesystem layout, must be the single source of truth to say which
> superproject this repository belongs as its submodule".
> 
> > This would be the first time in git history that we are saying a
> > property of a repository depends on having to examine files outside of
> > it.
> 
> Well, path-based configuration inclusion, with configuration driven
> hooks, I do not think the distinction matters much anymore in these
> days.
> 
> > I guess the main question I'd have is, why _wouldn't_ I want a
> > submodule to be able to point to the superproject containing it?
> 
> Because with (the absense of) a single "this is freestanding" bit, 
> by default the filesystem layout can already "point" at it?
Jonathan Nieder Feb. 10, 2022, 10:53 p.m. UTC | #12
Emily Shaffer wrote:
> On Tue, Feb 08, 2022 at 10:24:49AM -0800, Junio C Hamano wrote:
>> Jonathan Nieder <jrnieder@gmail.com> writes:

>>> My point with this example is that it's useful to have a definition of
>>> what is a submodule repository, to make it unambiguous whether this
>>> repository is a submodule or whether it's just a repository that
>>> happens to have been cloned inside of a git-managed worktree.
>>
>> OK, together with the other "no need to let NFS automounter worry
>> about parent directories", it makes a very successful argument for a
>> single bit (i.e. this is a free-standing repository and is not a
>> submodule, so no need to auto-discover if it is one).  I think the
>> "Alternatively" you later mention to solve ambiguity with just a
>> single bit, without "this is a submodule of that superproject"
>> linkage, is essentially the same?
>
> That resolution - "teach submodules to know they're submodules, but not
> whose submodule they are" - would still count as a success to me.

Thanks, both.  Sounds like a good path forward.

[...]
>                              So I don't terribly mind sending this as
> just a boolean, if we feel that the effort to keep it up outweighs the
> benefit of saving us a filesystem walk.
>
> I'm not completely convinced that it does, though

Personally, I'm convinced --- e.g., life gets painful enough when
core.worktree ends up pointing to the wrong path, so being able to
avoid that complexity seems like a nice outcome.

>                                                  - would the addition
> of a 'git fsck' check for this config be satisfactory? In other words,
> is the problem that the execution of this series wasn't thorough enough
> and it should be refined, or that the concept itself is beyond saving?

For the absorbed case, the path to the superproject should be pretty
stable, and in that case it's probably possible to make this robust
enough.  (That said, the path to the superproject gitdir would
typically just be "../..", at least as long as we have the other patch
to escape slashes in the submodule name.)  In the non-absorbed case,
it seems likely to get messy fast because a user can "mv" the
submodule around.

Another kind of case that gets interesting is when there are multiple
superproject worktrees and multiple submodule worktrees.  Does the
relative path become a per-worktree variable?  Using a boolean saves
us from having to think through it.

Thanks for the clear explanations,
Jonathan
Ævar Arnfjörð Bjarmason Feb. 12, 2022, 8:35 p.m. UTC | #13
On Mon, Feb 07 2022, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Feb 03 2022, Emily Shaffer wrote:
>
>>> To be honest, I'm not all that interested in performance
>>> - I want the config added for correctness, instead.
>>
>> And I'm honestly still at the point of not even being against this whole
>> thing, although it probably sounds like that. I'm really not.
>>
>> I just genuinely don't get where this is headed. I.e. for the last
>> iteration I did a demo patch on top that showed that there was no case
>> added by the series where the on-the-fly discovery wasn't equivalent to
>> the set-in-config value[4].
>
> Here's a few examples:

I've read the downthread, but it's probably best to reply to this...

> 1. Suppose I track my $HOME directory as a git repository.  Within my
>    home directory, I have a src/git/ subdirectory with a clone of
>    git.git, but I never intended to treat this as a submodule.
>
>    If I run "git rev-parse --show-superproject-working-tree", then it
>    will discover my home directory repository, run ls-files in there
>    to see if it has GITLINK entries, and either see one for src/git if
>    I had "git add"ed it by mistake or not see one.  In either case,
>    it would it would view my src/git/ directory as being a submodule
>    of my home directory even though I hadn't intended it to be so.
>
> 2. Suppose I have a copy of a repository such as
>    https://gerrit.googlesource.com/gerrit/, with all its submodules.
>    I am in the plugins/replication/ directory.
>
>    If I run "git rev-parse --show-superproject-working-tree", then it
>    will discover my gerrit repository, run ls-files in there to see if
>    it has GITLINK entries, and use the result to decide whether the
>    cwd is a submodule.  So for example, if I had run "git rm --cached
>    plugins/replication" to _prepare to_ remove the plugins/replication
>    submodule, then "git rev-parse --show-superproject-working-tree"
>    will produce the wrong result.

These both seem like valid edge cases, but they're still going to be the
same edge case on the "parent" side even with a proposed cache (whether
it's a boolean or a path).

I.e. the question here is really not one of caching, but of what it
means for Y to be a submodule of X.

I assumed that we'd prefer a 1=1 relationship between the parent
reporting that Y is a submodule of it, and Y reporting that it is a
submodule (of the parent at some <path>).

If that's the case we can walk up and ask parent .git's whether they
think the <path> is their submodule.

If it's not the case perhaps a config is needed, but then that surely
has wider implications. I.e. won't it be the case that we can't add the
config after-the-fact as this series proposes in those some ambiguous
cases?xo

> 3. Suppose I am not using submodules at all.  I have a clone of
>    mawk.git and I am working there.
>
>    If I run "git rev-parse --show-superproject-working-tree", then I'm
>    presumably interested in doing something submodule-specific;
>    nothing wrong with that.  But the series we're responding to is
>    meant to support a wider variety of operations --- for example,
>    suppose I am running a plain "git status" operation.
>
>    If "git status" runs "git rev-parse
>    --show-superproject-working-tree", then git would walk up the
>    filesystem above my mawk/ directory, looking for another .git dir.
>    We can reach an NFS automounter directory and just hang.  Even
>    without an NFS automounter, we'd expect this to take a while
>    because, unlike normal repository discovery, we have no reason to
>    believe that the walk is going to quickly discover a .git directory
>    and terminate.  So this would violate user expectations.

We have a /a/b/c/d.git mounted, but not a parent /a/b/, and walking
upwards causes it to be mounted?
Junio C Hamano Feb. 13, 2022, 6:25 a.m. UTC | #14
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> We have a /a/b/c/d.git mounted, but not a parent /a/b/, and walking
> upwards causes it to be mounted?

;-)