diff mbox series

[v2,2/4] introduce submodule.superprojectGitDir cache

Message ID 20210616004508.87186-3-emilyshaffer@google.com (mailing list archive)
State New, archived
Headers show
Series cache parent project's gitdir in submodules | expand

Commit Message

Emily Shaffer June 16, 2021, 12:45 a.m. UTC
Teach submodules a reference to their superproject's gitdir. This allows
us to A) know that we're running from a submodule, and B) have a
shortcut to the superproject's vitals, for example, configs.

By using a relative path instead of an absolute path, we can move the
superproject directory around on the filesystem without breaking the
submodule's cache.

Since this cached value is only introduced during new submodule creation
via `git submodule add`, though, there is more work to do to allow the
cache to be created at other times.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Documentation/config/submodule.txt | 12 +++++++++
 builtin/submodule--helper.c        |  4 +++
 t/t7400-submodule-basic.sh         | 40 ++++++++++++++++--------------
 3 files changed, 38 insertions(+), 18 deletions(-)

Comments

Junio C Hamano June 16, 2021, 4:40 a.m. UTC | #1
Emily Shaffer <emilyshaffer@google.com> writes:

> Teach submodules a reference to their superproject's gitdir. This allows
> us to A) know that we're running from a submodule, and B) have a
> shortcut to the superproject's vitals, for example, configs.
>
> By using a relative path instead of an absolute path, we can move the
> superproject directory around on the filesystem without breaking the
> submodule's cache.
>
> Since this cached value is only introduced during new submodule creation
> via `git submodule add`, though, there is more work to do to allow the
> cache to be created at other times.
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  Documentation/config/submodule.txt | 12 +++++++++
>  builtin/submodule--helper.c        |  4 +++
>  t/t7400-submodule-basic.sh         | 40 ++++++++++++++++--------------
>  3 files changed, 38 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt
> index d7a63c8c12..7c459cc19e 100644
> --- a/Documentation/config/submodule.txt
> +++ b/Documentation/config/submodule.txt
> @@ -90,3 +90,15 @@ submodule.alternateErrorStrategy::
>  	`ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore`
>  	or `info`, and if there is an error with the computed alternate, the
>  	clone proceeds as if no alternate was specified.
> +
> +submodule.superprojectGitDir::
> +	The relative path from the submodule's worktree  to the superproject's
> +	gitdir. This config should only be present in projects which are
> +	submodules, but is not guaranteed to be present in every submodule. It
> +	is set automatically during submodule creation.
> ++
> +	In situations where more than one superproject references the same
> +	submodule worktree, the value of this config and the behavior of
> +	operations which use it are undefined. To reference a single project
> +	from multiple superprojects, it is better to create a worktree of the
> +	submodule for each superproject.

You'd need to dedent the second paragraph that follows a lone '+'
sign to typeset this correctly.

The new paragraph suggests separate worktrees for the same submodule
repository, but for that to work correctly,

 - "git clone [--recurse-submodules]" that clones the second
   superproject that shares the same submodule with a superproject
   that we already locally has to support a way for users to tell
   where to grab that existing submodule from and arrange a new
   worktree, instead of creating another instance of the submodule
   repository by cloning it afresh.

 - The "submodule.superprojectGitDir" needs to be set to
   per-worktree half of the repo-local configuration file.

Because I usually do not pay much attention to the submodule part of
the toolset, I may well be mistaken, but I suspect that the former
does not exist yet.  If I recall correctly, the latter was a NEEDSWORK
item in the previous round of this patchset?

As I said, I think it is OK for now to stop at declaring that you
cannot simply do it without hinting something that may not fully
work.

Thanks.
Junio C Hamano June 16, 2021, 4:43 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> As I said, I think it is OK for now to stop at declaring that you
> cannot simply do it without hinting something that may not fully
> work.

I'll add the following to the tip of the topic for now.


 Documentation/config/submodule.txt | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git c/Documentation/config/submodule.txt w/Documentation/config/submodule.txt
index 7c459cc19e..58ba63a75e 100644
--- c/Documentation/config/submodule.txt
+++ w/Documentation/config/submodule.txt
@@ -97,8 +97,5 @@ submodule.superprojectGitDir::
 	submodules, but is not guaranteed to be present in every submodule. It
 	is set automatically during submodule creation.
 +
-	In situations where more than one superproject references the same
-	submodule worktree, the value of this config and the behavior of
-	operations which use it are undefined. To reference a single project
-	from multiple superprojects, it is better to create a worktree of the
-	submodule for each superproject.
+Because of this configuration variable, it is forbidden to use the
+same submodule worktree shared by multiple superprojects.
Emily Shaffer June 18, 2021, midnight UTC | #3
On Wed, Jun 16, 2021 at 01:40:36PM +0900, Junio C Hamano wrote:
> 
> Emily Shaffer <emilyshaffer@google.com> writes:
> > +submodule.superprojectGitDir::
> > +	The relative path from the submodule's worktree  to the superproject's
> > +	gitdir. This config should only be present in projects which are
> > +	submodules, but is not guaranteed to be present in every submodule. It
> > +	is set automatically during submodule creation.
> > ++
> > +	In situations where more than one superproject references the same
> > +	submodule worktree, the value of this config and the behavior of
> > +	operations which use it are undefined. To reference a single project
> > +	from multiple superprojects, it is better to create a worktree of the
> > +	submodule for each superproject.
> 
> You'd need to dedent the second paragraph that follows a lone '+'
> sign to typeset this correctly.

Ok.

> 
> The new paragraph suggests separate worktrees for the same submodule
> repository, but for that to work correctly,
> 
>  - "git clone [--recurse-submodules]" that clones the second
>    superproject that shares the same submodule with a superproject
>    that we already locally has to support a way for users to tell
>    where to grab that existing submodule from and arrange a new
>    worktree, instead of creating another instance of the submodule
>    repository by cloning it afresh.
> 
>  - The "submodule.superprojectGitDir" needs to be set to
>    per-worktree half of the repo-local configuration file.
> 
> Because I usually do not pay much attention to the submodule part of
> the toolset, I may well be mistaken, but I suspect that the former
> does not exist yet.  If I recall correctly, the latter was a NEEDSWORK
> item in the previous round of this patchset?
> 
> As I said, I think it is OK for now to stop at declaring that you
> cannot simply do it without hinting something that may not fully
> work.

Yeah, that is all correct. Ok, I will drop the broken suggestion.

 - Emily
Emily Shaffer June 18, 2021, 12:03 a.m. UTC | #4
On Wed, Jun 16, 2021 at 01:43:36PM +0900, Junio C Hamano wrote:
> 
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > As I said, I think it is OK for now to stop at declaring that you
> > cannot simply do it without hinting something that may not fully
> > work.
> 
> I'll add the following to the tip of the topic for now.

Just saw this - yes, it looks fine to me. I'll squash that locally in
case anybody has more comments and wants a reroll.

> 
> 
>  Documentation/config/submodule.txt | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git c/Documentation/config/submodule.txt w/Documentation/config/submodule.txt
> index 7c459cc19e..58ba63a75e 100644
> --- c/Documentation/config/submodule.txt
> +++ w/Documentation/config/submodule.txt
> @@ -97,8 +97,5 @@ submodule.superprojectGitDir::
>  	submodules, but is not guaranteed to be present in every submodule. It
>  	is set automatically during submodule creation.
>  +
> -	In situations where more than one superproject references the same
> -	submodule worktree, the value of this config and the behavior of
> -	operations which use it are undefined. To reference a single project
> -	from multiple superprojects, it is better to create a worktree of the
> -	submodule for each superproject.
> +Because of this configuration variable, it is forbidden to use the
> +same submodule worktree shared by multiple superprojects.
Jonathan Tan July 27, 2021, 5:46 p.m. UTC | #5
> Teach submodules a reference to their superproject's gitdir. This allows
> us to A) know that we're running from a submodule, and B) have a
> shortcut to the superproject's vitals, for example, configs.

The first sentence is probably better "Introduce a new config variable
storing a submodule's reference to its superproject's gitdir, and teach
'git submodule add' to write it".

Also, I think there should be more detail about the planned use both
here in the commit message and in the config documentation. Is the plan
just to use it for best-effort explanatory messages? (Using it as a true
cache is probably too performance-intensive, I would think, since in its
absence, we have no idea whether the repo is a submodule and would
always have to search to the root of the filesystem.) If it is just for
best-effort explanatory messages, maybe write:

  If present, commands like "git status" in this submodule may print
  additional information about this submodule's status with respect to
  its superproject.

This would further reinforce that this variable being missing is OK.

> diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt
> index d7a63c8c12..7c459cc19e 100644
> --- a/Documentation/config/submodule.txt
> +++ b/Documentation/config/submodule.txt
> @@ -90,3 +90,15 @@ submodule.alternateErrorStrategy::
>  	`ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore`
>  	or `info`, and if there is an error with the computed alternate, the
>  	clone proceeds as if no alternate was specified.
> +
> +submodule.superprojectGitDir::
> +	The relative path from the submodule's worktree  to the superproject's
> +	gitdir. This config should only be present in projects which are
> +	submodules, but is not guaranteed to be present in every submodule. It
> +	is set automatically during submodule creation.
> ++
> +	In situations where more than one superproject references the same
> +	submodule worktree, the value of this config and the behavior of
> +	operations which use it are undefined. To reference a single project
> +	from multiple superprojects, it is better to create a worktree of the
> +	submodule for each superproject.

So my suggestion would be:

  The relative path from this repository's worktree to its
  superproject's gitdir. When Git is run in a repository, it usually
  makes no difference whether this repository is standalone or a
  submodule, but if this configuration variable is present, commands
  like "git status" in this submodule may print additional information
  about this submodule's status with respect to its superproject.

(I believe Junio has commented on the second paragraph - I don't have
additional comments on that.)
Emily Shaffer Aug. 19, 2021, 5:53 p.m. UTC | #6
On Tue, Jul 27, 2021 at 10:46:50AM -0700, Jonathan Tan wrote:
> 
> > Teach submodules a reference to their superproject's gitdir. This allows
> > us to A) know that we're running from a submodule, and B) have a
> > shortcut to the superproject's vitals, for example, configs.
> 
> The first sentence is probably better "Introduce a new config variable
> storing a submodule's reference to its superproject's gitdir, and teach
> 'git submodule add' to write it".
> 
> Also, I think there should be more detail about the planned use both
> here in the commit message and in the config documentation. Is the plan
> just to use it for best-effort explanatory messages? (Using it as a true
> cache is probably too performance-intensive, I would think, since in its
> absence, we have no idea whether the repo is a submodule and would
> always have to search to the root of the filesystem.) If it is just for
> best-effort explanatory messages, maybe write:
> 
>   If present, commands like "git status" in this submodule may print
>   additional information about this submodule's status with respect to
>   its superproject.
> 
> This would further reinforce that this variable being missing is OK.

Ok, I'll expand the commit message. The first use case for this extra
pointer is for a shared config between superproject and submodule, which
I've sent a series for already; I'll mention that in the commit message
too.

> 
> > diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt
> > index d7a63c8c12..7c459cc19e 100644
> > --- a/Documentation/config/submodule.txt
> > +++ b/Documentation/config/submodule.txt
> > @@ -90,3 +90,15 @@ submodule.alternateErrorStrategy::
> >  	`ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore`
> >  	or `info`, and if there is an error with the computed alternate, the
> >  	clone proceeds as if no alternate was specified.
> > +
> > +submodule.superprojectGitDir::
> > +	The relative path from the submodule's worktree  to the superproject's
> > +	gitdir. This config should only be present in projects which are
> > +	submodules, but is not guaranteed to be present in every submodule. It
> > +	is set automatically during submodule creation.
> > ++
> > +	In situations where more than one superproject references the same
> > +	submodule worktree, the value of this config and the behavior of
> > +	operations which use it are undefined. To reference a single project
> > +	from multiple superprojects, it is better to create a worktree of the
> > +	submodule for each superproject.
> 
> So my suggestion would be:
> 
>   The relative path from this repository's worktree to its
>   superproject's gitdir. When Git is run in a repository, it usually
>   makes no difference whether this repository is standalone or a
>   submodule, but if this configuration variable is present, commands
>   like "git status" in this submodule may print additional information
>   about this submodule's status with respect to its superproject.
> 
> (I believe Junio has commented on the second paragraph - I don't have
> additional comments on that.)

The spirit of this suggestion seems to be "describe a possible side
effect of this config", so I'll do that, although the wording may not be
exactly the same. Thanks.

 - Emily
Ævar Arnfjörð Bjarmason Oct. 14, 2021, 7:25 p.m. UTC | #7
On Tue, Jun 15 2021, Emily Shaffer wrote:

> Teach submodules a reference to their superproject's gitdir. This allows
> us to A) know that we're running from a submodule, and B) have a
> shortcut to the superproject's vitals, for example, configs.
>
> By using a relative path instead of an absolute path, we can move the
> superproject directory around on the filesystem without breaking the
> submodule's cache.
>
> Since this cached value is only introduced during new submodule creation
> via `git submodule add`, though, there is more work to do to allow the
> cache to be created at other times.
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  Documentation/config/submodule.txt | 12 +++++++++
>  builtin/submodule--helper.c        |  4 +++
>  t/t7400-submodule-basic.sh         | 40 ++++++++++++++++--------------
>  3 files changed, 38 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt
> index d7a63c8c12..7c459cc19e 100644
> --- a/Documentation/config/submodule.txt
> +++ b/Documentation/config/submodule.txt
> @@ -90,3 +90,15 @@ submodule.alternateErrorStrategy::
>  	`ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore`
>  	or `info`, and if there is an error with the computed alternate, the
>  	clone proceeds as if no alternate was specified.
> +
> +submodule.superprojectGitDir::
> +	The relative path from the submodule's worktree  to the superproject's
> +	gitdir. This config should only be present in projects which are
> +	submodules, but is not guaranteed to be present in every submodule. It
> +	is set automatically during submodule creation.
> ++
> +	In situations where more than one superproject references the same
> +	submodule worktree, the value of this config and the behavior of
> +	operations which use it are undefined. To reference a single project
> +	from multiple superprojects, it is better to create a worktree of the
> +	submodule for each superproject.
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index d55f6262e9..d60fcd2c7d 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -1910,6 +1910,10 @@ static int module_clone(int argc, const char **argv, const char *prefix)
>  		git_config_set_in_file(p, "submodule.alternateErrorStrategy",
>  					   error_strategy);
>  
> +	git_config_set_in_file(p, "submodule.superprojectGitdir",
> +			       relative_path(absolute_path(get_git_dir()),
> +					     path, &sb));
> +
>  	free(sm_alternate);
>  	free(error_strategy);
>  

Am I correct that what this series really does is avoid needing to:

1. Run the equivalent of $(git rev-parse --absolute-git-dir), let's call
   the result of that $X.
2. Feed that to the equvialent of $(git -C $X/../ rev-parse
   --absolute-git-dir)
3. Check if the relationship between the two is really that of a
   submodule, i.e. running "git submodule status", check if $X contains
   ".git/modules/" etc.

I see an earlier iteration of this series had such a shell-out, and that
this is the "cache":
https://lore.kernel.org/git/20210423001539.4059524-5-emilyshaffer@google.com/;
and your v1 cover letter seems to support the above summary:
https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer@google.com/

I think it's fine to fine to add such a cache in principle if it's needed.

But I'm a bit iffy on a series that's structured in a way as to not
start by asserting that we should have given semantics without the
cache, and then adds the cache later as an optional optimization.

Particularly as the whole submodule business is moving to C, so isn't
this whole caching business something we can avoid doing, and instead
just call a C function? The optimization part of this was not calling
"git rev-parse" on every submodule invocation wasn't it, not avoiding a
few syscalls that deal with the FS.

Your initial RFC had modifications to git-submodule.sh, in the interim
it seems that's been moved sufficiently to C that we're modifying just
the submodule.c here.

I'm not very familiar with setup_git_directory_gently_1(),
discover_git_directory() etc, but wherever you are in a git worktree
we'll chdir() to the top-level when running built-ins.

So that gives us step #1 of the above. And #2 would be adding "/../" to
the end of that path and running those functions again? Perhaps with a
#3 for "is there a submodule relationship?".

Even if we still have some bits in shellscript etc, couldn't we then
setenv() that in some GIT_* variable, e.g. GIT_SUPERPROJECT_DIR?

Or is the problem really that this isn't a cache, because we can't say
with absolute certainty that there is such a gitdir/submodule
relationship, except at the time of running "git submodule" in the
former for some reason?
diff mbox series

Patch

diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt
index d7a63c8c12..7c459cc19e 100644
--- a/Documentation/config/submodule.txt
+++ b/Documentation/config/submodule.txt
@@ -90,3 +90,15 @@  submodule.alternateErrorStrategy::
 	`ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore`
 	or `info`, and if there is an error with the computed alternate, the
 	clone proceeds as if no alternate was specified.
+
+submodule.superprojectGitDir::
+	The relative path from the submodule's worktree  to the superproject's
+	gitdir. This config should only be present in projects which are
+	submodules, but is not guaranteed to be present in every submodule. It
+	is set automatically during submodule creation.
++
+	In situations where more than one superproject references the same
+	submodule worktree, the value of this config and the behavior of
+	operations which use it are undefined. To reference a single project
+	from multiple superprojects, it is better to create a worktree of the
+	submodule for each superproject.
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index d55f6262e9..d60fcd2c7d 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1910,6 +1910,10 @@  static int module_clone(int argc, const char **argv, const char *prefix)
 		git_config_set_in_file(p, "submodule.alternateErrorStrategy",
 					   error_strategy);
 
+	git_config_set_in_file(p, "submodule.superprojectGitdir",
+			       relative_path(absolute_path(get_git_dir()),
+					     path, &sb));
+
 	free(sm_alternate);
 	free(error_strategy);
 
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index f5dc051a6e..e45f42588f 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -108,14 +108,18 @@  test_expect_success 'setup - repository to add submodules to' '
 submodurl=$(pwd -P)
 
 inspect() {
-	dir=$1 &&
-
-	git -C "$dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
-	{ git -C "$dir" symbolic-ref HEAD || :; } >head &&
-	git -C "$dir" rev-parse HEAD >head-sha1 &&
-	git -C "$dir" update-index --refresh &&
-	git -C "$dir" diff-files --exit-code &&
-	git -C "$dir" clean -n -d -x >untracked
+	sub_dir=$1 &&
+	super_dir=$2 &&
+
+	git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
+	{ git -C "$sub_dir" symbolic-ref HEAD || :; } >head &&
+	git -C "$sub_dir" rev-parse HEAD >head-sha1 &&
+	git -C "$sub_dir" update-index --refresh &&
+	git -C "$sub_dir" diff-files --exit-code &&
+	cached_super_dir="$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" &&
+	[ "$(git -C "$super_dir" rev-parse --absolute-git-dir)" \
+		-ef "$sub_dir/$cached_super_dir" ] &&
+	git -C "$sub_dir" clean -n -d -x >untracked
 }
 
 
@@ -139,7 +143,7 @@  test_expect_success 'submodule add' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod &&
+	inspect addtest/submod addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -230,7 +234,7 @@  test_expect_success 'submodule add --branch' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod-branch &&
+	inspect addtest/submod-branch addtest &&
 	test_cmp expect-heads heads &&
 	test_cmp expect-head head &&
 	test_must_be_empty untracked
@@ -246,7 +250,7 @@  test_expect_success 'submodule add with ./ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotsubmod/frotz &&
+	inspect addtest/dotsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -262,7 +266,7 @@  test_expect_success 'submodule add with /././ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotslashdotsubmod/frotz &&
+	inspect addtest/dotslashdotsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -278,7 +282,7 @@  test_expect_success 'submodule add with // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/slashslashsubmod/frotz &&
+	inspect addtest/slashslashsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -294,7 +298,7 @@  test_expect_success 'submodule add with /.. in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod &&
+	inspect addtest/realsubmod addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -310,7 +314,7 @@  test_expect_success 'submodule add with ./, /.. and // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod2 &&
+	inspect addtest/realsubmod2 addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -341,7 +345,7 @@  test_expect_success 'submodule add in subdirectory' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod3 &&
+	inspect addtest/realsubmod3 addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -482,7 +486,7 @@  test_expect_success 'update should work when path is an empty dir' '
 	git submodule update -q >update.out &&
 	test_must_be_empty update.out &&
 
-	inspect init &&
+	inspect init . &&
 	test_cmp expect head-sha1
 '
 
@@ -541,7 +545,7 @@  test_expect_success 'update should checkout rev1' '
 	echo "$rev1" >expect &&
 
 	git submodule update init &&
-	inspect init &&
+	inspect init . &&
 
 	test_cmp expect head-sha1
 '