diff mbox series

[RFC,6/6] add: reject nested repositories

Message ID 20230213182134.2173280-7-calvinwan@google.com (mailing list archive)
State Superseded
Headers show
Series add: block invalid submodules | expand

Commit Message

Calvin Wan Feb. 13, 2023, 6:21 p.m. UTC
From: Josh Steadmon <steadmon@google.com>

As noted in 532139940c (add: warn when adding an embedded repository,
2017-06-14), adding embedded repositories results in subpar experience
compared to submodules, due to the lack of a corresponding .gitmodules
entry, which means later clones of the top-level repository cannot
locate the embedded repo. We expect that this situation is usually
unintentional, which is why 532139940c added a warning message and
advice when users attempt to add an embedded repo.

At $dayjob, we have found that even this advice is insufficient to stop
users from committing unclonable embedded repos in shared projects.
This causes toil for the owners of the top-level project repository as
they must clean up the resulting gitlinks. Additionally, these mistakes
are often made by partners outside of $dayjob, which means that a simple
organization-wide change to the default Git config would be insufficient
to prevent these mistakes.

Due to this experience, we believe that Git's default behavior should be
changed to disallow adding embedded repositories. This commit changes
the existing warning into a fatal error while preserving the
`--no-warn-embedded-repo` flag as a way to bypass this check.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 Documentation/git-add.txt          |  7 ++++---
 builtin/add.c                      | 28 ++++++++++++++++++++--------
 t/t7400-submodule-basic.sh         |  4 ++--
 t/t7412-submodule-absorbgitdirs.sh |  2 +-
 t/t7414-submodule-mistakes.sh      | 21 ++++++++++-----------
 t/t7450-bad-git-dotfiles.sh        |  2 +-
 6 files changed, 38 insertions(+), 26 deletions(-)

Comments

Jeff King Feb. 13, 2023, 8:42 p.m. UTC | #1
On Mon, Feb 13, 2023 at 06:21:34PM +0000, Calvin Wan wrote:

> As noted in 532139940c (add: warn when adding an embedded repository,
> 2017-06-14), adding embedded repositories results in subpar experience
> compared to submodules, due to the lack of a corresponding .gitmodules
> entry, which means later clones of the top-level repository cannot
> locate the embedded repo. We expect that this situation is usually
> unintentional, which is why 532139940c added a warning message and
> advice when users attempt to add an embedded repo.

As the author of 532139940c, this escalation to an error seems like a
reasonable step to me.

The patch looks pretty reasonable to me from a cursory read, but here a
few small comments:

> diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
> index a030d33c6e..b7fb95b061 100644
> --- a/Documentation/git-add.txt
> +++ b/Documentation/git-add.txt
> @@ -177,10 +177,11 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files.
>  	tree or not.
>  
>  --no-warn-embedded-repo::
> -	By default, `git add` will warn when adding an embedded
> +	By default, `git add` will error out when adding an embedded

The option name here is rather unfortunate, since it's no longer a
warning. But keeping it as-is for historical compatibility may be the
best option. In retrospect, I wish I'd called it --allow-embedded-repos
or something.

> diff --git a/builtin/add.c b/builtin/add.c
> index 76277df326..795d9251b9 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -421,36 +421,45 @@ static const char embedded_advice[] = N_(
>  "\n"
>  "	git rm --cached %s\n"
>  "\n"
> -"See \"git help submodule\" for more information."
> +"See \"git help submodule\" for more information.\n"
> +"\n"
> +"If you cannot use submodules, you may bypass this check with:\n"
> +"\n"
> +"	git add --no-warn-embedded-repo %s\n"
>  );

I was a little surprised by this hunk, but I guess if we are going to
block the user's operation from completing, we might want to tell them
how to get around it. But it seems odd to me that the instructions to
"git rm --cached" the submodule remain. If this situation is now an
error and not a warning, there is nothing to roll back from the index,
since we will have bailed before writing it.

If we are going to start recommending --no-warn-embedded-repo here,
would we want to promote it from being OPT_HIDDEN_BOOL()? We do document
it in the manpage, but just omit it from the "-h" output, since it
should be rarely used. Maybe it is OK to stay that way; you don't need
it until you run into this situation, at which point the advice
hopefully has guided you in the right direction.

> -static void check_embedded_repo(const char *path)
> +static int check_embedded_repo(const char *path)
>  {
> +	int ret = 0;
>  	struct strbuf name = STRBUF_INIT;
>  	static int adviced_on_embedded_repo = 0;
>  
>  	if (!warn_on_embedded_repo)
> -		return;
> +		goto cleanup;
>  	if (!ends_with(path, "/"))
> -		return;
> +		goto cleanup;
> +
> +	ret = 1;

I wondered about these "goto cleanup" calls here, since there is nothing
to clean up yet. But you are just piggy-backing on the "return ret",
rather than a separate "return 0" here.

And I was surprised by returning at all, since the point is to make this
a hard error. But it looks like the intent is to report an error for
every such case, and then a final die, like:

  $ git add .
  error: cannot add embedded git repository: foo
  advice: ...
  error: cannot add embedded git repository: bar
  fatal: refusing to add embedded git repositories

OK. I doubt anybody cares much either way (it is more convenient if you
have a lot of cases, at the expense of making the error more verbose
when there is only one case), so this is fine.

>  	/* Drop trailing slash for aesthetics */
>  	strbuf_addstr(&name, path);
>  	strbuf_strip_suffix(&name, "/");
>  
> -	warning(_("adding embedded git repository: %s"), name.buf);
> +	error(_("cannot add embedded git repository: %s"), name.buf);
>  	if (!adviced_on_embedded_repo &&
>  	    advice_enabled(ADVICE_ADD_EMBEDDED_REPO)) {
> -		advise(embedded_advice, name.buf, name.buf);
> +		advise(embedded_advice, name.buf, name.buf, name.buf);

This triple name.buf that must match the earlier string is horrible, of
course, but nothing new in your patch. If you drop the "rm --cached"
part from the advice, it can remain as a horrible double-mention of
name.buf. :)

> diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
> index eae6a46ef3..e0bcecba6e 100755
> --- a/t/t7400-submodule-basic.sh
> +++ b/t/t7400-submodule-basic.sh
> @@ -118,7 +118,7 @@ test_expect_success 'setup - repository in init subdirectory' '
>  test_expect_success 'setup - commit with gitlink' '
>  	echo a >a &&
>  	echo z >z &&
> -	git add a init z &&
> +	git add --no-warn-embedded-repo a init z &&
>  	git commit -m "super commit 1"
>  '
>  
> @@ -771,7 +771,7 @@ test_expect_success 'set up for relative path tests' '
>  			git init &&
>  			test_commit foo
>  		) &&
> -		git add sub &&
> +		git add --no-warn-embedded-repo sub &&
>  		git config -f .gitmodules submodule.sub.path sub &&
>  		git config -f .gitmodules submodule.sub.url ../subrepo &&
>  		cp .git/config pristine-.git-config &&

OK, these are cases that are warning now (because they are trying to do
something clever with the setup), but which will be blocked. Arguably
they should have --no-warn-embedded-repo already, but nobody cared
so far that their stderr was a little noisy.

I do wonder if there are any users who might do clever things like this
themselves, but they'd already have been nagged by the warning (and
hopefully discovered --no-warn-embedded-repo on their own).

-Peff
Junio C Hamano Feb. 14, 2023, 2:17 a.m. UTC | #2
Jeff King <peff@peff.net> writes:

>>  "	git rm --cached %s\n"
>>  "\n"
>> -"See \"git help submodule\" for more information."
>> +"See \"git help submodule\" for more information.\n"
>> +"\n"
>> +"If you cannot use submodules, you may bypass this check with:\n"
>> +"\n"
>> +"	git add --no-warn-embedded-repo %s\n"
>>  );
>
> I was a little surprised by this hunk, but I guess if we are going to
> block the user's operation from completing, we might want to tell them
> how to get around it. But it seems odd to me that the instructions to
> "git rm --cached" the submodule remain. If this situation is now an
> error and not a warning, there is nothing to roll back from the index,
> since we will have bailed before writing it.
>
> If we are going to start recommending --no-warn-embedded-repo here,
> would we want to promote it from being OPT_HIDDEN_BOOL()? We do document
> it in the manpage, but just omit it from the "-h" output, since it
> should be rarely used. Maybe it is OK to stay that way; you don't need
> it until you run into this situation, at which point the advice
> hopefully has guided you in the right direction.

If we are keeping the escape hatch, it would make sense to actually
use that escape hatch to protect existing "git add" with that,
instead of turning them into "git submodule add" and then adjust the
tests for the consequences (i.e. "submodule add" does more than what
"git add [--no-warn-embedded-repo]" would), at least for these tests
in [3,4,5/6].  

Also I do not think it is too late for a more natural UI, e.g.
"--allow-embedded-repo=[yes/no/warn]", to deprecate the
"--[no-]warn-*" option.
Jeff King Feb. 14, 2023, 4:07 p.m. UTC | #3
On Mon, Feb 13, 2023 at 06:17:53PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >>  "	git rm --cached %s\n"
> >>  "\n"
> >> -"See \"git help submodule\" for more information."
> >> +"See \"git help submodule\" for more information.\n"
> >> +"\n"
> >> +"If you cannot use submodules, you may bypass this check with:\n"
> >> +"\n"
> >> +"	git add --no-warn-embedded-repo %s\n"
> >>  );
> >
> > I was a little surprised by this hunk, but I guess if we are going to
> > block the user's operation from completing, we might want to tell them
> > how to get around it. But it seems odd to me that the instructions to
> > "git rm --cached" the submodule remain. If this situation is now an
> > error and not a warning, there is nothing to roll back from the index,
> > since we will have bailed before writing it.
> >
> > If we are going to start recommending --no-warn-embedded-repo here,
> > would we want to promote it from being OPT_HIDDEN_BOOL()? We do document
> > it in the manpage, but just omit it from the "-h" output, since it
> > should be rarely used. Maybe it is OK to stay that way; you don't need
> > it until you run into this situation, at which point the advice
> > hopefully has guided you in the right direction.
> 
> If we are keeping the escape hatch, it would make sense to actually
> use that escape hatch to protect existing "git add" with that,
> instead of turning them into "git submodule add" and then adjust the
> tests for the consequences (i.e. "submodule add" does more than what
> "git add [--no-warn-embedded-repo]" would), at least for these tests
> in [3,4,5/6].

Good point. I did not really look at the test modifications, but
anywhere that is triggering the current warning is arguably a good spot
to be using --no-warn-embedded-repo already. It is simply that the test
did not bother to look at their noisy stderr. And such a modification is
obviously correct, as there are no further implications for the test.

> Also I do not think it is too late for a more natural UI, e.g.
> "--allow-embedded-repo=[yes/no/warn]", to deprecate the
> "--[no-]warn-*" option.

True. We have to keep the existing form for backwards compatibility, but
we can certainly add a new one.

I kind of doubt that --allow-embedded-repo=warn is useful, though. If a
caller knows what it is doing is OK, then it would say "yes". And
otherwise, you'd want "no". There is no situation where a caller is
unsure.

-Peff
Junio C Hamano Feb. 14, 2023, 4:32 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

>> If we are keeping the escape hatch, it would make sense to actually
>> use that escape hatch to protect existing "git add" with that,
>> instead of turning them into "git submodule add" and then adjust the
>> tests for the consequences (i.e. "submodule add" does more than what
>> "git add [--no-warn-embedded-repo]" would), at least for these tests
>> in [3,4,5/6].
>
> Good point. I did not really look at the test modifications, but
> anywhere that is triggering the current warning is arguably a good spot
> to be using --no-warn-embedded-repo already. It is simply that the test
> did not bother to look at their noisy stderr. And such a modification is
> obviously correct, as there are no further implications for the test.

I did not mean that no "git add" that create a gitlink in existing
tests should be made into "git submodule add".  The ones that
clearly wanted to set up tests to see what happens in a top-level
with a subproject may become more realistic tests by switching to
"git submodule add" and updating the expected "git diff HEAD" output
to include a newly created .gitmodules file.  But some of the tests
are merely to see what happens with an index with a gitlink in it,
and "add --no-warn" would be more appropriate for them.

>> Also I do not think it is too late for a more natural UI, e.g.
>> "--allow-embedded-repo=[yes/no/warn]", to deprecate the
>> "--[no-]warn-*" option.
>
> True. We have to keep the existing form for backwards compatibility, but
> we can certainly add a new one.
>
> I kind of doubt that --allow-embedded-repo=warn is useful, though. If a
> caller knows what it is doing is OK, then it would say "yes". And
> otherwise, you'd want "no". There is no situation where a caller is
> unsure.

Yeah, if the default becomes "no", then there isn't much point,
other than just for completeness, to have "warn" as a choice.

Thanks.
Calvin Wan Feb. 14, 2023, 9:45 p.m. UTC | #5
On Tue, Feb 14, 2023 at 8:32 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> >> If we are keeping the escape hatch, it would make sense to actually
> >> use that escape hatch to protect existing "git add" with that,
> >> instead of turning them into "git submodule add" and then adjust the
> >> tests for the consequences (i.e. "submodule add" does more than what
> >> "git add [--no-warn-embedded-repo]" would), at least for these tests
> >> in [3,4,5/6].
> >
> > Good point. I did not really look at the test modifications, but
> > anywhere that is triggering the current warning is arguably a good spot
> > to be using --no-warn-embedded-repo already. It is simply that the test
> > did not bother to look at their noisy stderr. And such a modification is
> > obviously correct, as there are no further implications for the test.
>
> I did not mean that no "git add" that create a gitlink in existing
> tests should be made into "git submodule add".  The ones that
> clearly wanted to set up tests to see what happens in a top-level
> with a subproject may become more realistic tests by switching to
> "git submodule add" and updating the expected "git diff HEAD" output
> to include a newly created .gitmodules file.  But some of the tests
> are merely to see what happens with an index with a gitlink in it,
> and "add --no-warn" would be more appropriate for them.

I'll take another pass into the modified tests from previous patches
and pick out ones that are not specifically submodule related tests.

> >> Also I do not think it is too late for a more natural UI, e.g.
> >> "--allow-embedded-repo=[yes/no/warn]", to deprecate the
> >> "--[no-]warn-*" option.
> >
> > True. We have to keep the existing form for backwards compatibility, but
> > we can certainly add a new one.
> >
> > I kind of doubt that --allow-embedded-repo=warn is useful, though. If a
> > caller knows what it is doing is OK, then it would say "yes". And
> > otherwise, you'd want "no". There is no situation where a caller is
> > unsure.
>
> Yeah, if the default becomes "no", then there isn't much point,
> other than just for completeness, to have "warn" as a choice.

I don't see a point for "warn" as well. The default "no" case should
carry over part of the deprecated warning from before.
diff mbox series

Patch

diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
index a030d33c6e..b7fb95b061 100644
--- a/Documentation/git-add.txt
+++ b/Documentation/git-add.txt
@@ -177,10 +177,11 @@  for "git add --no-all <pathspec>...", i.e. ignored removed files.
 	tree or not.
 
 --no-warn-embedded-repo::
-	By default, `git add` will warn when adding an embedded
+	By default, `git add` will error out when adding an embedded
 	repository to the index without using `git submodule add` to
-	create an entry in `.gitmodules`. This option will suppress the
-	warning (e.g., if you are manually performing operations on
+	create an entry in `.gitmodules`. This option will allow the
+	embedded repository to be added and suppress the error.
+	(e.g., if you are manually performing operations on
 	submodules).
 
 --renormalize::
diff --git a/builtin/add.c b/builtin/add.c
index 76277df326..795d9251b9 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -421,36 +421,45 @@  static const char embedded_advice[] = N_(
 "\n"
 "	git rm --cached %s\n"
 "\n"
-"See \"git help submodule\" for more information."
+"See \"git help submodule\" for more information.\n"
+"\n"
+"If you cannot use submodules, you may bypass this check with:\n"
+"\n"
+"	git add --no-warn-embedded-repo %s\n"
 );
 
-static void check_embedded_repo(const char *path)
+static int check_embedded_repo(const char *path)
 {
+	int ret = 0;
 	struct strbuf name = STRBUF_INIT;
 	static int adviced_on_embedded_repo = 0;
 
 	if (!warn_on_embedded_repo)
-		return;
+		goto cleanup;
 	if (!ends_with(path, "/"))
-		return;
+		goto cleanup;
+
+	ret = 1;
 
 	/* Drop trailing slash for aesthetics */
 	strbuf_addstr(&name, path);
 	strbuf_strip_suffix(&name, "/");
 
-	warning(_("adding embedded git repository: %s"), name.buf);
+	error(_("cannot add embedded git repository: %s"), name.buf);
 	if (!adviced_on_embedded_repo &&
 	    advice_enabled(ADVICE_ADD_EMBEDDED_REPO)) {
-		advise(embedded_advice, name.buf, name.buf);
+		advise(embedded_advice, name.buf, name.buf, name.buf);
 		adviced_on_embedded_repo = 1;
 	}
 
+cleanup:
 	strbuf_release(&name);
+	return ret;
 }
 
 static int add_files(struct dir_struct *dir, int flags)
 {
-	int i, exit_status = 0;
+	int i, exit_status = 0, embedded_repo = 0;
 	struct string_list matched_sparse_paths = STRING_LIST_INIT_NODUP;
 
 	if (dir->ignored_nr) {
@@ -476,10 +485,13 @@  static int add_files(struct dir_struct *dir, int flags)
 				die(_("adding files failed"));
 			exit_status = 1;
 		} else {
-			check_embedded_repo(dir->entries[i]->name);
+			embedded_repo |= check_embedded_repo(dir->entries[i]->name);
 		}
 	}
 
+	if (embedded_repo)
+		die(_("refusing to add embedded git repositories"));
+
 	if (matched_sparse_paths.nr) {
 		advise_on_updating_sparse_paths(&matched_sparse_paths);
 		exit_status = 1;
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index eae6a46ef3..e0bcecba6e 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -118,7 +118,7 @@  test_expect_success 'setup - repository in init subdirectory' '
 test_expect_success 'setup - commit with gitlink' '
 	echo a >a &&
 	echo z >z &&
-	git add a init z &&
+	git add --no-warn-embedded-repo a init z &&
 	git commit -m "super commit 1"
 '
 
@@ -771,7 +771,7 @@  test_expect_success 'set up for relative path tests' '
 			git init &&
 			test_commit foo
 		) &&
-		git add sub &&
+		git add --no-warn-embedded-repo sub &&
 		git config -f .gitmodules submodule.sub.path sub &&
 		git config -f .gitmodules submodule.sub.url ../subrepo &&
 		cp .git/config pristine-.git-config &&
diff --git a/t/t7412-submodule-absorbgitdirs.sh b/t/t7412-submodule-absorbgitdirs.sh
index 2859695c6d..365953b558 100755
--- a/t/t7412-submodule-absorbgitdirs.sh
+++ b/t/t7412-submodule-absorbgitdirs.sh
@@ -100,7 +100,7 @@  test_expect_success 'absorb the git dir in a nested submodule' '
 test_expect_success 'setup a gitlink with missing .gitmodules entry' '
 	git init sub2 &&
 	test_commit -C sub2 first &&
-	git add sub2 &&
+	git add --no-warn-embedded-repo sub2 &&
 	git commit -m superproject
 '
 
diff --git a/t/t7414-submodule-mistakes.sh b/t/t7414-submodule-mistakes.sh
index 101afff30f..dbdcbe909d 100755
--- a/t/t7414-submodule-mistakes.sh
+++ b/t/t7414-submodule-mistakes.sh
@@ -10,31 +10,30 @@  test_expect_success 'create embedded repository' '
 	test_commit -C embed one
 '
 
-test_expect_success 'git-add on embedded repository warns' '
-	test_when_finished "git rm --cached -f embed" &&
-	git add embed 2>stderr &&
-	test_i18ngrep warning stderr
+test_expect_success 'git-add on embedded repository dies' '
+	test_must_fail git add embed 2>stderr &&
+	test_i18ngrep fatal stderr
 '
 
-test_expect_success '--no-warn-embedded-repo suppresses warning' '
+test_expect_success '--no-warn-embedded-repo suppresses error message' '
 	test_when_finished "git rm --cached -f embed" &&
 	git add --no-warn-embedded-repo embed 2>stderr &&
-	test_i18ngrep ! warning stderr
+	test_i18ngrep ! fatal stderr
 '
 
-test_expect_success 'no warning when updating entry' '
+test_expect_success 'no error message when updating entry' '
 	test_when_finished "git rm --cached -f embed" &&
-	git add embed &&
+	git add --no-warn-embedded-repo embed &&
 	git -C embed commit --allow-empty -m two &&
 	git add embed 2>stderr &&
-	test_i18ngrep ! warning stderr
+	test_i18ngrep ! fatal stderr
 '
 
-test_expect_success 'submodule add does not warn' '
+test_expect_success 'submodule add does not issue error message' '
 	test_when_finished "git rm -rf submodule .gitmodules" &&
 	git -c protocol.file.allow=always \
 		submodule add ./embed submodule 2>stderr &&
-	test_i18ngrep ! warning stderr
+	test_i18ngrep ! fatal stderr
 '
 
 test_done
diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index ba1f569bcb..6fbcf36ae9 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -307,7 +307,7 @@  test_expect_success 'git dirs of sibling submodules must not be nested' '
 		EOF
 		git clone . thing1 &&
 		git clone . thing2 &&
-		git add .gitmodules thing1 thing2 &&
+		git add --no-warn-embedded-repo .gitmodules thing1 thing2 &&
 		test_tick &&
 		git commit -m nested
 	) &&