diff mbox series

[v5,9/9] submodule: support reading .gitmodules when it's not in the working tree

Message ID 20180917140940.3839-10-ao2@ao2.it (mailing list archive)
State Superseded
Headers show
Series Make submodules work if .gitmodules is not checked out | expand

Commit Message

Antonio Ospite Sept. 17, 2018, 2:09 p.m. UTC
When the .gitmodules file is not available in the working tree, try
using the content from the index and from the current branch. This
covers the case when the file is part of the repository but for some
reason it is not checked out, for example because of a sparse checkout.

This makes it possible to use at least the 'git submodule' commands
which *read* the gitmodules configuration file without fully populating
the working tree.

Writing to .gitmodules will still require that the file is checked out,
so check for that before calling config_set_in_gitmodules_file_gently.

Add a similar check also in git-submodule.sh::cmd_add() to anticipate
the eventual failure of the "git submodule add" command when .gitmodules
is not safely writeable; this prevents the command from leaving the
repository in a spurious state (e.g. the submodule repository was cloned
but .gitmodules was not updated because
config_set_in_gitmodules_file_gently failed).

Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading
from .gitmodules succeeds and that writing to it fails when the file is
not checked out.

Signed-off-by: Antonio Ospite <ao2@ao2.it>
---
 builtin/submodule--helper.c            |  6 +-
 git-submodule.sh                       |  5 ++
 submodule-config.c                     | 18 +++++-
 t/t7411-submodule-config.sh            | 26 ++++++++-
 t/t7416-submodule-sparse-gitmodules.sh | 78 ++++++++++++++++++++++++++
 5 files changed, 129 insertions(+), 4 deletions(-)
 create mode 100755 t/t7416-submodule-sparse-gitmodules.sh

Comments

SZEDER Gábor Sept. 18, 2018, 5:12 p.m. UTC | #1
Hi Antonio,

it appears that this patch (and its previous versions as well) is
responsible for triggering occasional test failures in
't7814-grep-recurse-submodules.sh', more frequently, about once in
every ten runs, on macOS on Travis CI, less frequently, about once in
a couple of hundred runs on Linux on my machine.

The reason for the failure is memory corruption manifesting in various
ways: segfault, malloc() or use after free() errors from libc, corrupt
loose object, invalid ref, bogus output, etc.

Applying the following patch makes t7814 fail almost every time,
though sometimes that loop has to iterate over 1000 times until that
'git grep' finally fails...  so good luck with debugging ;)

diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 7184113b9b..93ae2e8e7c 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -337,6 +337,10 @@ test_expect_success 'grep --recurse-submodules should pass the pattern type alon
 	test_must_fail git -c grep.patternType=fixed grep --recurse-submodules -e "(.|.)[\d]" &&
 
 	# Basic
+	for i in $(seq 0 2000)
+	do
+		git grep --recurse-submodules 1 >/dev/null || return 1
+	done &&
 	git grep -G --recurse-submodules -e "(.|.)[\d]" >actual &&
 	cat >expect <<-\EOF &&
 	a:(1|2)d(3|4)


On first look I didn't notice anything that is obviously wrong in this
patch and could be responsible for the memory corruption, but there is
one thing I found strange, though:


On Mon, Sep 17, 2018 at 04:09:40PM +0200, Antonio Ospite wrote:
> When the .gitmodules file is not available in the working tree, try
> using the content from the index and from the current branch.

"from the index and from the current branch" of which repository?

> This
> covers the case when the file is part of the repository but for some
> reason it is not checked out, for example because of a sparse checkout.
> 
> This makes it possible to use at least the 'git submodule' commands
> which *read* the gitmodules configuration file without fully populating
> the working tree.
> 
> Writing to .gitmodules will still require that the file is checked out,
> so check for that before calling config_set_in_gitmodules_file_gently.
> 
> Add a similar check also in git-submodule.sh::cmd_add() to anticipate
> the eventual failure of the "git submodule add" command when .gitmodules
> is not safely writeable; this prevents the command from leaving the
> repository in a spurious state (e.g. the submodule repository was cloned
> but .gitmodules was not updated because
> config_set_in_gitmodules_file_gently failed).
> 
> Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading
> from .gitmodules succeeds and that writing to it fails when the file is
> not checked out.
> 
> Signed-off-by: Antonio Ospite <ao2@ao2.it>
> ---

> diff --git a/submodule-config.c b/submodule-config.c
> index 61a555e920..bdb1d0e2c9 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c

> @@ -603,8 +604,21 @@ static void submodule_cache_check_init(struct repository *repo)
>  static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void *data)
>  {
>  	if (repo->worktree) {
> -		char *file = repo_worktree_path(repo, GITMODULES_FILE);
> -		git_config_from_file(fn, file, data);
> +		struct git_config_source config_source = { 0 };
> +		const struct config_options opts = { 0 };
> +		struct object_id oid;
> +		char *file;
> +
> +		file = repo_worktree_path(repo, GITMODULES_FILE);
> +		if (file_exists(file))
> +			config_source.file = file;
> +		else if (get_oid(GITMODULES_INDEX, &oid) >= 0)
> +			config_source.blob = GITMODULES_INDEX;

The repository used in t7814 contains nested submodules, which means
that config_from_gitmodules() is invoked three times.

Now, the first two of those calls look at the superproject and at
'submodule', and find the existing files '.../trash
directory.t7814-grep-recurse-submodules/.gitmodules' and '.../trash
directory.t7814-grep-recurse-submodules/submodule/.gitmodules',
respectively.  So far so good.

The third call, however, looks at the nested submodule at
'submodule/sub', which doesn't contain a '.gitmodules' file.  So this
function goes on with the second condition and calls
get_oid(GITMODULES_INDEX, &oid), which then appears to find the blob
in the _superproject's_ index.

I'm no expert on submodules, but my gut feeling says that this can't
be right.  But if it _is_ right, then I would say that the commit
message should explain in detail, why it is right.

Anyway, even if it is indeed wrong, I'm not sure whether this is the
root cause of the memory corruption.


> +		else if (get_oid(GITMODULES_HEAD, &oid) >= 0)
> +			config_source.blob = GITMODULES_HEAD;
> +
> +		config_with_options(fn, data, &config_source, &opts);
> +
>  		free(file);
>  	}
>  }
Junio C Hamano Sept. 19, 2018, 7:24 p.m. UTC | #2
SZEDER Gábor <szeder.dev@gmail.com> writes:

> it appears that this patch (and its previous versions as well) is
> responsible for triggering occasional test failures in
> 't7814-grep-recurse-submodules.sh', more frequently, about once in
> every ten runs, on macOS on Travis CI, less frequently, about once in
> a couple of hundred runs on Linux on my machine.

I see that among Cc'ed are people who are more familiar with the
submodule code and where it wants to go.  Thanks for a report and
analysis.

> The reason for the failure is memory corruption manifesting in various
> ways: segfault, malloc() or use after free() errors from libc, corrupt
> loose object, invalid ref, bogus output, etc.
> 
> Applying the following patch makes t7814 fail almost every time,
> though sometimes that loop has to iterate over 1000 times until that
> 'git grep' finally fails...  so good luck with debugging ;)
>
> diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
> index 7184113b9b..93ae2e8e7c 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -337,6 +337,10 @@ test_expect_success 'grep --recurse-submodules should pass the pattern type alon
>  	test_must_fail git -c grep.patternType=fixed grep --recurse-submodules -e "(.|.)[\d]" &&
>  
>  	# Basic
> +	for i in $(seq 0 2000)
> +	do
> +		git grep --recurse-submodules 1 >/dev/null || return 1
> +	done &&
>  	git grep -G --recurse-submodules -e "(.|.)[\d]" >actual &&
>  	cat >expect <<-\EOF &&
>  	a:(1|2)d(3|4)
>
> On first look I didn't notice anything that is obviously wrong in this
> patch and could be responsible for the memory corruption, but there is
> one thing I found strange, though:
>
>
> On Mon, Sep 17, 2018 at 04:09:40PM +0200, Antonio Ospite wrote:
>> When the .gitmodules file is not available in the working tree, try
>> using the content from the index and from the current branch.
>
> "from the index and from the current branch" of which repository?
>
>> This
>> covers the case when the file is part of the repository but for some
>> reason it is not checked out, for example because of a sparse checkout.
>> 
>> This makes it possible to use at least the 'git submodule' commands
>> which *read* the gitmodules configuration file without fully populating
>> the working tree.
>> 
>> Writing to .gitmodules will still require that the file is checked out,
>> so check for that before calling config_set_in_gitmodules_file_gently.
>> 
>> Add a similar check also in git-submodule.sh::cmd_add() to anticipate
>> the eventual failure of the "git submodule add" command when .gitmodules
>> is not safely writeable; this prevents the command from leaving the
>> repository in a spurious state (e.g. the submodule repository was cloned
>> but .gitmodules was not updated because
>> config_set_in_gitmodules_file_gently failed).
>> 
>> Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading
>> from .gitmodules succeeds and that writing to it fails when the file is
>> not checked out.
>> 
>> Signed-off-by: Antonio Ospite <ao2@ao2.it>
>> ---
>
>> diff --git a/submodule-config.c b/submodule-config.c
>> index 61a555e920..bdb1d0e2c9 100644
>> --- a/submodule-config.c
>> +++ b/submodule-config.c
>
>> @@ -603,8 +604,21 @@ static void submodule_cache_check_init(struct repository *repo)
>>  static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void *data)
>>  {
>>  	if (repo->worktree) {
>> -		char *file = repo_worktree_path(repo, GITMODULES_FILE);
>> -		git_config_from_file(fn, file, data);
>> +		struct git_config_source config_source = { 0 };
>> +		const struct config_options opts = { 0 };
>> +		struct object_id oid;
>> +		char *file;
>> +
>> +		file = repo_worktree_path(repo, GITMODULES_FILE);
>> +		if (file_exists(file))
>> +			config_source.file = file;
>> +		else if (get_oid(GITMODULES_INDEX, &oid) >= 0)
>> +			config_source.blob = GITMODULES_INDEX;
>
> The repository used in t7814 contains nested submodules, which means
> that config_from_gitmodules() is invoked three times.
>
> Now, the first two of those calls look at the superproject and at
> 'submodule', and find the existing files '.../trash
> directory.t7814-grep-recurse-submodules/.gitmodules' and '.../trash
> directory.t7814-grep-recurse-submodules/submodule/.gitmodules',
> respectively.  So far so good.
>
> The third call, however, looks at the nested submodule at
> 'submodule/sub', which doesn't contain a '.gitmodules' file.  So this
> function goes on with the second condition and calls
> get_oid(GITMODULES_INDEX, &oid), which then appears to find the blob
> in the _superproject's_ index.
>
> I'm no expert on submodules, but my gut feeling says that this can't
> be right.  But if it _is_ right, then I would say that the commit
> message should explain in detail, why it is right.
>
> Anyway, even if it is indeed wrong, I'm not sure whether this is the
> root cause of the memory corruption.
>
>
>> +		else if (get_oid(GITMODULES_HEAD, &oid) >= 0)
>> +			config_source.blob = GITMODULES_HEAD;
>> +
>> +		config_with_options(fn, data, &config_source, &opts);
>> +
>>  		free(file);
>>  	}
>>  }
Antonio Ospite Sept. 20, 2018, 3:35 p.m. UTC | #3
On Tue, 18 Sep 2018 19:12:57 +0200
SZEDER Gábor <szeder.dev@gmail.com> wrote:

> Hi Antonio,
> 
> it appears that this patch (and its previous versions as well) is
> responsible for triggering occasional test failures in
> 't7814-grep-recurse-submodules.sh', more frequently, about once in
> every ten runs, on macOS on Travis CI, less frequently, about once in
> a couple of hundred runs on Linux on my machine.
>

Thanks a lot for testing Gábor, it's really appreciated.

> The reason for the failure is memory corruption manifesting in various
> ways: segfault, malloc() or use after free() errors from libc, corrupt
> loose object, invalid ref, bogus output, etc.
> 
> Applying the following patch makes t7814 fail almost every time,
> though sometimes that loop has to iterate over 1000 times until that
> 'git grep' finally fails...  so good luck with debugging ;)
[...]

I managed to capture some traces of the segfaults using this variation:

diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 7184113b9b..56e87c3f8a 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -337,6 +337,10 @@ test_expect_success 'grep --recurse-submodules should pass the pattern type alon
        test_must_fail git -c grep.patternType=fixed grep --recurse-submodules -e "(.|.)[\d]" &&

        # Basic
+       for i in $(test_seq 0 2000)
+       do
+               debug --debugger="gdb --silent -ex run -ex quit --return-child-result --args" git grep --recurse-submodules 1 >/dev/null || return 1
+       done &&
        git grep -G --recurse-submodules -e "(.|.)[\d]" >actual &&
        cat >expect <<-\EOF &&
        a:(1|2)d(3|4)


Running t7814 with --run="1,6,22" is enough to observe the issue.

FWICS these corruptions are caused by concurrent accesses to the object
store.

The issue is caused by these facts:
  1. git grep uses threads;
  2. git grep reads submodules config with repo_read_gitmodules;
  3. repo_read_gitmodules calls config_from_gitmodules
  4. the changes in patch 9 in this series make config_from_gitmodules
     use the object store, which apparently is not mt-safe, while the
     previous use of git_config_from_file() was.

> On first look I didn't notice anything that is obviously wrong in this
> patch and could be responsible for the memory corruption, but there is
> one thing I found strange, though:
> 
> 
> On Mon, Sep 17, 2018 at 04:09:40PM +0200, Antonio Ospite wrote:
> > When the .gitmodules file is not available in the working tree, try
> > using the content from the index and from the current branch.
> 
> "from the index and from the current branch" of which repository?
> 
[...]

> > diff --git a/submodule-config.c b/submodule-config.c
> > index 61a555e920..bdb1d0e2c9 100644
> > --- a/submodule-config.c
> > +++ b/submodule-config.c
> 
> > @@ -603,8 +604,21 @@ static void submodule_cache_check_init(struct repository *repo)
> >  static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void *data)
> >  {
> >  	if (repo->worktree) {
> > -		char *file = repo_worktree_path(repo, GITMODULES_FILE);
> > -		git_config_from_file(fn, file, data);
> > +		struct git_config_source config_source = { 0 };
> > +		const struct config_options opts = { 0 };
> > +		struct object_id oid;
> > +		char *file;
> > +
> > +		file = repo_worktree_path(repo, GITMODULES_FILE);
> > +		if (file_exists(file))
> > +			config_source.file = file;
> > +		else if (get_oid(GITMODULES_INDEX, &oid) >= 0)
> > +			config_source.blob = GITMODULES_INDEX;
> 
> The repository used in t7814 contains nested submodules, which means
> that config_from_gitmodules() is invoked three times.
> 
> Now, the first two of those calls look at the superproject and at
> 'submodule', and find the existing files '.../trash
> directory.t7814-grep-recurse-submodules/.gitmodules' and '.../trash
> directory.t7814-grep-recurse-submodules/submodule/.gitmodules',
> respectively.  So far so good.
> 
> The third call, however, looks at the nested submodule at
> 'submodule/sub', which doesn't contain a '.gitmodules' file.  So this
> function goes on with the second condition and calls
> get_oid(GITMODULES_INDEX, &oid), which then appears to find the blob
> in the _superproject's_ index.
> 
> I'm no expert on submodules, but my gut feeling says that this can't
> be right.  But if it _is_ right, then I would say that the commit
> message should explain in detail, why it is right.
>

I'll think about that too.

> Anyway, even if it is indeed wrong, I'm not sure whether this is the
> root cause of the memory corruption.
> 

I think the immediate cause of the corruptions is multi-threading in
grep, I can prevent the issue from happening by using "git grep
--threads 1 ...".

Protecting the problematic submodules function could work for now, but
I'd like to have more comments, my proposal is:

diff --git a/builtin/grep.c b/builtin/grep.c
index 601f801158..52b45de749 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -427,6 +427,11 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
        if (repo_submodule_init(&submodule, superproject, path))
                return 0;

+       grep_read_lock();
+       /*
+        * NEEDSWORK: repo_read_gitmodules accesses the object store which is
+        * global, thus it needs to be protected.
+        */
        repo_read_gitmodules(&submodule);

        /*
@@ -439,7 +444,6 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
         * store is no longer global and instead is a member of the repository
         * object.
         */
-       grep_read_lock();
        add_to_alternates_memory(submodule.objects->objectdir);
        grep_read_unlock();


The pre-existing NEEDSWORK comment there also suggests that these
problems with the object store are known. I was not aware of them.

With the change from above I could not reproduce the problem anymore,
this should be the only location where
repo_read_gitmodules/config_from_gitmodules is called in a thread.

Thanks you,
   Antonio
Junio C Hamano Sept. 21, 2018, 4:19 p.m. UTC | #4
Antonio Ospite <ao2@ao2.it> writes:

> Protecting the problematic submodules function could work for now, but
> I'd like to have more comments, my proposal is:
>
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 601f801158..52b45de749 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -427,6 +427,11 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
>         if (repo_submodule_init(&submodule, superproject, path))
>                 return 0;
>
> +       grep_read_lock();
> +       /*
> +        * NEEDSWORK: repo_read_gitmodules accesses the object store which is
> +        * global, thus it needs to be protected.
> +        */
>         repo_read_gitmodules(&submodule);
>
>         /*
> @@ -439,7 +444,6 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
>          * store is no longer global and instead is a member of the repository
>          * object.
>          */
> -       grep_read_lock();
>         add_to_alternates_memory(submodule.objects->objectdir);
>         grep_read_unlock();

I think this is in line with how the grep codepath protects itself
when doing anything that accesses the object store.

Thanks.
Antonio Ospite Sept. 24, 2018, 10:20 a.m. UTC | #5
On Tue, 18 Sep 2018 19:12:57 +0200
SZEDER Gábor <szeder.dev@gmail.com> wrote:

[...]
> On Mon, Sep 17, 2018 at 04:09:40PM +0200, Antonio Ospite wrote:
> > When the .gitmodules file is not available in the working tree, try
> > using the content from the index and from the current branch.
> 
> "from the index and from the current branch" of which repository?
>

I took a look, some comments below.

> > diff --git a/submodule-config.c b/submodule-config.c
> > index 61a555e920..bdb1d0e2c9 100644
> > --- a/submodule-config.c
> > +++ b/submodule-config.c
> 
> > @@ -603,8 +604,21 @@ static void submodule_cache_check_init(struct repository *repo)
> >  static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void *data)
> >  {
[...]
> > +		file = repo_worktree_path(repo, GITMODULES_FILE);
> > +		if (file_exists(file))
> > +			config_source.file = file;
> > +		else if (get_oid(GITMODULES_INDEX, &oid) >= 0)
> > +			config_source.blob = GITMODULES_INDEX;
> > +		else if (get_oid(GITMODULES_HEAD, &oid) >= 0)
> > +			config_source.blob = GITMODULES_HEAD;
> > +
> 
> The repository used in t7814 contains nested submodules, which means
> that config_from_gitmodules() is invoked three times.
> 
> Now, the first two of those calls look at the superproject and at
> 'submodule', and find the existing files '.../trash
> directory.t7814-grep-recurse-submodules/.gitmodules' and '.../trash
> directory.t7814-grep-recurse-submodules/submodule/.gitmodules',
> respectively.  So far so good.
> 
> The third call, however, looks at the nested submodule at
> 'submodule/sub', which doesn't contain a '.gitmodules' file.  So this
> function goes on with the second condition and calls
> get_oid(GITMODULES_INDEX, &oid), which then appears to find the blob
> in the _superproject's_ index.
>

You are correct.

This is a limitation of the object store in git, there is no equivalent
of get_oid() to get the oid from a specific repository and this affects
config_with_options too when the config source is a blob.

This does not affect commands called via "git -C submodule_dir cmd"
because in that case the chdir happens before the_repository is set up,
for instance "git-submodule $SOMETHING --recursive" commands seem to
change the working directory before the recursion.

> > +		config_with_options(fn, data, &config_source, &opts);
> > +
> >  		free(file);
> >  	}
> >  }
> I'm no expert on submodules, but my gut feeling says that this can't
> be right.  But if it _is_ right, then I would say that the commit
> message should explain in detail, why it is right.
> 

I agree it isn't right, I didn't consider the case of nested
submodules, but even if I did the current git design does not
allow to correctly solve the problem: "read config from blob of an
arbitrary repository".

So what to do for the time being?

The issue is there but in a "normal" scenario it is not causing any real
harm because:

  1. currently it is exposed "only" by git grep and nested submodules.

  2. the new mechanism works fine when reading the submodule config for
     the root repository.

  3. the new mechanism does not "usually" impact non-leaf nested
     submodules, because the .gitmodules file is "normally" there.

  4. git grep never *uses* the GITMODULES_INDEX erroneously read
     from the root project when scanning the _leaf_ nested submodule
     because there are no further submodules down the line, the
     following check fails in builtin/grep.c:

       if (recurse_submodules && S_ISGITLINK(entry.mode) ...
       ...

In fact, because of 4. the test suite passes even if the gitmodule
config is not correct for the leaf submodule.

Actually 4. makes me think that the repo_read_gitmodules() call in
builtin/grep.c might not be strictly necessary, its purpose seems to be
to *anticipate* reading the config of *child* submodules while still
processing the *current* submodule, the config for the *current*
submodule was already read from the superproject by the immediately
preceding repo_submodule_init(), via:

  repo_submodule_init()
    submodule_from_path()
      gitmodules_read_check()
        repo_read_gitmodules()

And this would happen anyway also for child submodules down the
recursion path if we removed repo_read_gitmodules() in builtin/grep.c,
the operation would be not protected by grep_read_lock() tho.

The test suite passes even after removing repo_read_gitmodules()
entirely from builtin/grep.c, but I am still not confident that I get
all the implication of why that call was originally added in commit
f9ee2fcdfa (grep: recurse in-process using 'struct repository',
2017-08-02).

Anyways, even if we removed the call we would prevent the problem from
happening in the test suite, but not in the real world, in case non-leaf
submodules without .gitmodules in their working tree.

To recap:
  - patch 9/9 exposes a problem with the object store but for now it's
    only a potential problem in the future case that someone wanted to
    use *nested* submodules without .gitmodules in the working tree.
  - the new code should not affect current users which
    assume .gitmodules to be in the working tree for nested submodules;
  - even if we removed the call to repo_read_gitmodules() call in
    builtin/grep.c we would not avoid the problem entirely, just avoid
    it for the the _leaf_ submodules in case of nested submodules.

Selfishly, I'd propose to still merge the changes (I can send a v6 with
the locking fix in) and I'll write a test_expect_failure snippet to
document the problem Gábor spotted so we remember about it and fix it
when the object store can be accessed per-repository.

I am afraid I cannot look into the core issue about the object store in
my free time, however if someone wanted to sponsor some time I might
consider taking a stab at it.

Ciao,
   Antonio
Stefan Beller Sept. 24, 2018, 9 p.m. UTC | #6
On Mon, Sep 24, 2018 at 3:20 AM Antonio Ospite <ao2@ao2.it> wrote:

> > The third call, however, looks at the nested submodule at
> > 'submodule/sub', which doesn't contain a '.gitmodules' file.  So this
> > function goes on with the second condition and calls
> > get_oid(GITMODULES_INDEX, &oid), which then appears to find the blob
> > in the _superproject's_ index.
> >
>
> You are correct.
>
> This is a limitation of the object store in git, there is no equivalent
> of get_oid() to get the oid from a specific repository and this affects
> config_with_options too when the config source is a blob.

Not yet, as there is a big push to pass-through an object-store object
or similar recently and rely less on global variables.
I am not sure I get to this code, though.

> This does not affect commands called via "git -C submodule_dir cmd"
> because in that case the chdir happens before the_repository is set up,
> for instance "git-submodule $SOMETHING --recursive" commands seem to
> change the working directory before the recursion.

For this it may be worth looking into the option
       --super-prefix=<path>
  Currently for internal use only. Set a prefix which gives a
  path from above a repository down to its root. One use is
  to give submodules context about the superproject that
  invoked it.

the whole motion of moving to in-process deprecates this clunky
API to pass around strings to subprocesses.

> The test suite passes even after removing repo_read_gitmodules()
> entirely from builtin/grep.c, but I am still not confident that I get
> all the implication of why that call was originally added in commit
> f9ee2fcdfa (grep: recurse in-process using 'struct repository',
> 2017-08-02).

If you checkout that commit and remove the call to repo_read_gitmodules
and then call git-grep in a superproject with nested submodules, you
get a segfault.

On master (and deleting out that line) you do not get the segfault,
I think praise goes to ff6f1f564c4 (submodule-config: lazy-load a
repository's .gitmodules file, 2017-08-03) which happened shortly
after f9ee2fcdfa.

It showcased that it worked by converting ls-files, but left out grep.

So I think based on ff6f1f564c4 it is safe to remove all calls to
repo_read_gitmodules.

> Anyways, even if we removed the call we would prevent the problem from
> happening in the test suite, but not in the real world, in case non-leaf
> submodules without .gitmodules in their working tree.

Quite frankly I think grep was just overlooked in review of
https://public-inbox.org/git/20170803182000.179328-14-bmwill@google.com/

Stefan
Antonio Ospite Sept. 27, 2018, 2:44 p.m. UTC | #7
Hi Stefan,

On Mon, 24 Sep 2018 14:00:50 -0700
Stefan Beller <sbeller@google.com> wrote:

> On Mon, Sep 24, 2018 at 3:20 AM Antonio Ospite <ao2@ao2.it> wrote:
> 
[...]
> > This is a limitation of the object store in git, there is no equivalent
> > of get_oid() to get the oid from a specific repository and this affects
> > config_with_options too when the config source is a blob.
> 
> Not yet, as there is a big push to pass-through an object-store object
> or similar recently and rely less on global variables.
> I am not sure I get to this code, though.
>

If you end up touching get_oid() please CC me.

> > This does not affect commands called via "git -C submodule_dir cmd"
> > because in that case the chdir happens before the_repository is set up,
> > for instance "git-submodule $SOMETHING --recursive" commands seem to
> > change the working directory before the recursion.
> 
> For this it may be worth looking into the option
>        --super-prefix=<path>
>   Currently for internal use only. Set a prefix which gives a
>   path from above a repository down to its root. One use is
>   to give submodules context about the superproject that
>   invoked it.
>

My comment wanted to highlight that there are NO problems in the
mentioned cases:

  - git -C submodule_dir cmd
  - git submodule cmd --recursive

Are you suggesting to look into super-prefix for any reason in
particular?

[...]
> > The test suite passes even after removing repo_read_gitmodules()
> > entirely from builtin/grep.c, but I am still not confident that I get
> > all the implication of why that call was originally added in commit
> > f9ee2fcdfa (grep: recurse in-process using 'struct repository',
> > 2017-08-02).
> 
> If you checkout that commit and remove the call to repo_read_gitmodules
> and then call git-grep in a superproject with nested submodules, you
> get a segfault.
> 
> On master (and deleting out that line) you do not get the segfault,
> I think praise goes to ff6f1f564c4 (submodule-config: lazy-load a
> repository's .gitmodules file, 2017-08-03) which happened shortly
> after f9ee2fcdfa.
> 
> It showcased that it worked by converting ls-files, but left out grep.
> 
> So I think based on ff6f1f564c4 it is safe to remove all calls to
> repo_read_gitmodules.
>

Thanks for confirming.

> > Anyways, even if we removed the call we would prevent the problem from
> > happening in the test suite, but not in the real world, in case non-leaf
> > submodules without .gitmodules in their working tree.
> 
> Quite frankly I think grep was just overlooked in review of
> https://public-inbox.org/git/20170803182000.179328-14-bmwill@google.com/
> 

OK, so the plan for v6 is:

  - avoid the corruption issues spotted by Gábor by removing the call
    to repo_read_gitmodules in builtin/grep.c (this still does not fix
    the potential problem with nested submodules).

  - add a new test-tool which better exercises the new
    config_from_gitmodules code,

  - add also a test_expect_failure test to document the use case that
    cannot be supported yet: nested submodules without .gitmodules in
    their working tree.

Thanks,
   Antonio
Antonio Ospite Sept. 27, 2018, 2:49 p.m. UTC | #8
On Fri, 21 Sep 2018 09:19:45 -0700
Junio C Hamano <gitster@pobox.com> wrote:

> Antonio Ospite <ao2@ao2.it> writes:
> 
> > Protecting the problematic submodules function could work for now, but
> > I'd like to have more comments, my proposal is:
> >
> > diff --git a/builtin/grep.c b/builtin/grep.c
> > index 601f801158..52b45de749 100644
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -427,6 +427,11 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
> >         if (repo_submodule_init(&submodule, superproject, path))
> >                 return 0;
> >
> > +       grep_read_lock();
> > +       /*
> > +        * NEEDSWORK: repo_read_gitmodules accesses the object store which is
> > +        * global, thus it needs to be protected.
> > +        */
> >         repo_read_gitmodules(&submodule);
> >
> >         /*
> > @@ -439,7 +444,6 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
> >          * store is no longer global and instead is a member of the repository
> >          * object.
> >          */
> > -       grep_read_lock();
> >         add_to_alternates_memory(submodule.objects->objectdir);
> >         grep_read_unlock();
> 
> I think this is in line with how the grep codepath protects itself
> when doing anything that accesses the object store.
> 

Thanks for the comment.

However, after confirming with Stefan Beller, I think we are going to
solve the corruption issue by removing this call to
repo_read_gitmodules(), which is not strictly necessary:

https://public-inbox.org/git/CAGZ79kZaomuE3p1puznM1x+hu-w4O+ZqeGUODBDj=-R3Z1hDzg@mail.gmail.com/

Thanks,
   Antonio
Stefan Beller Sept. 27, 2018, 6 p.m. UTC | #9
On Thu, Sep 27, 2018 at 7:44 AM Antonio Ospite <ao2@ao2.it> wrote:
>
> If you end up touching get_oid() please CC me.

noted. I am not sure I'll touch it anytime soon, though.

>
> Are you suggesting to look into super-prefix for any reason in
> particular?

No, I misread the intent of that part of your message

> >
> > So I think based on ff6f1f564c4 it is safe to remove all calls to
> > repo_read_gitmodules.
> >
>
> Thanks for confirming.
>

> OK, so the plan for v6 is:
>
>   - avoid the corruption issues spotted by Gábor by removing the call
>     to repo_read_gitmodules in builtin/grep.c (this still does not fix
>     the potential problem with nested submodules).
>
>   - add a new test-tool which better exercises the new
>     config_from_gitmodules code,

Sounds good.

>
>   - add also a test_expect_failure test to document the use case that
>     cannot be supported yet: nested submodules without .gitmodules in
>     their working tree.

Personally I would want to live in a world where we don't *have* to nor
*want* to support submodules without .gitmodules in the respective
superproject.

We did support some use cases historically that I would make sure to
continue to support, but I am not sure how much effort we want to spend
on supporting further use cases of incomplete submodules.

Feel free to do so, as such tests help to document the boundaries.

Stefan
Antonio Ospite Oct. 1, 2018, 3:45 p.m. UTC | #10
On Thu, 27 Sep 2018 11:00:52 -0700
Stefan Beller <sbeller@google.com> wrote:

> On Thu, Sep 27, 2018 at 7:44 AM Antonio Ospite <ao2@ao2.it> wrote:
[...]
> > OK, so the plan for v6 is:
> >
> >   - avoid the corruption issues spotted by Gábor by removing the call
> >     to repo_read_gitmodules in builtin/grep.c (this still does not fix
> >     the potential problem with nested submodules).
> >

Actually that is not enough to fix the inconsistent access to the
object store: the functions is_submodule_active() and
repo_submodule_init() too end up calling config_from_gitmodules() and
need protecting as well, so I am going to put them under the git read
lock and leave repo_read_gitmodules() in place for now.

Removing unneeded code can go in a possible stand-alone patch.

> >   - add a new test-tool which better exercises the new
> >     config_from_gitmodules code,
> 
> Sounds good.
> 
> >
> >   - add also a test_expect_failure test to document the use case that
> >     cannot be supported yet: nested submodules without .gitmodules in
> >     their working tree.
> 
> Personally I would want to live in a world where we don't *have* to nor
> *want* to support submodules without .gitmodules in the respective
> superproject.
>

Just to double check: are you referring to *nested* submodules in the
sentence above?

I am asking because the whole point of this patchset is to *enable* the
use of submodules without .gitmodules in the working tree of the
superproject. :)

It's just that current limitations in git do not allow to support this
for *nested* submodules yet.

> We did support some use cases historically that I would make sure to
> continue to support, but I am not sure how much effort we want to spend
> on supporting further use cases of incomplete submodules.
>
> Feel free to do so, as such tests help to document the boundaries.
> 

Let's see how v6 turns out.

Thanks,
   Antonio
Stefan Beller Oct. 1, 2018, 7:42 p.m. UTC | #11
On Mon, Oct 1, 2018 at 8:45 AM Antonio Ospite <ao2@ao2.it> wrote:
>
> On Thu, 27 Sep 2018 11:00:52 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
> > On Thu, Sep 27, 2018 at 7:44 AM Antonio Ospite <ao2@ao2.it> wrote:
> [...]
> > > OK, so the plan for v6 is:
> > >
> > >   - avoid the corruption issues spotted by Gábor by removing the call
> > >     to repo_read_gitmodules in builtin/grep.c (this still does not fix
> > >     the potential problem with nested submodules).
> > >
>
> Actually that is not enough to fix the inconsistent access to the
> object store: the functions is_submodule_active() and
> repo_submodule_init() too end up calling config_from_gitmodules() and
> need protecting as well, so I am going to put them under the git read
> lock and leave repo_read_gitmodules() in place for now.
>

>
> I am asking because the whole point of this patchset is to *enable* the
> use of submodules without .gitmodules in the working tree of the
> superproject. :)

I was imprecise and meant to
s/.gitmodules/mechanism to configure the name <-> path mapping/

In this series, the .gitmodules may not be present in the working tree,
but it is still there in the repo. Later we may want to rename that file
or put it into a magic branch, and I'd still find it a good idea.
What I find a bad idea is to have only a gitlink and a repo put
into that path and expect that it magically works as then it is not
a submodule, but some "halfway there thing". We need to have
an explicit "yes this is a submodule" statement, (which currently
comes from the .gitmodules file in the working tree), and I am not
attached to where it comes from, but that it exists.
diff mbox series

Patch

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index bd14f57d00..f72f6ee58a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2032,8 +2032,12 @@  static int module_config(int argc, const char **argv, const char *prefix)
 		return print_config_from_gitmodules(argv[1]);
 
 	/* Equivalent to ACTION_SET in builtin/config.c */
-	if (argc == 3)
+	if (argc == 3) {
+		if (!is_writing_gitmodules_ok())
+			die(_("please make sure that the .gitmodules file is in the working tree"));
+
 		return config_set_in_gitmodules_file_gently(argv[1], argv[2]);
+	}
 
 	usage_with_options(git_submodule_helper_usage, module_config_options);
 }
diff --git a/git-submodule.sh b/git-submodule.sh
index 25b9bc58cb..bff855f54a 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -159,6 +159,11 @@  cmd_add()
 		shift
 	done
 
+	if ! git submodule--helper config --check-writeable >/dev/null 2>&1
+	then
+		 die "$(eval_gettext "please make sure that the .gitmodules file is in the working tree")"
+	fi
+
 	if test -n "$reference_path"
 	then
 		is_absolute_path "$reference_path" ||
diff --git a/submodule-config.c b/submodule-config.c
index 61a555e920..bdb1d0e2c9 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -1,4 +1,5 @@ 
 #include "cache.h"
+#include "dir.h"
 #include "repository.h"
 #include "config.h"
 #include "submodule-config.h"
@@ -603,8 +604,21 @@  static void submodule_cache_check_init(struct repository *repo)
 static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void *data)
 {
 	if (repo->worktree) {
-		char *file = repo_worktree_path(repo, GITMODULES_FILE);
-		git_config_from_file(fn, file, data);
+		struct git_config_source config_source = { 0 };
+		const struct config_options opts = { 0 };
+		struct object_id oid;
+		char *file;
+
+		file = repo_worktree_path(repo, GITMODULES_FILE);
+		if (file_exists(file))
+			config_source.file = file;
+		else if (get_oid(GITMODULES_INDEX, &oid) >= 0)
+			config_source.blob = GITMODULES_INDEX;
+		else if (get_oid(GITMODULES_HEAD, &oid) >= 0)
+			config_source.blob = GITMODULES_HEAD;
+
+		config_with_options(fn, data, &config_source, &opts);
+
 		free(file);
 	}
 }
diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index 45953f9300..2cfabb18bc 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -134,7 +134,7 @@  test_expect_success 'error in history in fetchrecursesubmodule lets continue' '
 	)
 '
 
-test_expect_success 'reading submodules config with "submodule--helper config"' '
+test_expect_success 'reading submodules config from the working tree with "submodule--helper config"' '
 	(cd super &&
 		echo "../submodule" >expect &&
 		git submodule--helper config submodule.submodule.url >actual &&
@@ -192,4 +192,28 @@  test_expect_success 'non-writeable .gitmodules when it is in the current branch
 	)
 '
 
+test_expect_success 'reading submodules config from the index when .gitmodules is not in the working tree' '
+	ORIG=$(git -C super rev-parse HEAD) &&
+	test_when_finished "git -C super reset --hard $ORIG" &&
+	(cd super &&
+		git submodule--helper config submodule.submodule.url "staged_url" &&
+		git add .gitmodules &&
+		rm -f .gitmodules &&
+		echo "staged_url" >expect &&
+		git submodule--helper config submodule.submodule.url >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reading submodules config from the current branch when .gitmodules is not in the index' '
+	ORIG=$(git -C super rev-parse HEAD) &&
+	test_when_finished "git -C super reset --hard $ORIG" &&
+	(cd super &&
+		git rm .gitmodules &&
+		echo "../submodule" >expect &&
+		git submodule--helper config submodule.submodule.url >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
diff --git a/t/t7416-submodule-sparse-gitmodules.sh b/t/t7416-submodule-sparse-gitmodules.sh
new file mode 100755
index 0000000000..908a4e6958
--- /dev/null
+++ b/t/t7416-submodule-sparse-gitmodules.sh
@@ -0,0 +1,78 @@ 
+#!/bin/sh
+#
+# Copyright (C) 2018  Antonio Ospite <ao2@ao2.it>
+#
+
+test_description='Test reading/writing .gitmodules when not in the working tree
+
+This test verifies that, when .gitmodules is in the current branch but is not
+in the working tree reading from it still works but writing to it does not.
+
+The test setup uses a sparse checkout, however the same scenario can be set up
+also by committing .gitmodules and then just removing it from the filesystem.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'sparse checkout setup which hides .gitmodules' '
+	echo file >file &&
+	git add file &&
+	test_tick &&
+	git commit -m upstream &&
+	git clone . super &&
+	git clone super submodule &&
+	git clone super new_submodule &&
+	(cd super &&
+		git submodule add ../submodule &&
+		test_tick &&
+		git commit -m submodule &&
+		cat >.git/info/sparse-checkout <<-\EOF &&
+		/*
+		!/.gitmodules
+		EOF
+		git config core.sparsecheckout true &&
+		git read-tree -m -u HEAD &&
+		test_path_is_missing .gitmodules
+	)
+'
+
+test_expect_success 'reading gitmodules config file when it is not checked out' '
+	(cd super &&
+		echo "../submodule" >expect &&
+		git submodule--helper config submodule.submodule.url >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'not writing gitmodules config file when it is not checked out' '
+	 test_must_fail git -C super submodule--helper config submodule.submodule.url newurl
+'
+
+test_expect_success 'initialising submodule when the gitmodules config is not checked out' '
+	git -C super submodule init
+'
+
+test_expect_success 'showing submodule summary when the gitmodules config is not checked out' '
+	git -C super submodule summary
+'
+
+test_expect_success 'updating submodule when the gitmodules config is not checked out' '
+	(cd submodule &&
+		echo file2 >file2 &&
+		git add file2 &&
+		git commit -m "add file2 to submodule"
+	) &&
+	git -C super submodule update
+'
+
+test_expect_success 'not adding submodules when the gitmodules config is not checked out' '
+	test_must_fail git -C super submodule add ../new_submodule
+'
+
+# This test checks that the previous "git submodule add" did not leave the
+# repository in a spurious state when it failed.
+test_expect_success 'init submodule still works even after the previous add failed' '
+	git -C super submodule init
+'
+
+test_done