diff mbox series

[1/2] rm: changes in the '.gitmodules' are staged after using '--cached'

Message ID 20210218184931.83613-2-periperidip@gmail.com (mailing list archive)
State New, archived
Headers show
Series rm: changes in the '.gitmodules' are staged after using '--cached' | expand

Commit Message

Shourya Shukla Feb. 18, 2021, 6:49 p.m. UTC
Earlier, on doing a 'git rm --cached <submodule>' did not modify the
'.gitmodules' entry of the submodule in question hence the file was not
staged. Change this behaviour to remove the entry of the submodule from
the '.gitmodules', something which might be more expected of the
command.

Reported-by: Javier Mora <javier.moradesambricio@rtx.com>
Signed-off-by: Shourya Shukla <periperidip@gmail.com>
---
 builtin/rm.c | 48 +++++++++++++++++++++++++++---------------------
 1 file changed, 27 insertions(+), 21 deletions(-)

Comments

Philippe Blain Feb. 18, 2021, 8:14 p.m. UTC | #1
Hello Shourya,

Le 2021-02-18 à 13:49, Shourya Shukla a écrit :
> Earlier, on doing a 'git rm --cached <submodule>' did not modify the
> '.gitmodules' entry of the submodule in question hence the file was not
> staged. Change this behaviour to remove the entry of the submodule from
> the '.gitmodules', something which might be more expected of the
> command.

We prefer using the imperative mood for the commit message title,
the present tense for describing the actual state of the code,
and finally the imperative mood again to give order to the code base
to change its behaviour [1]. So something like the following would fit more
into the project's conventions:


     rm: stage submodule removal from '.gitmodules' when using '--cached'

     Currently, using 'git rm --cached <submodule>' removes submodule <submodule> from the index
     and leaves the submodule working tree intact in the superproject working tree,
     but does not stage any changes to the '.gitmodules' file, in contrast to
     'git rm <submodule>', which removes both the submodule and its configuration
     in '.gitmodules' from the worktree and index.
     
     Fix this inconsistency by also staging the removal of the configuration of the
     submodule from the '.gitmodules' file, leaving the worktree copy intact, a behaviour
     which is more in line with what might be expected when using '--cached'.


However, this is *not* what you patch does; it also removes the relevant
section from the '.gitmodules' file *in the worktree*, which is not acceptable
because it is exactly contrary to what '--cached' means.

This was verified by running Javier's demonstration script that I included in the
Gitgitgadget issue [2], which I copy here:


~~~
rm -rf some_submodule top_repo

mkdir some_submodule
cd some_submodule
git init
echo hello > hello.txt
git add hello.txt
git commit -m 'First commit of submodule'
cd ..
mkdir top_repo
cd top_repo
git init
echo world > world.txt
git add world.txt
git commit -m 'First commit of top repo'
git submodule add ../some_submodule
git status  # both some_submodule and .gitmodules staged
git commit -m 'Added submodule'
git rm --cached some_submodule
git status  # only some_submodule staged
~~~

With your changes, at the end '.gitmodules' is modified in both the
worktree and the index, whereas we would want it to be modified
*only* in the index.

And we would want it to be staged for deletion (and only deleting the config
entry and keeping an empty ".gitmodules' file in the index)
if the user is removing the only submodule in the superproject.


> ---
>   builtin/rm.c | 48 +++++++++++++++++++++++++++---------------------
>   1 file changed, 27 insertions(+), 21 deletions(-)
> 

Once implemeted correctly (leaving the worktree version of '.gitmodules'
intact), that patch should also change the documentation to stay up-to-date,
since the "Submodules" section of Documentation/git-rm.txt states [3]:

     If it exists the submodule.<name> section in the gitmodules[5] file will
     also be removed and that file will be staged (unless --cached or -n are used).


Cheers,
Philippe.

[1] https://git-scm.com/docs/SubmittingPatches#describe-changes
[2] https://github.com/gitgitgadget/git/issues/750
[3] https://git-scm.com/docs/git-rm#_submodules
Philippe Blain Feb. 18, 2021, 8:39 p.m. UTC | #2
Le 2021-02-18 à 15:14, Philippe Blain a écrit :
> 
> And we would want it to be staged for deletion (and only deleting the config
> entry and keeping an empty ".gitmodules' file in the index)
> if the user is removing the only submodule in the superproject.

Sorry for the typo, a "not" is missing:

And we would want it to be staged for deletion (and *not* only deleting the config
entry and keeping an empty ".gitmodules' file in the index)
if the user is removing the only submodule in the superproject.

P.S. I CC'ed "cousteaulecommandant" who was CC'ed on the original bug report
since Javier's @rtx.com address now bounces.
Junio C Hamano Feb. 18, 2021, 10:03 p.m. UTC | #3
Shourya Shukla <periperidip@gmail.com> writes:

> +		if (list.entry[i].is_submodule) {
> +			/*
> +			 * Then, unless we used "--cached", remove the filenames from
> +			 * the workspace. If we fail to remove the first one, we
> +			 * abort the "git rm" (but once we've successfully removed
> +			 * any file at all, we'll go ahead and commit to it all:
> +			 * by then we've already committed ourselves and can't fail
> +			 * in the middle)
> +			 */
> +			if (!index_only) {
> +				struct strbuf buf = STRBUF_INIT;
>  				strbuf_reset(&buf);
>  				strbuf_addstr(&buf, path);
>  				if (remove_dir_recursively(&buf, 0))
>  					die(_("could not remove '%s'"), path);
>  
>  				removed = 1;
> -				if (!remove_path_from_gitmodules(path))
> -					gitmodules_modified = 1;
> -				continue;
> +				strbuf_release(&buf);

Since we won't come to this block when doing index_only, we are
allowed to touch the working tree contents and files.  We indeed do
"rm -rf" of the submodule working tree and touch .gitmodules file
that is in the working tree.

>  			}
> +			if (!remove_path_from_gitmodules(path))
> +				gitmodules_modified = 1;
> +			continue;

But this looks wrong.  It might be OK to remove from the .gitmodules
stored in the index, but I fail to see why it is justified to touch
the working tree file when "--cached" is given.

> +		}
> +		if (!index_only) {
>  			if (!remove_path(path)) {
>  				removed = 1;
>  				continue;
> @@ -396,11 +398,15 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>  			if (!removed)
>  				die_errno("git rm: '%s'", path);
>  		}
> -		strbuf_release(&buf);
> -		if (gitmodules_modified)
> -			stage_updated_gitmodules(&the_index);

Assuming that it is somehow justifiable that removing the entry from
the .gitmodules in the index (again, I do not think it is
justifiable to remove from the working tree file), we no longer can
use stage_updated_gitmodules() helper as-is.

I think you'd need to

 - Add a variant of remove_path_from_gitmodules() that can remove
   the given submodule from the .gitmodules in the index entry
   without touching the working tree.  The change could be to update
   the function to take an extra "index_only" parameter and a
   pointer to an index_state instance, and

   (1) if !index_only, then edit the .gitmodules file in the working
       tree to remove the entry for path;

   (2) in both !index_only and index_only cases, read .gitmodules
       file from the index, edit to remove the entry for path, and
       add the result to the index.

   and return 0 for success (e.g. if path is not a submoudle or no
   entry for it is found in .gitmodules, it may return -1).

 - Since the previous point will maintain the correct contents in
   the index in all cases, get rid of gitmodules_modified and calls
   to stage_updated_gitmodules().  The call to write_locked_index()
   at the end will take care of the actual writing out of the index.

if we want to teach "rm --cached" to update only the index, and "rm"
to update both the index and the working tree, of ".gitmodules".

Having said that, I still do not think it is a good direction to go
to teach low level "rm", "mv" etc. to know about ".gitmodules" (yes,
yes, I know that some changes that I consider to be mistakes have
already happened---that does not mean we cannot correct our course
and it does not mean it is OK to make things even worse).  Such a
"how does a submodule get managed" policy decision belongs to the
"git submodule" subcommand, I would think.

Thanks.

> +	/*
> +	 * Remove the entry of the submodule from the ".gitmodules" irrespective
> +	 * whether "--cached" was passed or not.
> +	 */
> +	if (gitmodules_modified)
> +		stage_updated_gitmodules(&the_index);
> +
>  	if (write_locked_index(&the_index, &lock_file,
>  			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
>  		die(_("Unable to write new index file"));
Shourya Shukla Feb. 19, 2021, 3:19 p.m. UTC | #4
On 18/02 03:14, Philippe Blain wrote:
> Hello Shourya,
> 
> Le 2021-02-18 à 13:49, Shourya Shukla a écrit :
> > Earlier, on doing a 'git rm --cached <submodule>' did not modify the
> > '.gitmodules' entry of the submodule in question hence the file was not
> > staged. Change this behaviour to remove the entry of the submodule from
> > the '.gitmodules', something which might be more expected of the
> > command.
> 
> We prefer using the imperative mood for the commit message title,
> the present tense for describing the actual state of the code,
> and finally the imperative mood again to give order to the code base
> to change its behaviour [1]. So something like the following would fit more
> into the project's conventions:

I have no idea how I missed that one. Apologies, will make the change.

>     rm: stage submodule removal from '.gitmodules' when using '--cached'
> 
>     Currently, using 'git rm --cached <submodule>' removes submodule <submodule> from the index
>     and leaves the submodule working tree intact in the superproject working tree,
>     but does not stage any changes to the '.gitmodules' file, in contrast to
>     'git rm <submodule>', which removes both the submodule and its configuration
>     in '.gitmodules' from the worktree and index.
>     Fix this inconsistency by also staging the removal of the configuration of the
>     submodule from the '.gitmodules' file, leaving the worktree copy intact, a behaviour
>     which is more in line with what might be expected when using '--cached'.
> 

Okay. I will use the above message.

> However, this is *not* what you patch does; it also removes the relevant
> section from the '.gitmodules' file *in the worktree*, which is not acceptable
> because it is exactly contrary to what '--cached' means.
> 
> This was verified by running Javier's demonstration script that I included in the
> Gitgitgadget issue [2], which I copy here:
> 
> 
> ~~~
> rm -rf some_submodule top_repo
> 
> mkdir some_submodule
> cd some_submodule
> git init
> echo hello > hello.txt
> git add hello.txt
> git commit -m 'First commit of submodule'
> cd ..
> mkdir top_repo
> cd top_repo
> git init
> echo world > world.txt
> git add world.txt
> git commit -m 'First commit of top repo'
> git submodule add ../some_submodule
> git status  # both some_submodule and .gitmodules staged
> git commit -m 'Added submodule'
> git rm --cached some_submodule
> git status  # only some_submodule staged
> ~~~
> 
> With your changes, at the end '.gitmodules' is modified in both the
> worktree and the index, whereas we would want it to be modified
> *only* in the index.
> 
> And we would want it to be staged for deletion (and only deleting the config
> entry and keeping an empty ".gitmodules' file in the index)
> if the user is removing the only submodule in the superproject.

Correct.

> >   builtin/rm.c | 48 +++++++++++++++++++++++++++---------------------
> >   1 file changed, 27 insertions(+), 21 deletions(-)
> > 
> 
> Once implemeted correctly (leaving the worktree version of '.gitmodules'
> intact), that patch should also change the documentation to stay up-to-date,
> since the "Submodules" section of Documentation/git-rm.txt states [3]:
> 
>     If it exists the submodule.<name> section in the gitmodules[5] file will
>     also be removed and that file will be staged (unless --cached or -n are used).

Understood. I have to let the working tree '.gitmodules' be left as-is
and only change the copy in the index.
Shourya Shukla Feb. 19, 2021, 3:24 p.m. UTC | #5
On 18/02 02:03, Junio C Hamano wrote:
> Shourya Shukla <periperidip@gmail.com> writes:
> 
> > +		if (list.entry[i].is_submodule) {
> > +			/*
> > +			 * Then, unless we used "--cached", remove the filenames from
> > +			 * the workspace. If we fail to remove the first one, we
> > +			 * abort the "git rm" (but once we've successfully removed
> > +			 * any file at all, we'll go ahead and commit to it all:
> > +			 * by then we've already committed ourselves and can't fail
> > +			 * in the middle)
> > +			 */
> > +			if (!index_only) {
> > +				struct strbuf buf = STRBUF_INIT;
> >  				strbuf_reset(&buf);
> >  				strbuf_addstr(&buf, path);
> >  				if (remove_dir_recursively(&buf, 0))
> >  					die(_("could not remove '%s'"), path);
> >  
> >  				removed = 1;
> > -				if (!remove_path_from_gitmodules(path))
> > -					gitmodules_modified = 1;
> > -				continue;
> > +				strbuf_release(&buf);
> 
> Since we won't come to this block when doing index_only, we are
> allowed to touch the working tree contents and files.  We indeed do
> "rm -rf" of the submodule working tree and touch .gitmodules file
> that is in the working tree.
> 
> >  			}
> > +			if (!remove_path_from_gitmodules(path))
> > +				gitmodules_modified = 1;
> > +			continue;
> 
> But this looks wrong.  It might be OK to remove from the .gitmodules
> stored in the index, but I fail to see why it is justified to touch
> the working tree file when "--cached" is given.

No no, you are correct. Phillipe pointed out the same thing. I don't
know how I made this mistake.

> > +		}
> > +		if (!index_only) {
> >  			if (!remove_path(path)) {
> >  				removed = 1;
> >  				continue;
> > @@ -396,11 +398,15 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
> >  			if (!removed)
> >  				die_errno("git rm: '%s'", path);
> >  		}
> > -		strbuf_release(&buf);
> > -		if (gitmodules_modified)
> > -			stage_updated_gitmodules(&the_index);
> 
> Assuming that it is somehow justifiable that removing the entry from
> the .gitmodules in the index (again, I do not think it is
> justifiable to remove from the working tree file), we no longer can
> use stage_updated_gitmodules() helper as-is.
> 
> I think you'd need to
> 
>  - Add a variant of remove_path_from_gitmodules() that can remove
>    the given submodule from the .gitmodules in the index entry
>    without touching the working tree.  The change could be to update
>    the function to take an extra "index_only" parameter and a
>    pointer to an index_state instance, and
> 
>    (1) if !index_only, then edit the .gitmodules file in the working
>        tree to remove the entry for path;
> 
>    (2) in both !index_only and index_only cases, read .gitmodules
>        file from the index, edit to remove the entry for path, and
>        add the result to the index.
> 
>    and return 0 for success (e.g. if path is not a submoudle or no
>    entry for it is found in .gitmodules, it may return -1).
> 
>  - Since the previous point will maintain the correct contents in
>    the index in all cases, get rid of gitmodules_modified and calls
>    to stage_updated_gitmodules().  The call to write_locked_index()
>    at the end will take care of the actual writing out of the index.
> 
> if we want to teach "rm --cached" to update only the index, and "rm"
> to update both the index and the working tree, of ".gitmodules".

Yeah, this approach seems perfect. I will do it this way.

> Having said that, I still do not think it is a good direction to go
> to teach low level "rm", "mv" etc. to know about ".gitmodules" (yes,
> yes, I know that some changes that I consider to be mistakes have
> already happened---that does not mean we cannot correct our course
> and it does not mean it is OK to make things even worse).  Such a
> "how does a submodule get managed" policy decision belongs to the
> "git submodule" subcommand, I would think.


Let's do it this way. I will deliver a v2 of this patch, if we get
comments from anyone stating that this should not go forward, then we
will drop this patch or do what is suggested. Else, queue this patch for
now (given that this does not break anything, obviously) and maybe put
up a RFC for the method you suggested. I am saying this because we have
not received any conclusive evidence of whether this patch should carry
on or not (not trying to disregard your take).

What do you say?
Junio C Hamano Feb. 20, 2021, 3:31 a.m. UTC | #6
Shourya Shukla <periperidip@gmail.com> writes:

>> Since we won't come to this block when doing index_only, we are
>> allowed to touch the working tree contents and files.  We indeed do
>> "rm -rf" of the submodule working tree and touch .gitmodules file
>> that is in the working tree.
>> 
>> >  			}
>> > +			if (!remove_path_from_gitmodules(path))
>> > +				gitmodules_modified = 1;
>> > +			continue;
>> 
>> But this looks wrong.  It might be OK to remove from the .gitmodules
>> stored in the index, but I fail to see why it is justified to touch
>> the working tree file when "--cached" is given.
>
> No no, you are correct. Phillipe pointed out the same thing. I don't
> know how I made this mistake.
> ...
>> I think you'd need to
>> ...
>
> Yeah, this approach seems perfect. I will do it this way.

OK, then let's go that way.

Thanks.
diff mbox series

Patch

diff --git a/builtin/rm.c b/builtin/rm.c
index 4858631e0f..0b74f50bfe 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -254,7 +254,7 @@  static struct option builtin_rm_options[] = {
 int cmd_rm(int argc, const char **argv, const char *prefix)
 {
 	struct lock_file lock_file = LOCK_INIT;
-	int i;
+	int i, removed = 0, gitmodules_modified = 0;
 	struct pathspec pathspec;
 	char *seen;
 
@@ -365,30 +365,32 @@  int cmd_rm(int argc, const char **argv, const char *prefix)
 	if (show_only)
 		return 0;
 
-	/*
-	 * Then, unless we used "--cached", remove the filenames from
-	 * the workspace. If we fail to remove the first one, we
-	 * abort the "git rm" (but once we've successfully removed
-	 * any file at all, we'll go ahead and commit to it all:
-	 * by then we've already committed ourselves and can't fail
-	 * in the middle)
-	 */
-	if (!index_only) {
-		int removed = 0, gitmodules_modified = 0;
-		struct strbuf buf = STRBUF_INIT;
-		for (i = 0; i < list.nr; i++) {
-			const char *path = list.entry[i].name;
-			if (list.entry[i].is_submodule) {
+	for (i = 0; i < list.nr; i++) {
+		const char *path = list.entry[i].name;
+		if (list.entry[i].is_submodule) {
+			/*
+			 * Then, unless we used "--cached", remove the filenames from
+			 * the workspace. If we fail to remove the first one, we
+			 * abort the "git rm" (but once we've successfully removed
+			 * any file at all, we'll go ahead and commit to it all:
+			 * by then we've already committed ourselves and can't fail
+			 * in the middle)
+			 */
+			if (!index_only) {
+				struct strbuf buf = STRBUF_INIT;
 				strbuf_reset(&buf);
 				strbuf_addstr(&buf, path);
 				if (remove_dir_recursively(&buf, 0))
 					die(_("could not remove '%s'"), path);
 
 				removed = 1;
-				if (!remove_path_from_gitmodules(path))
-					gitmodules_modified = 1;
-				continue;
+				strbuf_release(&buf);
 			}
+			if (!remove_path_from_gitmodules(path))
+				gitmodules_modified = 1;
+			continue;
+		}
+		if (!index_only) {
 			if (!remove_path(path)) {
 				removed = 1;
 				continue;
@@ -396,11 +398,15 @@  int cmd_rm(int argc, const char **argv, const char *prefix)
 			if (!removed)
 				die_errno("git rm: '%s'", path);
 		}
-		strbuf_release(&buf);
-		if (gitmodules_modified)
-			stage_updated_gitmodules(&the_index);
 	}
 
+	/*
+	 * Remove the entry of the submodule from the ".gitmodules" irrespective
+	 * whether "--cached" was passed or not.
+	 */
+	if (gitmodules_modified)
+		stage_updated_gitmodules(&the_index);
+
 	if (write_locked_index(&the_index, &lock_file,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
 		die(_("Unable to write new index file"));