diff mbox series

partial commit of file-to-directory change, was Re: Bugreport

Message ID 20240120004628.GA117170@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series partial commit of file-to-directory change, was Re: Bugreport | expand

Commit Message

Jeff King Jan. 20, 2024, 12:46 a.m. UTC
On Fri, Jan 19, 2024 at 11:14:47PM +0000, brian m. carlson wrote:

> > $ git commit . -m +
> > error: 'd' does not have a commit checked out
> > fatal: updating files failed
> 
> I can confirm this behaviour[0], and it's definitely wrong that it
> thinks `d` is a submodule.  It does, however, work if you do `git commit
> -m +` (that is, without the .), which makes sense since the relevant
> change is already staged in the index.
> 
> I'm not going to get to a patch or more thorough investigation, but
> perhaps someone else will.

I peeked at this a bit earlier; I didn't come up with a solution, but
here are a few more details.

As you noted, the problem is only with giving a pathspec to "git
commit". The bug is in the custom code git-commit uses to set up the
index for a partial commit. It generates a list of entries for the
partial commit via list_path(), and then tries to add them to the index.
But of course "d", being a directory, does not make any sense as an
index entry itself (unless as a submodule).

I'd have thought that we should just be using the same code that "git
add" does here (which obviously works). But all of this comes from
2888605c64 (builtin-commit: fix partial-commit support, 2007-11-18),
which explicitly says "you can't just run add_files_to_cache()", which
is what git-add does under the hood.

I don't know if we could revisit those assumptions and reuse some of the
existing code, or if the custom partial-commit code could be fixed.  I
guess the root of the issue is that in the call to list_paths(), we call
overlay_tree_on_index(). And that's how we end up thinking that "d" is a
useful path to talk about, even though it has already been removed from
the index (we have "d/b.txt" instead).

So I'm not sure if we need a solution here-ish:


but I'm not quite sure what we should be doing instead of that
warning(). Should we updating and including everything in "d/"? What
about if there were a "d/c.txt" that was not a tracked file at all?

It might also be that the bug is earlier in list_paths(), where we
should notice that "d" is gone and now we have entries under "d/".

I dunno. I probably won't dig much further on this myself, but it's
possible Junio (cc'd) might have an idea right away.

-Peff

Comments

Junio C Hamano Jan. 20, 2024, 12:55 a.m. UTC | #1
Jeff King <peff@peff.net> writes:

> I dunno. I probably won't dig much further on this myself, but it's
> possible Junio (cc'd) might have an idea right away.

Sorry, but I do not have any idea "right away".  I even needed to
see the tree of 2888605c64 and check if we had submodules back then
(it turns out that we did).
Junio C Hamano Jan. 20, 2024, 1:22 a.m. UTC | #2
Jeff King <peff@peff.net> writes:

> It might also be that the bug is earlier in list_paths(), where we
> should notice that "d" is gone and now we have entries under "d/".

We had a related discussion on "commit -i/-o tests" review thread,
where we noticed that "git commit -i foobar", when "foobar" does not
match any path in the index, silently ignore that unmatching
pathspec (but "commit -o foobar" does error out, saying "did not
match any file(s) known to git").  The important design decision we
made long time ago when we introduced partial commit ("commit -o
pathspec") is that we do not _add_ paths (that match the pathspec)
that are not tracked.  We require explicit "git add" for them before
you run "git commit".

So, I suspect that list_paths() that gives "d" as a thing to add is
broken.  "commit -m + ." at that point is saying "we should get the
latest contents from the working tree for all the paths in the real
index, add[*] them to the temporary index freshly read from HEAD,
and the resulting temporary index should be written out as a tree"
to create a commit, and then "the same set of contents for the paths
then should be added to the real index, so that the differences
between the tree we wrote out of the temporary index to update the
HEAD and the real index would still be the changes already in the
index before this partial commit".

    Side note: here, "add" would include removing (if the working
    tree file for the path is gone) or killing 'd' while adding
    'd/b.txt'.

So I would understand if 'd/b.txt' is listed (because in the real
index there already is d/b.txt known), but if the directory 'd'
itself is listed, that does sound like a bug.
Junio C Hamano Jan. 25, 2024, 6:54 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> On Fri, Jan 19, 2024 at 11:14:47PM +0000, brian m. carlson wrote:
>
>> > $ git commit . -m +
>> > error: 'd' does not have a commit checked out
>> > fatal: updating files failed
>> 
>> I can confirm this behaviour[0], and it's definitely wrong that it
>> thinks `d` is a submodule.  It does, however, work if you do `git commit
>> -m +` (that is, without the .), which makes sense since the relevant
>> change is already staged in the index.
>> 
>> I'm not going to get to a patch or more thorough investigation, but
>> perhaps someone else will.
>
> I peeked at this a bit earlier; I didn't come up with a solution, but
> here are a few more details.

I didn't either X-<, and the thing is somewhat nasty, as there are two
states (HEAD and the index) involved.

 * For paths that do not match the pathspec, we want to take the
   version from the HEAD.  For paths that do match the pathspec, we
   want to take the version from the working tree.  And after making
   the partial commit with such changes, we want to make the same
   change to the index to prepare for the next commit.

 * With the way the code is currently structured, we find paths that
   appear either in the index or in the HEAD to match against the
   pathspec.  This is so that we can honor an earlier "git rm" since
   we read HEAD.

 * Then we check each of these paths that are either in the index or
   in HEAD that matched the pathspec.  If it is missing from the
   working tree, we would remove it from both the temporary ndex
   used for the partial commit, and the real index that we'll
   continue to use after the partial commit.  If it exists in the
   working tree, we would take the contents of it to update.

   For the purpose of this operation, a path D that was a blob in
   the index should be _removed_ when it is a directory in the
   working tree (i.e. made room to create D/F).  And we should not
   add D/F when running "git commit D" or "git commit D/F", because
   the partial commit does not "git add" (it only does "git add -u")

In the example in the discussion, we had D that was a blob and got
replaced with a directory.  If we did "git add -u D", we would just
have removed D from the index, so the partial commit should do the
same.  Which means we need to know the type of the thing we expect.
But list_paths() only returns a list of path names, and does not
give us the type of the thing we expect.  We use the same list
twice, once to update the temporary index (which we create by
reading HEAD) and update the paths listed there from the working
tree.  And the same list is used again to update the real index
(which is what the user had before starting the command) and update
the paths listed there from the working tree.  The tricky part is
that the type of (or even existence of) D may have changed between
the HEAD and the index, but we want to perform the same operation to
the real index with respect to the paths we touched to create the
partial commit.

Naively, add_remove_files() looks like an obvious place to see if
the path is in the working tree (which we already do) *and* compare
it with "the type of the thing we expect", but the thing is that
this function is called twice with different index (once with the
temporary index that started from HEAD, i.e. oblivious to what the
user did to the index since HEAD; then again with the real index
with the changes the user made since HEAD), so we cannot just say
"let's see what this path D is in the index".

I _think_ we would need to update list_paths() so that it records a
bit more than just pathnames that match the pathspec, namely, what
should be done to the path, by doing lstat() on the working tree
paths.
diff mbox series

Patch

diff --git a/builtin/commit.c b/builtin/commit.c
index 65196a2827..25e65e986d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -302,7 +302,9 @@  static void add_remove_files(struct string_list *list)
 			continue;
 
 		if (!lstat(p->string, &st)) {
-			if (add_to_index(&the_index, p->string, &st, 0))
+			if (S_ISDIR(st.st_mode))
+				warning("woah, %s is a dir", p->string);
+			else if (add_to_index(&the_index, p->string, &st, 0))
 				die(_("updating files failed"));
 		} else
 			remove_file_from_index(&the_index, p->string);