[v2] apply: allow "new file" patches on i-t-a entries

Message ID	20200804223155.7727-1-ray@ameretat.dev (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=W4Po=BO=vger.kernel.org=git-owner@kernel.org> From: "Raymond E. Pasco" <ray@ameretat.dev> To: git@vger.kernel.org Cc: "Raymond E. Pasco" <ray@ameretat.dev>, Junio C Hamano <gitster@pobox.com> Subject: [PATCH v2] apply: allow "new file" patches on i-t-a entries Date: Tue, 4 Aug 2020 18:31:55 -0400 Message-Id: <20200804223155.7727-1-ray@ameretat.dev> In-Reply-To: <xmqqlfiuryym.fsf@gitster.c.googlers.com> References: <xmqqlfiuryym.fsf@gitster.c.googlers.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	[v2] apply: allow "new file" patches on i-t-a entries \| expand [v2] apply: allow "new file" patches on i-t-a entries

Message ID

20200804223155.7727-1-ray@ameretat.dev (mailing list archive)

State

Superseded

Headers

From: "Raymond E. Pasco" <ray@ameretat.dev>
To: git@vger.kernel.org
Cc: "Raymond E. Pasco" <ray@ameretat.dev>,
        Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2] apply: allow "new file" patches on i-t-a entries
Date: Tue,  4 Aug 2020 18:31:55 -0400
Message-Id: <20200804223155.7727-1-ray@ameretat.dev>
In-Reply-To: <xmqqlfiuryym.fsf@gitster.c.googlers.com>
References: <xmqqlfiuryym.fsf@gitster.c.googlers.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

[v2] apply: allow "new file" patches on i-t-a entries | expand

Commit Message

Raymond E. Pasco Aug. 4, 2020, 10:31 p.m. UTC

diff-files recently changed to treat "intent to add" entries as new file
diffs rather than diffs from the empty blob. However, apply refuses to
apply new file diffs on top of existing index entries, except in the
case of renames. This causes "git add -p", which uses apply, to fail
when attempting to stage hunks from a file when intent to add has been
recorded.

This changes the logic in check_to_create() which checks if an entry
already exists in an index in two ways: first, we only search for an
index entry at all if ok_if_exists is false; second, we check for the
CE_INTENT_TO_ADD flag on any index entries we find and allow the apply
to proceed if it is set.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
---
 apply.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Aug. 4, 2020, 11:49 p.m. UTC | #1

"Raymond E. Pasco" <ray@ameretat.dev> writes:

> diff-files recently changed to treat "intent to add" entries as new file
> diffs rather than diffs from the empty blob. However, apply refuses to
> apply new file diffs on top of existing index entries, except in the
> case of renames. This causes "git add -p", which uses apply, to fail
> when attempting to stage hunks from a file when intent to add has been
> recorded.

As this is supposed to be a "bugfix", there shouldn't be any need to
update documentation (otherwise, we are either fixing documentation
in addition to the bug, or we are changing the documented behaviour
in the name of bugfix---which we need to be very careful to see if
we are not breaking existing users).  But we do need to document
what behaviour we want with tests, which will also serve as a way to
protect the current behaviour from future bugs.

So I started writing the attached, but I have strong doubts about
the updated behaviour.

 - The first one (setup).  We create a sample file and keep it as
   the master copy, and express our intention to add test-file with
   its contents sometime in the future.  And then we take a patch
   out of the index.  We make sure that the produced patch is a
   creation patch.

   This should be straight-forward and uncontroversial.

 - The second one.  We make sure that we have i-t-a and not real
   entry for test-file in the index.  We try to apply the patch we
   took in the first test to (and only to) the index.  This must
   succeed, thanks to your fix---the i-t-a entry in the index should
   not prevent "new file mode 100644" from created at test-file.  We
   make sure what is in the index matches the master copy.

   This should be straight-forward and uncontroversial.

 - The third one.  We do the same set-up as the previous one, but in
   addition, we remove the working tree copy before attempting to
   apply the patch both to the index and to the working tree.  That
   way, "when creating a new file, it must not exist yet" rule on
   the working tree side would not trigger.

   This I find troublesome.  The real use case you had trouble with
   (i.e. "git add -p") would not remove any working tree files
   before attempting to apply any patch, I would imagine.  Are we
   expecting the right behaviour with this test?  I cannot tell.  

   It feels like it is even pointless to allow i-t-a entry to exist
   in the index for the path, if we MUST remove the path from the
   working tree anyway, to be able to apply.

 - The fourth one.  If we have test-file on the working tree, "when
   creating a new file, it must not exist yet" rule on the working
   tree side should trigger and prevent the operation.  I think this
   is a reasonable expectation.

What I am wondering primarily is if we should actually FAIL the
third one.  The patch tries to create a path, for which there is an
i-t-a entry in the index.  But a path with i-t-a entry in the index
means the user at least must have had a file on the working tree to
register that intent-to-add the path.  Removed working tree file
would then mean that the path _has_ a local modification, so "git
apply --index" should *not* succeed for the usual reasons of having
differences between the index and the working tree.

And without your "fix" to apply.c, "git apply" in the the third test
fails, so we may need a bit more work to make sure it keeps failing.

I dunno.  It almost feels that this approach to fix "git add -p"
might be barking up a wrong tree.  After all, the user, by having an
i-t-a entry for the path in the index, expressed the desire to add
real contents later to the path, so being able to use "git apply"
with either "--cached" or "--index" options to clobber the path with
a creation patch feels wrong _unless_ the user then rescinded the
previous intent to add to the path (with "git rm --cached" or an
equivalent).

How exactly does "git add -p" fail for such a patch?  What operation
does it exactly want to do ("apply --cached"???) and is it "apply"
that is wrong, or is it "git add -p" that fails to remove the i-t-a
entry from the index before running "git apply" that is at fault?

Thanks.

 apply.c              | 11 +++++++----
 t/t4140-apply-ita.sh | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/apply.c b/apply.c
index 8bff604dbe..4cba4ce71a 100644
--- a/apply.c
+++ b/apply.c
@@ -3747,10 +3747,13 @@ static int check_to_create(struct apply_state *state,
 {
 	struct stat nst;

-	if (state->check_index &&
-	    index_name_pos(state->repo->index, new_name, strlen(new_name)) >= 0 &&
-	    !ok_if_exists)
-		return EXISTS_IN_INDEX;
+	if (state->check_index && !ok_if_exists) {
+		int pos = index_name_pos(state->repo->index, new_name, strlen(new_name));
+		if (pos >= 0 &&
+		    !(state->repo->index->cache[pos]->ce_flags & CE_INTENT_TO_ADD))
+			return EXISTS_IN_INDEX;
+	}
+
 	if (state->cached)
 		return 0;

diff --git a/t/t4140-apply-ita.sh b/t/t4140-apply-ita.sh
new file mode 100755
index 0000000000..e9f3749e65
--- /dev/null
+++ b/t/t4140-apply-ita.sh
@@ -0,0 +1,51 @@
+#!/bin/sh
+#
+# Copyright 2020 Google LLC
+#
+
+test_description='git apply of i-t-a file'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	test_write_lines 1 2 3 4 5 >blueprint &&
+
+	cat blueprint >test-file &&
+	git add -N test-file &&
+	git diff >creation-patch &&
+	grep "new file mode 100644" creation-patch
+'
+
+test_expect_success 'apply creation patch to ita path (--cached)' '
+	git rm -f test-file &&
+	cat blueprint >test-file &&
+	git add -N test-file &&
+
+	git apply --cached creation-patch &&
+	git cat-file blob :test-file >actual &&
+	test_cmp blueprint actual
+'
+
+test_expect_success 'apply creation patch to ita path (--index)' '
+	git rm -f test-file &&
+	cat blueprint >test-file &&
+	git add -N test-file &&
+	rm -f test-file &&
+
+	# NEEDSWORK: this should fail as test-file does not
+	# agree between index and the working tree, no?????
+	git apply --index creation-patch &&
+	git cat-file blob :test-file >actual &&
+	test_cmp blueprint actual &&
+	test_cmp blueprint test-file
+'
+
+test_expect_success 'apply creation patch to ita path (--index)' '
+	git rm -f test-file &&
+	cat blueprint >test-file &&
+	git add -N test-file &&
+
+	test_must_fail git apply --index creation-patch
+'
+
+test_done

Raymond E. Pasco Aug. 5, 2020, 12:32 a.m. UTC | #2

On Tue Aug 4, 2020 at 7:49 PM EDT, Junio C Hamano wrote:
> How exactly does "git add -p" fail for such a patch? What operation
> does it exactly want to do ("apply --cached"???) and is it "apply"
> that is wrong, or is it "git add -p" that fails to remove the i-t-a
> entry from the index before running "git apply" that is at fault?

Yes, "add -p" uses "apply --cached". I do believe this belongs in apply,
both because all "add -p" really does is assemble things to be fed to
apply and also for the more detailed reasons below.

The index and the filesystem are both able to represent "no file" and "a
file exists" states, but the index has an additional state (i-t-a) with
no direct representation in the worktree. By (correctly) emitting "new
file" patches when comparing a file to an i-t-a index entry, we are
setting down the rule that a "new file" patch is not merely the diff
between "no file" and "a file exists", but also the diff between i-t-a
and "a file exists".

Similarly, "deleted file" patches are the diff between "a file exists"
and "no file exists", but they are also the diff between i-t-a and "no
file exists" - if you add -N a file and then delete it from the
worktree, "deleted file" is what git diff (correctly) shows. As a
consequence of these rules, "new file" and "deleted file" diffs are now
the only diffs that validly apply to an i-t-a entry. So apply needs to
handle them (in "--cached" mode, anyway).

But the worktree lives in the filesystem, where there are no i-t-a
entries. So the question seems to me to be whether "no file" in the
worktree matches an i-t-a entry in the index for the purposes of "add
--index". I count a couple options here:

- Nothing on the filesystem can accurately match an i-t-a entry in the
  index, so all attempts at "apply --index" when there is an i-t-a in
  the index fail with "file: does not match index". "apply --cached",
  which "add -p" uses, applies only to the index and continues to work.
  I think I prefer this one; additionally, the comment in read-cache.c
  indicate that this is supposed to be the case already, so I just need
  to make sure this check is not skipped on "new file" patches.

- The current (as of this patch) behavior: a "new file" patch applies
  both to an i-t-a in the index, and to the lack of a file in the
  worktree. This may seem strange, but it may also seem strange that an
  identical new file patch, which can be applied either to just the
  worktree or just the index successfully, fails when applied to both at
  the same time with "apply --index". However, this is precisely what is
  done anyway by "apply --index" when there are no i-t-a entries
  involved, so I lean towards i-t-a entries never matching the worktree.

Patch for the first option in progress.

diff --git a/apply.c b/apply.c
index 8bff604dbe..4cba4ce71a 100644
--- a/apply.c
+++ b/apply.c
@@ -3747,10 +3747,13 @@  static int check_to_create(struct apply_state *state,
 {
 	struct stat nst;
 
-	if (state->check_index &&
-	    index_name_pos(state->repo->index, new_name, strlen(new_name)) >= 0 &&
-	    !ok_if_exists)
-		return EXISTS_IN_INDEX;
+	if (state->check_index && !ok_if_exists) {
+		int pos = index_name_pos(state->repo->index, new_name, strlen(new_name));
+		if (pos >= 0 &&
+		    !(state->repo->index->cache[pos]->ce_flags & CE_INTENT_TO_ADD))
+			return EXISTS_IN_INDEX;
+	}
+
 	if (state->cached)
 		return 0;