diff mbox series

diff-files: treat "i-t-a" files as "not-in-index"

Message ID 20200611161640.52156-1-shrinidhi.kaushik@gmail.com (mailing list archive)
State New, archived
Headers show
Series diff-files: treat "i-t-a" files as "not-in-index" | expand

Commit Message

Srinidhi Kaushik June 11, 2020, 4:16 p.m. UTC
The `diff-files' command and related commands which call `cmd_diff_files()',
consider the "intent-to-add" files as a part of the index when comparing the
work-tree against it. This was previously addressed in [1] and [2] by turning
the option `--ita-invisible-in-index' (introduced in [3]) on by default.

For `diff-files' (and `add -p' as a consequence) to show the i-t-a files as
as new, `ita_invisible_in_index' will be enabled by default here as well.

[1] 0231ae71d3 (diff: turn --ita-invisible-in-index on by default, 2018-05-26)
[2] 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist in
                index", 2016-10-24)
[3] b42b451919 (diff: add --ita-[in]visible-in-index, 2016-10-24)

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
---

Hello! This is my first patch in this project.
This issue was mentioned in #leftoverbits on GitHub: [1], and this
patch implements the change proposed in [2].

[1] https://github.com/gitgitgadget/git/issues/647
[2] https://lore.kernel.org/git/20200527230357.GB546534@coredump.intra.peff.net


 builtin/diff-files.c  |  7 +++++++
 t/t2203-add-intent.sh | 25 ++++++++++++++++++++++++-
 2 files changed, 31 insertions(+), 1 deletion(-)

Comments

Junio C Hamano June 11, 2020, 8:27 p.m. UTC | #1
Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:

> The `diff-files' command and related commands which call `cmd_diff_files()',
> consider the "intent-to-add" files as a part of the index when comparing the
> work-tree against it. This was previously addressed in [1] and [2] by turning
> the option `--ita-invisible-in-index' (introduced in [3]) on by default.
>
> For `diff-files' (and `add -p' as a consequence) to show the i-t-a files as
> as new, `ita_invisible_in_index' will be enabled by default here as well.
>
> [1] 0231ae71d3 (diff: turn --ita-invisible-in-index on by default, 2018-05-26)
> [2] 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist in
>                 index", 2016-10-24)
> [3] b42b451919 (diff: add --ita-[in]visible-in-index, 2016-10-24)

Is there any place where we still run the internal diff machinery to
compare the index and the working tree without setting the
ita_invisible_in_index bit on with this patch applied, and if so,
why?  Does the justification why that other place needs to leave
the bit off apply to this codepath as well?

What I am trying to get at is if this is helping only one usecase
for "diff-files" while breaking other usecases.

On the other hand, if there is no longer anybody who wants
ita_invisible_in_index off, perhaps we can get rid of the bit and
lose many conditionals.
Srinidhi Kaushik June 11, 2020, 11:28 p.m. UTC | #2
Thanks for replying!

On Thu, Jun 11, 2020 at 01:27:22PM -0700, Junio C Hamano wrote:
> Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:
> 
> > The `diff-files' command and related commands which call `cmd_diff_files()',
> > consider the "intent-to-add" files as a part of the index when comparing the
> > work-tree against it. This was previously addressed in [1] and [2] by turning
> > the option `--ita-invisible-in-index' (introduced in [3]) on by default.
> >
> > For `diff-files' (and `add -p' as a consequence) to show the i-t-a files as
> > as new, `ita_invisible_in_index' will be enabled by default here as well.
> >
> > [1] 0231ae71d3 (diff: turn --ita-invisible-in-index on by default, 2018-05-26)
> > [2] 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist in
> >                 index", 2016-10-24)
> > [3] b42b451919 (diff: add --ita-[in]visible-in-index, 2016-10-24)
> 
> Is there any place where we still run the internal diff machinery to
> compare the index and the working tree without setting the
> ita_invisible_in_index bit on with this patch applied, and if so,
> why?  Does the justification why that other place needs to leave
> the bit off apply to this codepath as well?

Yes, I believe that there exist some use cases for `ita_invisible_in_index'
to be unset. For instance, `index_differs_from' which is used in a quite a
few places -- like "commit", "revert", and "rebase" -- which require a
"no change" to be returned.

This commit: [1] addressed the issue where the cache-tree was producing
the same tree as the current commit when it involved "intent-to-add"
entries, instead of aborting.

[1] 018ec3c820 (commit: fix empty commit creation when there's no changes
                but ita entries, 2016-10-24) 

> What I am trying to get at is if this is helping only one usecase
> for "diff-files" while breaking other usecases.

Currently, `run_add_p' (for "add"; which this patch addresses
the fix), and `push_to_deploy' (in "receive-pack"; where this
is the intended behavior), call "diff-files" as a subprocess,
in which case the `ita_invisible_in_index' bit is explicitly
set. For all other cases, calls are made directly
to `run_diff_files' and will be unaffected by this change.
 
> On the other hand, if there is no longer anybody who wants
> ita_invisible_in_index off, perhaps we can get rid of the bit and
> lose many conditionals.
Srinidhi Kaushik June 18, 2020, 5:58 p.m. UTC | #3
Hello,
Is there any update on this patch?
Please let me know if I missed anything.

Thanks!

On 06/12/2020 04:58, Srinidhi Kaushik wrote:
> Thanks for replying!
> 
> On Thu, Jun 11, 2020 at 01:27:22PM -0700, Junio C Hamano wrote:
> > Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:
> > 
> > > The `diff-files' command and related commands which call `cmd_diff_files()',
> > > consider the "intent-to-add" files as a part of the index when comparing the
> > > work-tree against it. This was previously addressed in [1] and [2] by turning
> > > the option `--ita-invisible-in-index' (introduced in [3]) on by default.
> > >
> > > For `diff-files' (and `add -p' as a consequence) to show the i-t-a files as
> > > as new, `ita_invisible_in_index' will be enabled by default here as well.
> > >
> > > [1] 0231ae71d3 (diff: turn --ita-invisible-in-index on by default, 2018-05-26)
> > > [2] 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist in
> > >                 index", 2016-10-24)
> > > [3] b42b451919 (diff: add --ita-[in]visible-in-index, 2016-10-24)
> > 
> > Is there any place where we still run the internal diff machinery to
> > compare the index and the working tree without setting the
> > ita_invisible_in_index bit on with this patch applied, and if so,
> > why?  Does the justification why that other place needs to leave
> > the bit off apply to this codepath as well?
> 
> Yes, I believe that there exist some use cases for `ita_invisible_in_index'
> to be unset. For instance, `index_differs_from' which is used in a quite a
> few places -- like "commit", "revert", and "rebase" -- which require a
> "no change" to be returned.
> 
> This commit: [1] addressed the issue where the cache-tree was producing
> the same tree as the current commit when it involved "intent-to-add"
> entries, instead of aborting.
> 
> [1] 018ec3c820 (commit: fix empty commit creation when there's no changes
>                 but ita entries, 2016-10-24) 
> 
> > What I am trying to get at is if this is helping only one usecase
> > for "diff-files" while breaking other usecases.
> 
> Currently, `run_add_p' (for "add"; which this patch addresses
> the fix), and `push_to_deploy' (in "receive-pack"; where this
> is the intended behavior), call "diff-files" as a subprocess,
> in which case the `ita_invisible_in_index' bit is explicitly
> set. For all other cases, calls are made directly
> to `run_diff_files' and will be unaffected by this change.
>  
> > On the other hand, if there is no longer anybody who wants
> > ita_invisible_in_index off, perhaps we can get rid of the bit and
> > lose many conditionals.
Junio C Hamano June 18, 2020, 10:33 p.m. UTC | #4
Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:

> -test_expect_success 'diff-files/diff-cached shows ita as new/not-new files' '
> +test_expect_success 'diff/diff-cached shows ita as new/not-new files' '
>  	git reset --hard &&
>  	echo new >new-ita &&
>  	git add -N new-ita &&

Interesting.  

I thought that the test originally tested "diff-files" and
"diff-index --cached" and a modernization patch forgot to update the
title when the test body was changed to use "diff" and "diff
--cached", but that is not the case here.  When 0231ae71 (diff: turn
--ita-invisible-in-index on by default, 2018-05-26) added this test,
it gave a wrong title from the beginning.

Nice catch.

> @@ -243,6 +243,29 @@ test_expect_success 'diff-files/diff-cached shows ita as new/not-new files' '
>  	test_must_be_empty actual2
>  '
>  
> +test_expect_success 'diff-files shows i-t-a files as new files' '
> +	git reset --hard &&
> +	touch empty &&

Use of "touch" gives a wrong impression that you care about the file
timestamp; use something like ": >empty &&" instead when you care
about the presence of the file and do not care about its timestamp.

> +	content="foo" &&
> +	echo $content >not-empty &&

The quoting decision is backwards in these two lines.  It is OK not
to quote when the right hand side literal is clearly a single word
without $IFS.  On the other hand, it is a good practice to always
quote when using what is in a "$variable".

> +	git add -N empty not-empty &&
> +	git diff-files -p >actual &&
> +	hash_e=$(git hash-object empty) &&
> +	hash_n=$(git hash-object not-empty) &&
> +	cat >expect <<-EOF &&
> +	diff --git a/empty b/empty
> +	new file mode 100644
> +	index 0000000..$(git rev-parse --short $hash_e)
> +	diff --git a/not-empty b/not-empty
> +	new file mode 100644
> +	index 0000000..$(git rev-parse --short $hash_n)
> +	--- /dev/null
> +	+++ b/not-empty
> +	@@ -0,0 +1 @@
> +	+$content
> +	EOF
> +	test_cmp expect actual
> +'

OK.  Do we want to show what happens when "diff" and "diff --cached"
are run with these two "added but not quite added yet" paths to
contrast with this new case?

Thanks.
Junio C Hamano June 18, 2020, 10:33 p.m. UTC | #5
Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:

> Hello,
> Is there any update on this patch?
> Please let me know if I missed anything.

Sorry, the patch totally fell through the cracks.

I just sent out a brief review on the part I didn't review during
the first round.

Thanks.
Junio C Hamano June 18, 2020, 10:40 p.m. UTC | #6
Junio C Hamano <gitster@pobox.com> writes:

>> +	touch empty &&
>
> Use of "touch" gives a wrong impression that you care about the file
> timestamp; use something like ": >empty &&" instead when you care
> about the presence of the file and do not care about its timestamp.

I just realized that this is even more important in this case not to
use "touch".

The test that uses this file cares not just the presence, but it
deeply cares that its contents is empty.  The thing it least cares
about is its timestamp.

The purpose of using "touch" is to update the timestamp, to keep the
current contents if it exists, and to ensure it exists (as a side
effect), in the decreasing order of importance.  Use of the command
here misleads the readers.

Thanks.
Srinidhi Kaushik June 19, 2020, 9:31 a.m. UTC | #7
Thank you for reviewing this; I appreciate it!

> > +     content="foo" &&
> > +     echo $content >not-empty &&
>
> The quoting decision is backwards in these two lines.  It is OK not
> to quote when the right hand side literal is clearly a single word
> without $IFS.  On the other hand, it is a good practice to always
> quote when using what is in a "$variable".

Yes, that doesn't look right, I will make changes in v2.

[...]
> > > +  touch empty &&
> >
> > Use of "touch" gives a wrong impression that you care about the file
> > timestamp; use something like ": >empty &&" instead when you care
> > about the presence of the file and do not care about its timestamp.
>
> I just realized that this is even more important in this case not to
> use "touch".
>
> The test that uses this file cares not just the presence, but it
> deeply cares that its contents is empty.  The thing it least cares
> about is its timestamp.
>
> The purpose of using "touch" is to update the timestamp, to keep the
> current contents if it exists, and to ensure it exists (as a side
> effect), in the decreasing order of importance.  Use of the command
> here misleads the readers.

Oops, you are right. That makes sense. Will update to ": >empty".

[...]
> > +     git add -N empty not-empty &&
> > +     git diff-files -p >actual &&
> > +     hash_e=$(git hash-object empty) &&
> > +     hash_n=$(git hash-object not-empty) &&
> > +     cat >expect <<-EOF &&
> > +     diff --git a/empty b/empty
> > +     new file mode 100644
> > +     index 0000000..$(git rev-parse --short $hash_e)
> > +     diff --git a/not-empty b/not-empty
> > +     new file mode 100644
> > +     index 0000000..$(git rev-parse --short $hash_n)
> > +     --- /dev/null
> > +     +++ b/not-empty
> > +     @@ -0,0 +1 @@
> > +     +$content
> > +     EOF
> > +     test_cmp expect actual
> > +'
>
> OK.  Do we want to show what happens when "diff" and "diff --cached"
> are run with these two "added but not quite added yet" paths to
> contrast with this new case?

I'm not sure if we want to repeat an older test. The test (which was
renamed in this patch) in t2203-add-intent.sh: "diff/diff-cached shows
ita as new/not-new files" is already doing that. Should  the "diff" and
"diff --cached" steps be appended here again?

Thanks.
Junio C Hamano June 19, 2020, 9:43 p.m. UTC | #8
Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:

>> > +     git add -N empty not-empty &&
>> > +     git diff-files -p >actual &&
>> > +     hash_e=$(git hash-object empty) &&
>> > +     hash_n=$(git hash-object not-empty) &&
>> > +     cat >expect <<-EOF &&
>> > +     diff --git a/empty b/empty
>> > +     new file mode 100644
>> > +     index 0000000..$(git rev-parse --short $hash_e)
>> > +     diff --git a/not-empty b/not-empty
>> > +     new file mode 100644
>> > +     index 0000000..$(git rev-parse --short $hash_n)
>> > +     --- /dev/null
>> > +     +++ b/not-empty
>> > +     @@ -0,0 +1 @@
>> > +     +$content
>> > +     EOF
>> > +     test_cmp expect actual
>> > +'
>>
>> OK.  Do we want to show what happens when "diff" and "diff --cached"
>> are run with these two "added but not quite added yet" paths to
>> contrast with this new case?
>
> I'm not sure if we want to repeat an older test. The test (which was
> renamed in this patch) in t2203-add-intent.sh: "diff/diff-cached shows
> ita as new/not-new files" is already doing that. Should  the "diff" and
> "diff --cached" steps be appended here again?

No, there is no need to repeat essentially the same test that exists
elsewhere.  I wonder if it reduces duplication even further if we
extend that existing test that checks "diff" and "diff --cached" so
that it also checks "diff-files" as well?, instead of adding this
new one?  The existing one checks diff and diff-cached only with a
new non-empty path, and it can use tests with a new empty path at
the same time, with a unified "set up" code that is in the early
part of the test, e.g.

diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index 5bbe8dcce4..cfde790ac7 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -235,7 +235,8 @@ test_expect_success 'double rename detection in status' '
 test_expect_success 'diff-files/diff-cached shows ita as new/not-new files' '
 	git reset --hard &&
 	echo new >new-ita &&
-	git add -N new-ita &&
+	: >new-ita-empty &&
+	git add -N new-ita new-ita-empty &&
 	git diff --summary >actual &&
 	...

Then the existing tests can be updated to see not just --summary but
also for the contents like you did in the new test---and another test
that examines what "git diff-files" sees (which is what you added)
can happen in the same test (that way, the same set-up can be reused
for three tests).

Thanks.
diff mbox series

Patch

diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 86ae474fbf..1e352dd8f7 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -28,6 +28,13 @@  int cmd_diff_files(int argc, const char **argv, const char *prefix)
 	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
 	repo_init_revisions(the_repository, &rev, prefix);
 	rev.abbrev = 0;
+
+	/*
+	 * Consider "intent-to-add" files as new by default, unless
+	 * explicitly specified in the command line or anywhere else.
+	 */
+	rev.diffopt.ita_invisible_in_index = 1;
+
 	precompose_argv(argc, argv);
 
 	argc = setup_revisions(argc, argv, &rev, NULL);
diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index 5bbe8dcce4..742f27a935 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -232,7 +232,7 @@  test_expect_success 'double rename detection in status' '
 	)
 '
 
-test_expect_success 'diff-files/diff-cached shows ita as new/not-new files' '
+test_expect_success 'diff/diff-cached shows ita as new/not-new files' '
 	git reset --hard &&
 	echo new >new-ita &&
 	git add -N new-ita &&
@@ -243,6 +243,29 @@  test_expect_success 'diff-files/diff-cached shows ita as new/not-new files' '
 	test_must_be_empty actual2
 '
 
+test_expect_success 'diff-files shows i-t-a files as new files' '
+	git reset --hard &&
+	touch empty &&
+	content="foo" &&
+	echo $content >not-empty &&
+	git add -N empty not-empty &&
+	git diff-files -p >actual &&
+	hash_e=$(git hash-object empty) &&
+	hash_n=$(git hash-object not-empty) &&
+	cat >expect <<-EOF &&
+	diff --git a/empty b/empty
+	new file mode 100644
+	index 0000000..$(git rev-parse --short $hash_e)
+	diff --git a/not-empty b/not-empty
+	new file mode 100644
+	index 0000000..$(git rev-parse --short $hash_n)
+	--- /dev/null
+	+++ b/not-empty
+	@@ -0,0 +1 @@
+	+$content
+	EOF
+	test_cmp expect actual
+'
 
 test_expect_success '"diff HEAD" includes ita as new files' '
 	git reset --hard &&