diff mbox series

[2/2] diff-filter: be more careful when looking for negative bits

Message ID e8006493a9ed4da9b9125865e004ba7ace20e7a4.1643149759.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Fix two --diff-filter bugs | expand

Commit Message

Johannes Schindelin Jan. 25, 2022, 10:29 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `--diff-filter=<bits>` option allows to filter the diff by certain
criteria, for example `R` to only show renamed files. It also supports
negating a filter via a down-cased letter, i.e. `r` to show _everything
but_ renamed files.

However, the code is a bit overzealous when trying to figure out whether
`git diff` should start with all diff-filters turned on because the user
provided a lower-case letter: if the `--diff-filter` argument starts
with an upper-case letter, we must not start with all bits turned on.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 diff.c         | 8 +++-----
 t/t4202-log.sh | 8 ++++++++
 2 files changed, 11 insertions(+), 5 deletions(-)

Comments

Taylor Blau Jan. 25, 2022, 11:21 p.m. UTC | #1
On Tue, Jan 25, 2022 at 10:29:19PM +0000, Johannes Schindelin via GitGitGadget wrote:
> diff --git a/diff.c b/diff.c
> index c862771a589..fc1151b9c73 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4821,17 +4821,15 @@ static int diff_opt_diff_filter(const struct option *option,
>  	prepare_filter_bits();
>
>  	/*
> -	 * If there is a negation e.g. 'd' in the input, and we haven't
> +	 * If the input starts with a negation e.g. 'd', and we haven't
>  	 * initialized the filter field with another --diff-filter, start
>  	 * from full set of bits, except for AON.
>  	 */
>  	if (!opt->filter) {
> -		for (i = 0; (optch = optarg[i]) != '\0'; i++) {
> -			if (optch < 'a' || 'z' < optch)
> -				continue;
> +		optch = optarg[0];
> +		if (optch >= 'a' && 'z' >= optch) {
>  			opt->filter = (1 << (ARRAY_SIZE(diff_status_letters) - 1)) - 1;
>  			opt->filter &= ~filter_bit[DIFF_STATUS_FILTER_AON];
> -			break;
>  		}
>  	}

Thinking through how this would have worked before with
`--diff-filter=Dr`, I think it goes something like:

  1. We set all bits (except the all-or-none bit) on via the first loop.
  2. Then we OR in the bit for deletions, which does not change the
     overall filter (since it was already set by the previous step).
  3. Then we unset the bit corresponding to renames.

That leaves us with all bits on except two: DIFF_STATUS_RENAMED and
DIFF_STATUS_FILTER_AON.

As far as I can understand, the AON "filter" shows all files as long as
at least one of them matches the filter, otherwise it shows nothing at
all.

But that doesn't save us, since we have many more bits on than we should
have, meaning that `--diff-filter=Dr` doesn't work at all (assuming you
expected it to show just deletions, like `--diff-filter=D` does).

It's possible that I don't understand what the all-or-nothing bit is
supposed to be doing, though.

Thanks,
Taylor
Junio C Hamano Jan. 26, 2022, 7:23 a.m. UTC | #2
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `--diff-filter=<bits>` option allows to filter the diff by certain
> criteria, for example `R` to only show renamed files. It also supports
> negating a filter via a down-cased letter, i.e. `r` to show _everything
> but_ renamed files.

[jc: Squashable fix-up attached at the end]

>
> However, the code is a bit overzealous when trying to figure out whether
> `git diff` should start with all diff-filters turned on because the user
> provided a lower-case letter: if the `--diff-filter` argument starts
> with an upper-case letter, we must not start with all bits turned on.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  diff.c         | 8 +++-----
>  t/t4202-log.sh | 8 ++++++++
>  2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index c862771a589..fc1151b9c73 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4821,17 +4821,15 @@ static int diff_opt_diff_filter(const struct option *option,
>  	prepare_filter_bits();
>  
>  	/*
> -	 * If there is a negation e.g. 'd' in the input, and we haven't
> +	 * If the input starts with a negation e.g. 'd', and we haven't
>  	 * initialized the filter field with another --diff-filter, start
>  	 * from full set of bits, except for AON.
>  	 */
>  	if (!opt->filter) {
> -		for (i = 0; (optch = optarg[i]) != '\0'; i++) {
> -			if (optch < 'a' || 'z' < optch)
> -				continue;
> +		optch = optarg[0];
> +		if (optch >= 'a' && 'z' >= optch) {

Style.  When both sides of && must be satisfied, list these three
things in the order that should appear.  For example,

		if ('z' >= optch && optch >= 'a')

is OK because it makes it clear that optch sits between 'z' and 'a'
for the expression to be true.  The existing one is also OK for the
same reason.  The condition holds when either optch is below
(i.e. comes before) 'a', or it is above (i.e. comes after) 'z', so

		if (optch < 'a' || 'z' < optch)

orders them naturally.  Also

		if ('a' <= optch && optch <= 'z')

is good for the same reason as the first example, but probably is
better because the three things are ordered from smaller to larger,
which is usually how people count things.

I'd usually let this pass if it were new code, but because the patch
breaks the ordering the existing code has, it is a different story.

But more importantly, I do not know if the updated code is correct.

>  			opt->filter = (1 << (ARRAY_SIZE(diff_status_letters) - 1)) - 1;
>  			opt->filter &= ~filter_bit[DIFF_STATUS_FILTER_AON];
> -			break;
>  		}
>  	}

While the finding in the cover letter (i.e. "--diff-filter=Dr does
not work as expected") is certainly good, I do not know about this
implementation.  "--diff-filter=rD" and "--diff-filter=Dr" ought to
behave the same way, but if we base our logic on optarg[0], then the
first letter and only the first letter is made special, which does
not smell right.

> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index 50495598619..28f727937dd 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -142,6 +142,14 @@ test_expect_success 'diff-filter=R' '
>  
>  '
>  
> +test_expect_success 'diff-filter=Ra' '
> +
> +	git log -M --pretty="format:%s" --diff-filter=R HEAD >expect &&
> +	git log -M --pretty="format:%s" --diff-filter=Ra HEAD >actual &&
> +	test_cmp expect actual
> +

In other words, this should work the same way, no?

> +	git log -M --pretty="format:%s" --diff-filter=aR HEAD >actual &&
	
> +'
> +
>  test_expect_success 'diff-filter=C' '
>  
>  	git log -C -C --pretty="format:%s" --diff-filter=C HEAD >actual &&



------------------------------------------------------------

 diff.c         | 31 ++++++++++++++++++++++++++++---
 t/t4202-log.sh |  4 +++-
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index fc1151b9c7..a57e458f63 100644
--- a/diff.c
+++ b/diff.c
@@ -4821,13 +4821,38 @@ static int diff_opt_diff_filter(const struct option *option,
 	prepare_filter_bits();
 
 	/*
-	 * If the input starts with a negation e.g. 'd', and we haven't
+	 * If there is a negation e.g. 'd' in the input, and we haven't
 	 * initialized the filter field with another --diff-filter, start
 	 * from full set of bits, except for AON.
+	 * However, the user can try to limit to selected positive bits,
+	 * in which case we do not have to.
+	 *
+	 * NEEDSWORK: the "we haven't initialied" above is meant to
+	 * address cases where multiple options, e.g. --diff-filter=d
+	 * --diff-filter=a are given.  But this implementation is
+	 * insufficient when we refrain from starting from full set
+	 * when any positive bit is given.  Consider "--diff-filter=D
+	 * --diff-filter=r", which ought to behave the same way as
+	 * "--diff-filter=Dr" and "--diff-filter=rD".  The right fix
+	 * would probably involve two "opt->filter[NP]" fields,
+	 * records positive and negative bits separately in them while
+	 * parsing, and then after processing all options, compute
+	 * opt->filter by subtracting opt->filterN from opt->filterP
+	 * (and when we do so, fill opt->filterP to full bits if it is
+	 * absolutely empty).
 	 */
 	if (!opt->filter) {
-		optch = optarg[0];
-		if (optch >= 'a' && 'z' >= optch) {
+		int has_positive = 0;
+		int has_negative = 0;
+
+		for (i = 0; (optch = optarg[i]) != '\0'; i++) {
+			if (optch < 'a' || 'z' < optch)
+				has_positive++;
+			else
+				has_negative++;
+		}
+
+		if (!has_positive && has_negative) {
 			opt->filter = (1 << (ARRAY_SIZE(diff_status_letters) - 1)) - 1;
 			opt->filter &= ~filter_bit[DIFF_STATUS_FILTER_AON];
 		}
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 28f727937d..128183e66f 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -142,10 +142,12 @@ test_expect_success 'diff-filter=R' '
 
 '
 
-test_expect_success 'diff-filter=Ra' '
+test_expect_success 'diff-filter=Ra and aR' '
 
 	git log -M --pretty="format:%s" --diff-filter=R HEAD >expect &&
 	git log -M --pretty="format:%s" --diff-filter=Ra HEAD >actual &&
+	test_cmp expect actual &&
+	git log -M --pretty="format:%s" --diff-filter=aR HEAD >actual &&
 	test_cmp expect actual
 
 '
Ævar Arnfjörð Bjarmason Jan. 26, 2022, 5:57 p.m. UTC | #3
On Tue, Jan 25 2022, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `--diff-filter=<bits>` option allows to filter the diff by certain
> criteria, for example `R` to only show renamed files. It also supports
> negating a filter via a down-cased letter, i.e. `r` to show _everything
> but_ renamed files.
>
> However, the code is a bit overzealous when trying to figure out whether
> `git diff` should start with all diff-filters turned on because the user
> provided a lower-case letter: if the `--diff-filter` argument starts
> with an upper-case letter, we must not start with all bits turned on.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  diff.c         | 8 +++-----
>  t/t4202-log.sh | 8 ++++++++
>  2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index c862771a589..fc1151b9c73 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4821,17 +4821,15 @@ static int diff_opt_diff_filter(const struct option *option,
>  	prepare_filter_bits();
>  
>  	/*
> -	 * If there is a negation e.g. 'd' in the input, and we haven't
> +	 * If the input starts with a negation e.g. 'd', and we haven't
>  	 * initialized the filter field with another --diff-filter, start
>  	 * from full set of bits, except for AON.
>  	 */
>  	if (!opt->filter) {
> -		for (i = 0; (optch = optarg[i]) != '\0'; i++) {
> -			if (optch < 'a' || 'z' < optch)
> -				continue;
> +		optch = optarg[0];
> +		if (optch >= 'a' && 'z' >= optch) {

We'll probably never have to deal with non-ASCII, so maybe this is being
overzelous, but perhaps changing this to islower(optch) is worth it?

This relies on non-standard C both in the pre- and post-image, but in
reality it works everywhere, until someone attempts to port git to an
EBCDIC system...

>  			opt->filter = (1 << (ARRAY_SIZE(diff_status_letters) - 1)) - 1;
>  			opt->filter &= ~filter_bit[DIFF_STATUS_FILTER_AON];
> -			break;
>  		}
>  	}
>  
> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index 50495598619..28f727937dd 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -142,6 +142,14 @@ test_expect_success 'diff-filter=R' '
>  
>  '
>  
> +test_expect_success 'diff-filter=Ra' '
> +

nit: extra \n

> +	git log -M --pretty="format:%s" --diff-filter=R HEAD >expect &&
> +	git log -M --pretty="format:%s" --diff-filter=Ra HEAD >actual &&
> +	test_cmp expect actual
> +
> +'
> +
>  test_expect_success 'diff-filter=C' '
>  
>  	git log -C -C --pretty="format:%s" --diff-filter=C HEAD >actual &&
Taylor Blau Jan. 26, 2022, 10:55 p.m. UTC | #4
On Tue, Jan 25, 2022 at 11:23:28PM -0800, Junio C Hamano wrote:
>  diff.c         | 31 ++++++++++++++++++++++++++++---
>  t/t4202-log.sh |  4 +++-
>  2 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index fc1151b9c7..a57e458f63 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4821,13 +4821,38 @@ static int diff_opt_diff_filter(const struct option *option,
>  	prepare_filter_bits();
>
>  	/*
> -	 * If the input starts with a negation e.g. 'd', and we haven't
> +	 * If there is a negation e.g. 'd' in the input, and we haven't
>  	 * initialized the filter field with another --diff-filter, start
>  	 * from full set of bits, except for AON.
> +	 * However, the user can try to limit to selected positive bits,
> +	 * in which case we do not have to.
> +	 *
> +	 * NEEDSWORK: the "we haven't initialied" above is meant to
> +	 * address cases where multiple options, e.g. --diff-filter=d
> +	 * --diff-filter=a are given.  But this implementation is
> +	 * insufficient when we refrain from starting from full set
> +	 * when any positive bit is given.  Consider "--diff-filter=D
> +	 * --diff-filter=r", which ought to behave the same way as
> +	 * "--diff-filter=Dr" and "--diff-filter=rD".  The right fix
> +	 * would probably involve two "opt->filter[NP]" fields,
> +	 * records positive and negative bits separately in them while
> +	 * parsing, and then after processing all options, compute
> +	 * opt->filter by subtracting opt->filterN from opt->filterP
> +	 * (and when we do so, fill opt->filterP to full bits if it is
> +	 * absolutely empty).
>  	 */
>  	if (!opt->filter) {
> -		optch = optarg[0];
> -		if (optch >= 'a' && 'z' >= optch) {
> +		int has_positive = 0;
> +		int has_negative = 0;
> +
> +		for (i = 0; (optch = optarg[i]) != '\0'; i++) {
> +			if (optch < 'a' || 'z' < optch)
> +				has_positive++;
> +			else
> +				has_negative++;
> +		}
> +
> +		if (!has_positive && has_negative) {
>  			opt->filter = (1 << (ARRAY_SIZE(diff_status_letters) - 1)) - 1;
>  			opt->filter &= ~filter_bit[DIFF_STATUS_FILTER_AON];
>  		}

Ahh. I feel much better about this implementation. Something was nagging
me about treating optarg[0] specially, and you put very succinctly what
it was that was bothering me.

(One small nit that I absolutely do not care about is using a variable
that starts with 'has_'--which I would expect to be a boolean--to count
the number of positive/negative filters. Perhaps calling these
positive_nr, and negative_nr, respectively, would be clearer.)

> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index 28f727937d..128183e66f 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -142,10 +142,12 @@ test_expect_success 'diff-filter=R' '
>
>  '
>
> -test_expect_success 'diff-filter=Ra' '
> +test_expect_success 'diff-filter=Ra and aR' '
>
>  	git log -M --pretty="format:%s" --diff-filter=R HEAD >expect &&
>  	git log -M --pretty="format:%s" --diff-filter=Ra HEAD >actual &&
> +	test_cmp expect actual &&
> +	git log -M --pretty="format:%s" --diff-filter=aR HEAD >actual &&
>  	test_cmp expect actual

Perfect.

Thanks,
Taylor
Junio C Hamano Jan. 26, 2022, 11:36 p.m. UTC | #5
Taylor Blau <me@ttaylorr.com> writes:

>> +		int has_positive = 0;
>> +		int has_negative = 0;
>> +
>> +		for (i = 0; (optch = optarg[i]) != '\0'; i++) {
>> +			if (optch < 'a' || 'z' < optch)
>> +				has_positive++;
>> +			else
>> +				has_negative++;
>> +		}
>> +
>> +		if (!has_positive && has_negative) {
>
> (One small nit that I absolutely do not care about is using a variable
> that starts with 'has_'--which I would expect to be a boolean--to count
> the number of positive/negative filters. Perhaps calling these

These variables are indeed used in the boolean sense (see how they
are used in the condition in the "if" statement).

If we wanted to short-circuit, we could do this:

	for (i = 0;
	     !has_positive && !has_negative &&
	     (optch = optarg[i]) != '\0';
	     i++) {
		if (isupper(optch))
			has_negative++;
		else
			has_positive++;
	}

and thanks to their names, nobody would be confused to find it is a
bug that these do not count when the loop exits early.  We shouldn't
name these positive_nr or negative_nr because we are not interested
in counting them.

It is just that "bool++" is more idiomatic than repeated assignment
"bool = 1" when setter and getter both know it is merely used as a
Boolean, and that is why they named as has_X, which clearly is a
name for a Boolean, not a counter.

Having said that, I do not mind if an assignment is used instead of
post-increment in new code.  I just think it is waste of time to go
find increments that toggle a Boolean to true and changing them to
assignments, and it is even more waste having to review such a code
churn, so let's not go there ;-)

But as I wrote in the big NEEDSWORK comment, this loop should
disappear when the code is properly rewritten to correct the
interaction between two or more "--diff-filter=" options, so it
would not matter too much.

Thanks.
Johannes Schindelin Jan. 28, 2022, 11:52 a.m. UTC | #6
Hi Taylor,

On Tue, 25 Jan 2022, Taylor Blau wrote:

> On Tue, Jan 25, 2022 at 10:29:19PM +0000, Johannes Schindelin via GitGitGadget wrote:
> > diff --git a/diff.c b/diff.c
> > index c862771a589..fc1151b9c73 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -4821,17 +4821,15 @@ static int diff_opt_diff_filter(const struct option *option,
> >  	prepare_filter_bits();
> >
> >  	/*
> > -	 * If there is a negation e.g. 'd' in the input, and we haven't
> > +	 * If the input starts with a negation e.g. 'd', and we haven't
> >  	 * initialized the filter field with another --diff-filter, start
> >  	 * from full set of bits, except for AON.
> >  	 */
> >  	if (!opt->filter) {
> > -		for (i = 0; (optch = optarg[i]) != '\0'; i++) {
> > -			if (optch < 'a' || 'z' < optch)
> > -				continue;
> > +		optch = optarg[0];
> > +		if (optch >= 'a' && 'z' >= optch) {
> >  			opt->filter = (1 << (ARRAY_SIZE(diff_status_letters) - 1)) - 1;
> >  			opt->filter &= ~filter_bit[DIFF_STATUS_FILTER_AON];
> > -			break;
> >  		}
> >  	}
>
> Thinking through how this would have worked before with
> `--diff-filter=Dr`, I think it goes something like:
>
>   1. We set all bits (except the all-or-none bit) on via the first loop.
>   2. Then we OR in the bit for deletions, which does not change the
>      overall filter (since it was already set by the previous step).
>   3. Then we unset the bit corresponding to renames.
>
> That leaves us with all bits on except two: DIFF_STATUS_RENAMED and
> DIFF_STATUS_FILTER_AON.

Correct. And since we asked only for "Deleted", we get way more than we
bargained for.

> As far as I can understand, the AON "filter" shows all files as long as
> at least one of them matches the filter, otherwise it shows nothing at
> all.

Right, so on its own, it is quite useless. It needs to be combined with
another diff filter to make sense.

> But that doesn't save us, since we have many more bits on than we should
> have, meaning that `--diff-filter=Dr` doesn't work at all (assuming you
> expected it to show just deletions, like `--diff-filter=D` does).

Correct.

Ciao,
Dscho
Johannes Schindelin Jan. 28, 2022, 11:54 a.m. UTC | #7
Hi Junio,

On Wed, 26 Jan 2022, Junio C Hamano wrote:

> [...] this loop should disappear when the code is properly rewritten to
> correct the interaction between two or more "--diff-filter=" options, so
> it would not matter too much.

And so it did. Thank you for the suggestion. At first I thought that it
would require too extensive a code change, but was pleasantly surprised
when I figured out that the final step can be done as part of
`diff_setup_done()`, with a quite pleasant-to-read patch, if I may say so.

Ciao,
Dscho
diff mbox series

Patch

diff --git a/diff.c b/diff.c
index c862771a589..fc1151b9c73 100644
--- a/diff.c
+++ b/diff.c
@@ -4821,17 +4821,15 @@  static int diff_opt_diff_filter(const struct option *option,
 	prepare_filter_bits();
 
 	/*
-	 * If there is a negation e.g. 'd' in the input, and we haven't
+	 * If the input starts with a negation e.g. 'd', and we haven't
 	 * initialized the filter field with another --diff-filter, start
 	 * from full set of bits, except for AON.
 	 */
 	if (!opt->filter) {
-		for (i = 0; (optch = optarg[i]) != '\0'; i++) {
-			if (optch < 'a' || 'z' < optch)
-				continue;
+		optch = optarg[0];
+		if (optch >= 'a' && 'z' >= optch) {
 			opt->filter = (1 << (ARRAY_SIZE(diff_status_letters) - 1)) - 1;
 			opt->filter &= ~filter_bit[DIFF_STATUS_FILTER_AON];
-			break;
 		}
 	}
 
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 50495598619..28f727937dd 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -142,6 +142,14 @@  test_expect_success 'diff-filter=R' '
 
 '
 
+test_expect_success 'diff-filter=Ra' '
+
+	git log -M --pretty="format:%s" --diff-filter=R HEAD >expect &&
+	git log -M --pretty="format:%s" --diff-filter=Ra HEAD >actual &&
+	test_cmp expect actual
+
+'
+
 test_expect_success 'diff-filter=C' '
 
 	git log -C -C --pretty="format:%s" --diff-filter=C HEAD >actual &&