[2/5] apply: honor `ignore_ws_none` with `correct_ws_error`

Message ID	1eb33969-1739-4a27-a77b-3f4268f5519d@gmail.com (mailing list archive)
State	New
Headers	show Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2123179A3 for <git@vger.kernel.org>; Sun, 25 Aug 2024 10:18:30 +0000 (UTC) Message-ID: <1eb33969-1739-4a27-a77b-3f4268f5519d@gmail.com> Date: Sun, 25 Aug 2024 12:18:28 +0200 Precedence: bulk MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [PATCH 2/5] apply: honor `ignore_ws_none` with `correct_ws_error` From: =?utf-8?q?Rub=C3=A9n_Justo?= <rjusto@gmail.com> To: Git List <git@vger.kernel.org> References: <6dd964c2-9dee-4257-8f1a-5bc31a73722e@gmail.com> Content-Language: en-US In-Reply-To: <6dd964c2-9dee-4257-8f1a-5bc31a73722e@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	`--whitespace=fix` with `--no-ignore-whitespace` \| expand [0/5] `--whitespace=fix` with `--no-ignore-whitespace` [1/5] apply: introduce `ignore_ws_default` [2/5] apply: honor `ignore_ws_none` with `correct_ws_error` [3/5] apply: whitespace errors in context lines if we have [4/5] apply: error message in `record_ws_error()` [5/5] t4124: move test preparation into the test context

Rubén Justo Aug. 25, 2024, 10:18 a.m. UTC

Ensure strict matching of context lines when applying with
`--whitespace=fix` combined with `--no-ignore-whitespace`.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
---
 apply.c                  |  3 ++-
 t/t4124-apply-ws-rule.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)

Junio C Hamano Aug. 27, 2024, 12:35 a.m. UTC | #1

Rubén Justo <rjusto@gmail.com> writes:

> Ensure strict matching of context lines when applying with
> `--whitespace=fix` combined with `--no-ignore-whitespace`.
>
> Signed-off-by: Rubén Justo <rjusto@gmail.com>
> ---
>  apply.c                  |  3 ++-
>  t/t4124-apply-ws-rule.sh | 27 +++++++++++++++++++++++++++
>  2 files changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/apply.c b/apply.c
> index 63e58086f1..0cb9d38e5a 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -2596,7 +2596,8 @@ static int match_fragment(struct apply_state *state,
>  		goto out;
>  	}
>  
> -	if (state->ws_error_action != correct_ws_error) {
> +	if (state->ws_error_action != correct_ws_error ||
> +	    state->ws_ignore_action == ignore_ws_none) {
>  		ret = 0;
>  		goto out;
>  	}

Hmph, if we are correcting for whitespace violations, even if
whitespace fuzz is not allowed externally, wouldn't the issue that
c1beba5b (git-apply --whitespace=fix: fix whitespace fuzz introduced
by previous run, 2008-01-30) corrected still apply?  IOW, isn't this
change introducing a regression when an input touches a file with a
change with broken whitespaces, and then touches the same file to
replace the broken whitespace lines with something else?

Rubén Justo Aug. 29, 2024, 5:07 a.m. UTC | #2

On Mon, Aug 26, 2024 at 05:35:14PM -0700, Junio C Hamano wrote:
> Rubén Justo <rjusto@gmail.com> writes:
> 
> > Ensure strict matching of context lines when applying with
> > `--whitespace=fix` combined with `--no-ignore-whitespace`.
> >
> > Signed-off-by: Rubén Justo <rjusto@gmail.com>
> > ---
> >  apply.c                  |  3 ++-
> >  t/t4124-apply-ws-rule.sh | 27 +++++++++++++++++++++++++++
> >  2 files changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/apply.c b/apply.c
> > index 63e58086f1..0cb9d38e5a 100644
> > --- a/apply.c
> > +++ b/apply.c
> > @@ -2596,7 +2596,8 @@ static int match_fragment(struct apply_state *state,
> >  		goto out;
> >  	}
> >  
> > -	if (state->ws_error_action != correct_ws_error) {
> > +	if (state->ws_error_action != correct_ws_error ||
> > +	    state->ws_ignore_action == ignore_ws_none) {
> >  		ret = 0;
> >  		goto out;
> >  	}
> 
> Hmph, if we are correcting for whitespace violations, even if
> whitespace fuzz is not allowed externally, wouldn't the issue that
> c1beba5b (git-apply --whitespace=fix: fix whitespace fuzz introduced
> by previous run, 2008-01-30) corrected still apply?  IOW, isn't this
> change introducing a regression when an input touches a file with a
> change with broken whitespaces, and then touches the same file to
> replace the broken whitespace lines with something else?

Yes, that is the center of the blast this series is producing;
affecting to what `--whitespace=fix` does (just a reminder for other
readers):

  - stop fixing whitespace errors in context lines and

  - no longer warning about them.

Clearly, as you point out, the change poses a problem when applying
changes with whitespace errors in lines involved multiple times,
either during the same apply session or across multiple sessions
executed in sequence.

The new `ignore_ws_default` enum option is intended to mitigate the
blast.  It would be unexpected IMHO for someone who wants the behavior
described in c1beba5b to be indicating `--no-ignore-whitespace`.  And
with it we allow the possibility for someone who finds that behavior
undesirable to avoid it.

I'm not very happy with the new enum, but I haven't come up with a
better idea.  There are other alternatives:

  --whitespace=fix-strict

  --do-not-fix-ws-errors-in-context-lines

None of them are better, I think.

Junio C Hamano Aug. 29, 2024, 11:13 p.m. UTC | #3

Rubén Justo <rjusto@gmail.com> writes:

> I'm not very happy with the new enum, but I haven't come up with a
> better idea.
> ...
> None of them are better, I think.

Not adding a new enum is probably much better.  See the "additional
thought" in my review on [3/5], for example.

Thanks.

Rubén Justo Sept. 3, 2024, 10:06 p.m. UTC | #4

On Thu, Aug 29, 2024 at 04:13:14PM -0700, Junio C Hamano wrote:
> Rubén Justo <rjusto@gmail.com> writes:
> 
> > I'm not very happy with the new enum, but I haven't come up with a
> > better idea.
> > ...
> > None of them are better, I think.
> 
> Not adding a new enum is probably much better.  See the "additional
> thought" in my review on [3/5], for example.

If I understand correctly the example you mentioned, using
`in_fn_table()` cannot help us in `parse_fragment()`.  But I could be
completely wrong and misunderstanding your intention.

I still don't see a better option than introducing a new value
`default`.  Perhaps described like this:

diff --git a/Documentation/config/apply.txt b/Documentation/config/apply.txt
index f9908e210a..7b642d2f3a 100644
--- a/Documentation/config/apply.txt
+++ b/Documentation/config/apply.txt
@@ -4,6 +4,10 @@ apply.ignoreWhitespace::
        option.
        When set to one of: no, none, never, false, it tells 'git apply' to
        respect all whitespace differences.
+       When not set or set to `default`, it tells `git apply` to
+       behave like the previous setting: `no`.  However, when
+       combined with 'whitespace=fix', some whitespace errors
+       will still be ignored because they are being fixed.
        See linkgit:git-apply[1].

 apply.whitespace:

Junio C Hamano Sept. 4, 2024, 4:41 a.m. UTC | #5

Rubén Justo <rjusto@gmail.com> writes:

> If I understand correctly the example you mentioned, using
> `in_fn_table()` cannot help us in `parse_fragment()`.  But I could be
> completely wrong and misunderstanding your intention.

Hmph.  It's unfortunate that the control flow goes like this:

 apply_all_patches()
 -> apply_patch()
    -> parse_chunk()
       -> parse_single_patch()
          -> parse_fragment()
    -> use_patch()
    -> check_patch_list()
       -> check_patch()
          -> apply_data()
             -> add_to_fn_table()

So deciding if a context line (or a new line for that matter)
contains a whitespace error is done too early in the current code,
and if you want to do this correctly, you'd need to move the check
down so that it happens in apply_one_fragment() that is called in
apply_data().  The rest of the whitespace checks are done there, and
regardless of the "patch that touches the same path twice" issue,
it feels like the right thing to do anyway.

Such a "right" fix might be involved.  If we want to punt, I think
you can still inspect the *patch inside parse_single_patch(), and
figure out if the target path of the current fragment you are
looking at has already been touched in the current session (we parse
everything into the patch struct whose fragments member has a
chained list of fragments).  Normally that should not be the case.
If we know we are not being fed such a patch with duplicated paths,
we do not have to inspect whitespace issues while parsing a context
' ' line in parse_fragments().

> I still don't see a better option than introducing a new value
> `default`.

As long as we can tell that our input is not a patch that touches
the same path twice, we shouldn't need a new knob or a command line
option.  When we are dealing with Git generated patch that does not
mention the same path twice, we can unconditionally say "unless
--ignore-space-change and other options that allow loose matching of
context lines are given, it is an error if context lines do not
exactly match, even when we are correcting for whitespace rule
violations".  Otherwise, we may need to keep the current workaround
logic in case a later change has a line as a ' ' context for a file
that an earlier change modified (and fixed with --whitespace=fix).

It is a different story if all you want to change is to add to
--whitespace family of options a new "silent-fix" that makes
corrections in the same way as "fix" (or "strip"), but wihtout
giving any warnings.  I think it makes sense to have such a mode,
but that is largely orthogonal to the discussion we are having, I
think, even though it might result in a similar effect visible to
the end-users.

Thanks.

Rubén Justo Sept. 4, 2024, 6:20 p.m. UTC | #6

On Tue, Sep 03, 2024 at 09:41:31PM -0700, Junio C Hamano wrote:

> ... a new "silent-fix" that makes
> corrections in the same way as "fix" (or "strip"), but wihtout
> giving any warnings.

My main intention is not to make "fix" quieter, although it will
probably be a pleasant consequence.

Let's consider a sequence of patches where some whitespace errors in
one patch are carried over to the next:

    $ # underscore (_) is whitespace for readability
    $ cat >file <<END
    a
    END
    $ cat >patch1 <<END # this adds a whitespace error
    --- v/file
    +++ m/file
    @@ -1 +1,2 @@
     a
    +b_
    END
    $ cat >patch2 <<END # this adds another one
    --- v/file
    +++ m/file
    @@ -1,2 +1,3 @@
     a
     b_
    +c_
    END

When applying "patch1" with `--whitespace=fix`, we'll fix the
whitespace error in "b ".  Consequently, when applying "patch2" we'll
need to fix that line in the patch before applying it, so that it
matches the context line, now with "b".  Makes sense.

This applies not only to:

    $ git apply --whitespace=fix patch1 && \
      git apply --whitespace=fix patch2

But to:

    $ git apply --whitespace=fix patch1 patch2

Even to:

    $ git apply --whitespace=fix <(cat patch1 patch2)

So far, so good.

However, a legit question is: Why "a " is being modified here?:

    $ cat >foo <<END
    a_
    b_
    END
    $ cat >patch <<END
    --- v/foo
    +++ m/foo
    @@ -1,2 +1,2 @@
     a
    -b_
    +b
    END
    $ git apply -v --whitespace=fix patch
    Checking patch foo...
    Applied patch foo cleanly.
    $ xxd foo
    00000000: 610a                                     a.

Is this a bug?  I don't think so.  But the result, IMHO, is
questionable, and "git apply" rejecting the patch could also be an
expected outcome.

We are assuming an implicit, and perhaps unwanted, step:

    --- v/foo
    +++ m/foo
    @@ -1,2 +1,2 @@
    -a_
    +a
     b_
    END

We cannot change the default behaviour (I won't dig into this), but I
think we can give a knob to allow changing it.  This is the main
intention of the, ugly, new "_default" enum value.

And... as a nice side effect, we'll know when to stop showing warnings
when the user touches lines next to other lines with whitespace errors
that they don't want, or can, change ;-)

[2/5] apply: honor `ignore_ws_none` with `correct_ws_error`

Commit Message

Comments

Patch