diff mbox series

[RFC,3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

Message ID 20180924100604.32208-4-phillip.wood@talktalk.net (mailing list archive)
State New, archived
Headers show
Series [RFC,1/3] xdiff-interface: make xdl_blankline() available | expand

Commit Message

Phillip Wood Sept. 24, 2018, 10:06 a.m. UTC
From: Phillip Wood <phillip.wood@dunelm.org.uk>

This adds another mode for highlighting lines that have moved with an
indentation change. Unlike the existing
--color-moved-ws=allow-indentation-change setting this mode uses the
visible change in the indentation to group lines, rather than the
indentation string. This means it works with files that use a mix of
tabs and spaces for indentation and can cope with whitespace errors
where there is a space before a tab (it's the job of
--ws-error-highlight to deal with those errors, it should affect the
move detection). It will also group the lines either
side of a blank line if their indentation change matches so short
lines followed by a blank line followed by more lines with the same
indentation change will be correctly highlighted.

This is a RFC as there are a number of questions about how to proceed
from here:
 1) Do we need a second option or should this implementation replace
    --color-moved-ws=allow-indentation-change. I'm unclear if that mode
    has any advantages for some people. There seems to have been an
    intention [1] to get it working with mixes of tabs and spaces but
    nothing ever came of it.
 2) If we keep two options what should this option be called, the name
    is long and ambiguous at the moment - mixed could refer to mixed
    indentation length rather than a mix of tabs and spaces.
 3) Should we support whitespace flags with this mode?
    --ignore-space-at-eol and --ignore-cr-at eol would be fairly simple
    to support and I can see a use for them, --ignore-all-space and
    --ignore-space-change would need some changes to xdiff to allow them
    to apply only after the indentation. I think --ignore-blank-lines
    would need a bit of work to get it working as well. (Note the
    existing mode does not support any of these flags either)

[1] https://public-inbox.org/git/CAGZ79kasAqE+=7ciVrdjoRdu0UFjVBr8Ma502nw+3hZL=ebXYQ@mail.gmail.com/

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 122 +++++++++++++++++++++++++++++++++----
 diff.h                     |   1 +
 t/t4015-diff-whitespace.sh |  89 +++++++++++++++++++++++++++
 3 files changed, 199 insertions(+), 13 deletions(-)

Comments

Stefan Beller Sept. 25, 2018, 1:07 a.m. UTC | #1
On Mon, Sep 24, 2018 at 3:06 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> This adds another mode for highlighting lines that have moved with an
> indentation change. Unlike the existing
> --color-moved-ws=allow-indentation-change setting this mode uses the
> visible change in the indentation to group lines, rather than the
> indentation string.

Wow! Thanks for putting this RFC out.
My original vision was to be useful to python users as well,
which counts 1 tab as 8 spaces IIUC.

The "visual" indentation you mention here sounds like
a tab is counted as "up to the next position of (n-1) % 8",
i.e. stop at positions 8, 16, 24... which would not be pythonic,
but useful in e.g. our code base.

> This means it works with files that use a mix of
> tabs and spaces for indentation and can cope with whitespace errors
> where there is a space before a tab

Cool!

> (it's the job of
> --ws-error-highlight to deal with those errors, it should affect the
> move detection).

Not sure I understand this side note. So --ws-error-highlight can
highlight them, but the move detection should *not*(?) be affected
by the highlighted parts, or it should do things differently on
whether  --ws-error-highlight is given?

> It will also group the lines either
> side of a blank line if their indentation change matches so short
> lines followed by a blank line followed by more lines with the same
> indentation change will be correctly highlighted.

That sounds very useful (at least for my editor, that strips
blank lines to be empty lines), but I would think this feature is
worth its own commit/patch.

I wonder how much this feature is orthogonal to the existing
problem of detecting the moved indented blocks (existing
allow-indentation-change vs the new feature discussed first
above)

>
> This is a RFC as there are a number of questions about how to proceed
> from here:
>  1) Do we need a second option or should this implementation replace
>     --color-moved-ws=allow-indentation-change. I'm unclear if that mode
>     has any advantages for some people. There seems to have been an
>     intention [1] to get it working with mixes of tabs and spaces but
>     nothing ever came of it.

Oh, yeah, I was working on that, but dropped the ball.

I am not sure what the best end goal is, or if there are many different
modes that are useful to different target audiences.
My own itch at the time was (de-/)in-dented code from refactoring
patches for git.git and JGit (so Java, C, shell); and I think not hurting
python would also be good.

ignoring the mixture of ws seems like it would also cater free text or
other more exotic languages.

What is your use case, what kind of content do you process that
this patch would help you?

I am not overly attached to the current implementation of
 --color-moved-ws=allow-indentation-change,
and I think Junio has expressed the fear of "too many options"
already in this problem space, so if possible I would extend/replace
the current option.

>  2) If we keep two options what should this option be called, the name
>     is long and ambiguous at the moment - mixed could refer to mixed
>     indentation length rather than a mix of tabs and spaces.

Let's first read the code to have an opinion, or re-state the question
from above ("What is this used for?") as I could imagine one of the
modes could be "ws-pythonic" and allow for whitespace indentation
that would have the whole block count as an indented by the same
amount, (e.g. if you wrap a couple functions in python by a class).

> +++ b/diff.c
> @@ -304,7 +304,11 @@ static int parse_color_moved_ws(const char *arg)
>                 else if (!strcmp(sb.buf, "ignore-all-space"))
>                         ret |= XDF_IGNORE_WHITESPACE;
>                 else if (!strcmp(sb.buf, "allow-indentation-change"))
> -                       ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
> +                       ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
> +                        (ret & ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE);

So this RFC lets "allow-indentation-change" override
"allow-mixed-indentation-change" and vice versa. That
also solves the issue of configuring one and giving the other
as a command line option. Nice.

>         if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
>             (ret & XDF_WHITESPACE_FLAGS))
>                 die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));
> +       else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) &&
> +                (ret & XDF_WHITESPACE_FLAGS))
> +               die(_("color-moved-ws: allow-mixed-indentation-change cannot be combined with other white space modes"));

Do we want to open a bit mask for all indentation change options? e.g.
#define COLOR_MOVED_WS_INDENTATION_MASK \
    (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE | \
     COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE)

> @@ -763,11 +770,65 @@ struct moved_entry {
>   * comparision is longer than the second.
>   */
>  struct ws_delta {
> -       char *string;
> +       union {
> +               int delta;
> +               char *string;
> +       };
>         unsigned int current_longer : 1;
> +       unsigned int have_string : 1;
>  };
>  #define WS_DELTA_INIT { NULL, 0 }
>
> +static int compute_mixed_ws_delta(const struct emitted_diff_symbol *a,
> +                                 const struct emitted_diff_symbol *b,
> +                                 int *delta)
> +{
> +       unsigned int i = 0, j = 0;
> +       int la, lb;
> +       int ta = a->flags & WS_TAB_WIDTH_MASK;
> +       int tb = b->flags & WS_TAB_WIDTH_MASK;
> +       const char *sa = a->line;
> +       const char *sb = b->line;
> +
> +       if (xdiff_is_blankline(sa, a->len, 0) &&
> +           xdiff_is_blankline(sb, b->len, 0)) {
> +               *delta = INT_MIN;
> +               return 1;
> +       }
> +
> +       /* skip any \v \f \r at start of indentation */
> +       while (sa[i] == '\f' || sa[i] == '\v' ||
> +              (sa[i] == '\r' && i < a->len - 1))

I do not understand the use of parens here.
I would have expected all comparisons in one
block which is then &&'d to the length requirement.
But this seems to tread \r special if not at EOL.

> +               i++;
> +       while (sb[j] == '\f' || sb[j] == '\v' ||
> +              (sb[j] == '\r' && j < b->len - 1))
> +               j++;
> +
> +       for (la = 0; ; i++) {
> +               if (sa[i] == ' ')
> +                       la++;
> +               else if (sa[i] == '\t')
> +                       la = ((la + ta) / ta) * ta;

multiplication/division may be expensive,
would something like

  la = la - (la % ta) + ta;

work instead? (the modulo is a hidden division,
but at least we do not have another multiplication)

Further I'd find it slightly easier to understand as it
"fills up to the next multiple of <ta>" whereas the
divide and re-multiply trick relies on integer logic, but
that might be just me.  Maybe just add a comment.

> +               else
> +                       break;
> +       }
> +       for (lb = 0; ; j++) {
> +               if (sb[j] == ' ')
> +                       lb++;
> +               else if (sb[j] == '\t')
> +                       lb = ((lb + tb) / tb) * tb;
> +               else
> +                       break;
> +       }
> +       if (a->s == DIFF_SYMBOL_PLUS)
> +               *delta = la - lb;
> +       else
> +               *delta = lb - la;

When writing the original feature I had reasons
not to rely on the symbol, as you could have
moved things from + to - (or the other way round)
and added or removed indentation. That is what the
`current_longer` is used for. But given that you only
count here, we can have negative numbers, so it
would work either way for adding or removing indentation.

But then, why do we need to have a different sign
depending on the sign of the line?

> +
> +       return (a->len - i == b->len - j) &&
> +               !memcmp(sa + i, sb + j, a->len - i);
> +}
> +
>  static int compute_ws_delta(const struct emitted_diff_symbol *a,
>                              const struct emitted_diff_symbol *b,
>                              struct ws_delta *out)
> @@ -778,6 +839,7 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
>
>         out->string = xmemdupz(longer->line, d);
>         out->current_longer = (a == longer);
> +       out->have_string = 1;
>
>         return !strncmp(longer->line + d, shorter->line, shorter->len);
>  }
> @@ -820,15 +882,34 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
>          * To do so we need to compare 'l' to 'cur', adjusting the
>          * one of them for the white spaces, depending which was longer.
>          */

The comment above would only apply to the original mode?

> +       if (o->color_moved_ws_handling &
> +           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
> +               wslen = strlen(pmb->wsd->string);
> +               if (pmb->wsd->current_longer)
> +                       c += wslen;
> +               else
> +                       a += wslen;
>
> -       wslen = strlen(pmb->wsd->string);
> -       if (pmb->wsd->current_longer)
> -               c += wslen;
> -       else
> -               a += wslen;
> +               if (strcmp(a, c))
> +                       return 1;

This could be "return strcmp" instead of falling
through to the last line in the function in case of 0. But this
is just indenting code that is already there.

>
> -       if (strcmp(a, c))
> -               return 1;
> +               return 0;
> +       } else if (o->color_moved_ws_handling &
> +                  COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
> +               int delta;
> +
> +               if (!compute_mixed_ws_delta(cur->es, l, &delta))
> +                   return 1;
> +
> +               if (pmb->wsd->delta == INT_MIN) {
> +                       pmb->wsd->delta = delta;
> +                       return 0;
> +               }
> +
> +               return !(delta == pmb->wsd->delta || delta == INT_MIN);

Most of the code here deals with jumping over empty lines, and the new
mode is just comparing the two numbers.


> +       } else {
> +               BUG("no color_moved_ws_allow_indentation_change set");

Instead of the BUG here could we have a switch/case (or if/else)
covering the complete space of delta->have_string instead?
Then we would not leave a lingering bug in the code base.

> +       }
>
>         return 0;
>  }
> @@ -845,7 +926,8 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>                          & XDF_WHITESPACE_FLAGS;
>
>         if (diffopt->color_moved_ws_handling &
> -           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> +           (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
> +            COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>                 /*
>                  * As there is not specific white space config given,
>                  * we'd need to check for a new block, so ignore all
> @@ -953,7 +1035,8 @@ static void pmb_advance_or_null_multi_match(struct diff_options *o,
>                         pmb[i] = pmb[i]->next_line;
>                 } else {
>                         if (pmb[i]->wsd) {
> -                               free(pmb[i]->wsd->string);
> +                               if (pmb[i]->wsd->have_string)
> +                                       free(pmb[i]->wsd->string);
>                                 FREE_AND_NULL(pmb[i]->wsd);
>                         }
>                         pmb[i] = NULL;
> @@ -1066,7 +1149,8 @@ static void mark_color_as_moved(struct diff_options *o,
>                         continue;
>
>                 if (o->color_moved_ws_handling &
> -                   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> +                   (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
> +                    COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>                         pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
>                 else
>                         pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
> @@ -1088,6 +1172,17 @@ static void mark_color_as_moved(struct diff_options *o,
>                                                 pmb[pmb_nr++] = match;
>                                         } else
>                                                 free(wsd);
> +                               } else if (o->color_moved_ws_handling &
> +                                          COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
> +                                       int delta;
> +
> +                                       if (compute_mixed_ws_delta(l, match->es, &delta)) {
> +                                               struct ws_delta *wsd = xmalloc(sizeof(*match->wsd));
> +                                               wsd->delta = delta;
> +                                               wsd->have_string = 0;
> +                                               match->wsd = wsd;
> +                                               pmb[pmb_nr++] = match;

I would want to keep mark_color_as_moved and friends smaller, and instead
move the complexity to compute_ws_delta  which would check for the mode
in `o` instead of repeating the modes in all these function.
Just like cmp_in_block_with_wsd takes both modes into account


> +                                       }
>                                 } else {
>                                         pmb[pmb_nr++] = match;
>                                 }
> @@ -5740,7 +5835,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>                         struct hashmap add_lines, del_lines;
>
>                         if (o->color_moved_ws_handling &
> -                           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> +                           (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
> +                            COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>                                 o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
>
>                         hashmap_init(&del_lines, moved_entry_cmp, o, 0);
> diff --git a/diff.h b/diff.h
> index 5e6bcf0926..03628cda45 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -217,6 +217,7 @@ struct diff_options {
>
>         /* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
>         #define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5)
> +       #define COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE (1<<6)
>         int color_moved_ws_handling;
>  };
>
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 41facf7abf..737dbd4a42 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1902,4 +1902,93 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
>         test_i18ngrep allow-indentation-change err
>  '
>
> +NUL=''
> +test_expect_success 'compare mixed whitespace delta across moved blocks' '
> +
> +       git reset --hard &&
> +       tr Q_ "\t " <<-EOF >text.txt &&

So this is the extended version of q_to_tab, as it also
translates _ to blank.

> +       ${NUL}

is an empty line? So maybe s/NUL/EMPTY/ ?

I think the following test cases may be useful:

    (3x_) too short without
    $EMPTY
    (4x_)  being grouped across blank line

that will be indented to

    (3x_+n) too short without
    $EMPTY
    (4x_+n)  being grouped across blank line

i.e. the current test of grouping across an empty line
always has the same indentation before and after, but we
only care about the change in indentation, such that
we should be able to have different indentation levels
before and after an empty line in the code, and
still count it as a block when they are indented the
same amount.


Is it possible for a block to start with an empty line?
How do we handle multiple adjacent empty lines?

Do we need tests for such special cases?

-

I hope this helps, as I gave the feedback above
mostly unstructured.

I'm excited about the skip blank lines mode, but
I am not quite sure about the "just count" mode,
as that is what I had originally IIRC but Jonathan
seemed to not be fond of it. Maybe he remembers
why.

Thanks,
Stefan
Phillip Wood Oct. 9, 2018, 9:50 a.m. UTC | #2
Hi Stefan

Thanks for all your comments on this, they've been really helpful.

On 25/09/2018 02:07, Stefan Beller wrote:
> On Mon, Sep 24, 2018 at 3:06 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> This adds another mode for highlighting lines that have moved with an
>> indentation change. Unlike the existing
>> --color-moved-ws=allow-indentation-change setting this mode uses the
>> visible change in the indentation to group lines, rather than the
>> indentation string.
> 
> Wow! Thanks for putting this RFC out.
> My original vision was to be useful to python users as well,
> which counts 1 tab as 8 spaces IIUC.
> 
> The "visual" indentation you mention here sounds like
> a tab is counted as "up to the next position of (n-1) % 8",
> i.e. stop at positions 8, 16, 24... which would not be pythonic,
> but useful in e.g. our code base.

The docs for python2 state[1]

  Leading whitespace (spaces and tabs) at the beginning of a logical
  line is used to compute the indentation level of the line, which in
  turn is used to determine the grouping of statements.

  First, tabs are replaced (from left to right) by one to eight spaces
  such that the total number of characters up to and including the
  replacement is a multiple of eight (this is intended to be the same
  rule as used by Unix). The total number of spaces preceding the
  first non-blank character then determines the line’s
  indentation. Indentation cannot be split over multiple physical
  lines using backslashes; the whitespace up to the first backslash
  determines the indentation.

As I understand it that fits with the "visual" indentation implemented
by this patch.

For python3 adds a third paragraph[2]

  Indentation is rejected as inconsistent if a source file mixes tabs
  and spaces in a way that makes the meaning dependent on the worth of
  a tab in spaces; a TabError is raised in that case.

My impression is that people generally avoid mixing tabs and spaces in
python3 code, in which case I wonder if the "visual" indentation
combined with a suitable setting for core.whitespace to highlight
erroneous tabs/spaces would be enough. (I'm not a python programmer so I
could be completely wrong on that)

In any case the more I think about it the more convinced I am that
having a move detection mode for "pythonic" indentation is a mistake. If
a line is added with dodgy indentation then it is a problem whether or
not it has been moved so I think this should be handled by the
whitespace error highlighting. This would allow a single mode for move
detection with an indentation change.

[1] https://docs.python.org/2.7/reference/lexical_analysis.html#indentation
[2] https://docs.python.org/3.7/reference/lexical_analysis.html#indentation

>> This means it works with files that use a mix of
>> tabs and spaces for indentation and can cope with whitespace errors
>> where there is a space before a tab
> 
> Cool!
> 
>> (it's the job of
>> --ws-error-highlight to deal with those errors, it should affect the
>> move detection).
> 
> Not sure I understand this side note. So --ws-error-highlight can
> highlight them, but the move detection should *not*(?) be affected
> by the highlighted parts, or it should do things differently on
> whether  --ws-error-highlight is given?

I just meant that the move detection should pretend the whitespace
errors do not exist.

>> It will also group the lines either
>> side of a blank line if their indentation change matches so short
>> lines followed by a blank line followed by more lines with the same
>> indentation change will be correctly highlighted.
> 
> That sounds very useful (at least for my editor, that strips
> blank lines to be empty lines), but I would think this feature is
> worth its own commit/patch.
> 
> I wonder how much this feature is orthogonal to the existing
> problem of detecting the moved indented blocks (existing
> allow-indentation-change vs the new feature discussed first
> above)

It only works if the blank lines get moved with the non-blank lines
around it, then it matches the normal moved behavior I think. I'd like
to have it include blank context lines where the lines either side have
the same indentation change but that is trickier to implement.

>>
>> This is a RFC as there are a number of questions about how to proceed
>> from here:
>>  1) Do we need a second option or should this implementation replace
>>     --color-moved-ws=allow-indentation-change. I'm unclear if that mode
>>     has any advantages for some people. There seems to have been an
>>     intention [1] to get it working with mixes of tabs and spaces but
>>     nothing ever came of it.
> 
> Oh, yeah, I was working on that, but dropped the ball.
> 
> I am not sure what the best end goal is, or if there are many different
> modes that are useful to different target audiences.
> My own itch at the time was (de-/)in-dented code from refactoring
> patches for git.git and JGit (so Java, C, shell); and I think not hurting
> python would also be good.

As I said above I've more or less come to the view that the correctness
of pythonic indentation is orthogonal to move detection as it affects
all additions, not just those that correspond to moved lines.

> ignoring the mixture of ws seems like it would also cater free text or
> other more exotic languages.
> 
> What is your use case, what kind of content do you process that
> this patch would help you?

I wrote this because I was re-factoring some shell code than was using a
indentation step of four spaces but with tabs in the leading indentation
which the current mode does not handle.

> I am not overly attached to the current implementation of
>  --color-moved-ws=allow-indentation-change,
> and I think Junio has expressed the fear of "too many options"
> already in this problem space, so if possible I would extend/replace
> the current option.
> 
>>  2) If we keep two options what should this option be called, the name
>>     is long and ambiguous at the moment - mixed could refer to mixed
>>     indentation length rather than a mix of tabs and spaces.
> 
> Let's first read the code to have an opinion, or re-state the question
> from above ("What is this used for?") as I could imagine one of the
> modes could be "ws-pythonic" and allow for whitespace indentation
> that would have the whole block count as an indented by the same
> amount, (e.g. if you wrap a couple functions in python by a class).
> 
>> +++ b/diff.c
>> @@ -304,7 +304,11 @@ static int parse_color_moved_ws(const char *arg)
>>                 else if (!strcmp(sb.buf, "ignore-all-space"))
>>                         ret |= XDF_IGNORE_WHITESPACE;
>>                 else if (!strcmp(sb.buf, "allow-indentation-change"))
>> -                       ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
>> +                       ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
>> +                        (ret & ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE);
> 
> So this RFC lets "allow-indentation-change" override
> "allow-mixed-indentation-change" and vice versa. That
> also solves the issue of configuring one and giving the other
> as a command line option. Nice.
> 
>>         if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
>>             (ret & XDF_WHITESPACE_FLAGS))
>>                 die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));
>> +       else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) &&
>> +                (ret & XDF_WHITESPACE_FLAGS))
>> +               die(_("color-moved-ws: allow-mixed-indentation-change cannot be combined with other white space modes"));
> 
> Do we want to open a bit mask for all indentation change options? e.g.
> #define COLOR_MOVED_WS_INDENTATION_MASK \
>     (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE | \
>      COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE)

That's a good idea if we retain two separate modes

> 
>> @@ -763,11 +770,65 @@ struct moved_entry {
>>   * comparision is longer than the second.
>>   */
>>  struct ws_delta {
>> -       char *string;
>> +       union {
>> +               int delta;
>> +               char *string;
>> +       };
>>         unsigned int current_longer : 1;
>> +       unsigned int have_string : 1;
>>  };
>>  #define WS_DELTA_INIT { NULL, 0 }
>>
>> +static int compute_mixed_ws_delta(const struct emitted_diff_symbol *a,
>> +                                 const struct emitted_diff_symbol *b,
>> +                                 int *delta)
>> +{
>> +       unsigned int i = 0, j = 0;
>> +       int la, lb;
>> +       int ta = a->flags & WS_TAB_WIDTH_MASK;
>> +       int tb = b->flags & WS_TAB_WIDTH_MASK;
>> +       const char *sa = a->line;
>> +       const char *sb = b->line;
>> +
>> +       if (xdiff_is_blankline(sa, a->len, 0) &&
>> +           xdiff_is_blankline(sb, b->len, 0)) {
>> +               *delta = INT_MIN;
>> +               return 1;
>> +       }
>> +
>> +       /* skip any \v \f \r at start of indentation */
>> +       while (sa[i] == '\f' || sa[i] == '\v' ||
>> +              (sa[i] == '\r' && i < a->len - 1))
> 
> I do not understand the use of parens here.
> I would have expected all comparisons in one
> block which is then &&'d to the length requirement.
> But this seems to tread \r special if not at EOL.

I only want to skip '\r' if it isn't part of "\r\n" at the end of the
line. (similar to way --ignore-cr-at-eol does not ignore a trailing '\r'
on an incomplete line)

>> +               i++;
>> +       while (sb[j] == '\f' || sb[j] == '\v' ||
>> +              (sb[j] == '\r' && j < b->len - 1))
>> +               j++;
>> +
>> +       for (la = 0; ; i++) {
>> +               if (sa[i] == ' ')
>> +                       la++;
>> +               else if (sa[i] == '\t')
>> +                       la = ((la + ta) / ta) * ta;
> 
> multiplication/division may be expensive,
> would something like
> 
>   la = la - (la % ta) + ta;
> 
> work instead? (the modulo is a hidden division,
> but at least we do not have another multiplication)
> 
> Further I'd find it slightly easier to understand as it
> "fills up to the next multiple of <ta>" whereas the
> divide and re-multiply trick relies on integer logic, but
> that might be just me.  Maybe just add a comment.

I agree your version is clearer and it is marginally (~1%) faster

>> +               else
>> +                       break;
>> +       }
>> +       for (lb = 0; ; j++) {
>> +               if (sb[j] == ' ')
>> +                       lb++;
>> +               else if (sb[j] == '\t')
>> +                       lb = ((lb + tb) / tb) * tb;
>> +               else
>> +                       break;
>> +       }
>> +       if (a->s == DIFF_SYMBOL_PLUS)
>> +               *delta = la - lb;
>> +       else
>> +               *delta = lb - la;
> 
> When writing the original feature I had reasons
> not to rely on the symbol, as you could have
> moved things from + to - (or the other way round)
> and added or removed indentation. That is what the
> `current_longer` is used for. But given that you only
> count here, we can have negative numbers, so it
> would work either way for adding or removing indentation.
> 
> But then, why do we need to have a different sign
> depending on the sign of the line?

The check means that we get the same delta whichever way round the lines
are compared. I think I added this because without it the highlighting
gets broken if there is increase in indentation followed by an identical
decrease on the next line.

>> +
>> +       return (a->len - i == b->len - j) &&
>> +               !memcmp(sa + i, sb + j, a->len - i);
>> +}
>> +
>>  static int compute_ws_delta(const struct emitted_diff_symbol *a,
>>                              const struct emitted_diff_symbol *b,
>>                              struct ws_delta *out)
>> @@ -778,6 +839,7 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
>>
>>         out->string = xmemdupz(longer->line, d);
>>         out->current_longer = (a == longer);
>> +       out->have_string = 1;
>>
>>         return !strncmp(longer->line + d, shorter->line, shorter->len);
>>  }
>> @@ -820,15 +882,34 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
>>          * To do so we need to compare 'l' to 'cur', adjusting the
>>          * one of them for the white spaces, depending which was longer.
>>          */
> 
> The comment above would only apply to the original mode?

Yes that should be changed/moved

>> +       if (o->color_moved_ws_handling &
>> +           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
>> +               wslen = strlen(pmb->wsd->string);
>> +               if (pmb->wsd->current_longer)
>> +                       c += wslen;
>> +               else
>> +                       a += wslen;
>>
>> -       wslen = strlen(pmb->wsd->string);
>> -       if (pmb->wsd->current_longer)
>> -               c += wslen;
>> -       else
>> -               a += wslen;
>> +               if (strcmp(a, c))
>> +                       return 1;
> 
> This could be "return strcmp" instead of falling
> through to the last line in the function in case of 0. But this
> is just indenting code that is already there.

As you say it's keeping the code the same, also while it does not matter
to the caller at the moment I was wary of potentially changing the sign
of the return value.

>> -       if (strcmp(a, c))
>> -               return 1;
>> +               return 0;
>> +       } else if (o->color_moved_ws_handling &
>> +                  COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
>> +               int delta;
>> +
>> +               if (!compute_mixed_ws_delta(cur->es, l, &delta))
>> +                   return 1;
>> +
>> +               if (pmb->wsd->delta == INT_MIN) {
>> +                       pmb->wsd->delta = delta;
>> +                       return 0;
>> +               }
>> +
>> +               return !(delta == pmb->wsd->delta || delta == INT_MIN);
> 
> Most of the code here deals with jumping over empty lines, and the new
> mode is just comparing the two numbers.
> 
>> +       } else {
>> +               BUG("no color_moved_ws_allow_indentation_change set");
> 
> Instead of the BUG here could we have a switch/case (or if/else)
> covering the complete space of delta->have_string instead?
> Then we would not leave a lingering bug in the code base.

I'm not sure what you mean, we cover all the existing
color_moved_ws_handling values, I added the BUG() call to pick up future
omissions if another mode is added. (If we go for a single mode none of
this matters)

>> +       }
>>
>>         return 0;
>>  }
>> @@ -845,7 +926,8 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>>                          & XDF_WHITESPACE_FLAGS;
>>
>>         if (diffopt->color_moved_ws_handling &
>> -           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
>> +           (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
>> +            COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>>                 /*
>>                  * As there is not specific white space config given,
>>                  * we'd need to check for a new block, so ignore all
>> @@ -953,7 +1035,8 @@ static void pmb_advance_or_null_multi_match(struct diff_options *o,
>>                         pmb[i] = pmb[i]->next_line;
>>                 } else {
>>                         if (pmb[i]->wsd) {
>> -                               free(pmb[i]->wsd->string);
>> +                               if (pmb[i]->wsd->have_string)
>> +                                       free(pmb[i]->wsd->string);
>>                                 FREE_AND_NULL(pmb[i]->wsd);
>>                         }
>>                         pmb[i] = NULL;
>> @@ -1066,7 +1149,8 @@ static void mark_color_as_moved(struct diff_options *o,
>>                         continue;
>>
>>                 if (o->color_moved_ws_handling &
>> -                   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
>> +                   (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
>> +                    COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>>                         pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
>>                 else
>>                         pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
>> @@ -1088,6 +1172,17 @@ static void mark_color_as_moved(struct diff_options *o,
>>                                                 pmb[pmb_nr++] = match;
>>                                         } else
>>                                                 free(wsd);
>> +                               } else if (o->color_moved_ws_handling &
>> +                                          COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
>> +                                       int delta;
>> +
>> +                                       if (compute_mixed_ws_delta(l, match->es, &delta)) {
>> +                                               struct ws_delta *wsd = xmalloc(sizeof(*match->wsd));
>> +                                               wsd->delta = delta;
>> +                                               wsd->have_string = 0;
>> +                                               match->wsd = wsd;
>> +                                               pmb[pmb_nr++] = match;
> 
> I would want to keep mark_color_as_moved and friends smaller, and instead
> move the complexity to compute_ws_delta  which would check for the mode
> in `o` instead of repeating the modes in all these function.
> Just like cmp_in_block_with_wsd takes both modes into account

That makes sense, I'll fix it.

>> +                                       }
>>                                 } else {
>>                                         pmb[pmb_nr++] = match;
>>                                 }
>> @@ -5740,7 +5835,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>>                         struct hashmap add_lines, del_lines;
>>
>>                         if (o->color_moved_ws_handling &
>> -                           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
>> +                           (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
>> +                            COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>>                                 o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
>>
>>                         hashmap_init(&del_lines, moved_entry_cmp, o, 0);
>> diff --git a/diff.h b/diff.h
>> index 5e6bcf0926..03628cda45 100644
>> --- a/diff.h
>> +++ b/diff.h
>> @@ -217,6 +217,7 @@ struct diff_options {
>>
>>         /* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
>>         #define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5)
>> +       #define COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE (1<<6)
>>         int color_moved_ws_handling;
>>  };
>>
>> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
>> index 41facf7abf..737dbd4a42 100755
>> --- a/t/t4015-diff-whitespace.sh
>> +++ b/t/t4015-diff-whitespace.sh
>> @@ -1902,4 +1902,93 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
>>         test_i18ngrep allow-indentation-change err
>>  '
>>
>> +NUL=''
>> +test_expect_success 'compare mixed whitespace delta across moved blocks' '
>> +
>> +       git reset --hard &&
>> +       tr Q_ "\t " <<-EOF >text.txt &&
> 
> So this is the extended version of q_to_tab, as it also
> translates _ to blank.

Exactly

>> +       ${NUL}
> 
> is an empty line? So maybe s/NUL/EMPTY/ ?

That might be clearer

> I think the following test cases may be useful:
> 
>     (3x_) too short without
>     $EMPTY
>     (4x_)  being grouped across blank line
> 
> that will be indented to
> 
>     (3x_+n) too short without
>     $EMPTY
>     (4x_+n)  being grouped across blank line
> 
> i.e. the current test of grouping across an empty line
> always has the same indentation before and after, but we
> only care about the change in indentation, such that
> we should be able to have different indentation levels
> before and after an empty line in the code, and
> still count it as a block when they are indented the
> same amount.

That's a good idea, thanks

> Is it possible for a block to start with an empty line?

Yes, the block in the test starts with an empty line

> How do we handle multiple adjacent empty lines?

We group them all with the moved lines. This is slightly different to
--ignore-blank-lines which has a threshold on how may blank lines it
will ignore.

> Do we need tests for such special cases?

I would probably be best, picking up changes to the behavior of unusual
corner cases is best done with a test.

> I hope this helps, as I gave the feedback above
> mostly unstructured.

It's been really useful, thank for taking the time to look through the
patch so carefully.

Best Wishes

Phillip

> I'm excited about the skip blank lines mode, but
> I am not quite sure about the "just count" mode,
> as that is what I had originally IIRC but Jonathan
> seemed to not be fond of it. Maybe he remembers
> why.
> 
> Thanks,
> Stefan
Stefan Beller Oct. 9, 2018, 9:10 p.m. UTC | #3
> As I said above I've more or less come to the view that the correctness
> of pythonic indentation is orthogonal to move detection as it affects
> all additions, not just those that correspond to moved lines.

Makes sense.

> > What is your use case, what kind of content do you process that
> > this patch would help you?
>
> I wrote this because I was re-factoring some shell code than was using a
> indentation step of four spaces but with tabs in the leading indentation
> which the current mode does not handle.

Ah that is good to know.

I was thinking whether we want to generalize the move detection into a more
generic "detect and fade out uninteresting things" and not just focus on white
spaces (but these are most often the uninteresting things).

Over the last year we had quite a couple of large refactorings, that
would have helped by that:
* For example the hash transition plan had a lot of patches that
  were basically s/char *sha1/struct object oid/ or some variation thereof.
* Introducing struct repository

I used the word diff to look at those patches, which helped a lot, but
maybe a mode that would allow me to mark this specific replacement
uninteresting would be even better.
Maybe this can be done as a piggyback on top of the move detection as
a "move in place, but with uninteresting pattern". The problem of this
is that the pattern needs to be accounted for when hashing the entries
into the hashmaps, which is easy when doing white spaces only.


> >> +       if (a->s == DIFF_SYMBOL_PLUS)
> >> +               *delta = la - lb;
> >> +       else
> >> +               *delta = lb - la;
> >
> > When writing the original feature I had reasons
> > not to rely on the symbol, as you could have
> > moved things from + to - (or the other way round)
> > and added or removed indentation. That is what the
> > `current_longer` is used for. But given that you only
> > count here, we can have negative numbers, so it
> > would work either way for adding or removing indentation.
> >
> > But then, why do we need to have a different sign
> > depending on the sign of the line?
>
> The check means that we get the same delta whichever way round the lines
> are compared. I think I added this because without it the highlighting
> gets broken if there is increase in indentation followed by an identical
> decrease on the next line.

But wouldn't we want to get that highlighted?
I do not quite understand the scenario, yet. Are both indented
and dedented part of the same block?


> >
> >> +       } else {
> >> +               BUG("no color_moved_ws_allow_indentation_change set");
> >
> > Instead of the BUG here could we have a switch/case (or if/else)
> > covering the complete space of delta->have_string instead?
> > Then we would not leave a lingering bug in the code base.
>
> I'm not sure what you mean, we cover all the existing
> color_moved_ws_handling values, I added the BUG() call to pick up future
> omissions if another mode is added. (If we go for a single mode none of
> this matters)

Ah, makes sense!
Phillip Wood Oct. 10, 2018, 3:26 p.m. UTC | #4
On 09/10/2018 22:10, Stefan Beller wrote:
>> As I said above I've more or less come to the view that the correctness
>> of pythonic indentation is orthogonal to move detection as it affects
>> all additions, not just those that correspond to moved lines.
> 
> Makes sense.

Right so are you happy for we to re-roll with a single 
allow-indentation-change mode based on my RFC?

> 
>>> What is your use case, what kind of content do you process that
>>> this patch would help you?
>>
>> I wrote this because I was re-factoring some shell code than was using a
>> indentation step of four spaces but with tabs in the leading indentation
>> which the current mode does not handle.
> 
> Ah that is good to know.
> 
> I was thinking whether we want to generalize the move detection into a more
> generic "detect and fade out uninteresting things" and not just focus on white
> spaces (but these are most often the uninteresting things).
> 
> Over the last year we had quite a couple of large refactorings, that
> would have helped by that:
> * For example the hash transition plan had a lot of patches that
>    were basically s/char *sha1/struct object oid/ or some variation thereof.
> * Introducing struct repository
> 
> I used the word diff to look at those patches, which helped a lot, but
> maybe a mode that would allow me to mark this specific replacement
> uninteresting would be even better.
> Maybe this can be done as a piggyback on top of the move detection as
> a "move in place, but with uninteresting pattern". The problem of this
> is that the pattern needs to be accounted for when hashing the entries
> into the hashmaps, which is easy when doing white spaces only.

Yes the I like the idea. Yesterday I was looking at Alban's patches to 
refactor the todo list handling for rebase -i and there are a lot of '.' 
to '->' changes which weren't particularly interesting though at least 
diff-highlight made it clear if that was the only change on a line. 
Incidentally --color-moved was very useful for looking at that series.

>>>> +       if (a->s == DIFF_SYMBOL_PLUS)
>>>> +               *delta = la - lb;
>>>> +       else
>>>> +               *delta = lb - la;
>>>
>>> When writing the original feature I had reasons
>>> not to rely on the symbol, as you could have
>>> moved things from + to - (or the other way round)
>>> and added or removed indentation. That is what the
>>> `current_longer` is used for. But given that you only
>>> count here, we can have negative numbers, so it
>>> would work either way for adding or removing indentation.
>>>
>>> But then, why do we need to have a different sign
>>> depending on the sign of the line?
>>
>> The check means that we get the same delta whichever way round the lines
>> are compared. I think I added this because without it the highlighting
>> gets broken if there is increase in indentation followed by an identical
>> decrease on the next line.
> 
> But wouldn't we want to get that highlighted?
> I do not quite understand the scenario, yet. Are both indented
> and dedented part of the same block?

With --color-moved=zebra the indented lines and the de-indented lines 
should be different colors, without the test they both ended up in the 
same block.

Best Wishes

Phillip
>>>
>>>> +       } else {
>>>> +               BUG("no color_moved_ws_allow_indentation_change set");
>>>
>>> Instead of the BUG here could we have a switch/case (or if/else)
>>> covering the complete space of delta->have_string instead?
>>> Then we would not leave a lingering bug in the code base.
>>
>> I'm not sure what you mean, we cover all the existing
>> color_moved_ws_handling values, I added the BUG() call to pick up future
>> omissions if another mode is added. (If we go for a single mode none of
>> this matters)
> 
> Ah, makes sense!
>
Stefan Beller Oct. 10, 2018, 6:05 p.m. UTC | #5
On Wed, Oct 10, 2018 at 8:26 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> On 09/10/2018 22:10, Stefan Beller wrote:
> >> As I said above I've more or less come to the view that the correctness
> >> of pythonic indentation is orthogonal to move detection as it affects
> >> all additions, not just those that correspond to moved lines.
> >
> > Makes sense.
>
> Right so are you happy for we to re-roll with a single
> allow-indentation-change mode based on my RFC?

I am happy with that.
diff mbox series

Patch

diff --git a/diff.c b/diff.c
index 0a652e28d4..45f33daa60 100644
--- a/diff.c
+++ b/diff.c
@@ -304,7 +304,11 @@  static int parse_color_moved_ws(const char *arg)
 		else if (!strcmp(sb.buf, "ignore-all-space"))
 			ret |= XDF_IGNORE_WHITESPACE;
 		else if (!strcmp(sb.buf, "allow-indentation-change"))
-			ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
+			ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
+			 (ret & ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE);
+		else if (!strcmp(sb.buf, "allow-mixed-indentation-change"))
+			ret = COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE |
+			 (ret & ~COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE);
 		else
 			error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf);
 
@@ -314,6 +318,9 @@  static int parse_color_moved_ws(const char *arg)
 	if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
 	    (ret & XDF_WHITESPACE_FLAGS))
 		die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));
+	else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) &&
+		 (ret & XDF_WHITESPACE_FLAGS))
+		die(_("color-moved-ws: allow-mixed-indentation-change cannot be combined with other white space modes"));
 
 	string_list_clear(&l, 0);
 
@@ -763,11 +770,65 @@  struct moved_entry {
  * comparision is longer than the second.
  */
 struct ws_delta {
-	char *string;
+	union {
+		int delta;
+		char *string;
+	};
 	unsigned int current_longer : 1;
+	unsigned int have_string : 1;
 };
 #define WS_DELTA_INIT { NULL, 0 }
 
+static int compute_mixed_ws_delta(const struct emitted_diff_symbol *a,
+				  const struct emitted_diff_symbol *b,
+				  int *delta)
+{
+	unsigned int i = 0, j = 0;
+	int la, lb;
+	int ta = a->flags & WS_TAB_WIDTH_MASK;
+	int tb = b->flags & WS_TAB_WIDTH_MASK;
+	const char *sa = a->line;
+	const char *sb = b->line;
+
+	if (xdiff_is_blankline(sa, a->len, 0) &&
+	    xdiff_is_blankline(sb, b->len, 0)) {
+		*delta = INT_MIN;
+		return 1;
+	}
+
+	/* skip any \v \f \r at start of indentation */
+	while (sa[i] == '\f' || sa[i] == '\v' ||
+	       (sa[i] == '\r' && i < a->len - 1))
+		i++;
+	while (sb[j] == '\f' || sb[j] == '\v' ||
+	       (sb[j] == '\r' && j < b->len - 1))
+		j++;
+
+	for (la = 0; ; i++) {
+		if (sa[i] == ' ')
+			la++;
+		else if (sa[i] == '\t')
+			la = ((la + ta) / ta) * ta;
+		else
+			break;
+	}
+	for (lb = 0; ; j++) {
+		if (sb[j] == ' ')
+			lb++;
+		else if (sb[j] == '\t')
+			lb = ((lb + tb) / tb) * tb;
+		else
+			break;
+	}
+	if (a->s == DIFF_SYMBOL_PLUS)
+		*delta = la - lb;
+	else
+		*delta = lb - la;
+
+	return (a->len - i == b->len - j) &&
+		!memcmp(sa + i, sb + j, a->len - i);
+}
+
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
 			     const struct emitted_diff_symbol *b,
 			     struct ws_delta *out)
@@ -778,6 +839,7 @@  static int compute_ws_delta(const struct emitted_diff_symbol *a,
 
 	out->string = xmemdupz(longer->line, d);
 	out->current_longer = (a == longer);
+	out->have_string = 1;
 
 	return !strncmp(longer->line + d, shorter->line, shorter->len);
 }
@@ -820,15 +882,34 @@  static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * To do so we need to compare 'l' to 'cur', adjusting the
 	 * one of them for the white spaces, depending which was longer.
 	 */
+	if (o->color_moved_ws_handling &
+	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
+		wslen = strlen(pmb->wsd->string);
+		if (pmb->wsd->current_longer)
+			c += wslen;
+		else
+			a += wslen;
 
-	wslen = strlen(pmb->wsd->string);
-	if (pmb->wsd->current_longer)
-		c += wslen;
-	else
-		a += wslen;
+		if (strcmp(a, c))
+			return 1;
 
-	if (strcmp(a, c))
-		return 1;
+		return 0;
+	} else if (o->color_moved_ws_handling &
+		   COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
+		int delta;
+
+		if (!compute_mixed_ws_delta(cur->es, l, &delta))
+		    return 1;
+
+		if (pmb->wsd->delta == INT_MIN) {
+			pmb->wsd->delta = delta;
+			return 0;
+		}
+
+		return !(delta == pmb->wsd->delta || delta == INT_MIN);
+	} else {
+		BUG("no color_moved_ws_allow_indentation_change set");
+	}
 
 	return 0;
 }
@@ -845,7 +926,8 @@  static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			 & XDF_WHITESPACE_FLAGS;
 
 	if (diffopt->color_moved_ws_handling &
-	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+	    (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
+	     COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
 		/*
 		 * As there is not specific white space config given,
 		 * we'd need to check for a new block, so ignore all
@@ -953,7 +1035,8 @@  static void pmb_advance_or_null_multi_match(struct diff_options *o,
 			pmb[i] = pmb[i]->next_line;
 		} else {
 			if (pmb[i]->wsd) {
-				free(pmb[i]->wsd->string);
+				if (pmb[i]->wsd->have_string)
+					free(pmb[i]->wsd->string);
 				FREE_AND_NULL(pmb[i]->wsd);
 			}
 			pmb[i] = NULL;
@@ -1066,7 +1149,8 @@  static void mark_color_as_moved(struct diff_options *o,
 			continue;
 
 		if (o->color_moved_ws_handling &
-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+		    (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
+		     COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
 			pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
 		else
 			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
@@ -1088,6 +1172,17 @@  static void mark_color_as_moved(struct diff_options *o,
 						pmb[pmb_nr++] = match;
 					} else
 						free(wsd);
+				} else if (o->color_moved_ws_handling &
+					   COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
+					int delta;
+
+					if (compute_mixed_ws_delta(l, match->es, &delta)) {
+						struct ws_delta *wsd = xmalloc(sizeof(*match->wsd));
+						wsd->delta = delta;
+						wsd->have_string = 0;
+						match->wsd = wsd;
+						pmb[pmb_nr++] = match;
+					}
 				} else {
 					pmb[pmb_nr++] = match;
 				}
@@ -5740,7 +5835,8 @@  static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			struct hashmap add_lines, del_lines;
 
 			if (o->color_moved_ws_handling &
-			    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			    (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
+			     COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
 				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
 
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
diff --git a/diff.h b/diff.h
index 5e6bcf0926..03628cda45 100644
--- a/diff.h
+++ b/diff.h
@@ -217,6 +217,7 @@  struct diff_options {
 
 	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
 	#define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5)
+	#define COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE (1<<6)
 	int color_moved_ws_handling;
 };
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 41facf7abf..737dbd4a42 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1902,4 +1902,93 @@  test_expect_success 'compare whitespace delta incompatible with other space opti
 	test_i18ngrep allow-indentation-change err
 '
 
+NUL=''
+test_expect_success 'compare mixed whitespace delta across moved blocks' '
+
+	git reset --hard &&
+	tr Q_ "\t " <<-EOF >text.txt &&
+	${NUL}
+	____too short without
+	${NUL}
+	____being grouped across blank line
+	${NUL}
+	context
+	lines
+	to
+	anchor
+	____Indented text to
+	_Q____be further indented by four spaces across
+	____Qseveral lines
+	QQ____These two lines have had their
+	____indentation reduced by four spaces
+	Qdifferent indentation change
+	____too short
+	EOF
+
+	git add text.txt &&
+	git commit -m "add text.txt" &&
+
+	tr Q_ "\t " <<-EOF >text.txt &&
+	context
+	lines
+	to
+	anchor
+	QIndented text to
+	QQbe further indented by four spaces across
+	Q____several lines
+	${NUL}
+	QQtoo short without
+	${NUL}
+	QQbeing grouped across blank line
+	${NUL}
+	Q_QThese two lines have had their
+	indentation reduced by four spaces
+	QQdifferent indentation change
+	__Qtoo short
+	EOF
+
+	git -c color.diff.whitespace="normal red" \
+		-c core.whitespace=space-before-tab \
+		diff --color --color-moved --ws-error-highlight=all \
+		--color-moved-ws=allow-mixed-indentation-change >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
+
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,16 +1,16 @@<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    too short without<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    being grouped across blank line<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	 <RESET>context<RESET>
+	 <RESET>lines<RESET>
+	 <RESET>to<RESET>
+	 <RESET>anchor<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    Indented text to<RESET>
+	<BOLD;MAGENTA>-<RESET><BRED> <RESET>	<BOLD;MAGENTA>    be further indented by four spaces across<RESET>
+	<BOLD;MAGENTA>-<RESET><BRED>    <RESET>	<BOLD;MAGENTA>several lines<RESET>
+	<BOLD;BLUE>-<RESET>		<BOLD;BLUE>    These two lines have had their<RESET>
+	<BOLD;BLUE>-<RESET><BOLD;BLUE>    indentation reduced by four spaces<RESET>
+	<BOLD;MAGENTA>-<RESET>	<BOLD;MAGENTA>different indentation change<RESET>
+	<RED>-<RESET><RED>    too short<RESET>
+	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>Indented text to<RESET>
+	<BOLD;CYAN>+<RESET>		<BOLD;CYAN>be further indented by four spaces across<RESET>
+	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>    several lines<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>too short without<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>being grouped across blank line<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;CYAN>+<RESET>	<BRED> <RESET>	<BOLD;CYAN>These two lines have had their<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>indentation reduced by four spaces<RESET>
+	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>different indentation change<RESET>
+	<GREEN>+<RESET><BRED>  <RESET>	<GREEN>too short<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done