diff mbox series

[v3,2/2] diff: teach diff to read algorithm from diff driver

Message ID b330222ce83bdf03c20085ff10fcff8a090474d5.1676665285.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Teach diff to honor diff algorithms set through git attributes | expand

Commit Message

John Cai Feb. 17, 2023, 8:21 p.m. UTC
From: John Cai <johncai86@gmail.com>

It can be useful to specify diff algorithms per file type. For example,
one may want to use the minimal diff algorithm for .json files, another
for .c files, etc.

The diff machinery already checks attributes for a diff driver. Teach
the diff driver parser a new type "algorithm" to look for in the
config, which will be used if a driver has been specified through the
attributes.

Enforce precedence of the diff algorithm by favoring the command line
option, then looking at the driver attributes & config combination, then
finally the diff.algorithm config.

To enforce precedence order, use a new `ignore_driver_algorithm` member
during options pasing to indicate the diff algorithm was set via command
line args.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/gitattributes.txt | 37 ++++++++++++++++++++++++++++++++
 diff.c                          | 33 ++++++++++++++++++++--------
 diff.h                          |  1 +
 t/lib-diff-alternative.sh       | 38 ++++++++++++++++++++++++++++++++-
 userdiff.c                      |  4 +++-
 userdiff.h                      |  1 +
 6 files changed, 103 insertions(+), 11 deletions(-)

Comments

Junio C Hamano Feb. 17, 2023, 9:50 p.m. UTC | #1
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

Looking good.  Some comments below.  Many of them minor.

> +Setting the internal diff algorithm
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The diff algorithm can be set through the `diff.algorithm` config key, but
> +sometimes it may be helpful to set the diff algorithm by path. For example, one

I would have expected "per path" instead of "by path".

> +might wish to set a diff algorithm automatically for all `.json` files such that
> +the user would not need to pass in a separate command line `--diff-algorithm`
> +flag each time.

While this is not incorrect per-se, I think the first paragraph of
the proposed commit log message was a lot more convincing.  Your
changes may not be limited to a single kind of files, and a command
line option is simply not enough.  You may want one algorithm for
".json" while using another for ".c", which was really an excellent
example you gave.

> +This diff algorithm applies to user facing diff output like git-diff(1),
> +git-show(1) and is used for the `--stat` output as well. The merge machinery
> +will not use the diff algorithm set through this method.

Is "format-patch" considered "user-facing"?

> +NOTE: If the `command` key also exists, then Git will treat this as an external
> +diff and attempt to use the value set for `command` as an external program. For
> +instance, the following config, combined with the above `.gitattributes` file,
> +will result in `command` favored over `algorithm`.
> +
> +----------------------------------------------------------------
> +[diff "<name>"]
> +  command = j-c-diff
> +  algorithm = histogram
> +----------------------------------------------------------------

Isn't this a bit too verbose, given that the reader has just seen
the external diff driver section.  I wonder something like this is
sufficient, without any sample configuration?

    NOTE: If `diff.<name>.command` is defined for path with the
    `diff=<name>` attribute, it is executed as an external diff driver
    (see above), and adding `diff.<name>.algorithm` has no effect (the
    algorithm is not passed to the external diff driver).


> diff --git a/diff.c b/diff.c
> index 5efc22ca06b..04469da6d34 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4456,15 +4456,13 @@ static void run_diff_cmd(const char *pgm,
>  	const char *xfrm_msg = NULL;
>  	int complete_rewrite = (p->status == DIFF_STATUS_MODIFIED) && p->score;
>  	int must_show_header = 0;
> +	struct userdiff_driver *drv = NULL;
>  
> -
> -	if (o->flags.allow_external) {
> -		struct userdiff_driver *drv;
> -
> +	if (o->flags.allow_external || !o->ignore_driver_algorithm)
>  		drv = userdiff_find_by_path(o->repo->index, attr_path);
> -		if (drv && drv->external)
> -			pgm = drv->external;
> -	}
> +
> +	if (o->flags.allow_external && drv && drv->external)
> +		pgm = drv->external;

OK.  There is no explicit "pgm = NULL" initialization in this
function, but that is done by the caller passing NULL to the
function as a parameter, so it all makes sense.

> @@ -4481,12 +4479,16 @@ static void run_diff_cmd(const char *pgm,
>  		run_external_diff(pgm, name, other, one, two, xfrm_msg, o);
>  		return;
>  	}
> -	if (one && two)
> +	if (one && two) {
> +		if (drv && !o->ignore_driver_algorithm && drv->algorithm)
> +			set_diff_algorithm(o, drv->algorithm);

For symmetry with the above choice of pgm we just saw, the order of
the condition might be easier to follow if written like so:

	if (!o->ignore_driver_algorithm && drv && drv->algorithm)

It would not make any measurable difference performance-wise either way.

> @@ -4583,6 +4585,14 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o,
>  	const char *name;
>  	const char *other;
>  
> +	if (!o->ignore_driver_algorithm) {
> +		struct userdiff_driver *drv = userdiff_find_by_path(o->repo->index, p->one->path);

That's an overlong line.

> +
> +		if (drv && drv->algorithm) {
> +			set_diff_algorithm(o, drv->algorithm);
> +		}

No need to have {} around a single statement block.

> +	}
> +
>  	if (DIFF_PAIR_UNMERGED(p)) {
>  		/* unmerged */
>  		builtin_diffstat(p->one->path, NULL, NULL, NULL,
> @@ -5130,6 +5140,8 @@ static int diff_opt_diff_algorithm(const struct option *opt,
>  		return error(_("option diff-algorithm accepts \"myers\", "
>  			       "\"minimal\", \"patience\" and \"histogram\""));
>  
> +	options->ignore_driver_algorithm = 1;
> +
>  	return 0;
>  }
>  
> @@ -5145,6 +5157,8 @@ static int diff_opt_diff_algorithm_no_arg(const struct option *opt,
>  		BUG("available diff algorithms include \"myers\", "
>  			       "\"minimal\", \"patience\" and \"histogram\"");
>  
> +	options->ignore_driver_algorithm = 1;
> +
>  	return 0;
>  }
> @@ -5285,6 +5299,7 @@ static int diff_opt_patience(const struct option *opt,
>  	for (i = 0; i < options->anchors_nr; i++)
>  		free(options->anchors[i]);
>  	options->anchors_nr = 0;
> +	options->ignore_driver_algorithm = 1;
>  
>  	return set_diff_algorithm(options, "patience");
>  }


I was hoping that set_diff_algorithm() can be the shared common one
that signals we were told to use a specific algorithm, but it also
is called from internal codepaths so it cannot be it.

It is probably not worth introducing an extra helper that only calls
set_diff_algorithm() and sets ignore_driver_algorithm bit only for
that to reduce three-times repetition.

OK.

> diff --git a/userdiff.c b/userdiff.c
> index d71b82feb74..ff25cfc4b4c 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -293,7 +293,7 @@ PATTERNS("scheme",
>  	 "|([^][)(}{[ \t])+"),
>  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
>  	 "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
> -{ "default", NULL, -1, { NULL, 0 } },
> +{ "default", NULL, NULL, -1, { NULL, 0 } },
>  };

I was surprised that there is so little damage to the built-in
userdiff driver definitions, but this is thanks to the PATTERNS()
and IPATTERN() macro that use designated initializers.  Very nice.

Nicely done.
Elijah Newren Feb. 18, 2023, 2:56 a.m. UTC | #2
On Fri, Feb 17, 2023 at 12:21 PM John Cai via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: John Cai <johncai86@gmail.com>
>
> It can be useful to specify diff algorithms per file type. For example,
> one may want to use the minimal diff algorithm for .json files, another
> for .c files, etc.
>
> The diff machinery already checks attributes for a diff driver. Teach
> the diff driver parser a new type "algorithm" to look for in the
> config, which will be used if a driver has been specified through the
> attributes.
>
> Enforce precedence of the diff algorithm by favoring the command line
> option, then looking at the driver attributes & config combination, then
> finally the diff.algorithm config.
>
> To enforce precedence order, use a new `ignore_driver_algorithm` member
> during options pasing to indicate the diff algorithm was set via command
> line args.

s/pasing/parsing/

> Signed-off-by: John Cai <johncai86@gmail.com>
> ---
>  Documentation/gitattributes.txt | 37 ++++++++++++++++++++++++++++++++
>  diff.c                          | 33 ++++++++++++++++++++--------
>  diff.h                          |  1 +
>  t/lib-diff-alternative.sh       | 38 ++++++++++++++++++++++++++++++++-
>  userdiff.c                      |  4 +++-
>  userdiff.h                      |  1 +
>  6 files changed, 103 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index c19e64ea0ef..f212079a131 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -758,6 +758,43 @@ with the above configuration, i.e. `j-c-diff`, with 7
>  parameters, just like `GIT_EXTERNAL_DIFF` program is called.
>  See linkgit:git[1] for details.
>
> +Setting the internal diff algorithm
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The diff algorithm can be set through the `diff.algorithm` config key, but
> +sometimes it may be helpful to set the diff algorithm by path. For example, one
> +might wish to set a diff algorithm automatically for all `.json` files such that
> +the user would not need to pass in a separate command line `--diff-algorithm`
> +flag each time.
> +
> +First, in `.gitattributes`, assign the `diff` attribute for paths.
> +
> +------------------------
> +*.json diff=<name>
> +------------------------
> +
> +Then, define a "diff.<name>.algorithm" configuration to specify the diff

Should there be a link to `git-config` right after "configuration"?
Otherwise, users may think that they are being told to specify
additional configuration within the .gitattributes file.

> +algorithm, choosing from `meyers`, `patience`, `minimal`, or `histogram`.

s/meyers/myers/

> +
> +----------------------------------------------------------------
> +[diff "<name>"]
> +  algorithm = histogram
> +----------------------------------------------------------------

It's pretty easy to assume the above is meant to be part of the
.gitattributes file instead of the .git/config file.  Don't most users
run `git config` commands directly rather than edit the .git/config
file?  Should we provide a sample command rather than showing what the
config file will contain?

> +
> +This diff algorithm applies to user facing diff output like git-diff(1),
> +git-show(1) and is used for the `--stat` output as well. The merge machinery
> +will not use the diff algorithm set through this method.

Yaay, thanks for including this!


I'm still curious if this should this also include warnings/caveats, such as:
  * The diff attribute specified in .gitattributes will be ignored in
a bare clone
  * The diff attribute specified in .gitattributes will be ignored if
it is only specified in another branch (e.g. on a branch "special-file
diff=patience" recorded in .gitattributes, then checkout master but
run `git log -1 -p $branch`)
  * When a file is renamed, the diff attribute for the pre-image name
is the only one the system pays attention to (thus adding "-R" can
flip which diff algorithm is run for the renamed file).

Also, since I tested the three items above to verify they are valid
warnings, I'm a bit confused.  I thought your intent was to use this
server-side[1], so isn't the bare clone aspect a deal-breaker for your
intended usecase?

[1] https://lore.kernel.org/git/7852AC7B-7A4E-4DD0-ADEA-CFFD5D16C595@gmail.com/

> +
> +NOTE: If the `command` key also exists, then Git will treat this as an external
> +diff and attempt to use the value set for `command` as an external program. For
> +instance, the following config, combined with the above `.gitattributes` file,
> +will result in `command` favored over `algorithm`.
> +
> +----------------------------------------------------------------
> +[diff "<name>"]
> +  command = j-c-diff
> +  algorithm = histogram
> +----------------------------------------------------------------
>
>  Defining a custom hunk-header
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> diff --git a/diff.c b/diff.c
> index 5efc22ca06b..04469da6d34 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4456,15 +4456,13 @@ static void run_diff_cmd(const char *pgm,
>         const char *xfrm_msg = NULL;
>         int complete_rewrite = (p->status == DIFF_STATUS_MODIFIED) && p->score;
>         int must_show_header = 0;
> +       struct userdiff_driver *drv = NULL;
>
> -
> -       if (o->flags.allow_external) {
> -               struct userdiff_driver *drv;
> -
> +       if (o->flags.allow_external || !o->ignore_driver_algorithm)
>                 drv = userdiff_find_by_path(o->repo->index, attr_path);
> -               if (drv && drv->external)
> -                       pgm = drv->external;
> -       }
> +
> +       if (o->flags.allow_external && drv && drv->external)
> +               pgm = drv->external;
>
>         if (msg) {
>                 /*
> @@ -4481,12 +4479,16 @@ static void run_diff_cmd(const char *pgm,
>                 run_external_diff(pgm, name, other, one, two, xfrm_msg, o);
>                 return;
>         }
> -       if (one && two)
> +       if (one && two) {
> +               if (drv && !o->ignore_driver_algorithm && drv->algorithm)
> +                       set_diff_algorithm(o, drv->algorithm);
> +
>                 builtin_diff(name, other ? other : name,
>                              one, two, xfrm_msg, must_show_header,
>                              o, complete_rewrite);
> -       else
> +       } else {
>                 fprintf(o->file, "* Unmerged path %s\n", name);
> +       }
>  }
>
>  static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *istate)
> @@ -4583,6 +4585,14 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o,
>         const char *name;
>         const char *other;
>
> +       if (!o->ignore_driver_algorithm) {
> +               struct userdiff_driver *drv = userdiff_find_by_path(o->repo->index, p->one->path);
> +
> +               if (drv && drv->algorithm) {
> +                       set_diff_algorithm(o, drv->algorithm);
> +               }
> +       }
> +
>         if (DIFF_PAIR_UNMERGED(p)) {
>                 /* unmerged */
>                 builtin_diffstat(p->one->path, NULL, NULL, NULL,
> @@ -5130,6 +5140,8 @@ static int diff_opt_diff_algorithm(const struct option *opt,
>                 return error(_("option diff-algorithm accepts \"myers\", "
>                                "\"minimal\", \"patience\" and \"histogram\""));
>
> +       options->ignore_driver_algorithm = 1;
> +
>         return 0;
>  }
>
> @@ -5145,6 +5157,8 @@ static int diff_opt_diff_algorithm_no_arg(const struct option *opt,
>                 BUG("available diff algorithms include \"myers\", "
>                                "\"minimal\", \"patience\" and \"histogram\"");
>
> +       options->ignore_driver_algorithm = 1;
> +
>         return 0;
>  }
>
> @@ -5285,6 +5299,7 @@ static int diff_opt_patience(const struct option *opt,
>         for (i = 0; i < options->anchors_nr; i++)
>                 free(options->anchors[i]);
>         options->anchors_nr = 0;
> +       options->ignore_driver_algorithm = 1;
>
>         return set_diff_algorithm(options, "patience");
>  }
> diff --git a/diff.h b/diff.h
> index 41eb2c3d428..8d770b1d579 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -333,6 +333,7 @@ struct diff_options {
>         int prefix_length;
>         const char *stat_sep;
>         int xdl_opts;
> +       int ignore_driver_algorithm;
>
>         /* see Documentation/diff-options.txt */
>         char **anchors;
> diff --git a/t/lib-diff-alternative.sh b/t/lib-diff-alternative.sh
> index 8d1e408bb58..2dc02bca873 100644
> --- a/t/lib-diff-alternative.sh
> +++ b/t/lib-diff-alternative.sh
> @@ -105,10 +105,46 @@ index $file1..$file2 100644
>   }
>  EOF
>
> +       cat >expect_diffstat <<EOF
> + file1 => file2 | 21 ++++++++++-----------
> + 1 file changed, 10 insertions(+), 11 deletions(-)
> +EOF
> +
>         STRATEGY=$1
>
> +       test_expect_success "$STRATEGY diff from attributes" '
> +               echo "file* diff=driver" >.gitattributes &&
> +               git config diff.driver.algorithm "$STRATEGY" &&
> +               test_must_fail git diff --no-index file1 file2 > output &&
> +               cat expect &&
> +               cat output &&
> +               test_cmp expect output
> +       '
> +
> +       test_expect_success "$STRATEGY diff from attributes has valid diffstat" '
> +               echo "file* diff=driver" >.gitattributes &&
> +               git config diff.driver.algorithm "$STRATEGY" &&
> +               test_must_fail git diff --stat --no-index file1 file2 > output &&
> +               test_cmp expect_diffstat output
> +       '
> +
>         test_expect_success "$STRATEGY diff" '
> -               test_must_fail git diff --no-index "--$STRATEGY" file1 file2 > output &&
> +               test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
> +               test_cmp expect output
> +       '
> +
> +       test_expect_success "$STRATEGY diff command line precedence before attributes" '
> +               echo "file* diff=driver" >.gitattributes &&
> +               git config diff.driver.algorithm meyers &&

Is this misspelling of myers intentional?  I think with the typo, the
code falls back to the default algorithm, which happens to be myers,
so I think the test works either way, but were you intending to test
fallback behavior in case of a typo here, or was that accidental?



> +               test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
> +               test_cmp expect output
> +       '
> +
> +       test_expect_success "$STRATEGY diff attributes precedence before config" '
> +               git config diff.algorithm default &&
> +               echo "file* diff=driver" >.gitattributes &&
> +               git config diff.driver.algorithm "$STRATEGY" &&
> +               test_must_fail git diff --no-index file1 file2 > output &&
>                 test_cmp expect output
>         '
>
> diff --git a/userdiff.c b/userdiff.c
> index d71b82feb74..ff25cfc4b4c 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -293,7 +293,7 @@ PATTERNS("scheme",
>          "|([^][)(}{[ \t])+"),
>  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
>          "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
> -{ "default", NULL, -1, { NULL, 0 } },
> +{ "default", NULL, NULL, -1, { NULL, 0 } },
>  };
>  #undef PATTERNS
>  #undef IPATTERN
> @@ -394,6 +394,8 @@ int userdiff_config(const char *k, const char *v)
>                 return parse_bool(&drv->textconv_want_cache, k, v);
>         if (!strcmp(type, "wordregex"))
>                 return git_config_string(&drv->word_regex, k, v);
> +       if (!strcmp(type, "algorithm"))
> +               return git_config_string(&drv->algorithm, k, v);
>
>         return 0;
>  }
> diff --git a/userdiff.h b/userdiff.h
> index aee91bc77e6..24419db6973 100644
> --- a/userdiff.h
> +++ b/userdiff.h
> @@ -14,6 +14,7 @@ struct userdiff_funcname {
>  struct userdiff_driver {
>         const char *name;
>         const char *external;
> +       const char *algorithm;
>         int binary;
>         struct userdiff_funcname funcname;
>         const char *word_regex;
> --
> gitgitgadget
John Cai Feb. 20, 2023, 3:32 p.m. UTC | #3
Hi Elijah,

On 17 Feb 2023, at 21:56, Elijah Newren wrote:

> On Fri, Feb 17, 2023 at 12:21 PM John Cai via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: John Cai <johncai86@gmail.com>
>>
>> It can be useful to specify diff algorithms per file type. For example,
>> one may want to use the minimal diff algorithm for .json files, another
>> for .c files, etc.
>>
>> The diff machinery already checks attributes for a diff driver. Teach
>> the diff driver parser a new type "algorithm" to look for in the
>> config, which will be used if a driver has been specified through the
>> attributes.
>>
>> Enforce precedence of the diff algorithm by favoring the command line
>> option, then looking at the driver attributes & config combination, then
>> finally the diff.algorithm config.
>>
>> To enforce precedence order, use a new `ignore_driver_algorithm` member
>> during options pasing to indicate the diff algorithm was set via command
>> line args.
>
> s/pasing/parsing/

thanks for noticing this!

>
>> Signed-off-by: John Cai <johncai86@gmail.com>
>> ---
>>  Documentation/gitattributes.txt | 37 ++++++++++++++++++++++++++++++++
>>  diff.c                          | 33 ++++++++++++++++++++--------
>>  diff.h                          |  1 +
>>  t/lib-diff-alternative.sh       | 38 ++++++++++++++++++++++++++++++++-
>>  userdiff.c                      |  4 +++-
>>  userdiff.h                      |  1 +
>>  6 files changed, 103 insertions(+), 11 deletions(-)
>>
>> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
>> index c19e64ea0ef..f212079a131 100644
>> --- a/Documentation/gitattributes.txt
>> +++ b/Documentation/gitattributes.txt
>> @@ -758,6 +758,43 @@ with the above configuration, i.e. `j-c-diff`, with 7
>>  parameters, just like `GIT_EXTERNAL_DIFF` program is called.
>>  See linkgit:git[1] for details.
>>
>> +Setting the internal diff algorithm
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +The diff algorithm can be set through the `diff.algorithm` config key, but
>> +sometimes it may be helpful to set the diff algorithm by path. For example, one
>> +might wish to set a diff algorithm automatically for all `.json` files such that
>> +the user would not need to pass in a separate command line `--diff-algorithm`
>> +flag each time.
>> +
>> +First, in `.gitattributes`, assign the `diff` attribute for paths.
>> +
>> +------------------------
>> +*.json diff=<name>
>> +------------------------
>> +
>> +Then, define a "diff.<name>.algorithm" configuration to specify the diff
>
> Should there be a link to `git-config` right after "configuration"?
> Otherwise, users may think that they are being told to specify
> additional configuration within the .gitattributes file.
>
>> +algorithm, choosing from `meyers`, `patience`, `minimal`, or `histogram`.
>
> s/meyers/myers/

likewise.

>
>> +
>> +----------------------------------------------------------------
>> +[diff "<name>"]
>> +  algorithm = histogram
>> +----------------------------------------------------------------
>
> It's pretty easy to assume the above is meant to be part of the
> .gitattributes file instead of the .git/config file.  Don't most users
> run `git config` commands directly rather than edit the .git/config
> file?  Should we provide a sample command rather than showing what the
> config file will contain?

Not sure--I think I was was just following what I saw in the existing
documentation for driver configuration. But would be interested to see what
other folks think.

>
>> +
>> +This diff algorithm applies to user facing diff output like git-diff(1),
>> +git-show(1) and is used for the `--stat` output as well. The merge machinery
>> +will not use the diff algorithm set through this method.
>
> Yaay, thanks for including this!
>
>
> I'm still curious if this should this also include warnings/caveats, such as:
>   * The diff attribute specified in .gitattributes will be ignored in
> a bare clone
>   * The diff attribute specified in .gitattributes will be ignored if
> it is only specified in another branch (e.g. on a branch "special-file
> diff=patience" recorded in .gitattributes, then checkout master but
> run `git log -1 -p $branch`)
>   * When a file is renamed, the diff attribute for the pre-image name
> is the only one the system pays attention to (thus adding "-R" can
> flip which diff algorithm is run for the renamed file).

I would be fine with adding that--though originally I was thinking that these
can be inferred from the way that gitattributes are documented in [1]. Calling
these out would make it more clear though, so I could go either way.

>
> Also, since I tested the three items above to verify they are valid
> warnings, I'm a bit confused.  I thought your intent was to use this
> server-side[1], so isn't the bare clone aspect a deal-breaker for your
> intended usecase?
>
> [1] https://lore.kernel.org/git/7852AC7B-7A4E-4DD0-ADEA-CFFD5D16C595@gmail.com/

yes, indeed. I was planning on adding bare repository support in a separate
patch series, since the additions in [2] allows .gitattributes to be read from a
bare repository.

1. https://git-scm.com/docs/gitattributes
2. https://lore.kernel.org/git/0ca8b2458921fc40269b0c43b5ec86eba77d6b54.1673684790.git.karthik.188@gmail.com/

thanks!
John

>
>> +
>> +NOTE: If the `command` key also exists, then Git will treat this as an external
>> +diff and attempt to use the value set for `command` as an external program. For
>> +instance, the following config, combined with the above `.gitattributes` file,
>> +will result in `command` favored over `algorithm`.
>> +
>> +----------------------------------------------------------------
>> +[diff "<name>"]
>> +  command = j-c-diff
>> +  algorithm = histogram
>> +----------------------------------------------------------------
>>
>>  Defining a custom hunk-header
>>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> diff --git a/diff.c b/diff.c
>> index 5efc22ca06b..04469da6d34 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -4456,15 +4456,13 @@ static void run_diff_cmd(const char *pgm,
>>         const char *xfrm_msg = NULL;
>>         int complete_rewrite = (p->status == DIFF_STATUS_MODIFIED) && p->score;
>>         int must_show_header = 0;
>> +       struct userdiff_driver *drv = NULL;
>>
>> -
>> -       if (o->flags.allow_external) {
>> -               struct userdiff_driver *drv;
>> -
>> +       if (o->flags.allow_external || !o->ignore_driver_algorithm)
>>                 drv = userdiff_find_by_path(o->repo->index, attr_path);
>> -               if (drv && drv->external)
>> -                       pgm = drv->external;
>> -       }
>> +
>> +       if (o->flags.allow_external && drv && drv->external)
>> +               pgm = drv->external;
>>
>>         if (msg) {
>>                 /*
>> @@ -4481,12 +4479,16 @@ static void run_diff_cmd(const char *pgm,
>>                 run_external_diff(pgm, name, other, one, two, xfrm_msg, o);
>>                 return;
>>         }
>> -       if (one && two)
>> +       if (one && two) {
>> +               if (drv && !o->ignore_driver_algorithm && drv->algorithm)
>> +                       set_diff_algorithm(o, drv->algorithm);
>> +
>>                 builtin_diff(name, other ? other : name,
>>                              one, two, xfrm_msg, must_show_header,
>>                              o, complete_rewrite);
>> -       else
>> +       } else {
>>                 fprintf(o->file, "* Unmerged path %s\n", name);
>> +       }
>>  }
>>
>>  static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *istate)
>> @@ -4583,6 +4585,14 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o,
>>         const char *name;
>>         const char *other;
>>
>> +       if (!o->ignore_driver_algorithm) {
>> +               struct userdiff_driver *drv = userdiff_find_by_path(o->repo->index, p->one->path);
>> +
>> +               if (drv && drv->algorithm) {
>> +                       set_diff_algorithm(o, drv->algorithm);
>> +               }
>> +       }
>> +
>>         if (DIFF_PAIR_UNMERGED(p)) {
>>                 /* unmerged */
>>                 builtin_diffstat(p->one->path, NULL, NULL, NULL,
>> @@ -5130,6 +5140,8 @@ static int diff_opt_diff_algorithm(const struct option *opt,
>>                 return error(_("option diff-algorithm accepts \"myers\", "
>>                                "\"minimal\", \"patience\" and \"histogram\""));
>>
>> +       options->ignore_driver_algorithm = 1;
>> +
>>         return 0;
>>  }
>>
>> @@ -5145,6 +5157,8 @@ static int diff_opt_diff_algorithm_no_arg(const struct option *opt,
>>                 BUG("available diff algorithms include \"myers\", "
>>                                "\"minimal\", \"patience\" and \"histogram\"");
>>
>> +       options->ignore_driver_algorithm = 1;
>> +
>>         return 0;
>>  }
>>
>> @@ -5285,6 +5299,7 @@ static int diff_opt_patience(const struct option *opt,
>>         for (i = 0; i < options->anchors_nr; i++)
>>                 free(options->anchors[i]);
>>         options->anchors_nr = 0;
>> +       options->ignore_driver_algorithm = 1;
>>
>>         return set_diff_algorithm(options, "patience");
>>  }
>> diff --git a/diff.h b/diff.h
>> index 41eb2c3d428..8d770b1d579 100644
>> --- a/diff.h
>> +++ b/diff.h
>> @@ -333,6 +333,7 @@ struct diff_options {
>>         int prefix_length;
>>         const char *stat_sep;
>>         int xdl_opts;
>> +       int ignore_driver_algorithm;
>>
>>         /* see Documentation/diff-options.txt */
>>         char **anchors;
>> diff --git a/t/lib-diff-alternative.sh b/t/lib-diff-alternative.sh
>> index 8d1e408bb58..2dc02bca873 100644
>> --- a/t/lib-diff-alternative.sh
>> +++ b/t/lib-diff-alternative.sh
>> @@ -105,10 +105,46 @@ index $file1..$file2 100644
>>   }
>>  EOF
>>
>> +       cat >expect_diffstat <<EOF
>> + file1 => file2 | 21 ++++++++++-----------
>> + 1 file changed, 10 insertions(+), 11 deletions(-)
>> +EOF
>> +
>>         STRATEGY=$1
>>
>> +       test_expect_success "$STRATEGY diff from attributes" '
>> +               echo "file* diff=driver" >.gitattributes &&
>> +               git config diff.driver.algorithm "$STRATEGY" &&
>> +               test_must_fail git diff --no-index file1 file2 > output &&
>> +               cat expect &&
>> +               cat output &&
>> +               test_cmp expect output
>> +       '
>> +
>> +       test_expect_success "$STRATEGY diff from attributes has valid diffstat" '
>> +               echo "file* diff=driver" >.gitattributes &&
>> +               git config diff.driver.algorithm "$STRATEGY" &&
>> +               test_must_fail git diff --stat --no-index file1 file2 > output &&
>> +               test_cmp expect_diffstat output
>> +       '
>> +
>>         test_expect_success "$STRATEGY diff" '
>> -               test_must_fail git diff --no-index "--$STRATEGY" file1 file2 > output &&
>> +               test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
>> +               test_cmp expect output
>> +       '
>> +
>> +       test_expect_success "$STRATEGY diff command line precedence before attributes" '
>> +               echo "file* diff=driver" >.gitattributes &&
>> +               git config diff.driver.algorithm meyers &&
>
> Is this misspelling of myers intentional?  I think with the typo, the
> code falls back to the default algorithm, which happens to be myers,
> so I think the test works either way, but were you intending to test
> fallback behavior in case of a typo here, or was that accidental?
>
>
>
>> +               test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
>> +               test_cmp expect output
>> +       '
>> +
>> +       test_expect_success "$STRATEGY diff attributes precedence before config" '
>> +               git config diff.algorithm default &&
>> +               echo "file* diff=driver" >.gitattributes &&
>> +               git config diff.driver.algorithm "$STRATEGY" &&
>> +               test_must_fail git diff --no-index file1 file2 > output &&
>>                 test_cmp expect output
>>         '
>>
>> diff --git a/userdiff.c b/userdiff.c
>> index d71b82feb74..ff25cfc4b4c 100644
>> --- a/userdiff.c
>> +++ b/userdiff.c
>> @@ -293,7 +293,7 @@ PATTERNS("scheme",
>>          "|([^][)(}{[ \t])+"),
>>  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
>>          "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
>> -{ "default", NULL, -1, { NULL, 0 } },
>> +{ "default", NULL, NULL, -1, { NULL, 0 } },
>>  };
>>  #undef PATTERNS
>>  #undef IPATTERN
>> @@ -394,6 +394,8 @@ int userdiff_config(const char *k, const char *v)
>>                 return parse_bool(&drv->textconv_want_cache, k, v);
>>         if (!strcmp(type, "wordregex"))
>>                 return git_config_string(&drv->word_regex, k, v);
>> +       if (!strcmp(type, "algorithm"))
>> +               return git_config_string(&drv->algorithm, k, v);
>>
>>         return 0;
>>  }
>> diff --git a/userdiff.h b/userdiff.h
>> index aee91bc77e6..24419db6973 100644
>> --- a/userdiff.h
>> +++ b/userdiff.h
>> @@ -14,6 +14,7 @@ struct userdiff_funcname {
>>  struct userdiff_driver {
>>         const char *name;
>>         const char *external;
>> +       const char *algorithm;
>>         int binary;
>>         struct userdiff_funcname funcname;
>>         const char *word_regex;
>> --
>> gitgitgadget
Elijah Newren Feb. 20, 2023, 4:21 p.m. UTC | #4
On Mon, Feb 20, 2023 at 7:32 AM John Cai <johncai86@gmail.com> wrote:
[...]
> > I'm still curious if this should this also include warnings/caveats, such as:
> >   * The diff attribute specified in .gitattributes will be ignored in
> > a bare clone
> >   * The diff attribute specified in .gitattributes will be ignored if
> > it is only specified in another branch (e.g. on a branch "special-file
> > diff=patience" recorded in .gitattributes, then checkout master but
> > run `git log -1 -p $branch`)
> >   * When a file is renamed, the diff attribute for the pre-image name
> > is the only one the system pays attention to (thus adding "-R" can
> > flip which diff algorithm is run for the renamed file).
>
> I would be fine with adding that--though originally I was thinking that these
> can be inferred from the way that gitattributes are documented in [1]. Calling
> these out would make it more clear though, so I could go either way.
>
> >
> > Also, since I tested the three items above to verify they are valid
> > warnings, I'm a bit confused.  I thought your intent was to use this
> > server-side[1], so isn't the bare clone aspect a deal-breaker for your
> > intended usecase?
> >
> > [1] https://lore.kernel.org/git/7852AC7B-7A4E-4DD0-ADEA-CFFD5D16C595@gmail.com/
>
> yes, indeed. I was planning on adding bare repository support in a separate
> patch series, since the additions in [2] allows .gitattributes to be read from a
> bare repository.
>
> 1. https://git-scm.com/docs/gitattributes
> 2. https://lore.kernel.org/git/0ca8b2458921fc40269b0c43b5ec86eba77d6b54.1673684790.git.karthik.188@gmail.com/
>
> thanks!
> John

Oh, interesting, I didn't know about [2].  So, is the plan to take the
--source option from that series and add it to diff (perhaps with a
different name, since log tends to consume diff options and --source
is already taken)?

And do you expect to get the tree-ish from the two the users are
already specifying to diff?  If so, which one do you use (the two
commits being diffed might have differing .gitattributes files)?  If
not, what does that mean for users of e.g. the GitLab UI who have to
specify a third tree when diffing?
John Cai Feb. 20, 2023, 4:49 p.m. UTC | #5
Hi Elijah,

On 20 Feb 2023, at 11:21, Elijah Newren wrote:

> On Mon, Feb 20, 2023 at 7:32 AM John Cai <johncai86@gmail.com> wrote:
> [...]
>>> I'm still curious if this should this also include warnings/caveats, such as:
>>>   * The diff attribute specified in .gitattributes will be ignored in
>>> a bare clone
>>>   * The diff attribute specified in .gitattributes will be ignored if
>>> it is only specified in another branch (e.g. on a branch "special-file
>>> diff=patience" recorded in .gitattributes, then checkout master but
>>> run `git log -1 -p $branch`)
>>>   * When a file is renamed, the diff attribute for the pre-image name
>>> is the only one the system pays attention to (thus adding "-R" can
>>> flip which diff algorithm is run for the renamed file).
>>
>> I would be fine with adding that--though originally I was thinking that these
>> can be inferred from the way that gitattributes are documented in [1]. Calling
>> these out would make it more clear though, so I could go either way.
>>
>>>
>>> Also, since I tested the three items above to verify they are valid
>>> warnings, I'm a bit confused.  I thought your intent was to use this
>>> server-side[1], so isn't the bare clone aspect a deal-breaker for your
>>> intended usecase?
>>>
>>> [1] https://lore.kernel.org/git/7852AC7B-7A4E-4DD0-ADEA-CFFD5D16C595@gmail.com/
>>
>> yes, indeed. I was planning on adding bare repository support in a separate
>> patch series, since the additions in [2] allows .gitattributes to be read from a
>> bare repository.
>>
>> 1. https://git-scm.com/docs/gitattributes
>> 2. https://lore.kernel.org/git/0ca8b2458921fc40269b0c43b5ec86eba77d6b54.1673684790.git.karthik.188@gmail.com/
>>
>> thanks!
>> John
>
> Oh, interesting, I didn't know about [2].  So, is the plan to take the
> --source option from that series and add it to diff (perhaps with a
> different name, since log tends to consume diff options and --source
> is already taken)?

Yep, that would be the general idea

>
> And do you expect to get the tree-ish from the two the users are
> already specifying to diff?  If so, which one do you use (the two
> commits being diffed might have differing .gitattributes files)?  If
> not, what does that mean for users of e.g. the GitLab UI who have to
> specify a third tree when diffing?

Good question! Since it seems that when `git-diff(1)` considers diff.<driver>,
it goes with the path of the first one. (might need some confirmation here)

in diff.c:


static void run_diff(struct diff_filepair *p, struct diff_options *o)
{
	const char *pgm = external_diff();
	struct strbuf msg;
	struct diff_filespec *one = p->one;
	struct diff_filespec *two = p->two;
	const char *name;
	const char *other;
	const char *attr_path;

	name  = one->path;
	other = (strcmp(name, two->path) ? two->path : NULL);
	attr_path = name;
	if (o->prefix_length)

I was thinking we would just use the tree-ish of the first one

thanks
John
Elijah Newren Feb. 20, 2023, 5:32 p.m. UTC | #6
On Mon, Feb 20, 2023 at 8:49 AM John Cai <johncai86@gmail.com> wrote:
>
> Hi Elijah,
>
> On 20 Feb 2023, at 11:21, Elijah Newren wrote:
>
> > On Mon, Feb 20, 2023 at 7:32 AM John Cai <johncai86@gmail.com> wrote:
> > [...]
> >>> I'm still curious if this should this also include warnings/caveats, such as:
> >>>   * The diff attribute specified in .gitattributes will be ignored in
> >>> a bare clone
> >>>   * The diff attribute specified in .gitattributes will be ignored if
> >>> it is only specified in another branch (e.g. on a branch "special-file
> >>> diff=patience" recorded in .gitattributes, then checkout master but
> >>> run `git log -1 -p $branch`)
> >>>   * When a file is renamed, the diff attribute for the pre-image name
> >>> is the only one the system pays attention to (thus adding "-R" can
> >>> flip which diff algorithm is run for the renamed file).
> >>
> >> I would be fine with adding that--though originally I was thinking that these
> >> can be inferred from the way that gitattributes are documented in [1]. Calling
> >> these out would make it more clear though, so I could go either way.
> >>
> >>>
> >>> Also, since I tested the three items above to verify they are valid
> >>> warnings, I'm a bit confused.  I thought your intent was to use this
> >>> server-side[1], so isn't the bare clone aspect a deal-breaker for your
> >>> intended usecase?
> >>>
> >>> [1] https://lore.kernel.org/git/7852AC7B-7A4E-4DD0-ADEA-CFFD5D16C595@gmail.com/
> >>
> >> yes, indeed. I was planning on adding bare repository support in a separate
> >> patch series, since the additions in [2] allows .gitattributes to be read from a
> >> bare repository.
> >>
> >> 1. https://git-scm.com/docs/gitattributes
> >> 2. https://lore.kernel.org/git/0ca8b2458921fc40269b0c43b5ec86eba77d6b54.1673684790.git.karthik.188@gmail.com/
> >>
> >> thanks!
> >> John
> >
> > Oh, interesting, I didn't know about [2].  So, is the plan to take the
> > --source option from that series and add it to diff (perhaps with a
> > different name, since log tends to consume diff options and --source
> > is already taken)?
>
> Yep, that would be the general idea
>
> >
> > And do you expect to get the tree-ish from the two the users are
> > already specifying to diff?  If so, which one do you use (the two
> > commits being diffed might have differing .gitattributes files)?  If
> > not, what does that mean for users of e.g. the GitLab UI who have to
> > specify a third tree when diffing?
>
> Good question! Since it seems that when `git-diff(1)` considers diff.<driver>,
> it goes with the path of the first one. (might need some confirmation here)
>
> in diff.c:
>
>
> static void run_diff(struct diff_filepair *p, struct diff_options *o)
> {
>         const char *pgm = external_diff();
>         struct strbuf msg;
>         struct diff_filespec *one = p->one;
>         struct diff_filespec *two = p->two;
>         const char *name;
>         const char *other;
>         const char *attr_path;
>
>         name  = one->path;
>         other = (strcmp(name, two->path) ? two->path : NULL);
>         attr_path = name;
>         if (o->prefix_length)
>
> I was thinking we would just use the tree-ish of the first one

That would certainly simplify, but it'd be pretty important to
document.  (Incidentally, this kind of decision was my reason for
asking about all those special cases earlier, i.e. how to handle diff
between different commits, how to handle renames, how to handle bare
repositories, etc.)

This kind of decision probably also means you'd need a variety of
testcases where .gitattributes is different in every commit & the
index & the working tree, and then you start testing several of the
possible pairings to make sure the right .gitattributes file is used
(e.g. (commit, commit), (commit, index), (index, commit), (worktree,
index), etc.)

However, I'm curious again.  You brought this up because you want to
use it in GitLab, yet configuration of using this option as it appears
in this series requires changing _both_ .gitattributes and
.git/config.  How will users of the GitLab UI do the configuration
necessary for the git-config side to take effect?
John Cai Feb. 20, 2023, 8:53 p.m. UTC | #7
Hi Elijah,

On 20 Feb 2023, at 12:32, Elijah Newren wrote:

> On Mon, Feb 20, 2023 at 8:49 AM John Cai <johncai86@gmail.com> wrote:
>>
>> Hi Elijah,
>>
>> On 20 Feb 2023, at 11:21, Elijah Newren wrote:
>>
>>> On Mon, Feb 20, 2023 at 7:32 AM John Cai <johncai86@gmail.com> wrote:
>>> [...]
>>>>> I'm still curious if this should this also include warnings/caveats, such as:
>>>>>   * The diff attribute specified in .gitattributes will be ignored in
>>>>> a bare clone
>>>>>   * The diff attribute specified in .gitattributes will be ignored if
>>>>> it is only specified in another branch (e.g. on a branch "special-file
>>>>> diff=patience" recorded in .gitattributes, then checkout master but
>>>>> run `git log -1 -p $branch`)
>>>>>   * When a file is renamed, the diff attribute for the pre-image name
>>>>> is the only one the system pays attention to (thus adding "-R" can
>>>>> flip which diff algorithm is run for the renamed file).
>>>>
>>>> I would be fine with adding that--though originally I was thinking that these
>>>> can be inferred from the way that gitattributes are documented in [1]. Calling
>>>> these out would make it more clear though, so I could go either way.
>>>>
>>>>>
>>>>> Also, since I tested the three items above to verify they are valid
>>>>> warnings, I'm a bit confused.  I thought your intent was to use this
>>>>> server-side[1], so isn't the bare clone aspect a deal-breaker for your
>>>>> intended usecase?
>>>>>
>>>>> [1] https://lore.kernel.org/git/7852AC7B-7A4E-4DD0-ADEA-CFFD5D16C595@gmail.com/
>>>>
>>>> yes, indeed. I was planning on adding bare repository support in a separate
>>>> patch series, since the additions in [2] allows .gitattributes to be read from a
>>>> bare repository.
>>>>
>>>> 1. https://git-scm.com/docs/gitattributes
>>>> 2. https://lore.kernel.org/git/0ca8b2458921fc40269b0c43b5ec86eba77d6b54.1673684790.git.karthik.188@gmail.com/
>>>>
>>>> thanks!
>>>> John
>>>
>>> Oh, interesting, I didn't know about [2].  So, is the plan to take the
>>> --source option from that series and add it to diff (perhaps with a
>>> different name, since log tends to consume diff options and --source
>>> is already taken)?
>>
>> Yep, that would be the general idea
>>
>>>
>>> And do you expect to get the tree-ish from the two the users are
>>> already specifying to diff?  If so, which one do you use (the two
>>> commits being diffed might have differing .gitattributes files)?  If
>>> not, what does that mean for users of e.g. the GitLab UI who have to
>>> specify a third tree when diffing?
>>
>> Good question! Since it seems that when `git-diff(1)` considers diff.<driver>,
>> it goes with the path of the first one. (might need some confirmation here)
>>
>> in diff.c:
>>
>>
>> static void run_diff(struct diff_filepair *p, struct diff_options *o)
>> {
>>         const char *pgm = external_diff();
>>         struct strbuf msg;
>>         struct diff_filespec *one = p->one;
>>         struct diff_filespec *two = p->two;
>>         const char *name;
>>         const char *other;
>>         const char *attr_path;
>>
>>         name  = one->path;
>>         other = (strcmp(name, two->path) ? two->path : NULL);
>>         attr_path = name;
>>         if (o->prefix_length)
>>
>> I was thinking we would just use the tree-ish of the first one
>
> That would certainly simplify, but it'd be pretty important to
> document.  (Incidentally, this kind of decision was my reason for
> asking about all those special cases earlier, i.e. how to handle diff
> between different commits, how to handle renames, how to handle bare
> repositories, etc.)

Good point--that would be good to document.

>
> This kind of decision probably also means you'd need a variety of
> testcases where .gitattributes is different in every commit & the
> index & the working tree, and then you start testing several of the
> possible pairings to make sure the right .gitattributes file is used
> (e.g. (commit, commit), (commit, index), (index, commit), (worktree,
> index), etc.)
>
> However, I'm curious again.  You brought this up because you want to
> use it in GitLab, yet configuration of using this option as it appears
> in this series requires changing _both_ .gitattributes and
> .git/config.  How will users of the GitLab UI do the configuration
> necessary for the git-config side to take effect?

Good question--we would likely need to add some pre-baked configuration server
side with a driver config that users could then tap into with their
gitattributes files.
Jeff King Feb. 22, 2023, 7:47 p.m. UTC | #8
On Mon, Feb 20, 2023 at 09:32:52AM -0800, Elijah Newren wrote:

> > I was thinking we would just use the tree-ish of the first one
> 
> That would certainly simplify, but it'd be pretty important to
> document.  (Incidentally, this kind of decision was my reason for
> asking about all those special cases earlier, i.e. how to handle diff
> between different commits, how to handle renames, how to handle bare
> repositories, etc.)
> 
> This kind of decision probably also means you'd need a variety of
> testcases where .gitattributes is different in every commit & the
> index & the working tree, and then you start testing several of the
> possible pairings to make sure the right .gitattributes file is used
> (e.g. (commit, commit), (commit, index), (index, commit), (worktree,
> index), etc.)

There may be some prior art here in how we handle mailmaps in a bare
repository. In that case, we pull them from HEAD (or really any commit
of your choosing, but the default is HEAD). That may seem a bit weird,
but it matches how non-bare repositories work, which read the mailmap
from the working tree.

So likewise, even looking at an old commit like "git show HEAD~1000", in
a non-bare repository we will read .gitattributes from the working tree,
which means it is (roughly) coming from HEAD. In some ways that is good
(there may be improvements to the attributes) and in some ways it is
weird and confusing (the meaning of the attributes may have been
different back then, or the two histories may even be somewhat
unrelated!). So I think you can make an argument either way on what is
useful, but harmonizing the non-bare and bare cases seems like the best
place to start.

And then that machinery would probably be enough to let people ask for
specific things on top (like "git show --attributes-from=HEAD~1000
HEAD~1000" if they really wanted).

-Peff
John Cai Feb. 24, 2023, 5:44 p.m. UTC | #9
Hey Peff,

On 22 Feb 2023, at 14:47, Jeff King wrote:

> On Mon, Feb 20, 2023 at 09:32:52AM -0800, Elijah Newren wrote:
>
>>> I was thinking we would just use the tree-ish of the first one
>>
>> That would certainly simplify, but it'd be pretty important to
>> document.  (Incidentally, this kind of decision was my reason for
>> asking about all those special cases earlier, i.e. how to handle diff
>> between different commits, how to handle renames, how to handle bare
>> repositories, etc.)
>>
>> This kind of decision probably also means you'd need a variety of
>> testcases where .gitattributes is different in every commit & the
>> index & the working tree, and then you start testing several of the
>> possible pairings to make sure the right .gitattributes file is used
>> (e.g. (commit, commit), (commit, index), (index, commit), (worktree,
>> index), etc.)
>
> There may be some prior art here in how we handle mailmaps in a bare
> repository. In that case, we pull them from HEAD (or really any commit
> of your choosing, but the default is HEAD). That may seem a bit weird,
> but it matches how non-bare repositories work, which read the mailmap
> from the working tree.
>
> So likewise, even looking at an old commit like "git show HEAD~1000", in
> a non-bare repository we will read .gitattributes from the working tree,
> which means it is (roughly) coming from HEAD. In some ways that is good
> (there may be improvements to the attributes) and in some ways it is
> weird and confusing (the meaning of the attributes may have been
> different back then, or the two histories may even be somewhat
> unrelated!). So I think you can make an argument either way on what is
> useful, but harmonizing the non-bare and bare cases seems like the best
> place to start.

Thanks for the historical context and suggestion here
>
> And then that machinery would probably be enough to let people ask for
> specific things on top (like "git show --attributes-from=HEAD~1000
> HEAD~1000" if they really wanted).

Makes sense to me

thanks
John

>
> -Peff
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index c19e64ea0ef..f212079a131 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -758,6 +758,43 @@  with the above configuration, i.e. `j-c-diff`, with 7
 parameters, just like `GIT_EXTERNAL_DIFF` program is called.
 See linkgit:git[1] for details.
 
+Setting the internal diff algorithm
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The diff algorithm can be set through the `diff.algorithm` config key, but
+sometimes it may be helpful to set the diff algorithm by path. For example, one
+might wish to set a diff algorithm automatically for all `.json` files such that
+the user would not need to pass in a separate command line `--diff-algorithm`
+flag each time.
+
+First, in `.gitattributes`, assign the `diff` attribute for paths.
+
+------------------------
+*.json diff=<name>
+------------------------
+
+Then, define a "diff.<name>.algorithm" configuration to specify the diff
+algorithm, choosing from `meyers`, `patience`, `minimal`, or `histogram`.
+
+----------------------------------------------------------------
+[diff "<name>"]
+  algorithm = histogram
+----------------------------------------------------------------
+
+This diff algorithm applies to user facing diff output like git-diff(1),
+git-show(1) and is used for the `--stat` output as well. The merge machinery
+will not use the diff algorithm set through this method.
+
+NOTE: If the `command` key also exists, then Git will treat this as an external
+diff and attempt to use the value set for `command` as an external program. For
+instance, the following config, combined with the above `.gitattributes` file,
+will result in `command` favored over `algorithm`.
+
+----------------------------------------------------------------
+[diff "<name>"]
+  command = j-c-diff
+  algorithm = histogram
+----------------------------------------------------------------
 
 Defining a custom hunk-header
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/diff.c b/diff.c
index 5efc22ca06b..04469da6d34 100644
--- a/diff.c
+++ b/diff.c
@@ -4456,15 +4456,13 @@  static void run_diff_cmd(const char *pgm,
 	const char *xfrm_msg = NULL;
 	int complete_rewrite = (p->status == DIFF_STATUS_MODIFIED) && p->score;
 	int must_show_header = 0;
+	struct userdiff_driver *drv = NULL;
 
-
-	if (o->flags.allow_external) {
-		struct userdiff_driver *drv;
-
+	if (o->flags.allow_external || !o->ignore_driver_algorithm)
 		drv = userdiff_find_by_path(o->repo->index, attr_path);
-		if (drv && drv->external)
-			pgm = drv->external;
-	}
+
+	if (o->flags.allow_external && drv && drv->external)
+		pgm = drv->external;
 
 	if (msg) {
 		/*
@@ -4481,12 +4479,16 @@  static void run_diff_cmd(const char *pgm,
 		run_external_diff(pgm, name, other, one, two, xfrm_msg, o);
 		return;
 	}
-	if (one && two)
+	if (one && two) {
+		if (drv && !o->ignore_driver_algorithm && drv->algorithm)
+			set_diff_algorithm(o, drv->algorithm);
+
 		builtin_diff(name, other ? other : name,
 			     one, two, xfrm_msg, must_show_header,
 			     o, complete_rewrite);
-	else
+	} else {
 		fprintf(o->file, "* Unmerged path %s\n", name);
+	}
 }
 
 static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *istate)
@@ -4583,6 +4585,14 @@  static void run_diffstat(struct diff_filepair *p, struct diff_options *o,
 	const char *name;
 	const char *other;
 
+	if (!o->ignore_driver_algorithm) {
+		struct userdiff_driver *drv = userdiff_find_by_path(o->repo->index, p->one->path);
+
+		if (drv && drv->algorithm) {
+			set_diff_algorithm(o, drv->algorithm);
+		}
+	}
+
 	if (DIFF_PAIR_UNMERGED(p)) {
 		/* unmerged */
 		builtin_diffstat(p->one->path, NULL, NULL, NULL,
@@ -5130,6 +5140,8 @@  static int diff_opt_diff_algorithm(const struct option *opt,
 		return error(_("option diff-algorithm accepts \"myers\", "
 			       "\"minimal\", \"patience\" and \"histogram\""));
 
+	options->ignore_driver_algorithm = 1;
+
 	return 0;
 }
 
@@ -5145,6 +5157,8 @@  static int diff_opt_diff_algorithm_no_arg(const struct option *opt,
 		BUG("available diff algorithms include \"myers\", "
 			       "\"minimal\", \"patience\" and \"histogram\"");
 
+	options->ignore_driver_algorithm = 1;
+
 	return 0;
 }
 
@@ -5285,6 +5299,7 @@  static int diff_opt_patience(const struct option *opt,
 	for (i = 0; i < options->anchors_nr; i++)
 		free(options->anchors[i]);
 	options->anchors_nr = 0;
+	options->ignore_driver_algorithm = 1;
 
 	return set_diff_algorithm(options, "patience");
 }
diff --git a/diff.h b/diff.h
index 41eb2c3d428..8d770b1d579 100644
--- a/diff.h
+++ b/diff.h
@@ -333,6 +333,7 @@  struct diff_options {
 	int prefix_length;
 	const char *stat_sep;
 	int xdl_opts;
+	int ignore_driver_algorithm;
 
 	/* see Documentation/diff-options.txt */
 	char **anchors;
diff --git a/t/lib-diff-alternative.sh b/t/lib-diff-alternative.sh
index 8d1e408bb58..2dc02bca873 100644
--- a/t/lib-diff-alternative.sh
+++ b/t/lib-diff-alternative.sh
@@ -105,10 +105,46 @@  index $file1..$file2 100644
  }
 EOF
 
+	cat >expect_diffstat <<EOF
+ file1 => file2 | 21 ++++++++++-----------
+ 1 file changed, 10 insertions(+), 11 deletions(-)
+EOF
+
 	STRATEGY=$1
 
+	test_expect_success "$STRATEGY diff from attributes" '
+		echo "file* diff=driver" >.gitattributes &&
+		git config diff.driver.algorithm "$STRATEGY" &&
+		test_must_fail git diff --no-index file1 file2 > output &&
+		cat expect &&
+		cat output &&
+		test_cmp expect output
+	'
+
+	test_expect_success "$STRATEGY diff from attributes has valid diffstat" '
+		echo "file* diff=driver" >.gitattributes &&
+		git config diff.driver.algorithm "$STRATEGY" &&
+		test_must_fail git diff --stat --no-index file1 file2 > output &&
+		test_cmp expect_diffstat output
+	'
+
 	test_expect_success "$STRATEGY diff" '
-		test_must_fail git diff --no-index "--$STRATEGY" file1 file2 > output &&
+		test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
+		test_cmp expect output
+	'
+
+	test_expect_success "$STRATEGY diff command line precedence before attributes" '
+		echo "file* diff=driver" >.gitattributes &&
+		git config diff.driver.algorithm meyers &&
+		test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
+		test_cmp expect output
+	'
+
+	test_expect_success "$STRATEGY diff attributes precedence before config" '
+		git config diff.algorithm default &&
+		echo "file* diff=driver" >.gitattributes &&
+		git config diff.driver.algorithm "$STRATEGY" &&
+		test_must_fail git diff --no-index file1 file2 > output &&
 		test_cmp expect output
 	'
 
diff --git a/userdiff.c b/userdiff.c
index d71b82feb74..ff25cfc4b4c 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -293,7 +293,7 @@  PATTERNS("scheme",
 	 "|([^][)(}{[ \t])+"),
 PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
 	 "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
-{ "default", NULL, -1, { NULL, 0 } },
+{ "default", NULL, NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
 #undef IPATTERN
@@ -394,6 +394,8 @@  int userdiff_config(const char *k, const char *v)
 		return parse_bool(&drv->textconv_want_cache, k, v);
 	if (!strcmp(type, "wordregex"))
 		return git_config_string(&drv->word_regex, k, v);
+	if (!strcmp(type, "algorithm"))
+		return git_config_string(&drv->algorithm, k, v);
 
 	return 0;
 }
diff --git a/userdiff.h b/userdiff.h
index aee91bc77e6..24419db6973 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -14,6 +14,7 @@  struct userdiff_funcname {
 struct userdiff_driver {
 	const char *name;
 	const char *external;
+	const char *algorithm;
 	int binary;
 	struct userdiff_funcname funcname;
 	const char *word_regex;