mbox series

[v7,0/2] Add mailmap mechanism in cat-file options

Message ID 20221220060113.51010-1-siddharthasthana31@gmail.com (mailing list archive)
Headers show
Series Add mailmap mechanism in cat-file options | expand

Message

Siddharth Asthana Dec. 20, 2022, 6:01 a.m. UTC
Thanks a ton Christian and Ævar for the review :) I have made the
suggested changes in v7.

= Description

At present, `git-cat-file` command with `--batch-check` and `-s` options
does not complain when `--use-mailmap` option is given. The latter
option is just ignored. Instead, for commit/tag objects, the command
should compute the size of the object after replacing the idents and
report it. So, this patch series makes `-s` and `--batch-check` options
of `git-cat-file` honor mailmap when used with `--use-mailmap` option.

In this patch series we didn't want to change that '%(objectsize)'
always shows the size of the original object even when `--use-mailmap`
is set because first we have the long term plan to unify how the formats
for `git cat-file` and other commands works. And second existing formats
like the "pretty formats" used by `git log` have different options for
fields respecting mailmap or not respecting it (%an is for author name
while %aN for author name respecting mailmap).

I would like to thank my mentors, Christian Couder and John Cai, for all
of their help!
Looking forward to the reviews!

= Patch Organization

- The first patch makes `-s` option to return updated size of the
  <commit/tag> object, when combined with `--use-mailmap` option, after
  replacing the idents using the mailmap mechanism.
- The second patch makes `--batch-check` option to return updated size of
  the <commit/tag> object, when combined with `--use-mailmap` option,
  after replacing the idents using the mailmap mechanism.

= Changes in v5

- The patch series which improves the documentation for `-s`, `--batch`,
  `--batch-check` and `--batch-command` is again part of this patch
  series with patch 3/3 squashed into patches 1/3 and 2/3 as suggested
  by Junio, Taylor and Christian. The doc patch series was perviously
  sent independently for improving the documentation of git cat-file
  options:
  https://lore.kernel.org/git/20221029092513.73982-1-siddharthasthana31@gmail.com/

- Improve the tests according to run under SHA-256 mode.

= Changes in v6

- Improve the tests so it doesn't break under macOS.

= Changes in v7

- Used `repo_read_object_file` instead of `read_object_file` because we
  have been trying to migrate away from these, and new code should use
  the non-macro variants.
- Improve the documentation of patch series.

Siddharth Asthana (2):
  cat-file: add mailmap support to -s option
  cat-file: add mailmap support to --batch-check option

 Documentation/git-cat-file.txt | 53 +++++++++++++++++++--------
 builtin/cat-file.c             | 28 +++++++++++++++
 t/t4203-mailmap.sh             | 65 ++++++++++++++++++++++++++++++++++
 3 files changed, 132 insertions(+), 14 deletions(-)

Range-diff against v6:
1:  7f20f45183 = 1:  2097544b2d cat-file: add mailmap support to -s option
2:  971f38e064 ! 2:  8305148718 cat-file: add mailmap support to --batch-check option
    @@ Commit message
     
         2. Make one call to `oid_object_info_extended()` to get the type of the
            object. Then, if the object type is either of commit or tag, make a
    -       call to `read_object_file()` to read the contents of the object.
    +       call to `repo_read_object_file()` to read the contents of the object.
     
         I benchmarked the following command with both the above approaches and
         compared against the current implementation where `--use-mailmap`
    @@ Documentation/git-cat-file.txt: OPTIONS
     -	also need to specify the path, separated by whitespace.  See the
     -	section `BATCH OUTPUT` below for details.
     +	on stdin. May not be combined with any other options or arguments
    -+	except --textconv, --filters, or --use-mailmap.
    ++	except `--textconv`, `--filters`, or `--use-mailmap`.
     +	+
     +	* When used with `--textconv` or `--filters`, the input lines
     +	  must specify the path, separated by whitespace. See the section
    @@ Documentation/git-cat-file.txt: OPTIONS
     -	need to specify the path, separated by whitespace.  See the
     -	section `BATCH OUTPUT` below for details.
     +	Print object information for each object provided on stdin. May not be
    -+	combined with any other options or arguments except --textconv, --filters
    -+	or --use-mailmap.
    ++	combined with any other options or arguments except `--textconv`, `--filters`
    ++	or `--use-mailmap`.
     +	+
     +	* When used with `--textconv` or `--filters`, the input lines must
     +	 specify the path, separated by whitespace. See the section
    @@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
     +			size_t s = data->size;
     +			char *buf = NULL;
     +
    -+			buf = read_object_file(&data->oid, &data->type, &data->size);
    ++			buf = repo_read_object_file(the_repository, &data->oid, &data->type,
    ++						    &data->size);
     +			buf = replace_idents_using_mailmap(buf, &s);
     +			data->size = cast_size_t_to_ulong(s);
     +

Comments

Ævar Arnfjörð Bjarmason Dec. 20, 2022, 1:02 p.m. UTC | #1
On Tue, Dec 20 2022, Siddharth Asthana wrote:

>          2. Make one call to `oid_object_info_extended()` to get the type of the
>             object. Then, if the object type is either of commit or tag, make a
>     -       call to `read_object_file()` to read the contents of the object.
>     +       call to `repo_read_object_file()` to read the contents of the object.
>      
>          I benchmarked the following command with both the above approaches and
>          compared against the current implementation where `--use-mailmap`
>     @@ Documentation/git-cat-file.txt: OPTIONS
>      -	also need to specify the path, separated by whitespace.  See the
>      -	section `BATCH OUTPUT` below for details.
>      +	on stdin. May not be combined with any other options or arguments
>     -+	except --textconv, --filters, or --use-mailmap.
>     ++	except `--textconv`, `--filters`, or `--use-mailmap`.
>      +	+
>      +	* When used with `--textconv` or `--filters`, the input lines
>      +	  must specify the path, separated by whitespace. See the section
>     @@ Documentation/git-cat-file.txt: OPTIONS
>      -	need to specify the path, separated by whitespace.  See the
>      -	section `BATCH OUTPUT` below for details.
>      +	Print object information for each object provided on stdin. May not be
>     -+	combined with any other options or arguments except --textconv, --filters
>     -+	or --use-mailmap.
>     ++	combined with any other options or arguments except `--textconv`, `--filters`
>     ++	or `--use-mailmap`.
>      +	+
>      +	* When used with `--textconv` or `--filters`, the input lines must
>      +	 specify the path, separated by whitespace. See the section
>     @@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
>      +			size_t s = data->size;
>      +			char *buf = NULL;
>      +
>     -+			buf = read_object_file(&data->oid, &data->type, &data->size);
>     ++			buf = repo_read_object_file(the_repository, &data->oid, &data->type,
>     ++						    &data->size);
>      +			buf = replace_idents_using_mailmap(buf, &s);
>      +			data->size = cast_size_t_to_ulong(s);
>      +


This series looks good to me at this point, thanks in particular for the
repo_*() change (will make something I plan to submit soon easier).

These[1][2] are nits I came up with while reviewing this, not worth a
re-roll, but tweaked some things I found a bit odd, namely:

 * Let's not cast to void **, instead we can just declare a variable for
   the 's' case
 * If we don't need the "buf" return value we can skip the assignment
   (although I guess technically this breaks the encapsulation, so maybe
   we shouldn't skip it)
 * We can skip the NULL assignment in 2/2, and instead just assign to
   the variable as we declare it, and also do the replace/free on one
   line (to make it clear that we immediately don't care about it, and
   only want the size).

I don't think any of it's worth a re-roll, just notes to show you I've
looked at it carefully.

The only unresolved question I had while reading this is if we're sure
that a repo_read_object_file() following the a successful
oid_object_info_extended() is guaranteed to succeed? If not the code in
2/2 has a segfault we might trigger (as buf will be NULL), but maybe
we're guaranteed to always get the already-retrieved object from the
object cache.

1:  fe3cc3715b2 ! 1:  31701b3e55d cat-file: add mailmap support to -s option
    @@ Documentation/git-cat-file.txt: OPTIONS
     
      ## builtin/cat-file.c ##
     @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    + 		break;
      
      	case 's':
    ++	{
    ++		void *obuf = NULL;
    ++
      		oi.sizep = &size;
     +
     +		if (use_mailmap) {
     +			oi.typep = &type;
    -+			oi.contentp = (void**)&buf;
    ++			oi.contentp = &obuf;
     +		}
     +
      		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
    @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const
     +
     +		if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) {
     +			size_t s = size;
    -+			buf = replace_idents_using_mailmap(buf, &s);
    ++
    ++			replace_idents_using_mailmap(obuf, &s);
     +			size = cast_size_t_to_ulong(s);
     +		}
    ++		free(obuf);
     +
      		printf("%"PRIuMAX"\n", (uintmax_t)size);
      		ret = 0;
      		goto cleanup;
    +-
    ++	}
    + 	case 'e':
    + 		return !has_object_file(&oid);
    + 
     
      ## t/t4203-mailmap.sh ##
     @@ t/t4203-mailmap.sh: test_expect_success '--mailmap enables mailmap in cat-file for annotated tag obj
2:  d6c621820d2 ! 2:  14d95db69e9 cat-file: add mailmap support to --batch-check option
    @@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
     +
     +		if (use_mailmap && (data->type == OBJ_COMMIT || data->type == OBJ_TAG)) {
     +			size_t s = data->size;
    -+			char *buf = NULL;
    ++			char *buf = repo_read_object_file(the_repository,
    ++							  &data->oid,
    ++							  &data->type,
    ++							  &data->size);
     +
    -+			buf = repo_read_object_file(the_repository, &data->oid, &data->type,
    -+						    &data->size);
    -+			buf = replace_idents_using_mailmap(buf, &s);
    ++			free(replace_idents_using_mailmap(buf, &s));
     +			data->size = cast_size_t_to_ulong(s);
    -+
    -+			free(buf);
     +		}
      	}