diff mbox series

[v10,9/9] ls-tree.c: introduce "--format" option

Message ID db058bf670c5668fc5b95baf83667cc282cb739b.1641978175.git.dyroneteng@gmail.com (mailing list archive)
State Superseded
Headers show
Series ls-tree: "--object-only" and "--format" opts | expand

Commit Message

Teng Long Jan. 13, 2022, 3:42 a.m. UTC
Add a --format option to ls-tree. It has an existing default output,
and then --long and --name-only options to emit the default output
along with the objectsize and, or to only emit object paths.

Rather than add --type-only, --object-only etc. we can just support a
--format using a strbuf_expand() similar to "for-each-ref
--format". We might still add such options in the future for
convenience.

The --format implementation is slower than the existing code, but this
change does not cause any performance regressions. We'll leave the
existing show_tree() unchanged, and only run show_tree_fmt() in if
a --format different than the hardcoded built-in ones corresponding to
the existing modes is provided.

I.e. something like the "--long" output would be much slower with
this, mainly due to how we need to allocate various things to do with
quote.c instead of spewing the output directly to stdout.

The new option of '--format' comes from Ævar Arnfjörð Bjarmasonn's
idea and suggestion, this commit makes modifications in terms of the
original discussion on community [1].

Here is the statistics about performance tests:

1. Default format (hitten the builtin formats):

    "git ls-tree <tree-ish>" vs "--format='%(mode) %(type) %(object)%x09%(file)'"

    $hyperfine --warmup=10 "/opt/git/master/bin/git ls-tree -r HEAD"
    Benchmark 1: /opt/git/master/bin/git ls-tree -r HEAD
    Time (mean ± σ):     105.2 ms ±   3.3 ms    [User: 84.3 ms, System: 20.8 ms]
    Range (min … max):    99.2 ms … 113.2 ms    28 runs

    $hyperfine --warmup=10 "/opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object)%x09%(file)'  HEAD"
    Benchmark 1: /opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object)%x09%(file)'  HEAD
    Time (mean ± σ):     106.4 ms ±   2.7 ms    [User: 86.1 ms, System: 20.2 ms]
    Range (min … max):   100.2 ms … 110.5 ms    29 runs

2. Default format includes object size (hitten the builtin formats):

    "git ls-tree -l <tree-ish>" vs "--format='%(mode) %(type) %(object) %(size:padded)%x09%(file)'"

    $hyperfine --warmup=10 "/opt/git/master/bin/git ls-tree -r -l HEAD"
    Benchmark 1: /opt/git/master/bin/git ls-tree -r -l HEAD
    Time (mean ± σ):     335.1 ms ±   6.5 ms    [User: 304.6 ms, System: 30.4 ms]
    Range (min … max):   327.5 ms … 348.4 ms    10 runs

    $hyperfine --warmup=10 "/opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object) %(size:padded)%x09%(file)'  HEAD"
    Benchmark 1: /opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object) %(size:padded)%x09%(file)'  HEAD
    Time (mean ± σ):     337.2 ms ±   8.2 ms    [User: 309.2 ms, System: 27.9 ms]
    Range (min … max):   328.8 ms … 349.4 ms    10 runs

Links:
	[1] https://public-inbox.org/git/RFC-patch-6.7-eac299f06ff-20211217T131635Z-avarab@gmail.com/

Signed-off-by: Teng Long <dyroneteng@gmail.com>
---
 Documentation/git-ls-tree.txt |  51 +++++++++++++-
 builtin/ls-tree.c             | 129 +++++++++++++++++++++++++++++++++-
 t/t3105-ls-tree-format.sh     |  55 +++++++++++++++
 3 files changed, 230 insertions(+), 5 deletions(-)
 create mode 100755 t/t3105-ls-tree-format.sh

Comments

Ævar Arnfjörð Bjarmason Jan. 13, 2022, 7:16 a.m. UTC | #1
On Thu, Jan 13 2022, Teng Long wrote:

> Signed-off-by: Teng Long <dyroneteng@gmail.com>
> ---
>  Documentation/git-ls-tree.txt |  51 +++++++++++++-
>  builtin/ls-tree.c             | 129 +++++++++++++++++++++++++++++++++-
>  t/t3105-ls-tree-format.sh     |  55 +++++++++++++++
>  3 files changed, 230 insertions(+), 5 deletions(-)
>  create mode 100755 t/t3105-ls-tree-format.sh
>
> diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
> index 729370f235..ebdde6eae3 100644
> --- a/Documentation/git-ls-tree.txt
> +++ b/Documentation/git-ls-tree.txt
> @@ -10,9 +10,9 @@ SYNOPSIS
>  --------
>  [verse]
>  'git ls-tree' [-d] [-r] [-t] [-l] [-z]
> -	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]]
> -	    <tree-ish> [<path>...]
> -
> +	    [--name-only] [--name-status] [--object-only]
> +	    [--full-name] [--full-tree] [--abbrev[=<n>]] 

Let's split up this re-flow only change into its own commit? I.e. the
only non-whitespace change here is beginning with [--format].

If it was the right thing to do to re-flow this then we didn't need
[--format=<format>] to exist to do so...

> +	    [--format=<format>] <tree-ish> [<path>...]
>  DESCRIPTION

Removing this \n breaks the formatting in the file. See "make man && man
./Documentation/git-ls-tree.1". The ./Documentation/doc-diff utility is
also handy for sanity checking the documentation formatting.

>  -----------
>  Lists the contents of a given tree object, like what "/bin/ls -a" does
> @@ -79,6 +79,16 @@ OPTIONS
>  	Do not limit the listing to the current working directory.
>  	Implies --full-name.
>  
> +--format=<format>::
> +	A string that interpolates `%(fieldname)` from the result
> +	being shown. It also interpolates `%%` to `%`, and
> +	`%xx` where `xx` are hex digits interpolates to character
> +	with hex code `xx`; for example `%00` interpolates to
> +	`\0` (NUL), `%09` to `\t` (TAB) and `%0a` to `\n` (LF).
> +	When specified, `--format` cannot be combined with other
> +	format-altering options, including `--long`, `--name-only`
> +	and `--object-only`.
> +

These new docs make sense & seem to cover all the basis, thanks!

> +
> +Default format:
> +
>          <mode> SP <type> SP <object> TAB <file>

Here because we've added --format discussing the previous pseudo-format
as a "default" format becomes confusing. Let's instead say:

        The output format of `ls-tree` is determined by either the `--format` option,
        or other format-altering options such as `--name-long` etc. (see `--format` above).

        The use of certain `--format` directives is equivalent to using those options,
        but invoking the full formatting machinery can be slower than using an appropriate
        formatting option.

        In cases where the `--format` would exactly map to an existing option `ls-tree` will
        use the appropriate faster path. Thus the default format is equivalent to:
        ---
        %(mode) %(type) %(object)%x09%(file)
        ---

Or something like that. We could then discuss e.g. --name-long being
`%(mode) %(type) %(object) %(size:padded)%x09%(file)` when we discuss
that option.

>  This output format is compatible with what `--index-info --stdin` of
> @@ -105,6 +118,38 @@ quoted as explained for the configuration variable `core.quotePath`
>  (see linkgit:git-config[1]).  Using `-z` the filename is output
>  verbatim and the line is terminated by a NUL byte.
>  
> +Customized format:
> +
> +It is possible to print in a custom format by using the `--format` option,
> +which is able to interpolate different fields using a `%(fieldname)` notation.
> +For example, if you want to only print the <object> and <file> fields with a
> +JSON style, executing with a specific "--format" like
> +
> +        git ls-tree --format='{"object":"%(object)", "file":"%(file)"}' <tree-ish>
> +
> +The output format changes to:
> +
> +        {"object":"<object>", "file":"<file>"}

This one-liner is guaranteed to result in invalid JSON on some
repositories, both because JSON is inherently a bad fit for git's data
model (JSON needs to be in one Unicode encoding, Git's tree data might
me in a mixture of encodings), and because it'll break if the file
includes a '"'.

I think it's better to just replace this with some example involving -z,
or at least prominently note that this is broken in the general case,
but can be used ad-hoc to quickly check things with "jq" or whatever.

> +FIELD NAMES
> +-----------
> +
> +Various values from structured fields can be used to interpolate
> +into the resulting output. For each outputing line, the following
> +names can be used:
> +
> +mode::
> +	The mode of the object.
> +type::
> +	The type of the object (`blob` or `tree`).
> +object::
> +	The name of the object.
> +size[:padded]::
> +	The size of the object ("-" if it's a tree).
> +	It also supports a padded format of size with "%(size:padded)".
> +file::
> +	The filename of the object.

In
https://lore.kernel.org/git/cover.1641043500.git.dyroneteng@gmail.com/
you noted that you changed the field names of e.g. "objectname" to
"object" etc. You're right that I picked these as-is from the
git-for-each-ref formatting.

1/3 of your reasoning for doing so was to make it consistent with the
documentation examples of e.g.:

     <mode> SP <type> SP <object> TAB <file>

I think in any case (as noted above) we should change those to use the
--format), so that leaves just:

 - "I prefer to make the name more simple to memorize and type"
 - "I think the names with "object" prefix are [from git-for-each-ref
   and the object* prefixes aren't redundant there, but would be here]".

I think both of those still apply, but I think having these consistent
with git-for-each-ref outweighs the slight benefit of shorter names.

Right now only a handful of things support these sort of --format
directives, but we've already got RFC/WIP patches to add that to
git-cat-file, and are likely to add more in the future.

I'd also like us to eventually be able to combine what are now separate
built-ins with their own --format to expose more deeply some internal
APIs via IPC. E.g. now you can do this:

    git for-each-ref --format='%(refname) %(tree)'

But to list each of those trees you'd need to pipe that output into this
new 'git ls-tree --format. But imagine being able to do something like:

    git for-each-ref --format='%(refname) %(git-ls-tree --format %%(objectname) %(tree))'

Where we'd just invoke git-ls-tree for you without running a full
sub-process. I think both for that hypothetical and working with the two
--formats now having to use %(type) in some places but %(objecttype)
etc. in others is just needlessly confusing. Let's just consistently use
the same format names everywhere.

Specifically for your s/path/file/ name change, that's just inaccurate, consider:

    $ ./git ls-tree --format="%(mode) %(type) %(file)" -t HEAD -- t/README
    040000 tree t
    100644 blob t/README

And:

    $ $ (cd t && ../git ls-tree --format="%(mode) %(type) %(file)" -t -r HEAD -- README)
    040000 tree ./
    100644 blob README

I.e. we talk about <path> in the existing SYNOPSIS for a reason. That we
had a "<file>" in the existing format demo was a bug/shorthand that we
shouldn't be propagating further.

> [...]
> +static const char *format;
> +static const char *default_format = "%(mode) %(type) %(object)%x09%(file)";
> +static const char *long_format = "%(mode) %(type) %(object) %(size:padded)%x09%(file)";
> +static const char *name_only_format = "%(file)";
> +static const char *object_only_format = "%(object)";
> +

One advantage of keeping the variable names I picked in
https://lore.kernel.org/git/RFC-patch-6.7-eac299f06ff-20211217T131635Z-avarab@gmail.com/
is that they align, so you can instantly see that the first two are
equivalent until the "%x09".

It also makes it easier to review to avoid such churn, to see what you
really changed I'm looking at a local version of a range-diff where I
renamed these, the struct you renamed etc. back just to see what you
/really/ changed. I.e. what are functional v.s. renaming changes.

>  static int parse_shown_fields(void)
>  {
>  	if (cmdmode == MODE_NAME_ONLY) {
> @@ -76,6 +82,72 @@ static int parse_shown_fields(void)
>  	return 1;
>  }
>  
> +static void expand_objectsize(struct strbuf *line, const struct object_id *oid,
> +			      const enum object_type type, unsigned int padded)
> +{
> +	if (type == OBJ_BLOB) {
> +		unsigned long size;
> +		if (oid_object_info(the_repository, oid, &size) < 0)
> +			die(_("could not get object info about '%s'"),
> +			    oid_to_hex(oid));
> +		if (padded)
> +			strbuf_addf(line, "%7" PRIuMAX, (uintmax_t)size);
> +		else
> +			strbuf_addf(line, "%" PRIuMAX, (uintmax_t)size);

Here you changed my '"%"PRIuMAX' to '"%" PRIuMAX'. The former is the
prevailing style in this codebase, and avoiding the formatting churn
makes the inter-diff easier to read.

> +	} else if (padded) {
> +		strbuf_addf(line, "%7s", "-");
> +	} else {
> +		strbuf_addstr(line, "-");
> +	}
> +}

Ditto some harder to review interdiff due to renaming
churn. I.e. s/line/sb/ in both this and expand_show_tree(). I really
wouldn't care at all except because of all the manual work in reviewing
the inter-diff between my original version & this derived version.

In the case of "line" that's not even an improvement. With a --format
we're not building a "line", the user is free to insert any arbitrary
directives including \n's, so we might be working on multiple lines.

> +test_expect_success 'ls-tree --format usage' '
> +	test_expect_code 129 git ls-tree --format=fmt -l HEAD &&
> +	test_expect_code 129 git ls-tree --format=fmt --name-only HEAD &&
> +	test_expect_code 129 git ls-tree --format=fmt --name-status HEAD &&
> +	test_expect_code 129 git ls-tree --format=fmt --object-only HEAD
> +'

This & several other changes v.s. my version are good, e.g. here I seem
to have repeated the logic error I noted for your version (i.e omitting
"HEAD"), oops!

> +test_expect_success 'setup' '
> +	mkdir dir &&
> +	test_commit dir/sub-file &&
> +	test_commit top-file
> +'
> +
> +test_ls_tree_format () {
> +	format=$1 &&
> +	opts=$2 &&
> +	shift 2 &&
> +	git ls-tree $opts -r HEAD >expect.raw &&
> +	sed "s/^/> /" >expect <expect.raw &&
> +	git ls-tree --format="> $format" -r HEAD >actual &&
> +	test_cmp expect actual
> +}
> +
> +test_expect_success 'ls-tree --format=<default-like>' '
> +	test_ls_tree_format \
> +		"%(mode) %(type) %(object)%x09%(file)" \
> +		""
> +'
> +
> +test_expect_success 'ls-tree --format=<long-like>' '
> +	test_ls_tree_format \
> +		"%(mode) %(type) %(object) %(size:padded)%x09%(file)" \
> +		"--long"
> +'
> +
> +test_expect_success 'ls-tree --format=<name-only-like>' '
> +	test_ls_tree_format \
> +		"%(file)" \
> +		"--name-only"
> +'
> +
> +test_expect_success 'ls-tree --format=<object-only-like>' '
> +	test_ls_tree_format \
> +		"%(object)" \
> +		"--object-only"
> +'
> +
> +test_done

As I noted in my RFC CL (https://lore.kernel.org/git/RFC-cover-0.7-00000000000-20211217T131635Z-avarab@gmail.com/):

	"the tests for ls-tree are really
	lacking. E.g. I seem to have a rather obvious bug in how -t and the
	--format interact here, but no test catches it."

So first, in my version of adding --format I was careful to make
--name-only etc. imply a given --format, and then only at the last
minute would we take the "fast path":
https://lore.kernel.org/git/RFC-patch-6.7-eac299f06ff-20211217T131635Z-avarab@gmail.com/

You rewrote that in
https://lore.kernel.org/git/e0add802fbbabde7e7b3743127b2d4047f1ce760.1641043500.git.dyroneteng@gmail.com/
and qremoved the limited "GIT_TEST_LS_TREE_FORMAT_BACKEND" testing I
added, so now the internal --format machinery can't be run through the
existing tests we do have.

Even with that re-added I really wouldn't trust that this code is doing
the right thing (and as noted, I don't trust my own RFC version
either). I think e.g. our "coverage" Makefile targets would be a good
start as a first approximation, i.e. running the /ls-tree/ tests and
seeing if we have full coverage.

> Signed-off-by: Teng Long <dyroneteng@gmail.com>

As I noted in 7/9 I think this patch is 9/9 still mostly something I
wrote, so that the "author" and Signed-off-by should be preserved. The
below is a range-diff of an amended version I've been looking at in
trying to review this. It undoes several (but not all) of your
formatting/renaming-only changes, just so that I could see what the
non-formatting changes were:

1:  6c96dff15c5 ! 1:  917bb168d45 ls-tree: add a --format=<fmt> option
    @@
      ## Metadata ##
    -Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    +Author: Teng Long <dyroneteng@gmail.com>
     
      ## Commit message ##
    -    ls-tree: add a --format=<fmt> option
    +    ls-tree.c: introduce "--format" option
     
         Add a --format option to ls-tree. It has an existing default output,
         and then --long and --name-only options to emit the default output
    @@ Commit message
     
         The --format implementation is slower than the existing code, but this
         change does not cause any performance regressions. We'll leave the
    -    existing show_tree() unchanged, and only run show_tree_format() in if
    +    existing show_tree() unchanged, and only run show_tree_fmt() in if
         a --format different than the hardcoded built-in ones corresponding to
         the existing modes is provided.
     
    -    "Slower" here can bee seen via the the following "hyperfine"
    -    command. This uses GIT_TEST_LS_TREE_FORMAT_BACKEND=<bool> to force the
    -    use of the new backend:
    -
    -        $ hyperfine -L env false,true -L f "-r,-r -l,-r --name-only,-r --format='%(objectname)'" 'GIT_TEST_LS_TREE_FORMAT_BACKEND={env} ./git -C ~/g/linux ls-tree {f} HEAD' -r 10
    -        Benchmark 1: GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r HEAD
    -          Time (mean ± σ):      86.1 ms ±   0.6 ms    [User: 65.2 ms, System: 20.9 ms]
    -          Range (min … max):    85.2 ms …  87.5 ms    10 runs
    +    I.e. something like the "--long" output would be much slower with
    +    this, mainly due to how we need to allocate various things to do with
    +    quote.c instead of spewing the output directly to stdout.
     
    -        Benchmark 2: GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r HEAD
    -          Time (mean ± σ):     122.5 ms ±   0.6 ms    [User: 101.3 ms, System: 21.1 ms]
    -          Range (min … max):   121.8 ms … 123.4 ms    10 runs
    +    The new option of '--format' comes from Ævar Arnfjörð Bjarmasonn's
    +    idea and suggestion, this commit makes modifications in terms of the
    +    original discussion on community [1].
     
    -        Benchmark 3: GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r -l HEAD
    -          Time (mean ± σ):     277.7 ms ±   1.3 ms    [User: 234.6 ms, System: 43.0 ms]
    -          Range (min … max):   275.9 ms … 279.7 ms    10 runs
    +    Here is the statistics about performance tests:
     
    -        Benchmark 4: GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r -l HEAD
    -          Time (mean ± σ):     332.8 ms ±   2.6 ms    [User: 282.0 ms, System: 50.7 ms]
    -          Range (min … max):   329.6 ms … 338.2 ms    10 runs
    +    1. Default format (hitten the builtin formats):
     
    -        Benchmark 5: GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r --name-only HEAD
    -          Time (mean ± σ):      71.8 ms ±   0.4 ms    [User: 54.1 ms, System: 17.6 ms]
    -          Range (min … max):    71.2 ms …  72.5 ms    10 runs
    +        "git ls-tree <tree-ish>" vs "--format='%(mode) %(type) %(object)%x09%(file)'"
     
    -        Benchmark 6: GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r --name-only HEAD
    -          Time (mean ± σ):      86.6 ms ±   0.5 ms    [User: 65.7 ms, System: 20.7 ms]
    -          Range (min … max):    85.9 ms …  87.4 ms    10 runs
    +        $hyperfine --warmup=10 "/opt/git/master/bin/git ls-tree -r HEAD"
    +        Benchmark 1: /opt/git/master/bin/git ls-tree -r HEAD
    +        Time (mean ± σ):     105.2 ms ±   3.3 ms    [User: 84.3 ms, System: 20.8 ms]
    +        Range (min … max):    99.2 ms … 113.2 ms    28 runs
     
    -        Benchmark 7: GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r --format='%(objectname)' HEAD
    -          Time (mean ± σ):      85.8 ms ±   0.6 ms    [User: 66.2 ms, System: 19.5 ms]
    -          Range (min … max):    85.0 ms …  86.9 ms    10 runs
    +        $hyperfine --warmup=10 "/opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object)%x09%(file)'  HEAD"
    +        Benchmark 1: /opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object)%x09%(file)'  HEAD
    +        Time (mean ± σ):     106.4 ms ±   2.7 ms    [User: 86.1 ms, System: 20.2 ms]
    +        Range (min … max):   100.2 ms … 110.5 ms    29 runs
     
    -        Benchmark 8: GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r --format='%(objectname)' HEAD
    -          Time (mean ± σ):      85.3 ms ±   0.2 ms    [User: 66.6 ms, System: 18.7 ms]
    -          Range (min … max):    85.0 ms …  85.7 ms    10 runs
    +    2. Default format includes object size (hitten the builtin formats):
     
    -        Summary
    -          'GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r --name-only HEAD' ran
    -            1.19 ± 0.01 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r --format='%(objectname)' HEAD'
    -            1.19 ± 0.01 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r --format='%(objectname)' HEAD'
    -            1.20 ± 0.01 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r HEAD'
    -            1.21 ± 0.01 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r --name-only HEAD'
    -            1.71 ± 0.01 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r HEAD'
    -            3.87 ± 0.03 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=false ./git -C ~/g/linux ls-tree -r -l HEAD'
    -            4.64 ± 0.05 times faster than 'GIT_TEST_LS_TREE_FORMAT_BACKEND=true ./git -C ~/g/linux ls-tree -r -l HEAD'
    +        "git ls-tree -l <tree-ish>" vs "--format='%(mode) %(type) %(object) %(size:padded)%x09%(file)'"
     
    -    I.e. something like the "--long" output would be much slower with
    -    this, mainly due to how we need to allocate various things to do with
    -    quote.c instead of spewing the output directly to stdout.
    +        $hyperfine --warmup=10 "/opt/git/master/bin/git ls-tree -r -l HEAD"
    +        Benchmark 1: /opt/git/master/bin/git ls-tree -r -l HEAD
    +        Time (mean ± σ):     335.1 ms ±   6.5 ms    [User: 304.6 ms, System: 30.4 ms]
    +        Range (min … max):   327.5 ms … 348.4 ms    10 runs
     
    -    But even a --format='%(objectname)' is fast with the new backend, so
    -    this is viable as a replacement for adding new formats, and we'll pay
    -    for this added complexity as a one-off, and not again every time a new
    -    format needs to be added. See [1] for an example of what it would
    -    otherwise take to add an --object-name flag.
    +        $hyperfine --warmup=10 "/opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object) %(size:padded)%x09%(file)'  HEAD"
    +        Benchmark 1: /opt/git/ls-tree-oid-only/bin/git ls-tree -r --format='%(mode) %(type) %(object) %(size:padded)%x09%(file)'  HEAD
    +        Time (mean ± σ):     337.2 ms ±   8.2 ms    [User: 309.2 ms, System: 27.9 ms]
    +        Range (min … max):   328.8 ms … 349.4 ms    10 runs
     
    -    1. https://lore.kernel.org/git/2e449d1c792ff81da5f22c8bf65ed33c393d62f8.1639721750.git.dyroneteng@gmail.com/
    +    Links:
    +            [1] https://public-inbox.org/git/RFC-patch-6.7-eac299f06ff-20211217T131635Z-avarab@gmail.com/
     
    -    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    +    Signed-off-by: Teng Long <dyroneteng@gmail.com>
     
    - ## builtin/ls-tree.c ##
    -@@ builtin/ls-tree.c: static struct pathspec pathspec;
    - static int chomp_prefix;
    - static const char *ls_tree_prefix;
    + ## Documentation/git-ls-tree.txt ##
    +@@ Documentation/git-ls-tree.txt: SYNOPSIS
    + --------
    + [verse]
    + 'git ls-tree' [-d] [-r] [-t] [-l] [-z]
    +-	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]]
    +-	    <tree-ish> [<path>...]
    ++	    [--name-only] [--name-status] [--object-only]
    ++	    [--full-name] [--full-tree] [--abbrev[=<n>]]
    ++	    [--format=<format>] <tree-ish> [<path>...]
      
    -+/*
    -+ * The format equivalents that show_tree() is prepared to handle.
    -+ */
    -+static const char *ls_tree_format_d = "%(objectmode) %(objecttype) %(objectname)%x09%(path)";
    -+static const char *ls_tree_format_l = "%(objectmode) %(objecttype) %(objectname) %(objectsize:padded)%x09%(path)";
    -+static const char *ls_tree_format_n = "%(path)";
    + DESCRIPTION
    + -----------
    +@@ Documentation/git-ls-tree.txt: OPTIONS
    + 	Do not limit the listing to the current working directory.
    + 	Implies --full-name.
    + 
    ++--format=<format>::
    ++	A string that interpolates `%(fieldname)` from the result
    ++	being shown. It also interpolates `%%` to `%`, and
    ++	`%xx` where `xx` are hex digits interpolates to character
    ++	with hex code `xx`; for example `%00` interpolates to
    ++	`\0` (NUL), `%09` to `\t` (TAB) and `%0a` to `\n` (LF).
    ++	When specified, `--format` cannot be combined with other
    ++	format-altering options, including `--long`, `--name-only`
    ++	and `--object-only`.
     +
    - static const  char * const ls_tree_usage[] = {
    - 	N_("git ls-tree [<options>] <tree-ish> [<path>...]"),
    - 	NULL
    - };
    + [<path>...]::
    + 	When paths are given, show them (note that this isn't really raw
    + 	pathnames, but rather a list of patterns to match).  Otherwise
    +@@ Documentation/git-ls-tree.txt: OPTIONS
      
    -+struct read_tree_ls_tree_data {
    -+	const char *format;
    -+	struct strbuf sb_scratch;
    -+	struct strbuf sb_tmp;
    -+};
    + Output Format
    + -------------
    ++
    ++Default format:
    ++
    +         <mode> SP <type> SP <object> TAB <file>
    + 
    + This output format is compatible with what `--index-info --stdin` of
    +@@ Documentation/git-ls-tree.txt: quoted as explained for the configuration variable `core.quotePath`
    + (see linkgit:git-config[1]).  Using `-z` the filename is output
    + verbatim and the line is terminated by a NUL byte.
    + 
    ++Customized format:
     +
    ++It is possible to print in a custom format by using the `--format` option,
    ++which is able to interpolate different fields using a `%(fieldname)` notation.
    ++For example, if you want to only print the <object> and <file> fields with a
    ++JSON style, executing with a specific "--format" like
    ++
    ++        git ls-tree --format='{"object":"%(object)", "file":"%(file)"}' <tree-ish>
    ++
    ++The output format changes to:
    ++
    ++        {"object":"<object>", "file":"<file>"}
    ++
    ++FIELD NAMES
    ++-----------
    ++
    ++Various values from structured fields can be used to interpolate
    ++into the resulting output. For each outputing line, the following
    ++names can be used:
    ++
    ++mode::
    ++	The mode of the object.
    ++type::
    ++	The type of the object (`blob` or `tree`).
    ++object::
    ++	The name of the object.
    ++size[:padded]::
    ++	The size of the object ("-" if it's a tree).
    ++	It also supports a padded format of size with "%(size:padded)".
    ++file::
    ++	The filename of the object.
    ++
    + GIT
    + ---
    + Part of the linkgit:git[1] suite
    +
    + ## builtin/ls-tree.c ##
    +@@ builtin/ls-tree.c: static unsigned int shown_fields;
    + #define FIELD_DEFAULT 29 /* 11101 size is not shown to output by default */
    + #define FIELD_LONG_DEFAULT  (FIELD_DEFAULT | FIELD_SIZE)
    + 
     +struct expand_ls_tree_data {
     +	unsigned mode;
     +	enum object_type type;
     +	const struct object_id *oid;
     +	const char *pathname;
    -+	const char *basebuf;
    -+	struct strbuf *sb_scratch;
    -+	struct strbuf *sb_tmp;
    ++	struct strbuf *base;
     +};
     +
    - static int show_recursive(const char *base, size_t baselen, const char *pathname)
    + static const  char * const ls_tree_usage[] = {
    + 	N_("git ls-tree [<options>] <tree-ish> [<path>...]"),
    + 	NULL
    +@@ builtin/ls-tree.c: enum {
    + 
    + static int cmdmode = MODE_UNSPECIFIED;
    + 
    ++static const char *format;
    ++static const char *ls_tree_format_d = "%(objectmode) %(objecttype) %(objectname)%x09%(path)";
    ++static const char *ls_tree_format_l = "%(objectmode) %(objecttype) %(objectname) %(objectsize:padded)%x09%(path)";
    ++static const char *ls_tree_format_n = "%(path)";
    ++static const char *ls_tree_format_o = "%(objectname)";
    ++
    + static int parse_shown_fields(void)
      {
    - 	int i;
    -@@ builtin/ls-tree.c: static int show_recursive(const char *base, size_t baselen, const char *pathname
    - 	return 0;
    + 	if (cmdmode == MODE_NAME_ONLY) {
    +@@ builtin/ls-tree.c: static int parse_shown_fields(void)
    + 	return 1;
      }
      
    -+static void expand_objectsize(struct strbuf *sb,
    -+			      const struct object_id *oid,
    -+			      const enum object_type type,
    -+			      unsigned int padded)
    ++static void expand_objectsize(struct strbuf *sb, const struct object_id *oid,
    ++			      const enum object_type type, unsigned int padded)
     +{
     +	if (type == OBJ_BLOB) {
     +		unsigned long size;
     +		if (oid_object_info(the_repository, oid, &size) < 0)
    -+			die(_("could not get object info about '%s'"), oid_to_hex(oid));
    ++			die(_("could not get object info about '%s'"),
    ++			    oid_to_hex(oid));
     +		if (padded)
     +			strbuf_addf(sb, "%7"PRIuMAX, (uintmax_t)size);
     +		else
    @@ builtin/ls-tree.c: static int show_recursive(const char *base, size_t baselen, c
     +	}
     +}
     +
    -+static size_t expand_show_tree(struct strbuf *sb,
    -+			       const char *start,
    ++static size_t expand_show_tree(struct strbuf *sb, const char *start,
     +			       void *context)
     +{
     +	struct expand_ls_tree_data *data = context;
     +	const char *end;
     +	const char *p;
    -+	size_t len;
    ++	unsigned int errlen;
    ++	size_t len = strbuf_expand_literal_cb(sb, start, NULL);
     +
    -+	len = strbuf_expand_literal_cb(sb, start, NULL);
     +	if (len)
     +		return len;
    -+
     +	if (*start != '(')
    -+		die(_("bad format as of '%s'"), start);
    ++		die(_("bad ls-tree format: as '%s'"), start);
    ++
     +	end = strchr(start + 1, ')');
     +	if (!end)
    -+		die(_("ls-tree format element '%s' does not end in ')'"),
    -+		    start);
    -+	len = end - start + 1;
    ++		die(_("bad ls-tree format: element '%s' does not end in ')'"), start);
     +
    ++	len = end - start + 1;
     +	if (skip_prefix(start, "(objectmode)", &p)) {
     +		strbuf_addf(sb, "%06o", data->mode);
     +	} else if (skip_prefix(start, "(objecttype)", &p)) {
    @@ builtin/ls-tree.c: static int show_recursive(const char *base, size_t baselen, c
     +	} else if (skip_prefix(start, "(objectsize)", &p)) {
     +		expand_objectsize(sb, data->oid, data->type, 0);
     +	} else if (skip_prefix(start, "(objectname)", &p)) {
    -+		strbuf_addstr(sb, find_unique_abbrev(data->oid, abbrev));
    ++		strbuf_add_unique_abbrev(sb, data->oid, abbrev);
     +	} else if (skip_prefix(start, "(path)", &p)) {
    -+		const char *name = data->basebuf;
    ++		const char *name = data->base->buf;
     +		const char *prefix = chomp_prefix ? ls_tree_prefix : NULL;
    -+
    -+		if (prefix)
    -+			name = relative_path(name, prefix, data->sb_scratch);
    -+		quote_c_style(name, data->sb_tmp, NULL, 0);
    -+		strbuf_add(sb, data->sb_tmp->buf, data->sb_tmp->len);
    -+
    -+		strbuf_reset(data->sb_tmp);
    -+		/* The relative_path() function resets "scratch" */
    ++		struct strbuf quoted = STRBUF_INIT;
    ++		struct strbuf s = STRBUF_INIT;
    ++		strbuf_addstr(data->base, data->pathname);
    ++		name = relative_path(data->base->buf, prefix, &s);
    ++		quote_c_style(name, &quoted, NULL, 0);
    ++		strbuf_addbuf(sb, &quoted);
    ++		strbuf_release(&s);
    ++		strbuf_release(&quoted);
     +	} else {
    -+		unsigned int errlen = (unsigned long)len;
    -+		die(_("bad ls-tree format specifiec %%%.*s"), errlen, start);
    ++		errlen = (unsigned long)len;
    ++		die(_("bad ls-tree format: %%%.*s"), errlen, start);
     +	}
    -+
     +	return len;
     +}
     +
    - static int show_tree_init(enum object_type *type, struct strbuf *base,
    - 			  const char *pathname, unsigned mode, int *retval)
    + static int show_recursive(const char *base, size_t baselen,
    + 			  const char *pathname)
      {
    -@@ builtin/ls-tree.c: static int show_tree_init(enum object_type *type, struct strbuf *base,
    +@@ builtin/ls-tree.c: static int show_recursive(const char *base, size_t baselen,
      	return 0;
      }
      
    +-static int show_default(const struct object_id *oid, enum object_type type,
    +-			const char *pathname, unsigned mode,
    +-			struct strbuf *base)
     +static int show_tree_fmt(const struct object_id *oid, struct strbuf *base,
     +			 const char *pathname, unsigned mode, void *context)
    -+{
    -+	struct read_tree_ls_tree_data *data = context;
    -+	struct expand_ls_tree_data my_data = {
    + {
    +-	size_t baselen = base->len;
    ++	size_t baselen;
    ++	int recurse = 0;
    ++	struct strbuf line = STRBUF_INIT;
    ++	enum object_type type = object_type(mode);
    ++
    ++	struct expand_ls_tree_data data = {
     +		.mode = mode,
    -+		.type = OBJ_BLOB,
    ++		.type = type,
     +		.oid = oid,
     +		.pathname = pathname,
    -+		.sb_scratch = &data->sb_scratch,
    -+		.sb_tmp = &data->sb_tmp,
    ++		.base = base,
     +	};
    -+	struct strbuf sb = STRBUF_INIT;
    -+	int retval = 0;
    -+	size_t baselen;
     +
    -+	if (show_tree_init(&my_data.type, base, pathname, mode, &retval))
    -+		return retval;
    ++	if (type == OBJ_TREE && show_recursive(base->buf, base->len, pathname))
    ++		recurse = READ_TREE_RECURSIVE;
    ++	if (type == OBJ_TREE && recurse && !(ls_options & LS_SHOW_TREES))
    ++		return recurse;
    ++	if (type == OBJ_BLOB && (ls_options & LS_TREE_ONLY))
    ++		return 0;
     +
     +	baselen = base->len;
    -+	strbuf_addstr(base, pathname);
    -+	strbuf_reset(&sb);
    -+	my_data.basebuf = base->buf;
    -+
    -+	strbuf_expand(&sb, data->format, expand_show_tree, &my_data);
    -+	strbuf_addch(&sb, line_termination);
    -+	fwrite(sb.buf, sb.len, 1, stdout);
    ++	strbuf_expand(&line, format, expand_show_tree, &data);
    ++	strbuf_addch(&line, line_termination);
    ++	fwrite(line.buf, line.len, 1, stdout);
    ++	strbuf_release(&line);
     +	strbuf_setlen(base, baselen);
    -+
    -+	return retval;
    ++	return recurse;
     +}
     +
    - static int show_tree(const struct object_id *oid, struct strbuf *base,
    - 		const char *pathname, unsigned mode, void *context)
    - {
    ++static int show_default(struct expand_ls_tree_data *data)
    ++{
    ++	size_t baselen = data->base->len;
    + 
    + 	if (shown_fields & FIELD_SIZE) {
    + 		char size_text[24];
    +-		if (type == OBJ_BLOB) {
    ++		if (data->type == OBJ_BLOB) {
    + 			unsigned long size;
    +-			if (oid_object_info(the_repository, oid, &size) == OBJ_BAD)
    ++			if (oid_object_info(the_repository, data->oid, &size) == OBJ_BAD)
    + 				xsnprintf(size_text, sizeof(size_text), "BAD");
    + 			else
    + 				xsnprintf(size_text, sizeof(size_text),
    +@@ builtin/ls-tree.c: static int show_default(const struct object_id *oid, enum object_type type,
    + 		} else {
    + 			xsnprintf(size_text, sizeof(size_text), "-");
    + 		}
    +-		printf("%06o %s %s %7s\t", mode, type_name(type),
    +-		find_unique_abbrev(oid, abbrev), size_text);
    ++		printf("%06o %s %s %7s\t", data->mode, type_name(data->type),
    ++		find_unique_abbrev(data->oid, abbrev), size_text);
    + 	} else {
    +-		printf("%06o %s %s\t", mode, type_name(type),
    +-		find_unique_abbrev(oid, abbrev));
    ++		printf("%06o %s %s\t", data->mode, type_name(data->type),
    ++		find_unique_abbrev(data->oid, abbrev));
    + 	}
    +-	baselen = base->len;
    +-	strbuf_addstr(base, pathname);
    +-	write_name_quoted_relative(base->buf,
    ++	baselen = data->base->len;
    ++	strbuf_addstr(data->base, data->pathname);
    ++	write_name_quoted_relative(data->base->buf,
    + 				   chomp_prefix ? ls_tree_prefix : NULL, stdout,
    + 				   line_termination);
    +-	strbuf_setlen(base, baselen);
    ++	strbuf_setlen(data->base, baselen);
    + 	return 1;
    + }
    + 
    +@@ builtin/ls-tree.c: static int show_tree(const struct object_id *oid, struct strbuf *base,
    + 	size_t baselen;
    + 	enum object_type type = object_type(mode);
    + 
    ++	struct expand_ls_tree_data data = {
    ++		.mode = mode,
    ++		.type = type,
    ++		.oid = oid,
    ++		.pathname = pathname,
    ++		.base = base,
    ++	};
    ++
    + 	if (type == OBJ_TREE && show_recursive(base->buf, base->len, pathname))
    + 		recurse = READ_TREE_RECURSIVE;
    + 	if (type == OBJ_TREE && recurse && !(ls_options & LS_SHOW_TREES))
    +@@ builtin/ls-tree.c: static int show_tree(const struct object_id *oid, struct strbuf *base,
    + 	}
    + 
    + 	if (shown_fields >= FIELD_DEFAULT)
    +-		show_default(oid, type, pathname, mode, base);
    ++		show_default(&data);
    + 
    + 	return recurse;
    + }
     @@ builtin/ls-tree.c: int cmd_ls_tree(int argc, const char **argv, const char *prefix)
      	struct object_id oid;
      	struct tree *tree;
      	int i, full_tree = 0;
    -+	const char *implicit_format = NULL;
    -+	const char *format = NULL;
    -+	struct read_tree_ls_tree_data read_tree_cb_data = {
    -+		.sb_scratch = STRBUF_INIT,
    -+		.sb_tmp = STRBUF_INIT,
    -+	};
    ++	read_tree_fn_t fn = show_tree;
      	const struct option ls_tree_options[] = {
      		OPT_BIT('d', NULL, &ls_options, N_("only show trees"),
      			LS_TREE_ONLY),
    @@ builtin/ls-tree.c: int cmd_ls_tree(int argc, const char **argv, const char *pref
      		OPT_BOOL(0, "full-tree", &full_tree,
      			 N_("list entire tree; not just current directory "
      			    "(implies --full-name)")),
    -+		OPT_STRING_F(0 , "format", &format, N_("format"),
    -+			     N_("format to use for the output"), PARSE_OPT_NONEG),
    ++		OPT_STRING_F(0, "format", &format, N_("format"),
    ++			     N_("format to use for the output"),
    ++			     PARSE_OPT_NONEG),
      		OPT__ABBREV(&abbrev),
      		OPT_END()
      	};
    -+	read_tree_fn_t fn = show_tree;
    - 
    - 	git_config(git_default_config, NULL);
    - 	ls_tree_prefix = prefix;
     @@ builtin/ls-tree.c: int cmd_ls_tree(int argc, const char **argv, const char *prefix)
    - 	if ( (LS_TREE_ONLY|LS_RECURSIVE) ==
      	    ((LS_TREE_ONLY|LS_RECURSIVE) & ls_options))
      		ls_options |= LS_SHOW_TREES;
    -+	if (ls_options & LS_NAME_ONLY)
    -+		implicit_format = ls_tree_format_n;
    -+	if (ls_options & LS_SHOW_SIZE)
    -+		implicit_format = ls_tree_format_l;
    -+
    -+	if (format && implicit_format)
    -+		usage_msg_opt(_("providing --format cannot be combined with other format-altering options"),
    -+			      ls_tree_usage, ls_tree_options);
    -+	if (implicit_format)
    -+		format = implicit_format;
    -+	if (!format)
    -+		format = ls_tree_format_d;
      
    ++	if (format && cmdmode)
    ++		usage_msg_opt(
    ++			_("--format can't be combined with other format-altering options"),
    ++			ls_tree_usage, ls_tree_options);
      	if (argc < 1)
      		usage_with_options(ls_tree_usage, ls_tree_options);
    + 	if (get_oid(argv[0], &oid))
     @@ builtin/ls-tree.c: int cmd_ls_tree(int argc, const char **argv, const char *prefix)
      	tree = parse_tree_indirect(&oid);
      	if (!tree)
      		die("not a tree object");
    +-	return !!read_tree(the_repository, tree,
    +-			   &pathspec, show_tree, NULL);
     +
     +	/*
     +	 * The generic show_tree_fmt() is slower than show_tree(), so
     +	 * take the fast path if possible.
     +	 */
    -+	if (format && (!strcmp(format, ls_tree_format_d) ||
    -+		       !strcmp(format, ls_tree_format_l) ||
    -+		       !strcmp(format, ls_tree_format_n)))
    ++	if (format &&
    ++	    (!strcmp(format, ls_tree_format_d) ||
    ++	     !strcmp(format, ls_tree_format_l) ||
    ++	     !strcmp(format, ls_tree_format_n) ||
    ++	     !strcmp(format, ls_tree_format_o)))
     +		fn = show_tree;
     +	else if (format)
     +		fn = show_tree_fmt;
    -+	/*
    -+	 * Allow forcing the show_tree_fmt(), to test that it can
    -+	 * handle the test suite.
    -+	 */
    -+	if (git_env_bool("GIT_TEST_LS_TREE_FORMAT_BACKEND", 0))
    -+		fn = show_tree_fmt;
     +
    -+	read_tree_cb_data.format = format;
    - 	return !!read_tree(the_repository, tree,
    --			   &pathspec, show_tree, NULL);
    -+			   &pathspec, fn, &read_tree_cb_data);
    ++	return !!read_tree(the_repository, tree, &pathspec, fn, NULL);
      }
     
      ## t/t3105-ls-tree-format.sh (new) ##
    @@ t/t3105-ls-tree-format.sh (new)
     +. ./test-lib.sh
     +
     +test_expect_success 'ls-tree --format usage' '
    -+	test_expect_code 129 git ls-tree --format=fmt -l &&
    -+	test_expect_code 129 git ls-tree --format=fmt --name-only &&
    -+	test_expect_code 129 git ls-tree --format=fmt --name-status
    ++	test_expect_code 129 git ls-tree --format=fmt -l HEAD &&
    ++	test_expect_code 129 git ls-tree --format=fmt --name-only HEAD &&
    ++	test_expect_code 129 git ls-tree --format=fmt --name-status HEAD &&
    ++	test_expect_code 129 git ls-tree --format=fmt --object-only HEAD
     +'
     +
     +test_expect_success 'setup' '
    @@ t/t3105-ls-tree-format.sh (new)
     +	test_ls_tree_format \
     +		"%(path)" \
     +		"--name-only"
    ++'
     +
    ++test_expect_success 'ls-tree --format=<object-only-like>' '
    ++	test_ls_tree_format \
    ++		"%(objectname)" \
    ++		"--object-only"
     +'
     +
     +test_done
Teng Long Jan. 18, 2022, 12:59 p.m. UTC | #2
On Thu, Jan 13, 2022 at 5:04 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> > diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
> > index 729370f235..ebdde6eae3 100644
> > --- a/Documentation/git-ls-tree.txt
> > +++ b/Documentation/git-ls-tree.txt
> > @@ -10,9 +10,9 @@ SYNOPSIS
> >  --------
> >  [verse]
> >  'git ls-tree' [-d] [-r] [-t] [-l] [-z]
> > -         [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]]
> > -         <tree-ish> [<path>...]
> > -
> > +         [--name-only] [--name-status] [--object-only]
> > +         [--full-name] [--full-tree] [--abbrev[=<n>]]
>
> Let's split up this re-flow only change into its own commit? I.e. the
> only non-whitespace change here is beginning with [--format].
>
> If it was the right thing to do to re-flow this then we didn't need
> [--format=<format>] to exist to do so...

Agree, especially if "--format" comes earlier as you mentioned in
another reply.

The doc change should only include the new "--format" here, so
we just:

diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index db02d6d79a..b02f028aca 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -10,5 +10,5 @@ SYNOPSIS
 --------
 [verse]
 'git ls-tree' [-d] [-r] [-t] [-l] [-z]
-           [--name-only] [--name-status] [--full-name] [--full-tree] [--abbrev[=<n>]]
+           [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]] [--format=<format>]
            <tree-ish> [<path>...]

is OK.

> > +         [--format=<format>] <tree-ish> [<path>...]
> >  DESCRIPTION
>
> Removing this \n breaks the formatting in the file. See "make man && man
> ./Documentation/git-ls-tree.1". The ./Documentation/doc-diff utility is
> also handy for sanity checking the documentation formatting.

This is my mistake and will be corrected in the next patch.

> >  -----------
> >  Lists the contents of a given tree object, like what "/bin/ls -a" does
> > @@ -79,6 +79,16 @@ OPTION
> >       Do not limit the listing to the current working directory.
> >       Implies --full-name.
> >
> > +--format=<format>::
> > +     A string that interpolates `%(fieldname)` from the result
> > +     being shown. It also interpolates `%%` to `%`, and
> > +     `%xx` where `xx` are hex digits interpolates to character
> > +     with hex code `xx`; for example `%00` interpolates to
> > +     `\0` (NUL), `%09` to `\t` (TAB) and `%0a` to `\n` (LF).
> > +     When specified, `--format` cannot be combined with other
> > +     format-altering options, including `--long`, `--name-only`
> > +     and `--object-only`.
> > +
>
> These new docs make sense & seem to cover all the basis, thanks!

Actually, the content is not from me, I borrowed it from other
documents, but I'm glad if it's described and placed here correctly.

> Here because we've added --format discussing the previous pseudo-format
> as a "default" format becomes confusing. Let's instead say:
>
>         The output format of `ls-tree` is determined by either the `--format` option,
>         or other format-altering options such as `--name-long` etc. (see `--format` above).
>
>         The use of certain `--format` directives is equivalent to using those options,
>         but invoking the full formatting machinery can be slower than using an appropriate
>         formatting option.
>
>         In cases where the `--format` would exactly map to an existing option `ls-tree` will
>         use the appropriate faster path. Thus the default format is equivalent to:
>         ---
>         %(mode) %(type) %(object)%x09%(file)
>         ---

Make sense.

I will use this paragraph instead in next patch except a tiny
nit (s/--name-long/--name-only/).

> > +The output format changes to:
> > +
> > +        {"object":"<object>", "file":"<file>"}
>
> This one-liner is guaranteed to result in invalid JSON on some
> repositories, both because JSON is inherently a bad fit for git's data
> model (JSON needs to be in one Unicode encoding, Git's tree data might
> me in a mixture of encodings), and because it'll break if the file
> includes a '"'.

Correct and especially we use a "quote_c" style behind.

> I think it's better to just replace this with some example involving -z,
> or at least prominently note that this is broken in the general case,
> but can be used ad-hoc to quickly check things with "jq" or whatever.

Your suggestion is great, but personally I don't want to introduce more
complexity and other tools in here, and try to describe it in a simple way.
I think the below maybe is enough:

@@ -117,14 +127,10 @@ Customized format:
 
 It is possible to print in a custom format by using the `--format` option,
 which is able to interpolate different fields using a `%(fieldname)` notation.
-For example, if you want to only print the <object> and <file> fields with a
-JSON style, executing with a specific "--format" like
-
-        git ls-tree --format='{"object":"%(object)", "file":"%(file)"}' <tree-ish>
-
-The output format changes to:
+For example, if you only care about the <object> and <file> fields, you can
+execute with a specific "--format" like
 
-        {"object":"<object>", "file":"<file>"}
+        git ls-tree --format="%(object) %(file)" <tree-ish>
 
 FIELD NAMES
 -----------

> > +FIELD NAMES
> > +-----------
> > +
> > +Various values from structured fields can be used to interpolate
> > +into the resulting output. For each outputing line, the following
> > +names can be used:
> > +
> > +mode::
> > +     The mode of the object.
> > +type::
> > +     The type of the object (`blob` or `tree`).
> > +object::
> > +     The name of the object.
> > +size[:padded]::
> > +     The size of the object ("-" if it's a tree).
> > +     It also supports a padded format of size with "%(size:padded)".
> > +file::
> > +     The filename of the object.
>
> In
> https://lore.kernel.org/git/cover.1641043500.git.dyroneteng@gmail.com/
> you noted that you changed the field names of e.g. "objectname" to
> "object" etc. You're right that I picked these as-is from the
> git-for-each-ref formatting.
>
> 1/3 of your reasoning for doing so was to make it consistent with the
> documentation examples of e.g.:
>
>      <mode> SP <type> SP <object> TAB <file>
>
> I think in any case (as noted above) we should change those to use the
> --format), so that leaves just:
>
>  - "I prefer to make the name more simple to memorize and type"
>  - "I think the names with "object" prefix are [from git-for-each-ref
>    and the object* prefixes aren't redundant there, but would be here]".
>
> I think both of those still apply, but I think having these consistent
> with git-for-each-ref outweighs the slight benefit of shorter names.
>
> Right now only a handful of things support these sort of --format
> directives, but we've already got RFC/WIP patches to add that to
> git-cat-file, and are likely to add more in the future.

New and important input for me on this.

> I'd also like us to eventually be able to combine what are now separate
> built-ins with their own --format to expose more deeply some internal
> APIs via IPC. E.g. now you can do this:
>
>     git for-each-ref --format='%(refname) %(tree)'
>
> But to list each of those trees you'd need to pipe that output into this
> new 'git ls-tree --format. But imagine being able to do something like:
>
>     git for-each-ref --format='%(refname) %(git-ls-tree --format %%(objectname) %(tree))'

Make sense.

> Where we'd just invoke git-ls-tree for you without running a full
> sub-process. I think both for that hypothetical and working with the two
> --formats now having to use %(type) in some places but %(objecttype)
> etc. in others is just needlessly confusing. Let's just consistently use
> the same format names everywhere.
>
> Specifically for your s/path/file/ name change, that's just inaccurate, consider:
>
>     $ ./git ls-tree --format="%(mode) %(type) %(file)" -t HEAD -- t/README
>     040000 tree t
>     100644 blob t/README
>
> And:
>
>     $ $ (cd t && ../git ls-tree --format="%(mode) %(type) %(file)" -t -r HEAD -- README)
>     040000 tree ./
>     100644 blob README
>
> I.e. we talk about <path> in the existing SYNOPSIS for a reason. That we
> had a "<file>" in the existing format demo was a bug/shorthand that we
> shouldn't be propagating further.

Should use "path" instead of "file" in here.

Make sense.

> > [...]
> > +static const char *format;
> > +static const char *default_format = "%(mode) %(type) %(object)%x09%(file)";
> > +static const char *long_format = "%(mode) %(type) %(object) %(size:padded)%x09%(file)";
> > +static const char *name_only_format = "%(file)";
> > +static const char *object_only_format = "%(object)";
> > +
>
> One advantage of keeping the variable names I picked in
> https://lore.kernel.org/git/RFC-patch-6.7-eac299f06ff-20211217T131635Z-avarab@gmail.com/
> is that they align, so you can instantly see that the first two are
> equivalent until the "%x09".

Ha. Thanks.I think they not align in here is because my variables' names not align :)  

Actually, I was hesitating to use like "object" or follow the "objectname" like
rules. I would like git to have unified naming style on this, but I didn't have
that much input (other usage already on the way) at the time, so I chose to use
a shorter and probably more memorable name based on the document.

But now, I agree with you, to use the same naming conventions because on the whole,
especially multiple commands have the same appeal on format naming, uniformity is
more important than memorability of a single command, and I also think maybe we
might need to describe and maintain these rules of <fieldname> in a document
somewhere.

> It also makes it easier to review to avoid such churn, to see what you
> really changed I'm looking at a local version of a range-diff where I
> renamed these, the struct you renamed etc. back just to see what you
> /really/ changed. I.e. what are functional v.s. renaming changes.
>
> Here you changed my '"%"PRIuMAX' to '"%" PRIuMAX'. The former is the
> prevailing style in this codebase, and avoiding the formatting churn
> makes the inter-diff easier to read.

Will fix it in next patch.

> Ditto some harder to review interdiff due to renaming
> churn. I.e. s/line/sb/ in both this and expand_show_tree(). I really
> wouldn't care at all except because of all the manual work in reviewing
> the inter-diff between my original version & this derived version.
>
> In the case of "line" that's not even an improvement. With a --format
> we're not building a "line", the user is free to insert any arbitrary
> directives including \n's, so we might be working on multiple lines.

Make sense.

Will optimize in next patch.

> As I noted in my RFC CL (https://lore.kernel.org/git/RFC-cover-0.7-00000000000-20211217T131635Z-avarab@gmail.com/):
>
>         "the tests for ls-tree are really
>         lacking. E.g. I seem to have a rather obvious bug in how -t and the
>         --format interact here, but no test catches it."
>
> So first, in my version of adding --format I was careful to make
> --name-only etc. imply a given --format, and then only at the last
> minute would we take the "fast path":
> https://lore.kernel.org/git/RFC-patch-6.7-eac299f06ff-20211217T131635Z-avarab@gmail.com/

I'm not sure I understand all the meanings above. I think I have forgot about the content here.
and I have to add some tests for other options like "-t" combined with "--format".

Please correct me if I misunderstood.

> You rewrote that in
> https://lore.kernel.org/git/e0add802fbbabde7e7b3743127b2d4047f1ce760.1641043500.git.dyroneteng@gmail.com/
> and qremoved the limited "GIT_TEST_LS_TREE_FORMAT_BACKEND" testing I
> added, so now the internal --format machinery can't be run through the
> existing tests we do have.

I thought the "GIT_TEST_LS_TREE_FORMAT_BACKEND" is only used in your RFC for testing conveniently,
and should be removed in the end (non-RFC). So I removed it...

So should we add it back?

> Even with that re-added I really wouldn't trust that this code is doing
> the right thing (and as noted, I don't trust my own RFC version
> either). I think e.g. our "coverage" Makefile targets would be a good
> start as a first approximation, i.e. running the /ls-tree/ tests and
> seeing if we have full coverage.

Yeah. I haven't tried it yet, but I will try to run coverage detection when
the next patch is completed

> As I noted in 7/9 I think this patch is 9/9 still mostly something I
> wrote, so that the "author" and Signed-off-by should be preserved. The
> below is a range-diff of an amended version I've been looking at in
> trying to review this. It undoes several (but not all) of your
> formatting/renaming-only changes, just so that I could see what the
> non-formatting changes were:

Sorry for that. I misunderstand something here.
I will fix in the next patch.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index 729370f235..ebdde6eae3 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -10,9 +10,9 @@  SYNOPSIS
 --------
 [verse]
 'git ls-tree' [-d] [-r] [-t] [-l] [-z]
-	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]]
-	    <tree-ish> [<path>...]
-
+	    [--name-only] [--name-status] [--object-only]
+	    [--full-name] [--full-tree] [--abbrev[=<n>]]
+	    [--format=<format>] <tree-ish> [<path>...]
 DESCRIPTION
 -----------
 Lists the contents of a given tree object, like what "/bin/ls -a" does
@@ -79,6 +79,16 @@  OPTIONS
 	Do not limit the listing to the current working directory.
 	Implies --full-name.
 
+--format=<format>::
+	A string that interpolates `%(fieldname)` from the result
+	being shown. It also interpolates `%%` to `%`, and
+	`%xx` where `xx` are hex digits interpolates to character
+	with hex code `xx`; for example `%00` interpolates to
+	`\0` (NUL), `%09` to `\t` (TAB) and `%0a` to `\n` (LF).
+	When specified, `--format` cannot be combined with other
+	format-altering options, including `--long`, `--name-only`
+	and `--object-only`.
+
 [<path>...]::
 	When paths are given, show them (note that this isn't really raw
 	pathnames, but rather a list of patterns to match).  Otherwise
@@ -87,6 +97,9 @@  OPTIONS
 
 Output Format
 -------------
+
+Default format:
+
         <mode> SP <type> SP <object> TAB <file>
 
 This output format is compatible with what `--index-info --stdin` of
@@ -105,6 +118,38 @@  quoted as explained for the configuration variable `core.quotePath`
 (see linkgit:git-config[1]).  Using `-z` the filename is output
 verbatim and the line is terminated by a NUL byte.
 
+Customized format:
+
+It is possible to print in a custom format by using the `--format` option,
+which is able to interpolate different fields using a `%(fieldname)` notation.
+For example, if you want to only print the <object> and <file> fields with a
+JSON style, executing with a specific "--format" like
+
+        git ls-tree --format='{"object":"%(object)", "file":"%(file)"}' <tree-ish>
+
+The output format changes to:
+
+        {"object":"<object>", "file":"<file>"}
+
+FIELD NAMES
+-----------
+
+Various values from structured fields can be used to interpolate
+into the resulting output. For each outputing line, the following
+names can be used:
+
+mode::
+	The mode of the object.
+type::
+	The type of the object (`blob` or `tree`).
+object::
+	The name of the object.
+size[:padded]::
+	The size of the object ("-" if it's a tree).
+	It also supports a padded format of size with "%(size:padded)".
+file::
+	The filename of the object.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 56cc166adb..e048a68ee0 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -57,6 +57,12 @@  enum {
 
 static int cmdmode = MODE_UNSPECIFIED;
 
+static const char *format;
+static const char *default_format = "%(mode) %(type) %(object)%x09%(file)";
+static const char *long_format = "%(mode) %(type) %(object) %(size:padded)%x09%(file)";
+static const char *name_only_format = "%(file)";
+static const char *object_only_format = "%(object)";
+
 static int parse_shown_fields(void)
 {
 	if (cmdmode == MODE_NAME_ONLY) {
@@ -76,6 +82,72 @@  static int parse_shown_fields(void)
 	return 1;
 }
 
+static void expand_objectsize(struct strbuf *line, const struct object_id *oid,
+			      const enum object_type type, unsigned int padded)
+{
+	if (type == OBJ_BLOB) {
+		unsigned long size;
+		if (oid_object_info(the_repository, oid, &size) < 0)
+			die(_("could not get object info about '%s'"),
+			    oid_to_hex(oid));
+		if (padded)
+			strbuf_addf(line, "%7" PRIuMAX, (uintmax_t)size);
+		else
+			strbuf_addf(line, "%" PRIuMAX, (uintmax_t)size);
+	} else if (padded) {
+		strbuf_addf(line, "%7s", "-");
+	} else {
+		strbuf_addstr(line, "-");
+	}
+}
+
+static size_t expand_show_tree(struct strbuf *line, const char *start,
+			       void *context)
+{
+	struct show_tree_data *data = context;
+	const char *end;
+	const char *p;
+	unsigned int errlen;
+	size_t len = strbuf_expand_literal_cb(line, start, NULL);
+
+	if (len)
+		return len;
+	if (*start != '(')
+		die(_("bad ls-tree format: as '%s'"), start);
+
+	end = strchr(start + 1, ')');
+	if (!end)
+		die(_("bad ls-tree format: element '%s' does not end in ')'"), start);
+
+	len = end - start + 1;
+	if (skip_prefix(start, "(mode)", &p)) {
+		strbuf_addf(line, "%06o", data->mode);
+	} else if (skip_prefix(start, "(type)", &p)) {
+		strbuf_addstr(line, type_name(data->type));
+	} else if (skip_prefix(start, "(size:padded)", &p)) {
+		expand_objectsize(line, data->oid, data->type, 1);
+	} else if (skip_prefix(start, "(size)", &p)) {
+		expand_objectsize(line, data->oid, data->type, 0);
+	} else if (skip_prefix(start, "(object)", &p)) {
+		strbuf_add_unique_abbrev(line, data->oid, abbrev);
+	} else if (skip_prefix(start, "(file)", &p)) {
+		const char *name = data->base->buf;
+		const char *prefix = chomp_prefix ? ls_tree_prefix : NULL;
+		struct strbuf quoted = STRBUF_INIT;
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(data->base, data->pathname);
+		name = relative_path(data->base->buf, prefix, &sb);
+		quote_c_style(name, &quoted, NULL, 0);
+		strbuf_addbuf(line, &quoted);
+		strbuf_release(&sb);
+		strbuf_release(&quoted);
+	} else {
+		errlen = (unsigned long)len;
+		die(_("bad ls-tree format: %%%.*s"), errlen, start);
+	}
+	return len;
+}
+
 static int show_recursive(const char *base, size_t baselen,
 			  const char *pathname)
 {
@@ -116,6 +188,38 @@  static enum object_type get_type(unsigned int mode)
 	        : OBJ_BLOB);
 }
 
+static int show_tree_fmt(const struct object_id *oid, struct strbuf *base,
+			 const char *pathname, unsigned mode, void *context)
+{
+	size_t baselen;
+	int recurse = 0;
+	struct strbuf line = STRBUF_INIT;
+	enum object_type type = get_type(mode);
+
+	struct show_tree_data data = {
+		.mode = mode,
+		.type = type,
+		.oid = oid,
+		.pathname = pathname,
+		.base = base,
+	};
+
+	if (type == OBJ_TREE && show_recursive(base->buf, base->len, pathname))
+		recurse = READ_TREE_RECURSIVE;
+	if (type == OBJ_TREE && recurse && !(ls_options & LS_SHOW_TREES))
+		return recurse;
+	if (type == OBJ_BLOB && (ls_options & LS_TREE_ONLY))
+		return 0;
+
+	baselen = base->len;
+	strbuf_expand(&line, format, expand_show_tree, &data);
+	strbuf_addch(&line, line_termination);
+	fwrite(line.buf, line.len, 1, stdout);
+	strbuf_release(&line);
+	strbuf_setlen(base, baselen);
+	return recurse;
+}
+
 static int show_default(struct show_tree_data *data)
 {
 	size_t baselen = data->base->len;
@@ -195,6 +299,7 @@  int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	struct object_id oid;
 	struct tree *tree;
 	int i, full_tree = 0;
+	read_tree_fn_t fn = show_tree;
 	const struct option ls_tree_options[] = {
 		OPT_BIT('d', NULL, &ls_options, N_("only show trees"),
 			LS_TREE_ONLY),
@@ -217,6 +322,9 @@  int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 		OPT_BOOL(0, "full-tree", &full_tree,
 			 N_("list entire tree; not just current directory "
 			    "(implies --full-name)")),
+		OPT_STRING_F(0, "format", &format, N_("format"),
+			     N_("format to use for the output"),
+			     PARSE_OPT_NONEG),
 		OPT__ABBREV(&abbrev),
 		OPT_END()
 	};
@@ -237,6 +345,10 @@  int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	    ((LS_TREE_ONLY|LS_RECURSIVE) & ls_options))
 		ls_options |= LS_SHOW_TREES;
 
+	if (format && cmdmode)
+		usage_msg_opt(
+			_("--format can't be combined with other format-altering options"),
+			ls_tree_usage, ls_tree_options);
 	if (argc < 1)
 		usage_with_options(ls_tree_usage, ls_tree_options);
 	if (get_oid(argv[0], &oid))
@@ -260,6 +372,19 @@  int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	tree = parse_tree_indirect(&oid);
 	if (!tree)
 		die("not a tree object");
-	return !!read_tree(the_repository, tree,
-			   &pathspec, show_tree, NULL);
+
+	/*
+	 * The generic show_tree_fmt() is slower than show_tree(), so
+	 * take the fast path if possible.
+	 */
+	if (format &&
+	    (!strcmp(format, default_format) ||
+	     !strcmp(format, long_format) ||
+	     !strcmp(format, name_only_format) ||
+	     !strcmp(format, object_only_format)))
+		fn = show_tree;
+	else if (format)
+		fn = show_tree_fmt;
+
+	return !!read_tree(the_repository, tree, &pathspec, fn, NULL);
 }
diff --git a/t/t3105-ls-tree-format.sh b/t/t3105-ls-tree-format.sh
new file mode 100755
index 0000000000..ea0f51d866
--- /dev/null
+++ b/t/t3105-ls-tree-format.sh
@@ -0,0 +1,55 @@ 
+#!/bin/sh
+
+test_description='ls-tree --format'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'ls-tree --format usage' '
+	test_expect_code 129 git ls-tree --format=fmt -l HEAD &&
+	test_expect_code 129 git ls-tree --format=fmt --name-only HEAD &&
+	test_expect_code 129 git ls-tree --format=fmt --name-status HEAD &&
+	test_expect_code 129 git ls-tree --format=fmt --object-only HEAD
+'
+
+test_expect_success 'setup' '
+	mkdir dir &&
+	test_commit dir/sub-file &&
+	test_commit top-file
+'
+
+test_ls_tree_format () {
+	format=$1 &&
+	opts=$2 &&
+	shift 2 &&
+	git ls-tree $opts -r HEAD >expect.raw &&
+	sed "s/^/> /" >expect <expect.raw &&
+	git ls-tree --format="> $format" -r HEAD >actual &&
+	test_cmp expect actual
+}
+
+test_expect_success 'ls-tree --format=<default-like>' '
+	test_ls_tree_format \
+		"%(mode) %(type) %(object)%x09%(file)" \
+		""
+'
+
+test_expect_success 'ls-tree --format=<long-like>' '
+	test_ls_tree_format \
+		"%(mode) %(type) %(object) %(size:padded)%x09%(file)" \
+		"--long"
+'
+
+test_expect_success 'ls-tree --format=<name-only-like>' '
+	test_ls_tree_format \
+		"%(file)" \
+		"--name-only"
+'
+
+test_expect_success 'ls-tree --format=<object-only-like>' '
+	test_ls_tree_format \
+		"%(object)" \
+		"--object-only"
+'
+
+test_done