mbox series

[v4,0/2] rev-list: print additional missing object information

Message ID 20250205004147.887106-1-jltobler@gmail.com (mailing list archive)
Headers show
Series rev-list: print additional missing object information | expand

Message

Justin Tobler Feb. 5, 2025, 12:41 a.m. UTC
Greetings,

It is possible to configure git-rev-list(1) to print the OID of missing
objects by setting the `--missing=print` option. While it is useful
knowing about these objects, it would be nice to have even more context
about the objects that are missing. Luckily, from an object containing
the missing object, it is possible to infer additional information the
missing object. For example, if the tree containing a missing blob still
exists, the tree entry for the missing object should contain path and
type information.

This series aims to provide git-rev-list(1) with a new `print-info`
missing action for the `--missing` option that, when set, behaves like
the existing `print` action but also prints other potentially
interesting information about the missing object.

Missing object info is printed in the form `?<oid> [<token>=<value>]...`
where multiple `<token>=<value>` pairs may be specified each separated
from each other with a SP. Values that contain SP or LF characters are
expected to be encoded in a manner such that these problematic bytes are
handled. For missing object path information this is handled by quoting
the path in the C style if it contains SP or special characters.

One concern I currently have with this quoting approach is that it is a
bit more challenging to machine parse compared to something like using a
null byte to delimit between missing info. One option is, in a followup
series, introduce a git-for-each-ref(1) style format syntax. Maybe
something like: `--missing=print-info:%(path)%00%(type)`. I'm curious if
anyone may have thoughts around this. My goal is to ensure that there is
an easy to use machine parsable interface to get this information. I
could see something like `?<oid> path="foo \"bar" type=blob`, being a
bit complex.

The series is set up as follows:

- Patch 1 introduces the `print-info` missing action and supports
  printing missing object path information.

- Patch 2 extends the `print-info` missing action to also print object
  type information about the missing object.

Changes in V4:

- The core.quotePath behavior is no longer force enabled for the missing
  info values. Consequently the first two patches from the previous
  version are dropped.

Thanks,
-Justin

Justin Tobler (2):
  rev-list: add print-info action to print missing object path
  rev-list: extend print-info to print missing object type

 Documentation/rev-list-options.txt |  19 ++++++
 builtin/rev-list.c                 | 106 ++++++++++++++++++++++++-----
 t/t6022-rev-list-missing.sh        |  53 +++++++++++++++
 3 files changed, 161 insertions(+), 17 deletions(-)

Range-diff against v3:
1:  f628728300 < -:  ---------- quote: add c quote flag to ignore core.quotePath
2:  53a3811d8f < -:  ---------- quote: add quote_path() flag to ignore config
3:  fe7a3da8de ! 1:  e3d5295b4d rev-list: add print-info action to print missing object path
    @@ builtin/rev-list.c: static off_t get_object_disk_usage(struct object *obj)
     +		struct strbuf path = STRBUF_INIT;
     +
     +		strbuf_addstr(&sb, " path=");
    -+		quote_path(entry->path, NULL, &path,
    -+			   QUOTE_PATH_QUOTE_SP | QUOTE_PATH_IGNORE_CONFIG);
    ++		quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
     +		strbuf_addbuf(&sb, &path);
     +
     +		strbuf_release(&path);
4:  788b497d00 = 2:  6aa71444d3 rev-list: extend print-info to print missing object type

base-commit: b74ff38af58464688b211140b90ec90598d340c6

Comments

Christian Couder Feb. 5, 2025, 10:35 a.m. UTC | #1
On Wed, Feb 5, 2025 at 1:45 AM Justin Tobler <jltobler@gmail.com> wrote:

> Changes in V4:
>
> - The core.quotePath behavior is no longer force enabled for the missing
>   info values. Consequently the first two patches from the previous
>   version are dropped.

This v4 looks good to me. Ack!
Junio C Hamano Feb. 5, 2025, 1:18 p.m. UTC | #2
Justin Tobler <jltobler@gmail.com> writes:

> One concern I currently have with this quoting approach is that it is a
> bit more challenging to machine parse compared to something like using a
> null byte to delimit between missing info. One option is, in a followup
> series, introduce a git-for-each-ref(1) style format syntax. Maybe
> something like: `--missing=print-info:%(path)%00%(type)`. I'm curious if
> anyone may have thoughts around this.

Would it be so bad if we said that in -z mode with --info option,
each record is terminated with two NUL bytes, and elements on a list
of var=value pairs have a single NUL in between, or something silly
like that?  The point is to get away with just a fixed format,
without any customization.