mbox series

[0/3] support `--oid-only` in `ls-tree`

Message ID 20211115115153.48307-1-dyroneteng@gmail.com (mailing list archive)
Headers show
Series support `--oid-only` in `ls-tree` | expand

Message

Teng Long Nov. 15, 2021, 11:51 a.m. UTC
Sometimes, we only want to get the objects from output of `ls-tree`
and commands like `sed` or `cut` is usually used to intercept the
origin output to achieve this purpose in practical.

The patch contains three commits

    1. Implementation of the option.
    2. Add new tests in "t3104".
    3. Documentation modifications.

I'm appreciate if someone help to review the patch.

Thanks.

Teng Long (3):
  ls-tree.c: support `--oid-only` option for "git-ls-tree"
  t3104: add related tests for `--oid-only` option
  git-ls-tree.txt: description of the 'oid-only' option

 Documentation/git-ls-tree.txt |  8 +++--
 builtin/ls-tree.c             | 11 +++++++
 t/t3104-ls-tree-oid.sh        | 55 +++++++++++++++++++++++++++++++++++
 3 files changed, 72 insertions(+), 2 deletions(-)
 create mode 100755 t/t3104-ls-tree-oid.sh

Comments

Ævar Arnfjörð Bjarmason Nov. 15, 2021, 3:13 p.m. UTC | #1
On Mon, Nov 15 2021, Teng Long wrote:

> Sometimes, we only want to get the objects from output of `ls-tree`
> and commands like `sed` or `cut` is usually used to intercept the
> origin output to achieve this purpose in practical.
>
> The patch contains three commits
>
>     1. Implementation of the option.
>     2. Add new tests in "t3104".
>     3. Documentation modifications.
>
> I'm appreciate if someone help to review the patch.

I've looked it over, they look correct mostly, the test code in 2/3
looks a bit too complex (using find?).

But I'd much rather see this be done with adding strbuf_expand() to
ls-tree. I.e. its docs say that it can emit:

    <mode> SP <type> SP <object> TAB <file>

Or, with -l:

    <mode> SP <type> SP <object> SP <object size> TAB <file>

If you use strbuf_expand() you can just define a default format of:

    %(objectmode) SP %(objecttype) SP %(objectname) TAB %(path)

Then make the existing -l option a shorthand for tweaking that to:

    %(objectmode) SP %(objecttype) SP %(objectsize) SP %(objectname) TAB %(path)

Then you can get what you want out of this with a simple:

    git ls-tree --format="%(objectname)"

See e.g. git-cat-file for an existing use of strbuf_expand().
Jeff King Nov. 15, 2021, 7:09 p.m. UTC | #2
On Mon, Nov 15, 2021 at 04:13:24PM +0100, Ævar Arnfjörð Bjarmason wrote:

> But I'd much rather see this be done with adding strbuf_expand() to
> ls-tree. I.e. its docs say that it can emit:

I had a similar thought, but that's a much bigger task. I think it would
be reasonable to add --oid-only to match the existing --name-only, etc.
If we later add a custom --format option, then it can easily be folded
in and explained as "this is an alias for --format=%(objectname)", just
like --name-only would become "this is an alias for --format=%(path)".

-Peff
Jeff King Nov. 15, 2021, 7:23 p.m. UTC | #3
On Mon, Nov 15, 2021 at 07:51:50PM +0800, Teng Long wrote:

> Sometimes, we only want to get the objects from output of `ls-tree`
> and commands like `sed` or `cut` is usually used to intercept the
> origin output to achieve this purpose in practical.
> 
> The patch contains three commits
> 
>     1. Implementation of the option.
>     2. Add new tests in "t3104".
>     3. Documentation modifications.
> 
> I'm appreciate if someone help to review the patch.

This seems like a good feature to have. I think it would make sense to
squash the three patches into a single one. The documentation and test
patches do not stand on their own, which is why there was nothing useful
to say in their commit messages.

The implementation looks generally sensible (modulo the comments already
given). I was surprised that there was not an existing ls-tree script
that these would fit into. But there really isn't; t3101 covers
--name-only and other output, but is really focused on the pathnames
(though I think it would be OK to refactor it to cover output more
generally).

-Peff
Ævar Arnfjörð Bjarmason Nov. 15, 2021, 9:50 p.m. UTC | #4
On Mon, Nov 15 2021, Jeff King wrote:

> On Mon, Nov 15, 2021 at 04:13:24PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> But I'd much rather see this be done with adding strbuf_expand() to
>> ls-tree. I.e. its docs say that it can emit:
>
> I had a similar thought, but that's a much bigger task. I think it would
> be reasonable to add --oid-only to match the existing --name-only, etc.
> If we later add a custom --format option, then it can easily be folded
> in and explained as "this is an alias for --format=%(objectname)", just
> like --name-only would become "this is an alias for --format=%(path)".

A quick patch to do it below, seems to work, passes all tests, but I
don't know how much I'd trust it. It's also quite an add use of
strbuf_expa(). We print to stdout directly since
write_name_quoted_relative() really wants to write to stdout, and not
give you a buffer. But I guess it makes sense in a way.

The hardcoded %7s for %(objectsize) is a bit nasty, but I don't know if
we've got anything existing that handles format specifiers with
strbuf_expand() that we could steal.

I really wouldn't trust this code much, I found it when writing it that
our tests for ls-tree are really lacking, e.g. we may not have a single
test for "-l" anywhere (or maybe I didn't look enough, I was just
running t/*ls*tree* while hacking it.

I do thin that we should consider just going with --format in either
case if we agree that this is a good direction. I.e. could just support
3-4 hardcoded formats now and die if anything else is specified.

Then we'd be future-proof with the same interface expanding later, and
wouldn't need to support options that we're only carrying because we
didn't implement the more generic format support.

(Assume my Signed-off-by, if there's any interest...)

diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 3a442631c71..e89daad4229 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -31,6 +31,20 @@ static const  char * const ls_tree_usage[] = {
 	NULL
 };
 
+static const char *ls_tree_format_d = "%(objectmode) %(objecttype) %(objectname)	%(path)";
+static const char *ls_tree_format_l = "%(objectmode) %(objecttype) %(objectname) %(objectsize)	%(path)";
+static const char *ls_tree_format_n = "%(path)";
+
+struct expand_ls_tree_data {
+	const char *format;
+	unsigned mode;
+	const char *type;
+	const struct object_id *oid;
+	int abbrev;
+	const char *pathname;
+	const char *basebuf;
+};
+
 static int show_recursive(const char *base, int baselen, const char *pathname)
 {
 	int i;
@@ -61,9 +75,69 @@ static int show_recursive(const char *base, int baselen, const char *pathname)
 	return 0;
 }
 
+static size_t expand_show_tree(struct strbuf *sb,
+			       const char *start,
+			       void *context)
+{
+	struct expand_ls_tree_data *data = context;
+	const char *end;
+	const char *p;
+	size_t len;
+	const char *type = blob_type;
+
+	if (sb->len) {
+		fputs(sb->buf, stdout);
+		strbuf_reset(sb);
+	}
+
+	if (*start != '(')
+		die(_("bad format as of '%s'"), start);
+	end = strchr(start + 1, ')');
+	if (!end)
+		die(_("ls-tree format element '%s' does not end in ')'"),
+		    start);
+	len = end - start + 1;
+
+	if (skip_prefix(start, "(objectmode)", &p)) {
+		printf("%06o", data->mode);
+	} else if (skip_prefix(start, "(objecttype)", &p)) {
+		fputs(data->type, stdout);
+	} else if (skip_prefix(start, "(objectsize)", &p)) {
+		char size_text[24];
+		const struct object_id *oid = data->oid;
+
+		if (!strcmp(type, blob_type)) {
+			unsigned long size;
+			if (oid_object_info(the_repository, oid, &size) == OBJ_BAD)
+				xsnprintf(size_text, sizeof(size_text),
+					  "BAD");
+			else
+				xsnprintf(size_text, sizeof(size_text),
+					  "%"PRIuMAX, (uintmax_t)size);
+		} else {
+			xsnprintf(size_text, sizeof(size_text), "-");
+		}
+		printf("%7s", size_text);
+	} else if (skip_prefix(start, "(objectname)", &p)) {
+		fputs(find_unique_abbrev(data->oid, data->abbrev), stdout);
+	} else if (skip_prefix(start, "(path)", &p)) {
+		write_name_quoted_relative(data->basebuf,
+					   chomp_prefix ? ls_tree_prefix : NULL,
+					   stdout, line_termination);
+
+	} else {
+		unsigned int errlen = (unsigned long)len;
+		die(_("bad ls-tree format specifiec %%%.*s"), errlen, start);	
+	}
+
+	return len;
+}
+
 static int show_tree(const struct object_id *oid, struct strbuf *base,
 		const char *pathname, unsigned mode, void *context)
 {
+	struct expand_ls_tree_data *data = context;
+	struct strbuf sb = STRBUF_INIT;
 	int retval = 0;
 	int baselen;
 	const char *type = blob_type;
@@ -90,31 +164,18 @@ static int show_tree(const struct object_id *oid, struct strbuf *base,
 	else if (ls_options & LS_TREE_ONLY)
 		return 0;
 
-	if (!(ls_options & LS_NAME_ONLY)) {
-		if (ls_options & LS_SHOW_SIZE) {
-			char size_text[24];
-			if (!strcmp(type, blob_type)) {
-				unsigned long size;
-				if (oid_object_info(the_repository, oid, &size) == OBJ_BAD)
-					xsnprintf(size_text, sizeof(size_text),
-						  "BAD");
-				else
-					xsnprintf(size_text, sizeof(size_text),
-						  "%"PRIuMAX, (uintmax_t)size);
-			} else
-				xsnprintf(size_text, sizeof(size_text), "-");
-			printf("%06o %s %s %7s\t", mode, type,
-			       find_unique_abbrev(oid, abbrev),
-			       size_text);
-		} else
-			printf("%06o %s %s\t", mode, type,
-			       find_unique_abbrev(oid, abbrev));
-	}
 	baselen = base->len;
 	strbuf_addstr(base, pathname);
-	write_name_quoted_relative(base->buf,
-				   chomp_prefix ? ls_tree_prefix : NULL,
-				   stdout, line_termination);
+
+	strbuf_reset(&sb);
+	data->mode = mode;
+	data->type = type;
+	data->oid = oid;
+	data->abbrev = abbrev;
+	data->pathname = pathname;
+	data->basebuf = base->buf;
+	strbuf_expand(&sb, data->format, expand_show_tree, data);
+
 	strbuf_setlen(base, baselen);
 	return retval;
 }
@@ -147,6 +208,9 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 		OPT__ABBREV(&abbrev),
 		OPT_END()
 	};
+	struct expand_ls_tree_data ls_tree_cb_data = {
+		.format = ls_tree_format_d,
+	};
 
 	git_config(git_default_config, NULL);
 	ls_tree_prefix = prefix;
@@ -161,8 +225,14 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	}
 	/* -d -r should imply -t, but -d by itself should not have to. */
 	if ( (LS_TREE_ONLY|LS_RECURSIVE) ==
-	    ((LS_TREE_ONLY|LS_RECURSIVE) & ls_options))
+	    ((LS_TREE_ONLY|LS_RECURSIVE) & ls_options)) {
 		ls_options |= LS_SHOW_TREES;
+	}
+	if (ls_options & LS_NAME_ONLY)
+		ls_tree_cb_data.format = ls_tree_format_n;
+
+	if (ls_options & LS_SHOW_SIZE)
+		ls_tree_cb_data.format = ls_tree_format_l;
 
 	if (argc < 1)
 		usage_with_options(ls_tree_usage, ls_tree_options);
@@ -185,6 +255,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	tree = parse_tree_indirect(&oid);
 	if (!tree)
 		die("not a tree object");
+
 	return !!read_tree(the_repository, tree,
-			   &pathspec, show_tree, NULL);
+			   &pathspec, show_tree, &ls_tree_cb_data);
 }
Teng Long Nov. 19, 2021, 2:57 a.m. UTC | #5
> A quick patch to do it below, seems to work, passes all tests, but I
> don't know how much I'd trust it. It's also quite an add use of
> strbuf_expa(). We print to stdout directly since
> write_name_quoted_relative() really wants to write to stdout, and not
> give you a buffer. But I guess it makes sense in a way.

Thanks for the patch and the inputs about "strbuf_expa()".

> Then we'd be future-proof with the same interface expanding later, and
> wouldn't need to support options that we're only carrying because we
> didn't implement the more generic format support.

I agree but like Peff said it maybe another bigger task. I think I will
firstly solve the existing problems in next patch.

I will consider about the generic format support but not sure whether
it will continue to iterate in this patchset.

> (Assume my Signed-off-by, if there's any interest...)

Of course I will.

Thank you very much for your advice and guidance again.