diff mbox series

[v3,2/3] t0021: implementation the rot13-filter.pl script in C

Message ID 86e6baba460f4d0fce353d1fb6a0e18b57ecadaa.1659291025.git.matheus.bernardino@usp.br (mailing list archive)
State Superseded
Headers show
Series t0021: convert perl script to C test-tool helper | expand

Commit Message

Matheus Tavares July 31, 2022, 6:19 p.m. UTC
This script is currently used by three test files: t0021-conversion.sh,
t2080-parallel-checkout-basics.sh, and
t2082-parallel-checkout-attributes.sh. To avoid the need for the PERL
dependency at these tests, let's convert the script to a C test-tool
command. The following commit will take care of actually modifying the
said tests to use the new C helper and removing the Perl script.

Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Makefile                     |   1 +
 pkt-line.c                   |   5 +-
 pkt-line.h                   |   8 +-
 t/helper/test-rot13-filter.c | 379 +++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c         |   1 +
 t/helper/test-tool.h         |   1 +
 6 files changed, 393 insertions(+), 2 deletions(-)
 create mode 100644 t/helper/test-rot13-filter.c

Comments

Ævar Arnfjörð Bjarmason Aug. 1, 2022, 11:33 a.m. UTC | #1
On Sun, Jul 31 2022, Matheus Tavares wrote:


> +static char *rot13_usage = "test-tool rot13-filter [--always-delay] <log path> <capabilities>";
> +
> +int cmd__rot13_filter(int argc, const char **argv)
> +{
> +	const char **caps;
> +	int cap_count, i = 1;
> +	struct strset remote_caps = STRSET_INIT;
> +
> +	if (argc > 1 && !strcmp(argv[1], "--always-delay")) {
> +		always_delay = 1;
> +		i++;
> +	}
> +	if (argc - i < 2)
> +		usage(rot13_usage);
> +
> +	logfile = fopen(argv[i++], "a");
> +	if (!logfile)
> +		die_errno("failed to open log file");
> +
> +	caps = argv + i;
> +	cap_count = argc - i;

Since you need to change every single caller consider just starting out
with parse_options() here instead of rolling your own parsing. You could
use it for --always-delay in any case, but you could also just add a
--log-path and --capability (an OPT_STRING_LIST), so:

	test-tool rot13-filter [--always-delay] --log-path=<path> [--capability <capbility]...

> +
> +	for (i = 0; i < cap_count; i++) {
> +		if (!strcmp(caps[i], "clean"))
> +			has_clean_cap = 1;
> +		else if (!strcmp(caps[i], "smudge"))
> +			has_smudge_cap = 1;

In any case, maybe BUG() in an "else" here with "unknown capability"?

> +	fclose(logfile);

Perhaps check the return value & die_errno() if we fail to fclose()
(happens e.g. if the disk fills up).
Ævar Arnfjörð Bjarmason Aug. 1, 2022, 11:39 a.m. UTC | #2
On Sun, Jul 31 2022, Matheus Tavares wrote:

> +static void reply_list_available_blobs_cmd(void)
> +{
> +	struct hashmap_iter iter;
> +	struct strmap_entry *ent;
> +	struct string_list_item *str_item;
> +	struct string_list paths = STRING_LIST_INIT_NODUP;
> +
> +	/* flush */
> +	if (packet_read_line(0, NULL))
> +		die("bad list_available_blobs end");

Shouldn't anything that's not an OS error (e.g. write error) be a BUG()
instead in this code? I.e. it would be a bug in our own testcode if we
feed the wrong data here, or if pkt-line doesn't work as we expect...

> +
> +	strmap_for_each_entry(&delay, &iter, ent) {
> +		struct delay_entry *delay_entry = ent->value;
> +		if (!delay_entry->requested)
> +			continue;
> +		delay_entry->count--;
> +		if (!strcmp(ent->key, "invalid-delay.a")) {
> +			/* Send Git a pathname that was not delayed earlier */
> +			packet_write_fmt(1, "pathname=unfiltered");
> +		}
> +		if (!strcmp(ent->key, "missing-delay.a")) {
> +			/* Do not signal Git that this file is available */
> +		} else if (!delay_entry->count) {
> +			string_list_append(&paths, ent->key);
> +			packet_write_fmt(1, "pathname=%s", ent->key);
> +		}
> +	}
> +
> +	/* Print paths in sorted order. */
> +	string_list_sort(&paths);
> +	for_each_string_list_item(str_item, &paths)
> +		fprintf(logfile, " %s", str_item->string);
> +	string_list_clear(&paths, 0);
> +
> +	packet_flush(1);
> +
> +	fprintf(logfile, " [OK]\n");

I think it should be called out in the commit message that this is not
what the Perl version is doing, i.e. it does things like:

	print $debug " [OK]\n";
	$debug->flush();

After having previously printed the equivalent of your
for_each_string_list_item() to the log file.

In Perl anything that uses PerlIO is subject to internal buffering,
which doesn't have the same semantics as stdio buffering.

I think in this case it won't matter, since you're not expecting to have
concurrent writers. You could even use fputc() here.

But a faithful reproduction of the Perl version would be something like
appending the output here to a "struct strbuf", and then "flushing" it
at the end when the perl version does a "$debug->flush()".

I don't think that's worth the effort here, and we should just say that
it doesn't matter. I just think we should note it. Thanks!
Junio C Hamano Aug. 1, 2022, 9:18 p.m. UTC | #3
Matheus Tavares <matheus.bernardino@usp.br> writes:

> +static char *get_value(char *buf, size_t size, const char *key)
> +{
> +	const char *orig_buf = buf;
> +	int orig_size = (int)size;
> +
> +	if (!skip_prefix_mem((const char *)buf, size, key, (const char **)&buf, &size) ||
> +	    !skip_prefix_mem((const char *)buf, size, "=", (const char **)&buf, &size) ||
> +	    !size)

So, skip_prefix_mem(), when successfully parses the prefix out,
advances buf[] to skip the prefix and shortens size by the same
amount, so buf[size] is pointing at the same byte.  The code wants
to make sure buf[] begins with the "<key>=", skip that part, so
presumably buf[] after the above part moves to the beginning of
<value> in the "<key>=<value>" string?  It also wants to reject
"<key>=", i.e. an empty string as the <value>?

> +		die("expected key '%s', got '%.*s'",
> +		    key, orig_size, orig_buf);
> +
> +	buf[size] = '\0';

I find this assignment somewhat strange, but primarily because it
uses the updated buf[size] that ought to be pointing at the same
byte as the original buf[size].  Is this necessary because buf[size]
upon the entry to this function does not necessarily have NUL there?

Reading ahead,

 * packet_key_val_read() feeds the buffer taken from
   packet_read_line_gently(), so buf[size] should be NUL terminated
   already.

 * read_capabilities() feeds the buffer taken from
   packet_read_line(), so buf[size] should be NUL terminated
   already.

> +	return buf;
> +}

And the caller gets the byte position that begins the <value> part.

> +static char *packet_key_val_read(const char *key)
> +{
> +	int size;
> +	char *buf;
> +	if (packet_read_line_gently(0, &size, &buf) < 0)
> +		return NULL;
> +	return xstrdup(get_value(buf, size, key));
> +}

The returned value from get_value() is pointing into
pkt-line.c::packet_buffer[], so we return a copy to the caller,
which takes the ownership.  OK.

> +static inline void assert_remote_capability(struct strset *caps, const char *cap)
> +{
> +	if (!strset_contains(caps, cap))
> +		die("required '%s' capability not available from remote", cap);
> +}
> +
> +static void read_capabilities(struct strset *remote_caps)
> +{
> +	for (;;) {
> +		int size;
> +		char *buf = packet_read_line(0, &size);
> +		if (!buf)
> +			break;
> +		strset_add(remote_caps, get_value(buf, size, "capability"));
> +	}

strset_add() creates a copy of what get_value() borrowed from
pkt-line.c::packet_buffer[] here, which is good.

> +	assert_remote_capability(remote_caps, "clean");
> +	assert_remote_capability(remote_caps, "smudge");
> +	assert_remote_capability(remote_caps, "delay");
> +}

> +static void command_loop(void)
> +{
> +	for (;;) {
> +		char *buf, *output;
> +		int size;
> +		char *pathname;
> +		struct delay_entry *entry;
> +		struct strbuf input = STRBUF_INIT;
> +		char *command = packet_key_val_read("command");
> +
> +		if (!command) {
> +			fprintf(logfile, "STOP\n");
> +			break;
> +		}
> +		fprintf(logfile, "IN: %s", command);
> +
> +		if (!strcmp(command, "list_available_blobs")) {
> +			reply_list_available_blobs_cmd();
> +			free(command);
> +			continue;
> +		}

OK.

> +		pathname = packet_key_val_read("pathname");
> +		if (!pathname)
> +			die("unexpected EOF while expecting pathname");
> +		fprintf(logfile, " %s", pathname);
> +
> +		/* Read until flush */
> +		while ((buf = packet_read_line(0, &size))) {
> +			if (!strcmp(buf, "can-delay=1")) {
> +				entry = strmap_get(&delay, pathname);
> +				if (entry && !entry->requested) {
> +					entry->requested = 1;
> +				} else if (!entry && always_delay) {
> +					add_delay_entry(pathname, 1, 1);
> +				}

These are unnecessary {} around single statement blocks, but let's
let it pass in a test helper.

> +			} else if (starts_with(buf, "ref=") ||
> +				   starts_with(buf, "treeish=") ||
> +				   starts_with(buf, "blob=")) {
> +				fprintf(logfile, " %s", buf);
> +			} else {
> +				/*
> +				 * In general, filters need to be graceful about
> +				 * new metadata, since it's documented that we
> +				 * can pass any key-value pairs, but for tests,
> +				 * let's be a little stricter.
> +				 */
> +				die("Unknown message '%s'", buf);
> +			}
> +		}
> +
> +
> +		read_packetized_to_strbuf(0, &input, 0);

I do not see a need for double blank lines above.

> +		fprintf(logfile, " %"PRIuMAX" [OK] -- ", (uintmax_t)input.len);
> +
> +		entry = strmap_get(&delay, pathname);
> +		if (entry && entry->output) {
> +			output = entry->output;
> +		} else if (!strcmp(pathname, "error.r") || !strcmp(pathname, "abort.r")) {
> +			output = "";
> +		} else if (!strcmp(command, "clean") && has_clean_cap) {
> +			output = rot13(input.buf);
> +		} else if (!strcmp(command, "smudge") && has_smudge_cap) {
> +			output = rot13(input.buf);
> +		} else {
> +			die("bad command '%s'", command);
> +		}

Good.  At this point, output all points into something and itself
does not own the memory it is pointing at.

> +		if (!strcmp(pathname, "error.r")) {
> +			fprintf(logfile, "[ERROR]\n");
> +			packet_write_fmt(1, "status=error");
> +			packet_flush(1);
> +		} else if (!strcmp(pathname, "abort.r")) {
> +			fprintf(logfile, "[ABORT]\n");
> +			packet_write_fmt(1, "status=abort");
> +			packet_flush(1);
> +		} else if (!strcmp(command, "smudge") &&
> +			   (entry = strmap_get(&delay, pathname)) &&
> +			   entry->requested == 1) {
> +			fprintf(logfile, "[DELAYED]\n");
> +			packet_write_fmt(1, "status=delayed");
> +			packet_flush(1);
> +			entry->requested = 2;
> +			if (entry->output != output) {
> +				free(entry->output);
> +				entry->output = xstrdup(output);
> +			}
> +		} else {
> +			int i, nr_packets = 0;
> +			size_t output_len;
> +			const char *p;
> +			packet_write_fmt(1, "status=success");
> +			packet_flush(1);
> +
> +			if (skip_prefix(pathname, command, &p) &&
> +			    !strcmp(p, "-write-fail.r")) {
> +				fprintf(logfile, "[WRITE FAIL]\n");
> +				die("%s write error", command);
> +			}
> +
> +			output_len = strlen(output);
> +			fprintf(logfile, "OUT: %"PRIuMAX" ", (uintmax_t)output_len);
> +
> +			if (write_packetized_from_buf_no_flush_count(output,
> +				output_len, 1, &nr_packets))
> +				die("failed to write buffer to stdout");
> +			packet_flush(1);
> +
> +			for (i = 0; i < nr_packets; i++)
> +				fprintf(logfile, ".");
> +			fprintf(logfile, " [OK]\n");
> +
> +			packet_flush(1);
> +		}
> +		free(pathname);
> +		strbuf_release(&input);
> +		free(command);
> +	}
> +}

OK, at this point we are done with pathname and command so we can
free them for the next round.  input was used as a scratch buffer
and we are done with it, too.

Looking good.

Thanks.
Matheus Tavares Aug. 2, 2022, 12:13 a.m. UTC | #4
On Mon, Aug 1, 2022 at 6:18 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > +             die("expected key '%s', got '%.*s'",
> > +                 key, orig_size, orig_buf);
> > +
> > +     buf[size] = '\0';
>
> I find this assignment somewhat strange, but primarily because it
> uses the updated buf[size] that ought to be pointing at the same
> byte as the original buf[size].  Is this necessary because buf[size]
> upon the entry to this function does not necessarily have NUL there?
>
> Reading ahead,
>
>  * packet_key_val_read() feeds the buffer taken from
>    packet_read_line_gently(), so buf[size] should be NUL terminated
>    already.
>
>  * read_capabilities() feeds the buffer taken from
>    packet_read_line(), so buf[size] should be NUL terminated
>    already.
>
> > +     return buf;
> > +}
>
> And the caller gets the byte position that begins the <value> part.

Good point. I'll remove the buf[size] = '\0' assignment.

> > +                             if (entry && !entry->requested) {
> > +                                     entry->requested = 1;
> > +                             } else if (!entry && always_delay) {
> > +                                     add_delay_entry(pathname, 1, 1);
> > +                             }
>
> These are unnecessary {} around single statement blocks, but let's
> let it pass in a test helper.
> > [...]
> > +                             die("Unknown message '%s'", buf);
> > +                     }
> > +             }
> > +
> > +
> > +             read_packetized_to_strbuf(0, &input, 0);
>
> I do not see a need for double blank lines above.

Oops, I will fix these too. Thanks.
Matheus Tavares Aug. 2, 2022, 12:16 a.m. UTC | #5
On Mon, Aug 1, 2022 at 8:37 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> On Sun, Jul 31 2022, Matheus Tavares wrote:
> >
> > +
> > +     caps = argv + i;
> > +     cap_count = argc - i;
>
> Since you need to change every single caller consider just starting out
> with parse_options() here instead of rolling your own parsing. You could
> use it for --always-delay in any case, but you could also just add a
> --log-path and --capability (an OPT_STRING_LIST), so:
>
>         test-tool rot13-filter [--always-delay] --log-path=<path> [--capability <capbility]...

Ah, makes sense. Thanks

> > +
> > +     for (i = 0; i < cap_count; i++) {
> > +             if (!strcmp(caps[i], "clean"))
> > +                     has_clean_cap = 1;
> > +             else if (!strcmp(caps[i], "smudge"))
> > +                     has_smudge_cap = 1;
>
> In any case, maybe BUG() in an "else" here with "unknown capability"?

Yup, will do.

> > +     fclose(logfile);
>
> Perhaps check the return value & die_errno() if we fail to fclose()
> (happens e.g. if the disk fills up).

Sure. Thanks.
Johannes Schindelin Aug. 9, 2022, 9:45 a.m. UTC | #6
Hi Matheus,

On Mon, 1 Aug 2022, Matheus Tavares wrote:

> On Mon, Aug 1, 2022 at 8:37 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> >
> > On Sun, Jul 31 2022, Matheus Tavares wrote:
> > >
> > > +
> > > +     for (i = 0; i < cap_count; i++) {
> > > +             if (!strcmp(caps[i], "clean"))
> > > +                     has_clean_cap = 1;
> > > +             else if (!strcmp(caps[i], "smudge"))
> > > +                     has_smudge_cap = 1;
> >
> > In any case, maybe BUG() in an "else" here with "unknown capability"?
>
> Yup, will do.

Please don't, the suggestion is unsound.

The idea here is to find out whether the command-line listed the "clean"
and/or the "smudge" capabilities, ignoring all others for the moment.

To error out here with a BUG() would most likely break the invocation
in t0021 where we also pass the `delay` capability.

Ciao,
Dscho
Johannes Schindelin Aug. 9, 2022, 10 a.m. UTC | #7
Hi Junio,

On Mon, 1 Aug 2022, Junio C Hamano wrote:

> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > +		/* Read until flush */
> > +		while ((buf = packet_read_line(0, &size))) {
> > +			if (!strcmp(buf, "can-delay=1")) {
> > +				entry = strmap_get(&delay, pathname);
> > +				if (entry && !entry->requested) {
> > +					entry->requested = 1;
> > +				} else if (!entry && always_delay) {
> > +					add_delay_entry(pathname, 1, 1);
> > +				}
>
> These are unnecessary {} around single statement blocks, but let's
> let it pass in a test helper.

I would like to encourage you to think of ways how this project could
avoid the cost (mental space, reviewer time, back and forth between
contributor and reviewer) of such trivial code formatting issues.

My favored solution would be to adjust the code formatting rules in Git to
such an extent that it can be completely automated, whether via a
`clang-format-diff` rule [*1*] or via an adapted `checkpatch` [*2*] or via
something that is modeled after cURL's `checksrc` script [*3*].

It costs us too much time, and is too annoying all around, having to spend
so many brain cycles on code style (which people like me find much less
interesting than the actual, functional changes).

I'd much rather focus on the implementation of the rot13 filter and
potentially how this patch could give rise to even broader enhancements to
Git's source code that eventually have a user-visible, positive impact.

Ciao,
Dscho

Footnote *1*: https://lore.kernel.org/git/YstJl+5BPyR5RWnR@tapette.crustytoothpaste.net/
Footnote *2*: https://lore.kernel.org/git/xmqqbktvl0s4.fsf@gitster.g/
Footnote *3*: https://github.com/curl/curl/blob/master/scripts/checksrc.pl
Johannes Schindelin Aug. 9, 2022, 10:37 a.m. UTC | #8
Hi Junio,

On Mon, 1 Aug 2022, Junio C Hamano wrote:

>  * read_capabilities() feeds the buffer taken from
>    packet_read_line(), so buf[size] should be NUL terminated
>    already.

Could you help me agree?

In `packet_read_line()`, we call `packet_read()` with the
`PACKET_READ_CHOMP_NEWLINE` option, but we do not NUL-terminate the
buffer.

See https://github.com/git/git/blob/v2.37.1/pkt-line.c#L488-L494

In `packet_read()`, we call `packet_read_with_status()`, but do not
NUL-terminate the buffer.

See https://github.com/git/git/blob/v2.37.1/pkt-line.c#L478-L486

In `packet_read_with_status()`, I see that we call `get_packet_data()`
which does not NUL-terminate the buffer. Then we parse the length via
`packet_length()` which does not NUL-terminate the buffer.

Then, crucially, if the packet length is smaller than 3, we set the length
that is returned to 0 and return early indicating the conditions
`PACKET_READ_FLUSH`, `PACKET_READ_DELIM`, or `PACKET_READ_RESPONSE_END`,
which are ignored by `packet_read()`.

In this instance, the buffer is not NUL-terminated, I think. But if you
see that I missed something, I would like to know.

See https://github.com/git/git/blob/v2.37.1/pkt-line.c#L399-L476

And yes, in the case that there is a regular payload,
https://github.com/git/git/blob/v2.37.1/pkt-line.c#L456 NUL-terminates the
buffer.

And the proposed `get_value()` function would avoid returning a not
NUL-terminated buffer by virtue of using the `skip_prefix_mem()` function
with a non-empty prefix but a zero length buffer.

Therefore it is _still_ safe to skip the `buf[size] = '\0';` assignment
despite what I wrote above, even if it adds yet another piece of code to
Git's source code which is harder than necessary to reason about.

After all, it took me half an hour to research and write up this mail,
when reading `buf[size] = '\0';` would have taken all of two seconds to
verify that the code is safe.

Ciao,
Dscho
Johannes Schindelin Aug. 9, 2022, 10:47 a.m. UTC | #9
Hi Matheus,

On Sun, 31 Jul 2022, Matheus Tavares wrote:

> diff --git a/pkt-line.c b/pkt-line.c
> index 8e43c2def4..ce4e73b683 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -309,7 +309,8 @@ int write_packetized_from_fd_no_flush(int fd_in, int fd_out)
>  	return err;
>  }
>
> -int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_out)
> +int write_packetized_from_buf_no_flush_count(const char *src_in, size_t len,
> +					     int fd_out, int *packet_counter)
>  {
>  	int err = 0;
>  	size_t bytes_written = 0;
> @@ -324,6 +325,8 @@ int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_ou
>  			break;
>  		err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write);
>  		bytes_written += bytes_to_write;
> +		if (packet_counter)
> +			(*packet_counter)++;

The only reason why we do this here is to try to imitate the Perl script
that prints out a dot for every packet written, right?

But the Perl script wrote out those dots immediately and individually, not
in one go after writing all the packets.

Unless the tests rely on the dots in the output, I would therefore
recommend to simply scrap this functionality (and to write about it in the
commit message, with the rationale that it does not fit into the current C
code's paradigms and would require intrusive changes of questionable
benefit) and avoid touching `pkt-line.[ch]` altogether.

> [...]
> diff --git a/pkt-line.h b/pkt-line.h
> [...]
> +static void packet_initialize(void)
> +{
> +	int size;
> +	char *pkt_buf = packet_read_line(0, &size);
> +
> +	if (!pkt_buf || strncmp(pkt_buf, "git-filter-client", size))
> +		die("bad initialize: '%s'", xstrndup(pkt_buf, size));
> +
> +	pkt_buf = packet_read_line(0, &size);
> +	if (!pkt_buf || strncmp(pkt_buf, "version=2", size))
> +		die("bad version: '%.*s'", (int)size, pkt_buf);

This would mistake a packet `v` for being valid.

Junio pointed out in his review that `packet_read_line()` already
NUL-terminates the buffer (except when it returns `NULL`), therefore we
can write this instead:

	if (!pkt_buf || strcmp(pkt_buf, "version=2"))

Likewise with `"git-filter-client"`.

> +
> +	pkt_buf = packet_read_line(0, &size);
> +	if (pkt_buf)
> +		die("bad version end: '%.*s'", (int)size, pkt_buf);
> +
> +	packet_write_fmt(1, "git-filter-server");
> +	packet_write_fmt(1, "version=2");
> +	packet_flush(1);
> +}
> +
> +static char *rot13_usage = "test-tool rot13-filter [--always-delay] <log path> <capabilities>";
> +
> +int cmd__rot13_filter(int argc, const char **argv)
> +{
> +	const char **caps;
> +	int cap_count, i = 1;
> +	struct strset remote_caps = STRSET_INIT;
> +
> +	if (argc > 1 && !strcmp(argv[1], "--always-delay")) {
> +		always_delay = 1;
> +		i++;
> +	}

This is so much simpler to read than if it used `parse_options()`,
therefore I think that this is good as-is.

It is probably obvious that I did not spend as much time on reviewing this
round as I did the previous time (after all, if one spends three hours
here and three hours there, pretty soon one ends up having missed lunch
before knowing it). However, it is equally obvious that you did a great
job addressing my review of the previous round.

Thank you,
Dscho
Junio C Hamano Aug. 10, 2022, 6:37 p.m. UTC | #10
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I would like to encourage you to think of ways how this project could
> avoid the cost (mental space, reviewer time, back and forth between
> contributor and reviewer) of such trivial code formatting issues.

I do not need your encouragement.  I am sure the submitter could
have run clang-format or checkpatch.pl or whatever and noticed the
issue.  Small style diversions in submitted patches are distracting
enough to prevent me from concentrating on and noticing problems in
the more important aspects like correctness and leakiness.  That is
why people get formatting issues pointed out and CodingGuidelines
talks about styles.

Checkpatch is OK, but IIRC, you cannot ask to check "only the code I
changed in this patch" to clang-format, which may be the show
stopper.  Otherwise, I would quite welcome an automated "pre-flight"
automation, like "make" target, that submitters can use and GGG can
help them use.

Thanks.
Junio C Hamano Aug. 10, 2022, 7:58 p.m. UTC | #11
Junio C Hamano <gitster@pobox.com> writes:

> Checkpatch is OK, but IIRC, you cannot ask to check "only the code I
> changed in this patch" to clang-format, which may be the show
> stopper.  Otherwise, I would quite welcome an automated "pre-flight"
> automation, like "make" target, that submitters can use and GGG can
> help them use.

Let me step a bit back.  I do not think any automated tool would be
free of false positives, so it is OK to configure the tool loose and
have "judgement case" still be dealt by human reviewer, but if the
automation is overly strict, that would probably waste submitters'
time too much.

You would need to accept that the new contributors are human and are
capable of learning and configuring editors on their end, and after
they get reminded of the style rules once or twice and they get used
to the process, they would also help coaching yet even newer
contributors.

I personally feel that the level of style issues that need to be
pointed out among the recent list traffic is not overly excessive.

Thanks.
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 04d0fd1fe6..7cfcf3a911 100644
--- a/Makefile
+++ b/Makefile
@@ -764,6 +764,7 @@  TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
 TEST_BUILTINS_OBJS += test-reftable.o
 TEST_BUILTINS_OBJS += test-regex.o
+TEST_BUILTINS_OBJS += test-rot13-filter.o
 TEST_BUILTINS_OBJS += test-repository.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
 TEST_BUILTINS_OBJS += test-run-command.o
diff --git a/pkt-line.c b/pkt-line.c
index 8e43c2def4..ce4e73b683 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -309,7 +309,8 @@  int write_packetized_from_fd_no_flush(int fd_in, int fd_out)
 	return err;
 }
 
-int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_out)
+int write_packetized_from_buf_no_flush_count(const char *src_in, size_t len,
+					     int fd_out, int *packet_counter)
 {
 	int err = 0;
 	size_t bytes_written = 0;
@@ -324,6 +325,8 @@  int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_ou
 			break;
 		err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write);
 		bytes_written += bytes_to_write;
+		if (packet_counter)
+			(*packet_counter)++;
 	}
 	return err;
 }
diff --git a/pkt-line.h b/pkt-line.h
index 6d2a63db23..804fe687fb 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -32,7 +32,13 @@  void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((f
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int write_packetized_from_fd_no_flush(int fd_in, int fd_out);
-int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_out);
+int write_packetized_from_buf_no_flush_count(const char *src_in, size_t len,
+					     int fd_out, int *packet_counter);
+static inline int write_packetized_from_buf_no_flush(const char *src_in,
+						     size_t len, int fd_out)
+{
+	return write_packetized_from_buf_no_flush_count(src_in, len, fd_out, NULL);
+}
 
 /*
  * Stdio versions of packet_write functions. When mixing these with fd
diff --git a/t/helper/test-rot13-filter.c b/t/helper/test-rot13-filter.c
new file mode 100644
index 0000000000..d584511f8e
--- /dev/null
+++ b/t/helper/test-rot13-filter.c
@@ -0,0 +1,379 @@ 
+/*
+ * Example implementation for the Git filter protocol version 2
+ * See Documentation/gitattributes.txt, section "Filter Protocol"
+ *
+ * Usage: test-tool rot13-filter [--always-delay] <log path> <capabilities>
+ *
+ * Log path defines a debug log file that the script writes to. The
+ * subsequent arguments define a list of supported protocol capabilities
+ * ("clean", "smudge", etc).
+ *
+ * When --always-delay is given all pathnames with the "can-delay" flag
+ * that don't appear on the list bellow are delayed with a count of 1
+ * (see more below).
+ *
+ * This implementation supports special test cases:
+ * (1) If data with the pathname "clean-write-fail.r" is processed with
+ *     a "clean" operation then the write operation will die.
+ * (2) If data with the pathname "smudge-write-fail.r" is processed with
+ *     a "smudge" operation then the write operation will die.
+ * (3) If data with the pathname "error.r" is processed with any
+ *     operation then the filter signals that it cannot or does not want
+ *     to process the file.
+ * (4) If data with the pathname "abort.r" is processed with any
+ *     operation then the filter signals that it cannot or does not want
+ *     to process the file and any file after that is processed with the
+ *     same command.
+ * (5) If data with a pathname that is a key in the delay hash is
+ *     requested (e.g. "test-delay10.a") then the filter responds with
+ *     a "delay" status and sets the "requested" field in the delay hash.
+ *     The filter will signal the availability of this object after
+ *     "count" (field in delay hash) "list_available_blobs" commands.
+ * (6) If data with the pathname "missing-delay.a" is processed that the
+ *     filter will drop the path from the "list_available_blobs" response.
+ * (7) If data with the pathname "invalid-delay.a" is processed that the
+ *     filter will add the path "unfiltered" which was not delayed before
+ *     to the "list_available_blobs" response.
+ */
+
+#include "test-tool.h"
+#include "pkt-line.h"
+#include "string-list.h"
+#include "strmap.h"
+
+static FILE *logfile;
+static int always_delay, has_clean_cap, has_smudge_cap;
+static struct strmap delay = STRMAP_INIT;
+
+static char *rot13(char *str)
+{
+	char *c;
+	for (c = str; *c; c++)
+		if (isalpha(*c))
+			*c += tolower(*c) < 'n' ? 13 : -13;
+	return str;
+}
+
+static char *get_value(char *buf, size_t size, const char *key)
+{
+	const char *orig_buf = buf;
+	int orig_size = (int)size;
+
+	if (!skip_prefix_mem((const char *)buf, size, key, (const char **)&buf, &size) ||
+	    !skip_prefix_mem((const char *)buf, size, "=", (const char **)&buf, &size) ||
+	    !size)
+		die("expected key '%s', got '%.*s'",
+		    key, orig_size, orig_buf);
+
+	buf[size] = '\0';
+	return buf;
+}
+
+/*
+ * Read a text packet, expecting that it is in the form "key=value" for
+ * the given key. An EOF does not trigger any error and is reported
+ * back to the caller with NULL. Die if the "key" part of "key=value" does
+ * not match the given key, or the value part is empty.
+ */
+static char *packet_key_val_read(const char *key)
+{
+	int size;
+	char *buf;
+	if (packet_read_line_gently(0, &size, &buf) < 0)
+		return NULL;
+	return xstrdup(get_value(buf, size, key));
+}
+
+static inline void assert_remote_capability(struct strset *caps, const char *cap)
+{
+	if (!strset_contains(caps, cap))
+		die("required '%s' capability not available from remote", cap);
+}
+
+static void read_capabilities(struct strset *remote_caps)
+{
+	for (;;) {
+		int size;
+		char *buf = packet_read_line(0, &size);
+		if (!buf)
+			break;
+		strset_add(remote_caps, get_value(buf, size, "capability"));
+	}
+
+	assert_remote_capability(remote_caps, "clean");
+	assert_remote_capability(remote_caps, "smudge");
+	assert_remote_capability(remote_caps, "delay");
+}
+
+static void check_and_write_capabilities(struct strset *remote_caps,
+					 const char **caps, int caps_count)
+{
+	int i;
+	for (i = 0; i < caps_count; i++) {
+		if (!strset_contains(remote_caps, caps[i]))
+			die("our capability '%s' is not available from remote",
+			    caps[i]);
+		packet_write_fmt(1, "capability=%s\n", caps[i]);
+	}
+	packet_flush(1);
+}
+
+struct delay_entry {
+	int requested, count;
+	char *output;
+};
+
+static void free_delay_entries(void)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *ent;
+
+	strmap_for_each_entry(&delay, &iter, ent) {
+		struct delay_entry *delay_entry = ent->value;
+		free(delay_entry->output);
+		free(delay_entry);
+	}
+	strmap_clear(&delay, 0);
+}
+
+static void add_delay_entry(char *pathname, int count, int requested)
+{
+	struct delay_entry *entry = xcalloc(1, sizeof(*entry));
+	entry->count = count;
+	entry->requested = requested;
+	if (strmap_put(&delay, pathname, entry))
+		BUG("adding the same path twice to delay hash?");
+}
+
+static void reply_list_available_blobs_cmd(void)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *ent;
+	struct string_list_item *str_item;
+	struct string_list paths = STRING_LIST_INIT_NODUP;
+
+	/* flush */
+	if (packet_read_line(0, NULL))
+		die("bad list_available_blobs end");
+
+	strmap_for_each_entry(&delay, &iter, ent) {
+		struct delay_entry *delay_entry = ent->value;
+		if (!delay_entry->requested)
+			continue;
+		delay_entry->count--;
+		if (!strcmp(ent->key, "invalid-delay.a")) {
+			/* Send Git a pathname that was not delayed earlier */
+			packet_write_fmt(1, "pathname=unfiltered");
+		}
+		if (!strcmp(ent->key, "missing-delay.a")) {
+			/* Do not signal Git that this file is available */
+		} else if (!delay_entry->count) {
+			string_list_append(&paths, ent->key);
+			packet_write_fmt(1, "pathname=%s", ent->key);
+		}
+	}
+
+	/* Print paths in sorted order. */
+	string_list_sort(&paths);
+	for_each_string_list_item(str_item, &paths)
+		fprintf(logfile, " %s", str_item->string);
+	string_list_clear(&paths, 0);
+
+	packet_flush(1);
+
+	fprintf(logfile, " [OK]\n");
+	packet_write_fmt(1, "status=success");
+	packet_flush(1);
+}
+
+static void command_loop(void)
+{
+	for (;;) {
+		char *buf, *output;
+		int size;
+		char *pathname;
+		struct delay_entry *entry;
+		struct strbuf input = STRBUF_INIT;
+		char *command = packet_key_val_read("command");
+
+		if (!command) {
+			fprintf(logfile, "STOP\n");
+			break;
+		}
+		fprintf(logfile, "IN: %s", command);
+
+		if (!strcmp(command, "list_available_blobs")) {
+			reply_list_available_blobs_cmd();
+			free(command);
+			continue;
+		}
+
+		pathname = packet_key_val_read("pathname");
+		if (!pathname)
+			die("unexpected EOF while expecting pathname");
+		fprintf(logfile, " %s", pathname);
+
+		/* Read until flush */
+		while ((buf = packet_read_line(0, &size))) {
+			if (!strcmp(buf, "can-delay=1")) {
+				entry = strmap_get(&delay, pathname);
+				if (entry && !entry->requested) {
+					entry->requested = 1;
+				} else if (!entry && always_delay) {
+					add_delay_entry(pathname, 1, 1);
+				}
+			} else if (starts_with(buf, "ref=") ||
+				   starts_with(buf, "treeish=") ||
+				   starts_with(buf, "blob=")) {
+				fprintf(logfile, " %s", buf);
+			} else {
+				/*
+				 * In general, filters need to be graceful about
+				 * new metadata, since it's documented that we
+				 * can pass any key-value pairs, but for tests,
+				 * let's be a little stricter.
+				 */
+				die("Unknown message '%s'", buf);
+			}
+		}
+
+
+		read_packetized_to_strbuf(0, &input, 0);
+		fprintf(logfile, " %"PRIuMAX" [OK] -- ", (uintmax_t)input.len);
+
+		entry = strmap_get(&delay, pathname);
+		if (entry && entry->output) {
+			output = entry->output;
+		} else if (!strcmp(pathname, "error.r") || !strcmp(pathname, "abort.r")) {
+			output = "";
+		} else if (!strcmp(command, "clean") && has_clean_cap) {
+			output = rot13(input.buf);
+		} else if (!strcmp(command, "smudge") && has_smudge_cap) {
+			output = rot13(input.buf);
+		} else {
+			die("bad command '%s'", command);
+		}
+
+		if (!strcmp(pathname, "error.r")) {
+			fprintf(logfile, "[ERROR]\n");
+			packet_write_fmt(1, "status=error");
+			packet_flush(1);
+		} else if (!strcmp(pathname, "abort.r")) {
+			fprintf(logfile, "[ABORT]\n");
+			packet_write_fmt(1, "status=abort");
+			packet_flush(1);
+		} else if (!strcmp(command, "smudge") &&
+			   (entry = strmap_get(&delay, pathname)) &&
+			   entry->requested == 1) {
+			fprintf(logfile, "[DELAYED]\n");
+			packet_write_fmt(1, "status=delayed");
+			packet_flush(1);
+			entry->requested = 2;
+			if (entry->output != output) {
+				free(entry->output);
+				entry->output = xstrdup(output);
+			}
+		} else {
+			int i, nr_packets = 0;
+			size_t output_len;
+			const char *p;
+			packet_write_fmt(1, "status=success");
+			packet_flush(1);
+
+			if (skip_prefix(pathname, command, &p) &&
+			    !strcmp(p, "-write-fail.r")) {
+				fprintf(logfile, "[WRITE FAIL]\n");
+				die("%s write error", command);
+			}
+
+			output_len = strlen(output);
+			fprintf(logfile, "OUT: %"PRIuMAX" ", (uintmax_t)output_len);
+
+			if (write_packetized_from_buf_no_flush_count(output,
+				output_len, 1, &nr_packets))
+				die("failed to write buffer to stdout");
+			packet_flush(1);
+
+			for (i = 0; i < nr_packets; i++)
+				fprintf(logfile, ".");
+			fprintf(logfile, " [OK]\n");
+
+			packet_flush(1);
+		}
+		free(pathname);
+		strbuf_release(&input);
+		free(command);
+	}
+}
+
+static void packet_initialize(void)
+{
+	int size;
+	char *pkt_buf = packet_read_line(0, &size);
+
+	if (!pkt_buf || strncmp(pkt_buf, "git-filter-client", size))
+		die("bad initialize: '%s'", xstrndup(pkt_buf, size));
+
+	pkt_buf = packet_read_line(0, &size);
+	if (!pkt_buf || strncmp(pkt_buf, "version=2", size))
+		die("bad version: '%.*s'", (int)size, pkt_buf);
+
+	pkt_buf = packet_read_line(0, &size);
+	if (pkt_buf)
+		die("bad version end: '%.*s'", (int)size, pkt_buf);
+
+	packet_write_fmt(1, "git-filter-server");
+	packet_write_fmt(1, "version=2");
+	packet_flush(1);
+}
+
+static char *rot13_usage = "test-tool rot13-filter [--always-delay] <log path> <capabilities>";
+
+int cmd__rot13_filter(int argc, const char **argv)
+{
+	const char **caps;
+	int cap_count, i = 1;
+	struct strset remote_caps = STRSET_INIT;
+
+	if (argc > 1 && !strcmp(argv[1], "--always-delay")) {
+		always_delay = 1;
+		i++;
+	}
+	if (argc - i < 2)
+		usage(rot13_usage);
+
+	logfile = fopen(argv[i++], "a");
+	if (!logfile)
+		die_errno("failed to open log file");
+
+	caps = argv + i;
+	cap_count = argc - i;
+
+	for (i = 0; i < cap_count; i++) {
+		if (!strcmp(caps[i], "clean"))
+			has_clean_cap = 1;
+		else if (!strcmp(caps[i], "smudge"))
+			has_smudge_cap = 1;
+	}
+
+	add_delay_entry("test-delay10.a", 1, 0);
+	add_delay_entry("test-delay11.a", 1, 0);
+	add_delay_entry("test-delay20.a", 2, 0);
+	add_delay_entry("test-delay10.b", 1, 0);
+	add_delay_entry("missing-delay.a", 1, 0);
+	add_delay_entry("invalid-delay.a", 1, 0);
+
+	fprintf(logfile, "START\n");
+	packet_initialize();
+
+	read_capabilities(&remote_caps);
+	check_and_write_capabilities(&remote_caps, caps, cap_count);
+	fprintf(logfile, "init handshake complete\n");
+	strset_clear(&remote_caps);
+
+	command_loop();
+
+	fclose(logfile);
+	free_delay_entries();
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 318fdbab0c..d6a560f832 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -65,6 +65,7 @@  static struct test_cmd cmds[] = {
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
 	{ "reftable", cmd__reftable },
+	{ "rot13-filter", cmd__rot13_filter },
 	{ "dump-reftable", cmd__dump_reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index bb79927163..21a91b1019 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -54,6 +54,7 @@  int cmd__read_cache(int argc, const char **argv);
 int cmd__read_graph(int argc, const char **argv);
 int cmd__read_midx(int argc, const char **argv);
 int cmd__ref_store(int argc, const char **argv);
+int cmd__rot13_filter(int argc, const char **argv);
 int cmd__reftable(int argc, const char **argv);
 int cmd__regex(int argc, const char **argv);
 int cmd__repository(int argc, const char **argv);