diff mbox series

[v2] t/t0021: convert the rot13-filter.pl script to C

Message ID f38f722de7c3323207eda5ea632b5acd3765c285.1658675222.git.matheus.bernardino@usp.br (mailing list archive)
State New, archived
Headers show
Series [v2] t/t0021: convert the rot13-filter.pl script to C | expand

Commit Message

Matheus Tavares July 24, 2022, 3:09 p.m. UTC
This script is currently used by three test files: t0021-conversion.sh,
t2080-parallel-checkout-basics.sh, and
t2082-parallel-checkout-attributes.sh. To avoid the need for the PERL
dependency at these tests, let's convert the script to a C test-tool
command.

Note that there is a small adjustment needed at test t0021-conversion.sh
because it depended on a specific error message given by perl's die
routine.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---

Changes since v1:
- Squashed the two patches together.
- Declared `remote_caps` at cmd__rot13_filter()'s stack and passed it
  down the call stack instead of dynamic allocation.

 Makefile                                |   1 +
 pkt-line.c                              |  13 +-
 pkt-line.h                              |   2 +
 t/helper/test-rot13-filter.c            | 393 ++++++++++++++++++++++++
 t/helper/test-tool.c                    |   1 +
 t/helper/test-tool.h                    |   1 +
 t/t0021-conversion.sh                   |  71 ++---
 t/t0021/rot13-filter.pl                 | 247 ---------------
 t/t2080-parallel-checkout-basics.sh     |   7 +-
 t/t2082-parallel-checkout-attributes.sh |   7 +-
 10 files changed, 447 insertions(+), 296 deletions(-)
 create mode 100644 t/helper/test-rot13-filter.c
 delete mode 100644 t/t0021/rot13-filter.pl

Comments

Johannes Schindelin July 28, 2022, 4:58 p.m. UTC | #1
Hi Matheus,

On Sun, 24 Jul 2022, Matheus Tavares wrote:

> This script is currently used by three test files: t0021-conversion.sh,
> t2080-parallel-checkout-basics.sh, and
> t2082-parallel-checkout-attributes.sh. To avoid the need for the PERL
> dependency at these tests, let's convert the script to a C test-tool
> command.

Great!

>  - Squashed the two patches together.

I see why this might have been suggested, but it definitely made it more
challenging for me to review. You see, it is easy to just fly over a patch
that simply removes the `PERL` prereq, but it is much harder to jump back
and forth over all of these removals when the `.c` version of the filter
is added before them and the `.pl` version is removed after them. So I
find that it was bad advice, but I do not fault you for following it (we
all want reviews to just be over already and therefore sometimes pander to
the reviewers, no matter how much or little sense their feedback makes).

It just would have been easier for me to review if the chaff was separated
from the wheat, so to say.

To illustrate my point: it was a bit of a challenge to find the adjustment
of the "smudge write error at" needle in all of that cruft. It would have
made my life as a reviewer substantially easier had the patch
series been organized this way (which I assume you had before the feedback
you received demanded to squash everything in one hot pile):

	1/3 adjust the needle for the error message
	2/3 implement the rot13-filter in C
	3/3 use the test-tool in the tests and remove the PERL prereq, and
	    remove rot13-filter.pl

> [...]
> diff --git a/pkt-line.c b/pkt-line.c
> index 8e43c2def4..4425bdae36 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -309,9 +309,10 @@ int write_packetized_from_fd_no_flush(int fd_in, int fd_out)
>  	return err;
>  }
>
> -int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_out)
> +int write_packetized_from_buf_no_flush_count(const char *src_in, size_t len,
> +					     int fd_out, int *count_ptr)
>  {
> -	int err = 0;
> +	int err = 0, count = 0;
>  	size_t bytes_written = 0;
>  	size_t bytes_to_write;
>
> @@ -324,10 +325,18 @@ int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_ou
>  			break;
>  		err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write);
>  		bytes_written += bytes_to_write;
> +		count++;
>  	}
> +	if (count_ptr)
> +		*count_ptr = count;

This is not just a counter, but a packet counter, right? In any case, it
would probably make more sense to increment the value directly:

		if (count_ptr)
			(*count_ptr)++;

More on that below, where you use it.

>  	return err;
>  }
>
> +int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_out)
> +{
> +	return write_packetized_from_buf_no_flush_count(src_in, len, fd_out, NULL);
> +}

Have you considered making this a `static inline` in `pkt-line.h`?

> [...]
> diff --git a/t/helper/test-rot13-filter.c b/t/helper/test-rot13-filter.c
> new file mode 100644
> index 0000000000..536111f272
> --- /dev/null
> +++ b/t/helper/test-rot13-filter.c
> @@ -0,0 +1,393 @@
> +/*
> + * Example implementation for the Git filter protocol version 2
> + * See Documentation/gitattributes.txt, section "Filter Protocol"
> + *
> + * Usage: test-tool rot13-filter [--always-delay] <log path> <capabilities>
> + *
> + * Log path defines a debug log file that the script writes to. The
> + * subsequent arguments define a list of supported protocol capabilities
> + * ("clean", "smudge", etc).
> + *
> + * When --always-delay is given all pathnames with the "can-delay" flag
> + * that don't appear on the list bellow are delayed with a count of 1
> + * (see more below).
> + *
> + * This implementation supports special test cases:
> + * (1) If data with the pathname "clean-write-fail.r" is processed with
> + *     a "clean" operation then the write operation will die.
> + * (2) If data with the pathname "smudge-write-fail.r" is processed with
> + *     a "smudge" operation then the write operation will die.
> + * (3) If data with the pathname "error.r" is processed with any
> + *     operation then the filter signals that it cannot or does not want
> + *     to process the file.
> + * (4) If data with the pathname "abort.r" is processed with any
> + *     operation then the filter signals that it cannot or does not want
> + *     to process the file and any file after that is processed with the
> + *     same command.
> + * (5) If data with a pathname that is a key in the delay hash is
> + *     requested (e.g. "test-delay10.a") then the filter responds with
> + *     a "delay" status and sets the "requested" field in the delay hash.
> + *     The filter will signal the availability of this object after
> + *     "count" (field in delay hash) "list_available_blobs" commands.
> + * (6) If data with the pathname "missing-delay.a" is processed that the
> + *     filter will drop the path from the "list_available_blobs" response.
> + * (7) If data with the pathname "invalid-delay.a" is processed that the
> + *     filter will add the path "unfiltered" which was not delayed before
> + *     to the "list_available_blobs" response.
> + */
> +
> +#include "test-tool.h"
> +#include "pkt-line.h"
> +#include "string-list.h"
> +#include "strmap.h"
> +
> +static FILE *logfile;
> +static int always_delay;
> +static struct strmap delay = STRMAP_INIT;
> +static struct string_list requested_caps = STRING_LIST_INIT_NODUP;
> +
> +static int has_capability(const char *cap)
> +{
> +	return unsorted_string_list_has_string(&requested_caps, cap);
> +}
> +
> +static char *rot13(char *str)
> +{
> +	char *c;
> +	for (c = str; *c; c++) {
> +		if (*c >= 'a' && *c <= 'z')
> +			*c = 'a' + (*c - 'a' + 13) % 26;
> +		else if (*c >= 'A' && *c <= 'Z')
> +			*c = 'A' + (*c - 'A' + 13) % 26;

That's quite verbose, but it _is_ correct (if a bit harder than necessary
to validate, I admit that I had to look up whether `%`'s precedence is higher
than `+` in https://en.cppreference.com/w/c/language/operator_precedence).

A conciser way (also easier to reason about):

	for (c = str; *c; c++)
		if (isalpha(*c))
			*c += tolower(*c) < 'n' ? 13 : -13;

For fun, you could also look at
https://hea-www.harvard.edu/~fine/Tech/rot13.html whether you want to use
yet another approach.

> +	}
> +	return str;
> +}
> +
> +static char *skip_key_dup(const char *buf, size_t size, const char *key)
> +{
> +	struct strbuf keybuf = STRBUF_INIT;
> +	strbuf_addf(&keybuf, "%s=", key);
> +	if (!skip_prefix_mem(buf, size, keybuf.buf, &buf, &size) || !size)
> +		die("bad %s: '%s'", key, xstrndup(buf, size));
> +	strbuf_release(&keybuf);
> +	return xstrndup(buf, size);

This does what we want it to do, but it looks as if it was code translated
from a language that does not care one bit about allocations to a language
that cares a lot.

For example, instead of allocating a `strbuf` just to append `=` to the
key, in idiomatic C this code would read like this:

static char *get_value(char *buf, size_t size, const char *key)
{
	const char *orig_buf = buf;
	int orig_size = (int)size;

	if (!skip_prefix_mem(buf, size, key, &buf, &size) ||
	    !skip_prefix_mem(buf, size, "=", &buf, &size) ||
	    !size)
		die("expected key '%s', got '%.*s'",
		    key, orig_size, orig_buf);

	return xstrndup(buf, size);
}

I was tempted, even, to suggest returning a `const char *` after
NUL-terminating the line (via `buf[size] = '\0';`) instead of
`xstrndup()`ing it, but `packet_read_line()` reads into the singleton
`packet_buffer` and we use e.g. the `command` that is returned from this
function after reading the next packet, so the command would most likely
be overwritten.

> +}
> +
> +/*
> + * Read a text packet, expecting that it is in the form "key=value" for
> + * the given key. An EOF does not trigger any error and is reported
> + * back to the caller with NULL. Die if the "key" part of "key=value" does
> + * not match the given key, or the value part is empty.
> + */
> +static char *packet_key_val_read(const char *key)
> +{
> +	int size;
> +	char *buf;
> +	if (packet_read_line_gently(0, &size, &buf) < 0)
> +		return NULL;
> +	return skip_key_dup(buf, size, key);
> +}
> +
> +static void packet_read_capabilities(struct string_list *caps)
> +{
> +	while (1) {

In Git's source code, I think we prefer `for (;;)`. But not by much:

$ git grep 'while (1)' \*.c | wc
    128     508    3745

$ git grep 'for (;;)' \*.c | wc
    156     614    4389

> +		int size;
> +		char *buf = packet_read_line(0, &size);
> +		if (!buf)
> +			break;
> +		string_list_append_nodup(caps,
> +					 skip_key_dup(buf, size, "capability"));

It is tempting to use unsorted string lists for everything because Perl
makes that relatively easy.

However, in this instance I would strongly recommend using something more
akin to Perl's "hash" data structure, in this instance a `strset`.

> +	}
> +}
> +
> +/* Read remote capabilities and check them against capabilities we require */
> +static void packet_read_and_check_capabilities(struct string_list *remote_caps,
> +					       struct string_list *required_caps)
> +{
> +	struct string_list_item *item;
> +	packet_read_capabilities(remote_caps);
> +	for_each_string_list_item(item, required_caps) {
> +		if (!unsorted_string_list_has_string(remote_caps, item->string)) {
> +			die("required '%s' capability not available from remote",
> +			    item->string);
> +		}
> +	}
> +}

This is a pretty literal translation from Perl to C, and a couple of years
ago, I would have done the same.

However, these days I would recommend against it. In this instance, we are
really only interested in three capabilities: clean, smudge and delay. It
is much, much simpler to read in the capabilities and then manually verify
that the three required ones were included:

static void read_capabilities(struct strset *remote_caps)
{
	char *cap
	while ((cap = packet_key_val_read("capability")))
		strset_add(remote_caps, cap);

	if (!strset_contains(remote_caps, "clean"))
		die("required 'clean' capability not available from remote");
	if (!strset_contains(remote_caps, "smudge"))
		die("required 'smudge' capability not available from remote");
	if (!strset_contains(remote_caps, "delay"))
		die("required 'delay' capability not available from remote");
}

> +
> +/*
> + * Check our capabilities we want to advertise against the remote ones
> + * and then advertise our capabilities
> + */
> +static void packet_check_and_write_capabilities(struct string_list *remote_caps,
> +						struct string_list *our_caps)

The list of "our caps" comes from the command-line. In C, this means we
get a `const char **argv` and an `int argc`. So:

static void check_and_write_capabilities(struct strset *remote_caps,
					 const char **caps, int caps_count)
{
	int i;

	for (i = 0; i < caps_count; i++) {
		if (!strset_contains(remote_caps, caps[i]))
			die("our capability '%s' is not available from remote",
			    caps[i]);

		packet_write_fmt(1, "capability=%s\n", caps[i]);
	}
	packet_flush(1);
}

And then we would call it via

	check_and_write_capabilities(remote_caps, argv + 1, argc - 1);

> +
> +struct delay_entry {
> +	int requested, count;
> +	char *output;
> +};

Since you declare this here, it makes most sense to define
`free_delay_hash()` (which should really be named `free_delay_entries()`)
and `add_delay_entry()` here.

> +
> +static void command_loop(void)
> +{
> +	while (1) {
> +		char *command = packet_key_val_read("command");
> +		if (!command) {
> +			fprintf(logfile, "STOP\n");
> +			break;
> +		}
> +		fprintf(logfile, "IN: %s", command);

We will also need to `fflush(logfile)` here, to imitate the Perl script's
behavior more precisely.

> +
> +		if (!strcmp(command, "list_available_blobs")) {
> +			struct hashmap_iter iter;
> +			struct strmap_entry *ent;
> +			struct string_list_item *str_item;
> +			struct string_list paths = STRING_LIST_INIT_NODUP;
> +
> +			/* flush */
> +			if (packet_read_line(0, NULL))
> +				die("bad list_available_blobs end");
> +
> +			strmap_for_each_entry(&delay, &iter, ent) {
> +				struct delay_entry *delay_entry = ent->value;
> +				if (!delay_entry->requested)
> +					continue;
> +				delay_entry->count--;
> +				if (!strcmp(ent->key, "invalid-delay.a")) {
> +					/* Send Git a pathname that was not delayed earlier */
> +					packet_write_fmt(1, "pathname=unfiltered");
> +				}
> +				if (!strcmp(ent->key, "missing-delay.a")) {
> +					/* Do not signal Git that this file is available */
> +				} else if (!delay_entry->count) {
> +					string_list_insert(&paths, ent->key);
> +					packet_write_fmt(1, "pathname=%s", ent->key);
> +				}
> +			}
> +
> +			/* Print paths in sorted order. */

The Perl script does not order them specifically. Do we really have to do
that here?

In any case, it is more performant to append the paths in an unsorted way
and then sort them once in the end (that's O(N log(N)) instead of O(N^2)).

> +			for_each_string_list_item(str_item, &paths)
> +				fprintf(logfile, " %s", str_item->string);
> +			string_list_clear(&paths, 0);
> +
> +			packet_flush(1);
> +
> +			fprintf(logfile, " [OK]\n");
> +			packet_write_fmt(1, "status=success");
> +			packet_flush(1);

I know the Perl script uses an else here, but I'd much rather insert a
`continue` at the end of the `list_available_blobs` clause and de-indent
the remainder of the loop body.

> +		} else {
> +			char *buf, *output;
> +			int size;
> +			char *pathname;
> +			struct delay_entry *entry;
> +			struct strbuf input = STRBUF_INIT;
> +
> +			pathname = packet_key_val_read("pathname");
> +			if (!pathname)
> +				die("unexpected EOF while expecting pathname");
> +			fprintf(logfile, " %s", pathname);

Again, let's `fflush(logfile)` here.

> +
> +			/* Read until flush */
> +			buf = packet_read_line(0, &size);
> +			while (buf) {

Let's write this in more idiomatic C:

			while ((buf = packet_read_line(0, &size))) {

> +				if (!strcmp(buf, "can-delay=1")) {
> +					entry = strmap_get(&delay, pathname);
> +					if (entry && !entry->requested) {
> +						entry->requested = 1;
> +					} else if (!entry && always_delay) {
> +						entry = xcalloc(1, sizeof(*entry));
> +						entry->requested = 1;
> +						entry->count = 1;
> +						strmap_put(&delay, pathname, entry);

I guess here is our chance to extend the signature of `add_delay_entry()`
to accept a `requested` parameter, and to call that here.

> +					}
> +				} else if (starts_with(buf, "ref=") ||
> +					   starts_with(buf, "treeish=") ||
> +					   starts_with(buf, "blob=")) {
> +					fprintf(logfile, " %s", buf);
> +				} else {
> +					/*
> +					 * In general, filters need to be graceful about
> +					 * new metadata, since it's documented that we
> +					 * can pass any key-value pairs, but for tests,
> +					 * let's be a little stricter.
> +					 */
> +					die("Unknown message '%s'", buf);
> +				}
> +				buf = packet_read_line(0, &size);
> +			}
> +
> +
> +			read_packetized_to_strbuf(0, &input, 0);
> +			fprintf(logfile, " %"PRIuMAX" [OK] -- ", (uintmax_t)input.len);

This reads _so much nicer_ than the Perl version!

> +
> +			entry = strmap_get(&delay, pathname);
> +			if (entry && entry->output) {
> +				output = entry->output;
> +			} else if (!strcmp(pathname, "error.r") || !strcmp(pathname, "abort.r")) {
> +				output = "";
> +			} else if (!strcmp(command, "clean") && has_capability("clean")) {
> +				output = rot13(input.buf);
> +			} else if (!strcmp(command, "smudge") && has_capability("smudge")) {
> +				output = rot13(input.buf);
> +			} else {
> +				die("bad command '%s'", command);
> +			}
> +
> +			if (!strcmp(pathname, "error.r")) {
> +				fprintf(logfile, "[ERROR]\n");
> +				packet_write_fmt(1, "status=error");
> +				packet_flush(1);
> +			} else if (!strcmp(pathname, "abort.r")) {
> +				fprintf(logfile, "[ABORT]\n");
> +				packet_write_fmt(1, "status=abort");
> +				packet_flush(1);
> +			} else if (!strcmp(command, "smudge") &&
> +				   (entry = strmap_get(&delay, pathname)) &&
> +				   entry->requested == 1) {
> +				fprintf(logfile, "[DELAYED]\n");
> +				packet_write_fmt(1, "status=delayed");
> +				packet_flush(1);
> +				entry->requested = 2;
> +				entry->output = xstrdup(output);

We need to call `free(entry->output)` before that lest we leak memory, but
only if `output` is not identical anyway:

				if (entry->output != output) {
					free(entry->output);
					entry->output = xstrdup(output);
				}


> +			} else {
> +				int i, nr_packets;
> +				size_t output_len;
> +				struct strbuf sb = STRBUF_INIT;
> +				packet_write_fmt(1, "status=success");
> +				packet_flush(1);
> +
> +				strbuf_addf(&sb, "%s-write-fail.r", command);
> +				if (!strcmp(pathname, sb.buf)) {

We can easily avoid allocating the string just for comparing it:

				const char *p;

				if (skip_prefix(pathname, command, &p) &&
				    !strcmp(p, "-write-fail.r")) {

> +					fprintf(logfile, "[WRITE FAIL]\n");

					fflush(logfile) ;-)

> +					die("%s write error", command);
> +				}
> +
> +				output_len = strlen(output);
> +				fprintf(logfile, "OUT: %"PRIuMAX" ", (uintmax_t)output_len);
> +
> +				if (write_packetized_from_buf_no_flush_count(output,
> +					output_len, 1, &nr_packets))
> +					die("failed to write buffer to stdout");
> +				packet_flush(1);
> +
> +				for (i = 0; i < nr_packets; i++)
> +					fprintf(logfile, ".");

That's not quite the same as the Perl script does: it prints a '.'
(without flushing, though) _every_ time it wrote a packet.

If you want to emulate that, you will have to copy/edit that loop (and in
that case, the insanely long-named function
`write_packetized_from_buf_no_flush_count()` is unnecessary, too).

> +				fprintf(logfile, " [OK]\n");
> +
> +				packet_flush(1);
> +				strbuf_release(&sb);
> +			}
> +			free(pathname);
> +			strbuf_release(&input);
> +		}
> +		free(command);
> +	}
> +}
> +
> +static void free_delay_hash(void)
> +{
> +	struct hashmap_iter iter;
> +	struct strmap_entry *ent;
> +
> +	strmap_for_each_entry(&delay, &iter, ent) {
> +		struct delay_entry *delay_entry = ent->value;
> +		free(delay_entry->output);
> +		free(delay_entry);
> +	}
> +	strmap_clear(&delay, 0);
> +}
> +
> +static void add_delay_entry(char *pathname, int count)
> +{
> +	struct delay_entry *entry = xcalloc(1, sizeof(*entry));
> +	entry->count = count;
> +	if (strmap_put(&delay, pathname, entry))
> +		BUG("adding the same path twice to delay hash?");
> +}
> +
> +static void packet_initialize(const char *name, int version)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	int size;
> +	char *pkt_buf = packet_read_line(0, &size);
> +
> +	strbuf_addf(&sb, "%s-client", name);
> +	if (!pkt_buf || strncmp(pkt_buf, sb.buf, size))

We do not need the flexibility of the Perl package, where `name` is a
parameter. We can hard-code `git-filter-client` here. I.e. something like
this:

	if (!pkt_buf || size != 17 ||
	    strncmp(pkt_buf, "git-filter-client", 17))

> +		die("bad initialize: '%s'", xstrndup(pkt_buf, size));
> +
> +	strbuf_reset(&sb);
> +	strbuf_addf(&sb, "version=%d", version);

Same here. We do not need to allocate a string just to compare it to the
packet's payload.

> +	pkt_buf = packet_read_line(0, &size);
> +	if (!pkt_buf || strncmp(pkt_buf, sb.buf, size))
> +		die("bad version: '%s'", xstrndup(pkt_buf, size));
> +
> +	pkt_buf = packet_read_line(0, &size);
> +	if (pkt_buf)
> +		die("bad version end: '%s'", xstrndup(pkt_buf, size));
> +
> +	packet_write_fmt(1, "%s-server", name);
> +	packet_write_fmt(1, "version=%d", version);
> +	packet_flush(1);
> +	strbuf_release(&sb);
> +}
> +
> +static char *rot13_usage = "test-tool rot13-filter [--always-delay] <log path> <capabilities>";
> +
> +int cmd__rot13_filter(int argc, const char **argv)
> +{
> +	int i = 1;
> +	struct string_list remote_caps = STRING_LIST_INIT_DUP,
> +			   supported_caps = STRING_LIST_INIT_NODUP;
> +
> +	string_list_append(&supported_caps, "clean");
> +	string_list_append(&supported_caps, "smudge");
> +	string_list_append(&supported_caps, "delay");
> +
> +	if (argc > 1 && !strcmp(argv[i], "--always-delay")) {
> +		always_delay = 1;
> +		i++;
> +	}
> +	if (argc - i < 2)
> +		usage(rot13_usage);
> +
> +	logfile = fopen(argv[i++], "a");
> +	if (!logfile)
> +		die_errno("failed to open log file");
> +
> +	for ( ; i < argc; i++)
> +		string_list_append(&requested_caps, argv[i]);
> +
> +	add_delay_entry("test-delay10.a", 1);
> +	add_delay_entry("test-delay11.a", 1);
> +	add_delay_entry("test-delay20.a", 2);
> +	add_delay_entry("test-delay10.b", 1);
> +	add_delay_entry("missing-delay.a", 1);
> +	add_delay_entry("invalid-delay.a", 1);
> +
> +	fprintf(logfile, "START\n");
> +
> +	packet_initialize("git-filter", 2);
> +
> +	packet_read_and_check_capabilities(&remote_caps, &supported_caps);
> +	packet_check_and_write_capabilities(&remote_caps, &requested_caps);
> +	fprintf(logfile, "init handshake complete\n");
> +
> +	string_list_clear(&supported_caps, 0);
> +	string_list_clear(&remote_caps, 0);
> +
> +	command_loop();
> +
> +	fclose(logfile);
> +	string_list_clear(&requested_caps, 0);
> +	free_delay_hash();
> +	return 0;
> +}

Other than that, this looks great!

Thank you,
Dscho
Junio C Hamano July 28, 2022, 5:54 p.m. UTC | #2
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>>  - Squashed the two patches together.
>
> I see why this might have been suggested, but it definitely made it more
> challenging for me to review. You see, it is easy to just fly over a patch
> that simply removes the `PERL` prereq, but it is much harder to jump back
> and forth over all of these removals when the `.c` version of the filter
> is added before them and the `.pl` version is removed after them.

Yeah, I tend to agree.

> ...
> Other than that, this looks great!

Yup, thanks for an excellent review.
Ævar Arnfjörð Bjarmason July 28, 2022, 7:50 p.m. UTC | #3
On Thu, Jul 28 2022, Johannes Schindelin wrote:

> [...]
> I see why this might have been suggested, but it definitely made it more
> challenging for me to review. You see, it is easy to just fly over a patch
> that simply removes the `PERL` prereq, but it is much harder to jump back
> and forth over all of these removals when the `.c` version of the filter
> is added before them and the `.pl` version is removed after them. So I
> find that it was bad advice, but I do not fault you for following it (we
> all want reviews to just be over already and therefore sometimes pander to
> the reviewers, no matter how much or little sense their feedback makes).
> [...]
> To illustrate my point: it was a bit of a challenge to find the adjustment
> of the "smudge write error at" needle in all of that cruft. It would have
> made my life as a reviewer substantially easier had the patch
> series been organized this way (which I assume you had before the feedback
> you received demanded to squash everything in one hot pile):

If you don't think a suggestion of mine makes sense, I'd appreciate it
if you just replied me directly, instead of sending this sort of comment
to someone else. I find your wording here to be somewhere between snarky
and mean-spirited. I didn't demand anything.

If this was the first time this sort of thing has occurred I wouldn't
say anything about it, but this is far from being the first time.

In any case, if you read more than a few words into
https://lore.kernel.org/git/220723.86pmhwquie.gmgdl@evledraar.gmail.com/
you'll see that I suggested splitting the removal of the PERL prereq
into its own change, which I think would address what you're bringing up
here.

What I was mainly commenting on was that this series could avoid
introducing code in-between the v1 1/2 and 2/2 which is only needed
because of that split-up. I.e. the "exec", and needing to quote those
arguments.

Which I stand by, I think it's much easier to just do a "git show
--word-diff" on this than reason about how that "chain-loading" is
working, and whether the inter-series state is buggy. But again, the
concern you about the associated verbosity is easy to mitigate.

On the point of pandering to reviewers I find it really nitpicky to ask
for changes to change some working O(N^2)) code in a test-tool to O(N
log(N)), or to avoid a few allocations here & there.

If it was a new git built-in, then sure, but I think our collective time
is much better spend by just letting that sort of thing slide when it
comes to test-tools, which are almost always going to be operating on
the relatively tiny set of test data we expose them too.

Unless Matheus is keenly interested on optimizing this code, that is.

> [...]
> This does what we want it to do, but it looks as if it was code translated
> from a language that does not care one bit about allocations to a language
> that cares a lot.

FWIW Perl cares a lot about allocations, the sort of code you're
commenting on here doesn't involve allocations in Perl in the general
case, since it "allocates ahead", similar to how we use alloc_nr() and
strbuf_reset() patterns.

What it doesn't care about is free()-ing memory, which is an orthagonal
thing. But that's just an optimization, the general assumptios in that
if your program ever needs X MB of memory it's likely to need at least
that much again, or it'll exit() and have the OS clean it up.
Matheus Tavares July 31, 2022, 2:52 a.m. UTC | #4
Hi, Dscho

On Thu, Jul 28, 2022 at 1:58 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> > On Sun, 24 Jul 2022, Matheus Tavares wrote:
> >
> > diff --git a/t/helper/test-rot13-filter.c b/t/helper/test-rot13-filter.c
> > +static char *rot13(char *str)
> > +{
> > +     char *c;
> > +     for (c = str; *c; c++) {
> > +             if (*c >= 'a' && *c <= 'z')
> > +                     *c = 'a' + (*c - 'a' + 13) % 26;
> > +             else if (*c >= 'A' && *c <= 'Z')
> > +                     *c = 'A' + (*c - 'A' + 13) % 26;
>
> That's quite verbose, but it _is_ correct (if a bit harder than necessary
> to validate, I admit that I had to look up whether `%`'s precedence is higher
> than `+` in https://en.cppreference.com/w/c/language/operator_precedence).
>
> A conciser way (also easier to reason about):
>
>         for (c = str; *c; c++)
>                 if (isalpha(*c))
>                         *c += tolower(*c) < 'n' ? 13 : -13;

Nice :) Thanks.

> > [...]
> > +static void packet_read_capabilities(struct string_list *caps)
> > +{
> > +     while (1) {
> > +             int size;
> > +             char *buf = packet_read_line(0, &size);
> > +             if (!buf)
> > +                     break;
> > +             string_list_append_nodup(caps,
> > +                                      skip_key_dup(buf, size, "capability"));
>
> It is tempting to use unsorted string lists for everything because Perl
> makes that relatively easy.
>
> However, in this instance I would strongly recommend using something more
> akin to Perl's "hash" data structure, in this instance a `strset`.

Ok, will do.

> > +
> > +/*
> > + * Check our capabilities we want to advertise against the remote ones
> > + * and then advertise our capabilities
> > + */
> > +static void packet_check_and_write_capabilities(struct string_list *remote_caps,
> > +                                             struct string_list *our_caps)
>
> The list of "our caps" comes from the command-line. In C, this means we
> get a `const char **argv` and an `int argc`. So:
>
> static void check_and_write_capabilities(struct strset *remote_caps,
>                                          const char **caps, int caps_count)
> {
>         int i;
>
>         for (i = 0; i < caps_count; i++) {
>                 if (!strset_contains(remote_caps, caps[i]))
>                         die("our capability '%s' is not available from remote",
>                             caps[i]);
>
>                 packet_write_fmt(1, "capability=%s\n", caps[i]);
>         }
>         packet_flush(1);
> }

Makes sense. We also use the list elsewhere (has_capability()), but we
can easily replace that with two global flags to indicate if we have
the "clean" and "smudge" caps.

> And then we would call it via
>
>         check_and_write_capabilities(remote_caps, argv + 1, argc - 1);
>
> [...]
> > +static void command_loop(void)
> > +{
> > +     while (1) {
> > +             char *command = packet_key_val_read("command");
> > +             if (!command) {
> > +                     fprintf(logfile, "STOP\n");
> > +                     break;
> > +             }
> > +             fprintf(logfile, "IN: %s", command);
>
> We will also need to `fflush(logfile)` here, to imitate the Perl script's
> behavior more precisely.

I was somewhat intrigued as to why the flushes were needed in the Perl
script. But reading [1] and [2], now, it seems to have been an
oversight.

That is, Eric suggested splictily flushing stdout because it is a
pipe, but the author ended up erroneously disabling autoflush for
stdout too, so that's why we needed the flushes there. They later
acknowledged that and said that they would re-enabled it (see [2]),
but it seems to have been forgotten. So I think we can safely drop the
flush calls.

[1]: http://public-inbox.org/git/20160723072721.GA20875%40starla/
[2]: https://lore.kernel.org/git/7F1F1A0E-8FC3-4FBD-81AA-37786DE0EF50@gmail.com/

> > +
> > +             if (!strcmp(command, "list_available_blobs")) {
> > +                     struct hashmap_iter iter;
> > +                     struct strmap_entry *ent;
> > +                     struct string_list_item *str_item;
> > +                     struct string_list paths = STRING_LIST_INIT_NODUP;
> > +
> > +                     /* flush */
> > +                     if (packet_read_line(0, NULL))
> > +                             die("bad list_available_blobs end");
> > +
> > +                     strmap_for_each_entry(&delay, &iter, ent) {
> > +                             struct delay_entry *delay_entry = ent->value;
> > +                             if (!delay_entry->requested)
> > +                                     continue;
> > +                             delay_entry->count--;
> > +                             if (!strcmp(ent->key, "invalid-delay.a")) {
> > +                                     /* Send Git a pathname that was not delayed earlier */
> > +                                     packet_write_fmt(1, "pathname=unfiltered");
> > +                             }
> > +                             if (!strcmp(ent->key, "missing-delay.a")) {
> > +                                     /* Do not signal Git that this file is available */
> > +                             } else if (!delay_entry->count) {
> > +                                     string_list_insert(&paths, ent->key);
> > +                                     packet_write_fmt(1, "pathname=%s", ent->key);
> > +                             }
> > +                     }
> > +
> > +                     /* Print paths in sorted order. */
>
> The Perl script does not order them specifically. Do we really have to do
> that here?

It actually prints them in sorted order:

        foreach my $pathname ( sort keys %DELAY )

That is required because some test cases will compare the output using
this order.

> In any case, it is more performant to append the paths in an unsorted way
> and then sort them once in the end (that's O(N log(N)) instead of O(N^2)).

OK, will do.

> > +                     for_each_string_list_item(str_item, &paths)
> > +                             fprintf(logfile, " %s", str_item->string);
> > +                     string_list_clear(&paths, 0);
> > +
> > +                     packet_flush(1);
> > +
> > +                     fprintf(logfile, " [OK]\n");
> > +                     packet_write_fmt(1, "status=success");
> > +                     packet_flush(1);
>
> I know the Perl script uses an else here, but I'd much rather insert a
> `continue` at the end of the `list_available_blobs` clause and de-indent
> the remainder of the loop body.

Sure! I think we can take a step further and extract the if logic to a
separate function.

> > +             } else {
> > +                     char *buf, *output;
> > +                     int size;
> > +                     char *pathname;
> > +                     struct delay_entry *entry;
> > +                     struct strbuf input = STRBUF_INIT;
> > +
> > +                     pathname = packet_key_val_read("pathname");
> > +                     if (!pathname)
> > +                             die("unexpected EOF while expecting pathname");
> > +                     fprintf(logfile, " %s", pathname);
>
> Again, let's `fflush(logfile)` here.
>
> > +
> > +                     /* Read until flush */
> > +                     buf = packet_read_line(0, &size);
> > +                     while (buf) {
>
> Let's write this in more idiomatic C:
>
>                         while ((buf = packet_read_line(0, &size))) {
>
> > +                             if (!strcmp(buf, "can-delay=1")) {
> > +                                     entry = strmap_get(&delay, pathname);
> > +                                     if (entry && !entry->requested) {
> > +                                             entry->requested = 1;
> > +                                     } else if (!entry && always_delay) {
> > +                                             entry = xcalloc(1, sizeof(*entry));
> > +                                             entry->requested = 1;
> > +                                             entry->count = 1;
> > +                                             strmap_put(&delay, pathname, entry);
>
> I guess here is our chance to extend the signature of `add_delay_entry()`
> to accept a `requested` parameter, and to call that here.
>
> > +                                     }
> > +                             } else if (starts_with(buf, "ref=") ||
> > +                                        starts_with(buf, "treeish=") ||
> > +                                        starts_with(buf, "blob=")) {
> > +                                     fprintf(logfile, " %s", buf);
> > +                             } else {
> > +                                     /*
> > +                                      * In general, filters need to be graceful about
> > +                                      * new metadata, since it's documented that we
> > +                                      * can pass any key-value pairs, but for tests,
> > +                                      * let's be a little stricter.
> > +                                      */
> > +                                     die("Unknown message '%s'", buf);
> > +                             }
> > +                             buf = packet_read_line(0, &size);
> > +                     }
> > +
> > +
> > +                     read_packetized_to_strbuf(0, &input, 0);
> > +                     fprintf(logfile, " %"PRIuMAX" [OK] -- ", (uintmax_t)input.len);
>
> This reads _so much nicer_ than the Perl version!
>
> > +
> > +                     entry = strmap_get(&delay, pathname);
> > +                     if (entry && entry->output) {
> > +                             output = entry->output;
> > +                     } else if (!strcmp(pathname, "error.r") || !strcmp(pathname, "abort.r")) {
> > +                             output = "";
> > +                     } else if (!strcmp(command, "clean") && has_capability("clean")) {
> > +                             output = rot13(input.buf);
> > +                     } else if (!strcmp(command, "smudge") && has_capability("smudge")) {
> > +                             output = rot13(input.buf);
> > +                     } else {
> > +                             die("bad command '%s'", command);
> > +                     }
> > +
> > +                     if (!strcmp(pathname, "error.r")) {
> > +                             fprintf(logfile, "[ERROR]\n");
> > +                             packet_write_fmt(1, "status=error");
> > +                             packet_flush(1);
> > +                     } else if (!strcmp(pathname, "abort.r")) {
> > +                             fprintf(logfile, "[ABORT]\n");
> > +                             packet_write_fmt(1, "status=abort");
> > +                             packet_flush(1);
> > +                     } else if (!strcmp(command, "smudge") &&
> > +                                (entry = strmap_get(&delay, pathname)) &&
> > +                                entry->requested == 1) {
> > +                             fprintf(logfile, "[DELAYED]\n");
> > +                             packet_write_fmt(1, "status=delayed");
> > +                             packet_flush(1);
> > +                             entry->requested = 2;
> > +                             entry->output = xstrdup(output);
>
> We need to call `free(entry->output)` before that lest we leak memory, but
> only if `output` is not identical anyway:
>
>                                 if (entry->output != output) {
>                                         free(entry->output);
>                                         entry->output = xstrdup(output);
>                                 }

I think, entry->output will always be NULL here, since we only get
inside this if block after entry->requested has been set to 1 at the
top of the function; and, at that point, we haven't run ro13 yet.
Nevertheless, it doesn't hurt to add the free call anyway :)

>
> > +                     } else {
> > +                             int i, nr_packets;
> > +                             size_t output_len;
> > +                             struct strbuf sb = STRBUF_INIT;
> > +                             packet_write_fmt(1, "status=success");
> > +                             packet_flush(1);
> > +
> > +                             strbuf_addf(&sb, "%s-write-fail.r", command);
> > +                             if (!strcmp(pathname, sb.buf)) {
>
> We can easily avoid allocating the string just for comparing it:
>
>                                 const char *p;
>
>                                 if (skip_prefix(pathname, command, &p) &&
>                                     !strcmp(p, "-write-fail.r")) {
>
> > +                                     fprintf(logfile, "[WRITE FAIL]\n");
>
>                                         fflush(logfile) ;-)
>
> > +                                     die("%s write error", command);
> > +                             }
> > +
> > +                             output_len = strlen(output);
> > +                             fprintf(logfile, "OUT: %"PRIuMAX" ", (uintmax_t)output_len);
> > +
> > +                             if (write_packetized_from_buf_no_flush_count(output,
> > +                                     output_len, 1, &nr_packets))
> > +                                     die("failed to write buffer to stdout");
> > +                             packet_flush(1);
> > +
> > +                             for (i = 0; i < nr_packets; i++)
> > +                                     fprintf(logfile, ".");
>
> That's not quite the same as the Perl script does: it prints a '.'
> (without flushing, though) _every_ time it wrote a packet.
>
> If you want to emulate that, you will have to copy/edit that loop (and in
> that case, the insanely long-named function
> `write_packetized_from_buf_no_flush_count()` is unnecessary, too).

Hmm, I'm not sure we need to emulate that. I do dislike the huge
function name as well, but I also don't quite like to repeat code
copying that loop here...

> > +                             fprintf(logfile, " [OK]\n");
> > +
> > +                             packet_flush(1);
> > +                             strbuf_release(&sb);
> > +                     }
> > +                     free(pathname);
> > +                     strbuf_release(&input);
> > +             }
> > +             free(command);
> > +     }
> > +}
> > [...]
> > +static void packet_initialize(const char *name, int version)
> > +{
> > +     struct strbuf sb = STRBUF_INIT;
> > +     int size;
> > +     char *pkt_buf = packet_read_line(0, &size);
> > +
> > +     strbuf_addf(&sb, "%s-client", name);
> > +     if (!pkt_buf || strncmp(pkt_buf, sb.buf, size))
>
> We do not need the flexibility of the Perl package, where `name` is a
> parameter. We can hard-code `git-filter-client` here. I.e. something like
> this:
>
>         if (!pkt_buf || size != 17 ||
>             strncmp(pkt_buf, "git-filter-client", 17))

Good idea! Thanks. Perhaps, can't we do:

        if (!pkt_buf || strncmp(pkt_buf, "git-filter-client", size))

to avoid the hard-coded and possibly error-prone 17?

> > +             die("bad initialize: '%s'", xstrndup(pkt_buf, size));
> > +
> > +     strbuf_reset(&sb);
> > +     strbuf_addf(&sb, "version=%d", version);

Thanks for a very detailed review and great suggestions!
Johannes Schindelin Aug. 9, 2022, 9:36 a.m. UTC | #5
Hi Matheus,

On Sat, 30 Jul 2022, Matheus Tavares wrote:

> On Thu, Jul 28, 2022 at 1:58 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > > On Sun, 24 Jul 2022, Matheus Tavares wrote:
> > >
> > > +static void command_loop(void)
> > > +{
> > > +     while (1) {
> > > +             char *command = packet_key_val_read("command");
> > > +             if (!command) {
> > > +                     fprintf(logfile, "STOP\n");
> > > +                     break;
> > > +             }
> > > +             fprintf(logfile, "IN: %s", command);
> >
> > We will also need to `fflush(logfile)` here, to imitate the Perl script's
> > behavior more precisely.
>
> I was somewhat intrigued as to why the flushes were needed in the Perl
> script. But reading [1] and [2], now, it seems to have been an
> oversight.
>
> That is, Eric suggested splictily flushing stdout because it is a
> pipe, but the author ended up erroneously disabling autoflush for
> stdout too, so that's why we needed the flushes there. They later
> acknowledged that and said that they would re-enabled it (see [2]),
> but it seems to have been forgotten. So I think we can safely drop the
> flush calls.
>
> [1]: http://public-inbox.org/git/20160723072721.GA20875%40starla/
> [2]: https://lore.kernel.org/git/7F1F1A0E-8FC3-4FBD-81AA-37786DE0EF50@gmail.com/

I am somewhat weary of introducing a change of behavior while
reimplementing a Perl script in C at the same time, but in this instance I
think that the benefit of _not_ touching the `pkt-line.c` code is a
convincing reason to do so.

> > > +
> > > +             if (!strcmp(command, "list_available_blobs")) {
> > > +                     struct hashmap_iter iter;
> > > +                     struct strmap_entry *ent;
> > > +                     struct string_list_item *str_item;
> > > +                     struct string_list paths = STRING_LIST_INIT_NODUP;
> > > +
> > > +                     /* flush */
> > > +                     if (packet_read_line(0, NULL))
> > > +                             die("bad list_available_blobs end");
> > > +
> > > +                     strmap_for_each_entry(&delay, &iter, ent) {
> > > +                             struct delay_entry *delay_entry = ent->value;
> > > +                             if (!delay_entry->requested)
> > > +                                     continue;
> > > +                             delay_entry->count--;
> > > +                             if (!strcmp(ent->key, "invalid-delay.a")) {
> > > +                                     /* Send Git a pathname that was not delayed earlier */
> > > +                                     packet_write_fmt(1, "pathname=unfiltered");
> > > +                             }
> > > +                             if (!strcmp(ent->key, "missing-delay.a")) {
> > > +                                     /* Do not signal Git that this file is available */
> > > +                             } else if (!delay_entry->count) {
> > > +                                     string_list_insert(&paths, ent->key);
> > > +                                     packet_write_fmt(1, "pathname=%s", ent->key);
> > > +                             }
> > > +                     }
> > > +
> > > +                     /* Print paths in sorted order. */
> >
> > The Perl script does not order them specifically. Do we really have to do
> > that here?
>
> It actually prints them in sorted order:
>
>         foreach my $pathname ( sort keys %DELAY )

Whoops, sorry for missing that!

> > > +                             fprintf(logfile, " [OK]\n");
> > > +
> > > +                             packet_flush(1);
> > > +                             strbuf_release(&sb);
> > > +                     }
> > > +                     free(pathname);
> > > +                     strbuf_release(&input);
> > > +             }
> > > +             free(command);
> > > +     }
> > > +}
> > > [...]
> > > +static void packet_initialize(const char *name, int version)
> > > +{
> > > +     struct strbuf sb = STRBUF_INIT;
> > > +     int size;
> > > +     char *pkt_buf = packet_read_line(0, &size);
> > > +
> > > +     strbuf_addf(&sb, "%s-client", name);
> > > +     if (!pkt_buf || strncmp(pkt_buf, sb.buf, size))
> >
> > We do not need the flexibility of the Perl package, where `name` is a
> > parameter. We can hard-code `git-filter-client` here. I.e. something like
> > this:
> >
> >         if (!pkt_buf || size != 17 ||
> >             strncmp(pkt_buf, "git-filter-client", 17))
>
> Good idea! Thanks. Perhaps, can't we do:
>
>         if (!pkt_buf || strncmp(pkt_buf, "git-filter-client", size))
>
> to avoid the hard-coded and possibly error-prone 17?

I am afraid that this is not idempotent. If `pkt_buf` is "git" and `size`
is 3, then the suggested `strncmp()` would return 0, but we would want it
to be non-zero.

The best way to avoid the hard-coded 17 would be to introduce a local
constant and use `strlen()` on it (which modern compilers would evaluate
already at compile time).

> > > +             die("bad initialize: '%s'", xstrndup(pkt_buf, size));
> > > +
> > > +     strbuf_reset(&sb);
> > > +     strbuf_addf(&sb, "version=%d", version);
>
> Thanks for a very detailed review and great suggestions!

Thank you for your contribution that is very much relevant to my
interests!

Ciao,
Dscho
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 04d0fd1fe6..7cfcf3a911 100644
--- a/Makefile
+++ b/Makefile
@@ -764,6 +764,7 @@  TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
 TEST_BUILTINS_OBJS += test-reftable.o
 TEST_BUILTINS_OBJS += test-regex.o
+TEST_BUILTINS_OBJS += test-rot13-filter.o
 TEST_BUILTINS_OBJS += test-repository.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
 TEST_BUILTINS_OBJS += test-run-command.o
diff --git a/pkt-line.c b/pkt-line.c
index 8e43c2def4..4425bdae36 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -309,9 +309,10 @@  int write_packetized_from_fd_no_flush(int fd_in, int fd_out)
 	return err;
 }
 
-int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_out)
+int write_packetized_from_buf_no_flush_count(const char *src_in, size_t len,
+					     int fd_out, int *count_ptr)
 {
-	int err = 0;
+	int err = 0, count = 0;
 	size_t bytes_written = 0;
 	size_t bytes_to_write;
 
@@ -324,10 +325,18 @@  int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_ou
 			break;
 		err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write);
 		bytes_written += bytes_to_write;
+		count++;
 	}
+	if (count_ptr)
+		*count_ptr = count;
 	return err;
 }
 
+int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_out)
+{
+	return write_packetized_from_buf_no_flush_count(src_in, len, fd_out, NULL);
+}
+
 static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 			   void *dst, unsigned size, int options)
 {
diff --git a/pkt-line.h b/pkt-line.h
index 6d2a63db23..43986c525c 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -33,6 +33,8 @@  int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int write_packetized_from_fd_no_flush(int fd_in, int fd_out);
 int write_packetized_from_buf_no_flush(const char *src_in, size_t len, int fd_out);
+int write_packetized_from_buf_no_flush_count(const char *src_in, size_t len,
+					     int fd_out, int *count_ptr);
 
 /*
  * Stdio versions of packet_write functions. When mixing these with fd
diff --git a/t/helper/test-rot13-filter.c b/t/helper/test-rot13-filter.c
new file mode 100644
index 0000000000..536111f272
--- /dev/null
+++ b/t/helper/test-rot13-filter.c
@@ -0,0 +1,393 @@ 
+/*
+ * Example implementation for the Git filter protocol version 2
+ * See Documentation/gitattributes.txt, section "Filter Protocol"
+ *
+ * Usage: test-tool rot13-filter [--always-delay] <log path> <capabilities>
+ *
+ * Log path defines a debug log file that the script writes to. The
+ * subsequent arguments define a list of supported protocol capabilities
+ * ("clean", "smudge", etc).
+ *
+ * When --always-delay is given all pathnames with the "can-delay" flag
+ * that don't appear on the list bellow are delayed with a count of 1
+ * (see more below).
+ *
+ * This implementation supports special test cases:
+ * (1) If data with the pathname "clean-write-fail.r" is processed with
+ *     a "clean" operation then the write operation will die.
+ * (2) If data with the pathname "smudge-write-fail.r" is processed with
+ *     a "smudge" operation then the write operation will die.
+ * (3) If data with the pathname "error.r" is processed with any
+ *     operation then the filter signals that it cannot or does not want
+ *     to process the file.
+ * (4) If data with the pathname "abort.r" is processed with any
+ *     operation then the filter signals that it cannot or does not want
+ *     to process the file and any file after that is processed with the
+ *     same command.
+ * (5) If data with a pathname that is a key in the delay hash is
+ *     requested (e.g. "test-delay10.a") then the filter responds with
+ *     a "delay" status and sets the "requested" field in the delay hash.
+ *     The filter will signal the availability of this object after
+ *     "count" (field in delay hash) "list_available_blobs" commands.
+ * (6) If data with the pathname "missing-delay.a" is processed that the
+ *     filter will drop the path from the "list_available_blobs" response.
+ * (7) If data with the pathname "invalid-delay.a" is processed that the
+ *     filter will add the path "unfiltered" which was not delayed before
+ *     to the "list_available_blobs" response.
+ */
+
+#include "test-tool.h"
+#include "pkt-line.h"
+#include "string-list.h"
+#include "strmap.h"
+
+static FILE *logfile;
+static int always_delay;
+static struct strmap delay = STRMAP_INIT;
+static struct string_list requested_caps = STRING_LIST_INIT_NODUP;
+
+static int has_capability(const char *cap)
+{
+	return unsorted_string_list_has_string(&requested_caps, cap);
+}
+
+static char *rot13(char *str)
+{
+	char *c;
+	for (c = str; *c; c++) {
+		if (*c >= 'a' && *c <= 'z')
+			*c = 'a' + (*c - 'a' + 13) % 26;
+		else if (*c >= 'A' && *c <= 'Z')
+			*c = 'A' + (*c - 'A' + 13) % 26;
+	}
+	return str;
+}
+
+static char *skip_key_dup(const char *buf, size_t size, const char *key)
+{
+	struct strbuf keybuf = STRBUF_INIT;
+	strbuf_addf(&keybuf, "%s=", key);
+	if (!skip_prefix_mem(buf, size, keybuf.buf, &buf, &size) || !size)
+		die("bad %s: '%s'", key, xstrndup(buf, size));
+	strbuf_release(&keybuf);
+	return xstrndup(buf, size);
+}
+
+/*
+ * Read a text packet, expecting that it is in the form "key=value" for
+ * the given key. An EOF does not trigger any error and is reported
+ * back to the caller with NULL. Die if the "key" part of "key=value" does
+ * not match the given key, or the value part is empty.
+ */
+static char *packet_key_val_read(const char *key)
+{
+	int size;
+	char *buf;
+	if (packet_read_line_gently(0, &size, &buf) < 0)
+		return NULL;
+	return skip_key_dup(buf, size, key);
+}
+
+static void packet_read_capabilities(struct string_list *caps)
+{
+	while (1) {
+		int size;
+		char *buf = packet_read_line(0, &size);
+		if (!buf)
+			break;
+		string_list_append_nodup(caps,
+					 skip_key_dup(buf, size, "capability"));
+	}
+}
+
+/* Read remote capabilities and check them against capabilities we require */
+static void packet_read_and_check_capabilities(struct string_list *remote_caps,
+					       struct string_list *required_caps)
+{
+	struct string_list_item *item;
+	packet_read_capabilities(remote_caps);
+	for_each_string_list_item(item, required_caps) {
+		if (!unsorted_string_list_has_string(remote_caps, item->string)) {
+			die("required '%s' capability not available from remote",
+			    item->string);
+		}
+	}
+}
+
+/*
+ * Check our capabilities we want to advertise against the remote ones
+ * and then advertise our capabilities
+ */
+static void packet_check_and_write_capabilities(struct string_list *remote_caps,
+						struct string_list *our_caps)
+{
+	struct string_list_item *item;
+	for_each_string_list_item(item, our_caps) {
+		if (!unsorted_string_list_has_string(remote_caps, item->string)) {
+			die("our capability '%s' is not available from remote",
+			    item->string);
+		}
+		packet_write_fmt(1, "capability=%s\n", item->string);
+	}
+	packet_flush(1);
+}
+
+struct delay_entry {
+	int requested, count;
+	char *output;
+};
+
+static void command_loop(void)
+{
+	while (1) {
+		char *command = packet_key_val_read("command");
+		if (!command) {
+			fprintf(logfile, "STOP\n");
+			break;
+		}
+		fprintf(logfile, "IN: %s", command);
+
+		if (!strcmp(command, "list_available_blobs")) {
+			struct hashmap_iter iter;
+			struct strmap_entry *ent;
+			struct string_list_item *str_item;
+			struct string_list paths = STRING_LIST_INIT_NODUP;
+
+			/* flush */
+			if (packet_read_line(0, NULL))
+				die("bad list_available_blobs end");
+
+			strmap_for_each_entry(&delay, &iter, ent) {
+				struct delay_entry *delay_entry = ent->value;
+				if (!delay_entry->requested)
+					continue;
+				delay_entry->count--;
+				if (!strcmp(ent->key, "invalid-delay.a")) {
+					/* Send Git a pathname that was not delayed earlier */
+					packet_write_fmt(1, "pathname=unfiltered");
+				}
+				if (!strcmp(ent->key, "missing-delay.a")) {
+					/* Do not signal Git that this file is available */
+				} else if (!delay_entry->count) {
+					string_list_insert(&paths, ent->key);
+					packet_write_fmt(1, "pathname=%s", ent->key);
+				}
+			}
+
+			/* Print paths in sorted order. */
+			for_each_string_list_item(str_item, &paths)
+				fprintf(logfile, " %s", str_item->string);
+			string_list_clear(&paths, 0);
+
+			packet_flush(1);
+
+			fprintf(logfile, " [OK]\n");
+			packet_write_fmt(1, "status=success");
+			packet_flush(1);
+		} else {
+			char *buf, *output;
+			int size;
+			char *pathname;
+			struct delay_entry *entry;
+			struct strbuf input = STRBUF_INIT;
+
+			pathname = packet_key_val_read("pathname");
+			if (!pathname)
+				die("unexpected EOF while expecting pathname");
+			fprintf(logfile, " %s", pathname);
+
+			/* Read until flush */
+			buf = packet_read_line(0, &size);
+			while (buf) {
+				if (!strcmp(buf, "can-delay=1")) {
+					entry = strmap_get(&delay, pathname);
+					if (entry && !entry->requested) {
+						entry->requested = 1;
+					} else if (!entry && always_delay) {
+						entry = xcalloc(1, sizeof(*entry));
+						entry->requested = 1;
+						entry->count = 1;
+						strmap_put(&delay, pathname, entry);
+					}
+				} else if (starts_with(buf, "ref=") ||
+					   starts_with(buf, "treeish=") ||
+					   starts_with(buf, "blob=")) {
+					fprintf(logfile, " %s", buf);
+				} else {
+					/*
+					 * In general, filters need to be graceful about
+					 * new metadata, since it's documented that we
+					 * can pass any key-value pairs, but for tests,
+					 * let's be a little stricter.
+					 */
+					die("Unknown message '%s'", buf);
+				}
+				buf = packet_read_line(0, &size);
+			}
+
+
+			read_packetized_to_strbuf(0, &input, 0);
+			fprintf(logfile, " %"PRIuMAX" [OK] -- ", (uintmax_t)input.len);
+
+			entry = strmap_get(&delay, pathname);
+			if (entry && entry->output) {
+				output = entry->output;
+			} else if (!strcmp(pathname, "error.r") || !strcmp(pathname, "abort.r")) {
+				output = "";
+			} else if (!strcmp(command, "clean") && has_capability("clean")) {
+				output = rot13(input.buf);
+			} else if (!strcmp(command, "smudge") && has_capability("smudge")) {
+				output = rot13(input.buf);
+			} else {
+				die("bad command '%s'", command);
+			}
+
+			if (!strcmp(pathname, "error.r")) {
+				fprintf(logfile, "[ERROR]\n");
+				packet_write_fmt(1, "status=error");
+				packet_flush(1);
+			} else if (!strcmp(pathname, "abort.r")) {
+				fprintf(logfile, "[ABORT]\n");
+				packet_write_fmt(1, "status=abort");
+				packet_flush(1);
+			} else if (!strcmp(command, "smudge") &&
+				   (entry = strmap_get(&delay, pathname)) &&
+				   entry->requested == 1) {
+				fprintf(logfile, "[DELAYED]\n");
+				packet_write_fmt(1, "status=delayed");
+				packet_flush(1);
+				entry->requested = 2;
+				entry->output = xstrdup(output);
+			} else {
+				int i, nr_packets;
+				size_t output_len;
+				struct strbuf sb = STRBUF_INIT;
+				packet_write_fmt(1, "status=success");
+				packet_flush(1);
+
+				strbuf_addf(&sb, "%s-write-fail.r", command);
+				if (!strcmp(pathname, sb.buf)) {
+					fprintf(logfile, "[WRITE FAIL]\n");
+					die("%s write error", command);
+				}
+
+				output_len = strlen(output);
+				fprintf(logfile, "OUT: %"PRIuMAX" ", (uintmax_t)output_len);
+
+				if (write_packetized_from_buf_no_flush_count(output,
+					output_len, 1, &nr_packets))
+					die("failed to write buffer to stdout");
+				packet_flush(1);
+
+				for (i = 0; i < nr_packets; i++)
+					fprintf(logfile, ".");
+				fprintf(logfile, " [OK]\n");
+
+				packet_flush(1);
+				strbuf_release(&sb);
+			}
+			free(pathname);
+			strbuf_release(&input);
+		}
+		free(command);
+	}
+}
+
+static void free_delay_hash(void)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *ent;
+
+	strmap_for_each_entry(&delay, &iter, ent) {
+		struct delay_entry *delay_entry = ent->value;
+		free(delay_entry->output);
+		free(delay_entry);
+	}
+	strmap_clear(&delay, 0);
+}
+
+static void add_delay_entry(char *pathname, int count)
+{
+	struct delay_entry *entry = xcalloc(1, sizeof(*entry));
+	entry->count = count;
+	if (strmap_put(&delay, pathname, entry))
+		BUG("adding the same path twice to delay hash?");
+}
+
+static void packet_initialize(const char *name, int version)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int size;
+	char *pkt_buf = packet_read_line(0, &size);
+
+	strbuf_addf(&sb, "%s-client", name);
+	if (!pkt_buf || strncmp(pkt_buf, sb.buf, size))
+		die("bad initialize: '%s'", xstrndup(pkt_buf, size));
+
+	strbuf_reset(&sb);
+	strbuf_addf(&sb, "version=%d", version);
+	pkt_buf = packet_read_line(0, &size);
+	if (!pkt_buf || strncmp(pkt_buf, sb.buf, size))
+		die("bad version: '%s'", xstrndup(pkt_buf, size));
+
+	pkt_buf = packet_read_line(0, &size);
+	if (pkt_buf)
+		die("bad version end: '%s'", xstrndup(pkt_buf, size));
+
+	packet_write_fmt(1, "%s-server", name);
+	packet_write_fmt(1, "version=%d", version);
+	packet_flush(1);
+	strbuf_release(&sb);
+}
+
+static char *rot13_usage = "test-tool rot13-filter [--always-delay] <log path> <capabilities>";
+
+int cmd__rot13_filter(int argc, const char **argv)
+{
+	int i = 1;
+	struct string_list remote_caps = STRING_LIST_INIT_DUP,
+			   supported_caps = STRING_LIST_INIT_NODUP;
+
+	string_list_append(&supported_caps, "clean");
+	string_list_append(&supported_caps, "smudge");
+	string_list_append(&supported_caps, "delay");
+
+	if (argc > 1 && !strcmp(argv[i], "--always-delay")) {
+		always_delay = 1;
+		i++;
+	}
+	if (argc - i < 2)
+		usage(rot13_usage);
+
+	logfile = fopen(argv[i++], "a");
+	if (!logfile)
+		die_errno("failed to open log file");
+
+	for ( ; i < argc; i++)
+		string_list_append(&requested_caps, argv[i]);
+
+	add_delay_entry("test-delay10.a", 1);
+	add_delay_entry("test-delay11.a", 1);
+	add_delay_entry("test-delay20.a", 2);
+	add_delay_entry("test-delay10.b", 1);
+	add_delay_entry("missing-delay.a", 1);
+	add_delay_entry("invalid-delay.a", 1);
+
+	fprintf(logfile, "START\n");
+
+	packet_initialize("git-filter", 2);
+
+	packet_read_and_check_capabilities(&remote_caps, &supported_caps);
+	packet_check_and_write_capabilities(&remote_caps, &requested_caps);
+	fprintf(logfile, "init handshake complete\n");
+
+	string_list_clear(&supported_caps, 0);
+	string_list_clear(&remote_caps, 0);
+
+	command_loop();
+
+	fclose(logfile);
+	string_list_clear(&requested_caps, 0);
+	free_delay_hash();
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 318fdbab0c..d6a560f832 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -65,6 +65,7 @@  static struct test_cmd cmds[] = {
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
 	{ "reftable", cmd__reftable },
+	{ "rot13-filter", cmd__rot13_filter },
 	{ "dump-reftable", cmd__dump_reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index bb79927163..21a91b1019 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -54,6 +54,7 @@  int cmd__read_cache(int argc, const char **argv);
 int cmd__read_graph(int argc, const char **argv);
 int cmd__read_midx(int argc, const char **argv);
 int cmd__ref_store(int argc, const char **argv);
+int cmd__rot13_filter(int argc, const char **argv);
 int cmd__reftable(int argc, const char **argv);
 int cmd__regex(int argc, const char **argv);
 int cmd__repository(int argc, const char **argv);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 1c840348bd..aeaa8e02ed 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -17,9 +17,6 @@  tr \
   'nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM'
 EOF
 
-write_script rot13-filter.pl "$PERL_PATH" \
-	<"$TEST_DIRECTORY"/t0021/rot13-filter.pl
-
 generate_random_characters () {
 	LEN=$1
 	NAME=$2
@@ -365,8 +362,8 @@  test_expect_success 'diff does not reuse worktree files that need cleaning' '
 	test_line_count = 0 count
 '
 
-test_expect_success PERL 'required process filter should filter data' '
-	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
+test_expect_success 'required process filter should filter data' '
+	test_config_global filter.protocol.process "test-tool rot13-filter debug.log clean smudge" &&
 	test_config_global filter.protocol.required true &&
 	rm -rf repo &&
 	mkdir repo &&
@@ -450,8 +447,8 @@  test_expect_success PERL 'required process filter should filter data' '
 	)
 '
 
-test_expect_success PERL 'required process filter should filter data for various subcommands' '
-	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
+test_expect_success 'required process filter should filter data for various subcommands' '
+	test_config_global filter.protocol.process "test-tool rot13-filter debug.log clean smudge" &&
 	test_config_global filter.protocol.required true &&
 	(
 		cd repo &&
@@ -561,9 +558,9 @@  test_expect_success PERL 'required process filter should filter data for various
 	)
 '
 
-test_expect_success PERL 'required process filter takes precedence' '
+test_expect_success 'required process filter takes precedence' '
 	test_config_global filter.protocol.clean false &&
-	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" &&
+	test_config_global filter.protocol.process "test-tool rot13-filter debug.log clean" &&
 	test_config_global filter.protocol.required true &&
 	rm -rf repo &&
 	mkdir repo &&
@@ -587,8 +584,8 @@  test_expect_success PERL 'required process filter takes precedence' '
 	)
 '
 
-test_expect_success PERL 'required process filter should be used only for "clean" operation only' '
-	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" &&
+test_expect_success 'required process filter should be used only for "clean" operation only' '
+	test_config_global filter.protocol.process "test-tool rot13-filter debug.log clean" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -622,8 +619,8 @@  test_expect_success PERL 'required process filter should be used only for "clean
 	)
 '
 
-test_expect_success PERL 'required process filter should process multiple packets' '
-	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
+test_expect_success 'required process filter should process multiple packets' '
+	test_config_global filter.protocol.process "test-tool rot13-filter debug.log clean smudge" &&
 	test_config_global filter.protocol.required true &&
 
 	rm -rf repo &&
@@ -687,8 +684,8 @@  test_expect_success PERL 'required process filter should process multiple packet
 	)
 '
 
-test_expect_success PERL 'required process filter with clean error should fail' '
-	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
+test_expect_success 'required process filter with clean error should fail' '
+	test_config_global filter.protocol.process "test-tool rot13-filter debug.log clean smudge" &&
 	test_config_global filter.protocol.required true &&
 	rm -rf repo &&
 	mkdir repo &&
@@ -706,8 +703,8 @@  test_expect_success PERL 'required process filter with clean error should fail'
 	)
 '
 
-test_expect_success PERL 'process filter should restart after unexpected write failure' '
-	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
+test_expect_success 'process filter should restart after unexpected write failure' '
+	test_config_global filter.protocol.process "test-tool rot13-filter debug.log clean smudge" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -735,7 +732,7 @@  test_expect_success PERL 'process filter should restart after unexpected write f
 		rm -f debug.log &&
 		git checkout --quiet --no-progress . 2>git-stderr.log &&
 
-		grep "smudge write error at" git-stderr.log &&
+		grep "smudge write error" git-stderr.log &&
 		test_i18ngrep "error: external filter" git-stderr.log &&
 
 		cat >expected.log <<-EOF &&
@@ -761,8 +758,8 @@  test_expect_success PERL 'process filter should restart after unexpected write f
 	)
 '
 
-test_expect_success PERL 'process filter should not be restarted if it signals an error' '
-	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
+test_expect_success 'process filter should not be restarted if it signals an error' '
+	test_config_global filter.protocol.process "test-tool rot13-filter debug.log clean smudge" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -804,8 +801,8 @@  test_expect_success PERL 'process filter should not be restarted if it signals a
 	)
 '
 
-test_expect_success PERL 'process filter abort stops processing of all further files' '
-	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
+test_expect_success 'process filter abort stops processing of all further files' '
+	test_config_global filter.protocol.process "test-tool rot13-filter debug.log clean smudge" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -861,10 +858,10 @@  test_expect_success PERL 'invalid process filter must fail (and not hang!)' '
 	)
 '
 
-test_expect_success PERL 'delayed checkout in process filter' '
-	test_config_global filter.a.process "rot13-filter.pl a.log clean smudge delay" &&
+test_expect_success 'delayed checkout in process filter' '
+	test_config_global filter.a.process "test-tool rot13-filter a.log clean smudge delay" &&
 	test_config_global filter.a.required true &&
-	test_config_global filter.b.process "rot13-filter.pl b.log clean smudge delay" &&
+	test_config_global filter.b.process "test-tool rot13-filter b.log clean smudge delay" &&
 	test_config_global filter.b.required true &&
 
 	rm -rf repo &&
@@ -940,8 +937,8 @@  test_expect_success PERL 'delayed checkout in process filter' '
 	)
 '
 
-test_expect_success PERL 'missing file in delayed checkout' '
-	test_config_global filter.bug.process "rot13-filter.pl bug.log clean smudge delay" &&
+test_expect_success 'missing file in delayed checkout' '
+	test_config_global filter.bug.process "test-tool rot13-filter bug.log clean smudge delay" &&
 	test_config_global filter.bug.required true &&
 
 	rm -rf repo &&
@@ -960,8 +957,8 @@  test_expect_success PERL 'missing file in delayed checkout' '
 	grep "error: .missing-delay\.a. was not filtered properly" git-stderr.log
 '
 
-test_expect_success PERL 'invalid file in delayed checkout' '
-	test_config_global filter.bug.process "rot13-filter.pl bug.log clean smudge delay" &&
+test_expect_success 'invalid file in delayed checkout' '
+	test_config_global filter.bug.process "test-tool rot13-filter bug.log clean smudge delay" &&
 	test_config_global filter.bug.required true &&
 
 	rm -rf repo &&
@@ -990,10 +987,10 @@  do
 		mode_prereq='UTF8_NFD_TO_NFC' ;;
 	esac
 
-	test_expect_success PERL,SYMLINKS,$mode_prereq \
+	test_expect_success SYMLINKS,$mode_prereq \
 	"delayed checkout with $mode-collision don't write to the wrong place" '
 		test_config_global filter.delay.process \
-			"\"$TEST_ROOT/rot13-filter.pl\" --always-delay delayed.log clean smudge delay" &&
+			"test-tool rot13-filter --always-delay delayed.log clean smudge delay" &&
 		test_config_global filter.delay.required true &&
 
 		git init $mode-collision &&
@@ -1026,12 +1023,12 @@  do
 	'
 done
 
-test_expect_success PERL,SYMLINKS,CASE_INSENSITIVE_FS \
+test_expect_success SYMLINKS,CASE_INSENSITIVE_FS \
 "delayed checkout with submodule collision don't write to the wrong place" '
 	git init collision-with-submodule &&
 	(
 		cd collision-with-submodule &&
-		git config filter.delay.process "\"$TEST_ROOT/rot13-filter.pl\" --always-delay delayed.log clean smudge delay" &&
+		git config filter.delay.process "test-tool rot13-filter --always-delay delayed.log clean smudge delay" &&
 		git config filter.delay.required true &&
 
 		# We need Git to treat the submodule "a" and the
@@ -1062,11 +1059,11 @@  test_expect_success PERL,SYMLINKS,CASE_INSENSITIVE_FS \
 	)
 '
 
-test_expect_success PERL 'setup for progress tests' '
+test_expect_success 'setup for progress tests' '
 	git init progress &&
 	(
 		cd progress &&
-		git config filter.delay.process "rot13-filter.pl delay-progress.log clean smudge delay" &&
+		git config filter.delay.process "test-tool rot13-filter delay-progress.log clean smudge delay" &&
 		git config filter.delay.required true &&
 
 		echo "*.a filter=delay" >.gitattributes &&
@@ -1132,12 +1129,12 @@  do
 	'
 done
 
-test_expect_success PERL 'delayed checkout correctly reports the number of updated entries' '
+test_expect_success 'delayed checkout correctly reports the number of updated entries' '
 	rm -rf repo &&
 	git init repo &&
 	(
 		cd repo &&
-		git config filter.delay.process "../rot13-filter.pl delayed.log clean smudge delay" &&
+		git config filter.delay.process "test-tool rot13-filter delayed.log clean smudge delay" &&
 		git config filter.delay.required true &&
 
 		echo "*.a filter=delay" >.gitattributes &&
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
deleted file mode 100644
index 7bb93768f3..0000000000
--- a/t/t0021/rot13-filter.pl
+++ /dev/null
@@ -1,247 +0,0 @@ 
-#
-# Example implementation for the Git filter protocol version 2
-# See Documentation/gitattributes.txt, section "Filter Protocol"
-#
-# Usage: rot13-filter.pl [--always-delay] <log path> <capabilities>
-#
-# Log path defines a debug log file that the script writes to. The
-# subsequent arguments define a list of supported protocol capabilities
-# ("clean", "smudge", etc).
-#
-# When --always-delay is given all pathnames with the "can-delay" flag
-# that don't appear on the list bellow are delayed with a count of 1
-# (see more below).
-#
-# This implementation supports special test cases:
-# (1) If data with the pathname "clean-write-fail.r" is processed with
-#     a "clean" operation then the write operation will die.
-# (2) If data with the pathname "smudge-write-fail.r" is processed with
-#     a "smudge" operation then the write operation will die.
-# (3) If data with the pathname "error.r" is processed with any
-#     operation then the filter signals that it cannot or does not want
-#     to process the file.
-# (4) If data with the pathname "abort.r" is processed with any
-#     operation then the filter signals that it cannot or does not want
-#     to process the file and any file after that is processed with the
-#     same command.
-# (5) If data with a pathname that is a key in the DELAY hash is
-#     requested (e.g. "test-delay10.a") then the filter responds with
-#     a "delay" status and sets the "requested" field in the DELAY hash.
-#     The filter will signal the availability of this object after
-#     "count" (field in DELAY hash) "list_available_blobs" commands.
-# (6) If data with the pathname "missing-delay.a" is processed that the
-#     filter will drop the path from the "list_available_blobs" response.
-# (7) If data with the pathname "invalid-delay.a" is processed that the
-#     filter will add the path "unfiltered" which was not delayed before
-#     to the "list_available_blobs" response.
-#
-
-use 5.008;
-sub gitperllib {
-	# Git assumes that all path lists are Unix-y colon-separated ones. But
-	# when the Git for Windows executes the test suite, its MSYS2 Bash
-	# calls git.exe, and colon-separated path lists are converted into
-	# Windows-y semicolon-separated lists of *Windows* paths (which
-	# naturally contain a colon after the drive letter, so splitting by
-	# colons simply does not cut it).
-	#
-	# Detect semicolon-separated path list and handle them appropriately.
-
-	if ($ENV{GITPERLLIB} =~ /;/) {
-		return split(/;/, $ENV{GITPERLLIB});
-	}
-	return split(/:/, $ENV{GITPERLLIB});
-}
-use lib (gitperllib());
-use strict;
-use warnings;
-use IO::File;
-use Git::Packet;
-
-my $MAX_PACKET_CONTENT_SIZE = 65516;
-
-my $always_delay = 0;
-if ( $ARGV[0] eq '--always-delay' ) {
-	$always_delay = 1;
-	shift @ARGV;
-}
-
-my $log_file                = shift @ARGV;
-my @capabilities            = @ARGV;
-
-open my $debug, ">>", $log_file or die "cannot open log file: $!";
-
-my %DELAY = (
-	'test-delay10.a' => { "requested" => 0, "count" => 1 },
-	'test-delay11.a' => { "requested" => 0, "count" => 1 },
-	'test-delay20.a' => { "requested" => 0, "count" => 2 },
-	'test-delay10.b' => { "requested" => 0, "count" => 1 },
-	'missing-delay.a' => { "requested" => 0, "count" => 1 },
-	'invalid-delay.a' => { "requested" => 0, "count" => 1 },
-);
-
-sub rot13 {
-	my $str = shift;
-	$str =~ y/A-Za-z/N-ZA-Mn-za-m/;
-	return $str;
-}
-
-print $debug "START\n";
-$debug->flush();
-
-packet_initialize("git-filter", 2);
-
-my %remote_caps = packet_read_and_check_capabilities("clean", "smudge", "delay");
-packet_check_and_write_capabilities(\%remote_caps, @capabilities);
-
-print $debug "init handshake complete\n";
-$debug->flush();
-
-while (1) {
-	my ( $res, $command ) = packet_key_val_read("command");
-	if ( $res == -1 ) {
-		print $debug "STOP\n";
-		exit();
-	}
-	print $debug "IN: $command";
-	$debug->flush();
-
-	if ( $command eq "list_available_blobs" ) {
-		# Flush
-		packet_compare_lists([1, ""], packet_bin_read()) ||
-			die "bad list_available_blobs end";
-
-		foreach my $pathname ( sort keys %DELAY ) {
-			if ( $DELAY{$pathname}{"requested"} >= 1 ) {
-				$DELAY{$pathname}{"count"} = $DELAY{$pathname}{"count"} - 1;
-				if ( $pathname eq "invalid-delay.a" ) {
-					# Send Git a pathname that was not delayed earlier
-					packet_txt_write("pathname=unfiltered");
-				}
-				if ( $pathname eq "missing-delay.a" ) {
-					# Do not signal Git that this file is available
-				} elsif ( $DELAY{$pathname}{"count"} == 0 ) {
-					print $debug " $pathname";
-					packet_txt_write("pathname=$pathname");
-				}
-			}
-		}
-
-		packet_flush();
-
-		print $debug " [OK]\n";
-		$debug->flush();
-		packet_txt_write("status=success");
-		packet_flush();
-	} else {
-		my ( $res, $pathname ) = packet_key_val_read("pathname");
-		if ( $res == -1 ) {
-			die "unexpected EOF while expecting pathname";
-		}
-		print $debug " $pathname";
-		$debug->flush();
-
-		# Read until flush
-		my ( $done, $buffer ) = packet_txt_read();
-		while ( $buffer ne '' ) {
-			if ( $buffer eq "can-delay=1" ) {
-				if ( exists $DELAY{$pathname} and $DELAY{$pathname}{"requested"} == 0 ) {
-					$DELAY{$pathname}{"requested"} = 1;
-				} elsif ( !exists $DELAY{$pathname} and $always_delay ) {
-					$DELAY{$pathname} = { "requested" => 1, "count" => 1 };
-				}
-			} elsif ($buffer =~ /^(ref|treeish|blob)=/) {
-				print $debug " $buffer";
-			} else {
-				# In general, filters need to be graceful about
-				# new metadata, since it's documented that we
-				# can pass any key-value pairs, but for tests,
-				# let's be a little stricter.
-				die "Unknown message '$buffer'";
-			}
-
-			( $done, $buffer ) = packet_txt_read();
-		}
-		if ( $done == -1 ) {
-			die "unexpected EOF after pathname '$pathname'";
-		}
-
-		my $input = "";
-		{
-			binmode(STDIN);
-			my $buffer;
-			my $done = 0;
-			while ( !$done ) {
-				( $done, $buffer ) = packet_bin_read();
-				$input .= $buffer;
-			}
-			if ( $done == -1 ) {
-				die "unexpected EOF while reading input for '$pathname'";
-			}			
-			print $debug " " . length($input) . " [OK] -- ";
-			$debug->flush();
-		}
-
-		my $output;
-		if ( exists $DELAY{$pathname} and exists $DELAY{$pathname}{"output"} ) {
-			$output = $DELAY{$pathname}{"output"}
-		} elsif ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
-			$output = "";
-		} elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
-			$output = rot13($input);
-		} elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
-			$output = rot13($input);
-		} else {
-			die "bad command '$command'";
-		}
-
-		if ( $pathname eq "error.r" ) {
-			print $debug "[ERROR]\n";
-			$debug->flush();
-			packet_txt_write("status=error");
-			packet_flush();
-		} elsif ( $pathname eq "abort.r" ) {
-			print $debug "[ABORT]\n";
-			$debug->flush();
-			packet_txt_write("status=abort");
-			packet_flush();
-		} elsif ( $command eq "smudge" and
-			exists $DELAY{$pathname} and
-			$DELAY{$pathname}{"requested"} == 1 ) {
-			print $debug "[DELAYED]\n";
-			$debug->flush();
-			packet_txt_write("status=delayed");
-			packet_flush();
-			$DELAY{$pathname}{"requested"} = 2;
-			$DELAY{$pathname}{"output"} = $output;
-		} else {
-			packet_txt_write("status=success");
-			packet_flush();
-
-			if ( $pathname eq "${command}-write-fail.r" ) {
-				print $debug "[WRITE FAIL]\n";
-				$debug->flush();
-				die "${command} write error";
-			}
-
-			print $debug "OUT: " . length($output) . " ";
-			$debug->flush();
-
-			while ( length($output) > 0 ) {
-				my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
-				packet_bin_write($packet);
-				# dots represent the number of packets
-				print $debug ".";
-				if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
-					$output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
-				} else {
-					$output = "";
-				}
-			}
-			packet_flush();
-			print $debug " [OK]\n";
-			$debug->flush();
-			packet_flush();
-		}
-	}
-}
diff --git a/t/t2080-parallel-checkout-basics.sh b/t/t2080-parallel-checkout-basics.sh
index c683e60007..7d956625ca 100755
--- a/t/t2080-parallel-checkout-basics.sh
+++ b/t/t2080-parallel-checkout-basics.sh
@@ -230,12 +230,9 @@  test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading d
 # check the final report including sequential, parallel, and delayed entries
 # all at the same time. So we must have finer control of the parallel checkout
 # variables.
-test_expect_success PERL '"git checkout ." report should not include failed entries' '
-	write_script rot13-filter.pl "$PERL_PATH" \
-		<"$TEST_DIRECTORY"/t0021/rot13-filter.pl &&
-
+test_expect_success '"git checkout ." report should not include failed entries' '
 	test_config_global filter.delay.process \
-		"\"$(pwd)/rot13-filter.pl\" --always-delay delayed.log clean smudge delay" &&
+		"test-tool rot13-filter --always-delay delayed.log clean smudge delay" &&
 	test_config_global filter.delay.required true &&
 	test_config_global filter.cat.clean cat  &&
 	test_config_global filter.cat.smudge cat  &&
diff --git a/t/t2082-parallel-checkout-attributes.sh b/t/t2082-parallel-checkout-attributes.sh
index 2525457961..2df55b9405 100755
--- a/t/t2082-parallel-checkout-attributes.sh
+++ b/t/t2082-parallel-checkout-attributes.sh
@@ -138,12 +138,9 @@  test_expect_success 'parallel-checkout and external filter' '
 # The delayed queue is independent from the parallel queue, and they should be
 # able to work together in the same checkout process.
 #
-test_expect_success PERL 'parallel-checkout and delayed checkout' '
-	write_script rot13-filter.pl "$PERL_PATH" \
-		<"$TEST_DIRECTORY"/t0021/rot13-filter.pl &&
-
+test_expect_success 'parallel-checkout and delayed checkout' '
 	test_config_global filter.delay.process \
-		"\"$(pwd)/rot13-filter.pl\" --always-delay \"$(pwd)/delayed.log\" clean smudge delay" &&
+		"test-tool rot13-filter --always-delay \"$(pwd)/delayed.log\" clean smudge delay" &&
 	test_config_global filter.delay.required true &&
 
 	echo "abcd" >original &&