diff mbox series

[v2,3/3] builtin/diff-pairs: allow explicit diff queue flush

Message ID 20250212041825.2455031-4-jltobler@gmail.com (mailing list archive)
State New
Headers show
Series batch blob diff generation | expand

Commit Message

Justin Tobler Feb. 12, 2025, 4:18 a.m. UTC
The diffs queued from git-diff-pairs(1) stdin are not flushed EOF is
reached. To enable greater flexibility, allow control over when the diff
queue is flushed by writing a single nul byte on stdin between input
file pairs. Diff output between flushes is separated by a single line
terminator.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-diff-pairs.adoc |  4 ++++
 builtin/diff-pairs.c              | 11 +++++++++++
 t/t4070-diff-pairs.sh             | 22 ++++++++++++++++++++++
 3 files changed, 37 insertions(+)

Comments

Patrick Steinhardt Feb. 12, 2025, 9:23 a.m. UTC | #1
On Tue, Feb 11, 2025 at 10:18:25PM -0600, Justin Tobler wrote:
> The diffs queued from git-diff-pairs(1) stdin are not flushed EOF is

I think you meant to say "are flush when stdin is closed" or something
like that.

> reached. To enable greater flexibility, allow control over when the diff
> queue is flushed by writing a single nul byte on stdin between input

s/nul/NUL/

> file pairs. Diff output between flushes is separated by a single line
> terminator.
> 
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> ---
>  Documentation/git-diff-pairs.adoc |  4 ++++
>  builtin/diff-pairs.c              | 11 +++++++++++
>  t/t4070-diff-pairs.sh             | 22 ++++++++++++++++++++++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/Documentation/git-diff-pairs.adoc b/Documentation/git-diff-pairs.adoc
> index e9ef4a6615..33c0d702f0 100644
> --- a/Documentation/git-diff-pairs.adoc
> +++ b/Documentation/git-diff-pairs.adoc
> @@ -32,6 +32,10 @@ compute diffs progressively over the course of multiple invocations of
>  Each blob pair is fed to the diff machinery individually queued and the output
>  is flushed on stdin EOF.
>  
> +To explicitly flush the diff queue, a single nul byte can be written to stdin
> +between filepairs. Diff output between flushes is separated by a single line
> +terminator.

The same comment as for the previous patch applies here, I think we
should refrain from using jargon like "flushing", "diff queue" or
"filepairs". These are internal implementation details that the user
shouldn't need to worry about. Instead, we should be talking about the
user-visible effects.

> diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c
> index 08f3ee81e5..2436ce3013 100644
> --- a/builtin/diff-pairs.c
> +++ b/builtin/diff-pairs.c
> @@ -99,6 +99,17 @@ int cmd_diff_pairs(int argc, const char **argv, const char *prefix,
>  			break;
>  
>  		p = meta.buf;
> +		if (!*p) {
> +			flush_diff_queue(&revs.diffopt);
> +			/*
> +			 * When the diff queue is explicitly flushed, append an
> +			 * additional terminator to separate batches of diffs.
> +			 */
> +			fprintf(revs.diffopt.file, "%c",
> +				revs.diffopt.line_termination);

You can use `fputc(revs.diffopt.line_termination, revs.diffopt.file)`
instead.

> diff --git a/t/t4070-diff-pairs.sh b/t/t4070-diff-pairs.sh
> index e0a8e6f0a0..aca228a8fa 100755
> --- a/t/t4070-diff-pairs.sh
> +++ b/t/t4070-diff-pairs.sh
> @@ -77,4 +77,26 @@ test_expect_success 'split input across multiple diff-pairs' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'diff-pairs explicit queue flush' '
> +	git diff-tree -r -M -C -C -z base new >input &&
> +	printf "\0" >>input &&
> +	git diff-tree -r -M -C -C -z base new >>input &&
> +
> +	git diff-tree -r -M -C -C base new >expect &&
> +	printf "\n" >>expect &&
> +	git diff-tree -r -M -C -C base new >>expect &&
> +
> +	git diff-pairs <input >actual &&
> +	test_cmp expect actual
> +'
> +j
> +test_expect_success 'diff-pairs explicit queue flush null terminated' '

s/null/NUL

> +	git diff-tree -r -M -C -C -z base new >expect &&
> +	printf "\0" >>expect &&
> +	git diff-tree -r -M -C -C -z base new >>expect &&
> +
> +	git diff-pairs -z <expect >actual &&
> +	test_cmp expect actual
> +'
> +

Patrick
Phillip Wood Feb. 17, 2025, 2:38 p.m. UTC | #2
Hi Justin

On 12/02/2025 04:18, Justin Tobler wrote:
> The diffs queued from git-diff-pairs(1) stdin are not flushed EOF is
> reached. To enable greater flexibility, allow control over when the diff
> queue is flushed by writing a single nul byte on stdin between input
> file pairs. Diff output between flushes is separated by a single line
> terminator.

I agree with the comments others have made about the documentation. I 
also have some comments on the implementation below.

> diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c
> index 08f3ee81e5..2436ce3013 100644
> --- a/builtin/diff-pairs.c
> +++ b/builtin/diff-pairs.c
> @@ -99,6 +99,17 @@ int cmd_diff_pairs(int argc, const char **argv, const char *prefix,
>   			break;
>   
>   		p = meta.buf;
> +		if (!*p) {
> +			flush_diff_queue(&revs.diffopt);
> +			/*
> +			 * When the diff queue is explicitly flushed, append an
> +			 * additional terminator to separate batches of diffs.
> +			 */
> +			fprintf(revs.diffopt.file, "%c",
> +				revs.diffopt.line_termination);

As the user has requested an explicit flush we should call 
fflush(stdout) here to avoid deadlocking a caller that is waiting to 
read the terminator before writing the next batch of input. Ideally the 
tests would check that the output is flushed but I think that is quite 
hard to do with our test framework.

I think it would be easier for callers to parse the output if we always 
printed NUL here. Programming languages generally have a function that 
allows you to read all the input until a specific byte is seen. If 
flushing always used a NUL terminator the caller could use their 
equivalent of read_until(b'\0') to hoover up the output (using '-z' to 
do this would change the output of --numstat and embed a NUL between any 
stat data and the patch). Using a newline as the terminator here means 
the caller needs to look for "\n\n". That string occurs in the output 
between the stat data and the patch and can also occur in the patch 
hunks if diff.suppressBlankEmpty is set.

Now that we are calling diff_flush() in a loop we need to set .no_free 
in our diff options and call diff_free() at the end of the program (see 
the comment in diff.h)

Best Wishes

Phillip


> +			continue;
> +		}
> +
>   		if (*p != ':')
>   			die("invalid raw diff input");
>   		p++;
> diff --git a/t/t4070-diff-pairs.sh b/t/t4070-diff-pairs.sh
> index e0a8e6f0a0..aca228a8fa 100755
> --- a/t/t4070-diff-pairs.sh
> +++ b/t/t4070-diff-pairs.sh
> @@ -77,4 +77,26 @@ test_expect_success 'split input across multiple diff-pairs' '
>   	test_cmp expect actual
>   '
>   
> +test_expect_success 'diff-pairs explicit queue flush' '
> +	git diff-tree -r -M -C -C -z base new >input &&
> +	printf "\0" >>input &&
> +	git diff-tree -r -M -C -C -z base new >>input &&
> +
> +	git diff-tree -r -M -C -C base new >expect &&
> +	printf "\n" >>expect &&
> +	git diff-tree -r -M -C -C base new >>expect &&
> +
> +	git diff-pairs <input >actual &&
> +	test_cmp expect actual
> +'
> +j
> +test_expect_success 'diff-pairs explicit queue flush null terminated' '
> +	git diff-tree -r -M -C -C -z base new >expect &&
> +	printf "\0" >>expect &&
> +	git diff-tree -r -M -C -C -z base new >>expect &&
> +
> +	git diff-pairs -z <expect >actual &&
> +	test_cmp expect actual
> +'
> +
>   test_done
diff mbox series

Patch

diff --git a/Documentation/git-diff-pairs.adoc b/Documentation/git-diff-pairs.adoc
index e9ef4a6615..33c0d702f0 100644
--- a/Documentation/git-diff-pairs.adoc
+++ b/Documentation/git-diff-pairs.adoc
@@ -32,6 +32,10 @@  compute diffs progressively over the course of multiple invocations of
 Each blob pair is fed to the diff machinery individually queued and the output
 is flushed on stdin EOF.
 
+To explicitly flush the diff queue, a single nul byte can be written to stdin
+between filepairs. Diff output between flushes is separated by a single line
+terminator.
+
 OPTIONS
 -------
 
diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c
index 08f3ee81e5..2436ce3013 100644
--- a/builtin/diff-pairs.c
+++ b/builtin/diff-pairs.c
@@ -99,6 +99,17 @@  int cmd_diff_pairs(int argc, const char **argv, const char *prefix,
 			break;
 
 		p = meta.buf;
+		if (!*p) {
+			flush_diff_queue(&revs.diffopt);
+			/*
+			 * When the diff queue is explicitly flushed, append an
+			 * additional terminator to separate batches of diffs.
+			 */
+			fprintf(revs.diffopt.file, "%c",
+				revs.diffopt.line_termination);
+			continue;
+		}
+
 		if (*p != ':')
 			die("invalid raw diff input");
 		p++;
diff --git a/t/t4070-diff-pairs.sh b/t/t4070-diff-pairs.sh
index e0a8e6f0a0..aca228a8fa 100755
--- a/t/t4070-diff-pairs.sh
+++ b/t/t4070-diff-pairs.sh
@@ -77,4 +77,26 @@  test_expect_success 'split input across multiple diff-pairs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'diff-pairs explicit queue flush' '
+	git diff-tree -r -M -C -C -z base new >input &&
+	printf "\0" >>input &&
+	git diff-tree -r -M -C -C -z base new >>input &&
+
+	git diff-tree -r -M -C -C base new >expect &&
+	printf "\n" >>expect &&
+	git diff-tree -r -M -C -C base new >>expect &&
+
+	git diff-pairs <input >actual &&
+	test_cmp expect actual
+'
+j
+test_expect_success 'diff-pairs explicit queue flush null terminated' '
+	git diff-tree -r -M -C -C -z base new >expect &&
+	printf "\0" >>expect &&
+	git diff-tree -r -M -C -C -z base new >>expect &&
+
+	git diff-pairs -z <expect >actual &&
+	test_cmp expect actual
+'
+
 test_done