Message ID | 20250212041825.2455031-4-jltobler@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | batch blob diff generation | expand |
On Tue, Feb 11, 2025 at 10:18:25PM -0600, Justin Tobler wrote: > The diffs queued from git-diff-pairs(1) stdin are not flushed EOF is I think you meant to say "are flush when stdin is closed" or something like that. > reached. To enable greater flexibility, allow control over when the diff > queue is flushed by writing a single nul byte on stdin between input s/nul/NUL/ > file pairs. Diff output between flushes is separated by a single line > terminator. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > --- > Documentation/git-diff-pairs.adoc | 4 ++++ > builtin/diff-pairs.c | 11 +++++++++++ > t/t4070-diff-pairs.sh | 22 ++++++++++++++++++++++ > 3 files changed, 37 insertions(+) > > diff --git a/Documentation/git-diff-pairs.adoc b/Documentation/git-diff-pairs.adoc > index e9ef4a6615..33c0d702f0 100644 > --- a/Documentation/git-diff-pairs.adoc > +++ b/Documentation/git-diff-pairs.adoc > @@ -32,6 +32,10 @@ compute diffs progressively over the course of multiple invocations of > Each blob pair is fed to the diff machinery individually queued and the output > is flushed on stdin EOF. > > +To explicitly flush the diff queue, a single nul byte can be written to stdin > +between filepairs. Diff output between flushes is separated by a single line > +terminator. The same comment as for the previous patch applies here, I think we should refrain from using jargon like "flushing", "diff queue" or "filepairs". These are internal implementation details that the user shouldn't need to worry about. Instead, we should be talking about the user-visible effects. > diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c > index 08f3ee81e5..2436ce3013 100644 > --- a/builtin/diff-pairs.c > +++ b/builtin/diff-pairs.c > @@ -99,6 +99,17 @@ int cmd_diff_pairs(int argc, const char **argv, const char *prefix, > break; > > p = meta.buf; > + if (!*p) { > + flush_diff_queue(&revs.diffopt); > + /* > + * When the diff queue is explicitly flushed, append an > + * additional terminator to separate batches of diffs. > + */ > + fprintf(revs.diffopt.file, "%c", > + revs.diffopt.line_termination); You can use `fputc(revs.diffopt.line_termination, revs.diffopt.file)` instead. > diff --git a/t/t4070-diff-pairs.sh b/t/t4070-diff-pairs.sh > index e0a8e6f0a0..aca228a8fa 100755 > --- a/t/t4070-diff-pairs.sh > +++ b/t/t4070-diff-pairs.sh > @@ -77,4 +77,26 @@ test_expect_success 'split input across multiple diff-pairs' ' > test_cmp expect actual > ' > > +test_expect_success 'diff-pairs explicit queue flush' ' > + git diff-tree -r -M -C -C -z base new >input && > + printf "\0" >>input && > + git diff-tree -r -M -C -C -z base new >>input && > + > + git diff-tree -r -M -C -C base new >expect && > + printf "\n" >>expect && > + git diff-tree -r -M -C -C base new >>expect && > + > + git diff-pairs <input >actual && > + test_cmp expect actual > +' > +j > +test_expect_success 'diff-pairs explicit queue flush null terminated' ' s/null/NUL > + git diff-tree -r -M -C -C -z base new >expect && > + printf "\0" >>expect && > + git diff-tree -r -M -C -C -z base new >>expect && > + > + git diff-pairs -z <expect >actual && > + test_cmp expect actual > +' > + Patrick
Hi Justin On 12/02/2025 04:18, Justin Tobler wrote: > The diffs queued from git-diff-pairs(1) stdin are not flushed EOF is > reached. To enable greater flexibility, allow control over when the diff > queue is flushed by writing a single nul byte on stdin between input > file pairs. Diff output between flushes is separated by a single line > terminator. I agree with the comments others have made about the documentation. I also have some comments on the implementation below. > diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c > index 08f3ee81e5..2436ce3013 100644 > --- a/builtin/diff-pairs.c > +++ b/builtin/diff-pairs.c > @@ -99,6 +99,17 @@ int cmd_diff_pairs(int argc, const char **argv, const char *prefix, > break; > > p = meta.buf; > + if (!*p) { > + flush_diff_queue(&revs.diffopt); > + /* > + * When the diff queue is explicitly flushed, append an > + * additional terminator to separate batches of diffs. > + */ > + fprintf(revs.diffopt.file, "%c", > + revs.diffopt.line_termination); As the user has requested an explicit flush we should call fflush(stdout) here to avoid deadlocking a caller that is waiting to read the terminator before writing the next batch of input. Ideally the tests would check that the output is flushed but I think that is quite hard to do with our test framework. I think it would be easier for callers to parse the output if we always printed NUL here. Programming languages generally have a function that allows you to read all the input until a specific byte is seen. If flushing always used a NUL terminator the caller could use their equivalent of read_until(b'\0') to hoover up the output (using '-z' to do this would change the output of --numstat and embed a NUL between any stat data and the patch). Using a newline as the terminator here means the caller needs to look for "\n\n". That string occurs in the output between the stat data and the patch and can also occur in the patch hunks if diff.suppressBlankEmpty is set. Now that we are calling diff_flush() in a loop we need to set .no_free in our diff options and call diff_free() at the end of the program (see the comment in diff.h) Best Wishes Phillip > + continue; > + } > + > if (*p != ':') > die("invalid raw diff input"); > p++; > diff --git a/t/t4070-diff-pairs.sh b/t/t4070-diff-pairs.sh > index e0a8e6f0a0..aca228a8fa 100755 > --- a/t/t4070-diff-pairs.sh > +++ b/t/t4070-diff-pairs.sh > @@ -77,4 +77,26 @@ test_expect_success 'split input across multiple diff-pairs' ' > test_cmp expect actual > ' > > +test_expect_success 'diff-pairs explicit queue flush' ' > + git diff-tree -r -M -C -C -z base new >input && > + printf "\0" >>input && > + git diff-tree -r -M -C -C -z base new >>input && > + > + git diff-tree -r -M -C -C base new >expect && > + printf "\n" >>expect && > + git diff-tree -r -M -C -C base new >>expect && > + > + git diff-pairs <input >actual && > + test_cmp expect actual > +' > +j > +test_expect_success 'diff-pairs explicit queue flush null terminated' ' > + git diff-tree -r -M -C -C -z base new >expect && > + printf "\0" >>expect && > + git diff-tree -r -M -C -C -z base new >>expect && > + > + git diff-pairs -z <expect >actual && > + test_cmp expect actual > +' > + > test_done
diff --git a/Documentation/git-diff-pairs.adoc b/Documentation/git-diff-pairs.adoc index e9ef4a6615..33c0d702f0 100644 --- a/Documentation/git-diff-pairs.adoc +++ b/Documentation/git-diff-pairs.adoc @@ -32,6 +32,10 @@ compute diffs progressively over the course of multiple invocations of Each blob pair is fed to the diff machinery individually queued and the output is flushed on stdin EOF. +To explicitly flush the diff queue, a single nul byte can be written to stdin +between filepairs. Diff output between flushes is separated by a single line +terminator. + OPTIONS ------- diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c index 08f3ee81e5..2436ce3013 100644 --- a/builtin/diff-pairs.c +++ b/builtin/diff-pairs.c @@ -99,6 +99,17 @@ int cmd_diff_pairs(int argc, const char **argv, const char *prefix, break; p = meta.buf; + if (!*p) { + flush_diff_queue(&revs.diffopt); + /* + * When the diff queue is explicitly flushed, append an + * additional terminator to separate batches of diffs. + */ + fprintf(revs.diffopt.file, "%c", + revs.diffopt.line_termination); + continue; + } + if (*p != ':') die("invalid raw diff input"); p++; diff --git a/t/t4070-diff-pairs.sh b/t/t4070-diff-pairs.sh index e0a8e6f0a0..aca228a8fa 100755 --- a/t/t4070-diff-pairs.sh +++ b/t/t4070-diff-pairs.sh @@ -77,4 +77,26 @@ test_expect_success 'split input across multiple diff-pairs' ' test_cmp expect actual ' +test_expect_success 'diff-pairs explicit queue flush' ' + git diff-tree -r -M -C -C -z base new >input && + printf "\0" >>input && + git diff-tree -r -M -C -C -z base new >>input && + + git diff-tree -r -M -C -C base new >expect && + printf "\n" >>expect && + git diff-tree -r -M -C -C base new >>expect && + + git diff-pairs <input >actual && + test_cmp expect actual +' +j +test_expect_success 'diff-pairs explicit queue flush null terminated' ' + git diff-tree -r -M -C -C -z base new >expect && + printf "\0" >>expect && + git diff-tree -r -M -C -C -z base new >>expect && + + git diff-pairs -z <expect >actual && + test_cmp expect actual +' + test_done
The diffs queued from git-diff-pairs(1) stdin are not flushed EOF is reached. To enable greater flexibility, allow control over when the diff queue is flushed by writing a single nul byte on stdin between input file pairs. Diff output between flushes is separated by a single line terminator. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/git-diff-pairs.adoc | 4 ++++ builtin/diff-pairs.c | 11 +++++++++++ t/t4070-diff-pairs.sh | 22 ++++++++++++++++++++++ 3 files changed, 37 insertions(+)