[v2] upload-pack: use buffered I/O to talk to rev-list
diff mbox series

Message ID cf395005-af63-f698-fe19-6c4b6f1a8a4b@web.de
State New
Headers show
Series
  • [v2] upload-pack: use buffered I/O to talk to rev-list
Related show

Commit Message

René Scharfe Aug. 12, 2020, 4:52 p.m. UTC
Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
2016-06-08), significantly reduce the number of system calls and
simplify the code for sending object IDs to rev-list by using stdio's
buffering.

Take care to handle errors immediately to get the correct error code,
and to flush the buffer explicitly before closing the stream in order to
catch any write errors for these last bytes.

Helped-by: Chris Torek <chris.torek@gmail.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
---
Changes since v1:
- Handle fprintf() errors immediately.
- Call ferror() and fflush() explicitly before calling fclose().
- Ignore fclose() errors, similar to the original code.
- Initialize the stream pointer and use a NULL check in the error
  handler to avoid a bogus compiler warning about it being used
  without initialization.

 upload-pack.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

--
2.28.0

Comments

Christian Couder Aug. 13, 2020, 5:17 a.m. UTC | #1
On Wed, Aug 12, 2020 at 6:54 PM René Scharfe <l.s.r@web.de> wrote:

> -       close(cmd->in);
> +       if (ferror(cmd_in) || fflush(cmd_in))
> +               goto error;
> +       fclose(cmd_in);
>         cmd->in = -1;

I wonder if setting cmd->in to -1 is still useful...

>         sigchain_pop(SIGPIPE);
>
> @@ -660,8 +658,8 @@ static int do_reachable_revlist(struct child_process *cmd,
>  error:
>         sigchain_pop(SIGPIPE);
>
> -       if (cmd->in >= 0)
> -               close(cmd->in);
> +       if (cmd_in)
> +               fclose(cmd_in);

...as we don't check cmd->in anymore at the end of the function, but
we now check cmd_in instead. So should cmd_in have been set to -1
instead of cmd->in?
René Scharfe Aug. 13, 2020, 5:57 a.m. UTC | #2
Am 13.08.20 um 07:17 schrieb Christian Couder:
> On Wed, Aug 12, 2020 at 6:54 PM René Scharfe <l.s.r@web.de> wrote:
>
>> -       close(cmd->in);
>> +       if (ferror(cmd_in) || fflush(cmd_in))
>> +               goto error;
>> +       fclose(cmd_in);
>>         cmd->in = -1;
>
> I wonder if setting cmd->in to -1 is still useful...

The patch doesn't change its usefulness.  It probably was not necessary
to begin with.

>
>>         sigchain_pop(SIGPIPE);
>>

The next line right here (not shown in the diff) returns 0.

>> @@ -660,8 +658,8 @@ static int do_reachable_revlist(struct child_process *cmd,
>>  error:
>>         sigchain_pop(SIGPIPE);
>>
>> -       if (cmd->in >= 0)
>> -               close(cmd->in);
>> +       if (cmd_in)
>> +               fclose(cmd_in);
>
> ...as we don't check cmd->in anymore at the end of the function, but
> we now check cmd_in instead. So should cmd_in have been set to -1
> instead of cmd->in?

This error handler is not reached from the place that sets cmd->in back
to -1.  It can be reached from a place where cmd_in is still NULL, so
this check is necessary and setting cmd_in to NULL above is not needed.

René
Jeff King Aug. 13, 2020, 9:01 a.m. UTC | #3
On Thu, Aug 13, 2020 at 07:57:20AM +0200, René Scharfe wrote:

> Am 13.08.20 um 07:17 schrieb Christian Couder:
> > On Wed, Aug 12, 2020 at 6:54 PM René Scharfe <l.s.r@web.de> wrote:
> >
> >> -       close(cmd->in);
> >> +       if (ferror(cmd_in) || fflush(cmd_in))
> >> +               goto error;
> >> +       fclose(cmd_in);
> >>         cmd->in = -1;
> >
> > I wonder if setting cmd->in to -1 is still useful...
> 
> The patch doesn't change its usefulness.  It probably was not necessary
> to begin with.

Right. We exit immediately after setting it, but it's not clearly a dead
load: the "cmd" struct is passed in from the caller. Prior to 2997178ee6
(upload-pack: split check_unreachable() in two, prep for
get_reachable_list(), 2016-06-12), it was part of a longer function
which did have more error handling.

But after the split, has_unreachable() knows that it is always closed
after do_reachable_revlist() returns, so it never looks at cmd.in again.
It might be worth removing just to avoid confusion.

I thought this also implied that the conditional close() in the error
block was not necessary, but it is. In the existing code we could "goto
error" from start_command() failing, before we ever assign to cmd->in.
The run-command API clears the struct in that case (so we'd see cmd->in
== 0, and avoid closing).

> >> -       if (cmd->in >= 0)
> >> -               close(cmd->in);
> >> +       if (cmd_in)
> >> +               fclose(cmd_in);
> >
> > ...as we don't check cmd->in anymore at the end of the function, but
> > we now check cmd_in instead. So should cmd_in have been set to -1
> > instead of cmd->in?
> 
> This error handler is not reached from the place that sets cmd->in back
> to -1.  It can be reached from a place where cmd_in is still NULL, so
> this check is necessary and setting cmd_in to NULL above is not needed.

Right, that makes sense. Your NULL cmd_in is replacing the zero'd
cmd->in from start_command().

So I think your patch is correct. It might be worth removing the "-1"
assignment on top.

-Peff

Patch
diff mbox series

diff --git a/upload-pack.c b/upload-pack.c
index 80ad9a38d81..811fdd47b58 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -603,9 +603,8 @@  static int do_reachable_revlist(struct child_process *cmd,
 		"rev-list", "--stdin", NULL,
 	};
 	struct object *o;
-	char namebuf[GIT_MAX_HEXSZ + 2]; /* ^ + hash + LF */
+	FILE *cmd_in = NULL;
 	int i;
-	const unsigned hexsz = the_hash_algo->hexsz;

 	cmd->argv = argv;
 	cmd->git_cmd = 1;
@@ -623,8 +622,8 @@  static int do_reachable_revlist(struct child_process *cmd,
 	if (start_command(cmd))
 		goto error;

-	namebuf[0] = '^';
-	namebuf[hexsz + 1] = '\n';
+	cmd_in = xfdopen(cmd->in, "w");
+
 	for (i = get_max_object_index(); 0 < i; ) {
 		o = get_indexed_object(--i);
 		if (!o)
@@ -633,11 +632,9 @@  static int do_reachable_revlist(struct child_process *cmd,
 			o->flags &= ~TMP_MARK;
 		if (!is_our_ref(o, allow_uor))
 			continue;
-		memcpy(namebuf + 1, oid_to_hex(&o->oid), hexsz);
-		if (write_in_full(cmd->in, namebuf, hexsz + 2) < 0)
+		if (fprintf(cmd_in, "^%s\n", oid_to_hex(&o->oid)) < 0)
 			goto error;
 	}
-	namebuf[hexsz] = '\n';
 	for (i = 0; i < src->nr; i++) {
 		o = src->objects[i].item;
 		if (is_our_ref(o, allow_uor)) {
@@ -647,11 +644,12 @@  static int do_reachable_revlist(struct child_process *cmd,
 		}
 		if (reachable && o->type == OBJ_COMMIT)
 			o->flags |= TMP_MARK;
-		memcpy(namebuf, oid_to_hex(&o->oid), hexsz);
-		if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
+		if (fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid)) < 0)
 			goto error;
 	}
-	close(cmd->in);
+	if (ferror(cmd_in) || fflush(cmd_in))
+		goto error;
+	fclose(cmd_in);
 	cmd->in = -1;
 	sigchain_pop(SIGPIPE);

@@ -660,8 +658,8 @@  static int do_reachable_revlist(struct child_process *cmd,
 error:
 	sigchain_pop(SIGPIPE);

-	if (cmd->in >= 0)
-		close(cmd->in);
+	if (cmd_in)
+		fclose(cmd_in);
 	if (cmd->out >= 0)
 		close(cmd->out);
 	return -1;