diff mbox series

upload-pack: use buffered I/O to talk to rev-list

Message ID 6722ade6-971e-7ecc-e8f0-7f595ca0b0ff@web.de (mailing list archive)
State Superseded
Headers show
Series upload-pack: use buffered I/O to talk to rev-list | expand

Commit Message

René Scharfe. Aug. 2, 2020, 2:38 p.m. UTC
Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
2016-06-08), significantly reduce the number of system calls and
simplify the code for sending object IDs to rev-list by using stdio's
buffering and handling errors after the loops.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 upload-pack.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

--
2.28.0

Comments

Chris Torek Aug. 2, 2020, 4:03 p.m. UTC | #1
One suggestion here:

On Sun, Aug 2, 2020 at 7:41 AM René Scharfe <l.s.r@web.de> wrote:
> Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
> 2016-06-08), significantly reduce the number of system calls and
> simplify the code for sending object IDs to rev-list by using stdio's
> buffering and handling errors after the loops.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  upload-pack.c | 21 +++++++--------------
>  1 file changed, 7 insertions(+), 14 deletions(-)
>
> diff --git a/upload-pack.c b/upload-pack.c
> index 86737410709..9f616c2c6a6 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c

[snip]

> @@ -640,12 +636,11 @@ static int do_reachable_revlist(struct child_process *cmd,
>                 }
>                 if (reachable && o->type == OBJ_COMMIT)
>                         o->flags |= TMP_MARK;
> -               memcpy(namebuf, oid_to_hex(&o->oid), hexsz);
> -               if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
> -                       goto error;
> +               fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid));

The fprintf() call here *can* return an error, e.g., if the
connection has died.  If it does, it should set things up so that
a later ferror(cmd_in) returns true.

>         }
> -       close(cmd->in);
>         cmd->in = -1;
> +       if (fclose(cmd_in))
> +               goto error;

The fclose() call doesn't necessarily check ferror().  (The
FreeBSD stdio in particular definitely does not.)  It might
be better to use:

    failure = ferror(cmd_in);
    failure |= fclose(cmd_in);
    if (failure) ...

here, or similar.  (The temporary variable is not needed,
but someone might assume `if (ferror(fp) | fclose(fp))` is
a typo for `if (ferror(fp) || fclose(fp))`.)

(Note: my sample isn't properly indented as gmail does not
let me insert tabs easily)

Chris
René Scharfe. Aug. 3, 2020, 2 p.m. UTC | #2
Am 02.08.20 um 18:03 schrieb Chris Torek:
> One suggestion here:
>
> On Sun, Aug 2, 2020 at 7:41 AM René Scharfe <l.s.r@web.de> wrote:
>> Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
>> 2016-06-08), significantly reduce the number of system calls and
>> simplify the code for sending object IDs to rev-list by using stdio's
>> buffering and handling errors after the loops.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  upload-pack.c | 21 +++++++--------------
>>  1 file changed, 7 insertions(+), 14 deletions(-)
>>
>> diff --git a/upload-pack.c b/upload-pack.c
>> index 86737410709..9f616c2c6a6 100644
>> --- a/upload-pack.c
>> +++ b/upload-pack.c
>
> [snip]
>
>> @@ -640,12 +636,11 @@ static int do_reachable_revlist(struct child_process *cmd,
>>                 }
>>                 if (reachable && o->type == OBJ_COMMIT)
>>                         o->flags |= TMP_MARK;
>> -               memcpy(namebuf, oid_to_hex(&o->oid), hexsz);
>> -               if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
>> -                       goto error;
>> +               fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid));
>
> The fprintf() call here *can* return an error, e.g., if the
> connection has died.  If it does, it should set things up so that
> a later ferror(cmd_in) returns true.
>
>>         }
>> -       close(cmd->in);
>>         cmd->in = -1;
>> +       if (fclose(cmd_in))
>> +               goto error;
>
> The fclose() call doesn't necessarily check ferror().  (The
> FreeBSD stdio in particular definitely does not.)  It might
> be better to use:
>
>     failure = ferror(cmd_in);
>     failure |= fclose(cmd_in);
>     if (failure) ...
>
> here, or similar.  (The temporary variable is not needed,
> but someone might assume `if (ferror(fp) | fclose(fp))` is
> a typo for `if (ferror(fp) || fclose(fp))`.)

Thanks, didn't know that.  That's awful.  So the sentence "The fclose()
and fdclose() functions may also fail and set errno for any of the
errors specified for fflush(3)." from the FreeBSD manpage for fclose(3)
actually means that while it will flush, it is free to ignore any
flush errors?

Or do you mean that fflush() can succeed on a stream that has its error
indicator set?

In any case, we'd then better add a function that flushes the buffer,
closes the stream and reports errors in its return code and errno --
i.e. a sane fclose().

René
Chris Torek Aug. 3, 2020, 3:54 p.m. UTC | #3
On Mon, Aug 3, 2020 at 7:00 AM René Scharfe <l.s.r@web.de> wrote:
>
> Am 02.08.20 um 18:03 schrieb Chris Torek:
> > The fclose() call doesn't necessarily check ferror().  (The
> > FreeBSD stdio in particular definitely does not.)
>
> Thanks, didn't know that.  That's awful.  So the sentence "The fclose()
> and fdclose() functions may also fail and set errno for any of the
> errors specified for fflush(3)." from the FreeBSD manpage for fclose(3)
> actually means that while it will flush, it is free to ignore any
> flush errors?
>
> Or do you mean that fflush() can succeed on a stream that has its error
> indicator set?

The latter.  You have to get particularly (un?)lucky here: say the
buffer size is n, and the *last* write operation (fprintf, whatever)
filled the buffer mod n and called fflush internally and that failed.
Then at the time you call fclose() the buffer is empty.  There is
nothing to flush, so we just get the result of the system's close()
call.

> In any case, we'd then better add a function that flushes the buffer,
> closes the stream and reports errors in its return code and errno --
> i.e. a sane fclose().

Unfortunately the global-variable nature of errno means it may be
clobbered by the time you inspect it here, again with the same sort
of issue above.  It might be nice if C's error conventions were more
like Go's, and stdio retained an earlier error, but this stuff all dates
back to 16-bit "int" and 512-byte buffers and no room for nice stuff. :-)

Chris
Johannes Sixt Aug. 3, 2020, 6:15 p.m. UTC | #4
Am 02.08.20 um 18:03 schrieb Chris Torek:
> One suggestion here:
> 
> On Sun, Aug 2, 2020 at 7:41 AM René Scharfe <l.s.r@web.de> wrote:
>> Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
>> 2016-06-08), significantly reduce the number of system calls and
>> simplify the code for sending object IDs to rev-list by using stdio's
>> buffering and handling errors after the loops.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  upload-pack.c | 21 +++++++--------------
>>  1 file changed, 7 insertions(+), 14 deletions(-)
>>
>> diff --git a/upload-pack.c b/upload-pack.c
>> index 86737410709..9f616c2c6a6 100644
>> --- a/upload-pack.c
>> +++ b/upload-pack.c
> 
> [snip]
> 
>> @@ -640,12 +636,11 @@ static int do_reachable_revlist(struct child_process *cmd,
>>                 }
>>                 if (reachable && o->type == OBJ_COMMIT)
>>                         o->flags |= TMP_MARK;
>> -               memcpy(namebuf, oid_to_hex(&o->oid), hexsz);
>> -               if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
>> -                       goto error;
>> +               fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid));
> 
> The fprintf() call here *can* return an error, e.g., if the
> connection has died.  If it does, it should set things up so that
> a later ferror(cmd_in) returns true.

True. We need an explicit test after each fprintf anyway because SIGPIPE
may be ignored, and then writing fails with EPIPE. On Windows, this is
doubly important because we do not have SIGPIPE at all (and always see
EPIPE), but we see EPIPE only on the first failed write; subsequent
writes produce EINVAL.

-- Hannes
diff mbox series

Patch

diff --git a/upload-pack.c b/upload-pack.c
index 86737410709..9f616c2c6a6 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -595,10 +595,9 @@  static int do_reachable_revlist(struct child_process *cmd,
 	static const char *argv[] = {
 		"rev-list", "--stdin", NULL,
 	};
+	FILE *cmd_in;
 	struct object *o;
-	char namebuf[GIT_MAX_HEXSZ + 2]; /* ^ + hash + LF */
 	int i;
-	const unsigned hexsz = the_hash_algo->hexsz;

 	cmd->argv = argv;
 	cmd->git_cmd = 1;
@@ -616,8 +615,8 @@  static int do_reachable_revlist(struct child_process *cmd,
 	if (start_command(cmd))
 		goto error;

-	namebuf[0] = '^';
-	namebuf[hexsz + 1] = '\n';
+	cmd_in = xfdopen(cmd->in, "w");
+
 	for (i = get_max_object_index(); 0 < i; ) {
 		o = get_indexed_object(--i);
 		if (!o)
@@ -626,11 +625,8 @@  static int do_reachable_revlist(struct child_process *cmd,
 			o->flags &= ~TMP_MARK;
 		if (!is_our_ref(o, allow_uor))
 			continue;
-		memcpy(namebuf + 1, oid_to_hex(&o->oid), hexsz);
-		if (write_in_full(cmd->in, namebuf, hexsz + 2) < 0)
-			goto error;
+		fprintf(cmd_in, "^%s\n", oid_to_hex(&o->oid));
 	}
-	namebuf[hexsz] = '\n';
 	for (i = 0; i < src->nr; i++) {
 		o = src->objects[i].item;
 		if (is_our_ref(o, allow_uor)) {
@@ -640,12 +636,11 @@  static int do_reachable_revlist(struct child_process *cmd,
 		}
 		if (reachable && o->type == OBJ_COMMIT)
 			o->flags |= TMP_MARK;
-		memcpy(namebuf, oid_to_hex(&o->oid), hexsz);
-		if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
-			goto error;
+		fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid));
 	}
-	close(cmd->in);
 	cmd->in = -1;
+	if (fclose(cmd_in))
+		goto error;
 	sigchain_pop(SIGPIPE);

 	return 0;
@@ -653,8 +648,6 @@  static int do_reachable_revlist(struct child_process *cmd,
 error:
 	sigchain_pop(SIGPIPE);

-	if (cmd->in >= 0)
-		close(cmd->in);
 	if (cmd->out >= 0)
 		close(cmd->out);
 	return -1;