[v2,10/10] cat-file: use writev(2) if available

Message ID	20240823224630.1180772-11-e@80x24.org (mailing list archive)
State	New
Headers	show Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F169192B6D for <git@vger.kernel.org>; Fri, 23 Aug 2024 22:47:49 +0000 (UTC) From: Eric Wong <e@80x24.org> To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, Patrick Steinhardt <ps@pks.im> Subject: [PATCH v2 10/10] cat-file: use writev(2) if available Date: Fri, 23 Aug 2024 22:46:30 +0000 Message-ID: <20240823224630.1180772-11-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	cat-file speedups \| expand [v2,00/10] cat-file speedups [v2,01/10] packfile: move sizep computation [v2,02/10] packfile: allow content-limit for cat-file [v2,03/10] packfile: fix off-by-one in content_limit comparison [v2,04/10] packfile: inline cache_or_unpack_entry [v2,05/10] cat-file: use delta_base_cache entries directly [v2,06/10] packfile: packed_object_info avoids packed_to_object_type [v2,07/10] object_info: content_limit only applies to blobs [v2,08/10] cat-file: batch-command uses content_limit [v2,09/10] cat-file: batch_write: use size_t for length [v2,10/10] cat-file: use writev(2) if available

Message ID

20240823224630.1180772-11-e@80x24.org (mailing list archive)

State

New

Headers

From: Eric Wong <e@80x24.org>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>,
	Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v2 10/10] cat-file: use writev(2) if available
Date: Fri, 23 Aug 2024 22:46:30 +0000
Message-ID: <20240823224630.1180772-11-e@80x24.org>
In-Reply-To: <20240823224630.1180772-1-e@80x24.org>
References: <20240823224630.1180772-1-e@80x24.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

cat-file speedups | expand

Commit Message

Eric Wong Aug. 23, 2024, 10:46 p.m. UTC

Using writev here is 20-40% faster than three write syscalls in
succession for smaller (1-10k) objects in the delta base cache.
This advantage decreases as object sizes approach pipe size (64k
on Linux).

writev reduces wakeups and syscalls on the read side as well:
each write(2) syscall may trigger one or more corresponding
read(2) syscalls in the reader.  Attempting atomicity in the
writer via writev also reduces the likelyhood of non-blocking
readers failing with EAGAIN and having to call poll||select
before attempting to read again.

Unfortunately, this turns into a small (1-3%) slowdown for
gigantic objects of a megabyte or more even with after
increasing pipe size to 1MB via the F_SETPIPE_SZ fcntl(2) op.
This slowdown is acceptable to me since the vast majority of
objects are 64K or less for projects I've looked at.

Relying on stdio buffering and fflush(3) after each response was
considered for users without --buffer, but historically cat-file
defaults to being compatible with non-blocking stdout and able
to poll(2) after hitting EAGAIN on write(2).  Using stdio on
files with the O_NONBLOCK flag is (AFAIK) unspecified and likely
subject to portability problems and thus avoided.

Signed-off-by: Eric Wong <e@80x24.org>
---
 Makefile           |  3 +++
 builtin/cat-file.c | 62 ++++++++++++++++++++++++++++++-------------
 config.mak.uname   |  5 ++++
 git-compat-util.h  | 10 +++++++
 wrapper.c          | 18 +++++++++++++
 wrapper.h          |  1 +
 write-or-die.c     | 66 ++++++++++++++++++++++++++++++++++++++++++++++
 write-or-die.h     |  2 ++
 8 files changed, 149 insertions(+), 18 deletions(-)

Comments

Junio C Hamano Aug. 27, 2024, 5:41 a.m. UTC | #1

Eric Wong <e@80x24.org> writes:

> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index bf81054662..016b7d26a7 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -280,7 +280,7 @@ struct expand_data {
>  	off_t disk_size;
>  	const char *rest;
>  	struct object_id delta_base_oid;
> -	void *content;
> +	struct git_iovec iov[3];

The earlier content pointer hinted that the caller that obtained
data into this structure from the object layer can use it for any
purpose that suits it, but using git_iovec structure screams that
"we are going to write this thing out!".  As "expand_data" is about
what we are going to write out from cat-file anyway, is probably OK.

Having said that ...

> -static void print_object_or_die(struct batch_options *opt, struct expand_data *data)
> +static void batch_writev(struct batch_options *opt, struct expand_data *data,
> +			const struct strbuf *hdr, size_t size)
> +{
> +	data->iov[0].iov_base = hdr->buf;
> +	data->iov[0].iov_len = hdr->len;
> +	data->iov[1].iov_len = size;
> +
> +	/*
> +	 * Copying a (8|16)-byte iovec for a single byte is gross, but my
> +	 * attempt to stuff output_delim into the trailing NUL byte of
> +	 * iov[1].iov_base (and restoring it after writev(2) for the
> +	 * OI_DBCACHED case) to drop iovcnt from 3->2 wasn't faster.
> +	 */
> +	data->iov[2].iov_base = &opt->output_delim;
> +	data->iov[2].iov_len = 1;
> +	if (opt->buffer_output)
> +		fwritev_or_die(stdout, data->iov, 3);
> +	else
> +		writev_or_die(1, data->iov, 3);
> +
> +	/* writev_or_die may move iov[1].iov_base, so it's invalid */
> +	data->iov[1].iov_base = NULL;
> +}

... the above made me read it twice, wondering "where does
iov[1].iov_base comes from???"  The location of the git_iovec
structure in the expand_data forces this rather unnatural calling
convention where the iovec is passed by address (as part of the
expand_data structure), with only one of six slots filled, and the
other five slots are filled by this function from the parameters
passed to it.

I wonder if we can rework the data structure to

 - Not embed git_iovec iov[] in expand_data;

 - Keep "void *content" instead there;

 - Define an on-stack "struct git_iovec iov[3]" local to this function;

 - Pass "void *content" from the caller to this function;

 - Populate iov[] fully from hdr->{buf,len}, content, size, and
   opt->output_delim and consume it in this function by either
   calling fwritev_or_die() or writev_or_die().

That way, the caller does not have to use data->iov[1].iov_base in
place of data->content, which is the source of "Huh?  Why is the 2nd
element of the 3-element array so special?" puzzlement readers would
feel while reading the caller---after all, the fact that we are
using writev with three chunks is an implementation detail that the
caller does not have to know to correctly use this helper function.

Or am I missing something?

> +static void print_object_or_die(struct batch_options *opt,
> +				struct expand_data *data, struct strbuf *hdr)
>  {
>  	const struct object_id *oid = &data->oid;
>  
>  	assert(data->info.typep);
>  
> -	if (data->content) {
> -		void *content = data->content;
> +	if (data->iov[1].iov_base) {
> +		void *content = data->iov[1].iov_base;
>  		unsigned long size = data->size;
>  
> -		data->content = NULL;
>  		if (use_mailmap && (data->type == OBJ_COMMIT ||
>  					data->type == OBJ_TAG)) {
>  			size_t s = size;
> @@ -399,10 +424,10 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
>  			}
>  
>  			content = replace_idents_using_mailmap(content, &s);
> +			data->iov[1].iov_base = content;
>  			size = cast_size_t_to_ulong(s);
>  		}
> -
> -		batch_write(opt, content, size);
> +		batch_writev(opt, data, hdr, size);
>  		switch (data->info.whence) {
>  		case OI_CACHED:
>  			/*

And with the "let's make iov[3] a local implementation detail of
batch_writev()" approach, the above two hunks would shrink and
essentialy we'd replace batch_write() with batch_writev() (with
adjusted parameters).

> @@ -419,8 +444,6 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
>  		}
>  	} else {
>  		assert(data->type == OBJ_BLOB);
> -		if (opt->buffer_output)
> -			fflush(stdout);

We used to fflush whatever we have written before entering this
"else" clause.  We no longer do so

>  		if (opt->transform_mode) {
>  			char *contents;
>  			unsigned long size;
> @@ -447,10 +470,15 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
>  					    oid_to_hex(oid), data->rest);
>  			} else
>  				BUG("invalid transform_mode: %c", opt->transform_mode);
> -			batch_write(opt, contents, size);
> +			data->iov[1].iov_base = contents;
> +			batch_writev(opt, data, hdr, size);

And in the buffer_output mode, batch_writev() ends up calling
fwritev_or_die(), which is merely a series of fwrite() calls.  And
the removed fflush() earlier is perfectly fine, as it was solely
because we wanted to make sure fflush() before going down to direct
file descriptor access with write(2) and we are now still going
through stdio layer.

>  			free(contents);
>  		} else {
> +			batch_write(opt, hdr->buf, hdr->len);
> +			if (opt->buffer_output)
> +				fflush(stdout);

The bigger else clause is entered with potentially unflushed bytes
in the stdio buffer, as that was why we first fflush().  Then we do
batch_write() here, which uses fwrite() in the buffer_output mode,
without having to fflush().  But before doing stream_blob() below,
we do need to fflush().  Makes sense.

>  static void batch_one_object(const char *obj_name,
> @@ -666,7 +692,7 @@ static void parse_cmd_contents(struct batch_options *opt,
>  			     struct expand_data *data)
>  {
>  	opt->batch_mode = BATCH_MODE_CONTENTS;
> -	data->info.contentp = &data->content;
> +	data->info.contentp = &data->iov[1].iov_base;
>  	batch_one_object(line, output, opt, data);
>  }
>  
> @@ -823,7 +849,7 @@ static int batch_objects(struct batch_options *opt)
>  		data.info.typep = &data.type;
>  		if (!opt->transform_mode) {
>  			data.info.sizep = &data.size;
> -			data.info.contentp = &data.content;
> +			data.info.contentp = &data.iov[1].iov_base;
>  			data.info.content_limit = big_file_threshold;
>  			data.info.direct_cache = 1;
>  		}

If we do the "let's not leak the iov[3] implementation detail from
batch_writev()" update, the above two hunks can be eliminated.

> diff --git a/git-compat-util.h b/git-compat-util.h
> index ca7678a379..afde8abc99 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -388,6 +388,16 @@ static inline int git_setitimer(int which UNUSED,
>  #define setitimer(which,value,ovalue) git_setitimer(which,value,ovalue)
>  #endif
>  
> +#ifdef HAVE_WRITEV
> +#include <sys/uio.h>
> +#define git_iovec iovec
> +#else /* !HAVE_WRITEV */
> +struct git_iovec {
> +	void *iov_base;
> +	size_t iov_len;
> +};
> +#endif /* !HAVE_WRITEV */

OK.

> diff --git a/write-or-die.c b/write-or-die.c
> index 01a9a51fa2..227b051165 100644
> --- a/write-or-die.c
> +++ b/write-or-die.c
> @@ -107,3 +107,69 @@ void fflush_or_die(FILE *f)
>  	if (fflush(f))
>  		die_errno("fflush error");
>  }
> +
> +void fwritev_or_die(FILE *fp, const struct git_iovec *iov, int iovcnt)
> +{
> +	int i;
> +
> +	for (i = 0; i < iovcnt; i++) {
> +		size_t n = iov[i].iov_len;
> +
> +		if (fwrite(iov[i].iov_base, 1, n, fp) != n)
> +			die_errno("unable to write to FD=%d", fileno(fp));
> +	}
> +}

OK.

Junio C Hamano Aug. 27, 2024, 3:43 p.m. UTC | #2

Junio C Hamano <gitster@pobox.com> writes:

>> +static void batch_writev(struct batch_options *opt, struct expand_data *data,
>> +			const struct strbuf *hdr, size_t size)
>> +{
>> ...
>> +}
>
> ... the above made me read it twice, wondering "where does
> iov[1].iov_base comes from???"  The location of the git_iovec
> structure in the expand_data forces this rather unnatural calling
> convention where the iovec is passed by address (as part of the
> expand_data structure), with only one of six slots filled, and the
> other five slots are filled by this function from the parameters
> passed to it.
>
> I wonder if we can rework the data structure to
>
>  - Not embed git_iovec iov[] in expand_data;
>
>  - Keep "void *content" instead there;
>
>  - Define an on-stack "struct git_iovec iov[3]" local to this function;
>
>  - Pass "void *content" from the caller to this function;
>
>  - Populate iov[] fully from hdr->{buf,len}, content, size, and
>    opt->output_delim and consume it in this function by either
>    calling fwritev_or_die() or writev_or_die().
>
> That way, the caller does not have to use data->iov[1].iov_base in
> place of data->content, which is the source of "Huh?  Why is the 2nd
> element of the 3-element array so special?" puzzlement readers would
> feel while reading the caller---after all, the fact that we are
> using writev with three chunks is an implementation detail that the
> caller does not have to know to correctly use this helper function.
>
> Or am I missing something?

Additional thought.  Perhaps we can introduce

    static void batch_write(struct batch_options *opt,
			    const void *data, ...);

that is a vararg function that takes <data, len> pairs repeated at
the end, with data==NULL as sentinel.  It may technically need to be
called batch_writel(), but that is a backward compatible interface
for existing batch_write() callers.

Then the use of writev() can be encapsulated inside the updated
batch_write() function.  If you get only a single <data, len> pair,
you would do a single write_or_die() or fwrite_or_die().  Otherwise
you'd do the writev() thing, and the function can stay oblivious to
the meaning of what it is writing out.  There is no need for the
function to know that the payload is "header followed by body
followed by delimiter byte", as that is what the callers express at
the call sites of the function.

Hmm?

diff --git a/Makefile b/Makefile
index 3eab701b10..c7a062de00 100644
--- a/Makefile
+++ b/Makefile
@@ -1844,6 +1844,9 @@  ifdef NO_PREAD
 	COMPAT_CFLAGS += -DNO_PREAD
 	COMPAT_OBJS += compat/pread.o
 endif
+ifdef HAVE_WRITEV
+	COMPAT_CFLAGS += -DHAVE_WRITEV
+endif
 ifdef NO_FAST_WORKING_DIRECTORY
 	BASIC_CFLAGS += -DNO_FAST_WORKING_DIRECTORY
 endif
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index bf81054662..016b7d26a7 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -280,7 +280,7 @@  struct expand_data {
 	off_t disk_size;
 	const char *rest;
 	struct object_id delta_base_oid;
-	void *content;
+	struct git_iovec iov[3];
 
 	/*
 	 * If mark_query is true, we do not expand anything, but rather
@@ -378,17 +378,42 @@  static void batch_write(struct batch_options *opt, const void *data, size_t len)
 		write_or_die(1, data, len);
 }
 
-static void print_object_or_die(struct batch_options *opt, struct expand_data *data)
+static void batch_writev(struct batch_options *opt, struct expand_data *data,
+			const struct strbuf *hdr, size_t size)
+{
+	data->iov[0].iov_base = hdr->buf;
+	data->iov[0].iov_len = hdr->len;
+	data->iov[1].iov_len = size;
+
+	/*
+	 * Copying a (8|16)-byte iovec for a single byte is gross, but my
+	 * attempt to stuff output_delim into the trailing NUL byte of
+	 * iov[1].iov_base (and restoring it after writev(2) for the
+	 * OI_DBCACHED case) to drop iovcnt from 3->2 wasn't faster.
+	 */
+	data->iov[2].iov_base = &opt->output_delim;
+	data->iov[2].iov_len = 1;
+
+	if (opt->buffer_output)
+		fwritev_or_die(stdout, data->iov, 3);
+	else
+		writev_or_die(1, data->iov, 3);
+
+	/* writev_or_die may move iov[1].iov_base, so it's invalid */
+	data->iov[1].iov_base = NULL;
+}
+
+static void print_object_or_die(struct batch_options *opt,
+				struct expand_data *data, struct strbuf *hdr)
 {
 	const struct object_id *oid = &data->oid;
 
 	assert(data->info.typep);
 
-	if (data->content) {
-		void *content = data->content;
+	if (data->iov[1].iov_base) {
+		void *content = data->iov[1].iov_base;
 		unsigned long size = data->size;
 
-		data->content = NULL;
 		if (use_mailmap && (data->type == OBJ_COMMIT ||
 					data->type == OBJ_TAG)) {
 			size_t s = size;
@@ -399,10 +424,10 @@  static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 			}
 
 			content = replace_idents_using_mailmap(content, &s);
+			data->iov[1].iov_base = content;
 			size = cast_size_t_to_ulong(s);
 		}
-
-		batch_write(opt, content, size);
+		batch_writev(opt, data, hdr, size);
 		switch (data->info.whence) {
 		case OI_CACHED:
 			/*
@@ -419,8 +444,6 @@  static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 		}
 	} else {
 		assert(data->type == OBJ_BLOB);
-		if (opt->buffer_output)
-			fflush(stdout);
 		if (opt->transform_mode) {
 			char *contents;
 			unsigned long size;
@@ -447,10 +470,15 @@  static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					    oid_to_hex(oid), data->rest);
 			} else
 				BUG("invalid transform_mode: %c", opt->transform_mode);
-			batch_write(opt, contents, size);
+			data->iov[1].iov_base = contents;
+			batch_writev(opt, data, hdr, size);
 			free(contents);
 		} else {
+			batch_write(opt, hdr->buf, hdr->len);
+			if (opt->buffer_output)
+				fflush(stdout);
 			stream_blob(oid);
+			batch_write(opt, &opt->output_delim, 1);
 		}
 	}
 }
@@ -519,12 +547,10 @@  static void batch_object_write(const char *obj_name,
 		strbuf_addch(scratch, opt->output_delim);
 	}
 
-	batch_write(opt, scratch->buf, scratch->len);
-
-	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
-		print_object_or_die(opt, data);
-		batch_write(opt, &opt->output_delim, 1);
-	}
+	if (opt->batch_mode == BATCH_MODE_CONTENTS)
+		print_object_or_die(opt, data, scratch);
+	else
+		batch_write(opt, scratch->buf, scratch->len);
 }
 
 static void batch_one_object(const char *obj_name,
@@ -666,7 +692,7 @@  static void parse_cmd_contents(struct batch_options *opt,
 			     struct expand_data *data)
 {
 	opt->batch_mode = BATCH_MODE_CONTENTS;
-	data->info.contentp = &data->content;
+	data->info.contentp = &data->iov[1].iov_base;
 	batch_one_object(line, output, opt, data);
 }
 
@@ -823,7 +849,7 @@  static int batch_objects(struct batch_options *opt)
 		data.info.typep = &data.type;
 		if (!opt->transform_mode) {
 			data.info.sizep = &data.size;
-			data.info.contentp = &data.content;
+			data.info.contentp = &data.iov[1].iov_base;
 			data.info.content_limit = big_file_threshold;
 			data.info.direct_cache = 1;
 		}
diff --git a/config.mak.uname b/config.mak.uname
index 85d63821ec..8ce8776657 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -69,6 +69,7 @@  ifeq ($(uname_S),Linux)
 		BASIC_CFLAGS += -std=c99
         endif
 	LINK_FUZZ_PROGRAMS = YesPlease
+	HAVE_WRITEV = YesPlease
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
 	HAVE_ALLOCA_H = YesPlease
@@ -77,6 +78,7 @@  ifeq ($(uname_S),GNU/kFreeBSD)
 	DIR_HAS_BSD_GROUP_SEMANTICS = YesPlease
 	LIBC_CONTAINS_LIBINTL = YesPlease
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
+	HAVE_WRITEV = YesPlease
 endif
 ifeq ($(uname_S),UnixWare)
 	CC = cc
@@ -292,6 +294,7 @@  ifeq ($(uname_S),FreeBSD)
 	PAGER_ENV = LESS=FRX LV=-c MORE=FRX
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
 	FILENO_IS_A_MACRO = UnfortunatelyYes
+	HAVE_WRITEV = YesPlease
 endif
 ifeq ($(uname_S),OpenBSD)
 	NO_STRCASESTR = YesPlease
@@ -307,6 +310,7 @@  ifeq ($(uname_S),OpenBSD)
 	PROCFS_EXECUTABLE_PATH = /proc/curproc/file
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
 	FILENO_IS_A_MACRO = UnfortunatelyYes
+	HAVE_WRITEV = YesPlease
 endif
 ifeq ($(uname_S),MirBSD)
 	NO_STRCASESTR = YesPlease
@@ -329,6 +333,7 @@  ifeq ($(uname_S),NetBSD)
 	HAVE_BSD_KERN_PROC_SYSCTL = YesPlease
 	CSPRNG_METHOD = arc4random
 	PROCFS_EXECUTABLE_PATH = /proc/curproc/exe
+	HAVE_WRITEV = YesPlease
 endif
 ifeq ($(uname_S),AIX)
 	DEFAULT_PAGER = more
diff --git a/git-compat-util.h b/git-compat-util.h
index ca7678a379..afde8abc99 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -388,6 +388,16 @@  static inline int git_setitimer(int which UNUSED,
 #define setitimer(which,value,ovalue) git_setitimer(which,value,ovalue)
 #endif
 
+#ifdef HAVE_WRITEV
+#include <sys/uio.h>
+#define git_iovec iovec
+#else /* !HAVE_WRITEV */
+struct git_iovec {
+	void *iov_base;
+	size_t iov_len;
+};
+#endif /* !HAVE_WRITEV */
+
 #ifndef NO_LIBGEN_H
 #include <libgen.h>
 #else
diff --git a/wrapper.c b/wrapper.c
index f87d90bf57..066c772145 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -262,6 +262,24 @@  ssize_t xwrite(int fd, const void *buf, size_t len)
 	}
 }
 
+#ifdef HAVE_WRITEV
+ssize_t xwritev(int fd, const struct iovec *iov, int iovcnt)
+{
+	while (1) {
+		ssize_t nr = writev(fd, iov, iovcnt);
+
+		if (nr < 0) {
+			if (errno == EINTR)
+				continue;
+			if (handle_nonblock(fd, POLLOUT, errno))
+				continue;
+		}
+
+		return nr;
+	}
+}
+#endif /* !HAVE_WRITEV */
+
 /*
  * xpread() is the same as pread(), but it automatically restarts pread()
  * operations with a recoverable error (EAGAIN and EINTR). xpread() DOES
diff --git a/wrapper.h b/wrapper.h
index 1b2b047ea0..3d33c63d4f 100644
--- a/wrapper.h
+++ b/wrapper.h
@@ -16,6 +16,7 @@  void *xmmap_gently(void *start, size_t length, int prot, int flags, int fd, off_
 int xopen(const char *path, int flags, ...);
 ssize_t xread(int fd, void *buf, size_t len);
 ssize_t xwrite(int fd, const void *buf, size_t len);
+ssize_t xwritev(int fd, const struct git_iovec *, int iovcnt);
 ssize_t xpread(int fd, void *buf, size_t len, off_t offset);
 int xdup(int fd);
 FILE *xfopen(const char *path, const char *mode);
diff --git a/write-or-die.c b/write-or-die.c
index 01a9a51fa2..227b051165 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -107,3 +107,69 @@  void fflush_or_die(FILE *f)
 	if (fflush(f))
 		die_errno("fflush error");
 }
+
+void fwritev_or_die(FILE *fp, const struct git_iovec *iov, int iovcnt)
+{
+	int i;
+
+	for (i = 0; i < iovcnt; i++) {
+		size_t n = iov[i].iov_len;
+
+		if (fwrite(iov[i].iov_base, 1, n, fp) != n)
+			die_errno("unable to write to FD=%d", fileno(fp));
+	}
+}
+
+/*
+ * note: we don't care about atomicity from writev(2) right now.
+ * The goal is to avoid allocations+copies in the writer and
+ * reduce wakeups+syscalls in the reader.
+ * n.b. @iov is not const since we modify it to avoid allocating
+ * on partial write.
+ */
+#ifdef HAVE_WRITEV
+void writev_or_die(int fd, struct git_iovec *iov, int iovcnt)
+{
+	int i;
+
+	while (iovcnt > 0) {
+		ssize_t n = xwritev(fd, iov, iovcnt);
+
+		/* EINVAL happens when sum of iov_len exceeds SSIZE_MAX */
+		if (n < 0 && errno == EINVAL)
+			n = xwrite(fd, iov[0].iov_base, iov[0].iov_len);
+		if (n < 0) {
+			check_pipe(errno);
+			die_errno("writev error");
+		} else if (!n) {
+			errno = ENOSPC;
+			die_errno("writev_error");
+		}
+		/* skip fully written iovs, retry from the first partial iov */
+		for (i = 0; i < iovcnt; i++) {
+			if (n >= iov[i].iov_len) {
+				n -= iov[i].iov_len;
+			} else {
+				iov[i].iov_len -= n;
+				iov[i].iov_base = (char *)iov[i].iov_base + n;
+				break;
+			}
+		}
+		iovcnt -= i;
+		iov += i;
+	}
+}
+#else /* !HAVE_WRITEV */
+
+/*
+ * n.b. don't use stdio fwrite here even if it's faster, @fd may be
+ * non-blocking and stdio isn't equipped for EAGAIN
+ */
+void writev_or_die(int fd, struct git_iovec *iov, int iovcnt)
+{
+	int i;
+
+	for (i = 0; i < iovcnt; i++)
+		write_or_die(fd, iov[i].iov_base, iov[i].iov_len);
+}
+#endif /* !HAVE_WRITEV */
diff --git a/write-or-die.h b/write-or-die.h
index 65a5c42a47..20abec211c 100644
--- a/write-or-die.h
+++ b/write-or-die.h
@@ -7,6 +7,8 @@  void fprintf_or_die(FILE *, const char *fmt, ...);
 void fwrite_or_die(FILE *f, const void *buf, size_t count);
 void fflush_or_die(FILE *f);
 void write_or_die(int fd, const void *buf, size_t count);
+void writev_or_die(int fd, struct git_iovec *, int iovcnt);
+void fwritev_or_die(FILE *, const struct git_iovec *, int iovcnt);
 
 /*
  * These values are used to help identify parts of a repository to fsync.

[v2,10/10] cat-file: use writev(2) if available

Commit Message

Comments

Patch