diff mbox series

[1/1] upload-pack.c: make output buffer size configurable

Message ID 20211213132345.26310-2-jacob@gitlab.com (mailing list archive)
State New, archived
Headers show
Series Make upload-pack pack write size configurable | expand

Commit Message

Jacob Vosmaer Dec. 13, 2021, 1:23 p.m. UTC
Add a new compile time constant UPLOAD_PACK_BUFFER_SIZE that makes the
size of the static buffer in output_state configurable.

The current size of the buffer is 8192+1. The '+1' is a technical
detail orthogonal to this change. On GitLab.com we use GitLab's
pack-objects cache which does writes of 65515 bytes. Because of the
default 8KB buffer size, propagating these cache writes requires 8
pipe reads and 8 pipe writes from git-upload-pack, and 8 pipe reads
from Gitaly (our Git RPC service). If we increase the size of the
buffer to the maximum Git packet size, we need only 1 pipe read and 1
pipe write in git-upload-pack, and 1 pipe read in Gitaly to transfer
the same amount of data. In benchmarks with a pure fetch and 100%
cache hit rate workload we are seeing CPU utilization reductions of
over 30%.

Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
---
 upload-pack.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason Dec. 14, 2021, 12:08 p.m. UTC | #1
On Mon, Dec 13 2021, Jacob Vosmaer wrote:

> Add a new compile time constant UPLOAD_PACK_BUFFER_SIZE that makes the
> size of the static buffer in output_state configurable.
>
> The current size of the buffer is 8192+1. The '+1' is a technical
> detail orthogonal to this change. On GitLab.com we use GitLab's
> pack-objects cache which does writes of 65515 bytes. Because of the
> default 8KB buffer size, propagating these cache writes requires 8
> pipe reads and 8 pipe writes from git-upload-pack, and 8 pipe reads
> from Gitaly (our Git RPC service). If we increase the size of the
> buffer to the maximum Git packet size, we need only 1 pipe read and 1
> pipe write in git-upload-pack, and 1 pipe read in Gitaly to transfer
> the same amount of data. In benchmarks with a pure fetch and 100%
> cache hit rate workload we are seeing CPU utilization reductions of
> over 30%.
>
> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
> ---
>  upload-pack.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/upload-pack.c b/upload-pack.c
> index c78d55bc67..b799fbe628 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -42,6 +42,10 @@
>  #define ALL_FLAGS (THEY_HAVE | OUR_REF | WANTED | COMMON_KNOWN | SHALLOW | \
>  		NOT_SHALLOW | CLIENT_SHALLOW | HIDDEN_REF)
>  
> +#ifndef UPLOAD_PACK_BUFFER_SIZE
> +#define UPLOAD_PACK_BUFFER_SIZE 8192
> +#endif
> +
>  /* Enum for allowed unadvertised object request (UOR) */
>  enum allow_uor {
>  	/* Allow specifying sha1 if it is a ref tip. */
> @@ -194,7 +198,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
>  }
>  
>  struct output_state {
> -	char buffer[8193];
> +	char buffer[UPLOAD_PACK_BUFFER_SIZE+1];
>  	int used;
>  	unsigned packfile_uris_started : 1;
>  	unsigned packfile_started : 1;

Making this configurable obviousl has big impact in some cases, but I'm
a bit iffy on the faciltity to do so + it not being documented.

I don't think that the "static buffer" part here is important to anyone,
but the write() size is clearly important.

So doesn't it make more sense to have a uploadPack.bufferSize=8k
variable we can tweak, just make this "buffer" a "struct strbuf" instead
(i.e. it'll by dynamically grown), and then just flush it whenever we
hit the configured buffer size?
Jeff King Dec. 14, 2021, 3:08 p.m. UTC | #2
On Tue, Dec 14, 2021 at 01:08:55PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > +#ifndef UPLOAD_PACK_BUFFER_SIZE
> > +#define UPLOAD_PACK_BUFFER_SIZE 8192
> > +#endif
> > +
> >  /* Enum for allowed unadvertised object request (UOR) */
> >  enum allow_uor {
> >  	/* Allow specifying sha1 if it is a ref tip. */
> > @@ -194,7 +198,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
> >  }
> >  
> >  struct output_state {
> > -	char buffer[8193];
> > +	char buffer[UPLOAD_PACK_BUFFER_SIZE+1];
> >  	int used;
> >  	unsigned packfile_uris_started : 1;
> >  	unsigned packfile_started : 1;
> 
> Making this configurable obviousl has big impact in some cases, but I'm
> a bit iffy on the faciltity to do so + it not being documented.
> 
> I don't think that the "static buffer" part here is important to anyone,
> but the write() size is clearly important.
> 
> So doesn't it make more sense to have a uploadPack.bufferSize=8k
> variable we can tweak, just make this "buffer" a "struct strbuf" instead
> (i.e. it'll by dynamically grown), and then just flush it whenever we
> hit the configured buffer size?

I don't think we want to grow dynamically, because we don't want to hold
arbitrary amounts of data in memory. We're just passing it through. But
if there were a run-time config option, we could easily heap-allocate
the buffer up front.

That may be overkill, though I do agree this is kind of weirdly
undocumented. There are two other subtleties I notice:

  - This output_state struct does go on the stack, so something big like
    64k is questionable there (though on Linux, without recursion, it's
    usually OK).

  - we're relaying the data into pkt-lines. Do we run into problems when
    the buffer is larger than a packet? I think in send_sideband() we'll
    break it up as appropriate. But before we hit the PACK header, we
    send out packfile-uris directly with packet_write_fmt(). Those
    aren't likely to be long, but if they are, we'd die() in
    format_packet(). So something around LARGE_PACKET_DATA_MAX is
    probably the highest you'd want to set it anyway (and the most
    performant, since otherwise you have to write out extra partial
    packets).

So I kind of wonder if there is any real _harm_ in just always using
bigger packets, even if it does not always help. Given the subtle rules
about packet-max above, then we could just use that optimal value and
not worry about configurability.

-Peff
Jeff King Dec. 14, 2021, 3:37 p.m. UTC | #3
On Mon, Dec 13, 2021 at 02:23:45PM +0100, Jacob Vosmaer wrote:

> The current size of the buffer is 8192+1. The '+1' is a technical
> detail orthogonal to this change. On GitLab.com we use GitLab's
> pack-objects cache which does writes of 65515 bytes. Because of the
> default 8KB buffer size, propagating these cache writes requires 8
> pipe reads and 8 pipe writes from git-upload-pack, and 8 pipe reads
> from Gitaly (our Git RPC service). If we increase the size of the
> buffer to the maximum Git packet size, we need only 1 pipe read and 1
> pipe write in git-upload-pack, and 1 pipe read in Gitaly to transfer
> the same amount of data. In benchmarks with a pure fetch and 100%
> cache hit rate workload we are seeing CPU utilization reductions of
> over 30%.

I was curious to reproduce this locally, so I came up with:

  (
    printf "003fwant %s side-band-64k" $(git rev-parse HEAD)
    printf 0000
    echo '0009done'
  ) |
  git -c uploadpack.packobjectsHook='cat objects/pack/pack-*.pack; :' upload-pack . |
  pv >/dev/null

which hackily simulates the server side of your "cached" case. :) I ran
it on a fully-packed clone of linux.git.

It gets about 2.3GB/s with the tip of 'master' and 3.2GB/s with the
equivalent of your patch (using LARGE_PACKET_DATA_MAX). So definitely an
improvement.

Without the cached case (so actually running pack-objects, albeit a
pretty quick one because of bitmaps and pack-reuse), the timings are
about the same (171MB/s versus 174MB/s, but really it's just pegging a
CPU running pack-objects). So it would be fine to just do this
unconditionally, I think.

Looking at strace, the other thing I notice is that we write() the
packet header separately in send_sideband(), which doubles the number of
syscalls. I hackily re-wrote this to use writev() instead (patch below),
but it doesn't seem to actually help much (maybe a curiosity to explore
further, but definitely not something to hold up your patch).

-Peff

---
diff --git a/sideband.c b/sideband.c
index 85bddfdcd4..d0945507a3 100644
--- a/sideband.c
+++ b/sideband.c
@@ -5,6 +5,11 @@
 #include "help.h"
 #include "pkt-line.h"
 
+/* hack; should go in git-compat-util, and should provide compat
+ * wrapper around regular write()
+ */
+#include <sys/uio.h>
+
 struct keyword_entry {
 	/*
 	 * We use keyword as config key so it should be a single alphanumeric word.
@@ -257,6 +262,7 @@ void send_sideband(int fd, int band, const char *data, ssize_t sz, int packet_ma
 
 	while (sz) {
 		unsigned n;
+		struct iovec iov[2];
 		char hdr[5];
 
 		n = sz;
@@ -265,12 +271,17 @@ void send_sideband(int fd, int band, const char *data, ssize_t sz, int packet_ma
 		if (0 <= band) {
 			xsnprintf(hdr, sizeof(hdr), "%04x", n + 5);
 			hdr[4] = band;
-			write_or_die(fd, hdr, 5);
+			iov[0].iov_base = hdr;
+			iov[0].iov_len = 5;
 		} else {
 			xsnprintf(hdr, sizeof(hdr), "%04x", n + 4);
-			write_or_die(fd, hdr, 4);
+			iov[0].iov_base = hdr;
+			iov[0].iov_len = 4;
 		}
-		write_or_die(fd, p, n);
+		iov[1].iov_base = (void *)p;
+		iov[1].iov_len = n;
+		/* should check for errors, but also short writes and EINTR, etc */
+		writev(fd, iov, 2);
 		p += n;
 		sz -= n;
 	}
diff --git a/upload-pack.c b/upload-pack.c
index c78d55bc67..111de8c60c 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -194,7 +194,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 }
 
 struct output_state {
-	char buffer[8193];
+	char buffer[LARGE_PACKET_DATA_MAX];
 	int used;
 	unsigned packfile_uris_started : 1;
 	unsigned packfile_started : 1;
Jacob Vosmaer Dec. 14, 2021, 8:04 p.m. UTC | #4
On Tue, Dec 14, 2021 at 4:37 PM Jeff King <peff@peff.net> wrote:
>
> It gets about 2.3GB/s with the tip of 'master' and 3.2GB/s with the
> equivalent of your patch (using LARGE_PACKET_DATA_MAX). So definitely an
> improvement.
>
> Without the cached case (so actually running pack-objects, albeit a
> pretty quick one because of bitmaps and pack-reuse), the timings are
> about the same (171MB/s versus 174MB/s, but really it's just pegging a
> CPU running pack-objects). So it would be fine to just do this
> unconditionally, I think.

Thank you for validating this!

> Looking at strace, the other thing I notice is that we write() the
> packet header separately in send_sideband(), which doubles the number of
> syscalls. I hackily re-wrote this to use writev() instead (patch below),
> but it doesn't seem to actually help much (maybe a curiosity to explore
> further, but definitely not something to hold up your patch).

Those double writes bug me too. Getting rid of them was one of the
things I liked about using stdio here. Curious now if writev will make
an impact in my benchmarks. As your experiments indicate, the double
writes may not be a problem in real life. Either way, I also feel it
is not something to rope into this patch set.
diff mbox series

Patch

diff --git a/upload-pack.c b/upload-pack.c
index c78d55bc67..b799fbe628 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -42,6 +42,10 @@ 
 #define ALL_FLAGS (THEY_HAVE | OUR_REF | WANTED | COMMON_KNOWN | SHALLOW | \
 		NOT_SHALLOW | CLIENT_SHALLOW | HIDDEN_REF)
 
+#ifndef UPLOAD_PACK_BUFFER_SIZE
+#define UPLOAD_PACK_BUFFER_SIZE 8192
+#endif
+
 /* Enum for allowed unadvertised object request (UOR) */
 enum allow_uor {
 	/* Allow specifying sha1 if it is a ref tip. */
@@ -194,7 +198,7 @@  static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 }
 
 struct output_state {
-	char buffer[8193];
+	char buffer[UPLOAD_PACK_BUFFER_SIZE+1];
 	int used;
 	unsigned packfile_uris_started : 1;
 	unsigned packfile_started : 1;