Message ID | 20211213132345.26310-2-jacob@gitlab.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Make upload-pack pack write size configurable | expand |
On Mon, Dec 13 2021, Jacob Vosmaer wrote: > Add a new compile time constant UPLOAD_PACK_BUFFER_SIZE that makes the > size of the static buffer in output_state configurable. > > The current size of the buffer is 8192+1. The '+1' is a technical > detail orthogonal to this change. On GitLab.com we use GitLab's > pack-objects cache which does writes of 65515 bytes. Because of the > default 8KB buffer size, propagating these cache writes requires 8 > pipe reads and 8 pipe writes from git-upload-pack, and 8 pipe reads > from Gitaly (our Git RPC service). If we increase the size of the > buffer to the maximum Git packet size, we need only 1 pipe read and 1 > pipe write in git-upload-pack, and 1 pipe read in Gitaly to transfer > the same amount of data. In benchmarks with a pure fetch and 100% > cache hit rate workload we are seeing CPU utilization reductions of > over 30%. > > Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> > --- > upload-pack.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/upload-pack.c b/upload-pack.c > index c78d55bc67..b799fbe628 100644 > --- a/upload-pack.c > +++ b/upload-pack.c > @@ -42,6 +42,10 @@ > #define ALL_FLAGS (THEY_HAVE | OUR_REF | WANTED | COMMON_KNOWN | SHALLOW | \ > NOT_SHALLOW | CLIENT_SHALLOW | HIDDEN_REF) > > +#ifndef UPLOAD_PACK_BUFFER_SIZE > +#define UPLOAD_PACK_BUFFER_SIZE 8192 > +#endif > + > /* Enum for allowed unadvertised object request (UOR) */ > enum allow_uor { > /* Allow specifying sha1 if it is a ref tip. */ > @@ -194,7 +198,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data) > } > > struct output_state { > - char buffer[8193]; > + char buffer[UPLOAD_PACK_BUFFER_SIZE+1]; > int used; > unsigned packfile_uris_started : 1; > unsigned packfile_started : 1; Making this configurable obviousl has big impact in some cases, but I'm a bit iffy on the faciltity to do so + it not being documented. I don't think that the "static buffer" part here is important to anyone, but the write() size is clearly important. So doesn't it make more sense to have a uploadPack.bufferSize=8k variable we can tweak, just make this "buffer" a "struct strbuf" instead (i.e. it'll by dynamically grown), and then just flush it whenever we hit the configured buffer size?
On Tue, Dec 14, 2021 at 01:08:55PM +0100, Ævar Arnfjörð Bjarmason wrote: > > +#ifndef UPLOAD_PACK_BUFFER_SIZE > > +#define UPLOAD_PACK_BUFFER_SIZE 8192 > > +#endif > > + > > /* Enum for allowed unadvertised object request (UOR) */ > > enum allow_uor { > > /* Allow specifying sha1 if it is a ref tip. */ > > @@ -194,7 +198,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data) > > } > > > > struct output_state { > > - char buffer[8193]; > > + char buffer[UPLOAD_PACK_BUFFER_SIZE+1]; > > int used; > > unsigned packfile_uris_started : 1; > > unsigned packfile_started : 1; > > Making this configurable obviousl has big impact in some cases, but I'm > a bit iffy on the faciltity to do so + it not being documented. > > I don't think that the "static buffer" part here is important to anyone, > but the write() size is clearly important. > > So doesn't it make more sense to have a uploadPack.bufferSize=8k > variable we can tweak, just make this "buffer" a "struct strbuf" instead > (i.e. it'll by dynamically grown), and then just flush it whenever we > hit the configured buffer size? I don't think we want to grow dynamically, because we don't want to hold arbitrary amounts of data in memory. We're just passing it through. But if there were a run-time config option, we could easily heap-allocate the buffer up front. That may be overkill, though I do agree this is kind of weirdly undocumented. There are two other subtleties I notice: - This output_state struct does go on the stack, so something big like 64k is questionable there (though on Linux, without recursion, it's usually OK). - we're relaying the data into pkt-lines. Do we run into problems when the buffer is larger than a packet? I think in send_sideband() we'll break it up as appropriate. But before we hit the PACK header, we send out packfile-uris directly with packet_write_fmt(). Those aren't likely to be long, but if they are, we'd die() in format_packet(). So something around LARGE_PACKET_DATA_MAX is probably the highest you'd want to set it anyway (and the most performant, since otherwise you have to write out extra partial packets). So I kind of wonder if there is any real _harm_ in just always using bigger packets, even if it does not always help. Given the subtle rules about packet-max above, then we could just use that optimal value and not worry about configurability. -Peff
On Mon, Dec 13, 2021 at 02:23:45PM +0100, Jacob Vosmaer wrote: > The current size of the buffer is 8192+1. The '+1' is a technical > detail orthogonal to this change. On GitLab.com we use GitLab's > pack-objects cache which does writes of 65515 bytes. Because of the > default 8KB buffer size, propagating these cache writes requires 8 > pipe reads and 8 pipe writes from git-upload-pack, and 8 pipe reads > from Gitaly (our Git RPC service). If we increase the size of the > buffer to the maximum Git packet size, we need only 1 pipe read and 1 > pipe write in git-upload-pack, and 1 pipe read in Gitaly to transfer > the same amount of data. In benchmarks with a pure fetch and 100% > cache hit rate workload we are seeing CPU utilization reductions of > over 30%. I was curious to reproduce this locally, so I came up with: ( printf "003fwant %s side-band-64k" $(git rev-parse HEAD) printf 0000 echo '0009done' ) | git -c uploadpack.packobjectsHook='cat objects/pack/pack-*.pack; :' upload-pack . | pv >/dev/null which hackily simulates the server side of your "cached" case. :) I ran it on a fully-packed clone of linux.git. It gets about 2.3GB/s with the tip of 'master' and 3.2GB/s with the equivalent of your patch (using LARGE_PACKET_DATA_MAX). So definitely an improvement. Without the cached case (so actually running pack-objects, albeit a pretty quick one because of bitmaps and pack-reuse), the timings are about the same (171MB/s versus 174MB/s, but really it's just pegging a CPU running pack-objects). So it would be fine to just do this unconditionally, I think. Looking at strace, the other thing I notice is that we write() the packet header separately in send_sideband(), which doubles the number of syscalls. I hackily re-wrote this to use writev() instead (patch below), but it doesn't seem to actually help much (maybe a curiosity to explore further, but definitely not something to hold up your patch). -Peff --- diff --git a/sideband.c b/sideband.c index 85bddfdcd4..d0945507a3 100644 --- a/sideband.c +++ b/sideband.c @@ -5,6 +5,11 @@ #include "help.h" #include "pkt-line.h" +/* hack; should go in git-compat-util, and should provide compat + * wrapper around regular write() + */ +#include <sys/uio.h> + struct keyword_entry { /* * We use keyword as config key so it should be a single alphanumeric word. @@ -257,6 +262,7 @@ void send_sideband(int fd, int band, const char *data, ssize_t sz, int packet_ma while (sz) { unsigned n; + struct iovec iov[2]; char hdr[5]; n = sz; @@ -265,12 +271,17 @@ void send_sideband(int fd, int band, const char *data, ssize_t sz, int packet_ma if (0 <= band) { xsnprintf(hdr, sizeof(hdr), "%04x", n + 5); hdr[4] = band; - write_or_die(fd, hdr, 5); + iov[0].iov_base = hdr; + iov[0].iov_len = 5; } else { xsnprintf(hdr, sizeof(hdr), "%04x", n + 4); - write_or_die(fd, hdr, 4); + iov[0].iov_base = hdr; + iov[0].iov_len = 4; } - write_or_die(fd, p, n); + iov[1].iov_base = (void *)p; + iov[1].iov_len = n; + /* should check for errors, but also short writes and EINTR, etc */ + writev(fd, iov, 2); p += n; sz -= n; } diff --git a/upload-pack.c b/upload-pack.c index c78d55bc67..111de8c60c 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -194,7 +194,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data) } struct output_state { - char buffer[8193]; + char buffer[LARGE_PACKET_DATA_MAX]; int used; unsigned packfile_uris_started : 1; unsigned packfile_started : 1;
On Tue, Dec 14, 2021 at 4:37 PM Jeff King <peff@peff.net> wrote: > > It gets about 2.3GB/s with the tip of 'master' and 3.2GB/s with the > equivalent of your patch (using LARGE_PACKET_DATA_MAX). So definitely an > improvement. > > Without the cached case (so actually running pack-objects, albeit a > pretty quick one because of bitmaps and pack-reuse), the timings are > about the same (171MB/s versus 174MB/s, but really it's just pegging a > CPU running pack-objects). So it would be fine to just do this > unconditionally, I think. Thank you for validating this! > Looking at strace, the other thing I notice is that we write() the > packet header separately in send_sideband(), which doubles the number of > syscalls. I hackily re-wrote this to use writev() instead (patch below), > but it doesn't seem to actually help much (maybe a curiosity to explore > further, but definitely not something to hold up your patch). Those double writes bug me too. Getting rid of them was one of the things I liked about using stdio here. Curious now if writev will make an impact in my benchmarks. As your experiments indicate, the double writes may not be a problem in real life. Either way, I also feel it is not something to rope into this patch set.
diff --git a/upload-pack.c b/upload-pack.c index c78d55bc67..b799fbe628 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -42,6 +42,10 @@ #define ALL_FLAGS (THEY_HAVE | OUR_REF | WANTED | COMMON_KNOWN | SHALLOW | \ NOT_SHALLOW | CLIENT_SHALLOW | HIDDEN_REF) +#ifndef UPLOAD_PACK_BUFFER_SIZE +#define UPLOAD_PACK_BUFFER_SIZE 8192 +#endif + /* Enum for allowed unadvertised object request (UOR) */ enum allow_uor { /* Allow specifying sha1 if it is a ref tip. */ @@ -194,7 +198,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data) } struct output_state { - char buffer[8193]; + char buffer[UPLOAD_PACK_BUFFER_SIZE+1]; int used; unsigned packfile_uris_started : 1; unsigned packfile_started : 1;
Add a new compile time constant UPLOAD_PACK_BUFFER_SIZE that makes the size of the static buffer in output_state configurable. The current size of the buffer is 8192+1. The '+1' is a technical detail orthogonal to this change. On GitLab.com we use GitLab's pack-objects cache which does writes of 65515 bytes. Because of the default 8KB buffer size, propagating these cache writes requires 8 pipe reads and 8 pipe writes from git-upload-pack, and 8 pipe reads from Gitaly (our Git RPC service). If we increase the size of the buffer to the maximum Git packet size, we need only 1 pipe read and 1 pipe write in git-upload-pack, and 1 pipe read in Gitaly to transfer the same amount of data. In benchmarks with a pure fetch and 100% cache hit rate workload we are seeing CPU utilization reductions of over 30%. Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> --- upload-pack.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)