Message ID | 20211213132345.26310-1-jacob@gitlab.com (mailing list archive) |
---|---|
Headers | show |
Series | Make upload-pack pack write size configurable | expand |
On Mon, Dec 13, 2021 at 02:23:44PM +0100, Jacob Vosmaer wrote: > When transfering packfile data, upload-pack.c uses an 8KB buffer. > This is a reasonable size but when you transfer a lot of packfile > data, like we do on GitLab.com, we find it is beneficial to use a > larger buffer size. > > Below you will find a commit where we make the size of this 8KB > buffer configurable at compile time. It appears pack-objects always > does 8KB writes so I don't think we should change the default. But > for GitLab, where we have a cache for the output of pack-objects, > it is beneficial to use a larger IO size because the cache does > 64KB writes. I suspect the big reason that it matters for the cache is that for pack-objects, we're typically CPU bound computing the sha1 of the outgoing pack (for its trailer). I do suspect we could just always move to something closer to LARGE_PACKET_DATA_MAX (probably minus 1 for the sideband marker). Even if pack-objects only writes in 8k chunks, we may be able to pull several in one read(), especially if the network is a bottleneck (because we'd block writing to the client, and pack-objects would fill up the pipe buffer). I.e., it doesn't seem like there's any real downside to doing so. > I have also considered converting the packfile copying code to use > stdio when writing to stdout, but that would be a bigger change > because we have to be careful not to interleave stdio and stdlib > writes. And we would have to make the stdout output buffer size > configurable, because the default stdio buffer size is 4KB which > is no better than the status quo. A final argument against the stdio > approach is that it only reduces the number of writes from upload-pack, > while a larger buffer size reduces both the number of reads and > writes. Yeah, I don't think that would help all that much. We really want to size this based on pkt-line limits. That reduces syscalls, but also shrinks the overall size a little (since larger packets minimizes the framing overhead of the pkt-line headers). -Peff