Message ID | patch-2.3-a920a9971e8-20210707T103712Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | strbuf.[ch]: add STRBUF_HINT_SIZE, don't hardcode 8192 | expand |
On Wed, Jul 7, 2021 at 6:38 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > Change a couple of users of strbuf_init() that pass a hint of 8192 to > pass STRBUF_HINT_SIZE instead. > > Both of these hardcoded occurrences pre-date the use of the strbuf > API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and > af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff, > 2007-09-06). > > In both cases the exact choice of 8192 is rather arbitrary, e.g. for > commit buffers I think 1024 or 2048 would probably be a better > default (this commit message is getting this commit close to the > former, but I daresay it's already way above the average for git > commits). > > In any case, if we ever tweak STRBUF_HINT_SIZE we'll probably do so on > the basis of some codease-wide performance tests, so replacing the Did you mean? s/codease/code/ > hardcoded value with STRBUF_HINT_SIZE should be safe, they're the same > now, and if we change STRBUF_HINT_SIZE in the future this is one of > the main codepaths we'll be testing. > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Change a couple of users of strbuf_init() that pass a hint of 8192 to > pass STRBUF_HINT_SIZE instead. > > Both of these hardcoded occurrences pre-date the use of the strbuf > API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and > af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff, > 2007-09-06). > > In both cases the exact choice of 8192 is rather arbitrary, e.g. for > commit buffers I think 1024 or 2048 would probably be a better > default (this commit message is getting this commit close to the > former, but I daresay it's already way above the average for git > commits). Yes, they are arbitrary within the context of these callers. I do not think using STRBUF_HINT_SIZE macro in them is the right thing to do at all, as there is no reason to think that the best value for the write chunk sizes in these codepath has any linkage to the best value for the read chunk sizes used by strbuf_read() at all. When benchmarking reveals that the best default size for strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how do you tell that it also happens to be the best write buffer size for the cache-tree writeout codepath (answer: you don't)?
Junio C Hamano <gitster@pobox.com> writes: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> Change a couple of users of strbuf_init() that pass a hint of 8192 to >> pass STRBUF_HINT_SIZE instead. >> >> Both of these hardcoded occurrences pre-date the use of the strbuf >> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and >> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff, >> 2007-09-06). >> >> In both cases the exact choice of 8192 is rather arbitrary, e.g. for >> commit buffers I think 1024 or 2048 would probably be a better >> default (this commit message is getting this commit close to the >> former, but I daresay it's already way above the average for git >> commits). > > Yes, they are arbitrary within the context of these callers. > > I do not think using STRBUF_HINT_SIZE macro in them is the right > thing to do at all, as there is no reason to think that the best > value for the write chunk sizes in these codepath has any linkage to > the best value for the read chunk sizes used by strbuf_read() at > all. When benchmarking reveals that the best default size for > strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how > do you tell that it also happens to be the best write buffer size > for the cache-tree writeout codepath (answer: you don't)? Having said all that, I wouldn't be so opposed to an approach that - declares that we need only one "default I/O buffer size"; - declares that the appropriate size for it is 8192; - #define DEFAULT_IO_SIZE 8192; - does something like your [PATCH 1/3], but not limited to strbuf API, and - covers also the writeout codepath of cache-tree, etc. that uses hardcoded I/O buffer size. The biggest trouble I had with the posted patches, especially the [PATCH 2/3], was that I am quite sure that you wouldn't have used STRBUF_HINT_SIZE in the cache-tree writeout codepath or commit-tree codepath if they didn't use strbuf as a convenient way to get an elastic buffer. The more relevant commonality across codepaths that use 8192 is that the constant is used for sizing the I/O buffer, and I got an impression that the 3-patch series posted did an incomplete job of touching some that happen to use strbuf. An approach that concentrated on the "right" commonality, i.e. we have hardcoded magic constants for I/O buffer sizing, would have covered copy.c, diff-delta.c, http-backend.c etc. that do not use strbuf API where they have hardcoded 8k constants. Thanks.
On July 7, 2021 6:38 PM, Junio C Hamano wrote: >To: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >Cc: git@vger.kernel.org; Jeff King <peff@peff.net> >Subject: Re: [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE > >Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> Change a couple of users of strbuf_init() that pass a hint of 8192 to >> pass STRBUF_HINT_SIZE instead. >> >> Both of these hardcoded occurrences pre-date the use of the strbuf >> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and >> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff, >> 2007-09-06). >> >> In both cases the exact choice of 8192 is rather arbitrary, e.g. for >> commit buffers I think 1024 or 2048 would probably be a better default >> (this commit message is getting this commit close to the former, but I >> daresay it's already way above the average for git commits). > >Yes, they are arbitrary within the context of these callers. > >I do not think using STRBUF_HINT_SIZE macro in them is the right thing to do at all, as there is no reason to think that the best value for >the write chunk sizes in these codepath has any linkage to the best value for the read chunk sizes used by strbuf_read() at all. When >benchmarking reveals that the best default size for >strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how do you tell that it also happens to be the best write buffer size for >the cache-tree writeout codepath (answer: you don't)? And benchmark results are going to be highly platform dependent, as we have seen with our exotic platform. -Randall
On Wed, Jul 07, 2021 at 03:37:56PM -0700, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > > > Change a couple of users of strbuf_init() that pass a hint of 8192 to > > pass STRBUF_HINT_SIZE instead. > > > > Both of these hardcoded occurrences pre-date the use of the strbuf > > API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and > > af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff, > > 2007-09-06). > > > > In both cases the exact choice of 8192 is rather arbitrary, e.g. for > > commit buffers I think 1024 or 2048 would probably be a better > > default (this commit message is getting this commit close to the > > former, but I daresay it's already way above the average for git > > commits). > > Yes, they are arbitrary within the context of these callers. > > I do not think using STRBUF_HINT_SIZE macro in them is the right > thing to do at all, as there is no reason to think that the best > value for the write chunk sizes in these codepath has any linkage to > the best value for the read chunk sizes used by strbuf_read() at > all. When benchmarking reveals that the best default size for > strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how > do you tell that it also happens to be the best write buffer size > for the cache-tree writeout codepath (answer: you don't)? Being cc'd on this series, I feel compelled to respond with some review. But I'm in such agreement with what you said here (and downthread, and also in your response to patch 1) that I can only add a lame "me too". :) -Peff
diff --git a/cache-tree.c b/cache-tree.c index 45e58666afc..d69f6d1c66f 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -335,7 +335,7 @@ static int update_one(struct cache_tree *it, /* * Then write out the tree object for this level. */ - strbuf_init(&buffer, 8192); + strbuf_init(&buffer, STRBUF_HINT_SIZE); i = 0; while (i < entries) { diff --git a/commit.c b/commit.c index 8ea55a447fa..b3aab46072a 100644 --- a/commit.c +++ b/commit.c @@ -1521,7 +1521,7 @@ int commit_tree_extended(const char *msg, size_t msg_len, /* Not having i18n.commitencoding is the same as having utf-8 */ encoding_is_utf8 = is_encoding_utf8(git_commit_encoding); - strbuf_init(&buffer, 8192); /* should avoid reallocs for the headers */ + strbuf_init(&buffer, STRBUF_HINT_SIZE); /* should avoid reallocs for the headers */ strbuf_addf(&buffer, "tree %s\n", oid_to_hex(tree)); /*
Change a couple of users of strbuf_init() that pass a hint of 8192 to pass STRBUF_HINT_SIZE instead. Both of these hardcoded occurrences pre-date the use of the strbuf API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff, 2007-09-06). In both cases the exact choice of 8192 is rather arbitrary, e.g. for commit buffers I think 1024 or 2048 would probably be a better default (this commit message is getting this commit close to the former, but I daresay it's already way above the average for git commits). In any case, if we ever tweak STRBUF_HINT_SIZE we'll probably do so on the basis of some codease-wide performance tests, so replacing the hardcoded value with STRBUF_HINT_SIZE should be safe, they're the same now, and if we change STRBUF_HINT_SIZE in the future this is one of the main codepaths we'll be testing. Aside: It's unfortunate that we don't take a "hint" of "0" in strbuf_init() to mean "default" and e.g. "-1" to mean the strbuf_slopbuf (a in STRBUF_INIT). I considered adding that, or splitting them up so you'd do strbuf_init(&buf) for the strbuf_init(&buf, 0) case, or strbuf_init_alloc() for the strbuf_init(&buf, N) case, where N > 0. But that would be a big change across the codebase, so let's punt on that for now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- cache-tree.c | 2 +- commit.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)