diff mbox series

[2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE

Message ID patch-2.3-a920a9971e8-20210707T103712Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series strbuf.[ch]: add STRBUF_HINT_SIZE, don't hardcode 8192 | expand

Commit Message

Ævar Arnfjörð Bjarmason July 7, 2021, 10:38 a.m. UTC
Change a couple of users of strbuf_init() that pass a hint of 8192 to
pass STRBUF_HINT_SIZE instead.

Both of these hardcoded occurrences pre-date the use of the strbuf
API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
2007-09-06).

In both cases the exact choice of 8192 is rather arbitrary, e.g. for
commit buffers I think 1024 or 2048 would probably be a better
default (this commit message is getting this commit close to the
former, but I daresay it's already way above the average for git
commits).

In any case, if we ever tweak STRBUF_HINT_SIZE we'll probably do so on
the basis of some codease-wide performance tests, so replacing the
hardcoded value with STRBUF_HINT_SIZE should be safe, they're the same
now, and if we change STRBUF_HINT_SIZE in the future this is one of
the main codepaths we'll be testing.

Aside: It's unfortunate that we don't take a "hint" of "0" in
strbuf_init() to mean "default" and e.g. "-1" to mean the
strbuf_slopbuf (a in STRBUF_INIT). I considered adding that, or
splitting them up so you'd do strbuf_init(&buf) for the
strbuf_init(&buf, 0) case, or strbuf_init_alloc() for the
strbuf_init(&buf, N) case, where N > 0. But that would be a big change
across the codebase, so let's punt on that for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache-tree.c | 2 +-
 commit.c     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Eric Sunshine July 7, 2021, 8:10 p.m. UTC | #1
On Wed, Jul 7, 2021 at 6:38 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> Change a couple of users of strbuf_init() that pass a hint of 8192 to
> pass STRBUF_HINT_SIZE instead.
>
> Both of these hardcoded occurrences pre-date the use of the strbuf
> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
> 2007-09-06).
>
> In both cases the exact choice of 8192 is rather arbitrary, e.g. for
> commit buffers I think 1024 or 2048 would probably be a better
> default (this commit message is getting this commit close to the
> former, but I daresay it's already way above the average for git
> commits).
>
> In any case, if we ever tweak STRBUF_HINT_SIZE we'll probably do so on
> the basis of some codease-wide performance tests, so replacing the

Did you mean? s/codease/code/

> hardcoded value with STRBUF_HINT_SIZE should be safe, they're the same
> now, and if we change STRBUF_HINT_SIZE in the future this is one of
> the main codepaths we'll be testing.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Junio C Hamano July 7, 2021, 10:37 p.m. UTC | #2
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change a couple of users of strbuf_init() that pass a hint of 8192 to
> pass STRBUF_HINT_SIZE instead.
>
> Both of these hardcoded occurrences pre-date the use of the strbuf
> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
> 2007-09-06).
>
> In both cases the exact choice of 8192 is rather arbitrary, e.g. for
> commit buffers I think 1024 or 2048 would probably be a better
> default (this commit message is getting this commit close to the
> former, but I daresay it's already way above the average for git
> commits).

Yes, they are arbitrary within the context of these callers.

I do not think using STRBUF_HINT_SIZE macro in them is the right
thing to do at all, as there is no reason to think that the best
value for the write chunk sizes in these codepath has any linkage to
the best value for the read chunk sizes used by strbuf_read() at
all.  When benchmarking reveals that the best default size for
strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how
do you tell that it also happens to be the best write buffer size
for the cache-tree writeout codepath (answer: you don't)?
Junio C Hamano July 7, 2021, 11:09 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Change a couple of users of strbuf_init() that pass a hint of 8192 to
>> pass STRBUF_HINT_SIZE instead.
>>
>> Both of these hardcoded occurrences pre-date the use of the strbuf
>> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
>> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
>> 2007-09-06).
>>
>> In both cases the exact choice of 8192 is rather arbitrary, e.g. for
>> commit buffers I think 1024 or 2048 would probably be a better
>> default (this commit message is getting this commit close to the
>> former, but I daresay it's already way above the average for git
>> commits).
>
> Yes, they are arbitrary within the context of these callers.
>
> I do not think using STRBUF_HINT_SIZE macro in them is the right
> thing to do at all, as there is no reason to think that the best
> value for the write chunk sizes in these codepath has any linkage to
> the best value for the read chunk sizes used by strbuf_read() at
> all.  When benchmarking reveals that the best default size for
> strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how
> do you tell that it also happens to be the best write buffer size
> for the cache-tree writeout codepath (answer: you don't)?

Having said all that, I wouldn't be so opposed to an approach that

 - declares that we need only one "default I/O buffer size";

 - declares that the appropriate size for it is 8192;

 - #define DEFAULT_IO_SIZE 8192;

 - does something like your [PATCH 1/3], but not limited to strbuf
   API, and

 - covers also the writeout codepath of cache-tree, etc. that uses
   hardcoded I/O buffer size.

The biggest trouble I had with the posted patches, especially the
[PATCH 2/3], was that I am quite sure that you wouldn't have used
STRBUF_HINT_SIZE in the cache-tree writeout codepath or commit-tree
codepath if they didn't use strbuf as a convenient way to get an
elastic buffer.  The more relevant commonality across codepaths that
use 8192 is that the constant is used for sizing the I/O buffer, and
I got an impression that the 3-patch series posted did an incomplete
job of touching some that happen to use strbuf.

An approach that concentrated on the "right" commonality, i.e. we
have hardcoded magic constants for I/O buffer sizing, would have
covered copy.c, diff-delta.c, http-backend.c etc. that do not use
strbuf API where they have hardcoded 8k constants.

Thanks.
Randall S. Becker July 7, 2021, 11:22 p.m. UTC | #4
On July 7, 2021 6:38 PM, Junio C Hamano wrote:
>To: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>Cc: git@vger.kernel.org; Jeff King <peff@peff.net>
>Subject: Re: [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE
>
>Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Change a couple of users of strbuf_init() that pass a hint of 8192 to
>> pass STRBUF_HINT_SIZE instead.
>>
>> Both of these hardcoded occurrences pre-date the use of the strbuf
>> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
>> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
>> 2007-09-06).
>>
>> In both cases the exact choice of 8192 is rather arbitrary, e.g. for
>> commit buffers I think 1024 or 2048 would probably be a better default
>> (this commit message is getting this commit close to the former, but I
>> daresay it's already way above the average for git commits).
>
>Yes, they are arbitrary within the context of these callers.
>
>I do not think using STRBUF_HINT_SIZE macro in them is the right thing to do at all, as there is no reason to think that the best value for
>the write chunk sizes in these codepath has any linkage to the best value for the read chunk sizes used by strbuf_read() at all.  When
>benchmarking reveals that the best default size for
>strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how do you tell that it also happens to be the best write buffer size for
>the cache-tree writeout codepath (answer: you don't)?

And benchmark results are going to be highly platform dependent, as we have seen with our exotic platform.
-Randall
Jeff King July 12, 2021, 5:58 p.m. UTC | #5
On Wed, Jul 07, 2021 at 03:37:56PM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > Change a couple of users of strbuf_init() that pass a hint of 8192 to
> > pass STRBUF_HINT_SIZE instead.
> >
> > Both of these hardcoded occurrences pre-date the use of the strbuf
> > API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
> > af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
> > 2007-09-06).
> >
> > In both cases the exact choice of 8192 is rather arbitrary, e.g. for
> > commit buffers I think 1024 or 2048 would probably be a better
> > default (this commit message is getting this commit close to the
> > former, but I daresay it's already way above the average for git
> > commits).
> 
> Yes, they are arbitrary within the context of these callers.
> 
> I do not think using STRBUF_HINT_SIZE macro in them is the right
> thing to do at all, as there is no reason to think that the best
> value for the write chunk sizes in these codepath has any linkage to
> the best value for the read chunk sizes used by strbuf_read() at
> all.  When benchmarking reveals that the best default size for
> strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how
> do you tell that it also happens to be the best write buffer size
> for the cache-tree writeout codepath (answer: you don't)?

Being cc'd on this series, I feel compelled to respond with some review.
But I'm in such agreement with what you said here (and downthread, and
also in your response to patch 1) that I can only add a lame "me too". :)

-Peff
diff mbox series

Patch

diff --git a/cache-tree.c b/cache-tree.c
index 45e58666afc..d69f6d1c66f 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -335,7 +335,7 @@  static int update_one(struct cache_tree *it,
 	/*
 	 * Then write out the tree object for this level.
 	 */
-	strbuf_init(&buffer, 8192);
+	strbuf_init(&buffer, STRBUF_HINT_SIZE);
 
 	i = 0;
 	while (i < entries) {
diff --git a/commit.c b/commit.c
index 8ea55a447fa..b3aab46072a 100644
--- a/commit.c
+++ b/commit.c
@@ -1521,7 +1521,7 @@  int commit_tree_extended(const char *msg, size_t msg_len,
 	/* Not having i18n.commitencoding is the same as having utf-8 */
 	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
 
-	strbuf_init(&buffer, 8192); /* should avoid reallocs for the headers */
+	strbuf_init(&buffer, STRBUF_HINT_SIZE); /* should avoid reallocs for the headers */
 	strbuf_addf(&buffer, "tree %s\n", oid_to_hex(tree));
 
 	/*