Message ID | 20240226143350.3596-1-ach.lumap@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [Outreachy,1/2] strbuf: introduce strbuf_addstrings() to repeatedly add a string | expand |
Achu Luma <ach.lumap@gmail.com> writes: > In a following commit we are going to port code from > "t/helper/test-sha256.c", t/helper/test-hash.c and "t/t0015-hash.sh" to > a new "t/unit-tests/t-hash.c" file using the recently added unit test > framework. > > To port code like: perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;" > we are going to need a new strbuf_addstrings() function that repeatedly > adds the same string a number of times to a buffer. We do not need to call such a function "addstrings", though. The name on the subject line made me expect a varargs function: (bad) strbuf_addstrings(&sb, "foo", "bar", "baz", NULL); It would have been clearer if the name hinted what it does, clearer than just a single "s" that says it is talking about plural. What would be a good name that hints "n times add a single same string"? I dunno. I also would have expected that the order of parameters are repeat-count followed by what gets repeated. Having said all of the above, we already have "addchars" that is equally strange, so let's let it pass ;-). > diff --git a/strbuf.c b/strbuf.c > index 7827178d8e..eb2b3299ce 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -302,6 +302,17 @@ void strbuf_add(struct strbuf *sb, const void *data, size_t len) > strbuf_setlen(sb, sb->len + len); > } > > +void strbuf_addstrings(struct strbuf *sb, const char *s, size_t n) > +{ > + size_t len = strlen(s); Let's have a blank line here to separate decls from the first statement. > + if (unsigned_mult_overflows(len, n)) > + die("you want to use way too much memory"); > + strbuf_grow(sb, len * n); The error message given by strbuf_grow(sb, st_mult(len, n)); would be equally informative and takes only a single line. > + for (size_t i = 0; i < n; i++) > + memcpy(sb->buf + sb->len + len * i, s, len); Wouldn't it be sufficient to run strbuf_add() n times at this point, as we have already called strbuf_grow() to avoid repeated reallocation? Repeated manual memcpy() that involves manual offset computation makes me nervous. > + strbuf_setlen(sb, sb->len + len * n); > +} > + > void strbuf_addbuf(struct strbuf *sb, const struct strbuf *sb2) > { > strbuf_grow(sb, sb2->len); > diff --git a/strbuf.h b/strbuf.h > index e959caca87..0fb1b5e81e 100644 > --- a/strbuf.h > +++ b/strbuf.h > @@ -310,6 +310,11 @@ static inline void strbuf_addstr(struct strbuf *sb, const char *s) > strbuf_add(sb, s, strlen(s)); > } > > +/** > + * Add a NUL-terminated string the specified number of times to the buffer. > + */ > +void strbuf_addstrings(struct strbuf *sb, const char *s, size_t n); > + > /** > * Copy the contents of another buffer at the end of the current one. > */ > -- > 2.43.0.windows.1
On Mon, Feb 26, 2024 at 5:39 PM Junio C Hamano <gitster@pobox.com> wrote: > > Achu Luma <ach.lumap@gmail.com> writes: > > > In a following commit we are going to port code from > > "t/helper/test-sha256.c", t/helper/test-hash.c and "t/t0015-hash.sh" to > > a new "t/unit-tests/t-hash.c" file using the recently added unit test > > framework. > > > > To port code like: perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;" > > we are going to need a new strbuf_addstrings() function that repeatedly > > adds the same string a number of times to a buffer. > > We do not need to call such a function "addstrings", though. The > name on the subject line made me expect a varargs function: > > (bad) strbuf_addstrings(&sb, "foo", "bar", "baz", NULL); > > It would have been clearer if the name hinted what it does, clearer > than just a single "s" that says it is talking about plural. What > would be a good name that hints "n times add a single same string"? > I dunno. > > I also would have expected that the order of parameters are > repeat-count followed by what gets repeated. > > Having said all of the above, we already have "addchars" that is > equally strange, so let's let it pass ;-). Yeah, we thought about naming it strbuf_repeatstr() first, but then seeing addchars() I thought it would be better to imitate it. > > diff --git a/strbuf.c b/strbuf.c > > index 7827178d8e..eb2b3299ce 100644 > > --- a/strbuf.c > > +++ b/strbuf.c > > @@ -302,6 +302,17 @@ void strbuf_add(struct strbuf *sb, const void *data, size_t len) > > strbuf_setlen(sb, sb->len + len); > > } > > > > +void strbuf_addstrings(struct strbuf *sb, const char *s, size_t n) > > +{ > > + size_t len = strlen(s); > > Let's have a blank line here to separate decls from the first > statement. Yeah, right. > > + if (unsigned_mult_overflows(len, n)) > > + die("you want to use way too much memory"); > > + strbuf_grow(sb, len * n); > > The error message given by > > strbuf_grow(sb, st_mult(len, n)); > > would be equally informative and takes only a single line. It seems that the pattern in strbuf.c, for example in strbuf_splice() and strbuf_vinsertf(), is to do a check first using an *_overflows() function, die if the check fails, and only then call strbuf_grow(). So we did the same. I am fine with using your suggestion though. > > + for (size_t i = 0; i < n; i++) > > + memcpy(sb->buf + sb->len + len * i, s, len); > > Wouldn't it be sufficient to run strbuf_add() n times at this point, > as we have already called strbuf_grow() to avoid repeated > reallocation? Unfortunately strbuf_add() calls strbuf_grow() itself which is not needed as we have already called strbuf_grow() to avoid repeated reallocation. > Repeated manual memcpy() that involves manual offset > computation makes me nervous. I would have prefered to avoid it too, but didn't find a good way to do it. > > + strbuf_setlen(sb, sb->len + len * n); > > +}
Christian Couder <christian.couder@gmail.com> writes: > Unfortunately strbuf_add() calls strbuf_grow() itself which is not > needed as we have already called strbuf_grow() to avoid repeated > reallocation. Why is it unfortunate? If the current allocation is sufficient, it is a cheap no-op, isn't it (if not, we should make it so)?
On Mon, Feb 26, 2024 at 7:10 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > Unfortunately strbuf_add() calls strbuf_grow() itself which is not > > needed as we have already called strbuf_grow() to avoid repeated > > reallocation. > > Why is it unfortunate? If the current allocation is sufficient, it > is a cheap no-op, isn't it (if not, we should make it so)? Yeah, I agree that we don't need to be extra efficient and calling strbuf_grow() doesn't add much overhead. So let's just use strbuf_add() repeatedly after calling it once.
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index fda50f2af1..bed08af410 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -257,11 +257,9 @@ static void module_list_active(struct module_list *list) static char *get_up_path(const char *path) { - int i; struct strbuf sb = STRBUF_INIT; - for (i = count_slashes(path); i; i--) - strbuf_addstr(&sb, "../"); + strbuf_addstrings(&sb, "../", count_slashes(path)); /* * Check if 'path' ends with slash or not diff --git a/json-writer.c b/json-writer.c index 005c820aa4..25b9201f9c 100644 --- a/json-writer.c +++ b/json-writer.c @@ -46,10 +46,7 @@ static void append_quoted_string(struct strbuf *out, const char *in) static void indent_pretty(struct json_writer *jw) { - int k; - - for (k = 0; k < jw->open_stack.len; k++) - strbuf_addstr(&jw->json, " "); + strbuf_addstrings(&jw->json, " ", jw->open_stack.len); } /* diff --git a/strbuf.c b/strbuf.c index 7827178d8e..eb2b3299ce 100644 --- a/strbuf.c +++ b/strbuf.c @@ -302,6 +302,17 @@ void strbuf_add(struct strbuf *sb, const void *data, size_t len) strbuf_setlen(sb, sb->len + len); } +void strbuf_addstrings(struct strbuf *sb, const char *s, size_t n) +{ + size_t len = strlen(s); + if (unsigned_mult_overflows(len, n)) + die("you want to use way too much memory"); + strbuf_grow(sb, len * n); + for (size_t i = 0; i < n; i++) + memcpy(sb->buf + sb->len + len * i, s, len); + strbuf_setlen(sb, sb->len + len * n); +} + void strbuf_addbuf(struct strbuf *sb, const struct strbuf *sb2) { strbuf_grow(sb, sb2->len); diff --git a/strbuf.h b/strbuf.h index e959caca87..0fb1b5e81e 100644 --- a/strbuf.h +++ b/strbuf.h @@ -310,6 +310,11 @@ static inline void strbuf_addstr(struct strbuf *sb, const char *s) strbuf_add(sb, s, strlen(s)); } +/** + * Add a NUL-terminated string the specified number of times to the buffer. + */ +void strbuf_addstrings(struct strbuf *sb, const char *s, size_t n); + /** * Copy the contents of another buffer at the end of the current one. */
In a following commit we are going to port code from "t/helper/test-sha256.c", t/helper/test-hash.c and "t/t0015-hash.sh" to a new "t/unit-tests/t-hash.c" file using the recently added unit test framework. To port code like: perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;" we are going to need a new strbuf_addstrings() function that repeatedly adds the same string a number of times to a buffer. Such a strbuf_addstrings() function would already be useful in "json-writer.c" and "builtin/submodule-helper.c" as both of these files already have code that repeatedly adds the same string. So let's introduce such a strbuf_addstrings() function in "strbuf.{c,h}" and use it in both "json-writer.c" and "builtin/submodule-helper.c". We use the "strbuf_addstrings" name as this way strbuf_addstr() and strbuf_addstrings() would be similar for strings as strbuf_addch() and strbuf_addchars() for characters. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Achu Luma <ach.lumap@gmail.com> --- builtin/submodule--helper.c | 4 +--- json-writer.c | 5 +---- strbuf.c | 11 +++++++++++ strbuf.h | 5 +++++ 4 files changed, 18 insertions(+), 7 deletions(-) -- 2.43.0.windows.1