diff mbox series

[Outreachy,1/2] strbuf: introduce strbuf_addstrings() to repeatedly add a string

Message ID 20240226143350.3596-1-ach.lumap@gmail.com (mailing list archive)
State Superseded
Headers show
Series [Outreachy,1/2] strbuf: introduce strbuf_addstrings() to repeatedly add a string | expand

Commit Message

Achu Luma Feb. 26, 2024, 2:33 p.m. UTC
In a following commit we are going to port code from
"t/helper/test-sha256.c", t/helper/test-hash.c and "t/t0015-hash.sh" to
a new "t/unit-tests/t-hash.c" file using the recently added unit test
framework.

To port code like: perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;"
we are going to need a new strbuf_addstrings() function that repeatedly
adds the same string a number of times to a buffer.

Such a strbuf_addstrings() function would already be useful in
"json-writer.c" and "builtin/submodule-helper.c" as both of these files
already have code that repeatedly adds the same string. So let's
introduce such a strbuf_addstrings() function in "strbuf.{c,h}" and use
it in both "json-writer.c" and "builtin/submodule-helper.c".

We use the "strbuf_addstrings" name as this way strbuf_addstr() and
strbuf_addstrings() would be similar for strings as strbuf_addch() and
strbuf_addchars() for characters.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Achu Luma <ach.lumap@gmail.com>
---
 builtin/submodule--helper.c |  4 +---
 json-writer.c               |  5 +----
 strbuf.c                    | 11 +++++++++++
 strbuf.h                    |  5 +++++
 4 files changed, 18 insertions(+), 7 deletions(-)

--
2.43.0.windows.1

Comments

Junio C Hamano Feb. 26, 2024, 4:39 p.m. UTC | #1
Achu Luma <ach.lumap@gmail.com> writes:

> In a following commit we are going to port code from
> "t/helper/test-sha256.c", t/helper/test-hash.c and "t/t0015-hash.sh" to
> a new "t/unit-tests/t-hash.c" file using the recently added unit test
> framework.
>
> To port code like: perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;"
> we are going to need a new strbuf_addstrings() function that repeatedly
> adds the same string a number of times to a buffer.

We do not need to call such a function "addstrings", though.  The
name on the subject line made me expect a varargs function:

 (bad)	strbuf_addstrings(&sb, "foo", "bar", "baz", NULL);

It would have been clearer if the name hinted what it does, clearer
than just a single "s" that says it is talking about plural.  What
would be a good name that hints "n times add a single same string"?
I dunno.

I also would have expected that the order of parameters are
repeat-count followed by what gets repeated.

Having said all of the above, we already have "addchars" that is
equally strange, so let's let it pass ;-).

> diff --git a/strbuf.c b/strbuf.c
> index 7827178d8e..eb2b3299ce 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -302,6 +302,17 @@ void strbuf_add(struct strbuf *sb, const void *data, size_t len)
>  	strbuf_setlen(sb, sb->len + len);
>  }
>
> +void strbuf_addstrings(struct strbuf *sb, const char *s, size_t n)
> +{
> +	size_t len = strlen(s);

Let's have a blank line here to separate decls from the first
statement.

> +	if (unsigned_mult_overflows(len, n))
> +		die("you want to use way too much memory");
> +	strbuf_grow(sb, len * n);

The error message given by

	strbuf_grow(sb, st_mult(len, n));

would be equally informative and takes only a single line.

> +	for (size_t i = 0; i < n; i++)
> +		memcpy(sb->buf + sb->len + len * i, s, len);

Wouldn't it be sufficient to run strbuf_add() n times at this point,
as we have already called strbuf_grow() to avoid repeated
reallocation?  Repeated manual memcpy() that involves manual offset
computation makes me nervous.

> +	strbuf_setlen(sb, sb->len + len * n);
> +}
> +
>  void strbuf_addbuf(struct strbuf *sb, const struct strbuf *sb2)
>  {
>  	strbuf_grow(sb, sb2->len);
> diff --git a/strbuf.h b/strbuf.h
> index e959caca87..0fb1b5e81e 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -310,6 +310,11 @@ static inline void strbuf_addstr(struct strbuf *sb, const char *s)
>  	strbuf_add(sb, s, strlen(s));
>  }
>
> +/**
> + * Add a NUL-terminated string the specified number of times to the buffer.
> + */
> +void strbuf_addstrings(struct strbuf *sb, const char *s, size_t n);
> +
>  /**
>   * Copy the contents of another buffer at the end of the current one.
>   */
> --
> 2.43.0.windows.1
Christian Couder Feb. 26, 2024, 5:15 p.m. UTC | #2
On Mon, Feb 26, 2024 at 5:39 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Achu Luma <ach.lumap@gmail.com> writes:
>
> > In a following commit we are going to port code from
> > "t/helper/test-sha256.c", t/helper/test-hash.c and "t/t0015-hash.sh" to
> > a new "t/unit-tests/t-hash.c" file using the recently added unit test
> > framework.
> >
> > To port code like: perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;"
> > we are going to need a new strbuf_addstrings() function that repeatedly
> > adds the same string a number of times to a buffer.
>
> We do not need to call such a function "addstrings", though.  The
> name on the subject line made me expect a varargs function:
>
>  (bad)  strbuf_addstrings(&sb, "foo", "bar", "baz", NULL);
>
> It would have been clearer if the name hinted what it does, clearer
> than just a single "s" that says it is talking about plural.  What
> would be a good name that hints "n times add a single same string"?
> I dunno.
>
> I also would have expected that the order of parameters are
> repeat-count followed by what gets repeated.
>
> Having said all of the above, we already have "addchars" that is
> equally strange, so let's let it pass ;-).

Yeah, we thought about naming it strbuf_repeatstr() first, but then
seeing addchars() I thought it would be better to imitate it.

> > diff --git a/strbuf.c b/strbuf.c
> > index 7827178d8e..eb2b3299ce 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -302,6 +302,17 @@ void strbuf_add(struct strbuf *sb, const void *data, size_t len)
> >       strbuf_setlen(sb, sb->len + len);
> >  }
> >
> > +void strbuf_addstrings(struct strbuf *sb, const char *s, size_t n)
> > +{
> > +     size_t len = strlen(s);
>
> Let's have a blank line here to separate decls from the first
> statement.

Yeah, right.

> > +     if (unsigned_mult_overflows(len, n))
> > +             die("you want to use way too much memory");
> > +     strbuf_grow(sb, len * n);
>
> The error message given by
>
>         strbuf_grow(sb, st_mult(len, n));
>
> would be equally informative and takes only a single line.

It seems that the pattern in strbuf.c, for example in strbuf_splice()
and strbuf_vinsertf(), is to do a check first using an *_overflows()
function, die if the check fails, and only then call strbuf_grow(). So
we did the same. I am fine with using your suggestion though.

> > +     for (size_t i = 0; i < n; i++)
> > +             memcpy(sb->buf + sb->len + len * i, s, len);
>
> Wouldn't it be sufficient to run strbuf_add() n times at this point,
> as we have already called strbuf_grow() to avoid repeated
> reallocation?

Unfortunately strbuf_add() calls strbuf_grow() itself which is not
needed as we have already called strbuf_grow() to avoid repeated
reallocation.

>  Repeated manual memcpy() that involves manual offset
> computation makes me nervous.

I would have prefered to avoid it too, but didn't find a good way to do it.

> > +     strbuf_setlen(sb, sb->len + len * n);
> > +}
Junio C Hamano Feb. 26, 2024, 6:10 p.m. UTC | #3
Christian Couder <christian.couder@gmail.com> writes:

> Unfortunately strbuf_add() calls strbuf_grow() itself which is not
> needed as we have already called strbuf_grow() to avoid repeated
> reallocation.

Why is it unfortunate?  If the current allocation is sufficient, it
is a cheap no-op, isn't it (if not, we should make it so)?
Christian Couder Feb. 27, 2024, 10:07 a.m. UTC | #4
On Mon, Feb 26, 2024 at 7:10 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > Unfortunately strbuf_add() calls strbuf_grow() itself which is not
> > needed as we have already called strbuf_grow() to avoid repeated
> > reallocation.
>
> Why is it unfortunate?  If the current allocation is sufficient, it
> is a cheap no-op, isn't it (if not, we should make it so)?

Yeah, I agree that we don't need to be extra efficient and calling
strbuf_grow() doesn't add much overhead. So let's just use
strbuf_add() repeatedly after calling it once.
diff mbox series

Patch

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index fda50f2af1..bed08af410 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -257,11 +257,9 @@  static void module_list_active(struct module_list *list)

 static char *get_up_path(const char *path)
 {
-	int i;
 	struct strbuf sb = STRBUF_INIT;

-	for (i = count_slashes(path); i; i--)
-		strbuf_addstr(&sb, "../");
+	strbuf_addstrings(&sb, "../", count_slashes(path));

 	/*
 	 * Check if 'path' ends with slash or not
diff --git a/json-writer.c b/json-writer.c
index 005c820aa4..25b9201f9c 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -46,10 +46,7 @@  static void append_quoted_string(struct strbuf *out, const char *in)

 static void indent_pretty(struct json_writer *jw)
 {
-	int k;
-
-	for (k = 0; k < jw->open_stack.len; k++)
-		strbuf_addstr(&jw->json, "  ");
+	strbuf_addstrings(&jw->json, "  ", jw->open_stack.len);
 }

 /*
diff --git a/strbuf.c b/strbuf.c
index 7827178d8e..eb2b3299ce 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -302,6 +302,17 @@  void strbuf_add(struct strbuf *sb, const void *data, size_t len)
 	strbuf_setlen(sb, sb->len + len);
 }

+void strbuf_addstrings(struct strbuf *sb, const char *s, size_t n)
+{
+	size_t len = strlen(s);
+	if (unsigned_mult_overflows(len, n))
+		die("you want to use way too much memory");
+	strbuf_grow(sb, len * n);
+	for (size_t i = 0; i < n; i++)
+		memcpy(sb->buf + sb->len + len * i, s, len);
+	strbuf_setlen(sb, sb->len + len * n);
+}
+
 void strbuf_addbuf(struct strbuf *sb, const struct strbuf *sb2)
 {
 	strbuf_grow(sb, sb2->len);
diff --git a/strbuf.h b/strbuf.h
index e959caca87..0fb1b5e81e 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -310,6 +310,11 @@  static inline void strbuf_addstr(struct strbuf *sb, const char *s)
 	strbuf_add(sb, s, strlen(s));
 }

+/**
+ * Add a NUL-terminated string the specified number of times to the buffer.
+ */
+void strbuf_addstrings(struct strbuf *sb, const char *s, size_t n);
+
 /**
  * Copy the contents of another buffer at the end of the current one.
  */