diff mbox series

[3/4] strbuf: introduce `strbuf_attachstr()`

Message ID c3012f1da361af354a904f821b83d61f2534ccb2.1587297254.git.martin.agren@gmail.com (mailing list archive)
State New, archived
Headers show
Series strbuf: fix doc for `strbuf_attach()` and avoid it | expand

Commit Message

Martin Ågren April 19, 2020, 12:32 p.m. UTC
Similar to the previous commit, introduce `strbuf_attachstr()` where we
don't even have to pass in the length of the string that we want to
attach. Convert existing callers of `strbuf_attachstr()` that use
`strlen()`.

Note how only one caller passes in `mem == len + 1` and that the others
have been using `strbuf_attach()` in direct contradiction to how it was
(incorrectly) documented up until a few commits ago.

Now that the documentation has been fixed, you might say these are all
fine. But the calling convention of `strbuf_attach()` seems sufficiently
hard to get right that it's probably a good idea to introduce this
helper.

This could help reduce reallocations and memory waste. When we
pessimistically pass in `strlen(foo)` for `mem`, the strbuf will have
`alloc == len` and will do a reallocation, not just to get one more byte
for the NUL (which would have been a no-op), but because we're using
`ALLOC_GROW` under the hood, we will ask for 16 more bytes and another
50% on top of that.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 strbuf.h             | 11 +++++++++++
 path.c               |  3 +--
 pretty.c             |  2 +-
 refs/files-backend.c |  3 +--
 trailer.c            |  2 +-
 5 files changed, 15 insertions(+), 6 deletions(-)

Comments

Junio C Hamano April 20, 2020, 7:39 p.m. UTC | #1
Martin Ågren <martin.agren@gmail.com> writes:

> +/**
> + * Attach a string to a buffer similar to `strbuf_attachstr_len()`.
> + * Useful if you do not know the length of the string.
> + */
> +static inline void strbuf_attachstr(struct strbuf *sb, char *str)
> +{
> +	size_t len = strlen(str);
> +
> +	strbuf_attach(sb, str, len, len + 1);
> +}

This is somewhat worrysome in that the interface is _so_ simple that
people may fail to see that str must be allocated piece of memory,
and it is preferrable if string fully fills the allocation.

We should repeat that (instead of just trusting "similar to ..."
would tell them enough) in the doc, perhaps?

> diff --git a/trailer.c b/trailer.c
> index 0c414f2fed..56c4027943 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -1095,7 +1095,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
>  	for (ptr = trailer_lines; *ptr; ptr++) {
>  		if (last && isspace((*ptr)->buf[0])) {
>  			struct strbuf sb = STRBUF_INIT;
> -			strbuf_attach(&sb, *last, strlen(*last), strlen(*last));
> +			strbuf_attachstr(&sb, *last);
>  			strbuf_addbuf(&sb, *ptr);
>  			*last = strbuf_detach(&sb, NULL);
>  			continue;

This is not wrong per-se, but it is unclear if use of strbuf_attach*
family to avoid an explicit malloc/copy/free is buying much at this
callsite.  Simplifying the code here of course is not within the
scope of this series.
Martin Ågren April 21, 2020, 6:47 p.m. UTC | #2
On Mon, 20 Apr 2020 at 21:39, Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin Ågren <martin.agren@gmail.com> writes:
>
> > +/**
> > + * Attach a string to a buffer similar to `strbuf_attachstr_len()`.
> > + * Useful if you do not know the length of the string.
> > + */
> > +static inline void strbuf_attachstr(struct strbuf *sb, char *str)
> > +{
> > +     size_t len = strlen(str);
> > +
> > +     strbuf_attach(sb, str, len, len + 1);
> > +}
>
> This is somewhat worrysome in that the interface is _so_ simple that
> people may fail to see that str must be allocated piece of memory,
> and it is preferrable if string fully fills the allocation.
>
> We should repeat that (instead of just trusting "similar to ..."
> would tell them enough) in the doc, perhaps?

Yeah, that's a good point. I'll expand on this to try to better get
through that there are things to consider here.

> > @@ -1095,7 +1095,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
> >       for (ptr = trailer_lines; *ptr; ptr++) {
> >               if (last && isspace((*ptr)->buf[0])) {
> >                       struct strbuf sb = STRBUF_INIT;
> > -                     strbuf_attach(&sb, *last, strlen(*last), strlen(*last));
> > +                     strbuf_attachstr(&sb, *last);
> >                       strbuf_addbuf(&sb, *ptr);
> >                       *last = strbuf_detach(&sb, NULL);
> >                       continue;
>
> This is not wrong per-se, but it is unclear if use of strbuf_attach*
> family to avoid an explicit malloc/copy/free is buying much at this
> callsite.  Simplifying the code here of course is not within the
> scope of this series.

For the other patches in this series, I spent some time and effort
investigating where strings came from, "do I really feel certain that
they're NUL-terminated?". But for this patch, I more or less went "we've
been using strlen on this all this time, surely if it wasn't guaranteed
to be NUL-terminated we'd have messed up already". And I don't think I'm
making anything worse. But yeah, I didn't really step back to look at
what these sites are really doing, and how, as much as I did for the
others.


Martin
diff mbox series

Patch

diff --git a/strbuf.h b/strbuf.h
index 7d0aeda434..32cc15de0c 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -140,6 +140,17 @@  static inline void strbuf_attachstr_len(struct strbuf *sb,
 	strbuf_attach(sb, str, len, len + 1);
 }
 
+/**
+ * Attach a string to a buffer similar to `strbuf_attachstr_len()`.
+ * Useful if you do not know the length of the string.
+ */
+static inline void strbuf_attachstr(struct strbuf *sb, char *str)
+{
+	size_t len = strlen(str);
+
+	strbuf_attach(sb, str, len, len + 1);
+}
+
 /**
  * Swap the contents of two string buffers.
  */
diff --git a/path.c b/path.c
index 9bd717c307..3cd8fd56b4 100644
--- a/path.c
+++ b/path.c
@@ -815,8 +815,7 @@  const char *enter_repo(const char *path, int strict)
 			char *newpath = expand_user_path(used_path.buf, 0);
 			if (!newpath)
 				return NULL;
-			strbuf_attach(&used_path, newpath, strlen(newpath),
-				      strlen(newpath));
+			strbuf_attachstr(&used_path, newpath);
 		}
 		for (i = 0; suffix[i]; i++) {
 			struct stat st;
diff --git a/pretty.c b/pretty.c
index e171830389..5ecdf0cbb2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -590,7 +590,7 @@  static char *replace_encoding_header(char *buf, const char *encoding)
 		return buf; /* should not happen but be defensive */
 	len = cp + 1 - (buf + start);
 
-	strbuf_attach(&tmp, buf, strlen(buf), strlen(buf) + 1);
+	strbuf_attachstr(&tmp, buf);
 	if (is_encoding_utf8(encoding)) {
 		/* we have re-coded to UTF-8; drop the header */
 		strbuf_remove(&tmp, start, len);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 561c33ac8a..eb058d85b6 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1511,10 +1511,9 @@  static int commit_ref(struct ref_lock *lock)
 		 * the lockfile to. Hopefully it is empty; try to
 		 * delete it.
 		 */
-		size_t len = strlen(path);
 		struct strbuf sb_path = STRBUF_INIT;
 
-		strbuf_attach(&sb_path, path, len, len);
+		strbuf_attachstr(&sb_path, path);
 
 		/*
 		 * If this fails, commit_lock_file() will also fail
diff --git a/trailer.c b/trailer.c
index 0c414f2fed..56c4027943 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1095,7 +1095,7 @@  void trailer_info_get(struct trailer_info *info, const char *str,
 	for (ptr = trailer_lines; *ptr; ptr++) {
 		if (last && isspace((*ptr)->buf[0])) {
 			struct strbuf sb = STRBUF_INIT;
-			strbuf_attach(&sb, *last, strlen(*last), strlen(*last));
+			strbuf_attachstr(&sb, *last);
 			strbuf_addbuf(&sb, *ptr);
 			*last = strbuf_detach(&sb, NULL);
 			continue;