diff mbox series

[1/4] strbuf: fix doc for `strbuf_attach()`

Message ID 54e10bb06a39d23591ea4d02665083d705682468.1587297254.git.martin.agren@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/4] strbuf: fix doc for `strbuf_attach()` | expand

Commit Message

Martin Ågren April 19, 2020, 12:32 p.m. UTC
Commit 917c9a7133 ("New strbuf APIs: splice and attach.", 2007-09-15)
added `strbuf_attach()`. In the commit message, it is explicitly
discussed how using `mem == len` is ok, although possibly costly. When
this function was documented in dd613e6b87 ("Strbuf documentation:
document most functions", 2008-06-04), this distinction was missed and
the documentation simply forbids this case.

The function handles reallocation and truncation, yet the docs say that
the "amount [of allocated memory] must be larger than the string length,
because the string you pass is supposed to be a NUL-terminated string".

Several callers pass in `mem == len` meaning they are compatible with
the implementation and the original commit message, but not with the
documented behavior. In fact, compared to the documentation, they look
outright dangerous.

Later commits will convert most of those call sites to use new, simpler
interfaces, but in at least one instance we really do want to use this
possibility of passing the same value twice: In rerere.c, it is not safe
to pass in "len, len + 1". Doing so makes `strbuf_attach()` write a
trailing NUL at `result.ptr[len]` and running our test suite with
AddressSanitizer will spot an out-of-bounds write due to this.

That is, rerere.c really has use for the flexibility that `mem == len`
gives us. Let's update the documentation to clarify that this is ok.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 strbuf.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Junio C Hamano April 20, 2020, 5:30 p.m. UTC | #1
Martin Ågren <martin.agren@gmail.com> writes:

> Commit 917c9a7133 ("New strbuf APIs: splice and attach.", 2007-09-15)
> added `strbuf_attach()`. In the commit message, it is explicitly
> discussed how using `mem == len` is ok, although possibly costly. When
> this function was documented in dd613e6b87 ("Strbuf documentation:
> document most functions", 2008-06-04), this distinction was missed and
> the documentation simply forbids this case.

In other words, mem==len implies that the original lacks the
terminating '\0', so attach needs to reallocate from the get go, so
discouraging such use may make sense, but it is too strong to forbid
it, as the strbuf invariant is held safely?  If so, the description
makes sense to me.

> The function handles reallocation and truncation, yet the docs say that
> the "amount [of allocated memory] must be larger than the string length,
> because the string you pass is supposed to be a NUL-terminated string".

IOW, _attach() does not mind if the original lacks '\0' at the end.

> diff --git a/strbuf.h b/strbuf.h
> index ce8e49c0b2..2a462f70cc 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -112,10 +112,12 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz);
>  /**
>   * Attach a string to a buffer. You should specify the string to attach,
>   * the current length of the string and the amount of allocated memory.
> + * The amount must be at least as large as the string length. If the two
> + * lengths are equal, reallocation will be handled as appropriate and in
> + * any case, the string will be NUL-truncated as implied by `len`.

NUL-truncated?  Ah, if mem and len are the same, the string is reallocated
to fit an extra byte to NUL-terminate, to make sure strlen(sb->buf)==len
holds.  Makes sense.

> + *
> + * This string _must_ be malloc()ed, and after attaching, the pointer
> + * cannot be relied upon anymore, and neither be free()d directly.

That's a good thing to explain.

>   */
>  void strbuf_attach(struct strbuf *sb, void *str, size_t len, size_t mem);
Martin Ågren April 21, 2020, 6:44 p.m. UTC | #2
On Mon, 20 Apr 2020 at 19:30, Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin Ågren <martin.agren@gmail.com> writes:
>
> > The function handles reallocation and truncation, yet the docs say that
> > the "amount [of allocated memory] must be larger than the string length,
> > because the string you pass is supposed to be a NUL-terminated string".
>
> IOW, _attach() does not mind if the original lacks '\0' at the end.

Right, nor if it lacks the space for it.

> > diff --git a/strbuf.h b/strbuf.h
> > index ce8e49c0b2..2a462f70cc 100644
> > --- a/strbuf.h
> > +++ b/strbuf.h
> > @@ -112,10 +112,12 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz);
> >  /**
> >   * Attach a string to a buffer. You should specify the string to attach,
> >   * the current length of the string and the amount of allocated memory.
> > + * The amount must be at least as large as the string length. If the two
> > + * lengths are equal, reallocation will be handled as appropriate and in
> > + * any case, the string will be NUL-truncated as implied by `len`.
>
> NUL-truncated?  Ah, if mem and len are the same, the string is reallocated
> to fit an extra byte to NUL-terminate, to make sure strlen(sb->buf)==len
> holds.  Makes sense.

Exactly. NUL-terminated would be better. I think I'll split that last
sentence and replace it with something like the following:

  If the two lengths are equal, reallocation will be handled as
  needed. And regardless, the string will be NUL-terminated at `len`.

(One might still have strlen(sb->buf) < len though. This just guarantees
"<=")


Martin
diff mbox series

Patch

diff --git a/strbuf.h b/strbuf.h
index ce8e49c0b2..2a462f70cc 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -112,10 +112,12 @@  char *strbuf_detach(struct strbuf *sb, size_t *sz);
 /**
  * Attach a string to a buffer. You should specify the string to attach,
  * the current length of the string and the amount of allocated memory.
- * The amount must be larger than the string length, because the string you
- * pass is supposed to be a NUL-terminated string.  This string _must_ be
- * malloc()ed, and after attaching, the pointer cannot be relied upon
- * anymore, and neither be free()d directly.
+ * The amount must be at least as large as the string length. If the two
+ * lengths are equal, reallocation will be handled as appropriate and in
+ * any case, the string will be NUL-truncated as implied by `len`.
+ *
+ * This string _must_ be malloc()ed, and after attaching, the pointer
+ * cannot be relied upon anymore, and neither be free()d directly.
  */
 void strbuf_attach(struct strbuf *sb, void *str, size_t len, size_t mem);