diff mbox series

[v3,2/6] string-list: introduce `string_list_setlen()`

Message ID ae8d0ce1f25f26da09f2e3f5bc68f85cc162ce64.1682374789.git.me@ttaylorr.com (mailing list archive)
State Accepted
Commit 492ba81346cc45322d0c26bc927b01a34becf304
Headers show
Series banned: mark `strok()`, `strtok_r()` as banned | expand

Commit Message

Taylor Blau April 24, 2023, 10:20 p.m. UTC
It is sometimes useful to reduce the size of a `string_list`'s list of
items without having to re-allocate them. For example, doing the
following:

    struct strbuf buf = STRBUF_INIT;
    struct string_list parts = STRING_LIST_INIT_NO_DUP;
    while (strbuf_getline(&buf, stdin) != EOF) {
      parts.nr = 0;
      string_list_split_in_place(&parts, buf.buf, ":", -1);
      /* ... */
    }
    string_list_clear(&parts, 0);

is preferable over calling `string_list_clear()` on every iteration of
the loop. This is because `string_list_clear()` causes us free our
existing `items` array. This means that every time we call
`string_list_split_in_place()`, the string-list internals re-allocate
the same size array.

Since in the above example we do not care about the individual parts
after processing each line, it is much more efficient to pretend that
there aren't any elements in the `string_list` by setting `list->nr` to
0 while leaving the list of elements allocated as-is.

This allows `string_list_split_in_place()` to overwrite any existing
entries without needing to free and re-allocate them.

However, setting `list->nr` manually is not safe in all instances. There
are a couple of cases worth worrying about:

  - If the `string_list` is initialized with `strdup_strings`,
    truncating the list can lead to overwriting strings which are
    allocated elsewhere. If there aren't any other pointers to those
    strings other than the ones inside of the `items` array, they will
    become unreachable and leak.

    (We could ourselves free the truncated items between
    string_list->items[nr] and `list->nr`, but no present or future
    callers would benefit from this additional complexity).

  - If the given `nr` is larger than the current value of `list->nr`,
    we'll trick the `string_list` into a state where it thinks there are
    more items allocated than there actually are, which can lead to
    undefined behavior if we try to read or write those entries.

Guard against both of these by introducing a helper function which
guards assignment of `list->nr` against each of the above conditions.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 string-list.c |  9 +++++++++
 string-list.h | 10 ++++++++++
 2 files changed, 19 insertions(+)

Comments

Jeff King April 25, 2023, 6:21 a.m. UTC | #1
On Mon, Apr 24, 2023 at 06:20:14PM -0400, Taylor Blau wrote:

> However, setting `list->nr` manually is not safe in all instances. There
> are a couple of cases worth worrying about:
> 
>   - If the `string_list` is initialized with `strdup_strings`,
>     truncating the list can lead to overwriting strings which are
>     allocated elsewhere. If there aren't any other pointers to those
>     strings other than the ones inside of the `items` array, they will
>     become unreachable and leak.
> 
>     (We could ourselves free the truncated items between
>     string_list->items[nr] and `list->nr`, but no present or future
>     callers would benefit from this additional complexity).

I wondered how bad it would be to just free those truncated entries when
strdup_strings is set. But that led me to another interesting point: the
util fields. The regular string_list_clear() will optionally free the
util entries, too. We'd potentially need to deal with those, too.

We don't do anything with them here. So code like:

  struct string_list foo = STRING_LIST_INIT_NODUP;

  string_list_append(&foo, "bar")->util = xstrdup("something else");
  string_list_setlen(&foo, 0);

would leak that util field. To be clear, to me this definitely falls
under "if it hurts, don't do it", and I think code like above is pretty
unlikely. But since the point of our function is to prevent mistakes, I
thought it was worth mentioning.

I think we _could_ do something like:

  for (i = nr; i < list->nr; i++) {
	if (list->items[i].util)
		BUG("truncated string list item has non-NULL util field");
  }

though that is technically tighter than we need to be (it could be an
unowned util field, after all; we don't know what it means here). So I'm
inclined to leave your patch as-is.

This would all be easier if the string_list had a field for "we own the
util fields, too" just like it has strdup_strings. Or even a free-ing
function. But instead we have ad-hoc solutions like "free_util" and
string_list_clear_func(). But that's really outside the scope of your
series. </rant> :)

-Peff
Taylor Blau April 25, 2023, 9 p.m. UTC | #2
On Tue, Apr 25, 2023 at 02:21:07AM -0400, Jeff King wrote:
> I think we _could_ do something like:
>
>   for (i = nr; i < list->nr; i++) {
> 	if (list->items[i].util)
> 		BUG("truncated string list item has non-NULL util field");
>   }
>
> though that is technically tighter than we need to be (it could be an
> unowned util field, after all; we don't know what it means here). So I'm
> inclined to leave your patch as-is.

I think there are two ways to do it, either:

  - something like what you wrote above, perhaps with an additional
    `free_util` bit on the string_list itself (which might make it
    convenient to drop all of the `free_util` parameters that permeate
    its API)

  - have string_list_setlen() take a `free_util` argument itself, in
    which case your code would change to:

    if (free_util) {
      for (i = nr; i < list->nr; i++)
        free(list->items[i].util)
    }

> This would all be easier if the string_list had a field for "we own the
> util fields, too" just like it has strdup_strings. Or even a free-ing
> function. But instead we have ad-hoc solutions like "free_util" and
> string_list_clear_func(). But that's really outside the scope of your
> series. </rant> :)

;-).

Thanks,
Taylor
diff mbox series

Patch

diff --git a/string-list.c b/string-list.c
index 5f5b60fe1c..0f8ac117fd 100644
--- a/string-list.c
+++ b/string-list.c
@@ -203,6 +203,15 @@  void string_list_clear_func(struct string_list *list, string_list_clear_func_t c
 	list->nr = list->alloc = 0;
 }
 
+void string_list_setlen(struct string_list *list, size_t nr)
+{
+	if (list->strdup_strings)
+		BUG("cannot setlen a string_list which owns its entries");
+	if (nr > list->nr)
+		BUG("cannot grow a string_list with setlen");
+	list->nr = nr;
+}
+
 struct string_list_item *string_list_append_nodup(struct string_list *list,
 						  char *string)
 {
diff --git a/string-list.h b/string-list.h
index 77854840f7..122b318641 100644
--- a/string-list.h
+++ b/string-list.h
@@ -134,6 +134,16 @@  typedef void (*string_list_clear_func_t)(void *p, const char *str);
 /** Call a custom clear function on each util pointer */
 void string_list_clear_func(struct string_list *list, string_list_clear_func_t clearfunc);
 
+/*
+ * Set the length of a string_list to `nr`, provided that (a) `list`
+ * does not own its own storage, and (b) that `nr` is no larger than
+ * `list->nr`.
+ *
+ * Useful when "shrinking" `list` to write over existing entries that
+ * are no longer used without reallocating.
+ */
+void string_list_setlen(struct string_list *list, size_t nr);
+
 /**
  * Apply `func` to each item. If `func` returns nonzero, the
  * iteration aborts and the return value is propagated.