diff mbox series

[GSoC,RFC,1/2] STRBUF_INIT_CONST: a new way to initialize strbuf

Message ID 20200218041805.10939-2-robear.selwans@outlook.com (mailing list archive)
State New, archived
Headers show
Series STRBUF_INIT_CONST Cover | expand

Commit Message

Robear Selwans Feb. 18, 2020, 4:18 a.m. UTC
A new function `STRBUF_INIT_CONST(const_str)` was added to allow for a
quick initialization of strbuf.

Details:
Using `STRBUF_INIT_CONST(str)` creates a new struct of type `strbuf` and
initializes its `buf`, `len` and `alloc` as `str`, `strlen(str)` and
`0`, respectively.

Use Case:
This is meant to be used to initialize strbufs with constant values and
thus, only allocating memory when needed.

Usage Example:
```
strbuf env_var = STRBUF_INIT_CONST("dev");
```

This was added according to the issue opened at [https://github.com/gitgitgadget/git/issues/398]

Signed-off-by: Robear Selwans <robear.selwans@outlook.com>
---
 strbuf.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Jeff King Feb. 18, 2020, 6:21 a.m. UTC | #1
On Tue, Feb 18, 2020 at 04:18:04AM +0000, Robear Selwans wrote:

> A new function `STRBUF_INIT_CONST(const_str)` was added to allow for a
> quick initialization of strbuf.
> 
> Details:
> Using `STRBUF_INIT_CONST(str)` creates a new struct of type `strbuf` and
> initializes its `buf`, `len` and `alloc` as `str`, `strlen(str)` and
> `0`, respectively.
> 
> Use Case:
> This is meant to be used to initialize strbufs with constant values and
> thus, only allocating memory when needed.
> 
> Usage Example:
> ```
> strbuf env_var = STRBUF_INIT_CONST("dev");
> ```

This seems a bit dangerous to me, as we're initializing a non-const
pointer with a string literal. In fact, I'm a little surprised that the
compiler doesn't complain, but I think it's mostly due to historical
C-isms (the type of string literals is array-of-char). Using gcc's
-Wwrite-strings does complain, but there are several other cases already
in Git (looking at a few, I think there are some opportunities for
cleanup).

Your second patch catches cases where the strbuf functions want to write
to the buffer. But we've always been pretty open about the fact that
strbuf.buf is a writeable C-style string. So something like this:

  struct strbuf x = STRBUF_INIT_CONST("foo");
  size_t i;

  for (i = 0; i < x.len; i++)
	x.buf[i] = toupper(x.buf[i]);

would generate no compile-time warnings, but would invoke undefined
behavior (on my system it segfaults when run, but it could have even
more confusing outcomes). Even though this is called out specifically in
the strbuf docs:

   However, it is totally safe to modify anything in the string pointed
   by the `buf` member, between the indices `0` and `len-1` (inclusive).

Of course it would be easy to fix that by adding a strbuf_make_var()
call. But my concern is cases where the const-ness and the use of the
strbuf are far apart. The point of a strbuf is that you can just use it
without worrying, but now it's carrying this extra hidden state.

If we want to pursue this direction, I think we'd do better to give each
strbuf a matching array. Something like:

  #define STRBUF_INIT_FROM(buf) { .alloc = 0, .buf = buf, .len = ARRAY_SIZE(buf)-1 }
  ...
  char foo_buf[] = "this is the constant value";
  struct strbuf foo = STRBUF_INIT_FROM(foo_buf);

That gives you a true writeable buffer with the const data in it. _And_
it opens up the option of strbufs using stack buffers with an empty
initial value for efficiency (i.e., avoiding the heap at all for short
common cases, but being able to grow when needed). One trouble is that
you can't do it all in a single variable. You'd need something like:

  #define DECLARE_STACK_STRBUF(name, contents) \
	char name##_buf[] = (contents);
	struct strbuf name = STRBUF_INIT_FROM(name##_buf)
  ...
  DECLARE_STACK_STRBUF(foo, "this is the constant value");

But that gets weirdly un-C-like (your macro expands to multiple
statements, which is usually a macro pitfall; but we can't use the usual
"do { } while(0)" trick here, because the variables would go out of
scope at the end of the fake block.

So I think there are interesting directions here, but there's a lot of
stuff to figure out.

I notice you put GSoC in your subject line. If you're looking at this as
a microproject, IMHO this is _way_ more complicated and subtle than a
microproject should be. The goal there is to give something so easy that
you get to focus on getting your patches in and interacting with the
community. The scope I'd expect is more along the lines of compiling
with -Wwrite-strings and cleaning up some of the locations that
complain.

-Peff
Robear Selwans Feb. 18, 2020, 2:19 p.m. UTC | #2
On Tue, Feb 18, 2020 at 8:21 AM Jeff King <peff@peff.net> wrote:
> Your second patch catches cases where the strbuf functions want to write
> to the buffer. But we've always been pretty open about the fact that
> strbuf.buf is a writeable C-style string. So something like this:
> ...
> would generate no compile-time warnings, but would invoke undefined
> behavior (on my system it segfaults when run, but it could have even
> more confusing outcomes).
Oh right, I didn't think about that. Ignorant of me to expect everyone to just
call the functions and not edit the buf directly.

> If we want to pursue this direction, I think we'd do better to give each
> strbuf a matching array. Something like:
> ...
> So I think there are interesting directions here, but there's a lot of
> stuff to figure out.
I think that got me a bit fired up now.

> I notice you put GSoC in your subject line. If you're looking at this as
> a microproject, IMHO this is _way_ more complicated and subtle than a
> microproject should be. The goal there is to give something so easy that
> you get to focus on getting your patches in and interacting with the
> community. The scope I'd expect is more along the lines of compiling
> with -Wwrite-strings and cleaning up some of the locations that
> complain.
I'm actually planning to keep on contributing to git, so I kind of
didn't want to
do something trivial. Despite the fact that I'm planning to apply to
git for GSoC,
I'm mostly putting the [GSoC] so that reviewers would go easy on me :D. That
said, I might actually do the -Wwrite-strings clean-up after this one
is finished.

Thanks for the help, I guess I'll start editing it ASAP, then.

- mo7sener
Jeff King Feb. 18, 2020, 8:33 p.m. UTC | #3
On Tue, Feb 18, 2020 at 04:19:38PM +0200, Robear Selwans wrote:

> > I notice you put GSoC in your subject line. If you're looking at this as
> > a microproject, IMHO this is _way_ more complicated and subtle than a
> > microproject should be. The goal there is to give something so easy that
> > you get to focus on getting your patches in and interacting with the
> > community. The scope I'd expect is more along the lines of compiling
> > with -Wwrite-strings and cleaning up some of the locations that
> > complain.
> I'm actually planning to keep on contributing to git, so I kind of
> didn't want to
> do something trivial. Despite the fact that I'm planning to apply to
> git for GSoC,
> I'm mostly putting the [GSoC] so that reviewers would go easy on me :D. That
> said, I might actually do the -Wwrite-strings clean-up after this one
> is finished.

OK. If you want to go further, I certainly won't stop you. :)

-Peff
Johannes Sixt Feb. 19, 2020, 8:13 a.m. UTC | #4
Am 18.02.20 um 05:18 schrieb Robear Selwans:
> A new function `STRBUF_INIT_CONST(const_str)` was added to allow for a
> quick initialization of strbuf.
> 
> Details:
> Using `STRBUF_INIT_CONST(str)` creates a new struct of type `strbuf` and
> initializes its `buf`, `len` and `alloc` as `str`, `strlen(str)` and
> `0`, respectively.
> 
> Use Case:
> This is meant to be used to initialize strbufs with constant values and
> thus, only allocating memory when needed.
> 
> Usage Example:
> ```
> strbuf env_var = STRBUF_INIT_CONST("dev");
> ```
> 
> This was added according to the issue opened at [https://github.com/gitgitgadget/git/issues/398]

I am not a friend of this change at all. Why do so many functions and
strbuf instances have to pay a price (check for immutable string) for a
feature that they are not using?

As the macro is just intended for convenience, I suggest to implement it
using strbuf_addstr() under the hood. That is much less code churn, and
the price is paid only by the strbufs that actually use the feature.

-- Hannes
René Scharfe Feb. 20, 2020, 6:49 p.m. UTC | #5
Am 19.02.20 um 09:13 schrieb Johannes Sixt:
> Am 18.02.20 um 05:18 schrieb Robear Selwans:
>> A new function `STRBUF_INIT_CONST(const_str)` was added to allow for a
>> quick initialization of strbuf.
>>
>> Details:
>> Using `STRBUF_INIT_CONST(str)` creates a new struct of type `strbuf` and
>> initializes its `buf`, `len` and `alloc` as `str`, `strlen(str)` and
>> `0`, respectively.
>>
>> Use Case:
>> This is meant to be used to initialize strbufs with constant values and
>> thus, only allocating memory when needed.
>>
>> Usage Example:
>> ```
>> strbuf env_var = STRBUF_INIT_CONST("dev");
>> ```
>>
>> This was added according to the issue opened at [https://github.com/gitgitgadget/git/issues/398]
>
> I am not a friend of this change at all. Why do so many functions and
> strbuf instances have to pay a price (check for immutable string) for a
> feature that they are not using?
>
> As the macro is just intended for convenience, I suggest to implement it
> using strbuf_addstr() under the hood. That is much less code churn, and
> the price is paid only by the strbufs that actually use the feature.

I was also wondering what the benefits of this change might be.  Saving
one line and thus increasing convenience slightly doesn't justify all
this added complexity.  Saving an allocation in the following sequence
might be worthwhile:

	struct strbuf sb = STRBUF_INIT;
	strbuf_addstr(&sb, "foo");
	/* Use sb without modifying it. */
	strbuf_release(&sb); /* or leak it */

I found two examples of this pattern in the code, one in range-diff.c in
the function show_range_diff(), and below is a patch for getting rid of
the second one.  Are there other reasons why we'd want that feature?
Could you perhaps include a patch that makes use of it in this series,
to highlight its benefits?

-- >8 --
Subject: [PATCH] commit-graph: use progress title directly

merge_commit_graphs() copies the (translated) progress message into a
strbuf and passes the copy to start_delayed_progress() at each loop
iteration.  The latter function takes a string pointer, so let's avoid
the detour and hand the string to it directly.  That's shorter, simpler
and slightly more efficient.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 commit-graph.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 656dd647d5..f013a84e29 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1657,19 +1657,15 @@ static void merge_commit_graphs(struct write_commit_graph_context *ctx)
 {
 	struct commit_graph *g = ctx->r->objects->commit_graph;
 	uint32_t current_graph_number = ctx->num_commit_graphs_before;
-	struct strbuf progress_title = STRBUF_INIT;

 	while (g && current_graph_number >= ctx->num_commit_graphs_after) {
 		current_graph_number--;

-		if (ctx->report_progress) {
-			strbuf_addstr(&progress_title, _("Merging commit-graph"));
-			ctx->progress = start_delayed_progress(progress_title.buf, 0);
-		}
+		if (ctx->report_progress)
+			ctx->progress = start_delayed_progress(_("Merging commit-graph"), 0);

 		merge_commit_graph(ctx, g);
 		stop_progress(&ctx->progress);
-		strbuf_release(&progress_title);

 		g = g->base_graph;
 	}
--
2.25.1
Robear Selwans Feb. 21, 2020, 5:21 a.m. UTC | #6
On Thu, Feb 20, 2020 at 8:49 PM René Scharfe <l.s.r@web.de> wrote:
>
> Could you perhaps include a patch that makes use of it in this series,
> to highlight its benefits?

Well to begin with, I'm actually doing this in response to this issue
[https://github.com/gitgitgadget/git/issues/398].
The issue was created because of the following mail thread, though.
[https://public-inbox.org/git/20180601200146.114919-1-sbeller@google.com/]
To be honest, I'm not entirely sure about how making these changes
would help, as my experience is still quite limited. But from what
I've read, I think the main
use-case would be using const `strbuf`s to avoid memory leaks when
dealing with config strings.
Derrick Stolee Feb. 27, 2020, 6:50 a.m. UTC | #7
On 2/20/2020 1:49 PM, René Scharfe wrote:
> Am 19.02.20 um 09:13 schrieb Johannes Sixt:
>> As the macro is just intended for convenience, I suggest to implement it
>> using strbuf_addstr() under the hood. That is much less code churn, and
>> the price is paid only by the strbufs that actually use the feature.
> 
> I was also wondering what the benefits of this change might be.  Saving
> one line and thus increasing convenience slightly doesn't justify all
> this added complexity.  Saving an allocation in the following sequence
> might be worthwhile:
> 
> 	struct strbuf sb = STRBUF_INIT;
> 	strbuf_addstr(&sb, "foo");
> 	/* Use sb without modifying it. */
> 	strbuf_release(&sb); /* or leak it */
> 
> I found two examples of this pattern in the code, one in range-diff.c in
> the function show_range_diff(), and below is a patch for getting rid of
> the second one.  Are there other reasons why we'd want that feature?
> Could you perhaps include a patch that makes use of it in this series,
> to highlight its benefits?
> 
> -- >8 --
> Subject: [PATCH] commit-graph: use progress title directly
> 
> merge_commit_graphs() copies the (translated) progress message into a
> strbuf and passes the copy to start_delayed_progress() at each loop
> iteration.  The latter function takes a string pointer, so let's avoid
> the detour and hand the string to it directly.  That's shorter, simpler
> and slightly more efficient.
> 
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  commit-graph.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 656dd647d5..f013a84e29 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1657,19 +1657,15 @@ static void merge_commit_graphs(struct write_commit_graph_context *ctx)
>  {
>  	struct commit_graph *g = ctx->r->objects->commit_graph;
>  	uint32_t current_graph_number = ctx->num_commit_graphs_before;
> -	struct strbuf progress_title = STRBUF_INIT;
> 
>  	while (g && current_graph_number >= ctx->num_commit_graphs_after) {
>  		current_graph_number--;
> 
> -		if (ctx->report_progress) {
> -			strbuf_addstr(&progress_title, _("Merging commit-graph"));
> -			ctx->progress = start_delayed_progress(progress_title.buf, 0);
> -		}
> +		if (ctx->report_progress)
> +			ctx->progress = start_delayed_progress(_("Merging commit-graph"), 0);
> 
>  		merge_commit_graph(ctx, g);
>  		stop_progress(&ctx->progress);
> -		strbuf_release(&progress_title);
> 
>  		g = g->base_graph;
>  	}
> --

Not only is this a good change, it no longer leaks memory.
Thanks!

-Stolee
René Scharfe Feb. 27, 2020, 3:55 p.m. UTC | #8
Am 27.02.20 um 07:50 schrieb Derrick Stolee:
> On 2/20/2020 1:49 PM, René Scharfe wrote:
>> Subject: [PATCH] commit-graph: use progress title directly
>>
>> merge_commit_graphs() copies the (translated) progress message into a
>> strbuf and passes the copy to start_delayed_progress() at each loop
>> iteration.  The latter function takes a string pointer, so let's avoid
>> the detour and hand the string to it directly.  That's shorter, simpler
>> and slightly more efficient.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  commit-graph.c | 8 ++------
>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 656dd647d5..f013a84e29 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -1657,19 +1657,15 @@ static void merge_commit_graphs(struct write_commit_graph_context *ctx)
>>  {
>>  	struct commit_graph *g = ctx->r->objects->commit_graph;
>>  	uint32_t current_graph_number = ctx->num_commit_graphs_before;
>> -	struct strbuf progress_title = STRBUF_INIT;
>>
>>  	while (g && current_graph_number >= ctx->num_commit_graphs_after) {
>>  		current_graph_number--;
>>
>> -		if (ctx->report_progress) {
>> -			strbuf_addstr(&progress_title, _("Merging commit-graph"));
>> -			ctx->progress = start_delayed_progress(progress_title.buf, 0);
>> -		}
>> +		if (ctx->report_progress)
>> +			ctx->progress = start_delayed_progress(_("Merging commit-graph"), 0);
>>
>>  		merge_commit_graph(ctx, g);
>>  		stop_progress(&ctx->progress);
>> -		strbuf_release(&progress_title);
>>
>>  		g = g->base_graph;
>>  	}
>> --
>
> Not only is this a good change, it no longer leaks memory.
> Thanks!

strbuf_release() frees the allocated memory, so I don't think the code
was leaking before.  (It would have with strbuf_reset()).

René
diff mbox series

Patch

diff --git a/strbuf.h b/strbuf.h
index bfa66569a4..1a1753424c 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -71,6 +71,7 @@  struct strbuf {
 
 extern char strbuf_slopbuf[];
 #define STRBUF_INIT  { .alloc = 0, .len = 0, .buf = strbuf_slopbuf }
+#define STRBUF_INIT_CONST(const_str)  { .alloc = 0, .len = strlen(const_str), .buf = const_str }
 
 /*
  * Predeclare this here, since cache.h includes this file before it defines the