Message ID | 20200218041805.10939-2-robear.selwans@outlook.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | STRBUF_INIT_CONST Cover | expand |
On Tue, Feb 18, 2020 at 04:18:04AM +0000, Robear Selwans wrote: > A new function `STRBUF_INIT_CONST(const_str)` was added to allow for a > quick initialization of strbuf. > > Details: > Using `STRBUF_INIT_CONST(str)` creates a new struct of type `strbuf` and > initializes its `buf`, `len` and `alloc` as `str`, `strlen(str)` and > `0`, respectively. > > Use Case: > This is meant to be used to initialize strbufs with constant values and > thus, only allocating memory when needed. > > Usage Example: > ``` > strbuf env_var = STRBUF_INIT_CONST("dev"); > ``` This seems a bit dangerous to me, as we're initializing a non-const pointer with a string literal. In fact, I'm a little surprised that the compiler doesn't complain, but I think it's mostly due to historical C-isms (the type of string literals is array-of-char). Using gcc's -Wwrite-strings does complain, but there are several other cases already in Git (looking at a few, I think there are some opportunities for cleanup). Your second patch catches cases where the strbuf functions want to write to the buffer. But we've always been pretty open about the fact that strbuf.buf is a writeable C-style string. So something like this: struct strbuf x = STRBUF_INIT_CONST("foo"); size_t i; for (i = 0; i < x.len; i++) x.buf[i] = toupper(x.buf[i]); would generate no compile-time warnings, but would invoke undefined behavior (on my system it segfaults when run, but it could have even more confusing outcomes). Even though this is called out specifically in the strbuf docs: However, it is totally safe to modify anything in the string pointed by the `buf` member, between the indices `0` and `len-1` (inclusive). Of course it would be easy to fix that by adding a strbuf_make_var() call. But my concern is cases where the const-ness and the use of the strbuf are far apart. The point of a strbuf is that you can just use it without worrying, but now it's carrying this extra hidden state. If we want to pursue this direction, I think we'd do better to give each strbuf a matching array. Something like: #define STRBUF_INIT_FROM(buf) { .alloc = 0, .buf = buf, .len = ARRAY_SIZE(buf)-1 } ... char foo_buf[] = "this is the constant value"; struct strbuf foo = STRBUF_INIT_FROM(foo_buf); That gives you a true writeable buffer with the const data in it. _And_ it opens up the option of strbufs using stack buffers with an empty initial value for efficiency (i.e., avoiding the heap at all for short common cases, but being able to grow when needed). One trouble is that you can't do it all in a single variable. You'd need something like: #define DECLARE_STACK_STRBUF(name, contents) \ char name##_buf[] = (contents); struct strbuf name = STRBUF_INIT_FROM(name##_buf) ... DECLARE_STACK_STRBUF(foo, "this is the constant value"); But that gets weirdly un-C-like (your macro expands to multiple statements, which is usually a macro pitfall; but we can't use the usual "do { } while(0)" trick here, because the variables would go out of scope at the end of the fake block. So I think there are interesting directions here, but there's a lot of stuff to figure out. I notice you put GSoC in your subject line. If you're looking at this as a microproject, IMHO this is _way_ more complicated and subtle than a microproject should be. The goal there is to give something so easy that you get to focus on getting your patches in and interacting with the community. The scope I'd expect is more along the lines of compiling with -Wwrite-strings and cleaning up some of the locations that complain. -Peff
On Tue, Feb 18, 2020 at 8:21 AM Jeff King <peff@peff.net> wrote: > Your second patch catches cases where the strbuf functions want to write > to the buffer. But we've always been pretty open about the fact that > strbuf.buf is a writeable C-style string. So something like this: > ... > would generate no compile-time warnings, but would invoke undefined > behavior (on my system it segfaults when run, but it could have even > more confusing outcomes). Oh right, I didn't think about that. Ignorant of me to expect everyone to just call the functions and not edit the buf directly. > If we want to pursue this direction, I think we'd do better to give each > strbuf a matching array. Something like: > ... > So I think there are interesting directions here, but there's a lot of > stuff to figure out. I think that got me a bit fired up now. > I notice you put GSoC in your subject line. If you're looking at this as > a microproject, IMHO this is _way_ more complicated and subtle than a > microproject should be. The goal there is to give something so easy that > you get to focus on getting your patches in and interacting with the > community. The scope I'd expect is more along the lines of compiling > with -Wwrite-strings and cleaning up some of the locations that > complain. I'm actually planning to keep on contributing to git, so I kind of didn't want to do something trivial. Despite the fact that I'm planning to apply to git for GSoC, I'm mostly putting the [GSoC] so that reviewers would go easy on me :D. That said, I might actually do the -Wwrite-strings clean-up after this one is finished. Thanks for the help, I guess I'll start editing it ASAP, then. - mo7sener
On Tue, Feb 18, 2020 at 04:19:38PM +0200, Robear Selwans wrote: > > I notice you put GSoC in your subject line. If you're looking at this as > > a microproject, IMHO this is _way_ more complicated and subtle than a > > microproject should be. The goal there is to give something so easy that > > you get to focus on getting your patches in and interacting with the > > community. The scope I'd expect is more along the lines of compiling > > with -Wwrite-strings and cleaning up some of the locations that > > complain. > I'm actually planning to keep on contributing to git, so I kind of > didn't want to > do something trivial. Despite the fact that I'm planning to apply to > git for GSoC, > I'm mostly putting the [GSoC] so that reviewers would go easy on me :D. That > said, I might actually do the -Wwrite-strings clean-up after this one > is finished. OK. If you want to go further, I certainly won't stop you. :) -Peff
Am 18.02.20 um 05:18 schrieb Robear Selwans: > A new function `STRBUF_INIT_CONST(const_str)` was added to allow for a > quick initialization of strbuf. > > Details: > Using `STRBUF_INIT_CONST(str)` creates a new struct of type `strbuf` and > initializes its `buf`, `len` and `alloc` as `str`, `strlen(str)` and > `0`, respectively. > > Use Case: > This is meant to be used to initialize strbufs with constant values and > thus, only allocating memory when needed. > > Usage Example: > ``` > strbuf env_var = STRBUF_INIT_CONST("dev"); > ``` > > This was added according to the issue opened at [https://github.com/gitgitgadget/git/issues/398] I am not a friend of this change at all. Why do so many functions and strbuf instances have to pay a price (check for immutable string) for a feature that they are not using? As the macro is just intended for convenience, I suggest to implement it using strbuf_addstr() under the hood. That is much less code churn, and the price is paid only by the strbufs that actually use the feature. -- Hannes
Am 19.02.20 um 09:13 schrieb Johannes Sixt: > Am 18.02.20 um 05:18 schrieb Robear Selwans: >> A new function `STRBUF_INIT_CONST(const_str)` was added to allow for a >> quick initialization of strbuf. >> >> Details: >> Using `STRBUF_INIT_CONST(str)` creates a new struct of type `strbuf` and >> initializes its `buf`, `len` and `alloc` as `str`, `strlen(str)` and >> `0`, respectively. >> >> Use Case: >> This is meant to be used to initialize strbufs with constant values and >> thus, only allocating memory when needed. >> >> Usage Example: >> ``` >> strbuf env_var = STRBUF_INIT_CONST("dev"); >> ``` >> >> This was added according to the issue opened at [https://github.com/gitgitgadget/git/issues/398] > > I am not a friend of this change at all. Why do so many functions and > strbuf instances have to pay a price (check for immutable string) for a > feature that they are not using? > > As the macro is just intended for convenience, I suggest to implement it > using strbuf_addstr() under the hood. That is much less code churn, and > the price is paid only by the strbufs that actually use the feature. I was also wondering what the benefits of this change might be. Saving one line and thus increasing convenience slightly doesn't justify all this added complexity. Saving an allocation in the following sequence might be worthwhile: struct strbuf sb = STRBUF_INIT; strbuf_addstr(&sb, "foo"); /* Use sb without modifying it. */ strbuf_release(&sb); /* or leak it */ I found two examples of this pattern in the code, one in range-diff.c in the function show_range_diff(), and below is a patch for getting rid of the second one. Are there other reasons why we'd want that feature? Could you perhaps include a patch that makes use of it in this series, to highlight its benefits? -- >8 -- Subject: [PATCH] commit-graph: use progress title directly merge_commit_graphs() copies the (translated) progress message into a strbuf and passes the copy to start_delayed_progress() at each loop iteration. The latter function takes a string pointer, so let's avoid the detour and hand the string to it directly. That's shorter, simpler and slightly more efficient. Signed-off-by: René Scharfe <l.s.r@web.de> --- commit-graph.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 656dd647d5..f013a84e29 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1657,19 +1657,15 @@ static void merge_commit_graphs(struct write_commit_graph_context *ctx) { struct commit_graph *g = ctx->r->objects->commit_graph; uint32_t current_graph_number = ctx->num_commit_graphs_before; - struct strbuf progress_title = STRBUF_INIT; while (g && current_graph_number >= ctx->num_commit_graphs_after) { current_graph_number--; - if (ctx->report_progress) { - strbuf_addstr(&progress_title, _("Merging commit-graph")); - ctx->progress = start_delayed_progress(progress_title.buf, 0); - } + if (ctx->report_progress) + ctx->progress = start_delayed_progress(_("Merging commit-graph"), 0); merge_commit_graph(ctx, g); stop_progress(&ctx->progress); - strbuf_release(&progress_title); g = g->base_graph; } -- 2.25.1
On Thu, Feb 20, 2020 at 8:49 PM René Scharfe <l.s.r@web.de> wrote: > > Could you perhaps include a patch that makes use of it in this series, > to highlight its benefits? Well to begin with, I'm actually doing this in response to this issue [https://github.com/gitgitgadget/git/issues/398]. The issue was created because of the following mail thread, though. [https://public-inbox.org/git/20180601200146.114919-1-sbeller@google.com/] To be honest, I'm not entirely sure about how making these changes would help, as my experience is still quite limited. But from what I've read, I think the main use-case would be using const `strbuf`s to avoid memory leaks when dealing with config strings.
On 2/20/2020 1:49 PM, René Scharfe wrote: > Am 19.02.20 um 09:13 schrieb Johannes Sixt: >> As the macro is just intended for convenience, I suggest to implement it >> using strbuf_addstr() under the hood. That is much less code churn, and >> the price is paid only by the strbufs that actually use the feature. > > I was also wondering what the benefits of this change might be. Saving > one line and thus increasing convenience slightly doesn't justify all > this added complexity. Saving an allocation in the following sequence > might be worthwhile: > > struct strbuf sb = STRBUF_INIT; > strbuf_addstr(&sb, "foo"); > /* Use sb without modifying it. */ > strbuf_release(&sb); /* or leak it */ > > I found two examples of this pattern in the code, one in range-diff.c in > the function show_range_diff(), and below is a patch for getting rid of > the second one. Are there other reasons why we'd want that feature? > Could you perhaps include a patch that makes use of it in this series, > to highlight its benefits? > > -- >8 -- > Subject: [PATCH] commit-graph: use progress title directly > > merge_commit_graphs() copies the (translated) progress message into a > strbuf and passes the copy to start_delayed_progress() at each loop > iteration. The latter function takes a string pointer, so let's avoid > the detour and hand the string to it directly. That's shorter, simpler > and slightly more efficient. > > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > commit-graph.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 656dd647d5..f013a84e29 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -1657,19 +1657,15 @@ static void merge_commit_graphs(struct write_commit_graph_context *ctx) > { > struct commit_graph *g = ctx->r->objects->commit_graph; > uint32_t current_graph_number = ctx->num_commit_graphs_before; > - struct strbuf progress_title = STRBUF_INIT; > > while (g && current_graph_number >= ctx->num_commit_graphs_after) { > current_graph_number--; > > - if (ctx->report_progress) { > - strbuf_addstr(&progress_title, _("Merging commit-graph")); > - ctx->progress = start_delayed_progress(progress_title.buf, 0); > - } > + if (ctx->report_progress) > + ctx->progress = start_delayed_progress(_("Merging commit-graph"), 0); > > merge_commit_graph(ctx, g); > stop_progress(&ctx->progress); > - strbuf_release(&progress_title); > > g = g->base_graph; > } > -- Not only is this a good change, it no longer leaks memory. Thanks! -Stolee
Am 27.02.20 um 07:50 schrieb Derrick Stolee: > On 2/20/2020 1:49 PM, René Scharfe wrote: >> Subject: [PATCH] commit-graph: use progress title directly >> >> merge_commit_graphs() copies the (translated) progress message into a >> strbuf and passes the copy to start_delayed_progress() at each loop >> iteration. The latter function takes a string pointer, so let's avoid >> the detour and hand the string to it directly. That's shorter, simpler >> and slightly more efficient. >> >> Signed-off-by: René Scharfe <l.s.r@web.de> >> --- >> commit-graph.c | 8 ++------ >> 1 file changed, 2 insertions(+), 6 deletions(-) >> >> diff --git a/commit-graph.c b/commit-graph.c >> index 656dd647d5..f013a84e29 100644 >> --- a/commit-graph.c >> +++ b/commit-graph.c >> @@ -1657,19 +1657,15 @@ static void merge_commit_graphs(struct write_commit_graph_context *ctx) >> { >> struct commit_graph *g = ctx->r->objects->commit_graph; >> uint32_t current_graph_number = ctx->num_commit_graphs_before; >> - struct strbuf progress_title = STRBUF_INIT; >> >> while (g && current_graph_number >= ctx->num_commit_graphs_after) { >> current_graph_number--; >> >> - if (ctx->report_progress) { >> - strbuf_addstr(&progress_title, _("Merging commit-graph")); >> - ctx->progress = start_delayed_progress(progress_title.buf, 0); >> - } >> + if (ctx->report_progress) >> + ctx->progress = start_delayed_progress(_("Merging commit-graph"), 0); >> >> merge_commit_graph(ctx, g); >> stop_progress(&ctx->progress); >> - strbuf_release(&progress_title); >> >> g = g->base_graph; >> } >> -- > > Not only is this a good change, it no longer leaks memory. > Thanks! strbuf_release() frees the allocated memory, so I don't think the code was leaking before. (It would have with strbuf_reset()). René
diff --git a/strbuf.h b/strbuf.h index bfa66569a4..1a1753424c 100644 --- a/strbuf.h +++ b/strbuf.h @@ -71,6 +71,7 @@ struct strbuf { extern char strbuf_slopbuf[]; #define STRBUF_INIT { .alloc = 0, .len = 0, .buf = strbuf_slopbuf } +#define STRBUF_INIT_CONST(const_str) { .alloc = 0, .len = strlen(const_str), .buf = const_str } /* * Predeclare this here, since cache.h includes this file before it defines the
A new function `STRBUF_INIT_CONST(const_str)` was added to allow for a quick initialization of strbuf. Details: Using `STRBUF_INIT_CONST(str)` creates a new struct of type `strbuf` and initializes its `buf`, `len` and `alloc` as `str`, `strlen(str)` and `0`, respectively. Use Case: This is meant to be used to initialize strbufs with constant values and thus, only allocating memory when needed. Usage Example: ``` strbuf env_var = STRBUF_INIT_CONST("dev"); ``` This was added according to the issue opened at [https://github.com/gitgitgadget/git/issues/398] Signed-off-by: Robear Selwans <robear.selwans@outlook.com> --- strbuf.h | 1 + 1 file changed, 1 insertion(+)