mbox series

[0/6] strbuf: simplify `strbuf_attach()` usage

Message ID cover.1587240635.git.martin.agren@gmail.com (mailing list archive)
Headers show
Series strbuf: simplify `strbuf_attach()` usage | expand

Message

Martin Ågren April 18, 2020, 8:18 p.m. UTC
On Sat, 18 Apr 2020 at 21:56, Martin Ågren <martin.agren@gmail.com> wrote:
> > -       strbuf_attach(line, out, strlen(out), strlen(out));
> > +       strbuf_attach(line, out, out_len, out_len);
>
> This conversion is ok as such. I wondered why we pass in the same
> value twice (before and after this patch). Turns out this usage is wrong
> (as per the documentation in strbuf.h) but safe (as per my understanding
> of the implementation in strbuf.c). I'll follow up with a series that
> fell out of that investigation.

Here's that series. It could go in parallel to Danh's, or one could go
on top of the other. Any of those would be ok with me.

I think `strbuf_attach()` is sufficiently hard to use that we might as
well simplify it for the majority of the users that don't care about the
distinction between the string's length and the size of the allocated
memory, and avoid it for the few remaining users that are just as well
off using `strbuf_add()`.

The summary is that this function takes `len` and `alloc`, where the
latter must be greater than the former, yet several callers use the same
value for both. I first thought this could cause quite hairy problems
such as writing outside of allocated memory, but it doesn't look that
way. See the patches for more information.

An alternative to the approach taken here would be to introduce
`strbuf_attachstr()` and convert most existing users, then convert the
few remaining ones to use the new function or to move in another
direction. But the new name is a downer -- what else would you attach to
a strbuf if not a string?

Another alternative would be to first convert certain users away from
the function, then simplify it for the remainder. I preferred to first
spend a few patches converting a few existing callers to make it clear
that those are ok.

Martin

Martin Ågren (6):
  am: use `strbuf_attach()` correctly
  strbuf_attach: correctly pass in `strlen() + 1` for `alloc`
  strbuf: use `strbuf_attach()` correctly
  fast-import: avoid awkward use of `strbuf_attach()`
  rerere: avoid awkward use of `strbuf_attach()`
  strbuf: simplify `strbuf_attach()` usage

 strbuf.h             | 16 ++++++++--------
 apply.c              |  2 +-
 archive.c            |  2 +-
 blame.c              |  2 +-
 builtin/am.c         |  2 +-
 convert.c            |  4 ++--
 fast-import.c        |  7 ++++---
 imap-send.c          |  2 +-
 mailinfo.c           |  2 +-
 merge-recursive.c    |  2 +-
 path.c               |  3 +--
 pretty.c             |  4 ++--
 refs/files-backend.c |  2 +-
 rerere.c             |  3 ++-
 strbuf.c             | 11 +++++++----
 trailer.c            |  2 +-
 16 files changed, 35 insertions(+), 31 deletions(-)

Comments

Martin Ågren April 19, 2020, 4:44 a.m. UTC | #1
On Sat, 18 Apr 2020 at 22:18, Martin Ågren <martin.agren@gmail.com> wrote:
>
> On Sat, 18 Apr 2020 at 21:56, Martin Ågren <martin.agren@gmail.com> wrote:
> > > -       strbuf_attach(line, out, strlen(out), strlen(out));
> > > +       strbuf_attach(line, out, out_len, out_len);
> >
> > This conversion is ok as such. I wondered why we pass in the same
> > value twice (before and after this patch). Turns out this usage is wrong
> > (as per the documentation in strbuf.h) but safe (as per my understanding
> > of the implementation in strbuf.c). I'll follow up with a series that
> > fell out of that investigation.
>
> Here's that series. It could go in parallel to Danh's, or one could go
> on top of the other. Any of those would be ok with me.

I realize now that some of the reasoning in this series is incorrect.
`strbuf_grow()` will use `ALLOC_GROW` meaning we might reallocate
cheaply rather than doing an entire allocate+copy+free cycle.

Still, there is some buggyness in the usage because we might reallocate
(with a 50% overhead) even when we shouldn't really need to.

I'll resubmit this after tackling it from a different angle: I'll
introduce `strbuf_attachstr()`, simplifying and robustifyng most users.
Then I'll switch a few incorrect users of `strbuf_attach()` to
`strbuf_attachstr()`. Then a small number of users will continue using
`strbuf_attach()` with `alloc == len` for $reasons.

> The summary is that this function takes `len` and `alloc`, where the
> latter must be greater than the former, yet several callers use the same
> value for both. I first thought this could cause quite hairy problems
> such as writing outside of allocated memory, but it doesn't look that
> way. See the patches for more information.
>
> An alternative to the approach taken here would be to introduce
> `strbuf_attachstr()` and convert most existing users, then convert the
> few remaining ones to use the new function or to move in another
> direction. But the new name is a downer -- what else would you attach to
> a strbuf if not a string?

So this is what I'll do instead. The reasoning is, you can attach a
string (a NUL-terminated buffer) or a non-string (non-NUL-terminated
buffer).

Martin