diff mbox series

[v2,04/21] strbuf: fix leak when `appendwholeline()` fails with EOF

Message ID 9dd8709d1b3b350008218133986befdb2ae74bae.1716541556.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Various memory leak fixes | expand

Commit Message

Patrick Steinhardt May 24, 2024, 10:03 a.m. UTC
In `strbuf_appendwholeline()` we call `strbuf_getwholeline()` with a
temporary buffer. In case the call returns an error we indicate this by
returning EOF, but never release the temporary buffer. This can lead to
a memory leak when the line has been partially filled. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 strbuf.c              | 4 +++-
 t/t1400-update-ref.sh | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Jeff King May 25, 2024, 4:46 a.m. UTC | #1
On Fri, May 24, 2024 at 12:03:29PM +0200, Patrick Steinhardt wrote:

> In `strbuf_appendwholeline()` we call `strbuf_getwholeline()` with a
> temporary buffer. In case the call returns an error we indicate this by
> returning EOF, but never release the temporary buffer. This can lead to
> a memory leak when the line has been partially filled. Fix this.

Hmm, doesn't this indicate a bug in getwholeline()? Most strbuf
functions on error try to leave the allocation as-is.

At the end of the getdelim() version (which is probably what you're
running), when we see an error we do:

        if (!sb->buf)
                strbuf_init(sb, 0);
        else
                strbuf_reset(sb);
        return EOF;

So if getdelim() returned error and left us with a buffer (but still
returned -1 for error!), I think this code is assuming that the buffer
it left us with was the same one that existed beforehand.

But your commit message implies that it might allocate, hit an error,
and then return that error along with an allocated buffer? Looks like
that matches the getdelim() manpage, which says:

  If *lineptr was set to NULL before the call, then the buffer should be
  freed by the user program even on failure.

So should we do something like:

diff --git a/strbuf.c b/strbuf.c
index e1076c9891..e37165812b 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -659,7 +659,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 	if (!sb->buf)
 		strbuf_init(sb, 0);
 	else
-		strbuf_reset(sb);
+		strbuf_release(sb);
 	return EOF;
 }
 #else

That assumes sb->alloc is valid after a failed call, since
strbuf_release() checks it. But that seems reasonable. If not, we'd need
to free() and re-initialize it ourselves, and the code is more like:

diff --git a/strbuf.c b/strbuf.c
index e1076c9891..aed699c6bf 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 	 * we can just re-init, but otherwise we should make sure that our
 	 * length is empty, and that the result is NUL-terminated.
 	 */
-	if (!sb->buf)
-		strbuf_init(sb, 0);
-	else
-		strbuf_reset(sb);
+	FREE_AND_NULL(sb->buf);
+	strbuf_init(sb, 0);
 	return EOF;
 }
 #else

But I think either of those would solve your leak, _and_ would help with
similar leaks of strbuf_getwholeline() and friends.

-Peff
Patrick Steinhardt May 27, 2024, 6:44 a.m. UTC | #2
On Sat, May 25, 2024 at 12:46:35AM -0400, Jeff King wrote:
> On Fri, May 24, 2024 at 12:03:29PM +0200, Patrick Steinhardt wrote:
> 
> > In `strbuf_appendwholeline()` we call `strbuf_getwholeline()` with a
> > temporary buffer. In case the call returns an error we indicate this by
> > returning EOF, but never release the temporary buffer. This can lead to
> > a memory leak when the line has been partially filled. Fix this.
> 
> Hmm, doesn't this indicate a bug in getwholeline()? Most strbuf
> functions on error try to leave the allocation as-is.
> 
> At the end of the getdelim() version (which is probably what you're
> running), when we see an error we do:
> 
>         if (!sb->buf)
>                 strbuf_init(sb, 0);
>         else
>                 strbuf_reset(sb);
>         return EOF;
> 
> So if getdelim() returned error and left us with a buffer (but still
> returned -1 for error!), I think this code is assuming that the buffer
> it left us with was the same one that existed beforehand.
> 
> But your commit message implies that it might allocate, hit an error,
> and then return that error along with an allocated buffer? Looks like
> that matches the getdelim() manpage, which says:
> 
>   If *lineptr was set to NULL before the call, then the buffer should be
>   freed by the user program even on failure.
> 
> So should we do something like:
> 
> diff --git a/strbuf.c b/strbuf.c
> index e1076c9891..e37165812b 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -659,7 +659,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
>  	if (!sb->buf)
>  		strbuf_init(sb, 0);
>  	else
> -		strbuf_reset(sb);
> +		strbuf_release(sb);
>  	return EOF;
>  }
>  #else
> 
> That assumes sb->alloc is valid after a failed call, since
> strbuf_release() checks it. But that seems reasonable. If not, we'd need
> to free() and re-initialize it ourselves, and the code is more like:
> 
> diff --git a/strbuf.c b/strbuf.c
> index e1076c9891..aed699c6bf 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
>  	 * we can just re-init, but otherwise we should make sure that our
>  	 * length is empty, and that the result is NUL-terminated.
>  	 */
> -	if (!sb->buf)
> -		strbuf_init(sb, 0);
> -	else
> -		strbuf_reset(sb);
> +	FREE_AND_NULL(sb->buf);
> +	strbuf_init(sb, 0);
>  	return EOF;
>  }
>  #else
> 
> But I think either of those would solve your leak, _and_ would help with
> similar leaks of strbuf_getwholeline() and friends.

I'm not quite convinced that `strbuf_getwholeline()` should deallocate
the buffer for the caller, I think that makes for quite a confusing
calling convention. The caller may want to reuse the buffer for other
operations, and it feels hostile to release the buffer under their feet.

The only edge case where I think it would make sense to free allocated
data is when being passed a not-yet-allocated strbuf. But I wonder
whether the added complexity would be worth it.

I've been going through all callsites and couldn't spot any that doesn't
free the buffer on EOF. So I'd propose to leave this as-is and revisit
if we eventually see that this is causing more memory leaks.

Patrick
Jeff King May 29, 2024, 9:16 a.m. UTC | #3
On Mon, May 27, 2024 at 08:44:46AM +0200, Patrick Steinhardt wrote:

> > diff --git a/strbuf.c b/strbuf.c
> > diff --git a/strbuf.c b/strbuf.c
> > index e1076c9891..aed699c6bf 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
> >  	 * we can just re-init, but otherwise we should make sure that our
> >  	 * length is empty, and that the result is NUL-terminated.
> >  	 */
> > -	if (!sb->buf)
> > -		strbuf_init(sb, 0);
> > -	else
> > -		strbuf_reset(sb);
> > +	FREE_AND_NULL(sb->buf);
> > +	strbuf_init(sb, 0);
> >  	return EOF;
> >  }
> >  #else
> > 
> > But I think either of those would solve your leak, _and_ would help with
> > similar leaks of strbuf_getwholeline() and friends.
> 
> I'm not quite convinced that `strbuf_getwholeline()` should deallocate
> the buffer for the caller, I think that makes for quite a confusing
> calling convention. The caller may want to reuse the buffer for other
> operations, and it feels hostile to release the buffer under their feet.
> 
> The only edge case where I think it would make sense to free allocated
> data is when being passed a not-yet-allocated strbuf. But I wonder
> whether the added complexity would be worth it.

I'm not sure what they'd reuse it for. We necessarily have to reset it
before reading, so the contents are now garbage. The allocated buffer
could be reused, but since everybody has to call strbuf_grow() before
assuming they can write, it's not a correctness issue, but only an
optimization. But that optimization is pretty unlikely to matter. Since
we hit this code only on EOF or error, it's generally going to happen
once in a program, and not in a tight loop.

If we really cared, though, I think you could check sb->alloc before the
call to getdelim(), and then we'd know whether the original held an
allocation or not (and we could restore its state). That's what other
syscall-ish strbuf functions like strbuf_readlink() and strbuf_getcwd()
do.

That said, I agree that leaks here are not going to be common. Most
callers are going to call it in a loop and unconditionally release at
the end, whether they get multiple lines or not. The "append" function
is the odd man out by reading a single line into a new buffer[1].

Looking through the results of:

  git grep -P '(?<!while) \(!?strbuf_get(whole)?line'

I saw only one questionable case. builtin/difftool.c does:

  if (strbuf_getline_nul(&lpath, fp))
	break;

without freeing lpath. But then...it does not free it in the case that
we got a value, either! So I think it is leaking either way, and the
solution, to strbuf_release(&lpath) outside of the loop, would fix both
cases.

> I've been going through all callsites and couldn't spot any that doesn't
> free the buffer on EOF. So I'd propose to leave this as-is and revisit
> if we eventually see that this is causing more memory leaks.

OK. I don't feel too strongly about it, but mostly thought it seemed
inconsistent with the philosophy of those other strbuf functions.

-Peff

[1] I was surprised at first that strbuf_appendwholeline() uses that
    extra copy, as the obvious implementation is just to skip the
    strbuf_reset() call in getwholeline(), and then implement "get" as a
    wrapper around "append" to add back in the reset call.

    But that only works for the slower getc() implementation. For the
    getdelim()-based one, there is no notion of append (the interface
    would have to take an offset into the buffer).  We could probably
    optimize this at the cost of repeating ourselves, but given that
    there's only one probably-not-very-hot call to appendwholeline, it's
    likely not worth it.
Patrick Steinhardt May 29, 2024, 11:25 a.m. UTC | #4
On Wed, May 29, 2024 at 05:16:33AM -0400, Jeff King wrote:
> On Mon, May 27, 2024 at 08:44:46AM +0200, Patrick Steinhardt wrote:
> > > diff --git a/strbuf.c b/strbuf.c
> > > diff --git a/strbuf.c b/strbuf.c
> > > index e1076c9891..aed699c6bf 100644
> > > --- a/strbuf.c
> > > +++ b/strbuf.c
> > > @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
> > >  	 * we can just re-init, but otherwise we should make sure that our
> > >  	 * length is empty, and that the result is NUL-terminated.
> > >  	 */
> > > -	if (!sb->buf)
> > > -		strbuf_init(sb, 0);
> > > -	else
> > > -		strbuf_reset(sb);
> > > +	FREE_AND_NULL(sb->buf);
> > > +	strbuf_init(sb, 0);
> > >  	return EOF;
> > >  }
> > >  #else
> > > 
> > > But I think either of those would solve your leak, _and_ would help with
> > > similar leaks of strbuf_getwholeline() and friends.
> > 
> > I'm not quite convinced that `strbuf_getwholeline()` should deallocate
> > the buffer for the caller, I think that makes for quite a confusing
> > calling convention. The caller may want to reuse the buffer for other
> > operations, and it feels hostile to release the buffer under their feet.
> > 
> > The only edge case where I think it would make sense to free allocated
> > data is when being passed a not-yet-allocated strbuf. But I wonder
> > whether the added complexity would be worth it.
> 
> I'm not sure what they'd reuse it for. We necessarily have to reset it
> before reading, so the contents are now garbage. The allocated buffer
> could be reused, but since everybody has to call strbuf_grow() before
> assuming they can write, it's not a correctness issue, but only an
> optimization. But that optimization is pretty unlikely to matter. Since
> we hit this code only on EOF or error, it's generally going to happen
> once in a program, and not in a tight loop.
> 
> If we really cared, though, I think you could check sb->alloc before the
> call to getdelim(), and then we'd know whether the original held an
> allocation or not (and we could restore its state). That's what other
> syscall-ish strbuf functions like strbuf_readlink() and strbuf_getcwd()
> do.

Ah, I didn't know that we did similar things in other strbuf functions.
With that precedence I think it's less ugly to do this dance.

> That said, I agree that leaks here are not going to be common. Most
> callers are going to call it in a loop and unconditionally release at
> the end, whether they get multiple lines or not. The "append" function
> is the odd man out by reading a single line into a new buffer[1].
> 
> Looking through the results of:
> 
>   git grep -P '(?<!while) \(!?strbuf_get(whole)?line'
> 
> I saw only one questionable case. builtin/difftool.c does:
> 
>   if (strbuf_getline_nul(&lpath, fp))
> 	break;
> 
> without freeing lpath. But then...it does not free it in the case that
> we got a value, either! So I think it is leaking either way, and the
> solution, to strbuf_release(&lpath) outside of the loop, would fix both
> cases.

Indeed. We also didn't free `rpath` and `info`. I do have a follow up to
this series already, so let me add those leak fixes to it.

> > I've been going through all callsites and couldn't spot any that doesn't
> > free the buffer on EOF. So I'd propose to leave this as-is and revisit
> > if we eventually see that this is causing more memory leaks.
> 
> OK. I don't feel too strongly about it, but mostly thought it seemed
> inconsistent with the philosophy of those other strbuf functions.

I get where you're coming from now with the additional info that other
syscall-ish functions do a similar dance. I'll refrain from rerolling
this series just to fix this in a different way, also because neither of
us did spot any additional leaks caused by this.

Patrick
Jeff King May 30, 2024, 7:16 a.m. UTC | #5
On Wed, May 29, 2024 at 01:25:02PM +0200, Patrick Steinhardt wrote:

> > If we really cared, though, I think you could check sb->alloc before the
> > call to getdelim(), and then we'd know whether the original held an
> > allocation or not (and we could restore its state). That's what other
> > syscall-ish strbuf functions like strbuf_readlink() and strbuf_getcwd()
> > do.
> 
> Ah, I didn't know that we did similar things in other strbuf functions.
> With that precedence I think it's less ugly to do this dance.

Yeah, I probably should have explained that better in my earlier email. ;)

> > I saw only one questionable case. builtin/difftool.c does:
> > 
> >   if (strbuf_getline_nul(&lpath, fp))
> > 	break;
> > 
> > without freeing lpath. But then...it does not free it in the case that
> > we got a value, either! So I think it is leaking either way, and the
> > solution, to strbuf_release(&lpath) outside of the loop, would fix both
> > cases.
> 
> Indeed. We also didn't free `rpath` and `info`. I do have a follow up to
> this series already, so let me add those leak fixes to it.

Great! Thanks (as usual) for being receptive to me piling more things on
your todo list. :)

-Peff
diff mbox series

Patch

diff --git a/strbuf.c b/strbuf.c
index 0d929e4e19..e1076c9891 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -691,8 +691,10 @@  int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 int strbuf_appendwholeline(struct strbuf *sb, FILE *fp, int term)
 {
 	struct strbuf line = STRBUF_INIT;
-	if (strbuf_getwholeline(&line, fp, term))
+	if (strbuf_getwholeline(&line, fp, term)) {
+		strbuf_release(&line);
 		return EOF;
+	}
 	strbuf_addbuf(sb, &line);
 	strbuf_release(&line);
 	return 0;
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index ec3443cc87..bbee2783ab 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -4,6 +4,8 @@ 
 #
 
 test_description='Test git update-ref and basic ref logging'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 Z=$ZERO_OID