diff mbox series

[v4,15/19] object-file.c: release the "tag" in check_tag()

Message ID patch-v4-15.19-66c24afb893-20230117T151202Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series leak fixes: various simple leak fixes | expand

Commit Message

Ævar Arnfjörð Bjarmason Jan. 17, 2023, 5:11 p.m. UTC
Fix a memory leak that's been with us ever since c879daa2372 (Make
hash-object more robust against malformed objects, 2011-02-05). With
"HASH_FORMAT_CHECK" (used by "hash-object" and "replace") we'll parse
tags into a throwaway variable on the stack, but weren't freeing the
"item->tag" we might malloc() when doing so.

The clearing that release_tag_memory() does for us is redundant here,
but let's use it as-is anyway. It only has one other existing caller,
which does need the tag to be cleared.

Mark the tests that now pass in their entirety as passing under
"SANITIZE=leak", which means we'll test them as part of the
"linux-leaks" CI job.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c         | 1 +
 t/t3800-mktag.sh      | 1 +
 t/t5302-pack-index.sh | 2 ++
 3 files changed, 4 insertions(+)

Comments

René Scharfe Jan. 17, 2023, 7:58 p.m. UTC | #1
Am 17.01.23 um 18:11 schrieb Ævar Arnfjörð Bjarmason:
> Fix a memory leak that's been with us ever since c879daa2372 (Make
> hash-object more robust against malformed objects, 2011-02-05). With
> "HASH_FORMAT_CHECK" (used by "hash-object" and "replace") we'll parse
> tags into a throwaway variable on the stack, but weren't freeing the
> "item->tag" we might malloc() when doing so.
>
> The clearing that release_tag_memory() does for us is redundant here,
> but let's use it as-is anyway. It only has one other existing caller,
> which does need the tag to be cleared.

Calling it is better than getting our hands dirty with tag internals
here.

There's similar leak in check_commit() in the same file, but plugging it
would require exporting unparse_commit().  Or perhaps using the heavy
hammer that is release_commit_memory()?  Anyway, it doesn't seem simple
to me, so that would be a patch for a separate series.

>
> Mark the tests that now pass in their entirety as passing under
> "SANITIZE=leak", which means we'll test them as part of the
> "linux-leaks" CI job.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  object-file.c         | 1 +
>  t/t3800-mktag.sh      | 1 +
>  t/t5302-pack-index.sh | 2 ++
>  3 files changed, 4 insertions(+)
>
> diff --git a/object-file.c b/object-file.c
> index 80a0cd3b351..b554266aff4 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2324,6 +2324,7 @@ static void check_tag(const void *buf, size_t size)
>  	memset(&t, 0, sizeof(t));
>  	if (parse_tag_buffer(the_repository, &t, buf, size))
>  		die(_("corrupt tag"));
> +	release_tag_memory(&t);
>  }
>
>  static int index_mem(struct index_state *istate,
> diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
> index e3cf0ffbe59..d3e428ff46e 100755
> --- a/t/t3800-mktag.sh
> +++ b/t/t3800-mktag.sh
> @@ -4,6 +4,7 @@
>
>  test_description='git mktag: tag object verify test'
>
> +TEST_PASSES_SANITIZE_LEAK=true
>  . ./test-lib.sh
>
>  ###########################################################
> diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
> index b0095ab41d3..54b11f81c63 100755
> --- a/t/t5302-pack-index.sh
> +++ b/t/t5302-pack-index.sh
> @@ -4,6 +4,8 @@
>  #
>
>  test_description='pack index with 64-bit offsets and object CRC'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
>  . ./test-lib.sh
>
>  test_expect_success 'setup' '
Jeff King Jan. 18, 2023, 5:19 p.m. UTC | #2
On Tue, Jan 17, 2023 at 08:58:24PM +0100, René Scharfe wrote:

> Am 17.01.23 um 18:11 schrieb Ævar Arnfjörð Bjarmason:
> > Fix a memory leak that's been with us ever since c879daa2372 (Make
> > hash-object more robust against malformed objects, 2011-02-05). With
> > "HASH_FORMAT_CHECK" (used by "hash-object" and "replace") we'll parse
> > tags into a throwaway variable on the stack, but weren't freeing the
> > "item->tag" we might malloc() when doing so.
> >
> > The clearing that release_tag_memory() does for us is redundant here,
> > but let's use it as-is anyway. It only has one other existing caller,
> > which does need the tag to be cleared.
> 
> Calling it is better than getting our hands dirty with tag internals
> here.
> 
> There's similar leak in check_commit() in the same file, but plugging it
> would require exporting unparse_commit().  Or perhaps using the heavy
> hammer that is release_commit_memory()?  Anyway, it doesn't seem simple
> to me, so that would be a patch for a separate series.

I think both of these cases are a bit sketchy, because they create an
object struct on the stack, and not via alloc_*_node(), which may
violate assumptions elsewhere. In the case of the tag, it's probably OK.
For the commit, though, the "index" field is going to be 0, which may
confuse release_commit_memory(). It calls free_commit_buffer(), which is
going to use that index to try to free from the buffer slab.

So I'd be wary of calling that. I'm also slightly skeptical of the
existing code that calls parse_commit_buffer(), but I think it works. We
do not attach the buffer to the commit object there (good), and we pass
"0" for check_graph, so the graph code (which IIRC also uses the index
for some slab lookups) isn't run.

I think this code would be better off either:

  1. Just allocating an object struct in the usual way. I understand the
     desire not to spend extra memory, but parse_commit_buffer() is
     going to allocate structs under the hood for the tree and parent
     commits anyway.

  2. The point of this code is to find malformed input to hash-object.
     We're probably better off feeding the buffer to fsck_commit(), etc.
     It does more thorough checks, and these days it does not need an
     object struct at all.

Either of which would naturally fix the leak for tags. I'm not sure
there actually is a leak for commits, as commit structs don't store any
strings themselves.

I don't think any of that needs to hold up Ævar's current series,
though. Fixing the leak this way in the meantime is OK, and then if we
switch to one of the other techniques, the problem just goes away.

-Peff
René Scharfe Jan. 18, 2023, 6:05 p.m. UTC | #3
Am 18.01.23 um 18:19 schrieb Jeff King:
> On Tue, Jan 17, 2023 at 08:58:24PM +0100, René Scharfe wrote:
>
>> Am 17.01.23 um 18:11 schrieb Ævar Arnfjörð Bjarmason:
>>> Fix a memory leak that's been with us ever since c879daa2372 (Make
>>> hash-object more robust against malformed objects, 2011-02-05). With
>>> "HASH_FORMAT_CHECK" (used by "hash-object" and "replace") we'll parse
>>> tags into a throwaway variable on the stack, but weren't freeing the
>>> "item->tag" we might malloc() when doing so.
>>>
>>> The clearing that release_tag_memory() does for us is redundant here,
>>> but let's use it as-is anyway. It only has one other existing caller,
>>> which does need the tag to be cleared.
>>
>> Calling it is better than getting our hands dirty with tag internals
>> here.
>>
>> There's similar leak in check_commit() in the same file, but plugging it
>> would require exporting unparse_commit().  Or perhaps using the heavy
>> hammer that is release_commit_memory()?  Anyway, it doesn't seem simple
>> to me, so that would be a patch for a separate series.
>
> I think both of these cases are a bit sketchy, because they create an
> object struct on the stack, and not via alloc_*_node(), which may
> violate assumptions elsewhere. In the case of the tag, it's probably OK.
> For the commit, though, the "index" field is going to be 0, which may
> confuse release_commit_memory(). It calls free_commit_buffer(), which is
> going to use that index to try to free from the buffer slab.
>
> So I'd be wary of calling that. I'm also slightly skeptical of the
> existing code that calls parse_commit_buffer(), but I think it works. We
> do not attach the buffer to the commit object there (good), and we pass
> "0" for check_graph, so the graph code (which IIRC also uses the index
> for some slab lookups) isn't run.
>
> I think this code would be better off either:
>
>   1. Just allocating an object struct in the usual way. I understand the
>      desire not to spend extra memory, but parse_commit_buffer() is
>      going to allocate structs under the hood for the tree and parent
>      commits anyway.
>
>   2. The point of this code is to find malformed input to hash-object.
>      We're probably better off feeding the buffer to fsck_commit(), etc.
>      It does more thorough checks, and these days it does not need an
>      object struct at all.

I like the second one, as long as it won't check too much.  c879daa237
(Make hash-object more robust against malformed objects, 2011-02-05) added
the checks that are now in object-file.c and intended to only validate the
internal structure of objects, not relations between.  It gave the example
to allow adding a commit before its tree, which should be allowed.  And
IIUC fsck_object() fits that bill.

> Either of which would naturally fix the leak for tags. I'm not sure
> there actually is a leak for commits, as commit structs don't store any
> strings themselves.

parse_commit_buffer() allocates the list of parents.

Hmm, and it looks them up.  Doesn't this violate the goal to allow
dangling references?

> I don't think any of that needs to hold up Ævar's current series,
> though. Fixing the leak this way in the meantime is OK, and then if we
> switch to one of the other techniques, the problem just goes away.

Right.

René
Jeff King Jan. 18, 2023, 6:39 p.m. UTC | #4
On Wed, Jan 18, 2023 at 07:05:32PM +0100, René Scharfe wrote:

> >   2. The point of this code is to find malformed input to hash-object.
> >      We're probably better off feeding the buffer to fsck_commit(), etc.
> >      It does more thorough checks, and these days it does not need an
> >      object struct at all.
> 
> I like the second one, as long as it won't check too much.  c879daa237
> (Make hash-object more robust against malformed objects, 2011-02-05) added
> the checks that are now in object-file.c and intended to only validate the
> internal structure of objects, not relations between.  It gave the example
> to allow adding a commit before its tree, which should be allowed.  And
> IIUC fsck_object() fits that bill.

Yes, I think it will do what the right thing here. And having just
written up a quick series, the only tests which needed changes were ones
with syntactic problems. :)

I'll send it out in a few minutes.

> > Either of which would naturally fix the leak for tags. I'm not sure
> > there actually is a leak for commits, as commit structs don't store any
> > strings themselves.
> 
> parse_commit_buffer() allocates the list of parents.

Yes, but it does so with lookup_commit(), so the resulting commit
objects are themselves reachable from the usual obj_hash, and thus not
leaked.

> Hmm, and it looks them up.  Doesn't this violate the goal to allow
> dangling references?

No, because lookup_commit() is just about creating an in-process struct.
It doesn't look at the object database at all (though it would complain
if we had seen the same oid in-process as another type).

-Peff
René Scharfe Jan. 18, 2023, 6:54 p.m. UTC | #5
Am 18.01.23 um 19:39 schrieb Jeff King:
> On Wed, Jan 18, 2023 at 07:05:32PM +0100, René Scharfe wrote:
>
>>>   2. The point of this code is to find malformed input to hash-object.
>>>      We're probably better off feeding the buffer to fsck_commit(), etc.
>>>      It does more thorough checks, and these days it does not need an
>>>      object struct at all.
>>
>> I like the second one, as long as it won't check too much.  c879daa237
>> (Make hash-object more robust against malformed objects, 2011-02-05) added
>> the checks that are now in object-file.c and intended to only validate the
>> internal structure of objects, not relations between.  It gave the example
>> to allow adding a commit before its tree, which should be allowed.  And
>> IIUC fsck_object() fits that bill.
>
> Yes, I think it will do what the right thing here. And having just
> written up a quick series, the only tests which needed changes were ones
> with syntactic problems. :)
>
> I'll send it out in a few minutes.

Great! :)

>>> Either of which would naturally fix the leak for tags. I'm not sure
>>> there actually is a leak for commits, as commit structs don't store any
>>> strings themselves.
>>
>> parse_commit_buffer() allocates the list of parents.
>
> Yes, but it does so with lookup_commit(), so the resulting commit
> objects are themselves reachable from the usual obj_hash, and thus not
> leaked.

The commit_list structures are leaked, no?

>
>> Hmm, and it looks them up.  Doesn't this violate the goal to allow
>> dangling references?
>
> No, because lookup_commit() is just about creating an in-process struct.
> It doesn't look at the object database at all (though it would complain
> if we had seen the same oid in-process as another type).

Ah, good.

René
Jeff King Jan. 18, 2023, 7:17 p.m. UTC | #6
On Wed, Jan 18, 2023 at 07:54:41PM +0100, René Scharfe wrote:

> > Yes, but it does so with lookup_commit(), so the resulting commit
> > objects are themselves reachable from the usual obj_hash, and thus not
> > leaked.
> 
> The commit_list structures are leaked, no?

Ah, yeah, you're right.

-Peff
diff mbox series

Patch

diff --git a/object-file.c b/object-file.c
index 80a0cd3b351..b554266aff4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2324,6 +2324,7 @@  static void check_tag(const void *buf, size_t size)
 	memset(&t, 0, sizeof(t));
 	if (parse_tag_buffer(the_repository, &t, buf, size))
 		die(_("corrupt tag"));
+	release_tag_memory(&t);
 }
 
 static int index_mem(struct index_state *istate,
diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index e3cf0ffbe59..d3e428ff46e 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -4,6 +4,7 @@ 
 
 test_description='git mktag: tag object verify test'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 ###########################################################
diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index b0095ab41d3..54b11f81c63 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -4,6 +4,8 @@ 
 #
 
 test_description='pack index with 64-bit offsets and object CRC'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' '