[07/28] commit: avoid leaking already-saved buffer

Message ID	20240924215434.GG1143820@coredump.intra.peff.net (mailing list archive)
State	Accepted
Commit	753f6708d0a49cf56957f13d94d0d4919edd74b5
Headers	show Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 794DE146A6B for <git@vger.kernel.org>; Tue, 24 Sep 2024 21:54:36 +0000 (UTC) Date: Tue, 24 Sep 2024 17:54:34 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Cc: Patrick Steinhardt <ps@pks.im> Subject: [PATCH 07/28] commit: avoid leaking already-saved buffer Message-ID: <20240924215434.GG1143820@coredump.intra.peff.net> References: <20240924214930.GA1143523@coredump.intra.peff.net> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20240924214930.GA1143523@coredump.intra.peff.net>
Series	leak fixes for http fetch/push \| expand [0/28] leak fixes for http fetch/push [01/28] http-fetch: clear leaking git-index-pack(1) arguments [02/28] shallow: fix leak when unregistering last shallow root [03/28] fetch-pack: fix leaking sought refs [04/28] connect: clear child process before freeing in diagnostic mode [05/28] fetch-pack: free object filter before exiting [06/28] fetch-pack, send-pack: clean up shallow oid array [07/28] commit: avoid leaking already-saved buffer [08/28] send-pack: free cas options before exit [09/28] transport-helper: fix strbuf leak in push_refs_with_push() [10/28] fetch: free "raw" string when shrinking refspec [11/28] fetch-pack: clear pack lockfiles list [12/28] transport-helper: fix leak of dummy refs_list [13/28] http: fix leak when redacting cookies from curl trace [14/28] http: fix leak of http_object_request struct [15/28] http: call git_inflate_end() when releasing http_object_request [16/28] http: stop leaking buffer in http_get_info_packs() [17/28] remote-curl: free HEAD ref with free_one_ref() [18/28] http-walker: free fake packed_git list [19/28] http-push: clear refspecs before exiting [20/28] http-push: free repo->url string [21/28] http-push: free curl header lists [22/28] http-push: free transfer_request dest field [23/28] http-push: free transfer_request strbuf [24/28] http-push: free remote_ls_ctx.dentry_name [25/28] http-push: free xml_ctx.cdata after use [26/28] http-push: clean up objects list [27/28] http-push: clean up loose request when falling back to packed [28/28] http-push: clean up local_refs at exit

Message ID

20240924215434.GG1143820@coredump.intra.peff.net (mailing list archive)

State

Accepted

Commit

753f6708d0a49cf56957f13d94d0d4919edd74b5

Headers

Date: Tue, 24 Sep 2024 17:54:34 -0400
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>
Subject: [PATCH 07/28] commit: avoid leaking already-saved buffer
Message-ID: <20240924215434.GG1143820@coredump.intra.peff.net>
References: <20240924214930.GA1143523@coredump.intra.peff.net>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20240924214930.GA1143523@coredump.intra.peff.net>

Series

leak fixes for http fetch/push | expand

Commit Message

Jeff King Sept. 24, 2024, 9:54 p.m. UTC

When we parse a commit via repo_parse_commit_internal(), if
save_commit_buffer is set we'll stuff the buffer of the object contents
into a cache, overwriting any previous value.

This can result in a leak of that previously cached value, though it's
rare in practice. If we have a value in the cache it would have come
from a previous parse, and during that parse we'd set the object.parsed
flag, causing any subsequent parse attempts to exit without doing any
work.

But it's possible to "unparse" a commit, which we do when registering a
commit graft. And since shallow fetches are implemented using grafts,
the leak is triggered in practice by t5539.

There are a number of possible ways to address this:

  1. the unparsing function could clear the cached commit buffer, too. I
     think this would work for the case I found, but I'm not sure if
     there are other ways to end up in the same state (an unparsed
     commit with an entry in the commit buffer cache).

  2. when we parse, we could check the buffer cache and prefer it to
     reading the contents from the object database. In theory the
     contents of a particular sha1 are immutable, but the code in
     question is violating the immutability with grafts. So this
     approach makes me a bit nervous, although I think it would work in
     practice (the grafts are applied to what we parse, but we still
     retain the original contents).

  3. We could realize the cache is already populated and discard its
     contents before overwriting. It's possible some other code could be
     holding on to a pointer to the old cache entry (and we'd introduce
     a use-after-free), but I think the risk of that is relatively low.

  4. The reverse of (3): when the cache is populated, don't bother
     saving our new copy. This is perhaps a little weird, since we'll
     have just populated the commit struct based on a different buffer.
     But the two buffers should be the same, even in the presence of
     grafts (as in (2) above).

I went with option 4. It addresses the leak directly and doesn't carry
any risk of breaking other assumptions. And it's the same technique used
by parse_object_buffer() for this situation, though I'm not sure when it
would even come up there. The extra safety has been there since
bd1e17e245 (Make "parse_object()" also fill in commit message buffer
data., 2005-05-25).

This lets us mark t5539 as leak-free.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c                      | 3 ++-
 t/t5539-fetch-http-shallow.sh | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Patrick Steinhardt Sept. 26, 2024, 1:50 p.m. UTC | #1

On Tue, Sep 24, 2024 at 05:54:34PM -0400, Jeff King wrote:
> When we parse a commit via repo_parse_commit_internal(), if
> save_commit_buffer is set we'll stuff the buffer of the object contents
> into a cache, overwriting any previous value.

Interesting. I saw some cases that I think could be caused by this, but
couldn't make much sense of them.

> This can result in a leak of that previously cached value, though it's
> rare in practice. If we have a value in the cache it would have come
> from a previous parse, and during that parse we'd set the object.parsed
> flag, causing any subsequent parse attempts to exit without doing any
> work.
> 
> But it's possible to "unparse" a commit, which we do when registering a
> commit graft. And since shallow fetches are implemented using grafts,
> the leak is triggered in practice by t5539.
> 
> There are a number of possible ways to address this:
> 
>   1. the unparsing function could clear the cached commit buffer, too. I

s/the/The/

>      think this would work for the case I found, but I'm not sure if
>      there are other ways to end up in the same state (an unparsed
>      commit with an entry in the commit buffer cache).
> 
>   2. when we parse, we could check the buffer cache and prefer it to

s/when/When/

>      reading the contents from the object database. In theory the
>      contents of a particular sha1 are immutable, but the code in
>      question is violating the immutability with grafts. So this
>      approach makes me a bit nervous, although I think it would work in
>      practice (the grafts are applied to what we parse, but we still
>      retain the original contents).
> 
>   3. We could realize the cache is already populated and discard its
>      contents before overwriting. It's possible some other code could be
>      holding on to a pointer to the old cache entry (and we'd introduce
>      a use-after-free), but I think the risk of that is relatively low.
> 
>   4. The reverse of (3): when the cache is populated, don't bother
>      saving our new copy. This is perhaps a little weird, since we'll
>      have just populated the commit struct based on a different buffer.
>      But the two buffers should be the same, even in the presence of
>      grafts (as in (2) above).
> 
> I went with option 4. It addresses the leak directly and doesn't carry
> any risk of breaking other assumptions. And it's the same technique used
> by parse_object_buffer() for this situation, though I'm not sure when it
> would even come up there. The extra safety has been there since
> bd1e17e245 (Make "parse_object()" also fill in commit message buffer
> data., 2005-05-25).

Okay. This feels a bit weird indeed, but the fact that we already use
the same strategy in other places makes me feel way safer.

> This lets us mark t5539 as leak-free.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit.c                      | 3 ++-
>  t/t5539-fetch-http-shallow.sh | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/commit.c b/commit.c
> index 3a54e4db0d..cc03a93036 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -595,7 +595,8 @@ int repo_parse_commit_internal(struct repository *r,
>  	}
>  
>  	ret = parse_commit_buffer(r, item, buffer, size, 0);
> -	if (save_commit_buffer && !ret) {
> +	if (save_commit_buffer && !ret &&
> +	    !get_cached_commit_buffer(r, item, NULL)) {
>  		set_commit_buffer(r, item, buffer, size);
>  		return 0;
>  	}

And the fix is trivial.

Patrick

diff --git a/commit.c b/commit.c
index 3a54e4db0d..cc03a93036 100644
--- a/commit.c
+++ b/commit.c
@@ -595,7 +595,8 @@  int repo_parse_commit_internal(struct repository *r,
 	}
 
 	ret = parse_commit_buffer(r, item, buffer, size, 0);
-	if (save_commit_buffer && !ret) {
+	if (save_commit_buffer && !ret &&
+	    !get_cached_commit_buffer(r, item, NULL)) {
 		set_commit_buffer(r, item, buffer, size);
 		return 0;
 	}
diff --git a/t/t5539-fetch-http-shallow.sh b/t/t5539-fetch-http-shallow.sh
index 3ea75d34ca..82fe09d0a9 100755
--- a/t/t5539-fetch-http-shallow.sh
+++ b/t/t5539-fetch-http-shallow.sh
@@ -5,6 +5,7 @@  test_description='fetch/clone from a shallow clone over http'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd

[07/28] commit: avoid leaking already-saved buffer

Commit Message

Comments

Patch