diff mbox series

bundle: avoid closing file descriptor twice

Message ID pull.1857.git.1737849456338.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 9a84794ad8ad1bc8ec6b2c4e1592a1f63765e753
Headers show
Series bundle: avoid closing file descriptor twice | expand

Commit Message

Johannes Schindelin Jan. 25, 2025, 11:57 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

Already when introduced in c7a8a16239 (Add bundle transport,
2007-09-10), the `bundle` transport had a bug where it would open a file
descriptor to the bundle file and then close it _twice_: First, the file
descriptor (`data->fd`) is passed to `unbundle()`, which would use it as
the `stdin` of the `index-pack` process, which as a consequence would
close it via `start_command()`. However, `data->fd` would still hold the
numerical value of the file descriptor, and `close_bundle()` would see
that and happily close it again.

This seems not to have caused too many problems in almost two decades,
but I encountered a situation today where it _does_ cause problems: In
i686 variants of Git for Windows, it seems that file descriptors are
reused quickly after they have been closed.

In the particular scenario I faced, `git fetch <bundle> <ref>` gets the
same file descriptor value when opening the bundle file and importing
its embedded packfile (which implicitly closes the file descriptor) and
then when opening a pack file in `fetch_and_consume_refs()` while
looking up an object's header.

Later on, after the bundle has been imported (and the `close_bundle()`
function erroneously closes the file descriptor that has _already_ been
closed when using it as `stdin` for `git index-pack`), the same file
descriptor value has now been reused via `use_pack()`. Now, when either
the recursive fetch (which defaults to "on", unfortunately) or a
commit-graph update needs to `mmap()` the packfile, it fails due to a
now-invalid file descriptor that _should_ point to the pack file but
doesn't anymore.

To fix that, let's invalidate `data->fd` after calling `unbundle()`.
That way, `close_bundle()` does not close a file descriptor that may
have been reused for something different. While at it, document that
`unbundle()` closes the file descriptor, and ensure that it also does
that when failing to verify the bundle.

Luckily, this bug does not affect the bundle URI feature, it only
affects the `git fetch <bundle>` code path.

Note that this patch does not _completely_ clarifies who is responsible
to close that file descriptor, as `run_command()` may fail _without_
closing `cmd->in`. Addressing this issue thoroughly, however, would
require a rather thorough re-design of the `start_command()` and
`finish_command()` functionality to make it a lot less murky who is
responsible for what file descriptors.

At least this here patch is relatively easy to reason about, and
addresses a hard failure (`fatal: mmap: could not determine filesize`)
at the expense of leaking a file descriptor under very rare
circumstances in which `git fetch` would error out anyway.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    bundle: avoid closing file descriptor twice
    
    This is a really old bug. Deviating from Git's practice, I did not apply
    this on top of the bugged commit. In fact, I did not even apply this on
    top of an older maint-* branch because it would cause conflicts even
    cherry-picking it on top of maint-2.47.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1857%2Fdscho%2Ffix-fetch-bundle-double-close-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1857/dscho/fix-fetch-bundle-double-close-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1857

 bundle.c    | 4 +++-
 bundle.h    | 2 ++
 transport.c | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)


base-commit: f93ff170b93a1782659637824b25923245ac9dd1

Comments

Junio C Hamano Jan. 26, 2025, 2:49 a.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> This seems not to have caused too many problems in almost two decades,
> but I encountered a situation today where it _does_ cause problems: In
> i686 variants of Git for Windows, it seems that file descriptors are
> reused quickly after they have been closed.

Nice finding.  It pays not to be in too uniform a monoculture.

>     This is a really old bug. Deviating from Git's practice, I did not apply
>     this on top of the bugged commit. In fact, I did not even apply this on
>     top of an older maint-* branch because it would cause conflicts even
>     cherry-picking it on top of maint-2.47.

Just for fun, attached is my backport to maint-2.44 track.  It
merges, with conflict which is rather trivial to resolve, back to
v2.48.1, on which your original patch is based, and gives the
merge result identical to the result of direct application of your
patch to v2.48.1.

I am not sure what to feel about the FD leak mentioned in the log
message, though.

Thanks, will queue.


--- >8 ---
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Sat, 25 Jan 2025 23:57:36 +0000
Subject: [PATCH] bundle: avoid closing file descriptor twice

Already when introduced in c7a8a16239 (Add bundle transport,
2007-09-10), the `bundle` transport had a bug where it would open a file
descriptor to the bundle file and then close it _twice_: First, ...

... At least this here patch is relatively easy to reason about, and
addresses a hard failure (`fatal: mmap: could not determine filesize`)
at the expense of leaking a file descriptor under very rare
circumstances in which `git fetch` would error out anyway.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 bundle.c    | 4 +++-
 bundle.h    | 2 ++
 transport.c | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

Obviously this is hard to test ;-)

diff --git a/bundle.c b/bundle.c
index a9744da255..7b3f2da40f 100644
--- a/bundle.c
+++ b/bundle.c
@@ -616,8 +616,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
 {
 	struct child_process ip = CHILD_PROCESS_INIT;
 
-	if (verify_bundle(r, header, flags))
+	if (verify_bundle(r, header, flags)) {
+		close(bundle_fd);
 		return -1;
+	}
 
 	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
 
diff --git a/bundle.h b/bundle.h
index 021adbdcbb..8b187be7cf 100644
--- a/bundle.h
+++ b/bundle.h
@@ -50,6 +50,8 @@ int verify_bundle(struct repository *r, struct bundle_header *header,
  *
  * Before unbundling, this method will call verify_bundle() with the
  * given 'flags'.
+ *
+ * Note that the `bundle_fd` will be closed as part of the operation.
  */
 int unbundle(struct repository *r, struct bundle_header *header,
 	     int bundle_fd, struct strvec *extra_index_pack_args,
diff --git a/transport.c b/transport.c
index df518ead70..25e2233da0 100644
--- a/transport.c
+++ b/transport.c
@@ -184,6 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
 		get_refs_from_bundle_inner(transport);
 	ret = unbundle(the_repository, &data->header, data->fd,
 		       &extra_index_pack_args, 0);
+	data->fd = -1; /* `unbundle()` closes the file descriptor */
 	transport->hash_algo = data->header.hash_algo;
 	return ret;
 }
diff mbox series

Patch

diff --git a/bundle.c b/bundle.c
index f18f98fec9c..d7ad6908433 100644
--- a/bundle.c
+++ b/bundle.c
@@ -607,8 +607,10 @@  int unbundle(struct repository *r, struct bundle_header *header,
 	if (!opts)
 		opts = &opts_fallback;
 
-	if (verify_bundle(r, header, opts->flags))
+	if (verify_bundle(r, header, opts->flags)) {
+		close(bundle_fd);
 		return -1;
+	}
 
 	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
 
diff --git a/bundle.h b/bundle.h
index a80aa8ad9bf..d664b2f2d61 100644
--- a/bundle.h
+++ b/bundle.h
@@ -62,6 +62,8 @@  struct unbundle_opts {
  *
  * Before unbundling, this method will call verify_bundle() with 'flags'
  * provided in 'opts'.
+ *
+ * Note that the `bundle_fd` will be closed as part of the operation.
  */
 int unbundle(struct repository *r, struct bundle_header *header,
 	     int bundle_fd, struct strvec *extra_index_pack_args,
diff --git a/transport.c b/transport.c
index 10d820c3335..eb808f1e4b7 100644
--- a/transport.c
+++ b/transport.c
@@ -207,6 +207,7 @@  static int fetch_refs_from_bundle(struct transport *transport,
 
 	ret = unbundle(the_repository, &data->header, data->fd,
 		       &extra_index_pack_args, &opts);
+	data->fd = -1; /* `unbundle()` closes the file descriptor */
 	transport->hash_algo = data->header.hash_algo;
 
 	strvec_clear(&extra_index_pack_args);