diff mbox series

[1/2] pack-bitmap: check preferred pack validity when opening MIDX bitmap

Message ID 06eca1fba9d2597906ec342c51ba2bb5c4fde0e4.1652458395.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-objects: fix a pair of MIDX bitmap-related races | expand

Commit Message

Taylor Blau May 13, 2022, 4:23 p.m. UTC
When pack-objects adds an entry to its packing list, it marks the
packfile and offset containing the object, which we may later use during
verbatim reuse (c.f., `write_reused_pack_verbatim()`).

If the packfile in question is deleted in the background (e.g., due to a
concurrent `git repack`), we'll die() as a result of calling use_pack().
4c08018204 (pack-objects: protect against disappearing packs,
2011-10-14) worked around this by opening the pack ahead of time before
recording it as a valid source for reuse.

4c08018204's treatment meant that we could tolerate disappearing packs,
since it ensures we always have an open file descriptor any pack that we
mark as a valid source for reuse. This tightens the race to only happen
when we need to close an open pack's file descriptor (c.f., the caller
of `packfile.c::get_max_fd_limit()`) _and_ that pack was deleted, in
which case we'll complain that a pack could not be accessed and die().

The pack bitmap code does this, too, since prior to bab919bd44
(pack-bitmap: check pack validity when opening bitmap, 2015-03-26) it
was vulnerable to the same race.

The MIDX bitmap code does not do this, and is vulnerable to the same
race. Apply the same treatment as bab919bd44 to the routine responsible
for opening multi-pack bitmaps to close this race.

Similar to bab919bd44, we could technically just add this check in
reuse_partial_packfile_from_bitmap(), since it's technically possible to
use a MIDX .bitmap without needing to open any of its packs. But it's
simpler to do the check as early as possible, covering all direct uses
of the preferred pack. Note that doing this check early requires us to
call prepare_midx_pack() early, too, so move the relevant part of that
loop from load_reverse_index() into open_midx_bitmap_1().

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Junio C Hamano May 13, 2022, 6:19 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> The pack bitmap code does this, too, since prior to bab919bd44
> (pack-bitmap: check pack validity when opening bitmap, 2015-03-26) it
> was vulnerable to the same race.

That might be a GitHub internal reference to some other commit?
dc1daacd (pack-bitmap: check pack validity when opening bitmap,
2021-07-23) is what I found.

> The MIDX bitmap code does not do this, and is vulnerable to the same
> race. Apply the same treatment as bab919bd44 to the routine responsible
> for opening multi-pack bitmaps to close this race.

Same reference here and ...

> Similar to bab919bd44, we could technically just add this check in

... here.  But the solution in dc1daacd is quite different from what
we see here in the posted patch, so perhaps you are referring to
something different.  I dunno.

The call graph around the functions involved is

  prepare_midx_bitmap_git()
   -> open_midx_bitmap_1()
      * opens, mmaps and closes bitmap file
      -> load_midx_revindex()
   -> load_bitmap()
      -> load_reverse_index()
         -> prepare_midx_pack()
         -> load_pack_revindex()

And prepare_midx_pack() for these packs is moved from
load_reverse_index() to open_midx_bitmap_1() in this patch.

In addition, after doing so, we call is_pack_valid() on the single
preferred pack and return failure.

Because load_bitmap() or load_reverse_index() cannot be done before
you do open_midx_bitmap_1(), calling prepare_midx_pack() early will
end up calling add_packed_git() on underlying packs, allowing us to
access them even when somebody else removed them from the disk?  Is
that the idea?

> reuse_partial_packfile_from_bitmap(), since it's technically possible to
> use a MIDX .bitmap without needing to open any of its packs. But it's
> simpler to do the check as early as possible, covering all direct uses
> of the preferred pack. Note that doing this check early requires us to
> call prepare_midx_pack() early, too, so move the relevant part of that
> loop from load_reverse_index() into open_midx_bitmap_1().

OK.  That matches my observation above, I guess.  I do not quite get
why it is sufficient to check only the preferred one, though.

> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  pack-bitmap.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 97909d48da..6b1a43d99c 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -315,6 +315,8 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
>  	struct stat st;
>  	char *idx_name = midx_bitmap_filename(midx);
>  	int fd = git_open(idx_name);
> +	uint32_t i;
> +	struct packed_git *preferred;
>  
>  	free(idx_name);
>  
> @@ -353,6 +355,21 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
>  		warning(_("multi-pack bitmap is missing required reverse index"));
>  		goto cleanup;
>  	}
> +
> +	for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> +		if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
> +			die(_("could not open pack %s"),
> +			    bitmap_git->midx->pack_names[i]);
> +	}
> +
> +	preferred = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
> +	if (!is_pack_valid(preferred)) {
> +		close(fd);

This close() does not look correct.  After calling xmmap() to map
the bitmap file to bitmap_git->map, we do not need the underlying
file descriptor in order to use the contents of the file.  We have
closed it already at this point.

> +		warning(_("preferred pack (%s) is invalid"),
> +			preferred->pack_name);
> +		goto cleanup;
> +	}
> +
>  	return 0;
>  
>  cleanup:
> @@ -429,8 +446,6 @@ static int load_reverse_index(struct bitmap_index *bitmap_git)
>  		 * since we will need to make use of them in pack-objects.
>  		 */
>  		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> -			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
> -				die(_("load_reverse_index: could not open pack"));
>  			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
>  			if (ret)
>  				return ret;
Taylor Blau May 13, 2022, 7:55 p.m. UTC | #2
On Fri, May 13, 2022 at 11:19:05AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > The pack bitmap code does this, too, since prior to bab919bd44
> > (pack-bitmap: check pack validity when opening bitmap, 2015-03-26) it
> > was vulnerable to the same race.
>
> That might be a GitHub internal reference to some other commit?
> dc1daacd (pack-bitmap: check pack validity when opening bitmap,
> 2021-07-23) is what I found.

Oops. dc1daacdcc is the right reference (it's the version of our
bab919bd44 that got submitted upstream).

> > Similar to bab919bd44, we could technically just add this check in
>
> ... here.  But the solution in dc1daacd is quite different from what
> we see here in the posted patch, so perhaps you are referring to
> something different.  I dunno.

They are similar. Both dc1daacdcc and this patch ensure that the pack
we're going to do verbatim reuse from (i.e., the one that gets passed to
`reuse_partial_pack_from_bitmap()`) has an open handle.

In the case of a pack bitmap, there is only one pack to choose from (the
pack corresponding to the bitmap itself). In the case of a multi-pack
bitmap, the preferred pack is the one we choose, since it is the only
pack among those in the MIDX that we attempt verbatim reuse out of.

> The call graph around the functions involved is
>
>   prepare_midx_bitmap_git()
>    -> open_midx_bitmap_1()
>       * opens, mmaps and closes bitmap file
>       -> load_midx_revindex()
>    -> load_bitmap()
>       -> load_reverse_index()
>          -> prepare_midx_pack()
>          -> load_pack_revindex()
>
> And prepare_midx_pack() for these packs is moved from
> load_reverse_index() to open_midx_bitmap_1() in this patch.
>
> In addition, after doing so, we call is_pack_valid() on the single
> preferred pack and return failure.
>
> Because load_bitmap() or load_reverse_index() cannot be done before
> you do open_midx_bitmap_1(), calling prepare_midx_pack() early will
> end up calling add_packed_git() on underlying packs, allowing us to
> access them even when somebody else removed them from the disk?  Is
> that the idea?

Yes, exactly. It's similar to dc1daacdcc in that we need to have an open
handle on the packfile itself in order to reuse chunks of it verbatim.
Having the bitmap open signals pack-objects to say "it is OK to call
reuse_partial_packfile_from_bitmap()", but if that function tries to
open the packfile because it wasn't already opened like above, and in
the meantime it went away, we'll end up in the "cannot be accessed"
scenario.

> > reuse_partial_packfile_from_bitmap(), since it's technically possible to
> > use a MIDX .bitmap without needing to open any of its packs. But it's
> > simpler to do the check as early as possible, covering all direct uses
> > of the preferred pack. Note that doing this check early requires us to
> > call prepare_midx_pack() early, too, so move the relevant part of that
> > loop from load_reverse_index() into open_midx_bitmap_1().
>
> OK.  That matches my observation above, I guess.  I do not quite get
> why it is sufficient to check only the preferred one, though.

Only the preferred pack is the subject of verbatim reuse. We could open
all packs, but see my note in the patch message for why I don't think
that's a great idea.

The subsequent patch more aggressively opens packs handed to us via
traverse_bitmap_commit_list().

> > @@ -353,6 +355,21 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
> >  		warning(_("multi-pack bitmap is missing required reverse index"));
> >  		goto cleanup;
> >  	}
> > +
> > +	for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> > +		if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
> > +			die(_("could not open pack %s"),
> > +			    bitmap_git->midx->pack_names[i]);
> > +	}
> > +
> > +	preferred = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
> > +	if (!is_pack_valid(preferred)) {
> > +		close(fd);
>
> This close() does not look correct.  After calling xmmap() to map
> the bitmap file to bitmap_git->map, we do not need the underlying
> file descriptor in order to use the contents of the file.  We have
> closed it already at this point.

Definitely a mistake, thanks for catching. Will remove it and send
another version after there is some review on the second patch.

Thanks in the meantime for giving it a look over!

Thanks,
Taylor
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 97909d48da..6b1a43d99c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -315,6 +315,8 @@  static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
 	struct stat st;
 	char *idx_name = midx_bitmap_filename(midx);
 	int fd = git_open(idx_name);
+	uint32_t i;
+	struct packed_git *preferred;
 
 	free(idx_name);
 
@@ -353,6 +355,21 @@  static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
 		warning(_("multi-pack bitmap is missing required reverse index"));
 		goto cleanup;
 	}
+
+	for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+		if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
+			die(_("could not open pack %s"),
+			    bitmap_git->midx->pack_names[i]);
+	}
+
+	preferred = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
+	if (!is_pack_valid(preferred)) {
+		close(fd);
+		warning(_("preferred pack (%s) is invalid"),
+			preferred->pack_name);
+		goto cleanup;
+	}
+
 	return 0;
 
 cleanup:
@@ -429,8 +446,6 @@  static int load_reverse_index(struct bitmap_index *bitmap_git)
 		 * since we will need to make use of them in pack-objects.
 		 */
 		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
-			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
-				die(_("load_reverse_index: could not open pack"));
 			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
 			if (ret)
 				return ret;