mbox series

[v2,0/4] pack-objects: fix a pair of MIDX bitmap-related races

Message ID cover.1653418457.git.me@ttaylorr.com (mailing list archive)
Headers show
Series pack-objects: fix a pair of MIDX bitmap-related races | expand

Message

Taylor Blau May 24, 2022, 6:54 p.m. UTC
This is a small-ish reroll of mine and Victoria's series to fix a couple of
races related to using multi-pack bitmaps in pack-objects.

The crux of both is that we call `is_pack_valid()` far too late, leaving us in a
situation where `pack-objects` committed to using objects from a specific pack
in the MIDX bitmap, without having actually opened those packs. So if those
packs are removed in the background (e.g., due to a simultaneous repack),
any ongoing clones or fetches will see this error message:

    remote: Enumerating objects: 1498802, done.
    remote: fatal: packfile ./objects/pack/pack-$HASH.pack cannot be accessed
    remote: aborting due to possible repository corruption on the remote side.

The first patch is mostly unchanged (except for removing an accidental
double-close()), but the second patch has itself turned into three new patches
in order to resolve the issue described in [1].

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/Yn+v8mEHm2sfo4sn@nand.local/

Taylor Blau (4):
  pack-bitmap.c: check preferred pack validity when opening MIDX bitmap
  builtin/pack-objects.c: avoid redundant NULL check
  builtin/pack-objects.c: ensure included `--stdin-packs` exist
  builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects

 builtin/pack-objects.c | 43 +++++++++++++++++++++++++-----------------
 pack-bitmap.c          | 18 ++++++++++++++++--
 2 files changed, 42 insertions(+), 19 deletions(-)

Range-diff against v1:
1:  83e2ad3962 ! 1:  618e8a6166 pack-bitmap: check preferred pack validity when opening MIDX bitmap
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    pack-bitmap: check preferred pack validity when opening MIDX bitmap
    +    pack-bitmap.c: check preferred pack validity when opening MIDX bitmap
     
         When pack-objects adds an entry to its packing list, it marks the
         packfile and offset containing the object, which we may later use during
         verbatim reuse (c.f., `write_reused_pack_verbatim()`).
     
         If the packfile in question is deleted in the background (e.g., due to a
    -    concurrent `git repack`), we'll die() as a result of calling use_pack().
    -    4c08018204 (pack-objects: protect against disappearing packs,
    -    2011-10-14) worked around this by opening the pack ahead of time before
    -    recording it as a valid source for reuse.
    +    concurrent `git repack`), we'll die() as a result of calling use_pack(),
    +    unless we have an open file descriptor on the pack itself. 4c08018204
    +    (pack-objects: protect against disappearing packs, 2011-10-14) worked
    +    around this by opening the pack ahead of time before recording it as a
    +    valid source for reuse.
     
         4c08018204's treatment meant that we could tolerate disappearing packs,
    -    since it ensures we always have an open file descriptor any pack that we
    -    mark as a valid source for reuse. This tightens the race to only happen
    -    when we need to close an open pack's file descriptor (c.f., the caller
    -    of `packfile.c::get_max_fd_limit()`) _and_ that pack was deleted, in
    -    which case we'll complain that a pack could not be accessed and die().
    +    since it ensures we always have an open file descriptor on any pack that
    +    we mark as a valid source for reuse. This tightens the race to only
    +    happen when we need to close an open pack's file descriptor (c.f., the
    +    caller of `packfile.c::get_max_fd_limit()`) _and_ that pack was deleted,
    +    in which case we'll complain that a pack could not be accessed and
    +    die().
     
    -    The pack bitmap code does this, too, since prior to bab919bd44
    -    (pack-bitmap: check pack validity when opening bitmap, 2015-03-26) it
    +    The pack bitmap code does this, too, since prior to dc1daacdcc
    +    (pack-bitmap: check pack validity when opening bitmap, 2021-07-23) it
         was vulnerable to the same race.
     
         The MIDX bitmap code does not do this, and is vulnerable to the same
    -    race. Apply the same treatment as bab919bd44 to the routine responsible
    -    for opening multi-pack bitmaps to close this race.
    +    race. Apply the same treatment as dc1daacdcc to the routine responsible
    +    for opening the multi-pack bitmap's preferred pack to close this race.
     
    -    Similar to bab919bd44, we could technically just add this check in
    -    reuse_partial_packfile_from_bitmap(), since it's technically possible to
    -    use a MIDX .bitmap without needing to open any of its packs. But it's
    -    simpler to do the check as early as possible, covering all direct uses
    -    of the preferred pack. Note that doing this check early requires us to
    -    call prepare_midx_pack() early, too, so move the relevant part of that
    -    loop from load_reverse_index() into open_midx_bitmap_1().
    +    This patch handles the "preferred" pack (c.f., the section
    +    "multi-pack-index reverse indexes" in
    +    Documentation/technical/pack-format.txt) specially, since pack-objects
    +    depends on reusing exact chunks of that pack verbatim in
    +    reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded,
    +    the utility of a bitmap is significantly diminished.
    +
    +    Similar to dc1daacdcc, we could technically just add this check in
    +    reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX
    +    .bitmap without needing to open any of its packs. But it's simpler to do
    +    the check as early as possible, covering all direct uses of the
    +    preferred pack. Note that doing this check early requires us to call
    +    prepare_midx_pack() early, too, so move the relevant part of that loop
    +    from load_reverse_index() into open_midx_bitmap_1().
    +
    +    Subsequent patches handle the non-preferred packs in a slightly
    +    different fashion.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
     +
     +	preferred = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
     +	if (!is_pack_valid(preferred)) {
    -+		close(fd);
     +		warning(_("preferred pack (%s) is invalid"),
     +			preferred->pack_name);
     +		goto cleanup;
2:  9adf6e1341 < -:  ---------- builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
-:  ---------- > 2:  2719d33f32 builtin/pack-objects.c: avoid redundant NULL check
-:  ---------- > 3:  cdc3265ec2 builtin/pack-objects.c: ensure included `--stdin-packs` exist
-:  ---------- > 4:  3fc3a95517 builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects

Comments

Junio C Hamano May 24, 2022, 9:38 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

>          verbatim reuse (c.f., `write_reused_pack_verbatim()`).

Unlike "e.g." and "i.e.", I think these should all be "cf." (there
are many others).

>     +    This patch handles the "preferred" pack (c.f., the section
>     +    "multi-pack-index reverse indexes" in
>     +    Documentation/technical/pack-format.txt) specially, since pack-objects
>     +    depends on reusing exact chunks of that pack verbatim in
>     +    reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded,
>     +    the utility of a bitmap is significantly diminished.

It explains why we care about the "preferred" pack, which is a nice
clarification.  It hints that the other packs do not matter as much,
and it is clearly stated that how they are handled is ...

>     +    Similar to dc1daacdcc, we could technically just add this check in
>     +    reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX
>     +    .bitmap without needing to open any of its packs. But it's simpler to do
>     +    the check as early as possible, covering all direct uses of the
>     +    preferred pack. Note that doing this check early requires us to call
>     +    prepare_midx_pack() early, too, so move the relevant part of that loop
>     +    from load_reverse_index() into open_midx_bitmap_1().
>     +
>     +    Subsequent patches handle the non-preferred packs in a slightly
>     +    different fashion.

... left for later steps.

Excellent write-up.

>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>     @@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
>      +
>      +	preferred = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
>      +	if (!is_pack_valid(preferred)) {
>     -+		close(fd);
>      +		warning(_("preferred pack (%s) is invalid"),
>      +			preferred->pack_name);
>      +		goto cleanup;
> 2:  9adf6e1341 < -:  ---------- builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
> -:  ---------- > 2:  2719d33f32 builtin/pack-objects.c: avoid redundant NULL check
> -:  ---------- > 3:  cdc3265ec2 builtin/pack-objects.c: ensure included `--stdin-packs` exist
> -:  ---------- > 4:  3fc3a95517 builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
Taylor Blau May 25, 2022, 12:16 a.m. UTC | #2
On Tue, May 24, 2022 at 02:38:39PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >          verbatim reuse (c.f., `write_reused_pack_verbatim()`).
>
> Unlike "e.g." and "i.e.", I think these should all be "cf." (there
> are many others).

Oops. You learn something new everyday! ;-)

> >     +    This patch handles the "preferred" pack (c.f., the section
> >     +    "multi-pack-index reverse indexes" in
> >     +    Documentation/technical/pack-format.txt) specially, since pack-objects
> >     +    depends on reusing exact chunks of that pack verbatim in
> >     +    reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded,
> >     +    the utility of a bitmap is significantly diminished.
>
> It explains why we care about the "preferred" pack, which is a nice
> clarification.  It hints that the other packs do not matter as much,
> and it is clearly stated that how they are handled is ...
>
> >     +    Similar to dc1daacdcc, we could technically just add this check in
> >     +    reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX
> >     +    .bitmap without needing to open any of its packs. But it's simpler to do
> >     +    the check as early as possible, covering all direct uses of the
> >     +    preferred pack. Note that doing this check early requires us to call
> >     +    prepare_midx_pack() early, too, so move the relevant part of that loop
> >     +    from load_reverse_index() into open_midx_bitmap_1().
> >     +
> >     +    Subsequent patches handle the non-preferred packs in a slightly
> >     +    different fashion.
>
> ... left for later steps.
>
> Excellent write-up.

Thanks very much!

Taylor