diff mbox series

[v2,09/24] midx: infer preferred pack when not given one

Message ID 9495f6869d792264c4366c9914fcf93d544caa6a.1624314293.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series multi-pack reachability bitmaps | expand

Commit Message

Taylor Blau June 21, 2021, 10:25 p.m. UTC
In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the
multi-pack index code learned how to select a pack which all duplicate
objects are selected from. That is, if an object appears in multiple
packs, select the copy in the preferred pack before breaking ties
according to the other rules like pack mtime and readdir() order.

Not specifying a preferred pack can cause serious problems with
multi-pack reachability bitmaps, because these bitmaps rely on having at
least one pack from which all duplicates are selected. Not having such a
pack causes problems with the pack reuse code (e.g., like assuming that
a base object was sent from that pack via reuse when in fact the base
was selected from a different pack).

So why does not marking a pack preferred cause problems here? The reason
is roughly as follows:

  - Ties are broken (when handling duplicate objects) by sorting
    according to midx_oid_compare(), which sorts objects by OID,
    preferred-ness, pack mtime, and finally pack ID (more on that
    later).

  - The psuedo pack-order (described in
    Documentation/technical/bitmap-format.txt) is computed by
    midx_pack_order(), and sorts by pack ID and pack offset, with
    preferred packs sorting first.

  - But! Pack IDs come from incrementing the pack count in
    add_pack_to_midx(), which is a callback to
    for_each_file_in_pack_dir(), meaning that pack IDs are assigned in
    readdir() order.

When specifying a preferred pack, all of that works fine, because
duplicate objects are correctly resolved in favor of the copy in the
preferred pack, and the preferred pack sorts first in the object order.

"Sorting first" is critical, because the bitmap code relies on finding
out which pack holds the first object in the MIDX's pseudo pack-order to
determine which pack is preferred.

But if we didn't specify a preferred pack, and the pack which comes
first in readdir() order does not also have the lowest timestamp, then
it's possible that that pack (the one that sorts first in pseudo-pack
order, which the bitmap code will treat as the preferred one) did *not*
have all duplicate objects resolved in its favor, resulting in breakage.

The fix is simple: pick a (semi-arbitrary) preferred pack when none was
specified. This forces that pack to have duplicates resolved in its
favor, and (critically) to sort first in pseudo-pack order.
Unfortunately, testing this behavior portably isn't possible, since it
depends on readdir() order which isn't guaranteed by POSIX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 39 +++++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

Comments

Jeff King July 21, 2021, 10:34 a.m. UTC | #1
On Mon, Jun 21, 2021 at 06:25:21PM -0400, Taylor Blau wrote:

> In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the
> multi-pack index code learned how to select a pack which all duplicate
> objects are selected from. That is, if an object appears in multiple
> packs, select the copy in the preferred pack before breaking ties
> according to the other rules like pack mtime and readdir() order.
> 
> Not specifying a preferred pack can cause serious problems with
> multi-pack reachability bitmaps, because these bitmaps rely on having at
> least one pack from which all duplicates are selected. Not having such a
> pack causes problems with the pack reuse code (e.g., like assuming that
> a base object was sent from that pack via reuse when in fact the base
> was selected from a different pack).

It might be helpful to use a more descriptive name for "pack reuse code"
here, since it's kind of vague for people who have not been actively
working on bitmaps.

I don't have a short name for that chunk of code, but maybe:

  ...causes problems with the code in pack-objects to reuse packs
  verbatim (e.g., that code assumes that a delta object in a chunk of
  pack sent verbatim will have its base object sent from the same pack).

> So why does not marking a pack preferred cause problems here? The reason
> is roughly as follows:
> 
>   - Ties are broken (when handling duplicate objects) by sorting
>     according to midx_oid_compare(), which sorts objects by OID,
>     preferred-ness, pack mtime, and finally pack ID (more on that
>     later).
> 
>   - The psuedo pack-order (described in
>     Documentation/technical/bitmap-format.txt) is computed by
>     midx_pack_order(), and sorts by pack ID and pack offset, with
>     preferred packs sorting first.

I think the .rev description in pack-format.txt may be a better
reference here.

>   - But! Pack IDs come from incrementing the pack count in
>     add_pack_to_midx(), which is a callback to
>     for_each_file_in_pack_dir(), meaning that pack IDs are assigned in
>     readdir() order.
> 
> When specifying a preferred pack, all of that works fine, because
> duplicate objects are correctly resolved in favor of the copy in the
> preferred pack, and the preferred pack sorts first in the object order.
> 
> "Sorting first" is critical, because the bitmap code relies on finding
> out which pack holds the first object in the MIDX's pseudo pack-order to
> determine which pack is preferred.
> 
> But if we didn't specify a preferred pack, and the pack which comes
> first in readdir() order does not also have the lowest timestamp, then
> it's possible that that pack (the one that sorts first in pseudo-pack
> order, which the bitmap code will treat as the preferred one) did *not*
> have all duplicate objects resolved in its favor, resulting in breakage.
> 
> The fix is simple: pick a (semi-arbitrary) preferred pack when none was
> specified. This forces that pack to have duplicates resolved in its
> favor, and (critically) to sort first in pseudo-pack order.
> Unfortunately, testing this behavior portably isn't possible, since it
> depends on readdir() order which isn't guaranteed by POSIX.

This explanation is rather confusing, but I'm not sure if we can do much
better. I followed all of it, because I was there when we found the bug
that this is fixing. And of course that happened _after_ we implemented
midx bitmaps and in particular adapted the verbatim reuse stuff in
pack-objects to make use of it.

I see why you'd want to float the fix up before then, so we don't ever
have the broken state. But it's hard to understand what bug this is
fixing, because the bug does not even exist yet at this point in
the series!

I dunno. Like I said, I was able to follow it, so maybe it is
sufficient. I'm just not sure others would be able to.

> +
> +		if (!found)
> +			warning(_("unknown preferred pack: '%s'"),
> +				preferred_pack_name);
> +	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
> +		time_t oldest = ctx.info[0].p->mtime;
> +		ctx.preferred_pack_idx = 0;
> +
> +		if (packs_to_drop && packs_to_drop->nr)
> +			BUG("cannot write a MIDX bitmap during expiration");

Likewise, this BUG() feels somewhat out-of-place. At this point in the
series, we don't have bitmaps yet. :)

I can live with that, though. And I don't want to make a lot of work by
trying to re-order this patch within the series. Mostly I want to make
sure that if somebody stumbles on this commit via git-log or in a
bisection, that they can make some sense of what it's trying to do.

-Peff
Taylor Blau July 21, 2021, 8:16 p.m. UTC | #2
On Wed, Jul 21, 2021 at 06:34:25AM -0400, Jeff King wrote:
> On Mon, Jun 21, 2021 at 06:25:21PM -0400, Taylor Blau wrote:
>
> > In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the
> > multi-pack index code learned how to select a pack which all duplicate
> > objects are selected from. That is, if an object appears in multiple
> > packs, select the copy in the preferred pack before breaking ties
> > according to the other rules like pack mtime and readdir() order.
> >
> > Not specifying a preferred pack can cause serious problems with
> > multi-pack reachability bitmaps, because these bitmaps rely on having at
> > least one pack from which all duplicates are selected. Not having such a
> > pack causes problems with the pack reuse code (e.g., like assuming that
> > a base object was sent from that pack via reuse when in fact the base
> > was selected from a different pack).
>
> It might be helpful to use a more descriptive name for "pack reuse code"
> here, since it's kind of vague for people who have not been actively
> working on bitmaps.
>
> I don't have a short name for that chunk of code, but maybe:
>
>   ...causes problems with the code in pack-objects to reuse packs
>   verbatim (e.g., that code assumes that a delta object in a chunk of
>   pack sent verbatim will have its base object sent from the same pack).

Thanks; I like what you wrote here.

> >   - The psuedo pack-order (described in
> >     Documentation/technical/bitmap-format.txt) is computed by
> >     midx_pack_order(), and sorts by pack ID and pack offset, with
> >     preferred packs sorting first.
>
> I think the .rev description in pack-format.txt may be a better
> reference here.

Ditto, I changed that, too.

> >   - But! Pack IDs come from incrementing the pack count in
> >     add_pack_to_midx(), which is a callback to
> >     for_each_file_in_pack_dir(), meaning that pack IDs are assigned in
> >     readdir() order.
> >
> > [ ... ]
>
> This explanation is rather confusing, but I'm not sure if we can do much
> better. I followed all of it, because I was there when we found the bug
> that this is fixing. And of course that happened _after_ we implemented
> midx bitmaps and in particular adapted the verbatim reuse stuff in
> pack-objects to make use of it.
>
> I see why you'd want to float the fix up before then, so we don't ever
> have the broken state. But it's hard to understand what bug this is
> fixing, because the bug does not even exist yet at this point in
> the series!
>
> I dunno. Like I said, I was able to follow it, so maybe it is
> sufficient. I'm just not sure others would be able to.

I think that others will follow it, too. But I agree that it is
confusing, since we're fixing a bug that doesn't yet exist. In reality,
I wrote this patch after sending v1, and then reordered its position to
come before the implementation of MIDX bitmaps for that reason.

So in one sense, I prefer it this way because we don't ever introduce
the bug.  But in another sense, it is very jarring to read about an
interaction that has no basis in the code (yet).

I think that the best thing we could do without adding any significant
reordering would be to just call out the situation we're in. I added
this onto the end of the commit message which I think makes things a
little clearer:

    (Note that multi-pack reachability bitmaps have yet to be
    implemented; so in that sense this patch is fixing a bug which does
    not yet exist.  But by having this patch beforehand, we can prevent
    the bug from ever materializing.)

Thanks,
Taylor
Jeff King July 23, 2021, 8:50 a.m. UTC | #3
On Wed, Jul 21, 2021 at 04:16:07PM -0400, Taylor Blau wrote:

> > I dunno. Like I said, I was able to follow it, so maybe it is
> > sufficient. I'm just not sure others would be able to.
> 
> I think that others will follow it, too. But I agree that it is
> confusing, since we're fixing a bug that doesn't yet exist. In reality,
> I wrote this patch after sending v1, and then reordered its position to
> come before the implementation of MIDX bitmaps for that reason.
> 
> So in one sense, I prefer it this way because we don't ever introduce
> the bug.  But in another sense, it is very jarring to read about an
> interaction that has no basis in the code (yet).
> 
> I think that the best thing we could do without adding any significant
> reordering would be to just call out the situation we're in. I added
> this onto the end of the commit message which I think makes things a
> little clearer:
> 
>     (Note that multi-pack reachability bitmaps have yet to be
>     implemented; so in that sense this patch is fixing a bug which does
>     not yet exist.  But by having this patch beforehand, we can prevent
>     the bug from ever materializing.)

I do like fixing it up front. Here's my attempt at rewriting the commit
message. I tried to omit details about pack order, and instead refer to
the revindex code, and instead add more explanation of how this relates
to the pack-reuse code.

Something like:

  In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30),
  the multi-pack index code learned how to select a pack which all
  duplicate objects are selected from. That is, if an object appears in
  multiple packs, select the copy in the preferred pack before using one
  from any other pack.

  Later in that same series, 38ff7cabb6 (pack-revindex: write multi-pack
  reverse indexes, 2021-03-30) learned to put the preferred pack at the
  start of the pack order when generating a midx ".rev" file. So far,
  that .rev ordering has not mattered. But it will be very important
  once we start using the .rev ordering for midx bitmaps.

  There is code in pack-objects to reuse pack bytes verbatim when
  bitmaps tell us a significant portion of the beginning of the code
  should be in the output. This code relies on the pack mentioned by the
  0th bit also being the pack that is preferred for duplicates (because
  we'd want to make sure both bases and deltas come from the same pack).
  For a pack .bitmap, this is trivially correct. For a midx bitmap, it
  is only true when some pack gets both duplicate-priority and is placed
  at the front of the .rev file. I.e., there must be _some_ preferred
  pack.

  So if the user did not specify a preferred pack, we pick one
  arbitrarily.

  There's no test here for a few reasons:

    - the midx bitmap feature does not yet exist; this is preemptively
      fixing a problem before introducing buggy code

    - whether things go wrong with the current rules depends on things
      like readdir() order, since that is used for some midx pack
      ordering. So any test might happen to succeed or fail based on
      factors outside of our control.

Thoughts?

-Peff
Taylor Blau July 26, 2021, 7:44 p.m. UTC | #4
On Fri, Jul 23, 2021 at 04:50:31AM -0400, Jeff King wrote:
> On Wed, Jul 21, 2021 at 04:16:07PM -0400, Taylor Blau wrote:
>
> > > I dunno. Like I said, I was able to follow it, so maybe it is
> > > sufficient. I'm just not sure others would be able to.
> >
> > I think that others will follow it, too. But I agree that it is
> > confusing, since we're fixing a bug that doesn't yet exist. In reality,
> > I wrote this patch after sending v1, and then reordered its position to
> > come before the implementation of MIDX bitmaps for that reason.
> >
> > So in one sense, I prefer it this way because we don't ever introduce
> > the bug.  But in another sense, it is very jarring to read about an
> > interaction that has no basis in the code (yet).
> >
> > I think that the best thing we could do without adding any significant
> > reordering would be to just call out the situation we're in. I added
> > this onto the end of the commit message which I think makes things a
> > little clearer:
> >
> >     (Note that multi-pack reachability bitmaps have yet to be
> >     implemented; so in that sense this patch is fixing a bug which does
> >     not yet exist.  But by having this patch beforehand, we can prevent
> >     the bug from ever materializing.)
>
> I do like fixing it up front. Here's my attempt at rewriting the commit
> message. I tried to omit details about pack order, and instead refer to
> the revindex code, and instead add more explanation of how this relates
> to the pack-reuse code.
>
> Something like:
>
> [...]
>
> Thoughts?

I like it, although reading it fresh I found the sentence beginning with
"So if the user did not specify a preferred pack" to be a little
confusing. To connect it back to the previous paragraph, I added:

  ... in order to avoid a situation where no pack is marked as preferred
  (breaking our assumption about the pack representing the object at the
  0th bit).

and that read out much clearer (to me at least).

Thanks,
Taylor
diff mbox series

Patch

diff --git a/midx.c b/midx.c
index 759007d5a8..752d36c57f 100644
--- a/midx.c
+++ b/midx.c
@@ -950,15 +950,46 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;
 
-	ctx.preferred_pack_idx = -1;
 	if (preferred_pack_name) {
+		int found = 0;
 		for (i = 0; i < ctx.nr; i++) {
 			if (!cmp_idx_or_pack_name(preferred_pack_name,
 						  ctx.info[i].pack_name)) {
 				ctx.preferred_pack_idx = i;
+				found = 1;
 				break;
 			}
 		}
+
+		if (!found)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
+		time_t oldest = ctx.info[0].p->mtime;
+		ctx.preferred_pack_idx = 0;
+
+		if (packs_to_drop && packs_to_drop->nr)
+			BUG("cannot write a MIDX bitmap during expiration");
+
+		/*
+		 * set a preferred pack when writing a bitmap to ensure that
+		 * the pack from which the first object is selected in pseudo
+		 * pack-order has all of its objects selected from that pack
+		 * (and not another pack containing a duplicate)
+		 */
+		for (i = 1; i < ctx.nr; i++) {
+			time_t mtime = ctx.info[i].p->mtime;
+			if (mtime < oldest) {
+				oldest = mtime;
+				ctx.preferred_pack_idx = i;
+			}
+		}
+	} else {
+		/*
+		 * otherwise don't mark any pack as preferred to avoid
+		 * interfering with expiration logic below
+		 */
+		ctx.preferred_pack_idx = -1;
 	}
 
 	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
@@ -1029,11 +1060,7 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 						      ctx.info, ctx.nr,
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
-
-		if (!preferred)
-			warning(_("unknown preferred pack: '%s'"),
-				preferred_pack_name);
-		else {
+		if (preferred) {
 			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),