diff mbox series

[v3,5/9] pack-bitmap: introduce bitmap_walk_contains()

Message ID 20191115141541.11149-6-chriscool@tuxfamily.org (mailing list archive)
State New, archived
Headers show
Series Rewrite packfile reuse code | expand

Commit Message

Christian Couder Nov. 15, 2019, 2:15 p.m. UTC
From: Jeff King <peff@peff.net>

We will use this helper function in a following commit to
tell us if an object is packed.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 pack-bitmap.c | 12 ++++++++++++
 pack-bitmap.h |  3 +++
 2 files changed, 15 insertions(+)

Comments

Jeff King Dec. 9, 2019, 7:06 a.m. UTC | #1
On Fri, Nov 15, 2019 at 03:15:37PM +0100, Christian Couder wrote:

> We will use this helper function in a following commit to
> tell us if an object is packed.

Yeah, makes sense. This is eventually used in have_duplicate_entry() in
pack-objects, to check whether an object is already mentioned in
reuse_packfile_bitmap. And that's the part that would fix the test
failures from the previous commit.

But of course we don't yet have reuse_packfile_bitmap; that comes later.

> +int bitmap_walk_contains(struct bitmap_index *bitmap_git,
> +			 struct bitmap *bitmap, const struct object_id *oid)
> +{
> +	int idx;
> +
> +	if (!bitmap)
> +		return 0;
> +
> +	idx = bitmap_position(bitmap_git, oid);
> +	return idx >= 0 && bitmap_get(bitmap, idx);
> +}

This is really a factoring out of code in
bitmap_has_oid_in_uninteresting(). So I think you could simplify that
like:

diff --git a/pack-bitmap.c b/pack-bitmap.c
index cbfc544411..f5749d0ab3 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1194,16 +1194,6 @@ void free_bitmap_index(struct bitmap_index *b)
 int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_git,
 				    const struct object_id *oid)
 {
-	int pos;
-
-	if (!bitmap_git)
-		return 0; /* no bitmap loaded */
-	if (!bitmap_git->haves)
-		return 0; /* walk had no "haves" */
-
-	pos = bitmap_position_packfile(bitmap_git, oid);
-	if (pos < 0)
-		return 0;
-
-	return bitmap_get(bitmap_git->haves, pos);
+	return bitmap_git &&
+	       bitmap_walk_contains(bitmap_git, bitmap_git->haves, oid);
 }

One curiosity is that bitmap_has_oid_in_uninteresting() only uses
bitmap_position_packfile(), not bitmap_position(). So it wouldn't find
objects which weren't in the bitmapped packfile (i.e., ones where we
extended the bitmap to handle loose objects, or objects in other packs).

That seems like a bug in the current code to me. I suspect nobody
noticed because the only effect would be that sometimes we fail to
notice that we could reuse a delta against such an object (which isn't
incorrect, just suboptimal). I don't think p5311 would show this,
though, because it simulates a server that is fully packed.

I think it's probably still worth doing this as a preparatory patch,
though:

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e07c798879..6df22e7291 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1125,7 +1125,7 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_git,
 	if (!bitmap_git->haves)
 		return 0; /* walk had no "haves" */
 
-	pos = bitmap_position_packfile(bitmap_git, oid);
+	pos = bitmap_position(bitmap_git, oid);
 	if (pos < 0)
 		return 0;
 

-Peff
Christian Couder Dec. 13, 2019, 1:27 p.m. UTC | #2
On Mon, Dec 9, 2019 at 8:06 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Nov 15, 2019 at 03:15:37PM +0100, Christian Couder wrote:

> > +int bitmap_walk_contains(struct bitmap_index *bitmap_git,
> > +                      struct bitmap *bitmap, const struct object_id *oid)
> > +{
> > +     int idx;
> > +
> > +     if (!bitmap)
> > +             return 0;
> > +
> > +     idx = bitmap_position(bitmap_git, oid);
> > +     return idx >= 0 && bitmap_get(bitmap, idx);
> > +}
>
> This is really a factoring out of code in
> bitmap_has_oid_in_uninteresting(). So I think you could simplify that
> like:
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index cbfc544411..f5749d0ab3 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -1194,16 +1194,6 @@ void free_bitmap_index(struct bitmap_index *b)
>  int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_git,
>                                     const struct object_id *oid)
>  {
> -       int pos;
> -
> -       if (!bitmap_git)
> -               return 0; /* no bitmap loaded */
> -       if (!bitmap_git->haves)
> -               return 0; /* walk had no "haves" */
> -
> -       pos = bitmap_position_packfile(bitmap_git, oid);
> -       if (pos < 0)
> -               return 0;
> -
> -       return bitmap_get(bitmap_git->haves, pos);
> +       return bitmap_git &&
> +              bitmap_walk_contains(bitmap_git, bitmap_git->haves, oid);
>  }

Yeah, nice simplification. I added a patch doing that.

> One curiosity is that bitmap_has_oid_in_uninteresting() only uses
> bitmap_position_packfile(), not bitmap_position(). So it wouldn't find
> objects which weren't in the bitmapped packfile (i.e., ones where we
> extended the bitmap to handle loose objects, or objects in other packs).
>
> That seems like a bug in the current code to me. I suspect nobody
> noticed because the only effect would be that sometimes we fail to
> notice that we could reuse a delta against such an object (which isn't
> incorrect, just suboptimal). I don't think p5311 would show this,
> though, because it simulates a server that is fully packed.
>
> I think it's probably still worth doing this as a preparatory patch,
> though:
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index e07c798879..6df22e7291 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -1125,7 +1125,7 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_git,
>         if (!bitmap_git->haves)
>                 return 0; /* walk had no "haves" */
>
> -       pos = bitmap_position_packfile(bitmap_git, oid);
> +       pos = bitmap_position(bitmap_git, oid);
>         if (pos < 0)
>                 return 0;

Yeah, I agree that it's a good idea to do it in a preparatory patch,
so I added a patch doing that before the one doing the simplification
you suggest above.
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 016d0319fc..8a51302a1a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -826,6 +826,18 @@  int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
+int bitmap_walk_contains(struct bitmap_index *bitmap_git,
+			 struct bitmap *bitmap, const struct object_id *oid)
+{
+	int idx;
+
+	if (!bitmap)
+		return 0;
+
+	idx = bitmap_position(bitmap_git, oid);
+	return idx >= 0 && bitmap_get(bitmap, idx);
+}
+
 void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git,
 				 show_reachable_fn show_reachable)
 {
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 466c5afa09..6ab6033dbe 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -3,6 +3,7 @@ 
 
 #include "ewah/ewok.h"
 #include "khash.h"
+#include "pack.h"
 #include "pack-objects.h"
 
 struct commit;
@@ -53,6 +54,8 @@  int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping,
 			     kh_oid_map_t *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);
+int bitmap_walk_contains(struct bitmap_index *,
+			 struct bitmap *bitmap, const struct object_id *oid);
 
 /*
  * After a traversal has been performed by prepare_bitmap_walk(), this can be