diff mbox series

[v2,5/9] pack-bitmap: introduce bitmap_walk_contains()

Message ID 20191019103531.23274-6-chriscool@tuxfamily.org (mailing list archive)
State New, archived
Headers show
Series Rewrite packfile reuse code | expand

Commit Message

Christian Couder Oct. 19, 2019, 10:35 a.m. UTC
From: Jeff King <peff@peff.net>

We will use this helper function in a following commit to
tell us if an object is packed.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 pack-bitmap.c | 12 ++++++++++++
 pack-bitmap.h |  3 +++
 2 files changed, 15 insertions(+)

Comments

Philip Oakley Oct. 19, 2019, 3:25 p.m. UTC | #1
Hi Christian,
can I check one thing?

On 19/10/2019 11:35, Christian Couder wrote:
> From: Jeff King <peff@peff.net>
>
> We will use this helper function in a following commit to
> tell us if an object is packed.
>
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>   pack-bitmap.c | 12 ++++++++++++
>   pack-bitmap.h |  3 +++
>   2 files changed, 15 insertions(+)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 016d0319fc..8a51302a1a 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -826,6 +826,18 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
>   	return 0;
>   }
>   
> +int bitmap_walk_contains(struct bitmap_index *bitmap_git,
> +			 struct bitmap *bitmap, const struct object_id *oid)
> +{
> +	int idx;
Excuse my ignorance here...

For the case on Windows (int/long 32 bit), is this return value 
guaranteed to be less than 2GiB, i.e. not a memory offset?

I'm just thinking ahead to the resolution of the 4GiB file limit issue 
on Git-for-Windows (https://github.com/git-for-windows/git/pull/2179)

> +
> +	if (!bitmap)
> +		return 0;
> +
> +	idx = bitmap_position(bitmap_git, oid);
> +	return idx >= 0 && bitmap_get(bitmap, idx);
> +}
> +
>   void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git,
>   				 show_reachable_fn show_reachable)
>   {
> diff --git a/pack-bitmap.h b/pack-bitmap.h
> index 466c5afa09..6ab6033dbe 100644
> --- a/pack-bitmap.h
> +++ b/pack-bitmap.h
> @@ -3,6 +3,7 @@
>   
>   #include "ewah/ewok.h"
>   #include "khash.h"
> +#include "pack.h"
>   #include "pack-objects.h"
>   
>   struct commit;
> @@ -53,6 +54,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
>   int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping,
>   			     kh_oid_map_t *reused_bitmaps, int show_progress);
>   void free_bitmap_index(struct bitmap_index *);
> +int bitmap_walk_contains(struct bitmap_index *,
> +			 struct bitmap *bitmap, const struct object_id *oid);
>   
>   /*
>    * After a traversal has been performed by prepare_bitmap_walk(), this can be
Christian Couder Oct. 19, 2019, 6:55 p.m. UTC | #2
Hi Philip,

On Sat, Oct 19, 2019 at 5:25 PM Philip Oakley <philipoakley@iee.email> wrote:
>
> Hi Christian,
> can I check one thing?

Yeah, sure! Thanks for taking a look at my patches!

> On 19/10/2019 11:35, Christian Couder wrote:

> > +int bitmap_walk_contains(struct bitmap_index *bitmap_git,
> > +                      struct bitmap *bitmap, const struct object_id *oid)
> > +{
> > +     int idx;
> Excuse my ignorance here...
>
> For the case on Windows (int/long 32 bit), is this return value
> guaranteed to be less than 2GiB, i.e. not a memory offset?
>
> I'm just thinking ahead to the resolution of the 4GiB file limit issue
> on Git-for-Windows (https://github.com/git-for-windows/git/pull/2179)

I understand your concern, unfortunately, below we have:

idx = bitmap_position(bitmap_git, oid);

and bitmap_position() returns an int at least since 3ae5fa0768
(pack-bitmap: remove bitmap_git global variable, 2018-06-07)

So I think the fix would be much more involved than just changing the
type of the idx variable. It would likely involve modifying
bitmap_position(), and thus would probably best be addressed in a
separate patch series.

> > +
> > +     if (!bitmap)
> > +             return 0;
> > +
> > +     idx = bitmap_position(bitmap_git, oid);
> > +     return idx >= 0 && bitmap_get(bitmap, idx);
> > +}
Philip Oakley Oct. 19, 2019, 8:15 p.m. UTC | #3
Hi Christian,

On 19/10/2019 19:55, Christian Couder wrote:
> Hi Philip,
>
> On Sat, Oct 19, 2019 at 5:25 PM Philip Oakley <philipoakley@iee.email> wrote:
>> Hi Christian,
>> can I check one thing?
> Yeah, sure! Thanks for taking a look at my patches!
>
>> On 19/10/2019 11:35, Christian Couder wrote:
>>> +int bitmap_walk_contains(struct bitmap_index *bitmap_git,
>>> +                      struct bitmap *bitmap, const struct object_id *oid)
>>> +{
>>> +     int idx;
>> Excuse my ignorance here...
>>
>> For the case on Windows (int/long 32 bit), is this return value
>> guaranteed to be less than 2GiB, i.e. not a memory offset?
>>
>> I'm just thinking ahead to the resolution of the 4GiB file limit issue
>> on Git-for-Windows (https://github.com/git-for-windows/git/pull/2179)
> I understand your concern, unfortunately, below we have:
>
> idx = bitmap_position(bitmap_git, oid);
>
> and bitmap_position() returns an int at least since 3ae5fa0768
> (pack-bitmap: remove bitmap_git global variable, 2018-06-07)
>
> So I think the fix would be much more involved than just changing the
> type of the idx variable. It would likely involve modifying
> bitmap_position(), and thus would probably best be addressed in a
> separate patch series.

So, IIUC it is mem-sized, so I should at least note it and pay attention 
to it for my >4G series, which like you say is "much more involved than 
just"...

The patch to flip over all the affected locations is a bit humongous 
(big), plus it's a bit of a moving target...
>>> +
>>> +     if (!bitmap)
>>> +             return 0;
>>> +
>>> +     idx = bitmap_position(bitmap_git, oid);
>>> +     return idx >= 0 && bitmap_get(bitmap, idx);
>>> +}
Philip
Jeff King Oct. 19, 2019, 11:18 p.m. UTC | #4
On Sat, Oct 19, 2019 at 04:25:19PM +0100, Philip Oakley wrote:

> > +int bitmap_walk_contains(struct bitmap_index *bitmap_git,
> > +			 struct bitmap *bitmap, const struct object_id *oid)
> > +{
> > +	int idx;
> Excuse my ignorance here...
> 
> For the case on Windows (int/long 32 bit), is this return value guaranteed
> to be less than 2GiB, i.e. not a memory offset?
> 
> I'm just thinking ahead to the resolution of the 4GiB file limit issue on
> Git-for-Windows (https://github.com/git-for-windows/git/pull/2179)

Yes, it's not a memory offset.

This "idx" here (and the return value of bitmap_position) represents a
position within an array of objects. This isn't strictly limited to the
objects in a single pack (because a traversal might extend to objects
outside the bitmapped pack), but we can use that as a general ballpark.
And it's limited to a 4-byte object count already.

So the "best" type here would be a uint32_t (which is used elsewhere
in the pack code), but we use signedness to indicate that the object
wasn't found.

That's probably OK. The biggest repos I've seen have on the order of
10-100M objects. That still gives us a factor of 20 before we hit 2^31.
If we imagine those repos took 10 years or so to accrue that many
objects, then we probably still have 200 years of growth left. Of course
growth accelerates over time, but I suspect repos with 2B objects will
run into other scaling problems first. So I don't think it's worth
worrying about too much for now.

-Peff
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 016d0319fc..8a51302a1a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -826,6 +826,18 @@  int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
+int bitmap_walk_contains(struct bitmap_index *bitmap_git,
+			 struct bitmap *bitmap, const struct object_id *oid)
+{
+	int idx;
+
+	if (!bitmap)
+		return 0;
+
+	idx = bitmap_position(bitmap_git, oid);
+	return idx >= 0 && bitmap_get(bitmap, idx);
+}
+
 void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git,
 				 show_reachable_fn show_reachable)
 {
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 466c5afa09..6ab6033dbe 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -3,6 +3,7 @@ 
 
 #include "ewah/ewok.h"
 #include "khash.h"
+#include "pack.h"
 #include "pack-objects.h"
 
 struct commit;
@@ -53,6 +54,8 @@  int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping,
 			     kh_oid_map_t *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);
+int bitmap_walk_contains(struct bitmap_index *,
+			 struct bitmap *bitmap, const struct object_id *oid);
 
 /*
  * After a traversal has been performed by prepare_bitmap_walk(), this can be