diff mbox series

[v2,3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests

Message ID 7786dc879f006c8316c33dd70e98888ceb50a014.1656249017.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series bitmap: integrate a lookup table extension to the bitmap format | expand

Commit Message

Abhradeep Chakraborty June 26, 2022, 1:10 p.m. UTC
From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Teach git to provide a way for users to enable/disable bitmap lookup
table extension by providing a config option named 'writeBitmapLookupTable'.
Default is true.

Also add test to verify writting of lookup table.

Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
---
 Documentation/config/pack.txt |  7 +++++++
 builtin/multi-pack-index.c    |  8 ++++++++
 builtin/pack-objects.c        | 10 +++++++++-
 midx.c                        |  3 +++
 midx.h                        |  1 +
 pack-bitmap-write.c           |  2 ++
 t/t5310-pack-bitmaps.sh       |  3 ++-
 t/t5326-multi-pack-bitmaps.sh | 13 +++++++++++++
 8 files changed, 45 insertions(+), 2 deletions(-)

Comments

Derrick Stolee June 27, 2022, 2:43 p.m. UTC | #1
On 6/26/2022 9:10 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> 
> Teach git to provide a way for users to enable/disable bitmap lookup
> table extension by providing a config option named 'writeBitmapLookupTable'.
> Default is true.

I wonder if it makes sense to have it default to 'false' for now, but to
change that default after the feature has been shipped and running in
production for a while.

> Also add test to verify writting of lookup table.

s/writting/writing/

> +pack.writeBitmapLookupTable::
> +	When true, git will include a "lookup table" section in the

I think you should either use "Git" when talking about the software
generally, OR use "`git repack --write-bitmap-index` will include..."

> +	bitmap index (if one is written). This table is used to defer
> +	loading individual bitmaps as late as possible. This can be
> +	beneficial in repositories which have relatively large bitmap

s/which/that/

(I'm pretty sure that "that" is better. We're trying to restrict the set
of repositories we are talking about, not implying that all repositories
have this property.)

> +	indexes. Defaults to true.
> +

> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -713,6 +713,7 @@ static void write_lookup_table(struct hashfile *f,
>  	for (i = 0; i < writer.selected_nr; i++)
>  		table_inv[table[i]] = i;
>  
> +	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
>  	for (i = 0; i < writer.selected_nr; i++) {
>  		struct bitmapped_commit *selected = &writer.selected[table[i]];
>  		uint32_t xor_offset = selected->xor_offset;
> @@ -725,6 +726,7 @@ static void write_lookup_table(struct hashfile *f,
>  
>  	free(table);
>  	free(table_inv);
> +	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
>  }

These lines seem misplaced. Maybe they were meant for the previous
patch?

Thanks,
-Stolee
Abhradeep Chakraborty June 27, 2022, 5:42 p.m. UTC | #2
Derrick Stolee <derrickstolee@github.com> wrote:

> I wonder if it makes sense to have it default to 'false' for now, but to
> change that default after the feature has been shipped and running in
> production for a while.

I do not have any opinion. If most reviewers agree on it, I will surely
Set it to false.

> I think you should either use "Git" when talking about the software
> generally, OR use "`git repack --write-bitmap-index` will include..."

Ohh, yeah! Thanks for pointing out.

> s/which/that/
>
> (I'm pretty sure that "that" is better. We're trying to restrict the set
> of repositories we are talking about, not implying that all repositories
> have this property.)

Ok.

> These lines seem misplaced. Maybe they were meant for the previous
> patch?

I mainly used it for testing purpose. That's why I included it in
This patch. But I got your point and will move it to the previous
patch.

Thanks :)
Taylor Blau June 27, 2022, 5:47 p.m. UTC | #3
On Sun, Jun 26, 2022 at 01:10:14PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Teach git to provide a way for users to enable/disable bitmap lookup
> table extension by providing a config option named 'writeBitmapLookupTable'.
> Default is true.
>
> Also add test to verify writting of lookup table.
>
> Co-Authored-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> Mentored-by: Taylor Blau <me@ttaylorr.com>
> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>

I think that this was covered earlier in the review of this round, but
in general your Signed-off-by (often abbreviated as "S-o-b") should come
last. The order should be chronological, so I'd probably suggest
something like:

    Mentored-by: Taylor Blau <me@ttaylorr.com>
    Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
    Co-Authored-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index 5edbb7fe86e..3757616f09c 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -87,6 +87,13 @@ static int git_multi_pack_index_write_config(const char *var, const char *value,
>  			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
>  	}
>
> +	if (!strcmp(var, "pack.writebitmaplookuptable")) {
> +		if (git_config_bool(var, value))
> +			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
> +		else
> +			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
> +	}
> +
>  	/*
>  	 * We should never make a fall-back call to 'git_default_config', since
>  	 * this was already called in 'cmd_multi_pack_index()'.
> @@ -123,6 +130,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>  	};
>
>  	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> +	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;

I wonder if this should respect pack.writeBitmapLookupTable, too.
Probably both of them should take into account their separate
configuration values, but cleaning up the hashcache one can be done
separately outside of this series.

Everything else looks good.

> diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> index 899a4a941e1..79be0cf80e6 100644
> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -713,6 +713,7 @@ static void write_lookup_table(struct hashfile *f,
>  	for (i = 0; i < writer.selected_nr; i++)
>  		table_inv[table[i]] = i;
>
> +	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
>  	for (i = 0; i < writer.selected_nr; i++) {
>  		struct bitmapped_commit *selected = &writer.selected[table[i]];
>  		uint32_t xor_offset = selected->xor_offset;
> @@ -725,6 +726,7 @@ static void write_lookup_table(struct hashfile *f,
>
>  	free(table);
>  	free(table_inv);
> +	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);

This region may make more sense to include in the previous commit,
though I don't have a strong feeling about it.

Thanks,
Taylor
Taylor Blau June 27, 2022, 5:49 p.m. UTC | #4
On Mon, Jun 27, 2022 at 11:12:30PM +0530, Abhradeep Chakraborty wrote:
> Derrick Stolee <derrickstolee@github.com> wrote:
>
> > I wonder if it makes sense to have it default to 'false' for now, but to
> > change that default after the feature has been shipped and running in
> > production for a while.
>
> I do not have any opinion. If most reviewers agree on it, I will surely
> Set it to false.

I think it's definitely a safe approach. I don't have a huge concern
about enabling it earlier, but I don't think we're in a huge rush to add
a new feature here, either.

So I'd be fine to ship this with the default being disabled (IOW, *not*
writing the lookup table). That should give us a window where we can
shake out whatever bugs there are, as is often the case when working
with the bitmap code ;).

Thanks,
Taylor
Abhradeep Chakraborty June 27, 2022, 6:39 p.m. UTC | #5
Taylor Blau <me@ttaylorr.com> wrote:

> Probably both of them should take into account their separate
> configuration values, but cleaning up the hashcache one can be done
> separately outside of this series.

Actually, it does respect the `pack.writebitmaplookuptable` config.
As pack.writebitmaplookuptable is by default true (for this patch
Series), this line enables it by default. If `pack.writebitmaplookuptable`
Set to false, the proposed change in the `git_multi_pack_index_write_config`
function disables this flag.

> This region may make more sense to include in the previous commit,
> though I don't have a strong feeling about it.

Ok. Will move it to the previous patch.

Thanks :)
Taylor Blau June 29, 2022, 8:11 p.m. UTC | #6
On Tue, Jun 28, 2022 at 12:09:23AM +0530, Abhradeep Chakraborty wrote:
> Taylor Blau <me@ttaylorr.com> wrote:
>
> > Probably both of them should take into account their separate
> > configuration values, but cleaning up the hashcache one can be done
> > separately outside of this series.
>
> Actually, it does respect the `pack.writebitmaplookuptable` config.
> As pack.writebitmaplookuptable is by default true (for this patch
> Series), this line enables it by default. If `pack.writebitmaplookuptable`
> Set to false, the proposed change in the `git_multi_pack_index_write_config`
> function disables this flag.

Aha, you're absolutely right. I missed the earlier hunk. Thanks for
pointing it out, this part looks fine to me.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index ad7f73a1ead..6e1f454c4d6 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -164,6 +164,13 @@  When writing a multi-pack reachability bitmap, no new namehashes are
 computed; instead, any namehashes stored in an existing bitmap are
 permuted into their appropriate location when writing a new bitmap.
 
+pack.writeBitmapLookupTable::
+	When true, git will include a "lookup table" section in the
+	bitmap index (if one is written). This table is used to defer
+	loading individual bitmaps as late as possible. This can be
+	beneficial in repositories which have relatively large bitmap
+	indexes. Defaults to true.
+
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
 	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5edbb7fe86e..3757616f09c 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -87,6 +87,13 @@  static int git_multi_pack_index_write_config(const char *var, const char *value,
 			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
 	}
 
+	if (!strcmp(var, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(var, value))
+			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+		else
+			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+	}
+
 	/*
 	 * We should never make a fall-back call to 'git_default_config', since
 	 * this was already called in 'cmd_multi_pack_index()'.
@@ -123,6 +130,7 @@  static int cmd_multi_pack_index_write(int argc, const char **argv)
 	};
 
 	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
 
 	git_config(git_multi_pack_index_write_config, NULL);
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 39e28cfcafc..d6a33fd486c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -228,7 +228,7 @@  static enum {
 	WRITE_BITMAP_QUIET,
 	WRITE_BITMAP_TRUE,
 } write_bitmap_index;
-static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
+static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE;
 
 static int exclude_promisor_objects;
 
@@ -3148,6 +3148,14 @@  static int git_pack_config(const char *k, const char *v, void *cb)
 		else
 			write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
 	}
+
+	if (!strcmp(k, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(k, v))
+			write_bitmap_options |= BITMAP_OPT_LOOKUP_TABLE;
+		else
+			write_bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE;
+	}
+
 	if (!strcmp(k, "pack.usebitmaps")) {
 		use_bitmap_index_default = git_config_bool(k, v);
 		return 0;
diff --git a/midx.c b/midx.c
index 5f0dd386b02..9c26d04bfde 100644
--- a/midx.c
+++ b/midx.c
@@ -1072,6 +1072,9 @@  static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
 		options |= BITMAP_OPT_HASH_CACHE;
 
+	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
+		options |= BITMAP_OPT_LOOKUP_TABLE;
+
 	prepare_midx_packing_data(&pdata, ctx);
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
diff --git a/midx.h b/midx.h
index 22e8e53288e..5578cd7b835 100644
--- a/midx.h
+++ b/midx.h
@@ -47,6 +47,7 @@  struct multi_pack_index {
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
+#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 void get_midx_filename(struct strbuf *out, const char *object_dir);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 899a4a941e1..79be0cf80e6 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -713,6 +713,7 @@  static void write_lookup_table(struct hashfile *f,
 	for (i = 0; i < writer.selected_nr; i++)
 		table_inv[table[i]] = i;
 
+	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
 	for (i = 0; i < writer.selected_nr; i++) {
 		struct bitmapped_commit *selected = &writer.selected[table[i]];
 		uint32_t xor_offset = selected->xor_offset;
@@ -725,6 +726,7 @@  static void write_lookup_table(struct hashfile *f,
 
 	free(table);
 	free(table_inv);
+	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
 }
 
 static void write_hash_cache(struct hashfile *f,
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index f775fc1ce69..c669ed959e9 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -38,7 +38,8 @@  test_expect_success 'full repack creates bitmaps' '
 	ls .git/objects/pack/ | grep bitmap >output &&
 	test_line_count = 1 output &&
 	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
-	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
+	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace &&
+	grep "\"label\":\"writing_lookup_table\"" trace
 '
 
 basic_bitmap_tests
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4fe57414c13..43be49617b8 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -307,4 +307,17 @@  test_expect_success 'graceful fallback when missing reverse index' '
 	)
 '
 
+test_expect_success 'multi-pack-index write writes lookup table if enabled' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		test_commit base &&
+		git repack -ad &&
+		GIT_TRACE2_EVENT="$(pwd)/trace" \
+			git multi-pack-index write --bitmap &&
+		grep "\"label\":\"writing_lookup_table\"" trace
+	)
+'
 test_done