diff mbox series

[v2,15/15] pack-revindex: write multi-pack reverse indexes

Message ID 01bd6a35c6c441a30a22a4c2d17e9cf53de6b148.1614193703.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series midx: implement a multi-pack reverse index | expand

Commit Message

Taylor Blau Feb. 24, 2021, 7:10 p.m. UTC
Implement the writing half of multi-pack reverse indexes. This is
nothing more than the format describe a few patches ago, with a new set
of helper functions that will be used to clear out stale .rev files
corresponding to old MIDXs.

Unfortunately, a very similar comparison function as the one implemented
recently in pack-revindex.c is reimplemented here, this time accepting a
MIDX-internal type. An effort to DRY these up would create more
indirection and overhead than is necessary, so it isn't pursued here.

Currently, there are no callers which pass the MIDX_WRITE_REV_INDEX
flag, meaning that this is all dead code. But, that won't be the case
for long, since subsequent patches will introduce the multi-pack bitmap,
which will begin passing this field.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 midx.h |   1 +
 2 files changed, 112 insertions(+)

Comments

Jonathan Tan March 2, 2021, 6:40 p.m. UTC | #1
> @@ -1018,6 +1080,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  
>  	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
>  	free_chunkfile(cf);
> +
> +	if (flags & MIDX_WRITE_REV_INDEX)
> +		ctx.pack_order = midx_pack_order(&ctx);
> +
> +	if (flags & MIDX_WRITE_REV_INDEX)
> +		write_midx_reverse_index(midx_name, midx_hash, &ctx);
> +	clear_midx_files_ext(the_repository, ".rev", midx_hash);
> +
>  	commit_lock_file(&lk);
>  
>  cleanup:

Any reason why we're using 2 separate "if" statements?

Other than that, this patch and patch 14 look good. Besides all my minor
comments, I think the overall patch set is in good shape and ready to be
merged. It's great that we could reuse some of the individual-pack reverse
index concepts and code too.
Taylor Blau March 3, 2021, 3:30 p.m. UTC | #2
On Tue, Mar 02, 2021 at 10:40:33AM -0800, Jonathan Tan wrote:
> > @@ -1018,6 +1080,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >
> >  	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
> >  	free_chunkfile(cf);
> > +
> > +	if (flags & MIDX_WRITE_REV_INDEX)
> > +		ctx.pack_order = midx_pack_order(&ctx);
> > +
> > +	if (flags & MIDX_WRITE_REV_INDEX)
> > +		write_midx_reverse_index(midx_name, midx_hash, &ctx);
> > +	clear_midx_files_ext(the_repository, ".rev", midx_hash);
> > +
> >  	commit_lock_file(&lk);
> >
> >  cleanup:
>
> Any reason why we're using 2 separate "if" statements?

Yeah. This first if statement will turn into:

  if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))

so that the pack order is computed in either case (since both the
existing write_midx_reverse_index() and the eventual write_midx_bitmap()
will be able to use the pack order).

Arguably there is never a practical reason to write one without the
other (and writing a MIDX bitmap without a reverse index is a bug), so
perhaps these options should be consolidated.

But that's cleanup that I'd rather do after all of this has settled
(since it'd be weird to say: "here's the option to write bitmaps, except
we can't write multi-pack bitmaps yet, but setting it actually writes
this other thing").

> Other than that, this patch and patch 14 look good. Besides all my minor
> comments, I think the overall patch set is in good shape and ready to be
> merged. It's great that we could reuse some of the individual-pack reverse
> index concepts and code too.

Thanks, I am really glad that you had a chance to take a look at it. I
always find your review quite helpful.

Thanks,
Taylor
Jonathan Tan March 4, 2021, 2:04 a.m. UTC | #3
> On Tue, Mar 02, 2021 at 10:40:33AM -0800, Jonathan Tan wrote:
> > > @@ -1018,6 +1080,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> > >
> > >  	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
> > >  	free_chunkfile(cf);
> > > +
> > > +	if (flags & MIDX_WRITE_REV_INDEX)
> > > +		ctx.pack_order = midx_pack_order(&ctx);
> > > +
> > > +	if (flags & MIDX_WRITE_REV_INDEX)
> > > +		write_midx_reverse_index(midx_name, midx_hash, &ctx);
> > > +	clear_midx_files_ext(the_repository, ".rev", midx_hash);
> > > +
> > >  	commit_lock_file(&lk);
> > >
> > >  cleanup:
> >
> > Any reason why we're using 2 separate "if" statements?
> 
> Yeah. This first if statement will turn into:
> 
>   if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
> 
> so that the pack order is computed in either case (since both the
> existing write_midx_reverse_index() and the eventual write_midx_bitmap()
> will be able to use the pack order).

Ah, OK. That's what I was thinking of, but nice to have confirmation.
Maybe write in the commit message that these are separated because in
the future, one of the conditions will change.
Taylor Blau March 4, 2021, 3:06 a.m. UTC | #4
On Wed, Mar 03, 2021 at 06:04:44PM -0800, Jonathan Tan wrote:
> > > Any reason why we're using 2 separate "if" statements?
> >
> > Yeah. This first if statement will turn into:
> >
> >   if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
> >
> > so that the pack order is computed in either case (since both the
> > existing write_midx_reverse_index() and the eventual write_midx_bitmap()
> > will be able to use the pack order).
>
> Ah, OK. That's what I was thinking of, but nice to have confirmation.
> Maybe write in the commit message that these are separated because in
> the future, one of the conditions will change.

Thanks; that's a great idea.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/midx.c b/midx.c
index 8d7a8927b8..820276cc45 100644
--- a/midx.c
+++ b/midx.c
@@ -12,6 +12,7 @@ 
 #include "run-command.h"
 #include "repository.h"
 #include "chunk-format.h"
+#include "pack.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -472,6 +473,7 @@  struct write_midx_context {
 	uint32_t entries_nr;
 
 	uint32_t *pack_perm;
+	uint32_t *pack_order;
 	unsigned large_offsets_needed:1;
 	uint32_t num_large_offsets;
 
@@ -826,6 +828,66 @@  static int write_midx_large_offsets(struct hashfile *f,
 	return 0;
 }
 
+static int midx_pack_order_cmp(const void *va, const void *vb, void *_ctx)
+{
+	struct write_midx_context *ctx = _ctx;
+
+	struct pack_midx_entry *a = &ctx->entries[*(const uint32_t *)va];
+	struct pack_midx_entry *b = &ctx->entries[*(const uint32_t *)vb];
+
+	uint32_t perm_a = ctx->pack_perm[a->pack_int_id];
+	uint32_t perm_b = ctx->pack_perm[b->pack_int_id];
+
+	/* Sort objects in the preferred pack ahead of any others. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
+	/* Then, order objects by which packs they appear in. */
+	if (perm_a < perm_b)
+		return -1;
+	if (perm_a > perm_b)
+		return 1;
+
+	/* Then, disambiguate by their offset within each pack. */
+	if (a->offset < b->offset)
+		return -1;
+	if (a->offset > b->offset)
+		return 1;
+
+	return 0;
+}
+
+static uint32_t *midx_pack_order(struct write_midx_context *ctx)
+{
+	uint32_t *pack_order;
+	uint32_t i;
+
+	ALLOC_ARRAY(pack_order, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, ctx->entries_nr, midx_pack_order_cmp, ctx);
+
+	return pack_order;
+}
+
+static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
+				     struct write_midx_context *ctx)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
+
+	write_rev_file_order(buf.buf, ctx->pack_order, ctx->entries_nr,
+			     midx_hash, WRITE_REV);
+
+	strbuf_release(&buf);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+				 unsigned char *keep_hash);
+
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -1018,6 +1080,14 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
+
+	if (flags & MIDX_WRITE_REV_INDEX)
+		ctx.pack_order = midx_pack_order(&ctx);
+
+	if (flags & MIDX_WRITE_REV_INDEX)
+		write_midx_reverse_index(midx_name, midx_hash, &ctx);
+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
 	commit_lock_file(&lk);
 
 cleanup:
@@ -1032,6 +1102,7 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	free(ctx.info);
 	free(ctx.entries);
 	free(ctx.pack_perm);
+	free(ctx.pack_order);
 	free(midx_name);
 	return result;
 }
@@ -1044,6 +1115,44 @@  int write_midx_file(const char *object_dir,
 				   flags);
 }
 
+struct clear_midx_data {
+	char *keep;
+	const char *ext;
+};
+
+static void clear_midx_file_ext(const char *full_path, size_t full_path_len,
+				const char *file_name, void *_data)
+{
+	struct clear_midx_data *data = _data;
+
+	if (!(starts_with(file_name, "multi-pack-index-") &&
+	      ends_with(file_name, data->ext)))
+		return;
+	if (data->keep && !strcmp(data->keep, file_name))
+		return;
+
+	if (unlink(full_path))
+		die_errno(_("failed to remove %s"), full_path);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+				 unsigned char *keep_hash)
+{
+	struct clear_midx_data data;
+	memset(&data, 0, sizeof(struct clear_midx_data));
+
+	if (keep_hash)
+		data.keep = xstrfmt("multi-pack-index-%s%s",
+				    hash_to_hex(keep_hash), ext);
+	data.ext = ext;
+
+	for_each_file_in_pack_dir(r->objects->odb->path,
+				  clear_midx_file_ext,
+				  &data);
+
+	free(data.keep);
+}
+
 void clear_midx_file(struct repository *r)
 {
 	char *midx = get_midx_filename(r->objects->odb->path);
@@ -1056,6 +1165,8 @@  void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r, ".rev", NULL);
+
 	free(midx);
 }
 
diff --git a/midx.h b/midx.h
index 0a8294d2ee..8684cf0fef 100644
--- a/midx.h
+++ b/midx.h
@@ -40,6 +40,7 @@  struct multi_pack_index {
 };
 
 #define MIDX_PROGRESS     (1 << 0)
+#define MIDX_WRITE_REV_INDEX (1 << 1)
 
 char *get_midx_rev_filename(struct multi_pack_index *m);