mbox series

[v5,00/27] multi-pack reachability bitmaps

Message ID cover.1630443072.git.me@ttaylorr.com (mailing list archive)
Headers show
Series multi-pack reachability bitmaps | expand

Message

Taylor Blau Aug. 31, 2021, 8:51 p.m. UTC
Here is another version of the multi-pack reachability bitmaps series. It is
virtually unchanged since last time.

The changes that did occur is that I integrated Johannes' patch from [1] to fix
cleaning up MIDX .rev and .bitmap files when using `--object-dir`. That inspired
a lengthy discussion [2] about `--object-dir`, alternates, object-format and
running the MIDX builtin outside of a Git repository.

This series resolves that discussion by leaving everything as-is, and only
changing the following:

  - `git multi-pack-index` will not run when outside of a Git
    repository.

  - The `--object-dir` argument will only recognize object directories
    belonging to an alternate of the current repository.

  - Using `--object-dir` to point to a repository which uses a
    different hash than the repository in the current working directory
    will continue to not work (as was the case before this series).

And because this incorporates [1], we will also not accidentally clean `.rev`
files from the wrong object directory.

I think that this version is ready-to-go, and that we can turn our attention to
squashing some of these cross-alternate buglets, and integrating MIDX bitmaps
with `git repack`.

[1]: https://lore.kernel.org/git/20210823171011.80588-1-johannes@sipsolutions.net/
[2]: https://lore.kernel.org/git/YSVsHo2wLhnraBnv@nand.local/

Jeff King (2):
  t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP

Taylor Blau (25):
  pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  pack-bitmap-write.c: free existing bitmaps
  Documentation: describe MIDX-based bitmaps
  midx: disallow running outside of a repository
  midx: fix `*.rev` cleanups with `--object-dir`
  midx: clear auxiliary .rev after replacing the MIDX
  midx: reject empty `--preferred-pack`'s
  midx: infer preferred pack when not given one
  midx: close linked MIDXs, avoid leaking memory
  midx: avoid opening multiple MIDXs when writing
  pack-bitmap.c: introduce 'bitmap_num_objects()'
  pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  pack-bitmap.c: avoid redundant calls to try_partial_reuse
  pack-bitmap: read multi-pack bitmaps
  pack-bitmap: write multi-pack bitmaps
  t5310: move some tests to lib-bitmap.sh
  t/helper/test-read-midx.c: add --checksum mode
  t5326: test multi-pack bitmap behavior
  t5319: don't write MIDX bitmaps in t5319
  t7700: update to work with MIDX bitmap test knob
  midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  p5310: extract full and partial bitmap tests
  p5326: perf tests for MIDX bitmaps

 Documentation/git-multi-pack-index.txt       |  20 +-
 Documentation/technical/bitmap-format.txt    |  71 ++-
 Documentation/technical/multi-pack-index.txt |  10 +-
 builtin/multi-pack-index.c                   |   2 +
 builtin/pack-objects.c                       |   8 +-
 builtin/repack.c                             |  12 +-
 ci/run-build-and-tests.sh                    |   1 +
 git.c                                        |   2 +-
 midx.c                                       | 328 ++++++++++--
 midx.h                                       |   5 +
 pack-bitmap-write.c                          |  79 ++-
 pack-bitmap.c                                | 499 ++++++++++++++++---
 pack-bitmap.h                                |   9 +-
 packfile.c                                   |   2 +-
 t/README                                     |   4 +
 t/helper/test-read-midx.c                    |  16 +-
 t/lib-bitmap.sh                              | 240 +++++++++
 t/perf/lib-bitmap.sh                         |  69 +++
 t/perf/p5310-pack-bitmaps.sh                 |  65 +--
 t/perf/p5326-multi-pack-bitmaps.sh           |  43 ++
 t/t0410-partial-clone.sh                     |  12 +-
 t/t5310-pack-bitmaps.sh                      | 231 +--------
 t/t5319-multi-pack-index.sh                  |  53 +-
 t/t5326-multi-pack-bitmaps.sh                | 286 +++++++++++
 t/t7700-repack.sh                            |  18 +-
 25 files changed, 1644 insertions(+), 441 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

Range-diff against v4:
 1:  92dc0bbc0d =  1:  7815d9929d pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
 2:  979276bc74 =  2:  629171115a pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
 3:  8f00493955 =  3:  d469c1d8f6 pack-bitmap-write.c: free existing bitmaps
 4:  bc7db926d8 =  4:  158ff797c4 Documentation: describe MIDX-based bitmaps
 -:  ---------- >  5:  5f24be8985 midx: disallow running outside of a repository
 -:  ---------- >  6:  0aacaa9283 midx: fix `*.rev` cleanups with `--object-dir`
 5:  771741844b !  7:  d30e6fe9a5 midx: clear auxiliary .rev after replacing the MIDX
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      
      	if (flags & MIDX_WRITE_REV_INDEX)
      		write_midx_reverse_index(midx_name, midx_hash, &ctx);
    --	clear_midx_files_ext(the_repository, ".rev", midx_hash);
    +-	clear_midx_files_ext(object_dir, ".rev", midx_hash);
      
      	commit_lock_file(&lk);
      
    -+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
    ++	clear_midx_files_ext(object_dir, ".rev", midx_hash);
     +
      cleanup:
      	for (i = 0; i < ctx.nr; i++) {
 6:  dab5dbf228 =  8:  db2a24a8ae midx: reject empty `--preferred-pack`'s
 7:  31f4517de0 =  9:  059c583e34 midx: infer preferred pack when not given one
 8:  aa3bd96d9b = 10:  6f5ca446f3 midx: close linked MIDXs, avoid leaking memory
 9:  c9fea31fa8 ! 11:  4656608f73 midx: avoid opening multiple MIDXs when writing
    @@ Commit message
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    + ## Documentation/git-multi-pack-index.txt ##
    +@@ Documentation/git-multi-pack-index.txt: OPTIONS
    + 	Use given directory for the location of Git objects. We check
    + 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
    + 	`<dir>/packs` for the pack-files to index.
    +++
    ++`<dir>` must be an alternate of the current repository.
    + 
    + --[no-]progress::
    + 	Turn progress on/off explicitly. If neither is specified, progress is
    +
      ## midx.c ##
     @@ midx.c: static int midx_checksum_valid(struct multi_pack_index *m)
      	return hashfile_checksum_valid(m->data, m->data_len);
10:  ee72fb7e38 = 12:  4c793df9d1 pack-bitmap.c: introduce 'bitmap_num_objects()'
11:  ede0bf1ce1 = 13:  9f165037ce pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
12:  df6844def0 = 14:  ba5fd71fb3 pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
13:  4e06f051a7 = 15:  06db8dbbc1 pack-bitmap.c: avoid redundant calls to try_partial_reuse
14:  a0d73eb3d3 = 16:  61798853b6 pack-bitmap: read multi-pack bitmaps
15:  9d83ad77ab ! 17:  4968229663 pack-bitmap: write multi-pack bitmaps
    @@ midx.c: static int write_midx_internal(const char *object_dir,
     +			 * corresponding bitmap (or one wasn't requested).
     +			 */
     +			if (!want_bitmap)
    -+				clear_midx_files_ext(the_repository, ".bitmap",
    ++				clear_midx_files_ext(object_dir, ".bitmap",
     +						     NULL);
     +			goto cleanup;
     +		}
    @@ midx.c: static int write_midx_internal(const char *object_dir,
     +		}
     +	}
     +
    -+	close_object_store(the_repository->objects);
    ++	if (ctx.m)
    ++		close_object_store(the_repository->objects);
      
      	commit_lock_file(&lk);
      
    -+	clear_midx_files_ext(the_repository, ".bitmap", midx_hash);
    - 	clear_midx_files_ext(the_repository, ".rev", midx_hash);
    ++	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
    + 	clear_midx_files_ext(object_dir, ".rev", midx_hash);
      
      cleanup:
     @@ midx.c: static int write_midx_internal(const char *object_dir,
    @@ midx.c: void clear_midx_file(struct repository *r)
      	if (remove_path(midx))
      		die(_("failed to clear multi-pack-index at %s"), midx);
      
    -+	clear_midx_files_ext(r, ".bitmap", NULL);
    - 	clear_midx_files_ext(r, ".rev", NULL);
    ++	clear_midx_files_ext(r->objects->odb->path, ".bitmap", NULL);
    + 	clear_midx_files_ext(r->objects->odb->path, ".rev", NULL);
      
      	free(midx);
     
16:  a92af89884 = 18:  5d60b07e2e t5310: move some tests to lib-bitmap.sh
17:  d47aa4a919 = 19:  1a9c3538db t/helper/test-read-midx.c: add --checksum mode
18:  9d9d9f28a6 = 20:  8895114ace t5326: test multi-pack bitmap behavior
19:  3e0da7e5ed = 21:  94b1317e0c t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
20:  4e0d49a2dd = 22:  a4f4d90bba t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
21:  47eba8ecf9 = 23:  92a6370e77 t5319: don't write MIDX bitmaps in t5319
22:  3d78afa2ad = 24:  c49dc46fb2 t7700: update to work with MIDX bitmap test knob
23:  c2f94e033d = 25:  44a4800756 midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
24:  6b03016c99 = 26:  bf0981b606 p5310: extract full and partial bitmap tests
25:  d98faa4c2c = 27:  6888fe01aa p5326: perf tests for MIDX bitmaps

Comments

Junio C Hamano Sept. 1, 2021, 6:07 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> Here is another version of the multi-pack reachability bitmaps series. It is
> virtually unchanged since last time.
>
> The changes that did occur is that I integrated Johannes' patch from [1] to fix
> cleaning up MIDX .rev and .bitmap files when using `--object-dir`. That inspired
> a lengthy discussion [2] about `--object-dir`, alternates, object-format and
> running the MIDX builtin outside of a Git repository.
>
> This series resolves that discussion by leaving everything as-is, and only
> changing the following:
>
>   - `git multi-pack-index` will not run when outside of a Git
>     repository.
>
>   - The `--object-dir` argument will only recognize object directories
>     belonging to an alternate of the current repository.
>
>   - Using `--object-dir` to point to a repository which uses a
>     different hash than the repository in the current working directory
>     will continue to not work (as was the case before this series).
>
> And because this incorporates [1], we will also not accidentally clean `.rev`
> files from the wrong object directory.
>
> I think that this version is ready-to-go, and that we can turn our attention to
> squashing some of these cross-alternate buglets, and integrating MIDX bitmaps
> with `git repack`.

Thanks.

>     +@@ Documentation/git-multi-pack-index.txt: OPTIONS
>     + 	Use given directory for the location of Git objects. We check
>     + 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
>     + 	`<dir>/packs` for the pack-files to index.
>     +++
>     ++`<dir>` must be an alternate of the current repository.

After replacing the previous round with this round and running "git
diff @{1}" on the branch, I noticed this documentation update, but
did't find any new code that tries to ensure that the requirement is
met.  It's a bit curious omission.

I think it is OK to allow running this command on <dir> and then add
it as a new alternate (iow, the <dir> being an alternate is not a
strict requirement for correct computation and writing of the midx,
even though it may be a requirement for correct use of the resulting
midx), so perhaps that is where the lack of validation comes from?

THanks.
Taylor Blau Sept. 1, 2021, 7:08 p.m. UTC | #2
On Wed, Sep 01, 2021 at 11:07:59AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> >     +@@ Documentation/git-multi-pack-index.txt: OPTIONS
> >     + 	Use given directory for the location of Git objects. We check
> >     + 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
> >     + 	`<dir>/packs` for the pack-files to index.
> >     +++
> >     ++`<dir>` must be an alternate of the current repository.
>
> After replacing the previous round with this round and running "git
> diff @{1}" on the branch, I noticed this documentation update, but
> did't find any new code that tries to ensure that the requirement is
> met.  It's a bit curious omission.
>
> I think it is OK to allow running this command on <dir> and then add
> it as a new alternate (iow, the <dir> being an alternate is not a
> strict requirement for correct computation and writing of the midx,
> even though it may be a requirement for correct use of the resulting
> midx), so perhaps that is where the lack of validation comes from?

I wasn't sure whether to include it or not, since we technically will
still write a MIDX in that object directory (alternate or not), but we
won't load up an existing MIDX that is already there to reference. So
we'll get the same result, just slower.

I'm comfortable with saying what's written in the documentation, since
even though it happens to work today, we should leave ourselves open to
not supporting directories which aren't alternates.

But I'm equally OK if you would rather drop this hunk from the
documentation when staging.

Thanks,
Taylor
Junio C Hamano Sept. 1, 2021, 7:23 p.m. UTC | #3
Taylor Blau <me@ttaylorr.com> writes:

> I'm comfortable with saying what's written in the documentation, since
> even though it happens to work today, we should leave ourselves open to
> not supporting directories which aren't alternates.
>
> But I'm equally OK if you would rather drop this hunk from the
> documentation when staging.

Oh, no, don't get me wrong.  I am comfortable with the documented
limitation, as that is what the area experts have agreed that is
reasonable given the expected use case.

I however am much less comfortable with a documented limitation that
we make no attempt to enforce, and that is why the first thing I
looked for after seeing the documentation update was new code to
make sure we reject a random directory that is not our alternate
object store.

Thanks.
Taylor Blau Sept. 1, 2021, 8:34 p.m. UTC | #4
On Wed, Sep 01, 2021 at 12:23:26PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > I'm comfortable with saying what's written in the documentation, since
> > even though it happens to work today, we should leave ourselves open to
> > not supporting directories which aren't alternates.
> >
> > But I'm equally OK if you would rather drop this hunk from the
> > documentation when staging.
>
> Oh, no, don't get me wrong.  I am comfortable with the documented
> limitation, as that is what the area experts have agreed that is
> reasonable given the expected use case.
>
> I however am much less comfortable with a documented limitation that
> we make no attempt to enforce, and that is why the first thing I
> looked for after seeing the documentation update was new code to
> make sure we reject a random directory that is not our alternate
> object store.

Sure, I don't mind getting more strict here in this series. If you want,
the below could be queued instead of the original 11/27:

--- 8< ---

Subject: [PATCH] midx: avoid opening multiple MIDXs when writing

Opening multiple instance of the same MIDX can lead to problems like two
separate packed_git structures which represent the same pack being added
to the repository's object store.

The above scenario can happen because prepare_midx_pack() checks if
`m->packs[pack_int_id]` is NULL in order to determine if a pack has been
opened and installed in the repository before. But a caller can
construct two copies of the same MIDX by calling get_multi_pack_index()
and load_multi_pack_index() since the former manipulates the
object store directly but the latter is a lower-level routine which
allocates a new MIDX for each call.

So if prepare_midx_pack() is called on multiple MIDXs with the same
pack_int_id, then that pack will be installed twice in the object
store's packed_git pointer.

This can lead to problems in, for e.g., the pack-bitmap code, which does
something like the following (in pack-bitmap.c:open_pack_bitmap()):

    struct bitmap_index *bitmap_git = ...;
    for (p = get_all_packs(r); p; p = p->next) {
      if (open_pack_bitmap_1(bitmap_git, p) == 0)
        ret = 0;
    }

which is a problem if two copies of the same pack exist in the
packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a
conditional like the following:

    if (bitmap_git->pack || bitmap_git->midx) {
      /* ignore extra bitmap file; we can only handle one */
      warning("ignoring extra bitmap file: %s", packfile->pack_name);
      close(fd);
      return -1;
    }

Avoid this scenario by not letting write_midx_internal() open a MIDX
that isn't also pointed at by the object store. So long as this is the
case, other routines should prefer to open MIDXs with
get_multi_pack_index() or reprepare_packed_git() instead of creating
instances on their own. Because get_multi_pack_index() returns
`r->object_store->multi_pack_index` if it is non-NULL, we'll only have
one instance of a MIDX open at one time, avoiding these problems.

To encourage this, drop the `struct multi_pack_index *` parameter from
`write_midx_internal()`, and rely instead on the `object_dir` to find
(or initialize) the correct MIDX instance.

Likewise, replace the call to `close_midx()` with
`close_object_store()`, since we're about to replace the MIDX with a new
one and should invalidate the object store's memory of any MIDX that
might have existed beforehand.

Note that this now forbids passing object directories that don't belong
to alternate repositories over `--object-dir`, since before we would
have happily opened a MIDX in any directory, but now restrict ourselves
to only those reachable by `r->objects->multi_pack_index` (and alternate
MIDXs that we can see by walking the `next` pointer).

As far as I can tell, supporting arbitrary directories with
`--object-dir` was a historical accident, since even the documentation
says `<alt>` when referring to the value passed to this option.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  2 ++
 builtin/commit-graph.c                 | 22 -------------------
 midx.c                                 | 29 ++++++++++++++++----------
 object-file.c                          | 21 +++++++++++++++++++
 object-store.h                         |  1 +
 t/t5319-multi-pack-index.sh            | 10 ++++++++-
 6 files changed, 51 insertions(+), 34 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index c9b063d31e..0af6beb2dd 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -23,6 +23,8 @@ OPTIONS
 	Use given directory for the location of Git objects. We check
 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
 	`<dir>/packs` for the pack-files to index.
++
+`<dir>` must be an alternate of the current repository.

 --[no-]progress::
 	Turn progress on/off explicitly. If neither is specified, progress is
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index cd86315221..003eaaac5c 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -43,28 +43,6 @@ static struct opts_commit_graph {
 	int enable_changed_paths;
 } opts;

-static struct object_directory *find_odb(struct repository *r,
-					 const char *obj_dir)
-{
-	struct object_directory *odb;
-	char *obj_dir_real = real_pathdup(obj_dir, 1);
-	struct strbuf odb_path_real = STRBUF_INIT;
-
-	prepare_alt_odb(r);
-	for (odb = r->objects->odb; odb; odb = odb->next) {
-		strbuf_realpath(&odb_path_real, odb->path, 1);
-		if (!strcmp(obj_dir_real, odb_path_real.buf))
-			break;
-	}
-
-	free(obj_dir_real);
-	strbuf_release(&odb_path_real);
-
-	if (!odb)
-		die(_("could not find object directory matching %s"), obj_dir);
-	return odb;
-}
-
 static int graph_verify(int argc, const char **argv)
 {
 	struct commit_graph *graph = NULL;
diff --git a/midx.c b/midx.c
index e83f22b5ee..25906044ff 100644
--- a/midx.c
+++ b/midx.c
@@ -893,7 +893,7 @@ static int midx_checksum_valid(struct multi_pack_index *m)
 	return hashfile_checksum_valid(m->data, m->data_len);
 }

-static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
+static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
 			       unsigned flags)
@@ -904,20 +904,26 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	struct hashfile *f = NULL;
 	struct lock_file lk;
 	struct write_midx_context ctx = { 0 };
+	struct multi_pack_index *cur;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
 	int result = 0;
 	struct chunkfile *cf;

+	/* Ensure the given object_dir is local, or a known alternate. */
+	find_odb(the_repository, object_dir);
+
 	midx_name = get_midx_filename(object_dir);
 	if (safe_create_leading_directories(midx_name))
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name);

-	if (m)
-		ctx.m = m;
-	else
-		ctx.m = load_multi_pack_index(object_dir, 1);
+	for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) {
+		if (!strcmp(object_dir, cur->object_dir)) {
+			ctx.m = cur;
+			break;
+		}
+	}

 	if (ctx.m && !midx_checksum_valid(ctx.m)) {
 		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
@@ -1119,7 +1125,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));

 	if (ctx.m)
-		close_midx(ctx.m);
+		close_object_store(the_repository->objects);

 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
@@ -1182,8 +1188,7 @@ int write_midx_file(const char *object_dir,
 		    const char *preferred_pack_name,
 		    unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
-				   flags);
+	return write_midx_internal(object_dir, NULL, preferred_pack_name, flags);
 }

 struct clear_midx_data {
@@ -1461,8 +1466,10 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla

 	free(count);

-	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
+	if (packs_to_drop.nr) {
+		result = write_midx_internal(object_dir, &packs_to_drop, NULL, flags);
+		m = NULL;
+	}

 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1651,7 +1658,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}

-	result = write_midx_internal(object_dir, m, NULL, NULL, flags);
+	result = write_midx_internal(object_dir, NULL, NULL, flags);
 	m = NULL;

 cleanup:
diff --git a/object-file.c b/object-file.c
index a8be899481..a4d720b4f5 100644
--- a/object-file.c
+++ b/object-file.c
@@ -820,6 +820,27 @@ char *compute_alternate_path(const char *path, struct strbuf *err)
 	return ref_git;
 }

+struct object_directory *find_odb(struct repository *r, const char *obj_dir)
+{
+	struct object_directory *odb;
+	char *obj_dir_real = real_pathdup(obj_dir, 1);
+	struct strbuf odb_path_real = STRBUF_INIT;
+
+	prepare_alt_odb(r);
+	for (odb = r->objects->odb; odb; odb = odb->next) {
+		strbuf_realpath(&odb_path_real, odb->path, 1);
+		if (!strcmp(obj_dir_real, odb_path_real.buf))
+			break;
+	}
+
+	free(obj_dir_real);
+	strbuf_release(&odb_path_real);
+
+	if (!odb)
+		die(_("could not find object directory matching %s"), obj_dir);
+	return odb;
+}
+
 static void fill_alternate_refs_command(struct child_process *cmd,
 					const char *repo_path)
 {
diff --git a/object-store.h b/object-store.h
index d24915ced1..250aa5f33c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -38,6 +38,7 @@ KHASH_INIT(odb_path_map, const char * /* key: odb_path */,

 void prepare_alt_odb(struct repository *r);
 char *compute_alternate_path(const char *path, struct strbuf *err);
+struct object_directory *find_odb(struct repository *r, const char *obj_dir);
 typedef int alt_odb_fn(struct object_directory *, void *);
 int foreach_alt_odb(alt_odb_fn, void*);
 typedef void alternate_ref_fn(const struct object_id *oid, void *);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index d7e4988f2b..bd09c3194b 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -582,7 +582,15 @@ test_expect_success 'force some 64-bit offsets with pack-objects' '
 	idx64=objects64/pack/test-64-$pack64.idx &&
 	chmod u+w $idx64 &&
 	corrupt_data $idx64 $(test_oid idxoff) "\02" &&
-	midx64=$(git multi-pack-index --object-dir=objects64 write) &&
+	# objects64 is not a real repository, but can serve as an alternate
+	# anyway so we can write a MIDX into it
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		( cd ../objects64 && pwd ) >.git/objects/info/alternates &&
+		midx64=$(git multi-pack-index --object-dir=../objects64 write)
+	) &&
 	midx_read_expect 1 63 5 objects64 " large-offsets"
 '

--
2.33.0.96.g73915697e6
Junio C Hamano Sept. 1, 2021, 8:49 p.m. UTC | #5
Taylor Blau <me@ttaylorr.com> writes:

> Sure, I don't mind getting more strict here in this series. If you want,
> the below could be queued instead of the original 11/27:

That may make the documentation and the code more consistent.

> As far as I can tell, supporting arbitrary directories with
> `--object-dir` was a historical accident, since even the documentation
> says `<alt>` when referring to the value passed to this option.

The synopsis has [--object-dir=<dir>], which wants to be cleaned up
for consistency (or <alt> updated to <dir>, but I tend to agree with
you that unifying to <alt> may make our intention more clear).

It is unfortunate that "git multi-pack-index -h" says <file>, which
is probably doubly wrong.  It seems this is the only instance that
abuses OPT_FILENAME() for a non-file, so perhaps it is not too bad
to fix it using the lower-level OPTION_FILENAME (instead of adding
a one-off OPT_DIRECTORY_NAME() helper).

Neither is something that would block this step, of course.
Taylor Blau Sept. 1, 2021, 8:54 p.m. UTC | #6
On Wed, Sep 01, 2021 at 01:49:47PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Sure, I don't mind getting more strict here in this series. If you want,
> > the below could be queued instead of the original 11/27:
>
> That may make the documentation and the code more consistent.
>
> > As far as I can tell, supporting arbitrary directories with
> > `--object-dir` was a historical accident, since even the documentation
> > says `<alt>` when referring to the value passed to this option.
>
> The synopsis has [--object-dir=<dir>], which wants to be cleaned up
> for consistency (or <alt> updated to <dir>, but I tend to agree with
> you that unifying to <alt> may make our intention more clear).
>
> It is unfortunate that "git multi-pack-index -h" says <file>, which
> is probably doubly wrong.  It seems this is the only instance that
> abuses OPT_FILENAME() for a non-file, so perhaps it is not too bad
> to fix it using the lower-level OPTION_FILENAME (instead of adding
> a one-off OPT_DIRECTORY_NAME() helper).
>
> Neither is something that would block this step, of course.

I think there is definitely plenty of opportunity to clean all of this
up even more. But I don't think this already-long series is the place to
do it necessarily, since we don't want to let these last-minute (mostly)
cosmetic issues get in the way of this series as a whole.

Hopefully this v5 is at a point where we could start merging it down to
'next' and then address things like the helptext, `s/dir/alt` and so on.

Thanks,
Taylor
Jeff King Sept. 2, 2021, 9:38 a.m. UTC | #7
On Wed, Sep 01, 2021 at 04:34:01PM -0400, Taylor Blau wrote:

> > Oh, no, don't get me wrong.  I am comfortable with the documented
> > limitation, as that is what the area experts have agreed that is
> > reasonable given the expected use case.
> >
> > I however am much less comfortable with a documented limitation that
> > we make no attempt to enforce, and that is why the first thing I
> > looked for after seeing the documentation update was new code to
> > make sure we reject a random directory that is not our alternate
> > object store.
> 
> Sure, I don't mind getting more strict here in this series. If you want,
> the below could be queued instead of the original 11/27:
> 
> --- 8< ---
> 
> Subject: [PATCH] midx: avoid opening multiple MIDXs when writing

I think this is worth doing here, as part of this series.

Two observations (neither of which would lead to changing the patch):

> diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> index c9b063d31e..0af6beb2dd 100644
> --- a/Documentation/git-multi-pack-index.txt
> +++ b/Documentation/git-multi-pack-index.txt
> @@ -23,6 +23,8 @@ OPTIONS
>  	Use given directory for the location of Git objects. We check
>  	`<dir>/packs/multi-pack-index` for the current MIDX file, and
>  	`<dir>/packs` for the pack-files to index.
> ++
> +`<dir>` must be an alternate of the current repository.

I wondered if this needed to say "must be the main object directory of
or an alternate of the current repository". But if you are intending to
operate in the main object directory, you would simply omit --object-dir
entirely. It is good that it will still work if you specified it
explicitly, but I don't think we need to clutter the documentation with
it.

> index cd86315221..003eaaac5c 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -43,28 +43,6 @@ static struct opts_commit_graph {
>  	int enable_changed_paths;
>  } opts;
> 
> -static struct object_directory *find_odb(struct repository *r,
> -					 const char *obj_dir)
> -{
> -	struct object_directory *odb;
> -	char *obj_dir_real = real_pathdup(obj_dir, 1);
> -	struct strbuf odb_path_real = STRBUF_INIT;
> -
> -	prepare_alt_odb(r);
> -	for (odb = r->objects->odb; odb; odb = odb->next) {
> -		strbuf_realpath(&odb_path_real, odb->path, 1);
> -		if (!strcmp(obj_dir_real, odb_path_real.buf))
> -			break;
> -	}
> -
> -	free(obj_dir_real);
> -	strbuf_release(&odb_path_real);
> -
> -	if (!odb)
> -		die(_("could not find object directory matching %s"), obj_dir);
> -	return odb;
> -}

Ah, right, commit-graph faces this same conundrum we've been discussing.
And it behaves in the way that we concluded:

  $ git init one
  $ git commit-graph write --object-dir $PWD/one/.git/objects
  fatal: not a git repository (or any of the parent directories): .git

  $ git init two
  $ git -C two commit-graph write --object-dir $PWD/one/.git/objects
  fatal: could not find object directory matching /home/peff/tmp/one/.git/objects

That gives me more confidence in the direction we decided on.

(Apologies if this was obvious to others, but I didn't see any mention
of commit-graph's similar option in the recent discussion).

-Peff
Jeff King Sept. 2, 2021, 9:40 a.m. UTC | #8
On Wed, Sep 01, 2021 at 01:49:47PM -0700, Junio C Hamano wrote:

> > As far as I can tell, supporting arbitrary directories with
> > `--object-dir` was a historical accident, since even the documentation
> > says `<alt>` when referring to the value passed to this option.
> 
> The synopsis has [--object-dir=<dir>], which wants to be cleaned up
> for consistency (or <alt> updated to <dir>, but I tend to agree with
> you that unifying to <alt> may make our intention more clear).
> 
> It is unfortunate that "git multi-pack-index -h" says <file>, which
> is probably doubly wrong.  It seems this is the only instance that
> abuses OPT_FILENAME() for a non-file, so perhaps it is not too bad
> to fix it using the lower-level OPTION_FILENAME (instead of adding
> a one-off OPT_DIRECTORY_NAME() helper).

That made me wonder what "git commit-graph -h" says. It says "<dir>"
(even though it already must be an alternate), because it uses
OPT_STRING().

I think using OPTION_FILENAME or similar is better there, too, though,
because it reinterprets the name after the repo-setup chdir() step. But
now you'd have to for an OPT_DIRNAME() helper. :)

> Neither is something that would block this step, of course.

Yeah, very much agreed that this can come later on top.

-Peff
Jeff King Sept. 2, 2021, 9:45 a.m. UTC | #9
On Tue, Aug 31, 2021 at 04:51:33PM -0400, Taylor Blau wrote:

> This series resolves that discussion by leaving everything as-is, and only
> changing the following:
> 
>   - `git multi-pack-index` will not run when outside of a Git
>     repository.
> 
>   - The `--object-dir` argument will only recognize object directories
>     belonging to an alternate of the current repository.
> 
>   - Using `--object-dir` to point to a repository which uses a
>     different hash than the repository in the current working directory
>     will continue to not work (as was the case before this series).
> 
> And because this incorporates [1], we will also not accidentally clean `.rev`
> files from the wrong object directory.

Thanks. I read over the new patches, and all looks good to me (using the
revised patch 11 you already sent, and which I commented on separately).

-Peff