diff mbox series

fixup! midx: implement writing incremental MIDX bitmaps

Message ID xmqqseuozg53.fsf_-_@gitster.g (mailing list archive)
State New
Headers show
Series fixup! midx: implement writing incremental MIDX bitmaps | expand

Commit Message

Junio C Hamano Aug. 28, 2024, 5:55 p.m. UTC
With -Wunused, the compiler notices that the midx_name parameter is
unused.  In this case, it is truly unused, the function signature is
not constrained externally, so we can simply drop the parameter from
the definition of the function and its sole caller.

This comes from 01a2cbab (midx: implement writing incremental MIDX
bitmaps, 2024-08-15), so I'll squash the following to that commit.

 midx-write.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Jeff King Aug. 28, 2024, 6:33 p.m. UTC | #1
On Wed, Aug 28, 2024 at 10:55:20AM -0700, Junio C Hamano wrote:

> With -Wunused, the compiler notices that the midx_name parameter is
> unused.  In this case, it is truly unused, the function signature is
> not constrained externally, so we can simply drop the parameter from
> the definition of the function and its sole caller.
> 
> This comes from 01a2cbab (midx: implement writing incremental MIDX
> bitmaps, 2024-08-15), so I'll squash the following to that commit.

Well, that didn't take long for this to come up again. :) I've been
fixing them progressively as they hit 'next' (since I don't usually
build 'seen'), but this one isn't there yet.

I'm always curious in these cases why we have the parameter at all if
it's unnecessary (i.e., is it a bug or leftover cruft). In this case, it
was present before that commit, but refactoring meant that we no longer
write to $name-$hash.bitmap, but instead use get_midx_filename_ext(), or
get_split_midx_filename_ext() in incremental mode.

Is that right, though? It looks like the caller might pass in a
tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX,
if we're in incremental mode. But we'll write directly to
"multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to
what degree it matters, since that's the name we want in the long run.
But would we possibly overwrite an active-in-use file rather than doing
the atomic rename-into-place if we happened to generate the same midx?

It feels like we should still respect the name the caller is using for
tempfiles, and then rename it into the correct spot at the end.

-Peff
Taylor Blau Aug. 29, 2024, 6:57 p.m. UTC | #2
On Wed, Aug 28, 2024 at 02:33:56PM -0400, Jeff King wrote:
> Is that right, though? It looks like the caller might pass in a
> tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX,
> if we're in incremental mode. But we'll write directly to
> "multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to
> what degree it matters, since that's the name we want in the long run.
> But would we possibly overwrite an active-in-use file rather than doing
> the atomic rename-into-place if we happened to generate the same midx?
>
> It feels like we should still respect the name the caller is using for
> tempfiles, and then rename it into the correct spot at the end.

In either case, we're going to write to a temporary file initialized by
the pack-bitmap machinery and then rename() it into place at the end of
bitmap_writer_finish().

On the caller side, in the non-incremental mode, we'll pass
$GIT_DIR/objects/pack/multi-pack-index-$hash.bitmap as the location,
write its contents into a temporary file, and then rename() it there.

But in the incremental mode this series introduces, I think it would be
a bug to pass a tmp_midx_XXXXXX file path there, since nobody would move
it from tmp_midx_XXXXX-$HASH.bitmap into its final location.

So I think what's written here with the fixup! patch is right (and
should be squashed into 13/13 in the next round), but let me know if I'm
missing something.

Thanks,
Taylor
Jeff King Aug. 29, 2024, 7:27 p.m. UTC | #3
On Thu, Aug 29, 2024 at 02:57:08PM -0400, Taylor Blau wrote:

> On Wed, Aug 28, 2024 at 02:33:56PM -0400, Jeff King wrote:
> > Is that right, though? It looks like the caller might pass in a
> > tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX,
> > if we're in incremental mode. But we'll write directly to
> > "multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to
> > what degree it matters, since that's the name we want in the long run.
> > But would we possibly overwrite an active-in-use file rather than doing
> > the atomic rename-into-place if we happened to generate the same midx?
> >
> > It feels like we should still respect the name the caller is using for
> > tempfiles, and then rename it into the correct spot at the end.
> 
> In either case, we're going to write to a temporary file initialized by
> the pack-bitmap machinery and then rename() it into place at the end of
> bitmap_writer_finish().

OK, that addresses my worry, if we're always writing to a tempfile (and
I verified with some recent stracing that this is the case). So renaming
that into tmp_midx_XXXXXX.bitmap would just be a pointless extra layer
of renames.

I do wonder if it's possible for us to generate a new different revindex
and bitmap pair for the same midx hash, and for a reader to see a
mismatched set for a moment. But that's an atomicity problem, and an
extra layer of renames is not going to solve that.

> On the caller side, in the non-incremental mode, we'll pass
> $GIT_DIR/objects/pack/multi-pack-index-$hash.bitmap as the location,
> write its contents into a temporary file, and then rename() it there.
> 
> But in the incremental mode this series introduces, I think it would be
> a bug to pass a tmp_midx_XXXXXX file path there, since nobody would move
> it from tmp_midx_XXXXX-$HASH.bitmap into its final location.
> 
> So I think what's written here with the fixup! patch is right (and
> should be squashed into 13/13 in the next round), but let me know if I'm
> missing something.

What confused me is that write_midx_reverse_index() _does_ still take
midx_name, and respects it. But I think that is a bug!

We do not usually even call that function, since modern midx's have a
RIDX chunk inside them instead of a separate file. But if you do this:

  # generate an extra pack
  git commit --allow-empty -m foo
  git repack -d

  # make an incremental midx with a .rev file; usually this ends up
  # as a RIDX chunk, so we have to force it.
  GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --incremental --bitmap

then you'll end up with a tmp_midx_XXXXXX-*.rev file leftover in
multi-pack-index.d (since, as you note, nobody is moving those into
place).

So probably write_midx_reverse_index() needs the same treatment to
derive its own filenames for the incremental case, and to drop the
midx_name parameter.

Or I wonder if we could simply drop the code to write a separate .rev
file entirely? I don't think there's a reason anybody would want it.

-Peff
diff mbox series

Patch

diff --git c/midx-write.c w/midx-write.c
index bac3b0589a..0ad9139fdb 100644
--- c/midx-write.c
+++ w/midx-write.c
@@ -827,7 +827,7 @@  static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
 }
 
 static int write_midx_bitmap(struct write_midx_context *ctx,
-			     const char *object_dir, const char *midx_name,
+			     const char *object_dir,
 			     const unsigned char *midx_hash,
 			     struct packing_data *pdata,
 			     struct commit **commits,
@@ -1415,7 +1415,7 @@  static int write_midx_internal(const char *object_dir,
 		FREE_AND_NULL(ctx.entries);
 		ctx.entries_nr = 0;
 
-		if (write_midx_bitmap(&ctx, object_dir, midx_name.buf,
+		if (write_midx_bitmap(&ctx, object_dir,
 				      midx_hash, &pdata, commits, commits_nr,
 				      flags) < 0) {
 			error(_("could not write multi-pack bitmap"));