Message ID | xmqqseuozg53.fsf_-_@gitster.g (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fixup! midx: implement writing incremental MIDX bitmaps | expand |
On Wed, Aug 28, 2024 at 10:55:20AM -0700, Junio C Hamano wrote: > With -Wunused, the compiler notices that the midx_name parameter is > unused. In this case, it is truly unused, the function signature is > not constrained externally, so we can simply drop the parameter from > the definition of the function and its sole caller. > > This comes from 01a2cbab (midx: implement writing incremental MIDX > bitmaps, 2024-08-15), so I'll squash the following to that commit. Well, that didn't take long for this to come up again. :) I've been fixing them progressively as they hit 'next' (since I don't usually build 'seen'), but this one isn't there yet. I'm always curious in these cases why we have the parameter at all if it's unnecessary (i.e., is it a bug or leftover cruft). In this case, it was present before that commit, but refactoring meant that we no longer write to $name-$hash.bitmap, but instead use get_midx_filename_ext(), or get_split_midx_filename_ext() in incremental mode. Is that right, though? It looks like the caller might pass in a tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX, if we're in incremental mode. But we'll write directly to "multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to what degree it matters, since that's the name we want in the long run. But would we possibly overwrite an active-in-use file rather than doing the atomic rename-into-place if we happened to generate the same midx? It feels like we should still respect the name the caller is using for tempfiles, and then rename it into the correct spot at the end. -Peff
On Wed, Aug 28, 2024 at 02:33:56PM -0400, Jeff King wrote: > Is that right, though? It looks like the caller might pass in a > tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX, > if we're in incremental mode. But we'll write directly to > "multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to > what degree it matters, since that's the name we want in the long run. > But would we possibly overwrite an active-in-use file rather than doing > the atomic rename-into-place if we happened to generate the same midx? > > It feels like we should still respect the name the caller is using for > tempfiles, and then rename it into the correct spot at the end. In either case, we're going to write to a temporary file initialized by the pack-bitmap machinery and then rename() it into place at the end of bitmap_writer_finish(). On the caller side, in the non-incremental mode, we'll pass $GIT_DIR/objects/pack/multi-pack-index-$hash.bitmap as the location, write its contents into a temporary file, and then rename() it there. But in the incremental mode this series introduces, I think it would be a bug to pass a tmp_midx_XXXXXX file path there, since nobody would move it from tmp_midx_XXXXX-$HASH.bitmap into its final location. So I think what's written here with the fixup! patch is right (and should be squashed into 13/13 in the next round), but let me know if I'm missing something. Thanks, Taylor
On Thu, Aug 29, 2024 at 02:57:08PM -0400, Taylor Blau wrote: > On Wed, Aug 28, 2024 at 02:33:56PM -0400, Jeff King wrote: > > Is that right, though? It looks like the caller might pass in a > > tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX, > > if we're in incremental mode. But we'll write directly to > > "multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to > > what degree it matters, since that's the name we want in the long run. > > But would we possibly overwrite an active-in-use file rather than doing > > the atomic rename-into-place if we happened to generate the same midx? > > > > It feels like we should still respect the name the caller is using for > > tempfiles, and then rename it into the correct spot at the end. > > In either case, we're going to write to a temporary file initialized by > the pack-bitmap machinery and then rename() it into place at the end of > bitmap_writer_finish(). OK, that addresses my worry, if we're always writing to a tempfile (and I verified with some recent stracing that this is the case). So renaming that into tmp_midx_XXXXXX.bitmap would just be a pointless extra layer of renames. I do wonder if it's possible for us to generate a new different revindex and bitmap pair for the same midx hash, and for a reader to see a mismatched set for a moment. But that's an atomicity problem, and an extra layer of renames is not going to solve that. > On the caller side, in the non-incremental mode, we'll pass > $GIT_DIR/objects/pack/multi-pack-index-$hash.bitmap as the location, > write its contents into a temporary file, and then rename() it there. > > But in the incremental mode this series introduces, I think it would be > a bug to pass a tmp_midx_XXXXXX file path there, since nobody would move > it from tmp_midx_XXXXX-$HASH.bitmap into its final location. > > So I think what's written here with the fixup! patch is right (and > should be squashed into 13/13 in the next round), but let me know if I'm > missing something. What confused me is that write_midx_reverse_index() _does_ still take midx_name, and respects it. But I think that is a bug! We do not usually even call that function, since modern midx's have a RIDX chunk inside them instead of a separate file. But if you do this: # generate an extra pack git commit --allow-empty -m foo git repack -d # make an incremental midx with a .rev file; usually this ends up # as a RIDX chunk, so we have to force it. GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --incremental --bitmap then you'll end up with a tmp_midx_XXXXXX-*.rev file leftover in multi-pack-index.d (since, as you note, nobody is moving those into place). So probably write_midx_reverse_index() needs the same treatment to derive its own filenames for the incremental case, and to drop the midx_name parameter. Or I wonder if we could simply drop the code to write a separate .rev file entirely? I don't think there's a reason anybody would want it. -Peff
diff --git c/midx-write.c w/midx-write.c index bac3b0589a..0ad9139fdb 100644 --- c/midx-write.c +++ w/midx-write.c @@ -827,7 +827,7 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr } static int write_midx_bitmap(struct write_midx_context *ctx, - const char *object_dir, const char *midx_name, + const char *object_dir, const unsigned char *midx_hash, struct packing_data *pdata, struct commit **commits, @@ -1415,7 +1415,7 @@ static int write_midx_internal(const char *object_dir, FREE_AND_NULL(ctx.entries); ctx.entries_nr = 0; - if (write_midx_bitmap(&ctx, object_dir, midx_name.buf, + if (write_midx_bitmap(&ctx, object_dir, midx_hash, &pdata, commits, commits_nr, flags) < 0) { error(_("could not write multi-pack bitmap"));