Message ID | 7d20c13f8b48d2aef45c2c8c40efb6ecdb865aa8.1641320129.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | midx: prevent bitmap corruption when permuting pack order | expand |
Taylor Blau <me@ttaylorr.com> writes: > ... It's likely we were using > finalize_object_file() instead of a pure rename() because the former > also adjusts shared permissions. I thought the primary reason why we use finalize was because we ignore EEXIST (and the assumption is that the files with the same contents get the same name computed from their contents). > tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr, > midx_hash, WRITE_REV); > > - if (finalize_object_file(tmp_file, buf.buf)) > + if (rename(tmp_file, buf.buf)) > die(_("cannot store reverse index file")); Doesn't your new code die with it if buf.buf names an existing file?
Junio C Hamano <gitster@pobox.com> writes: > Taylor Blau <me@ttaylorr.com> writes: > >> ... It's likely we were using >> finalize_object_file() instead of a pure rename() because the former >> also adjusts shared permissions. > > I thought the primary reason why we use finalize was because we > ignore EEXIST (and the assumption is that the files with the same > contents get the same name computed from their contents). > >> tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr, >> midx_hash, WRITE_REV); >> >> - if (finalize_object_file(tmp_file, buf.buf)) >> + if (rename(tmp_file, buf.buf)) >> die(_("cannot store reverse index file")); > > Doesn't your new code die with it if buf.buf names an existing file? Ah, scratch that. rename() discards the old one atomically, so as long as tmp_file and buf.buf are in the same directory (which I think it is in this case), we wouldn't be affected by the bug that is worked around with "Coda hack" in finalize_object_file(), either.
On Fri, Jan 14, 2022 at 01:43:55PM -0800, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > Taylor Blau <me@ttaylorr.com> writes: > > > >> ... It's likely we were using > >> finalize_object_file() instead of a pure rename() because the former > >> also adjusts shared permissions. > > > > I thought the primary reason why we use finalize was because we > > ignore EEXIST (and the assumption is that the files with the same > > contents get the same name computed from their contents). > > > >> tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr, > >> midx_hash, WRITE_REV); > >> > >> - if (finalize_object_file(tmp_file, buf.buf)) > >> + if (rename(tmp_file, buf.buf)) > >> die(_("cannot store reverse index file")); > > > > Doesn't your new code die with it if buf.buf names an existing file? > > Ah, scratch that. rename() discards the old one atomically, so as > long as tmp_file and buf.buf are in the same directory (which I > think it is in this case), we wouldn't be affected by the bug that > is worked around with "Coda hack" in finalize_object_file(), either. Exactly. In this case, we really did want to overwrite an existing .rev file with the same name. That's because prior to this patch, we didn't store the object order in the MIDX itself. That made it possible for us to change the object order, but leave the MIDX checksum alone. Then we'd keep the old .rev (with the old order, but the same name) in place, if we used link(2) in order to try (and fail) to overwrite the existing .rev with the new one. Indeed, the temporary file created by write_rev_file_order() is written in the pack directory, which is where the .rev file and MIDX will ultimately live. This code is going to be hidden behind a test knob in a few patches anyway, but verifying all of this is good nonetheless (especially if you just want to apply these first two patches into the v2.35 tree). Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > On Fri, Jan 14, 2022 at 01:43:55PM -0800, Junio C Hamano wrote: >> Junio C Hamano <gitster@pobox.com> writes: >> >> > Taylor Blau <me@ttaylorr.com> writes: >> > >> >> ... It's likely we were using >> >> finalize_object_file() instead of a pure rename() because the former >> >> also adjusts shared permissions. >> > >> > I thought the primary reason why we use finalize was because we >> > ignore EEXIST (and the assumption is that the files with the same >> > contents get the same name computed from their contents). >> > >> >> tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr, >> >> midx_hash, WRITE_REV); >> >> >> >> - if (finalize_object_file(tmp_file, buf.buf)) >> >> + if (rename(tmp_file, buf.buf)) >> >> die(_("cannot store reverse index file")); >> > >> > Doesn't your new code die with it if buf.buf names an existing file? >> >> Ah, scratch that. rename() discards the old one atomically, so as >> long as tmp_file and buf.buf are in the same directory (which I >> think it is in this case), we wouldn't be affected by the bug that >> is worked around with "Coda hack" in finalize_object_file(), either. > > Exactly. In this case, we really did want to overwrite an existing .rev > file with the same name. That's because prior to this patch, we didn't > store the object order in the MIDX itself. That made it possible for us > to change the object order, but leave the MIDX checksum alone. The other change in this step is addition of a new chunk type that records the revindex data. With that, you are guaranteeing that a new file with the same checksum as an old one must have the byte-for-byte identical contents, so at that point, it is OK to use finalize_object_file() and not use rename() to keep the old file, no? If that is the case, we may rather want to use f-o-f consistently for stuff inside .git/ directory whose filename is tied with its contents. I do not think we want to chase a bug that comes from difference between link-then-unlink vs rename, which affects loose object files in one way and midx file in another way.
Taylor Blau <me@ttaylorr.com> writes: > The previous patch demonstrates a bug where a MIDX's auxiliary object > order can become out of sync with a MIDX bitmap. > > This is because of two confounding factors: > > - First, the object order is stored in a file which is named according > to the multi-pack index's checksum, and the MIDX does not store the > object order. This means that the object order can change without > altering the checksum. > > - But the .rev file is moved into place with finalize_object_file(), > which link(2)'s the file into place instead of renaming it. For us, > that means that a modified .rev file will not be moved into place if > MIDX's checksum was unchanged. > > The fix here is two-fold. First, we need to stop linking the file into > place and instead rename it. It's likely we were using > finalize_object_file() instead of a pure rename() because the former > also adjusts shared permissions. But that is unnecessary, because we > already do so in write_rev_file_order(), so rename alone is safe. > > But we also need to make the MIDX's checksum change in some way when the > preferred pack changes without altering the set of packs stored in a > MIDX to prevent a race where the new .rev file is moved into place > before the MIDX is updated. Here, you'd get the opposite effect: reading > old bitmaps with the new object order. I think the main issue is the first confounding factor you listed above: even if we didn't have the other confounding factor, that issue alone is enough to motivate the entire patch set. Likewise, as Junio said [1], I don't think we need to switch to rename() if we make the checksum different, so the fix is one-fold, not two-fold. For what it's worth, I switched back to finalize_object_file() and ran the tests, and they all pass. So I would simplify the commit message to just talk about the checksum issue. (This is definitely not a blocker for merging, though - others might find the additional context helpful.) The code up to this patch (apart from the rename()) looks good. [1] https://lore.kernel.org/git/xmqqtue54iop.fsf@gitster.g/
On Thu, Jan 20, 2022 at 10:08:43AM -0800, Jonathan Tan wrote: > I think the main issue is the first confounding factor you listed above: > even if we didn't have the other confounding factor, that issue alone is > enough to motivate the entire patch set. Likewise, as Junio said [1], I > don't think we need to switch to rename() if we make the checksum > different, so the fix is one-fold, not two-fold. For what it's worth, I > switched back to finalize_object_file() and ran the tests, and they all > pass. Yeah, I agree with both of you and would rather use finalize_object_file() here and be consistent with other callers that modify $GIT_DIR/objects. It is safe to do, since the .rev file's name will necessarily be different if the MIDX's object order changes. > [1] https://lore.kernel.org/git/xmqqtue54iop.fsf@gitster.g/ Thanks, Taylor
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt index b39c69da8c..f2221d2b44 100644 --- a/Documentation/technical/multi-pack-index.txt +++ b/Documentation/technical/multi-pack-index.txt @@ -24,6 +24,7 @@ and their offsets into multiple packfiles. It contains: ** An offset within the jth packfile for the object. * If large offsets are required, we use another list of large offsets similar to version 2 pack-indexes. +- An optional list of objects in pseudo-pack order (used with MIDX bitmaps). Thus, we can provide O(log N) lookup time for any number of packfiles. diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 8d2f42f29e..6d3efb7d16 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -376,6 +376,11 @@ CHUNK DATA: [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'}) 8-byte offsets into large packfiles. + [Optional] Bitmap pack order (ID: {'R', 'I', 'D', 'X'}) + A list of MIDX positions (one per object in the MIDX, num_objects in + total, each a 4-byte unsigned integer in network byte order), sorted + according to their relative bitmap/pseudo-pack positions. + TRAILER: Index checksum of the above contents. @@ -456,9 +461,5 @@ In short, a MIDX's pseudo-pack is the de-duplicated concatenation of objects in packs stored by the MIDX, laid out in pack order, and the packs arranged in MIDX order (with the preferred pack coming first). -Finally, note that the MIDX's reverse index is not stored as a chunk in -the multi-pack-index itself. This is done because the reverse index -includes the checksum of the pack or MIDX to which it belongs, which -makes it impossible to write in the MIDX. To avoid races when rewriting -the MIDX, a MIDX reverse index includes the MIDX's checksum in its -filename (e.g., `multi-pack-index-xyz.rev`). +The MIDX's reverse index is stored in the optional 'RIDX' chunk within +the MIDX itself. diff --git a/midx.c b/midx.c index 837b46b2af..d3179e9c02 100644 --- a/midx.c +++ b/midx.c @@ -33,6 +33,7 @@ #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */ #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */ +#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */ #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256) #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t)) #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t)) @@ -833,6 +834,18 @@ static int write_midx_large_offsets(struct hashfile *f, return 0; } +static int write_midx_revindex(struct hashfile *f, + void *data) +{ + struct write_midx_context *ctx = data; + uint32_t i; + + for (i = 0; i < ctx->entries_nr; i++) + hashwrite_be32(f, ctx->pack_order[i]); + + return 0; +} + struct midx_pack_order_data { uint32_t nr; uint32_t pack; @@ -891,7 +904,7 @@ static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash, tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr, midx_hash, WRITE_REV); - if (finalize_object_file(tmp_file, buf.buf)) + if (rename(tmp_file, buf.buf)) die(_("cannot store reverse index file")); strbuf_release(&buf); @@ -1403,15 +1416,19 @@ static int write_midx_internal(const char *object_dir, (size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH, write_midx_large_offsets); + if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) { + ctx.pack_order = midx_pack_order(&ctx); + add_chunk(cf, MIDX_CHUNKID_REVINDEX, + ctx.entries_nr * sizeof(uint32_t), + write_midx_revindex); + } + write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); write_chunkfile(cf, &ctx); finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM); free_chunkfile(cf); - if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) - ctx.pack_order = midx_pack_order(&ctx); - if (flags & MIDX_WRITE_REV_INDEX) write_midx_reverse_index(midx_name.buf, midx_hash, &ctx); if (flags & MIDX_WRITE_BITMAP) { diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 0ca2868b0b..353282310d 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -395,7 +395,7 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' ' ) ' -test_expect_failure 'changing the preferred pack does not corrupt bitmaps' ' +test_expect_success 'changing the preferred pack does not corrupt bitmaps' ' rm -fr repo && git init repo && test_when_finished "rm -fr repo" &&
The previous patch demonstrates a bug where a MIDX's auxiliary object order can become out of sync with a MIDX bitmap. This is because of two confounding factors: - First, the object order is stored in a file which is named according to the multi-pack index's checksum, and the MIDX does not store the object order. This means that the object order can change without altering the checksum. - But the .rev file is moved into place with finalize_object_file(), which link(2)'s the file into place instead of renaming it. For us, that means that a modified .rev file will not be moved into place if MIDX's checksum was unchanged. The fix here is two-fold. First, we need to stop linking the file into place and instead rename it. It's likely we were using finalize_object_file() instead of a pure rename() because the former also adjusts shared permissions. But that is unnecessary, because we already do so in write_rev_file_order(), so rename alone is safe. But we also need to make the MIDX's checksum change in some way when the preferred pack changes without altering the set of packs stored in a MIDX to prevent a race where the new .rev file is moved into place before the MIDX is updated. Here, you'd get the opposite effect: reading old bitmaps with the new object order. But this race bites us even here: suppose that we didn't change the MIDX checksum, but only renamed the auxiliary object order into place instead of hardlinking it. Then when we go to generate the new bitmap, we'll load the old MIDX bitmap, along with the MIDX that it references. That's fine, since the new MIDX isn't moved into place until after the new bitmap is generated. But the new object order *has* been moved into place. So we'll read the old bitmaps in the new order when generating the new bitmap file, meaning that without this secondary change, bitmap generation itself would become a victim of the race described here. This can all be prevented by forcing the MIDX's checksum to change when the object order changes. We could include the entire object order in the MIDX, but doing so is somewhat awkward. (For example, the code that writes a .rev file expects to know the checksum of the associated pack or MIDX, but writing that data into the MIDX itself makes that a circular dependency). Instead, make the object order used during bitmap generation part of the MIDX itself. That means that the new test in t5326 will cause the MIDX's checksum to update, preventing the stale read problem. In theory, it is possible to store a "fingerprint" of the full object order here, so long as that fingerprint changes at least as often as the full object order does. Some possibilities here include storing the identity of the preferred pack, along with the mtimes of the non-preferred packs in a consistent order. But storing a limited part of the information makes it difficult to reason about whether or not there are gaps between the two that would cause us to get bitten by this bug again. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/technical/multi-pack-index.txt | 1 + Documentation/technical/pack-format.txt | 13 +++++----- midx.c | 25 ++++++++++++++++---- t/t5326-multi-pack-bitmaps.sh | 2 +- 4 files changed, 30 insertions(+), 11 deletions(-)