Message ID | 20190404232546.GD21839@sigill.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | a rabbit hole of update-server-info fixes | expand |
Am 05.04.2019 um 01:25 schrieb Jeff King: > When we have a .midx that covers many packfiles, we try to avoid opening > the .idx for those packfiles. However, there are a few problems with the > filename comparison we use: > > - we ask midx_contains_pack() about the .pack name, not the .idx name. > But it compares to the latter. > > - we compute the basename of the pack using strrchr() to find the > final slash. But that leaves an extra "/" at the start of our > string; we need to advance past it. > > That also raises the question of what to do when the name does not > have a slash at all. This should generally not happen (we always > find files in "pack/"), but it doesn't hurt to be defensive here. > > The tests don't notice because there's nothing about opening those .idx > files that would cause us to give incorrect output. It's just a little > slower. The new test checks this case by corrupting the covered .idx, > and then making sure we don't complain about it. > > Signed-off-by: Jeff King <peff@peff.net> > --- > packfile.c | 17 ++++++++++++++--- > t/t5319-multi-pack-index.sh | 14 ++++++++++++++ > 2 files changed, 28 insertions(+), 3 deletions(-) > > diff --git a/packfile.c b/packfile.c > index 054269ae5d..e7ca135ed5 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -486,15 +496,16 @@ static int open_packed_git_1(struct packed_git *p) > ssize_t read_result; > const unsigned hashsz = the_hash_algo->rawsz; > > - if (!p->index_data) { > + if (!p->index_data && the_repository->objects->multi_pack_index) { So if there is no multi_pack_index, we skip this block now... > struct multi_pack_index *m; > - const char *pack_name = strrchr(p->pack_name, '/'); > + char *idx_name = pack_name_to_idx(pack_basename(p)); > > for (m = the_repository->objects->multi_pack_index; > m; m = m->next) { > - if (midx_contains_pack(m, pack_name)) > + if (midx_contains_pack(m, idx_name)) > break; > } > + free(idx_name); > > if (!m && open_pack_index(p)) > return error("packfile %s index unavailable", p->pack_name); ... which also means this open_pack_index() call isn't done anymore if there's no .midx file at all. You don't mention this change in the commit message; is it intended? And I wonder if it would be easier overall to let midx_contains_pack() accept .pack names in addition to .idx names. Perhaps with something like this? int cmp_idx_or_pack_name(const char *idx_or_pack_name, const char *idx_name) { while (*idx_name && *idx_name == *idx_or_pack_name) { idx_name++; idx_or_pack_name++; } if (!strcmp(idx_name, ".idx") && !strcmp(idx_or_pack_name, ".pack")) return 0; return strcmp(idx_or_pack_name, idx_name); } René
On Thu, Apr 04, 2019 at 07:25:46PM -0400, Jeff King wrote: > When we have a .midx that covers many packfiles, we try to avoid opening > the .idx for those packfiles. However, there are a few problems with the > filename comparison we use: > > - we ask midx_contains_pack() about the .pack name, not the .idx name. > But it compares to the latter. > > - we compute the basename of the pack using strrchr() to find the > final slash. But that leaves an extra "/" at the start of our > string; we need to advance past it. > > That also raises the question of what to do when the name does not > have a slash at all. This should generally not happen (we always > find files in "pack/"), but it doesn't hurt to be defensive here. > > The tests don't notice because there's nothing about opening those .idx > files that would cause us to give incorrect output. It's just a little > slower. The new test checks this case by corrupting the covered .idx, > and then making sure we don't complain about it. When the test suite is run with GIT_TEST_MULTI_PACK_INDEX=1, the following tests fail with this patch: t1006-cat-file.sh (Wstat: 256 Tests: 111 Failed: 1) Failed test: 87 Non-zero exit status: 1 t5320-delta-islands.sh (Wstat: 256 Tests: 15 Failed: 10) Failed tests: 2-4, 6-10, 12-13 Non-zero exit status: 1 t5310-pack-bitmaps.sh (Wstat: 256 Tests: 46 Failed: 1) Failed test: 44 Non-zero exit status: 1 t5570-git-daemon.sh (Wstat: 256 Tests: 21 Failed: 1) Failed test: 10 Non-zero exit status: 1 > Signed-off-by: Jeff King <peff@peff.net> > --- > packfile.c | 17 ++++++++++++++--- > t/t5319-multi-pack-index.sh | 14 ++++++++++++++ > 2 files changed, 28 insertions(+), 3 deletions(-) > > diff --git a/packfile.c b/packfile.c > index 054269ae5d..e7ca135ed5 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -472,6 +472,16 @@ static unsigned int get_max_fd_limit(void) > #endif > } > > +static const char *pack_basename(struct packed_git *p) > +{ > + const char *ret = strrchr(p->pack_name, '/'); > + if (ret) > + ret = ret + 1; /* skip past slash */ > + else > + ret = p->pack_name; /* we only have a base */ > + return ret; > +} > + > /* > * Do not call this directly as this leaks p->pack_fd on error return; > * call open_packed_git() instead. > @@ -486,15 +496,16 @@ static int open_packed_git_1(struct packed_git *p) > ssize_t read_result; > const unsigned hashsz = the_hash_algo->rawsz; > > - if (!p->index_data) { > + if (!p->index_data && the_repository->objects->multi_pack_index) { > struct multi_pack_index *m; > - const char *pack_name = strrchr(p->pack_name, '/'); > + char *idx_name = pack_name_to_idx(pack_basename(p)); > > for (m = the_repository->objects->multi_pack_index; > m; m = m->next) { > - if (midx_contains_pack(m, pack_name)) > + if (midx_contains_pack(m, idx_name)) > break; > } > + free(idx_name); > > if (!m && open_pack_index(p)) > return error("packfile %s index unavailable", p->pack_name); > diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh > index 8c4d2bd849..1ebf19ec3c 100755 > --- a/t/t5319-multi-pack-index.sh > +++ b/t/t5319-multi-pack-index.sh > @@ -117,6 +117,20 @@ test_expect_success 'write midx with one v2 pack' ' > > compare_results_with_midx "one v2 pack" > > +test_expect_success 'corrupt idx not opened' ' > + idx=$(test-tool read-midx $objdir | grep "\.idx\$") && > + mv $objdir/pack/$idx backup-$idx && > + test_when_finished "mv backup-\$idx \$objdir/pack/\$idx" && > + > + # This is the minimum size for a sha-1 based .idx; this lets > + # us pass perfunctory tests, but anything that actually opens and reads > + # the idx file will complain. > + test_copy_bytes 1064 <backup-$idx >$objdir/pack/$idx && > + > + git -c core.multiPackIndex=true rev-list --objects --all 2>err && > + test_must_be_empty err > +' > + > test_expect_success 'add more objects' ' > for i in $(test_seq 6 10) > do > -- > 2.21.0.714.gd1be1d035b >
On Fri, Apr 05, 2019 at 10:05:29AM +0200, René Scharfe wrote: > > @@ -486,15 +496,16 @@ static int open_packed_git_1(struct packed_git *p) > > ssize_t read_result; > > const unsigned hashsz = the_hash_algo->rawsz; > > > > - if (!p->index_data) { > > + if (!p->index_data && the_repository->objects->multi_pack_index) { > > So if there is no multi_pack_index, we skip this block now... > > > struct multi_pack_index *m; > > - const char *pack_name = strrchr(p->pack_name, '/'); > > + char *idx_name = pack_name_to_idx(pack_basename(p)); > > > > for (m = the_repository->objects->multi_pack_index; > > m; m = m->next) { > > - if (midx_contains_pack(m, pack_name)) > > + if (midx_contains_pack(m, idx_name)) > > break; > > } > > + free(idx_name); > > > > if (!m && open_pack_index(p)) > > return error("packfile %s index unavailable", p->pack_name); > > ... which also means this open_pack_index() call isn't done anymore if > there's no .midx file at all. You don't mention this change in the > commit message; is it intended? Doh, thank you for catching that. I made that switch at the last minute because I didn't want to pay the malloc/free cost when we had no list to compare to. I'm surprised it works at all. :-/ I guess it doesn't, from the other message in the thread. > And I wonder if it would be easier overall to let midx_contains_pack() > accept .pack names in addition to .idx names. Perhaps with something > like this? > > int cmp_idx_or_pack_name(const char *idx_or_pack_name, const char *idx_name) > { > while (*idx_name && *idx_name == *idx_or_pack_name) { > idx_name++; > idx_or_pack_name++; > } > if (!strcmp(idx_name, ".idx") && !strcmp(idx_or_pack_name, ".pack")) > return 0; > return strcmp(idx_or_pack_name, idx_name); > } Hmm, maybe. It does a binary search, so I'd have to scratch my head for a minute of whether this loose comparison is correct. I think it is because of that final strcmp. -Peff
On Fri, Apr 05, 2019 at 09:21:25AM -0400, Jeff King wrote: > > ... which also means this open_pack_index() call isn't done anymore if > > there's no .midx file at all. You don't mention this change in the > > commit message; is it intended? > > Doh, thank you for catching that. I made that switch at the last minute > because I didn't want to pay the malloc/free cost when we had no list to > compare to. I'm surprised it works at all. :-/ > > I guess it doesn't, from the other message in the thread. Hmm, no, that's a different problem. I think we'd generally already have the pack open unless there _is_ a midx, so this conditional probably never kicks in unless we have a midx anyway. Meaning my malloc-avoidance was over-thought. So we could just get rid of it. Or do it like this: diff --git a/packfile.c b/packfile.c index e7ca135ed5..ef0f959311 100644 --- a/packfile.c +++ b/packfile.c @@ -496,17 +496,22 @@ static int open_packed_git_1(struct packed_git *p) ssize_t read_result; const unsigned hashsz = the_hash_algo->rawsz; - if (!p->index_data && the_repository->objects->multi_pack_index) { - struct multi_pack_index *m; - char *idx_name = pack_name_to_idx(pack_basename(p)); + if (!p->index_data) { + struct multi_pack_index *m = + the_repository->objects->multi_pack_index; - for (m = the_repository->objects->multi_pack_index; - m; m = m->next) { - if (midx_contains_pack(m, idx_name)) - break; + if (m) { + char *idx_name = pack_name_to_idx(pack_basename(p)); + + for (; m; m = m->next) { + if (midx_contains_pack(m, idx_name)) + break; + } + free(idx_name); } - free(idx_name); + if (!m) + warning("opening anyway"); if (!m && open_pack_index(p)) return error("packfile %s index unavailable", p->pack_name); } I'll have to re-roll to address the other problem, though, so I'll give it some thought. -Peff
On Fri, Apr 05, 2019 at 02:01:20PM +0200, SZEDER Gábor wrote: > > The tests don't notice because there's nothing about opening those .idx > > files that would cause us to give incorrect output. It's just a little > > slower. The new test checks this case by corrupting the covered .idx, > > and then making sure we don't complain about it. > > When the test suite is run with GIT_TEST_MULTI_PACK_INDEX=1, the > following tests fail with this patch: > > t1006-cat-file.sh (Wstat: 256 Tests: 111 Failed: 1) > Failed test: 87 > Non-zero exit status: 1 > t5320-delta-islands.sh (Wstat: 256 Tests: 15 Failed: 10) > Failed tests: 2-4, 6-10, 12-13 > Non-zero exit status: 1 > t5310-pack-bitmaps.sh (Wstat: 256 Tests: 46 Failed: 1) > Failed test: 44 > Non-zero exit status: 1 > t5570-git-daemon.sh (Wstat: 256 Tests: 21 Failed: 1) > Failed test: 10 > Non-zero exit status: 1 Thanks. It looks like this is not a bug in my patch per se, but rather that the feature of the midx code it is un-breaking is not actually a good idea. The problem is with generating the pack revindex. It needs the actual index for each pack to be loaded, and now open_packed_git() will sometimes quietly not do that. Probably we _could_ be generating a big revindex off of the midx, but in the meantime, the revindex code needs to make sure that the index is loaded before looking at it. This patch clears up those test failures (I'll re-roll the whole series with it in the proper spot). --- diff --git a/pack-bitmap.c b/pack-bitmap.c index 4695aaf6b4..3960ad94c8 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -308,7 +308,8 @@ static int load_pack_bitmap(struct bitmap_index *bitmap_git) bitmap_git->bitmaps = kh_init_sha1(); bitmap_git->ext_index.positions = kh_init_sha1_pos(); - load_pack_revindex(bitmap_git->pack); + if (load_pack_revindex(bitmap_git->pack)) + goto failed; if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) || !(bitmap_git->trees = read_bitmap_1(bitmap_git)) || diff --git a/pack-revindex.c b/pack-revindex.c index 50891f77a2..d28a7e43d0 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -1,6 +1,7 @@ #include "cache.h" #include "pack-revindex.h" #include "object-store.h" +#include "packfile.h" /* * Pack index for existing packs give us easy access to the offsets into @@ -158,10 +159,14 @@ static void create_pack_revindex(struct packed_git *p) sort_revindex(p->revindex, num_ent, p->pack_size); } -void load_pack_revindex(struct packed_git *p) +int load_pack_revindex(struct packed_git *p) { - if (!p->revindex) + if (!p->revindex) { + if (open_pack_index(p)) + return -1; create_pack_revindex(p); + } + return 0; } int find_revindex_position(struct packed_git *p, off_t ofs) @@ -188,7 +193,9 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) { int pos; - load_pack_revindex(p); + if (load_pack_revindex(p)) + return NULL; + pos = find_revindex_position(p, ofs); if (pos < 0) diff --git a/pack-revindex.h b/pack-revindex.h index e262f3efe8..848331d5d6 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -8,7 +8,7 @@ struct revindex_entry { unsigned int nr; }; -void load_pack_revindex(struct packed_git *p); +int load_pack_revindex(struct packed_git *p); int find_revindex_position(struct packed_git *p, off_t ofs); struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs); diff --git a/packfile.c b/packfile.c index e7ca135ed5..ca73a5b12a 100644 --- a/packfile.c +++ b/packfile.c @@ -2040,8 +2040,10 @@ int for_each_object_in_pack(struct packed_git *p, uint32_t i; int r = 0; - if (flags & FOR_EACH_OBJECT_PACK_ORDER) - load_pack_revindex(p); + if (flags & FOR_EACH_OBJECT_PACK_ORDER) { + if (load_pack_revindex(p)) + return -1; + } for (i = 0; i < p->num_objects; i++) { uint32_t pos;
diff --git a/packfile.c b/packfile.c index 054269ae5d..e7ca135ed5 100644 --- a/packfile.c +++ b/packfile.c @@ -472,6 +472,16 @@ static unsigned int get_max_fd_limit(void) #endif } +static const char *pack_basename(struct packed_git *p) +{ + const char *ret = strrchr(p->pack_name, '/'); + if (ret) + ret = ret + 1; /* skip past slash */ + else + ret = p->pack_name; /* we only have a base */ + return ret; +} + /* * Do not call this directly as this leaks p->pack_fd on error return; * call open_packed_git() instead. @@ -486,15 +496,16 @@ static int open_packed_git_1(struct packed_git *p) ssize_t read_result; const unsigned hashsz = the_hash_algo->rawsz; - if (!p->index_data) { + if (!p->index_data && the_repository->objects->multi_pack_index) { struct multi_pack_index *m; - const char *pack_name = strrchr(p->pack_name, '/'); + char *idx_name = pack_name_to_idx(pack_basename(p)); for (m = the_repository->objects->multi_pack_index; m; m = m->next) { - if (midx_contains_pack(m, pack_name)) + if (midx_contains_pack(m, idx_name)) break; } + free(idx_name); if (!m && open_pack_index(p)) return error("packfile %s index unavailable", p->pack_name); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 8c4d2bd849..1ebf19ec3c 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -117,6 +117,20 @@ test_expect_success 'write midx with one v2 pack' ' compare_results_with_midx "one v2 pack" +test_expect_success 'corrupt idx not opened' ' + idx=$(test-tool read-midx $objdir | grep "\.idx\$") && + mv $objdir/pack/$idx backup-$idx && + test_when_finished "mv backup-\$idx \$objdir/pack/\$idx" && + + # This is the minimum size for a sha-1 based .idx; this lets + # us pass perfunctory tests, but anything that actually opens and reads + # the idx file will complain. + test_copy_bytes 1064 <backup-$idx >$objdir/pack/$idx && + + git -c core.multiPackIndex=true rev-list --objects --all 2>err && + test_must_be_empty err +' + test_expect_success 'add more objects' ' for i in $(test_seq 6 10) do
When we have a .midx that covers many packfiles, we try to avoid opening the .idx for those packfiles. However, there are a few problems with the filename comparison we use: - we ask midx_contains_pack() about the .pack name, not the .idx name. But it compares to the latter. - we compute the basename of the pack using strrchr() to find the final slash. But that leaves an extra "/" at the start of our string; we need to advance past it. That also raises the question of what to do when the name does not have a slash at all. This should generally not happen (we always find files in "pack/"), but it doesn't hurt to be defensive here. The tests don't notice because there's nothing about opening those .idx files that would cause us to give incorrect output. It's just a little slower. The new test checks this case by corrupting the covered .idx, and then making sure we don't complain about it. Signed-off-by: Jeff King <peff@peff.net> --- packfile.c | 17 ++++++++++++++--- t/t5319-multi-pack-index.sh | 14 ++++++++++++++ 2 files changed, 28 insertions(+), 3 deletions(-)