Message ID | aaeb021538cdfeb83dc6004fe7b3ac35a23aef49.1668063122.git.dyroneteng@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 2aa84d5f3ea1966a81477ad31bee0136e39d3917 |
Headers | show |
Series | [v3,1/2] pack-bitmap.c: remove unnecessary "open_pack_index()" calls | expand |
On Thu, Nov 10, 2022 at 03:10:11PM +0800, Teng Long wrote: > It's likely that we'll end up relying on objects in that pack later > in the process (in which case we're doing the work of opening the > pack index optimistically), but not guaranteed. > > For instance, consider a repository with a large number of small > packs, and one large pack with a bitmap. If we see that bitmap pack > last in our loop which calls open_pack_bitmap_1(), the current code > will have opened *all* pack index files in the repository. If the > request can be served out of the bitmapped pack alone, then the time > spent opening these idx files was wasted.S By the way, I wondered if it was possible to measure a slowdown in this case. It is, but you have to try pretty hard. Something like this: # one bitmapped pack git repack -adb # and then a bunch of other packs git rev-list HEAD | head -10000 | while read commit; do echo $commit | git pack-objects .git/objects/pack/pack done # make the bitmapped one newest, since otherwise our non-bitmap lookup # of the initial traversal commit causes us to open all the other # packs first! bitmap=$(echo .git/objects/pack/pack-*.bitmap) touch ${bitmap%.bitmap}.* hyperfine -L v old,new './git.{v} rev-list --count --use-bitmap-index HEAD' where "new" and "old" are builds with and without this patch. I get: Benchmark 1: ./git.old rev-list --count --use-bitmap-index HEAD Time (mean ± σ): 117.9 ms ± 1.8 ms [User: 26.9 ms, System: 90.0 ms] Range (min … max): 113.4 ms … 120.3 ms 25 runs Benchmark 2: ./git.new rev-list --count --use-bitmap-index HEAD Time (mean ± σ): 71.8 ms ± 2.6 ms [User: 21.2 ms, System: 50.5 ms] Range (min … max): 67.0 ms … 75.1 ms 41 runs Summary './git.new rev-list --count --use-bitmap-index HEAD' ran 1.64 ± 0.06 times faster than './git.old rev-list --count --use-bitmap-index HEAD' which implies to me two things: - this probably isn't helping anybody much in the real world, as evidenced by the contortions I had to go through to set up the situation (and which would be made much better by repacking, which would also speed up non-bitmap operations). - it's worth doing anyway. Even if it only shaves off microseconds, the existing call is just pointless. -Peff
On Mon, Nov 14, 2022 at 05:03:56PM -0500, Jeff King wrote: > On Thu, Nov 10, 2022 at 03:10:11PM +0800, Teng Long wrote: > > > It's likely that we'll end up relying on objects in that pack later > > in the process (in which case we're doing the work of opening the > > pack index optimistically), but not guaranteed. > > > > For instance, consider a repository with a large number of small > > packs, and one large pack with a bitmap. If we see that bitmap pack > > last in our loop which calls open_pack_bitmap_1(), the current code > > will have opened *all* pack index files in the repository. If the > > request can be served out of the bitmapped pack alone, then the time > > spent opening these idx files was wasted.S > > By the way, I wondered if it was possible to measure a slowdown in this > case. It is, but you have to try pretty hard. Something like this: > > # one bitmapped pack > git repack -adb > > # and then a bunch of other packs > git rev-list HEAD | > head -10000 | > while read commit; do > echo $commit | git pack-objects .git/objects/pack/pack > done OK, so with 10K packs, we see about a 1.6-fold improvement, which is definitely substantial. On a fresh clone of git.git, repeating your experiment with only 1K packs (which is definitely a number of packs that GitHub sees in under-maintained repositories), the runtime goes from 25.3ms -> 20.9ms on my machine, or about a 1.2-fold improvement. So definitely smaller, but even at 1/10th the number of packs from your experiment, still noticeable. > - this probably isn't helping anybody much in the real world, as > evidenced by the contortions I had to go through to set up the > situation (and which would be made much better by repacking, which > would also speed up non-bitmap operations). Per above, I'm not sure I totally agree ;-). 1K packs is definitely an extreme amount of packs, but not out-of-this-world. It probably would show up in carefully-picked graphs, but not in "overall git rev-list time" or something as broad/noisy as that. > - it's worth doing anyway. Even if it only shaves off microseconds, > the existing call is just pointless. Yes, definitely. Thanks, Taylor
On Mon, Nov 14, 2022 at 05:14:37PM -0500, Taylor Blau wrote: > OK, so with 10K packs, we see about a 1.6-fold improvement, which is > definitely substantial. > > On a fresh clone of git.git, repeating your experiment with only 1K > packs (which is definitely a number of packs that GitHub sees in > under-maintained repositories), the runtime goes from 25.3ms -> 20.9ms > on my machine, or about a 1.2-fold improvement. > > So definitely smaller, but even at 1/10th the number of packs from your > experiment, still noticeable. Interesting. I had tried it initially with 1000 and the improvements were much smaller. I just did it again, though, and got the same 20% speedup. I'm not sure what I screwed up earlier (I may have been confused by the timestamp/sorting issue; I only realized it was important midway through looking into this). > > - this probably isn't helping anybody much in the real world, as > > evidenced by the contortions I had to go through to set up the > > situation (and which would be made much better by repacking, which > > would also speed up non-bitmap operations). > > Per above, I'm not sure I totally agree ;-). 1K packs is definitely an > extreme amount of packs, but not out-of-this-world. It probably would > show up in carefully-picked graphs, but not in "overall git rev-list > time" or something as broad/noisy as that. Yeah, I agree that 1k is a lot more compelling. The big impractical thing I think is that if the bitmapped pack is older (and it usually is), then we'd often open all the other packs anyway: - if the start of the traversal is in the bitmapped pack, then we fruitlessly open each of the others looking for the object (since the bitmapped one will come last in the reverse-chronological sorting) - if it isn't in the bitmapped pack, then we'll end up opening all those other packs anyway to fill out the bitmap (since by definition it can't be included in the on-disk bitmaps) So I'd be surprised if it ever mattered in the real world. Though again, I think the new code is less surprising in general, and could matter if we changed other things (e.g., if we prioritized lookups in a pack with a .bitmap). -Peff
On Mon, Nov 14, 2022 at 05:31:18PM -0500, Jeff King wrote: > Yeah, I agree that 1k is a lot more compelling. The big impractical > thing I think is that if the bitmapped pack is older (and it usually > is), then we'd often open all the other packs anyway: > > - if the start of the traversal is in the bitmapped pack, then we > fruitlessly open each of the others looking for the object (since > the bitmapped one will come last in the reverse-chronological > sorting) > > - if it isn't in the bitmapped pack, then we'll end up opening all > those other packs anyway to fill out the bitmap (since by definition > it can't be included in the on-disk bitmaps) > > So I'd be surprised if it ever mattered in the real world. Though again, > I think the new code is less surprising in general, and could matter if > we changed other things (e.g., if we prioritized lookups in a pack with > a .bitmap). I completely agree. It's definitely worth doing purely based on the principle of least-surprise. But the potential performance improvements are just gravy on top ;-). Thanks, Taylor
diff --git a/pack-bitmap.c b/pack-bitmap.c index 440407f1be..982e286bac 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -411,9 +411,6 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git struct stat st; char *bitmap_name; - if (open_pack_index(packfile)) - return -1; - bitmap_name = pack_bitmap_filename(packfile); fd = git_open(bitmap_name);