Message ID | ED25E182-C296-4D08-8170-340567D8964A@amazon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] index-pack: improve performance on NFS | expand |
"Jansen, Geert" <gerardu@amazon.com> writes: > The index-pack command determines if a sha1 collision test is needed by > checking the existence of a loose object with the given hash. In my tests, I > can improve performance of “git clone” on Amazon EFS by 8x when used with a > non-default mount option (lookupcache=pos) that's required for a Gitlab HA > setup. My assumption is that this check is unnecessary when cloning into a new > repository because the repository will be empty. My knee-jerk reaction is that your insight that we can skip the "dup check" when starting from emptiness is probably correct, but your use of .cloning flag as an approximation of "are we starting from emptiness?" is probably wrong, at least for two reasons. - "git clone --reference=..." does not strictly start from emptiness, and would still have to make sure that incoming pack does not try to inject an object with different contents but with the same name. - "git init && git fetch ..." starts from emptiness and would want to benefit from the same optimization as you are implementing here. As to the implementation, I think the patch adjusts the right "if()" condition to skip the collision test here. > - if (startup_info->have_repository) { > + if (startup_info->have_repository && !cloning) { > read_lock(); > collision_test_needed = > has_sha1_file_with_flags(oid->hash, OBJECT_INFO_QUICK); I just do not think !cloning is quite correct. Thanks.
On Fri, Oct 26 2018, Junio C Hamano wrote: > "Jansen, Geert" <gerardu@amazon.com> writes: > >> The index-pack command determines if a sha1 collision test is needed by >> checking the existence of a loose object with the given hash. In my tests, I >> can improve performance of “git clone” on Amazon EFS by 8x when used with a >> non-default mount option (lookupcache=pos) that's required for a Gitlab HA >> setup. My assumption is that this check is unnecessary when cloning into a new >> repository because the repository will be empty. > > My knee-jerk reaction is that your insight that we can skip the "dup > check" when starting from emptiness is probably correct, but your > use of .cloning flag as an approximation of "are we starting from > emptiness?" is probably wrong, at least for two reasons. > > - "git clone --reference=..." does not strictly start from > emptiness, and would still have to make sure that incoming pack > does not try to inject an object with different contents but with > the same name. > > - "git init && git fetch ..." starts from emptiness and would want > to benefit from the same optimization as you are implementing > here. > > As to the implementation, I think the patch adjusts the right "if()" > condition to skip the collision test here. > >> - if (startup_info->have_repository) { >> + if (startup_info->have_repository && !cloning) { >> read_lock(); >> collision_test_needed = >> has_sha1_file_with_flags(oid->hash, OBJECT_INFO_QUICK); > > I just do not think !cloning is quite correct. Geert: Thanks for working on this. A GitLab instance I'm involved in managing at work has suffered from this issue, e.g. with "fork" being a "clone" under the hood on GitLab, and taking ages on the NetApp NFS filer due to this issue, so I'm very interested in this moving forward. But as Junio notes the devil's in the details, another one I thought of is: GIT_OBJECT_DIRECTORY=/some/other/repository git clone ... It seems to me that it's better to stick this into setup_git_directory_gently() in setup.c and check various edge cases there. I.e. make this a new "have_object_already" member of the startup_info struct. That would be set depending on whether we find objects/packs in the objects dir, and would know about GIT_OBJECT_DIRECTORY (either just punting, or looking at those too). It would also need to know about read_info_alternates(), depending on when that's checked it would handle git clone --reference too since it just sets it up via add_to_alternates_file(). Then you'd change sha1_object() to just check startup_info->have_objects_already instead of startup_info->have_repository, and since we'd checked "did we have objects already?" it would work for the init && fetch case Junio mentioned. It would also be very nice to have a test for this, even if it's something OS-specific that only works on Linux after we probe for strace(1). Testing (without your patch, because git-am barfs on it , seems to dislake the base64 encoding): rm -rf /tmp/df; strace -f git clone --bare git@github.com:git/git.git /tmp/df 2>&1 | grep '".*ENOENT' 2>&1|perl -pe 's/^.*?"([^"]+)".*/$1/'|sort|uniq -c|sort -nr|less I see we also check if packed-refs exists ~2800 times, and check for each ref we find on the remote. Those are obviously less of a performance issue when cloning in the common case, but perhaps another place where we can insert a "don't check, we don't have anything already" condition.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > But as Junio notes the devil's in the details, another one I thought of > is: > > GIT_OBJECT_DIRECTORY=/some/other/repository git clone ... > > It seems to me that ... Actually I take all of that back ;-) For the purpose of this patch, use of existing .cloning field in the transport is fine, as the sole existing user of the field wants the field to mean "Are we starting with an empty object store?", and not "Are we running the command whose name is 'git clone'?". Now, the logic used to set the field to true may have room to be improved. We should do that as a separte and orthogonal effort so that the two cases I mentioned plus the new one(s) you brought up would also be taken into account, so that we can set the .cloning field more accurately to suite the two callers' needs---one caller is the age-old one added by beea4152 ("clone: support remote shallow repository", 2013-12-05), and the other one is what Geert is adding in this thread. We _may_ want to rename that transport.cloning field to better reflect what it truly means (it is not "are we cloning?" but "are there any objects in the repo to worry about?") as a preparatory step before Geert's patch, but I do not think we should make it a requirement to take the "let's skip collision check" patch to improve the logic used to set that .cloning field.
On Sat, Oct 27, 2018 at 04:26:50PM +0900, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > > > But as Junio notes the devil's in the details, another one I thought of > > is: > > > > GIT_OBJECT_DIRECTORY=/some/other/repository git clone ... > > > > It seems to me that ... > > Actually I take all of that back ;-) > > For the purpose of this patch, use of existing .cloning field in the > transport is fine, as the sole existing user of the field wants the > field to mean "Are we starting with an empty object store?", and not > "Are we running the command whose name is 'git clone'?". Taking one step back, the root problem in this thread is that stat() on non-existing files is slow (which makes has_sha1_file slow). One solution there is to cache the results of looking in .git/objects (or any alternate object store) for loose files. And indeed, this whole scheme is just a specialized form of that: it's a flag to say "hey, we do not have any objects yet, so do not bother looking". Could we implement that in a more direct and central way? And could we implement it in a way that catches more cases? E.g., if I have _one_ object, that defeats this specialized optimization, but it is probably still beneficial to cache that knowledge (and the reasonable cutoff is probably not 1, but some value of N loose objects). Of course any cache raises questions of cache invalidation, but I think we've already dealt with that for this case. When we use OBJECT_INFO_QUICK, that is a sign that we want to make this kind of accuracy/speed tradeoff (which does a similar caching thing with packfiles). So putting that all together, could we have something like: diff --git a/object-store.h b/object-store.h index 63b7605a3e..28cde568a0 100644 --- a/object-store.h +++ b/object-store.h @@ -135,6 +135,18 @@ struct raw_object_store { */ struct packed_git *all_packs; + /* + * A set of all loose objects we have. This probably ought to be split + * into a set of 256 caches so that we can fault in one directory at a + * time. + */ + struct oid_array loose_cache; + enum { + LOOSE_CACHE_UNFILLED = 0, + LOOSE_CACHE_INVALID, + LOOSE_CACHE_VALID + } loose_cache_status; + /* * A fast, rough count of the number of objects in the repository. * These two fields are not meant for direct access. Use diff --git a/packfile.c b/packfile.c index 86074a76e9..68ca4fff0e 100644 --- a/packfile.c +++ b/packfile.c @@ -990,6 +990,8 @@ void reprepare_packed_git(struct repository *r) r->objects->approximate_object_count_valid = 0; r->objects->packed_git_initialized = 0; prepare_packed_git(r); + oid_array_clear(&r->objects->loose_cache); + r->objects->loose_cache_status = LOOSE_CACHE_UNFILLED; } struct packed_git *get_packed_git(struct repository *r) diff --git a/sha1-file.c b/sha1-file.c index dd0b6aa873..edbe037eaa 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1172,6 +1172,40 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep) return parse_sha1_header_extended(hdr, &oi, 0); } +/* probably should be configurable? */ +#define LOOSE_OBJECT_CACHE_MAX 65536 + +static int fill_loose_cache(const struct object_id *oid, + const char *path, + void *data) +{ + struct oid_array *cache = data; + + if (cache->nr == LOOSE_OBJECT_CACHE_MAX) + return -1; + + oid_array_append(data, oid); + return 0; +} + +static int quick_has_loose(struct raw_object_store *r, + struct object_id *oid) +{ + struct oid_array *cache = &r->loose_cache; + + if (r->loose_cache_status == LOOSE_CACHE_UNFILLED) { + if (for_each_loose_object(fill_loose_cache, cache, 0) < 0) + r->loose_cache_status = LOOSE_CACHE_INVALID; + else + r->loose_cache_status = LOOSE_CACHE_VALID; + } + + if (r->loose_cache_status == LOOSE_CACHE_INVALID) + return -1; + + return oid_array_lookup(cache, oid) >= 0; +} + static int sha1_loose_object_info(struct repository *r, const unsigned char *sha1, struct object_info *oi, int flags) @@ -1198,6 +1232,19 @@ static int sha1_loose_object_info(struct repository *r, if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) { const char *path; struct stat st; + if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK)) { + struct object_id oid; + hashcpy(oid.hash, sha1); + switch (quick_has_loose(r->objects, &oid)) { + case 0: + return -1; /* missing: error */ + case 1: + return 0; /* have: 0 == success */ + default: + /* unknown; fall back to stat */ + break; + } + } if (stat_sha1_file(r, sha1, &st, &path) < 0) return -1; if (oi->disk_sizep) That's mostly untested, but it might be enough to run some timing tests with. I think if we want to pursue this, we'd want to address the bits I mentioned in the comments, and look at unifying this with the loose cache from cc817ca3ef (which if I had remembered we added, probably would have saved some time writing the above ;) ). -Peff
On Sat, Oct 27 2018, Jeff King wrote: > On Sat, Oct 27, 2018 at 04:26:50PM +0900, Junio C Hamano wrote: > >> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> >> > But as Junio notes the devil's in the details, another one I thought of >> > is: >> > >> > GIT_OBJECT_DIRECTORY=/some/other/repository git clone ... >> > >> > It seems to me that ... >> >> Actually I take all of that back ;-) >> >> For the purpose of this patch, use of existing .cloning field in the >> transport is fine, as the sole existing user of the field wants the >> field to mean "Are we starting with an empty object store?", and not >> "Are we running the command whose name is 'git clone'?". > > Taking one step back, the root problem in this thread is that stat() on > non-existing files is slow (which makes has_sha1_file slow). > > One solution there is to cache the results of looking in .git/objects > (or any alternate object store) for loose files. And indeed, this whole > scheme is just a specialized form of that: it's a flag to say "hey, we > do not have any objects yet, so do not bother looking". > > Could we implement that in a more direct and central way? And could we > implement it in a way that catches more cases? E.g., if I have _one_ > object, that defeats this specialized optimization, but it is probably > still beneficial to cache that knowledge (and the reasonable cutoff is > probably not 1, but some value of N loose objects). > > Of course any cache raises questions of cache invalidation, but I think > we've already dealt with that for this case. When we use > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of > accuracy/speed tradeoff (which does a similar caching thing with > packfiles). > > So putting that all together, could we have something like: > > diff --git a/object-store.h b/object-store.h > index 63b7605a3e..28cde568a0 100644 > --- a/object-store.h > +++ b/object-store.h > @@ -135,6 +135,18 @@ struct raw_object_store { > */ > struct packed_git *all_packs; > > + /* > + * A set of all loose objects we have. This probably ought to be split > + * into a set of 256 caches so that we can fault in one directory at a > + * time. > + */ > + struct oid_array loose_cache; > + enum { > + LOOSE_CACHE_UNFILLED = 0, > + LOOSE_CACHE_INVALID, > + LOOSE_CACHE_VALID > + } loose_cache_status; > + > /* > * A fast, rough count of the number of objects in the repository. > * These two fields are not meant for direct access. Use > diff --git a/packfile.c b/packfile.c > index 86074a76e9..68ca4fff0e 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -990,6 +990,8 @@ void reprepare_packed_git(struct repository *r) > r->objects->approximate_object_count_valid = 0; > r->objects->packed_git_initialized = 0; > prepare_packed_git(r); > + oid_array_clear(&r->objects->loose_cache); > + r->objects->loose_cache_status = LOOSE_CACHE_UNFILLED; > } > > struct packed_git *get_packed_git(struct repository *r) > diff --git a/sha1-file.c b/sha1-file.c > index dd0b6aa873..edbe037eaa 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -1172,6 +1172,40 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep) > return parse_sha1_header_extended(hdr, &oi, 0); > } > > +/* probably should be configurable? */ > +#define LOOSE_OBJECT_CACHE_MAX 65536 > + > +static int fill_loose_cache(const struct object_id *oid, > + const char *path, > + void *data) > +{ > + struct oid_array *cache = data; > + > + if (cache->nr == LOOSE_OBJECT_CACHE_MAX) > + return -1; > + > + oid_array_append(data, oid); > + return 0; > +} > + > +static int quick_has_loose(struct raw_object_store *r, > + struct object_id *oid) The assumption with making it exactly 0 objects and not any value of >0 is that we can safely assume that a "clone" or initial "fetch"[1] is special in ways that a clone isn't. I.e. we're starting out with nothing and doing the initial population, that's probably not as true in an existing repo that's getting concurrent fetches, commits, rebases etc. But in the spirit of taking a step back, maybe we should take two steps back and consider why we're doing this at all. Three of our tests fail if we compile git like this, and cloning is much faster (especially on NFS): diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 2004e25da2..0c2d008ee0 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -796,3 +796,3 @@ static void sha1_object(const void *data, struct object_entry *obj_entry, - if (startup_info->have_repository) { + if (0) { read_lock(); Even on a local disk I'm doing 262759 lstat() calls cloning git.git and spending 5% of my time on that. But why do we have this in the first place? It's because of 8685da4256 ("don't ever allow SHA1 collisions to exist by fetching a pack", 2007-03-20) and your 51054177b3 ("index-pack: detect local corruption in collision check", 2017-04-01). I.e. we are worried about (and those tests check for): a) A malicious user trying to give us repository where they have created an object with the same SHA-1 that's different, as in the SHAttered attack. I remember (but have not dug up) an old E-Mail from Linus saying that this was an important security aspect of git, i.e. even if SHA-1 was broken you couldn't easily propagate bad objects. b) Cases where we've ended up with different content for a SHA-1 due to e.g. a local FS corruption. Which is the subject of your commit in 2017. c) Are there cases where fetch.fsckObjects is off and we just flip a bit on the wire and don't notice? I think not because we always check the pack checksum (don't we), but I'm not 100% sure. I'm inclined to think that we should at the very least make this configurable. Running into a) is the least of my worries when operating some git server on NFS. The b) case is also a concern, but in that case we'd actually be improving things by writing out the duplicate object, if that was followed-up with something like the "null" negotiator[2] and gc/repack being able to look at the two objects, check their SHA-1/content and throw away the bad one we'd have the ability to heal a corrupt repository where we now just produce a hard error. Even if someone wants to make the argument that this is behavior that we absolutely *MUST* keep and not make configurable, there's still much smarter ways to do it. We could e.g. just unconditionally write out the packfile into a quarantine environment (see 720dae5a19 ("config doc: elaborate on fetch.fsckObjects security", 2018-07-27)), *then* loop over the loose objects and packs we have and see if any of those exist in the new pack, if they do, do the current assertion, and if not (and fetch.fsckObjects passes) move it out of the quarantine. I'm most inclined to say we should just have a config option to disable this in lieu of fancier solutions. I think a) is entirely implausible (and I'm not worrying about state-level actors attacking my git repos), and b) would be no worse than it is today. 1. Although less so for initial fetch, think a) setup bunch of remotes b) parallel 'git fetch {}' ::: $(git remote) 2. https://public-inbox.org/git/87o9ciisg6.fsf@evledraar.gmail.com/
On Sat, Oct 27, 2018 at 11:34 AM Jeff King <peff@peff.net> wrote: > Taking one step back, the root problem in this thread is that stat() on > non-existing files is slow (which makes has_sha1_file slow). > > One solution there is to cache the results of looking in .git/objects > (or any alternate object store) for loose files. And indeed, this whole > scheme is just a specialized form of that: it's a flag to say "hey, we > do not have any objects yet, so do not bother looking". > > Could we implement that in a more direct and central way? And could we > implement it in a way that catches more cases? E.g., if I have _one_ > object, that defeats this specialized optimization, but it is probably > still beneficial to cache that knowledge (and the reasonable cutoff is > probably not 1, but some value of N loose objects). And we do hit this on normal git-fetch. The larger the received pack is (e.g. if you don't fetch often), the more stat() we do, so your approach is definitely better. > Of course any cache raises questions of cache invalidation, but I think > we've already dealt with that for this case. When we use > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of > accuracy/speed tradeoff (which does a similar caching thing with > packfiles). We don't care about a separate process adding more loose objects while index-pack is running, do we? I'm guessing we don't but just to double check... > So putting that all together, could we have something like: > > diff --git a/object-store.h b/object-store.h > index 63b7605a3e..28cde568a0 100644 > --- a/object-store.h > +++ b/object-store.h > @@ -135,6 +135,18 @@ struct raw_object_store { > */ > struct packed_git *all_packs; > > + /* > + * A set of all loose objects we have. This probably ought to be split > + * into a set of 256 caches so that we can fault in one directory at a > + * time. > + */ > + struct oid_array loose_cache; > + enum { > + LOOSE_CACHE_UNFILLED = 0, > + LOOSE_CACHE_INVALID, > + LOOSE_CACHE_VALID > + } loose_cache_status; > + > /* > * A fast, rough count of the number of objects in the repository. > * These two fields are not meant for direct access. Use > diff --git a/packfile.c b/packfile.c > index 86074a76e9..68ca4fff0e 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -990,6 +990,8 @@ void reprepare_packed_git(struct repository *r) > r->objects->approximate_object_count_valid = 0; > r->objects->packed_git_initialized = 0; > prepare_packed_git(r); > + oid_array_clear(&r->objects->loose_cache); > + r->objects->loose_cache_status = LOOSE_CACHE_UNFILLED; > } > > struct packed_git *get_packed_git(struct repository *r) > diff --git a/sha1-file.c b/sha1-file.c > index dd0b6aa873..edbe037eaa 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -1172,6 +1172,40 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep) > return parse_sha1_header_extended(hdr, &oi, 0); > } > > +/* probably should be configurable? */ Yes, perhaps with gc.auto config value (multiplied by 256) as the cut point. If it's too big maybe just go with a bloom filter. For this particular case we expect like 99% of calls to miss. > +#define LOOSE_OBJECT_CACHE_MAX 65536 > + > +static int fill_loose_cache(const struct object_id *oid, > + const char *path, > + void *data) > +{ > + struct oid_array *cache = data; > + > + if (cache->nr == LOOSE_OBJECT_CACHE_MAX) > + return -1; > + > + oid_array_append(data, oid); > + return 0; > +} > + > +static int quick_has_loose(struct raw_object_store *r, > + struct object_id *oid) > +{ > + struct oid_array *cache = &r->loose_cache; > + > + if (r->loose_cache_status == LOOSE_CACHE_UNFILLED) { > + if (for_each_loose_object(fill_loose_cache, cache, 0) < 0) > + r->loose_cache_status = LOOSE_CACHE_INVALID; > + else > + r->loose_cache_status = LOOSE_CACHE_VALID; > + } > + > + if (r->loose_cache_status == LOOSE_CACHE_INVALID) > + return -1; > + > + return oid_array_lookup(cache, oid) >= 0; > +} > + > static int sha1_loose_object_info(struct repository *r, > const unsigned char *sha1, > struct object_info *oi, int flags) > @@ -1198,6 +1232,19 @@ static int sha1_loose_object_info(struct repository *r, > if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) { > const char *path; > struct stat st; > + if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK)) { > + struct object_id oid; > + hashcpy(oid.hash, sha1); > + switch (quick_has_loose(r->objects, &oid)) { > + case 0: > + return -1; /* missing: error */ > + case 1: > + return 0; /* have: 0 == success */ > + default: > + /* unknown; fall back to stat */ > + break; > + } > + } > if (stat_sha1_file(r, sha1, &st, &path) < 0) > return -1; > if (oi->disk_sizep) > > That's mostly untested, but it might be enough to run some timing tests > with. I think if we want to pursue this, we'd want to address the bits I > mentioned in the comments, and look at unifying this with the loose > cache from cc817ca3ef (which if I had remembered we added, probably > would have saved some time writing the above ;) ). > > -Peff
Jeff King <peff@peff.net> writes: > Of course any cache raises questions of cache invalidation, but I think > we've already dealt with that for this case. When we use > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of > accuracy/speed tradeoff (which does a similar caching thing with > packfiles). > > So putting that all together, could we have something like: I think this conceptually is a vast improvement relative to ".cloning" optimization. Obviously this does not have the huge downside of the other approach that turns the collision detection completely off. A real question is how much performance gain, relative to ".cloning" thing, this approach gives us. If it gives us 80% or more of the gain compared to doing no checking, I'd say we have a clear winner. > That's mostly untested, but it might be enough to run some timing tests > with. I think if we want to pursue this, we'd want to address the bits I > mentioned in the comments, and look at unifying this with the loose > cache from cc817ca3ef (which if I had remembered we added, probably > would have saved some time writing the above ;) ). Yup.
On Sat, Oct 27, 2018 at 01:22:16PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Taking one step back, the root problem in this thread is that stat() on > > non-existing files is slow (which makes has_sha1_file slow). > > > > One solution there is to cache the results of looking in .git/objects > > (or any alternate object store) for loose files. And indeed, this whole > > scheme is just a specialized form of that: it's a flag to say "hey, we > > do not have any objects yet, so do not bother looking". > > > > Could we implement that in a more direct and central way? And could we > > implement it in a way that catches more cases? E.g., if I have _one_ > > object, that defeats this specialized optimization, but it is probably > > still beneficial to cache that knowledge (and the reasonable cutoff is > > probably not 1, but some value of N loose objects). > [...] > > The assumption with making it exactly 0 objects and not any value of >0 > is that we can safely assume that a "clone" or initial "fetch"[1] is > special in ways that a clone isn't. I.e. we're starting out with nothing > and doing the initial population, that's probably not as true in an > existing repo that's getting concurrent fetches, commits, rebases etc. I assume you mean s/that a clone isn't/that a fetch isn't/. I agree there are cases where you might be able to go further if you assume a full "0". But my point is that "clone" is an ambiguous concept, and it doesn't map completely to what's actually slow here. So if you only look at "are we cloning", then: - you have a bunch of cases which are "clones", but aren't actually starting from scratch - you get zero benefit in the non-clone cases, when we could be scaling the benefit smoothly > But in the spirit of taking a step back, maybe we should take two steps > back and consider why we're doing this at all. OK, I think it's worth discussing, and I'll do that below. But first I want to say... > Three of our tests fail if we compile git like this, and cloning is much > faster (especially on NFS): > > diff --git a/builtin/index-pack.c b/builtin/index-pack.c > index 2004e25da2..0c2d008ee0 100644 > --- a/builtin/index-pack.c > +++ b/builtin/index-pack.c > @@ -796,3 +796,3 @@ static void sha1_object(const void *data, struct object_entry *obj_entry, > > - if (startup_info->have_repository) { > + if (0) { > read_lock(); > > Even on a local disk I'm doing 262759 lstat() calls cloning git.git and > spending 5% of my time on that. With the caching patch I posted earlier, I see roughly the same speedup on an index-pack of git.git as I do with disabling the collision check entirely (I did see about a 1% difference in favor of what you wrote above, which was within the noise, but may well be valid due to slightly reduced lock contention). TBH I'm not sure if any of this is actually worth caring about on a normal Linux system, though. There stat() is fast. It might be much more interesting on macOS or Windows, or on a Linux system on NFS. > But why do we have this in the first place? It's because of 8685da4256 > ("don't ever allow SHA1 collisions to exist by fetching a pack", > 2007-03-20) and your 51054177b3 ("index-pack: detect local corruption in > collision check", 2017-04-01). > > I.e. we are worried about (and those tests check for): > > a) A malicious user trying to give us repository where they have > created an object with the same SHA-1 that's different, as in the > SHAttered attack. > > I remember (but have not dug up) an old E-Mail from Linus saying > that this was an important security aspect of git, i.e. even if > SHA-1 was broken you couldn't easily propagate bad objects. Yeah, especially given recent advances in SHA-1 attacks, I'm not super comfortable with the idea of disabling the duplicate-object check at this point. > b) Cases where we've ended up with different content for a SHA-1 due to > e.g. a local FS corruption. Which is the subject of your commit in > 2017. Sort of. We actually detected it before my patch, but we just gave a really crappy error message. ;) > c) Are there cases where fetch.fsckObjects is off and we just flip a > bit on the wire and don't notice? I think not because we always > check the pack checksum (don't we), but I'm not 100% sure. We'd detect bit-blips on the wire due to the pack checksum. But what's more interesting are bit-flips on the disk of the sender, which would then put the bogus data into the pack checksum they generate on the fly. However, we do detect such a bit-flip, even without fsckObjects, because the sender does not tell us the expected sha-1 of each object. It gives us a stream of objects, and the receiver computes the sha-1's themselves. So a bit flip manifests in the connectivity-check when we say "hey, the other side should have sent us object X but did not" (we do not say "gee, what is this object Y they sent?" because after not seeing X, we do not know which objects would have been reachable, so we have a whole bunch of such Y's). fetch.fsckObjects is purely about doing semantic object-quality checks. They're not even that expensive to do. The main reason they're disabled is that there are many historical objects floating around that fail them (I think it would be a useful exercise to sort the existing checks by priority, downgrading many of them to warnings, and then setting the default for fetch.fsckObjects to "reject anything above warning"). > Even if someone wants to make the argument that this is behavior that we > absolutely *MUST* keep and not make configurable, there's still much > smarter ways to do it. I don't have any real object to a configuration like this, if people want to experiment with it. But in contrast, the patch I showed earlier: - is safe enough to just turn on all the time, without the user having to configure anything nor make a safety tradeoff - speeds up all the other spots that use OBJECT_INFO_QUICK (like fetch's tag-following, or what appears to be the exact same optimization done manually inside mark_complete_and_common-ref()). > We could e.g. just unconditionally write out the packfile into a > quarantine environment (see 720dae5a19 ("config doc: elaborate on > fetch.fsckObjects security", 2018-07-27)), *then* loop over the loose > objects and packs we have and see if any of those exist in the new pack, > if they do, do the current assertion, and if not (and fetch.fsckObjects > passes) move it out of the quarantine. Yes, I agree that would work, though it's a bigger architecture change. -Peff
On Mon, Oct 29, 2018 at 11:04:53AM -0400, Jeff King wrote: > > Even if someone wants to make the argument that this is behavior that we > > absolutely *MUST* keep and not make configurable, there's still much > > smarter ways to do it. > > I don't have any real object to a configuration like this, if people > want to experiment with it. But in contrast, the patch I showed earlier: > > - is safe enough to just turn on all the time, without the user having > to configure anything nor make a safety tradeoff > > - speeds up all the other spots that use OBJECT_INFO_QUICK (like > fetch's tag-following, or what appears to be the exact same > optimization done manually inside mark_complete_and_common-ref()). One thing I forgot to add. We're focusing here on the case where the objects _aren't_ present, and we're primarily trying to get rid of the stat call. But when we actually do see a duplicate, we open up the object and actually compare the bytes. Eliminating the collision check entirely would save that work, which is obviously something that can't be improved by just caching the existence of loose objects. I'm not sure how often that case happens in a normal repository. We see it a fair on GitHub servers because of the way we use alternates (i.e., we often already have the object you pushed up because it's present in another fork and available via objects/info/alternates). -Peff
On Sat, Oct 27, 2018 at 04:04:32PM +0200, Duy Nguyen wrote: > > Of course any cache raises questions of cache invalidation, but I think > > we've already dealt with that for this case. When we use > > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of > > accuracy/speed tradeoff (which does a similar caching thing with > > packfiles). > > We don't care about a separate process adding more loose objects while > index-pack is running, do we? I'm guessing we don't but just to double > check... Right. That's basically what QUICK means: don't bother re-examining the repository to handle simultaneous writes, even if it means saying an object is not there when it has recently appeared. So far it has only applied to packs, but this is really just the same concept (just as we would not notice a new pack arriving, we will not notice a new loose object arriving). > > +/* probably should be configurable? */ > > +#define LOOSE_OBJECT_CACHE_MAX 65536 > > Yes, perhaps with gc.auto config value (multiplied by 256) as the cut > point. If it's too big maybe just go with a bloom filter. For this > particular case we expect like 99% of calls to miss. I wonder, though, if we should have a maximum at all. The existing examples I've found of this technique are: - mark_complete_and_common_ref(), which is trying to cover this exact case. It looks like it avoids adding more objects than there are refs, so I guess it actually has a pretty small cutoff. - find_short_object_filename(), which does the same thing with no limits. And there if we _do_ have a lot of objects, we'd still prefer to keep the cache. And really, this list is pretty much equivalent to looking at a pack .idx. The only difference is that one is mmap'd, but here we'd use the heap. So it's not shared between processes, but otherwise the working set size is similar. -Peff
On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote: > > Of course any cache raises questions of cache invalidation, but I think > > we've already dealt with that for this case. When we use > > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of > > accuracy/speed tradeoff (which does a similar caching thing with > > packfiles). > > > > So putting that all together, could we have something like: > > I think this conceptually is a vast improvement relative to > ".cloning" optimization. Obviously this does not have the huge > downside of the other approach that turns the collision detection > completely off. > > A real question is how much performance gain, relative to ".cloning" > thing, this approach gives us. If it gives us 80% or more of the > gain compared to doing no checking, I'd say we have a clear winner. My test runs showed it improving index-pack by about 3%, versus 4% for no collision checking at all. But there was easily 1% of noise. And much more importantly, that was on a Linux system on ext4, where stat is fast. I'd be much more curious to hear timing results from people on macOS or Windows, or from Geert's original NFS case. -Peff
On Mon, Oct 29 2018, Jeff King wrote: > On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote: > >> > Of course any cache raises questions of cache invalidation, but I think >> > we've already dealt with that for this case. When we use >> > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of >> > accuracy/speed tradeoff (which does a similar caching thing with >> > packfiles). >> > >> > So putting that all together, could we have something like: >> >> I think this conceptually is a vast improvement relative to >> ".cloning" optimization. Obviously this does not have the huge >> downside of the other approach that turns the collision detection >> completely off. >> >> A real question is how much performance gain, relative to ".cloning" >> thing, this approach gives us. If it gives us 80% or more of the >> gain compared to doing no checking, I'd say we have a clear winner. > > My test runs showed it improving index-pack by about 3%, versus 4% for > no collision checking at all. But there was easily 1% of noise. And much > more importantly, that was on a Linux system on ext4, where stat is > fast. I'd be much more curious to hear timing results from people on > macOS or Windows, or from Geert's original NFS case. At work we make copious use of NetApp over NFS for filers. I'd say this is probably typical for enterprise environments. Raw I/O performance over the wire (writing a large file) is really good, but metadata (e.g. stat) performance tends to be atrocious. We both host the in-house Git server (GitLab) on such a filer (for HA etc.), as well as many types of clients. As noted by Geert upthread you need to mount the git directories with lookupcache=positive (see e.g. [1]). Cloning git.git as --bare onto such a partition with my patch: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.98 1.802091 19 93896 19813 futex 14.64 0.432782 7 61415 16 read 9.40 0.277804 1 199576 pread64 4.88 0.144172 3 49355 11 write 3.10 0.091498 31 2919 2880 stat 2.53 0.074812 31 2431 737 lstat 1.96 0.057934 3 17257 1276 recvfrom 0.91 0.026815 3 8543 select 0.62 0.018425 2 8543 poll [...] real 0m32.053s user 0m21.451s sys 0m7.806s Without: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 71.01 31.653787 50 628265 21608 futex 24.14 10.761950 41 260658 258964 lstat 2.22 0.988001 5 199576 pread64 1.32 0.587844 10 59662 3 read 0.79 0.350625 7 50376 11 write 0.22 0.096019 33 2919 2880 stat 0.13 0.057950 4 15821 12 recvfrom 0.05 0.022385 3 7949 select 0.04 0.015988 2 7949 poll 0.03 0.013622 3406 4 wait4 [...] real 4m38.670s user 0m29.015s sys 0m33.894s So a reduction in clone time by ~90%. Performance would be basically the same with your patch. But let's discuss that elsewhere in this thread. Just wanted to post the performance numbers here. 1. https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/109#note_12528896
On Mon, Oct 29 2018, Jeff King wrote: > On Sat, Oct 27, 2018 at 01:22:16PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > Taking one step back, the root problem in this thread is that stat() on >> > non-existing files is slow (which makes has_sha1_file slow). >> > >> > One solution there is to cache the results of looking in .git/objects >> > (or any alternate object store) for loose files. And indeed, this whole >> > scheme is just a specialized form of that: it's a flag to say "hey, we >> > do not have any objects yet, so do not bother looking". >> > >> > Could we implement that in a more direct and central way? And could we >> > implement it in a way that catches more cases? E.g., if I have _one_ >> > object, that defeats this specialized optimization, but it is probably >> > still beneficial to cache that knowledge (and the reasonable cutoff is >> > probably not 1, but some value of N loose objects). >> [...] >> >> The assumption with making it exactly 0 objects and not any value of >0 >> is that we can safely assume that a "clone" or initial "fetch"[1] is >> special in ways that a clone isn't. I.e. we're starting out with nothing >> and doing the initial population, that's probably not as true in an >> existing repo that's getting concurrent fetches, commits, rebases etc. > > I assume you mean s/that a clone isn't/that a fetch isn't/. Yes, sorry. > I agree there are cases where you might be able to go further if you > assume a full "0". But my point is that "clone" is an ambiguous concept, > and it doesn't map completely to what's actually slow here. So if you > only look at "are we cloning", then: > > - you have a bunch of cases which are "clones", but aren't actually > starting from scratch > > - you get zero benefit in the non-clone cases, when we could be > scaling the benefit smoothly Indeed. It's not special in principle, but I think in practice the biggest wins are in the clone case, and unlike "fetch" we can safely assume we're free of race conditions. More on that below. >> But in the spirit of taking a step back, maybe we should take two steps >> back and consider why we're doing this at all. > > OK, I think it's worth discussing, and I'll do that below. But first I > want to say... > >> Three of our tests fail if we compile git like this, and cloning is much >> faster (especially on NFS): >> >> diff --git a/builtin/index-pack.c b/builtin/index-pack.c >> index 2004e25da2..0c2d008ee0 100644 >> --- a/builtin/index-pack.c >> +++ b/builtin/index-pack.c >> @@ -796,3 +796,3 @@ static void sha1_object(const void *data, struct object_entry *obj_entry, >> >> - if (startup_info->have_repository) { >> + if (0) { >> read_lock(); >> >> Even on a local disk I'm doing 262759 lstat() calls cloning git.git and >> spending 5% of my time on that. > > With the caching patch I posted earlier, I see roughly the same speedup > on an index-pack of git.git as I do with disabling the collision check > entirely (I did see about a 1% difference in favor of what you wrote > above, which was within the noise, but may well be valid due to slightly > reduced lock contention). > > TBH I'm not sure if any of this is actually worth caring about on a > normal Linux system, though. There stat() is fast. It might be much more > interesting on macOS or Windows, or on a Linux system on NFS. It matters a *lot* on NFS as my performance numbers in https://public-inbox.org/git/87d0rslhkl.fsf@evledraar.gmail.com/ show. >> But why do we have this in the first place? It's because of 8685da4256 >> ("don't ever allow SHA1 collisions to exist by fetching a pack", >> 2007-03-20) and your 51054177b3 ("index-pack: detect local corruption in >> collision check", 2017-04-01). >> >> I.e. we are worried about (and those tests check for): >> >> a) A malicious user trying to give us repository where they have >> created an object with the same SHA-1 that's different, as in the >> SHAttered attack. >> >> I remember (but have not dug up) an old E-Mail from Linus saying >> that this was an important security aspect of git, i.e. even if >> SHA-1 was broken you couldn't easily propagate bad objects. > > Yeah, especially given recent advances in SHA-1 attacks, I'm not super > comfortable with the idea of disabling the duplicate-object check at > this point. I'd be comfortable with it in my setup since it's been limited to collision attacks that are computationally prohibitive, and there being no sign of preimage attacks, which is the case we really need to worry about. >> b) Cases where we've ended up with different content for a SHA-1 due to >> e.g. a local FS corruption. Which is the subject of your commit in >> 2017. > > Sort of. We actually detected it before my patch, but we just gave a > really crappy error message. ;) > >> c) Are there cases where fetch.fsckObjects is off and we just flip a >> bit on the wire and don't notice? I think not because we always >> check the pack checksum (don't we), but I'm not 100% sure. > > We'd detect bit-blips on the wire due to the pack checksum. But what's > more interesting are bit-flips on the disk of the sender, which would > then put the bogus data into the pack checksum they generate on the fly. > > However, we do detect such a bit-flip, even without fsckObjects, because > the sender does not tell us the expected sha-1 of each object. It gives > us a stream of objects, and the receiver computes the sha-1's > themselves. So a bit flip manifests in the connectivity-check when we > say "hey, the other side should have sent us object X but did not" (we > do not say "gee, what is this object Y they sent?" because after not > seeing X, we do not know which objects would have been reachable, so we > have a whole bunch of such Y's). > > fetch.fsckObjects is purely about doing semantic object-quality checks. > They're not even that expensive to do. The main reason they're disabled > is that there are many historical objects floating around that fail > them (I think it would be a useful exercise to sort the existing checks > by priority, downgrading many of them to warnings, and then setting the > default for fetch.fsckObjects to "reject anything above warning"). Thanks, that's really informative & useful. >> Even if someone wants to make the argument that this is behavior that we >> absolutely *MUST* keep and not make configurable, there's still much >> smarter ways to do it. > > I don't have any real object to a configuration like this, if people > want to experiment with it. But in contrast, the patch I showed earlier: > > - is safe enough to just turn on all the time, without the user having > to configure anything nor make a safety tradeoff I think it's a useful patch to carry forward, and agree that it should be turned on by default. It does introduce a race condition where you can introduce a colliding object to the repository by doing two concurrent pushes, but as you note in https://public-inbox.org/git/20181029151842.GJ17668@sigill.intra.peff.net/ this already applies to packs, so you can trigger that with the right sized push (depending on transfer.unpackLimit), and we also have this in existing forms for other stuff. I do think it's amazingly paranoid to be worried about SHA-1 collisions in the first place, and a bit odd to leave the door open on these race conditions. I.e. it's hard to imagine a state-level[1] actor with sufficient motivation to exploit this who wouldn't find some way to make the race condition work as an escape hatch. I admit just leaving that race condition does close a lot of doors entirely. I.e. you could sometimes trigger a collision but wouldn't have the right conditions to exploit the race condition. > - speeds up all the other spots that use OBJECT_INFO_QUICK (like > fetch's tag-following, or what appears to be the exact same > optimization done manually inside mark_complete_and_common-ref()). We also pay a constant cost of doing an opendir() / readdir() on however many loose objects we have for every push on the server-side. While it's not as bad as stat() in a loop that's also quite slow on NFS. In a busy repo that gets a lot of branches / branch deletions (so not quite as extreme as [2], but close) and the default expiry policy you can easily have 20-100K loose objects (something near the lower bound of that is the current live state of one server I'm looking at). A recursive opendir()/readdir() on that on local disk is really fast if it's in cache, but can easily be 1-5 seconds on NFS. So for a push we'd now pay up to 5s just populating a cache we'll bearly use to accept some tiny push with just a few objects. I also found when writing "index-pack: add ability to disable SHA-1 collision check" that it's really handy to recover from some forms of repo corruption, so I've documented that. So aside from the performance case it's a useful knob to have. So what I'll do is: * Re-roll my 4 patch series to include the patch you have in <20181027093300.GA23974@sigill.intra.peff.net> * Turn that behavior on by default, but have some knob to toggle it off, because as noted above on some performance sensitive NFS cases I'd really like to not have the cache *AND* not have the collision check, performance will suffer with the cache. >> We could e.g. just unconditionally write out the packfile into a >> quarantine environment (see 720dae5a19 ("config doc: elaborate on >> fetch.fsckObjects security", 2018-07-27)), *then* loop over the loose >> objects and packs we have and see if any of those exist in the new pack, >> if they do, do the current assertion, and if not (and fetch.fsckObjects >> passes) move it out of the quarantine. > > Yes, I agree that would work, though it's a bigger architecture change. 1. "state-level" because even though Google's collision cost ~$100k we're talking about a *much* harder problem in practice of doing something useful. E.g. replacing git.c with an actual exploit, not just two cute specially crafted PDFs. 2. https://public-inbox.org/git/87fu6bmr0j.fsf@evledraar.gmail.com/
On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote: > A real question is how much performance gain, relative to ".cloning" > thing, this approach gives us. If it gives us 80% or more of the > gain compared to doing no checking, I'd say we have a clear winner. I've tested Jeff's loose-object-cache patch and the performance is within error bounds of my .cloning patch. A git clone of the same repo as in my initial tests: .cloning -> 10m04 loose-object-cache -> 9m59 Jeff's patch does a little more work (256 readdir() calls, which in case of an empty repo translate into 256 LOOKUP calls that return NFS4ERR_NOENT) but that appears to be insignificant. I agree that the loose-object-cache approach is preferred as it applies to more git commands and also benefits performance when there are loose objects already in the repository. As pointed out in the thread, maybe the default cache size should be some integer times gc.auto. I believe the loose-object-cache approach would have a performance regression when you're receiving a small pack file and there's many loose objects in the repo. Basically you're trading off MIN(256, num_objects_in_pack / dentries_per_readdir) * readdir_latency against num_loose_objects * stat_latency On Amazon EFS (and I expect on other NFS server implementations too) it is more efficient to do readdir() on a large directory than to stat() each of the individual files in the same directory. I don't have exact numbers but based on a very rough calculation the difference is roughly 10x for large directories under normal circumstances. As an example, this means that when you're recieving a pack file with 1K objects in a repository with 10K loose objects that the loose-object-cache patch has roughly the same performance as the current git. I'm not sure if this is something to worry about as I'm not sure people run repos with this many loose files. If it is a concern, there could be a flag to turn the loose object cache on/off.
On Mon, Oct 29, 2018 at 09:34:53PM +0000, Geert Jansen wrote: > On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote: > > > A real question is how much performance gain, relative to ".cloning" > > thing, this approach gives us. If it gives us 80% or more of the > > gain compared to doing no checking, I'd say we have a clear winner. > > I've tested Jeff's loose-object-cache patch and the performance is within error > bounds of my .cloning patch. A git clone of the same repo as in my initial > tests: > > .cloning -> 10m04 > loose-object-cache -> 9m59 > > Jeff's patch does a little more work (256 readdir() calls, which in case of an > empty repo translate into 256 LOOKUP calls that return NFS4ERR_NOENT) but that > appears to be insignificant. Yep, that makes sense. Thanks for timing it. > I believe the loose-object-cache approach would have a performance regression > when you're receiving a small pack file and there's many loose objects in the > repo. Basically you're trading off > > MIN(256, num_objects_in_pack / dentries_per_readdir) * readdir_latency > > against > > num_loose_objects * stat_latency Should num_loose_objects and num_objects_in_pack be swapped here? Just making sure I understand what you're saying. The patch I showed just blindly reads each of the 256 object subdirectories. I think if we pursue this (and it seems like everybody is on board), we should cache each of those individually. So a single object would incur at most one opendir/readdir (and subsequent objects may, too, or they may hit that cache if they share the first byte). So the 256 in your MIN() is potentially much smaller. We still have to deal with the fact that if you have a large number of loose objects, they may be split cross multiple readdir (or getdents) calls. The "cache maximum" we discussed does bound that, but in some ways that's worse: you waste time doing the bounded amount of readdir and then don't even get the benefit of the cache. ;) > On Amazon EFS (and I expect on other NFS server implementations too) it is more > efficient to do readdir() on a large directory than to stat() each of the > individual files in the same directory. I don't have exact numbers but based on > a very rough calculation the difference is roughly 10x for large directories > under normal circumstances. I'd expect readdir() to be much faster than stat() in general (e.g., "ls -f" versus "ls -l" is faster even on a warm cache; there's more formatting going on in the latter, but I think a lot of it is the effort to stat). -Peff
On Mon, Oct 29, 2018 at 05:50:50PM -0400, Jeff King wrote: > > I believe the loose-object-cache approach would have a performance regression > > when you're receiving a small pack file and there's many loose objects in the > > repo. Basically you're trading off > > > > MIN(256, num_objects_in_pack / dentries_per_readdir) * readdir_latency > > > > against > > > > num_loose_objects * stat_latency > > Should num_loose_objects and num_objects_in_pack be swapped here? Just > making sure I understand what you're saying. Whoops, yes, thanks for spotting that! > So the 256 in your MIN() is potentially much smaller. We still have to > deal with the fact that if you have a large number of loose objects, > they may be split cross multiple readdir (or getdents) calls. The "cache > maximum" we discussed does bound that, but in some ways that's worse: > you waste time doing the bounded amount of readdir and then don't even > get the benefit of the cache. ;) Yup. To get the performance benefit you'd like the cache to hold all loose objects except in clearly degenerate cases with far too many loose objects. > > On Amazon EFS (and I expect on other NFS server implementations too) it is more > > efficient to do readdir() on a large directory than to stat() each of the > > individual files in the same directory. I don't have exact numbers but based on > > a very rough calculation the difference is roughly 10x for large directories > > under normal circumstances. > > I'd expect readdir() to be much faster than stat() in general (e.g., "ls > -f" versus "ls -l" is faster even on a warm cache; there's more > formatting going on in the latter, but I think a lot of it is the effort > to stat). In the case of NFS, the client usually requests that the READDIR response also contains some of the stat flags (like st_mode). But even in this case it's still more efficient to return multiple entries in one batch through READDIR rather than as individual responses to GETATTR (which is what stat() maps to).
On Mon, Oct 29, 2018 at 09:34:53PM +0000, Geert Jansen wrote: > As an example, this means that when you're recieving a pack file with 1K > objects in a repository with 10K loose objects that the loose-object-cache > patch has roughly the same performance as the current git. I'm not sure if this > is something to worry about as I'm not sure people run repos with this many > loose files. If it is a concern, there could be a flag to turn the loose object > cache on/off. So yeah, that's the other thing I'm thinking about regarding having a maximum loose cache size. 10k objects is only 200KB in memory. That's basically nothing. At some point you run into pathological cases, like having a million objects (but that's still only 20MB, much less than we devote to other caches, though of course they do add up). If you have a million loose objects, I strongly suspect you're going to run into other problems (like space, since you're not getting any deltas). The one thing that gives me pause is that if you have a bunch of unused and unreachable loose objects on disk, most operations won't actually look at them at all. The majority of operations are only looking for objects we expect to be present (e.g., resolving a ref, walking a tree) and are fulfilled by checking the pack indices first. So it's possible that Git is _tolerable_ for most operations with a million loose objects, and we could make it slightly worse by loading the cache. But I find it hard to get too worked up about spending an extra 20MB (and the time to readdir() it in) in that case. It seems like about 400ms on my machine, and the correct next step is almost always going to be "pack" or "prune" anyway. -Peff
On Mon, Oct 29, 2018 at 3:27 PM Jeff King <peff@peff.net> wrote: > So yeah, that's the other thing I'm thinking about regarding having a > maximum loose cache size. tangent: This preloading/caching could be used for a more precise approach to decide when to gc instead of using some statistical sampling of objects/17, eventually.
On Mon, Oct 29, 2018 at 08:36:07PM +0100, Ævar Arnfjörð Bjarmason wrote: > > Yeah, especially given recent advances in SHA-1 attacks, I'm not super > > comfortable with the idea of disabling the duplicate-object check at > > this point. > > I'd be comfortable with it in my setup since it's been limited to > collision attacks that are computationally prohibitive, and there being > no sign of preimage attacks, which is the case we really need to worry > about. I agree, and I'm not actually that worried about the current state. But what makes me more nervous is the life-cycle around Git. In 5 years, people are still going to be running what we ship today, and will grumble about upgrading to deal with SHA-1. I suppose it's not the end of the world as long as they can un-flip a config switch to get back the more-paranoid behavior (which is all that you're really proposing). > It does introduce a race condition where you can introduce a colliding > object to the repository by doing two concurrent pushes, but as you note > in > https://public-inbox.org/git/20181029151842.GJ17668@sigill.intra.peff.net/ > this already applies to packs, so you can trigger that with the right > sized push (depending on transfer.unpackLimit), and we also have this in > existing forms for other stuff. Right. It can also trigger currently if somebody runs "git repack" simultaneously (the loose becomes packed, but we don't re-scan the pack directory). > I do think it's amazingly paranoid to be worried about SHA-1 collisions > in the first place, and a bit odd to leave the door open on these race > conditions. I.e. it's hard to imagine a state-level[1] actor with > sufficient motivation to exploit this who wouldn't find some way to make > the race condition work as an escape hatch. Yeah, I agree there's an element of that. I think the "push twice quickly to race" thing is actually not all that interesting, though. In that case, you're providing both the objects already, so why not just push the one you want? What's more interesting is racing with the victim of your collision (I feed Junio the good half of the collision, and then try to race his push and get my evil half in at the same time). Or racing a repack. But timing the race there seems a lot trickier. I suspect you could open up the window substantially by feeding your pack really slowly. So I start to push at 1pm, but trickle in a byte at a time of my 1GB pack, taking several hours. Meanwhile Junio pushes, and then as soon as I see that, I send the rest of my pack. My index-pack doesn't see Junio's push because it started before. And ditto with repack, if the servers runs it predictably in response to load. So maybe not so tricky after all. I think the other thing that helps here is that _everybody_ runs the collision check. So yeah, you can race pushing your evil stuff to my server. But it only takes one person fetching into their quiescent laptop repository to notice the collision and sound the alarm. I'll admit that there's a whole lot of hand-waving there, for a security claim. I'll be glad to simply move off of SHA-1. > In a busy repo that gets a lot of branches / branch deletions (so not > quite as extreme as [2], but close) and the default expiry policy you > can easily have 20-100K loose objects (something near the lower bound of > that is the current live state of one server I'm looking at). > > A recursive opendir()/readdir() on that on local disk is really fast if > it's in cache, but can easily be 1-5 seconds on NFS. So for a push we'd > now pay up to 5s just populating a cache we'll bearly use to accept some > tiny push with just a few objects. That 1-5 seconds is a little scary. Locally for a million objects I was looking at 400ms. But obviously NFS is going to be much worse. I do agree with your sentiment below that even if this should be on by default, it should have a config knob. After all, "please flip this switch and see if things improve" is a good escape hatch to have. > * Re-roll my 4 patch series to include the patch you have in > <20181027093300.GA23974@sigill.intra.peff.net> I don't think it's quite ready for inclusion as-is. I hope to brush it up a bit, but I have quite a backlog of stuff to review, as well. -Peff
On Mon, Oct 29, 2018 at 03:35:58PM -0700, Stefan Beller wrote: > On Mon, Oct 29, 2018 at 3:27 PM Jeff King <peff@peff.net> wrote: > > > So yeah, that's the other thing I'm thinking about regarding having a > > maximum loose cache size. > > tangent: > This preloading/caching could be used for a more precise approach > to decide when to gc instead of using some statistical sampling > of objects/17, eventually. Isn't it exactly the same thing? Ideally we'd break down the cache to the directory level, so we could fill the cache list for "17" and ask "how full are you". Or we could just readdir objects/17 ourselves. But either way, it's the same amount of work. -Peff
On Mon, Oct 29, 2018 at 07:27:39PM -0400, Jeff King wrote: > On Mon, Oct 29, 2018 at 08:36:07PM +0100, Ævar Arnfjörð Bjarmason wrote: > > * Re-roll my 4 patch series to include the patch you have in > > <20181027093300.GA23974@sigill.intra.peff.net> > > I don't think it's quite ready for inclusion as-is. I hope to brush it > up a bit, but I have quite a backlog of stuff to review, as well. We're still quite keen to get this patch included. Is there anything I can do to help? Also I just re-read your comments on maximum cache size. I think you were arguing both sides of the equation and I wasn't sure where you'd ended up. :) A larger cache size potentially takes more time to fill up especially on NFS while a smaller cache size obviously would less effective. That said a small cache is still effective for the "clone" case where the repo is empty. It also occurred to me that as a performance optimization your patch could read the the loose object directories in parallel using a thread pool. At least on Amazon EFS this should result in al almost linear performance increase. I'm not sure how much this would help for local file systems. In any case this may be best done as a follow-up patch (that I'd be happy to volunteer for).
On Wed, Nov 07, 2018 at 10:55:24PM +0000, Geert Jansen wrote: > On Mon, Oct 29, 2018 at 07:27:39PM -0400, Jeff King wrote: > > > On Mon, Oct 29, 2018 at 08:36:07PM +0100, Ævar Arnfjörð Bjarmason wrote: > > > * Re-roll my 4 patch series to include the patch you have in > > > <20181027093300.GA23974@sigill.intra.peff.net> > > > > I don't think it's quite ready for inclusion as-is. I hope to brush it > > up a bit, but I have quite a backlog of stuff to review, as well. > > We're still quite keen to get this patch included. Is there anything I can do > to help? Yes, testing and review. :) I won't send the series out just yet, as I suspect it could use another read-through on my part. But if you want to peek at it or try some timings, it's available at: https://github.com/peff/git jk/loose-cache It's quite a bit bigger than the original patch, as some refactoring was necessary to reuse the existing cache in alternate_object_directories. I'm rather pleased with how it turned out; it unifies the handling of alternates and the main object directory, which is a cleanup I've been wanting to do for some time. > Also I just re-read your comments on maximum cache size. I think you were > arguing both sides of the equation and I wasn't sure where you'd ended up. :) > A larger cache size potentially takes more time to fill up especially on NFS > while a smaller cache size obviously would less effective. That said a small > cache is still effective for the "clone" case where the repo is empty. I ended up thinking that a large cache is going to be fine. So I didn't even bother implementing a limit in my series, which makes things a bit simpler (it's one less state to deal with). Since it reuses the existing cache code, it's better in a few ways than my earlier patch: 1. If a program uses OBJECT_INFO_QUICK and prints abbreviated sha1s, we only have to load the cache once (I think fetch does this, but I didn't test it). 2. The cache is filled one directory at a time, which avoids unnecessary work when there are only a few lookups. 3. The cache is per-object-directory. So if a request can be filled without looking at an alternate, we avoid looking at the alternate. I doubt this matters much in practice (the case we care about is when we _don't_ have the object, and there you have to look everywhere). The one thing I didn't implement is a config option to disable this. That would be pretty easy to add. I don't think it's necessary, but it would make testing before/after behavior easier if somebody thinks it's slowing down their particular case. > It also occurred to me that as a performance optimization your patch could read > the the loose object directories in parallel using a thread pool. At least on > Amazon EFS this should result in al almost linear performance increase. I'm not > sure how much this would help for local file systems. In any case this may be > best done as a follow-up patch (that I'd be happy to volunteer for). Yeah, I suspect it could make things faster in some cases. But it also implies filling all of the cache directories at once up front. The code I have now tries to avoid unnecessary cache fills. But it would be pretty easy to kick off a full fill. I agree it would make more sense as a follow-up patch (and probably controlled by a config option, since it likely only makes sense when you have a really high-latency readdir). -Peff
On Thu, Nov 08, 2018 at 07:02:57AM -0500, Jeff King wrote: > Yes, testing and review. :) > > I won't send the series out just yet, as I suspect it could use another > read-through on my part. But if you want to peek at it or try some > timings, it's available at: > > https://github.com/peff/git jk/loose-cache I gave this branch a go. There's a performance regression as I'm getting a clone speed of about 100 KiB/s while with the previous patch I got around 20 MiB/s. The culprint appears to be a very large number of stat() calls on ".git/objects/info/alternates". The call stack is: -> quick_has_loose() -> prepare_alt_odb() -> read_info_alternates()
On Thu, Nov 08, 2018 at 08:58:19PM +0000, Geert Jansen wrote: > On Thu, Nov 08, 2018 at 07:02:57AM -0500, Jeff King wrote: > > > Yes, testing and review. :) > > > > I won't send the series out just yet, as I suspect it could use another > > read-through on my part. But if you want to peek at it or try some > > timings, it's available at: > > > > https://github.com/peff/git jk/loose-cache > > I gave this branch a go. There's a performance regression as I'm getting a > clone speed of about 100 KiB/s while with the previous patch I got around 20 > MiB/s. The culprint appears to be a very large number of stat() calls on > ".git/objects/info/alternates". The call stack is: > > -> quick_has_loose() > -> prepare_alt_odb() > -> read_info_alternates() Heh, indeed. Try this on top: diff --git a/sha1-file.c b/sha1-file.c index bc35b28e17..9ff27f92ed 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -692,6 +692,7 @@ void prepare_alt_odb(struct repository *r) link_alt_odb_entries(r, r->objects->alternate_db, PATH_SEP, NULL, 0); read_info_alternates(r, r->objects->odb->path, 0); + r->objects->loaded_alternates = 1; } /* Returns 1 if we have successfully freshened the file, 0 otherwise. */ -Peff
On Thu, Nov 08, 2018 at 04:18:24PM -0500, Jeff King wrote: > Heh, indeed. Try this on top: > > diff --git a/sha1-file.c b/sha1-file.c > index bc35b28e17..9ff27f92ed 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -692,6 +692,7 @@ void prepare_alt_odb(struct repository *r) > link_alt_odb_entries(r, r->objects->alternate_db, PATH_SEP, NULL, 0); > > read_info_alternates(r, r->objects->odb->path, 0); > + r->objects->loaded_alternates = 1; > } > > /* Returns 1 if we have successfully freshened the file, 0 otherwise. */ Thanks, this did it. Performance is now back at the level of the previous patch.
On Thu, Nov 08 2018, Jeff King wrote: > On Wed, Nov 07, 2018 at 10:55:24PM +0000, Geert Jansen wrote: > >> On Mon, Oct 29, 2018 at 07:27:39PM -0400, Jeff King wrote: >> >> > On Mon, Oct 29, 2018 at 08:36:07PM +0100, Ævar Arnfjörð Bjarmason wrote: >> > > * Re-roll my 4 patch series to include the patch you have in >> > > <20181027093300.GA23974@sigill.intra.peff.net> >> > >> > I don't think it's quite ready for inclusion as-is. I hope to brush it >> > up a bit, but I have quite a backlog of stuff to review, as well. >> >> We're still quite keen to get this patch included. Is there anything I can do >> to help? > > Yes, testing and review. :) > > I won't send the series out just yet, as I suspect it could use another > read-through on my part. But if you want to peek at it or try some > timings, it's available at: > > https://github.com/peff/git jk/loose-cache Just a comment on this from the series: Note that it is possible for this to actually be _slower_. We'll do a full readdir() to fill the cache, so if you have a very large number of loose objects and a very small number of lookups, that readdir() may end up more expensive. In practice, though, having a large number of loose objects is already a performance problem, which should be fixed by repacking or pruning via git-gc. So on balance, this should be a good tradeoff. Our biggest repo has a very large number of loose objects at any given time, but the vast majority of these are because gc *is* happening very frequently and the default expiry policy of 2wks is in effect. Having a large number of loose objects is not per-se a performance problem. It's a problem if you end up "faulting" to from packs to the loose object directory a lot because those objects are still reachable, but if they're not reachable that number can grow very large if your ref churn is large (so lots of expired loose object production). Anyway, the series per-se looks good to me. It's particularly nice to have some of the ODB cleanup + cleanup in fetch-pack.c Just wanted to note that in our default (reasonable) config we do produce scenarios where this change can still be somewhat pathological, so I'm still interested in disabling it entirely given the implausibility of what it's guarding against.
On Thu, Nov 08 2018, Ævar Arnfjörð Bjarmason wrote: > On Thu, Nov 08 2018, Jeff King wrote: > >> On Wed, Nov 07, 2018 at 10:55:24PM +0000, Geert Jansen wrote: >> >>> On Mon, Oct 29, 2018 at 07:27:39PM -0400, Jeff King wrote: >>> >>> > On Mon, Oct 29, 2018 at 08:36:07PM +0100, Ævar Arnfjörð Bjarmason wrote: >>> > > * Re-roll my 4 patch series to include the patch you have in >>> > > <20181027093300.GA23974@sigill.intra.peff.net> >>> > >>> > I don't think it's quite ready for inclusion as-is. I hope to brush it >>> > up a bit, but I have quite a backlog of stuff to review, as well. >>> >>> We're still quite keen to get this patch included. Is there anything I can do >>> to help? >> >> Yes, testing and review. :) >> >> I won't send the series out just yet, as I suspect it could use another >> read-through on my part. But if you want to peek at it or try some >> timings, it's available at: >> >> https://github.com/peff/git jk/loose-cache > > Just a comment on this from the series: > > Note that it is possible for this to actually be _slower_. We'll do a > full readdir() to fill the cache, so if you have a very large number of > loose objects and a very small number of lookups, that readdir() may end > up more expensive. > > In practice, though, having a large number of loose objects is already a > performance problem, which should be fixed by repacking or pruning via > git-gc. So on balance, this should be a good tradeoff. > > Our biggest repo has a very large number of loose objects at any given > time, but the vast majority of these are because gc *is* happening very > frequently and the default expiry policy of 2wks is in effect. > > Having a large number of loose objects is not per-se a performance > problem. > > It's a problem if you end up "faulting" to from packs to the loose > object directory a lot because those objects are still reachable, but if > they're not reachable that number can grow very large if your ref churn > is large (so lots of expired loose object production). > > Anyway, the series per-se looks good to me. It's particularly nice to > have some of the ODB cleanup + cleanup in fetch-pack.c > > Just wanted to note that in our default (reasonable) config we do > produce scenarios where this change can still be somewhat pathological, > so I'm still interested in disabling it entirely given the > implausibility of what it's guarding against. Some actual numbers for this for a fairly small repo on NFS, "cold" cache (hadn't run it in a bit): $ time (find objects/?? -type f|wc -l) 862 real 0m1.927s Warm cache: $ time (find objects/?? -type f|wc -l) 872 real 0m0.151s Cold cache on a bigger monorepo: $ time (find objects/?? -type f|wc -l) real 0m4.336s Warm cache on a bigger monorepo (more ref churn): $ time (find objects/?? -type f|wc -l) 49746 real 0m1.082s This on a server where bulk sustained writes of large files are really fast (up to 1GB/s). It's just these metadata ops that are slow. I also get cold cache times of up to 6 seconds on: time (find $(ls -d objects/??|sort -R) -type f | wc -l) As opposed max of ~4s without -R, so I suspect there may be some client/server optimization where things are iterated over in recursive glob order (pre-fetched?), whereas the cache will try to fill buckets is it encounters loose objects, so iterate over objects/{00..ff} randomly. I'm not really leading up to any point here I haven't made already. I was just curious to try to find some upper bound of overhead if say a pack with 512 objects is pushed. In that case it's very likely that we need to fill at least 200/256 buckets.
On Wed, Nov 07 2018, Geert Jansen wrote: > On Mon, Oct 29, 2018 at 07:27:39PM -0400, Jeff King wrote: > >> On Mon, Oct 29, 2018 at 08:36:07PM +0100, Ævar Arnfjörð Bjarmason wrote: >> > * Re-roll my 4 patch series to include the patch you have in >> > <20181027093300.GA23974@sigill.intra.peff.net> >> >> I don't think it's quite ready for inclusion as-is. I hope to brush it >> up a bit, but I have quite a backlog of stuff to review, as well. > > We're still quite keen to get this patch included. Is there anything I can do > to help? > > Also I just re-read your comments on maximum cache size. I think you were > arguing both sides of the equation and I wasn't sure where you'd ended up. :) > A larger cache size potentially takes more time to fill up especially on NFS > while a smaller cache size obviously would less effective. That said a small > cache is still effective for the "clone" case where the repo is empty. > > It also occurred to me that as a performance optimization your patch could read > the the loose object directories in parallel using a thread pool. At least on > Amazon EFS this should result in al almost linear performance increase. I'm not > sure how much this would help for local file systems. In any case this may be > best done as a follow-up patch (that I'd be happy to volunteer for). I'm planning to re-submit mine with some minor changes after the great Documentation/config* move lands. As noted in https://public-inbox.org/git/87bm7clf4o.fsf@evledraar.gmail.com/ and https://public-inbox.org/git/87h8gq5zmc.fsf@evledraar.gmail.com/ I think it's regardless of Jeff's optimization is. O(nothing) is always faster than O(something), particularly (as explained in that E-Mail) on NFS. You didn't answer my question in https://public-inbox.org/git/20181030024925.GC8325@amazon.com/ about whether for your purposes you're interested in this for something where it needs to work out of the box on some random Amazon's customer's "git", or if it's something in-house and you just don't want to turn off collision checking. That would be useful to know. I've turned on core.checkCollisions=false in production at our site. Cloning for some large repositories went from ~200 minutes to ~5m, and some pushes from ~5 minutes to ~10 seconds. Those numbers will be very similar (but slightly higher, maybe 1-5 seconds higher in the latter case) with Jeff's (depending on the push).
On Fri, Nov 9, 2018 at 2:46 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > I'm planning to re-submit mine with some minor changes after the great > Documentation/config* move lands. > > As noted in > https://public-inbox.org/git/87bm7clf4o.fsf@evledraar.gmail.com/ and > https://public-inbox.org/git/87h8gq5zmc.fsf@evledraar.gmail.com/ I think > it's regardless of Jeff's optimization is. O(nothing) is always faster > than O(something), particularly (as explained in that E-Mail) on NFS. Is it really worth adding more code to maintain just to shave a couple seconds (or a few percent clone time)?
On Fri, Nov 09 2018, Duy Nguyen wrote: > On Fri, Nov 9, 2018 at 2:46 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: >> I'm planning to re-submit mine with some minor changes after the great >> Documentation/config* move lands. >> >> As noted in >> https://public-inbox.org/git/87bm7clf4o.fsf@evledraar.gmail.com/ and >> https://public-inbox.org/git/87h8gq5zmc.fsf@evledraar.gmail.com/ I think >> it's regardless of Jeff's optimization is. O(nothing) is always faster >> than O(something), particularly (as explained in that E-Mail) on NFS. > > Is it really worth adding more code to maintain just to shave a couple > seconds (or a few percent clone time)? Yeah I think so, because (in rough priority order): a) The maintenance burden of carrying core.checkCollisions is trivial, and it's hard to imagine a scenario where it'll be difficult to selectively turn off some does_this_collide() function. b) I think I need to worry more about a meteorite colliding with the datacenter than the threat this check is trying to guard against. c) I think we should just turn it off by default on SHA-1, but don't expect that argument will carry the day. But I expect even those who think we still need it will have a hard time making that argument in the case of SHA-256. So having the codepath to disable it is helpful. d) As shown in the linked E-Mails of mine you sometimes pay a 2-3 second *fixed* cost even for a very small (think ~100-200 objects) push/fetch that would otherwise take milliseconds with Jeff's version of this optimization (and not with mine). This can be a hundred/thousands of percent slowdown. Is that a big deal in itself in terms of absolute time spent? No. But I'm also thinking about this from the perspective of getting noise out of performance metrics. Some of this slowdown is also "user waiting for the terminal to be usable again" not just some machine somewhere wasting its own time. e) As shown in the patch I have this direction as a very beneficial side-effect makes it much easier to repair corrupt repositories. Something I'm hoping to pursue even further. I've had cases where core.checkCollisions=false + stuff on top would have made repairing a broken repo much easier. Anyway, I'm in no rush to send my patch. I'm happily using it in production, but will wait for Jeff's be ready and to land before picking it up again. Just wanted to do a braindump of the benefits.
On Thu, Nov 08, 2018 at 11:20:47PM +0100, Ævar Arnfjörð Bjarmason wrote: > Just a comment on this from the series: > > Note that it is possible for this to actually be _slower_. We'll do a > full readdir() to fill the cache, so if you have a very large number of > loose objects and a very small number of lookups, that readdir() may end > up more expensive. > > In practice, though, having a large number of loose objects is already a > performance problem, which should be fixed by repacking or pruning via > git-gc. So on balance, this should be a good tradeoff. > > Our biggest repo has a very large number of loose objects at any given > time, but the vast majority of these are because gc *is* happening very > frequently and the default expiry policy of 2wks is in effect. > > Having a large number of loose objects is not per-se a performance > problem. Yes, you're right. I was trying not to get into the rabbit hole of discussing theoretical tradeoffs, but it is worth addressing. I've updated that commit message in the patches I'll send out momentarily. -Peff
On Sat, Nov 10, 2018 at 03:04:35PM +0100, Ævar Arnfjörð Bjarmason wrote: > d) As shown in the linked E-Mails of mine you sometimes pay a 2-3 second > *fixed* cost even for a very small (think ~100-200 objects) push/fetch > that would otherwise take milliseconds with Jeff's version of this > optimization (and not with mine). This can be a hundred/thousands of > percent slowdown. > > Is that a big deal in itself in terms of absolute time spent? No. But > I'm also thinking about this from the perspective of getting noise > out of performance metrics. Some of this slowdown is also "user > waiting for the terminal to be usable again" not just some machine > somewhere wasting its own time. IMHO the ultimate end-game in this direction is still "don't have a bunch of loose objects". Right now this can legitimately happen due to unreachable-but-recent objects being exploded out (or never packed in the first place). But I hope in the long run that we'll actually put these into packs. That will make this case faster _and_ avoid extra work during gc _and_ fix the "whoops, we just ran gc but you still have a lot of objects" problem. Which doesn't invalidate your other four points, of course. ;) -Peff
On Fri, Nov 09, 2018 at 02:43:52PM +0100, Ævar Arnfjörð Bjarmason wrote: > As noted in > https://public-inbox.org/git/87bm7clf4o.fsf@evledraar.gmail.com/ and > https://public-inbox.org/git/87h8gq5zmc.fsf@evledraar.gmail.com/ I think > it's regardless of Jeff's optimization is. O(nothing) is always faster > than O(something), particularly (as explained in that E-Mail) on NFS. > > You didn't answer my question in > https://public-inbox.org/git/20181030024925.GC8325@amazon.com/ about > whether for your purposes you're interested in this for something where > it needs to work out of the box on some random Amazon's customer's > "git", or if it's something in-house and you just don't want to turn off > collision checking. That would be useful to know. The reason I started this thread is to optimize performance for AWS customers that run git on EFS. Therefore, my preference is that git would be fast out of the box on NFS/EFS but without having to disable collision checking unconditionally (disabling it for empty repos is fine as that's a no-op anyway).
diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 2004e25da..22b3d40fb 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -84,6 +84,7 @@ static int verbose; static int show_resolving_progress; static int show_stat; static int check_self_contained_and_connected; +static int cloning; static struct progress *progress; @@ -794,7 +795,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry, assert(data || obj_entry); - if (startup_info->have_repository) { + if (startup_info->have_repository && !cloning) { read_lock(); collision_test_needed = has_sha1_file_with_flags(oid->hash, OBJECT_INFO_QUICK); @@ -1705,6 +1706,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) check_self_contained_and_connected = 1; } else if (!strcmp(arg, "--fsck-objects")) { do_fsck_object = 1; + } else if (!strcmp(arg, "--cloning")) { + cloning = 1; } else if (!strcmp(arg, "--verify")) { verify = 1; } else if (!strcmp(arg, "--verify-stat")) { diff --git a/fetch-pack.c b/fetch-pack.c index b3ed7121b..c75bfb8aa 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -843,6 +843,8 @@ static int get_pack(struct fetch_pack_args *args, argv_array_push(&cmd.args, "--check-self-contained-and-connected"); if (args->from_promisor) argv_array_push(&cmd.args, "--promisor"); + if (args->cloning) + argv_array_pushf(&cmd.args, "--cloning"); } else { cmd_name = "unpack-objects";
The index-pack command determines if a sha1 collision test is needed by checking the existence of a loose object with the given hash. In my tests, I can improve performance of “git clone” on Amazon EFS by 8x when used with a non-default mount option (lookupcache=pos) that's required for a Gitlab HA setup. My assumption is that this check is unnecessary when cloning into a new repository because the repository will be empty. By default, the Linux NFS client will cache directory entries as well as the non-existence of directory entries. The latter means that when client c1 does stat() on a file that does not exist, the non-existence will be cached and any subsequent stat() operation on the file will return -ENOENT until the cache expires or is invalidated, even if the file was created on client c2 in the mean time. This leads to errors in a Gitlab HA setup when it distributes jobs over multiple worker nodes assuming each worker node has the same view of the shared file system. The recommended workaround by Gitlab is to use the “lookupcache=pos” NFS mount option which disables the negative lookup cache. This option has a high performance impact. Cloning the gitlab-ce repository (https://gitlab.com/gitlab-org/gitlib-ce.git) into an NFS mounted directory gives the following results: lookupcache=all (default): 624 seconds lookupcache=pos: 4957 seconds The reason for the poor performance is that index-pack will issue a stat() call for every object in the repo when checking if a collision test is needed. These stat() calls result in the following NFS operations: LOOKUP dirfh=".git/objects", name="01" -> NFS4ERR_ENOENT With lookupcache=all, the non-existence of the .git/objects/XX directories is cached, so that there will be at most 256 LOOKUP calls. With lookupcache=pos, there will be one LOOKUP operation for every object in the repository, which in case of the gitlab-ce repo is about 1.3 million times. The attached patch removes the collision check when cloning into a new repository. The performance of git clone with this patch is: lookupcache=pos (with patch): 577 seconds I'd welcome feedback on the attached patch and whether my assumption that the sha1 collision check can be safely omitted when cloning into a new repository is correct. Signed-off-by: Geert Jansen <gerardu@amazon.com> --- builtin/index-pack.c | 5 ++++- fetch-pack.c | 2 ++ 2 files changed, 6 insertions(+), 1 deletion(-)