Message ID | 20190903194247.217964-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cache-tree: do not lazy-fetch merge tree | expand |
On 9/3/2019 3:42 PM, Jonathan Tan wrote: > When cherry-picking (for example), new trees may be constructed. During > this process, Git checks whether these trees exist. However, in a > partial clone, this causes a lazy fetch to occur, which is both > unnecessary (because Git has already constructed this tree as part of > the cherry-picking process) and likely to fail (because the remote > probably doesn't have this tree). If we have constructed the object already, then why do we not see it and avoid fetching it? This must be a slightly strange timing issue with objects being flushed to disk or added to the object cache. One approach is to find all of these has_object_file() calls that should really be one with OBJECT_INFO_SKIP_FETCH_OBJECT. Another would be to find out why has_object_file() isn't seeing the object we constructed. > Do not lazy fetch in this situation. I agree that the patch has this effect. > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > Another partial clone bug. > > This raises the issue that failed fetches are currently fatal - if they > weren't fatal, this cherry-pick would have worked (except with some > delay as the fetch is attempted, and with a warning message about the > fetch failing). My personal inclination right now is to leave things as > it is (fatal failed fetches), but I'm open to other opinions. > --- > cache-tree.c | 2 +- > t/t0410-partial-clone.sh | 14 ++++++++++++++ > 2 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/cache-tree.c b/cache-tree.c > index c22161f987..9e596893bc 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -407,7 +407,7 @@ static int update_one(struct cache_tree *it, > if (repair) { > struct object_id oid; > hash_object_file(buffer.buf, buffer.len, tree_type, &oid); > - if (has_object_file(&oid)) > + if (has_object_file_with_flags(&oid, OBJECT_INFO_SKIP_FETCH_OBJECT)) > oidcpy(&it->oid, &oid); > else > to_invalidate = 1; > diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh > index 6415063980..3e434b6a81 100755 > --- a/t/t0410-partial-clone.sh > +++ b/t/t0410-partial-clone.sh > @@ -492,6 +492,20 @@ test_expect_success 'gc stops traversal when a missing but promised object is re > ! grep "$TREE_HASH" out > ' > > +test_expect_success 'do not fetch when checking existence of tree we construct ourselves' ' > + rm -rf repo && > + test_create_repo repo && > + test_commit -C repo base && > + test_commit -C repo side1 && > + git -C repo checkout base && > + test_commit -C repo side2 && > + > + git -C repo config core.repositoryformatversion 1 && > + git -C repo config extensions.partialclone "arbitrary string" && > + > + git -C repo cherry-pick side1 > +' > + I appreciate this test! Thanks, -Stolee
> On 9/3/2019 3:42 PM, Jonathan Tan wrote: > > When cherry-picking (for example), new trees may be constructed. During > > this process, Git checks whether these trees exist. However, in a > > partial clone, this causes a lazy fetch to occur, which is both > > unnecessary (because Git has already constructed this tree as part of > > the cherry-picking process) and likely to fail (because the remote > > probably doesn't have this tree). > > If we have constructed the object already, then why do we not see it > and avoid fetching it? This must be a slightly strange timing issue > with objects being flushed to disk or added to the object cache. Thanks for taking a look at this! The answer is that I wasn't precise when I said "already constructed" - I meant that it is in a struct strbuf. It hasn't been written to disk yet, so has_object_file() does not see it. > One approach is to find all of these has_object_file() calls that should > really be one with OBJECT_INFO_SKIP_FETCH_OBJECT. Another would be to > find out why has_object_file() isn't seeing the object we constructed. By the former, do you mean that we should look at the other has_object_file() calls? I looked at the others in cache-tree.c and I think the one in this patch is the only one that is called on an OID generated from hash_object_file(). (And I answered the latter in the above paragraph.) To avoid confusion, maybe this commit message is better: When cherry-picking (for example), new trees may be constructed. During this process, Git constructs the new tree in a struct strbuf, computes the OID of the new tree, and checks if the new OID already exists on disk. However, in a partial clone, the disk check causes a lazy fetch to occur, which is both unnecessary (because we have the tree in the struct strbuf) and likely to fail (because the remote probably doesn't have this tree).
Jonathan Tan <jonathantanmy@google.com> writes: > When cherry-picking (for example), new trees may be constructed. During > this process, Git constructs the new tree in a struct strbuf, computes > the OID of the new tree, and checks if the new OID already exists on > disk. However, in a partial clone, the disk check causes a lazy fetch to > occur, which is both unnecessary (because we have the tree in the struct > strbuf) and likely to fail (because the remote probably doesn't have > this tree). FWIW, this logic dates back to aecf567c ("cache-tree: create/update cache-tree on checkout", 2014-07-05), when "checkout" learned to perform opportunistic revalidation of cache-tree data structure, without writing into the object store. If we were lazily checked out, and created a blob locally that happens to match the original we did not fetch in a directory this piece of code is hashing, the resulting hash _may_ name a tree that the other side has that we did not fetch, so taking the "to_invalidate = 1" side would make the resulting cache-tree less optimal, but because the design choice being made here is to take that hit in order to avoid network cost, as long as that is documented properly (iow, "probably doesn't have" is not an issue; even if they have it, you do not want to fetch and make the cache-tree entry valid), it is OK.
diff --git a/cache-tree.c b/cache-tree.c index c22161f987..9e596893bc 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -407,7 +407,7 @@ static int update_one(struct cache_tree *it, if (repair) { struct object_id oid; hash_object_file(buffer.buf, buffer.len, tree_type, &oid); - if (has_object_file(&oid)) + if (has_object_file_with_flags(&oid, OBJECT_INFO_SKIP_FETCH_OBJECT)) oidcpy(&it->oid, &oid); else to_invalidate = 1; diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index 6415063980..3e434b6a81 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -492,6 +492,20 @@ test_expect_success 'gc stops traversal when a missing but promised object is re ! grep "$TREE_HASH" out ' +test_expect_success 'do not fetch when checking existence of tree we construct ourselves' ' + rm -rf repo && + test_create_repo repo && + test_commit -C repo base && + test_commit -C repo side1 && + git -C repo checkout base && + test_commit -C repo side2 && + + git -C repo config core.repositoryformatversion 1 && + git -C repo config extensions.partialclone "arbitrary string" && + + git -C repo cherry-pick side1 +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd
When cherry-picking (for example), new trees may be constructed. During this process, Git checks whether these trees exist. However, in a partial clone, this causes a lazy fetch to occur, which is both unnecessary (because Git has already constructed this tree as part of the cherry-picking process) and likely to fail (because the remote probably doesn't have this tree). Do not lazy fetch in this situation. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- Another partial clone bug. This raises the issue that failed fetches are currently fatal - if they weren't fatal, this cherry-pick would have worked (except with some delay as the fetch is attempted, and with a warning message about the fetch failing). My personal inclination right now is to leave things as it is (fatal failed fetches), but I'm open to other opinions. --- cache-tree.c | 2 +- t/t0410-partial-clone.sh | 14 ++++++++++++++ 2 files changed, 15 insertions(+), 1 deletion(-)