Message ID | 20200721225020.1352772-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | sha1-file: make pretend_object_file() not prefetch | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > To make progress towards solving this, I'm thinking of making a new > function has_object() that takes a repository, an object ID, and flags, > and only supports one flag: HAS_OBJECT_RECHECK_PACKED (which has the > opposite meaning of OBJECT_INFO_QUICK). Checks that should not fetch > should use has_object(), and checks that should fetch (if they exist - I > think that most would want additional information about the object, so > they would use oid_object_info() or similar already) should use > oid_object_info() or a similar function. That way we can see how many > uses of has_object_file() we have checked, and at the same time make > behavior we usually want into the default behavior. Any opinions? Yes, a new function (or a pair of functions) makes it easy to see how much progress we made and also gives us easier greppability.
Jonathan Tan <jonathantanmy@google.com> writes: > When pretend_object_file() is invoked with an object that does not exist > (as is the typical case), there is no need to fetch anything from the > promisor remote, because the caller already knows what the object is > supposed to contain. Therefore, suppress the fetch. (The > OBJECT_INFO_QUICK flag is added for the same reason.) Yes, "pretend" is also a way to lie about the contents IIRC, so even if the object is available elsewhere, we should *not* fetch from the promisor. Makes sense to me.
On Tue, Jul 21, 2020 at 04:27:15PM -0700, Junio C Hamano wrote: > Jonathan Tan <jonathantanmy@google.com> writes: > > > When pretend_object_file() is invoked with an object that does not exist > > (as is the typical case), there is no need to fetch anything from the > > promisor remote, because the caller already knows what the object is > > supposed to contain. Therefore, suppress the fetch. (The > > OBJECT_INFO_QUICK flag is added for the same reason.) > > Yes, "pretend" is also a way to lie about the contents IIRC, so even > if the object is available elsewhere, we should *not* fetch from the > promisor. Makes sense to me. I agree this patch is fine, but I wonder if it could go even further. If we are pretending some particular contents, shouldn't we override anything that might be in the object database? I.e., could we eliminate this has_object_file() entirely? That should be OK for the same reason that it's OK to use QUICK. There's only one caller of this function (git-blame), which I think would be happy with such a change. -Peff
Jeff King <peff@peff.net> writes: > I agree this patch is fine, but I wonder if it could go even further. If > we are pretending some particular contents, shouldn't we override > anything that might be in the object database? I.e., could we eliminate > this has_object_file() entirely? > > That should be OK for the same reason that it's OK to use QUICK. > > There's only one caller of this function (git-blame), which I think > would be happy with such a change. I actually have to take that "we could even lie to say that content that does not hash to X is object X" back---that was never the intention of this mechanism. It was to ensure that operations that are supposedly read-only can avoid writing into the repository---"blame" uses it to pretend as if the working tree file already has a corresponding object in the object store, IIRC. The only reason why hash_object_file() is used here is to allow us discarding the memory held for that working tree copy if it happens to match what is stored in the object database. The saving would be within a few hundred kilobytes to a single digit megabyte range at most, hopefully, so we could drop it (oh, saying "a single digit megabyte" reminds me that my first Linux computer was 486dx with 4MB ram---on that box, it may have mattered).
diff --git a/sha1-file.c b/sha1-file.c index ccd34dd9e8..45fdb8415c 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1600,7 +1600,8 @@ int pretend_object_file(void *buf, unsigned long len, enum object_type type, struct cached_object *co; hash_object_file(the_hash_algo, buf, len, type_name(type), oid); - if (has_object_file(oid) || find_cached_object(oid)) + if (has_object_file_with_flags(oid, OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT) || + find_cached_object(oid)) return 0; ALLOC_GROW(cached_objects, cached_object_nr + 1, cached_object_alloc); co = &cached_objects[cached_object_nr++]; diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh index eea048e52c..2ed6aaae35 100755 --- a/t/t8002-blame.sh +++ b/t/t8002-blame.sh @@ -122,4 +122,15 @@ test_expect_success '--exclude-promisor-objects does not BUG-crash' ' test_must_fail git blame --exclude-promisor-objects one ' +test_expect_success 'blame with uncommitted edits in partial clone does not crash' ' + git init server && + echo foo >server/file.txt && + git -C server add file.txt && + git -C server commit -m file && + + git clone --filter=blob:none "file://$(pwd)/server" client && + echo bar >>client/file.txt && + git -C client blame file.txt +' + test_done