Message ID | patch-05.10-719fcfbe13c-20220201T144803Z-avarab@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | object-file API: pass object enums, tidy up streaming interface | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Provide a new hash_object_file_oideq() for those callers of > check_object_signature() that don't care about its streaming > interface. I.e. at the start of that function we do: > > if (map) { > hash_object_file(r->hash_algo, map, size, type, real_oid); > return !oideq(oid, real_oid) ? -1 : 0; > } > > These callers always provide a "map" (or "buf[fer]"). Let's have them > call this simpler hash_object_file_oideq() function instead. > > None of them use a non-NULL "real_oid" argument, but let's provide it > like check_object_signature() did. This'll make it easy to have these > emit better error messages in the future as was done in > 96e41f58fe1 (fsck: report invalid object type-path combinations, > 2021-10-01), i.e. the die() calls here can emit not only the OID we > expected, but also what we got. This has a potential to moving us in the wrong direction when made to code paths that currently fully slurp an object in-core but may want to shrink the memory footprint by using streaming API more. Having said that, I think all of these want to also use the in-core version of hte object data (mostly to write them out), and they need more work to move them in the other direction, so I'd say it is OK to introduce the new helper to simplify these callsites.
On Tue, Feb 01 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> Provide a new hash_object_file_oideq() for those callers of >> check_object_signature() that don't care about its streaming >> interface. I.e. at the start of that function we do: >> >> if (map) { >> hash_object_file(r->hash_algo, map, size, type, real_oid); >> return !oideq(oid, real_oid) ? -1 : 0; >> } >> >> These callers always provide a "map" (or "buf[fer]"). Let's have them >> call this simpler hash_object_file_oideq() function instead. >> >> None of them use a non-NULL "real_oid" argument, but let's provide it >> like check_object_signature() did. This'll make it easy to have these >> emit better error messages in the future as was done in >> 96e41f58fe1 (fsck: report invalid object type-path combinations, >> 2021-10-01), i.e. the die() calls here can emit not only the OID we >> expected, but also what we got. > > This has a potential to moving us in the wrong direction when made > to code paths that currently fully slurp an object in-core but may > want to shrink the memory footprint by using streaming API more. > > Having said that, I think all of these want to also use the in-core > version of hte object data (mostly to write them out), and they need > more work to move them in the other direction, so I'd say it is OK > to introduce the new helper to simplify these callsites. Yes, I'm avoiding any value-judgement on whether the callers that use these APIs should or shouldn't be using something else. But I think as the end-state shows splitting apart the two very different API users that do and don't provide the "map" parameter makes sense. We're effectively calling two different functions in all but name before this change.
diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 9f1c730e587..f084da198c7 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -299,8 +299,8 @@ static void export_blob(const struct object_id *oid) buf = read_object_file(oid, &type, &size); if (!buf) die("could not read blob %s", oid_to_hex(oid)); - if (check_object_signature(the_repository, oid, buf, size, - type_name(type), NULL) < 0) + if (!hash_object_file_oideq(the_repository->hash_algo, buf, + size, type, oid, NULL)) die("oid mismatch in blob %s", oid_to_hex(oid)); object = parse_object_buffer(the_repository, oid, type, size, buf, &eaten); diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 01574378ce2..5531a6d8bae 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1412,9 +1412,8 @@ static void fix_unresolved_deltas(struct hashfile *f) if (!data) continue; - if (check_object_signature(the_repository, &d->oid, - data, size, - type_name(type), NULL)) + if (!hash_object_file_oideq(the_repository->hash_algo, data, + size, type, &d->oid, NULL)) die(_("local object %s is corrupt"), oid_to_hex(&d->oid)); /* diff --git a/builtin/mktag.c b/builtin/mktag.c index 96a3686af53..d8c7cf836b7 100644 --- a/builtin/mktag.c +++ b/builtin/mktag.c @@ -61,9 +61,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type) type_name(*tagged_type), type_name(type)); repl = lookup_replace_object(the_repository, tagged_oid); - ret = check_object_signature(the_repository, repl, - buffer, size, type_name(*tagged_type), - NULL); + ret = !hash_object_file_oideq(the_repository->hash_algo, buffer, size, + *tagged_type, repl, NULL); free(buffer); return ret; diff --git a/object-file.c b/object-file.c index 59eb793e0ac..27d10112960 100644 --- a/object-file.c +++ b/object-file.c @@ -1862,6 +1862,19 @@ void hash_object_file(const struct git_hash_algo *algo, const void *buf, write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen); } +int hash_object_file_oideq(const struct git_hash_algo *algo, const void *buf, + unsigned long len, enum object_type type, + const struct object_id *oid, + struct object_id *real_oidp) +{ + struct object_id tmp; + struct object_id *real_oid = real_oidp ? real_oidp : &tmp; + + hash_object_file(algo, buf, len, type_name(type), real_oid); + + return oideq(oid, real_oid); +} + /* Finalize a file on disk, and close it. */ static void close_loose_object(int fd) { diff --git a/object-store.h b/object-store.h index eab1e2a967e..95907062682 100644 --- a/object-store.h +++ b/object-store.h @@ -249,6 +249,20 @@ void hash_object_file(const struct git_hash_algo *algo, const void *buf, unsigned long len, const char *type, struct object_id *oid); +/** + * hash_object_file_oideq() is like hash_object_file() except that + * asserts that "real_oid" is equivalent to an input "oid", and the + * return value is that of oideq(oid, real_oid). + * + * The "real_oid" can be NULL, when non-NULL the caller provides a + * "struct object_id *" that can be used to print what the real OID + * was. + */ +int hash_object_file_oideq(const struct git_hash_algo *algo, const void *buf, + unsigned long len, enum object_type type, + const struct object_id *oid, + struct object_id *real_oidp); + int write_object_file_flags(const void *buf, unsigned long len, enum object_type type, struct object_id *oid, unsigned flags); diff --git a/object.c b/object.c index c37501fc120..d7673332582 100644 --- a/object.c +++ b/object.c @@ -289,8 +289,8 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) buffer = repo_read_object_file(r, oid, &type, &size); if (buffer) { - if (check_object_signature(r, repl, buffer, size, - type_name(type), NULL) < 0) { + if (!hash_object_file_oideq(r->hash_algo, buffer, size, + type, repl, NULL)) { free(buffer); error(_("hash mismatch %s"), oid_to_hex(repl)); return NULL;
Provide a new hash_object_file_oideq() for those callers of check_object_signature() that don't care about its streaming interface. I.e. at the start of that function we do: if (map) { hash_object_file(r->hash_algo, map, size, type, real_oid); return !oideq(oid, real_oid) ? -1 : 0; } These callers always provide a "map" (or "buf[fer]"). Let's have them call this simpler hash_object_file_oideq() function instead. None of them use a non-NULL "real_oid" argument, but let's provide it like check_object_signature() did. This'll make it easy to have these emit better error messages in the future as was done in 96e41f58fe1 (fsck: report invalid object type-path combinations, 2021-10-01), i.e. the die() calls here can emit not only the OID we expected, but also what we got. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/fast-export.c | 4 ++-- builtin/index-pack.c | 5 ++--- builtin/mktag.c | 5 ++--- object-file.c | 13 +++++++++++++ object-store.h | 14 ++++++++++++++ object.c | 4 ++-- 6 files changed, 35 insertions(+), 10 deletions(-)