diff mbox series

[05/10] object-file API: provide a hash_object_file_oideq()

Message ID patch-05.10-719fcfbe13c-20220201T144803Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series object-file API: pass object enums, tidy up streaming interface | expand

Commit Message

Ævar Arnfjörð Bjarmason Feb. 1, 2022, 2:53 p.m. UTC
Provide a new hash_object_file_oideq() for those callers of
check_object_signature() that don't care about its streaming
interface. I.e. at the start of that function we do:

	if (map) {
		hash_object_file(r->hash_algo, map, size, type, real_oid);
		return !oideq(oid, real_oid) ? -1 : 0;
	}

These callers always provide a "map" (or "buf[fer]"). Let's have them
call this simpler hash_object_file_oideq() function instead.

None of them use a non-NULL "real_oid" argument, but let's provide it
like check_object_signature() did. This'll make it easy to have these
emit better error messages in the future as was done in
96e41f58fe1 (fsck: report invalid object type-path combinations,
2021-10-01), i.e. the die() calls here can emit not only the OID we
expected, but also what we got.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  4 ++--
 builtin/index-pack.c  |  5 ++---
 builtin/mktag.c       |  5 ++---
 object-file.c         | 13 +++++++++++++
 object-store.h        | 14 ++++++++++++++
 object.c              |  4 ++--
 6 files changed, 35 insertions(+), 10 deletions(-)

Comments

Junio C Hamano Feb. 1, 2022, 7:08 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Provide a new hash_object_file_oideq() for those callers of
> check_object_signature() that don't care about its streaming
> interface. I.e. at the start of that function we do:
>
> 	if (map) {
> 		hash_object_file(r->hash_algo, map, size, type, real_oid);
> 		return !oideq(oid, real_oid) ? -1 : 0;
> 	}
>
> These callers always provide a "map" (or "buf[fer]"). Let's have them
> call this simpler hash_object_file_oideq() function instead.
>
> None of them use a non-NULL "real_oid" argument, but let's provide it
> like check_object_signature() did. This'll make it easy to have these
> emit better error messages in the future as was done in
> 96e41f58fe1 (fsck: report invalid object type-path combinations,
> 2021-10-01), i.e. the die() calls here can emit not only the OID we
> expected, but also what we got.

This has a potential to moving us in the wrong direction when made
to code paths that currently fully slurp an object in-core but may
want to shrink the memory footprint by using streaming API more.

Having said that, I think all of these want to also use the in-core
version of hte object data (mostly to write them out), and they need
more work to move them in the other direction, so I'd say it is OK
to introduce the new helper to simplify these callsites.
Ævar Arnfjörð Bjarmason Feb. 1, 2022, 8:56 p.m. UTC | #2
On Tue, Feb 01 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Provide a new hash_object_file_oideq() for those callers of
>> check_object_signature() that don't care about its streaming
>> interface. I.e. at the start of that function we do:
>>
>> 	if (map) {
>> 		hash_object_file(r->hash_algo, map, size, type, real_oid);
>> 		return !oideq(oid, real_oid) ? -1 : 0;
>> 	}
>>
>> These callers always provide a "map" (or "buf[fer]"). Let's have them
>> call this simpler hash_object_file_oideq() function instead.
>>
>> None of them use a non-NULL "real_oid" argument, but let's provide it
>> like check_object_signature() did. This'll make it easy to have these
>> emit better error messages in the future as was done in
>> 96e41f58fe1 (fsck: report invalid object type-path combinations,
>> 2021-10-01), i.e. the die() calls here can emit not only the OID we
>> expected, but also what we got.
>
> This has a potential to moving us in the wrong direction when made
> to code paths that currently fully slurp an object in-core but may
> want to shrink the memory footprint by using streaming API more.
>
> Having said that, I think all of these want to also use the in-core
> version of hte object data (mostly to write them out), and they need
> more work to move them in the other direction, so I'd say it is OK
> to introduce the new helper to simplify these callsites.

Yes, I'm avoiding any value-judgement on whether the callers that use
these APIs should or shouldn't be using something else.

But I think as the end-state shows splitting apart the two very
different API users that do and don't provide the "map" parameter makes
sense. We're effectively calling two different functions in all but name
before this change.
diff mbox series

Patch

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 9f1c730e587..f084da198c7 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -299,8 +299,8 @@  static void export_blob(const struct object_id *oid)
 		buf = read_object_file(oid, &type, &size);
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
-		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type), NULL) < 0)
+		if (!hash_object_file_oideq(the_repository->hash_algo, buf,
+					    size, type, oid, NULL))
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 01574378ce2..5531a6d8bae 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1412,9 +1412,8 @@  static void fix_unresolved_deltas(struct hashfile *f)
 		if (!data)
 			continue;
 
-		if (check_object_signature(the_repository, &d->oid,
-					   data, size,
-					   type_name(type), NULL))
+		if (!hash_object_file_oideq(the_repository->hash_algo, data,
+					    size, type, &d->oid, NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 96a3686af53..d8c7cf836b7 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -61,9 +61,8 @@  static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 		    type_name(*tagged_type), type_name(type));
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
-	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type),
-				     NULL);
+	ret = !hash_object_file_oideq(the_repository->hash_algo, buffer, size,
+				      *tagged_type, repl, NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index 59eb793e0ac..27d10112960 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1862,6 +1862,19 @@  void hash_object_file(const struct git_hash_algo *algo, const void *buf,
 	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
 }
 
+int hash_object_file_oideq(const struct git_hash_algo *algo, const void *buf,
+			   unsigned long len, enum object_type type,
+			   const struct object_id *oid,
+			   struct object_id *real_oidp)
+{
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
+
+	hash_object_file(algo, buf, len, type_name(type), real_oid);
+
+	return oideq(oid, real_oid);
+}
+
 /* Finalize a file on disk, and close it. */
 static void close_loose_object(int fd)
 {
diff --git a/object-store.h b/object-store.h
index eab1e2a967e..95907062682 100644
--- a/object-store.h
+++ b/object-store.h
@@ -249,6 +249,20 @@  void hash_object_file(const struct git_hash_algo *algo, const void *buf,
 		      unsigned long len, const char *type,
 		      struct object_id *oid);
 
+/**
+ * hash_object_file_oideq() is like hash_object_file() except that
+ * asserts that "real_oid" is equivalent to an input "oid", and the
+ * return value is that of oideq(oid, real_oid).
+ *
+ * The "real_oid" can be NULL, when non-NULL the caller provides a
+ * "struct object_id *" that can be used to print what the real OID
+ * was.
+ */
+int hash_object_file_oideq(const struct git_hash_algo *algo, const void *buf,
+			   unsigned long len, enum object_type type,
+			   const struct object_id *oid,
+			   struct object_id *real_oidp);
+
 int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
 			    unsigned flags);
diff --git a/object.c b/object.c
index c37501fc120..d7673332582 100644
--- a/object.c
+++ b/object.c
@@ -289,8 +289,8 @@  struct object *parse_object(struct repository *r, const struct object_id *oid)
 
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
-		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type), NULL) < 0) {
+		if (!hash_object_file_oideq(r->hash_algo, buffer, size,
+					    type, repl, NULL)) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;