diff mbox series

mktree: do not check type of remote objects

Message ID 748f39a9-65aa-2110-cf92-7ddf81b5f507@roku.com (mailing list archive)
State New, archived
Headers show
Series mktree: do not check type of remote objects | expand

Commit Message

Richard Oliver June 21, 2022, 1:59 p.m. UTC
On 16/06/2022 18:44, Junio C Hamano wrote:
> This patch would be a good first cut as a starting point, but we
> probably can do better by doing oid_object_info_extended() call with
> OBJECT_INFO_SKIP_FETCH_OBJECT bit (and probably QUICK bit, too) set,
> with the current code structure.
> 
> And when we do so, the title would not match the purpose of the
> change.  The verification was disabled with "--missing" all along
> and that is not what we are changing.  What we will be fixing is the
> wasteful implementation.
> 
>     mktree: do not check types of remote objects
> 
>     With 31c8221a (mktree: validate entry type in input, 2009-05-14), we
>     called the sha1_object_info() API to obtain the type information,
>     but allowed the call to silently fail when the object was missing
>     locally, so that we can sanity-check the types opportunistically
>     when the object did exist.
> 
>     The implementation is understandable because back then there was no
>     lazy/on-demand downloading of individual objects from the promisor
>     remotes that causes a long delay and materializes the object, hence
>     defeating the point of using "--missing".  The design is hurting us
>     now.
> 
>     We could bypass the opportunistic type/mode consistency check
>     altogether when "--missing" is given, but instead, use the
>     oid_object_info_extended() API and tell it that we are only
>     interested in objects that locally exist and are immediately
>     available by passing OBJECT_INFO_SKIP_FETCH_OBJECT bit to it.  That
>     way, we will still retain the cheap and opportunistic sanity check
>     for local objects.

I've prepared a patch below as per your suggestion.

As a side note, do you think we need to re-work some uses of the word
'missing' in the documentation? Some uses of the word, such as in
mktree, predate the concept of promisor remotes. The partial-clone.txt
documentation differentiates between missing "due to a partial clone
or fetch" and missing "due to repository corruption".  Would making
such a distinction elsewhere be useful?

Cheers,
Richard

-- >8 --
Subject: [PATCH] mktree: do not check type of remote objects

With 31c8221a (mktree: validate entry type in input, 2009-05-14), we
called the sha1_object_info() API to obtain the type information, but
allowed the call to silently fail when the object was missing locally,
so that we can sanity-check the types opportunistically when the
object did exist.

The implementation is understandable because back then there was no
lazy/on-demand downloading of individual objects from the promisor
remotes that causes a long delay and materializes the object, hence
defeating the point of using "--missing".  The design is hurting us
now.

We could bypass the opportunistic type/mode consistency check
altogether when "--missing" is given, but instead, use the
oid_object_info_extended() API and tell it that we are only interested
in objects that locally exist and are immediately available by passing
OBJECT_INFO_SKIP_FETCH_OBJECT bit to it.  That way, we will still
retain the cheap and opportunistic sanity check for local objects.

Signed-off-by: Richard Oliver <roliver@roku.com>
---
 builtin/mktree.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Junio C Hamano June 21, 2022, 4:51 p.m. UTC | #1
Richard Oliver <roliver@roku.com> writes:

> As a side note, do you think we need to re-work some uses of the word
> 'missing' in the documentation? Some uses of the word, such as in
> mktree, predate the concept of promisor remotes. The partial-clone.txt
> documentation differentiates between missing "due to a partial clone
> or fetch" and missing "due to repository corruption".  Would making
> such a distinction elsewhere be useful?

I do agree with that direction.  Back when the world was simpler,
there were just "we have the object" and "we ought to have the
object but we do not see it", the latter of which was a clear
repository corruption.  There was no third choice.

With the lazy/partial clone stuff, it is not a corrupted repository
anymore when certain objects that are reachable during traversal is
missing (this is slightly different from a shallow clone in that a
shallow history makes the traversal stop early, so at least objects
in a shallow clone that are reachable during traversal must exist).

So, yes, I think the distinction between "missing but shouldn't be"
and "missing by design" is a good thing to keep in mind.

Thanks.
Junio C Hamano June 21, 2022, 5:48 p.m. UTC | #2
Richard Oliver <roliver@roku.com> writes:

> +	/* Check the type of object identified by oid without fetching objects */
> +	struct object_info oi = OBJECT_INFO_INIT;

This is -Wdecl-after-statement.

> +	oi.typep = &obj_type;
> +	if (oid_object_info_extended(the_repository, &oid, &oi,
> +				     OBJECT_INFO_LOOKUP_REPLACE |
> +				     OBJECT_INFO_QUICK |
> +				     OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
> +		obj_type = -1;
> +
>  	if (obj_type < 0) {
>  		if (allow_missing) {
>  			; /* no problem - missing objects are presumed to be of the right type */


I've done an obvious and trivial fix-up locally while queueing.

Thanks.


diff --git a/builtin/mktree.c b/builtin/mktree.c
index cfadb52670..06d81400f5 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -74,6 +74,7 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 	unsigned mode;
 	enum object_type mode_type; /* object type derived from mode */
 	enum object_type obj_type; /* object type derived from sha */
+	struct object_info oi = OBJECT_INFO_INIT;
 	char *path, *to_free = NULL;
 	struct object_id oid;
 
@@ -117,7 +118,6 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 	}
 
 	/* Check the type of object identified by oid without fetching objects */
-	struct object_info oi = OBJECT_INFO_INIT;
 	oi.typep = &obj_type;
 	if (oid_object_info_extended(the_repository, &oid, &oi,
 				     OBJECT_INFO_LOOKUP_REPLACE |
diff mbox series

Patch

diff --git a/builtin/mktree.c b/builtin/mktree.c
index 902edba6d2..cfadb52670 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -116,8 +116,15 @@  static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 			path, ptr, type_name(mode_type));
 	}
 
-	/* Check the type of object identified by sha1 */
-	obj_type = oid_object_info(the_repository, &oid, NULL);
+	/* Check the type of object identified by oid without fetching objects */
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = &obj_type;
+	if (oid_object_info_extended(the_repository, &oid, &oi,
+				     OBJECT_INFO_LOOKUP_REPLACE |
+				     OBJECT_INFO_QUICK |
+				     OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
+		obj_type = -1;
+
 	if (obj_type < 0) {
 		if (allow_missing) {
 			; /* no problem - missing objects are presumed to be of the right type */