diff mbox series

[v2,1/2] sha1-file: support OBJECT_INFO_FOR_PREFETCH

Message ID 068861632b85179d2a5a5ceb966e951a78b27141.1553895166.git.jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series Batch fetching of missing blobs in diff and show | expand

Commit Message

Jonathan Tan March 29, 2019, 9:39 p.m. UTC
Teach oid_object_info_extended() to support a new flag that inhibits
fetching of missing objects. This is equivalent to setting
fetch_is_missing to 0, calling oid_object_info_extended(), then setting
fetch_if_missing to whatever it was before. Update unpack-trees.c to use
this new flag instead of repeatedly setting fetch_if_missing.

This new flag complicates things slightly in that there are now 2 ways
to do the same thing. But this eliminates the need to repeatedly set a
global variable, and more importantly, allows prefetching to be done in
parallel (in the future); hence, this patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 object-store.h |  6 ++++++
 sha1-file.c    |  3 ++-
 unpack-trees.c | 17 +++++++++--------
 3 files changed, 17 insertions(+), 9 deletions(-)

Comments

Johannes Schindelin April 5, 2019, 2:13 p.m. UTC | #1
Hi Jonathan,

On Fri, 29 Mar 2019, Jonathan Tan wrote:

> Teach oid_object_info_extended() to support a new flag that inhibits
> fetching of missing objects. This is equivalent to setting
> fetch_is_missing to 0, calling oid_object_info_extended(), then setting
> fetch_if_missing to whatever it was before. Update unpack-trees.c to use
> this new flag instead of repeatedly setting fetch_if_missing.
>
> This new flag complicates things slightly in that there are now 2 ways
> to do the same thing.

Just a note that I disagree with the latter part of the sentence: those
are not 2 ways of doing the same thing, but they are two switches that
essentially both have to be flipped to "on". They're just multiple gates.

I do not ask you to rephrase it, merely registering a different opinion.

The patch looks good, I especially like the post-image of
`check_updates()`, which looks much nicer (from my perspective, of
course).

Thanks,
Dscho
Jeff King April 5, 2019, 10 p.m. UTC | #2
On Fri, Mar 29, 2019 at 02:39:27PM -0700, Jonathan Tan wrote:

> Teach oid_object_info_extended() to support a new flag that inhibits
> fetching of missing objects. This is equivalent to setting
> fetch_is_missing to 0, calling oid_object_info_extended(), then setting
> fetch_if_missing to whatever it was before. Update unpack-trees.c to use
> this new flag instead of repeatedly setting fetch_if_missing.
> 
> This new flag complicates things slightly in that there are now 2 ways
> to do the same thing. But this eliminates the need to repeatedly set a
> global variable, and more importantly, allows prefetching to be done in
> parallel (in the future); hence, this patch.

Sorry I'm a little late to review this. I don't have any critical
comments, so if this gets ignored, I'll live with it.

> +/*
> + * Do not attempt to fetch the object if missing (even if fetch_is_missing is
> + * nonzero). This is meant for bulk prefetching of missing blobs in a partial
> + * clone. Implies OBJECT_INFO_QUICK.
> + */
> +#define OBJECT_INFO_FOR_PREFETCH (32 + OBJECT_INFO_QUICK)

Mostly I found the name and semantics of this flag to be a little
confusing. Really what we want is to tell oid_object_info() not do any
on-demand fetching for us. That seems like a thing that we might
eventually want for other purposes (e.g., a diff operation that could
produce a real blob diff but would be happy outputting a less-detailed
tree diff).

If it were just OBJECT_INFO_NO_FETCH or similar, that tells more clearly
what it does, and would make sense in more contexts.

I suspect that QUICK would be the norm when used with it, though I
probably would have kept the two orthogonal for the sake of simplicity
and clarity.

> diff --git a/unpack-trees.c b/unpack-trees.c
> index 22c41a3ba8..381b0cd65e 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -404,20 +404,21 @@ static int check_updates(struct unpack_trees_options *o)
>  		 * below.
>  		 */
>  		struct oid_array to_fetch = OID_ARRAY_INIT;
> -		int fetch_if_missing_store = fetch_if_missing;
> -		fetch_if_missing = 0;
>  		for (i = 0; i < index->cache_nr; i++) {
>  			struct cache_entry *ce = index->cache[i];
> -			if ((ce->ce_flags & CE_UPDATE) &&
> -			    !S_ISGITLINK(ce->ce_mode)) {
> -				if (!has_object_file(&ce->oid))
> -					oid_array_append(&to_fetch, &ce->oid);
> -			}
> +
> +			if (!(ce->ce_flags & CE_UPDATE) ||
> +			    S_ISGITLINK(ce->ce_mode))
> +				continue;
> +			if (!oid_object_info_extended(the_repository, &ce->oid,
> +						      NULL,
> +						      OBJECT_INFO_FOR_PREFETCH))
> +				continue;
> +			oid_array_append(&to_fetch, &ce->oid);

Here we get rid of the global set/restore dance, which is nice. But
there's also a behavior change, as we've picked up QUICK. I think that's
probably the right thing to do, but I was a bit surprised not to see any
discussion in the commit message.

-Peff
diff mbox series

Patch

diff --git a/object-store.h b/object-store.h
index 14fc935bd1..dd3f9b75f0 100644
--- a/object-store.h
+++ b/object-store.h
@@ -280,6 +280,12 @@  struct object_info {
 #define OBJECT_INFO_QUICK 8
 /* Do not check loose object */
 #define OBJECT_INFO_IGNORE_LOOSE 16
+/*
+ * Do not attempt to fetch the object if missing (even if fetch_is_missing is
+ * nonzero). This is meant for bulk prefetching of missing blobs in a partial
+ * clone. Implies OBJECT_INFO_QUICK.
+ */
+#define OBJECT_INFO_FOR_PREFETCH (32 + OBJECT_INFO_QUICK)
 
 int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
diff --git a/sha1-file.c b/sha1-file.c
index 494606f771..ad02649124 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1370,7 +1370,8 @@  int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 
 		/* Check if it is a missing object */
 		if (fetch_if_missing && repository_format_partial_clone &&
-		    !already_retried && r == the_repository) {
+		    !already_retried && r == the_repository &&
+		    !(flags & OBJECT_INFO_FOR_PREFETCH)) {
 			/*
 			 * TODO Investigate having fetch_object() return
 			 * TODO error/success and stopping the music here.
diff --git a/unpack-trees.c b/unpack-trees.c
index 22c41a3ba8..381b0cd65e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -404,20 +404,21 @@  static int check_updates(struct unpack_trees_options *o)
 		 * below.
 		 */
 		struct oid_array to_fetch = OID_ARRAY_INIT;
-		int fetch_if_missing_store = fetch_if_missing;
-		fetch_if_missing = 0;
 		for (i = 0; i < index->cache_nr; i++) {
 			struct cache_entry *ce = index->cache[i];
-			if ((ce->ce_flags & CE_UPDATE) &&
-			    !S_ISGITLINK(ce->ce_mode)) {
-				if (!has_object_file(&ce->oid))
-					oid_array_append(&to_fetch, &ce->oid);
-			}
+
+			if (!(ce->ce_flags & CE_UPDATE) ||
+			    S_ISGITLINK(ce->ce_mode))
+				continue;
+			if (!oid_object_info_extended(the_repository, &ce->oid,
+						      NULL,
+						      OBJECT_INFO_FOR_PREFETCH))
+				continue;
+			oid_array_append(&to_fetch, &ce->oid);
 		}
 		if (to_fetch.nr)
 			fetch_objects(repository_format_partial_clone,
 				      to_fetch.oid, to_fetch.nr);
-		fetch_if_missing = fetch_if_missing_store;
 		oid_array_clear(&to_fetch);
 	}
 	for (i = 0; i < index->cache_nr; i++) {