diff mbox series

diff: restrict when prefetching occurs

Message ID 20200331020418.55640-1-jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series diff: restrict when prefetching occurs | expand

Commit Message

Jonathan Tan March 31, 2020, 2:04 a.m. UTC
Commit 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-08)
optimized "diff" by prefetching blobs in a partial clone, but there are
some cases wherein blobs do not need to be prefetched. In particular, if
(1) no blob data is included in the output of the diff, (2)
break-rewrite detection is not requested, and (3) no inexact rename
detection is needed, then no blobs are read at all.

Therefore, in such a case, do not prefetch. Change diffcore_std() to
only prefetch if (1) and/or (2) is not true (currently, it always
prefetches); change diffcore_rename() to prefetch if (3) is not true and
no prefetch has yet occurred.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
I decided to revisit [1] because at $DAYJOB there was a new case that
required this. I used Peff's code from [2] for the rebase part, and also
automatically prefetch if blob data will be included in the output of
the diff and/or break-rewrite detection is requested.

[1] https://lore.kernel.org/git/20200128213508.31661-1-jonathantanmy@google.com/
[2] https://lore.kernel.org/git/20200130055136.GA2184413@coredump.intra.peff.net/
---
 diff.c                        | 26 +++++++++++++++----
 diffcore-rename.c             | 40 +++++++++++++++++++++++++++-
 diffcore.h                    |  2 +-
 t/t4067-diff-partial-clone.sh | 49 +++++++++++++++++++++++++++++++++++
 4 files changed, 110 insertions(+), 7 deletions(-)

Comments

Derrick Stolee March 31, 2020, 12:14 p.m. UTC | #1
On 3/30/2020 10:04 PM, Jonathan Tan wrote:
> Commit 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-08)
> optimized "diff" by prefetching blobs in a partial clone, but there are
> some cases wherein blobs do not need to be prefetched. In particular, if
> (1) no blob data is included in the output of the diff, (2)
> break-rewrite detection is not requested, and (3) no inexact rename
> detection is needed, then no blobs are read at all.
> 
> Therefore, in such a case, do not prefetch. Change diffcore_std() to
> only prefetch if (1) and/or (2) is not true (currently, it always
> prefetches); change diffcore_rename() to prefetch if (3) is not true and
> no prefetch has yet occurred.

This conflicts with [3], so please keep that in mind.

Maybe [3] should be adjusted to assume this patch, because that change
is mostly about disabling the batch download when no renames are required.
As Peff said [2] the full rename detection trigger is "overly broad".

However, the changed-path Bloom filters are an excellent test for this
patch, as computing them in a partial clone will trigger downloading all
blobs without [3].

[3] https://lore.kernel.org/git/55824cda89c1dca7756c8c2d831d6e115f4a9ddb.1585528298.git.gitgitgadget@gmail.com/T/#u

> [1] https://lore.kernel.org/git/20200128213508.31661-1-jonathantanmy@google.com/
> [2] https://lore.kernel.org/git/20200130055136.GA2184413@coredump.intra.peff.net/
> ---
>  diff.c                        | 26 +++++++++++++++----
>  diffcore-rename.c             | 40 +++++++++++++++++++++++++++-
>  diffcore.h                    |  2 +-
>  t/t4067-diff-partial-clone.sh | 49 +++++++++++++++++++++++++++++++++++
>  4 files changed, 110 insertions(+), 7 deletions(-)
> 
> diff --git a/diff.c b/diff.c
> index 1010d806f5..19c5d638d6 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6507,10 +6507,24 @@ static void add_if_missing(struct repository *r,
>  
>  void diffcore_std(struct diff_options *options)
>  {
> -	if (options->repo == the_repository && has_promisor_remote()) {
> -		/*
> -		 * Prefetch the diff pairs that are about to be flushed.
> -		 */
> +	int prefetched = 0;
> +	int output_formats_to_prefetch = DIFF_FORMAT_DIFFSTAT |
> +		DIFF_FORMAT_NUMSTAT |
> +		DIFF_FORMAT_PATCH |
> +		DIFF_FORMAT_SHORTSTAT |
> +		DIFF_FORMAT_DIRSTAT;
> +
> +	/*
> +	 * Check if the user requested a blob-data-requiring diff output and/or
> +	 * break-rewrite detection (which requires blob data). If yes, prefetch
> +	 * the diff pairs.
> +	 *
> +	 * If no prefetching occurs, diffcore_rename() will prefetch if it
> +	 * decides that it needs inexact rename detection.
> +	 */
> +	if (options->repo == the_repository && has_promisor_remote() &&
> +	    (options->output_format & output_formats_to_prefetch ||
> +	     (!options->found_follow && options->break_opt != -1))) {
>  		int i;
>  		struct diff_queue_struct *q = &diff_queued_diff;
>  		struct oid_array to_fetch = OID_ARRAY_INIT;
> @@ -6520,6 +6534,8 @@ void diffcore_std(struct diff_options *options)
>  			add_if_missing(options->repo, &to_fetch, p->one);
>  			add_if_missing(options->repo, &to_fetch, p->two);
>  		}
> +		prefetched = 1;
> +
>  		if (to_fetch.nr)

It was difficult to see from the context, but the next line is
"do the prefetch", so "prefetched = 1" makes sense here, even
if to_fetch.nr is zero. We've already done the work to see that
a batch download is not needed.

>  			/*
>  			 * NEEDSWORK: Consider deduplicating the OIDs sent.
> @@ -6538,7 +6554,7 @@ void diffcore_std(struct diff_options *options)
>  			diffcore_break(options->repo,
>  				       options->break_opt);
>  		if (options->detect_rename)
> -			diffcore_rename(options);
> +			diffcore_rename(options, prefetched);
>  		if (options->break_opt != -1)
>  			diffcore_merge_broken();
>  	}
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index e189f407af..962565f066 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -7,6 +7,7 @@
>  #include "object-store.h"
>  #include "hashmap.h"
>  #include "progress.h"
> +#include "promisor-remote.h"
>  
>  /* Table of rename/copy destinations */
>  
> @@ -448,7 +449,18 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i
>  	return count;
>  }
>  
> -void diffcore_rename(struct diff_options *options)
> +static void add_if_missing(struct repository *r,
> +			   struct oid_array *to_fetch,
> +			   const struct diff_filespec *filespec)
> +{
> +	if (filespec && filespec->oid_valid &&
> +	    !S_ISGITLINK(filespec->mode) &&
> +	    oid_object_info_extended(r, &filespec->oid, NULL,
> +				     OBJECT_INFO_FOR_PREFETCH))
> +		oid_array_append(to_fetch, &filespec->oid);
> +}
> +
> +void diffcore_rename(struct diff_options *options, int prefetched)
>  {
>  	int detect_rename = options->detect_rename;
>  	int minimum_score = options->rename_score;
> @@ -538,6 +550,32 @@ void diffcore_rename(struct diff_options *options)
>  		break;
>  	}
>  
> +	if (!prefetched) {
> +		/*
> +		 * At this point we know there's actual work to do: we have rename
> +		 * destinations that didn't find an exact match, and we have potential
> +		 * sources. So we'll have to do inexact rename detection, which
> +		 * requires looking at the blobs.
> +		 *
> +		 * If we haven't already prefetched, it's worth pre-fetching
> +		 * them as a group now.
> +		 */
> +		int i;
> +		struct oid_array to_fetch = OID_ARRAY_INIT;
> +
> +		for (i = 0; i < rename_dst_nr; i++) {
> +			if (rename_dst[i].pair)
> +				continue; /* already found exact match */
> +			add_if_missing(options->repo, &to_fetch, rename_dst[i].two);

Could this be reversed instead to avoid the "continue"?

	if (!rename_dst[i].pair)
		add_if_missing(options->repo, &to_fetch, rename_dst[i].two);

> +		}
> +		for (i = 0; i < rename_src_nr; i++)
> +			add_if_missing(options->repo, &to_fetch, rename_src[i].p->one);

Does this not have the equivalent "rename_src[i].pair" logic for exact
matches?

> +		if (to_fetch.nr)
> +			promisor_remote_get_direct(options->repo,
> +						   to_fetch.oid, to_fetch.nr);

Perhaps promisor_remote_get_direct() could have the check for
nr == 0 to exit early instead of putting that upon all the
callers?

> +		oid_array_clear(&to_fetch);
> +	}
> +
>  	if (options->show_rename_progress) {
>  		progress = start_delayed_progress(
>  				_("Performing inexact rename detection"),
> diff --git a/diffcore.h b/diffcore.h
> index 7c07347e42..9f69506574 100644
> --- a/diffcore.h
> +++ b/diffcore.h
> @@ -144,7 +144,7 @@ struct diff_filepair *diff_queue(struct diff_queue_struct *,
>  void diff_q(struct diff_queue_struct *, struct diff_filepair *);
>  
>  void diffcore_break(struct repository *, int);
> -void diffcore_rename(struct diff_options *);
> +void diffcore_rename(struct diff_options *, int prefetched);
>  void diffcore_merge_broken(void);
>  void diffcore_pickaxe(struct diff_options *);
>  void diffcore_order(const char *orderfile);
> diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
> index 4831ad35e6..7acb64727d 100755
> --- a/t/t4067-diff-partial-clone.sh
> +++ b/t/t4067-diff-partial-clone.sh
> @@ -131,4 +131,53 @@ test_expect_success 'diff with rename detection batches blobs' '
>  	test_line_count = 1 done_lines
>  '
>  
> +test_expect_success 'diff does not fetch anything if inexact rename detection is not needed' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	printf "b\nb\nb\nb\nb\n" >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&

> +	rm server/b &&
> +	printf "b\nb\nb\nb\nb\n" >server/c &&

Would "mv server/b server/c" make it more clear that
this is an exact rename?

> +	git -C server add c &&
> +	git -C server commit -a -m x &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +	# Ensure no fetches.
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff --raw -M HEAD^ HEAD &&
> +	! test_path_exists trace
> +'
> +
> +test_expect_success 'diff --break-rewrites fetches only if necessary, and batches blobs if it does' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	printf "b\nb\nb\nb\nb\n" >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +	printf "c\nc\nc\nc\nc\n" >server/b &&
> +	git -C server commit -a -m x &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +	# Ensure no fetches.
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff --raw -M HEAD^ HEAD &&
> +	! test_path_exists trace &&
> +
> +	# But with --break-rewrites, ensure that there is exactly 1 negotiation
> +	# by checking that there is only 1 "done" line sent. ("done" marks the
> +	# end of negotiation.)
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff --break-rewrites --raw -M HEAD^ HEAD &&
> +	grep "git> done" trace >done_lines &&
> +	test_line_count = 1 done_lines
> +'
> +
>  test_done
> 

Thanks,
-Stolee
Jonathan Tan March 31, 2020, 4:50 p.m. UTC | #2
> This conflicts with [3], so please keep that in mind.
> 
> Maybe [3] should be adjusted to assume this patch, because that change
> is mostly about disabling the batch download when no renames are required.
> As Peff said [2] the full rename detection trigger is "overly broad".
> 
> However, the changed-path Bloom filters are an excellent test for this
> patch, as computing them in a partial clone will trigger downloading all
> blobs without [3].
> 
> [3] https://lore.kernel.org/git/55824cda89c1dca7756c8c2d831d6e115f4a9ddb.1585528298.git.gitgitgadget@gmail.com/T/#u
> 
> > [1] https://lore.kernel.org/git/20200128213508.31661-1-jonathantanmy@google.com/
> > [2] https://lore.kernel.org/git/20200130055136.GA2184413@coredump.intra.peff.net/

Thanks for the pointer. Yes, I think that [3] should be adjusted to
assume this patch.

> > +		for (i = 0; i < rename_dst_nr; i++) {
> > +			if (rename_dst[i].pair)
> > +				continue; /* already found exact match */
> > +			add_if_missing(options->repo, &to_fetch, rename_dst[i].two);
> 
> Could this be reversed instead to avoid the "continue"?

Hmm...I prefer the "return early" approach, but can change it if others
prefer to avoid the "continue" here.

> 	if (!rename_dst[i].pair)
> 		add_if_missing(options->repo, &to_fetch, rename_dst[i].two);
> 
> > +		}
> > +		for (i = 0; i < rename_src_nr; i++)
> > +			add_if_missing(options->repo, &to_fetch, rename_src[i].p->one);
> 
> Does this not have the equivalent "rename_src[i].pair" logic for exact
> matches?

Thanks for the catch. There's no "pair" in rename_src[i], but the
equivalent is "if (skip_unmodified &&
diff_unmodified_pair(rename_src[i].p))", which you can see in the "for"
loop later in the function. I've added this.

> > +		if (to_fetch.nr)
> > +			promisor_remote_get_direct(options->repo,
> > +						   to_fetch.oid, to_fetch.nr);
> 
> Perhaps promisor_remote_get_direct() could have the check for
> nr == 0 to exit early instead of putting that upon all the
> callers?

The 2nd param is a pointer to an array, and I think it would be strange
to pass a pointer to a 0-size region of memory anywhere, so I'll leave
it as it is.

> > +test_expect_success 'diff does not fetch anything if inexact rename detection is not needed' '
> > +	test_when_finished "rm -rf server client trace" &&
> > +
> > +	test_create_repo server &&
> > +	echo a >server/a &&
> > +	printf "b\nb\nb\nb\nb\n" >server/b &&
> > +	git -C server add a b &&
> > +	git -C server commit -m x &&
> 
> > +	rm server/b &&
> > +	printf "b\nb\nb\nb\nb\n" >server/c &&
> 
> Would "mv server/b server/c" make it more clear that
> this is an exact rename?

True. Will do.

Thanks for the review.
Derrick Stolee March 31, 2020, 5:48 p.m. UTC | #3
On 3/31/2020 12:50 PM, Jonathan Tan wrote:
>> This conflicts with [3], so please keep that in mind.
>>
>> Maybe [3] should be adjusted to assume this patch, because that change
>> is mostly about disabling the batch download when no renames are required.
>> As Peff said [2] the full rename detection trigger is "overly broad".
>>
>> However, the changed-path Bloom filters are an excellent test for this
>> patch, as computing them in a partial clone will trigger downloading all
>> blobs without [3].
>>
>> [3] https://lore.kernel.org/git/55824cda89c1dca7756c8c2d831d6e115f4a9ddb.1585528298.git.gitgitgadget@gmail.com/T/#u
>>
>>> [1] https://lore.kernel.org/git/20200128213508.31661-1-jonathantanmy@google.com/
>>> [2] https://lore.kernel.org/git/20200130055136.GA2184413@coredump.intra.peff.net/
> 
> Thanks for the pointer. Yes, I think that [3] should be adjusted to
> assume this patch.
> 
>>> +		for (i = 0; i < rename_dst_nr; i++) {
>>> +			if (rename_dst[i].pair)
>>> +				continue; /* already found exact match */
>>> +			add_if_missing(options->repo, &to_fetch, rename_dst[i].two);
>>
>> Could this be reversed instead to avoid the "continue"?
> 
> Hmm...I prefer the "return early" approach, but can change it if others
> prefer to avoid the "continue" here.

The "return early" approach is great and makes sense unless there is
only one line of code happening in the other case. Not sure if there
is any potential that the non-continue case grows in size or not.

Doesn't hurt that much to have the "return early" approach, as you
wrote it.

>> 	if (!rename_dst[i].pair)
>> 		add_if_missing(options->repo, &to_fetch, rename_dst[i].two);
>>
>>> +		}
>>> +		for (i = 0; i < rename_src_nr; i++)
>>> +			add_if_missing(options->repo, &to_fetch, rename_src[i].p->one);
>>
>> Does this not have the equivalent "rename_src[i].pair" logic for exact
>> matches?
> 
> Thanks for the catch. There's no "pair" in rename_src[i], but the
> equivalent is "if (skip_unmodified &&
> diff_unmodified_pair(rename_src[i].p))", which you can see in the "for"
> loop later in the function. I've added this.
> 
>>> +		if (to_fetch.nr)
>>> +			promisor_remote_get_direct(options->repo,
>>> +						   to_fetch.oid, to_fetch.nr);
>>
>> Perhaps promisor_remote_get_direct() could have the check for
>> nr == 0 to exit early instead of putting that upon all the
>> callers?
> 
> The 2nd param is a pointer to an array, and I think it would be strange
> to pass a pointer to a 0-size region of memory anywhere, so I'll leave
> it as it is.

Well, I would assume that to_fetch.oid is either NULL or is alloc'd
larger than to_fetch.nr when there are no added objects.

This is now the fourth location where we if (to_fetch.nr) promisor_remote_get_direct()
so we have already violated the rule of three.

My preference would be to insert a patch before this that halts the
promisor_remote_get_direct() call on an nr of 0 and deletes the "if (nr)"
conditions from the three existing callers. Then this patch could use
the logic without ever adding the "if (nr)".

Thanks,
-Stolee
Junio C Hamano March 31, 2020, 6:15 p.m. UTC | #4
Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index e189f407af..962565f066 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> ...
> @@ -448,7 +449,18 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i
>  	return count;
>  }
>  
> +static void add_if_missing(struct repository *r,
> +			   struct oid_array *to_fetch,
> +			   const struct diff_filespec *filespec)
> +{
> +	if (filespec && filespec->oid_valid &&
> +	    !S_ISGITLINK(filespec->mode) &&
> +	    oid_object_info_extended(r, &filespec->oid, NULL,
> +				     OBJECT_INFO_FOR_PREFETCH))
> +		oid_array_append(to_fetch, &filespec->oid);
> +}

Do not copy&paste the exact code from elsewhere.  It is a sure way
to guarantee that they will drift apart over time.  Rename the one
in diff.c to something a bit more appropriate to be a global name
(e.g. diff_prepare_prefetch() or somesuch), make it extern in
<diffcore.h> and use it here.

Thanks.
Junio C Hamano March 31, 2020, 6:21 p.m. UTC | #5
Derrick Stolee <stolee@gmail.com> writes:

>>>> +		for (i = 0; i < rename_dst_nr; i++) {
>>>> +			if (rename_dst[i].pair)
>>>> +				continue; /* already found exact match */
>>>> +			add_if_missing(options->repo, &to_fetch, rename_dst[i].two);
>>>
>>> Could this be reversed instead to avoid the "continue"?
>> 
>> Hmm...I prefer the "return early" approach, but can change it if others
>> prefer to avoid the "continue" here.
>
> The "return early" approach is great and makes sense unless there is
> only one line of code happening in the other case. Not sure if there
> is any potential that the non-continue case grows in size or not.
>
> Doesn't hurt that much to have the "return early" approach, as you
> wrote it.

Even with just one statement after the continue, in this particular
case, the logic seems to flow a bit more naturally.  "Let's see each
item in this list.  ah, this has already been processed so let's
move on.  otherwise, we may need to do something a bit more."  It
also saves one indentation level for the logic that matters ;-)

>>>> +		for (i = 0; i < rename_src_nr; i++)
>>>> +			add_if_missing(options->repo, &to_fetch, rename_src[i].p->one);
>>>
>>> Does this not have the equivalent "rename_src[i].pair" logic for exact
>>> matches?

One source blob can be copied to multiple destination path, with and
without modification, but we currently do not detect the case where
a destination blob is a concatenation of two source blobs.  So we
can optimize the destination side ("we are done with it, no need to
look---we won't find anything better anyway as we've found the exact
copy source") but we cannot do the same optimization on the source
side ("yes, this one was copied to path A, but path B may have a
copy with slight modification), I would think.
diff mbox series

Patch

diff --git a/diff.c b/diff.c
index 1010d806f5..19c5d638d6 100644
--- a/diff.c
+++ b/diff.c
@@ -6507,10 +6507,24 @@  static void add_if_missing(struct repository *r,
 
 void diffcore_std(struct diff_options *options)
 {
-	if (options->repo == the_repository && has_promisor_remote()) {
-		/*
-		 * Prefetch the diff pairs that are about to be flushed.
-		 */
+	int prefetched = 0;
+	int output_formats_to_prefetch = DIFF_FORMAT_DIFFSTAT |
+		DIFF_FORMAT_NUMSTAT |
+		DIFF_FORMAT_PATCH |
+		DIFF_FORMAT_SHORTSTAT |
+		DIFF_FORMAT_DIRSTAT;
+
+	/*
+	 * Check if the user requested a blob-data-requiring diff output and/or
+	 * break-rewrite detection (which requires blob data). If yes, prefetch
+	 * the diff pairs.
+	 *
+	 * If no prefetching occurs, diffcore_rename() will prefetch if it
+	 * decides that it needs inexact rename detection.
+	 */
+	if (options->repo == the_repository && has_promisor_remote() &&
+	    (options->output_format & output_formats_to_prefetch ||
+	     (!options->found_follow && options->break_opt != -1))) {
 		int i;
 		struct diff_queue_struct *q = &diff_queued_diff;
 		struct oid_array to_fetch = OID_ARRAY_INIT;
@@ -6520,6 +6534,8 @@  void diffcore_std(struct diff_options *options)
 			add_if_missing(options->repo, &to_fetch, p->one);
 			add_if_missing(options->repo, &to_fetch, p->two);
 		}
+		prefetched = 1;
+
 		if (to_fetch.nr)
 			/*
 			 * NEEDSWORK: Consider deduplicating the OIDs sent.
@@ -6538,7 +6554,7 @@  void diffcore_std(struct diff_options *options)
 			diffcore_break(options->repo,
 				       options->break_opt);
 		if (options->detect_rename)
-			diffcore_rename(options);
+			diffcore_rename(options, prefetched);
 		if (options->break_opt != -1)
 			diffcore_merge_broken();
 	}
diff --git a/diffcore-rename.c b/diffcore-rename.c
index e189f407af..962565f066 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -7,6 +7,7 @@ 
 #include "object-store.h"
 #include "hashmap.h"
 #include "progress.h"
+#include "promisor-remote.h"
 
 /* Table of rename/copy destinations */
 
@@ -448,7 +449,18 @@  static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i
 	return count;
 }
 
-void diffcore_rename(struct diff_options *options)
+static void add_if_missing(struct repository *r,
+			   struct oid_array *to_fetch,
+			   const struct diff_filespec *filespec)
+{
+	if (filespec && filespec->oid_valid &&
+	    !S_ISGITLINK(filespec->mode) &&
+	    oid_object_info_extended(r, &filespec->oid, NULL,
+				     OBJECT_INFO_FOR_PREFETCH))
+		oid_array_append(to_fetch, &filespec->oid);
+}
+
+void diffcore_rename(struct diff_options *options, int prefetched)
 {
 	int detect_rename = options->detect_rename;
 	int minimum_score = options->rename_score;
@@ -538,6 +550,32 @@  void diffcore_rename(struct diff_options *options)
 		break;
 	}
 
+	if (!prefetched) {
+		/*
+		 * At this point we know there's actual work to do: we have rename
+		 * destinations that didn't find an exact match, and we have potential
+		 * sources. So we'll have to do inexact rename detection, which
+		 * requires looking at the blobs.
+		 *
+		 * If we haven't already prefetched, it's worth pre-fetching
+		 * them as a group now.
+		 */
+		int i;
+		struct oid_array to_fetch = OID_ARRAY_INIT;
+
+		for (i = 0; i < rename_dst_nr; i++) {
+			if (rename_dst[i].pair)
+				continue; /* already found exact match */
+			add_if_missing(options->repo, &to_fetch, rename_dst[i].two);
+		}
+		for (i = 0; i < rename_src_nr; i++)
+			add_if_missing(options->repo, &to_fetch, rename_src[i].p->one);
+		if (to_fetch.nr)
+			promisor_remote_get_direct(options->repo,
+						   to_fetch.oid, to_fetch.nr);
+		oid_array_clear(&to_fetch);
+	}
+
 	if (options->show_rename_progress) {
 		progress = start_delayed_progress(
 				_("Performing inexact rename detection"),
diff --git a/diffcore.h b/diffcore.h
index 7c07347e42..9f69506574 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -144,7 +144,7 @@  struct diff_filepair *diff_queue(struct diff_queue_struct *,
 void diff_q(struct diff_queue_struct *, struct diff_filepair *);
 
 void diffcore_break(struct repository *, int);
-void diffcore_rename(struct diff_options *);
+void diffcore_rename(struct diff_options *, int prefetched);
 void diffcore_merge_broken(void);
 void diffcore_pickaxe(struct diff_options *);
 void diffcore_order(const char *orderfile);
diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
index 4831ad35e6..7acb64727d 100755
--- a/t/t4067-diff-partial-clone.sh
+++ b/t/t4067-diff-partial-clone.sh
@@ -131,4 +131,53 @@  test_expect_success 'diff with rename detection batches blobs' '
 	test_line_count = 1 done_lines
 '
 
+test_expect_success 'diff does not fetch anything if inexact rename detection is not needed' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	printf "b\nb\nb\nb\nb\n" >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	rm server/b &&
+	printf "b\nb\nb\nb\nb\n" >server/c &&
+	git -C server add c &&
+	git -C server commit -a -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure no fetches.
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff --raw -M HEAD^ HEAD &&
+	! test_path_exists trace
+'
+
+test_expect_success 'diff --break-rewrites fetches only if necessary, and batches blobs if it does' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	printf "b\nb\nb\nb\nb\n" >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	printf "c\nc\nc\nc\nc\n" >server/b &&
+	git -C server commit -a -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure no fetches.
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff --raw -M HEAD^ HEAD &&
+	! test_path_exists trace &&
+
+	# But with --break-rewrites, ensure that there is exactly 1 negotiation
+	# by checking that there is only 1 "done" line sent. ("done" marks the
+	# end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff --break-rewrites --raw -M HEAD^ HEAD &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
 test_done