diff mbox series

[v6,3/3] unbundle: support object verification for fetches

Message ID 53395e8c08a8487f3e53dca15766307854a24b3b.1718109943.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series object checking related additions and fixes for bundles in fetches | expand

Commit Message

Xing Xin June 11, 2024, 12:45 p.m. UTC
From: Xing Xin <xingxin.xx@bytedance.com>

This commit extends object verification support for fetches in
`bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
option to `verify_bundle_flags`. When this option is enabled,
`bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
determine whether to append the "--fsck-objects" flag to
"git-index-pack".

`VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is now passed to `unbundle` in the
fetching process, including:

- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.

This addition ensures a consistent logic for object verification during
fetch operations. Tests have been added to confirm functionality in the
scenarios mentioned above.

Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
 bundle-uri.c                |  2 +-
 bundle.c                    |  5 +++++
 bundle.h                    |  1 +
 t/t5558-clone-bundle-uri.sh | 35 ++++++++++++++++++++++++++++++++++-
 t/t5607-clone-bundle.sh     | 33 +++++++++++++++++++++++++++++++++
 transport.c                 |  2 +-
 6 files changed, 75 insertions(+), 3 deletions(-)

Comments

Junio C Hamano June 11, 2024, 8:05 p.m. UTC | #1
"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Xing Xin <xingxin.xx@bytedance.com>
>
> This commit extends object verification support for fetches in
> `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
> option to `verify_bundle_flags`. When this option is enabled,
> `bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
> determine whether to append the "--fsck-objects" flag to
> "git-index-pack".

Please start your proposed log message by stating what the perceived
problem without this patch in the current world is.  Without it, you
cannot easily answer the most basic question: "why are we doing this?"

The name VERIFY_BUNDLE_FSCK_FOLLOW_FETCH does not read very well.
VERIFY_BUNDLE part is common across various flags, so what is
specific to the feature is "FSCK_FOLLOW_FETCH", and it is good that
we convey the fact that we do a bit more than the normal
VERIFY_BUNDLE (which is, to read the prerequisite headers and make
sure we have them in the sense that they are reachable from our
refs) with the word FSCK.

But is it necessary (or even a good idea) to limit its usability
with "FOLLOW_FETCH" (which does not look even grammatical)?  Aren't
we closing the door to other folks who may want to do a more
thorough fsck-like checks in other codepaths by saying "if you are
not doing this immediately after you fetch, you are unwelcome to use
this flag"?

> `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is now passed to `unbundle` in the
> fetching process, including:
>
> - `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
> - `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
>
> This addition ensures a consistent logic for object verification during
> fetch operations. Tests have been added to confirm functionality in the
> scenarios mentioned above.
>
> Reviewed-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
>  bundle-uri.c                |  2 +-
>  bundle.c                    |  5 +++++
>  bundle.h                    |  1 +
>  t/t5558-clone-bundle-uri.sh | 35 ++++++++++++++++++++++++++++++++++-
>  t/t5607-clone-bundle.sh     | 33 +++++++++++++++++++++++++++++++++
>  transport.c                 |  2 +-
>  6 files changed, 75 insertions(+), 3 deletions(-)
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 65666a11d9c..e7ebac6ce57 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  	 * the prerequisite commits.
>  	 */
>  	if ((result = unbundle(r, &header, bundle_fd, NULL,
> -			       VERIFY_BUNDLE_QUIET)))
> +			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
>  		return 1;

OK (modulo the flag name).

> +	if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
> +		if (fetch_pack_fsck_objects())
> +			strvec_push(&ip.args, "--fsck-objects");
> +

OK, that's quite straight-forward.  We are running "index-pack
--fix-thin --stdin" and feeding the bundle data to it.  We just tell
it to also work in the "--fsck-objects" mode.

> diff --git a/transport.c b/transport.c
> index 0ad04b77fd2..6cd5683bb45 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
>  	if (!data->get_refs_from_bundle_called)
>  		get_refs_from_bundle_inner(transport);
>  	ret = unbundle(the_repository, &data->header, data->fd,
> -		       &extra_index_pack_args, 0);
> +		       &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);

OK.

I wonder if something like this is a potential follow-up topic
somebody may be interested in after the dust settles.  That is
exactly why the name of this bit matters.



 builtin/bundle.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git c/builtin/bundle.c w/builtin/bundle.c
index d5d41a8f67..eeb5963dcb 100644
--- c/builtin/bundle.c
+++ w/builtin/bundle.c
@@ -194,10 +194,13 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
 	int bundle_fd = -1;
 	int ret;
 	int progress = isatty(2);
+	int fsck_objects = 0;
 
 	struct option options[] = {
 		OPT_BOOL(0, "progress", &progress,
 			 N_("show progress meter")),
+		OPT_BOOL(0, "fsck-objects", &fsck_objects,
+			 N_("check the objects in the bundle")),
 		OPT_END()
 	};
 	char *bundle_file;
@@ -217,7 +220,8 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
 		strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
 			     _("Unbundling objects"), NULL);
 	ret = !!unbundle(the_repository, &header, bundle_fd,
-			 &extra_index_pack_args, 0) ||
+			 &extra_index_pack_args,
+			 fsck_objects ? VERIFY_BUNDLE_FSCK_FOLLOW_FETCH : 0) ||
 		list_bundle_refs(&header, argc, argv);
 	bundle_header_release(&header);
 cleanup:
Xing Xin June 12, 2024, 6:33 p.m. UTC | #2
At 2024-06-12 04:05:35, "Junio C Hamano" <gitster@pobox.com> wrote:
>"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Xing Xin <xingxin.xx@bytedance.com>
>>
>> This commit extends object verification support for fetches in
>> `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
>> option to `verify_bundle_flags`. When this option is enabled,
>> `bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
>> determine whether to append the "--fsck-objects" flag to
>> "git-index-pack".
>
>Please start your proposed log message by stating what the perceived
>problem without this patch in the current world is.  Without it, you
>cannot easily answer the most basic question: "why are we doing this?"

Got it.

>The name VERIFY_BUNDLE_FSCK_FOLLOW_FETCH does not read very well.
>VERIFY_BUNDLE part is common across various flags, so what is
>specific to the feature is "FSCK_FOLLOW_FETCH", and it is good that
>we convey the fact that we do a bit more than the normal
>VERIFY_BUNDLE (which is, to read the prerequisite headers and make
>sure we have them in the sense that they are reachable from our
>refs) with the word FSCK.
>
>But is it necessary (or even a good idea) to limit its usability
>with "FOLLOW_FETCH" (which does not look even grammatical)?  Aren't
>we closing the door to other folks who may want to do a more
>thorough fsck-like checks in other codepaths by saying "if you are
>not doing this immediately after you fetch, you are unwelcome to use
>this flag"?

I initially considered adding another option VERIFY_BUNDLE_FSCK_ALWAYS
for other scenarios, which would take a higher priority than
VERIFY_BUNDLE_FSCK_FOLLOW_FETCH. However, that approach is also
confusing, as we would have two flags both controlling the fsck
behavior.

How about extending VERIFY_BUNDLE_FSCK and letting the callers decide
whether to add the flag for fscking?

In bundle.c, we can make a small change like:

@@ -625,6 +626,9 @@ int unbundle(struct repository *r, struct bundle_header *header,
 	if (header->filter.choice)
 		strvec_push(&ip.args, "--promisor=from-bundle");

+	if (flags & VERIFY_BUNDLE_FSCK)
+		strvec_push(&ip.args, "--fsck-objects");
+
 	if (extra_index_pack_args) {
 		strvec_pushv(&ip.args, extra_index_pack_args->v);
 		strvec_clear(extra_index_pack_args);

For example, in `bundle-uri.c:unbundle_from_file`, which is one of the
callers of unbundle, we can use `fetch_pack_fsck_objects` to decide
whether to add that option:

@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	 * the prerequisite commits.
 	 */
 	if ((result = unbundle(r, &header, bundle_fd, NULL,
-			       VERIFY_BUNDLE_QUIET)))
+			       VERIFY_BUNDLE_QUIET | (fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0))))
 		return 1;

This approach should streamline the code while maintaining flexibility.
The follow-up patch you mentioned below should just work then, it is not
for now because we are touching the unwanted `fetch_pack_fsck_objects`
within `unbundle`.

Xing Xin

>I wonder if something like this is a potential follow-up topic
>somebody may be interested in after the dust settles.  That is
>exactly why the name of this bit matters.
>
>
>
> builtin/bundle.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git c/builtin/bundle.c w/builtin/bundle.c
>index d5d41a8f67..eeb5963dcb 100644
>--- c/builtin/bundle.c
>+++ w/builtin/bundle.c
>@@ -194,10 +194,13 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
> 	int bundle_fd = -1;
> 	int ret;
> 	int progress = isatty(2);
>+	int fsck_objects = 0;
> 
> 	struct option options[] = {
> 		OPT_BOOL(0, "progress", &progress,
> 			 N_("show progress meter")),
>+		OPT_BOOL(0, "fsck-objects", &fsck_objects,
>+			 N_("check the objects in the bundle")),
> 		OPT_END()
> 	};
> 	char *bundle_file;
>@@ -217,7 +220,8 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
> 		strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
> 			     _("Unbundling objects"), NULL);
> 	ret = !!unbundle(the_repository, &header, bundle_fd,
>-			 &extra_index_pack_args, 0) ||
>+			 &extra_index_pack_args,
>+			 fsck_objects ? VERIFY_BUNDLE_FSCK_FOLLOW_FETCH : 0) ||
> 		list_bundle_refs(&header, argc, argv);
> 	bundle_header_release(&header);
> cleanup:
diff mbox series

Patch

diff --git a/bundle-uri.c b/bundle-uri.c
index 65666a11d9c..e7ebac6ce57 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -373,7 +373,7 @@  static int unbundle_from_file(struct repository *r, const char *file)
 	 * the prerequisite commits.
 	 */
 	if ((result = unbundle(r, &header, bundle_fd, NULL,
-			       VERIFY_BUNDLE_QUIET)))
+			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
 		return 1;
 
 	/*
diff --git a/bundle.c b/bundle.c
index 95367c2d0a0..20bbfffbb43 100644
--- a/bundle.c
+++ b/bundle.c
@@ -17,6 +17,7 @@ 
 #include "list-objects-filter-options.h"
 #include "connected.h"
 #include "write-or-die.h"
+#include "fetch-pack.h"
 
 static const char v2_bundle_signature[] = "# v2 git bundle\n";
 static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -625,6 +626,10 @@  int unbundle(struct repository *r, struct bundle_header *header,
 	if (header->filter.choice)
 		strvec_push(&ip.args, "--promisor=from-bundle");
 
+	if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
+		if (fetch_pack_fsck_objects())
+			strvec_push(&ip.args, "--fsck-objects");
+
 	if (extra_index_pack_args) {
 		strvec_pushv(&ip.args, extra_index_pack_args->v);
 		strvec_clear(extra_index_pack_args);
diff --git a/bundle.h b/bundle.h
index 021adbdcbb3..545df6e9d40 100644
--- a/bundle.h
+++ b/bundle.h
@@ -33,6 +33,7 @@  int create_bundle(struct repository *r, const char *path,
 enum verify_bundle_flags {
 	VERIFY_BUNDLE_VERBOSE = (1 << 0),
 	VERIFY_BUNDLE_QUIET = (1 << 1),
+	VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 2),
 };
 
 int verify_bundle(struct repository *r, struct bundle_header *header,
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 8f4f802e4f1..48be1b18802 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -30,7 +30,21 @@  test_expect_success 'create bundle' '
 		git bundle create B.bundle topic &&
 
 		# Create a bundle with reference pointing to non-existent object.
-		sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
+		sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle &&
+
+		cat >data <<-EOF &&
+		tree $(git rev-parse HEAD^{tree})
+		parent $(git rev-parse HEAD)
+		author A U Thor
+		committer A U Thor
+
+		commit: this is a commit with bad emails
+
+		EOF
+		git hash-object --literally -t commit -w --stdin <data >commit &&
+		git branch bad $(cat commit) &&
+		git bundle create bad-object.bundle bad &&
+		git update-ref -d refs/heads/bad
 	)
 '
 
@@ -52,6 +66,25 @@  test_expect_success 'clone with bundle that has bad header' '
 	! grep "refs/bundles/" refs
 '
 
+test_expect_success 'clone with bundle that has bad object' '
+	# Unbundle succeeds if no fsckObjects confugured.
+	git clone --bundle-uri="clone-from/bad-object.bundle" \
+		clone-from clone-bad-object-no-fsck &&
+	git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/bad
+	EOF
+	test_cmp expect actual &&
+
+	# Unbundle fails with fsckObjects set true, but clone can still proceed.
+	git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad-object.bundle" \
+		clone-from clone-bad-object-fsck 2>err &&
+	test_grep "missingEmail" err &&
+	git -C clone-bad-object-fsck for-each-ref --format="%(refname)" >refs &&
+	! grep "refs/bundles/" refs
+'
+
 test_expect_success 'clone with path bundle and non-default hash' '
 	test_when_finished "rm -rf clone-path-non-default-hash" &&
 	GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh
index 0d1e92d9963..5182efc0b45 100755
--- a/t/t5607-clone-bundle.sh
+++ b/t/t5607-clone-bundle.sh
@@ -138,6 +138,39 @@  test_expect_success 'fetch SHA-1 from bundle' '
 	git fetch --no-tags foo/tip.bundle "$(cat hash)"
 '
 
+test_expect_success 'clone bundle with different fsckObjects configurations' '
+	test_create_repo bundle-fsck &&
+	(
+		cd bundle-fsck &&
+		test_commit first &&
+		cat >data <<-EOF &&
+		tree $(git rev-parse HEAD^{tree})
+		parent $(git rev-parse HEAD)
+		author A U Thor
+		committer A U Thor
+
+		commit: this is a commit with bad emails
+
+		EOF
+		git hash-object --literally -t commit -w --stdin <data >commit &&
+		git branch bad $(cat commit) &&
+		git bundle create bad.bundle bad
+	) &&
+
+	git clone bundle-fsck/bad.bundle bundle-no-fsck &&
+
+	git -c fetch.fsckObjects=false -c transfer.fsckObjects=true \
+		clone bundle-fsck/bad.bundle bundle-fetch-no-fsck &&
+
+	test_must_fail git -c fetch.fsckObjects=true \
+		clone bundle-fsck/bad.bundle bundle-fetch-fsck 2>err &&
+	test_grep "missingEmail" err &&
+
+	test_must_fail git -c transfer.fsckObjects=true \
+		clone bundle-fsck/bad.bundle bundle-transfer-fsck 2>err &&
+	test_grep "missingEmail" err
+'
+
 test_expect_success 'git bundle uses expected default format' '
 	git bundle create bundle HEAD^.. &&
 	cat >expect <<-EOF &&
diff --git a/transport.c b/transport.c
index 0ad04b77fd2..6cd5683bb45 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,7 @@  static int fetch_refs_from_bundle(struct transport *transport,
 	if (!data->get_refs_from_bundle_called)
 		get_refs_from_bundle_inner(transport);
 	ret = unbundle(the_repository, &data->header, data->fd,
-		       &extra_index_pack_args, 0);
+		       &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);
 	transport->hash_algo = data->header.hash_algo;
 	return ret;
 }