diff mbox series

[v3,08/12] bundle: parse filter capability

Message ID 025f38290f5a705c80854a42e1abcaba9a9f336d.1646750359.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Partial bundles | expand

Commit Message

Derrick Stolee March 8, 2022, 2:39 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

The v3 bundle format has capabilities, allowing newer versions of Git to
create bundles with newer features. Older versions that do not
understand these new capabilities will fail with a helpful warning.

Create a new capability allowing Git to understand that the contained
pack-file is filtered according to some object filter. Typically, this
filter will be "blob:none" for a blobless partial clone.

This change teaches Git to parse this capability, place its value in the
bundle header, and demonstrate this understanding by adding a message to
'git bundle verify'.

Since we will use gently_parse_list_objects_filter() outside of
list-objects-filter-options.c, make it an external method and move its
API documentation to before its declaration.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/technical/bundle-format.txt | 11 ++++++++---
 bundle.c                                  | 15 ++++++++++++++-
 bundle.h                                  |  2 ++
 list-objects-filter-options.c             | 17 +----------------
 list-objects-filter-options.h             | 20 ++++++++++++++++++++
 5 files changed, 45 insertions(+), 20 deletions(-)

Comments

Junio C Hamano March 8, 2022, 5:29 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/bundle.h b/bundle.h
> index 06009fe6b1f..7fef2108f43 100644
> --- a/bundle.h
> +++ b/bundle.h
> @@ -4,12 +4,14 @@
>  #include "strvec.h"
>  #include "cache.h"
>  #include "string-list.h"
> +#include "list-objects-filter-options.h"
>  
>  struct bundle_header {
>  	unsigned version;
>  	struct string_list prerequisites;
>  	struct string_list references;
>  	const struct git_hash_algo *hash_algo;
> +	struct list_objects_filter_options filter;
>  };

This used to be a pointer to the struct, with "NULL means do not
filter" semantics, with .nr==0 as BUG().  Which was the same
justification used when an earlier step added a pointer to the
filter struct to rev_info.

Should the same logic applies there to make it into an embedded
struct in rev_info as well?
Ævar Arnfjörð Bjarmason March 9, 2022, 1:30 p.m. UTC | #2
On Tue, Mar 08 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
> [...]
>  static const char v2_bundle_signature[] = "# v2 git bundle\n";
>  static const char v3_bundle_signature[] = "# v3 git bundle\n";
> @@ -33,6 +33,7 @@ void bundle_header_release(struct bundle_header *header)
>  {
>  	string_list_clear(&header->prerequisites, 1);
>  	string_list_clear(&header->references, 1);
> +	list_objects_filter_release(&header->filter);
>  }
>  
>  static int parse_capability(struct bundle_header *header, const char *capability)
> @@ -45,6 +46,10 @@ static int parse_capability(struct bundle_header *header, const char *capability
>  		header->hash_algo = &hash_algos[algo];
>  		return 0;
>  	}
> +	if (skip_prefix(capability, "filter=", &arg)) {
> +		parse_list_objects_filter(&header->filter, arg);
> +		return 0;
> +	}
>  	return error(_("unknown capability '%s'"), capability);
>  }

I haven't tested, but I did wonder if our purely "check reachability"
(or equivalent) "verify" would be slowed down by doing whatever filter
magic we're doing here, but then remembered/saw that we only parse the
header, so it can't be that bad :)

I.e. this is only checking the syntax of the filter, surely, and then
spits it back at us. That makes sense.

I think that this hunk from the subsequent 10/12 is in the wrong place
though, and should be here when we change "verify" (not "create" later):
	
	diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
	index 72ab8139052..831c4788a94 100644
	--- a/Documentation/git-bundle.txt
	+++ b/Documentation/git-bundle.txt
	@@ -75,8 +75,8 @@ verify <file>::
	 	cleanly to the current repository.  This includes checks on the
	 	bundle format itself as well as checking that the prerequisite
	 	commits exist and are fully linked in the current repository.
	-	'git bundle' prints a list of missing commits, if any, and exits
	-	with a non-zero status.
	+	'git bundle' prints the bundle's object filter and its list of
	+	missing commits, if any, and exits with a non-zero status.
	 
	 list-heads <file>::
	 	Lists the references defined in the bundle.  If followed by a

I think instead of starting to list every header we might add in the
future there it would make sense to just add after the pre-image
full-stop something like:

    We'll also print out information about any known capabilities, such
    as "object filter". See "Capabilities" in technical/...

Which future proofs it a bit...

> [...]

I think it would also be great to have tests for intentionally
corrupting the header and seeing what the "verify" output is, i.e. do we
die right away, proceed to validate the rest. So just (somewhat
pseudocode):

    git bundle create [...] my.bdl &&
    sed 's/@filter: .*/@filter: bad/' <my.bdl >bad.bdl &&
    test_must_fail git bundle verify bad.bdl 2>actual &&
    [...]
    test_cmp expect actual

Overall this series looks really good at this point, and I'm down to
minor shades of colors on the bikeshedding :)
Derrick Stolee March 9, 2022, 2:35 p.m. UTC | #3
On 3/8/2022 12:29 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/bundle.h b/bundle.h
>> index 06009fe6b1f..7fef2108f43 100644
>> --- a/bundle.h
>> +++ b/bundle.h
>> @@ -4,12 +4,14 @@
>>  #include "strvec.h"
>>  #include "cache.h"
>>  #include "string-list.h"
>> +#include "list-objects-filter-options.h"
>>  
>>  struct bundle_header {
>>  	unsigned version;
>>  	struct string_list prerequisites;
>>  	struct string_list references;
>>  	const struct git_hash_algo *hash_algo;
>> +	struct list_objects_filter_options filter;
>>  };
> 
> This used to be a pointer to the struct, with "NULL means do not
> filter" semantics, with .nr==0 as BUG().  Which was the same
> justification used when an earlier step added a pointer to the
> filter struct to rev_info.
> 
> Should the same logic applies there to make it into an embedded
> struct in rev_info as well?

You're absolutely right. Making the change will make the
range-diff look absolutely horrid, but the change isn't
terribly hard for the most part.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/Documentation/technical/bundle-format.txt b/Documentation/technical/bundle-format.txt
index bac558d049a..b9be8644cf5 100644
--- a/Documentation/technical/bundle-format.txt
+++ b/Documentation/technical/bundle-format.txt
@@ -71,6 +71,11 @@  and the Git bundle v2 format cannot represent a shallow clone repository.
 == Capabilities
 
 Because there is no opportunity for negotiation, unknown capabilities cause 'git
-bundle' to abort.  The only known capability is `object-format`, which specifies
-the hash algorithm in use, and can take the same values as the
-`extensions.objectFormat` configuration value.
+bundle' to abort.
+
+* `object-format` specifies the hash algorithm in use, and can take the same
+  values as the `extensions.objectFormat` configuration value.
+
+* `filter` specifies an object filter as in the `--filter` option in
+  linkgit:git-rev-list[1]. The resulting pack-file must be marked as a
+  `.promisor` pack-file after it is unbundled.
diff --git a/bundle.c b/bundle.c
index 7ba60a573d7..41922565627 100644
--- a/bundle.c
+++ b/bundle.c
@@ -11,7 +11,7 @@ 
 #include "run-command.h"
 #include "refs.h"
 #include "strvec.h"
-
+#include "list-objects-filter-options.h"
 
 static const char v2_bundle_signature[] = "# v2 git bundle\n";
 static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -33,6 +33,7 @@  void bundle_header_release(struct bundle_header *header)
 {
 	string_list_clear(&header->prerequisites, 1);
 	string_list_clear(&header->references, 1);
+	list_objects_filter_release(&header->filter);
 }
 
 static int parse_capability(struct bundle_header *header, const char *capability)
@@ -45,6 +46,10 @@  static int parse_capability(struct bundle_header *header, const char *capability
 		header->hash_algo = &hash_algos[algo];
 		return 0;
 	}
+	if (skip_prefix(capability, "filter=", &arg)) {
+		parse_list_objects_filter(&header->filter, arg);
+		return 0;
+	}
 	return error(_("unknown capability '%s'"), capability);
 }
 
@@ -220,6 +225,8 @@  int verify_bundle(struct repository *r,
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
 
+	revs.filter = &header->filter;
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 
@@ -259,6 +266,12 @@  int verify_bundle(struct repository *r,
 			     r->nr),
 			  r->nr);
 		list_refs(r, 0, NULL);
+
+		if (header->filter.choice != LOFC_DISABLED) {
+			printf_ln("The bundle uses this filter: %s",
+				  list_objects_filter_spec(&header->filter));
+		}
+
 		r = &header->prerequisites;
 		if (!r->nr) {
 			printf_ln(_("The bundle records a complete history."));
diff --git a/bundle.h b/bundle.h
index 06009fe6b1f..7fef2108f43 100644
--- a/bundle.h
+++ b/bundle.h
@@ -4,12 +4,14 @@ 
 #include "strvec.h"
 #include "cache.h"
 #include "string-list.h"
+#include "list-objects-filter-options.h"
 
 struct bundle_header {
 	unsigned version;
 	struct string_list prerequisites;
 	struct string_list references;
 	const struct git_hash_algo *hash_algo;
+	struct list_objects_filter_options filter;
 };
 
 #define BUNDLE_HEADER_INIT \
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index fd8d59f653a..d8597cdee36 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -40,22 +40,7 @@  const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 	BUG("list_object_filter_config_name: invalid argument '%d'", c);
 }
 
-/*
- * Parse value of the argument to the "filter" keyword.
- * On the command line this looks like:
- *       --filter=<arg>
- * and in the pack protocol as:
- *       "filter" SP <arg>
- *
- * The filter keyword will be used by many commands.
- * See Documentation/rev-list-options.txt for allowed values for <arg>.
- *
- * Capture the given arg as the "filter_spec".  This can be forwarded to
- * subordinate commands when necessary (although it's better to pass it through
- * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
- * convenience of the current command.
- */
-static int gently_parse_list_objects_filter(
+int gently_parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg,
 	struct strbuf *errbuf)
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index da5b6737e27..f6fe6a3d2ca 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -72,6 +72,26 @@  struct list_objects_filter_options {
 /* Normalized command line arguments */
 #define CL_ARG__FILTER "filter"
 
+/*
+ * Parse value of the argument to the "filter" keyword.
+ * On the command line this looks like:
+ *       --filter=<arg>
+ * and in the pack protocol as:
+ *       "filter" SP <arg>
+ *
+ * The filter keyword will be used by many commands.
+ * See Documentation/rev-list-options.txt for allowed values for <arg>.
+ *
+ * Capture the given arg as the "filter_spec".  This can be forwarded to
+ * subordinate commands when necessary (although it's better to pass it through
+ * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
+ * convenience of the current command.
+ */
+int gently_parse_list_objects_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf);
+
 void list_objects_filter_die_if_populated(
 	struct list_objects_filter_options *filter_options);