diff mbox series

[RFC] list-objects-filter: introduce new filter sparse:buffer=<spec>

Message ID pull.1316.git.1659958159193.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] list-objects-filter: introduce new filter sparse:buffer=<spec> | expand

Commit Message

ZheNing Hu Aug. 8, 2022, 11:29 a.m. UTC
From: ZheNing Hu <adlternative@gmail.com>

Although we already had a `--filter=sparse:oid=<oid>` which
can used to clone a repository with limited objects which meet
filter rules in the file corresponding to the <oid> on the git
server. But it can only read filter rules which have been record
in the git server before.

But sometimes users may want to pass their own filter rules
without upload it to the git server first. So now introduce a
new filter `sparse:buffer=<spec>`, which can pass users
required filterspec.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    [RFC] list-objects-filter: introduce new filter sparse:buffer=
    
    This patch mainly do a small optimization for partial-clone:
    
    Introduce new filter sparse:buffer=, user can passes their required
    filterspec from the command line.
    
    e.g. git clone --filter="sparse:buffer=zip.c" --no-checkout
    git@github.com:git/git.git which can filter out all blobs except zip.c.
    
    git clone --filter="sparse:buffer=block-sha" --no-checkout
    git@github.com:git/git.git which can filter out all blobs outside of
    this directory block-sha/.
    
    or user can write a filterspec file:
    
    cat >filterspec <<-EOF && block-sha1 !block-sha1/sha1.c EOF git clone
    --filter="sparse:buffer=`cat filterspec`" --no-checkout
    git@github.com:git/git.git
    
    which can use filter out all blobs outside of block-sha1/ and
    block-sha1/sha1.c.
    
    Why cannot we introduce a new option like
    --filter="sparse:specfile=<file>": git server/client now use same filter
    parsing function, if we pass the specfile data as the filter content to
    server, server parse it again as a specfile... So maybe we need a
    differentiation of filter parsing functions on client and server.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1316%2Fadlternative%2Fzh%2Fparital-clone-sparse-buffer-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1316/adlternative/zh/parital-clone-sparse-buffer-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1316

 Documentation/rev-list-options.txt |  4 ++
 dir.c                              |  6 +--
 dir.h                              |  3 ++
 list-objects-filter-options.c      |  8 +++
 list-objects-filter-options.h      |  3 ++
 list-objects-filter.c              | 27 ++++++++++
 t/t5616-partial-clone.sh           | 83 ++++++++++++++++++++++++++++++
 7 files changed, 129 insertions(+), 5 deletions(-)


base-commit: 679aad9e82d0dfd8ef3d1f98fa4629665496cec9

Comments

Junio C Hamano Aug. 8, 2022, 4:15 p.m. UTC | #1
"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> Although we already had a `--filter=sparse:oid=<oid>` which
> can used to clone a repository with limited objects which meet
> filter rules in the file corresponding to the <oid> on the git
> server. But it can only read filter rules which have been record
> in the git server before.

Was the reason why we have "we limit to an object we already have"
restriction because we didn't want to blindly use a piece of
uncontrolled arbigrary end-user data here?  Just wondering.
ZheNing Hu Aug. 9, 2022, 6:13 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> 于2022年8月9日周二 00:15写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > Although we already had a `--filter=sparse:oid=<oid>` which
> > can used to clone a repository with limited objects which meet
> > filter rules in the file corresponding to the <oid> on the git
> > server. But it can only read filter rules which have been record
> > in the git server before.
>
> Was the reason why we have "we limit to an object we already have"
> restriction because we didn't want to blindly use a piece of
> uncontrolled arbigrary end-user data here?  Just wondering.
>

* An end-user's maybe doesn't even have write access to the repository,
so they can't config a filterspec file before git clone, what should they
do now?

* If there are  thousands of  different developers use the same git repo,
and they use "--filter=sparse:oid" to do different partial-clone, then how
many filterspec file should repo managers config first?

* Why not carefully check "uncontrolled arbigrary end-user data" here,
such as add a config like "partialclone.sparsebufferlimit" to limit transport
data size, or check if filterspec file is legal? Or if git server
don't trust its
user... we can use a config to ban this filter, And within some companies,
users can basically be trusted.

 * I'm sure it would be beneficial to let the filtering rules be configured
by the user, because now many people have such needs: download
only a few of files of directories of the repository.

* sparse-checkout + partial-clone is a good reference: we have a
".git/info/sparse-checkout" for record what we actually want to checkout to
work-tree, and it will fetch some missing git objects which record in
".git/info/sparse-checkout" from git server. I know it use <oid> to fetch
objects one by one instead of "path"... But In hindsight, its performance is
extraordinarily bad as a result...

Anyway, this patch represents some of my complaints about the current
partial-clone feature and I hope the community will move forward with it.

Thanks.

ZheNing Hu
Derrick Stolee Aug. 9, 2022, 1:37 p.m. UTC | #3
On 8/8/2022 12:15 PM, Junio C Hamano wrote:
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: ZheNing Hu <adlternative@gmail.com>
>>
>> Although we already had a `--filter=sparse:oid=<oid>` which
>> can used to clone a repository with limited objects which meet
>> filter rules in the file corresponding to the <oid> on the git
>> server. But it can only read filter rules which have been record
>> in the git server before.
> 
> Was the reason why we have "we limit to an object we already have"
> restriction because we didn't want to blindly use a piece of
> uncontrolled arbigrary end-user data here?  Just wondering.

One of the ideas here was to limit the opportunity of sending an
arbitrary set of data over the Git protocol and avoid exactly the
scenario you mention.

Another was that it is incredibly expensive to compute the set of
reachable objects within an arbitrary sparse-checkout definition,
since it requires walking trees (bitmaps do not help here). This
is why (to my knowledge) no Git hosting service currently supports
this mechanism at scale. At minimum, using the stored OID would
allow the host to keep track of these pre-defined sets and do some
precomputing of reachable data using bitmaps to keep clones and
fetches reasonable at all.

The other side of the issue is that we do not have a good solution
for resolving how to change this filter in the future, in case the
user wants to expand their sparse-checkout definition and update
their partial clone filter.

There used to be a significant issue where a 'git checkout'
would fault in a lot of missing trees because the index needed to
reference the files outside of the sparse-checkout definition. Now
that the sparse index exists, this is less of an impediment, but
it can still cause some pain.

At this moment, I think path-scoped filters have a lot of problems
that need solving before they can be used effectively in the wild.
I would prefer that we solve those problems before making the
feature more complicated. That's a tall ask, since these problems
do not have simple solutions.

Thanks,
-Stolee
Jeff King Aug. 10, 2022, 9:15 p.m. UTC | #4
On Tue, Aug 09, 2022 at 09:37:09AM -0400, Derrick Stolee wrote:

> > Was the reason why we have "we limit to an object we already have"
> > restriction because we didn't want to blindly use a piece of
> > uncontrolled arbigrary end-user data here?  Just wondering.
> 
> One of the ideas here was to limit the opportunity of sending an
> arbitrary set of data over the Git protocol and avoid exactly the
> scenario you mention.

One other implication here is that the filter spec is sent inside of a
pkt-line.  So the implementation here is limiting us to 64kb. That may
sound like a lot for simple specs, but I imagine in big repos they can
possibly get pretty complex.

That would be fixable with a protocol extension to take the data over
multiple pkt-lines.

That said...

> At this moment, I think path-scoped filters have a lot of problems
> that need solving before they can be used effectively in the wild.
> I would prefer that we solve those problems before making the
> feature more complicated. That's a tall ask, since these problems
> do not have simple solutions.

...I agree with this. It is nice to put more power in the hands of the
clients, but we have to balance that with other issues like server
resource use. The approach so far has been to implement the simplest and
most efficient operations at the client-server level, and then have the
client build local features on top of that. So in this case, probably
requesting that _no_ trees are sent in the initial clone, and then
faulting them in as the client explores the tree using its own local
sparse definition. And I think that mostly works now.

Though I admit I do not keep a close watch on the status of
partial-checkout features. I mostly always cared about it from the
server provider angle. ;)

-Peff
ZheNing Hu Aug. 12, 2022, 3:40 p.m. UTC | #5
Derrick Stolee <derrickstolee@github.com> 于2022年8月9日周二 21:37写道:
>
> On 8/8/2022 12:15 PM, Junio C Hamano wrote:
> > "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >> From: ZheNing Hu <adlternative@gmail.com>
> >>
> >> Although we already had a `--filter=sparse:oid=<oid>` which
> >> can used to clone a repository with limited objects which meet
> >> filter rules in the file corresponding to the <oid> on the git
> >> server. But it can only read filter rules which have been record
> >> in the git server before.
> >
> > Was the reason why we have "we limit to an object we already have"
> > restriction because we didn't want to blindly use a piece of
> > uncontrolled arbigrary end-user data here?  Just wondering.
>
> One of the ideas here was to limit the opportunity of sending an
> arbitrary set of data over the Git protocol and avoid exactly the
> scenario you mention.
>
> Another was that it is incredibly expensive to compute the set of
> reachable objects within an arbitrary sparse-checkout definition,
> since it requires walking trees (bitmaps do not help here). This
> is why (to my knowledge) no Git hosting service currently supports
> this mechanism at scale. At minimum, using the stored OID would
> allow the host to keep track of these pre-defined sets and do some
> precomputing of reachable data using bitmaps to keep clones and
> fetches reasonable at all.
>

How about only allowing some easier filter rules?

e.g. https://github.com/derrickstolee/sparse-checkout-example

User A can use --filter="sparse:buffer=client" to download client/ directory,
User B can use --filter="sparse:buffer=service/list"  to download only
service/list.

cat >filterspec <<-EOF &&
web
service
EOF
User C can use --filter="sparse:buffer=`cat filterspec`" to download
web/ and service/.

cat >filterspec <<-EOF &&
service
!service/list
EOF
But user D cannot use --filter="sparse:buffer=service/list"  to
download service without service/list.

I guess many users can benefit from this...

> The other side of the issue is that we do not have a good solution
> for resolving how to change this filter in the future, in case the
> user wants to expand their sparse-checkout definition and update
> their partial clone filter.
>

I guess we don't really need to maintain this "partial clone filter", we
can even reuse sparse-checkout rules after we first partial-clone, we maybe
should write the first partial-clone filter rules to .git/info/sparse-checkout
(only when --sparse is used in git clone?)

> There used to be a significant issue where a 'git checkout'
> would fault in a lot of missing trees because the index needed to
> reference the files outside of the sparse-checkout definition. Now
> that the sparse index exists, this is less of an impediment, but
> it can still cause some pain.
>

Agree.

> At this moment, I think path-scoped filters have a lot of problems
> that need solving before they can be used effectively in the wild.
> I would prefer that we solve those problems before making the
> feature more complicated. That's a tall ask, since these problems
> do not have simple solutions.
>

Could you tell me where the problem is? I can start to deal with them :)

> Thanks,
> -Stolee

Thanks.
ZheNing Hu
ZheNing Hu Aug. 12, 2022, 3:49 p.m. UTC | #6
Jeff King <peff@peff.net> 于2022年8月11日周四 05:15写道:
>
> On Tue, Aug 09, 2022 at 09:37:09AM -0400, Derrick Stolee wrote:
>
> > > Was the reason why we have "we limit to an object we already have"
> > > restriction because we didn't want to blindly use a piece of
> > > uncontrolled arbigrary end-user data here?  Just wondering.
> >
> > One of the ideas here was to limit the opportunity of sending an
> > arbitrary set of data over the Git protocol and avoid exactly the
> > scenario you mention.
>
> One other implication here is that the filter spec is sent inside of a
> pkt-line.  So the implementation here is limiting us to 64kb. That may
> sound like a lot for simple specs, but I imagine in big repos they can
> possibly get pretty complex.
>
> That would be fixable with a protocol extension to take the data over
> multiple pkt-lines.
>

This sounds very scary, a filter rules file has (more then) 64KB...
If the filter is really big, I think the server will really be slow to parse it.

> That said...
>
> > At this moment, I think path-scoped filters have a lot of problems
> > that need solving before they can be used effectively in the wild.
> > I would prefer that we solve those problems before making the
> > feature more complicated. That's a tall ask, since these problems
> > do not have simple solutions.
>
> ...I agree with this. It is nice to put more power in the hands of the
> clients, but we have to balance that with other issues like server
> resource use. The approach so far has been to implement the simplest and
> most efficient operations at the client-server level, and then have the
> client build local features on top of that. So in this case, probably
> requesting that _no_ trees are sent in the initial clone, and then
> faulting them in as the client explores the tree using its own local
> sparse definition. And I think that mostly works now.
>

Agree. But we have to fetch these blobs one by one after partial clone,
why not reduce some extra network overhead If we can get those blobs
that are *most* needed in the first partial clone, right?

> Though I admit I do not keep a close watch on the status of
> partial-checkout features. I mostly always cared about it from the
> server provider angle. ;)
>
> -Peff

Thanks.

ZheNing Hu
Jeff King Aug. 14, 2022, 6:54 a.m. UTC | #7
On Fri, Aug 12, 2022 at 11:49:06PM +0800, ZheNing Hu wrote:

> > ...I agree with this. It is nice to put more power in the hands of the
> > clients, but we have to balance that with other issues like server
> > resource use. The approach so far has been to implement the simplest and
> > most efficient operations at the client-server level, and then have the
> > client build local features on top of that. So in this case, probably
> > requesting that _no_ trees are sent in the initial clone, and then
> > faulting them in as the client explores the tree using its own local
> > sparse definition. And I think that mostly works now.
> 
> Agree. But we have to fetch these blobs one by one after partial clone,
> why not reduce some extra network overhead If we can get those blobs
> that are *most* needed in the first partial clone, right?

Ideally we wouldn't be getting them one-by-one, but rather batching
them. I'm not sure of the current state of the code. But we should at
least be able to batch by tree depth.

You're right that we'll never be able to be as efficient as a server to
whom we can say "I care about these paths", though.

-Peff
ZheNing Hu Aug. 26, 2022, 5:10 a.m. UTC | #8
Derrick Stolee <derrickstolee@github.com> 于2022年8月9日周二 21:37写道:
>
> On 8/8/2022 12:15 PM, Junio C Hamano wrote:
> > "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >> From: ZheNing Hu <adlternative@gmail.com>
> >>
> >> Although we already had a `--filter=sparse:oid=<oid>` which
> >> can used to clone a repository with limited objects which meet
> >> filter rules in the file corresponding to the <oid> on the git
> >> server. But it can only read filter rules which have been record
> >> in the git server before.
> >
> > Was the reason why we have "we limit to an object we already have"
> > restriction because we didn't want to blindly use a piece of
> > uncontrolled arbigrary end-user data here?  Just wondering.
>
> One of the ideas here was to limit the opportunity of sending an
> arbitrary set of data over the Git protocol and avoid exactly the
> scenario you mention.
>

I find that sparse-checkout uses a "cone" mode to limit the set of send
data, which can achieve performance improvement. I don't know if we can
use this mode here? With a brief look, it seems that the "cone" mode is
ensuring that the filter rule we add is directory and does not contain some
special rule '!', '?', '*', '[', ']'. But now if we transport the
filter rule to git server,
git server cannot check if the filter rule is a directory, because it involves
paths in multiple commits. e.g. in 9e6f67, "test" can be a directory, but in
e5e154e, "test" can be a file... I don't know how to solve this problem...

> Another was that it is incredibly expensive to compute the set of
> reachable objects within an arbitrary sparse-checkout definition,
> since it requires walking trees (bitmaps do not help here). This
> is why (to my knowledge) no Git hosting service currently supports
> this mechanism at scale. At minimum, using the stored OID would
> allow the host to keep track of these pre-defined sets and do some
> precomputing of reachable data using bitmaps to keep clones and
> fetches reasonable at all.
>
> The other side of the issue is that we do not have a good solution
> for resolving how to change this filter in the future, in case the
> user wants to expand their sparse-checkout definition and update
> their partial clone filter.
>
> There used to be a significant issue where a 'git checkout'
> would fault in a lot of missing trees because the index needed to
> reference the files outside of the sparse-checkout definition. Now
> that the sparse index exists, this is less of an impediment, but
> it can still cause some pain.
>
> At this moment, I think path-scoped filters have a lot of problems
> that need solving before they can be used effectively in the wild.
> I would prefer that we solve those problems before making the
> feature more complicated. That's a tall ask, since these problems
> do not have simple solutions.
>

I have a good idea that if we can let such path-scoped filters work,
we can apply sparse-checkout with it... Maybe one day, users can
use:

git clone --sparse --filter="sparse:buffer=dir" xxx.git

to have the repo with sparse-checkout results...
Needless to say, this is very tempting.

> Thanks,
> -Stolee

Thanks,
ZheNing Hu
diff mbox series

Patch

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 195e74eec63..76410cdafd6 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -913,6 +913,10 @@  specification contained in the blob (or blob-expression) '<blob-ish>'
 to omit blobs that would not be required for a sparse checkout on
 the requested refs.
 +
+The form '--filter=sparse:buffer=<spec>' uses a sparse-checkout
+specification contained in the buffer to omit blobs that would not
+be required for a sparse checkout on the requested refs.
++
 The form '--filter=tree:<depth>' omits all blobs and trees whose depth
 from the root tree is >= <depth> (minimum depth if an object is located
 at multiple depths in the commits traversed). <depth>=0 will not include
diff --git a/dir.c b/dir.c
index d7cfb08e441..5c438d2e730 100644
--- a/dir.c
+++ b/dir.c
@@ -1037,10 +1037,6 @@  static void invalidate_directory(struct untracked_cache *uc,
 		dir->dirs[i]->recurse = 0;
 }
 
-static int add_patterns_from_buffer(char *buf, size_t size,
-				    const char *base, int baselen,
-				    struct pattern_list *pl);
-
 /* Flags for add_patterns() */
 #define PATTERN_NOFOLLOW (1<<0)
 
@@ -1123,7 +1119,7 @@  static int add_patterns(const char *fname, const char *base, int baselen,
 	return 0;
 }
 
-static int add_patterns_from_buffer(char *buf, size_t size,
+int add_patterns_from_buffer(char *buf, size_t size,
 				    const char *base, int baselen,
 				    struct pattern_list *pl)
 {
diff --git a/dir.h b/dir.h
index 7bc862030cf..7dfc49cf83a 100644
--- a/dir.h
+++ b/dir.h
@@ -440,6 +440,9 @@  void add_patterns_from_file(struct dir_struct *, const char *fname);
 int add_patterns_from_blob_to_list(struct object_id *oid,
 				   const char *base, int baselen,
 				   struct pattern_list *pl);
+int add_patterns_from_buffer(char *buf, size_t size,
+				    const char *base, int baselen,
+				    struct pattern_list *pl);
 void parse_path_pattern(const char **string, int *patternlen, unsigned *flags, int *nowildcardlen);
 void add_pattern(const char *string, const char *base,
 		 int baselen, struct pattern_list *pl, int srcpos);
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 4b25287886d..0afb066e795 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -29,6 +29,8 @@  const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 		return "tree";
 	case LOFC_SPARSE_OID:
 		return "sparse:oid";
+	case LOFC_SPARSE_BUFFER:
+		return "sparse:buffer";
 	case LOFC_OBJECT_TYPE:
 		return "object:type";
 	case LOFC_COMBINE:
@@ -84,6 +86,11 @@  int gently_parse_list_objects_filter(
 		}
 		return 1;
 
+	} else if (skip_prefix(arg, "sparse:buffer=", &v0)) {
+		filter_options->spec_buffer = xstrdup(v0);
+		filter_options->choice = LOFC_SPARSE_BUFFER;
+		return 0;
+
 	} else if (skip_prefix(arg, "object:type=", &v0)) {
 		int type = type_from_string_gently(v0, strlen(v0), 1);
 		if (type < 0) {
@@ -338,6 +345,7 @@  void list_objects_filter_release(
 		return;
 	string_list_clear(&filter_options->filter_spec, /*free_util=*/0);
 	free(filter_options->sparse_oid_name);
+	free(filter_options->spec_buffer);
 	for (sub = 0; sub < filter_options->sub_nr; sub++)
 		list_objects_filter_release(&filter_options->sub[sub]);
 	free(filter_options->sub);
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index ffc02d77e76..21d9ed4ef5d 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -14,6 +14,7 @@  enum list_objects_filter_choice {
 	LOFC_BLOB_LIMIT,
 	LOFC_TREE_DEPTH,
 	LOFC_SPARSE_OID,
+	LOFC_SPARSE_BUFFER,
 	LOFC_OBJECT_TYPE,
 	LOFC_COMBINE,
 	LOFC__COUNT /* must be last */
@@ -58,6 +59,8 @@  struct list_objects_filter_options {
 	unsigned long tree_exclude_depth;
 	enum object_type object_type;
 
+	char *spec_buffer;
+
 	/* LOFC_COMBINE values */
 
 	/* This array contains all the subfilters which this filter combines. */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 1c1ee3d1bb1..17127c167d4 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -545,6 +545,32 @@  static void filter_sparse_oid__init(
 	filter->free_fn = filter_sparse_free;
 }
 
+static void filter_sparse_file_path__init(
+	struct list_objects_filter_options *filter_options,
+	struct filter *filter)
+{
+	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
+	struct strbuf strbuf = STRBUF_INIT;
+	char *buf;
+	size_t size;
+
+	strbuf_addf(&strbuf, "%s\n", filter_options->spec_buffer);
+	buf = strbuf_detach(&strbuf, &size);
+	if (add_patterns_from_buffer(buf, size, "", 0, &d->pl) < 0)
+		die(_("unable to parse sparse filter data: %s"),
+		    filter_options->spec_buffer);
+	strbuf_release(&strbuf);
+
+	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
+	d->array_frame[d->nr].default_match = 0; /* default to include */
+	d->array_frame[d->nr].child_prov_omit = 0;
+	d->nr++;
+
+	filter->filter_data = d;
+	filter->filter_object_fn = filter_sparse;
+	filter->free_fn = filter_sparse_free;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -766,6 +792,7 @@  static filter_init_fn s_filters[] = {
 	filter_blobs_limit__init,
 	filter_trees_depth__init,
 	filter_sparse_oid__init,
+	filter_sparse_file_path__init,
 	filter_object_type__init,
 	filter_combine__init,
 };
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 4a3778d04a8..14c435f2f4b 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -421,6 +421,9 @@  test_expect_success 'setup src repo for sparse filter' '
 	git -C sparse-src config --local uploadpack.allowanysha1inwant 1 &&
 	test_commit -C sparse-src one &&
 	test_commit -C sparse-src two &&
+	mkdir sparse-src/three &&
+	test_commit -C sparse-src/three four &&
+	test_commit -C sparse-src/three five &&
 	echo /one.t >sparse-src/only-one &&
 	git -C sparse-src add . &&
 	git -C sparse-src commit -m "add sparse checkout files"
@@ -451,6 +454,86 @@  test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
 	test_i18ngrep "unable to parse sparse filter data in" err
 '
 
+test_expect_success 'partial clone with sparse:buffer filter with single file succeeds' '
+	rm -rf dst.git &&
+	git clone --no-local --bare \
+		  --filter=sparse:buffer=one.t \
+		  sparse-src dst.git &&
+	(
+		cd dst.git &&
+		git rev-list --objects --missing=print HEAD >out &&
+		grep "^$(git rev-parse HEAD:one.t)" out &&
+		grep "^?$(git rev-parse HEAD:two.t)" out &&
+		grep "^?$(git rev-parse HEAD:three/four.t)" out &&
+		grep "^?$(git rev-parse HEAD:three/five.t)" out
+	)
+'
+
+test_expect_success 'partial clone with sparse:buffer filter with single dir succeeds' '
+	rm -rf dst.git &&
+	git clone --no-local --bare \
+		  --filter=sparse:buffer=three \
+		  sparse-src dst.git &&
+	(
+		cd dst.git &&
+		git rev-list --objects --missing=print HEAD >out &&
+		grep "^?$(git rev-parse HEAD:one.t)" out &&
+		grep "^?$(git rev-parse HEAD:two.t)" out &&
+		grep "^$(git rev-parse HEAD:three/four.t)" out &&
+		grep "^$(git rev-parse HEAD:three/five.t)" out
+	)
+'
+
+test_expect_success 'partial clone with sparse:buffer filter with filterspec file succeeds' '
+	test_write_lines one.t three > filterspec &&
+	rm -rf dst.git &&
+	git clone --no-local --bare \
+		  --filter="sparse:buffer=`cat filterspec`" \
+		  sparse-src dst.git &&
+	(
+		cd dst.git &&
+		git rev-list --objects --missing=print HEAD >out &&
+		grep "^$(git rev-parse HEAD:one.t)" out &&
+		grep "^?$(git rev-parse HEAD:two.t)" out &&
+		grep "^$(git rev-parse HEAD:three/four.t)" out &&
+		grep "^$(git rev-parse HEAD:three/five.t)" out
+	)
+'
+
+test_expect_success 'partial clone with sparse:buffer filter with filterspec file and special character' '
+	cat >filterspec <<-EOF &&
+	three
+	!three/four.t
+	EOF
+	rm -rf dst.git &&
+	git clone --no-local --bare \
+		  --filter="sparse:buffer=`cat filterspec`" \
+		  sparse-src dst.git &&
+	(
+		cd dst.git &&
+		git rev-list --objects --missing=print HEAD >out &&
+		grep "^?$(git rev-parse HEAD:one.t)" out &&
+		grep "^?$(git rev-parse HEAD:two.t)" out &&
+		grep "^?$(git rev-parse HEAD:three/four.t)" out &&
+		grep "^$(git rev-parse HEAD:three/five.t)" out
+	)
+'
+
+test_expect_success 'partial clone with sparse:buffer filter with unknown filterspec' '
+	rm -rf dst.git &&
+	git clone --no-local --bare \
+		  --filter=sparse:buffer=unknown \
+		  sparse-src dst.git &&
+	(
+		cd dst.git &&
+		git rev-list --objects --missing=print HEAD >out &&
+		grep "^?$(git rev-parse HEAD:one.t)" out &&
+		grep "^?$(git rev-parse HEAD:two.t)" out &&
+		grep "^?$(git rev-parse HEAD:three/four.t)" out &&
+		grep "^?$(git rev-parse HEAD:three/five.t)" out
+	)
+'
+
 setup_triangle () {
 	rm -rf big-blob.txt server client promisor-remote &&