mbox series

[0/8] Harden the sparse-checkout builtin

Message ID pull.513.git.1579029962.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Harden the sparse-checkout builtin | expand

Message

Linus Arver via GitGitGadget Jan. 14, 2020, 7:25 p.m. UTC
This series is based on ds/sparse-list-in-cone-mode.

This series attempts to clean up some rough edges in the sparse-checkout
feature, especially around the cone mode.

Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
clone --sparse" option when using a URL instead of a local path. This is
fixed and properly tested here.

Also, let's improve Git's response to these more complicated scenarios:

 1. Running "git sparse-checkout init" in a worktree would complain because
    the "info" dir doesn't exist.
 2. Tracked paths that include "*" and "" in their filenames.
 3. If a user edits the sparse-checkout file to have non-cone pattern, such
    as "*" anywhere or "" in the wrong place, then we should respond
    appropriately. That is: warn that the patterns are not cone-mode, then
    revert to the old logic.

Thanks, -Stolee

Derrick Stolee (8):
  t1091: use check_files to reduce boilerplate
  sparse-checkout: create leading directories
  clone: fix --sparse option with URLs
  sparse-checkout: cone mode does not recognize "**"
  sparse-checkout: detect short patterns
  sparse-checkout: warn on incorrect '*' in patterns
  sparse-checkout: properly match escaped characters
  sparse-checkout: write escaped patterns in cone mode

 builtin/clone.c                    |   2 +-
 builtin/sparse-checkout.c          |  52 ++++-
 dir.c                              |  69 ++++++-
 dir.h                              |   1 +
 t/t1091-sparse-checkout-builtin.sh | 320 ++++++++++++++++-------------
 5 files changed, 296 insertions(+), 148 deletions(-)


base-commit: 4fd683b6a35eabd23dd5183da7f654a1e1f00325
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-513%2Fderrickstolee%2Fsparse-harden-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-513/derrickstolee/sparse-harden-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/513

Comments

Taylor Blau Jan. 14, 2020, 7:34 p.m. UTC | #1
Hi Stolee,

On Tue, Jan 14, 2020 at 07:25:54PM +0000, Derrick Stolee via GitGitGadget wrote:
> This series is based on ds/sparse-list-in-cone-mode.
>
> This series attempts to clean up some rough edges in the sparse-checkout
> feature, especially around the cone mode.
>
> Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
> clone --sparse" option when using a URL instead of a local path. This is
> fixed and properly tested here.

I haven't had a chance to look at the other patches (besides the one
that I have already reviewed on- and off-list), so take my comments here
with a grain of salt.

It's too bad that 'git clone --sparse' isn't working with URLs in
v2.25.0, but it happens, and I don't think that this is a grave-enough
issue to warrant a new release. At least, since '--sparse' is new in
v2.25.0, we're not breaking existing workflows that have already relied
on it.

And, since sparse-checkout is still relatively niche (at, least for now
;-)), I think that this not handling cloning from URLs is fine until
v2.26.0.

Of course, if there's ever another need for v2.25.1, I don't think that
this would *hurt* to release then, which is to say that we definitely
should have these patches in a release, soon, but I don't think that
there's a terrible sense of urgency in the meantime.

> Also, let's improve Git's response to these more complicated scenarios:
>
>  1. Running "git sparse-checkout init" in a worktree would complain because
>     the "info" dir doesn't exist.
>  2. Tracked paths that include "*" and "" in their filenames.
>  3. If a user edits the sparse-checkout file to have non-cone pattern, such
>     as "*" anywhere or "" in the wrong place, then we should respond
>     appropriately. That is: warn that the patterns are not cone-mode, then
>     revert to the old logic.
>
> Thanks, -Stolee
>
> Derrick Stolee (8):
>   t1091: use check_files to reduce boilerplate
>   sparse-checkout: create leading directories
>   clone: fix --sparse option with URLs
>   sparse-checkout: cone mode does not recognize "**"
>   sparse-checkout: detect short patterns
>   sparse-checkout: warn on incorrect '*' in patterns
>   sparse-checkout: properly match escaped characters
>   sparse-checkout: write escaped patterns in cone mode
>
>  builtin/clone.c                    |   2 +-
>  builtin/sparse-checkout.c          |  52 ++++-
>  dir.c                              |  69 ++++++-
>  dir.h                              |   1 +
>  t/t1091-sparse-checkout-builtin.sh | 320 ++++++++++++++++-------------
>  5 files changed, 296 insertions(+), 148 deletions(-)
>
>
> base-commit: 4fd683b6a35eabd23dd5183da7f654a1e1f00325
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-513%2Fderrickstolee%2Fsparse-harden-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-513/derrickstolee/sparse-harden-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/513
> --
> gitgitgadget
Thanks,
Taylor
Derrick Stolee Jan. 14, 2020, 7:44 p.m. UTC | #2
On 1/14/2020 2:34 PM, Taylor Blau wrote:
> Hi Stolee,
> 
> On Tue, Jan 14, 2020 at 07:25:54PM +0000, Derrick Stolee via GitGitGadget wrote:
>> This series is based on ds/sparse-list-in-cone-mode.
>>
>> This series attempts to clean up some rough edges in the sparse-checkout
>> feature, especially around the cone mode.
>>
>> Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
>> clone --sparse" option when using a URL instead of a local path. This is
>> fixed and properly tested here.
> 
> I haven't had a chance to look at the other patches (besides the one
> that I have already reviewed on- and off-list), so take my comments here
> with a grain of salt.
> 
> It's too bad that 'git clone --sparse' isn't working with URLs in
> v2.25.0, but it happens, and I don't think that this is a grave-enough
> issue to warrant a new release. At least, since '--sparse' is new in
> v2.25.0, we're not breaking existing workflows that have already relied
> on it.
> 
> And, since sparse-checkout is still relatively niche (at, least for now
> ;-)), I think that this not handling cloning from URLs is fine until
> v2.26.0.

Since we've already worked out the workaround to be:

	git clone --no-checkout <url> <dir>
	cd <dir>
	git sparse-checkout init --cone

there is no rush to fix this. Users _may_ discover the --sparse option
from the clone docs and complain, but we can point them to the above
directions for now.

> Of course, if there's ever another need for v2.25.1, I don't think that
> this would *hurt* to release then, which is to say that we definitely
> should have these patches in a release, soon, but I don't think that
> there's a terrible sense of urgency in the meantime.

I wouldn't complain to have patches 1-3 in an otherwise-warranted .1 release.

Thanks,
-Stolee
Jeff King Jan. 14, 2020, 9:31 p.m. UTC | #3
On Tue, Jan 14, 2020 at 02:44:50PM -0500, Derrick Stolee wrote:

> Since we've already worked out the workaround to be:
> 
> 	git clone --no-checkout <url> <dir>
> 	cd <dir>
> 	git sparse-checkout init --cone
> 
> there is no rush to fix this. Users _may_ discover the --sparse option
> from the clone docs and complain, but we can point them to the above
> directions for now.

Yes, but part of the beauty of the new system is just having to say
"--sparse" to make something useful happen. :)

> > Of course, if there's ever another need for v2.25.1, I don't think that
> > this would *hurt* to release then, which is to say that we definitely
> > should have these patches in a release, soon, but I don't think that
> > there's a terrible sense of urgency in the meantime.
> 
> I wouldn't complain to have patches 1-3 in an otherwise-warranted .1 release.

Agreed. 1-3 look obviously correct to me. The quoting bits I'm a little
more fuzzy on, just because I haven't really looked hard into cone mode.
Ditto for the "disable cone mode" checks.

My gut instinct is that you should be able to deduce whether the pattern
hashmap can be used purely from the patterns you see, and
core.sparseCheckoutCone would not be needed (and so if you violate the
rules by writing something manual, then it just gets slower; or maybe
we're even able to apply the literal cone-mode rules quickly and handle
the other separately). But it's much more likely I'm showing off my lack
of knowledge of the details of the problem space. You can feel free to
educate me, and/or roll your eyes and ignore me if this was already
discussed earlier.

By the way, I did notice this while poking about, which could go on top
(or hopefully be lumped in with the 1-3 as "obviously correct"):

-- >8 --

Subject: [PATCH] sparse-checkout: fix documentation typo for core.sparseCheckoutCone

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-sparse-checkout.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index 974ade2238..285893b069 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -106,7 +106,7 @@ The full pattern set allows for arbitrary pattern matches and complicated
 inclusion/exclusion rules. These can result in O(N*M) pattern matches when
 updating the index, where N is the number of patterns and M is the number
 of paths in the index. To combat this performance issue, a more restricted
-pattern set is allowed when `core.spareCheckoutCone` is enabled.
+pattern set is allowed when `core.sparseCheckoutCone` is enabled.
 
 The accepted patterns in the cone pattern set are:
Junio C Hamano Jan. 15, 2020, 7:16 p.m. UTC | #4
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Also, let's improve Git's response to these more complicated scenarios:
>
>  1. Running "git sparse-checkout init" in a worktree would complain because
>     the "info" dir doesn't exist.
>  2. Tracked paths that include "*" and "" in their filenames.
>  3. If a user edits the sparse-checkout file to have non-cone pattern, such
>     as "*" anywhere or "" in the wrong place, then we should respond
>     appropriately. That is: warn that the patterns are not cone-mode, then
>     revert to the old logic.

It seems somebody ate a letter to make "<some letter>" into "" an
empty string, so I cannot quite grok the above list---two out of
three bullet points are not quite readable.
Derrick Stolee Jan. 15, 2020, 8:32 p.m. UTC | #5
On 1/15/2020 2:16 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Also, let's improve Git's response to these more complicated scenarios:
>>
>>  1. Running "git sparse-checkout init" in a worktree would complain because
>>     the "info" dir doesn't exist.
>>  2. Tracked paths that include "*" and "" in their filenames.
>>  3. If a user edits the sparse-checkout file to have non-cone pattern, such
>>     as "*" anywhere or "" in the wrong place, then we should respond
>>     appropriately. That is: warn that the patterns are not cone-mode, then
>>     revert to the old logic.
> 
> It seems somebody ate a letter to make "<some letter>" into "" an
> empty string, so I cannot quite grok the above list---two out of
> three bullet points are not quite readable.

In 2., the "" should be "\".

In 3., the "*" and "" should be "**" and "*" respectively.

These are things that are being collapsed by GitHub's markdown
processing. Sorry that this affects GitGitGadget's cover letter.

By escaping appropriately, these show up correctly and hopefully
will be fixed in v2.

-Stolee