mbox series

[v4,00/15] Harden the sparse-checkout builtin

Message ID pull.513.v4.git.1580501775.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Harden the sparse-checkout builtin | expand

Message

John Passaro via GitGitGadget Jan. 31, 2020, 8:16 p.m. UTC
This series is based on ds/sparse-list-in-cone-mode.

This series attempts to clean up some rough edges in the sparse-checkout
feature, especially around the cone mode.

Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
clone --sparse" option when using a URL instead of a local path. This is
fixed and properly tested here.

Also, let's improve Git's response to these more complicated scenarios:

 1. Running "git sparse-checkout init" in a worktree would complain because
    the "info" dir doesn't exist.
 2. Tracked paths that include "*" and "\" in their filenames.
 3. If a user edits the sparse-checkout file to have non-cone pattern, such
    as "**" anywhere or "*" in the wrong place, then we should respond
    appropriately. That is: warn that the patterns are not cone-mode, then
    revert to the old logic.

Updates in V2:

 * Added C-style quoting to the output of "git sparse-checkout list" in cone
   mode.
 * Improved documentation.
 * Responded to most style feedback. Hopefully I didn't miss anything.
 * I was lingering on this a little to see if I could also fix the issue
   raised in [1], but I have not figured that one out, yet.

Update in V3:

 * Input now uses Peff's recommended pattern: unquote C-style strings over
   stdin and otherwise do not un-escape input.

[1] 
https://lore.kernel.org/git/062301d5d0bc$c3e17760$4ba46620$@Frontier.com/

Thanks, -Stolee

Derrick Stolee (14):
  t1091: use check_files to reduce boilerplate
  t1091: improve here-docs
  sparse-checkout: create leading directories
  clone: fix --sparse option with URLs
  sparse-checkout: cone mode does not recognize "**"
  sparse-checkout: detect short patterns
  sparse-checkout: warn on globs in cone patterns
  sparse-checkout: properly match escaped characters
  sparse-checkout: write escaped patterns in cone mode
  sparse-checkout: unquote C-style strings over --stdin
  sparse-checkout: use C-style quotes in 'list' subcommand
  sparse-checkout: escape all glob characters on write
  sparse-checkout: improve docs around 'set' in cone mode
  sparse-checkout: fix cone mode behavior mismatch

Jeff King (1):
  sparse-checkout: fix documentation typo for core.sparseCheckoutCone

 Documentation/git-sparse-checkout.txt |  19 +-
 builtin/clone.c                       |   2 +-
 builtin/sparse-checkout.c             |  48 +++-
 dir.c                                 |  79 +++++-
 t/t1091-sparse-checkout-builtin.sh    | 352 +++++++++++++++-----------
 unpack-trees.c                        |   2 +-
 6 files changed, 346 insertions(+), 156 deletions(-)


base-commit: 4fd683b6a35eabd23dd5183da7f654a1e1f00325
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-513%2Fderrickstolee%2Fsparse-harden-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-513/derrickstolee/sparse-harden-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/513

Range-diff vs v3:

  1:  1cc825412f =  1:  1cc825412f t1091: use check_files to reduce boilerplate
  2:  b7a6ad145a =  2:  b7a6ad145a t1091: improve here-docs
  3:  5497ad8778 =  3:  5497ad8778 sparse-checkout: create leading directories
  4:  4991a51f6d =  4:  4991a51f6d clone: fix --sparse option with URLs
  5:  ae78c3069b =  5:  ae78c3069b sparse-checkout: fix documentation typo for core.sparseCheckoutCone
  6:  2ad4d3e467 =  6:  2ad4d3e467 sparse-checkout: cone mode does not recognize "**"
  7:  aace064510 =  7:  aace064510 sparse-checkout: detect short patterns
  8:  d2a510a3bb !  8:  66caabef5f sparse-checkout: warn on incorrect '*' in patterns
     @@ -1,6 +1,6 @@
      Author: Derrick Stolee <dstolee@microsoft.com>
      
     -    sparse-checkout: warn on incorrect '*' in patterns
     +    sparse-checkout: warn on globs in cone patterns
      
          In cone mode, the sparse-checkout commmand will write patterns that
          allow faster pattern matching. This matching only works if the patterns
     @@ -9,10 +9,10 @@
          cone mode matching to fail.
      
          The cone mode patterns may end in "/*" but otherwise an un-escaped
     -    asterisk is invalid. Add checks to disable cone mode when seeing these
     -    values.
     +    asterisk or other glob character is invalid. Add checks to disable
     +    cone mode when seeing these values.
      
     -    A later change will properly handle escaped asterisks.
     +    A later change will properly handle escaped globs.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ -43,16 +43,23 @@
      +	next = given->pattern + 2;
      +
      +	while (*cur) {
     -+		/* We care about *cur == '*' */
     -+		if (*cur != '*')
     ++		/* Watch for glob characters '*', '\', '[', '?' */
     ++		if (!is_glob_special(*cur))
      +			goto increment;
      +
      +		/* But only if *prev != '\\' */
      +		if (*prev == '\\')
      +			goto increment;
      +
     ++		/* But allow the initial '\' */
     ++		if (*cur == '\\' &&
     ++		    is_glob_special(*next))
     ++			goto increment;
     ++
      +		/* But a trailing '/' then '*' is fine */
     -+		if (*prev == '/' && *next == 0)
     ++		if (*prev == '/' &&
     ++		    *cur == '*' &&
     ++		    *next == 0)
      +			goto increment;
      +
      +		/* Not a cone pattern. */
     @@ -94,6 +101,18 @@
      +	check_read_tree_errors repo "a deep" "disabling cone pattern matching"
      +'
      +
     ++test_expect_success 'pattern-checks: contained glob characters' '
     ++	for c in "[a]" "\\" "?" "*"
     ++	do
     ++		cat >repo/.git/info/sparse-checkout <<-EOF &&
     ++		/*
     ++		!/*/
     ++		something$c-else/
     ++		EOF
     ++		check_read_tree_errors repo "a" "disabling cone pattern matching"
     ++	done
     ++'
     ++
      +test_expect_success 'pattern-checks: escaped "*"' '
      +	cat >repo/.git/info/sparse-checkout <<-\EOF &&
      +	/*
  9:  9ea69e9069 !  9:  4c86d01f0e sparse-checkout: properly match escaped characters
     @@ -23,6 +23,7 @@
      +static char *dup_and_filter_pattern(const char *pattern)
      +{
      +	char *set, *read;
     ++	size_t count  = 0;
      +	char *result = xstrdup(pattern);
      +
      +	set = result;
     @@ -37,11 +38,14 @@
      +
      +		set++;
      +		read++;
     ++		count++;
      +	}
      +	*set = 0;
      +
     -+	if (*(read - 2) == '/' && *(read - 1) == '*')
     -+		*(read - 2) = 0;
     ++	if (count > 2 &&
     ++	    *(set - 1) == '*' &&
     ++	    *(set - 2) == '/')
     ++		*(set - 2) = 0;
      +
      +	return result;
      +}
     @@ -73,7 +77,7 @@
       --- a/t/t1091-sparse-checkout-builtin.sh
       +++ b/t/t1091-sparse-checkout-builtin.sh
      @@
     - 	check_read_tree_errors repo "a deep" "disabling cone pattern matching"
     + 	done
       '
       
      -test_expect_success 'pattern-checks: escaped "*"' '
     @@ -96,6 +100,7 @@
       	!/*/
      -	/does\*not\*exist/
      +	/zbad\\dir/
     ++	!/zbad\\dir/*/
      +	/zdoes\*not\*exist/
      +	/zdoes\*exist/
       	EOF
 10:  e2f9afc70c ! 10:  0b9346f67b sparse-checkout: write escaped patterns in cone mode
     @@ -9,26 +9,12 @@
          However, in cone mode we expect a list of paths to directories and then
          we convert those into patterns.
      
     -    Even more specifically, the goal is to always allow the following from
     -    the root of a repo:
     -
     -      git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin
     -
     -    The ls-tree command provides directory names with an unescaped asterisk.
     -    It also quotes the directories that contain an escaped backslash. We
     -    must remove these quotes, then keep the escaped backslashes.
     -
          However, there is some care needed for the timing of these escapes. The
          in-memory pattern list is used to update the working directory before
          writing the patterns to disk. Thus, we need the command to have the
          unescaped names in the hashsets for the cone comparisons, then escape
          the patterns later.
      
     -    Use unquote_c_style() when parsing lines from stdin. Command-line
     -    arguments will be parsed as-is, assuming the user can do the correct
     -    level of escaping from their environment to match the exact directory
     -    names.
     -
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
     @@ -89,29 +75,6 @@
       	}
       }
       
     -@@
     - 		pl.use_cone_patterns = 1;
     - 
     - 		if (set_opts.use_stdin) {
     --			while (!strbuf_getline(&line, stdin))
     -+			struct strbuf unquoted = STRBUF_INIT;
     -+			while (!strbuf_getline(&line, stdin)) {
     -+				if (line.buf[0] == '"') {
     -+					strbuf_setlen(&unquoted, 0);
     -+					if (unquote_c_style(&unquoted, line.buf, NULL))
     -+						die(_("unable to unquote C-style string '%s'"),
     -+						line.buf);
     -+
     -+					strbuf_swap(&unquoted, &line);
     -+				}
     -+
     - 				strbuf_to_cone_pattern(&line, &pl);
     -+			}
     -+
     -+			strbuf_release(&unquoted);
     - 		} else {
     - 			for (i = 0; i < argc; i++) {
     - 				strbuf_setlen(&line, 0);
      
       diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
       --- a/t/t1091-sparse-checkout-builtin.sh
     @@ -131,29 +94,18 @@
       	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist" &&
       	git -C escaped sparse-checkout init --cone &&
      -	cat >escaped/.git/info/sparse-checkout <<-\EOF &&
     -+	git -C escaped sparse-checkout set zbad\\dir "zdoes*not*exist" "zdoes*exist" &&
     ++	git -C escaped sparse-checkout set zbad\\dir/bogus "zdoes*not*exist" "zdoes*exist" &&
      +	cat >expect <<-\EOF &&
       	/*
       	!/*/
       	/zbad\\dir/
     -+	/zdoes\*exist/
     - 	/zdoes\*not\*exist/
     -+	EOF
     -+	test_cmp expect escaped/.git/info/sparse-checkout &&
     -+	check_read_tree_errors escaped "a zbad\\dir zdoes*exist" &&
     -+	git -C escaped ls-tree -d --name-only HEAD | git -C escaped sparse-checkout set --stdin &&
     -+	cat >expect <<-\EOF &&
     -+	/*
     -+	!/*/
     -+	/deep/
     -+	/folder1/
     -+	/folder2/
     -+	/zbad\\dir/
     + 	!/zbad\\dir/*/
     +-	/zdoes\*not\*exist/
     ++	/zbad\\dir/bogus/
       	/zdoes\*exist/
     ++	/zdoes\*not\*exist/
       	EOF
     --	check_read_tree_errors escaped "a zbad\\dir zdoes*exist"
      +	test_cmp expect escaped/.git/info/sparse-checkout &&
     -+	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist"
     + 	check_read_tree_errors escaped "a zbad\\dir zdoes*exist"
       '
       
     - test_done
  -:  ---------- > 11:  9f682e6076 sparse-checkout: unquote C-style strings over --stdin
 11:  ec714a4cf0 = 12:  e2c6f85617 sparse-checkout: use C-style quotes in 'list' subcommand
  -:  ---------- > 13:  54be8e89eb sparse-checkout: escape all glob characters on write
 12:  1867746d97 = 14:  3dd8f97b3a sparse-checkout: improve docs around 'set' in cone mode
  -:  ---------- > 15:  5e9fcce75f sparse-checkout: fix cone mode behavior mismatch

Comments

Elijah Newren Jan. 31, 2020, 8:36 p.m. UTC | #1
On Fri, Jan 31, 2020 at 12:16 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This series is based on ds/sparse-list-in-cone-mode.
>
> This series attempts to clean up some rough edges in the sparse-checkout
> feature, especially around the cone mode.
>
> Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
> clone --sparse" option when using a URL instead of a local path. This is
> fixed and properly tested here.
>
> Also, let's improve Git's response to these more complicated scenarios:
>
>  1. Running "git sparse-checkout init" in a worktree would complain because
>     the "info" dir doesn't exist.
>  2. Tracked paths that include "*" and "\" in their filenames.
>  3. If a user edits the sparse-checkout file to have non-cone pattern, such
>     as "**" anywhere or "*" in the wrong place, then we should respond
>     appropriately. That is: warn that the patterns are not cone-mode, then
>     revert to the old logic.
>
> Updates in V2:
>
>  * Added C-style quoting to the output of "git sparse-checkout list" in cone
>    mode.
>  * Improved documentation.
>  * Responded to most style feedback. Hopefully I didn't miss anything.
>  * I was lingering on this a little to see if I could also fix the issue
>    raised in [1], but I have not figured that one out, yet.
>
> Update in V3:
>
>  * Input now uses Peff's recommended pattern: unquote C-style strings over
>    stdin and otherwise do not un-escape input.

...and updates in V4 are?  (I looked over the range-diff which
definitely helps, but a summary would still be nice.)
Derrick Stolee Feb. 3, 2020, 2:09 p.m. UTC | #2
On 1/31/2020 3:36 PM, Elijah Newren wrote:
> On Fri, Jan 31, 2020 at 12:16 PM Derrick Stolee via GitGitGadget
>> Update in V3:
>>
>>  * Input now uses Peff's recommended pattern: unquote C-style strings over
>>    stdin and otherwise do not un-escape input.
> 
> ...and updates in V4 are?  (I looked over the range-diff which
> definitely helps, but a summary would still be nice.)

Sorry! I definitely should have double-checked before sending.

Updates in V4:

* Special-character checking now considers all glob characters
  ( '[', '*', '\\', '?' ) See Patches 8 and 13.

* Patch 10 is is now split into two (Patches 10 and 11), to properly
  escape patterns and to unquote C-style strings. 

* The file/directory path bug reported in [1] is fixed in Patch 15.

Thanks,
-Stolee

[1] https://lore.kernel.org/git/CADSBhNbbO=aq-Oo2MpzDMN2VAX4m6f9Jb-eCtVVX1NfWKE9zJw@mail.gmail.com/
Taylor Blau Feb. 8, 2020, 11:32 p.m. UTC | #3
Hi Derrick,

On Mon, Feb 03, 2020 at 09:09:54AM -0500, Derrick Stolee wrote:
> On 1/31/2020 3:36 PM, Elijah Newren wrote:
> > On Fri, Jan 31, 2020 at 12:16 PM Derrick Stolee via GitGitGadget
> >> Update in V3:
> >>
> >>  * Input now uses Peff's recommended pattern: unquote C-style strings over
> >>    stdin and otherwise do not un-escape input.
> >
> > ...and updates in V4 are?  (I looked over the range-diff which
> > definitely helps, but a summary would still be nice.)
>
> Sorry! I definitely should have double-checked before sending.
>
> Updates in V4:
>
> * Special-character checking now considers all glob characters
>   ( '[', '*', '\\', '?' ) See Patches 8 and 13.
>
> * Patch 10 is is now split into two (Patches 10 and 11), to properly
>   escape patterns and to unquote C-style strings.
>
> * The file/directory path bug reported in [1] is fixed in Patch 15.

Thanks for including these. I haven't been super active in the earlier
rounds of review on this series, but I gave a thorough look to what you
have in v4, and it all looks good to me.

Please consider this:

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

> Thanks,
> -Stolee
>
> [1] https://lore.kernel.org/git/CADSBhNbbO=aq-Oo2MpzDMN2VAX4m6f9Jb-eCtVVX1NfWKE9zJw@mail.gmail.com/

Thanks,
Taylor
Junio C Hamano Feb. 9, 2020, 5:27 p.m. UTC | #4
Taylor Blau <me@ttaylorr.com> writes:

> On Mon, Feb 03, 2020 at 09:09:54AM -0500, Derrick Stolee wrote:
> ...
> Thanks for including these. I haven't been super active in the earlier
> rounds of review on this series, but I gave a thorough look to what you
> have in v4, and it all looks good to me.
>
> Please consider this:
>
>   Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks, all.