diff mbox series

unpack-trees: fix sparse directory recursion check

Message ID pull.1344.git.1662066153644.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 037f8ea6d961a52deaa8df5150049850a53d61ac
Headers show
Series unpack-trees: fix sparse directory recursion check | expand

Commit Message

Victoria Dye Sept. 1, 2022, 9:02 p.m. UTC
From: Victoria Dye <vdye@github.com>

Ensure 'is_sparse_directory_entry()' receives a valid 'name_entry *' if one
exists in the list of tree(s) being unpacked in 'unpack_callback()'.

Currently, 'is_sparse_directory_entry()' is called with the first
'name_entry' in the 'names' list of entries on 'unpack_callback()'. However,
this entry may be empty even when other elements of 'names' are not (such as
when switching from an orphan branch back to a "normal" branch). As a
result, 'is_sparse_directory_entry()' could incorrectly indicate that a
sparse directory is *not* actually sparse because the name of the index
entry does not match the (empty) 'name_entry' path.

Fix the issue by using the existing 'name_entry *p' value in
'unpack_callback()', which points to the first non-empty entry in 'names'.
Because 'p' is 'const', also update 'is_sparse_directory_entry()'s
'name_entry *' argument to be 'const'.

Finally, add a regression test case.

Reported-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
    unpack-trees: fix sparse directory recursion check
    
    This issue was found when the updates from v2.37.3 introduced a test
    failure in a downstream test suite.
    
    The issue stems from the fact that, before v2.37.3, 'unpack_callback()'
    could previously "assume" that 'names[0]' was non-empty if a cache entry
    was unpacked as a sparse index. When b15207b8cf (unpack-trees: unpack
    new trees as sparse directories, 2022-08-08)) was introduced, it
    invalidated that assumption by allowing sparse directories to be
    unpacked based on the contents of other 'names' entries, rather than
    unnecessarily recursing into them and unpacking files individually. As a
    result, certain scenarios could cause a sparse directory to be unpacked
    then also recursively unpacked via 'traverse_trees_recursive()',
    creating duplicate index entries.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1344%2Fvdye%2Fbugfix%2Fsparse-index-orphan-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1344/vdye/bugfix/sparse-index-orphan-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1344

 t/t1092-sparse-checkout-compatibility.sh | 9 +++++++++
 unpack-trees.c                           | 4 ++--
 2 files changed, 11 insertions(+), 2 deletions(-)


base-commit: d42b38dfb5edf1a7fddd9542d722f91038407819

Comments

Johannes Schindelin Sept. 2, 2022, 9:03 a.m. UTC | #1
Hi Victoria,

On Thu, 1 Sep 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> Ensure 'is_sparse_directory_entry()' receives a valid 'name_entry *' if one
> exists in the list of tree(s) being unpacked in 'unpack_callback()'.
>
> Currently, 'is_sparse_directory_entry()' is called with the first
> 'name_entry' in the 'names' list of entries on 'unpack_callback()'. However,
> this entry may be empty even when other elements of 'names' are not (such as
> when switching from an orphan branch back to a "normal" branch). As a
> result, 'is_sparse_directory_entry()' could incorrectly indicate that a
> sparse directory is *not* actually sparse because the name of the index
> entry does not match the (empty) 'name_entry' path.
>
> Fix the issue by using the existing 'name_entry *p' value in
> 'unpack_callback()', which points to the first non-empty entry in 'names'.
> Because 'p' is 'const', also update 'is_sparse_directory_entry()'s
> 'name_entry *' argument to be 'const'.
>
> Finally, add a regression test case.
>
> Reported-by: Derrick Stolee <derrickstolee@github.com>
> Signed-off-by: Victoria Dye <vdye@github.com>

A delightful commit message. Thank you!

The explanation makes perfect sense to me, and with that explanation the
diff becomes obvious, too.

Thanks,
Dscho

> ---
>     unpack-trees: fix sparse directory recursion check
>
>     This issue was found when the updates from v2.37.3 introduced a test
>     failure in a downstream test suite.
>
>     The issue stems from the fact that, before v2.37.3, 'unpack_callback()'
>     could previously "assume" that 'names[0]' was non-empty if a cache entry
>     was unpacked as a sparse index. When b15207b8cf (unpack-trees: unpack
>     new trees as sparse directories, 2022-08-08)) was introduced, it
>     invalidated that assumption by allowing sparse directories to be
>     unpacked based on the contents of other 'names' entries, rather than
>     unnecessarily recursing into them and unpacking files individually. As a
>     result, certain scenarios could cause a sparse directory to be unpacked
>     then also recursively unpacked via 'traverse_trees_recursive()',
>     creating duplicate index entries.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1344%2Fvdye%2Fbugfix%2Fsparse-index-orphan-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1344/vdye/bugfix/sparse-index-orphan-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1344
>
>  t/t1092-sparse-checkout-compatibility.sh | 9 +++++++++
>  unpack-trees.c                           | 4 ++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 0302e36fd66..b9350c075c2 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -380,6 +380,15 @@ test_expect_success 'checkout with modified sparse directory' '
>  	test_all_match git checkout base
>  '
>
> +test_expect_success 'checkout orphan then non-orphan' '
> +	init_repos &&
> +
> +	test_all_match git checkout --orphan test-orphan &&
> +	test_all_match git status --porcelain=v2 &&
> +	test_all_match git checkout base &&
> +	test_all_match git status --porcelain=v2
> +'
> +
>  test_expect_success 'add outside sparse cone' '
>  	init_repos &&
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 90b92114be8..bae812156c4 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1423,7 +1423,7 @@ static void debug_unpack_callback(int n,
>   * from the tree walk at the given traverse_info.
>   */
>  static int is_sparse_directory_entry(struct cache_entry *ce,
> -				     struct name_entry *name,
> +				     const struct name_entry *name,
>  				     struct traverse_info *info)
>  {
>  	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
> @@ -1562,7 +1562,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>  			}
>  		}
>
> -		if (!is_sparse_directory_entry(src[0], names, info) &&
> +		if (!is_sparse_directory_entry(src[0], p, info) &&
>  		    !is_new_sparse_dir &&
>  		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>  						    names, info) < 0) {
>
> base-commit: d42b38dfb5edf1a7fddd9542d722f91038407819
> --
> gitgitgadget
>
Derrick Stolee Sept. 2, 2022, 1:32 p.m. UTC | #2
On 9/1/2022 5:02 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Ensure 'is_sparse_directory_entry()' receives a valid 'name_entry *' if one
> exists in the list of tree(s) being unpacked in 'unpack_callback()'.
> 
> Currently, 'is_sparse_directory_entry()' is called with the first
> 'name_entry' in the 'names' list of entries on 'unpack_callback()'. However,
> this entry may be empty even when other elements of 'names' are not (such as
> when switching from an orphan branch back to a "normal" branch). As a
> result, 'is_sparse_directory_entry()' could incorrectly indicate that a
> sparse directory is *not* actually sparse because the name of the index
> entry does not match the (empty) 'name_entry' path.

Thank you for finding the root cause here. It was very non-obvious!

> +test_expect_success 'checkout orphan then non-orphan' '
> +	init_repos &&
> +
> +	test_all_match git checkout --orphan test-orphan &&
> +	test_all_match git status --porcelain=v2 &&
> +	test_all_match git checkout base &&
> +	test_all_match git status --porcelain=v2
> +'
> +

This test demonstrates how specific the case needs to be for this to
actually happen.

Could it also happen if we are going from a commit without that
sparse directory and then to a commit with that sparse directory? I
think that would be a more common case, but I was unable to
manipulate the test repo in t1092 to trigger this bug in the existing
test cases.

This makes me think that this bug _is_ extremely rare, so we don't
need to rush this into a 2.37.4 or anything. It would be good to
bring it into 2.38.0-rc0, though.

Thanks!
-Stolee
Junio C Hamano Sept. 2, 2022, 4:57 p.m. UTC | #3
Derrick Stolee <derrickstolee@github.com> writes:

> Could it also happen if we are going from a commit without that
> sparse directory and then to a commit with that sparse directory? I
> think that would be a more common case, but I was unable to
> manipulate the test repo in t1092 to trigger this bug in the existing
> test cases.
>
> This makes me think that this bug _is_ extremely rare, so we don't
> need to rush this into a 2.37.4 or anything. It would be good to
> bring it into 2.38.0-rc0, though.

When the "test" part of the patch is applied directly on top of the
vd/sparse-reset-checkout-fixes topic without its "fix" part, the
newly added test fails, and with the "fix" part applied, everything
goes peachy.  I think applying it on top of the topic as a fix to it
would be the most sensible.  Any motivated distro packager can grab
the topic and merge to their long-term maintenance track that way
more easily.

Thanks all.
Shaoxuan Yuan Sept. 2, 2022, 7:03 p.m. UTC | #4
On 9/1/2022 2:02 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Ensure 'is_sparse_directory_entry()' receives a valid 'name_entry *' if one
> exists in the list of tree(s) being unpacked in 'unpack_callback()'.
> 
> Currently, 'is_sparse_directory_entry()' is called with the first
> 'name_entry' in the 'names' list of entries on 'unpack_callback()'. However,
> this entry may be empty even when other elements of 'names' are not (such as
> when switching from an orphan branch back to a "normal" branch). As a
> result, 'is_sparse_directory_entry()' could incorrectly indicate that a
> sparse directory is *not* actually sparse because the name of the index
> entry does not match the (empty) 'name_entry' path.
> 
> Fix the issue by using the existing 'name_entry *p' value in
> 'unpack_callback()', which points to the first non-empty entry in 'names'.
> Because 'p' is 'const', also update 'is_sparse_directory_entry()'s
> 'name_entry *' argument to be 'const'.
> 
> Finally, add a regression test case.
> 
> Reported-by: Derrick Stolee <derrickstolee@github.com>
> Signed-off-by: Victoria Dye <vdye@github.com>

The issue is well explained: even I don't have prior knowledge but I
still get the gist out of it! The added tests in t1092 reflect the
changes brought by the diff!

Thanks,
Shaoxuan

> ---
>     unpack-trees: fix sparse directory recursion check
>     
>     This issue was found when the updates from v2.37.3 introduced a test
>     failure in a downstream test suite.
>     
>     The issue stems from the fact that, before v2.37.3, 'unpack_callback()'
>     could previously "assume" that 'names[0]' was non-empty if a cache entry
>     was unpacked as a sparse index. When b15207b8cf (unpack-trees: unpack
>     new trees as sparse directories, 2022-08-08)) was introduced, it
>     invalidated that assumption by allowing sparse directories to be
>     unpacked based on the contents of other 'names' entries, rather than
>     unnecessarily recursing into them and unpacking files individually. As a
>     result, certain scenarios could cause a sparse directory to be unpacked
>     then also recursively unpacked via 'traverse_trees_recursive()',
>     creating duplicate index entries.
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1344%2Fvdye%2Fbugfix%2Fsparse-index-orphan-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1344/vdye/bugfix/sparse-index-orphan-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1344
> 
>  t/t1092-sparse-checkout-compatibility.sh | 9 +++++++++
>  unpack-trees.c                           | 4 ++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 0302e36fd66..b9350c075c2 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -380,6 +380,15 @@ test_expect_success 'checkout with modified sparse directory' '
>  	test_all_match git checkout base
>  '
>  
> +test_expect_success 'checkout orphan then non-orphan' '
> +	init_repos &&
> +
> +	test_all_match git checkout --orphan test-orphan &&
> +	test_all_match git status --porcelain=v2 &&
> +	test_all_match git checkout base &&
> +	test_all_match git status --porcelain=v2
> +'
> +
>  test_expect_success 'add outside sparse cone' '
>  	init_repos &&
>  
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 90b92114be8..bae812156c4 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1423,7 +1423,7 @@ static void debug_unpack_callback(int n,
>   * from the tree walk at the given traverse_info.
>   */
>  static int is_sparse_directory_entry(struct cache_entry *ce,
> -				     struct name_entry *name,
> +				     const struct name_entry *name,
>  				     struct traverse_info *info)
>  {
>  	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
> @@ -1562,7 +1562,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>  			}
>  		}
>  
> -		if (!is_sparse_directory_entry(src[0], names, info) &&
> +		if (!is_sparse_directory_entry(src[0], p, info) &&
>  		    !is_new_sparse_dir &&
>  		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>  						    names, info) < 0) {
> 
> base-commit: d42b38dfb5edf1a7fddd9542d722f91038407819
diff mbox series

Patch

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 0302e36fd66..b9350c075c2 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -380,6 +380,15 @@  test_expect_success 'checkout with modified sparse directory' '
 	test_all_match git checkout base
 '
 
+test_expect_success 'checkout orphan then non-orphan' '
+	init_repos &&
+
+	test_all_match git checkout --orphan test-orphan &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git checkout base &&
+	test_all_match git status --porcelain=v2
+'
+
 test_expect_success 'add outside sparse cone' '
 	init_repos &&
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 90b92114be8..bae812156c4 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1423,7 +1423,7 @@  static void debug_unpack_callback(int n,
  * from the tree walk at the given traverse_info.
  */
 static int is_sparse_directory_entry(struct cache_entry *ce,
-				     struct name_entry *name,
+				     const struct name_entry *name,
 				     struct traverse_info *info)
 {
 	if (!ce || !name || !S_ISSPARSEDIR(ce->ce_mode))
@@ -1562,7 +1562,7 @@  static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			}
 		}
 
-		if (!is_sparse_directory_entry(src[0], names, info) &&
+		if (!is_sparse_directory_entry(src[0], p, info) &&
 		    !is_new_sparse_dir &&
 		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 						    names, info) < 0) {