diff mbox series

[v1,2/7] mv: add documentation for check_dir_in_index()

Message ID 20220719132809.409247-3-shaoxuan.yuan02@gmail.com (mailing list archive)
State Superseded
Headers show
Series mv: from in-cone to out-of-cone | expand

Commit Message

Shaoxuan Yuan July 19, 2022, 1:28 p.m. UTC
Using check_dir_in_index without checking if the directory is on-disk
could get a false positive for partially sparsified directory.

Add a note in the documentation to warn potential user.

Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/mv.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Derrick Stolee July 19, 2022, 5:43 p.m. UTC | #1
On 7/19/2022 9:28 AM, Shaoxuan Yuan wrote:
> Using check_dir_in_index without checking if the directory is on-disk
> could get a false positive for partially sparsified directory.

It can be helpful to point out that this is a relatively new
method, introduced in b91a2b6594 (mv: add check_dir_in_index()
and solve general dir check issue, 2022-06-30).

> + *
> + * Note: *always* check the directory is not on-disk before this function
> + * (i.e. using lstat());
> + * otherwise it may return a false positive for a partially sparsified
> + * directory.

I'm not sure what you mean by a "false positive" in this case.
The directory exists in the index, which is what the method is
defined as checking. This does not say anything about the
worktree.

Perhaps that's the real problem? Someone might interpret this
as meaning the directory does not exist in the worktree? That
would mean that this doc update needs to be changed significantly
to say "Note that this does not imply anything about the state
of the worktree" or something.

But I think I'd rather just see this patch be dropped, unless I
am missing something important.

Thanks,
-Stolee
Victoria Dye July 19, 2022, 6:01 p.m. UTC | #2
Shaoxuan Yuan wrote:
> Using check_dir_in_index without checking if the directory is on-disk
> could get a false positive for partially sparsified directory.
> 
> Add a note in the documentation to warn potential user.
> 
> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
> ---
>  builtin/mv.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/builtin/mv.c b/builtin/mv.c
> index 4729bb1a1a..c8b9069db8 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -132,6 +132,11 @@ static int index_range_of_same_dir(const char *src, int length,
>   * Return 0 if such directory exist (i.e. with any of its contained files not
>   * marked with CE_SKIP_WORKTREE, the directory would be present in working tree).
>   * Return 1 otherwise.
> + *
> + * Note: *always* check the directory is not on-disk before this function
> + * (i.e. using lstat());
> + * otherwise it may return a false positive for a partially sparsified
> + * directory.

To me, a comment like this indicates that either the function is not doing
what it should be doing, or its name doesn't properly describe the
function's behavior.

Per the function description:

	Check if an out-of-cone directory should be in the index. Imagine
	this case that all the files under a directory are marked with
	'CE_SKIP_WORKTREE' bit and thus the directory is sparsified.

But neither the name of the function ('check_dir_in_index') nor the
function's behavior (hence the comment you're adding here) match that
description.

Since this function is intended to verify that a directory 1) exists in the
index, and 2) is *entirely* sparse, I have two suggestions:

* Change the description to specify that the non-existence of the directory
  on disk is an *assumption*, not an opportunity for a "false positive."
  It's a subtle distinction, but it clarifies that the function should be
  used only when the caller already knows the directory is empty. For
  example:

	/*
	 * Given the path of a directory that does not exist on-disk, check whether the
	 * directory contains any entries in the index with the SKIP_WORKTREE flag 
	 * enabled.
	 *
	 * Return 1 if such index entries exist.
	 * Return 0 otherwise.
	 */

* 'check_dir_in_index' doesn't reflect the "is not on disk" prerequisite, so
  the function name can be updated to clarify that (e.g.,
  'empty_dir_has_sparse_contents')

>   */
>  static int check_dir_in_index(const char *name)
>  {
Victoria Dye July 19, 2022, 6:10 p.m. UTC | #3
Victoria Dye wrote:
> Shaoxuan Yuan wrote:
>> Using check_dir_in_index without checking if the directory is on-disk
>> could get a false positive for partially sparsified directory.
>>
>> Add a note in the documentation to warn potential user.
>>
>> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
>> ---
>>  builtin/mv.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/builtin/mv.c b/builtin/mv.c
>> index 4729bb1a1a..c8b9069db8 100644
>> --- a/builtin/mv.c
>> +++ b/builtin/mv.c
>> @@ -132,6 +132,11 @@ static int index_range_of_same_dir(const char *src, int length,
>>   * Return 0 if such directory exist (i.e. with any of its contained files not
>>   * marked with CE_SKIP_WORKTREE, the directory would be present in working tree).
>>   * Return 1 otherwise.
>> + *
>> + * Note: *always* check the directory is not on-disk before this function
>> + * (i.e. using lstat());
>> + * otherwise it may return a false positive for a partially sparsified
>> + * directory.
> 
> To me, a comment like this indicates that either the function is not doing
> what it should be doing, or its name doesn't properly describe the
> function's behavior.
> 
> Per the function description:
> 
> 	Check if an out-of-cone directory should be in the index. Imagine
> 	this case that all the files under a directory are marked with
> 	'CE_SKIP_WORKTREE' bit and thus the directory is sparsified.
> 
> But neither the name of the function ('check_dir_in_index') nor the
> function's behavior (hence the comment you're adding here) match that
> description.
> 
> Since this function is intended to verify that a directory 1) exists in the
> index, and 2) is *entirely* sparse, I have two suggestions:
> 
> * Change the description to specify that the non-existence of the directory
>   on disk is an *assumption*, not an opportunity for a "false positive."
>   It's a subtle distinction, but it clarifies that the function should be
>   used only when the caller already knows the directory is empty. For
>   example:
> 
> 	/*
> 	 * Given the path of a directory that does not exist on-disk, check whether the
> 	 * directory contains any entries in the index with the SKIP_WORKTREE flag 
> 	 * enabled.
> 	 *
> 	 * Return 1 if such index entries exist.
> 	 * Return 0 otherwise.
> 	 */

Whoops, I mixed up the return values (I assumed this returned a boolean
based on the 'check' in the function name). In that case...
> 
> * 'check_dir_in_index' doesn't reflect the "is not on disk" prerequisite, so
>   the function name can be updated to clarify that (e.g.,
>   'empty_dir_has_sparse_contents')

...there are two options. Either you can use a more "boolean-y" name (like
the one I suggest above) and flip the return values (where "1" means "the
empty dir has sparse contents"), or change the name to something that
explicitly *doesn't* imply boolean. I'm personally in favor of the former
(I'm really struggling to come up with a descriptive, non-boolean name for
this function), but I'm fine with either if you can come up with a good
function name.

> 
>>   */
>>  static int check_dir_in_index(const char *name)
>>  {
>
Shaoxuan Yuan July 21, 2022, 1:58 p.m. UTC | #4
On 7/20/2022 1:43 AM, Derrick Stolee wrote:
 >> + *
 >> + * Note: *always* check the directory is not on-disk before this 
function
 >> + * (i.e. using lstat());
 >> + * otherwise it may return a false positive for a partially sparsified
 >> + * directory.
 >
 > I'm not sure what you mean by a "false positive" in this case.
 > The directory exists in the index, which is what the method is
 > defined as checking. This does not say anything about the
 > worktree.
 >
 > Perhaps that's the real problem? Someone might interpret this
 > as meaning the directory does not exist in the worktree? That
 > would mean that this doc update needs to be changed significantly
 > to say "Note that this does not imply anything about the state
 > of the worktree" or something.

This method assumes that the directory being checking does not exist
in the working tree, but the method itself does not check this. And
if the user does not make sure the directory is absent from the
worktree, this method may return a success for a partially sparsified
directory, which is not intended.

 > But I think I'd rather just see this patch be dropped, unless I
 > am missing something important.

I found Victoria's paraphrase [1] makes my point much clearer.

--
Thanks,
Shaoxuan
Shaoxuan Yuan July 21, 2022, 2:20 p.m. UTC | #5
On 7/20/2022 2:01 AM, Victoria Dye wrote:
 > Shaoxuan Yuan wrote:
 >> Using check_dir_in_index without checking if the directory is on-disk
 >> could get a false positive for partially sparsified directory.
 >>
 >> Add a note in the documentation to warn potential user.
 >>
 >> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
 >> ---
 >>  builtin/mv.c | 5 +++++
 >>  1 file changed, 5 insertions(+)
 >>
 >> diff --git a/builtin/mv.c b/builtin/mv.c
 >> index 4729bb1a1a..c8b9069db8 100644
 >> --- a/builtin/mv.c
 >> +++ b/builtin/mv.c
 >> @@ -132,6 +132,11 @@ static int index_range_of_same_dir(const char 
*src, int length,
 >>   * Return 0 if such directory exist (i.e. with any of its contained 
files not
 >>   * marked with CE_SKIP_WORKTREE, the directory would be present in 
working tree).
 >>   * Return 1 otherwise.
 >> + *
 >> + * Note: *always* check the directory is not on-disk before this 
function
 >> + * (i.e. using lstat());
 >> + * otherwise it may return a false positive for a partially sparsified
 >> + * directory.
 >
 > To me, a comment like this indicates that either the function is not 
doing
 > what it should be doing, or its name doesn't properly describe the
 > function's behavior.
 >
 > Per the function description:
 >
 >     Check if an out-of-cone directory should be in the index. Imagine
 >     this case that all the files under a directory are marked with
 >     'CE_SKIP_WORKTREE' bit and thus the directory is sparsified.
 >
 > But neither the name of the function ('check_dir_in_index') nor the
 > function's behavior (hence the comment you're adding here) match that
 > description.
 >
 > Since this function is intended to verify that a directory 1) exists 
in the
 > index, and 2) is *entirely* sparse, I have two suggestions:
 >
 > * Change the description to specify that the non-existence of the 
directory
 >   on disk is an *assumption*, not an opportunity for a "false positive."
 >   It's a subtle distinction, but it clarifies that the function should be
 >   used only when the caller already knows the directory is empty. For
 >   example:
 >
 >     /*
 >      * Given the path of a directory that does not exist on-disk, 
check whether the
 >      * directory contains any entries in the index with the 
SKIP_WORKTREE flag
 >      * enabled.
 >      *
 >      * Return 1 if such index entries exist.
 >      * Return 0 otherwise.
 >      */
 >
 > * 'check_dir_in_index' doesn't reflect the "is not on disk" 
prerequisite, so
 >   the function name can be updated to clarify that (e.g.,
 >   'empty_dir_has_sparse_contents')
 >

I really breathed a sigh of relief after seeing these two points :-)
They word things out much better than the original ones.

--
Thanks,
Shaoxuan
diff mbox series

Patch

diff --git a/builtin/mv.c b/builtin/mv.c
index 4729bb1a1a..c8b9069db8 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -132,6 +132,11 @@  static int index_range_of_same_dir(const char *src, int length,
  * Return 0 if such directory exist (i.e. with any of its contained files not
  * marked with CE_SKIP_WORKTREE, the directory would be present in working tree).
  * Return 1 otherwise.
+ *
+ * Note: *always* check the directory is not on-disk before this function
+ * (i.e. using lstat());
+ * otherwise it may return a false positive for a partially sparsified
+ * directory.
  */
 static int check_dir_in_index(const char *name)
 {