diff mbox series

[v3,6/8] attr: be careful about sparse directories

Message ID c9e100e68f80196a35a37b5d0aad74e8e1174766.1629206603.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse index: delete ignored files outside sparse cone | expand

Commit Message

Derrick Stolee Aug. 17, 2021, 1:23 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 attr.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Johannes Schindelin Aug. 19, 2021, 8:11 a.m. UTC | #1
Hi Stolee,

On Tue, 17 Aug 2021, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  attr.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/attr.c b/attr.c
> index d029e681f28..a1009f78029 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -14,6 +14,7 @@
>  #include "utf8.h"
>  #include "quote.h"
>  #include "thread-utils.h"
> +#include "dir.h"
>
>  const char git_attr__true[] = "(builtin)true";
>  const char git_attr__false[] = "\0(builtin)false";
> @@ -744,6 +745,19 @@ static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  	if (!istate)
>  		return NULL;
>
> +	/*
> +	 * In the case of cone-mode sparse-checkout, getting the
> +	 * .gitattributes file from a directory is meaningless: all
> +	 * contained paths will be sparse if the .gitattributes is also
> +	 * sparse. In the case of a sparse index, it is critical that we
> +	 * don't go looking for one as it will expand the index.
> +	 */
> +	init_sparse_checkout_patterns(istate);
At first I thought that `init_sparse_checkout_patterns()` is called by
`path_in_sparse_checkout()` below, and therefore would not be necessary.

But it is! Without it, we have no way to test whether `use_cone_patterns`
is set, because, well, it gets set by `init_sparse_checkout_patterns()`.

Would it therefore make sense to refactor the code to have a
`path_in_sparse_checkout_cone()` function? Or add a flag
`only_in_cone_mode` as function parameter to `path_in_sparse_checkout()`?

Ciao,
Dscho

> +	if (istate->sparse_checkout_patterns &&
> +	    istate->sparse_checkout_patterns->use_cone_patterns &&
> +	    path_in_sparse_checkout(path, istate) == NOT_MATCHED)

> +		return NULL;
> +
>  	buf = read_blob_data_from_index(istate, path, NULL);
>  	if (!buf)
>  		return NULL;
> --
> gitgitgadget
>
>
Elijah Newren Aug. 19, 2021, 8:53 p.m. UTC | #2
On Tue, Aug 17, 2021 at 6:23 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  attr.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/attr.c b/attr.c
> index d029e681f28..a1009f78029 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -14,6 +14,7 @@
>  #include "utf8.h"
>  #include "quote.h"
>  #include "thread-utils.h"
> +#include "dir.h"
>
>  const char git_attr__true[] = "(builtin)true";
>  const char git_attr__false[] = "\0(builtin)false";
> @@ -744,6 +745,19 @@ static struct attr_stack *read_attr_from_index(struct index_state *istate,
>         if (!istate)
>                 return NULL;
>
> +       /*
> +        * In the case of cone-mode sparse-checkout, getting the
> +        * .gitattributes file from a directory is meaningless: all
> +        * contained paths will be sparse if the .gitattributes is also
> +        * sparse. In the case of a sparse index, it is critical that we
> +        * don't go looking for one as it will expand the index.
> +        */

"all contained paths will be sparse if the .gitattributes is also sparse"?

Do you mean something more like "the .gitattributes only applies for
files under the given directory, and if the directory is sparse, then
neither the .gitattributes file or any other file under that directory
will be present" ?

Also, out of curiosity, I was suggesting in the past we do something
like this for .gitignore files, for the same reason.  Do we have such
logic in place, or is that another area of the code that hasn't been
handled yet?

> +       init_sparse_checkout_patterns(istate);
> +       if (istate->sparse_checkout_patterns &&
> +           istate->sparse_checkout_patterns->use_cone_patterns &&
> +           path_in_sparse_checkout(path, istate) == NOT_MATCHED)
> +               return NULL;
> +
>         buf = read_blob_data_from_index(istate, path, NULL);
>         if (!buf)
>                 return NULL;
> --
> gitgitgadget

Though the code appears correct, I too am curious about the questions
Dscho asked about the code in this patch.
Derrick Stolee Aug. 20, 2021, 3:36 p.m. UTC | #3
On 8/19/2021 4:11 AM, Johannes Schindelin wrote:
> On Tue, 17 Aug 2021, Derrick Stolee via GitGitGadget wrote:
>> +	/*
>> +	 * In the case of cone-mode sparse-checkout, getting the
>> +	 * .gitattributes file from a directory is meaningless: all
>> +	 * contained paths will be sparse if the .gitattributes is also
>> +	 * sparse. In the case of a sparse index, it is critical that we
>> +	 * don't go looking for one as it will expand the index.
>> +	 */
>> +	init_sparse_checkout_patterns(istate);
> At first I thought that `init_sparse_checkout_patterns()` is called by
> `path_in_sparse_checkout()` below, and therefore would not be necessary.
> 
> But it is! Without it, we have no way to test whether `use_cone_patterns`
> is set, because, well, it gets set by `init_sparse_checkout_patterns()`.
> 
> Would it therefore make sense to refactor the code to have a
> `path_in_sparse_checkout_cone()` function? Or add a flag
> `only_in_cone_mode` as function parameter to `path_in_sparse_checkout()`?

One way to clean this up as-is would be to replace
 
>> +	if (istate->sparse_checkout_patterns &&
>> +	    istate->sparse_checkout_patterns->use_cone_patterns &&
>> +	    path_in_sparse_checkout(path, istate) == NOT_MATCHED)

with

	if (!path_in_sparse_checkout(path, istate) &&
	    istate->sparse_checkout_patterns->use_cone_patterns)

to get a similar effect. However, it creates wasted pattern-match
checks when not in cone-mode, so you are probably right that a
path_in_cone_mode_sparse_checkout() would be best to short-circuit
in the non-cone-mode case.

This special casing should be rare, so I don't think it is worth
adding a flag to path_in_sparse_checkout() and making those call
sites more complicated.

Thanks,
-Stolee
Derrick Stolee Aug. 20, 2021, 3:39 p.m. UTC | #4
On 8/19/2021 4:53 PM, Elijah Newren wrote:
> On Tue, Aug 17, 2021 at 6:23 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> +       /*
>> +        * In the case of cone-mode sparse-checkout, getting the
>> +        * .gitattributes file from a directory is meaningless: all
>> +        * contained paths will be sparse if the .gitattributes is also
>> +        * sparse. In the case of a sparse index, it is critical that we
>> +        * don't go looking for one as it will expand the index.
>> +        */
> 
> "all contained paths will be sparse if the .gitattributes is also sparse"?
> 
> Do you mean something more like "the .gitattributes only applies for
> files under the given directory, and if the directory is sparse, then
> neither the .gitattributes file or any other file under that directory
> will be present" ?

Yes, you understand correctly and explain it better. Thanks.
 
> Also, out of curiosity, I was suggesting in the past we do something
> like this for .gitignore files, for the same reason.  Do we have such
> logic in place, or is that another area of the code that hasn't been
> handled yet?

I don't believe this has been handled. It definitely is less obvious
what to do there, because the point of .gitignore is to skip files that
exist in the working tree even if Git didn't put them there. Meanwhile,
.gitattributes is about how Git writes tracked files, but Git doesn't
write sparse tracked files.

> Though the code appears correct, I too am curious about the questions
> Dscho asked about the code in this patch.
 
Thanks,
-Stolee
Elijah Newren Aug. 20, 2021, 4:05 p.m. UTC | #5
On Fri, Aug 20, 2021 at 8:39 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/19/2021 4:53 PM, Elijah Newren wrote:
> > On Tue, Aug 17, 2021 at 6:23 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> +       /*
> >> +        * In the case of cone-mode sparse-checkout, getting the
> >> +        * .gitattributes file from a directory is meaningless: all
> >> +        * contained paths will be sparse if the .gitattributes is also
> >> +        * sparse. In the case of a sparse index, it is critical that we
> >> +        * don't go looking for one as it will expand the index.
> >> +        */
> >
> > "all contained paths will be sparse if the .gitattributes is also sparse"?
> >
> > Do you mean something more like "the .gitattributes only applies for
> > files under the given directory, and if the directory is sparse, then
> > neither the .gitattributes file or any other file under that directory
> > will be present" ?
>
> Yes, you understand correctly and explain it better. Thanks.
>
> > Also, out of curiosity, I was suggesting in the past we do something
> > like this for .gitignore files, for the same reason.  Do we have such
> > logic in place, or is that another area of the code that hasn't been
> > handled yet?
>
> I don't believe this has been handled. It definitely is less obvious
> what to do there, because the point of .gitignore is to skip files that
> exist in the working tree even if Git didn't put them there. Meanwhile,
> .gitattributes is about how Git writes tracked files, but Git doesn't
> write sparse tracked files.

Well, one advantage of deleting ignored files in sparse directories
when we sparsify, is that we know if any files are left, they are all
untracked and not ignored.  So we don't need to load the .gitignore
file for those sparse directories.

Sure, there's a small edge case of users adding new untracked files
that would have matched the .gitignore file, but it's also clear that
they removed the .gitignore file when they sparsified, so I don't see
a problem in reporting it as untracked while that directory was
as-sparsified-away-as-possible.
diff mbox series

Patch

diff --git a/attr.c b/attr.c
index d029e681f28..a1009f78029 100644
--- a/attr.c
+++ b/attr.c
@@ -14,6 +14,7 @@ 
 #include "utf8.h"
 #include "quote.h"
 #include "thread-utils.h"
+#include "dir.h"
 
 const char git_attr__true[] = "(builtin)true";
 const char git_attr__false[] = "\0(builtin)false";
@@ -744,6 +745,19 @@  static struct attr_stack *read_attr_from_index(struct index_state *istate,
 	if (!istate)
 		return NULL;
 
+	/*
+	 * In the case of cone-mode sparse-checkout, getting the
+	 * .gitattributes file from a directory is meaningless: all
+	 * contained paths will be sparse if the .gitattributes is also
+	 * sparse. In the case of a sparse index, it is critical that we
+	 * don't go looking for one as it will expand the index.
+	 */
+	init_sparse_checkout_patterns(istate);
+	if (istate->sparse_checkout_patterns &&
+	    istate->sparse_checkout_patterns->use_cone_patterns &&
+	    path_in_sparse_checkout(path, istate) == NOT_MATCHED)
+		return NULL;
+
 	buf = read_blob_data_from_index(istate, path, NULL);
 	if (!buf)
 		return NULL;