diff mbox series

[1/1] attr: add builtin objectmode values support

Message ID 20231114214934.3484892-1-jojwang@google.com (mailing list archive)
State Superseded
Headers show
Series [1/1] attr: add builtin objectmode values support | expand

Commit Message

Joanna Wang Nov. 14, 2023, 9:49 p.m. UTC
Gives all paths builtin objectmode values based on the paths' modes
(one of 100644, 100755, 120000, 040000, 160000). Users may use
this feature to filter by file types. For example a pathspec such as
':(attr:builtin_objectmode=160000)' could filter for submodules without
needing to have `builtin_objectmode=160000` to be set in .gitattributes
for every submodule path.

These values are also reflected in `git check-attr` results.
If the git_attr_direction is set to GIT_ATTR_INDEX or GIT_ATTR_CHECKIN
and a path is not found in the index, the value will be unspecified.

This patch also reserves the builtin_* attribute namespace for objectmode
and any future builtin attributes. Any user defined attributes using this
reserved namespace will result in a warning. This is a breaking change for
any existing builtin_* attributes.
Pathspecs with some builtin_* attribute name (excluding builtin_objectmode)
will behave like any attribute where there are no user specified values.

Signed-off-by: Joanna Wang <jojwang@google.com>
---

>But the usual practice is *not* to cover all paths with explicit
> attribute definition, isn't it?
Got it, the breaking change is that we're giving mode a value when
user may expect no value.

>When checking things out of the index, the index should be the
>source of the truth.  If something is not in the index, is there a
>point in falling back to the workint tree state to decide if the
>thing should be checked out of the index?
I don't think working tree objectmode fallback makes sense for
GIT_ATTR_CHECKOUT and GIT_ATTR_INDEX. Updated patch implements
without fallback.

>                static struct git_attr mode_attr;
>
>                if (!mode_attr)
>                        mode_attr = git_attr("mode");
Is there an idiomatic way to check if this git_attr (struct or pointer)
has been initialized? I could not find an anything for C but maybe I've
missed a common way to do this in the codebase.

I also addressed the style nits. Thanks for the review/fedback so far. 

 Documentation/gitattributes.txt | 14 +++++++
 attr.c                          | 67 +++++++++++++++++++++++++++++++--
 t/t0003-attributes.sh           | 44 ++++++++++++++++++++++
 t/t6135-pathspec-with-attrs.sh  | 25 ++++++++++++
 4 files changed, 147 insertions(+), 3 deletions(-)

Comments

Junio C Hamano Nov. 16, 2023, 1:26 a.m. UTC | #1
Joanna Wang <jojwang@google.com> writes:

>>                static struct git_attr mode_attr;
>>
>>                if (!mode_attr)
>>                        mode_attr = git_attr("mode");
> Is there an idiomatic way to check if this git_attr (struct or pointer)
> has been initialized?

Sorry (I think the above is a typo from the "to illustrate the idea"
code snippet in my message), the static one should be a pointer to a
struct, as git_attr() is meant to "intern" the given name and return
a singleton instance.

The idiom is to see if it is still NULL and then call git_attr().
Of course, you could repeatedly call git_attr() with the same name
and get the same singleton instance, so without the "have we
initialized?" check, your code will still be correct, but it would
incur a locked hashmap lookup (to ensure that we do not create the
second instance of the same name), so...

> +RESERVED BUILTIN_* ATTRIBUTES
> +-----------------------------
> +
> +builtin_* is a reserved namespace for builtin attribute values. Any
> +user defined attributes under this namespace will result in a warning.

warning and then?  I think we would want to do "warn and ignore",
not "warn and disable all the builtin_ computation", i.e.,

	... will be ignored and a warning message is given.

> +/*
> + * Atribute name cannot begin with "builtin_" which
> + * is a reserved namespace for built in attributes values.
> + */
> +static int attr_name_reserved(const char *name)
> +{
> +	return !strncmp(name, "builtin_", strlen("builtin_"));
> +}
> +

You'd want to use starts_with() probably.

> @@ -1240,6 +1250,57 @@ static struct object_id *default_attr_source(void)
>  	return &attr_source;
>  }
>  
> +static const char *builtin_object_mode_attr(struct index_state *istate, const char *path)
> +{
> +	unsigned int mode;
> +	struct strbuf sb = STRBUF_INIT;
> +	
> +	if (direction == GIT_ATTR_CHECKIN) {
> +		struct object_id oid;
> +		struct stat st;
> +		if (lstat(path, &st))
> +			die_errno(_("unable to stat '%s'"), path);
> +		mode = canon_mode(st.st_mode);
> +		if (S_ISDIR(mode)) {
> +			/* `path` is either a directory or it is a submodule,
> +			 * in which case it is already indexed as submodule
> +			 * or it does not exist in the index yet and we need to
> +			 * check if we can resolve to a ref.
> +			*/

	/*
	 * Our multi-line comment begins and ends with
	 * slash-asterisk and asterisk-slash on their
	 * own lines without anything else.
	 */

> +			int pos = index_name_pos(istate, path, strlen(path));
> +			if (pos >= 0) {
> +				 if (S_ISGITLINK(istate->cache[pos]->ce_mode))
> +					 mode = istate->cache[pos]->ce_mode;
> +			} else if (resolve_gitlink_ref(path, "HEAD", &oid) == 0)
> +				mode = S_IFGITLINK;
> +		}

Give {braces} around the else if block as the other side has it
(Documentation/CodingGuidelines):

	- When there are multiple arms to a conditional and some of them
	  require braces, enclose even a single line block in braces for
	  consistency.

> +	} else {
> +		/*
> +		 * For GIT_ATTR_CHECKOUT and GIT_ATTR_INDEX we only check
> +		 * for mode in the index.
> +		 */
> +		int pos = index_name_pos(istate, path, strlen(path));
> +		if (pos >= 0)
> +			mode = istate->cache[pos]->ce_mode;
> +		else 
> +			return ATTR__UNSET;
> +	}
> +	strbuf_addf(&sb, "%06o", mode);
> +	return sb.buf;
> +	

The struct itself will be reclaimed as it is on stack, and sb.buf is
given back to the caller, so there technically is no leak, but I think
a more kosher way to do this involves the use of strbuf_detach().

Let's lose the blank line at the end of the function that is not
serving much useful purpose.

>  void git_check_attr(struct index_state *istate,
>  		    const char *path,
>  		    struct attr_check *check)
> @@ -1253,7 +1314,7 @@ void git_check_attr(struct index_state *istate,
>  		unsigned int n = check->items[i].attr->attr_nr;
>  		const char *value = check->all_attrs[n].value;
>  		if (value == ATTR__UNKNOWN)
> -			value = ATTR__UNSET;
> +			value = compute_builtin_attr(istate, path, check->all_attrs[n].attr);
>  		check->items[i].value = value;
>  	}
>  }

Nicely done.

> diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
> index aee2298f01..25aa3fbd05 100755
> --- a/t/t0003-attributes.sh
> +++ b/t/t0003-attributes.sh
> @@ -19,6 +19,16 @@ attr_check () {
>  	test_must_be_empty err
>  }
>  
> +attr_check_object_mode () {
> +	path="$1" &&
> +	expect="$2" &&
> +	check_opts="$3" &&
> +	git check-attr $check_opts builtin_objectmode -- "$path" >actual 2>err &&
> +	echo "$path: builtin_objectmode: $expect" >expect &&
> +	test_cmp expect actual
> +	test_must_be_empty err
> +}
> +
>  attr_check_quote () {
>  	path="$1" quoted_path="$2" expect="$3" &&
>  
> @@ -558,4 +568,38 @@ test_expect_success EXPENSIVE 'large attributes file ignored in index' '
>  	test_cmp expect err
>  '
>  
> +test_expect_success 'native object mode attributes work' '

> +	>exec && chmod +x exec && attr_check_object_mode exec 100755 &&
> +	>normal && attr_check_object_mode normal 100644 &&
> +	mkdir dir && attr_check_object_mode dir 040000 &&

Just a style thing, but we tend to write one statement on each line.

> +	>to_sym ln -s to_sym sym && attr_check_object_mode sym 120000
> +'

You do not need to_sym file, do you?  Symbolic links can be left
dangling.

In any case, the above test would not work on a filesystem without
native executable bit support.  Also, you cannot do "ln -s" on
certain filesystems.  It is a good idea to split the above into
three tests, one that is run unconditionally (for "normal" and
"dir"), another that is run under the POSIXPERM prerequisite, and
the third that is run under the SYNLINKS prerequisite.

> +test_expect_success 'native object mode attributes work with --cached' '
> +	>normal && attr_check_object_mode normal unspecified --cached &&
> +	git add normal && attr_check_object_mode normal 100644 --cached
> +'

For "--cached test", on the other hand, we should be able to set the
executable bit or record a symbolic link regardless of the
filesystem using "update-index", e.g.,

	>normal && git add normal &&
	empty_blob=$(git rev-parse :normal)
	cat <<-EOF | git update-index --index-info
	100755 $empty_blob 0	exec
	120000 $empty_blob 0	symlink
	EOF
	attr_check_object_mode normal 100644 --cached &&
	attr_check_object_mode exec 100755 --cached &&
	attr_check_object_mode symlink 120000 --cached

or something.

> +test_expect_success 'check object mode attributes work for submodules' '
> +	mkdir sub &&
> +	(
> +		cd sub &&
> +		git init &&
> +		mv .git .real &&
> +		echo "gitdir: .real" >.git &&
> +		test_commit first
> +	) &&
> +	attr_check_object_mode sub 160000 &&
> +	attr_check_object_mode sub unspecified --cached &&
> +	git add sub &&
> +	attr_check_object_mode sub 160000 --cached
> +'

Sounds good.

> +test_expect_success 'we do not allow user defined builtin_* attributes' '
> +	echo "foo* builtin_foo" >.gitattributes &&
> +	git add .gitattributes 2>actual &&
> +	echo "builtin_foo is not a valid attribute name: .gitattributes:1" >expect &&
> +	test_cmp expect actual
> +'

Another thing that is worth checking is what happens when there is an
conflicting entry:

	echo "foo builtin_objectmode=200" >.gitattributes &&
	>foo &&
	attr_check_object_mode foo 100644

>  test_done
> diff --git a/t/t6135-pathspec-with-attrs.sh b/t/t6135-pathspec-with-attrs.sh
> index a9c1e4e0ec..b08a32ea68 100755
> --- a/t/t6135-pathspec-with-attrs.sh
> +++ b/t/t6135-pathspec-with-attrs.sh
> @@ -295,4 +295,29 @@ test_expect_success 'reading from .gitattributes in a subdirectory (3)' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'pathspec with builtin_objectmode attr can be used' '
> +	>mode_exec_file_1 &&
> +
> +	git status -s ":(attr:builtin_objectmode=100644)mode_exec_*" >actual &&
> +	echo ?? mode_exec_file_1 >expect &&
> +	test_cmp expect actual &&
> +
> +	git add mode_exec_file_1 && chmod +x mode_exec_file_1 &&
> +	git status -s ":(attr:builtin_objectmode=100755)mode_exec_*" >actual &&
> +	echo AM mode_exec_file_1 >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'builtin_objectmode attr can be excluded' '
> +	>mode_1_regular &&
> +	>mode_1_exec  && chmod +x mode_1_exec &&
> +	git status -s ":(exclude,attr:builtin_objectmode=100644)" "mode_1_*" >actual &&
> +	echo ?? mode_1_exec >expect &&
> +	test_cmp expect actual &&
> +
> +	git status -s ":(exclude,attr:builtin_objectmode=100755)" "mode_1_*" >actual &&
> +	echo ?? mode_1_regular >expect &&
> +	test_cmp expect actual
> +'
> +
>  test_done
Eric Sunshine Nov. 16, 2023, 1:37 a.m. UTC | #2
On Wed, Nov 15, 2023 at 8:26 PM Junio C Hamano <gitster@pobox.com> wrote:
> Joanna Wang <jojwang@google.com> writes:
> > +test_expect_success 'native object mode attributes work with --cached' '
> > +     >normal && attr_check_object_mode normal unspecified --cached &&
> > +     git add normal && attr_check_object_mode normal 100644 --cached
> > +'
>
> For "--cached test", on the other hand, we should be able to set the
> executable bit or record a symbolic link regardless of the
> filesystem using "update-index", e.g.,
>
>         empty_blob=$(git rev-parse :normal)
>         cat <<-EOF | git update-index --index-info
>         100755 $empty_blob 0    exec
>         120000 $empty_blob 0    symlink
>         EOF
>
> or something.

A bit more idiomatic in this codebase would be:

        git update-index --index-info <<-EOF
        100755 $empty_blob 0    exec
        120000 $empty_blob 0    symlink
        EOF

No need for `cat`.
Junio C Hamano Nov. 16, 2023, 2:53 a.m. UTC | #3
Eric Sunshine <sunshine@sunshineco.com> writes:

> A bit more idiomatic in this codebase would be:
>
>         git update-index --index-info <<-EOF
>         100755 $empty_blob 0    exec
>         120000 $empty_blob 0    symlink
>         EOF
>
> No need for `cat`.

Aye, of course.  We should not cat a single-source datastream into a
pipe.  Thanks for correcting me.
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 8c1793c148..84bad3e741 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -100,6 +100,20 @@  for a path to `Unspecified` state.  This can be done by listing
 the name of the attribute prefixed with an exclamation point `!`.
 
 
+RESERVED BUILTIN_* ATTRIBUTES
+-----------------------------
+
+builtin_* is a reserved namespace for builtin attribute values. Any
+user defined attributes under this namespace will result in a warning.
+
+`builtin_objectmode`
+^^^^^^^^^^^^^^^^^^^^
+This attribute is for filtering files by their file bit modes (40000,
+120000, 160000, 100755, 100644). e.g. ':(attr:builtin_objectmode=160000)'.
+You may also check these values with `git check-attr builtin_objectmode -- <file>`.
+If the object is not in the index `git check-attr --cached` will return unspecified.
+
+
 EFFECTS
 -------
 
diff --git a/attr.c b/attr.c
index e62876dfd3..8493f2c5c0 100644
--- a/attr.c
+++ b/attr.c
@@ -17,6 +17,7 @@ 
 #include "utf8.h"
 #include "quote.h"
 #include "read-cache-ll.h"
+#include "refs.h"
 #include "revision.h"
 #include "object-store-ll.h"
 #include "setup.h"
@@ -183,6 +184,15 @@  static void all_attrs_init(struct attr_hashmap *map, struct attr_check *check)
 	}
 }
 
+/*
+ * Atribute name cannot begin with "builtin_" which
+ * is a reserved namespace for built in attributes values.
+ */
+static int attr_name_reserved(const char *name)
+{
+	return !strncmp(name, "builtin_", strlen("builtin_"));
+}
+
 static int attr_name_valid(const char *name, size_t namelen)
 {
 	/*
@@ -315,7 +325,7 @@  static const char *parse_attr(const char *src, int lineno, const char *cp,
 			cp++;
 			len--;
 		}
-		if (!attr_name_valid(cp, len)) {
+		if (!attr_name_valid(cp, len) || attr_name_reserved(cp)) {
 			report_invalid_attr(cp, len, src, lineno);
 			return NULL;
 		}
@@ -379,7 +389,7 @@  static struct match_attr *parse_attr_line(const char *line, const char *src,
 		name += strlen(ATTRIBUTE_MACRO_PREFIX);
 		name += strspn(name, blank);
 		namelen = strcspn(name, blank);
-		if (!attr_name_valid(name, namelen)) {
+		if (!attr_name_valid(name, namelen) || attr_name_reserved(name)) {
 			report_invalid_attr(name, namelen, src, lineno);
 			goto fail_return;
 		}
@@ -1240,6 +1250,57 @@  static struct object_id *default_attr_source(void)
 	return &attr_source;
 }
 
+static const char *builtin_object_mode_attr(struct index_state *istate, const char *path)
+{
+	unsigned int mode;
+	struct strbuf sb = STRBUF_INIT;
+	
+	if (direction == GIT_ATTR_CHECKIN) {
+		struct object_id oid;
+		struct stat st;
+		if (lstat(path, &st))
+			die_errno(_("unable to stat '%s'"), path);
+		mode = canon_mode(st.st_mode);
+		if (S_ISDIR(mode)) {
+			/* `path` is either a directory or it is a submodule,
+			 * in which case it is already indexed as submodule
+			 * or it does not exist in the index yet and we need to
+			 * check if we can resolve to a ref.
+			*/
+			int pos = index_name_pos(istate, path, strlen(path));
+			if (pos >= 0) {
+				 if (S_ISGITLINK(istate->cache[pos]->ce_mode))
+					 mode = istate->cache[pos]->ce_mode;
+			} else if (resolve_gitlink_ref(path, "HEAD", &oid) == 0)
+				mode = S_IFGITLINK;
+		}
+	} else {
+		/*
+		 * For GIT_ATTR_CHECKOUT and GIT_ATTR_INDEX we only check
+		 * for mode in the index.
+		 */
+		int pos = index_name_pos(istate, path, strlen(path));
+		if (pos >= 0)
+			mode = istate->cache[pos]->ce_mode;
+		else 
+			return ATTR__UNSET;
+	}
+	strbuf_addf(&sb, "%06o", mode);
+	return sb.buf;
+	
+}
+
+
+static const char *compute_builtin_attr(struct index_state *istate,
+					  const char *path,
+					  const struct git_attr *attr) {
+	const struct git_attr *object_mode_attr = git_attr("builtin_objectmode");
+	
+	if (attr == object_mode_attr)
+		return builtin_object_mode_attr(istate, path);
+	return ATTR__UNSET;
+}
+
 void git_check_attr(struct index_state *istate,
 		    const char *path,
 		    struct attr_check *check)
@@ -1253,7 +1314,7 @@  void git_check_attr(struct index_state *istate,
 		unsigned int n = check->items[i].attr->attr_nr;
 		const char *value = check->all_attrs[n].value;
 		if (value == ATTR__UNKNOWN)
-			value = ATTR__UNSET;
+			value = compute_builtin_attr(istate, path, check->all_attrs[n].attr);
 		check->items[i].value = value;
 	}
 }
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index aee2298f01..25aa3fbd05 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -19,6 +19,16 @@  attr_check () {
 	test_must_be_empty err
 }
 
+attr_check_object_mode () {
+	path="$1" &&
+	expect="$2" &&
+	check_opts="$3" &&
+	git check-attr $check_opts builtin_objectmode -- "$path" >actual 2>err &&
+	echo "$path: builtin_objectmode: $expect" >expect &&
+	test_cmp expect actual
+	test_must_be_empty err
+}
+
 attr_check_quote () {
 	path="$1" quoted_path="$2" expect="$3" &&
 
@@ -558,4 +568,38 @@  test_expect_success EXPENSIVE 'large attributes file ignored in index' '
 	test_cmp expect err
 '
 
+test_expect_success 'native object mode attributes work' '
+	>exec && chmod +x exec && attr_check_object_mode exec 100755 &&
+	>normal && attr_check_object_mode normal 100644 &&
+	mkdir dir && attr_check_object_mode dir 040000 &&
+	>to_sym ln -s to_sym sym && attr_check_object_mode sym 120000
+'
+
+test_expect_success 'native object mode attributes work with --cached' '
+	>normal && attr_check_object_mode normal unspecified --cached &&
+	git add normal && attr_check_object_mode normal 100644 --cached
+'
+
+test_expect_success 'check object mode attributes work for submodules' '
+	mkdir sub &&
+	(
+		cd sub &&
+		git init &&
+		mv .git .real &&
+		echo "gitdir: .real" >.git &&
+		test_commit first
+	) &&
+	attr_check_object_mode sub 160000 &&
+	attr_check_object_mode sub unspecified --cached &&
+	git add sub &&
+	attr_check_object_mode sub 160000 --cached
+'
+
+test_expect_success 'we do not allow user defined builtin_* attributes' '
+	echo "foo* builtin_foo" >.gitattributes &&
+	git add .gitattributes 2>actual &&
+	echo "builtin_foo is not a valid attribute name: .gitattributes:1" >expect &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t6135-pathspec-with-attrs.sh b/t/t6135-pathspec-with-attrs.sh
index a9c1e4e0ec..b08a32ea68 100755
--- a/t/t6135-pathspec-with-attrs.sh
+++ b/t/t6135-pathspec-with-attrs.sh
@@ -295,4 +295,29 @@  test_expect_success 'reading from .gitattributes in a subdirectory (3)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'pathspec with builtin_objectmode attr can be used' '
+	>mode_exec_file_1 &&
+
+	git status -s ":(attr:builtin_objectmode=100644)mode_exec_*" >actual &&
+	echo ?? mode_exec_file_1 >expect &&
+	test_cmp expect actual &&
+
+	git add mode_exec_file_1 && chmod +x mode_exec_file_1 &&
+	git status -s ":(attr:builtin_objectmode=100755)mode_exec_*" >actual &&
+	echo AM mode_exec_file_1 >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'builtin_objectmode attr can be excluded' '
+	>mode_1_regular &&
+	>mode_1_exec  && chmod +x mode_1_exec &&
+	git status -s ":(exclude,attr:builtin_objectmode=100644)" "mode_1_*" >actual &&
+	echo ?? mode_1_exec >expect &&
+	test_cmp expect actual &&
+
+	git status -s ":(exclude,attr:builtin_objectmode=100755)" "mode_1_*" >actual &&
+	echo ?? mode_1_regular >expect &&
+	test_cmp expect actual
+'
+
 test_done