[v3,2/2] apply: reload .gitattributes after patching it
diff mbox series

Message ID 20190813024307.705016-3-sandals@crustytoothpaste.net
State New
Headers show
Series
  • Honor .gitattributes with rebase --am
Related show

Commit Message

brian m. carlson Aug. 13, 2019, 2:43 a.m. UTC
When applying multiple patches with git am, or when rebasing using the
am backend, it's possible that one of our patches has updated a
gitattributes file. Currently, we cache this information, so if a
file in a subsequent patch has attributes applied, the file will be
written out with the attributes in place as of the time we started the
rebase or am operation, not with the attributes applied by the previous
patch. This problem does not occur when using the -m or -i flags to
rebase.

To ensure we write the correct data into the working tree, expire the
cache after each patch that touches a path ending in ".gitattributes".

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 apply.c           |  7 +++++++
 convert.c         |  9 ++++++++-
 convert.h         |  6 ++++++
 t/t3400-rebase.sh | 23 +++++++++++++++++++++++
 4 files changed, 44 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Aug. 13, 2019, 6:08 p.m. UTC | #1
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> When applying multiple patches with git am, or when rebasing using the
> am backend, it's possible that one of our patches has updated a
> gitattributes file. Currently, we cache this information, so if a
> file in a subsequent patch has attributes applied, the file will be
> written out with the attributes in place as of the time we started the
> rebase or am operation, not with the attributes applied by the previous
> patch. This problem does not occur when using the -m or -i flags to
> rebase.

Back when "am" was a script that repeatedly calls "apply --index &&
commit-tree", caching the original contents of the attributes file
and using it throughout the series would have been impossible to
begin with, so this must be a regression when we rewrote it in C and
moved to do everything in a single process without clearly the state
between the steps, I guess.

"rebase -m" and "rebase -i" are not repeated run_command() calls
that invoke "git cherry-pick" or "git merge" these days, either, so
I am somewhat curious how they avoid fallilng into the same trap.

Thanks for the fix.  Will queue.
Junio C Hamano Aug. 13, 2019, 10:37 p.m. UTC | #2
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> +test_expect_success 'rebase --am and .gitattributes' '
> +	test_create_repo attributes &&
> +	(
> +		cd attributes &&
> +		test_commit init &&
> +		test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
> +		test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&

These eventually invokes test-when-finished for cleaning things up,
that cannot be called inside a subshell.  The "attributes" test
repository is a throw-away reopsitory, so we should be able to just
use "git config" to set the variables locally in it, no?

> +		test_commit second &&
> +		git checkout -b test HEAD^ &&
> +
> +		echo "*.txt filter=test" >.gitattributes &&
> +		git add .gitattributes &&
> +		test_commit third &&
> +
> +		echo "This text is smudged." >a.txt &&
> +		git add a.txt &&
> +		test_commit fourth &&
> +		git rebase master &&
> +		grep "smudged" a.txt
> +	)
> +'
> +
>  test_expect_success 'rebase--merge.sh and --show-current-patch' '
>  	test_create_repo conflict-merge &&
>  	(
brian m. carlson Aug. 13, 2019, 10:48 p.m. UTC | #3
On 2019-08-13 at 22:37:56, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > +test_expect_success 'rebase --am and .gitattributes' '
> > +	test_create_repo attributes &&
> > +	(
> > +		cd attributes &&
> > +		test_commit init &&
> > +		test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
> > +		test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
> 
> These eventually invokes test-when-finished for cleaning things up,
> that cannot be called inside a subshell.  The "attributes" test
> repository is a throw-away reopsitory, so we should be able to just
> use "git config" to set the variables locally in it, no?

Yup, that's safe. I'll reroll with that.
Junio C Hamano Aug. 15, 2019, 10:10 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> When applying multiple patches with git am, or when rebasing using the
>> am backend, it's possible that one of our patches has updated a
>> gitattributes file. Currently, we cache this information, so if a
>> file in a subsequent patch has attributes applied, the file will be
>> written out with the attributes in place as of the time we started the
>> rebase or am operation, not with the attributes applied by the previous
>> patch. This problem does not occur when using the -m or -i flags to
>> rebase.
> ...
> "rebase -m" and "rebase -i" are not repeated run_command() calls
> that invoke "git cherry-pick" or "git merge" these days, either, so
> I am somewhat curious how they avoid fallilng into the same trap.
>
> Thanks for the fix.  Will queue.

Actually there still is one more thing I wasn't clear about the
change.

> To ensure we write the correct data into the working tree, expire the
> cache after each patch that touches a path ending in ".gitattributes".
> ...
> +			if (!flush_attributes && patch->new_name &&
> +			    ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE))
> +				flush_attributes = 1;

When an attribute file is removed by a patch, we should forget what
we read earlier from the file before it got removed.  Would such a
case, where patch->new_name would be NULL, be handled correctly?

The call to ends_with_path_components() is almost no cost, and I
would suspect that this call is easier to reason about without the
"!flush_attributes &&" in the conditional part, by the way.

Thanks.
brian m. carlson Aug. 16, 2019, 12:22 a.m. UTC | #5
On 2019-08-15 at 22:10:29, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >
> >> When applying multiple patches with git am, or when rebasing using the
> >> am backend, it's possible that one of our patches has updated a
> >> gitattributes file. Currently, we cache this information, so if a
> >> file in a subsequent patch has attributes applied, the file will be
> >> written out with the attributes in place as of the time we started the
> >> rebase or am operation, not with the attributes applied by the previous
> >> patch. This problem does not occur when using the -m or -i flags to
> >> rebase.
> > ...
> > "rebase -m" and "rebase -i" are not repeated run_command() calls
> > that invoke "git cherry-pick" or "git merge" these days, either, so
> > I am somewhat curious how they avoid fallilng into the same trap.
> >
> > Thanks for the fix.  Will queue.
> 
> Actually there still is one more thing I wasn't clear about the
> change.
> 
> > To ensure we write the correct data into the working tree, expire the
> > cache after each patch that touches a path ending in ".gitattributes".
> > ...
> > +			if (!flush_attributes && patch->new_name &&
> > +			    ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE))
> > +				flush_attributes = 1;
> 
> When an attribute file is removed by a patch, we should forget what
> we read earlier from the file before it got removed.  Would such a
> case, where patch->new_name would be NULL, be handled correctly?

That's a good question. I don't think we handle that case, so I'll
include that (and an appropriate test) with my reroll.

I suspect we probably want something like:

  if ((patch->new_name && ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) ||
      (patch->old_name && ends_with_path_components(patch->old_name, GITATTRIBUTES_FILE)))

This should cause at most one flush per patch, and I think this should
cover both cases, as well as any we haven't thought of. It's also
possible that we could get a copy or rename that causes a false
positive, but considering the contents of a .gitattributes file, that
seems unlikely enough in practice that it's probably not worth worrying
about for now.

> The call to ends_with_path_components() is almost no cost, and I
> would suspect that this call is easier to reason about without the
> "!flush_attributes &&" in the conditional part, by the way.

Yeah, it probably is. It was added to avoid the allocation, but now that
we don't have one, it shouldn't be a problem.

Patch
diff mbox series

diff --git a/apply.c b/apply.c
index cde95369bb..e5373452ab 100644
--- a/apply.c
+++ b/apply.c
@@ -4643,6 +4643,7 @@  static int apply_patch(struct apply_state *state,
 	struct patch *list = NULL, **listp = &list;
 	int skipped_patch = 0;
 	int res = 0;
+	int flush_attributes = 0;
 
 	state->patch_input_file = filename;
 	if (read_patch_file(&buf, fd) < 0)
@@ -4670,6 +4671,10 @@  static int apply_patch(struct apply_state *state,
 			patch_stats(state, patch);
 			*listp = patch;
 			listp = &patch->next;
+
+			if (!flush_attributes && patch->new_name &&
+			    ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE))
+				flush_attributes = 1;
 		}
 		else {
 			if (state->apply_verbosity > verbosity_normal)
@@ -4746,6 +4751,8 @@  static int apply_patch(struct apply_state *state,
 	if (state->summary && state->apply_verbosity > verbosity_silent)
 		summary_patch_list(list);
 
+	if (flush_attributes)
+		reset_parsed_attributes();
 end:
 	free_patch_list(list);
 	strbuf_release(&buf);
diff --git a/convert.c b/convert.c
index 94ff837649..030e9b81b9 100644
--- a/convert.c
+++ b/convert.c
@@ -1293,10 +1293,11 @@  struct conv_attrs {
 	const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
 };
 
+static struct attr_check *check;
+
 static void convert_attrs(const struct index_state *istate,
 			  struct conv_attrs *ca, const char *path)
 {
-	static struct attr_check *check;
 	struct attr_check_item *ccheck = NULL;
 
 	if (!check) {
@@ -1339,6 +1340,12 @@  static void convert_attrs(const struct index_state *istate,
 		ca->crlf_action = CRLF_AUTO_INPUT;
 }
 
+void reset_parsed_attributes(void)
+{
+	attr_check_free(check);
+	check = NULL;
+}
+
 int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
diff --git a/convert.h b/convert.h
index 831559f10d..3710969d43 100644
--- a/convert.h
+++ b/convert.h
@@ -94,6 +94,12 @@  void convert_to_git_filter_fd(const struct index_state *istate,
 int would_convert_to_git_filter_fd(const struct index_state *istate,
 				   const char *path);
 
+/*
+ * Reset the internal list of attributes used by convert_to_git and
+ * convert_to_working_tree.
+ */
+void reset_parsed_attributes(void);
+
 /*****************************************************************
  *
  * Streaming conversion support
diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
index 80b23fd326..062dc41df7 100755
--- a/t/t3400-rebase.sh
+++ b/t/t3400-rebase.sh
@@ -301,6 +301,29 @@  test_expect_success 'rebase --am and --show-current-patch' '
 	)
 '
 
+test_expect_success 'rebase --am and .gitattributes' '
+	test_create_repo attributes &&
+	(
+		cd attributes &&
+		test_commit init &&
+		test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" &&
+		test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" &&
+
+		test_commit second &&
+		git checkout -b test HEAD^ &&
+
+		echo "*.txt filter=test" >.gitattributes &&
+		git add .gitattributes &&
+		test_commit third &&
+
+		echo "This text is smudged." >a.txt &&
+		git add a.txt &&
+		test_commit fourth &&
+		git rebase master &&
+		grep "smudged" a.txt
+	)
+'
+
 test_expect_success 'rebase--merge.sh and --show-current-patch' '
 	test_create_repo conflict-merge &&
 	(