Message ID | 20190813024307.705016-3-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Honor .gitattributes with rebase --am | expand |
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > When applying multiple patches with git am, or when rebasing using the > am backend, it's possible that one of our patches has updated a > gitattributes file. Currently, we cache this information, so if a > file in a subsequent patch has attributes applied, the file will be > written out with the attributes in place as of the time we started the > rebase or am operation, not with the attributes applied by the previous > patch. This problem does not occur when using the -m or -i flags to > rebase. Back when "am" was a script that repeatedly calls "apply --index && commit-tree", caching the original contents of the attributes file and using it throughout the series would have been impossible to begin with, so this must be a regression when we rewrote it in C and moved to do everything in a single process without clearly the state between the steps, I guess. "rebase -m" and "rebase -i" are not repeated run_command() calls that invoke "git cherry-pick" or "git merge" these days, either, so I am somewhat curious how they avoid fallilng into the same trap. Thanks for the fix. Will queue.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > +test_expect_success 'rebase --am and .gitattributes' ' > + test_create_repo attributes && > + ( > + cd attributes && > + test_commit init && > + test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" && > + test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" && These eventually invokes test-when-finished for cleaning things up, that cannot be called inside a subshell. The "attributes" test repository is a throw-away reopsitory, so we should be able to just use "git config" to set the variables locally in it, no? > + test_commit second && > + git checkout -b test HEAD^ && > + > + echo "*.txt filter=test" >.gitattributes && > + git add .gitattributes && > + test_commit third && > + > + echo "This text is smudged." >a.txt && > + git add a.txt && > + test_commit fourth && > + git rebase master && > + grep "smudged" a.txt > + ) > +' > + > test_expect_success 'rebase--merge.sh and --show-current-patch' ' > test_create_repo conflict-merge && > (
On 2019-08-13 at 22:37:56, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > +test_expect_success 'rebase --am and .gitattributes' ' > > + test_create_repo attributes && > > + ( > > + cd attributes && > > + test_commit init && > > + test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" && > > + test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" && > > These eventually invokes test-when-finished for cleaning things up, > that cannot be called inside a subshell. The "attributes" test > repository is a throw-away reopsitory, so we should be able to just > use "git config" to set the variables locally in it, no? Yup, that's safe. I'll reroll with that.
Junio C Hamano <gitster@pobox.com> writes: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > >> When applying multiple patches with git am, or when rebasing using the >> am backend, it's possible that one of our patches has updated a >> gitattributes file. Currently, we cache this information, so if a >> file in a subsequent patch has attributes applied, the file will be >> written out with the attributes in place as of the time we started the >> rebase or am operation, not with the attributes applied by the previous >> patch. This problem does not occur when using the -m or -i flags to >> rebase. > ... > "rebase -m" and "rebase -i" are not repeated run_command() calls > that invoke "git cherry-pick" or "git merge" these days, either, so > I am somewhat curious how they avoid fallilng into the same trap. > > Thanks for the fix. Will queue. Actually there still is one more thing I wasn't clear about the change. > To ensure we write the correct data into the working tree, expire the > cache after each patch that touches a path ending in ".gitattributes". > ... > + if (!flush_attributes && patch->new_name && > + ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) > + flush_attributes = 1; When an attribute file is removed by a patch, we should forget what we read earlier from the file before it got removed. Would such a case, where patch->new_name would be NULL, be handled correctly? The call to ends_with_path_components() is almost no cost, and I would suspect that this call is easier to reason about without the "!flush_attributes &&" in the conditional part, by the way. Thanks.
On 2019-08-15 at 22:10:29, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > >> When applying multiple patches with git am, or when rebasing using the > >> am backend, it's possible that one of our patches has updated a > >> gitattributes file. Currently, we cache this information, so if a > >> file in a subsequent patch has attributes applied, the file will be > >> written out with the attributes in place as of the time we started the > >> rebase or am operation, not with the attributes applied by the previous > >> patch. This problem does not occur when using the -m or -i flags to > >> rebase. > > ... > > "rebase -m" and "rebase -i" are not repeated run_command() calls > > that invoke "git cherry-pick" or "git merge" these days, either, so > > I am somewhat curious how they avoid fallilng into the same trap. > > > > Thanks for the fix. Will queue. > > Actually there still is one more thing I wasn't clear about the > change. > > > To ensure we write the correct data into the working tree, expire the > > cache after each patch that touches a path ending in ".gitattributes". > > ... > > + if (!flush_attributes && patch->new_name && > > + ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) > > + flush_attributes = 1; > > When an attribute file is removed by a patch, we should forget what > we read earlier from the file before it got removed. Would such a > case, where patch->new_name would be NULL, be handled correctly? That's a good question. I don't think we handle that case, so I'll include that (and an appropriate test) with my reroll. I suspect we probably want something like: if ((patch->new_name && ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) || (patch->old_name && ends_with_path_components(patch->old_name, GITATTRIBUTES_FILE))) This should cause at most one flush per patch, and I think this should cover both cases, as well as any we haven't thought of. It's also possible that we could get a copy or rename that causes a false positive, but considering the contents of a .gitattributes file, that seems unlikely enough in practice that it's probably not worth worrying about for now. > The call to ends_with_path_components() is almost no cost, and I > would suspect that this call is easier to reason about without the > "!flush_attributes &&" in the conditional part, by the way. Yeah, it probably is. It was added to avoid the allocation, but now that we don't have one, it shouldn't be a problem.
diff --git a/apply.c b/apply.c index cde95369bb..e5373452ab 100644 --- a/apply.c +++ b/apply.c @@ -4643,6 +4643,7 @@ static int apply_patch(struct apply_state *state, struct patch *list = NULL, **listp = &list; int skipped_patch = 0; int res = 0; + int flush_attributes = 0; state->patch_input_file = filename; if (read_patch_file(&buf, fd) < 0) @@ -4670,6 +4671,10 @@ static int apply_patch(struct apply_state *state, patch_stats(state, patch); *listp = patch; listp = &patch->next; + + if (!flush_attributes && patch->new_name && + ends_with_path_components(patch->new_name, GITATTRIBUTES_FILE)) + flush_attributes = 1; } else { if (state->apply_verbosity > verbosity_normal) @@ -4746,6 +4751,8 @@ static int apply_patch(struct apply_state *state, if (state->summary && state->apply_verbosity > verbosity_silent) summary_patch_list(list); + if (flush_attributes) + reset_parsed_attributes(); end: free_patch_list(list); strbuf_release(&buf); diff --git a/convert.c b/convert.c index 94ff837649..030e9b81b9 100644 --- a/convert.c +++ b/convert.c @@ -1293,10 +1293,11 @@ struct conv_attrs { const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */ }; +static struct attr_check *check; + static void convert_attrs(const struct index_state *istate, struct conv_attrs *ca, const char *path) { - static struct attr_check *check; struct attr_check_item *ccheck = NULL; if (!check) { @@ -1339,6 +1340,12 @@ static void convert_attrs(const struct index_state *istate, ca->crlf_action = CRLF_AUTO_INPUT; } +void reset_parsed_attributes(void) +{ + attr_check_free(check); + check = NULL; +} + int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path) { struct conv_attrs ca; diff --git a/convert.h b/convert.h index 831559f10d..3710969d43 100644 --- a/convert.h +++ b/convert.h @@ -94,6 +94,12 @@ void convert_to_git_filter_fd(const struct index_state *istate, int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path); +/* + * Reset the internal list of attributes used by convert_to_git and + * convert_to_working_tree. + */ +void reset_parsed_attributes(void); + /***************************************************************** * * Streaming conversion support diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh index 80b23fd326..062dc41df7 100755 --- a/t/t3400-rebase.sh +++ b/t/t3400-rebase.sh @@ -301,6 +301,29 @@ test_expect_success 'rebase --am and --show-current-patch' ' ) ' +test_expect_success 'rebase --am and .gitattributes' ' + test_create_repo attributes && + ( + cd attributes && + test_commit init && + test_config filter.test.clean "sed -e '\''s/smudged/clean/g'\''" && + test_config filter.test.smudge "sed -e '\''s/clean/smudged/g'\''" && + + test_commit second && + git checkout -b test HEAD^ && + + echo "*.txt filter=test" >.gitattributes && + git add .gitattributes && + test_commit third && + + echo "This text is smudged." >a.txt && + git add a.txt && + test_commit fourth && + git rebase master && + grep "smudged" a.txt + ) +' + test_expect_success 'rebase--merge.sh and --show-current-patch' ' test_create_repo conflict-merge && (
When applying multiple patches with git am, or when rebasing using the am backend, it's possible that one of our patches has updated a gitattributes file. Currently, we cache this information, so if a file in a subsequent patch has attributes applied, the file will be written out with the attributes in place as of the time we started the rebase or am operation, not with the attributes applied by the previous patch. This problem does not occur when using the -m or -i flags to rebase. To ensure we write the correct data into the working tree, expire the cache after each patch that touches a path ending in ".gitattributes". Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- apply.c | 7 +++++++ convert.c | 9 ++++++++- convert.h | 6 ++++++ t/t3400-rebase.sh | 23 +++++++++++++++++++++++ 4 files changed, 44 insertions(+), 1 deletion(-)