diff mbox series

[V2] git-apply: Allow simultaneous --cached and --3way options

Message ID 20210405221902.27998-1-jerry@skydio.com (mailing list archive)
State New, archived
Headers show
Series [V2] git-apply: Allow simultaneous --cached and --3way options | expand

Commit Message

Jerry Zhang April 5, 2021, 10:19 p.m. UTC
Previously, --cached and --3way were not
allowed to be used together, since --3way
wrote conflict markers into the working tree.

These changes change semantics so that if
these flags are given together and there is
a conflict, the conflicting objects are left
at a higher order in the cache, and the command
will return non-zero. If there is no conflict,
the patch is applied directly to cache as
expected and the command will return 0.

The user can use `git diff` to view the contents
of the conflict, or `git checkout -m -- .` to
regenerate the conflict markers in the working
directory.

With the combined --3way and --cached flags,
The conflict markers won't be written to the
working directory, so there is no point in
attempting rerere.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Jerry Zhang <jerryxzha@googlemail.com>
---
 Documentation/git-apply.txt | 3 ++-
 apply.c                     | 5 ++---
 2 files changed, 4 insertions(+), 4 deletions(-)

Comments

Junio C Hamano April 5, 2021, 10:46 p.m. UTC | #1
Jerry Zhang <jerry@skydio.com> writes:

> Subject: Re: [PATCH V2] git-apply: Allow simultaneous --cached and --3way options

s/Allow/allow/ (cf. "git shortlog --no-merged" output for recent examples)

> Previously, --cached and --3way were not
> allowed to be used together, since --3way
> wrote conflict markers into the working tree.

Hint that you are talking about the "git apply" command by
mentioning the name somewhere.

Drop "previously"; we talk about the status quo in the present tense
in our proposed commit log messages to set the stage, and then describe
what the patch author percieves as a problem, before describing the
proposed solution to the problem.

cf. Documentation/SubmittingPatches[[describe-changes]] (the whole section)

> These changes change semantics so that if
> these flags are given together and there is
> a conflict, the conflicting objects are left
> at a higher order in the cache, and the command
> will return non-zero. If there is no conflict,
> the patch is applied directly to cache as
> expected and the command will return 0.

Give an order to the codebase to "be like so".  Here is my attempt.

    Teach "git apply" to accept "--cached" and "--3way" at the same
    time.  Only when all changes to all paths involved in the
    application auto-resolve cleanly, the result is placed in the
    index at stage #0 and the command exits with 0 status.  If there
    is any path whose conflict cannot be cleanly auto-resolved, the
    original contents from common ancestor (stage #1), our version
    (stage #2) and the contents from the patch (stage #3) for the
    conflicted paths are left at separate stages without any attempt
    to resolve the conflict at the content level, and the command
    exists with non-zero status, because there is no place (like the
    working tree files) to leave a half-resolved conflicted merge
    result to ask the end-user to resolve.

> The user can use `git diff` to view the contents
> of the conflict, or `git checkout -m -- .` to
> regenerate the conflict markers in the working
> directory.

Nice.

> With the combined --3way and --cached flags,
> The conflict markers won't be written to the
> working directory, so there is no point in
> attempting rerere.

I am not sure what this paragraph is trying to convey here.

I agree that when a *new* conflict is encountered in this new mode,
writing out a rerere pre-image, in preparation for accepting the
post-image the end-user gives us after the conflicts are resolved,
does not make sense, because we are not giving the end-user the
conflicted state and asking to help resolve it for us.

But if a rerere database entry records a previous merge result in
which conflicts were resolved by the end user, it would make sense
to try reusing the resolution, I would think.  I offhand do not know
how involved it would be to do so, so punting on that is fine, but
that is "there is no point", but it is "we are not trying".

Perhaps

    When there are conflicts, theoretically, it would be nice to be
    able to replay an existing entry in the rerere database that
    records already resolved conflict that match the current one,
    but that would be too much work, so let's not try it for now.

would be a good explanation why we are not doing (i.e. we made a
trade-off) and recording that is important, as it will allow others
in the future to try building on the change we are proposing here
(it is not like we decided that it is fundamentally wrong to try to
use rerere in this situation).

> Signed-off-by: Jerry Zhang <jerry@skydio.com>
> Signed-off-by: Jerry Zhang <jerryxzha@googlemail.com>

Unless we are interacting with two people with the same name, please
sign-off with the same name/address as the name/address that will be
recorded as the author of this change.  I am guessing that dropping
the latter should be sufficient?

Thanks.
Jerry Zhang April 6, 2021, 2:52 a.m. UTC | #2
Thanks for the comments! I've updated v3 with the changes. Let me know
if you have any
more thoughts on whether to block / warn the user before clobbering their cache.

On Mon, Apr 5, 2021 at 3:46 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jerry Zhang <jerry@skydio.com> writes:
>
> > Subject: Re: [PATCH V2] git-apply: Allow simultaneous --cached and --3way options
>
> s/Allow/allow/ (cf. "git shortlog --no-merged" output for recent examples)
>
> > Previously, --cached and --3way were not
> > allowed to be used together, since --3way
> > wrote conflict markers into the working tree.
>
> Hint that you are talking about the "git apply" command by
> mentioning the name somewhere.
>
> Drop "previously"; we talk about the status quo in the present tense
> in our proposed commit log messages to set the stage, and then describe
> what the patch author percieves as a problem, before describing the
> proposed solution to the problem.
>
> cf. Documentation/SubmittingPatches[[describe-changes]] (the whole section)
>
> > These changes change semantics so that if
> > these flags are given together and there is
> > a conflict, the conflicting objects are left
> > at a higher order in the cache, and the command
> > will return non-zero. If there is no conflict,
> > the patch is applied directly to cache as
> > expected and the command will return 0.
>
> Give an order to the codebase to "be like so".  Here is my attempt.
>
>     Teach "git apply" to accept "--cached" and "--3way" at the same
>     time.  Only when all changes to all paths involved in the
>     application auto-resolve cleanly, the result is placed in the
>     index at stage #0 and the command exits with 0 status.  If there
>     is any path whose conflict cannot be cleanly auto-resolved, the
>     original contents from common ancestor (stage #1), our version
>     (stage #2) and the contents from the patch (stage #3) for the
>     conflicted paths are left at separate stages without any attempt
>     to resolve the conflict at the content level, and the command
>     exists with non-zero status, because there is no place (like the
>     working tree files) to leave a half-resolved conflicted merge
>     result to ask the end-user to resolve.
>
> > The user can use `git diff` to view the contents
> > of the conflict, or `git checkout -m -- .` to
> > regenerate the conflict markers in the working
> > directory.
>
> Nice.
>
> > With the combined --3way and --cached flags,
> > The conflict markers won't be written to the
> > working directory, so there is no point in
> > attempting rerere.
>
> I am not sure what this paragraph is trying to convey here.
>
> I agree that when a *new* conflict is encountered in this new mode,
> writing out a rerere pre-image, in preparation for accepting the
> post-image the end-user gives us after the conflicts are resolved,
> does not make sense, because we are not giving the end-user the
> conflicted state and asking to help resolve it for us.
>
> But if a rerere database entry records a previous merge result in
> which conflicts were resolved by the end user, it would make sense
> to try reusing the resolution, I would think.  I offhand do not know
> how involved it would be to do so, so punting on that is fine, but
> that is "there is no point", but it is "we are not trying".
>
> Perhaps
>
>     When there are conflicts, theoretically, it would be nice to be
>     able to replay an existing entry in the rerere database that
>     records already resolved conflict that match the current one,
>     but that would be too much work, so let's not try it for now.
>
> would be a good explanation why we are not doing (i.e. we made a
> trade-off) and recording that is important, as it will allow others
> in the future to try building on the change we are proposing here
> (it is not like we decided that it is fundamentally wrong to try to
> use rerere in this situation).
>
> > Signed-off-by: Jerry Zhang <jerry@skydio.com>
> > Signed-off-by: Jerry Zhang <jerryxzha@googlemail.com>
>
> Unless we are interacting with two people with the same name, please
> sign-off with the same name/address as the name/address that will be
> recorded as the author of this change.  I am guessing that dropping
> the latter should be sufficient?
>
> Thanks.
Junio C Hamano April 6, 2021, 5:52 a.m. UTC | #3
Jerry Zhang <jerry@skydio.com> writes:

> Thanks for the comments! I've updated v3 with the changes. Let me know
> if you have any
> more thoughts on whether to block / warn the user before clobbering their cache.

Please do not top-post on this list.

I've already said that I think we should ensure the index is clean
by default, because, unlike the case where the application is done
on the working tree files, the use of "--cached" is a sign that the
next step is likely to write a tree out.  As I've already said so in
earlier reviews, there is nothing more from me to add on that issue.

>> Give an order to the codebase to "be like so".  Here is my attempt.
>>
>>     Teach "git apply" to accept "--cached" and "--3way" at the same
>>     time.  Only when all changes to all paths involved in the
>>     application auto-resolve cleanly, the result is placed in the
>>     index at stage #0 and the command exits with 0 status.  If there
>>     is any path whose conflict cannot be cleanly auto-resolved, the
>>     original contents from common ancestor (stage #1), our version
>>     (stage #2) and the contents from the patch (stage #3) for the
>>     conflicted paths are left at separate stages without any attempt
>>     to resolve the conflict at the content level, and the command
>>     exists with non-zero status, because there is no place (like the
>>     working tree files) to leave a half-resolved conflicted merge
>>     result to ask the end-user to resolve.

I wrote the above as an example to illustrate the tone and the level
of details expected in our proposed commit log message.  The
behaviour it describes may not necessarily match what you have
implemented in the patch.

For example, imagine that we are applying a patch for two paths,
where one auto-resolves cleanly and the other does not.  The above
description expects both paths will leave the higher stages (instead
of recording the auto-resolved path at stage #0, and leaving the
other path that cannot be auto-resolved at higher stages) and the
command exits with non-zero status, which may not be what you
implemented.  As an illustration, I didn't necessarily mean such an
all-or-none behaviour wrt resolving should be what we implement---I
do not want to choose, as this is your itch and I want _you_ with
the itch to think long and hard before deciding what the best design
for end-users would be, and present it as a proposed solution.  An
obvious alternative is to record auto-resolved paths at stage #0 and
leave only the paths for which auto-resolution failed in conflicted
state.

Thanks.
Jerry Zhang April 6, 2021, 9:56 p.m. UTC | #4
On Mon, Apr 5, 2021 at 10:52 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jerry Zhang <jerry@skydio.com> writes:
>
> > Thanks for the comments! I've updated v3 with the changes. Let me know
> > if you have any
> > more thoughts on whether to block / warn the user before clobbering their cache.
>
> Please do not top-post on this list.
>
> I've already said that I think we should ensure the index is clean
> by default, because, unlike the case where the application is done
> on the working tree files, the use of "--cached" is a sign that the
> next step is likely to write a tree out.  As I've already said so in
> earlier reviews, there is nothing more from me to add on that issue.
Understood, but please bear with me to explain the risks a bit more. I'm
having some difficulty coming up with a name and explanation for flags
for this case, because I don't completely understand the safety issue
we are trying to mitigate.

Let me enumerate some behaviors in 3 different cases where the user
has "file.txt" changes staged in the index, so index differs from HEAD.

"git apply --cached" would either 1. combine the patch and cached version
and put that in the cache or 2. do nothing (patch failed). In 2 nothing happened
so the user's changes are safe. In 1 the user's changes may be gone, but
since the user was forewarned, this is presumably what they wanted.

"git apply --3way" would either 1. apply cleanly to working dir or
2. conflict, in which case user's changes would be moved to stage #2
in cache. For 1 the user's changes are in the cache, so they can check that out
to restore the original state, since this invocation requires the cache
and working dir to match. For 2, the user's changes are moved to cache
in stage #2. Although the changes are preserved, there doesn't seem to
be any atomic way to move a cache entry from stage #2 to stage #0.
Something like "git restore --staged --ours file.txt" seems like it should
work, but "git restore" doesn't allow combining those flags.
The non atomic way we can do is "git checkout --ours file.txt &&
git add file.txt", this is ok in this case since we've required the index
and working tree to match.

"git apply --3way --cached" would either 1. apply cleanly to the cache or
2. conflict, and the user's changes are moved to stage #2. In 1, the user's
changes are lost because they're combined with the patch, but this is
the same as the "--cached" case by itself. In 2, the user's changes are
preserved in stage #2 similar to "--3way" by itself. What's somewhat
tricky here is restoring it to stage #0 since we can't use the working
tree, but I think that is more of a limitation in "git restore", since moving
a cache entry from stage #2 to stage #0 is a conceptually possible and
simple operation.

In summary it seems to me that merge or no merge, the safety semantics
for "--3way" + "--cached" as it is are pretty similar to the existing semantics
for those options individually. The user could be preparing to write
a tree out in either the "--cached" or the "--cached --3way" operation
so I don't understand why those must differ in safety. In addition, the
both "--3way" and "--3way --cached" perform mergey operations that
changes the stages of a file in cache, so I don't understand why those
must differ in safety either.

>
> >> Give an order to the codebase to "be like so".  Here is my attempt.
> >>
> >>     Teach "git apply" to accept "--cached" and "--3way" at the same
> >>     time.  Only when all changes to all paths involved in the
> >>     application auto-resolve cleanly, the result is placed in the
> >>     index at stage #0 and the command exits with 0 status.  If there
> >>     is any path whose conflict cannot be cleanly auto-resolved, the
> >>     original contents from common ancestor (stage #1), our version
> >>     (stage #2) and the contents from the patch (stage #3) for the
> >>     conflicted paths are left at separate stages without any attempt
> >>     to resolve the conflict at the content level, and the command
> >>     exists with non-zero status, because there is no place (like the
> >>     working tree files) to leave a half-resolved conflicted merge
> >>     result to ask the end-user to resolve.
>
> I wrote the above as an example to illustrate the tone and the level
> of details expected in our proposed commit log message.  The
> behaviour it describes may not necessarily match what you have
> implemented in the patch.
>
> For example, imagine that we are applying a patch for two paths,
> where one auto-resolves cleanly and the other does not.  The above
> description expects both paths will leave the higher stages (instead
> of recording the auto-resolved path at stage #0, and leaving the
> other path that cannot be auto-resolved at higher stages) and the
> command exits with non-zero status, which may not be what you
> implemented.  As an illustration, I didn't necessarily mean such an
> all-or-none behaviour wrt resolving should be what we implement---I
> do not want to choose, as this is your itch and I want _you_ with
> the itch to think long and hard before deciding what the best design
> for end-users would be, and present it as a proposed solution.  An
> obvious alternative is to record auto-resolved paths at stage #0 and
> leave only the paths for which auto-resolution failed in conflicted
> state.
I missed the "all changes to all paths" requirement in that description,
I'll update it to be more consistent with what it actually does. As you say,
the leaving entries at higher orders behavior only happens for conflicting
paths, not for all paths.
>
> Thanks.
Jerry Zhang April 7, 2021, 2:25 a.m. UTC | #5
On Tue, Apr 6, 2021 at 2:56 PM Jerry Zhang <jerry@skydio.com> wrote:
>
> On Mon, Apr 5, 2021 at 10:52 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Jerry Zhang <jerry@skydio.com> writes:
> >
> > > Thanks for the comments! I've updated v3 with the changes. Let me know
> > > if you have any
> > > more thoughts on whether to block / warn the user before clobbering their cache.
> >
> > Please do not top-post on this list.
> >
> > I've already said that I think we should ensure the index is clean
> > by default, because, unlike the case where the application is done
> > on the working tree files, the use of "--cached" is a sign that the
> > next step is likely to write a tree out.  As I've already said so in
> > earlier reviews, there is nothing more from me to add on that issue.
> Understood, but please bear with me to explain the risks a bit more. I'm
> having some difficulty coming up with a name and explanation for flags
> for this case, because I don't completely understand the safety issue
> we are trying to mitigate.
>
> Let me enumerate some behaviors in 3 different cases where the user
> has "file.txt" changes staged in the index, so index differs from HEAD.
>
> "git apply --cached" would either 1. combine the patch and cached version
> and put that in the cache or 2. do nothing (patch failed). In 2 nothing happened
> so the user's changes are safe. In 1 the user's changes may be gone, but
> since the user was forewarned, this is presumably what they wanted.
>
> "git apply --3way" would either 1. apply cleanly to working dir or
> 2. conflict, in which case user's changes would be moved to stage #2
> in cache. For 1 the user's changes are in the cache, so they can check that out
> to restore the original state, since this invocation requires the cache
> and working dir to match. For 2, the user's changes are moved to cache
> in stage #2. Although the changes are preserved, there doesn't seem to
> be any atomic way to move a cache entry from stage #2 to stage #0.
> Something like "git restore --staged --ours file.txt" seems like it should
> work, but "git restore" doesn't allow combining those flags.
> The non atomic way we can do is "git checkout --ours file.txt &&
> git add file.txt", this is ok in this case since we've required the index
> and working tree to match.
>
> "git apply --3way --cached" would either 1. apply cleanly to the cache or
> 2. conflict, and the user's changes are moved to stage #2. In 1, the user's
> changes are lost because they're combined with the patch, but this is
> the same as the "--cached" case by itself. In 2, the user's changes are
> preserved in stage #2 similar to "--3way" by itself. What's somewhat
> tricky here is restoring it to stage #0 since we can't use the working
> tree, but I think that is more of a limitation in "git restore", since moving
> a cache entry from stage #2 to stage #0 is a conceptually possible and
> simple operation.
Update: I found out that this can be done through "git update-index"
Say you start with
```
100755 adc1032303de8d262868c0a1b85a00fa3b97e9d8 1 file.sh
100755 d0bf2e33594ea7d9d7352df5511db562de819518 2 file.sh
100755 e9bf5dfdea804f4f1bcf24988bbc7c3f99991a09 3 file.sh
```
Pass into "git update-index --index-info the following
```
100755 d0bf2e33594ea7d9d7352df5511db562de819518 0 file.sh
0 0 1 file.sh
0 0 2 file.sh
0 0 3 file.sh
```
and you will have atomically moved the "ours" stage of file.sh into
stage #0. This is sort of what I'd expect "git checkout --ours file.sh"
to do, but it doesn't seem to do that.
>
> In summary it seems to me that merge or no merge, the safety semantics
> for "--3way" + "--cached" as it is are pretty similar to the existing semantics
> for those options individually. The user could be preparing to write
> a tree out in either the "--cached" or the "--cached --3way" operation
> so I don't understand why those must differ in safety. In addition, the
> both "--3way" and "--3way --cached" perform mergey operations that
> changes the stages of a file in cache, so I don't understand why those
> must differ in safety either.
>
> >
> > >> Give an order to the codebase to "be like so".  Here is my attempt.
> > >>
> > >>     Teach "git apply" to accept "--cached" and "--3way" at the same
> > >>     time.  Only when all changes to all paths involved in the
> > >>     application auto-resolve cleanly, the result is placed in the
> > >>     index at stage #0 and the command exits with 0 status.  If there
> > >>     is any path whose conflict cannot be cleanly auto-resolved, the
> > >>     original contents from common ancestor (stage #1), our version
> > >>     (stage #2) and the contents from the patch (stage #3) for the
> > >>     conflicted paths are left at separate stages without any attempt
> > >>     to resolve the conflict at the content level, and the command
> > >>     exists with non-zero status, because there is no place (like the
> > >>     working tree files) to leave a half-resolved conflicted merge
> > >>     result to ask the end-user to resolve.
> >
> > I wrote the above as an example to illustrate the tone and the level
> > of details expected in our proposed commit log message.  The
> > behaviour it describes may not necessarily match what you have
> > implemented in the patch.
> >
> > For example, imagine that we are applying a patch for two paths,
> > where one auto-resolves cleanly and the other does not.  The above
> > description expects both paths will leave the higher stages (instead
> > of recording the auto-resolved path at stage #0, and leaving the
> > other path that cannot be auto-resolved at higher stages) and the
> > command exits with non-zero status, which may not be what you
> > implemented.  As an illustration, I didn't necessarily mean such an
> > all-or-none behaviour wrt resolving should be what we implement---I
> > do not want to choose, as this is your itch and I want _you_ with
> > the itch to think long and hard before deciding what the best design
> > for end-users would be, and present it as a proposed solution.  An
> > obvious alternative is to record auto-resolved paths at stage #0 and
> > leave only the paths for which auto-resolution failed in conflicted
> > state.
> I missed the "all changes to all paths" requirement in that description,
> I'll update it to be more consistent with what it actually does. As you say,
> the leaving entries at higher orders behavior only happens for conflicting
> paths, not for all paths.
> >
> > Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
index 91d9a8601c..392882d9a5 100644
--- a/Documentation/git-apply.txt
+++ b/Documentation/git-apply.txt
@@ -89,7 +89,8 @@  OPTIONS
 	and we have those blobs available locally, possibly leaving the
 	conflict markers in the files in the working tree for the user to
 	resolve.  This option implies the `--index` option, and is incompatible
-	with the `--reject` and the `--cached` options.
+	with the `--reject` option. When used with the --cached option, any conflicts
+    are left at higher stages in the cache.
 
 --build-fake-ancestor=<file>::
 	Newer 'git diff' output has embedded 'index information'
diff --git a/apply.c b/apply.c
index 6695a931e9..e59c77a1b7 100644
--- a/apply.c
+++ b/apply.c
@@ -133,8 +133,6 @@  int check_apply_state(struct apply_state *state, int force_apply)
 
 	if (state->apply_with_reject && state->threeway)
 		return error(_("--reject and --3way cannot be used together."));
-	if (state->cached && state->threeway)
-		return error(_("--cached and --3way cannot be used together."));
 	if (state->threeway) {
 		if (is_not_gitdir)
 			return error(_("--3way outside a repository"));
@@ -4646,7 +4644,8 @@  static int write_out_results(struct apply_state *state, struct patch *list)
 		}
 		string_list_clear(&cpath, 0);
 
-		repo_rerere(state->repo, 0);
+		if (!state->cached)
+			repo_rerere(state->repo, 0);
 	}
 
 	return errs;