diff mbox series

[1/1] git-apply: Allow simultaneous --cached and --3way options

Message ID 20210403013410.32064-2-jerry@skydio.com (mailing list archive)
State Superseded
Headers show
Series git-apply: Allow simultaneous --cached and --3way options | expand

Commit Message

Jerry Zhang April 3, 2021, 1:34 a.m. UTC
Previously, --cached and --3way were not
allowed to be used together, since --3way
wrote conflict markers into the working tree.

These changes change semantics so that if
these flags are given together and there is
a conflict, the conflict markers are added
directly to cache. If there is no conflict,
the patch is applied directly to cache as
expected.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Jerry Zhang <jerryxzha@googlemail.com>
---
 Documentation/git-apply.txt |  4 +++-
 apply.c                     | 13 +++++++------
 2 files changed, 10 insertions(+), 7 deletions(-)

Comments

Elijah Newren April 3, 2021, 3:46 a.m. UTC | #1
I'm not that familiar with apply.c, but let me attempt to take a look...

On Fri, Apr 2, 2021 at 6:36 PM Jerry Zhang <jerry@skydio.com> wrote:
>
> Previously, --cached and --3way were not
> allowed to be used together, since --3way
> wrote conflict markers into the working tree.
>
> These changes change semantics so that if
> these flags are given together and there is
> a conflict, the conflict markers are added
> directly to cache. If there is no conflict,
> the patch is applied directly to cache as
> expected.
>
> Signed-off-by: Jerry Zhang <jerry@skydio.com>
> Signed-off-by: Jerry Zhang <jerryxzha@googlemail.com>
> ---
>  Documentation/git-apply.txt |  4 +++-
>  apply.c                     | 13 +++++++------
>  2 files changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
> index 91d9a8601c..3dc0085066 100644
> --- a/Documentation/git-apply.txt
> +++ b/Documentation/git-apply.txt
> @@ -89,7 +89,9 @@ OPTIONS
>         and we have those blobs available locally, possibly leaving the
>         conflict markers in the files in the working tree for the user to
>         resolve.  This option implies the `--index` option, and is incompatible
> -       with the `--reject` and the `--cached` options.
> +       with the `--reject` option. When used with the --cached option, any
> +       conflict markers are added directly to the cache rather than the
> +       working tree.
>
>  --build-fake-ancestor=<file>::
>         Newer 'git diff' output has embedded 'index information'
> diff --git a/apply.c b/apply.c
> index 6695a931e9..fc94ca0e99 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -133,8 +133,6 @@ int check_apply_state(struct apply_state *state, int force_apply)
>
>         if (state->apply_with_reject && state->threeway)
>                 return error(_("--reject and --3way cannot be used together."));
> -       if (state->cached && state->threeway)
> -               return error(_("--cached and --3way cannot be used together."));
>         if (state->threeway) {
>                 if (is_not_gitdir)
>                         return error(_("--3way outside a repository"));
> @@ -4490,13 +4488,16 @@ static int create_file(struct apply_state *state, struct patch *patch)
>
>         if (!mode)
>                 mode = S_IFREG | 0644;
> -       if (create_one_file(state, path, mode, buf, size))
> -               return -1;
> +       if (!state->cached) {

Why add this check?  create_one_file() already has an early return if
state->cached is true.

> +               if (create_one_file(state, path, mode, buf, size))
> +                       return -1;
> +       }
>
> -       if (patch->conflicted_threeway)
> +       if (patch->conflicted_threeway && !state->cached)
>                 return add_conflicted_stages_file(state, patch);
> -       else if (state->update_index)
> +       else if (state->update_index) {
>                 return add_index_file(state, path, mode, buf, size);

So if something had conflicts, you ignore the various conflicted
modes, and just add it to the index as it stands.  What if it was
deleted upstream and modified locally?  Doesn't that just ignore the
conflict, make it invisible to the user, and add the locally modified
version?  Similarly if it was renamed upstream and modified locally,
doesn't that end up in both files being present?  And if there's a
directory/file conflict, due to the lack of ADD_CACHE_SKIP_DFCHECK (or
whatever it's called), the add is just going to fail, but perhaps
that's the most reasonable case as it'd print an error message and
return -1, I think.

Again, I didn't test any of this out and I'm not so familiar with this
code, so I'm guessing at these scenarios.  If I'm wrong about how this
works, the commit message probably deserves an explanation about why
they work, and we'd definitely need a few testcases for these types of
scenarios.  If I'm right, the current implementation is problematic at
least if not the idea of using these options together.
Junio C Hamano April 3, 2021, 4:26 a.m. UTC | #2
Elijah Newren <newren@gmail.com> writes:

> I'm not that familiar with apply.c, but let me attempt to take a look...

I am (well, at least I was the one who invented 3way and added the
"index" line to the diff output format) ;-)

> scenarios.  If I'm right, the current implementation is problematic at
> least if not the idea of using these options together.

Yes, the conflicted case cannot sanely be handled _without_ leaving
higher stage entries in the index, and that is exactly why I made it
incompatible with the "--cached" mode.

It might be OK to only allow the combination when everything auto
resolves cleanly and fail the operation without touching either the
index or the working tree.  Pretending there was no delete/modify
conflicts or adding contents with unresolved conflicts as if nothing
bad happened as stage 0 entries would never be acceptable.

Perhaps

 * Error out if the index does not match HEAD.

 * Try applying to the contents in the index.  If there are any
   structural conflicts, leave these paths at higher stage and do
   not touch their contents.

 * For paths without structural conflict but need content merges,
   attempt ll-merge of the contents.  If autoresolves cleanly,
   register the result at stage 0.  Otherwise, discard the failed
   conflicted merge, and leave stages 1, 2 and 3 as they are.

 * Exit with status 0 if and only if everything has resolved
   cleanly.  Otherwise, exit with non-zero status.

would be the minimally-acceptably-safe behaviour.
Junio C Hamano April 4, 2021, 1:02 a.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> It might be OK to only allow the combination when everything auto
> resolves cleanly and fail the operation without touching either the
> index or the working tree.  Pretending there was no delete/modify
> conflicts or adding contents with unresolved conflicts as if nothing
> bad happened as stage 0 entries would never be acceptable.
>
> Perhaps
>
>  * Error out if the index does not match HEAD.
>
>  * Try applying to the contents in the index.  If there are any
>    structural conflicts, leave these paths at higher stage and do
>    not touch their contents.
>
>  * For paths without structural conflict but need content merges,
>    attempt ll-merge of the contents.  If autoresolves cleanly,
>    register the result at stage 0.  Otherwise, discard the failed
>    conflicted merge, and leave stages 1, 2 and 3 as they are.
>
>  * Exit with status 0 if and only if everything has resolved
>    cleanly.  Otherwise, exit with non-zero status.
>
> would be the minimally-acceptably-safe behaviour.

Note that, while a lot unsatisfactory than the above, the following
would also be acceptable.

  * Error out if the index does not match HEAD.

  * Try applying to the contents in the index.  If there are any
    structural conflicts, abort without touching the index (or the
    working tree --- but that is best left unsaid as we all know we
    are talking about '--cached').

  * For paths without structural conflict but need content merges,
    attempt ll-merge of the contents.  If ALL SUCh PATHS autoresolve
    cleanly, register their result at stage 0.  Otherwise, abort
    without touching the index (or the working tree).

  * Exit with status 0 if and only if everything has resolved
    cleanly.  Otherwise, exit with non-zero status (and never touch
    the index or the working tree).

The version I earlier gave would give a good starting point to
manually resolve the conflicts in the index and when resolved fully,
it is safely recorded as the result of applying the patch on top of
HEAD, because the non-final results are all in higher stages, and
all the paths at stage 0 are either from the HEAD and unaffected by
the merge, or the ones that cleanly resolved.  The "the index must
match HEAD" upfront is to ensure that.  Otherwise it would make it
very tempting, after spending all that time to resolve the conflicts
only in the higher stages of the index, to commit the index as-is to
make a child commit of HEAD and record that it is the result of
applying the patch.  But if the starting condition had a change
unrelated to the change the patch brings in already in the index,
the resulting commit would be _more_ than what the patch did to the
codebase.

The simplified version would let the user proceed only when the
conflicts can mechanically resolved, but it still has the "make sure
what is recorded is only from the incoming patch" safety.

Of course, if the user is trying to cherry-pick parts of multiple
patches and combine them to create a new single commit, the second
and subsequent applycation of the patches would be thwarted by the
"the index must match HEAD" rule, but it is far safer to make each
step into its own snapshot commit during such a workflow to combine
multiple patch pieces and then squash them together after finishing,
than carrying an intermediate result only in the index and risk
losing work you did in the previous step(s) to incorrect resolution
in later step(s).
Jerry Zhang April 5, 2021, 10:08 p.m. UTC | #4
On Fri, Apr 2, 2021 at 8:46 PM Elijah Newren <newren@gmail.com> wrote:
>
> I'm not that familiar with apply.c, but let me attempt to take a look...
>
> On Fri, Apr 2, 2021 at 6:36 PM Jerry Zhang <jerry@skydio.com> wrote:
> >
> > Previously, --cached and --3way were not
> > allowed to be used together, since --3way
> > wrote conflict markers into the working tree.
> >
> > These changes change semantics so that if
> > these flags are given together and there is
> > a conflict, the conflict markers are added
> > directly to cache. If there is no conflict,
> > the patch is applied directly to cache as
> > expected.
> >
> > Signed-off-by: Jerry Zhang <jerry@skydio.com>
> > Signed-off-by: Jerry Zhang <jerryxzha@googlemail.com>
> > ---
> >  Documentation/git-apply.txt |  4 +++-
> >  apply.c                     | 13 +++++++------
> >  2 files changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
> > index 91d9a8601c..3dc0085066 100644
> > --- a/Documentation/git-apply.txt
> > +++ b/Documentation/git-apply.txt
> > @@ -89,7 +89,9 @@ OPTIONS
> >         and we have those blobs available locally, possibly leaving the
> >         conflict markers in the files in the working tree for the user to
> >         resolve.  This option implies the `--index` option, and is incompatible
> > -       with the `--reject` and the `--cached` options.
> > +       with the `--reject` option. When used with the --cached option, any
> > +       conflict markers are added directly to the cache rather than the
> > +       working tree.
> >
> >  --build-fake-ancestor=<file>::
> >         Newer 'git diff' output has embedded 'index information'
> > diff --git a/apply.c b/apply.c
> > index 6695a931e9..fc94ca0e99 100644
> > --- a/apply.c
> > +++ b/apply.c
> > @@ -133,8 +133,6 @@ int check_apply_state(struct apply_state *state, int force_apply)
> >
> >         if (state->apply_with_reject && state->threeway)
> >                 return error(_("--reject and --3way cannot be used together."));
> > -       if (state->cached && state->threeway)
> > -               return error(_("--cached and --3way cannot be used together."));
> >         if (state->threeway) {
> >                 if (is_not_gitdir)
> >                         return error(_("--3way outside a repository"));
> > @@ -4490,13 +4488,16 @@ static int create_file(struct apply_state *state, struct patch *patch)
> >
> >         if (!mode)
> >                 mode = S_IFREG | 0644;
> > -       if (create_one_file(state, path, mode, buf, size))
> > -               return -1;
> > +       if (!state->cached) {
>
> Why add this check?  create_one_file() already has an early return if
> state->cached is true.
>
removed in latest patch
> > +               if (create_one_file(state, path, mode, buf, size))
> > +                       return -1;
> > +       }
> >
> > -       if (patch->conflicted_threeway)
> > +       if (patch->conflicted_threeway && !state->cached)
> >                 return add_conflicted_stages_file(state, patch);
> > -       else if (state->update_index)
> > +       else if (state->update_index) {
> >                 return add_index_file(state, path, mode, buf, size);
>
> So if something had conflicts, you ignore the various conflicted
> modes, and just add it to the index as it stands.  What if it was
> deleted upstream and modified locally?  Doesn't that just ignore the
> conflict, make it invisible to the user, and add the locally modified
> version?  Similarly if it was renamed upstream and modified locally,
> doesn't that end up in both files being present?  And if there's a
> directory/file conflict, due to the lack of ADD_CACHE_SKIP_DFCHECK (or
> whatever it's called), the add is just going to fail, but perhaps
> that's the most reasonable case as it'd print an error message and
> return -1, I think.
>
> Again, I didn't test any of this out and I'm not so familiar with this
> code, so I'm guessing at these scenarios.  If I'm wrong about how this
> works, the commit message probably deserves an explanation about why
> they work, and we'd definitely need a few testcases for these types of
> scenarios.  If I'm right, the current implementation is problematic at
> least if not the idea of using these options together.
Echoing my other reply, I think I will give up on the conflict markers and
just leave the files at the higher stages. I think that will address any
safety concerns since that's what --3way does without the cached
flag
Jerry Zhang April 5, 2021, 10:12 p.m. UTC | #5
On Sat, Apr 3, 2021 at 6:02 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > It might be OK to only allow the combination when everything auto
> > resolves cleanly and fail the operation without touching either the
> > index or the working tree.  Pretending there was no delete/modify
> > conflicts or adding contents with unresolved conflicts as if nothing
> > bad happened as stage 0 entries would never be acceptable.
Yep I've changed it to just leave the higher stages in the index
> >
> > Perhaps
> >
> >  * Error out if the index does not match HEAD.
> >
> >  * Try applying to the contents in the index.  If there are any
> >    structural conflicts, leave these paths at higher stage and do
> >    not touch their contents.
> >
> >  * For paths without structural conflict but need content merges,
> >    attempt ll-merge of the contents.  If autoresolves cleanly,
> >    register the result at stage 0.  Otherwise, discard the failed
> >    conflicted merge, and leave stages 1, 2 and 3 as they are.
> >
> >  * Exit with status 0 if and only if everything has resolved
> >    cleanly.  Otherwise, exit with non-zero status.
> >
> > would be the minimally-acceptably-safe behaviour.
>
> Note that, while a lot unsatisfactory than the above, the following
> would also be acceptable.
>
>   * Error out if the index does not match HEAD.
>
>   * Try applying to the contents in the index.  If there are any
>     structural conflicts, abort without touching the index (or the
>     working tree --- but that is best left unsaid as we all know we
>     are talking about '--cached').
>
>   * For paths without structural conflict but need content merges,
>     attempt ll-merge of the contents.  If ALL SUCh PATHS autoresolve
>     cleanly, register their result at stage 0.  Otherwise, abort
>     without touching the index (or the working tree).
>
>   * Exit with status 0 if and only if everything has resolved
>     cleanly.  Otherwise, exit with non-zero status (and never touch
>     the index or the working tree).
>
> The version I earlier gave would give a good starting point to
> manually resolve the conflicts in the index and when resolved fully,
> it is safely recorded as the result of applying the patch on top of
> HEAD, because the non-final results are all in higher stages, and
> all the paths at stage 0 are either from the HEAD and unaffected by
> the merge, or the ones that cleanly resolved.  The "the index must
> match HEAD" upfront is to ensure that.  Otherwise it would make it
> very tempting, after spending all that time to resolve the conflicts
> only in the higher stages of the index, to commit the index as-is to
> make a child commit of HEAD and record that it is the result of
> applying the patch.  But if the starting condition had a change
> unrelated to the change the patch brings in already in the index,
> the resulting commit would be _more_ than what the patch did to the
> codebase.
I can see what you mean about the user safety issue. However,
my specific use case (see cover letter) involves an index that does not
match HEAD, and wouldn't be possible at all if we forced the index to
match HEAD. Furthermore git-apply --cached even without --3way
doesn't force the index to match HEAD either, so why force it now?

>
> The simplified version would let the user proceed only when the
> conflicts can mechanically resolved, but it still has the "make sure
> what is recorded is only from the incoming patch" safety.
>
> Of course, if the user is trying to cherry-pick parts of multiple
> patches and combine them to create a new single commit, the second
> and subsequent applycation of the patches would be thwarted by the
> "the index must match HEAD" rule, but it is far safer to make each
> step into its own snapshot commit during such a workflow to combine
> multiple patch pieces and then squash them together after finishing,
> than carrying an intermediate result only in the index and risk
> losing work you did in the previous step(s) to incorrect resolution
> in later step(s).
Junio C Hamano April 5, 2021, 10:23 p.m. UTC | #6
Jerry Zhang <jerry@skydio.com> writes:

> I can see what you mean about the user safety issue. However,
> my specific use case (see cover letter) involves an index that does not
> match HEAD, and wouldn't be possible at all if we forced the index to
> match HEAD. Furthermore git-apply --cached even without --3way
> doesn't force the index to match HEAD either, so why force it now?

Primarily because we tend to be extra careful before mergy operation
than any other operation.  Especially without --3way, apply (with or
without --cached/--index) is extra careful to make itself all-or-none
operation to be safe, so that there is no mixed mess that requires
manual intervention (which would further increase the risk of mistakes).

It is OK to introduce a new option to allow a dirty index, and your
tool can pass that option when it calls "apply --cached --3way", but
it would be safe to require a clean index (it does not matter how
dirty the working tree is ;-) by default.
Jerry Zhang April 5, 2021, 11:29 p.m. UTC | #7
On Mon, Apr 5, 2021 at 3:23 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jerry Zhang <jerry@skydio.com> writes:
>
> > I can see what you mean about the user safety issue. However,
> > my specific use case (see cover letter) involves an index that does not
> > match HEAD, and wouldn't be possible at all if we forced the index to
> > match HEAD. Furthermore git-apply --cached even without --3way
> > doesn't force the index to match HEAD either, so why force it now?
>
> Primarily because we tend to be extra careful before mergy operation
> than any other operation.  Especially without --3way, apply (with or
> without --cached/--index) is extra careful to make itself all-or-none
> operation to be safe, so that there is no mixed mess that requires
> manual intervention (which would further increase the risk of mistakes).
>
> It is OK to introduce a new option to allow a dirty index, and your
> tool can pass that option when it calls "apply --cached --3way", but
> it would be safe to require a clean index (it does not matter how
> dirty the working tree is ;-) by default.
>
Sure adding the staged files will definitely clobber whatever the user
had in the cache at stage 0. This will probably be unexpected. But
the normal invocation of --3way also does this without warning, since
it touches the cache as well. It just seems odd to me to be adding a safety
check on some paths that aren't there on other very similar ones. Maybe
another option would be to add a very stern warning for users of --3way?

Unrelatedly would you have context on why --3way falls back on 3way
rather than trying 3way first then falling back on apply_fragments if
blobs don't exist? I see some cases where the normal patch application
will succeed but apply the patch incorrectly, while 3way will apply the
patch correctly. In these cases it's impossible for the user to force 3way.
Are there downsides to 3way that aren't solved by falling back on
apply_fragments?
Junio C Hamano April 6, 2021, 12:10 a.m. UTC | #8
Jerry Zhang <jerry@skydio.com> writes:

> Unrelatedly would you have context on why --3way falls back on 3way
> rather than trying 3way first then falling back on apply_fragments if
> blobs don't exist?

Historical accident, following the order in which these features
were invented, plus applying a single patch straight has been faster
than doing a 3-way.

I tend to agree with what you are hinting at, though, as I do not
offhand think of a situation in which a successful 3way merge would
be less correct than a straight patch application, and if the user
explicitly has told us to do "--3way", that is a sign that it is
acceptable to try 3way first even if it costs more cycles.

Those who has been giving "--3way" from inertia would notice if the
performance difference is large enough and may complain, though.
diff mbox series

Patch

diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
index 91d9a8601c..3dc0085066 100644
--- a/Documentation/git-apply.txt
+++ b/Documentation/git-apply.txt
@@ -89,7 +89,9 @@  OPTIONS
 	and we have those blobs available locally, possibly leaving the
 	conflict markers in the files in the working tree for the user to
 	resolve.  This option implies the `--index` option, and is incompatible
-	with the `--reject` and the `--cached` options.
+	with the `--reject` option. When used with the --cached option, any
+	conflict markers are added directly to the cache rather than the
+	working tree.
 
 --build-fake-ancestor=<file>::
 	Newer 'git diff' output has embedded 'index information'
diff --git a/apply.c b/apply.c
index 6695a931e9..fc94ca0e99 100644
--- a/apply.c
+++ b/apply.c
@@ -133,8 +133,6 @@  int check_apply_state(struct apply_state *state, int force_apply)
 
 	if (state->apply_with_reject && state->threeway)
 		return error(_("--reject and --3way cannot be used together."));
-	if (state->cached && state->threeway)
-		return error(_("--cached and --3way cannot be used together."));
 	if (state->threeway) {
 		if (is_not_gitdir)
 			return error(_("--3way outside a repository"));
@@ -4490,13 +4488,16 @@  static int create_file(struct apply_state *state, struct patch *patch)
 
 	if (!mode)
 		mode = S_IFREG | 0644;
-	if (create_one_file(state, path, mode, buf, size))
-		return -1;
+	if (!state->cached) {
+		if (create_one_file(state, path, mode, buf, size))
+			return -1;
+	}
 
-	if (patch->conflicted_threeway)
+	if (patch->conflicted_threeway && !state->cached)
 		return add_conflicted_stages_file(state, patch);
-	else if (state->update_index)
+	else if (state->update_index) {
 		return add_index_file(state, path, mode, buf, size);
+	}
 	return 0;
 }