diff mbox series

[v2] Give git-pull a --reset option

Message ID 20190421040823.24821-1-alexhenrie24@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] Give git-pull a --reset option | expand

Commit Message

Alex Henrie April 21, 2019, 4:08 a.m. UTC
A common workflow is to make a commit on a local branch, push the branch
to the remote, check out the remote branch on a second computer, amend
the commit on the second computer, force-push back to the remote branch,
and finally submit a pull request. However, if the user switches back to
the first computer, they must then run the cumbersome command
`git fetch && git reset --hard origin`. (Actually, at this point Git
novices often try running `git pull --force`, but it doesn't do what
they expect.) This patch adds the shortcut `git pull --reset` to serve
as a complement to `git push --force`.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
---
 Documentation/git-pull.txt |  8 ++++++++
 builtin/pull.c             | 23 +++++++++++++++++++++++
 t/t5520-pull.sh            | 24 ++++++++++++++++++++++++
 3 files changed, 55 insertions(+)

Comments

Junio C Hamano April 21, 2019, 5:38 a.m. UTC | #1
Alex Henrie <alexhenrie24@gmail.com> writes:

> A common workflow is to make a commit on a local branch, push the branch
> to the remote, check out the remote branch on a second computer, amend
> the commit on the second computer, force-push back to the remote branch,
> and finally submit a pull request. However, if the user switches back to
> the first computer, they must then run the cumbersome command
> `git fetch && git reset --hard origin`. 

Doesn't anybody sense there is a larger problem if such a workflow
is "common" in the first place?  In that sequence, when you come
back to the first repository there is no guarantee that what you are
losing is exactly what you are willing to lose and nothing else
(i.e. your earlier WIP you sent to the second repository, which was
further polished, making the earlier WIP you had here irrelevant).

If the last "recovery at the first repository" step were "pull --rebase",
at least you would realize that you have the earlier WIP locally
that either

    (1) conflicts with the more polished work that have been
        accepted at the upstream, in which case you can tell the
        rebase machinery to drop that earlier WIP _after_ making
        sure that it is only that stale WIP you not only are willing
        to but actively do want to lose that is getting discarded.

    (2) replays cleanly on top of the updated upstream, which hasn't
        accepted your pull request made from the second repository
        with the more polished version, in which case you'd realize
        that you may have to work on the topic further.  And you
        have a chance to inspect what you ended up with before using
        "reset --hard" or "rebase -i" to discard what you no longer
        need.

At least, I think the longhand this attempts to replace, "fetch"
followed by "reset --hard origin" is better because of two reasons.
It is more explicit that the procedure is destructive, and more
importantly, it can allow to have (and we should encourage users to
make a habit to have) an extra step to inspect what the user is
about to lose with "git log origin.." after "fetch" but before
"reset --hard".

So I have a moderately strong suspicion that "git pull --reset"
promotes a wrong workflow and should not exist.
Alex Henrie April 21, 2019, 7:01 a.m. UTC | #2
On Sat, Apr 20, 2019 at 11:38 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Alex Henrie <alexhenrie24@gmail.com> writes:
>
> > A common workflow is to make a commit on a local branch, push the branch
> > to the remote, check out the remote branch on a second computer, amend
> > the commit on the second computer, force-push back to the remote branch,
> > and finally submit a pull request. However, if the user switches back to
> > the first computer, they must then run the cumbersome command
> > `git fetch && git reset --hard origin`.
>
> Doesn't anybody sense there is a larger problem if such a workflow
> is "common" in the first place?  In that sequence, when you come
> back to the first repository there is no guarantee that what you are
> losing is exactly what you are willing to lose and nothing else
> (i.e. your earlier WIP you sent to the second repository, which was
> further polished, making the earlier WIP you had here irrelevant).

You may be right. On the other hand, you're expected to think about
what you're doing before running `git push --force` and clobbering a
remote branch. Similarly, you would be expected to think about what
you're doing before running `git pull --reset` and clobbering a local
branch. It's actually easier to recover from accidentally clobbering a
local branch than accidentally clobbering a remote branch because you
can use `git reflog` to find the lost commits.

> If the last "recovery at the first repository" step were "pull --rebase",
> at least you would realize that you have the earlier WIP locally
> that either
>
>     (1) conflicts with the more polished work that have been
>         accepted at the upstream, in which case you can tell the
>         rebase machinery to drop that earlier WIP _after_ making
>         sure that it is only that stale WIP you not only are willing
>         to but actively do want to lose that is getting discarded.
>
>     (2) replays cleanly on top of the updated upstream, which hasn't
>         accepted your pull request made from the second repository
>         with the more polished version, in which case you'd realize
>         that you may have to work on the topic further.  And you
>         have a chance to inspect what you ended up with before using
>         "reset --hard" or "rebase -i" to discard what you no longer
>         need.

I understand that `git pull --rebase` followed by `git rebase --skip`
is a safer workflow. I just feel like an option like `git pull
--reset` should be there for users who know what they're doing, just
like `git push --force` is available for users who know what they're
doing.

> At least, I think the longhand this attempts to replace, "fetch"
> followed by "reset --hard origin" is better because of two reasons.
> It is more explicit that the procedure is destructive, and more
> importantly, it can allow to have (and we should encourage users to
> make a habit to have) an extra step to inspect what the user is
> about to lose with "git log origin.." after "fetch" but before
> "reset --hard".

I'd be happy to emphasize the destructive nature of this option by
calling it `git pull --hard-reset` instead.

> So I have a moderately strong suspicion that "git pull --reset"
> promotes a wrong workflow and should not exist.

It'd be great to get some feedback from other Git users, but in the
end it's up to you and I trust your decision.

-Alex
Philip Oakley April 21, 2019, 1:53 p.m. UTC | #3
Hi Alex

On 21/04/2019 08:01, Alex Henrie wrote:
> On Sat, Apr 20, 2019 at 11:38 PM Junio C Hamano <gitster@pobox.com> wrote:
>> Alex Henrie <alexhenrie24@gmail.com> writes:
>>
>>> A common workflow is to make a commit on a local branch, push the branch
>>> to the remote, check out the remote branch on a second computer, amend
>>> the commit on the second computer, force-push back to the remote branch,
>>> and finally submit a pull request. However, if the user switches back to
>>> the first computer, they must then run the cumbersome command
>>> `git fetch && git reset --hard origin`.
This will be quite a common occurrence especially for personal repos on 
one of the hosting sites (GitHub, GitLab, etc), so we know that we are 
the only user of that repo.

I've certainly used that style of technique for Windows vs Linux machine 
transfers of Git work (when my old hack linux machine is functional:-).
>> Doesn't anybody sense there is a larger problem if such a workflow
>> is "common" in the first place?  In that sequence, when you come
>> back to the first repository there is no guarantee that what you are
>> losing is exactly what you are willing to lose and nothing else
>> (i.e. your earlier WIP you sent to the second repository, which was
>> further polished, making the earlier WIP you had here irrelevant).
I'd agree that a public/joint repository could have issues if a user 
blindly assumes that no one has cooperated with them and added more 
commits, but that becomes a social issue (plus worthy of a documentation 
mention!).

There are still a few blind spots in the functionality of Git regarding 
how users interact with their hosting provider, which is trying to be 
(from a user perspective) both a backup and an independent repo (side 
discussion at Git Merge). This would appear to be one of those cases 
where one is collaborating with no-one (oneself), but from two machines.

It may just be that we need to put aside `pull` because of its old 
semantics, and start with a fresh command name for the 'fetch + sort 
stuff out' for these use cases.
> You may be right. On the other hand, you're expected to think about
> what you're doing before running `git push --force` and clobbering a
> remote branch. Similarly, you would be expected to think about what
> you're doing before running `git pull --reset` and clobbering a local
> branch. It's actually easier to recover from accidentally clobbering a
> local branch than accidentally clobbering a remote branch because you
> can use `git reflog` to find the lost commits.
>
>> If the last "recovery at the first repository" step were "pull --rebase",
>> at least you would realize that you have the earlier WIP locally
>> that either
>>
>>      (1) conflicts with the more polished work that have been
>>          accepted at the upstream, in which case you can tell the
>>          rebase machinery to drop that earlier WIP _after_ making
>>          sure that it is only that stale WIP you not only are willing
>>          to but actively do want to lose that is getting discarded.
>>
>>      (2) replays cleanly on top of the updated upstream, which hasn't
>>          accepted your pull request made from the second repository
>>          with the more polished version, in which case you'd realize
>>          that you may have to work on the topic further.  And you
>>          have a chance to inspect what you ended up with before using
>>          "reset --hard" or "rebase -i" to discard what you no longer
>>          need.
> I understand that `git pull --rebase` followed by `git rebase --skip`
> is a safer workflow. I just feel like an option like `git pull
> --reset` should be there for users who know what they're doing, just
> like `git push --force` is available for users who know what they're
> doing.
>
>> At least, I think the longhand this attempts to replace, "fetch"
>> followed by "reset --hard origin" is better because of two reasons.
>> It is more explicit that the procedure is destructive, and more
>> importantly, it can allow to have (and we should encourage users to
>> make a habit to have) an extra step to inspect what the user is
>> about to lose with "git log origin.." after "fetch" but before
>> "reset --hard".
> I'd be happy to emphasize the destructive nature of this option by
> calling it `git pull --hard-reset` instead.
>
>> So I have a moderately strong suspicion that "git pull --reset"
>> promotes a wrong workflow and should not exist.
> It'd be great to get some feedback from other Git users, but in the
> end it's up to you and I trust your decision.
>
> -Alex
I though it worth chiming in that folks do use these simple dumb use cases.
--
Philip
brian m. carlson April 21, 2019, 9:18 p.m. UTC | #4
On Sun, Apr 21, 2019 at 02:38:38PM +0900, Junio C Hamano wrote:
> Alex Henrie <alexhenrie24@gmail.com> writes:
> 
> > A common workflow is to make a commit on a local branch, push the branch
> > to the remote, check out the remote branch on a second computer, amend
> > the commit on the second computer, force-push back to the remote branch,
> > and finally submit a pull request. However, if the user switches back to
> > the first computer, they must then run the cumbersome command
> > `git fetch && git reset --hard origin`. 
> 
> Doesn't anybody sense there is a larger problem if such a workflow
> is "common" in the first place?  In that sequence, when you come
> back to the first repository there is no guarantee that what you are
> losing is exactly what you are willing to lose and nothing else
> (i.e. your earlier WIP you sent to the second repository, which was
> further polished, making the earlier WIP you had here irrelevant).

It may be helpful to point out that this is essentially the workflow I
had before I had only one computer. I would make some changes on my
desktop, which was my primary computer, then need to travel somewhere
and use my laptop. I would want to go back to my desktop when I returned
home, because it was far more powerful, and I would know that I was the
only user of this branch.

Now, as a contributor and a moderately advanced user, I would likely end
up using "git commit --fixup" (or "--squash") for this purpose and
squash only before submitting, but there are situations where fixup
commits cause conflicts and it's necessary to do a rebase and force push
if you don't want extensive pain.

So while I think that "git pull --rebase" or "git pull --ff-only" are
probably better options in most situations, I can see the use in this
command, with appropriate foresight and knowledge. I can also see how
it's easy to blow away data with the proposed option, especially for
novice users, who are likely not aware of how to restore it from the
reflog.

I'm not sure if this email is an argument for or against this option,
but maybe it provides some helpful perspective.
Junio C Hamano April 21, 2019, 11:38 p.m. UTC | #5
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> It may be helpful to point out that this is essentially the workflow I
> had ...
> I'm not sure if this email is an argument for or against this option,
> but maybe it provides some helpful perspective.

I think you and Phillip misread me.

I did not question if the workflow is common the message you are
responding to.  I've done my fair share of "I know what I have on my
laptop is stale as I pushed it out to elsewhere to continue working
on it, so let's get rid of it from the updated upstream and start
afresh" myself.

What I questioned was if it is sensible to ensure that it stays
common.

We'd be encouraging the dangerous workflow, instead of analysing the
situation where people employ it and trying to give them a better
alternative to deal with the situation.  That is what we'd be doing
with "pull --reset", i.e. an easier to type short-hand that robs the
chance to double check in the middle of "fetch && inspect && reset"
sequence.

As to where the feature should go, if we decide that it is a good
idea to promote this workflow element as a supported feature, I
agree with Alex's design in the patch that it makes sense to have it
as "pull --reset".  Existing "pull --merge" and "pull -rebase" are
"fetch and then integrate by merging theirs into ours" and "fetch
and then integrate by rebasing our stuff on top of theirs"; the new
"pull --reset" would fit well sitting next to them with its "fetch
and then treat our stuff as valueless and replace it with theirs"
semantics.
Phillip Wood April 22, 2019, 3:14 p.m. UTC | #6
On 22/04/2019 00:38, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
>> It may be helpful to point out that this is essentially the workflow I
>> had ...
>> I'm not sure if this email is an argument for or against this option,
>> but maybe it provides some helpful perspective.
> 
> I think you and Phillip misread me.
> 
> I did not question if the workflow is common the message you are
> responding to.  I've done my fair share of "I know what I have on my
> laptop is stale as I pushed it out to elsewhere to continue working
> on it, so let's get rid of it from the updated upstream and start
> afresh" myself.
> 
> What I questioned was if it is sensible to ensure that it stays
> common.
> 
> We'd be encouraging the dangerous workflow, instead of analysing the
> situation where people employ it and trying to give them a better
> alternative to deal with the situation. 

I agree that it would be better to support this workflow in a safer 
manner. I use a script that looks at the reflog of the local and remote 
branches to work out if it is safe to reset --hard when pulling. If 
there are local changes since the last push/pull (because I forget to 
sync before starting work) then it will run rebase --onto <new-remote> 
<last-sync>. If the remote branch and local branches have both been 
rewritten it tells me to run range-diff to figure out how to fix it.

The value of <last-sync> is the most recent push or pull that updated 
the local branch. i.e. when both the local and remote refs pointed to 
the same commit at the same time (rebase complicates this as the remote 
we rebased onto does not appear in directly the local reflog - you have 
to look for "rebase .*finish.* <local-ref> <remote-oid>". Also an ide 
might auto fetch and update the remote ref while we're rebasing the 
local work which stops the reflog coinciding exactly.) The <last-sync> 
is also useful for making forced pushes safer. The script pushes with 
--force-with-lease=<ref>:<last-sync> if the local branch has been 
rewritten since the last sync so that any remote changes since the last 
sync are not overwritten.

Using the reflog to figure out the last sync makes forced pushes and 
pulls much safer. If we want to support this workflow in core git (and 
it does seem to be fairly common) then I think it needs to be based on 
something that prevents people from accidentally throwing away local 
changes when pulling and remote changes when pushing. There was some 
discussion of this with respect to pushing last summer[1]

Best Wishes

Phillip

[1] 
https://public-inbox.org/git/CAEFop40OJ5MRwM8zxE44yB0f2Fxw9YsUdM1e-H=Nn9e=sAGJ=w@mail.gmail.com/T/#u

  That is what we'd be doing
> with "pull --reset", i.e. an easier to type short-hand that robs the
> chance to double check in the middle of "fetch && inspect && reset"
> sequence.
> 
> As to where the feature should go, if we decide that it is a good
> idea to promote this workflow element as a supported feature, I
> agree with Alex's design in the patch that it makes sense to have it
> as "pull --reset".  Existing "pull --merge" and "pull -rebase" are
> "fetch and then integrate by merging theirs into ours" and "fetch
> and then integrate by rebasing our stuff on top of theirs"; the new
> "pull --reset" would fit well sitting next to them with its "fetch
> and then treat our stuff as valueless and replace it with theirs"
> semantics.
>
diff mbox series

Patch

diff --git a/Documentation/git-pull.txt b/Documentation/git-pull.txt
index 118d9d86f7..032a5c2e34 100644
--- a/Documentation/git-pull.txt
+++ b/Documentation/git-pull.txt
@@ -23,6 +23,7 @@  More precisely, 'git pull' runs 'git fetch' with the given
 parameters and calls 'git merge' to merge the retrieved branch
 heads into the current branch.
 With `--rebase`, it runs 'git rebase' instead of 'git merge'.
+With `--reset`, it runs `git reset --hard` instead of 'git merge'.
 
 <repository> should be the name of a remote repository as
 passed to linkgit:git-fetch[1].  <refspec> can name an
@@ -141,6 +142,13 @@  unless you have read linkgit:git-rebase[1] carefully.
 +
 This option is only valid when "--rebase" is used.
 
+--reset::
+	Reset the local branch to be identical to the remote branch, discarding
+	any local commits or other changes.
+
+--no-reset::
+	Override earlier --reset.
+
 Options related to fetching
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/builtin/pull.c b/builtin/pull.c
index 33db889955..b32134c1f1 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -95,6 +95,7 @@  static int recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 
 /* Options passed to git-merge or git-rebase */
 static enum rebase_type opt_rebase = -1;
+static char *opt_reset;
 static char *opt_diffstat;
 static char *opt_log;
 static char *opt_signoff;
@@ -144,6 +145,9 @@  static struct option pull_options[] = {
 	  "(false|true|merges|preserve|interactive)",
 	  N_("incorporate changes by rebasing rather than merging"),
 	  PARSE_OPT_OPTARG, parse_opt_rebase },
+	OPT_PASSTHRU(0, "reset", &opt_reset, NULL,
+		N_("discard all local changes rather than merging"),
+		PARSE_OPT_NOARG | PARSE_OPT_NONEG),
 	OPT_PASSTHRU('n', NULL, &opt_diffstat, NULL,
 		N_("do not show a diffstat at the end of the merge"),
 		PARSE_OPT_NOARG | PARSE_OPT_NONEG),
@@ -860,6 +864,16 @@  static int run_rebase(const struct object_id *curr_head,
 	return ret;
 }
 
+/**
+ * Runs git-reset, returning its exit status.
+ */
+static int run_reset(void)
+{
+	static const char *argv[] = { "reset", "--hard", "FETCH_HEAD", NULL };
+
+	return run_command_v_opt(argv, RUN_GIT_CMD);
+}
+
 int cmd_pull(int argc, const char **argv, const char *prefix)
 {
 	const char *repo, **refspecs;
@@ -892,6 +906,9 @@  int cmd_pull(int argc, const char **argv, const char *prefix)
 	if (get_oid("HEAD", &orig_head))
 		oidclr(&orig_head);
 
+	if (opt_rebase && opt_reset)
+		die(_("--rebase and --reset are mutually exclusive"));
+
 	if (!opt_rebase && opt_autostash != -1)
 		die(_("--[no-]autostash option is only valid with --rebase."));
 
@@ -986,6 +1003,12 @@  int cmd_pull(int argc, const char **argv, const char *prefix)
 			ret = rebase_submodules();
 
 		return ret;
+	} else if (opt_reset) {
+		int ret = run_reset();
+		if (!ret && (recurse_submodules == RECURSE_SUBMODULES_ON ||
+			     recurse_submodules == RECURSE_SUBMODULES_ON_DEMAND))
+			ret = update_submodules();
+		return ret;
 	} else {
 		int ret = run_merge();
 		if (!ret && (recurse_submodules == RECURSE_SUBMODULES_ON ||
diff --git a/t/t5520-pull.sh b/t/t5520-pull.sh
index cf4cc32fd0..597f2429d5 100755
--- a/t/t5520-pull.sh
+++ b/t/t5520-pull.sh
@@ -708,4 +708,28 @@  test_expect_success 'git pull --rebase against local branch' '
 	test file = "$(cat file2)"
 '
 
+test_expect_success 'git pull --rebase --reset is an invalid combination' '
+	test_must_fail git pull --rebase --reset .
+'
+
+test_expect_success 'git pull --reset overwrites or deletes all tracked files' '
+	git init cloned &&
+	(
+		cd cloned &&
+		echo committed > committed &&
+		echo staged > staged &&
+		echo untracked > file &&
+		echo untracked > untracked &&
+		git add committed &&
+		git commit -m original &&
+		git add staged &&
+		git pull --reset .. &&
+		test ! -f committed &&
+		test ! -f staged &&
+		test "$(cat untracked)" = "untracked" &&
+		test "$(git status --porcelain)" = "?? untracked"
+	) &&
+	test_cmp file cloned/file
+'
+
 test_done