doc: revisions: improve single range explanation

Message ID	20210613004434.10278-1-felipe.contreras@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: Felipe Contreras <felipe.contreras@gmail.com> To: git@vger.kernel.org Cc: Bagas Sanjaya <bagasdotme@gmail.com>, Elijah Newren <newren@gmail.com>, Eric Sunshine <sunshine@sunshineco.com>, Junio C Hamano <gitster@pobox.com>, Felipe Contreras <felipe.contreras@gmail.com> Subject: [PATCH] doc: revisions: improve single range explanation Date: Sat, 12 Jun 2021 19:44:34 -0500 Message-Id: <20210613004434.10278-1-felipe.contreras@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	doc: revisions: improve single range explanation \| expand doc: revisions: improve single range explanation

Felipe Contreras June 13, 2021, 12:44 a.m. UTC

The original explanation didn't seem clear enough to some people.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
---
 Documentation/revisions.txt | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

Bagas Sanjaya June 13, 2021, 2:50 a.m. UTC | #1

Hi,

On 13/06/21 07.44, Felipe Contreras wrote:
> The original explanation didn't seem clear enough to some people.
> 
> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
> ---
>   Documentation/revisions.txt | 22 +++++++++++-----------
>   1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
> index f5f17b65a1..d8cf512686 100644
> --- a/Documentation/revisions.txt
> +++ b/Documentation/revisions.txt
> @@ -299,22 +299,22 @@ empty range that is both reachable and unreachable from HEAD.
>   
>   Commands that are specifically designed to take two distinct ranges
>   (e.g. "git range-diff R1 R2" to compare two ranges) do exist, but
> -they are exceptions.  Unless otherwise noted, all "git" commands
> +they are exceptions.  Unless otherwise noted, all git commands
>   that operate on a set of commits work on a single revision range.
> -In other words, writing two "two-dot range notation" next to each
> -other, e.g.
>   
> -    $ git log A..B C..D
> +For example, if you have a linear history like this:
>   
> -does *not* specify two revision ranges for most commands.  Instead
> -it will name a single connected set of commits, i.e. those that are
> -reachable from either B or D but are reachable from neither A or C.
> -In a linear history like this:
> +    ---A---B---C---D---E---F
>   
> -    ---A---B---o---o---C---D
> +Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
> +commits, but doing A..F B..E will not retrieve two revision ranges
> +totalling 8 commits. Instead the starting point A gets overriden by B,
> +and the ending point of E by F, effectively becoming B..F, a single
> +revision range.
> 


AFAIK, A..F means all commits from A to F. But in case of branched 
history like

     ---A---B---C---G---H---I <- main
                \
                 ---D---E---F <- mybranch

the notation main..mybranch means all commits that are reachable from 
mybranch but not from main, but the opposite (mybranch..main) means the 
opposite!

So basically the right-hand side of two dot notation specifies from what 
commit I want to select the range, and the left-hand side specifies the 
commit which I don't want to reach.

Felipe Contreras June 13, 2021, 3:12 a.m. UTC | #2

Bagas Sanjaya wrote:
> On 13/06/21 07.44, Felipe Contreras wrote:
> > The original explanation didn't seem clear enough to some people.
> > 
> > Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
> > ---
> >   Documentation/revisions.txt | 22 +++++++++++-----------
> >   1 file changed, 11 insertions(+), 11 deletions(-)
> > 
> > diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
> > index f5f17b65a1..d8cf512686 100644
> > --- a/Documentation/revisions.txt
> > +++ b/Documentation/revisions.txt
> > @@ -299,22 +299,22 @@ empty range that is both reachable and unreachable from HEAD.
> >   
> >   Commands that are specifically designed to take two distinct ranges
> >   (e.g. "git range-diff R1 R2" to compare two ranges) do exist, but
> > -they are exceptions.  Unless otherwise noted, all "git" commands
> > +they are exceptions.  Unless otherwise noted, all git commands
> >   that operate on a set of commits work on a single revision range.
> > -In other words, writing two "two-dot range notation" next to each
> > -other, e.g.
> >   
> > -    $ git log A..B C..D
> > +For example, if you have a linear history like this:
> >   
> > -does *not* specify two revision ranges for most commands.  Instead
> > -it will name a single connected set of commits, i.e. those that are
> > -reachable from either B or D but are reachable from neither A or C.
> > -In a linear history like this:
> > +    ---A---B---C---D---E---F
> >   
> > -    ---A---B---o---o---C---D
> > +Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
> > +commits, but doing A..F B..E will not retrieve two revision ranges
> > +totalling 8 commits. Instead the starting point A gets overriden by B,
> > +and the ending point of E by F, effectively becoming B..F, a single
> > +revision range.
> 
> AFAIK, A..F means all commits from A to F. But in case of branched 
> history like
> 
>      ---A---B---C---G---H---I <- main
>                 \
>                  ---D---E---F <- mybranch
> 
> the notation main..mybranch means all commits that are reachable from 
> mybranch but not from main, but the opposite (mybranch..main) means the 
> opposite!
> 
> So basically the right-hand side of two dot notation specifies from what 
> commit I want to select the range, and the left-hand side specifies the 
> commit which I don't want to reach.

Yes, `A..F` is the same as `^A F`.

Eric Sunshine June 13, 2021, 3:32 a.m. UTC | #3

On Sat, Jun 12, 2021 at 8:44 PM Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> The original explanation didn't seem clear enough to some people.
>
> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
> ---
> diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
> @@ -299,22 +299,22 @@ empty range that is both reachable and unreachable from HEAD.
> +For example, if you have a linear history like this:
>
> +    ---A---B---C---D---E---F
>
> +Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
> +commits, but doing A..F B..E will not retrieve two revision ranges
> +totalling 8 commits. Instead the starting point A gets overriden by B,
> +and the ending point of E by F, effectively becoming B..F, a single
> +revision range.

s/overriden/overridden/

For what it's worth, as a person who is far from expert at revision
ranges, I had to read this revised text five or six times and think
about it quite a bit to understand what it is saying, whereas with
Junio's original[1], I understood it on the first read with only a
little thought.

Also, if this explanation is aimed at newcomers, then saying only
"doing A..F will retrieve 5 commits" without actually saying _which_
commits those are is perhaps not so helpful. A newcomer might be
helped more by enumerating the precise commits:

    The range A..F represents five commits B, C, D, E, F, and the
    range B..E represents three commits C, D, E, ...

[1]: https://lore.kernel.org/git/xmqqv97g2svd.fsf@gitster.g/

Felipe Contreras June 13, 2021, 4:25 a.m. UTC | #4

Eric Sunshine wrote:
> On Sat, Jun 12, 2021 at 8:44 PM Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
> > The original explanation didn't seem clear enough to some people.
> >
> > Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
> > ---
> > diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
> > @@ -299,22 +299,22 @@ empty range that is both reachable and unreachable from HEAD.
> > +For example, if you have a linear history like this:
> >
> > +    ---A---B---C---D---E---F
> >
> > +Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
> > +commits, but doing A..F B..E will not retrieve two revision ranges
> > +totalling 8 commits. Instead the starting point A gets overriden by B,
> > +and the ending point of E by F, effectively becoming B..F, a single
> > +revision range.
> 
> s/overriden/overridden/
> 
> For what it's worth, as a person who is far from expert at revision
> ranges, I had to read this revised text five or six times and think
> about it quite a bit to understand what it is saying,

Can you explain why?

This is the context: commands don't generally take two ranges:

 1. Unless otherwise noted, all git commands that operate on a set of
    commits work on a single revision range.

 2. Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
    commits, but doing A..F B..E will not retrieve two revision ranges
    totalling 8 commits.

At this point what isn't clear? Isn't it clear that `A..F B..E` aren't
two revision ranges?

 3. Instead the starting point A gets overridden by B, and the ending
    point of E by F, effectively becoming B..F, a single revision range.

What isn't clear about that? A gets superseded by B because it's higher
in the graph. And if you do `git log D E F` it's clear that doing
`git log F` will get you the same thing, isn't it?

> Also, if this explanation is aimed at newcomers, then saying only
> "doing A..F will retrieve 5 commits" without actually saying _which_
> commits those are is perhaps not so helpful.

It doesn't matter which specific commits are retrieved, the only thing
that matters is that `X op Y` is not additive.

Elijah Newren June 13, 2021, 7:02 a.m. UTC | #5

On Sat, Jun 12, 2021 at 9:25 PM Felipe Contreras
<felipe.contreras@gmail.com> wrote:
>
> Eric Sunshine wrote:
> > On Sat, Jun 12, 2021 at 8:44 PM Felipe Contreras
> > <felipe.contreras@gmail.com> wrote:
> > > The original explanation didn't seem clear enough to some people.
> > >
> > > Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
> > > ---
> > > diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
> > > @@ -299,22 +299,22 @@ empty range that is both reachable and unreachable from HEAD.
> > > +For example, if you have a linear history like this:
> > >
> > > +    ---A---B---C---D---E---F
> > >
> > > +Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
> > > +commits, but doing A..F B..E will not retrieve two revision ranges
> > > +totalling 8 commits. Instead the starting point A gets overriden by B,
> > > +and the ending point of E by F, effectively becoming B..F, a single
> > > +revision range.
> >
> > s/overriden/overridden/
> >
> > For what it's worth, as a person who is far from expert at revision
> > ranges, I had to read this revised text five or six times and think
> > about it quite a bit to understand what it is saying,
>
> Can you explain why?

I tend to agree with Eric.  I think the example you chose is likely to
be misinterpreted and your wording magnifies it.  A..F B..E simplifies
to B..F which is *almost* the union of A..F and B..E, it's only
missing A.  Off-by-one errors are easy to miss.  You make it more
likely that they'll miss it, because there are only 6 commits total in
the union, and you are trying to explain why listing A..F B..E while
not be 8 commits, which readers can easily respond with, "Well, of
course it's not 8 commits.  There's only 6.  When you do the union
operation, of course the duplicates go away", and miss the actual
point that A got excluded.

Junio's wording and example just seemed better to me here.

>
> This is the context: commands don't generally take two ranges:
>
>  1. Unless otherwise noted, all git commands that operate on a set of
>     commits work on a single revision range.
>
>  2. Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
>     commits, but doing A..F B..E will not retrieve two revision ranges
>     totalling 8 commits.
>
> At this point what isn't clear? Isn't it clear that `A..F B..E` aren't
> two revision ranges?
>
>  3. Instead the starting point A gets overridden by B, and the ending
>     point of E by F, effectively becoming B..F, a single revision range.
>
> What isn't clear about that? A gets superseded by B because it's higher
> in the graph. And if you do `git log D E F` it's clear that doing
> `git log F` will get you the same thing, isn't it?
>
> > Also, if this explanation is aimed at newcomers, then saying only
> > "doing A..F will retrieve 5 commits" without actually saying _which_
> > commits those are is perhaps not so helpful.
>
> It doesn't matter which specific commits are retrieved, the only thing
> that matters is that `X op Y` is not additive.
>
> --
> Felipe Contreras

Eric Sunshine June 13, 2021, 8:11 a.m. UTC | #6

On Sun, Jun 13, 2021 at 12:26 AM Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> Eric Sunshine wrote:
> > For what it's worth, as a person who is far from expert at revision
> > ranges, I had to read this revised text five or six times and think
> > about it quite a bit to understand what it is saying,
>
> Can you explain why?

Probably not to a degree which will satisfy you. And I'm not being
flippant by saying that. I mean only that it is more than a little
difficult to explain why one thing "clicks" easily in the brain while
something else doesn't. I can only relate (to some extent) what I
experienced while reading your revised text.

> This is the context: commands don't generally take two ranges:
>
>  1. Unless otherwise noted, all git commands that operate on a set of
>     commits work on a single revision range.
>
>  2. Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
>     commits, but doing A..F B..E will not retrieve two revision ranges
>     totalling 8 commits.
>
> At this point what isn't clear? Isn't it clear that `A..F B..E` aren't
> two revision ranges?

The documentation stating explicitly that `A..F B..E` is not two
ranges is fine. What was difficult to understand was your explanation
of _why_ those are not two ranges. In contrast, I had no difficulty
understanding Junio's explanation of why that is not two ranges.

>  3. Instead the starting point A gets overridden by B, and the ending
>     point of E by F, effectively becoming B..F, a single revision range.
>
> What isn't clear about that? A gets superseded by B because it's higher
> in the graph. And if you do `git log D E F` it's clear that doing
> `git log F` will get you the same thing, isn't it?

One of the reasons I had to re-read your text so many times was
because it was difficult to build a mental model of what you were
saying, and to follow along with all the "this replaces that" and
"this other thing replaces that other thing". While doing so, I
repeatedly had to glance back at the original `A..F B..E` to make sure
the mental model I was building was correct or still made sense. The
word "overridden" didn't help because I couldn't tell if, by
"overridden", you meant that something got replaced by something else
or if something was merely ignored. (Or maybe those are the same thing
in this case, but how will a newcomer -- who is trying to learn this
from scratch -- know which it is?)

However, an even bigger problem I experienced while reading your
revised text is that it felt like it was trying to express some rule
which the reader should internalize ("replace this with that, and
replace this other thing too") with no proper explanation of _why_ the
rule works that way. Worse, the rule (whatever it is) never actually
materialized or solidified in a way which I could understand and thus
apply to in other situations. Junio's explanation, on the other hand,
was simple and to the point, and (for whatever reason) clicked easily
in my brain, such that I came away feeling that I could apply the
knowledge immediately to other situations. On the other hand, after
reading your proposed text, I did not feel as if I had gained any
knowledge, and even had I picked up the rule which seems to be in
there, I likely still wouldn't have understood _why_ that rule works
or is needed; it would just have been some black box.

> > Also, if this explanation is aimed at newcomers, then saying only
> > "doing A..F will retrieve 5 commits" without actually saying _which_
> > commits those are is perhaps not so helpful.
>
> It doesn't matter which specific commits are retrieved, the only thing
> that matters is that `X op Y` is not additive.

The very first question which popped into my head upon reading "Doing
A..F will retrieve 5 commits" was "which five commits?". Not being
told the answer by the text did not help me feel confident that I knew
the correct five commits. Had the text stated explicitly "the five
commits B, C, D, E, F", then there would be no question and no feeling
of uncertainty about it. So, whatever precision your above statement
might have, it is likely to be lost on the general newcomer who is
simply trying to learn about and understand Git revisions.

Felipe Contreras June 13, 2021, 4:13 p.m. UTC | #7

Eric Sunshine wrote:
> On Sun, Jun 13, 2021 at 12:26 AM Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
> > Eric Sunshine wrote:
> > > For what it's worth, as a person who is far from expert at revision
> > > ranges, I had to read this revised text five or six times and think
> > > about it quite a bit to understand what it is saying,
> >
> > Can you explain why?
> 
> Probably not to a degree which will satisfy you. And I'm not being
> flippant by saying that. I mean only that it is more than a little
> difficult to explain why one thing "clicks" easily in the brain while
> something else doesn't. I can only relate (to some extent) what I
> experienced while reading your revised text.

Yes, but the documentation is not for you, it's for the majority of
users, so it behooves to try to understand the reason to see if it
applies to the population in general.

> > This is the context: commands don't generally take two ranges:
> >
> >  1. Unless otherwise noted, all git commands that operate on a set of
> >     commits work on a single revision range.
> >
> >  2. Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
> >     commits, but doing A..F B..E will not retrieve two revision ranges
> >     totalling 8 commits.
> >
> > At this point what isn't clear? Isn't it clear that `A..F B..E` aren't
> > two revision ranges?
> 
> The documentation stating explicitly that `A..F B..E` is not two
> ranges is fine. What was difficult to understand was your explanation
> of _why_ those are not two ranges.

At this point the _why_ has not been explained, merely that these two
things don't result in two ranges.

> >  3. Instead the starting point A gets overridden by B, and the ending
> >     point of E by F, effectively becoming B..F, a single revision range.
> >
> > What isn't clear about that? A gets superseded by B because it's higher
> > in the graph. And if you do `git log D E F` it's clear that doing
> > `git log F` will get you the same thing, isn't it?
> 
> One of the reasons I had to re-read your text so many times was
> because it was difficult to build a mental model of what you were
> saying, and to follow along with all the "this replaces that" and
> "this other thing replaces that other thing". While doing so, I
> repeatedly had to glance back at the original `A..F B..E` to make sure
> the mental model I was building was correct or still made sense.

I wonder why that is the case. A..F is so simple it doesn't have to be
explained, Ruby even expands that obvious range.

  ---A---B---C---D---E---F
     ^                   ^
    from                 to

And B..E:

  ---A---B---C---D---E---F
         ^           ^
        from         to

In Ruby the range can be defined simply as: 'A'..'F'

  ["A", "B", "C", "D", "E", "F"]

Would 1..6 be easier to picture?

> The word "overridden" didn't help because I couldn't tell if, by
> "overridden", you meant that something got replaced by something else
> or if something was merely ignored. (Or maybe those are the same thing
> in this case, but how will a newcomer -- who is trying to learn this
> from scratch -- know which it is?)

If I say Lucy is available from 1 to 6 p.m. and Michael from 2 to 5 p.m.
why would 2 p.m supersede 1 p.m.?

If we are trying to define a starting point, obviously the latest
starting point is the one that wins. No?

> However, an even bigger problem I experienced while reading your
> revised text is that it felt like it was trying to express some rule
> which the reader should internalize ("replace this with that, and
> replace this other thing too")

The text starts with *for example*. Therefore it's not something
general, it's an example.

> Junio's explanation, on the other hand, was simple and to the point,
> and (for whatever reason) clicked easily in my brain, such that I came
> away feeling that I could apply the knowledge immediately to other
> situations.

Junio's explanation is inaccurate because it stated that this:

 Unless otherwise noted, all git commands that operate on a set of
 commits work on a single revision range.

Is the same as this:

 writing two "two-dot range notation" next to each does *not* specify
 two revision ranges for most commands.

But it is not the same.

Can you tell me why?

> On the other hand, after reading your proposed text, I did not feel as
> if I had gained any knowledge, and even had I picked up the rule which
> seems to be in there,

The text never mentioned any rule.

> > > Also, if this explanation is aimed at newcomers, then saying only
> > > "doing A..F will retrieve 5 commits" without actually saying _which_
> > > commits those are is perhaps not so helpful.
> >
> > It doesn't matter which specific commits are retrieved, the only thing
> > that matters is that `X op Y` is not additive.
> 
> The very first question which popped into my head upon reading "Doing
> A..F will retrieve 5 commits" was "which five commits?".

Keep reading.

> So, whatever precision your above statement might have, it is likely
> to be lost on the general newcomer who is simply trying to learn about
> and understand Git revisions.

Or maybe it's something that only applies to you.

Cheers.

Felipe Contreras June 13, 2021, 5:09 p.m. UTC | #8

Elijah Newren wrote:
> On Sat, Jun 12, 2021 at 9:25 PM Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
> >
> > Eric Sunshine wrote:
> > > On Sat, Jun 12, 2021 at 8:44 PM Felipe Contreras
> > > <felipe.contreras@gmail.com> wrote:
> > > > The original explanation didn't seem clear enough to some people.
> > > >
> > > > Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
> > > > ---
> > > > diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
> > > > @@ -299,22 +299,22 @@ empty range that is both reachable and unreachable from HEAD.
> > > > +For example, if you have a linear history like this:
> > > >
> > > > +    ---A---B---C---D---E---F
> > > >
> > > > +Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
> > > > +commits, but doing A..F B..E will not retrieve two revision ranges
> > > > +totalling 8 commits. Instead the starting point A gets overriden by B,
> > > > +and the ending point of E by F, effectively becoming B..F, a single
> > > > +revision range.
> > >
> > > s/overriden/overridden/
> > >
> > > For what it's worth, as a person who is far from expert at revision
> > > ranges, I had to read this revised text five or six times and think
> > > about it quite a bit to understand what it is saying,
> >
> > Can you explain why?
> 
> I tend to agree with Eric.  I think the example you chose is likely to
> be misinterpreted and your wording magnifies it.  A..F B..E simplifies
> to B..F which is *almost* the union of A..F and B..E, it's only
> missing A.  Off-by-one errors are easy to miss.

Yes, but right before it's explained that the ending point is F.
Not E, F.

> You make it more likely that they'll miss it, because there are only 6
> commits total in the union, and you are trying to explain why listing
> A..F B..E while not be 8 commits, which readers can easily respond
> with, "Well, of course it's not 8 commits.  There's only 6.

If the reader understands that no more than 6 commits can be returned,
then the reader has understood the point that the operation is not
addition.

> When you do the union operation, of course the duplicates go away",
> and miss the actual point that A got excluded.

But that is not the point. This is the point:

  Unless otherwise noted, all git commands that operate on a set of
  commits work on a single revision range.

You are missing the forest for the trees.

In the context of gitrevisions(7) the user has just been told that:

  1. We are trying to specify a graph of commits reachable from a
     commit, or commits.

The user was shown this graph:

  G   H   I   J
   \ /     \ /
    D   E   F
     \  |  / \
      \ | /   |
       \|/    |
        B     C
         \   /
          \ /
           A

And that B is A^, therefore doing `git log A B` is redundant, as is
doing `git log A B D`.

  2. The caret notation `^r1 r2` means commits reachable from r2, but
     exclude commits reachable from r1 (r1 and it's ancestors)

That means '^D A' will exclude D G and H.

  3. The two-dot range notation `r1..r2` is the same as `^r1 r2`

Now, whith this context in mind, we are trying to hedge the corner-case
of `r1..r2 r3..r4` in other words: `^r1 r2 ^r3 r4`.

The user has been told already that C..A is the same as `^C A` (I'm
changing the order to be consistent with the graph above). And to make
my point clear I actually don't need two starting points.

So how about this:

  Commands that are specifically designed to take two distinct ranges
  (e.g. "git range-diff R1 R2" to compare two ranges) do exist, but
  they are exceptions.  Unless otherwise noted, all git commands
  that operate on a set of commits work on a single revision range. Just
  like 'A A' coalesces to 'A', 'B..A C..A' is the same as the
  single revision range '^B ^C A'.

Elijah Newren June 14, 2021, 2:39 p.m. UTC | #9

On Sun, Jun 13, 2021 at 10:09 AM Felipe Contreras
<felipe.contreras@gmail.com> wrote:
>
> Elijah Newren wrote:
> > On Sat, Jun 12, 2021 at 9:25 PM Felipe Contreras
> > <felipe.contreras@gmail.com> wrote:
> > >
> > > Eric Sunshine wrote:
> > > > On Sat, Jun 12, 2021 at 8:44 PM Felipe Contreras
> > > > <felipe.contreras@gmail.com> wrote:
> > > > > The original explanation didn't seem clear enough to some people.
> > > > >
> > > > > Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
> > > > > ---
> > > > > diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
> > > > > @@ -299,22 +299,22 @@ empty range that is both reachable and unreachable from HEAD.
> > > > > +For example, if you have a linear history like this:
> > > > >
> > > > > +    ---A---B---C---D---E---F
> > > > >
> > > > > +Doing A..F will retrieve 5 commits, and doing B..E will retrieve 3
> > > > > +commits, but doing A..F B..E will not retrieve two revision ranges
> > > > > +totalling 8 commits. Instead the starting point A gets overriden by B,
> > > > > +and the ending point of E by F, effectively becoming B..F, a single
> > > > > +revision range.
> > > >
> > > > s/overriden/overridden/
> > > >
> > > > For what it's worth, as a person who is far from expert at revision
> > > > ranges, I had to read this revised text five or six times and think
> > > > about it quite a bit to understand what it is saying,
> > >
> > > Can you explain why?
> >
> > I tend to agree with Eric.  I think the example you chose is likely to
> > be misinterpreted and your wording magnifies it.  A..F B..E simplifies
> > to B..F which is *almost* the union of A..F and B..E, it's only
> > missing A.  Off-by-one errors are easy to miss.
>
> Yes, but right before it's explained that the ending point is F.
> Not E, F.

I think this is somewhat of a useless distinction -- not for the end
result, but in terms of helping users understand.  We started adding
an explanation to the manual because users misunderstand how
"start1..end1 start2..end2" is treated and we want to correct their
misunderstandings.  In that context, the only misunderstanding I can
think of that is dispelled by specifying F is the endpoint would be
"two ranges are intersected to get the range of commits that log will
operate on".  I've never seen users assume that or make such a
mistake.  I've always seen them assume that the "two ranges are
combined with a union".  In that case, F matches their
misunderstanding, so this part of the explanation does nothing to help
correct their assumptions.

The only place their misunderstanding disagrees with the correct
answer for your example is on the other side of those ranges.  They
would have gotten an incorrect answer of "A..F B..E" == "A..F" ,
whereas the correct answer is "B..F".  That's an off-by-one error, but
I think they're likely to miss it.  Especially given that folks
already mess up the left hand side of single "FOO..BAR" expressions
with off-by-one errors.

> > You make it more likely that they'll miss it, because there are only 6
> > commits total in the union, and you are trying to explain why listing
> > A..F B..E while not be 8 commits, which readers can easily respond
> > with, "Well, of course it's not 8 commits.  There's only 6.
>
> If the reader understands that no more than 6 commits can be returned,
> then the reader has understood the point that the operation is not
> addition.

Who in the world ever assumes that "two dotted ranges are combined via
list addition"?  I've only ever come across users assuming the
operation is a union (or, equivalently, addition on sets).  I don't
understand why you even try to make that point, and think it's a
distraction that does more harm than good.

> > When you do the union operation, of course the duplicates go away",
> > and miss the actual point that A got excluded.
>
> But that is not the point. This is the point:
>
>   Unless otherwise noted, all git commands that operate on a set of
>   commits work on a single revision range.
>
> You are missing the forest for the trees.

I think you are missing the boat.

That sentence on its own is completely insufficient to dispel the
misunderstanding.  All that sentence says to users is that if they
specify what they think of as "two ranges" that we'll somehow treat it
as one; but since users are prone to think that "revision range" is
interchangeable with "set of revisions" (especially since we defined
A..B elsewhere in set operations), this will merely make them think in
terms of what set operation they need to perform on the "two ranges"
to get the set of commits the operation will function on.

Most users I've seen simply do that via applying a simple operation to
combine two ranges into one.  Everyone I've ever run across that
misunderstands this "two range" thing, does so in the same way: by
assuming that the two ranges are combined via a union to get an
interesting set of commits.

The example you provide should attempt to help explain why that mental
model is mistaken and provide them with a corrected one.  Your
response to Eric suggests you're not even trying to provide a
corrected mental model, and your response here suggests you are trying
to only correct mistakes of the form "take two revision ranges and add
them keeping duplicates" and "take two revision ranges and intersect
them", neither of which I've observed in the wild.

> In the context of gitrevisions(7) the user has just been told that:
>
>   1. We are trying to specify a graph of commits reachable from a
>      commit, or commits.
>
> The user was shown this graph:
>
>   G   H   I   J
>    \ /     \ /
>     D   E   F
>      \  |  / \
>       \ | /   |
>        \|/    |
>         B     C
>          \   /
>           \ /
>            A
>
> And that B is A^, therefore doing `git log A B` is redundant, as is
> doing `git log A B D`.
>
>   2. The caret notation `^r1 r2` means commits reachable from r2, but
>      exclude commits reachable from r1 (r1 and it's ancestors)
>
> That means '^D A' will exclude D G and H.
>
>   3. The two-dot range notation `r1..r2` is the same as `^r1 r2`
>
>
> Now, whith this context in mind, we are trying to hedge the corner-case
> of `r1..r2 r3..r4` in other words: `^r1 r2 ^r3 r4`.
>
> The user has been told already that C..A is the same as `^C A` (I'm
> changing the order to be consistent with the graph above). And to make
> my point clear I actually don't need two starting points.
>
> So how about this:
>
>   Commands that are specifically designed to take two distinct ranges
>   (e.g. "git range-diff R1 R2" to compare two ranges) do exist, but
>   they are exceptions.  Unless otherwise noted, all git commands
>   that operate on a set of commits work on a single revision range. Just
>   like 'A A' coalesces to 'A', 'B..A C..A' is the same as the
>   single revision range '^B ^C A'.

Your example here almost seems to suggest that we do an intersection
of the "two ranges" to get the answer.  It's certainly not your
intent, but I think the users I've helped would be prone to read it
that way due to your focus on coalescing, and due to your selection of
an example which happens to give the correct answer when using the
intersection misinterpretation.

I would be much happier with something like this:

"""
Note: There is no shorthand for getting a union or intersection of
multiple dotted ranges.

Commands that are specifically designed to take two distinct ranges
(e.g. "git range-diff R1 R2" to compare two ranges) do exist, but they
are exceptions.  Unless otherwise noted, all git commands that operate
on a set of commits work on a single revision range.  Thus, just as
"A..B" translates to "^A B", the expression "A..B C..D" translates to
"^A B ^C D", i.e. all commits reachable from either B or D, as long as
they are not reachable from either A or C.  This is much different
than you would get by trying to do either an intersection or union of
the two separate ranges A..B and C..D.  Compare the differences on the
following simple linear history:

    ---A---B---C---D---E---F---G---H

The command

$ git log A..E C..H

would be the same as

$ git log C..H

(since E is reachable from H, and A is reachable from C).  In contrast,
the union of A..E and C..H would be A..H, while the intersection would
be C..E.
"""

Felipe Contreras June 15, 2021, 11:53 a.m. UTC | #10

Elijah Newren wrote:
> On Sun, Jun 13, 2021 at 10:09 AM Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
> > Elijah Newren wrote:

> > > I tend to agree with Eric.  I think the example you chose is likely to
> > > be misinterpreted and your wording magnifies it.  A..F B..E simplifies
> > > to B..F which is *almost* the union of A..F and B..E, it's only
> > > missing A.  Off-by-one errors are easy to miss.
> >
> > Yes, but right before it's explained that the ending point is F.
> > Not E, F.
> 
> I think this is somewhat of a useless distinction -- not for the end
> result, but in terms of helping users understand.  We started adding
> an explanation to the manual because users misunderstand how
> "start1..end1 start2..end2" is treated and we want to correct their
> misunderstandings.  In that context, the only misunderstanding I can
> think of that is dispelled by specifying F is the endpoint would be
> "two ranges are intersected to get the range of commits that log will
> operate on".  I've never seen users assume that or make such a
> mistake.  I've always seen them assume that the "two ranges are
> combined with a union".

Then that warrants yet another paragraph, because this one is for:

  Commands that are specifically designed to take two distinct ranges
  (e.g. "git range-diff R1 R2" to compare two ranges) do exist, but they
  are exceptions.

Probably outside the section of Dotted Range Notations, because if the
user is confused about what 'C B A A' should do, that has nothing to do
with this dotted ranges.

Maybe after the user has been told that:

  Specifying several revisions means the set of commits reachable from
  any of the given commits.

  A commit's reachable set is the commit itself and the commits in
  its ancestry chain.

> > > You make it more likely that they'll miss it, because there are only 6
> > > commits total in the union, and you are trying to explain why listing
> > > A..F B..E while not be 8 commits, which readers can easily respond
> > > with, "Well, of course it's not 8 commits.  There's only 6.
> >
> > If the reader understands that no more than 6 commits can be returned,
> > then the reader has understood the point that the operation is not
> > addition.
> 
> Who in the world ever assumes that "two dotted ranges are combined via
> list addition"?

I don't know, but that is the paragraph we are on:

  Commands that are specifically designed to take two distinct ranges
  (e.g. "git range-diff R1 R2" to compare two ranges) do exist, but they
  are exceptions.

If you are arguing for the removal of this entire paragraph and its
examples, I'd be fine with that.

> I've only ever come across users assuming the
> operation is a union (or, equivalently, addition on sets).  I don't
> understand why you even try to make that point, and think it's a
> distraction that does more harm than good.

If you think it's impossible for the user to assume two dotted ranges
means addition, please explain what is the point of this sentence:

  Unless otherwise noted, all "git" commands that operate on a set of
  commits work on a single revision range.

> > > When you do the union operation, of course the duplicates go away",
> > > and miss the actual point that A got excluded.
> >
> > But that is not the point. This is the point:
> >
> >   Unless otherwise noted, all git commands that operate on a set of
> >   commits work on a single revision range.
> >
> > You are missing the forest for the trees.
> 
> I think you are missing the boat.
> 
> That sentence on its own is completely insufficient to dispel the
> misunderstanding.

One misunderstanding, perhaps, not the one we are trying to tackle
here.

> All that sentence says to users is that if they specify what they
> think of as "two ranges" that we'll somehow treat it as one;

Didn't you just said the user would never think it's actually two
ranges?

What's the point in saying that if the user already knows it?

> but since users are prone to think that "revision range" is
> interchangeable with "set of revisions" (especially since we defined
> A..B elsewhere in set operations), this will merely make them think in
> terms of what set operation they need to perform on the "two ranges"
> to get the set of commits the operation will function on.

That belongs in a separate paragraph.

> The example you provide should attempt to help explain why that mental
> model is mistaken and provide them with a corrected one.  Your
> response to Eric suggests you're not even trying to provide a
> corrected mental model, and your response here suggests you are trying
> to only correct mistakes of the form "take two revision ranges and add
> them keeping duplicates" and "take two revision ranges and intersect
> them", neither of which I've observed in the wild.

I'm providing an example for the paragraph that is already written.

If you want me to rewrite the entire section I can certainly give it a
try.

> Commands that are specifically designed to take two distinct ranges
> (e.g. "git range-diff R1 R2" to compare two ranges) do exist, but they
> are exceptions.  Unless otherwise noted, all git commands that operate
> on a set of commits work on a single revision range.

Isn't this obvious for all users?

> Thus, just as "A..B" translates to "^A B", the expression "A..B C..D"
> translates to "^A B ^C D", i.e. all commits reachable from either B or
> D, as long as they are not reachable from either A or C.

How about we remove the entire paragraph and replace it with:

  When specifying two ranges, such as 'A..B C..D', the way this is
  interpreted is as a single range '^A B ^C D', that is: all commits
  reachable from either B or D, as long as they are not reachable from
  either A or C. Assuming a linear history, B would be reachable from C,
  so this is the same as '^C D'.

doc: revisions: improve single range explanation

Commit Message

Comments

Patch