diff mbox series

describe-doc: clarify default length of abbreviation

Message ID pull.1026.git.git.1621150366442.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series describe-doc: clarify default length of abbreviation | expand

Commit Message

Anders Höckersten May 16, 2021, 7:32 a.m. UTC
From: =?UTF-8?q?Anders=20H=C3=B6ckersten?= <anders@hockersten.se>

Clarify the default length used for the abbreviated form used for
commits in git describe.

The behavior was modified in Git 2.11.0, but the documentation was not
updated to clarify the new behavior.

Signed-off-by: Anders Höckersten <anders@hockersten.se>
---
    describe-doc: clarify default length of abbreviation
    
    Clarify the default length used for the abbreviated form used for
    commits in git describe.
    
    The behavior was modified in Git 2.11.0, but the documentation was not
    updated to clarify the new behavior.
    
    Signed-off-by: Anders Höckersten anders@hockersten.se

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1026%2Fahockersten%2Fdescribe-doc-abbreviation-clarification-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1026/ahockersten/describe-doc-abbreviation-clarification-v1
Pull-Request: https://github.com/git/git/pull/1026

 Documentation/git-describe.txt | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)


base-commit: 48bf2fa8bad054d66bd79c6ba903c89c704201f7

Comments

Junio C Hamano May 16, 2021, 11:34 a.m. UTC | #1
"Anders Höckersten via GitGitGadget"  <gitgitgadget@gmail.com>
writes:

> From: =?UTF-8?q?Anders=20H=C3=B6ckersten?= <anders@hockersten.se>
>
> Clarify the default length used for the abbreviated form used for
> commits in git describe.
>
> The behavior was modified in Git 2.11.0, but the documentation was not
> updated to clarify the new behavior.
>
> Signed-off-by: Anders Höckersten <anders@hockersten.se>
> ---
>     describe-doc: clarify default length of abbreviation
>     
>     Clarify the default length used for the abbreviated form used for
>     commits in git describe.
>     
>     The behavior was modified in Git 2.11.0, but the documentation was not
>     updated to clarify the new behavior.
>     
>     Signed-off-by: Anders Höckersten anders@hockersten.se
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1026%2Fahockersten%2Fdescribe-doc-abbreviation-clarification-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1026/ahockersten/describe-doc-abbreviation-clarification-v1
> Pull-Request: https://github.com/git/git/pull/1026


.git/rebase-apply/patch:32: trailing whitespace.
	will vary according to the size of the repository with a default of 
.git/rebase-apply/patch:34: trailing whitespace.
	as needed to form a unique object name.  An <n> of 0 will suppress 
.git/rebase-apply/patch:46: trailing whitespace.
of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The 
.git/rebase-apply/patch:47: trailing whitespace.
length of the abbreviation scales as the repository grows, using the 
.git/rebase-apply/patch:48: trailing whitespace.
approximate number of objects in the repository and a bit of math 
warning: 5 lines applied after fixing whitespace errors.
Applying: describe-doc: clarify default length of abbreviation

Will fix them up while applying, but please be careful the next
time.  A good way to double check is to do

    $ git format-patch -o my.patch origin..
    $ git checkout --detach origin
    $ git am --whitespace=warn my.patch

that is, (1) create a patch series out of the branch you have worked
on, designed to be applied on top of the origin, (2) detach HEAD at
the origin, and (3) apply the patch.  Once you are done, you can go
back to the original branch with

    $ git checkout -

The patch text looks good.

Thanks.


>  Documentation/git-describe.txt | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
> index a88f6ae2c6e7..f3f41cfac496 100644
> --- a/Documentation/git-describe.txt
> +++ b/Documentation/git-describe.txt
> @@ -63,10 +63,11 @@ OPTIONS
>  	Automatically implies --tags.
>  
>  --abbrev=<n>::
> -	Instead of using the default 7 hexadecimal digits as the
> -	abbreviated object name, use <n> digits, or as many digits
> -	as needed to form a unique object name.  An <n> of 0
> -	will suppress long format, only showing the closest tag.
> +	Instead of using the default number of hexadecimal digits (which
> +	will vary according to the size of the repository with a default of 
> +	7) of the abbreviated object name, use <n> digits, or as many digits
> +	as needed to form a unique object name.  An <n> of 0 will suppress 
> +	long format, only showing the closest tag.
>  
>  --candidates=<n>::
>  	Instead of considering only the 10 most recent tags as
> @@ -139,8 +140,11 @@ at the end.
>  
>  The number of additional commits is the number
>  of commits which would be displayed by "git log v1.0.4..parent".
> -The hash suffix is "-g" + unambiguous abbreviation for the tip commit
> -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`).
> +The hash suffix is "-g" + an unambigous abbreviation for the tip commit
> +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The 
> +length of the abbreviation scales as the repository grows, using the 
> +approximate number of objects in the repository and a bit of math 
> +around the birthday paradox, and defaults to a minimum of 7.
>  The "g" prefix stands for "git" and is used to allow describing the version of
>  a software depending on the SCM the software is managed with. This is useful
>  in an environment where people may use different SCMs.
>
> base-commit: 48bf2fa8bad054d66bd79c6ba903c89c704201f7
Bagas Sanjaya May 16, 2021, noon UTC | #2
On 16/05/21 14.32, Anders Höckersten via GitGitGadget wrote:
  >   --abbrev=<n>::
> -	Instead of using the default 7 hexadecimal digits as the
> -	abbreviated object name, use <n> digits, or as many digits
> -	as needed to form a unique object name.  An <n> of 0
> -	will suppress long format, only showing the closest tag.
> +	Instead of using the default number of hexadecimal digits (which
> +	will vary according to the size of the repository with a default of
> +	7) of the abbreviated object name, use <n> digits, or as many digits
> +	as needed to form a unique object name.  An <n> of 0 will suppress
> +	long format, only showing the closest tag.
>   

I think the more correct is the abbreviated hash length is determined
by number of objects.

>   --candidates=<n>::
>   	Instead of considering only the 10 most recent tags as
> @@ -139,8 +140,11 @@ at the end.
>   
>   The number of additional commits is the number
>   of commits which would be displayed by "git log v1.0.4..parent".
> -The hash suffix is "-g" + unambiguous abbreviation for the tip commit
> -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`).
> +The hash suffix is "-g" + an unambigous abbreviation for the tip commit
> +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The
> +length of the abbreviation scales as the repository grows, using the
> +approximate number of objects in the repository and a bit of math
> +around the birthday paradox, and defaults to a minimum of 7.

What is the birthday paradox then?

Also, better say "... and by default the minimum length is 7.".

Thanks.
Anders Höckersten May 16, 2021, 12:47 p.m. UTC | #3
On Sun, May 16, 2021, at 14:00, Bagas Sanjaya wrote:
> On 16/05/21 14.32, Anders Höckersten via GitGitGadget wrote:
>   >   --abbrev=<n>::
> > -	Instead of using the default 7 hexadecimal digits as the
> > -	abbreviated object name, use <n> digits, or as many digits
> > -	as needed to form a unique object name.  An <n> of 0
> > -	will suppress long format, only showing the closest tag.
> > +	Instead of using the default number of hexadecimal digits (which
> > +	will vary according to the size of the repository with a default of
> > +	7) of the abbreviated object name, use <n> digits, or as many digits
> > +	as needed to form a unique object name.  An <n> of 0 will suppress
> > +	long format, only showing the closest tag.
> >   
> 
> I think the more correct is the abbreviated hash length is determined
> by number of objects.

I agree. I will modify this to:  "(which will vary according to the number of objects in the repository with a default of 7)" unless you have a different suggestion.

> >   --candidates=<n>::
> >   	Instead of considering only the 10 most recent tags as
> > @@ -139,8 +140,11 @@ at the end.
> >   
> >   The number of additional commits is the number
> >   of commits which would be displayed by "git log v1.0.4..parent".
> > -The hash suffix is "-g" + unambiguous abbreviation for the tip commit
> > -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`).
> > +The hash suffix is "-g" + an unambigous abbreviation for the tip commit
> > +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The
> > +length of the abbreviation scales as the repository grows, using the
> > +approximate number of objects in the repository and a bit of math
> > +around the birthday paradox, and defaults to a minimum of 7.
> 
> What is the birthday paradox then?
> 
> Also, better say "... and by default the minimum length is 7.".

The explanation was mostly copied from the 2.11.0 release notes, but mentioning the birtday paradox is unnecessary. I suggest changing this sentence to:
"The length of the abbreviation scales as the repository grows using the approximate number of objects in the repository, and by default the minimum length is 7."

Best regards,
Anders
Junio C Hamano May 16, 2021, 12:58 p.m. UTC | #4
Anders Höckersten <anders@hockersten.se> writes:

> The explanation was mostly copied from the 2.11.0 release notes,
> but mentioning the birtday paradox is unnecessary. I suggest
> changing this sentence to: "The length of the abbreviation scales
> as the repository grows using the approximate number of objects in
> the repository, and by default the minimum length is 7."

Heh.  In my priate review, I said that I very much liked the way the
new description was phrased with "a bit of math around the birthday
paradox".  Now I know why I liked that phrasing---it turns out to be
my own ;-)

I don't mind with or without mention of the birthday math.  Thanks
for working on this.
Felipe Contreras May 16, 2021, 6:51 p.m. UTC | #5
Bagas Sanjaya wrote:
> On 16/05/21 14.32, Anders Höckersten via GitGitGadget wrote:
> > @@ -139,8 +140,11 @@ at the end.
> >   
> >   The number of additional commits is the number
> >   of commits which would be displayed by "git log v1.0.4..parent".
> > -The hash suffix is "-g" + unambiguous abbreviation for the tip commit
> > -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`).
> > +The hash suffix is "-g" + an unambigous abbreviation for the tip commit
> > +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The
> > +length of the abbreviation scales as the repository grows, using the
> > +approximate number of objects in the repository and a bit of math
> > +around the birthday paradox, and defaults to a minimum of 7.
> 
> What is the birthday paradox then?

It's a probability fact that goes against common sense. In a romm with 23
people you are 50% likely to find two people with the same birthday.

https://en.wikipedia.org/wiki/Birthday_problem
Robert P. J. Day May 16, 2021, 7 p.m. UTC | #6
On Sun, 16 May 2021, Felipe Contreras wrote:

> Bagas Sanjaya wrote:
> > On 16/05/21 14.32, Anders Höckersten via GitGitGadget wrote:
> > > @@ -139,8 +140,11 @@ at the end.
> > >
> > >   The number of additional commits is the number
> > >   of commits which would be displayed by "git log v1.0.4..parent".
> > > -The hash suffix is "-g" + unambiguous abbreviation for the tip commit
> > > -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`).
> > > +The hash suffix is "-g" + an unambigous abbreviation for the tip commit
> > > +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The
> > > +length of the abbreviation scales as the repository grows, using the
> > > +approximate number of objects in the repository and a bit of math
> > > +around the birthday paradox, and defaults to a minimum of 7.
> >
> > What is the birthday paradox then?
>
> It's a probability fact that goes against common sense. In a romm
> with 23 people you are 50% likely to find two people with the same
> birthday.
>
> https://en.wikipedia.org/wiki/Birthday_problem

  i've had to explain the logic behind this to people who really have
a tough time understanding this, and it's a concept that applies in a
lot of places (surprisingly).

  what trips people up is thinking they need to calculate the
probability that two or more people have the same birthday.

  no.

  what you want to do is calculate the ongoing probability that each
person's birthday is *different* from all the earlier ones. as in:

  * prob that 2nd person has a different bday is 364/365
  * prob that 3rd person has a different bday is 363/365
  * prob that 4th person has a different bday is 362/365

and so on, and as you multiply those together, it's right at 23 people
that the chance that that person has a different bday from all earlier
ones drops below 50%.

  what's neat is that this way of looking at things applies in a lot
of places.

rday
Felipe Contreras May 16, 2021, 9:07 p.m. UTC | #7
On Sun, May 16, 2021 at 2:00 PM Robert P. J. Day <rpjday@crashcourse.ca> wrote:
> On Sun, 16 May 2021, Felipe Contreras wrote:
> > Bagas Sanjaya wrote:

> > > What is the birthday paradox then?
> >
> > It's a probability fact that goes against common sense. In a romm
> > with 23 people you are 50% likely to find two people with the same
> > birthday.
> >
> > https://en.wikipedia.org/wiki/Birthday_problem
>
>   i've had to explain the logic behind this to people who really have
> a tough time understanding this, and it's a concept that applies in a
> lot of places (surprisingly).

Indeed. Very very few people actually understand probability. Any
intuition you have is almost always wrong. Even professional
probabilists get probability wrong consistently.

I've found it's safer and easier to not trust my intuition, write
code, and that way get the probability (also called Monte Carlo
method).

I have a git repository with tricky simulations and I actually had
written one for the birthday paradox, but I had not pushed it. Now I
have [1].

The actual code is just two lines:

  birthdays = Array.new($n) { rand(365.25) }
  birthdays.any? { |e| birthdays.count(e) > 1 }

Yet our brain somehow has trouble figuring out the approximate result
of that computation.

Cheers.

[1] https://github.com/felipec/simulation/blob/master/examples/birthday
Anders Höckersten May 17, 2021, 5:51 a.m. UTC | #8
On Sun, May 16, 2021, at 14:58, Junio C Hamano wrote:
> Anders Höckersten <anders@hockersten.se> writes:
> 
> > The explanation was mostly copied from the 2.11.0 release notes,
> > but mentioning the birtday paradox is unnecessary. I suggest
> > changing this sentence to: "The length of the abbreviation scales
> > as the repository grows using the approximate number of objects in
> > the repository, and by default the minimum length is 7."
> 
> Heh.  In my priate review, I said that I very much liked the way the
> new description was phrased with "a bit of math around the birthday
> paradox".  Now I know why I liked that phrasing---it turns out to be
> my own ;-)
> 
> I don't mind with or without mention of the birthday math.  Thanks
> for working on this.

Actually, changed my mind again. I like the phrasing and it's in the "examples" section so a bit of verbosity doesn't hurt. Will submit a new patch with the other changes mentioned + whitespace fixes momentarily.

/A
diff mbox series

Patch

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index a88f6ae2c6e7..f3f41cfac496 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -63,10 +63,11 @@  OPTIONS
 	Automatically implies --tags.
 
 --abbrev=<n>::
-	Instead of using the default 7 hexadecimal digits as the
-	abbreviated object name, use <n> digits, or as many digits
-	as needed to form a unique object name.  An <n> of 0
-	will suppress long format, only showing the closest tag.
+	Instead of using the default number of hexadecimal digits (which
+	will vary according to the size of the repository with a default of 
+	7) of the abbreviated object name, use <n> digits, or as many digits
+	as needed to form a unique object name.  An <n> of 0 will suppress 
+	long format, only showing the closest tag.
 
 --candidates=<n>::
 	Instead of considering only the 10 most recent tags as
@@ -139,8 +140,11 @@  at the end.
 
 The number of additional commits is the number
 of commits which would be displayed by "git log v1.0.4..parent".
-The hash suffix is "-g" + unambiguous abbreviation for the tip commit
-of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`).
+The hash suffix is "-g" + an unambigous abbreviation for the tip commit
+of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The 
+length of the abbreviation scales as the repository grows, using the 
+approximate number of objects in the repository and a bit of math 
+around the birthday paradox, and defaults to a minimum of 7.
 The "g" prefix stands for "git" and is used to allow describing the version of
 a software depending on the SCM the software is managed with. This is useful
 in an environment where people may use different SCMs.