Message ID | pull.1026.git.git.1621150366442.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | describe-doc: clarify default length of abbreviation | expand |
"Anders Höckersten via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: =?UTF-8?q?Anders=20H=C3=B6ckersten?= <anders@hockersten.se> > > Clarify the default length used for the abbreviated form used for > commits in git describe. > > The behavior was modified in Git 2.11.0, but the documentation was not > updated to clarify the new behavior. > > Signed-off-by: Anders Höckersten <anders@hockersten.se> > --- > describe-doc: clarify default length of abbreviation > > Clarify the default length used for the abbreviated form used for > commits in git describe. > > The behavior was modified in Git 2.11.0, but the documentation was not > updated to clarify the new behavior. > > Signed-off-by: Anders Höckersten anders@hockersten.se > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1026%2Fahockersten%2Fdescribe-doc-abbreviation-clarification-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1026/ahockersten/describe-doc-abbreviation-clarification-v1 > Pull-Request: https://github.com/git/git/pull/1026 .git/rebase-apply/patch:32: trailing whitespace. will vary according to the size of the repository with a default of .git/rebase-apply/patch:34: trailing whitespace. as needed to form a unique object name. An <n> of 0 will suppress .git/rebase-apply/patch:46: trailing whitespace. of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The .git/rebase-apply/patch:47: trailing whitespace. length of the abbreviation scales as the repository grows, using the .git/rebase-apply/patch:48: trailing whitespace. approximate number of objects in the repository and a bit of math warning: 5 lines applied after fixing whitespace errors. Applying: describe-doc: clarify default length of abbreviation Will fix them up while applying, but please be careful the next time. A good way to double check is to do $ git format-patch -o my.patch origin.. $ git checkout --detach origin $ git am --whitespace=warn my.patch that is, (1) create a patch series out of the branch you have worked on, designed to be applied on top of the origin, (2) detach HEAD at the origin, and (3) apply the patch. Once you are done, you can go back to the original branch with $ git checkout - The patch text looks good. Thanks. > Documentation/git-describe.txt | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt > index a88f6ae2c6e7..f3f41cfac496 100644 > --- a/Documentation/git-describe.txt > +++ b/Documentation/git-describe.txt > @@ -63,10 +63,11 @@ OPTIONS > Automatically implies --tags. > > --abbrev=<n>:: > - Instead of using the default 7 hexadecimal digits as the > - abbreviated object name, use <n> digits, or as many digits > - as needed to form a unique object name. An <n> of 0 > - will suppress long format, only showing the closest tag. > + Instead of using the default number of hexadecimal digits (which > + will vary according to the size of the repository with a default of > + 7) of the abbreviated object name, use <n> digits, or as many digits > + as needed to form a unique object name. An <n> of 0 will suppress > + long format, only showing the closest tag. > > --candidates=<n>:: > Instead of considering only the 10 most recent tags as > @@ -139,8 +140,11 @@ at the end. > > The number of additional commits is the number > of commits which would be displayed by "git log v1.0.4..parent". > -The hash suffix is "-g" + unambiguous abbreviation for the tip commit > -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). > +The hash suffix is "-g" + an unambigous abbreviation for the tip commit > +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The > +length of the abbreviation scales as the repository grows, using the > +approximate number of objects in the repository and a bit of math > +around the birthday paradox, and defaults to a minimum of 7. > The "g" prefix stands for "git" and is used to allow describing the version of > a software depending on the SCM the software is managed with. This is useful > in an environment where people may use different SCMs. > > base-commit: 48bf2fa8bad054d66bd79c6ba903c89c704201f7
On 16/05/21 14.32, Anders Höckersten via GitGitGadget wrote: > --abbrev=<n>:: > - Instead of using the default 7 hexadecimal digits as the > - abbreviated object name, use <n> digits, or as many digits > - as needed to form a unique object name. An <n> of 0 > - will suppress long format, only showing the closest tag. > + Instead of using the default number of hexadecimal digits (which > + will vary according to the size of the repository with a default of > + 7) of the abbreviated object name, use <n> digits, or as many digits > + as needed to form a unique object name. An <n> of 0 will suppress > + long format, only showing the closest tag. > I think the more correct is the abbreviated hash length is determined by number of objects. > --candidates=<n>:: > Instead of considering only the 10 most recent tags as > @@ -139,8 +140,11 @@ at the end. > > The number of additional commits is the number > of commits which would be displayed by "git log v1.0.4..parent". > -The hash suffix is "-g" + unambiguous abbreviation for the tip commit > -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). > +The hash suffix is "-g" + an unambigous abbreviation for the tip commit > +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The > +length of the abbreviation scales as the repository grows, using the > +approximate number of objects in the repository and a bit of math > +around the birthday paradox, and defaults to a minimum of 7. What is the birthday paradox then? Also, better say "... and by default the minimum length is 7.". Thanks.
On Sun, May 16, 2021, at 14:00, Bagas Sanjaya wrote: > On 16/05/21 14.32, Anders Höckersten via GitGitGadget wrote: > > --abbrev=<n>:: > > - Instead of using the default 7 hexadecimal digits as the > > - abbreviated object name, use <n> digits, or as many digits > > - as needed to form a unique object name. An <n> of 0 > > - will suppress long format, only showing the closest tag. > > + Instead of using the default number of hexadecimal digits (which > > + will vary according to the size of the repository with a default of > > + 7) of the abbreviated object name, use <n> digits, or as many digits > > + as needed to form a unique object name. An <n> of 0 will suppress > > + long format, only showing the closest tag. > > > > I think the more correct is the abbreviated hash length is determined > by number of objects. I agree. I will modify this to: "(which will vary according to the number of objects in the repository with a default of 7)" unless you have a different suggestion. > > --candidates=<n>:: > > Instead of considering only the 10 most recent tags as > > @@ -139,8 +140,11 @@ at the end. > > > > The number of additional commits is the number > > of commits which would be displayed by "git log v1.0.4..parent". > > -The hash suffix is "-g" + unambiguous abbreviation for the tip commit > > -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). > > +The hash suffix is "-g" + an unambigous abbreviation for the tip commit > > +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The > > +length of the abbreviation scales as the repository grows, using the > > +approximate number of objects in the repository and a bit of math > > +around the birthday paradox, and defaults to a minimum of 7. > > What is the birthday paradox then? > > Also, better say "... and by default the minimum length is 7.". The explanation was mostly copied from the 2.11.0 release notes, but mentioning the birtday paradox is unnecessary. I suggest changing this sentence to: "The length of the abbreviation scales as the repository grows using the approximate number of objects in the repository, and by default the minimum length is 7." Best regards, Anders
Anders Höckersten <anders@hockersten.se> writes: > The explanation was mostly copied from the 2.11.0 release notes, > but mentioning the birtday paradox is unnecessary. I suggest > changing this sentence to: "The length of the abbreviation scales > as the repository grows using the approximate number of objects in > the repository, and by default the minimum length is 7." Heh. In my priate review, I said that I very much liked the way the new description was phrased with "a bit of math around the birthday paradox". Now I know why I liked that phrasing---it turns out to be my own ;-) I don't mind with or without mention of the birthday math. Thanks for working on this.
Bagas Sanjaya wrote: > On 16/05/21 14.32, Anders Höckersten via GitGitGadget wrote: > > @@ -139,8 +140,11 @@ at the end. > > > > The number of additional commits is the number > > of commits which would be displayed by "git log v1.0.4..parent". > > -The hash suffix is "-g" + unambiguous abbreviation for the tip commit > > -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). > > +The hash suffix is "-g" + an unambigous abbreviation for the tip commit > > +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The > > +length of the abbreviation scales as the repository grows, using the > > +approximate number of objects in the repository and a bit of math > > +around the birthday paradox, and defaults to a minimum of 7. > > What is the birthday paradox then? It's a probability fact that goes against common sense. In a romm with 23 people you are 50% likely to find two people with the same birthday. https://en.wikipedia.org/wiki/Birthday_problem
On Sun, 16 May 2021, Felipe Contreras wrote: > Bagas Sanjaya wrote: > > On 16/05/21 14.32, Anders Höckersten via GitGitGadget wrote: > > > @@ -139,8 +140,11 @@ at the end. > > > > > > The number of additional commits is the number > > > of commits which would be displayed by "git log v1.0.4..parent". > > > -The hash suffix is "-g" + unambiguous abbreviation for the tip commit > > > -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). > > > +The hash suffix is "-g" + an unambigous abbreviation for the tip commit > > > +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The > > > +length of the abbreviation scales as the repository grows, using the > > > +approximate number of objects in the repository and a bit of math > > > +around the birthday paradox, and defaults to a minimum of 7. > > > > What is the birthday paradox then? > > It's a probability fact that goes against common sense. In a romm > with 23 people you are 50% likely to find two people with the same > birthday. > > https://en.wikipedia.org/wiki/Birthday_problem i've had to explain the logic behind this to people who really have a tough time understanding this, and it's a concept that applies in a lot of places (surprisingly). what trips people up is thinking they need to calculate the probability that two or more people have the same birthday. no. what you want to do is calculate the ongoing probability that each person's birthday is *different* from all the earlier ones. as in: * prob that 2nd person has a different bday is 364/365 * prob that 3rd person has a different bday is 363/365 * prob that 4th person has a different bday is 362/365 and so on, and as you multiply those together, it's right at 23 people that the chance that that person has a different bday from all earlier ones drops below 50%. what's neat is that this way of looking at things applies in a lot of places. rday
On Sun, May 16, 2021 at 2:00 PM Robert P. J. Day <rpjday@crashcourse.ca> wrote: > On Sun, 16 May 2021, Felipe Contreras wrote: > > Bagas Sanjaya wrote: > > > What is the birthday paradox then? > > > > It's a probability fact that goes against common sense. In a romm > > with 23 people you are 50% likely to find two people with the same > > birthday. > > > > https://en.wikipedia.org/wiki/Birthday_problem > > i've had to explain the logic behind this to people who really have > a tough time understanding this, and it's a concept that applies in a > lot of places (surprisingly). Indeed. Very very few people actually understand probability. Any intuition you have is almost always wrong. Even professional probabilists get probability wrong consistently. I've found it's safer and easier to not trust my intuition, write code, and that way get the probability (also called Monte Carlo method). I have a git repository with tricky simulations and I actually had written one for the birthday paradox, but I had not pushed it. Now I have [1]. The actual code is just two lines: birthdays = Array.new($n) { rand(365.25) } birthdays.any? { |e| birthdays.count(e) > 1 } Yet our brain somehow has trouble figuring out the approximate result of that computation. Cheers. [1] https://github.com/felipec/simulation/blob/master/examples/birthday
On Sun, May 16, 2021, at 14:58, Junio C Hamano wrote: > Anders Höckersten <anders@hockersten.se> writes: > > > The explanation was mostly copied from the 2.11.0 release notes, > > but mentioning the birtday paradox is unnecessary. I suggest > > changing this sentence to: "The length of the abbreviation scales > > as the repository grows using the approximate number of objects in > > the repository, and by default the minimum length is 7." > > Heh. In my priate review, I said that I very much liked the way the > new description was phrased with "a bit of math around the birthday > paradox". Now I know why I liked that phrasing---it turns out to be > my own ;-) > > I don't mind with or without mention of the birthday math. Thanks > for working on this. Actually, changed my mind again. I like the phrasing and it's in the "examples" section so a bit of verbosity doesn't hurt. Will submit a new patch with the other changes mentioned + whitespace fixes momentarily. /A
diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt index a88f6ae2c6e7..f3f41cfac496 100644 --- a/Documentation/git-describe.txt +++ b/Documentation/git-describe.txt @@ -63,10 +63,11 @@ OPTIONS Automatically implies --tags. --abbrev=<n>:: - Instead of using the default 7 hexadecimal digits as the - abbreviated object name, use <n> digits, or as many digits - as needed to form a unique object name. An <n> of 0 - will suppress long format, only showing the closest tag. + Instead of using the default number of hexadecimal digits (which + will vary according to the size of the repository with a default of + 7) of the abbreviated object name, use <n> digits, or as many digits + as needed to form a unique object name. An <n> of 0 will suppress + long format, only showing the closest tag. --candidates=<n>:: Instead of considering only the 10 most recent tags as @@ -139,8 +140,11 @@ at the end. The number of additional commits is the number of commits which would be displayed by "git log v1.0.4..parent". -The hash suffix is "-g" + unambiguous abbreviation for the tip commit -of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). +The hash suffix is "-g" + an unambigous abbreviation for the tip commit +of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`). The +length of the abbreviation scales as the repository grows, using the +approximate number of objects in the repository and a bit of math +around the birthday paradox, and defaults to a minimum of 7. The "g" prefix stands for "git" and is used to allow describing the version of a software depending on the SCM the software is managed with. This is useful in an environment where people may use different SCMs.