diff mbox series

[v2] doc: remove misleading documentation on pack names

Message ID 20200722202629.109277-1-johannes@sipsolutions.net (mailing list archive)
State New, archived
Headers show
Series [v2] doc: remove misleading documentation on pack names | expand

Commit Message

Johannes Berg July 22, 2020, 8:26 p.m. UTC
The index-pack documentation explicitly states that the pack
name is derived from the sorted list of object names, but
that clearly isn't true. I can't seem to be able to figure
out if this was ever changed though.

Be less explicit in the docs as to what the exact output is,
and just say that it's whatever goes into the pack name.

Really it seems to be the sha1 of the entire file, without
the checksum footer.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
It was reported that bup writes pack files that have a name
different from what git does, and I think it's quite possibly
because of this documentation ... it doesn't actually really
*matter* though, as long as the file is internally consistent
nothing checks that the name also matches the footer.

You can also take this as a bug report and fix the language in
some other, perhaps more precise way, if you prefer :-)

v2: correct bup list address, oops
---
 Documentation/git-index-pack.txt | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Comments

Junio C Hamano July 22, 2020, 9:09 p.m. UTC | #1
Johannes Berg <johannes@sipsolutions.net> writes:

> The index-pack documentation explicitly states that the pack
> name is derived from the sorted list of object names, but
> that clearly isn't true. I can't seem to be able to figure
> out if this was ever changed though.
>
> Be less explicit in the docs as to what the exact output is,
> and just say that it's whatever goes into the pack name.
>
> Really it seems to be the sha1 of the entire file, without
> the checksum footer.

Please avoid "seems to be" and spend a bit of effort digging the
history especially when we are not in a hurry to get to the definite
answer.  We can go "less explicit", or be a bit more informative by
saying that it is the trailer hash that is standard practice shared
across our binary files like the index and the packfile.

I think this is 1190a1ac (pack-objects: name pack files after
trailer hash, 2013-12-05).  It forgot to update the comment before
write_idx_file() function when it did this change:

 /*
  * On entry *sha1 contains the pack content SHA1 hash, on exit it is
  * the SHA1 hash of sorted object names. The objects array passed in
  * will be sorted by SHA1 on exit.
  */
 const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects,
 			   int nr_objects, const struct pack_idx_option *opts,
-			   unsigned char *sha1)
+			   const unsigned char *sha1)
 {

Obviously, after it turned *sha1 into 'const', it no longer is
possible for it to have anything different from what was passed in
upon exit.

> +Once the index has been created, the hash that goes into the name of
> +the pack/idx file is printed to stdout. If --stdin was also used then
> +this is prefixed by either "pack\t", or "keep\t" if a new .keep file
> +was successfully created. This is useful to remove a .keep file used
> +as a lock to prevent the race with 'git repack' mentioned above.

The change is good---I made sure that among these filve lines, what
changed was only the first one and half lines.  I however would have
preferred not to see the line rewrapping.

Thanks.
Johannes Berg July 22, 2020, 9:14 p.m. UTC | #2
On Wed, 2020-07-22 at 14:09 -0700, Junio C Hamano wrote:
> 
> Please avoid "seems to be" and spend a bit of effort digging the
> history especially when we are not in a hurry to get to the definite
> answer.

I did ... but between all the file renames and moves, and not
understanding the code very well, I didn't really understand what was
going on.

Ok, so maybe "seems to be" was a bit of a cop-out because I do
understand that's what git does *now* (having just replicated it in
bup), but I have no idea how it got there.

> We can go "less explicit", or be a bit more informative by
> saying that it is the trailer hash that is standard practice shared
> across our binary files like the index and the packfile.
> 
> I think this is 1190a1ac (pack-objects: name pack files after
> trailer hash, 2013-12-05).

Indeed, that makes sense. Somehow I didn't come across this commit, but
perhaps that's because I was looking too much at index-pack.c (and its
various renames).

>   It forgot to update the comment before
> write_idx_file() function when it did this change:
> 
>  /*
>   * On entry *sha1 contains the pack content SHA1 hash, on exit it is
>   * the SHA1 hash of sorted object names. The objects array passed in
>   * will be sorted by SHA1 on exit.
>   */
>  const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects,
>  			   int nr_objects, const struct pack_idx_option *opts,
> -			   unsigned char *sha1)
> +			   const unsigned char *sha1)
>  {
> 
> Obviously, after it turned *sha1 into 'const', it no longer is
> possible for it to have anything different from what was passed in
> upon exit.

Indeed :-)

> > +Once the index has been created, the hash that goes into the name of
> > +the pack/idx file is printed to stdout. If --stdin was also used then
> > +this is prefixed by either "pack\t", or "keep\t" if a new .keep file
> > +was successfully created. This is useful to remove a .keep file used
> > +as a lock to prevent the race with 'git repack' mentioned above.
> 
> The change is good---I made sure that among these filve lines, what
> changed was only the first one and half lines.  I however would have
> preferred not to see the line rewrapping.

Ok, fair - the text all seemed wrapped "nicely" so I preserved that
rather than have one line significantly shorter than the others, but if
you prefer that it's fine by me.

Really all I was trying to do is be a *little* more helpful than just
point out "the documentation is wrong"...

johannes
Junio C Hamano July 22, 2020, 9:16 p.m. UTC | #3
Johannes Berg <johannes@sipsolutions.net> writes:

> Really all I was trying to do is be a *little* more helpful than just
> point out "the documentation is wrong"...

Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 9316d9a80b0d..ace40fa9f363 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -104,12 +104,11 @@  This option cannot be used with --stdin.
 NOTES
 -----
 
-Once the index has been created, the list of object names is sorted
-and the SHA-1 hash of that list is printed to stdout. If --stdin was
-also used then this is prefixed by either "pack\t", or "keep\t" if a
-new .keep file was successfully created. This is useful to remove a
-.keep file used as a lock to prevent the race with 'git repack'
-mentioned above.
+Once the index has been created, the hash that goes into the name of
+the pack/idx file is printed to stdout. If --stdin was also used then
+this is prefixed by either "pack\t", or "keep\t" if a new .keep file
+was successfully created. This is useful to remove a .keep file used
+as a lock to prevent the race with 'git repack' mentioned above.
 
 GIT
 ---