diff mbox series

[1/2] bitmap-format.txt: fix some formatting issues

Message ID 976361e624a3dd58c8f291358d42f4e4c66eb266.1654177966.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series bitmap-format.txt: fix some formatting issues and include checksum info | expand

Commit Message

Abhradeep Chakraborty June 2, 2022, 1:52 p.m. UTC
From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

The asciidoc generated html for `Documentation/technical/bitmap-
format.txt` is broken. This is mainly because `-` is used for nested
lists (which is not allowed in asciidoc) instead of `*`.

Fix these and also reformat it (e.g. removing some blank lines) for
better readability of the html page.

Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/technical/bitmap-format.txt | 96 +++++++++++------------
 1 file changed, 45 insertions(+), 51 deletions(-)

Comments

Junio C Hamano June 6, 2022, 3:55 p.m. UTC | #1
"Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> Cc: git@vger.kernel.org,  Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Identify those who may have input with "git log --no-merges" and add
them here, perhaps?

> The asciidoc generated html for `Documentation/technical/bitmap-
> format.txt` is broken. This is mainly because `-` is used for nested
> lists (which is not allowed in asciidoc) instead of `*`.

Are we missing another step that must come much earlier than this
patch?  It seems to me that Documentation/Makefile does not even
consider that we should feed this file to AsciiDoc.

> Fix these and also reformat it (e.g. removing some blank lines) for
> better readability of the html page.

Do these blank lines hurt very badly how the end-result is formatted
in HTML?  Does the extra indentation between the line with "The
following flags are supported" on it and the two bullet items in the
header make the output better in significant way?

These changes make the input text much harder to read, and are not
very welcome, so unless they are part of "fixing generated HTML is
broken", please omit them.  As evidenced by the lack of HTML output
in the build system, a lot more folks read this document in text than
in HTML, and readability of the source matters.

Thanks.

> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  Documentation/technical/bitmap-format.txt | 96 +++++++++++------------
>  1 file changed, 45 insertions(+), 51 deletions(-)
>
> diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
> index 04b3ec21785..110d7ddf8ed 100644
> --- a/Documentation/technical/bitmap-format.txt
> +++ b/Documentation/technical/bitmap-format.txt
> @@ -39,7 +39,7 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
>  
>  == On-disk format
>  
> -	- A header appears at the beginning:
> +	* A header appears at the beginning:
>  
>  		4-byte signature: {'B', 'I', 'T', 'M'}
>  
> @@ -48,35 +48,30 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
>  			of the bitmap index (the same one as JGit).
>  
>  		2-byte flags (network byte order)
> -
>  			The following flags are supported:
> -
> -			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
> -			This flag must always be present. It implies that the
> -			bitmap index has been generated for a packfile or
> -			multi-pack index (MIDX) with full closure (i.e. where
> -			every single object in the packfile/MIDX can find its
> -			parent links inside the same packfile/MIDX). This is a
> -			requirement for the bitmap index format, also present in
> -			JGit, that greatly reduces the complexity of the
> -			implementation.
> -
> -			- BITMAP_OPT_HASH_CACHE (0x4)
> -			If present, the end of the bitmap file contains
> -			`N` 32-bit name-hash values, one per object in the
> -			pack/MIDX. The format and meaning of the name-hash is
> -			described below.
> +				- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
> +				This flag must always be present. It implies that the
> +				bitmap index has been generated for a packfile or
> +				multi-pack index (MIDX) with full closure (i.e. where
> +				every single object in the packfile/MIDX can find its
> +				parent links inside the same packfile/MIDX). This is a
> +				requirement for the bitmap index format, also present in
> +				JGit, that greatly reduces the complexity of the
> +				implementation.
> +				- BITMAP_OPT_HASH_CACHE (0x4)
> +				If present, the end of the bitmap file contains
> +				`N` 32-bit name-hash values, one per object in the
> +				pack/MIDX. The format and meaning of the name-hash is
> +				described below.
>  
>  		4-byte entry count (network byte order)
> -
>  			The total count of entries (bitmapped commits) in this bitmap index.
>  
>  		20-byte checksum
> -
>  			The SHA1 checksum of the pack/MIDX this bitmap index
>  			belongs to.
>  
> -	- 4 EWAH bitmaps that act as type indexes
> +	* 4 EWAH bitmaps that act as type indexes
>  
>  		Type indexes are serialized after the hash cache in the shape
>  		of four EWAH bitmaps stored consecutively (see Appendix A for
> @@ -84,7 +79,6 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
>  
>  		There is a bitmap for each Git object type, stored in the following
>  		order:
> -
>  			- Commits
>  			- Trees
>  			- Blobs
> @@ -97,39 +91,39 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
>  		in a full set (all bits set), and the AND of all 4 bitmaps will
>  		result in an empty bitmap (no bits set).
>  
> -	- N entries with compressed bitmaps, one for each indexed commit
> +	* N entries with compressed bitmaps, one for each indexed commit
>  
>  		Where `N` is the total amount of entries in this bitmap index.
>  		Each entry contains the following:
>  
> -		- 4-byte object position (network byte order)
> -			The position **in the index for the packfile or
> -			multi-pack index** where the bitmap for this commit is
> -			found.
> -
> -		- 1-byte XOR-offset
> -			The xor offset used to compress this bitmap. For an entry
> -			in position `x`, a XOR offset of `y` means that the actual
> -			bitmap representing this commit is composed by XORing the
> -			bitmap for this entry with the bitmap in entry `x-y` (i.e.
> -			the bitmap `y` entries before this one).
> -
> -			Note that this compression can be recursive. In order to
> -			XOR this entry with a previous one, the previous entry needs
> -			to be decompressed first, and so on.
> -
> -			The hard-limit for this offset is 160 (an entry can only be
> -			xor'ed against one of the 160 entries preceding it). This
> -			number is always positive, and hence entries are always xor'ed
> -			with **previous** bitmaps, not bitmaps that will come afterwards
> -			in the index.
> -
> -		- 1-byte flags for this bitmap
> -			At the moment the only available flag is `0x1`, which hints
> -			that this bitmap can be re-used when rebuilding bitmap indexes
> -			for the repository.
> -
> -		- The compressed bitmap itself, see Appendix A.
> +			** 4-byte object position (network byte order)
> +				The position **in the index for the packfile or
> +				multi-pack index** where the bitmap for this commit is
> +				found.
> +
> +			** 1-byte XOR-offset
> +				The xor offset used to compress this bitmap. For an entry
> +				in position `x`, a XOR offset of `y` means that the actual
> +				bitmap representing this commit is composed by XORing the
> +				bitmap for this entry with the bitmap in entry `x-y` (i.e.
> +				the bitmap `y` entries before this one).
> +
> +				Note that this compression can be recursive. In order to
> +				XOR this entry with a previous one, the previous entry needs
> +				to be decompressed first, and so on.
> +
> +				The hard-limit for this offset is 160 (an entry can only be
> +				xor'ed against one of the 160 entries preceding it). This
> +				number is always positive, and hence entries are always xor'ed
> +				with **previous** bitmaps, not bitmaps that will come afterwards
> +				in the index.
> +
> +			** 1-byte flags for this bitmap
> +				At the moment the only available flag is `0x1`, which hints
> +				that this bitmap can be re-used when rebuilding bitmap indexes
> +				for the repository.
> +
> +			** The compressed bitmap itself, see Appendix A.
>  
>  == Appendix A: Serialization format for an EWAH bitmap
Abhradeep Chakraborty June 7, 2022, 10:25 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> wrote:

> Identify those who may have input with "git log --no-merges" and add
> them here, perhaps?

Thanks, I hopefully cc'd all the people who can give some input about the
patch except Peff. I got to know that he took a break so I decided not to
cc him (will surely do if you say). I would love to hear from other people
who has knowledge on asciidoc.

I previously informed Taylor and Kaartic about the patch but forgot to
cc them :P

Another thing to note that the checksum that I included in the last
commit is suggested by Taylor himself. I was having problem to understand
some portion of `load_bitmap_header()` (because I wasn't aware of the
trailing checksum) when he cleared my doubt by saying that a trailer
checksum exists and also suggested to make a PR addressing that -

> I'm glad that it was helpful! If you think others may be confused by the same, feel free to write a patch modifying Documentation/technical/bitmap-format.txt to point out the trailing checksum.

Junio wrote -

> Are we missing another step that must come much earlier than this
> patch?  It seems to me that Documentation/Makefile does not even
> consider that we should feed this file to AsciiDoc.

I also think the same. At first, I thought this is intentional. When
I ran `make doc` (to test the resulting html file), it didn't generate
any html file for bitmap-format.txt. But thankfully there is an online
asciidoc editor[1] where you can check the resulting html file. You also
can check the resulting html by copy-pasting the content[2] of my github
branch bitmap-format file to that editor.

Will write a patch for it.

The current broken page can be found at - https://git-scm.com/docs/bitmap-format

> Do these blank lines hurt very badly how the end-result is formatted
> in HTML?  Does the extra indentation between the line with "The
> following flags are supported" on it and the two bullet items in the
> header make the output better in significant way?

Answering to the first question - yes, those are necessary to improve
the html readability (you can verify that by including and removing the
blank lines in the editor and obsering the changes). This ensures that
all the related paragraphes are contained in the same block.

The extra identations are not necessary. I add those because I thought
that these would be visually better for html page readers. If you think
it does the opposite, I can remove those.

I tried to use two bullets as less as possible ( In most cases, nested
lists came under <pre> blocks, so I didn't have to use two bullets).
But in one case, I had to use it for nested lists (Try the editor to
see the rendered output).

> These changes make the input text much harder to read, and are not
> very welcome, so unless they are part of "fixing generated HTML is
> broken", please omit them.  As evidenced by the lack of HTML output
> in the build system, a lot more folks read this document in text than
> in HTML, and readability of the source matters.

Okay, I will then remove those extra indentations. But besides that, all
are necessary.

I admit that readability of source matters but I think html pages are
also important (even more important)  for people who don't have the
source codes and want to know the git internals.

Thanks :)

[1] https://asciidoclive.com/edit/scratch/1
[2] https://github.com/Abhra303/git/blob/fix-doc-formatting/Documentation/technical/bitmap-format.txt
diff mbox series

Patch

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 04b3ec21785..110d7ddf8ed 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -39,7 +39,7 @@  MIDXs, both the bit-cache and rev-cache extensions are required.
 
 == On-disk format
 
-	- A header appears at the beginning:
+	* A header appears at the beginning:
 
 		4-byte signature: {'B', 'I', 'T', 'M'}
 
@@ -48,35 +48,30 @@  MIDXs, both the bit-cache and rev-cache extensions are required.
 			of the bitmap index (the same one as JGit).
 
 		2-byte flags (network byte order)
-
 			The following flags are supported:
-
-			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
-			This flag must always be present. It implies that the
-			bitmap index has been generated for a packfile or
-			multi-pack index (MIDX) with full closure (i.e. where
-			every single object in the packfile/MIDX can find its
-			parent links inside the same packfile/MIDX). This is a
-			requirement for the bitmap index format, also present in
-			JGit, that greatly reduces the complexity of the
-			implementation.
-
-			- BITMAP_OPT_HASH_CACHE (0x4)
-			If present, the end of the bitmap file contains
-			`N` 32-bit name-hash values, one per object in the
-			pack/MIDX. The format and meaning of the name-hash is
-			described below.
+				- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
+				This flag must always be present. It implies that the
+				bitmap index has been generated for a packfile or
+				multi-pack index (MIDX) with full closure (i.e. where
+				every single object in the packfile/MIDX can find its
+				parent links inside the same packfile/MIDX). This is a
+				requirement for the bitmap index format, also present in
+				JGit, that greatly reduces the complexity of the
+				implementation.
+				- BITMAP_OPT_HASH_CACHE (0x4)
+				If present, the end of the bitmap file contains
+				`N` 32-bit name-hash values, one per object in the
+				pack/MIDX. The format and meaning of the name-hash is
+				described below.
 
 		4-byte entry count (network byte order)
-
 			The total count of entries (bitmapped commits) in this bitmap index.
 
 		20-byte checksum
-
 			The SHA1 checksum of the pack/MIDX this bitmap index
 			belongs to.
 
-	- 4 EWAH bitmaps that act as type indexes
+	* 4 EWAH bitmaps that act as type indexes
 
 		Type indexes are serialized after the hash cache in the shape
 		of four EWAH bitmaps stored consecutively (see Appendix A for
@@ -84,7 +79,6 @@  MIDXs, both the bit-cache and rev-cache extensions are required.
 
 		There is a bitmap for each Git object type, stored in the following
 		order:
-
 			- Commits
 			- Trees
 			- Blobs
@@ -97,39 +91,39 @@  MIDXs, both the bit-cache and rev-cache extensions are required.
 		in a full set (all bits set), and the AND of all 4 bitmaps will
 		result in an empty bitmap (no bits set).
 
-	- N entries with compressed bitmaps, one for each indexed commit
+	* N entries with compressed bitmaps, one for each indexed commit
 
 		Where `N` is the total amount of entries in this bitmap index.
 		Each entry contains the following:
 
-		- 4-byte object position (network byte order)
-			The position **in the index for the packfile or
-			multi-pack index** where the bitmap for this commit is
-			found.
-
-		- 1-byte XOR-offset
-			The xor offset used to compress this bitmap. For an entry
-			in position `x`, a XOR offset of `y` means that the actual
-			bitmap representing this commit is composed by XORing the
-			bitmap for this entry with the bitmap in entry `x-y` (i.e.
-			the bitmap `y` entries before this one).
-
-			Note that this compression can be recursive. In order to
-			XOR this entry with a previous one, the previous entry needs
-			to be decompressed first, and so on.
-
-			The hard-limit for this offset is 160 (an entry can only be
-			xor'ed against one of the 160 entries preceding it). This
-			number is always positive, and hence entries are always xor'ed
-			with **previous** bitmaps, not bitmaps that will come afterwards
-			in the index.
-
-		- 1-byte flags for this bitmap
-			At the moment the only available flag is `0x1`, which hints
-			that this bitmap can be re-used when rebuilding bitmap indexes
-			for the repository.
-
-		- The compressed bitmap itself, see Appendix A.
+			** 4-byte object position (network byte order)
+				The position **in the index for the packfile or
+				multi-pack index** where the bitmap for this commit is
+				found.
+
+			** 1-byte XOR-offset
+				The xor offset used to compress this bitmap. For an entry
+				in position `x`, a XOR offset of `y` means that the actual
+				bitmap representing this commit is composed by XORing the
+				bitmap for this entry with the bitmap in entry `x-y` (i.e.
+				the bitmap `y` entries before this one).
+
+				Note that this compression can be recursive. In order to
+				XOR this entry with a previous one, the previous entry needs
+				to be decompressed first, and so on.
+
+				The hard-limit for this offset is 160 (an entry can only be
+				xor'ed against one of the 160 entries preceding it). This
+				number is always positive, and hence entries are always xor'ed
+				with **previous** bitmaps, not bitmaps that will come afterwards
+				in the index.
+
+			** 1-byte flags for this bitmap
+				At the moment the only available flag is `0x1`, which hints
+				that this bitmap can be re-used when rebuilding bitmap indexes
+				for the repository.
+
+			** The compressed bitmap itself, see Appendix A.
 
 == Appendix A: Serialization format for an EWAH bitmap