diff mbox series

[2/2] git-format-patch: Document format for binary patch

Message ID 20210324123027.29460-3-bagasdotme@gmail.com (mailing list archive)
State New
Headers show
Series Diff format documentation for git-format-patch | expand

Commit Message

Bagas Sanjaya March 24, 2021, 12:30 p.m. UTC
Document binary file patch formats that are different from text file
patch.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
---
 Documentation/git-format-patch.txt | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Junio C Hamano March 24, 2021, 5:53 p.m. UTC | #1
Bagas Sanjaya <bagasdotme@gmail.com> writes:

> Document binary file patch formats that are different from text file
> patch.
>
> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
> ---
>  Documentation/git-format-patch.txt | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt
> index 247033f8fc..8de172b1f4 100644
> --- a/Documentation/git-format-patch.txt
> +++ b/Documentation/git-format-patch.txt
> @@ -725,6 +725,28 @@ diff format is described as below:
>  
>  include::diff-generate-patch.txt[]
>  
> +Binary Files
> +~~~~~~~~~~~~
> +For binary files, the diff format have some differences compared to text
> +files:

I do not think this is specific to 'format-patch'.  If we need to
describe 'git diff --binary', it should be done there, so that
readers of "git diff --help" would also be able to learn the format.

> +1. Object hashes in index header line (`index <hash>..<hash> <mode>`)

s/Object hash/Object name/;

> +   are always given in full form, as binary patch is designed to be
> +   applied only to an exact copy of original file. This is to ensure
> +   that such patch don't apply to file with similar name but different
> +   hash.

... with similar but different object name.

cf. Documentation/glossary-contents.txt tells you what "object name" is.

> +2. There are additional extended header lines specific to binary files:
> +
> +        GIT binary patch
> +        delta <bytes>
> +        literal <bytes>
> +
> +3. The diff body can be either delta or full (literal) content,
> +   whichever is the smallest size. It is encoded with base85 algorithm,
> +   and emitted in 64 characters each line. All but the last line in
> +   the body are prefixed with `z`.

I do not think this is all that useful; it clutters the description
for a reader who is not interested in reimplementing an encoder or a
decoder from the document.

And it is way too insufficient for a reader who wants to reimplement
an encoder or a decoder.  For example,

 - It does not say anything about what the delta is and how it is
   computed.

 - The 'z' is redundant; the more important is to say that the first
   byte signals how many bytes are on that line and it is a mere
   artifact that we cram up to 52 bytes on a line.

 - It does not say anything about how the binary patch ensures that
   it is reversible (i.e. can be given to "git apply -R").

Thanks.
Bagas Sanjaya March 25, 2021, 6:22 a.m. UTC | #2
On 25/03/21 00.53, Junio C Hamano wrote:
> I do not think this is all that useful; it clutters the description
> for a reader who is not interested in reimplementing an encoder or a
> decoder from the document.
> 
> And it is way too insufficient for a reader who wants to reimplement
> an encoder or a decoder.  For example,
> 
>   - It does not say anything about what the delta is and how it is
>     computed.
> 
>   - The 'z' is redundant; the more important is to say that the first
>     byte signals how many bytes are on that line and it is a mere
>     artifact that we cram up to 52 bytes on a line.
> 
>   - It does not say anything about how the binary patch ensures that
>     it is reversible (i.e. can be given to "git apply -R").
> 
> Thanks.
> 
Hmmm...

I write this patch from "naive" observation of git format-patch's
behavior when given binary files in the commit.

Perhaps someone which is more familiar in base85 {en,de}coder and binary
patch in general can write better documentation than what I send here.
Junio C Hamano March 25, 2021, 6:25 p.m. UTC | #3
Bagas Sanjaya <bagasdotme@gmail.com> writes:

> On 25/03/21 00.53, Junio C Hamano wrote:
>> I do not think this is all that useful; it clutters the description
>> for a reader who is not interested in reimplementing an encoder or a
>> decoder from the document.
>> And it is way too insufficient for a reader who wants to reimplement
>> an encoder or a decoder.  For example,
>>   - It does not say anything about what the delta is and how it is
>>     computed.
>>   - The 'z' is redundant; the more important is to say that the
>> first
>>     byte signals how many bytes are on that line and it is a mere
>>     artifact that we cram up to 52 bytes on a line.
>>   - It does not say anything about how the binary patch ensures that
>>     it is reversible (i.e. can be given to "git apply -R").
>> Thanks.
>> 
> Hmmm...
>
> I write this patch from "naive" observation of git format-patch's
> behavior when given binary files in the commit.
>
> Perhaps someone which is more familiar in base85 {en,de}coder and binary
> patch in general can write better documentation than what I send here.

I do not mind reviewing an update to an existing document or a new
document in Documentation/technical/ somewhere, if somebody is
motivated enough to write the details to a degree that would allow
reimplementation of the encoder and the decoder.  I just do not think
it belongs to the end-user-facing document of "format-patch", whose
target is users of the "format-patch" command, not reimplementors of
the command.
diff mbox series

Patch

diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt
index 247033f8fc..8de172b1f4 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -725,6 +725,28 @@  diff format is described as below:
 
 include::diff-generate-patch.txt[]
 
+Binary Files
+~~~~~~~~~~~~
+For binary files, the diff format have some differences compared to text
+files:
+
+1. Object hashes in index header line (`index <hash>..<hash> <mode>`)
+   are always given in full form, as binary patch is designed to be
+   applied only to an exact copy of original file. This is to ensure
+   that such patch don't apply to file with similar name but different
+   hash.
+
+2. There are additional extended header lines specific to binary files:
+
+        GIT binary patch
+        delta <bytes>
+        literal <bytes>
+
+3. The diff body can be either delta or full (literal) content,
+   whichever is the smallest size. It is encoded with base85 algorithm,
+   and emitted in 64 characters each line. All but the last line in
+   the body are prefixed with `z`.
+
 SEE ALSO
 --------
 linkgit:git-am[1], linkgit:git-send-email[1]