diff mbox series

docs: clarify how the text and text=auto attributes are different

Message ID 20230421055641.550199-1-alexhenrie24@gmail.com (mailing list archive)
State New, archived
Headers show
Series docs: clarify how the text and text=auto attributes are different | expand

Commit Message

Alex Henrie April 21, 2023, 5:56 a.m. UTC
These two sentences are confusing because the description of the text
attribute sounds exactly the same as the description of the text=auto
attribute:

"Setting the text attribute on a path enables end-of-line normalization"

"When text is set to "auto", the path is marked for automatic
end-of-line conversion"

Unless the reader is already familiar with the two variants, there's a
high probability that they will think that "end-of-line normalization"
is the same thing as "automatic end-of-line conversion".

It's also confusing that the explanation of how end-of-line conversion
works is in the paragraph for text=auto even though it applies equally
to the text attribute which is described earlier.

On top of that, "When the file has been committed with CRLF, no
conversion is done" implies that normalization is only suppressed if the
file has been committed. In fact, running `git add` on a CRLF file,
adding the text attribute to the file, and running `git add` again does
not do anything to the line endings either.

Rephrase the documentation of text and text=auto to be clear about how
they are the same, how they are different, and in what cases
normalization is performed.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
---
 Documentation/gitattributes.txt | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Comments

Junio C Hamano April 21, 2023, 4:35 p.m. UTC | #1
Alex Henrie <alexhenrie24@gmail.com> writes:

> These two sentences are confusing because the description of the text
> attribute sounds exactly the same as the description of the text=auto
> attribute:
>
> "Setting the text attribute on a path enables end-of-line normalization"
>
> "When text is set to "auto", the path is marked for automatic
> end-of-line conversion"
>
> Unless the reader is already familiar with the two variants, there's a
> high probability that they will think that "end-of-line normalization"
> is the same thing as "automatic end-of-line conversion".

True.  

The prerequisite to understand these two seemingly contradicting
sentences is to know that "attribute" can be "set" (in the sense of
Boolean "yes") and can be "set to value" (here in this case, to the
string "auto").  They are treated differently.

It is very good that you decided to clarify it.

> It's also confusing that the explanation of how end-of-line conversion
> works is in the paragraph for text=auto even though it applies equally
> to the text attribute which is described earlier.

Good observation.

Instead of half-duplicating the same explanation in two places, it
is very good to do so just once, in the introductory description of
the 'text' attribute before the individual description of what
happens when the attribute is in any of the four states.

> On top of that, "When the file has been committed with CRLF, no
> conversion is done" implies that normalization is only suppressed if the
> file has been committed. In fact, running `git add` on a CRLF file,
> adding the text attribute to the file, and running `git add` again does
> not do anything to the line endings either.

True again.  The conversion happens between the index and the
working tree, and "committing" does not have much say in the
process.  Interestingly, the description of what happens when the
attribute is "Unset" (in the sense of setting it to Boolean "no")
uses a better terminology---"checkin or checkout", which is about
getting things out of or into the index.

It is very good that you decided to clarify it.

> Rephrase the documentation of text and text=auto to be clear about how
> they are the same, how they are different, and in what cases
> normalization is performed.

Thanks.

> Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
> ---
>  Documentation/gitattributes.txt | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 39bfbca1ff..6db4ecd794 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -131,9 +131,12 @@ linkgit:git-config[1]).
>  
>  Set::
>  
> -	Setting the `text` attribute on a path enables end-of-line
> -	normalization and marks the path as a text file.  End-of-line
> -	conversion takes place without guessing the content type.
> +	Setting the `text` attribute on a path marks the path as a text
> +	file, which enables end-of-line normalization: When a matching file
> +	is added to the index, even if it has CRLF line endings in the
> +	working directory, the file is stored in Git with LF line endings.
> +	However, if the file was already in Git with CRLF endings, no
> +	conversion is done.

I think most of the new text should go to the paragraph before this
hunk (i.e. "... its line endings are converted to ..."), where not
just CRLF->LF conversion is described, but how it does and does not
happen depending on the eol, core.eol, and core.autocrlf are set.

>  Unset::
>  
> @@ -142,10 +145,9 @@ Unset::
>  
>  Set to string value "auto"::
>  
> -	When `text` is set to "auto", the path is marked for automatic
> -	end-of-line conversion.  If Git decides that the content is
> -	text, its line endings are converted to LF on checkin.
> -	When the file has been committed with CRLF, no conversion is done.
> +	When text is set to "auto", Git decides by itself whether the file
> +	is text or binary.  If it is text, line endings are converted as
> +	described above.  If it is binary, they are not.

This side is good.  "As described above" can still be used as-is
even after the explanation of eol conversion is made before we start
enumerating Set/Unset/set to auto/unspecified.

Thanks.
Torsten Bögershausen April 21, 2023, 6:25 p.m. UTC | #2
On Thu, Apr 20, 2023 at 11:56:41PM -0600, Alex Henrie wrote:

Thanks for that, fully agree with comments from Junio.

>  Set::
>
> -	Setting the `text` attribute on a path enables end-of-line
> -	normalization and marks the path as a text file.  End-of-line
> -	conversion takes place without guessing the content type.
> +	Setting the `text` attribute on a path marks the path as a text
> +	file, which enables end-of-line normalization: When a matching file
> +	is added to the index, even if it has CRLF line endings in the
> +	working directory, the file is stored in Git with LF line endings.

Hm, I stumbled a little bit over the word "even" here.
But it makes things very clear, so this is good.


> +	However, if the file was already in Git with CRLF endings, no
> +	conversion is done.

But I think that this is wrong, unless I missed something.

Using
*.txt text
in .gitattributes
will convert CRLF into LF in the repo at the next git commit/git add

When you create a file with CRLF (no atttibutes), commit it into the repo,
and later add the attribute "text", the file may be reported as modified.
Not always. A `touch file.txt` typically asks Git to re-investigate things,
and it may show up as modified.
After a `git clone` it may be shown as modified or not.
End of side-note.

Setting




>
>  Unset::
>
> @@ -142,10 +145,9 @@ Unset::
>
>  Set to string value "auto"::
>
> -	When `text` is set to "auto", the path is marked for automatic
> -	end-of-line conversion.  If Git decides that the content is
> -	text, its line endings are converted to LF on checkin.
> -	When the file has been committed with CRLF, no conversion is done.
> +	When text is set to "auto", Git decides by itself whether the file
> +	is text or binary.  If it is text, line endings are converted as
> +	described above.  If it is binary, they are not.

Here we need to have the lines from above:

When text is set to "auto", Git decides by itself whether the file
is text or binary.  If it is text, line endings are converted as
described above.  If it is binary, they are not.
However, if the file was already in Git with CRLF endings, no
conversion is done.

I hope that I didn't messed up while reading this patch ?
Thanks again for bringing this up
Alex Henrie April 26, 2023, 1:18 p.m. UTC | #3
On Fri, Apr 21, 2023 at 12:26 PM Torsten Bögershausen <tboegi@web.de> wrote:

> Using
> *.txt text
> in .gitattributes
> will convert CRLF into LF in the repo at the next git commit/git add
>
> When you create a file with CRLF (no atttibutes), commit it into the repo,
> and later add the attribute "text", the file may be reported as modified.
> Not always. A `touch file.txt` typically asks Git to re-investigate things,
> and it may show up as modified.
> After a `git clone` it may be shown as modified or not.
> End of side-note.

Thank you very much for pointing that out. I had assumed that "text"
and "text=auto" behaved the same for text files, but they do not:
"text" normalizes line endings even if the file is already in Git.
That means that my proposed text is not correct, and I will have to
change it similarly to how you suggested below.

> >  Set to string value "auto"::
> >
> > -     When `text` is set to "auto", the path is marked for automatic
> > -     end-of-line conversion.  If Git decides that the content is
> > -     text, its line endings are converted to LF on checkin.
> > -     When the file has been committed with CRLF, no conversion is done.
> > +     When text is set to "auto", Git decides by itself whether the file
> > +     is text or binary.  If it is text, line endings are converted as
> > +     described above.  If it is binary, they are not.
>
> Here we need to have the lines from above:
>
> When text is set to "auto", Git decides by itself whether the file
> is text or binary.  If it is text, line endings are converted as
> described above.  If it is binary, they are not.
> However, if the file was already in Git with CRLF endings, no
> conversion is done.

Yes, the note about what happens if the file is already in Git with
CRLF endings applies specifically to text=auto, so it needs to be in
the text=auto section. The documentation of the eol attribute actually
explains it more clearly:

"Note that setting this attribute on paths which are in the index with
CRLF line endings may make the paths to be considered dirty unless
text=auto is set. Adding the path to the index again will normalize
the line endings in the index."

I think that explanation needs to be moved from the eol attribute
documentation to the text attribute documentation and edited for
clarity. I will send a v2 that revises the documentation of both
attributes, taking into account your and Junio's feedback.

Thanks again,

-Alex
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 39bfbca1ff..6db4ecd794 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -131,9 +131,12 @@  linkgit:git-config[1]).
 
 Set::
 
-	Setting the `text` attribute on a path enables end-of-line
-	normalization and marks the path as a text file.  End-of-line
-	conversion takes place without guessing the content type.
+	Setting the `text` attribute on a path marks the path as a text
+	file, which enables end-of-line normalization: When a matching file
+	is added to the index, even if it has CRLF line endings in the
+	working directory, the file is stored in Git with LF line endings.
+	However, if the file was already in Git with CRLF endings, no
+	conversion is done.
 
 Unset::
 
@@ -142,10 +145,9 @@  Unset::
 
 Set to string value "auto"::
 
-	When `text` is set to "auto", the path is marked for automatic
-	end-of-line conversion.  If Git decides that the content is
-	text, its line endings are converted to LF on checkin.
-	When the file has been committed with CRLF, no conversion is done.
+	When text is set to "auto", Git decides by itself whether the file
+	is text or binary.  If it is text, line endings are converted as
+	described above.  If it is binary, they are not.
 
 Unspecified::