diff mbox series

[v2] docs: rewrite the documentation of the text and eol attributes

Message ID 20230428042221.871095-1-alexhenrie24@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] docs: rewrite the documentation of the text and eol attributes | expand

Commit Message

Alex Henrie April 28, 2023, 4:22 a.m. UTC
These two sentences are confusing because the description of the text
attribute sounds exactly the same as the description of the text=auto
attribute:

"Setting the text attribute on a path enables end-of-line normalization"

"When text is set to "auto", the path is marked for automatic
end-of-line conversion"

Unless the reader is already familiar with the two variants, there's a
high probability that they will think that "end-of-line normalization"
is the same thing as "automatic end-of-line conversion".

It's also not clear that the phrase "When the file has been committed
with CRLF, no conversion is done" in the paragraph for text=auto does
not apply equally to the bare text attribute which is described earlier.
Moreover, it falsely implies that normalization is only suppressed if
the file has been committed. In fact, running `git add` on a CRLF file,
adding the text=auto attribute to the file, and running `git add` again
does not do anything to the line endings either.

On top of that, in several places the documentation for the eol
attribute sounds like it can force normalization on checkin and checkout
all by itself, but eol doesn't control normalization on checkin and
doesn't control normalization on checkout either unless accompanied by
the text attribute.

Rephrase the documentation of text, text=auto, eol, eol=crlf, and eol=lf
to be clear about how they are the same, how they are different, and in
what cases normalization is performed.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
---
v2: rewrite completely and rewrite the eol documentation too
---
 Documentation/gitattributes.txt | 58 +++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 25 deletions(-)

Comments

Junio C Hamano April 28, 2023, 6:05 p.m. UTC | #1
Alex Henrie <alexhenrie24@gmail.com> writes:

> These two sentences are confusing because the description of the text
> attribute sounds exactly the same as the description of the text=auto
> attribute:
>
> "Setting the text attribute on a path enables end-of-line normalization"
>
> "When text is set to "auto", the path is marked for automatic
> end-of-line conversion"
>
> Unless the reader is already familiar with the two variants, there's a
> high probability that they will think that "end-of-line normalization"
> is the same thing as "automatic end-of-line conversion".
>
> It's also not clear that the phrase "When the file has been committed
> with CRLF, no conversion is done" in the paragraph for text=auto does
> not apply equally to the bare text attribute which is described earlier.
> Moreover, it falsely implies that normalization is only suppressed if
> the file has been committed. In fact, running `git add` on a CRLF file,
> adding the text=auto attribute to the file, and running `git add` again
> does not do anything to the line endings either.
>
> On top of that, in several places the documentation for the eol
> attribute sounds like it can force normalization on checkin and checkout
> all by itself, but eol doesn't control normalization on checkin and
> doesn't control normalization on checkout either unless accompanied by
> the text attribute.
>
> Rephrase the documentation of text, text=auto, eol, eol=crlf, and eol=lf
> to be clear about how they are the same, how they are different, and in
> what cases normalization is performed.

All good observations, appropriate motivation of the change.

> Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
> ---
> v2: rewrite completely and rewrite the eol documentation too
> ---

Thanks.  I wonder helped-by tboegi@ might be in order, but hopefully
it may come during the review of this round.

> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 39bfbca1ff..bcea84f439 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -120,20 +120,22 @@ repository upon 'git add' and 'git commit'.
>  `text`
>  ^^^^^^
>  
> -This attribute enables and controls end-of-line normalization.  When a
> -text file is normalized, its line endings are converted to LF in the
> -repository.  To control what line ending style is used in the working
> -directory, use the `eol` attribute for a single file and the
> -`core.eol` configuration variable for all text files.
> -Note that setting `core.autocrlf` to `true` or `input` overrides
> -`core.eol` (see the definitions of those options in
> -linkgit:git-config[1]).
> +This attribute marks the path as a text file, which enables end-of-line
> +normalization on checkin and possibly also checkout: When a matching
> +file is added to the index, even if it has CRLF line endings in the
> +working directory, the file is stored in the index with LF line endings.
> +Conversely, when the file is copied from the index to the working
> +directory, its line endings may be converted from LF to CRLF depending
> +on the `eol` attribute, the Git config, and the platform (see
> +explanation of `eol` below).

Four points.

1. Overall, the new text is clearly written, and I think it made the
   updated description of "Set" and "Set to string 'auto'" easier to
   understand.

2. "on checkin and possibly also checkout" may be redundant, as the
   section in which the `text` attribute is described is all about
   various conversions upon checking in (i.e. moving contents to the
   index) and checking out (i.e. moving contents out of the index).
   The introductory text for the whole section that does not define
   what "check-in" and "check-out" are and instead loosely says
   "commands such as" may want to be clarified, but that can be done
   as a follow-on clean-up work after this patch lands.   

3. It is unclear why we would want to mention "even if it has CRLF
   line endings in the working directory".  "Regardless of the line
   endings in the working tree files" may slightly be better [*1*],
   but I think the text reads better without it as storing contents
   with possibly different end-of-line convention is the point of
   "end-of-line normalization" in the first place.

   "the file is stored in the index with LF line endings" may better
   be rephrased to "the contents is stored, after converting to use
   LF line endings if necessary, in the index", so that we can use
   the verb "convert" to stress the point of setting this attribute.

4. This problem is shared with the original text, but "This
   attribute marks the path as a text file" might be a bit
   misleading.  If it is left unspecified (i.e. unmentioned, or any
   earlier settings by explicitly overriding them with "!text") or
   worse yet, explicitly unset (i.e. "-text"), it marks the path as
   a non text.

      This attribute is used to mark the path as a text file, which
      needs certain end-of-line normalization upon check-in and
      check-out, or a non-text file, which do not.

  I dunno if the extra verbosity is too much, though.

>  Set::
>  
>  	Setting the `text` attribute on a path enables end-of-line
> -	normalization and marks the path as a text file.  End-of-line
> -	conversion takes place without guessing the content type.
> +	normalization on checkin and checkout as described above.  Line
> +	endings are normalized in the index the next time the file is
> +	checked in, even if the file was previously added to Git with CRLF
> +	line endings.

"the next time" -> "every time", but that is probably moot, because
I would question the need for the whole sentence.

The last sentence, starting with "Line endigns are normalized", may
be redundant and we probably are better off without it.  It would
risk becoming maintenance burden because we have to make sure it
stays in sync with what you wrote in the introductory text.  "as
described above" before that sentence is clear enough.

> @@ -142,10 +144,11 @@ Unset::
>  
>  Set to string value "auto"::
>  
> -	When `text` is set to "auto", the path is marked for automatic
> -	end-of-line conversion.  If Git decides that the content is
> -	text, its line endings are converted to LF on checkin.
> -	When the file has been committed with CRLF, no conversion is done.
> +	When `text` is set to "auto", Git decides by itself whether the file
> +	is text or binary.  If it is text and the file was not already in
> +	Git with CRLF endings, line endings are converted on checkin and
> +	checkout as described above.  Otherwise, no conversion is done on
> +	checkin or checkout.

Nice.

> @@ -162,23 +165,28 @@ unspecified.
>  This attribute sets a specific line-ending style to be used in the
>  working directory.  This attribute has effect only if the `text`
>  attribute is set or unspecified, or if it is set to `auto`, the file is
> -detected as text, and it is stored with LF endings in the index.  Note
> -that setting this attribute on paths which are in the index with CRLF
> -line endings may make the paths to be considered dirty unless
> -`text=auto` is set. Adding the path to the index again will normalize
> -the line endings in the index.
> +detected as text, and it is stored with LF endings in the index.

This level of clarification is very much appreciated.

We may want to use the phrase "checking out" somewhere in
conjunction with "in the working directory" to be consistent with
how the section is titled.  Something like

    This attribute marks a path to use a specific line-ending style
    in the working tree when it is checked out.

>  Set to string value "crlf"::
>  
> +	This setting converts the file's line endings in the working
> +	directory to CRLF when the file is checked out.

Nice.

>  Set to string value "lf"::
>  
> -	This setting forces Git to normalize line endings to LF on
> -	checkin and prevents conversion to CRLF when the file is
> -	checked out.
> +	This setting uses the same line endings in the working directory as
> +	in the index, whether they are LF or CRLF.  However, unless
> +	`text=auto`, adding the file to the index again will normalize its
> +	line endings to LF in the index.

First of all, "uses the same, except if ... then LF is used" is much
harder to understand than "use LF, except if ..." as an explanation
for "lf".

The description may be correct but is 'eol=lf' what controls how the
normalization is done upon checking in?

As you said in the explanation for the `eol` attribute as a whole,
`eol` is effective only when `text` says the path is a text file,
and that (i.e. the fact that `text` controls the end-of-line) makes
the path to use LF line endings, isn't it?  The last sentence
starting with "However, unless" that talks about conversion going
into the index feels out of place.

There are some exception handling in the code for odd cases like the
contents with mixed line endings, a path set to use LF but the file
actually has CRLF, etc.  While they are worth describing, I wonder
if they should be done in a separate paragraph.

> +Unspecified::
> +
> +	If the `eol` attribute is unspecified for a file, its line endings
> +	in the working directory are determined by the `core.autocrlf` or
> +	`core.eol` configuration variable (see the definitions of those
> +	options in linkgit:git-config[1]).  The default if `text` is set but
> +	neither of those variables is is `eol=lf` on Unix and `eol=crlf` on
> +	Windows.

Nice.


[Footnote]

*1* The intent is not about special-casing CRLF but any other; if
    Git were to be extended to support ancient MacOS, we would
    convert even if it uses CR line endings.
Alex Henrie April 29, 2023, 7:19 p.m. UTC | #2
On Fri, Apr 28, 2023 at 12:05 PM Junio C Hamano <gitster@pobox.com> wrote:

> Thanks.  I wonder helped-by tboegi@ might be in order, but hopefully
> it may come during the review of this round.

Sure, I'll add a Helped-by line in the next revision.

> 1. Overall, the new text is clearly written, and I think it made the
>    updated description of "Set" and "Set to string 'auto'" easier to
>    understand.

I'm glad you like it!

> 2. "on checkin and possibly also checkout" may be redundant, as the
>    section in which the `text` attribute is described is all about
>    various conversions upon checking in (i.e. moving contents to the
>    index) and checking out (i.e. moving contents out of the index).

For the `text` attribute's behavior to make sense, the first thing a
newbie has to understand is that `text` definitely enables
normalization on checkin and might or might not enable normalization
on checkout. The original text made it sound like `text` only controls
checkin and `eol` controls checkout, a false impression that is
especially misleading because on Linux and OS X, `text` without `eol`
(or core.autocrlf or core.eol) does only enable normalization on
checkin. The new text may be slightly redundant, but it makes the
important part crystal clear.

> 3. It is unclear why we would want to mention "even if it has CRLF
>    line endings in the working directory".  "Regardless of the line
>    endings in the working tree files" may slightly be better [*1*],
>    but I think the text reads better without it as storing contents
>    with possibly different end-of-line convention is the point of
>    "end-of-line normalization" in the first place.

> *1* The intent is not about special-casing CRLF but any other; if
>     Git were to be extended to support ancient MacOS, we would
>     convert even if it uses CR line endings.

I called out CRLF precisely because I was thinking about platforms
that use CR instead of LF or CRLF. I wanted to make it clear that if
you add a file that was written on or for an ancient Mac or Commodore,
the line endings are not normalized. Git can be used (and probably is
being used by someone) to version-control files for those platforms
without actually running on those platforms.

Then again, maybe the fact that the `text` attribute does not
normalize CR line endings is a bug in Git, and we shouldn't advertise
it in the documentation as if it were intended behavior. What do you
think?

>    "the file is stored in the index with LF line endings" may better
>    be rephrased to "the contents is stored, after converting to use
>    LF line endings if necessary, in the index", so that we can use
>    the verb "convert" to stress the point of setting this attribute.

The phrase "if necessary" would be a bit confusing. What makes
conversion on checkin "necessary"? The reader would wonder if it
depends on the Git config and the platform like conversion on checkout
does.

Would you be OK with your proposed wording minus "if necessary"?

> 4. This problem is shared with the original text, but "This
>    attribute marks the path as a text file" might be a bit
>    misleading.  If it is left unspecified (i.e. unmentioned, or any
>    earlier settings by explicitly overriding them with "!text") or
>    worse yet, explicitly unset (i.e. "-text"), it marks the path as
>    a non text.
>
>       This attribute is used to mark the path as a text file, which
>       needs certain end-of-line normalization upon check-in and
>       check-out, or a non-text file, which do not.
>
>   I dunno if the extra verbosity is too much, though.

Hmm, I would find your proposed wording misleading because it seems to
imply that the default is `text=auto` and that `-text` (or a config
variable) is necessary to turn end-of-line normalization off. In my
opinion it's already clear enough that the attribute is unset by
default and that `-text` unsets a previous `text`.

> >  Set::
> >
> >       Setting the `text` attribute on a path enables end-of-line
> > -     normalization and marks the path as a text file.  End-of-line
> > -     conversion takes place without guessing the content type.
> > +     normalization on checkin and checkout as described above.  Line
> > +     endings are normalized in the index the next time the file is
> > +     checked in, even if the file was previously added to Git with CRLF
> > +     line endings.
>
> "the next time" -> "every time", but that is probably moot, because
> I would question the need for the whole sentence.
>
> The last sentence, starting with "Line endigns are normalized", may
> be redundant and we probably are better off without it.  It would
> risk becoming maintenance burden because we have to make sure it
> stays in sync with what you wrote in the introductory text.  "as
> described above" before that sentence is clear enough.

I added that sentence to make the difference between `text` and
`text=auto` more clear. If all we have is the introductory paragraph,
it sounds like Git only converts CRLF to LF on checkin if it had
converted LF to CRLF on checkout. However, if you feel strongly that
the sentence is unhelpful here, I'll remove it.

> We may want to use the phrase "checking out" somewhere in
> conjunction with "in the working directory" to be consistent with
> how the section is titled.  Something like
>
>     This attribute marks a path to use a specific line-ending style
>     in the working tree when it is checked out.

That's fine.

> >  Set to string value "lf"::
> >
> > -     This setting forces Git to normalize line endings to LF on
> > -     checkin and prevents conversion to CRLF when the file is
> > -     checked out.
> > +     This setting uses the same line endings in the working directory as
> > +     in the index, whether they are LF or CRLF.  However, unless
> > +     `text=auto`, adding the file to the index again will normalize its
> > +     line endings to LF in the index.
>
> First of all, "uses the same, except if ... then LF is used" is much
> harder to understand than "use LF, except if ..." as an explanation
> for "lf".
>
> The description may be correct but is 'eol=lf' what controls how the
> normalization is done upon checking in?
>
> As you said in the explanation for the `eol` attribute as a whole,
> `eol` is effective only when `text` says the path is a text file,
> and that (i.e. the fact that `text` controls the end-of-line) makes
> the path to use LF line endings, isn't it?  The last sentence
> starting with "However, unless" that talks about conversion going
> into the index feels out of place.

That's a good point, it would be more clear here to explicitly say
that the setting affects checkout, and not mention what happens on
checkin (which was described previously in the `text` documentation).
Also, if we delete the sentence here then the seemingly redundant
sentence that starts with "Line endings are normalized" wouldn't be as
redundant.

> There are some exception handling in the code for odd cases like the
> contents with mixed line endings, a path set to use LF but the file
> actually has CRLF, etc.  While they are worth describing, I wonder
> if they should be done in a separate paragraph.

Probably best done in a separate patch after this rewrite is done.

Wow, that was a LOT of feedback on a relatively short piece of text.
Are you sure you don't want to rewrite the documentation yourself? ;-)

-Alex
Junio C Hamano April 30, 2023, 5:16 a.m. UTC | #3
Alex Henrie <alexhenrie24@gmail.com> writes:

> The new text may be slightly redundant, but it makes the
> important part crystal clear.

I tend to disagree and think it is a bit too redundant, but we do
not have to agree on everything---after all you are doing all the
work, and it is your motivation and your topic.

> Then again, maybe the fact that the `text` attribute does not
> normalize CR line endings is a bug in Git, and we shouldn't advertise
> it in the documentation as if it were intended behavior. What do you
> think?

Just not mentioning CRLF specifically would be sufficient.  When
(and if) somebody actually comes and complains, we can say "CR
delimited files are not considered text these days, we aren't doing
MacOS System 7" ;-).

> The phrase "if necessary" would be a bit confusing. What makes
> conversion on checkin "necessary"? The reader would wonder if it
> depends on the Git config and the platform like conversion on checkout
> does.
>
> Would you be OK with your proposed wording minus "if necessary"?

I added it only because it would not be necessary if the original
already uses LF line endings, in which you do not have to do any
converison.  As long as it is clear that no conversion happens in
such a case without saying, I am perfectly fine to drop it.

>> There are some exception handling in the code for odd cases like the
>> contents with mixed line endings, a path set to use LF but the file
>> actually has CRLF, etc.  While they are worth describing, I wonder
>> if they should be done in a separate paragraph.
>
> Probably best done in a separate patch after this rewrite is done.

What I meant was that that your "unless text=auto" and everything
after it was one of thoese exception handling that should be dealt
in a separate paragraph.  Leaving it as a follow-on work is fine.

> Wow, that was a LOT of feedback on a relatively short piece of text.
> Are you sure you don't want to rewrite the documentation yourself? ;-)

Absolutely not.  It is not my itch to update this document.

My itch is to make sure any change makes the resulting text better
than the previous one ;-).

Thanks.
Torsten Bögershausen April 30, 2023, 1:19 p.m. UTC | #4
On Thu, Apr 27, 2023 at 10:22:21PM -0600, Alex Henrie wrote:

Thanks for picking this up.
There had been some comments from Junio, I haven't had time to look at
them yet.
Some of my comments inline, lets see how we can converge things.

> These two sentences are confusing because the description of the text
> attribute sounds exactly the same as the description of the text=auto
> attribute:
>
> "Setting the text attribute on a path enables end-of-line normalization"
>
> "When text is set to "auto", the path is marked for automatic
> end-of-line conversion"
>
> Unless the reader is already familiar with the two variants, there's a
> high probability that they will think that "end-of-line normalization"
> is the same thing as "automatic end-of-line conversion".
>
> It's also not clear that the phrase "When the file has been committed
> with CRLF, no conversion is done" in the paragraph for text=auto does
> not apply equally to the bare text attribute which is described earlier.
> Moreover, it falsely implies that normalization is only suppressed if
> the file has been committed. In fact, running `git add` on a CRLF file,
> adding the text=auto attribute to the file, and running `git add` again
> does not do anything to the line endings either.
>
> On top of that, in several places the documentation for the eol
> attribute sounds like it can force normalization on checkin and checkout
> all by itself, but eol doesn't control normalization on checkin and
> doesn't control normalization on checkout either unless accompanied by
> the text attribute.
>
> Rephrase the documentation of text, text=auto, eol, eol=crlf, and eol=lf
> to be clear about how they are the same, how they are different, and in
> what cases normalization is performed.
>
> Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
> ---
> v2: rewrite completely and rewrite the eol documentation too
> ---
>  Documentation/gitattributes.txt | 58 +++++++++++++++++++--------------
>  1 file changed, 33 insertions(+), 25 deletions(-)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 39bfbca1ff..bcea84f439 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -120,20 +120,22 @@ repository upon 'git add' and 'git commit'.
>  `text`
>  ^^^^^^
>
> -This attribute enables and controls end-of-line normalization.  When a

Hm, not only, I think. The terminologie is probably not very well specified.
I would say that it "controls end-of-line conversion".
There are 2 type of conversions, from CRLF into LF and LF into CRLF.
The CRLF -> LF conversion happens only at `git commit`
(strictly speaking already at `git add`) and is called normalization.
Because in Git a "normalized" file has LF in the repo (and index).
The term normalize has even been add to 2 commands:

`git add --renormalize .`
inspired by
`git merge -Xrenormalize`

> -text file is normalized, its line endings are converted to LF in the
> -repository.  To control what line ending style is used in the working
> -directory, use the `eol` attribute for a single file and the
> -`core.eol` configuration variable for all text files.

In general, the `eol` attribute can be used by more that a single file.
And that is what we (or at least myself) recommend to do, in a kind
of best practice fashion, as an example:

*.sh text eol=lf

The core.eol is the fallback, when the eol attribute is not specified.
But the we look at core.autocrlf, before looking at core.eol,
as pointed out below.
And if none of them is set, Git uses the platform native setting,
whih is CRLF for Windows and LF for all other systems.


> -Note that setting `core.autocrlf` to `true` or `input` overrides
> -`core.eol` (see the definitions of those options in

Looking with fresh eyes: I am not sure if like this historical construct.
First we say the the "core.eol" sets the line endings (if not defined in
the attribute) and the we say that core.autcrlf overrides core.eol

This is mainly due to historically resons.
I think that things goes like this:
When text or text=auto (and Git identifies the file as text),
and the eol attribute is not set, then:
core.autocrlf=true gives CRLF
core.autocrlf=input give LF
core.autocrlf=false looks at core.eol:
core.eol=clrf gives CRLF
core.eol=lf give LF
core.eol unset gives the platform default

> -linkgit:git-config[1]).
> +This attribute marks the path as a text file, which enables end-of-line
> +normalization on checkin and possibly also checkout: When a matching
As said before.
 ...normalization on checkin and possibly conversion at checkout...
 or
  ... conversion on checkin and possibly also checkout...

> +file is added to the index, even if it has CRLF line endings in the
> +working directory, the file is stored in the index with LF line endings.
> +Conversely, when the file is copied from the index to the working
"copied" is not an ideal word here:
We may specify a filter and/or an encoding as well.
Would "transferred and possibly filtered/encoded" be better ?

> +directory, its line endings may be converted from LF to CRLF depending
> +on the `eol` attribute, the Git config, and the platform (see
> +explanation of `eol` below).
>
>  Set::
>
>  	Setting the `text` attribute on a path enables end-of-line
> -	normalization and marks the path as a text file.  End-of-line
> -	conversion takes place without guessing the content type.
> +	normalization on checkin and checkout as described above.  Line
> +	endings are normalized in the index the next time the file is
> +	checked in, even if the file was previously added to Git with CRLF
> +	line endings.
>
>  Unset::
>
> @@ -142,10 +144,11 @@ Unset::
>
>  Set to string value "auto"::
>
> -	When `text` is set to "auto", the path is marked for automatic
> -	end-of-line conversion.  If Git decides that the content is
> -	text, its line endings are converted to LF on checkin.
> -	When the file has been committed with CRLF, no conversion is done.
> +	When `text` is set to "auto", Git decides by itself whether the file
> +	is text or binary.  If it is text and the file was not already in
> +	Git with CRLF endings, line endings are converted on checkin and
> +	checkout as described above.  Otherwise, no conversion is done on
> +	checkin or checkout.

Side note: We previously talked about files. path is better.
>
>  Unspecified::
>
> @@ -162,23 +165,28 @@ unspecified.
>  This attribute sets a specific line-ending style to be used in the
>  working directory.  This attribute has effect only if the `text`
>  attribute is set or unspecified, or if it is set to `auto`, the file is
> -detected as text, and it is stored with LF endings in the index.  Note
> -that setting this attribute on paths which are in the index with CRLF
> -line endings may make the paths to be considered dirty unless
> -`text=auto` is set.
... Or `git add --renormalize <path> is run.

Adding the path to the index again will normalize
> -the line endings in the index.
> +detected as text, and it is stored with LF endings in the index.
>
>  Set to string value "crlf"::
>
> -	This setting forces Git to normalize line endings for this
> -	file on checkin and convert them to CRLF when the file is
> -	checked out.
> +	This setting converts the file's line endings in the working
> +	directory to CRLF when the file is checked out.
>
>  Set to string value "lf"::
>
> -	This setting forces Git to normalize line endings to LF on
> -	checkin and prevents conversion to CRLF when the file is
> -	checked out.
> +	This setting uses the same line endings in the working directory as
> +	in the index, whether they are LF or CRLF.  However, unless
> +	`text=auto`, adding the file to the index again will normalize its
> +	line endings to LF in the index.
> +
> +Unspecified::
> +
> +	If the `eol` attribute is unspecified for a file, its line endings
> +	in the working directory are determined by the `core.autocrlf` or
> +	`core.eol` configuration variable (see the definitions of those
> +	options in linkgit:git-config[1]).  The default if `text` is set but
> +	neither of those variables is is `eol=lf` on Unix and `eol=crlf` on
> +	Windows.
>
>  Backwards compatibility with `crlf` attribute
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> --
> 2.40.0
>

Thanks again for working on this.
/Torsten
Alex Henrie April 30, 2023, 11:43 p.m. UTC | #5
On Sun, Apr 30, 2023 at 7:19 AM Torsten Bögershausen <tboegi@web.de> wrote:
>
> On Thu, Apr 27, 2023 at 10:22:21PM -0600, Alex Henrie wrote:

> > -This attribute enables and controls end-of-line normalization.  When a
>
> Hm, not only, I think. The terminologie is probably not very well specified.
> I would say that it "controls end-of-line conversion".
> There are 2 type of conversions, from CRLF into LF and LF into CRLF.
> The CRLF -> LF conversion happens only at `git commit`
> (strictly speaking already at `git add`) and is called normalization.
> Because in Git a "normalized" file has LF in the repo (and index).
> The term normalize has even been add to 2 commands:
>
> `git add --renormalize .`
> inspired by
> `git merge -Xrenormalize`

Yes, we should probably be consistent about using the word "normalize"
to refer to conversion to LF and the word "convert" to refer to
conversion to CRLF or conversion in general. That would also allow us
to use the word "normalize" for emphasis without wondering whether to
add "if necessary", because normalization certainly does not imply
conversion if the file is already normalized.

> > -Note that setting `core.autocrlf` to `true` or `input` overrides
> > -`core.eol` (see the definitions of those options in
> > -linkgit:git-config[1]).
>
> Looking with fresh eyes: I am not sure if like this historical construct.
> First we say the the "core.eol" sets the line endings (if not defined in
> the attribute) and the we say that core.autcrlf overrides core.eol
>
> This is mainly due to historically resons.
> I think that things goes like this:
> When text or text=auto (and Git identifies the file as text),
> and the eol attribute is not set, then:
> core.autocrlf=true gives CRLF
> core.autocrlf=input give LF
> core.autocrlf=false looks at core.eol:
> core.eol=clrf gives CRLF
> core.eol=lf give LF
> core.eol unset gives the platform default

Yes, core.autocrlf takes precedence over core.eol. That's explained in
the git-config documentation and doesn't need to be explained again in
the attribute documentation. However, in the documentation for
unspecified `eol`, I did make sure to name `eol`, core.autocrlf, and
core.eol in order of precedence.

> > +file is added to the index, even if it has CRLF line endings in the
> > +working directory, the file is stored in the index with LF line endings.
> > +Conversely, when the file is copied from the index to the working
> "copied" is not an ideal word here:
> We may specify a filter and/or an encoding as well.
> Would "transferred and possibly filtered/encoded" be better ?

Although "transferred and possibly filtered/encoded" might be
technically more accurate, it is a mouthful. For readability, it would
be better to keep the documentation concise. It's clear in this
sentence that "copy" does not mean a byte-for-byte copy (because the
line endings may be changed), which leaves open the possibility that
the file may be transformed in other ways during checkout.

> > +     When `text` is set to "auto", Git decides by itself whether the file
> > +     is text or binary.  If it is text and the file was not already in
> > +     Git with CRLF endings, line endings are converted on checkin and
> > +     checkout as described above.  Otherwise, no conversion is done on
> > +     checkin or checkout.
>
> Side note: We previously talked about files. path is better.

It depends: When talking about applying the attribute to a path, path
is better, but the actual transformation happens on the file contents.
The file is what's text or binary and the file is what's transformed;
the path is always text and never transformed.

Thanks for the feedback. I will send a v3 that takes into account your
feedback and Junio's.

-Alex
Alex Henrie May 1, 2023, 12:40 a.m. UTC | #6
On Sat, Apr 29, 2023 at 11:16 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Alex Henrie <alexhenrie24@gmail.com> writes:
>
> > The new text may be slightly redundant, but it makes the
> > important part crystal clear.
>
> I tend to disagree and think it is a bit too redundant, but we do
> not have to agree on everything---after all you are doing all the
> work, and it is your motivation and your topic.

I just now realized that `eol=crlf` works when `text` is unspecified.
In that light, the phrase "on checkin and possibly also checkout"
seems to go a little too far towards implying that the eol attribute
and core.autocrlf config variable depend on the text attribute being
set. Omitting that phrase would leave the introduction with a better
balance between precision and brevity, and the remainder of the
documentation would still be there to flesh out the details.

> > Then again, maybe the fact that the `text` attribute does not
> > normalize CR line endings is a bug in Git, and we shouldn't advertise
> > it in the documentation as if it were intended behavior. What do you
> > think?
>
> Just not mentioning CRLF specifically would be sufficient.  When
> (and if) somebody actually comes and complains, we can say "CR
> delimited files are not considered text these days, we aren't doing
> MacOS System 7" ;-).

I don't think I would dare say "CR-delimited text is not text" to a
retro computing enthusiast. But I'm fine with not mentioning CRLF in
this sentence, and when the angry mob shows up on my doorstep, I will
just send them your way ;-)

-Alex
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 39bfbca1ff..bcea84f439 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -120,20 +120,22 @@  repository upon 'git add' and 'git commit'.
 `text`
 ^^^^^^
 
-This attribute enables and controls end-of-line normalization.  When a
-text file is normalized, its line endings are converted to LF in the
-repository.  To control what line ending style is used in the working
-directory, use the `eol` attribute for a single file and the
-`core.eol` configuration variable for all text files.
-Note that setting `core.autocrlf` to `true` or `input` overrides
-`core.eol` (see the definitions of those options in
-linkgit:git-config[1]).
+This attribute marks the path as a text file, which enables end-of-line
+normalization on checkin and possibly also checkout: When a matching
+file is added to the index, even if it has CRLF line endings in the
+working directory, the file is stored in the index with LF line endings.
+Conversely, when the file is copied from the index to the working
+directory, its line endings may be converted from LF to CRLF depending
+on the `eol` attribute, the Git config, and the platform (see
+explanation of `eol` below).
 
 Set::
 
 	Setting the `text` attribute on a path enables end-of-line
-	normalization and marks the path as a text file.  End-of-line
-	conversion takes place without guessing the content type.
+	normalization on checkin and checkout as described above.  Line
+	endings are normalized in the index the next time the file is
+	checked in, even if the file was previously added to Git with CRLF
+	line endings.
 
 Unset::
 
@@ -142,10 +144,11 @@  Unset::
 
 Set to string value "auto"::
 
-	When `text` is set to "auto", the path is marked for automatic
-	end-of-line conversion.  If Git decides that the content is
-	text, its line endings are converted to LF on checkin.
-	When the file has been committed with CRLF, no conversion is done.
+	When `text` is set to "auto", Git decides by itself whether the file
+	is text or binary.  If it is text and the file was not already in
+	Git with CRLF endings, line endings are converted on checkin and
+	checkout as described above.  Otherwise, no conversion is done on
+	checkin or checkout.
 
 Unspecified::
 
@@ -162,23 +165,28 @@  unspecified.
 This attribute sets a specific line-ending style to be used in the
 working directory.  This attribute has effect only if the `text`
 attribute is set or unspecified, or if it is set to `auto`, the file is
-detected as text, and it is stored with LF endings in the index.  Note
-that setting this attribute on paths which are in the index with CRLF
-line endings may make the paths to be considered dirty unless
-`text=auto` is set. Adding the path to the index again will normalize
-the line endings in the index.
+detected as text, and it is stored with LF endings in the index.
 
 Set to string value "crlf"::
 
-	This setting forces Git to normalize line endings for this
-	file on checkin and convert them to CRLF when the file is
-	checked out.
+	This setting converts the file's line endings in the working
+	directory to CRLF when the file is checked out.
 
 Set to string value "lf"::
 
-	This setting forces Git to normalize line endings to LF on
-	checkin and prevents conversion to CRLF when the file is
-	checked out.
+	This setting uses the same line endings in the working directory as
+	in the index, whether they are LF or CRLF.  However, unless
+	`text=auto`, adding the file to the index again will normalize its
+	line endings to LF in the index.
+
+Unspecified::
+
+	If the `eol` attribute is unspecified for a file, its line endings
+	in the working directory are determined by the `core.autocrlf` or
+	`core.eol` configuration variable (see the definitions of those
+	options in linkgit:git-config[1]).  The default if `text` is set but
+	neither of those variables is is `eol=lf` on Unix and `eol=crlf` on
+	Windows.
 
 Backwards compatibility with `crlf` attribute
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^