Message ID | d3b8ed97a105ea1d7e656c964b7eee378e11ede6.1657385781.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add some Glossary terms, and extra renormalize information. | expand |
"Philip Oakley via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Philip Oakley <philipoakley@iee.email> > > Bug report > https://lore.kernel.org/git/AM0PR02MB56357CC96B702244F3271014E8DC9@AM0PR02MB5635.eurprd02.prod.outlook.com/ > noted that a file containing /r/r/n needed renormalising twice. Did you mean backslash, not forward? > This is by design. Lone CR characters, not paired with an LF, are left > unchanged. Note the lack of idempotentness of the "clean" filter in the > documentation. OK. > Renormalize was introduced at 9472935d81e (add: introduce "--renormalize", > Torsten Bögershausen, 2017-11-16) Does this need to be said "HERE", rather than leaving it to run "git blame" for those who became curious? > Signed-off-by: Philip Oakley <philipoakley@iee.email> > --- > Documentation/git-add.txt | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt > index 11eb70f16c7..c4a5ad11a6b 100644 > --- a/Documentation/git-add.txt > +++ b/Documentation/git-add.txt > @@ -188,7 +188,8 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files. > forcibly add them again to the index. This is useful after > changing `core.autocrlf` configuration or the `text` attribute > in order to correct files added with wrong CRLF/LF line endings. > - This option implies `-u`. > + This option implies `-u`. Lone CR characters are untouched, so > + cleaning not idempotent. A CRCRLF sequence cleans to CRLF. Lack of verb BE somewhere. Do we expect our readers all understand the math-y word? It is not too hard to explain it to math-uninitiated, e.g. This option implies `-u`. Note that running renormalize again on the result of running renormalize may make it even "more normal". A CR-CR-LF sequence would first renormalize to CR-LF (the first CR, a lone CR, is left intact, and CR-LF that follows normalizes to LF). If you run renormalize again, the resulting CR-LF will normalize down to LF.
On Sat, Jul 09, 2022 at 04:56:21PM +0000, Philip Oakley via GitGitGadget wrote: > From: Philip Oakley <philipoakley@iee.email> > > Bug report > https://lore.kernel.org/git/AM0PR02MB56357CC96B702244F3271014E8DC9@AM0PR02MB5635.eurprd02.prod.outlook.com/ > noted that a file containing /r/r/n needed renormalising twice. > > This is by design. Lone CR characters, not paired with an LF, are left > unchanged. This is all fine. > Note the lack of idempotentness of the "clean" filter in the > documentation. The clean filter is idempotent, I would claim, see below. You can run it, and re-run, and re-run, there will no other changes. CRLF in the worktree will become LF in the repo, 'lone CR' stay as they are. In that sense, CRCRLF in the worktree will become CRLF in the repo. You can the renormalize again and again. The "trick" is that the user has to decide what CRCRLF mean and what should happen in the repo: CRCRLF in the worktree becomes one line ending (one LF in the repo) or CRCRLF in the worktree becomes two line endings ( LFLF in the repo) For a) you can use dos2unix twice. Or run `git add --renormalize` followed by `rm git.bdf` `git restore .` The thing is that we used a combination of different commands $ git add --renormalize . $ git commit -m "Renormalize bdf.txt" $ rm git.bdf $ git restore . $ git add --renormalize . $ git commit -m "Renormalize a second time bdf.txt" ... to clean up this very situation. And, if CRCRLF should have become LFLF instead ? Probably a python script is needed to fix this. (or some other script/program in the language of your choice) We could argue that `git add --renormalize` is idempotent, but a series of carefully crafted commands is not. In short, what is missing is the documentation how CRCRLF is handled by Git. > > Renormalize was introduced at 9472935d81e (add: introduce "--renormalize", > Torsten Bögershausen, 2017-11-16) > > Signed-off-by: Philip Oakley <philipoakley@iee.email> > --- > Documentation/git-add.txt | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt > index 11eb70f16c7..c4a5ad11a6b 100644 > --- a/Documentation/git-add.txt > +++ b/Documentation/git-add.txt > @@ -188,7 +188,8 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files. > forcibly add them again to the index. This is useful after > changing `core.autocrlf` configuration or the `text` attribute > in order to correct files added with wrong CRLF/LF line endings. > - This option implies `-u`. > + This option implies `-u`. Lone CR characters are untouched, so > + cleaning not idempotent. A CRCRLF sequence cleans to CRLF. How about this: This option implies `-u`. Lone CR characters are untouched. CRCRLF cleans to CRLF.
On 09/07/2022 22:06, Junio C Hamano wrote: > "Philip Oakley via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Philip Oakley <philipoakley@iee.email> >> >> Bug report >> https://lore.kernel.org/git/AM0PR02MB56357CC96B702244F3271014E8DC9@AM0PR02MB5635.eurprd02.prod.outlook.com/ >> noted that a file containing /r/r/n needed renormalising twice. > Did you mean backslash, not forward? Correct. Too many years of Windows. > >> This is by design. Lone CR characters, not paired with an LF, are left >> unchanged. Note the lack of idempotentness of the "clean" filter in the >> documentation. > OK. > > >> Renormalize was introduced at 9472935d81e (add: introduce "--renormalize", >> Torsten Bögershausen, 2017-11-16) > Does this need to be said "HERE", rather than leaving it to run "git > blame" for those who became curious? It was a misguided reminder to cc Torsten about his recollection of the CRCRLF issue. I'll remove it. I see Torsten has also commented. > >> Signed-off-by: Philip Oakley <philipoakley@iee.email> >> --- >> Documentation/git-add.txt | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt >> index 11eb70f16c7..c4a5ad11a6b 100644 >> --- a/Documentation/git-add.txt >> +++ b/Documentation/git-add.txt >> @@ -188,7 +188,8 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files. >> forcibly add them again to the index. This is useful after >> changing `core.autocrlf` configuration or the `text` attribute >> in order to correct files added with wrong CRLF/LF line endings. >> - This option implies `-u`. >> + This option implies `-u`. Lone CR characters are untouched, so >> + cleaning *^* not idempotent. A CRCRLF sequence cleans to CRLF. > Lack of verb BE somewhere. '^' It took me three re-reads to see my mistyping as my head knew what I'd meant to write, I've marked above as a note to self. Aside: Are there any guides / suggestions / how-to's for on-line reviewing that you can recommend o > Do we expect our readers all understand the math-y word? Ok. It's mainly used in the test directory, and fsmonitor.h, but not in the user docs. > It is not > too hard to explain it to math-uninitiated, e.g. > > This option implies `-u`. Note that running renormalize again > on the result of running renormalize may make it even "more > normal". A CR-CR-LF sequence would first renormalize to CR-LF > (the first CR, a lone CR, is left intact, and CR-LF that follows > normalizes to LF). If you run renormalize again, the resulting > CR-LF will normalize down to LF. > Torsten had a shorter suggestion I'll also look at. Philip
Philip Oakley <philipoakley@iee.email> writes: >>> + This option implies `-u`. Lone CR characters are untouched, so >>> + cleaning *^* not idempotent. A CRCRLF sequence cleans to CRLF. >> Lack of verb BE somewhere. > '^' It took me three re-reads to see my mistyping as my head knew what > I'd meant to write, I've marked above as a note to self. > Aside: Are there any guides / suggestions / how-to's for on-line > reviewing that you can recommend o Sorry, but I do not know of any good "trick" to fight against our common tendency to easily miss trivial typoes and thinkos in what we ourselves wrote. We can be surprisingly blind to what a colleague can spot immediately, and that is why it helps to have a thorough read-through by a reviewer with fresh eyes. When I was a more prolific contributor, I sometimes tried to read aloud what I wrote to myself, both docs and code, and caught silly mistakes before sending them out to the list, but I do not recommend it to others.
Hi Tortsen, Thanks for the reply and comments. On 10/07/2022 08:48, Torsten Bögershausen wrote: > On Sat, Jul 09, 2022 at 04:56:21PM +0000, Philip Oakley via GitGitGadget wrote: >> From: Philip Oakley <philipoakley@iee.email> >> >> Bug report >> https://lore.kernel.org/git/AM0PR02MB56357CC96B702244F3271014E8DC9@AM0PR02MB5635.eurprd02.prod.outlook.com/ >> noted that a file containing /r/r/n needed renormalising twice. >> >> This is by design. Lone CR characters, not paired with an LF, are left >> unchanged. > This is all fine. > >> Note the lack of idempotentness of the "clean" filter in the >> documentation. > The clean filter is idempotent, I would claim, see below. I'd disagree, on the basis that any second 'idempotent' cleaning should not change the file content at all. The need for a second clean was the surprise the user had. > You can run it, and re-run, and re-run, there will no other changes. > CRLF in the worktree will become LF in the repo, > 'lone CR' stay as they are. > In that sense, CRCRLF in the worktree will become CRLF in the repo. So the output isn't normalised, and warning messages ensue (if enabled, etc) > You can the renormalize again and again. > > The "trick" is that the user has to decide what CRCRLF mean and what > should happen in the repo .. for which they should be forewarned of the issue. In this case it was a large repository transfer of legacy data, so little knowledge of how the double CRs occured, but it was a real issue for them. > : > CRCRLF in the worktree becomes one line ending (one LF in the repo) > or > CRCRLF in the worktree becomes two line endings ( LFLF in the repo) > > For a) you can use dos2unix twice. > Or run `git add --renormalize` followed by > `rm git.bdf` > `git restore .` > > The thing is that we used a combination of different commands > $ git add --renormalize . > $ git commit -m "Renormalize bdf.txt" > $ rm git.bdf > $ git restore . > $ git add --renormalize . > $ git commit -m "Renormalize a second time bdf.txt" > > ... to clean up this very situation. > > And, if CRCRLF should have become LFLF instead ? > Probably a python script is needed to fix this. > (or some other script/program in the language of your choice) > > We could argue that > `git add --renormalize` is idempotent, but a series of carefully crafted > commands is not. > In short, what is missing is the documentation how CRCRLF is handled by > Git. *nod* > >> Renormalize was introduced at 9472935d81e (add: introduce "--renormalize", >> Torsten Bögershausen, 2017-11-16) >> >> Signed-off-by: Philip Oakley <philipoakley@iee.email> >> --- >> Documentation/git-add.txt | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt >> index 11eb70f16c7..c4a5ad11a6b 100644 >> --- a/Documentation/git-add.txt >> +++ b/Documentation/git-add.txt >> @@ -188,7 +188,8 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files. >> forcibly add them again to the index. This is useful after >> changing `core.autocrlf` configuration or the `text` attribute >> in order to correct files added with wrong CRLF/LF line endings. >> - This option implies `-u`. >> + This option implies `-u`. Lone CR characters are untouched, so >> + cleaning not idempotent. A CRCRLF sequence cleans to CRLF. > How about this: > > This option implies `-u`. Lone CR characters are untouched. CRCRLF cleans to CRLF. That is probably sufficient. It drops the awkward 'idempotent'. And indicates this edge case, though doesn't highlight that the resultant CRLF still leaves the file only partially renormalised. I'll reword. >
On 10/07/2022 23:04, Junio C Hamano wrote: > Philip Oakley <philipoakley@iee.email> writes: > >>>> + This option implies `-u`. Lone CR characters are untouched, so >>>> + cleaning *^* not idempotent. A CRCRLF sequence cleans to CRLF. >>> Lack of verb BE somewhere. >> '^' It took me three re-reads to see my mistyping as my head knew what >> I'd meant to write, I've marked above as a note to self. >> Aside: Are there any guides / suggestions / how-to's for on-line >> reviewing that you can recommend o > Sorry, but I do not know of any good "trick" to fight against our > common tendency to easily miss trivial typoes and thinkos in what we > ourselves wrote. We can be surprisingly blind to what a colleague > can spot immediately, and that is why it helps to have a thorough > read-through by a reviewer with fresh eyes. When I was a more > prolific contributor, I sometimes tried to read aloud what I wrote > to myself, both docs and code, and caught silly mistakes before > sending them out to the list, but I do not recommend it to others. Thanks. There does appear to be a lack of literature or articles in this area of on-list reviewing I've not even seen an list of snippets collated from email advice. Other than the email etiquette's starter for ten on don't top post ;-) -- Philip
Philip Oakley <philipoakley@iee.email> writes: >> How about this: >> >> This option implies `-u`. Lone CR characters are untouched. CRCRLF cleans to CRLF. > That is probably sufficient. It drops the awkward 'idempotent'. And > indicates this edge case, though doesn't highlight that the resultant > CRLF still leaves the file only partially renormalised. > > I'll reword. It's been a few weeks since the last activity on this topic. Anything you guys need unblocked to move forward? Thanks.
On Fri, Aug 05, 2022 at 03:26:10PM -0700, Junio C Hamano wrote: > Philip Oakley <philipoakley@iee.email> writes: > > >> How about this: > >> > >> This option implies `-u`. Lone CR characters are untouched. CRCRLF cleans to CRLF. > > That is probably sufficient. It drops the awkward 'idempotent'. And > > indicates this edge case, though doesn't highlight that the resultant > > CRLF still leaves the file only partially renormalised. > > > > I'll reword. > > It's been a few weeks since the last activity on this topic. > Anything you guys need unblocked to move forward? > > Thanks. > Not from my point of view. My understanding is, that the short version is OK for everybody: This option implies `-u`. Lone CR characters are untouched. CRCRLF cleans to CRLF. Is it OK to ask you for a local ammend to push this further ?
I've unfortunately had some family issue which prevented me doing any work. If I haven't managed anything by the end of the week. I'd be happy if others took it forward. On 05/08/2022 23:26, Junio C Hamano wrote: > Philip Oakley <philipoakley@iee.email> writes: > >>> How about this: >>> >>> This option implies `-u`. Lone CR characters are untouched. CRCRLF cleans to CRLF. >> That is probably sufficient. It drops the awkward 'idempotent'. And >> indicates this edge case, though doesn't highlight that the resultant >> CRLF still leaves the file only partially renormalised. >> >> I'll reword. > It's been a few weeks since the last activity on this topic. > Anything you guys need unblocked to move forward? > > Thanks. >
Philip Oakley <philipoakley@iee.email> writes: > I've unfortunately had some family issue which prevented me doing any work. I hope everything will be well on your side. > If I haven't managed anything by the end of the week. I'd be happy if > others took it forward. Thanks for letting us know.
On Mon, Aug 08, 2022 at 03:32:35PM +0100, Philip Oakley wrote: > I've unfortunately had some family issue which prevented me doing any work. That is sad to here. I hope that things are getting better in one way or another. > > If I haven't managed anything by the end of the week. I'd be happy if > others took it forward. I can certainly have a look, after the weekend, and continue your work.
diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt index 11eb70f16c7..c4a5ad11a6b 100644 --- a/Documentation/git-add.txt +++ b/Documentation/git-add.txt @@ -188,7 +188,8 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files. forcibly add them again to the index. This is useful after changing `core.autocrlf` configuration or the `text` attribute in order to correct files added with wrong CRLF/LF line endings. - This option implies `-u`. + This option implies `-u`. Lone CR characters are untouched, so + cleaning not idempotent. A CRCRLF sequence cleans to CRLF. --chmod=(+|-)x:: Override the executable bit of the added files. The executable