diff mbox series

[06/53] docs: admin-guide: avoid using UTF-8 chars

Message ID 4b372b47487992fa0b4036b4bfbb6c879f497786.1620641727.git.mchehab+huawei@kernel.org (mailing list archive)
State New
Headers show
Series Get rid of UTF-8 chars that can be mapped as ASCII | expand

Commit Message

Mauro Carvalho Chehab May 10, 2021, 10:26 a.m. UTC
While UTF-8 characters can be used at the Linux documentation,
the best is to use them only when ASCII doesn't offer a good replacement.
So, replace the occurences of the following UTF-8 characters:

	- U+00a0 (' '): NO-BREAK SPACE
	- U+2013 ('–'): EN DASH
	- U+2014 ('—'): EM DASH

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 Documentation/admin-guide/index.rst           |  2 +-
 Documentation/admin-guide/module-signing.rst  |  4 +-
 Documentation/admin-guide/ras.rst             | 94 +++++++++----------
 .../admin-guide/reporting-issues.rst          | 12 +--
 4 files changed, 56 insertions(+), 56 deletions(-)

Comments

Gabriel Krisman Bertazi May 10, 2021, 6:40 p.m. UTC | #1
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> While UTF-8 characters can be used at the Linux documentation,
> the best is to use them only when ASCII doesn't offer a good replacement.
> So, replace the occurences of the following UTF-8 characters:
>
> 	- U+00a0 (' '): NO-BREAK SPACE
> 	- U+2013 ('–'): EN DASH
> 	- U+2014 ('—'): EM DASH
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  Documentation/admin-guide/index.rst           |  2 +-
>  Documentation/admin-guide/module-signing.rst  |  4 +-
>  Documentation/admin-guide/ras.rst             | 94 +++++++++----------
>  .../admin-guide/reporting-issues.rst          | 12 +--
>  4 files changed, 56 insertions(+), 56 deletions(-)

Hi Mauro,

This patch misses one occurrence of U+2014 in
Documentation/admin-guide/sysctl/kernel.rst:1288.

There are also countless occurrences in Documentation/, outside of
Documentation/admin-guide.  I suppose another patch in the series, which
I didn't receive, will fix them?

These characters will just reappear elsewhere, eventually. I'm not sure
what is the gain here, other than minor consistence improvements. But we
should add a Warning during documentation generation (if there isn't one
already), to prevent them from spreading again.
Mauro Carvalho Chehab May 12, 2021, 8:44 a.m. UTC | #2
Em Mon, 10 May 2021 14:40:09 -0400
Gabriel Krisman Bertazi <krisman@collabora.com> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> 
> > While UTF-8 characters can be used at the Linux documentation,
> > the best is to use them only when ASCII doesn't offer a good replacement.
> > So, replace the occurences of the following UTF-8 characters:
> >
> > 	- U+00a0 (' '): NO-BREAK SPACE
> > 	- U+2013 ('–'): EN DASH
> > 	- U+2014 ('—'): EM DASH
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> >  Documentation/admin-guide/index.rst           |  2 +-
> >  Documentation/admin-guide/module-signing.rst  |  4 +-
> >  Documentation/admin-guide/ras.rst             | 94 +++++++++----------
> >  .../admin-guide/reporting-issues.rst          | 12 +--
> >  4 files changed, 56 insertions(+), 56 deletions(-)  
> 
> Hi Mauro,
> 
> This patch misses one occurrence of U+2014 in
> Documentation/admin-guide/sysctl/kernel.rst:1288.

It ended to be on a separate patch.

> There are also countless occurrences in Documentation/, outside of
> Documentation/admin-guide.  I suppose another patch in the series, which
> I didn't receive, will fix them?

Yes. This series should fix all occurrences inside Documentation/ on
*.rst files and on ABI, except for Documentation/translations[1].

[1] Still it probably makes sense to do a subset of the changes
from this series there, but touching non-Latin translations are riskier.

> These characters will just reappear elsewhere, eventually. I'm not sure
> what is the gain here, other than minor consistence improvements.

The main point here is that a large amount of those UTF-8 characters
appeared as result of document conversion from DocBook/LaTeX/Markdown.

As the conversion ended, I don't expect the need of re-doing a series
like that in the near future.

There are even some cases where the UTF-8 were doing wrong things, like
using an EN DASH instead of an hyphen in order to pass a command line
parameter, and the addition of non-printable BOM characters.

So, IMO, this is a necessarily cleanup after the conversion.

> But we
> should add a Warning during documentation generation (if there isn't one
> already), to prevent them from spreading again.

Not sure if it is worth... See: people can (and should) use UTF-8
characters when needed, like for instance using Latin accented 
characters on names and translations, and use Greek letters when
pertinent, like using MICRO SIGN or GREEK SMALL LETTER MU to
represent microsseconds.

On the other hand, using curly commas instead of ASCII ones and
dashes instead of -- and --- only makes harder for people to type
documents with normal editors without any gain, as Sphinx already
convert those into curly commas and EN/EM DASH when it generates 
html/pdf docs.


Thanks,
Mauro
David Woodhouse May 12, 2021, 9:25 a.m. UTC | #3
On Wed, 2021-05-12 at 10:44 +0200, Mauro Carvalho Chehab wrote:
> The main point here is that a large amount of those UTF-8 characters
> appeared as result of document conversion from DocBook/LaTeX/Markdown.
> 
> As the conversion ended, I don't expect the need of re-doing a series
> like that in the near future.
> 
> There are even some cases where the UTF-8 were doing wrong things, like
> using an EN DASH instead of an hyphen in order to pass a command line
> parameter, and the addition of non-printable BOM characters.
> 
> So, IMO, this is a necessarily cleanup after the conversion.

That part — fixing characters that are *wrong*, such as converting a
UTF-8 U+2014 EM DASH to a UTF-8 U+002D HYPHEN-MINUS, is reasonable
enough.

But you're not "avoiding using UTF-8 chars" there, as it says in the
title of this patch. HYPHEN-MINUS encoded as 0x2D *is* UTF-8.

I think you meant "avoid using non-ASCII chars", and even *that* is an
entirely bogus reason for doing anything at all, as discussed.

Limit yourself to fixing characters which are actually wrong, and it's
fine. One level of pointless trivia below spelling errors, mind you,
but at least not actively wrong.
Mauro Carvalho Chehab May 12, 2021, 10:22 a.m. UTC | #4
Em Wed, 12 May 2021 10:25:35 +0100
David Woodhouse <dwmw2@infradead.org> escreveu:

> On Wed, 2021-05-12 at 10:44 +0200, Mauro Carvalho Chehab wrote:
> > The main point here is that a large amount of those UTF-8 characters
> > appeared as result of document conversion from DocBook/LaTeX/Markdown.
> > 
> > As the conversion ended, I don't expect the need of re-doing a series
> > like that in the near future.
> > 
> > There are even some cases where the UTF-8 were doing wrong things, like
> > using an EN DASH instead of an hyphen in order to pass a command line
> > parameter, and the addition of non-printable BOM characters.
> > 
> > So, IMO, this is a necessarily cleanup after the conversion.  
> 
> That part — fixing characters that are *wrong*, such as converting a
> UTF-8 U+2014 EM DASH to a UTF-8 U+002D HYPHEN-MINUS, is reasonable
> enough.
> 
> But you're not "avoiding using UTF-8 chars" there, as it says in the
> title of this patch. HYPHEN-MINUS encoded as 0x2D *is* UTF-8.

Yeah, you're right, as ASCII is a subset of UTF-8 - as ASCII is
also subset of other charsets as well[1].

[1] ASCII is a subset for all charsets mentioned at:
       https://man7.org/linux/man-pages/man7/charsets.7.html

A more precise title would be something like:

	Use ASCII instead of non-ASCII UTF-8 alternate symbols
or
	Use ASCII subset instead of UTF-8 alternate symbols

See, the goal of this series is to address the cases where there are
multiple UTF-8 alternate symbols with the same meaning as the
original ASCII set. Most of them were introduced by tools like
DocBook/LaTeX/pandoc during document conversions[2], not by design,
but just because the UTF-8 non-ASCII symbols produce a nicer output 
in html or pdf. In another words, it was a toolset decision to change
them, diverging from what the author originally typed.

[2] I suspect that a few of them could have been introduced as a result
    of someone using a text editor like libreoffice (or equivalent),
    that has a similar behavior. 

With ReST, there's no need to use any those, as the building tools will
already do the such conversion when generating html/pdf output.

So, better to stick with ASCII subset on such cases, as it allows
to better use tools like grep and it makes easier to edit such files
on editors like vi, nano, emacs, etc.

Thanks,
Mauro
diff mbox series

Patch

diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index dc00afcabb95..b1692643718d 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -3,7 +3,7 @@  The Linux kernel user's and administrator's guide
 
 The following is a collection of user-oriented documents that have been
 added to the kernel over time.  There is, as yet, little overall order or
-organization here — this material was not written to be a single, coherent
+organization here - this material was not written to be a single, coherent
 document!  With luck things will improve quickly over time.
 
 This initial section contains overall information, including the README
diff --git a/Documentation/admin-guide/module-signing.rst b/Documentation/admin-guide/module-signing.rst
index 7d7c7c8a545c..bd1d2fef78e8 100644
--- a/Documentation/admin-guide/module-signing.rst
+++ b/Documentation/admin-guide/module-signing.rst
@@ -100,8 +100,8 @@  This has a number of options available:
      ``certs/signing_key.pem`` will disable the autogeneration of signing keys
      and allow the kernel modules to be signed with a key of your choosing.
      The string provided should identify a file containing both a private key
-     and its corresponding X.509 certificate in PEM form, or — on systems where
-     the OpenSSL ENGINE_pkcs11 is functional — a PKCS#11 URI as defined by
+     and its corresponding X.509 certificate in PEM form, or - on systems where
+     the OpenSSL ENGINE_pkcs11 is functional - a PKCS#11 URI as defined by
      RFC7512. In the latter case, the PKCS#11 URI should reference both a
      certificate and a private key.
 
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index 7b481b2a368e..00445adf8708 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -40,10 +40,10 @@  it causes data loss or system downtime.
 
 Among the monitoring measures, the most usual ones include:
 
-* CPU – detect errors at instruction execution and at L1/L2/L3 caches;
-* Memory – add error correction logic (ECC) to detect and correct errors;
-* I/O – add CRC checksums for transferred data;
-* Storage – RAID, journal file systems, checksums,
+* CPU - detect errors at instruction execution and at L1/L2/L3 caches;
+* Memory - add error correction logic (ECC) to detect and correct errors;
+* I/O - add CRC checksums for transferred data;
+* Storage - RAID, journal file systems, checksums,
   Self-Monitoring, Analysis and Reporting Technology (SMART).
 
 By monitoring the number of occurrences of error detections, it is possible
@@ -443,49 +443,49 @@  A typical EDAC system has the following structure under
 
 	/sys/devices/system/edac/
 	├── mc
-	│   ├── mc0
-	│   │   ├── ce_count
-	│   │   ├── ce_noinfo_count
-	│   │   ├── dimm0
-	│   │   │   ├── dimm_ce_count
-	│   │   │   ├── dimm_dev_type
-	│   │   │   ├── dimm_edac_mode
-	│   │   │   ├── dimm_label
-	│   │   │   ├── dimm_location
-	│   │   │   ├── dimm_mem_type
-	│   │   │   ├── dimm_ue_count
-	│   │   │   ├── size
-	│   │   │   └── uevent
-	│   │   ├── max_location
-	│   │   ├── mc_name
-	│   │   ├── reset_counters
-	│   │   ├── seconds_since_reset
-	│   │   ├── size_mb
-	│   │   ├── ue_count
-	│   │   ├── ue_noinfo_count
-	│   │   └── uevent
-	│   ├── mc1
-	│   │   ├── ce_count
-	│   │   ├── ce_noinfo_count
-	│   │   ├── dimm0
-	│   │   │   ├── dimm_ce_count
-	│   │   │   ├── dimm_dev_type
-	│   │   │   ├── dimm_edac_mode
-	│   │   │   ├── dimm_label
-	│   │   │   ├── dimm_location
-	│   │   │   ├── dimm_mem_type
-	│   │   │   ├── dimm_ue_count
-	│   │   │   ├── size
-	│   │   │   └── uevent
-	│   │   ├── max_location
-	│   │   ├── mc_name
-	│   │   ├── reset_counters
-	│   │   ├── seconds_since_reset
-	│   │   ├── size_mb
-	│   │   ├── ue_count
-	│   │   ├── ue_noinfo_count
-	│   │   └── uevent
-	│   └── uevent
+	│   ├── mc0
+	│   │   ├── ce_count
+	│   │   ├── ce_noinfo_count
+	│   │   ├── dimm0
+	│   │   │   ├── dimm_ce_count
+	│   │   │   ├── dimm_dev_type
+	│   │   │   ├── dimm_edac_mode
+	│   │   │   ├── dimm_label
+	│   │   │   ├── dimm_location
+	│   │   │   ├── dimm_mem_type
+	│   │   │   ├── dimm_ue_count
+	│   │   │   ├── size
+	│   │   │   └── uevent
+	│   │   ├── max_location
+	│   │   ├── mc_name
+	│   │   ├── reset_counters
+	│   │   ├── seconds_since_reset
+	│   │   ├── size_mb
+	│   │   ├── ue_count
+	│   │   ├── ue_noinfo_count
+	│   │   └── uevent
+	│   ├── mc1
+	│   │   ├── ce_count
+	│   │   ├── ce_noinfo_count
+	│   │   ├── dimm0
+	│   │   │   ├── dimm_ce_count
+	│   │   │   ├── dimm_dev_type
+	│   │   │   ├── dimm_edac_mode
+	│   │   │   ├── dimm_label
+	│   │   │   ├── dimm_location
+	│   │   │   ├── dimm_mem_type
+	│   │   │   ├── dimm_ue_count
+	│   │   │   ├── size
+	│   │   │   └── uevent
+	│   │   ├── max_location
+	│   │   ├── mc_name
+	│   │   ├── reset_counters
+	│   │   ├── seconds_since_reset
+	│   │   ├── size_mb
+	│   │   ├── ue_count
+	│   │   ├── ue_noinfo_count
+	│   │   └── uevent
+	│   └── uevent
 	└── uevent
 
 In the ``dimmX`` directories are EDAC control and attribute files for
diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index 18d8e25ba9df..f691930e13c0 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -824,7 +824,7 @@  and look a little lower at the table. At its top you'll see a line starting with
 mainline, which most of the time will point to a pre-release with a version
 number like '5.8-rc2'. If that's the case, you'll want to use this mainline
 kernel for testing, as that where all fixes have to be applied first. Do not let
-that 'rc' scare you, these 'development kernels' are pretty reliable — and you
+that 'rc' scare you, these 'development kernels' are pretty reliable - and you
 made a backup, as you were instructed above, didn't you?
 
 In about two out of every nine to ten weeks, mainline might point you to a
@@ -866,7 +866,7 @@  How to obtain a fresh Linux kernel
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 **Using a pre-compiled kernel**: This is often the quickest, easiest, and safest
-way for testing — especially is you are unfamiliar with the Linux kernel. The
+way for testing - especially is you are unfamiliar with the Linux kernel. The
 problem: most of those shipped by distributors or add-on repositories are build
 from modified Linux sources. They are thus not vanilla and therefore often
 unsuitable for testing and issue reporting: the changes might cause the issue
@@ -1248,7 +1248,7 @@  paragraph makes the severeness obvious.
 
 In case you performed a successful bisection, use the title of the change that
 introduced the regression as the second part of your subject. Make the report
-also mention the commit id of the culprit. In case of an unsuccessful bisection,
+also mention the commit id of the culprit. In case of an unsuccessful bisection,
 make your report mention the latest tested version that's working fine (say 5.7)
 and the oldest where the issue occurs (say 5.8-rc1).
 
@@ -1345,7 +1345,7 @@  about it to a chatroom or forum you normally hang out.
 
 **Be patient**: If you are really lucky you might get a reply to your report
 within a few hours. But most of the time it will take longer, as maintainers
-are scattered around the globe and thus might be in a different time zone – one
+are scattered around the globe and thus might be in a different time zone - one
 where they already enjoy their night away from keyboard.
 
 In general, kernel developers will take one to five business days to respond to
@@ -1388,7 +1388,7 @@  Here are your duties in case you got replies to your report:
 
 **Check who you deal with**: Most of the time it will be the maintainer or a
 developer of the particular code area that will respond to your report. But as
-issues are normally reported in public it could be anyone that's replying —
+issues are normally reported in public it could be anyone that's replying -
 including people that want to help, but in the end might guide you totally off
 track with their questions or requests. That rarely happens, but it's one of
 many reasons why it's wise to quickly run an internet search to see who you're
@@ -1716,7 +1716,7 @@  Maybe their test hardware broke, got replaced by something more fancy, or is so
 old that it's something you don't find much outside of computer museums
 anymore. Sometimes developer stops caring for their code and Linux at all, as
 something different in their life became way more important. In some cases
-nobody is willing to take over the job as maintainer – and nobody can be forced
+nobody is willing to take over the job as maintainer - and nobody can be forced
 to, as contributing to the Linux kernel is done on a voluntary basis. Abandoned
 drivers nevertheless remain in the kernel: they are still useful for people and
 removing would be a regression.