Message ID | 1441147759.9215.44.camel@decadent.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 01 Sep 2015 23:49:19 +0100 Ben Hutchings <ben@decadent.org.uk> wrote: > Currently the encoding of documents generated by DocBook depends on > the current locale. Make the output reproducible independently of > the locale, by setting the encoding to UTF-8 (LC_CTYPE=C.UTF-8) by > preference, or ASCII (LC_CTYPE=C) as a fallback. I guess I have to ask, though: doesn't it seem that having the docs produced according to the current locale is the Right Thing to do? Users have their locale set as it is for a reason, it seems like the production of textual documents should respect their choice. Am I missing something here? Thanks, jon -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri 2015-09-11 15:30:59 -0400, Jonathan Corbet wrote: > On Tue, 01 Sep 2015 23:49:19 +0100 > Ben Hutchings <ben@decadent.org.uk> wrote: > >> Currently the encoding of documents generated by DocBook depends on >> the current locale. Make the output reproducible independently of >> the locale, by setting the encoding to UTF-8 (LC_CTYPE=C.UTF-8) by >> preference, or ASCII (LC_CTYPE=C) as a fallback. > > I guess I have to ask, though: doesn't it seem that having the docs > produced according to the current locale is the Right Thing to do? Users > have their locale set as it is for a reason, it seems like the production > of textual documents should respect their choice. > > Am I missing something here? I sympathize with Jonathan's general concern here -- if this patchset makes it impossible for people to build documentation with (for example) their preferred collation order, it would be suboptimal. On the other hand, this seems to focus on character encodings specifically; do we really want to encourage any sort of encodings other than UTF-8? The only plausible arguments i've heard for documents that are exclusively CJK characters, which could achieve a modest size reduction using more targeted encodings. afaik, there are no such documents in the kernel, and i doubt there ever will be. --dkg -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 11 Sep 2015 17:40:33 -0400 Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote: > I sympathize with Jonathan's general concern here -- if this patchset > makes it impossible for people to build documentation with (for example) > their preferred collation order, it would be suboptimal. > > On the other hand, this seems to focus on character encodings > specifically; do we really want to encourage any sort of encodings other > than UTF-8? The only plausible arguments i've heard for documents that > are exclusively CJK characters, which could achieve a modest size > reduction using more targeted encodings. afaik, there are no such > documents in the kernel, and i doubt there ever will be. Well, there are CJK documents in the kernel, actually, though none are in the DocBook directory currently. Regardless of this, it's not a matter of which encodings we are encouraging. If we want to encourage utf-8 use, we might not want to start in the kernel's documentation directory. I think we need to respect the user's choice in this regard and not try to override it. If I take this patch, I suspect somebody will yell at me for it... With regard to reproducible builds: success in this area certainly requires reproducing the build environment as well. Honestly, I think that needs to include the locale settings. Let me know if you think I've totally misunderstood things. jon -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2015-09-11 at 13:30 -0600, Jonathan Corbet wrote: > On Tue, 01 Sep 2015 23:49:19 +0100 > Ben Hutchings <ben@decadent.org.uk> wrote: > > > Currently the encoding of documents generated by DocBook depends on > > the current locale. Make the output reproducible independently of > > the locale, by setting the encoding to UTF-8 (LC_CTYPE=C.UTF-8) by > > preference, or ASCII (LC_CTYPE=C) as a fallback. > > I guess I have to ask, though: doesn't it seem that having the docs > produced according to the current locale is the Right Thing to do? Users > have their locale set as it is for a reason, it seems like the production > of textual documents should respect their choice. > > Am I missing something here? Yes - the locale's character encoding applies to plain text, but rich text formats can have a locale-independent encoding which the viewer will automatically to the current locale's encoding. For HTML, the document encoding can be explicit in the document header (and is, in this case). Manual pages were already consistently encoded in UTF-8, as this is the default behaviour of DocBook-XSL (and is what man-db prefers as input). PDF and Postscript documents have arbitrary and explicit mappings from character numbers (or names) to glyphs, and PDF documents normally have a mapping from glyphs back to Unicode code points to support searching and copying text. Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 14 Sep 2015 01:32:50 +0100 Ben Hutchings <ben@decadent.org.uk> wrote: > > I guess I have to ask, though: doesn't it seem that having the docs > > produced according to the current locale is the Right Thing to do? Users > > have their locale set as it is for a reason, it seems like the production > > of textual documents should respect their choice. > > > > Am I missing something here? > > Yes - the locale's character encoding applies to plain text, but rich > text formats can have a locale-independent encoding which the viewer > will automatically to the current locale's encoding. > > For HTML, the document encoding can be explicit in the document header > (and is, in this case). > > Manual pages were already consistently encoded in UTF-8, as this is the > default behaviour of DocBook-XSL (and is what man-db prefers as input). > > PDF and Postscript documents have arbitrary and explicit mappings from > character numbers (or names) to glyphs, and PDF documents normally have > a mapping from glyphs back to Unicode code points to support searching > and copying text. OK, I guess you've talked me into it. Can I ask you for one last favor, though: please resubmit this patch with a couple of tweaks: - Based off current mainline, please (or docs-next, but that shouldn't be necessary). The patch as sent doesn't apply. - Could you add a comment to the check-lc_ctype proglet so that somebody stumbling across it in the scripts directory knows why it's there? Thanks, jon -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 198e9b5..9af25da 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -68,6 +68,12 @@ installmandocs: mandocs #External programs used KERNELDOC = $(srctree)/scripts/kernel-doc DOCPROC = $(objtree)/scripts/docproc +CHECK_LC_CTYPE = $(objtree)/scripts/check-lc_ctype + +# Use a fixed encoding - UTF-8 if the C library has support built-in +# or ASCII if not +LC_CTYPE := $(call try-run, LC_CTYPE=C.UTF-8 $(CHECK_LC_CTYPE),C.UTF-8,C) +export LC_CTYPE XMLTOFLAGS = -m $(srctree)/$(src)/stylesheet.xsl XMLTOFLAGS += --skip-validation diff --git a/Makefile b/Makefile index 13270c0..5846c06 100644 --- a/Makefile +++ b/Makefile @@ -1338,7 +1338,7 @@ $(help-board-dirs): help-%: # Documentation targets # --------------------------------------------------------------------------- %docs: scripts_basic FORCE - $(Q)$(MAKE) $(build)=scripts build_docproc + $(Q)$(MAKE) $(build)=scripts build_docproc build_check-lc_ctype $(Q)$(MAKE) $(build)=Documentation/DocBook $@ else # KBUILD_EXTMOD diff --git a/scripts/Makefile b/scripts/Makefile index 2016a64..6f0349f 100644 --- a/scripts/Makefile +++ b/scripts/Makefile @@ -7,6 +7,7 @@ # conmakehash: Create chartable # conmakehash: Create arrays for initializing the kernel console tables # docproc: Used in Documentation/DocBook +# check-lc_ctype: Used in Documentation/DocBook HOST_EXTRACFLAGS += -I$(srctree)/tools/include @@ -23,14 +24,16 @@ HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include always := $(hostprogs-y) $(hostprogs-m) # The following hostprogs-y programs are only build on demand -hostprogs-y += unifdef docproc +hostprogs-y += unifdef docproc check-lc_ctype # These targets are used internally to avoid "is up to date" messages -PHONY += build_unifdef build_docproc +PHONY += build_unifdef build_docproc build_check-lc_ctype build_unifdef: $(obj)/unifdef @: build_docproc: $(obj)/docproc @: +build_check-lc_ctype: $(obj)/check-lc_ctype + @: subdir-$(CONFIG_MODVERSIONS) += genksyms subdir-y += mod diff --git a/scripts/check-lc_ctype.c b/scripts/check-lc_ctype.c new file mode 100644 index 0000000..51fe229 --- /dev/null +++ b/scripts/check-lc_ctype.c @@ -0,0 +1,6 @@ +#include <locale.h> + +int main(void) +{ + return !setlocale(LC_CTYPE, ""); +}
Currently the encoding of documents generated by DocBook depends on the current locale. Make the output reproducible independently of the locale, by setting the encoding to UTF-8 (LC_CTYPE=C.UTF-8) by preference, or ASCII (LC_CTYPE=C) as a fallback. LC_CTYPE can normally be overridden by LC_ALL, but the top-level Makefile unsets that. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> --- Documentation/DocBook/Makefile | 6 ++++++ Makefile | 2 +- scripts/Makefile | 7 +++++-- scripts/check-lc_ctype.c | 6 ++++++ 4 files changed, 18 insertions(+), 3 deletions(-) create mode 100644 scripts/check-lc_ctype.c