diff mbox series

gitweb: switch to a modern DOCTYPE

Message ID 20220601012647.1439480-1-jason@jasonyundt.email (mailing list archive)
State Accepted
Commit 0e1a85ca7558a9ec6f2e708dcc106c455a50776d
Headers show
Series gitweb: switch to a modern DOCTYPE | expand

Commit Message

Jason Yundt June 1, 2022, 1:26 a.m. UTC
According to the HTML Standard FAQ:

	“What is the DOCTYPE for modern HTML documents?

	In text/html documents:

		<!DOCTYPE html>

	In documents delivered with an XML media type: no DOCTYPE is required
	and its use is generally unnecessary. However, you may use one if you
	want (see the following question). Note that the above is well-formed
	XML.”

	Source: [1]

Gitweb uses an XHTML 1.0 DOCTYPE:

	<!DOCTYPE html PUBLIC
	"-//W3C//DTD XHTML 1.0 Strict//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

While that DOCTYPE is still valid [2], it has several disadvantages:

1. It’s misleading. The DTD that browsers are supposed to use with that
   DOCTYPE has nothing to do with XHTML 1.0 and isn’t available at the URL
   that is given [2].
2. It’s obsolete. XHTML 1.0 was last revised in 2002 and was superseded in
   2018 [3].
3. It’s unreliable. Gitweb uses &nbsp; and &sdot; but lets an external file
   define them. “[…U]using entity references for characters in XML documents
   is unsafe if they are defined in an external file (except for &lt;, &gt;,
   &amp;, &quot;, and &apos;).” [4]

[1]: <https://github.com/whatwg/html/blob/main/FAQ.md#what-is-the-doctype-for-modern-html-documents>
[2]: <https://html.spec.whatwg.org/multipage/xhtml.html#parsing-xhtml-documents>
[3]: <https://www.w3.org/TR/xhtml1/#xhtml>
[4]: <https://html.spec.whatwg.org/multipage/xhtml.html#writing-xhtml-documents>

Signed-off-by: Jason Yundt <jason@jasonyundt.email>
---
 gitweb/gitweb.perl                        |  5 ++++-
 t/t9502-gitweb-standalone-parse-output.sh | 14 ++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

Comments

brian m. carlson June 2, 2022, 12:41 a.m. UTC | #1
On 2022-06-01 at 01:26:47, Jason Yundt wrote:
> According to the HTML Standard FAQ:
> 
> 	“What is the DOCTYPE for modern HTML documents?
> 
> 	In text/html documents:
> 
> 		<!DOCTYPE html>
> 
> 	In documents delivered with an XML media type: no DOCTYPE is required
> 	and its use is generally unnecessary. However, you may use one if you
> 	want (see the following question). Note that the above is well-formed
> 	XML.”
> 
> 	Source: [1]
> 
> Gitweb uses an XHTML 1.0 DOCTYPE:
> 
> 	<!DOCTYPE html PUBLIC
> 	"-//W3C//DTD XHTML 1.0 Strict//EN"
> 	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> 
> While that DOCTYPE is still valid [2], it has several disadvantages:
> 
> 1. It’s misleading. The DTD that browsers are supposed to use with that
>    DOCTYPE has nothing to do with XHTML 1.0 and isn’t available at the URL
>    that is given [2].

While the WHATWG may claim that, an XML parser is absolutely within its
rights to refer to and use that DTD, and in fact should do so unless its
catalog directs it elsewhere.  It may be that some browsers use an
internal catalog that refers to a different DTD, however.

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 606b50104c..1835487ab2 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -4219,7 +4219,10 @@ sub git_header_html {
>  	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
>  	print <<EOF;
>  <?xml version="1.0" encoding="utf-8"?>
> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> +<!DOCTYPE html [
> +	<!ENTITY nbsp "&#xA0;">
> +	<!ENTITY sdot "&#x22C5;">
> +]>

I think this should be fine.  It defines the entities we need and
appears to be valid XML.  I don't think there should be any problem
upgrading to XHTML 5 here.
Junio C Hamano June 2, 2022, 6:10 a.m. UTC | #2
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> While that DOCTYPE is still valid [2], it has several disadvantages:
>> 
>> 1. It’s misleading. The DTD that browsers are supposed to use with that
>>    DOCTYPE has nothing to do with XHTML 1.0 and isn’t available at the URL
>>    that is given [2].
>
> While the WHATWG may claim that, an XML parser is absolutely within its
> rights to refer to and use that DTD, and in fact should do so unless its
> catalog directs it elsewhere.  It may be that some browsers use an
> internal catalog that refers to a different DTD, however.
>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 606b50104c..1835487ab2 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -4219,7 +4219,10 @@ sub git_header_html {
>>  	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
>>  	print <<EOF;
>>  <?xml version="1.0" encoding="utf-8"?>
>> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
>> +<!DOCTYPE html [
>> +	<!ENTITY nbsp "&#xA0;">
>> +	<!ENTITY sdot "&#x22C5;">
>> +]>
>
> I think this should be fine.  It defines the entities we need and
> appears to be valid XML.  I don't think there should be any problem
> upgrading to XHTML 5 here.

OK, so in short, the patch text looks OK and the proposed log
message needs a bit more work?

Thanks.
Bagas Sanjaya June 2, 2022, 7:26 a.m. UTC | #3
On 6/1/22 08:26, Jason Yundt wrote:
> According to the HTML Standard FAQ:
> 
> 	“What is the DOCTYPE for modern HTML documents?
> 
> 	In text/html documents:
> 
> 		<!DOCTYPE html>
> 
> 	In documents delivered with an XML media type: no DOCTYPE is required
> 	and its use is generally unnecessary. However, you may use one if you
> 	want (see the following question). Note that the above is well-formed
> 	XML.”
> 
> 	Source: [1]
> 
> Gitweb uses an XHTML 1.0 DOCTYPE:
> 
> 	<!DOCTYPE html PUBLIC
> 	"-//W3C//DTD XHTML 1.0 Strict//EN"
> 	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> 
> While that DOCTYPE is still valid [2], it has several disadvantages:
> 
> 1. It’s misleading. The DTD that browsers are supposed to use with that
>    DOCTYPE has nothing to do with XHTML 1.0 and isn’t available at the URL
>    that is given [2].
> 2. It’s obsolete. XHTML 1.0 was last revised in 2002 and was superseded in
>    2018 [3].
> 3. It’s unreliable. Gitweb uses &nbsp; and &sdot; but lets an external file
>    define them. “[…U]using entity references for characters in XML documents
>    is unsafe if they are defined in an external file (except for &lt;, &gt;,
>    &amp;, &quot;, and &apos;).” [4]
> 
> [1]: <https://github.com/whatwg/html/blob/main/FAQ.md#what-is-the-doctype-for-modern-html-documents>
> [2]: <https://html.spec.whatwg.org/multipage/xhtml.html#parsing-xhtml-documents>
> [3]: <https://www.w3.org/TR/xhtml1/#xhtml>
> [4]: <https://html.spec.whatwg.org/multipage/xhtml.html#writing-xhtml-documents>
> 
> Signed-off-by: Jason Yundt <jason@jasonyundt.email>

So basically what this patch does is switch to HTML5, right? That is because
I can see DOCTYPE "upgrade" to use "<!DOCTYPE html>", which is the DOCTYPE
for HTML5. If it does, then mention HTML5 in v2.
diff mbox series

Patch

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 606b50104c..1835487ab2 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -4219,7 +4219,10 @@  sub git_header_html {
 	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
 	print <<EOF;
 <?xml version="1.0" encoding="utf-8"?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<!DOCTYPE html [
+	<!ENTITY nbsp "&#xA0;">
+	<!ENTITY sdot "&#x22C5;">
+]>
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
 <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke -->
 <!-- git core binaries version $git_version -->
diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
index 8cb582f0e6..81d5625557 100755
--- a/t/t9502-gitweb-standalone-parse-output.sh
+++ b/t/t9502-gitweb-standalone-parse-output.sh
@@ -220,4 +220,18 @@  test_expect_success 'no http-equiv="content-type" in XHTML' '
 	no_http_equiv_content_type "p=.git;a=tree"
 '
 
+proper_doctype() {
+	gitweb_run "$@" &&
+	grep -F "<!DOCTYPE html [" gitweb.body &&
+	grep "<!ENTITY nbsp" gitweb.body &&
+	grep "<!ENTITY sdot" gitweb.body
+}
+
+test_expect_success 'Proper DOCTYPE with entity declarations' '
+	proper_doctype &&
+	proper_doctype "p=.git" &&
+	proper_doctype "p=.git;a=log" &&
+	proper_doctype "p=.git;a=tree"
+'
+
 test_done