diff mbox series

[v2,2/2] gitweb: remove invalid http-equiv="content-type"

Message ID 20220308010711.61817-3-jason@jasonyundt.email (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Jason Yundt March 8, 2022, 1:07 a.m. UTC
Before this change, gitweb would generate pages which included:

	<meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8"/>

A meta element with http-equiv="content-type" is said to be in the
"Encoding declaration state". According to the HTML Standard,

	The Encoding declaration state may be used in HTML documents,
	but elements with an http-equiv attribute in that state must not
	be used in XML documents.

	Source: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-http-equiv-content-type>

Gitweb always generates XML documents, so its use of
http-equiv="content-type" was invalid. This change replaces that tag
with

	<meta charset="utf-8"/>

which is equivalent [1] and allowed in XML documents [2].

[1]: <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#attr-http-equiv>
[2]: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-charset>

Signed-off-by: Jason Yundt <jason@jasonyundt.email>
---
 gitweb/gitweb.perl                        |  2 +-
 t/t9502-gitweb-standalone-parse-output.sh | 16 ++++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

Comments

brian m. carlson March 8, 2022, 1:50 a.m. UTC | #1
On 2022-03-08 at 01:07:11, Jason Yundt wrote:
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index fbd1c20a23..59457c1004 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -4225,7 +4225,7 @@ sub git_header_html {
>  <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke -->
>  <!-- git core binaries version $git_version -->
>  <head>
> -<meta http-equiv="content-type" content="$content_type; charset=utf-8"/>
> +<meta charset="utf-8"/>

I don't actually think this is an improvement.  I don't think it's
necessary, considering we have an XML declaration and the HTTP header,
both of which already say it's UTF-8 and will take precedence over this.
Ævar Arnfjörð Bjarmason March 8, 2022, 12:44 p.m. UTC | #2
On Tue, Mar 08 2022, brian m. carlson wrote:

> [[PGP Signed Part:Undecided]]
> On 2022-03-08 at 01:07:11, Jason Yundt wrote:
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index fbd1c20a23..59457c1004 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -4225,7 +4225,7 @@ sub git_header_html {
>>  <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke -->
>>  <!-- git core binaries version $git_version -->
>>  <head>
>> -<meta http-equiv="content-type" content="$content_type; charset=utf-8"/>
>> +<meta charset="utf-8"/>
>
> I don't actually think this is an improvement.  I don't think it's
> necessary, considering we have an XML declaration and the HTTP header,
> both of which already say it's UTF-8 and will take precedence over this.

Ageed. I was a bit surprised per Jason's
https://lore.kernel.org/git/109813056.nniJfEyVGO@jason-desktop-linux/
that the removal wasn't kept.

I.e. he was replying to a question of mine asking whether we didn't need
this data at rest, e.g if you save the page. I didn't notice the "<?xml
version..." we emit, which seems to be enough.

I.e. this seems to have always been redundant going back to c994d620cc8
(v220, 2005-08-07), or rather, the character set part of it.

Maybe I still don't understand this, but the commit message seems to me
be conflating whether we send the *right* http-equiv with whether we
send it at all, i.e. if the problem is that XML documents shouldn't be
text/html isn't this correct?:
	
	diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
	index fbd1c20a232..c1c5af0b197 100755
	--- a/gitweb/gitweb.perl
	+++ b/gitweb/gitweb.perl
	@@ -4049,7 +4049,13 @@ sub get_page_title {
	 	return $title;
	 }
	 
	+sub get_content_type_xml {
	+	return 'application/xhtml+xml';
	+}
	+
	 sub get_content_type_html {
	+	my ($want_xml) = @_;
	+
	 	# require explicit support from the UA if we are to send the page as
	 	# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
	 	# we have to do this because MSIE sometimes globs '*/*', pretending to
	@@ -4057,7 +4063,7 @@ sub get_content_type_html {
	 	if (defined $cgi->http('HTTP_ACCEPT') &&
	 	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
	 	    $cgi->Accept('application/xhtml+xml') != 0) {
	-		return 'application/xhtml+xml';
	+		return get_content_type_html();
	 	} else {
	 		return 'text/html';
	 	}
	@@ -4214,6 +4220,7 @@ sub git_header_html {
	 
	 	my $title = get_page_title();
	 	my $content_type = get_content_type_html();
	+	my $content_type_xml = get_content_type_html();
	 	print $cgi->header(-type=>$content_type, -charset => 'utf-8',
	 	                   -status=> $status, -expires => $expires)
	 		unless ($opts{'-no_http_header'});
	@@ -4225,7 +4232,7 @@ sub git_header_html {
	 <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke -->
	 <!-- git core binaries version $git_version -->
	 <head>
	-<meta http-equiv="content-type" content="$content_type; charset=utf-8"/>
	+<meta http-equiv="content-type" content="$content_type_xml; charset=utf-8"/>
	 <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version"/>
	 <meta name="robots" content="index, nofollow"/>
	 <title>$title</title>

Of course we might then *also* decide that <meta http-equiv> in this
case isn't needed at all, but isn't that a seperate change?

And won't conforming browsers treat application/xhtml+xml differently
when the page is saved? A long time ago (Idid some web development)
using it would enable pedantic strictness in browsers, i.e. unclosed
tags etc. would be a hard error, but I can't reproduce that locally in
either Firefox or Chrome now (with just the gitweb output as-is with
that http-equiv tweaked).

So maybe it does nothing, or maybe it's just those browser...
Jason Yundt March 8, 2022, 2:54 p.m. UTC | #3
On Tuesday, March 8, 2022 7:44:35 AM EST Ævar Arnfjörð Bjarmason wrote:
> Maybe I still don't understand this, but the commit message seems to me
> be conflating whether we send the *right* http-equiv with whether we
> send it at all,

The intent behind the commit message is to say that <meta
http-equiv="content-type" …> is never correct in XHTML.

> i.e. if the problem is that XML documents shouldn't be
> text/html isn't this correct?:
> 	
> 	diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> 	index fbd1c20a232..c1c5af0b197 100755
> 	--- a/gitweb/gitweb.perl
> 	+++ b/gitweb/gitweb.perl
> 	@@ -4049,7 +4049,13 @@ sub get_page_title {
> 	 	return $title;
> 	 }
> 	 
> 	+sub get_content_type_xml {
> 	+	return 'application/xhtml+xml';
> 	+}
> 	+
> 	 sub get_content_type_html {
> 	+	my ($want_xml) = @_;
> 	+
> 	 	# require explicit support from the UA if we are to send the page as
> 	 	# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
> 	 	# we have to do this because MSIE sometimes globs '*/*', pretending to
> 	@@ -4057,7 +4063,7 @@ sub get_content_type_html {
> 	 	if (defined $cgi->http('HTTP_ACCEPT') &&
> 	 	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
> 	 	    $cgi->Accept('application/xhtml+xml') != 0) {
> 	-		return 'application/xhtml+xml';
> 	+		return get_content_type_html();

I’m guessing that you meant to call get_content_type_xml() here.

> 	 	} else {
> 	 		return 'text/html';
> 	 	}
> 	@@ -4214,6 +4220,7 @@ sub git_header_html {
> 	 
> 	 	my $title = get_page_title();
> 	 	my $content_type = get_content_type_html();
> 	+	my $content_type_xml = get_content_type_html();

I’m also guessing that you meant to call get_content_type_xml() here.

> 	 	print $cgi->header(-type=>$content_type, -charset => 'utf-8',
> 	 	                   -status=> $status, -expires => $expires)
> 	 		unless ($opts{'-no_http_header'});
> 	@@ -4225,7 +4232,7 @@ sub git_header_html {
> 	 <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke -->
> 	 <!-- git core binaries version $git_version -->
> 	 <head>
> 	-<meta http-equiv="content-type" content="$content_type; charset=utf-8"/>
> 	+<meta http-equiv="content-type" content="$content_type_xml; charset=utf-8"/>
> 	 <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version"/>
> 	 <meta name="robots" content="index, nofollow"/>
> 	 <title>$title</title>

With those assumptions in mind, I don’t think that your code is correct if
the problem is that XML documents shouldn't be text/html. Here’s why:

1. XML documents shouldn’t contain http-equiv="content-type" [1].
2. When a meta’s http-equiv attribute equals content-type, then its content
    attribute should equal “the literal string "text/html;", optionally
    followed by any number of ASCII whitespace, followed by the literal
    string "charset=utf-8".” [1]

[1]: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-http-equiv-content-type>
diff mbox series

Patch

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index fbd1c20a23..59457c1004 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -4225,7 +4225,7 @@  sub git_header_html {
 <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke -->
 <!-- git core binaries version $git_version -->
 <head>
-<meta http-equiv="content-type" content="$content_type; charset=utf-8"/>
+<meta charset="utf-8"/>
 <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version"/>
 <meta name="robots" content="index, nofollow"/>
 <title>$title</title>
diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
index e7363511dd..0b06e2d6b0 100755
--- a/t/t9502-gitweb-standalone-parse-output.sh
+++ b/t/t9502-gitweb-standalone-parse-output.sh
@@ -207,4 +207,20 @@  test_expect_success 'xss checks' '
 	xss "" "$TAG+"
 '
 
+check_encoding_meta_element() {
+	gitweb_run "$@" &&
+	! grep -E "http-equiv=['\"]?content-type" gitweb.body &&
+	grep -F '<meta charset="utf-8"/>' gitweb.body
+}
+
+# One of those can be used in XHTML, the other one can't. See:
+# <https://html.spec.whatwg.org/dev/semantics.html#attr-meta-charset>
+# <https://html.spec.whatwg.org/dev/semantics.html#attr-meta-http-equiv-content-type>
+test_expect_success 'no http-equiv="content-type", yes charset="utf-8"' '
+	check_encoding_meta_element &&
+	check_encoding_meta_element "p=.git" &&
+	check_encoding_meta_element "p=.git;a=log" &&
+	check_encoding_meta_element "p=.git;a=tree"
+'
+
 test_done