Message ID | 20200329002028.26080-1-julm+git@sourcephile.fr (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | gitweb: fix UTF-8 encoding when using CGI::Fast | expand |
Julien Moutinho <julm+git@sourcephile.fr> writes: > require CGI::Fast; > our $CGI = 'CGI::Fast'; > + # FCGI is not Unicode aware hence the UTF-8 encoding must be done manually. > + # However no encoding must be done within git_blob_plain() and git_snapshot() > + # which must still output in raw binary mode. I guess this comment would be sufficient to help future developers when they find that newer version of CGI::Fast has become Unicode aware later can make this part conditional to the version of the module, perhaps? Would "use CGI::Fast (-utf8)" instead of the whole thing help, by the way? > + no warnings 'redefine'; > + my $enc = Encode::find_encoding('UTF-8'); > + *FCGI::Stream::PRINT = sub { > + my @OUTPUT = @_; > + for (my $i = 1; $i < @_; $i++) { > + $OUTPUT[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC); > + } > + @_ = @OUTPUT; > + goto $FCGI_Stream_PRINT_raw; > + }; > my $request_number = 0; > # let each child service 100 requests > @@ -7079,6 +7093,7 @@ sub git_blob_plain { > ($sandbox ? 'attachment' : 'inline') > . '; filename="' . $save_as . '"'); > local $/ = undef; > + local *FCGI::Stream::PRINT = $FCGI_Stream_PRINT_raw; > binmode STDOUT, ':raw'; > print <$fd>; > binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi > @@ -7417,6 +7432,7 @@ sub git_snapshot { > > open my $fd, "-|", $cmd > or die_error(500, "Execute git-archive failed"); > + local *FCGI::Stream::PRINT = $FCGI_Stream_PRINT_raw; > binmode STDOUT, ':raw'; > print <$fd>; > binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
Le dim. 29 mars 2020 09h06 -0700, Junio C Hamano a écrit : > Julien Moutinho <julm+git@sourcephile.fr> writes: > > require CGI::Fast; > > our $CGI = 'CGI::Fast'; > > + # FCGI is not Unicode aware hence the UTF-8 encoding must be done manually. > > + # However no encoding must be done within git_blob_plain() and git_snapshot() > > + # which must still output in raw binary mode. > > I guess this comment would be sufficient to help future developers > when they find that newer version of CGI::Fast has become Unicode > aware later can make this part conditional to the version of the > module, perhaps? Sure, though as long as CGI::Fast will be relying on FCGI, I would not bet on any improvement on this bug which has been waiting to be fixed on FCGI's bugtracker since 2013: https://rt.cpan.org/Public/Bug/Display.html?id=89383 > Would "use CGI::Fast (-utf8)" instead of the whole thing help, by > the way? Unfortunately not, the -utf8 option (aka. $CGI::$PARAM_UTF8) controls the decoding of the input parameters, not the encoding of the output. - https://metacpan.org/pod/CGI#-utf8 - https://stackoverflow.com/questions/5005104/how-to-force-fastcgi-to-encode-form-data-as-utf-8-as-cgi-pm-has-option/7097698#7097698 > > our $FCGI_Stream_PRINT_raw = \&FCGI::Stream::PRINT; > [...] > > + local *FCGI::Stream::PRINT = $FCGI_Stream_PRINT_raw; I had forgotten to test the patch without FastCGI, but AFAICS it is innocuous in non-FastCGI mode: Perl does not chokes on \&FCGI::Stream::PRINT despite it being not reachable, and the local binding emits no redefine warning since it's not a redefine in this case, but a define. Regards,
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 65a3a9e..1a02a12 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -1291,9 +1291,23 @@ sub run_request { our ($pre_dispatch_hook, $post_dispatch_hook, $pre_listen_hook); our $CGI = 'CGI'; our $cgi; +our $FCGI_Stream_PRINT_raw = \&FCGI::Stream::PRINT; sub configure_as_fcgi { require CGI::Fast; our $CGI = 'CGI::Fast'; + # FCGI is not Unicode aware hence the UTF-8 encoding must be done manually. + # However no encoding must be done within git_blob_plain() and git_snapshot() + # which must still output in raw binary mode. + no warnings 'redefine'; + my $enc = Encode::find_encoding('UTF-8'); + *FCGI::Stream::PRINT = sub { + my @OUTPUT = @_; + for (my $i = 1; $i < @_; $i++) { + $OUTPUT[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC); + } + @_ = @OUTPUT; + goto $FCGI_Stream_PRINT_raw; + }; my $request_number = 0; # let each child service 100 requests @@ -7079,6 +7093,7 @@ sub git_blob_plain { ($sandbox ? 'attachment' : 'inline') . '; filename="' . $save_as . '"'); local $/ = undef; + local *FCGI::Stream::PRINT = $FCGI_Stream_PRINT_raw; binmode STDOUT, ':raw'; print <$fd>; binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi @@ -7417,6 +7432,7 @@ sub git_snapshot { open my $fd, "-|", $cmd or die_error(500, "Execute git-archive failed"); + local *FCGI::Stream::PRINT = $FCGI_Stream_PRINT_raw; binmode STDOUT, ':raw'; print <$fd>; binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
FCGI streams are implemented using the older stream API: TIEHANDLE, therefore applying PerlIO layers using binmode() has no effect to them. The solution in this patch is to redefine the FCGI::Stream::PRINT function to use UTF-8 as output encoding, except within git_blob_plain() and git_snapshot() which must still output in raw binary mode. This problem and solution were previously reported back in 2012: - http://git.661346.n2.nabble.com/Gitweb-running-as-FCGI-does-not-print-its-output-in-UTF-8-td7573415.html - http://stackoverflow.com/questions/5005104 Signed-off-by: Julien Moutinho <julm+git@sourcephile.fr> --- gitweb/gitweb.perl | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)