git gui: fix branch name encoding error on git gui
diff mbox series

Message ID 20191207002842.32208-1-kato-k@ksysllc.co.jp
State New
Headers show
Series
  • git gui: fix branch name encoding error on git gui
Related show

Commit Message

加藤一博 Dec. 7, 2019, 12:29 a.m. UTC
After "git checkout -b '漢字'" to create a branch with UTF-8
character in it, "git gui" shows the branch name incorrectly,
as it forgets to turn the bytes
read from the "git for-each-ref" and
read from "HEAD" file
into Unicode characters.

Signed-off-by: Kazuhiro Kato <kato-k@ksysllc.co.jp>
---
 git-gui.sh     | 1 +
 lib/branch.tcl | 2 ++
 2 files changed, 3 insertions(+)

Comments

Junio C Hamano Dec. 9, 2019, 7:15 p.m. UTC | #1
加藤一博 <kato-k@ksysllc.co.jp> writes:

> After "git checkout -b '漢字'" to create a branch with UTF-8
> character in it, "git gui" shows the branch name incorrectly,
> as it forgets to turn the bytes
> read from the "git for-each-ref" and
> read from "HEAD" file
> into Unicode characters.

Thanks.

Note to the git-gui mentainer.  The above may want to be
line-wrapped a bit.

> Signed-off-by: Kazuhiro Kato <kato-k@ksysllc.co.jp>
> ---
>  git-gui.sh     | 1 +
>  lib/branch.tcl | 2 ++
>  2 files changed, 3 insertions(+)
>
> diff --git a/git-gui.sh b/git-gui.sh
> index 0d21f56..8f4a9ae 100755
> --- a/git-gui.sh
> +++ b/git-gui.sh
> @@ -684,6 +684,7 @@ proc load_current_branch {} {
>  	global current_branch is_detached
>  
>  	set fd [open [gitdir HEAD] r]
> +	fconfigure $fd -translation binary -encoding utf-8
>  	if {[gets $fd ref] < 1} {
>  		set ref {}
>  	}

A comment totally outside the scope of this fix to anybody
interested in further working on this code.

This piece of code is way too intimate with the implementation
details of HEAD and yet not intimate enough to know that HEAD can be
a symlink (in other words, it is a poor imitation of the real logic
implemented in git core).  A kosher way to implement this would be
to call

	git symbolic-ref --quiet --short HEAD

which would succeed and give the branch name to its standard output,
or would fail when the head is detached.  Set "current_branch" and
"is_detached" according to the outcome.

And yes, Kato-san's fconfigure fix in this patch will still be
relevant even after such a fix to the implementation of this proc.

> diff --git a/lib/branch.tcl b/lib/branch.tcl 
> index 777eeb7..8b0c485 100644
> --- a/lib/branch.tcl
> +++ b/lib/branch.tcl
> @@ -8,6 +8,7 @@ proc load_all_heads {} {
>  	set rh_len [expr {[string length $rh] + 1}]
>  	set all_heads [list]
>  	set fd [git_read for-each-ref --format=%(refname) $rh]
> +	fconfigure $fd -translation binary -encoding utf-8
>  	while {[gets $fd line] > 0} {
>  		if {!$some_heads_tracking || ![is_tracking_branch $line]} {
>  			lappend all_heads [string range $line $rh_len end]
> @@ -24,6 +25,7 @@ proc load_all_tags {} {
>  		--sort=-taggerdate \
>  		--format=%(refname) \
>  		refs/tags]
> +	fconfigure $fd -translation binary -encoding utf-8
>  	while {[gets $fd line] > 0} {
>  		if {![regsub ^refs/tags/ $line {} name]} continue
>  		lappend all_tags $name
Pratyush Yadav Dec. 9, 2019, 9:09 p.m. UTC | #2
On 09/12/19 11:15AM, Junio C Hamano wrote:
> 加藤一博 <kato-k@ksysllc.co.jp> writes:
> 
> > After "git checkout -b '漢字'" to create a branch with UTF-8
> > character in it, "git gui" shows the branch name incorrectly,
> > as it forgets to turn the bytes
> > read from the "git for-each-ref" and
> > read from "HEAD" file
> > into Unicode characters.
> 
> Thanks.
> 
> Note to the git-gui mentainer.  The above may want to be
> line-wrapped a bit.

Thanks. Already done :)
 
> > Signed-off-by: Kazuhiro Kato <kato-k@ksysllc.co.jp>
> > ---
> >  git-gui.sh     | 1 +
> >  lib/branch.tcl | 2 ++
> >  2 files changed, 3 insertions(+)
> >
> > diff --git a/git-gui.sh b/git-gui.sh
> > index 0d21f56..8f4a9ae 100755
> > --- a/git-gui.sh
> > +++ b/git-gui.sh
> > @@ -684,6 +684,7 @@ proc load_current_branch {} {
> >  	global current_branch is_detached
> >  
> >  	set fd [open [gitdir HEAD] r]
> > +	fconfigure $fd -translation binary -encoding utf-8
> >  	if {[gets $fd ref] < 1} {
> >  		set ref {}
> >  	}
> 
> A comment totally outside the scope of this fix to anybody
> interested in further working on this code.
> 
> This piece of code is way too intimate with the implementation
> details of HEAD and yet not intimate enough to know that HEAD can be
> a symlink (in other words, it is a poor imitation of the real logic
> implemented in git core).  A kosher way to implement this would be
> to call
> 
> 	git symbolic-ref --quiet --short HEAD
> 
> which would succeed and give the branch name to its standard output,
> or would fail when the head is detached.  Set "current_branch" and
> "is_detached" according to the outcome.

It was introduced in fc4e8da (git-gui: Internalize symbolic-ref HEAD 
reading logic, 2007-05-30). The commit message is:

  To improve performance on fork+exec impoverished systems (such as
  Windows) we want to avoid running git-symbolic-ref on every rescan
  if we can do so.  A quick way to implement such an avoidance is to
  just read the HEAD ref ourselves; we'll either see it as a symref
  (starts with "ref: ") or we'll see it as a detached head (40 hex
  digits).  In either case we can treat that as our current branch.

Now I'm not sure how relevant this still is over 12 years later, but 
AFAIK a fork+exec is still very costly on Windows.

So I wonder whether we should manually check if HEAD is a symbolic link 
or we should just use git-symbolic-ref and hope the performance doesn't 
drop too much.
 
> And yes, Kato-san's fconfigure fix in this patch will still be
> relevant even after such a fix to the implementation of this proc.

Me and Jonathan (Cc) have been having a discussion [0] about whether 
hard-coding UTF-8 as the refname encoding is the right idea. The gist of 
it is that Git _technically_ allows refnames to be in other encodings as 
long as the strings are NULL terminated. It does not restrict itself to 
valid UTF-8 only. More details in the linked thread, of course.

My position is that we should default to UTF-8 given its popularity (at 
least in the Git world), but I'm wondering whether we should also add a 
config variable to allow users to configure their encodings.

If you don't mind, your thoughts on this would be appreciated :)

[0] https://github.com/prati0100/git-gui/pull/21
Junio C Hamano Dec. 9, 2019, 9:25 p.m. UTC | #3
Pratyush Yadav <me@yadavpratyush.com> writes:

> My position is that we should default to UTF-8 given its popularity (at 
> least in the Git world), but I'm wondering whether we should also add a 
> config variable to allow users to configure their encodings.
>
> If you don't mind, your thoughts on this would be appreciated :)
>
> [0] https://github.com/prati0100/git-gui/pull/21

Quite honestly, as long as we can keep the core part to stay
"pathnames and refnames are just runs of bytes, terminated by NUL,
and no other interpretation is necessary", I do not think I care too
deeply if the UI layers (either GUI or Porcelain) become a bit more
restrictive.

But that is just my gut feeling.
Pratyush Yadav Dec. 11, 2019, 4:05 p.m. UTC | #4
On 07/12/19 12:29AM, 加藤一博 wrote:
> After "git checkout -b '漢字'" to create a branch with UTF-8
> character in it, "git gui" shows the branch name incorrectly,
> as it forgets to turn the bytes
> read from the "git for-each-ref" and
> read from "HEAD" file
> into Unicode characters.
> 
> Signed-off-by: Kazuhiro Kato <kato-k@ksysllc.co.jp>

Merged. Thanks.

Patch
diff mbox series

diff --git a/git-gui.sh b/git-gui.sh
index 0d21f56..8f4a9ae 100755
--- a/git-gui.sh
+++ b/git-gui.sh
@@ -684,6 +684,7 @@  proc load_current_branch {} {
 	global current_branch is_detached
 
 	set fd [open [gitdir HEAD] r]
+	fconfigure $fd -translation binary -encoding utf-8
 	if {[gets $fd ref] < 1} {
 		set ref {}
 	}
diff --git a/lib/branch.tcl b/lib/branch.tcl
index 777eeb7..8b0c485 100644
--- a/lib/branch.tcl
+++ b/lib/branch.tcl
@@ -8,6 +8,7 @@  proc load_all_heads {} {
 	set rh_len [expr {[string length $rh] + 1}]
 	set all_heads [list]
 	set fd [git_read for-each-ref --format=%(refname) $rh]
+	fconfigure $fd -translation binary -encoding utf-8
 	while {[gets $fd line] > 0} {
 		if {!$some_heads_tracking || ![is_tracking_branch $line]} {
 			lappend all_heads [string range $line $rh_len end]
@@ -24,6 +25,7 @@  proc load_all_tags {} {
 		--sort=-taggerdate \
 		--format=%(refname) \
 		refs/tags]
+	fconfigure $fd -translation binary -encoding utf-8
 	while {[gets $fd line] > 0} {
 		if {![regsub ^refs/tags/ $line {} name]} continue
 		lappend all_tags $name