Message ID | 20191207002842.32208-1-kato-k@ksysllc.co.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | git gui: fix branch name encoding error on git gui | expand |
加藤一博 <kato-k@ksysllc.co.jp> writes: > After "git checkout -b '漢字'" to create a branch with UTF-8 > character in it, "git gui" shows the branch name incorrectly, > as it forgets to turn the bytes > read from the "git for-each-ref" and > read from "HEAD" file > into Unicode characters. Thanks. Note to the git-gui mentainer. The above may want to be line-wrapped a bit. > Signed-off-by: Kazuhiro Kato <kato-k@ksysllc.co.jp> > --- > git-gui.sh | 1 + > lib/branch.tcl | 2 ++ > 2 files changed, 3 insertions(+) > > diff --git a/git-gui.sh b/git-gui.sh > index 0d21f56..8f4a9ae 100755 > --- a/git-gui.sh > +++ b/git-gui.sh > @@ -684,6 +684,7 @@ proc load_current_branch {} { > global current_branch is_detached > > set fd [open [gitdir HEAD] r] > + fconfigure $fd -translation binary -encoding utf-8 > if {[gets $fd ref] < 1} { > set ref {} > } A comment totally outside the scope of this fix to anybody interested in further working on this code. This piece of code is way too intimate with the implementation details of HEAD and yet not intimate enough to know that HEAD can be a symlink (in other words, it is a poor imitation of the real logic implemented in git core). A kosher way to implement this would be to call git symbolic-ref --quiet --short HEAD which would succeed and give the branch name to its standard output, or would fail when the head is detached. Set "current_branch" and "is_detached" according to the outcome. And yes, Kato-san's fconfigure fix in this patch will still be relevant even after such a fix to the implementation of this proc. > diff --git a/lib/branch.tcl b/lib/branch.tcl > index 777eeb7..8b0c485 100644 > --- a/lib/branch.tcl > +++ b/lib/branch.tcl > @@ -8,6 +8,7 @@ proc load_all_heads {} { > set rh_len [expr {[string length $rh] + 1}] > set all_heads [list] > set fd [git_read for-each-ref --format=%(refname) $rh] > + fconfigure $fd -translation binary -encoding utf-8 > while {[gets $fd line] > 0} { > if {!$some_heads_tracking || ![is_tracking_branch $line]} { > lappend all_heads [string range $line $rh_len end] > @@ -24,6 +25,7 @@ proc load_all_tags {} { > --sort=-taggerdate \ > --format=%(refname) \ > refs/tags] > + fconfigure $fd -translation binary -encoding utf-8 > while {[gets $fd line] > 0} { > if {![regsub ^refs/tags/ $line {} name]} continue > lappend all_tags $name
On 09/12/19 11:15AM, Junio C Hamano wrote: > 加藤一博 <kato-k@ksysllc.co.jp> writes: > > > After "git checkout -b '漢字'" to create a branch with UTF-8 > > character in it, "git gui" shows the branch name incorrectly, > > as it forgets to turn the bytes > > read from the "git for-each-ref" and > > read from "HEAD" file > > into Unicode characters. > > Thanks. > > Note to the git-gui mentainer. The above may want to be > line-wrapped a bit. Thanks. Already done :) > > Signed-off-by: Kazuhiro Kato <kato-k@ksysllc.co.jp> > > --- > > git-gui.sh | 1 + > > lib/branch.tcl | 2 ++ > > 2 files changed, 3 insertions(+) > > > > diff --git a/git-gui.sh b/git-gui.sh > > index 0d21f56..8f4a9ae 100755 > > --- a/git-gui.sh > > +++ b/git-gui.sh > > @@ -684,6 +684,7 @@ proc load_current_branch {} { > > global current_branch is_detached > > > > set fd [open [gitdir HEAD] r] > > + fconfigure $fd -translation binary -encoding utf-8 > > if {[gets $fd ref] < 1} { > > set ref {} > > } > > A comment totally outside the scope of this fix to anybody > interested in further working on this code. > > This piece of code is way too intimate with the implementation > details of HEAD and yet not intimate enough to know that HEAD can be > a symlink (in other words, it is a poor imitation of the real logic > implemented in git core). A kosher way to implement this would be > to call > > git symbolic-ref --quiet --short HEAD > > which would succeed and give the branch name to its standard output, > or would fail when the head is detached. Set "current_branch" and > "is_detached" according to the outcome. It was introduced in fc4e8da (git-gui: Internalize symbolic-ref HEAD reading logic, 2007-05-30). The commit message is: To improve performance on fork+exec impoverished systems (such as Windows) we want to avoid running git-symbolic-ref on every rescan if we can do so. A quick way to implement such an avoidance is to just read the HEAD ref ourselves; we'll either see it as a symref (starts with "ref: ") or we'll see it as a detached head (40 hex digits). In either case we can treat that as our current branch. Now I'm not sure how relevant this still is over 12 years later, but AFAIK a fork+exec is still very costly on Windows. So I wonder whether we should manually check if HEAD is a symbolic link or we should just use git-symbolic-ref and hope the performance doesn't drop too much. > And yes, Kato-san's fconfigure fix in this patch will still be > relevant even after such a fix to the implementation of this proc. Me and Jonathan (Cc) have been having a discussion [0] about whether hard-coding UTF-8 as the refname encoding is the right idea. The gist of it is that Git _technically_ allows refnames to be in other encodings as long as the strings are NULL terminated. It does not restrict itself to valid UTF-8 only. More details in the linked thread, of course. My position is that we should default to UTF-8 given its popularity (at least in the Git world), but I'm wondering whether we should also add a config variable to allow users to configure their encodings. If you don't mind, your thoughts on this would be appreciated :) [0] https://github.com/prati0100/git-gui/pull/21
Pratyush Yadav <me@yadavpratyush.com> writes: > My position is that we should default to UTF-8 given its popularity (at > least in the Git world), but I'm wondering whether we should also add a > config variable to allow users to configure their encodings. > > If you don't mind, your thoughts on this would be appreciated :) > > [0] https://github.com/prati0100/git-gui/pull/21 Quite honestly, as long as we can keep the core part to stay "pathnames and refnames are just runs of bytes, terminated by NUL, and no other interpretation is necessary", I do not think I care too deeply if the UI layers (either GUI or Porcelain) become a bit more restrictive. But that is just my gut feeling.
On 07/12/19 12:29AM, 加藤一博 wrote: > After "git checkout -b '漢字'" to create a branch with UTF-8 > character in it, "git gui" shows the branch name incorrectly, > as it forgets to turn the bytes > read from the "git for-each-ref" and > read from "HEAD" file > into Unicode characters. > > Signed-off-by: Kazuhiro Kato <kato-k@ksysllc.co.jp> Merged. Thanks.
diff --git a/git-gui.sh b/git-gui.sh index 0d21f56..8f4a9ae 100755 --- a/git-gui.sh +++ b/git-gui.sh @@ -684,6 +684,7 @@ proc load_current_branch {} { global current_branch is_detached set fd [open [gitdir HEAD] r] + fconfigure $fd -translation binary -encoding utf-8 if {[gets $fd ref] < 1} { set ref {} } diff --git a/lib/branch.tcl b/lib/branch.tcl index 777eeb7..8b0c485 100644 --- a/lib/branch.tcl +++ b/lib/branch.tcl @@ -8,6 +8,7 @@ proc load_all_heads {} { set rh_len [expr {[string length $rh] + 1}] set all_heads [list] set fd [git_read for-each-ref --format=%(refname) $rh] + fconfigure $fd -translation binary -encoding utf-8 while {[gets $fd line] > 0} { if {!$some_heads_tracking || ![is_tracking_branch $line]} { lappend all_heads [string range $line $rh_len end] @@ -24,6 +25,7 @@ proc load_all_tags {} { --sort=-taggerdate \ --format=%(refname) \ refs/tags] + fconfigure $fd -translation binary -encoding utf-8 while {[gets $fd line] > 0} { if {![regsub ^refs/tags/ $line {} name]} continue lappend all_tags $name
After "git checkout -b '漢字'" to create a branch with UTF-8 character in it, "git gui" shows the branch name incorrectly, as it forgets to turn the bytes read from the "git for-each-ref" and read from "HEAD" file into Unicode characters. Signed-off-by: Kazuhiro Kato <kato-k@ksysllc.co.jp> --- git-gui.sh | 1 + lib/branch.tcl | 2 ++ 2 files changed, 3 insertions(+)