diff mbox series

[2/2] fix: when resolving merge conflicts, japanese file names become garbled.

Message ID c698805f088e0643e5faf027d4eaa6de14d6c1ff.1739918546.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series gitk: Fixing file name encoding issues. | expand

Commit Message

Kazuhiro Kato Feb. 18, 2025, 10:42 p.m. UTC
From: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>

Signed-off-by: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>
---
 gitk-git/gitk | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Junio C Hamano Feb. 19, 2025, 6:11 p.m. UTC | #1
"Kazuhiro Kato via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>

Here is a place to give a bit more context.  In what way the current
code is wrong, what end-user visible symptoms are brought due to
that wrongness, what is the correct way to implement it, etc.

> Signed-off-by: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>
> ---
>  gitk-git/gitk | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gitk-git/gitk b/gitk-git/gitk
> index 88951ed2384..f4f8dbd5fad 100755
> --- a/gitk-git/gitk
> +++ b/gitk-git/gitk
> @@ -8205,12 +8205,13 @@ proc parseblobdiffline {ids line} {
>  
>          if {$type eq "--cc"} {
>              # start of a new file in a merge diff
> -            set fname [string range $line 10 end]
> +            set fname_raw [string range $line 10 end]
> +            set fname [encoding convertfrom $fname_raw]

Is this "the Tcl read from git things as sequence of bytes, not
characters, so somebody needs to pass the bytes to "encoding"
function to turn them into a sequence of characters?  Unless
everything is US-ASCII, that is.

If that is the case, presumably the $line has a sequence of bytes,
so it may be wrong to chop it at 10th position (presumably that's
10th byte, not 10th character) when we are trying to teach the code
to deal with non-ASCII data, no?  

I am reasonably sure that [string length "diff --git"] is where 10
comes from, and that prefix will always be in ASCII, but it feels
safer and kosher if we converted the whole line first and then
chopped off the prefix.

The patch title says Japanese, but I would imagine this applies to
anything non-ASCII, so it would be better to retitle the patch to
say "non-ASCII" instead to signal that the issue the patch fixes
applies more widely.

Thanks.
diff mbox series

Patch

diff --git a/gitk-git/gitk b/gitk-git/gitk
index 88951ed2384..f4f8dbd5fad 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -8205,12 +8205,13 @@  proc parseblobdiffline {ids line} {
 
         if {$type eq "--cc"} {
             # start of a new file in a merge diff
-            set fname [string range $line 10 end]
+            set fname_raw [string range $line 10 end]
+            set fname [encoding convertfrom $fname_raw]
             if {[lsearch -exact $treediffs($ids) $fname] < 0} {
                 lappend treediffs($ids) $fname
                 add_flist [list $fname]
             }
-
+            set fname $fname_raw
         } else {
             set line [string range $line 11 end]
             # If the name hasn't changed the length will be odd,
@@ -8310,6 +8311,7 @@  proc parseblobdiffline {ids line} {
             set diffinhdr 0
             return
         }
+        set line [encoding convertfrom $line]
         $ctext insert end "$line\n" filesep
 
     } else {