Message ID | 20221008075509.26252-1-xdavidwuph@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | client: handle wide chars for table rows | expand |
Hi Pinghao, On Sat, 2022-10-08 at 15:55 +0800, Pinghao Wu wrote: > Try to find out printing width of wide chars using current locale and > its byte length before falling back to non-codepoint UTF-8 bytes > method. So IWD itself only supports utf-8 so I'm not sure adding support for wide chars in iwctl is really useful. This function is for printing table rows and any data in these rows comes from DBus, which will always be utf8. Unless you have some other use case I hadn't thought of? Thanks, James
The current non-codepoint UTF-8 method assumes that a codepoint occupies one column while printing. This is not true for some characters like Chinese characters, which occupies two columns. The motivation of this patch is to handle these "wide" (in printing columns) chars properly via wcwidth(). It's not actually about non-UTF-8, but it is also useful for that case.
On Tue, 2022-10-11 at 00:09 +0800, Pinghao Wu wrote: > The current non-codepoint UTF-8 method assumes that a codepoint > occupies > one column while printing. This is not true for some characters like > Chinese characters, which occupies two columns. Ah ok, in this case it may be best to combine you're changes with the non-codepoint check since you're changes are more complete for handling utf-8. I'll review the original patch. > > The motivation of this patch is to handle these "wide" (in printing > columns) chars properly via wcwidth(). It's not actually about > non-UTF-8, but it is also useful for that case.
On Sat, 2022-10-08 at 15:55 +0800, Pinghao Wu wrote: > Try to find out printing width of wide chars using current locale and > its byte length before falling back to non-codepoint UTF-8 bytes > method. > --- > client/display.c | 16 +++++++++++++++- > client/main.c | 3 +++ > 2 files changed, 18 insertions(+), 1 deletion(-) > > diff --git a/client/display.c b/client/display.c > index b729ad4c..dcbe11f8 100644 > --- a/client/display.c > +++ b/client/display.c > @@ -26,10 +26,13 @@ > > #define _GNU_SOURCE > #include <stdio.h> > +#include <stdlib.h> > +#include <string.h> > #include <signal.h> > #include <sys/stat.h> > #include <sys/ioctl.h> > #include <unistd.h> > +#include <wchar.h> > > #include <readline/history.h> > #include <readline/readline.h> > @@ -401,15 +404,26 @@ static char* next_line(char *s, unsigned int > *max, char **color_out) > unsigned int i; > int last_space = -1; > int last_color = -1; > + int s_len = strlen(s); > > /* Find the last space before 'max', as well as any color */ > - for (i = 0; i <= *max && s[i] != '\0'; i++) { > + for (i = 0; i <= *max && i != s_len; i++) { > if (s[i] == ' ') > last_space = i; > else if (s[i] == 0x1b) { > /* color escape won't count for column width > */ > *max += color_end(s + i); > last_color = i; > + /* Non-ASCII, try wchar */ > + } else if (s[i] & 0x80) { > + wchar_t w; > + int w_mblen; > + if ((w_mblen = mbtowc(&w, &s[i], s_len - i)) Instead of mbtowc() use l_utf8_get_codepoint() (an ELL utility). Since we assume the input is going to be utf-8 always. > > 0) { > + /* Compensate max bytes for wide char > */ > + *max += w_mblen - wcwidth(w); > + /* Skip other bytes for this wchar */ > + i += w_mblen - 1; > + } > /* Add width for non-codepoint UTF-8 bytes */ > } else if (((uint8_t)s[i] >> 6) == 2) > *max += 1; And you can go ahead and remove this if-clause entirely since the above case should now cover it. Plus the s[i] & 0x80 check above actually prevents this branch from ever being taken. > diff --git a/client/main.c b/client/main.c > index df5c0a61..7e8dead4 100644 > --- a/client/main.c > +++ b/client/main.c > @@ -25,6 +25,7 @@ > #endif > > #include <errno.h> > +#include <locale.h> > #include <signal.h> > #include <ell/ell.h> > > @@ -50,6 +51,8 @@ int main(int argc, char *argv[]) > { > bool all_done; > > + setlocale(LC_CTYPE, ""); > + > if (!l_main_init()) > return EXIT_FAILURE; > Thanks, James
diff --git a/client/display.c b/client/display.c index b729ad4c..dcbe11f8 100644 --- a/client/display.c +++ b/client/display.c @@ -26,10 +26,13 @@ #define _GNU_SOURCE #include <stdio.h> +#include <stdlib.h> +#include <string.h> #include <signal.h> #include <sys/stat.h> #include <sys/ioctl.h> #include <unistd.h> +#include <wchar.h> #include <readline/history.h> #include <readline/readline.h> @@ -401,15 +404,26 @@ static char* next_line(char *s, unsigned int *max, char **color_out) unsigned int i; int last_space = -1; int last_color = -1; + int s_len = strlen(s); /* Find the last space before 'max', as well as any color */ - for (i = 0; i <= *max && s[i] != '\0'; i++) { + for (i = 0; i <= *max && i != s_len; i++) { if (s[i] == ' ') last_space = i; else if (s[i] == 0x1b) { /* color escape won't count for column width */ *max += color_end(s + i); last_color = i; + /* Non-ASCII, try wchar */ + } else if (s[i] & 0x80) { + wchar_t w; + int w_mblen; + if ((w_mblen = mbtowc(&w, &s[i], s_len - i)) > 0) { + /* Compensate max bytes for wide char */ + *max += w_mblen - wcwidth(w); + /* Skip other bytes for this wchar */ + i += w_mblen - 1; + } /* Add width for non-codepoint UTF-8 bytes */ } else if (((uint8_t)s[i] >> 6) == 2) *max += 1; diff --git a/client/main.c b/client/main.c index df5c0a61..7e8dead4 100644 --- a/client/main.c +++ b/client/main.c @@ -25,6 +25,7 @@ #endif #include <errno.h> +#include <locale.h> #include <signal.h> #include <ell/ell.h> @@ -50,6 +51,8 @@ int main(int argc, char *argv[]) { bool all_done; + setlocale(LC_CTYPE, ""); + if (!l_main_init()) return EXIT_FAILURE;