Message ID | 20230523192949.1271671-3-calvinwan@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | git-compat-util cleanups | expand |
Calvin Wan <calvinwan@google.com> writes: > Splitting these macros from git-compat-util.h cleans up the file and > allows future third-party sources to not use these overrides if they do > not wish to. > > Signed-off-by: Calvin Wan <calvinwan@google.com> > --- > git-compat-util.h | 65 +------------------------------------------- > sane-ctype.h | 69 +++++++++++++++++++++++++++++++++++++++++++++++ Any specific reason for the "sane-" prefix? I think it would make more sense if it was just named ctype.h: see below. > -/* in ctype.c, for kwset users */ > -extern const unsigned char tolower_trans_tbl[256]; I'm not sure what this has to do with sanity, but this is indeed defined in ctype.c, so it's easier to justify moving this out if the criterion was "what's in ctype.c" rather than "what's related to sane ctypes". > -extern const signed char hexval_table[256]; And this one has nothing to do with ctypes or sanity, but rather, what's considered to be a hex character. I think we need another patch to move this to hex.h. > -#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1) Same for this one. With the above suggestions, I think we do get what we want - a split between things that make ctype more sane and between other compat stuff, and also a split between those two and more platform-agnostic things like what a hex character is.
Hi Calvin and Jonathan On 26/05/2023 22:19, Jonathan Tan wrote: > Calvin Wan <calvinwan@google.com> writes: >> Splitting these macros from git-compat-util.h cleans up the file and >> allows future third-party sources to not use these overrides if they do >> not wish to. I think splitting these out makes sense. If you have time to add a comment highlighting the differences from the standard library versions that would be really helpful. As far as I know the differences are - our versions are locale independent - our isspace() does not consider '\v' or '\f' to be space - our versions expect a signed or unsigned char rather than an int as their argument - our versions do not accept EOF - our versions do not require an explicit cast to avoid undefined behavior when passing char on systems where char is signed >> Signed-off-by: Calvin Wan <calvinwan@google.com> >> --- >> git-compat-util.h | 65 +------------------------------------------- >> sane-ctype.h | 69 +++++++++++++++++++++++++++++++++++++++++++++++ > > Any specific reason for the "sane-" prefix? I think it would make more > sense if it was just named ctype.h: see below. I don't have a strong opinion either way but "sane-ctype.h" makes it clearer why we're doing something different to the standard library. >> -/* in ctype.c, for kwset users */ >> -extern const unsigned char tolower_trans_tbl[256]; > > I'm not sure what this has to do with sanity, but this is indeed defined > in ctype.c, so it's easier to justify moving this out if the criterion > was "what's in ctype.c" rather than "what's related to sane ctypes". There is only one user of this so it could be moved to diffcore-pickaxe.c or kwset.c where it would be more discoverable if anyone needs to add another user in the future. >> -extern const signed char hexval_table[256]; > > And this one has nothing to do with ctypes or sanity, but rather, what's > considered to be a hex character. I think we need another patch to move > this to hex.h. Isn't "what's considered to be a hex character" related to ctypes and sanity though as it is used by the "sane" definition of isxdigit(). >> -#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1) > > Same for this one. I'd much rather keep all of our "sane" ctype replacements in one place as Calvin is proposing. isxdigit() is defined in <ctype.h> so it should be in our "sane" version of that header. Best Wishes Phillip > With the above suggestions, I think we do get what we want - a split > between things that make ctype more sane and between other compat stuff, > and also a split between those two and more platform-agnostic things > like what a hex character is.
Phillip Wood <phillip.wood123@gmail.com> writes: > > Any specific reason for the "sane-" prefix? I think it would make more > > sense if it was just named ctype.h: see below. > > I don't have a strong opinion either way but "sane-ctype.h" makes it > clearer why we're doing something different to the standard library. Ah...that's true. > >> -extern const signed char hexval_table[256]; > > > > And this one has nothing to do with ctypes or sanity, but rather, what's > > considered to be a hex character. I think we need another patch to move > > this to hex.h. > > Isn't "what's considered to be a hex character" related to ctypes and > sanity though as it is used by the "sane" definition of isxdigit(). > > >> -#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1) > > > > Same for this one. > > I'd much rather keep all of our "sane" ctype replacements in one place > as Calvin is proposing. isxdigit() is defined in <ctype.h> so it should > be in our "sane" version of that header. Good point.
Jonathan Tan <jonathantanmy@google.com> writes: > Any specific reason for the "sane-" prefix? I think it would make more > sense if it was just named ctype.h: see below. I think it is a good idea to keep the sane prefix for two reasons that were described in 4546738b (Unlocalized isspace and friends, 2005-10-13): Do our own ctype.h, just to get the sane semantics: we want locale-independence, _and_ we want the right signed behaviour. Plus we only use a very small subset of ctype.h anyway (isspace, isalpha, isdigit and isalnum).
diff --git a/git-compat-util.h b/git-compat-util.h index f8e68baf29..9d3c21acbb 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1150,70 +1150,7 @@ static inline size_t xsize_t(off_t len) #define HOST_NAME_MAX 256 #endif -/* in ctype.c, for kwset users */ -extern const unsigned char tolower_trans_tbl[256]; - -/* Sane ctype - no locale, and works with signed chars */ -#undef isascii -#undef isspace -#undef isdigit -#undef isalpha -#undef isalnum -#undef isprint -#undef islower -#undef isupper -#undef tolower -#undef toupper -#undef iscntrl -#undef ispunct -#undef isxdigit - -extern const unsigned char sane_ctype[256]; -extern const signed char hexval_table[256]; -#define GIT_SPACE 0x01 -#define GIT_DIGIT 0x02 -#define GIT_ALPHA 0x04 -#define GIT_GLOB_SPECIAL 0x08 -#define GIT_REGEX_SPECIAL 0x10 -#define GIT_PATHSPEC_MAGIC 0x20 -#define GIT_CNTRL 0x40 -#define GIT_PUNCT 0x80 -#define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0) -#define isascii(x) (((x) & ~0x7f) == 0) -#define isspace(x) sane_istest(x,GIT_SPACE) -#define isdigit(x) sane_istest(x,GIT_DIGIT) -#define isalpha(x) sane_istest(x,GIT_ALPHA) -#define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT) -#define isprint(x) ((x) >= 0x20 && (x) <= 0x7e) -#define islower(x) sane_iscase(x, 1) -#define isupper(x) sane_iscase(x, 0) -#define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL) -#define is_regex_special(x) sane_istest(x,GIT_GLOB_SPECIAL | GIT_REGEX_SPECIAL) -#define iscntrl(x) (sane_istest(x,GIT_CNTRL)) -#define ispunct(x) sane_istest(x, GIT_PUNCT | GIT_REGEX_SPECIAL | \ - GIT_GLOB_SPECIAL | GIT_PATHSPEC_MAGIC) -#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1) -#define tolower(x) sane_case((unsigned char)(x), 0x20) -#define toupper(x) sane_case((unsigned char)(x), 0) -#define is_pathspec_magic(x) sane_istest(x,GIT_PATHSPEC_MAGIC) - -static inline int sane_case(int x, int high) -{ - if (sane_istest(x, GIT_ALPHA)) - x = (x & ~0x20) | high; - return x; -} - -static inline int sane_iscase(int x, int is_lower) -{ - if (!sane_istest(x, GIT_ALPHA)) - return 0; - - if (is_lower) - return (x & 0x20) != 0; - else - return (x & 0x20) == 0; -} +#include "sane-ctype.h" /* * Like skip_prefix, but compare case-insensitively. Note that the comparison diff --git a/sane-ctype.h b/sane-ctype.h new file mode 100644 index 0000000000..df06bcbe14 --- /dev/null +++ b/sane-ctype.h @@ -0,0 +1,69 @@ +#ifndef SANE_CTYPE_H +#define SANE_CTYPE_H + +/* in ctype.c, for kwset users */ +extern const unsigned char tolower_trans_tbl[256]; + +/* Sane ctype - no locale, and works with signed chars */ +#undef isascii +#undef isspace +#undef isdigit +#undef isalpha +#undef isalnum +#undef isprint +#undef islower +#undef isupper +#undef tolower +#undef toupper +#undef iscntrl +#undef ispunct +#undef isxdigit + +extern const unsigned char sane_ctype[256]; +extern const signed char hexval_table[256]; +#define GIT_SPACE 0x01 +#define GIT_DIGIT 0x02 +#define GIT_ALPHA 0x04 +#define GIT_GLOB_SPECIAL 0x08 +#define GIT_REGEX_SPECIAL 0x10 +#define GIT_PATHSPEC_MAGIC 0x20 +#define GIT_CNTRL 0x40 +#define GIT_PUNCT 0x80 +#define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0) +#define isascii(x) (((x) & ~0x7f) == 0) +#define isspace(x) sane_istest(x,GIT_SPACE) +#define isdigit(x) sane_istest(x,GIT_DIGIT) +#define isalpha(x) sane_istest(x,GIT_ALPHA) +#define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT) +#define isprint(x) ((x) >= 0x20 && (x) <= 0x7e) +#define islower(x) sane_iscase(x, 1) +#define isupper(x) sane_iscase(x, 0) +#define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL) +#define is_regex_special(x) sane_istest(x,GIT_GLOB_SPECIAL | GIT_REGEX_SPECIAL) +#define iscntrl(x) (sane_istest(x,GIT_CNTRL)) +#define ispunct(x) sane_istest(x, GIT_PUNCT | GIT_REGEX_SPECIAL | \ + GIT_GLOB_SPECIAL | GIT_PATHSPEC_MAGIC) +#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1) +#define tolower(x) sane_case((unsigned char)(x), 0x20) +#define toupper(x) sane_case((unsigned char)(x), 0) +#define is_pathspec_magic(x) sane_istest(x,GIT_PATHSPEC_MAGIC) + +static inline int sane_case(int x, int high) +{ + if (sane_istest(x, GIT_ALPHA)) + x = (x & ~0x20) | high; + return x; +} + +static inline int sane_iscase(int x, int is_lower) +{ + if (!sane_istest(x, GIT_ALPHA)) + return 0; + + if (is_lower) + return (x & 0x20) != 0; + else + return (x & 0x20) == 0; +} + +#endif
Splitting these macros from git-compat-util.h cleans up the file and allows future third-party sources to not use these overrides if they do not wish to. Signed-off-by: Calvin Wan <calvinwan@google.com> --- git-compat-util.h | 65 +------------------------------------------- sane-ctype.h | 69 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+), 64 deletions(-) create mode 100644 sane-ctype.h