diff mbox series

[v2,3/7] sane-ctype.h: move sane-ctype macros from git-compat-util.h

Message ID 20230523192949.1271671-3-calvinwan@google.com (mailing list archive)
State Superseded
Headers show
Series git-compat-util cleanups | expand

Commit Message

Calvin Wan May 23, 2023, 7:29 p.m. UTC
Splitting these macros from git-compat-util.h cleans up the file and
allows future third-party sources to not use these overrides if they do
not wish to.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 git-compat-util.h | 65 +-------------------------------------------
 sane-ctype.h      | 69 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 64 deletions(-)
 create mode 100644 sane-ctype.h

Comments

Jonathan Tan May 26, 2023, 9:19 p.m. UTC | #1
Calvin Wan <calvinwan@google.com> writes:
> Splitting these macros from git-compat-util.h cleans up the file and
> allows future third-party sources to not use these overrides if they do
> not wish to.
> 
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> ---
>  git-compat-util.h | 65 +-------------------------------------------
>  sane-ctype.h      | 69 +++++++++++++++++++++++++++++++++++++++++++++++

Any specific reason for the "sane-" prefix? I think it would make more
sense if it was just named ctype.h: see below.

> -/* in ctype.c, for kwset users */
> -extern const unsigned char tolower_trans_tbl[256];

I'm not sure what this has to do with sanity, but this is indeed defined
in ctype.c, so it's easier to justify moving this out if the criterion
was "what's in ctype.c" rather than "what's related to sane ctypes".

> -extern const signed char hexval_table[256];

And this one has nothing to do with ctypes or sanity, but rather, what's
considered to be a hex character. I think we need another patch to move
this to hex.h.

> -#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1)

Same for this one.

With the above suggestions, I think we do get what we want - a split
between things that make ctype more sane and between other compat stuff,
and also a split between those two and more platform-agnostic things
like what a hex character is.
Phillip Wood May 28, 2023, 12:04 p.m. UTC | #2
Hi Calvin and Jonathan

On 26/05/2023 22:19, Jonathan Tan wrote:
> Calvin Wan <calvinwan@google.com> writes:
>> Splitting these macros from git-compat-util.h cleans up the file and
>> allows future third-party sources to not use these overrides if they do
>> not wish to.

I think splitting these out makes sense. If you have time to add a 
comment highlighting the differences from the standard library versions 
that would be really helpful. As far as I know the differences are

  - our versions are locale independent
  - our isspace() does not consider '\v' or '\f' to be space
  - our versions expect a signed or unsigned char rather than an int
    as their argument
  - our versions do not accept EOF
  - our versions do not require an explicit cast to avoid undefined
    behavior when passing char on systems where char is signed

>> Signed-off-by: Calvin Wan <calvinwan@google.com>
>> ---
>>   git-compat-util.h | 65 +-------------------------------------------
>>   sane-ctype.h      | 69 +++++++++++++++++++++++++++++++++++++++++++++++
> 
> Any specific reason for the "sane-" prefix? I think it would make more
> sense if it was just named ctype.h: see below.

I don't have a strong opinion either way but "sane-ctype.h" makes it 
clearer why we're doing something different to the standard library.

>> -/* in ctype.c, for kwset users */
>> -extern const unsigned char tolower_trans_tbl[256];
> 
> I'm not sure what this has to do with sanity, but this is indeed defined
> in ctype.c, so it's easier to justify moving this out if the criterion
> was "what's in ctype.c" rather than "what's related to sane ctypes".

There is only one user of this so it could be moved to 
diffcore-pickaxe.c or kwset.c where it would be more discoverable if 
anyone needs to add another user in the future.

>> -extern const signed char hexval_table[256];
> 
> And this one has nothing to do with ctypes or sanity, but rather, what's
> considered to be a hex character. I think we need another patch to move
> this to hex.h.

Isn't "what's considered  to be a hex character" related to ctypes and 
sanity though as it is used by the "sane" definition of isxdigit().

>> -#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1)
> 
> Same for this one.

I'd much rather keep all of our "sane" ctype replacements in one place 
as Calvin is proposing. isxdigit() is defined in <ctype.h> so it should 
be in our "sane" version of that header.

Best Wishes

Phillip

> With the above suggestions, I think we do get what we want - a split
> between things that make ctype more sane and between other compat stuff,
> and also a split between those two and more platform-agnostic things
> like what a hex character is.
Jonathan Tan May 30, 2023, 5:29 p.m. UTC | #3
Phillip Wood <phillip.wood123@gmail.com> writes:
> > Any specific reason for the "sane-" prefix? I think it would make more
> > sense if it was just named ctype.h: see below.
> 
> I don't have a strong opinion either way but "sane-ctype.h" makes it 
> clearer why we're doing something different to the standard library.

Ah...that's true.

> >> -extern const signed char hexval_table[256];
> > 
> > And this one has nothing to do with ctypes or sanity, but rather, what's
> > considered to be a hex character. I think we need another patch to move
> > this to hex.h.
> 
> Isn't "what's considered  to be a hex character" related to ctypes and 
> sanity though as it is used by the "sane" definition of isxdigit().
> 
> >> -#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1)
> > 
> > Same for this one.
> 
> I'd much rather keep all of our "sane" ctype replacements in one place 
> as Calvin is proposing. isxdigit() is defined in <ctype.h> so it should 
> be in our "sane" version of that header.

Good point.
Junio C Hamano June 1, 2023, 4:12 a.m. UTC | #4
Jonathan Tan <jonathantanmy@google.com> writes:

> Any specific reason for the "sane-" prefix? I think it would make more
> sense if it was just named ctype.h: see below.

I think it is a good idea to keep the sane prefix for two reasons
that were described in 4546738b (Unlocalized isspace and friends,
2005-10-13):

    Do our own ctype.h, just to get the sane semantics: we want
    locale-independence, _and_ we want the right signed behaviour. Plus we
    only use a very small subset of ctype.h anyway (isspace, isalpha,
    isdigit and isalnum).
diff mbox series

Patch

diff --git a/git-compat-util.h b/git-compat-util.h
index f8e68baf29..9d3c21acbb 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1150,70 +1150,7 @@  static inline size_t xsize_t(off_t len)
 #define HOST_NAME_MAX 256
 #endif
 
-/* in ctype.c, for kwset users */
-extern const unsigned char tolower_trans_tbl[256];
-
-/* Sane ctype - no locale, and works with signed chars */
-#undef isascii
-#undef isspace
-#undef isdigit
-#undef isalpha
-#undef isalnum
-#undef isprint
-#undef islower
-#undef isupper
-#undef tolower
-#undef toupper
-#undef iscntrl
-#undef ispunct
-#undef isxdigit
-
-extern const unsigned char sane_ctype[256];
-extern const signed char hexval_table[256];
-#define GIT_SPACE 0x01
-#define GIT_DIGIT 0x02
-#define GIT_ALPHA 0x04
-#define GIT_GLOB_SPECIAL 0x08
-#define GIT_REGEX_SPECIAL 0x10
-#define GIT_PATHSPEC_MAGIC 0x20
-#define GIT_CNTRL 0x40
-#define GIT_PUNCT 0x80
-#define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
-#define isascii(x) (((x) & ~0x7f) == 0)
-#define isspace(x) sane_istest(x,GIT_SPACE)
-#define isdigit(x) sane_istest(x,GIT_DIGIT)
-#define isalpha(x) sane_istest(x,GIT_ALPHA)
-#define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
-#define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)
-#define islower(x) sane_iscase(x, 1)
-#define isupper(x) sane_iscase(x, 0)
-#define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL)
-#define is_regex_special(x) sane_istest(x,GIT_GLOB_SPECIAL | GIT_REGEX_SPECIAL)
-#define iscntrl(x) (sane_istest(x,GIT_CNTRL))
-#define ispunct(x) sane_istest(x, GIT_PUNCT | GIT_REGEX_SPECIAL | \
-		GIT_GLOB_SPECIAL | GIT_PATHSPEC_MAGIC)
-#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1)
-#define tolower(x) sane_case((unsigned char)(x), 0x20)
-#define toupper(x) sane_case((unsigned char)(x), 0)
-#define is_pathspec_magic(x) sane_istest(x,GIT_PATHSPEC_MAGIC)
-
-static inline int sane_case(int x, int high)
-{
-	if (sane_istest(x, GIT_ALPHA))
-		x = (x & ~0x20) | high;
-	return x;
-}
-
-static inline int sane_iscase(int x, int is_lower)
-{
-	if (!sane_istest(x, GIT_ALPHA))
-		return 0;
-
-	if (is_lower)
-		return (x & 0x20) != 0;
-	else
-		return (x & 0x20) == 0;
-}
+#include "sane-ctype.h"
 
 /*
  * Like skip_prefix, but compare case-insensitively. Note that the comparison
diff --git a/sane-ctype.h b/sane-ctype.h
new file mode 100644
index 0000000000..df06bcbe14
--- /dev/null
+++ b/sane-ctype.h
@@ -0,0 +1,69 @@ 
+#ifndef SANE_CTYPE_H
+#define SANE_CTYPE_H
+
+/* in ctype.c, for kwset users */
+extern const unsigned char tolower_trans_tbl[256];
+
+/* Sane ctype - no locale, and works with signed chars */
+#undef isascii
+#undef isspace
+#undef isdigit
+#undef isalpha
+#undef isalnum
+#undef isprint
+#undef islower
+#undef isupper
+#undef tolower
+#undef toupper
+#undef iscntrl
+#undef ispunct
+#undef isxdigit
+
+extern const unsigned char sane_ctype[256];
+extern const signed char hexval_table[256];
+#define GIT_SPACE 0x01
+#define GIT_DIGIT 0x02
+#define GIT_ALPHA 0x04
+#define GIT_GLOB_SPECIAL 0x08
+#define GIT_REGEX_SPECIAL 0x10
+#define GIT_PATHSPEC_MAGIC 0x20
+#define GIT_CNTRL 0x40
+#define GIT_PUNCT 0x80
+#define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
+#define isascii(x) (((x) & ~0x7f) == 0)
+#define isspace(x) sane_istest(x,GIT_SPACE)
+#define isdigit(x) sane_istest(x,GIT_DIGIT)
+#define isalpha(x) sane_istest(x,GIT_ALPHA)
+#define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
+#define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)
+#define islower(x) sane_iscase(x, 1)
+#define isupper(x) sane_iscase(x, 0)
+#define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL)
+#define is_regex_special(x) sane_istest(x,GIT_GLOB_SPECIAL | GIT_REGEX_SPECIAL)
+#define iscntrl(x) (sane_istest(x,GIT_CNTRL))
+#define ispunct(x) sane_istest(x, GIT_PUNCT | GIT_REGEX_SPECIAL | \
+		GIT_GLOB_SPECIAL | GIT_PATHSPEC_MAGIC)
+#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1)
+#define tolower(x) sane_case((unsigned char)(x), 0x20)
+#define toupper(x) sane_case((unsigned char)(x), 0)
+#define is_pathspec_magic(x) sane_istest(x,GIT_PATHSPEC_MAGIC)
+
+static inline int sane_case(int x, int high)
+{
+	if (sane_istest(x, GIT_ALPHA))
+		x = (x & ~0x20) | high;
+	return x;
+}
+
+static inline int sane_iscase(int x, int is_lower)
+{
+	if (!sane_istest(x, GIT_ALPHA))
+		return 0;
+
+	if (is_lower)
+		return (x & 0x20) != 0;
+	else
+		return (x & 0x20) == 0;
+}
+
+#endif