diff mbox series

[PATCHv2,1/4] add-patch: Fix type missmatch rom msvc

Message ID 20250106190855.3098-2-soekkle@freenet.de (mailing list archive)
State New
Headers show
Series Fixes typemissmatch warinigs from msvc | expand

Commit Message

Sören Krecker Jan. 6, 2025, 7:08 p.m. UTC
Fix some compiler warings from msvw in add-patch.c for value truncation
form 64 bit to 32 bit integers.Change unsigned long to size_t for
correct variable size on linux and windows.
Add macro strtos for converting a string to size_t.
Test if convertion fails with over or underflow.

Signed-off-by: Sören Krecker <soekkle@freenet.de>

Uses strtouq

impove linux support

Change Macro name
---
 add-patch.c       | 53 +++++++++++++++++++++++++++--------------------
 gettext.h         |  2 +-
 git-compat-util.h |  6 ++++++
 3 files changed, 38 insertions(+), 23 deletions(-)

Comments

brian m. carlson Jan. 7, 2025, 12:13 a.m. UTC | #1
On 2025-01-06 at 19:08:52, Sören Krecker wrote:
> Fix some compiler warings from msvw in add-patch.c for value truncation
> form 64 bit to 32 bit integers.Change unsigned long to size_t for
> correct variable size on linux and windows.
> Add macro strtos for converting a string to size_t.
> Test if convertion fails with over or underflow.

A few minor nits here.  We want to say "from" both here and in the title
and "conversion" (and in the title, "mismatch"), and put a space after
the period in a sentence.  I think you meant "MSVC" instead of "msvw",
but if not, please do explain what that is, since I'm not familiar with
it and I'm curious.  The commit message is a good place to explain lots
in detail.

> Signed-off-by: Sören Krecker <soekkle@freenet.de>
> 
> Uses strtouq

I don't see that we're using this function.

> impove linux support
> 
> Change Macro name

We don't typically put comments about the revisions we've made to a
patch in the commit message.  We may put them below the --- so that
they're visible to readers and reviewers, which is helpful, but we
pretend that our patches were perfect to begin with in terms of the
commit message, since the future reader of the history only cares about
the actual end result and not what changes we made along the way.

> ---
>  add-patch.c       | 53 +++++++++++++++++++++++++++--------------------
>  gettext.h         |  2 +-
>  git-compat-util.h |  6 ++++++
>  3 files changed, 38 insertions(+), 23 deletions(-)
> 
> diff --git a/add-patch.c b/add-patch.c
> index 7b598e14df..67a7f68d23 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -242,7 +242,7 @@ static struct patch_mode patch_mode_worktree_nothead = {
>  };
>  
>  struct hunk_header {
> -	unsigned long old_offset, old_count, new_offset, new_count;
> +	size_t old_offset, old_count, new_offset, new_count;
>  	/*
>  	 * Start/end offsets to the extra text after the second `@@` in the
>  	 * hunk header, e.g. the function signature. This is expected to
> @@ -322,11 +322,12 @@ static void setup_child_process(struct add_p_state *s,
>  }
>  
>  static int parse_range(const char **p,
> -		       unsigned long *offset, unsigned long *count)
> +		       size_t *offset, size_t *count)
>  {
>  	char *pend;
> -
> -	*offset = strtoul(*p, &pend, 10);
> +	*offset = strtos(*p, &pend, 10);

I see you've defined this below.

> +	if (errno == ERANGE)
> +		return error("Number dose not fit datatype");

I think the word you wanted was "does".  However, perhaps we should
provide a better, more meaningful error message so the user knows what
data they provided that was invalid.  Maybe "absurdly large value in
diff header range"?  It would be quite bizarre to get a value even as
large as the maximum value of a 32-bit integer, and I don't think our
diff code can even handle values larger than INT_MAX.

In that context, it might not even be necessary to handle values larger
than unsigned long, since we can't generate them.  However, in the
interests of compatibility with other implementations which might not
have that limitation, size_t seems reasonable as a choice to handle more
generally.

Assuming we keep this, we probably also want to mark this for
translation by wrapping it in `_(` and `)`.

I also don't think this order is correct.  In general, errno is not
reset implicitly, so unless we know that an error occurred, errno is
meaningless, since another function could have set it to ERANGE.  We'd
probably need to save errno, set it to 0, and restore to verify that we
got the right value, since we can't distinguish here between a truncated
value for range reasons and for other reasons.

>  	if (pend == *p)
>  		return -1;
>
>  	if (*pend != ',') {
> @@ -334,7 +335,9 @@ static int parse_range(const char **p,
>  		*p = pend;
>  		return 0;
>  	}
> -	*count = strtoul(pend + 1, (char **)p, 10);
> +	*count = strtos(pend + 1, (char **)p, 10);
> +	if (errno == ERANGE)
> +		return error("Number dose not fit datatype");

Same comment here.

>  	return *p == pend + 1 ? -1 : 0;
>  }
>  
> @@ -673,8 +676,8 @@ static void render_hunk(struct add_p_state *s, struct hunk *hunk,
>  		 */
>  		const char *p;
>  		size_t len;
> -		unsigned long old_offset = header->old_offset;
> -		unsigned long new_offset = header->new_offset;
> +		size_t old_offset = header->old_offset;
> +		size_t new_offset = header->new_offset;
>  
>  		if (!colored) {
>  			p = s->plain.buf + header->extra_start;
> @@ -700,12 +703,14 @@ static void render_hunk(struct add_p_state *s, struct hunk *hunk,
>  		else
>  			new_offset += delta;
>  
> -		strbuf_addf(out, "@@ -%lu", old_offset);
> +		strbuf_addf(out, "@@ -%" PRIuMAX, (uintmax_t)old_offset);
>  		if (header->old_count != 1)
> -			strbuf_addf(out, ",%lu", header->old_count);
> -		strbuf_addf(out, " +%lu", new_offset);
> +			strbuf_addf(out, ",%" PRIuMAX,
> +				    (uintmax_t)header->old_count);
> +		strbuf_addf(out, " +%" PRIuMAX, (uintmax_t)new_offset);

If we're using size_t, we can use %zu.  That's specified in C99 as the
appropriate formatting type for size_t, and we require C99 or C11 for
all systems.  We don't need to cast to uintmax_t.

> diff --git a/gettext.h b/gettext.h
> index 484cafa562..d36f5a7ade 100644
> --- a/gettext.h
> +++ b/gettext.h
> @@ -53,7 +53,7 @@ static inline FORMAT_PRESERVING(1) const char *_(const char *msgid)
>  }
>  
>  static inline FORMAT_PRESERVING(1) FORMAT_PRESERVING(2)
> -const char *Q_(const char *msgid, const char *plu, unsigned long n)
> +const char *Q_(const char *msgid, const char *plu, size_t n)
>  {
>  	if (!git_gettext_enabled)
>  		return n == 1 ? msgid : plu;
> diff --git a/git-compat-util.h b/git-compat-util.h
> index e283c46c6f..4c33990a05 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -291,6 +291,12 @@ static inline int _have_unix_sockets(void)
>  #ifdef HAVE_BSD_SYSCTL
>  #include <sys/sysctl.h>
>  #endif
> +#if defined _WIN64
> +# define strtos strtoull
> +#else
> +#define strtos strtoul
> +#endif

This is not a great name for the function.  First of all, it resembles
the standard functions a lot, so it's something that POSIX could
standardize or an OS could add, and then we'll have some fun compilation
errors when we redefine things.

Second, it's a lot less future-proof.  While I do agree that only
Windows 64-bit systems are likely to fall into this case, since we
already include <limits.h>, we probably should do this:

  #if SIZE_MAX == ULONG_MAX
  #define str_to_size_t strtoul
  #else
  #define str_to_size_t strtoull
  #endif

(or whatever you want to call the function).

That expresses what we care about—that the type is suitable for the
value we want to parse—and doesn't use the OS as a proxy for that data.
Otherwise, the Unix developer who doesn't use Windows may not
understand _why_ Windows is special and the reason we've chosen this
change.

On that note, it would be helpful if you explained in the commit message
why that is for people who don't know.  Maybe something like this:

  On 64-bit systems, size_t is a 64-bit type.  On most Unix systems,
  unsigned long is also 64 bits in size, so we can use functions for
  that type to parse values of size_t.  However, on Windows, unsigned
  long is always 32 bits, and if we want a 64-bit type, we must use
  unsigned long long.  To future-proof our changes against other
  platforms that might be added in the future, we first check if
  unsigned long is sufficient, and otherwise, use unsigned long long,
  which will work in both cases.

Of course, please feel free to edit as you see fit.
Junio C Hamano Jan. 7, 2025, 12:26 a.m. UTC | #2
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> We don't typically put comments about the revisions we've made to a
> patch in the commit message.  We may put them below the --- so that
> they're visible to readers and reviewers, which is helpful, but we
> pretend that our patches were perfect to begin with in terms of the
> commit message, since the future reader of the history only cares about
> the actual end result and not what changes we made along the way.

Thanks, all true.  The future readers would only *see* the end
results, and we do not want to hear about the previous stumblings
the author made before reaching an acceptable version.

>> ---
>>  add-patch.c       | 53 +++++++++++++++++++++++++++--------------------
>>  gettext.h         |  2 +-
>>  git-compat-util.h |  6 ++++++
>>  3 files changed, 38 insertions(+), 23 deletions(-)

I already made this comment, but I think the offset/count being
ulong is a very sane design decision, and what is causing the
compiler warning is some earlier change that introduced size_t
variables or parameters in the callchain.  As far as I can tell,
there is no system functions that yields size_t (hence we must use
size_t everywhere) in the code paths that deal with offset and
count.  I suggested to find these abused size_t and fix them to use
the matching type, i.e. "unsigned long", instead, as an alternative
fix.  I did not get an impression that the author tried the approach
and found why we must use size_t for offset/count instead.

And if we go that route, there is *no* need to talk about 64-bit ve
32-bit platforms.  ulong used consistently everywhere would let you
use offset/count that fits in ulong, and the apply machinery is
artificially limited to limit the patch size to a few gigabytes, so
32-bit ulong should be plenty as Phillip pointed out earlier.

If we needed to parse an integer into a large integer, the existing
code seem to use strtoumax() into uintmax_t and move it to the
target (while checking for truncation).  "Ah we are on windows, so
use strtoll, otherwise use strtol" is not something we want to see
in our codebase.

> If we're using size_t, we can use %zu.  That's specified in C99 as the
> appropriate formatting type for size_t, and we require C99 or C11 for
> all systems.  We don't need to cast to uintmax_t.

You and Documentation/CodingGuidelines contradict with each other
here.

Thanks for a review.
Junio C Hamano Jan. 7, 2025, 12:53 a.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

>> If we're using size_t, we can use %zu.  That's specified in C99 as the
>> appropriate formatting type for size_t, and we require C99 or C11 for
>> all systems.  We don't need to cast to uintmax_t.
>
> You and Documentation/CodingGuidelines contradict with each other
> here.

By this, I do not necessarily mean that we should stick to the past
tradition since d7d850e2 (CodingGuidelines: mention C99 features we
can't use, 2022-10-10), written back when MSVC was claiming to do
C99 without letting us use %z conversion.

What I meant was that if we are to update our stance against %z
conversion after re-evaluating the situation (and such time will
certainly come someday---I do not offhand know if it can be today),
we should update the documentation before or at least at the same
time we recommend its use to new people.

Thanks.
diff mbox series

Patch

diff --git a/add-patch.c b/add-patch.c
index 7b598e14df..67a7f68d23 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -242,7 +242,7 @@  static struct patch_mode patch_mode_worktree_nothead = {
 };
 
 struct hunk_header {
-	unsigned long old_offset, old_count, new_offset, new_count;
+	size_t old_offset, old_count, new_offset, new_count;
 	/*
 	 * Start/end offsets to the extra text after the second `@@` in the
 	 * hunk header, e.g. the function signature. This is expected to
@@ -322,11 +322,12 @@  static void setup_child_process(struct add_p_state *s,
 }
 
 static int parse_range(const char **p,
-		       unsigned long *offset, unsigned long *count)
+		       size_t *offset, size_t *count)
 {
 	char *pend;
-
-	*offset = strtoul(*p, &pend, 10);
+	*offset = strtos(*p, &pend, 10);
+	if (errno == ERANGE)
+		return error("Number dose not fit datatype");
 	if (pend == *p)
 		return -1;
 	if (*pend != ',') {
@@ -334,7 +335,9 @@  static int parse_range(const char **p,
 		*p = pend;
 		return 0;
 	}
-	*count = strtoul(pend + 1, (char **)p, 10);
+	*count = strtos(pend + 1, (char **)p, 10);
+	if (errno == ERANGE)
+		return error("Number dose not fit datatype");
 	return *p == pend + 1 ? -1 : 0;
 }
 
@@ -673,8 +676,8 @@  static void render_hunk(struct add_p_state *s, struct hunk *hunk,
 		 */
 		const char *p;
 		size_t len;
-		unsigned long old_offset = header->old_offset;
-		unsigned long new_offset = header->new_offset;
+		size_t old_offset = header->old_offset;
+		size_t new_offset = header->new_offset;
 
 		if (!colored) {
 			p = s->plain.buf + header->extra_start;
@@ -700,12 +703,14 @@  static void render_hunk(struct add_p_state *s, struct hunk *hunk,
 		else
 			new_offset += delta;
 
-		strbuf_addf(out, "@@ -%lu", old_offset);
+		strbuf_addf(out, "@@ -%" PRIuMAX, (uintmax_t)old_offset);
 		if (header->old_count != 1)
-			strbuf_addf(out, ",%lu", header->old_count);
-		strbuf_addf(out, " +%lu", new_offset);
+			strbuf_addf(out, ",%" PRIuMAX,
+				    (uintmax_t)header->old_count);
+		strbuf_addf(out, " +%" PRIuMAX, (uintmax_t)new_offset);
 		if (header->new_count != 1)
-			strbuf_addf(out, ",%lu", header->new_count);
+			strbuf_addf(out, ",%" PRIuMAX,
+				    (uintmax_t)header->new_count);
 		strbuf_addstr(out, " @@");
 
 		if (len)
@@ -1066,11 +1071,13 @@  static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
 
 	/* last hunk simply gets the rest */
 	if (header->old_offset != remaining.old_offset)
-		BUG("miscounted old_offset: %lu != %lu",
-		    header->old_offset, remaining.old_offset);
+		BUG("miscounted old_offset: %"PRIuMAX" != %"PRIuMAX,
+		    (uintmax_t)header->old_offset,
+		    (uintmax_t)remaining.old_offset);
 	if (header->new_offset != remaining.new_offset)
-		BUG("miscounted new_offset: %lu != %lu",
-		    header->new_offset, remaining.new_offset);
+		BUG("miscounted new_offset: %"PRIuMAX" != %"PRIuMAX,
+		    (uintmax_t)header->new_offset,
+		    (uintmax_t)remaining.new_offset);
 	header->old_count = remaining.old_count;
 	header->new_count = remaining.new_count;
 	hunk->end = end;
@@ -1354,9 +1361,10 @@  static void summarize_hunk(struct add_p_state *s, struct hunk *hunk,
 	struct strbuf *plain = &s->plain;
 	size_t len = out->len, i;
 
-	strbuf_addf(out, " -%lu,%lu +%lu,%lu ",
-		    header->old_offset, header->old_count,
-		    header->new_offset, header->new_count);
+	strbuf_addf(out,
+		    " -%"PRIuMAX",%"PRIuMAX" +%"PRIuMAX",%"PRIuMAX" ",
+		    (uintmax_t)header->old_offset, (uintmax_t)header->old_count,
+		    (uintmax_t)header->new_offset, (uintmax_t)header->new_count);
 	if (out->len - len < SUMMARY_HEADER_WIDTH)
 		strbuf_addchars(out, ' ',
 				SUMMARY_HEADER_WIDTH + len - out->len);
@@ -1625,10 +1633,11 @@  static int patch_update_file(struct add_p_state *s,
 			else if (0 < response && response <= file_diff->hunk_nr)
 				hunk_index = response - 1;
 			else
-				err(s, Q_("Sorry, only %d hunk available.",
-					  "Sorry, only %d hunks available.",
-					  file_diff->hunk_nr),
-				    (int)file_diff->hunk_nr);
+				err(s,
+				    Q_("Sorry, only %"PRIuMAX" hunk available.",
+				       "Sorry, only %"PRIuMAX" hunks available.",
+				       (uintmax_t)file_diff->hunk_nr),
+				    (uintmax_t)file_diff->hunk_nr);
 		} else if (s->answer.buf[0] == '/') {
 			regex_t regex;
 			int ret;
diff --git a/gettext.h b/gettext.h
index 484cafa562..d36f5a7ade 100644
--- a/gettext.h
+++ b/gettext.h
@@ -53,7 +53,7 @@  static inline FORMAT_PRESERVING(1) const char *_(const char *msgid)
 }
 
 static inline FORMAT_PRESERVING(1) FORMAT_PRESERVING(2)
-const char *Q_(const char *msgid, const char *plu, unsigned long n)
+const char *Q_(const char *msgid, const char *plu, size_t n)
 {
 	if (!git_gettext_enabled)
 		return n == 1 ? msgid : plu;
diff --git a/git-compat-util.h b/git-compat-util.h
index e283c46c6f..4c33990a05 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -291,6 +291,12 @@  static inline int _have_unix_sockets(void)
 #ifdef HAVE_BSD_SYSCTL
 #include <sys/sysctl.h>
 #endif
+#if defined _WIN64
+# define strtos strtoull
+#else
+#define strtos strtoul
+#endif
+
 
 /* Used by compat/win32/path-utils.h, and more */
 static inline int is_xplatform_dir_sep(int c)