diff mbox series

[4/4] dir.c: avoid "exceeds maximum object size" error with GCC v12.x

Message ID 365889ee96e37dc9dcbe60d98880eb256dae90ee.1653351786.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 2acf4cf0010379f10b39eba1fb4e0868a5ba4114
Headers show
Series ci: fix windows-build with GCC v12.x | expand

Commit Message

Johannes Schindelin May 24, 2022, 12:23 a.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

Technically, the pointer difference `end - start` _could_ be negative,
and when cast to an (unsigned) `size_t` that would cause problems. In
this instance, the symptom is:

dir.c: In function 'git_url_basename':
dir.c:3087:13: error: 'memchr' specified bound [9223372036854775808, 0]
       exceeds maximum object size 9223372036854775807
       [-Werror=stringop-overread]
    CC ewah/bitmap.o
 3087 |         if (memchr(start, '/', end - start) == NULL
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

While it is a bit far-fetched to think that `end` (which is defined as
`repo + strlen(repo)`) and `start` (which starts at `repo` and never
steps beyond the NUL terminator) could result in such a negative
difference, GCC has no way of knowing that.

See also https://gcc.gnu.org/bugzilla//show_bug.cgi?id=85783.

Let's just add a safety check, primarily for GCC's benefit.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 dir.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Ævar Arnfjörð Bjarmason May 24, 2022, 5:53 a.m. UTC | #1
On Tue, May 24 2022, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Technically, the pointer difference `end - start` _could_ be negative,
> and when cast to an (unsigned) `size_t` that would cause problems. In
> this instance, the symptom is:
>
> dir.c: In function 'git_url_basename':
> dir.c:3087:13: error: 'memchr' specified bound [9223372036854775808, 0]
>        exceeds maximum object size 9223372036854775807
>        [-Werror=stringop-overread]
>     CC ewah/bitmap.o
>  3087 |         if (memchr(start, '/', end - start) == NULL
>       |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> While it is a bit far-fetched to think that `end` (which is defined as
> `repo + strlen(repo)`) and `start` (which starts at `repo` and never
> steps beyond the NUL terminator) could result in such a negative
> difference, GCC has no way of knowing that.
>
> See also https://gcc.gnu.org/bugzilla//show_bug.cgi?id=85783.
>
> Let's just add a safety check, primarily for GCC's benefit.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  dir.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/dir.c b/dir.c
> index 5aa6fbad0b7..ea78f606230 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -3076,6 +3076,15 @@ char *git_url_basename(const char *repo, int is_bundle, int is_bare)
>  			end--;
>  	}
>  
> +	/*
> +	 * It should not be possible to overflow `ptrdiff_t` by passing in an
> +	 * insanely long URL, but GCC does not know that and will complain
> +	 * without this check.
> +	 */
> +	if (end - start < 0)
> +		die(_("No directory name could be guessed.\n"

This should start with a lower-case letter, see CodingGuidelines.

> +		      "Please specify a directory on the command line"));
> +
>  	/*
>  	 * Strip trailing port number if we've got only a
>  	 * hostname (that is, there is no dir separator but a
Johannes Schindelin May 24, 2022, 9:05 p.m. UTC | #2
Hi Ævar,

On Tue, 24 May 2022, Ævar Arnfjörð Bjarmason wrote:

>
> On Tue, May 24 2022, Johannes Schindelin via GitGitGadget wrote:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > Technically, the pointer difference `end - start` _could_ be negative,
> > and when cast to an (unsigned) `size_t` that would cause problems. In
> > this instance, the symptom is:
> >
> > dir.c: In function 'git_url_basename':
> > dir.c:3087:13: error: 'memchr' specified bound [9223372036854775808, 0]
> >        exceeds maximum object size 9223372036854775807
> >        [-Werror=stringop-overread]
> >     CC ewah/bitmap.o
> >  3087 |         if (memchr(start, '/', end - start) == NULL
> >       |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > While it is a bit far-fetched to think that `end` (which is defined as
> > `repo + strlen(repo)`) and `start` (which starts at `repo` and never
> > steps beyond the NUL terminator) could result in such a negative
> > difference, GCC has no way of knowing that.
> >
> > See also https://gcc.gnu.org/bugzilla//show_bug.cgi?id=85783.
> >
> > Let's just add a safety check, primarily for GCC's benefit.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  dir.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/dir.c b/dir.c
> > index 5aa6fbad0b7..ea78f606230 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -3076,6 +3076,15 @@ char *git_url_basename(const char *repo, int is_bundle, int is_bare)
> >  			end--;
> >  	}
> >
> > +	/*
> > +	 * It should not be possible to overflow `ptrdiff_t` by passing in an
> > +	 * insanely long URL, but GCC does not know that and will complain
> > +	 * without this check.
> > +	 */
> > +	if (end - start < 0)
> > +		die(_("No directory name could be guessed.\n"
>
> This should start with a lower-case letter, see CodingGuidelines.

This message is copied from existing code later in the same function.
Since it is a translateable message, I do not want to edit it because that
would cause unnecessary work of the translators. Especially given that we
do not even expect this message to be shown, ever, but we only add this
hunk for GCC's benefit.

Thank you,
Johannes

>
> > +		      "Please specify a directory on the command line"));
> > +
> >  	/*
> >  	 * Strip trailing port number if we've got only a
> >  	 * hostname (that is, there is no dir separator but a
>
>
Derrick Stolee May 25, 2022, 1:39 p.m. UTC | #3
On 5/24/2022 5:05 PM, Johannes Schindelin wrote:> On Tue, 24 May 2022, Ævar Arnfjörð Bjarmason wrote:
>> On Tue, May 24 2022, Johannes Schindelin via GitGitGadget wrote:
>>> +	/*
>>> +	 * It should not be possible to overflow `ptrdiff_t` by passing in an
>>> +	 * insanely long URL, but GCC does not know that and will complain
>>> +	 * without this check.
>>> +	 */
>>> +	if (end - start < 0)
>>> +		die(_("No directory name could be guessed.\n"
>>
>> This should start with a lower-case letter, see CodingGuidelines.
> 
> This message is copied from existing code later in the same function.
> Since it is a translateable message, I do not want to edit it because that
> would cause unnecessary work of the translators. Especially given that we
> do not even expect this message to be shown, ever, but we only add this
> hunk for GCC's benefit.

Perhaps this should be a BUG() statement, then? Without any
translation?

Thanks,
-Stolee
Junio C Hamano May 25, 2022, 6:27 p.m. UTC | #4
Derrick Stolee <derrickstolee@github.com> writes:

> On 5/24/2022 5:05 PM, Johannes Schindelin wrote:> On Tue, 24 May 2022, Ævar Arnfjörð Bjarmason wrote:
>>> On Tue, May 24 2022, Johannes Schindelin via GitGitGadget wrote:
>>>> +	/*
>>>> +	 * It should not be possible to overflow `ptrdiff_t` by passing in an
>>>> +	 * insanely long URL, but GCC does not know that and will complain
>>>> +	 * without this check.
>>>> +	 */
>>>> +	if (end - start < 0)
>>>> +		die(_("No directory name could be guessed.\n"
>>>
>>> This should start with a lower-case letter, see CodingGuidelines.
>> 
>> This message is copied from existing code later in the same function.
>> Since it is a translateable message, I do not want to edit it because that
>> would cause unnecessary work of the translators. Especially given that we
>> do not even expect this message to be shown, ever, but we only add this
>> hunk for GCC's benefit.
>
> Perhaps this should be a BUG() statement, then? Without any
> translation?

Yeah, both are good.  If somehow the caller managed to pass such a
long URL then it can be considered a data error at runtime, and not
that the user detected a bug in our code, so in that sense die()
would be appropriate.  It is like xmalloc() running out of memory.

On the other hand, the "should not be possible to overflow" in the
comment implicitly assumes that it is impossible to pass insanely
long URL to trigger the condition from places we think of offhand,
like the command line, where the input is limited to a much shorter
string.  As "we detected a situation that should not happen unless
there is a programming or design bug" is what BUG() means, it is
also good here---our assumption that this should not be possible
turned out to be faulty, so we noticed a design bug.

I wonder if we can add a separate macro to add more to the
documentation value, though.  With something like

    #define FALSE_WARNING(expression, message) \
	do { if (expression) { BUG(message); } while (0)

the above would just become

	FALSE_WARNING(end - start < 0, "ptrdiff_t would not overflow here");

without a need for a big comment before it.  We might even be able
to optimize it out when building with compilers that do not need the
workaround.
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index 5aa6fbad0b7..ea78f606230 100644
--- a/dir.c
+++ b/dir.c
@@ -3076,6 +3076,15 @@  char *git_url_basename(const char *repo, int is_bundle, int is_bare)
 			end--;
 	}
 
+	/*
+	 * It should not be possible to overflow `ptrdiff_t` by passing in an
+	 * insanely long URL, but GCC does not know that and will complain
+	 * without this check.
+	 */
+	if (end - start < 0)
+		die(_("No directory name could be guessed.\n"
+		      "Please specify a directory on the command line"));
+
 	/*
 	 * Strip trailing port number if we've got only a
 	 * hostname (that is, there is no dir separator but a