mem-pool: use st_add() in mem_pool_strvfmt()

Message ID	bbe00b9e-64d8-4ec8-a2b9-2c6917c72dbd@web.de (mailing list archive)
State	New
Headers	show Received: from mout.web.de (mout.web.de [212.227.17.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87785146D6E for <git@vger.kernel.org>; Sun, 31 Mar 2024 18:53:09 +0000 (UTC) Message-ID: <bbe00b9e-64d8-4ec8-a2b9-2c6917c72dbd@web.de> Date: Sun, 31 Mar 2024 20:53:07 +0200 Precedence: bulk MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Git List <git@vger.kernel.org> From: =?utf-8?q?Ren=C3=A9_Scharfe?= <l.s.r@web.de> Subject: [PATCH] mem-pool: use st_add() in mem_pool_strvfmt() Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable UI-OutboundReport: notjunk:1;M01:P0:cm+OCI/6fNw=;MWzKMJ2CSavUBI9htN7EphNHkf0 E1PUcwv6n5n5BdJVORPT/3P5acJasK8BNZSgjrWLz8afBbzogYNSN8DmVoj2KftioFnBzoQry 3y7iF0x9VDmgUF8bUDyb8lpCU1l3qwVBnBC9a8PebDT+ustQHH2fA5Gc8QAXis9iYNpDLFiyo mzTsnJYDfbVGn01ktZshgaDVx2NSYOM6FwvPF2kGnSd0r/vs3Owo1C2DI6zuqainNhtvF3ScB ocNNLDMF3GAnjC0vWzCa5sTjRBGt7ZfWOOtGz5sQhnzYWCHFm929sw9I46U+mYQYZ2I3L8D6a g8vOAp/6+EJiMOQNPqAHA+WUQLDSgZseaDZu0X/Zf3oj7Qibk4xkr0+yFS6AVfYvAfvGGqq47 m4C7DZSrtOm4+2ArS8gPkjmHMj9c3NnfXeA5Ml1HaXjmZLFSh+h+KZYrb4gjX28LGLlHIFEa/ Adc494jd6K1DB8MCBhz1XzAfg62Eb8OOBPBGzxweS/0AYXS3K+FwtfeMBY451bWvXfk0MVke5 VPO+Aj5QzKvr7cVnQUSF+J0JkeS5PmgZ/J+Gz5X6Cu8dfrsBebhFC7JLw//5RFNq1bZ7aJxDZ yuAoLdzwrT0wFl+pz5sKqEBBHTc+MYPU2o1f0CIh4UvyGnZK13hFnj6qSRYGj1NCcNkCot8wU YbX3SP27hYEtmYX/Riyb2ZwWvv5Ti6nratuKZ8jUDkObjbTyjLw8mlsM0+/th16R/7Hru4ed+ Y06pcppdQw4rxiDdZvIjth15L+SvIyBSHbtqmwdGtaaaGrgLl8CqGmO6EFikmbe2NUofEhNO6 K4C5zqdKsAV5SkOGGuCggoCm3jjhMYnAF0Avad+X3WMd0=
Series	mem-pool: use st_add() in mem_pool_strvfmt() \| expand mem-pool: use st_add() in mem_pool_strvfmt()

René Scharfe March 31, 2024, 6:53 p.m. UTC

If len is INT_MAX in mem_pool_strvfmt(), then len + 1 overflows.
Casting it to size_t would prevent that.  Use st_add() to go a step
further and make the addition *obviously* safe.  The compiler can
optimize the check away on platforms where SIZE_MAX > INT_MAX, i.e.
basically everywhere.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 mem-pool.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--
2.44.0

Jeff King April 1, 2024, 3:36 a.m. UTC | #1

On Sun, Mar 31, 2024 at 08:53:07PM +0200, René Scharfe wrote:

> If len is INT_MAX in mem_pool_strvfmt(), then len + 1 overflows.
> Casting it to size_t would prevent that.  Use st_add() to go a step
> further and make the addition *obviously* safe.  The compiler can
> optimize the check away on platforms where SIZE_MAX > INT_MAX, i.e.
> basically everywhere.

Yeah, I think this is a good thing to do. I was confused at first why we
had an "int" at all, but it's the usual crappy snprintf interface.

Which, by the way...

> @@ -123,13 +124,14 @@ static char *mem_pool_strvfmt(struct mem_pool *pool, const char *fmt,
>  	if (len < 0)
>  		BUG("your vsnprintf is broken (returned %d)", len);

Not new in your patch, and I know this is copied from the strbuf code,
but I think a BUG() is probably the wrong thing. We added it long ago to
let us know about broken vsnprintf() implementations, but we'd have
flushed those out by now, as nothing in Git would work on such a
platform.

And meanwhile there are legitimate reasons for a non-broken vsnprintf()
to return -1: namely that it is the only useful thing they can do when
the requested string is larger than INT_MAX (e.g., "%s" on a string that
is over 2GB). This is sort of academic, of course. There's no useful
error to return here, and anybody who manages to shove 2GB into a place
where we expect a short string fully deserves to have their program
abort.

I don't have a good example of where you can trigger this (it used to be
easy with long attribute names, but these days we refuse to parse them).
But in general probably calling die() is more appropriate.

There's a similar call in vreportf() that tries to keep going, but it
ends up with lousy results. E.g., try:

  perl -le 'print "create refs/heads/", "a"x2147483648, "HEAD"' |
  git update-ref --stdin

which results in just "fatal: ", since formatting the error string
fails. Perhaps we should just print the unexpanded format string
("invalid ref format: %s" in this case). It's not great, but it's better
than nothing.

I guess I diverged quite far from reviewing your patch. ;) It obviously
looks fine, but the snprintf() int return value got me off on a tangent.

-Peff

René Scharfe April 2, 2024, 1:48 p.m. UTC | #2

Am 01.04.24 um 05:36 schrieb Jeff King:
> On Sun, Mar 31, 2024 at 08:53:07PM +0200, René Scharfe wrote:
>
> Which, by the way...
>
>> @@ -123,13 +124,14 @@ static char *mem_pool_strvfmt(struct mem_pool *pool, const char *fmt,
>>  	if (len < 0)
>>  		BUG("your vsnprintf is broken (returned %d)", len);
>
> Not new in your patch, and I know this is copied from the strbuf code,
> but I think a BUG() is probably the wrong thing. We added it long ago to
> let us know about broken vsnprintf() implementations, but we'd have
> flushed those out by now, as nothing in Git would work on such a
> platform.
>
> And meanwhile there are legitimate reasons for a non-broken vsnprintf()
> to return -1: namely that it is the only useful thing they can do when
> the requested string is larger than INT_MAX (e.g., "%s" on a string that
> is over 2GB). This is sort of academic, of course. There's no useful
> error to return here, and anybody who manages to shove 2GB into a place
> where we expect a short string fully deserves to have their program
> abort.
>
> I don't have a good example of where you can trigger this (it used to be
> easy with long attribute names, but these days we refuse to parse them).
> But in general probably calling die() is more appropriate.

Makes sense.  Could be rolled into a new wrapper, xvsnprintf();
imap-send.c::nfvasprintf() could call it as well.

There are also callers of vsnprintf(3) that use its return value without
checking for error: builtin/receive-pack.c::report_message(),
path.c::mksnpath() and arguably imap-send.c::nfsnprintf().

> There's a similar call in vreportf() that tries to keep going, but it
> ends up with lousy results. E.g., try:
>
>   perl -le 'print "create refs/heads/", "a"x2147483648, "HEAD"' |
>   git update-ref --stdin
>
> which results in just "fatal: ", since formatting the error string
> fails. Perhaps we should just print the unexpanded format string
> ("invalid ref format: %s" in this case). It's not great, but it's better
> than nothing.

We can throw in errno to distinguish between EILSEQ (invalid wide
character) and EOVERFLOW.  And we'd better not call die_errno() to avoid
triggering a recursion warning.  We can open-code it instead:

        if (vsnprintf(p, pend - p, err, params) < 0) {
                fprintf(stderr, _("%sunable to format message '%s': %s\n"),
                        _("fatal: "), err, strerror(errno));
                exit(128);
        }

But when I ran your test command (on macOS 14.4.1) ten times with this
change I got:

fatal: unable to format message 'invalid ref format: %s': Invalid argument
fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
fatal: unable to format message 'invalid ref format: %s': Invalid argument
fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
fatal: unable to format message 'invalid ref format: %s': Invalid argument
fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
fatal: unable to format message 'invalid ref format: %s': Invalid argument
fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
fatal: unable to format message 'invalid ref format: %s': Undefined error: 0

Which scares me.  Why is errno sometimes zero?  Why EINVAL instead of
EOVERFLOW?  O_o

René

Jeff King April 3, 2024, 1:18 a.m. UTC | #3

On Tue, Apr 02, 2024 at 03:48:45PM +0200, René Scharfe wrote:

> Makes sense.  Could be rolled into a new wrapper, xvsnprintf();
> imap-send.c::nfvasprintf() could call it as well.
> 
> There are also callers of vsnprintf(3) that use its return value without
> checking for error: builtin/receive-pack.c::report_message(),
> path.c::mksnpath() and arguably imap-send.c::nfsnprintf().

Hmm, yeah. Those are all OK not to use xsnprintf(), since they handle
truncation themselves. But the first two don't look like they handle a
negative return well. In report_message(), we'd end up shrinking "sz".
That's potentially an out-of-bounds problem, except that we'll always
have put a non-empty prefix into the buffer. For mksnpath(), though, I
suspect that trying to format a very long name could result in the
output being full of uninitialized bytes.

It only has one caller, which creates "foo~1" when we got EEXIST from
"foo". So I doubt you can get up to too much mischief with it. But it
could easily be replaced by mkpathdup() (or even a reusable strbuf
outside the loop if you really wanted to hyper-optimize)

And then we could get rid of mksnpath() entirely, and its horrible
bad_path failure mode.

> We can throw in errno to distinguish between EILSEQ (invalid wide
> character) and EOVERFLOW.  And we'd better not call die_errno() to avoid
> triggering a recursion warning.  We can open-code it instead:
> 
>         if (vsnprintf(p, pend - p, err, params) < 0) {
>                 fprintf(stderr, _("%sunable to format message '%s': %s\n"),
>                         _("fatal: "), err, strerror(errno));
>                 exit(128);
>         }

We could also just throw it into the buffer and let the rest of the
function proceed, like:

diff --git a/usage.c b/usage.c
index 09f0ed509b..5baab98fa3 100644
--- a/usage.c
+++ b/usage.c
@@ -19,8 +19,10 @@ static void vreportf(const char *prefix, const char *err, va_list params)
 	}
 	memcpy(msg, prefix, prefix_len);
 	p = msg + prefix_len;
-	if (vsnprintf(p, pend - p, err, params) < 0)
+	if (vsnprintf(p, pend - p, err, params) < 0) {
+		if (snprintf(p, pend - p, "could not format error: %s", err) < 0)
 		*p = '\0'; /* vsnprintf() failed, clip at prefix */
+	}

 	for (; p != pend - 1 && *p; p++) {
 		if (iscntrl(*p) && *p != '\t' && *p != '\n')

Though most of the rest of the function is not that useful (it is mostly
removing unreadable chars, and hopefully we do not have any of those in
our format strings!). I had not thought about showing strerror(). The C
standard does mention a negative value for encoding errors, but says
nothing about errno. POSIX does seem to mention EILSEQ and EOVERFLOW.
So this...

> But when I ran your test command (on macOS 14.4.1) ten times with this
> change I got:
> 
> fatal: unable to format message 'invalid ref format: %s': Invalid argument
> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
> fatal: unable to format message 'invalid ref format: %s': Invalid argument
> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
> fatal: unable to format message 'invalid ref format: %s': Invalid argument
> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
> fatal: unable to format message 'invalid ref format: %s': Invalid argument
> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
> 
> Which scares me.  Why is errno sometimes zero?  Why EINVAL instead of
> EOVERFLOW?  O_o

...is just confusing. I do think even without worrying about errno,
simply saying "I tried to format 'some error: %s' and couldn't" is going
to be way more useful than just "fatal: ". The only reason it would fail
is that there's something gross in that "%s". We can't be more specific
without interpreting the printf-format ourselves (which is probably not
worth it).

-Peff

René Scharfe April 3, 2024, 10:01 a.m. UTC | #4

Am 03.04.24 um 03:18 schrieb Jeff King:
> On Tue, Apr 02, 2024 at 03:48:45PM +0200, René Scharfe wrote:
>
>> Makes sense.  Could be rolled into a new wrapper, xvsnprintf();
>> imap-send.c::nfvasprintf() could call it as well.
>>
>> There are also callers of vsnprintf(3) that use its return value without
>> checking for error: builtin/receive-pack.c::report_message(),
>> path.c::mksnpath() and arguably imap-send.c::nfsnprintf().
>
> Hmm, yeah. Those are all OK not to use xsnprintf(), since they handle
> truncation themselves. But the first two don't look like they handle a
> negative return well. In report_message(), we'd end up shrinking "sz".
> That's potentially an out-of-bounds problem, except that we'll always
> have put a non-empty prefix into the buffer.

Getting only a truncated prefix is hopefully detected as invalid, but
explicit handling would probably be better.

> For mksnpath(), though, I
> suspect that trying to format a very long name could result in the
> output being full of uninitialized bytes.
>
> It only has one caller, which creates "foo~1" when we got EEXIST from
> "foo". So I doubt you can get up to too much mischief with it. But it
> could easily be replaced by mkpathdup() (or even a reusable strbuf
> outside the loop if you really wanted to hyper-optimize)
>
> And then we could get rid of mksnpath() entirely, and its horrible
> bad_path failure mode.

mkpath() would be perfect but unusable in multiple threads.  Cleaning
up after mkpathdup() is a bit iffy in that caller.  strbuf would be a
bit much in that error path, I think, and you might have to export or
reimplement strbuf_cleanup_path().

>> We can throw in errno to distinguish between EILSEQ (invalid wide
>> character) and EOVERFLOW.  And we'd better not call die_errno() to avoid
>> triggering a recursion warning.  We can open-code it instead:
>>
>>         if (vsnprintf(p, pend - p, err, params) < 0) {
>>                 fprintf(stderr, _("%sunable to format message '%s': %s\n"),
>>                         _("fatal: "), err, strerror(errno));
>>                 exit(128);
>>         }
>
> We could also just throw it into the buffer and let the rest of the
> function proceed, like:
>
> diff --git a/usage.c b/usage.c
> index 09f0ed509b..5baab98fa3 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -19,8 +19,10 @@ static void vreportf(const char *prefix, const char *err, va_list params)
>  	}
>  	memcpy(msg, prefix, prefix_len);
>  	p = msg + prefix_len;
> -	if (vsnprintf(p, pend - p, err, params) < 0)
> +	if (vsnprintf(p, pend - p, err, params) < 0) {
> +		if (snprintf(p, pend - p, "could not format error: %s", err) < 0)
>  		*p = '\0'; /* vsnprintf() failed, clip at prefix */
> +	}
>
>  	for (; p != pend - 1 && *p; p++) {
>  		if (iscntrl(*p) && *p != '\t' && *p != '\n')
>
> Though most of the rest of the function is not that useful (it is mostly
> removing unreadable chars, and hopefully we do not have any of those in
> our format strings!).

For warnings and usage messages this would keep the prefix and not
die.  This would look a bit strange, no?

	usage: could not format error: TERRIBLE MESSAGE!

> I had not thought about showing strerror(). The C
> standard does mention a negative value for encoding errors, but says
> nothing about errno. POSIX does seem to mention EILSEQ and EOVERFLOW.
> So this...
>
>> But when I ran your test command (on macOS 14.4.1) ten times with this
>> change I got:
>>
>> fatal: unable to format message 'invalid ref format: %s': Invalid argument
>> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
>> fatal: unable to format message 'invalid ref format: %s': Invalid argument
>> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
>> fatal: unable to format message 'invalid ref format: %s': Invalid argument
>> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
>> fatal: unable to format message 'invalid ref format: %s': Invalid argument
>> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
>> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
>> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0
>>
>> Which scares me.  Why is errno sometimes zero?  Why EINVAL instead of
>> EOVERFLOW?  O_o
>
> ...is just confusing. I do think even without worrying about errno,
> simply saying "I tried to format 'some error: %s' and couldn't" is going
> to be way more useful than just "fatal: ". The only reason it would fail
> is that there's something gross in that "%s". We can't be more specific
> without interpreting the printf-format ourselves (which is probably not
> worth it).

Yes, showing errno doesn't add that much value.  Except in this case it
shows that there's something going on that I don't understand.  Dare I
dig deeper?  Probably not..

René

Jeff King April 3, 2024, 8:48 p.m. UTC | #5

On Wed, Apr 03, 2024 at 12:01:13PM +0200, René Scharfe wrote:

> > For mksnpath(), though, I
> > suspect that trying to format a very long name could result in the
> > output being full of uninitialized bytes.
> >
> > It only has one caller, which creates "foo~1" when we got EEXIST from
> > "foo". So I doubt you can get up to too much mischief with it. But it
> > could easily be replaced by mkpathdup() (or even a reusable strbuf
> > outside the loop if you really wanted to hyper-optimize)
> >
> > And then we could get rid of mksnpath() entirely, and its horrible
> > bad_path failure mode.
> 
> mkpath() would be perfect but unusable in multiple threads.  Cleaning
> up after mkpathdup() is a bit iffy in that caller.  strbuf would be a
> bit much in that error path, I think, and you might have to export or
> reimplement strbuf_cleanup_path().

Yeah, I'd prefer not to go to mkpath(), even though that's the simplest
thing, just because we should be reducing the subtle non-reentrant parts
of the code over time. I don't think the cleanup handling for
mkpathdup() is too bad:

diff --git a/apply.c b/apply.c
index 432837a674..15dfe607ff 100644
--- a/apply.c
+++ b/apply.c
@@ -4502,20 +4502,25 @@ static int create_one_file(struct apply_state *state,
 		unsigned int nr = getpid();
 
 		for (;;) {
-			char newpath[PATH_MAX];
-			mksnpath(newpath, sizeof(newpath), "%s~%u", path, nr);
+			char *newpath = mkpathdup("%s~%u", path, nr);
 			res = try_create_file(state, newpath, mode, buf, size);
-			if (res < 0)
+			if (res < 0) {
+				free(newpath);
 				return -1;
+			}
 			if (!res) {
-				if (!rename(newpath, path))
+				if (!rename(newpath, path)) {
+					free(newpath);
 					return 0;
+				}
 				unlink_or_warn(newpath);
+				free(newpath);
 				break;
 			}
 			if (errno != EEXIST)
 				break;
 			++nr;
+			free(newpath);
 		}
 	}
 	return error_errno(_("unable to write file '%s' mode %o"),

It even gets a little easier if you hoist it to a strbuf outside the
loop, as you don't need to free on each loop iteration. That seems worth
it to me to get rid of a function that IMHO is mis-designed (it is
really only useful if you assume paths are bounded by PATH_MAX, which we
know isn't always true).

> > diff --git a/usage.c b/usage.c
> > index 09f0ed509b..5baab98fa3 100644
> > --- a/usage.c
> > +++ b/usage.c
> > @@ -19,8 +19,10 @@ static void vreportf(const char *prefix, const char *err, va_list params)
> >  	}
> >  	memcpy(msg, prefix, prefix_len);
> >  	p = msg + prefix_len;
> > -	if (vsnprintf(p, pend - p, err, params) < 0)
> > +	if (vsnprintf(p, pend - p, err, params) < 0) {
> > +		if (snprintf(p, pend - p, "could not format error: %s", err) < 0)
> >  		*p = '\0'; /* vsnprintf() failed, clip at prefix */
> > +	}
> >
> >  	for (; p != pend - 1 && *p; p++) {
> >  		if (iscntrl(*p) && *p != '\t' && *p != '\n')
> >
> > Though most of the rest of the function is not that useful (it is mostly
> > removing unreadable chars, and hopefully we do not have any of those in
> > our format strings!).
> 
> For warnings and usage messages this would keep the prefix and not
> die.  This would look a bit strange, no?
> 
> 	usage: could not format error: TERRIBLE MESSAGE!

Sure, but I think any solution here is going to look strange. Keep in
mind that we're trying to improve the case where we print _nothing_
useful at all. If you do see this on a non-fatal message, the subsequent
messages may be informative. E.g.:

  error: could not format error: unable to open loose object %s
  fatal: bad object HEAD~12

is probably better than just exiting after the first.

> Yes, showing errno doesn't add that much value.  Except in this case it
> shows that there's something going on that I don't understand.  Dare I
> dig deeper?  Probably not..

Well, let us know if you do. ;)

-Peff

Jeff King April 3, 2024, 9:20 p.m. UTC | #6

On Wed, Apr 03, 2024 at 04:48:36PM -0400, Jeff King wrote:

> Yeah, I'd prefer not to go to mkpath(), even though that's the simplest
> thing, just because we should be reducing the subtle non-reentrant parts
> of the code over time. I don't think the cleanup handling for
> mkpathdup() is too bad:
> 
> diff --git a/apply.c b/apply.c
> index 432837a674..15dfe607ff 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -4502,20 +4502,25 @@ static int create_one_file(struct apply_state *state,
>  		unsigned int nr = getpid();
>  
>  		for (;;) {
> -			char newpath[PATH_MAX];
> -			mksnpath(newpath, sizeof(newpath), "%s~%u", path, nr);
> +			char *newpath = mkpathdup("%s~%u", path, nr);
>  			res = try_create_file(state, newpath, mode, buf, size);
> -			if (res < 0)
> +			if (res < 0) {
> +				free(newpath);
>  				return -1;
> +			}
>  			if (!res) {
> -				if (!rename(newpath, path))
> +				if (!rename(newpath, path)) {
> +					free(newpath);
>  					return 0;
> +				}
>  				unlink_or_warn(newpath);
> +				free(newpath);
>  				break;
>  			}
>  			if (errno != EEXIST)
>  				break;
>  			++nr;
> +			free(newpath);
>  		}
>  	}
>  	return error_errno(_("unable to write file '%s' mode %o"),

OK, this misses one of the breaks, and potentially clobbers errno in
that path by calling free(). So it is harder than I said. I still think
it's worth doing, though.

-Peff

René Scharfe April 5, 2024, 1:10 p.m. UTC | #7

Am 03.04.24 um 22:48 schrieb Jeff King:
> On Wed, Apr 03, 2024 at 12:01:13PM +0200, René Scharfe wrote:
>
>>> diff --git a/usage.c b/usage.c
>>> index 09f0ed509b..5baab98fa3 100644
>>> --- a/usage.c
>>> +++ b/usage.c
>>> @@ -19,8 +19,10 @@ static void vreportf(const char *prefix, const char *err, va_list params)
>>>  	}
>>>  	memcpy(msg, prefix, prefix_len);
>>>  	p = msg + prefix_len;
>>> -	if (vsnprintf(p, pend - p, err, params) < 0)
>>> +	if (vsnprintf(p, pend - p, err, params) < 0) {
>>> +		if (snprintf(p, pend - p, "could not format error: %s", err) < 0)
>>>  		*p = '\0'; /* vsnprintf() failed, clip at prefix */
>>> +	}
>>>
>>>  	for (; p != pend - 1 && *p; p++) {
>>>  		if (iscntrl(*p) && *p != '\t' && *p != '\n')
>>>
>>> Though most of the rest of the function is not that useful (it is mostly
>>> removing unreadable chars, and hopefully we do not have any of those in
>>> our format strings!).

Hmm, this might be worth doing to avoid messing up the terminal if
'err' points into the weeds for some reason.

>> For warnings and usage messages this would keep the prefix and not
>> die.  This would look a bit strange, no?
>>
>> 	usage: could not format error: TERRIBLE MESSAGE!
>
> Sure, but I think any solution here is going to look strange. Keep in
> mind that we're trying to improve the case where we print _nothing_
> useful at all. If you do see this on a non-fatal message, the subsequent
> messages may be informative. E.g.:
>
>   error: could not format error: unable to open loose object %s
>   fatal: bad object HEAD~12
>
> is probably better than just exiting after the first.

OK, right, a format error doesn't have to be fatal and we can keep
running and possibly provide more details.

But mixing the format error with the actual payload message is not ideal.
At least we should give the format error its proper prefix, while still
reporting the prefix of the original message, e.g. like this:

   error: unable to format message: unable to open loose object %s
   fatal:

... or this:

   error: unable to format message: fatal: unable to open loose object %s

I tend to like the first one slightly better, even though the empty
message looks silly.  It doesn't mix the two and answers the questions
that I would have:  Why did the program stop?  Due to a fatal error.
Why is the fatal message silent?  The preceding error explains it.

While the second one only reveals the fatality somewhere in the middle
of the text, weakening the meaning of prefixes.

>> Yes, showing errno doesn't add that much value.  Except in this case it
>> shows that there's something going on that I don't understand.  Dare I
>> dig deeper?  Probably not..
>
> Well, let us know if you do. ;)

I still don't know why the error code varies between runs, but it
clearly does not come from vsnprintf(3) -- if I set errno to some
arbitrary value before the call, it is kept.  Which is enough to
convince me to ignore errno here.

René

Jeff King April 5, 2024, 5:45 p.m. UTC | #8

On Fri, Apr 05, 2024 at 03:10:33PM +0200, René Scharfe wrote:

> >>>  	memcpy(msg, prefix, prefix_len);
> >>>  	p = msg + prefix_len;
> >>> -	if (vsnprintf(p, pend - p, err, params) < 0)
> >>> +	if (vsnprintf(p, pend - p, err, params) < 0) {
> >>> +		if (snprintf(p, pend - p, "could not format error: %s", err) < 0)
> >>>  		*p = '\0'; /* vsnprintf() failed, clip at prefix */
> >>> +	}
> >>>
> >>>  	for (; p != pend - 1 && *p; p++) {
> >>>  		if (iscntrl(*p) && *p != '\t' && *p != '\n')
> >>>
> >>> Though most of the rest of the function is not that useful (it is mostly
> >>> removing unreadable chars, and hopefully we do not have any of those in
> >>> our format strings!).
> 
> Hmm, this might be worth doing to avoid messing up the terminal if
> 'err' points into the weeds for some reason.

I think we have bigger problems if "err" is messed up, because we'll be
interpreting garbage as a format string and almost certainly triggering
undefined behavior. And in general if we are not using a constant string
as the format it's a potential security vulnerability. So I think we can
mostly rely on the compile-time -Wformat checks for this.

> OK, right, a format error doesn't have to be fatal and we can keep
> running and possibly provide more details.
> 
> But mixing the format error with the actual payload message is not ideal.
> At least we should give the format error its proper prefix, while still
> reporting the prefix of the original message, e.g. like this:
> 
>    error: unable to format message: unable to open loose object %s
>    fatal:
> 
> ... or this:
> 
>    error: unable to format message: fatal: unable to open loose object %s
> 
> I tend to like the first one slightly better, even though the empty
> message looks silly.  It doesn't mix the two and answers the questions
> that I would have:  Why did the program stop?  Due to a fatal error.
> Why is the fatal message silent?  The preceding error explains it.
> 
> While the second one only reveals the fatality somewhere in the middle
> of the text, weakening the meaning of prefixes.

Yeah, I think your first one is good. It's _ugly_, but it's easy to
figure out what happened, and this is not something people generally see
(and the status quo is just "fatal:", so you're easily 50% less ugly).

> I still don't know why the error code varies between runs, but it
> clearly does not come from vsnprintf(3) -- if I set errno to some
> arbitrary value before the call, it is kept.  Which is enough to
> convince me to ignore errno here.

Agreed. Thanks for digging into it.

-Peff

mem-pool: use st_add() in mem_pool_strvfmt()

Commit Message

Comments

Patch