diff mbox series

[v2,1/1] vreportf(): avoid relying on stdio buffering

Message ID e426627e1494e31e548fe044c1c1806ff59340cf.1572356272.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Fix t5516 flakiness in Visual Studio builds | expand

Commit Message

Linus Arver via GitGitGadget Oct. 29, 2019, 1:37 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

The MSVC runtime behavior differs from glibc's with respect to
`fprintf(stderr, ...)` in that the former writes out the message
character by character.

In t5516, this leads to a funny problem where a `git fetch` process as
well as the `git upload-pack` process spawned by it _both_ call `die()`
at the same time. The output can look like this:

	fatal: git uploadfata-lp: raemcokte :error:  upload-pnot our arcef k6: n4ot our ea4cr1e3f 36d45ea94fca1398e86a771eda009872d63adb28598f6a9
	8e86a771eda009872d6ab2886

Let's avoid this predicament altogether by rendering the entire message,
including the prefix and the trailing newline, into the buffer we
already have (and which is still fixed size) and then write it out via
`xwrite()`.

We still clip the message to at most 4095 characters.

The history of `vreportf()` with regard to this issue includes the
following commits:

d048a96e (2007-11-09) - 'char msg[256]' is introduced to avoid interleaving
389d1767 (2009-03-25) - Buffer size increased to 1024 to avoid truncation
625a860c (2009-11-22) - Buffer size increased to 4096 to avoid truncation
f4c3edc0 (2015-08-11) - Buffer removed to avoid truncation
b5a9e435 (2017-01-11) - Reverts f4c3edc0 to be able to replace control
                        chars before sending to stderr
9ac13ec9 (2006-10-11) - Another attempt to solve interleaving.
                        This is seemingly related to d048a96e.
137a0d0e (2007-11-19) - Addresses out-of-order for display()
34df8aba (2009-03-10) - Switches xwrite() to fprintf() in recv_sideband()
                        to support UTF-8 emulation
eac14f89 (2012-01-14) - Removes the need for fprintf() for UTF-8 emulation,
                        so it's safe to use xwrite() again
5e5be9e2 (2016-06-28) - recv_sideband() uses xwrite() again

Note that we print nothing if the `vsnprintf()` call failed to render
the error message; There is little we can do in that case, and it should
not happen anyway.

Also please note that we `fflush(stderr)` here to help when running in a
Git Bash on Windows: in this case, `stderr` is not actually truly
unbuffered, and needs the extra help.

Helped-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 usage.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

Comments

Alexandr Miloslavskiy Oct. 29, 2019, 2:21 p.m. UTC | #1
On 29.10.2019 14:37, Johannes Schindelin via GitGitGadget wrote:

 > -	vsnprintf(msg, sizeof(msg), err, params);
 > -	for (p = msg; *p; p++) {
 > +	p = msg + off < pend ? msg + off : pend - 1;

According to my understanding, this is undefined behavior:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf
... Unless both pointers point to elements of the same array object, or 
one past the last element of the array object, the behavior is undefined.

 > The MSVC runtime behavior differs from glibc's with respect to
 > `fprintf(stderr, ...)` in that the former writes out the message
 > character by character.

Can you please preserve the research text about fprintf() behavior on 
different platforms that I provided before? Change formatting to what 
you think fits best.

fprintf() has problems with any buffering settings
--------------------------------------------------
1) If `stderr` is fully buffered:
    the ordering of `stdout` and `stderr` messages could be wrong,
    because `stderr` output waits for the buffer to become full.
2) If `stderr` has any type of buffering:
    buffer has fixed size, which could lead to interleaved buffer blocks
    when two threads/processes write at the same time.
3) If `stderr` is not buffered:
    Some implementations, such as VC++ and MinGW, literally disable
    buffering and `fprintf()` will output char-by-char, which leads to
    unreadable char-interleaved writes if two processes write to
    `stderr` at the same time (threads are OK because `fprintf()`
    usually locks `FILE*` in current process).
4) If stderr is line buffered: MinGW/VC++ will enable full buffering
    instead. See MSDN for `setvbuf()`.

fprintf() behavior in git, per platform
---------------------------------------
1) libc - large outputs can be block-interleaved
    fprintf() enables temporary stream buffering.
    Code references:
        buffered_vfprintf()
2) VC++ - char-interleaved
    fprintf() enables temporary stream buffering, but only if stream was
    not set to no buffering. This has no effect, because stderr is not
    buffered by default, and git takes an extra step to ensure that in
    `swap_osfhnd()`.
    Code references:
        _iob[_IOB_ENTRIES]
        __acrt_stdio_temporary_buffering_guard
        has_any_buffer()
3) MinGW - char-interleaved (console), full buffering (file)
    `fprintf()` obeys `stderr` buffering. But it uses old MSVCRT.DLL,
    which eventually calls `_flsbuf()`, which enables buffering unless
    `isatty(stderr)` or buffering is disabled. Buffering is not disabled
    by default for `stderr`. Therefore, buffering is enabled only for
    file-redirected `stderr`.
    Code references:
        __mingw_vfprintf()
        __pformat_wcputs()
        _fputc_nolock()
        _flsbuf()
        _iob[_IOB_ENTRIES]
Johannes Schindelin Oct. 29, 2019, 7:57 p.m. UTC | #2
Hi Alex,

On Tue, 29 Oct 2019, Alexandr Miloslavskiy wrote:

> On 29.10.2019 14:37, Johannes Schindelin via GitGitGadget wrote:
>
> > -	vsnprintf(msg, sizeof(msg), err, params);
> > -	for (p = msg; *p; p++) {
> > +	p = msg + off < pend ? msg + off : pend - 1;
>
> According to my understanding, this is undefined behavior:

It is not entirely obvious to me what exactly you mean by "this",
assuming that you refer to comparing two pointers via `<`, I agree that
this is not the best idea, I changed it to `off < pend - msg`. If my
assumption is correct, however, we are still talking about C, so I
wonder how this C++ document you linked could bear any relevance:

> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf
> ... Unless both pointers point to elements of the same array object, or one
> past the last element of the array object, the behavior is undefined.
>
> > The MSVC runtime behavior differs from glibc's with respect to
> > `fprintf(stderr, ...)` in that the former writes out the message
> > character by character.
>
> Can you please preserve the research text about fprintf() behavior on
> different platforms that I provided before? Change formatting to what you
> think fits best.

Quite honestly, I purposefully refrained from copying that information.
While even the following paragraphs are incomplete by necessity (there
are many more standard C libraries we try to work against, e.g. musl and
newlibc, and it would be insanity to try to cover them all in an
analysis of stdio buffering), the only finding that is relevant to the
patch under discussion is that MSVC's runtime outputs to `stderr` byte
by byte (or more correctly, character by character, I guess), and I did
mention this in the commit message (in the part you quoted).

Thanks,
Dscho

>
> fprintf() has problems with any buffering settings
> --------------------------------------------------
> 1) If `stderr` is fully buffered:
>    the ordering of `stdout` and `stderr` messages could be wrong,
>    because `stderr` output waits for the buffer to become full.
> 2) If `stderr` has any type of buffering:
>    buffer has fixed size, which could lead to interleaved buffer blocks
>    when two threads/processes write at the same time.
> 3) If `stderr` is not buffered:
>    Some implementations, such as VC++ and MinGW, literally disable
>    buffering and `fprintf()` will output char-by-char, which leads to
>    unreadable char-interleaved writes if two processes write to
>    `stderr` at the same time (threads are OK because `fprintf()`
>    usually locks `FILE*` in current process).
> 4) If stderr is line buffered: MinGW/VC++ will enable full buffering
>    instead. See MSDN for `setvbuf()`.
>
> fprintf() behavior in git, per platform
> ---------------------------------------
> 1) libc - large outputs can be block-interleaved
>    fprintf() enables temporary stream buffering.
>    Code references:
>        buffered_vfprintf()
> 2) VC++ - char-interleaved
>    fprintf() enables temporary stream buffering, but only if stream was
>    not set to no buffering. This has no effect, because stderr is not
>    buffered by default, and git takes an extra step to ensure that in
>    `swap_osfhnd()`.
>    Code references:
>        _iob[_IOB_ENTRIES]
>        __acrt_stdio_temporary_buffering_guard
>        has_any_buffer()
> 3) MinGW - char-interleaved (console), full buffering (file)
>    `fprintf()` obeys `stderr` buffering. But it uses old MSVCRT.DLL,
>    which eventually calls `_flsbuf()`, which enables buffering unless
>    `isatty(stderr)` or buffering is disabled. Buffering is not disabled
>    by default for `stderr`. Therefore, buffering is enabled only for
>    file-redirected `stderr`.
>    Code references:
>        __mingw_vfprintf()
>        __pformat_wcputs()
>        _fputc_nolock()
>        _flsbuf()
>        _iob[_IOB_ENTRIES]
>
>
Jeff King Oct. 29, 2019, 8:09 p.m. UTC | #3
On Tue, Oct 29, 2019 at 08:57:33PM +0100, Johannes Schindelin wrote:

> > On 29.10.2019 14:37, Johannes Schindelin via GitGitGadget wrote:
> >
> > > -	vsnprintf(msg, sizeof(msg), err, params);
> > > -	for (p = msg; *p; p++) {
> > > +	p = msg + off < pend ? msg + off : pend - 1;
> >
> > According to my understanding, this is undefined behavior:
> 
> It is not entirely obvious to me what exactly you mean by "this",
> assuming that you refer to comparing two pointers via `<`, I agree that
> this is not the best idea, I changed it to `off < pend - msg`. If my
> assumption is correct, however, we are still talking about C, so I
> wonder how this C++ document you linked could bear any relevance:

I think the issue is not the comparison, but rather that forming the
pointer "msg + off" is undefined, since its point outside of any object
(including the "one past" rule). And this is illegal in both C and C++,
though of course it works fine in practice most of the time.

Doing "off < pend - msg" is legal. Or note that "pend" is just
"msg + sizeof(msg)", subtract out the common term and end up with
"off < sizeof(msg)". :)

> > Can you please preserve the research text about fprintf() behavior on
> > different platforms that I provided before? Change formatting to what you
> > think fits best.
> 
> Quite honestly, I purposefully refrained from copying that information.
> While even the following paragraphs are incomplete by necessity (there
> are many more standard C libraries we try to work against, e.g. musl and
> newlibc, and it would be insanity to try to cover them all in an
> analysis of stdio buffering), the only finding that is relevant to the
> patch under discussion is that MSVC's runtime outputs to `stderr` byte
> by byte (or more correctly, character by character, I guess), and I did
> mention this in the commit message (in the part you quoted).

One other case is impacted, which is:

> > 1) If `stderr` is fully buffered:
> >    the ordering of `stdout` and `stderr` messages could be wrong,
> >    because `stderr` output waits for the buffer to become full.

We'll now output the stderr message closer to its time-of-printing,
which is a good thing (it may still be in a weird place with respect to
a buffered stdout, but getting the error message out ASAP is the best we
can do there).

I agree that the rest of the research is not especially relevant to the
code change (though I'm glad it is available in the list archive).

-Peff
Alexandr Miloslavskiy Oct. 29, 2019, 8:11 p.m. UTC | #4
On 29.10.2019 20:57, Johannes Schindelin wrote:

> It is not entirely obvious to me what exactly you mean by "this",
> assuming that you refer to comparing two pointers via `<`, I agree that
> this is not the best idea, I changed it to `off < pend - msg`. If my
> assumption is correct, however, we are still talking about C, so I
> wonder how this C++ document you linked could bear any relevance:

The linked portion of text speaks about adding to pointer. The paragraph 
is a bit long and my quote was indeed hard to grasp. And yes, sorry, C++ 
habits. However, the same thing applies to C, let's have this link instead:

https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf

Hopefully a better quote this time (c/c++ standard are hard to read for 
mortals like me):

When an expression that has integer type is added to or subtracted from 
a pointer <...> If both the pointer operand and the result point
to elements of the same array object, or one past the last element of 
the array object, <...> otherwise, the behavior is undefined.

> Quite honestly, I purposefully refrained from copying that information.

That's a pity, I've put quite an effort into researching all that.
Alexandr Miloslavskiy Oct. 29, 2019, 8:24 p.m. UTC | #5
On 29.10.2019 21:09, Jeff King wrote:

> I think the issue is not the comparison, but rather that forming the
> pointer "msg + off" is undefined, since its point outside of any object
> (including the "one past" rule). And this is illegal in both C and C++,

Yes, thanks for clarifying.

 > +    p = msg + off < pend ? msg + off : pend - 1;

 > though of course it works fine in practice most of the time.

The easiest thing that an optimizer can do (and that's actually quite 
likely in my understanding) is to conclude that 'msg + off' is the only 
non-UB branch here, therefore 'msg + off < pend' can be taken as always 
true, and then discard 'pend - 1' branch. Afterall, UB serve the biggest 
role in optimizations.

That's even more true now that all parts of equation are local and easy 
to grasp for static analysis.
diff mbox series

Patch

diff --git a/usage.c b/usage.c
index 2fdb20086b..5dfd38acb8 100644
--- a/usage.c
+++ b/usage.c
@@ -9,14 +9,21 @@ 
 void vreportf(const char *prefix, const char *err, va_list params)
 {
 	char msg[4096];
-	char *p;
+	size_t off = strlcpy(msg, prefix, sizeof(msg));
+	char *p, *pend = msg + sizeof(msg);
 
-	vsnprintf(msg, sizeof(msg), err, params);
-	for (p = msg; *p; p++) {
+	p = msg + off < pend ? msg + off : pend - 1;
+	if (vsnprintf(p, pend - p, err, params) < 0)
+		return; /* vsnprintf() failed, there is nothing we can do */
+
+	for (; p != pend - 1 && *p; p++) {
 		if (iscntrl(*p) && *p != '\t' && *p != '\n')
 			*p = '?';
 	}
-	fprintf(stderr, "%s%s\n", prefix, msg);
+
+	*(p++) = '\n'; /* we no longer need a NUL */
+	fflush(stderr);
+	xwrite(2, msg, p - msg);
 }
 
 static NORETURN void usage_builtin(const char *err, va_list params)