Message ID | e426627e1494e31e548fe044c1c1806ff59340cf.1572356272.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix t5516 flakiness in Visual Studio builds | expand |
On 29.10.2019 14:37, Johannes Schindelin via GitGitGadget wrote: > - vsnprintf(msg, sizeof(msg), err, params); > - for (p = msg; *p; p++) { > + p = msg + off < pend ? msg + off : pend - 1; According to my understanding, this is undefined behavior: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf ... Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined. > The MSVC runtime behavior differs from glibc's with respect to > `fprintf(stderr, ...)` in that the former writes out the message > character by character. Can you please preserve the research text about fprintf() behavior on different platforms that I provided before? Change formatting to what you think fits best. fprintf() has problems with any buffering settings -------------------------------------------------- 1) If `stderr` is fully buffered: the ordering of `stdout` and `stderr` messages could be wrong, because `stderr` output waits for the buffer to become full. 2) If `stderr` has any type of buffering: buffer has fixed size, which could lead to interleaved buffer blocks when two threads/processes write at the same time. 3) If `stderr` is not buffered: Some implementations, such as VC++ and MinGW, literally disable buffering and `fprintf()` will output char-by-char, which leads to unreadable char-interleaved writes if two processes write to `stderr` at the same time (threads are OK because `fprintf()` usually locks `FILE*` in current process). 4) If stderr is line buffered: MinGW/VC++ will enable full buffering instead. See MSDN for `setvbuf()`. fprintf() behavior in git, per platform --------------------------------------- 1) libc - large outputs can be block-interleaved fprintf() enables temporary stream buffering. Code references: buffered_vfprintf() 2) VC++ - char-interleaved fprintf() enables temporary stream buffering, but only if stream was not set to no buffering. This has no effect, because stderr is not buffered by default, and git takes an extra step to ensure that in `swap_osfhnd()`. Code references: _iob[_IOB_ENTRIES] __acrt_stdio_temporary_buffering_guard has_any_buffer() 3) MinGW - char-interleaved (console), full buffering (file) `fprintf()` obeys `stderr` buffering. But it uses old MSVCRT.DLL, which eventually calls `_flsbuf()`, which enables buffering unless `isatty(stderr)` or buffering is disabled. Buffering is not disabled by default for `stderr`. Therefore, buffering is enabled only for file-redirected `stderr`. Code references: __mingw_vfprintf() __pformat_wcputs() _fputc_nolock() _flsbuf() _iob[_IOB_ENTRIES]
Hi Alex, On Tue, 29 Oct 2019, Alexandr Miloslavskiy wrote: > On 29.10.2019 14:37, Johannes Schindelin via GitGitGadget wrote: > > > - vsnprintf(msg, sizeof(msg), err, params); > > - for (p = msg; *p; p++) { > > + p = msg + off < pend ? msg + off : pend - 1; > > According to my understanding, this is undefined behavior: It is not entirely obvious to me what exactly you mean by "this", assuming that you refer to comparing two pointers via `<`, I agree that this is not the best idea, I changed it to `off < pend - msg`. If my assumption is correct, however, we are still talking about C, so I wonder how this C++ document you linked could bear any relevance: > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf > ... Unless both pointers point to elements of the same array object, or one > past the last element of the array object, the behavior is undefined. > > > The MSVC runtime behavior differs from glibc's with respect to > > `fprintf(stderr, ...)` in that the former writes out the message > > character by character. > > Can you please preserve the research text about fprintf() behavior on > different platforms that I provided before? Change formatting to what you > think fits best. Quite honestly, I purposefully refrained from copying that information. While even the following paragraphs are incomplete by necessity (there are many more standard C libraries we try to work against, e.g. musl and newlibc, and it would be insanity to try to cover them all in an analysis of stdio buffering), the only finding that is relevant to the patch under discussion is that MSVC's runtime outputs to `stderr` byte by byte (or more correctly, character by character, I guess), and I did mention this in the commit message (in the part you quoted). Thanks, Dscho > > fprintf() has problems with any buffering settings > -------------------------------------------------- > 1) If `stderr` is fully buffered: > the ordering of `stdout` and `stderr` messages could be wrong, > because `stderr` output waits for the buffer to become full. > 2) If `stderr` has any type of buffering: > buffer has fixed size, which could lead to interleaved buffer blocks > when two threads/processes write at the same time. > 3) If `stderr` is not buffered: > Some implementations, such as VC++ and MinGW, literally disable > buffering and `fprintf()` will output char-by-char, which leads to > unreadable char-interleaved writes if two processes write to > `stderr` at the same time (threads are OK because `fprintf()` > usually locks `FILE*` in current process). > 4) If stderr is line buffered: MinGW/VC++ will enable full buffering > instead. See MSDN for `setvbuf()`. > > fprintf() behavior in git, per platform > --------------------------------------- > 1) libc - large outputs can be block-interleaved > fprintf() enables temporary stream buffering. > Code references: > buffered_vfprintf() > 2) VC++ - char-interleaved > fprintf() enables temporary stream buffering, but only if stream was > not set to no buffering. This has no effect, because stderr is not > buffered by default, and git takes an extra step to ensure that in > `swap_osfhnd()`. > Code references: > _iob[_IOB_ENTRIES] > __acrt_stdio_temporary_buffering_guard > has_any_buffer() > 3) MinGW - char-interleaved (console), full buffering (file) > `fprintf()` obeys `stderr` buffering. But it uses old MSVCRT.DLL, > which eventually calls `_flsbuf()`, which enables buffering unless > `isatty(stderr)` or buffering is disabled. Buffering is not disabled > by default for `stderr`. Therefore, buffering is enabled only for > file-redirected `stderr`. > Code references: > __mingw_vfprintf() > __pformat_wcputs() > _fputc_nolock() > _flsbuf() > _iob[_IOB_ENTRIES] > >
On Tue, Oct 29, 2019 at 08:57:33PM +0100, Johannes Schindelin wrote: > > On 29.10.2019 14:37, Johannes Schindelin via GitGitGadget wrote: > > > > > - vsnprintf(msg, sizeof(msg), err, params); > > > - for (p = msg; *p; p++) { > > > + p = msg + off < pend ? msg + off : pend - 1; > > > > According to my understanding, this is undefined behavior: > > It is not entirely obvious to me what exactly you mean by "this", > assuming that you refer to comparing two pointers via `<`, I agree that > this is not the best idea, I changed it to `off < pend - msg`. If my > assumption is correct, however, we are still talking about C, so I > wonder how this C++ document you linked could bear any relevance: I think the issue is not the comparison, but rather that forming the pointer "msg + off" is undefined, since its point outside of any object (including the "one past" rule). And this is illegal in both C and C++, though of course it works fine in practice most of the time. Doing "off < pend - msg" is legal. Or note that "pend" is just "msg + sizeof(msg)", subtract out the common term and end up with "off < sizeof(msg)". :) > > Can you please preserve the research text about fprintf() behavior on > > different platforms that I provided before? Change formatting to what you > > think fits best. > > Quite honestly, I purposefully refrained from copying that information. > While even the following paragraphs are incomplete by necessity (there > are many more standard C libraries we try to work against, e.g. musl and > newlibc, and it would be insanity to try to cover them all in an > analysis of stdio buffering), the only finding that is relevant to the > patch under discussion is that MSVC's runtime outputs to `stderr` byte > by byte (or more correctly, character by character, I guess), and I did > mention this in the commit message (in the part you quoted). One other case is impacted, which is: > > 1) If `stderr` is fully buffered: > > the ordering of `stdout` and `stderr` messages could be wrong, > > because `stderr` output waits for the buffer to become full. We'll now output the stderr message closer to its time-of-printing, which is a good thing (it may still be in a weird place with respect to a buffered stdout, but getting the error message out ASAP is the best we can do there). I agree that the rest of the research is not especially relevant to the code change (though I'm glad it is available in the list archive). -Peff
On 29.10.2019 20:57, Johannes Schindelin wrote: > It is not entirely obvious to me what exactly you mean by "this", > assuming that you refer to comparing two pointers via `<`, I agree that > this is not the best idea, I changed it to `off < pend - msg`. If my > assumption is correct, however, we are still talking about C, so I > wonder how this C++ document you linked could bear any relevance: The linked portion of text speaks about adding to pointer. The paragraph is a bit long and my quote was indeed hard to grasp. And yes, sorry, C++ habits. However, the same thing applies to C, let's have this link instead: https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf Hopefully a better quote this time (c/c++ standard are hard to read for mortals like me): When an expression that has integer type is added to or subtracted from a pointer <...> If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, <...> otherwise, the behavior is undefined. > Quite honestly, I purposefully refrained from copying that information. That's a pity, I've put quite an effort into researching all that.
On 29.10.2019 21:09, Jeff King wrote: > I think the issue is not the comparison, but rather that forming the > pointer "msg + off" is undefined, since its point outside of any object > (including the "one past" rule). And this is illegal in both C and C++, Yes, thanks for clarifying. > + p = msg + off < pend ? msg + off : pend - 1; > though of course it works fine in practice most of the time. The easiest thing that an optimizer can do (and that's actually quite likely in my understanding) is to conclude that 'msg + off' is the only non-UB branch here, therefore 'msg + off < pend' can be taken as always true, and then discard 'pend - 1' branch. Afterall, UB serve the biggest role in optimizations. That's even more true now that all parts of equation are local and easy to grasp for static analysis.
diff --git a/usage.c b/usage.c index 2fdb20086b..5dfd38acb8 100644 --- a/usage.c +++ b/usage.c @@ -9,14 +9,21 @@ void vreportf(const char *prefix, const char *err, va_list params) { char msg[4096]; - char *p; + size_t off = strlcpy(msg, prefix, sizeof(msg)); + char *p, *pend = msg + sizeof(msg); - vsnprintf(msg, sizeof(msg), err, params); - for (p = msg; *p; p++) { + p = msg + off < pend ? msg + off : pend - 1; + if (vsnprintf(p, pend - p, err, params) < 0) + return; /* vsnprintf() failed, there is nothing we can do */ + + for (; p != pend - 1 && *p; p++) { if (iscntrl(*p) && *p != '\t' && *p != '\n') *p = '?'; } - fprintf(stderr, "%s%s\n", prefix, msg); + + *(p++) = '\n'; /* we no longer need a NUL */ + fflush(stderr); + xwrite(2, msg, p - msg); } static NORETURN void usage_builtin(const char *err, va_list params)