Message ID | 20210805000435.10833-1-alex_y_xu@yahoo.ca (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | pipe: increase minimum default pipe size to 2 pages | expand |
On Wed, Aug 04, 2021 at 08:04:35PM -0400, Alex Xu (Hello71) wrote: > Before this patch, the following program prints 4096 and hangs. > Afterwards, it prints 8192 and exits successfully. Note that you may > need to increase your RLIMIT_NOFILE before running the program. > > int main() { > int pipefd[2]; > for (int i = 0; i < 1025; i++) > if (pipe(pipefd) == -1) > return 1; > size_t bufsz = fcntl(pipefd[1], F_GETPIPE_SZ); > printf("%zd\n", bufsz); > char *buf = calloc(bufsz, 1); > write(pipefd[1], buf, bufsz); > read(pipefd[0], buf, bufsz-1); > write(pipefd[1], buf, 1); > } > > Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> > --- Is this due to the changes that happened in 5.5? If so, a cc: stable and a fixes tag would be nice to have :) > See discussion at https://lore.kernel.org/lkml/1628086770.5rn8p04n6j.none@localhost/. This can go up in the changelog text too. thanks, greg k-h
Excerpts from Greg KH's message of August 5, 2021 4:35 am: > On Wed, Aug 04, 2021 at 08:04:35PM -0400, Alex Xu (Hello71) wrote: >> Before this patch, the following program prints 4096 and hangs. >> Afterwards, it prints 8192 and exits successfully. Note that you may >> need to increase your RLIMIT_NOFILE before running the program. >> >> int main() { >> int pipefd[2]; >> for (int i = 0; i < 1025; i++) >> if (pipe(pipefd) == -1) >> return 1; >> size_t bufsz = fcntl(pipefd[1], F_GETPIPE_SZ); >> printf("%zd\n", bufsz); >> char *buf = calloc(bufsz, 1); >> write(pipefd[1], buf, bufsz); >> read(pipefd[0], buf, bufsz-1); >> write(pipefd[1], buf, 1); >> } >> >> Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> >> --- > > Is this due to the changes that happened in 5.5? If so, a cc: stable > and a fixes tag would be nice to have :) > >> See discussion at https://lore.kernel.org/lkml/1628086770.5rn8p04n6j.none@localhost/. > > This can go up in the changelog text too. > > thanks, > > greg k-h > I tested 5.4 and it exhibits the same problem as master using this non-racy program. I think the problem goes back to v4.5, the first release with 759c01142a ("pipe: limit the per-user amount of pages allocated in pipes"). The issue likely become more apparent with the improvement in pipe performance from v5.5, whereas before that, pipes were too slow for the issue to manifest in racy environments. I'll send a new patch with #include lines and a Fixes: 759c01142a. I'm not 100% sure that it actually goes back that far, but the worst thing that can plausibly happen is that applications opening very large numbers of pipes suddenly use slightly more memory. I certainly hope nobody is relying on pipes randomly blocking roughly 1/4096 of the time. Regards, Alex.
On Thu, Aug 5, 2021 at 7:18 AM Alex Xu (Hello71) <alex_y_xu@yahoo.ca> wrote: > > I tested 5.4 and it exhibits the same problem as master using this > non-racy program. I think the problem goes back to v4.5, the first > release with 759c01142a ("pipe: limit the per-user amount of pages > allocated in pipes"). Yeah, our pipe buffer allocation strategy has been fairly consistent, although the exact locking details etc have certainly changed over time. I do think the behavior goes back all the way to that "limit to one single buffer if you hit the pipe size soft limit" commit, because the thing that example program tests has been true for the whole time, afaik: first fill up the first pipe buffer completely, then (a) read everything but one byte, and then (b) try to write another byte. Doing (a) will leave the pipe buffer still allocated and in use, and then (b) will fundamentally want to allocate a new buffer for the new write. Which will obviously not then be allowed if we have said "one pipe buffer only". So a lot of the code around pipe buffers has changed over the years, and the exact patterns and timing and wakeups has been completely rewritten, but that buffer allocation pattern is pretty fundamental and I don't think that has changed at all. (A long LONG time ago, we had only one pipe buffer, and it was a single circular queue, and you never had this kind of "used up one buffer, need to allocate a new one" issue, so it's not like this goes back to Linux 0.01, but the pipe buffers go back a _loong_ time). Allowing two buffers obviously doesn't change the basic pattern at all - but it means that we will always allow having at least PIPE_BUF bytes in the pipe. So you can obviously still trigger that "cannot write any more, will block any future writes", but at that point it's a clear user bug in thinking that pipes have some infinite buffer size. In contrast, expecting pipes to be able to hold 2 bytes at a time is quite reasonable, with POSIX guaranteeing PIPE_BUF of at least 512 bytes. I've applied Alex's patch. Linus
diff --git a/fs/pipe.c b/fs/pipe.c index 9ef4231cce61..8e6ef62aeb1c 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -31,6 +31,21 @@ #include "internal.h" +/* + * New pipe buffers will be restricted to this size while the user is exceeding + * their pipe buffer quota. The general pipe use case needs at least two + * buffers: one for data yet to be read, and one for new data. If this is less + * than two, then a write to a non-empty pipe may block even if the pipe is not + * full. This can occur with GNU make jobserver or similar uses of pipes as + * semaphores: multiple processes may be waiting to write tokens back to the + * pipe before reading tokens: https://lore.kernel.org/lkml/1628086770.5rn8p04n6j.none@localhost/. + * + * Users can reduce their pipe buffers with F_SETPIPE_SZ below this at their + * own risk, namely: pipe writes to non-full pipes may block until the pipe is + * emptied. + */ +#define PIPE_MIN_DEF_BUFFERS 2 + /* * The max size that a non-root user is allowed to grow the pipe. Can * be set by root in /proc/sys/fs/pipe-max-size @@ -781,8 +796,8 @@ struct pipe_inode_info *alloc_pipe_info(void) user_bufs = account_pipe_buffers(user, 0, pipe_bufs); if (too_many_pipe_buffers_soft(user_bufs) && pipe_is_unprivileged_user()) { - user_bufs = account_pipe_buffers(user, pipe_bufs, 1); - pipe_bufs = 1; + user_bufs = account_pipe_buffers(user, pipe_bufs, PIPE_MIN_DEF_BUFFERS); + pipe_bufs = PIPE_MIN_DEF_BUFFERS; } if (too_many_pipe_buffers_hard(user_bufs) && pipe_is_unprivileged_user())
Before this patch, the following program prints 4096 and hangs. Afterwards, it prints 8192 and exits successfully. Note that you may need to increase your RLIMIT_NOFILE before running the program. int main() { int pipefd[2]; for (int i = 0; i < 1025; i++) if (pipe(pipefd) == -1) return 1; size_t bufsz = fcntl(pipefd[1], F_GETPIPE_SZ); printf("%zd\n", bufsz); char *buf = calloc(bufsz, 1); write(pipefd[1], buf, bufsz); read(pipefd[0], buf, bufsz-1); write(pipefd[1], buf, 1); } Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> --- See discussion at https://lore.kernel.org/lkml/1628086770.5rn8p04n6j.none@localhost/. fs/pipe.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)