diff mbox series

pipe: increase minimum default pipe size to 2 pages

Message ID 20210805000435.10833-1-alex_y_xu@yahoo.ca (mailing list archive)
State New, archived
Headers show
Series pipe: increase minimum default pipe size to 2 pages | expand

Commit Message

Alex Xu (Hello71) Aug. 5, 2021, 12:04 a.m. UTC
Before this patch, the following program prints 4096 and hangs.
Afterwards, it prints 8192 and exits successfully. Note that you may
need to increase your RLIMIT_NOFILE before running the program.

int main() {
    int pipefd[2];
    for (int i = 0; i < 1025; i++)
        if (pipe(pipefd) == -1)
            return 1;
    size_t bufsz = fcntl(pipefd[1], F_GETPIPE_SZ);
    printf("%zd\n", bufsz);
    char *buf = calloc(bufsz, 1);
    write(pipefd[1], buf, bufsz);
    read(pipefd[0], buf, bufsz-1);
    write(pipefd[1], buf, 1);
}

Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
---

See discussion at https://lore.kernel.org/lkml/1628086770.5rn8p04n6j.none@localhost/.

 fs/pipe.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Greg KH Aug. 5, 2021, 8:35 a.m. UTC | #1
On Wed, Aug 04, 2021 at 08:04:35PM -0400, Alex Xu (Hello71) wrote:
> Before this patch, the following program prints 4096 and hangs.
> Afterwards, it prints 8192 and exits successfully. Note that you may
> need to increase your RLIMIT_NOFILE before running the program.
> 
> int main() {
>     int pipefd[2];
>     for (int i = 0; i < 1025; i++)
>         if (pipe(pipefd) == -1)
>             return 1;
>     size_t bufsz = fcntl(pipefd[1], F_GETPIPE_SZ);
>     printf("%zd\n", bufsz);
>     char *buf = calloc(bufsz, 1);
>     write(pipefd[1], buf, bufsz);
>     read(pipefd[0], buf, bufsz-1);
>     write(pipefd[1], buf, 1);
> }
> 
> Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
> ---

Is this due to the changes that happened in 5.5?  If so, a cc: stable
and a fixes tag would be nice to have :)

> See discussion at https://lore.kernel.org/lkml/1628086770.5rn8p04n6j.none@localhost/.

This can go up in the changelog text too.

thanks,

greg k-h
Alex Xu (Hello71) Aug. 5, 2021, 2:18 p.m. UTC | #2
Excerpts from Greg KH's message of August 5, 2021 4:35 am:
> On Wed, Aug 04, 2021 at 08:04:35PM -0400, Alex Xu (Hello71) wrote:
>> Before this patch, the following program prints 4096 and hangs.
>> Afterwards, it prints 8192 and exits successfully. Note that you may
>> need to increase your RLIMIT_NOFILE before running the program.
>> 
>> int main() {
>>     int pipefd[2];
>>     for (int i = 0; i < 1025; i++)
>>         if (pipe(pipefd) == -1)
>>             return 1;
>>     size_t bufsz = fcntl(pipefd[1], F_GETPIPE_SZ);
>>     printf("%zd\n", bufsz);
>>     char *buf = calloc(bufsz, 1);
>>     write(pipefd[1], buf, bufsz);
>>     read(pipefd[0], buf, bufsz-1);
>>     write(pipefd[1], buf, 1);
>> }
>> 
>> Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
>> ---
> 
> Is this due to the changes that happened in 5.5?  If so, a cc: stable
> and a fixes tag would be nice to have :)
> 
>> See discussion at https://lore.kernel.org/lkml/1628086770.5rn8p04n6j.none@localhost/.
> 
> This can go up in the changelog text too.
> 
> thanks,
> 
> greg k-h
> 

I tested 5.4 and it exhibits the same problem as master using this 
non-racy program. I think the problem goes back to v4.5, the first 
release with 759c01142a ("pipe: limit the per-user amount of pages 
allocated in pipes"). The issue likely become more apparent with the 
improvement in pipe performance from v5.5, whereas before that, pipes 
were too slow for the issue to manifest in racy environments.

I'll send a new patch with #include lines and a Fixes: 759c01142a. I'm 
not 100% sure that it actually goes back that far, but the worst thing 
that can plausibly happen is that applications opening very large 
numbers of pipes suddenly use slightly more memory. I certainly hope 
nobody is relying on pipes randomly blocking roughly 1/4096 of the 
time.

Regards,
Alex.
Linus Torvalds Aug. 5, 2021, 5:41 p.m. UTC | #3
On Thu, Aug 5, 2021 at 7:18 AM Alex Xu (Hello71) <alex_y_xu@yahoo.ca> wrote:
>
> I tested 5.4 and it exhibits the same problem as master using this
> non-racy program. I think the problem goes back to v4.5, the first
> release with 759c01142a ("pipe: limit the per-user amount of pages
> allocated in pipes").

Yeah, our pipe buffer allocation strategy has been fairly consistent,
although the exact locking details etc have certainly changed over
time.

I do think the behavior goes back all the way to that "limit to one
single buffer if you hit the pipe size soft limit" commit, because the
thing that example program tests has been true for the whole time,
afaik: first fill up the first pipe buffer completely, then (a) read
everything but one byte, and then (b) try to write another byte.

Doing (a) will leave the pipe buffer still allocated and in use, and
then (b) will fundamentally want to allocate a new buffer for the new
write.  Which will obviously not then be allowed if we have said "one
pipe buffer only".

So a lot of the code around pipe buffers has changed over the years,
and the exact patterns and timing and wakeups has been completely
rewritten, but that buffer allocation pattern is pretty fundamental
and I don't think that has changed at all.

(A long LONG time ago, we had only one pipe buffer, and it was a
single circular queue, and you never had this kind of "used up one
buffer, need to allocate a new one" issue, so it's not like this goes
back to Linux 0.01, but the pipe buffers go back a _loong_ time).

Allowing two buffers obviously doesn't change the basic pattern at all
- but it means that we will always allow having at least PIPE_BUF
bytes in the pipe. So you can obviously still trigger that "cannot
write any more, will block any future writes", but at that point it's
a clear user bug in thinking that pipes have some infinite buffer
size.

In contrast, expecting pipes to be able to hold 2 bytes at a time is
quite reasonable, with POSIX guaranteeing PIPE_BUF of at least 512
bytes.

I've applied Alex's patch.

                 Linus
diff mbox series

Patch

diff --git a/fs/pipe.c b/fs/pipe.c
index 9ef4231cce61..8e6ef62aeb1c 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -31,6 +31,21 @@ 
 
 #include "internal.h"
 
+/*
+ * New pipe buffers will be restricted to this size while the user is exceeding
+ * their pipe buffer quota. The general pipe use case needs at least two
+ * buffers: one for data yet to be read, and one for new data. If this is less
+ * than two, then a write to a non-empty pipe may block even if the pipe is not
+ * full. This can occur with GNU make jobserver or similar uses of pipes as
+ * semaphores: multiple processes may be waiting to write tokens back to the
+ * pipe before reading tokens: https://lore.kernel.org/lkml/1628086770.5rn8p04n6j.none@localhost/.
+ *
+ * Users can reduce their pipe buffers with F_SETPIPE_SZ below this at their
+ * own risk, namely: pipe writes to non-full pipes may block until the pipe is
+ * emptied.
+ */
+#define PIPE_MIN_DEF_BUFFERS 2
+
 /*
  * The max size that a non-root user is allowed to grow the pipe. Can
  * be set by root in /proc/sys/fs/pipe-max-size
@@ -781,8 +796,8 @@  struct pipe_inode_info *alloc_pipe_info(void)
 	user_bufs = account_pipe_buffers(user, 0, pipe_bufs);
 
 	if (too_many_pipe_buffers_soft(user_bufs) && pipe_is_unprivileged_user()) {
-		user_bufs = account_pipe_buffers(user, pipe_bufs, 1);
-		pipe_bufs = 1;
+		user_bufs = account_pipe_buffers(user, pipe_bufs, PIPE_MIN_DEF_BUFFERS);
+		pipe_bufs = PIPE_MIN_DEF_BUFFERS;
 	}
 
 	if (too_many_pipe_buffers_hard(user_bufs) && pipe_is_unprivileged_user())