From patchwork Wed Oct 23 20:17:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11207583 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39F4813B1 for ; Wed, 23 Oct 2019 20:17:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 18C3921872 for ; Wed, 23 Oct 2019 20:17:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OttP3Ril" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405554AbfJWUR0 (ORCPT ); Wed, 23 Oct 2019 16:17:26 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:34074 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2405621AbfJWUR0 (ORCPT ); Wed, 23 Oct 2019 16:17:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571861844; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1ofVwMOE/ATx0q/xqI7IgeNPgnC6+tkVYzw7GkmuoBw=; b=OttP3RilWF54D3r2QOAkuXw4B9LeX+HMCwDXBHpUXozzfOtu95dm1OBehPmeMrY3ZRmKDW EGBaMSPymM6Oi2QLLLUNQKLYxhxPHtNQvEAzpeP2F/a1dnuJlMs5mHOx4IpVV1lZ01pKrL HHYCIlr90r3D1neI8udmeWxjod892JY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-151-zmJeb5HCO2eT0FmCK4nFdw-1; Wed, 23 Oct 2019 16:17:19 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9ADC1800D5A; Wed, 23 Oct 2019 20:17:17 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id BE5645C1D4; Wed, 23 Oct 2019 20:17:14 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 01/10] pipe: Reduce #inclusion of pipe_fs_i.h [ver #2] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Oct 2019 21:17:14 +0100 Message-ID: <157186183402.3995.13580610732742590957.stgit@warthog.procyon.org.uk> In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-MC-Unique: zmJeb5HCO2eT0FmCK4nFdw-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: Remove some #inclusions of linux/pipe_fs_i.h that don't seem to be necessary any more. Signed-off-by: David Howells --- fs/exec.c | 1 - fs/ocfs2/aops.c | 1 - security/smack/smack_lsm.c | 1 - 3 files changed, 3 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 555e93c7dec8..57bc7ef8d31b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -59,7 +59,6 @@ #include #include #include -#include #include #include #include diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 8de1c9d644f6..c50ac6b7415b 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -11,7 +11,6 @@ #include #include #include -#include #include #include #include diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index abeb09c30633..ecea41ce919b 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -28,7 +28,6 @@ #include #include #include -#include #include #include #include From patchwork Wed Oct 23 20:17:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11207593 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 46F3E1747 for ; Wed, 23 Oct 2019 20:17:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1C76C21872 for ; Wed, 23 Oct 2019 20:17:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KUg1t7Ae" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405673AbfJWURd (ORCPT ); Wed, 23 Oct 2019 16:17:33 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:24221 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2405664AbfJWURd (ORCPT ); Wed, 23 Oct 2019 16:17:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571861851; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XsR5JVvlz2axvxZwMdTJ7DzX9ow3E3JJWkMAMhNB57M=; b=KUg1t7AentdkJD0/6grYk76NSispahMY2UqFTjQBiuZZAbNZfpaoevNwjFpvDnZqXK5vlA Lgf7DCElYeHcHAiY2kOzUVNhdiY93Cv8Rv0YsPXiDJsTHGKMbivTRvLLF32EMxazKYgL81 i7WXK0cNoilFdFruThFNPeYZuWjzpvQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-223-N0f243VWNhW2UxMcsze37g-1; Wed, 23 Oct 2019 16:17:28 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5148F80183D; Wed, 23 Oct 2019 20:17:26 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id 92112194B6; Wed, 23 Oct 2019 20:17:23 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 02/10] Remove the nr_exclusive argument from __wake_up_sync_key() [ver #2] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Oct 2019 21:17:22 +0100 Message-ID: <157186184284.3995.13755220637594977072.stgit@warthog.procyon.org.uk> In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: N0f243VWNhW2UxMcsze37g-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: Remove the nr_exclusive argument from __wake_up_sync_key() and derived functions as everything seems to set it to 1. Note also that if it wasn't set to 1, it would clear WF_SYNC anyway. Signed-off-by: David Howells --- include/linux/wait.h | 8 ++++---- kernel/exit.c | 2 +- kernel/sched/wait.c | 14 ++++---------- 3 files changed, 9 insertions(+), 15 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index 3eb7cae8206c..bb7676d396cd 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -201,9 +201,9 @@ void __wake_up(struct wait_queue_head *wq_head, unsigned int mode, int nr, void void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key_bookmark(struct wait_queue_head *wq_head, unsigned int mode, void *key, wait_queue_entry_t *bookmark); -void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode, int nr, void *key); +void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked(struct wait_queue_head *wq_head, unsigned int mode, int nr); -void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode, int nr); +void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode); #define wake_up(x) __wake_up(x, TASK_NORMAL, 1, NULL) #define wake_up_nr(x, nr) __wake_up(x, TASK_NORMAL, nr, NULL) @@ -214,7 +214,7 @@ void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode, int nr); #define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL) #define wake_up_interruptible_nr(x, nr) __wake_up(x, TASK_INTERRUPTIBLE, nr, NULL) #define wake_up_interruptible_all(x) __wake_up(x, TASK_INTERRUPTIBLE, 0, NULL) -#define wake_up_interruptible_sync(x) __wake_up_sync((x), TASK_INTERRUPTIBLE, 1) +#define wake_up_interruptible_sync(x) __wake_up_sync((x), TASK_INTERRUPTIBLE) /* * Wakeup macros to be used to report events to the targets. @@ -228,7 +228,7 @@ void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode, int nr); #define wake_up_interruptible_poll(x, m) \ __wake_up(x, TASK_INTERRUPTIBLE, 1, poll_to_key(m)) #define wake_up_interruptible_sync_poll(x, m) \ - __wake_up_sync_key((x), TASK_INTERRUPTIBLE, 1, poll_to_key(m)) + __wake_up_sync_key((x), TASK_INTERRUPTIBLE, poll_to_key(m)) #define ___wait_cond_timeout(condition) \ ({ \ diff --git a/kernel/exit.c b/kernel/exit.c index a46a50d67002..a1ff25ef050e 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1435,7 +1435,7 @@ static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode, void __wake_up_parent(struct task_struct *p, struct task_struct *parent) { __wake_up_sync_key(&parent->signal->wait_chldexit, - TASK_INTERRUPTIBLE, 1, p); + TASK_INTERRUPTIBLE, p); } static long do_wait(struct wait_opts *wo) diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index c1e566a114ca..b4b52361dab7 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -169,7 +169,6 @@ EXPORT_SYMBOL_GPL(__wake_up_locked_key_bookmark); * __wake_up_sync_key - wake up threads blocked on a waitqueue. * @wq_head: the waitqueue * @mode: which threads - * @nr_exclusive: how many wake-one or wake-many threads to wake up * @key: opaque value to be passed to wakeup targets * * The sync wakeup differs that the waker knows that it will schedule @@ -183,26 +182,21 @@ EXPORT_SYMBOL_GPL(__wake_up_locked_key_bookmark); * accessing the task state. */ void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode, - int nr_exclusive, void *key) + void *key) { - int wake_flags = 1; /* XXX WF_SYNC */ - if (unlikely(!wq_head)) return; - if (unlikely(nr_exclusive != 1)) - wake_flags = 0; - - __wake_up_common_lock(wq_head, mode, nr_exclusive, wake_flags, key); + __wake_up_common_lock(wq_head, mode, 1, WF_SYNC, key); } EXPORT_SYMBOL_GPL(__wake_up_sync_key); /* * __wake_up_sync - see __wake_up_sync_key() */ -void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode, int nr_exclusive) +void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode) { - __wake_up_sync_key(wq_head, mode, nr_exclusive, NULL); + __wake_up_sync_key(wq_head, mode, NULL); } EXPORT_SYMBOL_GPL(__wake_up_sync); /* For internal use only */ From patchwork Wed Oct 23 20:17:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11207601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B627913 for ; Wed, 23 Oct 2019 20:17:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6876621872 for ; Wed, 23 Oct 2019 20:17:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FToW/3WY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405796AbfJWURp (ORCPT ); Wed, 23 Oct 2019 16:17:45 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:55461 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2389389AbfJWURo (ORCPT ); Wed, 23 Oct 2019 16:17:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571861862; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kuVoT4rCIVndCHydJQvniYBLCRBG38HpqhPFp7A4Zoc=; b=FToW/3WYG1QBzuPVMa3u9DdHYGGV+zTYOe4axvkEun7Bd2+vSrZmtYf3gel1df2FlX5T1T IMsvBSL0MAKzEl5Y3QKAhG7hfhEnO6uaDKZDPVaVU+cB2X0/2heMfyx7vLYsAz34fZha+V +L0BWsQyJGt/nvCcRHBBjtznNKr3fvU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-197-LUFVp6z8MrC7FGPX9b4R9A-1; Wed, 23 Oct 2019 16:17:38 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 71DF7800D5A; Wed, 23 Oct 2019 20:17:36 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id 440B6194B6; Wed, 23 Oct 2019 20:17:32 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 03/10] Add wake_up_interruptible_sync_poll_locked() [ver #2] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Oct 2019 21:17:31 +0100 Message-ID: <157186185154.3995.9042853314167941790.stgit@warthog.procyon.org.uk> In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: LUFVp6z8MrC7FGPX9b4R9A-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: Add a wakeup call for a case whereby the caller already has the waitqueue spinlock held. This can be used by pipes to alter the ring buffer indices and issue a wakeup under the same spinlock. Signed-off-by: David Howells --- include/linux/wait.h | 3 +++ kernel/sched/wait.c | 23 +++++++++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/include/linux/wait.h b/include/linux/wait.h index bb7676d396cd..3283c8d02137 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -202,6 +202,7 @@ void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, vo void __wake_up_locked_key_bookmark(struct wait_queue_head *wq_head, unsigned int mode, void *key, wait_queue_entry_t *bookmark); void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); +void __wake_up_locked_sync_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked(struct wait_queue_head *wq_head, unsigned int mode, int nr); void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode); @@ -229,6 +230,8 @@ void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode); __wake_up(x, TASK_INTERRUPTIBLE, 1, poll_to_key(m)) #define wake_up_interruptible_sync_poll(x, m) \ __wake_up_sync_key((x), TASK_INTERRUPTIBLE, poll_to_key(m)) +#define wake_up_interruptible_sync_poll_locked(x, m) \ + __wake_up_locked_sync_key((x), TASK_INTERRUPTIBLE, poll_to_key(m)) #define ___wait_cond_timeout(condition) \ ({ \ diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index b4b52361dab7..ba059fbfc53a 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -191,6 +191,29 @@ void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode, } EXPORT_SYMBOL_GPL(__wake_up_sync_key); +/** + * __wake_up_locked_sync_key - wake up a thread blocked on a locked waitqueue. + * @wq_head: the waitqueue + * @mode: which threads + * @key: opaque value to be passed to wakeup targets + * + * The sync wakeup differs in that the waker knows that it will schedule + * away soon, so while the target thread will be woken up, it will not + * be migrated to another CPU - ie. the two threads are 'synchronized' + * with each other. This can prevent needless bouncing between CPUs. + * + * On UP it can prevent extra preemption. + * + * If this function wakes up a task, it executes a full memory barrier before + * accessing the task state. + */ +void __wake_up_locked_sync_key(struct wait_queue_head *wq_head, + unsigned int mode, void *key) +{ + __wake_up_common(wq_head, mode, 1, WF_SYNC, key, NULL); +} +EXPORT_SYMBOL_GPL(__wake_up_locked_sync_key); + /* * __wake_up_sync - see __wake_up_sync_key() */ From patchwork Wed Oct 23 20:17:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11207609 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1FE2B913 for ; Wed, 23 Oct 2019 20:18:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BA63F21872 for ; Wed, 23 Oct 2019 20:17:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CcZd/xN3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405906AbfJWUR6 (ORCPT ); Wed, 23 Oct 2019 16:17:58 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:27000 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2405830AbfJWURx (ORCPT ); Wed, 23 Oct 2019 16:17:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571861870; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2uyXUP0BHQMJuz45Ak3uslkSalNYJewLHtyWEXntxNU=; b=CcZd/xN3gfju/ebwC2t0kApitmENCp7wzwF3Qrw2pQNU7NT8CcnH1Kk4KGerU5qTrPl1xv m5wm4tacX9jzYghKdgdL4FXVddN1xP9Kfvk6NxrAOP43SIYzr4t8XnrZYfWZ4ghbBLWIEM Z8fi8fAImmgGtAiXNrvv2NBLVfFXJ/o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-307-9INAwdCuNoOt2NnkpOkkHQ-1; Wed, 23 Oct 2019 16:17:47 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8279A1800E06; Wed, 23 Oct 2019 20:17:45 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6ABBA3DE0; Wed, 23 Oct 2019 20:17:42 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Oct 2019 21:17:41 +0100 Message-ID: <157186186167.3995.7568100174393739543.stgit@warthog.procyon.org.uk> In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: 9INAwdCuNoOt2NnkpOkkHQ-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: Convert pipes to use head and tail pointers for the buffer ring rather than pointer and length as the latter requires two atomic ops to update (or a combined op) whereas the former only requires one. (1) The head pointer is the point at which production occurs and points to the slot in which the next buffer will be placed. This is equivalent to pipe->curbuf + pipe->nrbufs. The head pointer belongs to the write-side. (2) The tail pointer is the point at which consumption occurs. It points to the next slot to be consumed. This is equivalent to pipe->curbuf. The tail pointer belongs to the read-side. (3) head and tail are allowed to run to UINT_MAX and wrap naturally. They are only masked off when the array is being accessed, e.g.: pipe->bufs[head & mask] This means that it is not necessary to have a dead slot in the ring as head == tail isn't ambiguous. (4) The ring is empty if "head == tail". A helper, pipe_empty(), is provided for this. (5) The occupancy of the ring is "head - tail". A helper, pipe_occupancy(), is provided for this. (6) The number of free slots in the ring is "pipe->ring_size - occupancy". A helper, pipe_space_for_user() is provided to indicate how many slots userspace may use. (7) The ring is full if "head - tail >= pipe->ring_size". A helper, pipe_full(), is provided for this. Signed-off-by: David Howells --- fs/fuse/dev.c | 31 +++-- fs/pipe.c | 169 ++++++++++++++++------------- fs/splice.c | 188 ++++++++++++++++++++------------ include/linux/pipe_fs_i.h | 86 ++++++++++++++- include/linux/uio.h | 4 - lib/iov_iter.c | 266 +++++++++++++++++++++++++-------------------- 6 files changed, 464 insertions(+), 280 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index dadd617d826c..1e4bc27573cc 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -703,7 +703,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) cs->pipebufs++; cs->nr_segs--; } else { - if (cs->nr_segs == cs->pipe->buffers) + if (cs->nr_segs >= cs->pipe->ring_size) return -EIO; page = alloc_page(GFP_HIGHUSER); @@ -879,7 +879,7 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, struct pipe_buffer *buf; int err; - if (cs->nr_segs == cs->pipe->buffers) + if (cs->nr_segs >= cs->pipe->ring_size) return -EIO; err = unlock_request(cs->req); @@ -1341,7 +1341,7 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, if (!fud) return -EPERM; - bufs = kvmalloc_array(pipe->buffers, sizeof(struct pipe_buffer), + bufs = kvmalloc_array(pipe->ring_size, sizeof(struct pipe_buffer), GFP_KERNEL); if (!bufs) return -ENOMEM; @@ -1353,7 +1353,7 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, if (ret < 0) goto out; - if (pipe->nrbufs + cs.nr_segs > pipe->buffers) { + if (pipe_occupancy(pipe->head, pipe->tail) + cs.nr_segs > pipe->ring_size) { ret = -EIO; goto out; } @@ -1935,6 +1935,7 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, struct file *out, loff_t *ppos, size_t len, unsigned int flags) { + unsigned int head, tail, mask, count; unsigned nbuf; unsigned idx; struct pipe_buffer *bufs; @@ -1949,8 +1950,12 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, pipe_lock(pipe); - bufs = kvmalloc_array(pipe->nrbufs, sizeof(struct pipe_buffer), - GFP_KERNEL); + head = pipe->head; + tail = pipe->tail; + mask = pipe->ring_size - 1; + count = head - tail; + + bufs = kvmalloc_array(count, sizeof(struct pipe_buffer), GFP_KERNEL); if (!bufs) { pipe_unlock(pipe); return -ENOMEM; @@ -1958,8 +1963,8 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, nbuf = 0; rem = 0; - for (idx = 0; idx < pipe->nrbufs && rem < len; idx++) - rem += pipe->bufs[(pipe->curbuf + idx) & (pipe->buffers - 1)].len; + for (idx = tail; idx < head && rem < len; idx++) + rem += pipe->bufs[idx & mask].len; ret = -EINVAL; if (rem < len) @@ -1970,16 +1975,16 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, struct pipe_buffer *ibuf; struct pipe_buffer *obuf; - BUG_ON(nbuf >= pipe->buffers); - BUG_ON(!pipe->nrbufs); - ibuf = &pipe->bufs[pipe->curbuf]; + BUG_ON(nbuf >= pipe->ring_size); + BUG_ON(tail == head); + ibuf = &pipe->bufs[tail & mask]; obuf = &bufs[nbuf]; if (rem >= ibuf->len) { *obuf = *ibuf; ibuf->ops = NULL; - pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1); - pipe->nrbufs--; + tail++; + pipe_commit_read(pipe, tail); } else { if (!pipe_buf_get(pipe, ibuf)) goto out_free; diff --git a/fs/pipe.c b/fs/pipe.c index 8a2ab2f974bd..8a0806fe12d3 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -43,10 +43,11 @@ unsigned long pipe_user_pages_hard; unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR; /* - * We use a start+len construction, which provides full use of the - * allocated memory. - * -- Florian Coosmann (FGC) - * + * We use head and tail indices that aren't masked off, except at the point of + * dereference, but rather they're allowed to wrap naturally. This means there + * isn't a dead spot in the buffer, provided the ring size < INT_MAX. + * -- David Howells 2019-09-23. + * * Reads with count = 0 should always return 0. * -- Julian Bradfield 1999-06-07. * @@ -285,10 +286,12 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) ret = 0; __pipe_lock(pipe); for (;;) { - int bufs = pipe->nrbufs; - if (bufs) { - int curbuf = pipe->curbuf; - struct pipe_buffer *buf = pipe->bufs + curbuf; + unsigned int head = pipe->head; + unsigned int tail = pipe->tail; + unsigned int mask = pipe->ring_size - 1; + + if (!pipe_empty(head, tail)) { + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; size_t chars = buf->len; size_t written; int error; @@ -321,17 +324,17 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) if (!buf->len) { pipe_buf_release(pipe, buf); - curbuf = (curbuf + 1) & (pipe->buffers - 1); - pipe->curbuf = curbuf; - pipe->nrbufs = --bufs; + tail++; + pipe_commit_read(pipe, tail); do_wakeup = 1; } total_len -= chars; if (!total_len) break; /* common path: read succeeded */ + if (!pipe_empty(head, tail)) /* More to do? */ + continue; } - if (bufs) /* More to do? */ - continue; + if (!pipe->writers) break; if (!pipe->waiting_writers) { @@ -380,6 +383,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) { struct file *filp = iocb->ki_filp; struct pipe_inode_info *pipe = filp->private_data; + unsigned int head, tail, max_usage, mask; ssize_t ret = 0; int do_wakeup = 0; size_t total_len = iov_iter_count(from); @@ -397,12 +401,15 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) goto out; } + tail = pipe->tail; + head = pipe->head; + max_usage = pipe->ring_size; + mask = pipe->ring_size - 1; + /* We try to merge small writes */ chars = total_len & (PAGE_SIZE-1); /* size of the last buffer */ - if (pipe->nrbufs && chars != 0) { - int lastbuf = (pipe->curbuf + pipe->nrbufs - 1) & - (pipe->buffers - 1); - struct pipe_buffer *buf = pipe->bufs + lastbuf; + if (!pipe_empty(head, tail) && chars != 0) { + struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask]; int offset = buf->offset + buf->len; if (pipe_buf_can_merge(buf) && offset + chars <= PAGE_SIZE) { @@ -423,18 +430,16 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) } for (;;) { - int bufs; - if (!pipe->readers) { send_sig(SIGPIPE, current, 0); if (!ret) ret = -EPIPE; break; } - bufs = pipe->nrbufs; - if (bufs < pipe->buffers) { - int newbuf = (pipe->curbuf + bufs) & (pipe->buffers-1); - struct pipe_buffer *buf = pipe->bufs + newbuf; + + tail = pipe->tail; + if (!pipe_full(head, tail, max_usage)) { + struct pipe_buffer *buf = &pipe->bufs[head & mask]; struct page *page = pipe->tmp_page; int copied; @@ -470,14 +475,19 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) buf->ops = &packet_pipe_buf_ops; buf->flags = PIPE_BUF_FLAG_PACKET; } - pipe->nrbufs = ++bufs; + + head++; + pipe_commit_write(pipe, head); pipe->tmp_page = NULL; if (!iov_iter_count(from)) break; } - if (bufs < pipe->buffers) + + if (!pipe_full(head, tail, max_usage)) continue; + + /* Wait for buffer space to become available. */ if (filp->f_flags & O_NONBLOCK) { if (!ret) ret = -EAGAIN; @@ -515,17 +525,19 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct pipe_inode_info *pipe = filp->private_data; - int count, buf, nrbufs; + int count, head, tail, mask; switch (cmd) { case FIONREAD: __pipe_lock(pipe); count = 0; - buf = pipe->curbuf; - nrbufs = pipe->nrbufs; - while (--nrbufs >= 0) { - count += pipe->bufs[buf].len; - buf = (buf+1) & (pipe->buffers - 1); + head = pipe->head; + tail = pipe->tail; + mask = pipe->ring_size - 1; + + while (tail != head) { + count += pipe->bufs[tail & mask].len; + tail++; } __pipe_unlock(pipe); @@ -541,21 +553,25 @@ pipe_poll(struct file *filp, poll_table *wait) { __poll_t mask; struct pipe_inode_info *pipe = filp->private_data; - int nrbufs; + unsigned int head = READ_ONCE(pipe->head); + unsigned int tail = READ_ONCE(pipe->tail); poll_wait(filp, &pipe->wait, wait); + BUG_ON(pipe_occupancy(head, tail) > pipe->ring_size); + /* Reading only -- no need for acquiring the semaphore. */ - nrbufs = pipe->nrbufs; mask = 0; if (filp->f_mode & FMODE_READ) { - mask = (nrbufs > 0) ? EPOLLIN | EPOLLRDNORM : 0; + if (!pipe_empty(head, tail)) + mask |= EPOLLIN | EPOLLRDNORM; if (!pipe->writers && filp->f_version != pipe->w_counter) mask |= EPOLLHUP; } if (filp->f_mode & FMODE_WRITE) { - mask |= (nrbufs < pipe->buffers) ? EPOLLOUT | EPOLLWRNORM : 0; + if (!pipe_full(head, tail, pipe->ring_size)) + mask |= EPOLLOUT | EPOLLWRNORM; /* * Most Unices do not set EPOLLERR for FIFOs but on Linux they * behave exactly like pipes for poll(). @@ -679,7 +695,7 @@ struct pipe_inode_info *alloc_pipe_info(void) if (pipe->bufs) { init_waitqueue_head(&pipe->wait); pipe->r_counter = pipe->w_counter = 1; - pipe->buffers = pipe_bufs; + pipe->ring_size = pipe_bufs; pipe->user = user; mutex_init(&pipe->mutex); return pipe; @@ -697,9 +713,9 @@ void free_pipe_info(struct pipe_inode_info *pipe) { int i; - (void) account_pipe_buffers(pipe->user, pipe->buffers, 0); + (void) account_pipe_buffers(pipe->user, pipe->ring_size, 0); free_uid(pipe->user); - for (i = 0; i < pipe->buffers; i++) { + for (i = 0; i < pipe->ring_size; i++) { struct pipe_buffer *buf = pipe->bufs + i; if (buf->ops) pipe_buf_release(pipe, buf); @@ -880,7 +896,7 @@ SYSCALL_DEFINE1(pipe, int __user *, fildes) static int wait_for_partner(struct pipe_inode_info *pipe, unsigned int *cnt) { - int cur = *cnt; + int cur = *cnt; while (cur == *cnt) { pipe_wait(pipe); @@ -955,7 +971,7 @@ static int fifo_open(struct inode *inode, struct file *filp) } } break; - + case FMODE_WRITE: /* * O_WRONLY @@ -975,7 +991,7 @@ static int fifo_open(struct inode *inode, struct file *filp) goto err_wr; } break; - + case FMODE_READ | FMODE_WRITE: /* * O_RDWR @@ -1054,14 +1070,14 @@ unsigned int round_pipe_size(unsigned long size) static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) { struct pipe_buffer *bufs; - unsigned int size, nr_pages; + unsigned int size, nr_slots, head, tail, mask, n; unsigned long user_bufs; long ret = 0; size = round_pipe_size(arg); - nr_pages = size >> PAGE_SHIFT; + nr_slots = size >> PAGE_SHIFT; - if (!nr_pages) + if (!nr_slots) return -EINVAL; /* @@ -1071,13 +1087,13 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) * Decreasing the pipe capacity is always permitted, even * if the user is currently over a limit. */ - if (nr_pages > pipe->buffers && + if (nr_slots > pipe->ring_size && size > pipe_max_size && !capable(CAP_SYS_RESOURCE)) return -EPERM; - user_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages); + user_bufs = account_pipe_buffers(pipe->user, pipe->ring_size, nr_slots); - if (nr_pages > pipe->buffers && + if (nr_slots > pipe->ring_size && (too_many_pipe_buffers_hard(user_bufs) || too_many_pipe_buffers_soft(user_bufs)) && is_unprivileged_user()) { @@ -1086,17 +1102,21 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) } /* - * We can shrink the pipe, if arg >= pipe->nrbufs. Since we don't - * expect a lot of shrink+grow operations, just free and allocate - * again like we would do for growing. If the pipe currently + * We can shrink the pipe, if arg is greater than the ring occupancy. + * Since we don't expect a lot of shrink+grow operations, just free and + * allocate again like we would do for growing. If the pipe currently * contains more buffers than arg, then return busy. */ - if (nr_pages < pipe->nrbufs) { + mask = pipe->ring_size - 1; + head = pipe->head; + tail = pipe->tail; + n = pipe_occupancy(pipe->head, pipe->tail); + if (nr_slots < n) { ret = -EBUSY; goto out_revert_acct; } - bufs = kcalloc(nr_pages, sizeof(*bufs), + bufs = kcalloc(nr_slots, sizeof(*bufs), GFP_KERNEL_ACCOUNT | __GFP_NOWARN); if (unlikely(!bufs)) { ret = -ENOMEM; @@ -1105,33 +1125,36 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) /* * The pipe array wraps around, so just start the new one at zero - * and adjust the indexes. + * and adjust the indices. */ - if (pipe->nrbufs) { - unsigned int tail; - unsigned int head; - - tail = pipe->curbuf + pipe->nrbufs; - if (tail < pipe->buffers) - tail = 0; - else - tail &= (pipe->buffers - 1); - - head = pipe->nrbufs - tail; - if (head) - memcpy(bufs, pipe->bufs + pipe->curbuf, head * sizeof(struct pipe_buffer)); - if (tail) - memcpy(bufs + head, pipe->bufs, tail * sizeof(struct pipe_buffer)); + if (n > 0) { + unsigned int h = head & mask; + unsigned int t = tail & mask; + if (h > t) { + memcpy(bufs, pipe->bufs + t, + n * sizeof(struct pipe_buffer)); + } else { + unsigned int tsize = pipe->ring_size - t; + if (h > 0) + memcpy(bufs + tsize, pipe->bufs, + h * sizeof(struct pipe_buffer)); + memcpy(bufs, pipe->bufs + t, + tsize * sizeof(struct pipe_buffer)); + } } - pipe->curbuf = 0; + head = n; + tail = 0; + kfree(pipe->bufs); pipe->bufs = bufs; - pipe->buffers = nr_pages; - return nr_pages * PAGE_SIZE; + pipe->ring_size = nr_slots; + pipe->tail = tail; + pipe->head = head; + return pipe->ring_size * PAGE_SIZE; out_revert_acct: - (void) account_pipe_buffers(pipe->user, nr_pages, pipe->buffers); + (void) account_pipe_buffers(pipe->user, nr_slots, pipe->ring_size); return ret; } @@ -1161,7 +1184,7 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg) ret = pipe_set_size(pipe, arg); break; case F_GETPIPE_SZ: - ret = pipe->buffers * PAGE_SIZE; + ret = pipe->ring_size * PAGE_SIZE; break; default: ret = -EINVAL; diff --git a/fs/splice.c b/fs/splice.c index 98412721f056..bbc025236ff9 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -185,6 +185,9 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, struct splice_pipe_desc *spd) { unsigned int spd_pages = spd->nr_pages; + unsigned int tail = pipe->tail; + unsigned int head = pipe->head; + unsigned int mask = pipe->ring_size - 1; int ret = 0, page_nr = 0; if (!spd_pages) @@ -196,9 +199,8 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, goto out; } - while (pipe->nrbufs < pipe->buffers) { - int newbuf = (pipe->curbuf + pipe->nrbufs) & (pipe->buffers - 1); - struct pipe_buffer *buf = pipe->bufs + newbuf; + while (!pipe_full(head, tail, pipe->ring_size)) { + struct pipe_buffer *buf = &pipe->bufs[head & mask]; buf->page = spd->pages[page_nr]; buf->offset = spd->partial[page_nr].offset; @@ -207,7 +209,8 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, buf->ops = spd->ops; buf->flags = 0; - pipe->nrbufs++; + head++; + pipe_commit_write(pipe, head); page_nr++; ret += buf->len; @@ -228,17 +231,19 @@ EXPORT_SYMBOL_GPL(splice_to_pipe); ssize_t add_to_pipe(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { + unsigned int head = pipe->head; + unsigned int tail = pipe->tail; + unsigned int mask = pipe->ring_size - 1; int ret; if (unlikely(!pipe->readers)) { send_sig(SIGPIPE, current, 0); ret = -EPIPE; - } else if (pipe->nrbufs == pipe->buffers) { + } else if (pipe_full(head, tail, pipe->ring_size)) { ret = -EAGAIN; } else { - int newbuf = (pipe->curbuf + pipe->nrbufs) & (pipe->buffers - 1); - pipe->bufs[newbuf] = *buf; - pipe->nrbufs++; + pipe->bufs[head & mask] = *buf; + pipe_commit_write(pipe, head + 1); return buf->len; } pipe_buf_release(pipe, buf); @@ -252,14 +257,14 @@ EXPORT_SYMBOL(add_to_pipe); */ int splice_grow_spd(const struct pipe_inode_info *pipe, struct splice_pipe_desc *spd) { - unsigned int buffers = READ_ONCE(pipe->buffers); + unsigned int max_usage = READ_ONCE(pipe->ring_size); - spd->nr_pages_max = buffers; - if (buffers <= PIPE_DEF_BUFFERS) + spd->nr_pages_max = max_usage; + if (max_usage <= PIPE_DEF_BUFFERS) return 0; - spd->pages = kmalloc_array(buffers, sizeof(struct page *), GFP_KERNEL); - spd->partial = kmalloc_array(buffers, sizeof(struct partial_page), + spd->pages = kmalloc_array(max_usage, sizeof(struct page *), GFP_KERNEL); + spd->partial = kmalloc_array(max_usage, sizeof(struct partial_page), GFP_KERNEL); if (spd->pages && spd->partial) @@ -298,10 +303,11 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, { struct iov_iter to; struct kiocb kiocb; - int idx, ret; + unsigned int i_head; + int ret; iov_iter_pipe(&to, READ, pipe, len); - idx = to.idx; + i_head = to.head; init_sync_kiocb(&kiocb, in); kiocb.ki_pos = *ppos; ret = call_read_iter(in, &kiocb, &to); @@ -309,7 +315,7 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, *ppos = kiocb.ki_pos; file_accessed(in); } else if (ret < 0) { - to.idx = idx; + to.head = i_head; to.iov_offset = 0; iov_iter_advance(&to, 0); /* to free what was emitted */ /* @@ -370,11 +376,12 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos, struct iov_iter to; struct page **pages; unsigned int nr_pages; + unsigned int mask; size_t offset, base, copied = 0; ssize_t res; int i; - if (pipe->nrbufs == pipe->buffers) + if (pipe_full(pipe->head, pipe->tail, pipe->ring_size)) return -EAGAIN; /* @@ -400,8 +407,9 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos, } } - pipe->bufs[to.idx].offset = offset; - pipe->bufs[to.idx].len -= offset; + mask = pipe->ring_size - 1; + pipe->bufs[to.head & mask].offset = offset; + pipe->bufs[to.head & mask].len -= offset; for (i = 0; i < nr_pages; i++) { size_t this_len = min_t(size_t, len, PAGE_SIZE - offset); @@ -443,7 +451,8 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe, more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; - if (sd->len < sd->total_len && pipe->nrbufs > 1) + if (sd->len < sd->total_len && + pipe_occupancy(pipe->head, pipe->tail) > 1) more |= MSG_SENDPAGE_NOTLAST; return file->f_op->sendpage(file, buf->page, buf->offset, @@ -481,10 +490,13 @@ static void wakeup_pipe_writers(struct pipe_inode_info *pipe) static int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_desc *sd, splice_actor *actor) { + unsigned int head = pipe->head; + unsigned int tail = pipe->tail; + unsigned int mask = pipe->ring_size - 1; int ret; - while (pipe->nrbufs) { - struct pipe_buffer *buf = pipe->bufs + pipe->curbuf; + while (!pipe_empty(tail, head)) { + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; sd->len = buf->len; if (sd->len > sd->total_len) @@ -511,8 +523,8 @@ static int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_des if (!buf->len) { pipe_buf_release(pipe, buf); - pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1); - pipe->nrbufs--; + tail++; + pipe_commit_read(pipe, tail); if (pipe->files) sd->need_wakeup = true; } @@ -543,7 +555,7 @@ static int splice_from_pipe_next(struct pipe_inode_info *pipe, struct splice_des if (signal_pending(current)) return -ERESTARTSYS; - while (!pipe->nrbufs) { + while (pipe_empty(pipe->head, pipe->tail)) { if (!pipe->writers) return 0; @@ -686,7 +698,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, .pos = *ppos, .u.file = out, }; - int nbufs = pipe->buffers; + int nbufs = pipe->ring_size; struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL); ssize_t ret; @@ -699,16 +711,19 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, splice_from_pipe_begin(&sd); while (sd.total_len) { struct iov_iter from; + unsigned int head = pipe->head; + unsigned int tail = pipe->tail; + unsigned int mask = pipe->ring_size - 1; size_t left; - int n, idx; + int n; ret = splice_from_pipe_next(pipe, &sd); if (ret <= 0) break; - if (unlikely(nbufs < pipe->buffers)) { + if (unlikely(nbufs < pipe->ring_size)) { kfree(array); - nbufs = pipe->buffers; + nbufs = pipe->ring_size; array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL); if (!array) { @@ -719,16 +734,13 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, /* build the vector */ left = sd.total_len; - for (n = 0, idx = pipe->curbuf; left && n < pipe->nrbufs; n++, idx++) { - struct pipe_buffer *buf = pipe->bufs + idx; + for (n = 0; !pipe_empty(head, tail) && left && n < nbufs; tail++, n++) { + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; size_t this_len = buf->len; if (this_len > left) this_len = left; - if (idx == pipe->buffers - 1) - idx = -1; - ret = pipe_buf_confirm(pipe, buf); if (unlikely(ret)) { if (ret == -ENODATA) @@ -752,14 +764,15 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, *ppos = sd.pos; /* dismiss the fully eaten buffers, adjust the partial one */ + tail = pipe->tail; while (ret) { - struct pipe_buffer *buf = pipe->bufs + pipe->curbuf; + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; if (ret >= buf->len) { ret -= buf->len; buf->len = 0; pipe_buf_release(pipe, buf); - pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1); - pipe->nrbufs--; + tail++; + pipe_commit_read(pipe, tail); if (pipe->files) sd.need_wakeup = true; } else { @@ -942,15 +955,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, sd->flags &= ~SPLICE_F_NONBLOCK; more = sd->flags & SPLICE_F_MORE; - WARN_ON_ONCE(pipe->nrbufs != 0); + WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail)); while (len) { + unsigned int p_space; size_t read_len; loff_t pos = sd->pos, prev_pos = pos; /* Don't try to read more the pipe has space for. */ - read_len = min_t(size_t, len, - (pipe->buffers - pipe->nrbufs) << PAGE_SHIFT); + p_space = pipe->ring_size - + pipe_occupancy(pipe->head, pipe->tail); + read_len = min_t(size_t, len, p_space << PAGE_SHIFT); ret = do_splice_to(in, &pos, pipe, read_len, flags); if (unlikely(ret <= 0)) goto out_release; @@ -989,7 +1004,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, } done: - pipe->nrbufs = pipe->curbuf = 0; + pipe->tail = pipe->head = 0; file_accessed(in); return bytes; @@ -998,8 +1013,8 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, * If we did an incomplete transfer we must release * the pipe buffers in question: */ - for (i = 0; i < pipe->buffers; i++) { - struct pipe_buffer *buf = pipe->bufs + i; + for (i = 0; i < pipe->ring_size; i++) { + struct pipe_buffer *buf = &pipe->bufs[i]; if (buf->ops) pipe_buf_release(pipe, buf); @@ -1075,7 +1090,7 @@ static int wait_for_space(struct pipe_inode_info *pipe, unsigned flags) send_sig(SIGPIPE, current, 0); return -EPIPE; } - if (pipe->nrbufs != pipe->buffers) + if (!pipe_full(pipe->head, pipe->tail, pipe->ring_size)) return 0; if (flags & SPLICE_F_NONBLOCK) return -EAGAIN; @@ -1442,16 +1457,16 @@ static int ipipe_prep(struct pipe_inode_info *pipe, unsigned int flags) int ret; /* - * Check ->nrbufs without the inode lock first. This function + * Check the pipe occupancy without the inode lock first. This function * is speculative anyways, so missing one is ok. */ - if (pipe->nrbufs) + if (!pipe_empty(pipe->head, pipe->tail)) return 0; ret = 0; pipe_lock(pipe); - while (!pipe->nrbufs) { + while (pipe_empty(pipe->head, pipe->tail)) { if (signal_pending(current)) { ret = -ERESTARTSYS; break; @@ -1483,13 +1498,13 @@ static int opipe_prep(struct pipe_inode_info *pipe, unsigned int flags) * Check ->nrbufs without the inode lock first. This function * is speculative anyways, so missing one is ok. */ - if (pipe->nrbufs < pipe->buffers) + if (pipe_full(pipe->head, pipe->tail, pipe->ring_size)) return 0; ret = 0; pipe_lock(pipe); - while (pipe->nrbufs >= pipe->buffers) { + while (pipe_full(pipe->head, pipe->tail, pipe->ring_size)) { if (!pipe->readers) { send_sig(SIGPIPE, current, 0); ret = -EPIPE; @@ -1520,7 +1535,10 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, size_t len, unsigned int flags) { struct pipe_buffer *ibuf, *obuf; - int ret = 0, nbuf; + unsigned int i_head, o_head; + unsigned int i_tail, o_tail; + unsigned int i_mask, o_mask; + int ret = 0; bool input_wakeup = false; @@ -1540,7 +1558,14 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, */ pipe_double_lock(ipipe, opipe); + i_tail = ipipe->tail; + i_mask = ipipe->ring_size - 1; + o_head = opipe->head; + o_mask = opipe->ring_size - 1; + do { + size_t o_len; + if (!opipe->readers) { send_sig(SIGPIPE, current, 0); if (!ret) @@ -1548,14 +1573,18 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, break; } - if (!ipipe->nrbufs && !ipipe->writers) + i_head = ipipe->head; + o_tail = opipe->tail; + + if (pipe_empty(i_head, i_tail) && !ipipe->writers) break; /* * Cannot make any progress, because either the input * pipe is empty or the output pipe is full. */ - if (!ipipe->nrbufs || opipe->nrbufs >= opipe->buffers) { + if (pipe_empty(i_head, i_tail) || + pipe_full(o_head, o_tail, opipe->ring_size)) { /* Already processed some buffers, break */ if (ret) break; @@ -1575,9 +1604,8 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, goto retry; } - ibuf = ipipe->bufs + ipipe->curbuf; - nbuf = (opipe->curbuf + opipe->nrbufs) & (opipe->buffers - 1); - obuf = opipe->bufs + nbuf; + ibuf = &ipipe->bufs[i_tail & i_mask]; + obuf = &opipe->bufs[o_head & o_mask]; if (len >= ibuf->len) { /* @@ -1585,10 +1613,12 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, */ *obuf = *ibuf; ibuf->ops = NULL; - opipe->nrbufs++; - ipipe->curbuf = (ipipe->curbuf + 1) & (ipipe->buffers - 1); - ipipe->nrbufs--; + i_tail++; + pipe_commit_read(ipipe, i_tail); input_wakeup = true; + o_len = obuf->len; + o_head++; + pipe_commit_write(opipe, o_head); } else { /* * Get a reference to this pipe buffer, @@ -1610,12 +1640,14 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, pipe_buf_mark_unmergeable(obuf); obuf->len = len; - opipe->nrbufs++; - ibuf->offset += obuf->len; - ibuf->len -= obuf->len; + ibuf->offset += len; + ibuf->len -= len; + o_len = len; + o_head++; + pipe_commit_write(opipe, o_head); } - ret += obuf->len; - len -= obuf->len; + ret += o_len; + len -= o_len; } while (len); pipe_unlock(ipipe); @@ -1641,7 +1673,10 @@ static int link_pipe(struct pipe_inode_info *ipipe, size_t len, unsigned int flags) { struct pipe_buffer *ibuf, *obuf; - int ret = 0, i = 0, nbuf; + unsigned int i_head, o_head; + unsigned int i_tail, o_tail; + unsigned int i_mask, o_mask; + int ret = 0; /* * Potential ABBA deadlock, work around it by ordering lock @@ -1650,6 +1685,11 @@ static int link_pipe(struct pipe_inode_info *ipipe, */ pipe_double_lock(ipipe, opipe); + i_tail = ipipe->tail; + i_mask = ipipe->ring_size - 1; + o_head = opipe->head; + o_mask = opipe->ring_size - 1; + do { if (!opipe->readers) { send_sig(SIGPIPE, current, 0); @@ -1658,15 +1698,19 @@ static int link_pipe(struct pipe_inode_info *ipipe, break; } + i_head = ipipe->head; + o_tail = opipe->tail; + /* - * If we have iterated all input buffers or ran out of + * If we have iterated all input buffers or run out of * output room, break. */ - if (i >= ipipe->nrbufs || opipe->nrbufs >= opipe->buffers) + if (pipe_empty(i_head, i_tail) || + pipe_full(o_head, o_tail, opipe->ring_size)) break; - ibuf = ipipe->bufs + ((ipipe->curbuf + i) & (ipipe->buffers-1)); - nbuf = (opipe->curbuf + opipe->nrbufs) & (opipe->buffers - 1); + ibuf = &ipipe->bufs[i_tail & i_mask]; + obuf = &opipe->bufs[o_head & o_mask]; /* * Get a reference to this pipe buffer, @@ -1678,7 +1722,6 @@ static int link_pipe(struct pipe_inode_info *ipipe, break; } - obuf = opipe->bufs + nbuf; *obuf = *ibuf; /* @@ -1691,11 +1734,12 @@ static int link_pipe(struct pipe_inode_info *ipipe, if (obuf->len > len) obuf->len = len; - - opipe->nrbufs++; ret += obuf->len; len -= obuf->len; - i++; + + o_head++; + pipe_commit_write(opipe, o_head); + i_tail++; } while (len); /* diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 5c626fdc10db..fad096697ff5 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -30,9 +30,9 @@ struct pipe_buffer { * struct pipe_inode_info - a linux kernel pipe * @mutex: mutex protecting the whole thing * @wait: reader/writer wait point in case of empty/full pipe - * @nrbufs: the number of non-empty pipe buffers in this pipe - * @buffers: total number of buffers (should be a power of 2) - * @curbuf: the current pipe buffer entry + * @head: The point of buffer production + * @tail: The point of buffer consumption + * @ring_size: total number of buffers (should be a power of 2) * @tmp_page: cached released page * @readers: number of current readers of this pipe * @writers: number of current writers of this pipe @@ -48,7 +48,9 @@ struct pipe_buffer { struct pipe_inode_info { struct mutex mutex; wait_queue_head_t wait; - unsigned int nrbufs, curbuf, buffers; + unsigned int head; + unsigned int tail; + unsigned int ring_size; unsigned int readers; unsigned int writers; unsigned int files; @@ -104,6 +106,82 @@ struct pipe_buf_operations { bool (*get)(struct pipe_inode_info *, struct pipe_buffer *); }; +/** + * pipe_commit_read - Set pipe buffer tail pointer in the read-side + * @pipe: The pipe in question + * @tail: The new tail pointer + * + * Update the tail pointer in the read-side code after a read has taken place. + */ +static inline void pipe_commit_read(struct pipe_inode_info *pipe, + unsigned int tail) +{ + pipe->tail = tail; +} + +/** + * pipe_commit_write - Set pipe buffer head pointer in the write-side + * @pipe: The pipe in question + * @head: The new head pointer + * + * Update the head pointer in the write-side code after a write has taken place. + */ +static inline void pipe_commit_write(struct pipe_inode_info *pipe, + unsigned int head) +{ + pipe->head = head; +} + +/** + * pipe_empty - Return true if the pipe is empty + * @head: The pipe ring head pointer + * @tail: The pipe ring tail pointer + */ +static inline bool pipe_empty(unsigned int head, unsigned int tail) +{ + return head == tail; +} + +/** + * pipe_occupancy - Return number of slots used in the pipe + * @head: The pipe ring head pointer + * @tail: The pipe ring tail pointer + */ +static inline unsigned int pipe_occupancy(unsigned int head, unsigned int tail) +{ + return head - tail; +} + +/** + * pipe_full - Return true if the pipe is full + * @head: The pipe ring head pointer + * @tail: The pipe ring tail pointer + * @limit: The maximum amount of slots available. + */ +static inline bool pipe_full(unsigned int head, unsigned int tail, + unsigned int limit) +{ + return pipe_occupancy(head, tail) >= limit; +} + +/** + * pipe_space_for_user - Return number of slots available to userspace + * @head: The pipe ring head pointer + * @tail: The pipe ring tail pointer + * @pipe: The pipe info structure + */ +static inline unsigned int pipe_space_for_user(unsigned int head, unsigned int tail, + struct pipe_inode_info *pipe) +{ + unsigned int p_occupancy, p_space; + + p_occupancy = pipe_occupancy(head, tail); + if (p_occupancy >= pipe->ring_size) + return 0; + p_space = pipe->ring_size - p_occupancy; + return p_space; +} + /** * pipe_buf_get - get a reference to a pipe_buffer * @pipe: the pipe that the buffer belongs to diff --git a/include/linux/uio.h b/include/linux/uio.h index ab5f523bc0df..9576fd8158d7 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -45,8 +45,8 @@ struct iov_iter { union { unsigned long nr_segs; struct { - int idx; - int start_idx; + unsigned int head; + unsigned int start_head; }; }; }; diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 639d5e7014c1..150a40bdb21a 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -325,28 +325,33 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t static bool sanity(const struct iov_iter *i) { struct pipe_inode_info *pipe = i->pipe; - int idx = i->idx; - int next = pipe->curbuf + pipe->nrbufs; + unsigned int p_head = pipe->head; + unsigned int p_tail = pipe->tail; + unsigned int p_mask = pipe->ring_size - 1; + unsigned int p_occupancy = pipe_occupancy(p_head, p_tail); + unsigned int i_head = i->head; + unsigned int idx; + if (i->iov_offset) { struct pipe_buffer *p; - if (unlikely(!pipe->nrbufs)) + if (unlikely(p_occupancy == 0)) goto Bad; // pipe must be non-empty - if (unlikely(idx != ((next - 1) & (pipe->buffers - 1)))) + if (unlikely(i_head != p_head - 1)) goto Bad; // must be at the last buffer... - p = &pipe->bufs[idx]; + p = &pipe->bufs[i_head & p_mask]; if (unlikely(p->offset + p->len != i->iov_offset)) goto Bad; // ... at the end of segment } else { - if (idx != (next & (pipe->buffers - 1))) + if (i_head != p_head) goto Bad; // must be right after the last buffer } return true; Bad: - printk(KERN_ERR "idx = %d, offset = %zd\n", i->idx, i->iov_offset); - printk(KERN_ERR "curbuf = %d, nrbufs = %d, buffers = %d\n", - pipe->curbuf, pipe->nrbufs, pipe->buffers); - for (idx = 0; idx < pipe->buffers; idx++) + printk(KERN_ERR "idx = %d, offset = %zd\n", i_head, i->iov_offset); + printk(KERN_ERR "head = %d, tail = %d, buffers = %d\n", + p_head, p_tail, pipe->ring_size); + for (idx = 0; idx < pipe->ring_size; idx++) printk(KERN_ERR "[%p %p %d %d]\n", pipe->bufs[idx].ops, pipe->bufs[idx].page, @@ -359,18 +364,15 @@ static bool sanity(const struct iov_iter *i) #define sanity(i) true #endif -static inline int next_idx(int idx, struct pipe_inode_info *pipe) -{ - return (idx + 1) & (pipe->buffers - 1); -} - static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes, struct iov_iter *i) { struct pipe_inode_info *pipe = i->pipe; struct pipe_buffer *buf; + unsigned int p_tail = pipe->tail; + unsigned int p_mask = pipe->ring_size - 1; + unsigned int i_head = i->head; size_t off; - int idx; if (unlikely(bytes > i->count)) bytes = i->count; @@ -382,8 +384,7 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by return 0; off = i->iov_offset; - idx = i->idx; - buf = &pipe->bufs[idx]; + buf = &pipe->bufs[i_head & p_mask]; if (off) { if (offset == off && buf->page == page) { /* merge with the last one */ @@ -391,18 +392,21 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by i->iov_offset += bytes; goto out; } - idx = next_idx(idx, pipe); - buf = &pipe->bufs[idx]; + i_head++; + buf = &pipe->bufs[i_head & p_mask]; } - if (idx == pipe->curbuf && pipe->nrbufs) + if (pipe_full(i_head, p_tail, pipe->ring_size)) return 0; - pipe->nrbufs++; + buf->ops = &page_cache_pipe_buf_ops; - get_page(buf->page = page); + get_page(page); + buf->page = page; buf->offset = offset; buf->len = bytes; + + pipe_commit_read(pipe, i_head); i->iov_offset = offset + bytes; - i->idx = idx; + i->head = i_head; out: i->count -= bytes; return bytes; @@ -480,24 +484,30 @@ static inline bool allocated(struct pipe_buffer *buf) return buf->ops == &default_pipe_buf_ops; } -static inline void data_start(const struct iov_iter *i, int *idxp, size_t *offp) +static inline void data_start(const struct iov_iter *i, + unsigned int *iter_headp, size_t *offp) { + unsigned int p_mask = i->pipe->ring_size - 1; + unsigned int iter_head = i->head; size_t off = i->iov_offset; - int idx = i->idx; - if (off && (!allocated(&i->pipe->bufs[idx]) || off == PAGE_SIZE)) { - idx = next_idx(idx, i->pipe); + + if (off && (!allocated(&i->pipe->bufs[iter_head & p_mask]) || + off == PAGE_SIZE)) { + iter_head++; off = 0; } - *idxp = idx; + *iter_headp = iter_head; *offp = off; } static size_t push_pipe(struct iov_iter *i, size_t size, - int *idxp, size_t *offp) + int *iter_headp, size_t *offp) { struct pipe_inode_info *pipe = i->pipe; + unsigned int p_tail = pipe->tail; + unsigned int p_mask = pipe->ring_size - 1; + unsigned int iter_head; size_t off; - int idx; ssize_t left; if (unlikely(size > i->count)) @@ -506,33 +516,34 @@ static size_t push_pipe(struct iov_iter *i, size_t size, return 0; left = size; - data_start(i, &idx, &off); - *idxp = idx; + data_start(i, &iter_head, &off); + *iter_headp = iter_head; *offp = off; if (off) { left -= PAGE_SIZE - off; if (left <= 0) { - pipe->bufs[idx].len += size; + pipe->bufs[iter_head & p_mask].len += size; return size; } - pipe->bufs[idx].len = PAGE_SIZE; - idx = next_idx(idx, pipe); + pipe->bufs[iter_head & p_mask].len = PAGE_SIZE; + iter_head++; } - while (idx != pipe->curbuf || !pipe->nrbufs) { + while (!pipe_full(iter_head, p_tail, pipe->ring_size)) { + struct pipe_buffer *buf = &pipe->bufs[iter_head & p_mask]; struct page *page = alloc_page(GFP_USER); if (!page) break; - pipe->nrbufs++; - pipe->bufs[idx].ops = &default_pipe_buf_ops; - pipe->bufs[idx].page = page; - pipe->bufs[idx].offset = 0; - if (left <= PAGE_SIZE) { - pipe->bufs[idx].len = left; + + buf->ops = &default_pipe_buf_ops; + buf->page = page; + buf->offset = 0; + buf->len = max_t(ssize_t, left, PAGE_SIZE); + left -= buf->len; + iter_head++; + pipe_commit_write(pipe, iter_head); + + if (left == 0) return size; - } - pipe->bufs[idx].len = PAGE_SIZE; - left -= PAGE_SIZE; - idx = next_idx(idx, pipe); } return size - left; } @@ -541,23 +552,26 @@ static size_t copy_pipe_to_iter(const void *addr, size_t bytes, struct iov_iter *i) { struct pipe_inode_info *pipe = i->pipe; + unsigned int p_mask = pipe->ring_size - 1; + unsigned int i_head; size_t n, off; - int idx; if (!sanity(i)) return 0; - bytes = n = push_pipe(i, bytes, &idx, &off); + bytes = n = push_pipe(i, bytes, &i_head, &off); if (unlikely(!n)) return 0; - for ( ; n; idx = next_idx(idx, pipe), off = 0) { + do { size_t chunk = min_t(size_t, n, PAGE_SIZE - off); - memcpy_to_page(pipe->bufs[idx].page, off, addr, chunk); - i->idx = idx; + memcpy_to_page(pipe->bufs[i_head & p_mask].page, off, addr, chunk); + i->head = i_head; i->iov_offset = off + chunk; n -= chunk; addr += chunk; - } + off = 0; + i_head++; + } while (n); i->count -= bytes; return bytes; } @@ -573,28 +587,31 @@ static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes, __wsum *csum, struct iov_iter *i) { struct pipe_inode_info *pipe = i->pipe; + unsigned int p_mask = pipe->ring_size - 1; + unsigned int i_head; size_t n, r; size_t off = 0; __wsum sum = *csum; - int idx; if (!sanity(i)) return 0; - bytes = n = push_pipe(i, bytes, &idx, &r); + bytes = n = push_pipe(i, bytes, &i_head, &r); if (unlikely(!n)) return 0; - for ( ; n; idx = next_idx(idx, pipe), r = 0) { + do { size_t chunk = min_t(size_t, n, PAGE_SIZE - r); - char *p = kmap_atomic(pipe->bufs[idx].page); + char *p = kmap_atomic(pipe->bufs[i_head & p_mask].page); sum = csum_and_memcpy(p + r, addr, chunk, sum, off); kunmap_atomic(p); - i->idx = idx; + i->head = i_head; i->iov_offset = r + chunk; n -= chunk; off += chunk; addr += chunk; - } + r = 0; + i_head++; + } while (n); i->count -= bytes; *csum = sum; return bytes; @@ -645,29 +662,32 @@ static size_t copy_pipe_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i) { struct pipe_inode_info *pipe = i->pipe; + unsigned int p_mask = pipe->ring_size - 1; + unsigned int i_head; size_t n, off, xfer = 0; - int idx; if (!sanity(i)) return 0; - bytes = n = push_pipe(i, bytes, &idx, &off); + bytes = n = push_pipe(i, bytes, &i_head, &off); if (unlikely(!n)) return 0; - for ( ; n; idx = next_idx(idx, pipe), off = 0) { + do { size_t chunk = min_t(size_t, n, PAGE_SIZE - off); unsigned long rem; - rem = memcpy_mcsafe_to_page(pipe->bufs[idx].page, off, addr, - chunk); - i->idx = idx; + rem = memcpy_mcsafe_to_page(pipe->bufs[i_head & p_mask].page, + off, addr, chunk); + i->head = i_head; i->iov_offset = off + chunk - rem; xfer += chunk - rem; if (rem) break; n -= chunk; addr += chunk; - } + off = 0; + i_head++; + } while (n); i->count -= xfer; return xfer; } @@ -925,6 +945,8 @@ EXPORT_SYMBOL(copy_page_from_iter); static size_t pipe_zero(size_t bytes, struct iov_iter *i) { struct pipe_inode_info *pipe = i->pipe; + unsigned int p_mask = pipe->ring_size - 1; + unsigned int i_head; size_t n, off; int idx; @@ -935,13 +957,15 @@ static size_t pipe_zero(size_t bytes, struct iov_iter *i) if (unlikely(!n)) return 0; - for ( ; n; idx = next_idx(idx, pipe), off = 0) { + do { size_t chunk = min_t(size_t, n, PAGE_SIZE - off); - memzero_page(pipe->bufs[idx].page, off, chunk); - i->idx = idx; + memzero_page(pipe->bufs[i_head & p_mask].page, off, chunk); + i->head = i_head; i->iov_offset = off + chunk; n -= chunk; - } + off = 0; + i_head++; + } while (n); i->count -= bytes; return bytes; } @@ -987,20 +1011,26 @@ EXPORT_SYMBOL(iov_iter_copy_from_user_atomic); static inline void pipe_truncate(struct iov_iter *i) { struct pipe_inode_info *pipe = i->pipe; - if (pipe->nrbufs) { + unsigned int p_tail = pipe->tail; + unsigned int p_head = pipe->head; + unsigned int p_mask = pipe->ring_size - 1; + + if (!pipe_empty(p_head, p_tail)) { + struct pipe_buffer *buf; + unsigned int i_head = i->head; size_t off = i->iov_offset; - int idx = i->idx; - int nrbufs = (idx - pipe->curbuf) & (pipe->buffers - 1); + if (off) { - pipe->bufs[idx].len = off - pipe->bufs[idx].offset; - idx = next_idx(idx, pipe); - nrbufs++; + buf = &pipe->bufs[i_head & p_mask]; + buf->len = off - buf->offset; + i_head++; } - while (pipe->nrbufs > nrbufs) { - pipe_buf_release(pipe, &pipe->bufs[idx]); - idx = next_idx(idx, pipe); - pipe->nrbufs--; + while (p_head != i_head) { + p_head--; + pipe_buf_release(pipe, &pipe->bufs[p_head & p_mask]); } + + pipe_commit_write(pipe, p_head); } } @@ -1011,18 +1041,20 @@ static void pipe_advance(struct iov_iter *i, size_t size) size = i->count; if (size) { struct pipe_buffer *buf; + unsigned int p_mask = pipe->ring_size - 1; + unsigned int i_head = i->head; size_t off = i->iov_offset, left = size; - int idx = i->idx; + if (off) /* make it relative to the beginning of buffer */ - left += off - pipe->bufs[idx].offset; + left += off - pipe->bufs[i_head & p_mask].offset; while (1) { - buf = &pipe->bufs[idx]; + buf = &pipe->bufs[i_head & p_mask]; if (left <= buf->len) break; left -= buf->len; - idx = next_idx(idx, pipe); + i_head++; } - i->idx = idx; + i->head = i_head; i->iov_offset = buf->offset + left; } i->count -= size; @@ -1053,25 +1085,27 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) i->count += unroll; if (unlikely(iov_iter_is_pipe(i))) { struct pipe_inode_info *pipe = i->pipe; - int idx = i->idx; + unsigned int p_mask = pipe->ring_size - 1; + unsigned int i_head = i->head; size_t off = i->iov_offset; while (1) { - size_t n = off - pipe->bufs[idx].offset; + struct pipe_buffer *b = &pipe->bufs[i_head & p_mask]; + size_t n = off - b->offset; if (unroll < n) { off -= unroll; break; } unroll -= n; - if (!unroll && idx == i->start_idx) { + if (!unroll && i_head == i->start_head) { off = 0; break; } - if (!idx--) - idx = pipe->buffers - 1; - off = pipe->bufs[idx].offset + pipe->bufs[idx].len; + i_head--; + b = &pipe->bufs[i_head & p_mask]; + off = b->offset + b->len; } i->iov_offset = off; - i->idx = idx; + i->head = i_head; pipe_truncate(i); return; } @@ -1159,13 +1193,13 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, size_t count) { BUG_ON(direction != READ); - WARN_ON(pipe->nrbufs == pipe->buffers); + WARN_ON(pipe_full(pipe->head, pipe->tail, pipe->ring_size)); i->type = ITER_PIPE | READ; i->pipe = pipe; - i->idx = (pipe->curbuf + pipe->nrbufs) & (pipe->buffers - 1); + i->head = pipe->head; i->iov_offset = 0; i->count = count; - i->start_idx = i->idx; + i->start_head = i->head; } EXPORT_SYMBOL(iov_iter_pipe); @@ -1189,11 +1223,12 @@ EXPORT_SYMBOL(iov_iter_discard); unsigned long iov_iter_alignment(const struct iov_iter *i) { + unsigned int p_mask = i->pipe->ring_size - 1; unsigned long res = 0; size_t size = i->count; if (unlikely(iov_iter_is_pipe(i))) { - if (size && i->iov_offset && allocated(&i->pipe->bufs[i->idx])) + if (size && i->iov_offset && allocated(&i->pipe->bufs[i->head & p_mask])) return size | i->iov_offset; return size; } @@ -1231,19 +1266,20 @@ EXPORT_SYMBOL(iov_iter_gap_alignment); static inline ssize_t __pipe_get_pages(struct iov_iter *i, size_t maxsize, struct page **pages, - int idx, + int iter_head, size_t *start) { struct pipe_inode_info *pipe = i->pipe; - ssize_t n = push_pipe(i, maxsize, &idx, start); + unsigned int p_mask = pipe->ring_size - 1; + ssize_t n = push_pipe(i, maxsize, &iter_head, start); if (!n) return -EFAULT; maxsize = n; n += *start; while (n > 0) { - get_page(*pages++ = pipe->bufs[idx].page); - idx = next_idx(idx, pipe); + get_page(*pages++ = pipe->bufs[iter_head & p_mask].page); + iter_head++; n -= PAGE_SIZE; } @@ -1254,9 +1290,8 @@ static ssize_t pipe_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start) { - unsigned npages; + unsigned int iter_head, npages; size_t capacity; - int idx; if (!maxsize) return 0; @@ -1264,12 +1299,12 @@ static ssize_t pipe_get_pages(struct iov_iter *i, if (!sanity(i)) return -EFAULT; - data_start(i, &idx, start); - /* some of this one + all after this one */ - npages = ((i->pipe->curbuf - idx - 1) & (i->pipe->buffers - 1)) + 1; - capacity = min(npages,maxpages) * PAGE_SIZE - *start; + data_start(i, &iter_head, start); + /* Amount of free space: some of this one + all after this one */ + npages = pipe_space_for_user(iter_head, i->pipe->tail, i->pipe); + capacity = min(npages, maxpages) * PAGE_SIZE - *start; - return __pipe_get_pages(i, min(maxsize, capacity), pages, idx, start); + return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start); } ssize_t iov_iter_get_pages(struct iov_iter *i, @@ -1323,9 +1358,8 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i, size_t *start) { struct page **p; + unsigned int iter_head, npages; ssize_t n; - int idx; - int npages; if (!maxsize) return 0; @@ -1333,9 +1367,9 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i, if (!sanity(i)) return -EFAULT; - data_start(i, &idx, start); - /* some of this one + all after this one */ - npages = ((i->pipe->curbuf - idx - 1) & (i->pipe->buffers - 1)) + 1; + data_start(i, &iter_head, start); + /* Amount of free space: some of this one + all after this one */ + npages = pipe_space_for_user(iter_head, i->pipe->tail, i->pipe); n = npages * PAGE_SIZE - *start; if (maxsize > n) maxsize = n; @@ -1344,7 +1378,7 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i, p = get_pages_array(npages); if (!p) return -ENOMEM; - n = __pipe_get_pages(i, maxsize, p, idx, start); + n = __pipe_get_pages(i, maxsize, p, iter_head, start); if (n > 0) *pages = p; else @@ -1560,15 +1594,15 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages) if (unlikely(iov_iter_is_pipe(i))) { struct pipe_inode_info *pipe = i->pipe; + unsigned int iter_head; size_t off; - int idx; if (!sanity(i)) return 0; - data_start(i, &idx, &off); + data_start(i, &iter_head, &off); /* some of this one + all after this one */ - npages = ((pipe->curbuf - idx - 1) & (pipe->buffers - 1)) + 1; + npages = pipe_space_for_user(iter_head, pipe->tail, pipe); if (npages >= maxpages) return maxpages; } else iterate_all_kinds(i, size, v, ({ From patchwork Wed Oct 23 20:17:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11207613 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E0A313B1 for ; Wed, 23 Oct 2019 20:18:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 487D221A4A for ; Wed, 23 Oct 2019 20:18:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="A+exkiOv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405937AbfJWUSF (ORCPT ); Wed, 23 Oct 2019 16:18:05 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:51293 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2405913AbfJWUSE (ORCPT ); Wed, 23 Oct 2019 16:18:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571861882; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mAR3jWLLwceyOI1YGmnewtsmpk61ffiuSTaVd9UeOEc=; b=A+exkiOvNHKuvkb2/5HjmLS2OTP//M8AatXoGzUXo2TucZ9oy8+0l3DHMFNYIBtn/kk//p xhRFJkIZ9EndV3ngUSGKzKnL41+UhK3xDaUZBU6jHZWLkYu8cvn4TnSExJO4JmtFWfpUXL lSRL1cW5DMhBWYFWz5Tx5G/NWDxpPJo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-229-K25Bhp62OCCbaB28Qh2ePA-1; Wed, 23 Oct 2019 16:17:58 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CDA895E4; Wed, 23 Oct 2019 20:17:56 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7BC1A194B6; Wed, 23 Oct 2019 20:17:51 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 05/10] pipe: Allow pipes to have kernel-reserved slots [ver #2] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Oct 2019 21:17:50 +0100 Message-ID: <157186187074.3995.5299010277211516015.stgit@warthog.procyon.org.uk> In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: K25Bhp62OCCbaB28Qh2ePA-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: Split pipe->ring_size into two numbers: (1) pipe->ring_size - indicates the hard size of the pipe ring. (2) pipe->max_usage - indicates the maximum number of pipe ring slots that userspace orchestrated events can fill. This allows for a pipe that is both writable by the general kernel notification facility and by userspace, allowing plenty of ring space for notifications to be added whilst preventing userspace from being able to pin too much unswappable kernel space. Signed-off-by: David Howells --- fs/fuse/dev.c | 8 ++++---- fs/pipe.c | 10 ++++++---- fs/splice.c | 26 +++++++++++++------------- include/linux/pipe_fs_i.h | 6 +++++- lib/iov_iter.c | 4 ++-- 5 files changed, 30 insertions(+), 24 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 1e4bc27573cc..5ef57a322cb8 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -703,7 +703,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) cs->pipebufs++; cs->nr_segs--; } else { - if (cs->nr_segs >= cs->pipe->ring_size) + if (cs->nr_segs >= cs->pipe->max_usage) return -EIO; page = alloc_page(GFP_HIGHUSER); @@ -879,7 +879,7 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, struct pipe_buffer *buf; int err; - if (cs->nr_segs >= cs->pipe->ring_size) + if (cs->nr_segs >= cs->pipe->max_usage) return -EIO; err = unlock_request(cs->req); @@ -1341,7 +1341,7 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, if (!fud) return -EPERM; - bufs = kvmalloc_array(pipe->ring_size, sizeof(struct pipe_buffer), + bufs = kvmalloc_array(pipe->max_usage, sizeof(struct pipe_buffer), GFP_KERNEL); if (!bufs) return -ENOMEM; @@ -1353,7 +1353,7 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, if (ret < 0) goto out; - if (pipe_occupancy(pipe->head, pipe->tail) + cs.nr_segs > pipe->ring_size) { + if (pipe_occupancy(pipe->head, pipe->tail) + cs.nr_segs > pipe->max_usage) { ret = -EIO; goto out; } diff --git a/fs/pipe.c b/fs/pipe.c index 8a0806fe12d3..12e47cae9425 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -403,7 +403,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) tail = pipe->tail; head = pipe->head; - max_usage = pipe->ring_size; + max_usage = pipe->max_usage; mask = pipe->ring_size - 1; /* We try to merge small writes */ @@ -570,7 +570,7 @@ pipe_poll(struct file *filp, poll_table *wait) } if (filp->f_mode & FMODE_WRITE) { - if (!pipe_full(head, tail, pipe->ring_size)) + if (!pipe_full(head, tail, pipe->max_usage)) mask |= EPOLLOUT | EPOLLWRNORM; /* * Most Unices do not set EPOLLERR for FIFOs but on Linux they @@ -695,6 +695,7 @@ struct pipe_inode_info *alloc_pipe_info(void) if (pipe->bufs) { init_waitqueue_head(&pipe->wait); pipe->r_counter = pipe->w_counter = 1; + pipe->max_usage = pipe_bufs; pipe->ring_size = pipe_bufs; pipe->user = user; mutex_init(&pipe->mutex); @@ -1149,9 +1150,10 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) kfree(pipe->bufs); pipe->bufs = bufs; pipe->ring_size = nr_slots; + pipe->max_usage = nr_slots; pipe->tail = tail; pipe->head = head; - return pipe->ring_size * PAGE_SIZE; + return pipe->max_usage * PAGE_SIZE; out_revert_acct: (void) account_pipe_buffers(pipe->user, nr_slots, pipe->ring_size); @@ -1184,7 +1186,7 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg) ret = pipe_set_size(pipe, arg); break; case F_GETPIPE_SZ: - ret = pipe->ring_size * PAGE_SIZE; + ret = pipe->max_usage * PAGE_SIZE; break; default: ret = -EINVAL; diff --git a/fs/splice.c b/fs/splice.c index bbc025236ff9..03b96eae084b 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -199,7 +199,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, goto out; } - while (!pipe_full(head, tail, pipe->ring_size)) { + while (!pipe_full(head, tail, pipe->max_usage)) { struct pipe_buffer *buf = &pipe->bufs[head & mask]; buf->page = spd->pages[page_nr]; @@ -239,7 +239,7 @@ ssize_t add_to_pipe(struct pipe_inode_info *pipe, struct pipe_buffer *buf) if (unlikely(!pipe->readers)) { send_sig(SIGPIPE, current, 0); ret = -EPIPE; - } else if (pipe_full(head, tail, pipe->ring_size)) { + } else if (pipe_full(head, tail, pipe->max_usage)) { ret = -EAGAIN; } else { pipe->bufs[head & mask] = *buf; @@ -257,7 +257,7 @@ EXPORT_SYMBOL(add_to_pipe); */ int splice_grow_spd(const struct pipe_inode_info *pipe, struct splice_pipe_desc *spd) { - unsigned int max_usage = READ_ONCE(pipe->ring_size); + unsigned int max_usage = READ_ONCE(pipe->max_usage); spd->nr_pages_max = max_usage; if (max_usage <= PIPE_DEF_BUFFERS) @@ -381,7 +381,7 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos, ssize_t res; int i; - if (pipe_full(pipe->head, pipe->tail, pipe->ring_size)) + if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) return -EAGAIN; /* @@ -698,7 +698,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, .pos = *ppos, .u.file = out, }; - int nbufs = pipe->ring_size; + int nbufs = pipe->max_usage; struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL); ssize_t ret; @@ -721,9 +721,9 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, if (ret <= 0) break; - if (unlikely(nbufs < pipe->ring_size)) { + if (unlikely(nbufs < pipe->max_usage)) { kfree(array); - nbufs = pipe->ring_size; + nbufs = pipe->max_usage; array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL); if (!array) { @@ -963,7 +963,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, loff_t pos = sd->pos, prev_pos = pos; /* Don't try to read more the pipe has space for. */ - p_space = pipe->ring_size - + p_space = pipe->max_usage - pipe_occupancy(pipe->head, pipe->tail); read_len = min_t(size_t, len, p_space << PAGE_SHIFT); ret = do_splice_to(in, &pos, pipe, read_len, flags); @@ -1090,7 +1090,7 @@ static int wait_for_space(struct pipe_inode_info *pipe, unsigned flags) send_sig(SIGPIPE, current, 0); return -EPIPE; } - if (!pipe_full(pipe->head, pipe->tail, pipe->ring_size)) + if (!pipe_full(pipe->head, pipe->tail, pipe->max_usage)) return 0; if (flags & SPLICE_F_NONBLOCK) return -EAGAIN; @@ -1498,13 +1498,13 @@ static int opipe_prep(struct pipe_inode_info *pipe, unsigned int flags) * Check ->nrbufs without the inode lock first. This function * is speculative anyways, so missing one is ok. */ - if (pipe_full(pipe->head, pipe->tail, pipe->ring_size)) + if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) return 0; ret = 0; pipe_lock(pipe); - while (pipe_full(pipe->head, pipe->tail, pipe->ring_size)) { + while (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) { if (!pipe->readers) { send_sig(SIGPIPE, current, 0); ret = -EPIPE; @@ -1584,7 +1584,7 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, * pipe is empty or the output pipe is full. */ if (pipe_empty(i_head, i_tail) || - pipe_full(o_head, o_tail, opipe->ring_size)) { + pipe_full(o_head, o_tail, opipe->max_usage)) { /* Already processed some buffers, break */ if (ret) break; @@ -1706,7 +1706,7 @@ static int link_pipe(struct pipe_inode_info *ipipe, * output room, break. */ if (pipe_empty(i_head, i_tail) || - pipe_full(o_head, o_tail, opipe->ring_size)) + pipe_full(o_head, o_tail, opipe->max_usage)) break; ibuf = &ipipe->bufs[i_tail & i_mask]; diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index fad096697ff5..90055ff16550 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -32,6 +32,7 @@ struct pipe_buffer { * @wait: reader/writer wait point in case of empty/full pipe * @head: The point of buffer production * @tail: The point of buffer consumption + * @max_usage: The maximum number of slots that may be used in the ring * @ring_size: total number of buffers (should be a power of 2) * @tmp_page: cached released page * @readers: number of current readers of this pipe @@ -50,6 +51,7 @@ struct pipe_inode_info { wait_queue_head_t wait; unsigned int head; unsigned int tail; + unsigned int max_usage; unsigned int ring_size; unsigned int readers; unsigned int writers; @@ -176,9 +178,11 @@ static inline unsigned int pipe_space_for_user(unsigned int head, unsigned int t unsigned int p_occupancy, p_space; p_occupancy = pipe_occupancy(head, tail); - if (p_occupancy >= pipe->ring_size) + if (p_occupancy >= pipe->max_usage) return 0; p_space = pipe->ring_size - p_occupancy; + if (p_space > pipe->max_usage) + p_space = pipe->max_usage; return p_space; } diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 150a40bdb21a..134737e9d4ee 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -395,7 +395,7 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by i_head++; buf = &pipe->bufs[i_head & p_mask]; } - if (pipe_full(i_head, p_tail, pipe->ring_size)) + if (pipe_full(i_head, p_tail, pipe->max_usage)) return 0; buf->ops = &page_cache_pipe_buf_ops; @@ -528,7 +528,7 @@ static size_t push_pipe(struct iov_iter *i, size_t size, pipe->bufs[iter_head & p_mask].len = PAGE_SIZE; iter_head++; } - while (!pipe_full(iter_head, p_tail, pipe->ring_size)) { + while (!pipe_full(iter_head, p_tail, pipe->max_usage)) { struct pipe_buffer *buf = &pipe->bufs[iter_head & p_mask]; struct page *page = alloc_page(GFP_USER); if (!page) From patchwork Wed Oct 23 20:18:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11207623 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84F0F913 for ; Wed, 23 Oct 2019 20:18:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 63A1C2064A for ; Wed, 23 Oct 2019 20:18:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="M4erFpeG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2407416AbfJWUSM (ORCPT ); Wed, 23 Oct 2019 16:18:12 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:44025 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2405927AbfJWUSL (ORCPT ); Wed, 23 Oct 2019 16:18:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571861890; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PTa8/R0oI/33uh7QoOdTHDItHL1AUlg5WLx25CuXu0w=; b=M4erFpeGVQh97cMmgKy7wsyesEd+/qi+0bmhQyscWr2rzz6QA1/YmZeC2QDoBTB1ON2I5F 7SNEBDFHLFt65Ry/4ew8YGuZ+T3XaUlPHq1HCYvvuk9twGcG29KALwnl3Kot1y4HUv9wgd c3uenV0OMbxjLL1vZcnZR2CUVUhFEKI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-342-Z1qnm_rTOfe_DMgLiIc_Rg-1; Wed, 23 Oct 2019 16:18:07 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 739BE80183D; Wed, 23 Oct 2019 20:18:05 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id C0BC760BF1; Wed, 23 Oct 2019 20:18:02 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 06/10] pipe: Advance tail pointer inside of wait spinlock in pipe_read() [ver #2] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Oct 2019 21:18:02 +0100 Message-ID: <157186188205.3995.4257625973468561091.stgit@warthog.procyon.org.uk> In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-MC-Unique: Z1qnm_rTOfe_DMgLiIc_Rg-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: Advance the pipe ring tail pointer inside of wait spinlock in pipe_read() so that the pipe can be written into with kernel notifications from contexts where pipe->mutex cannot be taken. Signed-off-by: David Howells --- fs/pipe.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/pipe.c b/fs/pipe.c index 12e47cae9425..1274305772fb 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -324,9 +324,14 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) if (!buf->len) { pipe_buf_release(pipe, buf); + spin_lock_irq(&pipe->wait.lock); tail++; pipe_commit_read(pipe, tail); - do_wakeup = 1; + do_wakeup = 0; + wake_up_interruptible_sync_poll_locked( + &pipe->wait, EPOLLOUT | EPOLLWRNORM); + spin_unlock_irq(&pipe->wait.lock); + kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); } total_len -= chars; if (!total_len) @@ -358,6 +363,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) if (do_wakeup) { wake_up_interruptible_sync_poll(&pipe->wait, EPOLLOUT | EPOLLWRNORM); kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); + do_wakeup = 0; } pipe_wait(pipe); } From patchwork Wed Oct 23 20:18:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11207633 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 75E5E13B1 for ; Wed, 23 Oct 2019 20:18:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 54C7D21872 for ; Wed, 23 Oct 2019 20:18:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DzY/CDPu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2406052AbfJWUSU (ORCPT ); Wed, 23 Oct 2019 16:18:20 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:39809 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2406041AbfJWUSU (ORCPT ); Wed, 23 Oct 2019 16:18:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571861899; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TsICzw/12FA1ze+4AGSV0OSb28CD+mZ6S94YG04i4d8=; b=DzY/CDPuESKUdWEZda60BL/+fMCerqej3Xw0sB2sWAB2GYKNnE6Lm86N/apJXHuHpAS1v0 LqoVGz75iPxb06rn3e++we18gXgVThojNXg7bUu7FnBEnWjb9umuMntO/g/RPu1kVoNfEW O/JQXeg+gyj2QP5lY3YYGep3eLO3MHg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-158-GYEDDBkSPA-9sgTXCU1ilg-1; Wed, 23 Oct 2019 16:18:16 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 340FE107AD31; Wed, 23 Oct 2019 20:18:14 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id 637C7600CC; Wed, 23 Oct 2019 20:18:11 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 07/10] pipe: Conditionalise wakeup in pipe_read() [ver #2] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Oct 2019 21:18:10 +0100 Message-ID: <157186189069.3995.10292601951655075484.stgit@warthog.procyon.org.uk> In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-MC-Unique: GYEDDBkSPA-9sgTXCU1ilg-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: Only do a wakeup in pipe_read() if we made space in a completely full buffer. The producer shouldn't be waiting on pipe->wait otherwise. Signed-off-by: David Howells --- fs/pipe.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 1274305772fb..e3a8f10750c9 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -327,11 +327,13 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) spin_lock_irq(&pipe->wait.lock); tail++; pipe_commit_read(pipe, tail); - do_wakeup = 0; - wake_up_interruptible_sync_poll_locked( - &pipe->wait, EPOLLOUT | EPOLLWRNORM); + do_wakeup = 1; + if (head - (tail - 1) == pipe->max_usage) + wake_up_interruptible_sync_poll_locked( + &pipe->wait, EPOLLOUT | EPOLLWRNORM); spin_unlock_irq(&pipe->wait.lock); - kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); + if (head - (tail - 1) == pipe->max_usage) + kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); } total_len -= chars; if (!total_len) @@ -360,11 +362,6 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) ret = -ERESTARTSYS; break; } - if (do_wakeup) { - wake_up_interruptible_sync_poll(&pipe->wait, EPOLLOUT | EPOLLWRNORM); - kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); - do_wakeup = 0; - } pipe_wait(pipe); } __pipe_unlock(pipe); From patchwork Wed Oct 23 20:18:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11207641 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A7300913 for ; Wed, 23 Oct 2019 20:18:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7D47421872 for ; Wed, 23 Oct 2019 20:18:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Y9wbyAv/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2407445AbfJWUSc (ORCPT ); Wed, 23 Oct 2019 16:18:32 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:30369 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2407427AbfJWUS3 (ORCPT ); Wed, 23 Oct 2019 16:18:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571861907; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Mwljw2NG5FNU7TKvMqy9lThobSeo2IGdh0QPYxJP5hQ=; b=Y9wbyAv//0PYLRmhz/hionO8k7Q+tsCc+a3hvKkowncxmsxX1RCjaJEFgmoqz0Avc+WJrL l6coyN+mRImOJc747HTnUPVyYnQxPhPfOTpSiPAf/St9vSpnuWdBJgjF/CSgQY47DMtjo8 NwiGy3QSX0f4hqiU62oV/Za4BTjV2vM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-381-hYpi1Gq-PSWpwHxFAVr66A-1; Wed, 23 Oct 2019 16:18:24 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C6C44100551A; Wed, 23 Oct 2019 20:18:22 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id 202516012D; Wed, 23 Oct 2019 20:18:19 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 08/10] pipe: Rearrange sequence in pipe_write() to preallocate slot [ver #2] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Oct 2019 21:18:19 +0100 Message-ID: <157186189940.3995.3273332042636743290.stgit@warthog.procyon.org.uk> In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-MC-Unique: hYpi1Gq-PSWpwHxFAVr66A-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: Rearrange the sequence in pipe_write() so that the allocation of the new buffer, the allocation of a ring slot and the attachment to the ring is done under the pipe wait spinlock and then the lock is dropped and the buffer can be filled. The data copy needs to be done with the spinlock unheld and irqs enabled, so the lock needs to be dropped first. However, the reader can't progress as we're holding pipe->mutex. We also need to drop the lock as that would impact others looking at the pipe waitqueue, such as poll(), the consumer and a future kernel message writer. We just abandon the preallocated slot if we get a copy error. Future writes may continue it and a future read will eventually recycle it. Signed-off-by: David Howells --- fs/pipe.c | 51 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index e3a8f10750c9..1bfad2212b95 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -386,7 +386,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) { struct file *filp = iocb->ki_filp; struct pipe_inode_info *pipe = filp->private_data; - unsigned int head, tail, max_usage, mask; + unsigned int head, max_usage, mask; ssize_t ret = 0; int do_wakeup = 0; size_t total_len = iov_iter_count(from); @@ -404,14 +404,13 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) goto out; } - tail = pipe->tail; head = pipe->head; max_usage = pipe->max_usage; mask = pipe->ring_size - 1; /* We try to merge small writes */ chars = total_len & (PAGE_SIZE-1); /* size of the last buffer */ - if (!pipe_empty(head, tail) && chars != 0) { + if (!pipe_empty(head, pipe->tail) && chars != 0) { struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask]; int offset = buf->offset + buf->len; @@ -440,8 +439,8 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) break; } - tail = pipe->tail; - if (!pipe_full(head, tail, max_usage)) { + head = pipe->head; + if (!pipe_full(head, pipe->tail, max_usage)) { struct pipe_buffer *buf = &pipe->bufs[head & mask]; struct page *page = pipe->tmp_page; int copied; @@ -454,40 +453,56 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) } pipe->tmp_page = page; } + + /* Allocate a slot in the ring in advance and attach an + * empty buffer. If we fault or otherwise fail to use + * it, either the reader will consume it or it'll still + * be there for the next write. + */ + spin_lock_irq(&pipe->wait.lock); + + head = pipe->head; + pipe_commit_write(pipe, head + 1); + /* Always wake up, even if the copy fails. Otherwise * we lock up (O_NONBLOCK-)readers that sleep due to * syscall merging. * FIXME! Is this really true? */ - do_wakeup = 1; - copied = copy_page_from_iter(page, 0, PAGE_SIZE, from); - if (unlikely(copied < PAGE_SIZE && iov_iter_count(from))) { - if (!ret) - ret = -EFAULT; - break; - } - ret += copied; + wake_up_interruptible_sync_poll_locked( + &pipe->wait, EPOLLIN | EPOLLRDNORM); + + spin_unlock_irq(&pipe->wait.lock); + kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); /* Insert it into the buffer array */ + buf = &pipe->bufs[head & mask]; buf->page = page; buf->ops = &anon_pipe_buf_ops; buf->offset = 0; - buf->len = copied; + buf->len = 0; buf->flags = 0; if (is_packetized(filp)) { buf->ops = &packet_pipe_buf_ops; buf->flags = PIPE_BUF_FLAG_PACKET; } - - head++; - pipe_commit_write(pipe, head); pipe->tmp_page = NULL; + copied = copy_page_from_iter(page, 0, PAGE_SIZE, from); + if (unlikely(copied < PAGE_SIZE && iov_iter_count(from))) { + if (!ret) + ret = -EFAULT; + break; + } + ret += copied; + buf->offset = 0; + buf->len = copied; + if (!iov_iter_count(from)) break; } - if (!pipe_full(head, tail, max_usage)) + if (!pipe_full(head, pipe->tail, max_usage)) continue; /* Wait for buffer space to become available. */ From patchwork Wed Oct 23 20:18:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11207649 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5A7313B1 for ; Wed, 23 Oct 2019 20:18:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B3F892064A for ; Wed, 23 Oct 2019 20:18:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Ea287AW2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2407471AbfJWUSh (ORCPT ); Wed, 23 Oct 2019 16:18:37 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:34690 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2407469AbfJWUSh (ORCPT ); Wed, 23 Oct 2019 16:18:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571861916; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y5oBrUtAQ9C0QhrzjBJAL+muXoHgzb+Th6vM8rdAcVs=; b=Ea287AW2XIPF+citlEhjKs4KbG6R09vARa2undLGO5ndEzvP07Yu0z37g4zrBVrhKDfwoC 27LNLDZNufmTzexIWWkWYQikrB8YqMkM+QJnv+v8vh+LR+Zruw5mo60o+bL37LMn/K+Jlg dyImMzyuoDv92m5VdLq//ZgudM5ggzI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-197-15ykxvHfOZeZpt2NssVyQw-1; Wed, 23 Oct 2019 16:18:33 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4FEC7107AD31; Wed, 23 Oct 2019 20:18:31 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id BB1715DD7A; Wed, 23 Oct 2019 20:18:28 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 09/10] pipe: Remove redundant wakeup from pipe_write() [ver #2] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Oct 2019 21:18:28 +0100 Message-ID: <157186190801.3995.3420819228734631242.stgit@warthog.procyon.org.uk> In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-MC-Unique: 15ykxvHfOZeZpt2NssVyQw-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: Remove a redundant wakeup from pipe_write(). Signed-off-by: David Howells --- fs/pipe.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 1bfad2212b95..3df93990dd9d 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -516,11 +516,6 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) ret = -ERESTARTSYS; break; } - if (do_wakeup) { - wake_up_interruptible_sync_poll(&pipe->wait, EPOLLIN | EPOLLRDNORM); - kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); - do_wakeup = 0; - } pipe->waiting_writers++; pipe_wait(pipe); pipe->waiting_writers--; From patchwork Wed Oct 23 20:18:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11207659 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 183211747 for ; Wed, 23 Oct 2019 20:18:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EC2AF21872 for ; Wed, 23 Oct 2019 20:18:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aMF3H5Iz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2406028AbfJWUSq (ORCPT ); Wed, 23 Oct 2019 16:18:46 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:40243 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2407488AbfJWUSq (ORCPT ); Wed, 23 Oct 2019 16:18:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571861925; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Pfelg5hi6R03mZ1vLzjmM92WqqQzhNWSQkGO0kPCCJs=; b=aMF3H5Izte9cm8ahEdm9eAAreVhSy7vK3LbSb8/FLtgZXXtKhQHI7He3ZPAPQGpm41AE+y 8GKylcpzZEjLEdjLmWF1SZ6PhPg3vU29+sonb+Lzp+NBXD6yF1fcyKWiaUVw2fus5CLwA3 k4OIPsnvmLu+H1tEjmJCSd6njY2xM/E= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-15-jlueH4OLP0adVaz8-ap9CA-1; Wed, 23 Oct 2019 16:18:41 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 04489100551A; Wed, 23 Oct 2019 20:18:40 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6F5615DC18; Wed, 23 Oct 2019 20:18:37 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 10/10] pipe: Check for ring full inside of the spinlock in pipe_write() [ver #2] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Oct 2019 21:18:36 +0100 Message-ID: <157186191654.3995.6474781916564747415.stgit@warthog.procyon.org.uk> In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-MC-Unique: jlueH4OLP0adVaz8-ap9CA-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: Make pipe_write() check to see if the ring has become full between it taking the pipe mutex, checking the ring status and then taking the spinlock. This can happen if a notification is written into the pipe as that happens without the pipe mutex. Signed-off-by: David Howells --- fs/pipe.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/pipe.c b/fs/pipe.c index 3df93990dd9d..6a982a88f658 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -462,6 +462,11 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) spin_lock_irq(&pipe->wait.lock); head = pipe->head; + if (pipe_full(head, pipe->tail, max_usage)) { + spin_unlock_irq(&pipe->wait.lock); + continue; + } + pipe_commit_write(pipe, head + 1); /* Always wake up, even if the copy fails. Otherwise From patchwork Thu Oct 24 16:57:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11210345 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F135814ED for ; Thu, 24 Oct 2019 16:57:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C1A7420679 for ; Thu, 24 Oct 2019 16:57:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hpbu6utc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2410003AbfJXQ5r (ORCPT ); Thu, 24 Oct 2019 12:57:47 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:27526 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2410011AbfJXQ5q (ORCPT ); Thu, 24 Oct 2019 12:57:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1571936265; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NYl4FrLbBWaBG1625TSNrJR25CyJ2Un0Gw1Isq0q23I=; b=hpbu6utcg3EbBFzZbS/dRnVGuV40kL4RX7b7lNhVvS3o+ouXRb0WoWrD5XC/n7Cnn1n/Ph Q3PDdXpbuEejwMaxWhtvA4lhyC8KlTwwgNhiotYo8UoMx6I4ltR7gbqb2TPZMn96N9pOw8 r6gnAoGQvyyzBg05lk1PlcewOXOtkZk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-377-1QOKCWTXMqehQKb2ajkHSg-1; Thu, 24 Oct 2019 12:57:38 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 14C19801E66; Thu, 24 Oct 2019 16:57:36 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2966B5888; Thu, 24 Oct 2019 16:57:33 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> References: <157186182463.3995.13922458878706311997.stgit@warthog.procyon.org.uk> To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 11/10] pipe: Add fsync() support [ver #2] MIME-Version: 1.0 Content-ID: <30392.1571936252.1@warthog.procyon.org.uk> Date: Thu, 24 Oct 2019 17:57:32 +0100 Message-ID: <30394.1571936252@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: 1QOKCWTXMqehQKb2ajkHSg-1 X-Mimecast-Spam-Score: 0 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: pipe: Add fsync() support The keyrings testsuite needs the ability to wait for all the outstanding notifications in the queue to have been processed so that it can then go through them to find out whether the notifications it expected have been emitted. Implement fsync() support for pipes to provide this. The tailmost buffer at the point of calling is marked and fsync adds itself to the list of waiters, noting the tail position to be waited for and marking the buffer as no longer mergeable. Then when the buffer is consumed, if the flag is set, any matching waiters are woken up. Signed-off-by: David Howells --- fs/fuse/dev.c | 1 fs/pipe.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++ fs/splice.c | 3 ++ include/linux/pipe_fs_i.h | 22 ++++++++++++++++ lib/iov_iter.c | 2 - 5 files changed, 88 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 5ef57a322cb8..9617a35579cb 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1983,6 +1983,7 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, if (rem >= ibuf->len) { *obuf = *ibuf; ibuf->ops = NULL; + pipe_wake_fsync(pipe, ibuf, tail); tail++; pipe_commit_read(pipe, tail); } else { diff --git a/fs/pipe.c b/fs/pipe.c index 6a982a88f658..8e5fd7314be1 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -30,6 +30,12 @@ #include "internal.h" +struct pipe_fsync { + struct list_head link; /* Link in pipe->fsync */ + struct completion done; + unsigned int tail; /* The buffer being waited for */ +}; + /* * The max size that a non-root user is allowed to grow the pipe. Can * be set by root in /proc/sys/fs/pipe-max-size @@ -269,6 +275,58 @@ static bool pipe_buf_can_merge(struct pipe_buffer *buf) return buf->ops == &anon_pipe_buf_ops; } +/* + * Wait for all the data currently in the pipe to be consumed. + */ +static int pipe_fsync(struct file *file, loff_t a, loff_t b, int datasync) +{ + struct pipe_inode_info *pipe = file->private_data; + struct pipe_buffer *buf; + struct pipe_fsync fsync; + unsigned int head, tail, mask; + + pipe_lock(pipe); + + head = pipe->head; + tail = pipe->tail; + mask = pipe->ring_size - 1; + + if (pipe_empty(head, tail)) { + pipe_unlock(pipe); + return 0; + } + + init_completion(&fsync.done); + fsync.tail = tail; + buf = &pipe->bufs[tail & mask]; + buf->flags |= PIPE_BUF_FLAG_FSYNC; + pipe_buf_mark_unmergeable(buf); + list_add_tail(&fsync.link, &pipe->fsync); + pipe_unlock(pipe); + + if (wait_for_completion_interruptible(&fsync.done) < 0) { + pipe_lock(pipe); + list_del(&fsync.link); + pipe_unlock(pipe); + return -EINTR; + } + + return 0; +} + +void __pipe_wake_fsync(struct pipe_inode_info *pipe, unsigned int tail) +{ + struct pipe_fsync *fsync, *p; + + list_for_each_entry_safe(fsync, p, &pipe->fsync, link) { + if (fsync->tail == tail) { + list_del_init(&fsync->link); + complete(&fsync->done); + } + } +} +EXPORT_SYMBOL(__pipe_wake_fsync); + static ssize_t pipe_read(struct kiocb *iocb, struct iov_iter *to) { @@ -325,6 +383,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) if (!buf->len) { pipe_buf_release(pipe, buf); spin_lock_irq(&pipe->wait.lock); + pipe_wake_fsync(pipe, buf, tail); tail++; pipe_commit_read(pipe, tail); do_wakeup = 1; @@ -717,6 +776,7 @@ struct pipe_inode_info *alloc_pipe_info(void) pipe->ring_size = pipe_bufs; pipe->user = user; mutex_init(&pipe->mutex); + INIT_LIST_HEAD(&pipe->fsync); return pipe; } @@ -1060,6 +1120,7 @@ const struct file_operations pipefifo_fops = { .llseek = no_llseek, .read_iter = pipe_read, .write_iter = pipe_write, + .fsync = pipe_fsync, .poll = pipe_poll, .unlocked_ioctl = pipe_ioctl, .release = pipe_release, diff --git a/fs/splice.c b/fs/splice.c index 3f72bc31b6ec..e106367e1be6 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -523,6 +523,7 @@ static int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_des if (!buf->len) { pipe_buf_release(pipe, buf); + pipe_wake_fsync(pipe, buf, tail); tail++; pipe_commit_read(pipe, tail); if (pipe->files) @@ -771,6 +772,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, ret -= buf->len; buf->len = 0; pipe_buf_release(pipe, buf); + pipe_wake_fsync(pipe, buf, tail); tail++; pipe_commit_read(pipe, tail); if (pipe->files) @@ -1613,6 +1615,7 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, */ *obuf = *ibuf; ibuf->ops = NULL; + pipe_wake_fsync(ipipe, ibuf, i_tail); i_tail++; pipe_commit_read(ipipe, i_tail); input_wakeup = true; diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 90055ff16550..1a3027089558 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -8,6 +8,7 @@ #define PIPE_BUF_FLAG_ATOMIC 0x02 /* was atomically mapped */ #define PIPE_BUF_FLAG_GIFT 0x04 /* page is a gift */ #define PIPE_BUF_FLAG_PACKET 0x08 /* read() as a packet */ +#define PIPE_BUF_FLAG_FSYNC 0x10 /* fsync() is waiting for this buffer to die */ /** * struct pipe_buffer - a linux kernel pipe buffer @@ -43,6 +44,7 @@ struct pipe_buffer { * @w_counter: writer counter * @fasync_readers: reader side fasync * @fasync_writers: writer side fasync + * @fsync: Waiting fsyncs * @bufs: the circular array of pipe buffers * @user: the user who created this pipe **/ @@ -62,6 +64,7 @@ struct pipe_inode_info { struct page *tmp_page; struct fasync_struct *fasync_readers; struct fasync_struct *fasync_writers; + struct list_head fsync; struct pipe_buffer *bufs; struct user_struct *user; }; @@ -268,6 +271,25 @@ extern const struct pipe_buf_operations nosteal_pipe_buf_ops; long pipe_fcntl(struct file *, unsigned int, unsigned long arg); struct pipe_inode_info *get_pipe_info(struct file *file); +void __pipe_wake_fsync(struct pipe_inode_info *pipe, unsigned int tail); + +/** + * pipe_wake_fsync - Wake up anyone waiting with fsync for this point + * @pipe: The pipe that owns the buffer + * @buf: The pipe buffer in question + * @tail: The index in the ring of the buffer + * + * Check to see if anyone is waiting for the pipe ring to clear up to and + * including this buffer, and, if they are, wake them up. + */ +static inline void pipe_wake_fsync(struct pipe_inode_info *pipe, + struct pipe_buffer *buf, + unsigned int tail) +{ + if (unlikely(buf->flags & PIPE_BUF_FLAG_FSYNC)) + __pipe_wake_fsync(pipe, tail); +} + int create_pipe_files(struct file **, int); unsigned int round_pipe_size(unsigned long size); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index e22f4e283f6d..38d52524cd21 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -404,7 +404,7 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by buf->offset = offset; buf->len = bytes; - pipe_commit_read(pipe, i_head); + pipe_commit_write(pipe, i_head); i->iov_offset = offset + bytes; i->head = i_head; out: