From patchwork Sat Apr 13 16:54:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirill Smelkov X-Patchwork-Id: 10899579 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6F6617E1 for ; Sat, 13 Apr 2019 17:24:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6B7C52889C for ; Sat, 13 Apr 2019 17:24:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5E9D9288C9; Sat, 13 Apr 2019 17:24:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.6 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,URIBL_GREY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC3C52889C for ; Sat, 13 Apr 2019 17:24:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728646AbfDMRYp (ORCPT ); Sat, 13 Apr 2019 13:24:45 -0400 Received: from mail14.wdc04.mandrillapp.com ([205.201.139.14]:56130 "EHLO mail14.wdc04.mandrillapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728632AbfDMRYp (ORCPT ); Sat, 13 Apr 2019 13:24:45 -0400 X-Greylist: delayed 901 seconds by postgrey-1.27 at vger.kernel.org; Sat, 13 Apr 2019 13:24:44 EDT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=mandrill; d=nexedi.com; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Date:MIME-Version:Content-Type:Content-Transfer-Encoding; i=kirr@nexedi.com; bh=x0EtT+J5wWb+ZtJOdzWhHvTPnq3gOr8wqlzDcKgYIHc=; b=M+aIDgBlTZfjjveium1rmWMYQpDT5Vt0DJINLVrOu7EzxYh9KIstiMj83P+t0GsHw2hjE/5Faq0C +QjmBnDHFAeFpu1tDZO4PLcWVD3dMZB1GG6KgMiNS8CO/pHE8TbSj9qxrQIxQoN+n+FkeAcdWOFs molR2BVbRblcKl2cgM0= Received: from pmta08.mandrill.prod.suw01.rsglab.com (127.0.0.1) by mail14.wdc04.mandrillapp.com id hm8dlo1jvmg4 for ; Sat, 13 Apr 2019 16:54:58 +0000 (envelope-from ) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; i=@mandrillapp.com; q=dns/txt; s=mandrill; t=1555174498; h=From : Subject : To : Cc : Message-Id : In-Reply-To : References : Date : MIME-Version : Content-Type : Content-Transfer-Encoding : From : Subject : Date : X-Mandrill-User : List-Unsubscribe; bh=x0EtT+J5wWb+ZtJOdzWhHvTPnq3gOr8wqlzDcKgYIHc=; b=Df58U1KmARDYLy7edogUEpu6zpSnkkjOaOL6IU9aJIFwy2VkQBbPfmEgaA5fzz27ZS3dAa o/b4DzWf+az8t9OInh4eZvfxNawuycfuHY+nq9EqVMJI//NnDzVeljbYz9FDhMWw8q8XrwSU 3d7fnP4KcDr86KXd81kcJBi6XFpkY= From: Kirill Smelkov Subject: [PATCH 1/2] vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files Received: from [87.98.221.171] by mandrillapp.com id dc833330785c4110a24d48198a0b6d59; Sat, 13 Apr 2019 16:54:58 +0000 X-Mailer: git-send-email 2.21.0.593.g511ec345e1 To: Linus Torvalds Cc: Al Viro , Arnd Bergmann , Christoph Hellwig , Greg Kroah-Hartman , , , Kirill Smelkov , Rasmus Villemoes Message-Id: <20190413165449.11168-1-kirr@nexedi.com> In-Reply-To: <20190413165116.GB10314@deco.navytux.spb.ru> References: X-Report-Abuse: Please forward a copy of this message, including all headers, to abuse@mandrill.com X-Report-Abuse: You can also report abuse here: http://mandrillapp.com/contact/abuse?id=31050260.dc833330785c4110a24d48198a0b6d59 X-Mandrill-User: md_31050260 Date: Sat, 13 Apr 2019 16:54:58 +0000 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This amends commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") in how position is passed into .read()/.write() handler for stream-like files: Rasmus noticed that we currently pass 0 as position and ignore any position change if that is done by a file implementation. This papers over bugs if ppos is used in files that declare themselves as being stream-like as such bugs will go unnoticed. Even if a file implementation is correctly converted into using stream_open, its read/write later could be changed to use ppos and even though that won't be working correctly, that bug might go unnoticed without someone doing wrong behaviour analysis. It is thus better to pass ppos=NULL into read/write for stream-like files as that don't give any chance for ppos usage bugs because it will oops if ppos is ever used inside .read() or .write(). Note 1: rw_verify_area, new_sync_{read,write} needs to be updated because they are called by vfs_read/vfs_write & friends before file_operations .read/.write . Note 2: if file backend uses new-style .read_iter/.write_iter, position is still passed into there as non-pointer kiocb.ki_pos . Currently stream_open.cocci (semantic patch added by 10dce8af3422) ignores files whose file_operations has *_iter methods. Suggested-by: Rasmus Villemoes Signed-off-by: Kirill Smelkov Signed-off-by: Kirill Smelkov --- fs/open.c | 5 ++-- fs/read_write.c | 75 +++++++++++++++++++++++++++++-------------------- 2 files changed, 47 insertions(+), 33 deletions(-) diff --git a/fs/open.c b/fs/open.c index a00350018a47..9c7d724a6f67 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1219,8 +1219,9 @@ EXPORT_SYMBOL(nonseekable_open); /* * stream_open is used by subsystems that want stream-like file descriptors. * Such file descriptors are not seekable and don't have notion of position - * (file.f_pos is always 0). Contrary to file descriptors of other regular - * files, .read() and .write() can run simultaneously. + * (file.f_pos is always 0 and ppos passed to .read()/.write() is always NULL). + * Contrary to file descriptors of other regular files, .read() and .write() + * can run simultaneously. * * stream_open never fails and is marked to return int so that it could be * directly used as file_operations.open . diff --git a/fs/read_write.c b/fs/read_write.c index 61b43ad7608e..d62556be6848 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -371,7 +371,7 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t inode = file_inode(file); if (unlikely((ssize_t) count < 0)) return retval; - pos = *ppos; + pos = (ppos ? *ppos : 0); if (unlikely(pos < 0)) { if (!unsigned_offsets(file)) return retval; @@ -400,12 +400,13 @@ static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, lo ssize_t ret; init_sync_kiocb(&kiocb, filp); - kiocb.ki_pos = *ppos; + kiocb.ki_pos = (ppos ? *ppos : 0); iov_iter_init(&iter, READ, &iov, 1, len); ret = call_read_iter(filp, &kiocb, &iter); BUG_ON(ret == -EIOCBQUEUED); - *ppos = kiocb.ki_pos; + if (ppos) + *ppos = kiocb.ki_pos; return ret; } @@ -468,12 +469,12 @@ static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t ssize_t ret; init_sync_kiocb(&kiocb, filp); - kiocb.ki_pos = *ppos; + kiocb.ki_pos = (ppos ? *ppos : 0); iov_iter_init(&iter, WRITE, &iov, 1, len); ret = call_write_iter(filp, &kiocb, &iter); BUG_ON(ret == -EIOCBQUEUED); - if (ret > 0) + if (ret > 0 && ppos) *ppos = kiocb.ki_pos; return ret; } @@ -558,15 +559,10 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_ return ret; } -static inline loff_t file_pos_read(struct file *file) -{ - return file->f_mode & FMODE_STREAM ? 0 : file->f_pos; -} - -static inline void file_pos_write(struct file *file, loff_t pos) +/* file_ppos returns &file->f_pos or NULL if file is stream */ +static inline loff_t *file_ppos(struct file *file) { - if ((file->f_mode & FMODE_STREAM) == 0) - file->f_pos = pos; + return file->f_mode & FMODE_STREAM ? NULL : &file->f_pos; } ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count) @@ -575,10 +571,14 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count) ssize_t ret = -EBADF; if (f.file) { - loff_t pos = file_pos_read(f.file); - ret = vfs_read(f.file, buf, count, &pos); - if (ret >= 0) - file_pos_write(f.file, pos); + loff_t pos, *ppos = file_ppos(f.file); + if (ppos) { + pos = *ppos; + ppos = &pos; + } + ret = vfs_read(f.file, buf, count, ppos); + if (ret >= 0 && ppos) + f.file->f_pos = pos; fdput_pos(f); } return ret; @@ -595,10 +595,14 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count) ssize_t ret = -EBADF; if (f.file) { - loff_t pos = file_pos_read(f.file); - ret = vfs_write(f.file, buf, count, &pos); - if (ret >= 0) - file_pos_write(f.file, pos); + loff_t pos, *ppos = file_ppos(f.file); + if (ppos) { + pos = *ppos; + ppos = &pos; + } + ret = vfs_write(f.file, buf, count, ppos); + if (ret >= 0 && ppos) + f.file->f_pos = pos; fdput_pos(f); } @@ -673,14 +677,15 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter, ret = kiocb_set_rw_flags(&kiocb, flags); if (ret) return ret; - kiocb.ki_pos = *ppos; + kiocb.ki_pos = (ppos ? *ppos : 0); if (type == READ) ret = call_read_iter(filp, &kiocb, iter); else ret = call_write_iter(filp, &kiocb, iter); BUG_ON(ret == -EIOCBQUEUED); - *ppos = kiocb.ki_pos; + if (ppos) + *ppos = kiocb.ki_pos; return ret; } @@ -1013,10 +1018,14 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec, ssize_t ret = -EBADF; if (f.file) { - loff_t pos = file_pos_read(f.file); - ret = vfs_readv(f.file, vec, vlen, &pos, flags); - if (ret >= 0) - file_pos_write(f.file, pos); + loff_t pos, *ppos = file_ppos(f.file); + if (ppos) { + pos = *ppos; + ppos = &pos; + } + ret = vfs_readv(f.file, vec, vlen, ppos, flags); + if (ret >= 0 && ppos) + f.file->f_pos = pos; fdput_pos(f); } @@ -1033,10 +1042,14 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec, ssize_t ret = -EBADF; if (f.file) { - loff_t pos = file_pos_read(f.file); - ret = vfs_writev(f.file, vec, vlen, &pos, flags); - if (ret >= 0) - file_pos_write(f.file, pos); + loff_t pos, *ppos = file_ppos(f.file); + if (ppos) { + pos = *ppos; + ppos = &pos; + } + ret = vfs_writev(f.file, vec, vlen, ppos, flags); + if (ret >= 0 && ppos) + f.file->f_pos = pos; fdput_pos(f); } From patchwork Sat Apr 13 16:55:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirill Smelkov X-Patchwork-Id: 10899581 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 89F0417E1 for ; Sat, 13 Apr 2019 17:25:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D72C2889C for ; Sat, 13 Apr 2019 17:25:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 50796288C7; Sat, 13 Apr 2019 17:25:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.6 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,URIBL_GREY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 93035288C7 for ; Sat, 13 Apr 2019 17:25:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728648AbfDMRYq (ORCPT ); Sat, 13 Apr 2019 13:24:46 -0400 Received: from mail14.wdc04.mandrillapp.com ([205.201.139.14]:56130 "EHLO mail14.wdc04.mandrillapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728642AbfDMRYq (ORCPT ); Sat, 13 Apr 2019 13:24:46 -0400 X-Greylist: delayed 902 seconds by postgrey-1.27 at vger.kernel.org; Sat, 13 Apr 2019 13:24:45 EDT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=mandrill; d=nexedi.com; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Date:MIME-Version:Content-Type:Content-Transfer-Encoding; i=kirr@nexedi.com; bh=Iy97WLVPISZgdG4TAVxl763cg4XknvpGREfNQtadPV4=; b=YpzV7j33PSttw/N2pnQhpzbjqCRmPdoKyr6p/C0SH48bAApAumylSFwlwstqFShNp9I/htRSMcWz fFFNVehcYfY5V5V7/7Zsjmh6d7cXcDO5AlDxftn8OflAaP5Camu9rkSBh3QrHr6c1AZS2JWdfEN3 1jrf+uyr+BofkpH6nrI= Received: from pmta08.mandrill.prod.suw01.rsglab.com (127.0.0.1) by mail14.wdc04.mandrillapp.com id hm8dlq1jvmg6 for ; Sat, 13 Apr 2019 16:55:06 +0000 (envelope-from ) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; i=@mandrillapp.com; q=dns/txt; s=mandrill; t=1555174506; h=From : Subject : To : Cc : Message-Id : In-Reply-To : References : Date : MIME-Version : Content-Type : Content-Transfer-Encoding : From : Subject : Date : X-Mandrill-User : List-Unsubscribe; bh=Iy97WLVPISZgdG4TAVxl763cg4XknvpGREfNQtadPV4=; b=Gb+F74f/yrL3bQxfjrDkG1xGOFDGImyfs6XsUAveTXPI2gJnbGbOtTbe0WItADh5amAcgJ jNL8mntR8t+DIitcQ8VdkVgIlbdCycjTqFnyrNaljdQNjexCb+DSa+3/PJFKjpZ1mNDL0eDO tj4h1g8gn4KeSGuz7bpHSbKC6c+wY= From: Kirill Smelkov Subject: [PATCH 2/2] vfs: use &file->f_pos directly on files that have position Received: from [87.98.221.171] by mandrillapp.com id b1b401c17c974c8a9771778189978ed3; Sat, 13 Apr 2019 16:55:06 +0000 X-Mailer: git-send-email 2.21.0.593.g511ec345e1 To: Linus Torvalds Cc: Al Viro , Arnd Bergmann , Christoph Hellwig , Greg Kroah-Hartman , , , Kirill Smelkov Message-Id: <20190413165449.11168-2-kirr@nexedi.com> In-Reply-To: <20190413165116.GB10314@deco.navytux.spb.ru> References: <20190413165449.11168-1-kirr@nexedi.com> X-Report-Abuse: Please forward a copy of this message, including all headers, to abuse@mandrill.com X-Report-Abuse: You can also report abuse here: http://mandrillapp.com/contact/abuse?id=31050260.b1b401c17c974c8a9771778189978ed3 X-Mandrill-User: md_31050260 Date: Sat, 13 Apr 2019 16:55:06 +0000 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Long ago vfs read/write operations were passing ppos=&file->f_pos directly to .read / .write file_operations methods. That changed in 2004 in 55f09ec0087c ("read/write: pass down a copy of f_pos, not f_pos itself.") which started to pass ppos=&local_var trying to avoid simultaneous read/write/lseek stepping onto each other toes and overwriting file->f_pos racily. That measure was not complete and in 2014 commit 9c225f2655e36 ("vfs: atomic f_pos accesses as per POSIX") added file->f_pos_lock to completely disable simultaneous read/write/lseek runs. After f_pos_lock was introduced the reason to avoid passing ppos=&file->f_pos directly due to concurrency vanished. Linus explains[1]: In fact, we *used* to (long ago) pass in the address of "file->f_pos" itself to the low-level read/write routines. We then changed it to do that indirection through a local copy of pos (and file_pos_read/file_pos_write) because we didn't do the proper locking, so different read/write versions could mess with each other (and with lseek). But one of the things that commit 9c225f2655e36 ("vfs: atomic f_pos accesses as per POSIX") did was to add the proper locking at least for the cases that we care about deeply, so we *could* say that we have three cases: - FMODE_ATOMIC_POS: properly locked, - FMODE_STREAM: no pos at all - otherwise a "mostly don't care - don't mix!" and so we could go back to not copying the pos at all, and instead do something like loff_t *ppos = f.file->f_mode & FMODE_STREAM ? NULL : &file->f_pos; ret = vfs_write(f.file, buf, count, ppos); and perhaps have a long-term plan to try to get rid of the "don't mix" case entirely (ie "if you use f_pos, then we'll do the proper locking") (The above is obviously surrounded by the fdget_pos()/fdput_pos() that implements the locking decision). Currently for regular files we always set FMODE_ATOMIC_POS and change that to FMODE_STREAM if stream_open is used explicitly on open. That leaves other files, like e.g. sockets and pipes, for "mostly don't care - don't mix!" case. Sockets, for example, always check that on read/write the initial pos they receive is 0 and don't update it. And if it is !0 they return -ESPIPE. That suggests that we can do the switch into passing &file->f_pos directly now and incrementally convert to FMODE_STREAM files that were doing the stream-like checking manually in their low-level .read/.write handlers. Note: it is theoretically possible that a driver updates *ppos inside even if read/write returns error. For such cases the conversion will change IO semantic a bit. The semantic that is changing here was introduced in 2013 in commit 5faf153ebf61 "don't call file_pos_write() if vfs_{read,write}{,v}() fails". [1] https://lore.kernel.org/linux-fsdevel/CAHk-=whJtZt52SnhBGrNMnuxFn3GE9X_e02x8BPxtkqrfyZukw@mail.gmail.com/ Suggested-by: Linus Torvalds Signed-off-by: Kirill Smelkov --- fs/read_write.c | 36 ++++-------------------------------- 1 file changed, 4 insertions(+), 32 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index d62556be6848..13550b65cb2c 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -571,14 +571,7 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count) ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_read(f.file, buf, count, ppos); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_read(f.file, buf, count, file_ppos(f.file)); fdput_pos(f); } return ret; @@ -595,14 +588,7 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count) ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_write(f.file, buf, count, ppos); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_write(f.file, buf, count, file_ppos(f.file)); fdput_pos(f); } @@ -1018,14 +1004,7 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec, ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_readv(f.file, vec, vlen, ppos, flags); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_readv(f.file, vec, vlen, file_ppos(f.file), flags); fdput_pos(f); } @@ -1042,14 +1021,7 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec, ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_writev(f.file, vec, vlen, ppos, flags); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_writev(f.file, vec, vlen, file_ppos(f.file), flags); fdput_pos(f); }