From patchwork Thu Nov 29 05:55:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10703937 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2192E17F0 for ; Thu, 29 Nov 2018 05:59:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 14C7E2E6A0 for ; Thu, 29 Nov 2018 05:59:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 087942E6A9; Thu, 29 Nov 2018 05:59:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4C0A32E6A0 for ; Thu, 29 Nov 2018 05:59:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728783AbeK2RD0 (ORCPT ); Thu, 29 Nov 2018 12:03:26 -0500 Received: from mail.kernel.org ([198.145.29.99]:37600 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727402AbeK2RDZ (ORCPT ); Thu, 29 Nov 2018 12:03:25 -0500 Received: from sasha-vm.mshome.net (unknown [37.142.5.207]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8627321104; Thu, 29 Nov 2018 05:59:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1543471154; bh=e5Nv0J2oHwauCYmFw66YMHAf7Gfp+g+y45/Isxcld2Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CCdORK6UOIIdXfJ0NiD+GnUjouCkuIxO8q0Cj+HUVxLIRgJj8csOgpPnvhVFFtOdb xbJSwW/4FmYlhU1xhxko+UfJu2Elqzkk+iGefSbgcMxqiLf3ahSxp9nUl4i01yZARm +BH9049Ra7qPgQUWhfjSPzeqSWnXnzJnNui8ccb8= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Chanho Min , "Rafael J . Wysocki" , Sasha Levin , linux-fsdevel@vger.kernel.org Subject: [PATCH AUTOSEL 4.19 39/68] exec: make de_thread() freezable Date: Thu, 29 Nov 2018 00:55:30 -0500 Message-Id: <20181129055559.159228-39-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181129055559.159228-1-sashal@kernel.org> References: <20181129055559.159228-1-sashal@kernel.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chanho Min [ Upstream commit c22397888f1eed98cd59f0a88f2a5f6925f80e15 ] Suspend fails due to the exec family of functions blocking the freezer. The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for all sub-threads to die, and we have the deadlock if one of them is frozen. This also can occur with the schedule() waiting for the group thread leader to exit if it is frozen. In our machine, it causes freeze timeout as bellows. Freezing of tasks failed after 20.010 seconds (1 tasks refusing to freeze, wq_busy=0): setcpushares-ls D ffffffc00008ed70 0 5817 1483 0x0040000d Call trace: [] __switch_to+0x88/0xa0 [] __schedule+0x1bc/0x720 [] schedule+0x40/0xa8 [] flush_old_exec+0xdc/0x640 [] load_elf_binary+0x2a8/0x1090 [] search_binary_handler+0x9c/0x240 [] load_script+0x20c/0x228 [] search_binary_handler+0x9c/0x240 [] do_execveat_common.isra.14+0x4f8/0x6e8 [] compat_SyS_execve+0x38/0x48 [] el0_svc_naked+0x24/0x28 To fix this, make de_thread() freezable. It looks safe and works fine. Suggested-by: Oleg Nesterov Signed-off-by: Chanho Min Acked-by: Oleg Nesterov Acked-by: Pavel Machek Acked-by: Michal Hocko Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- fs/exec.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 1ebf6e5a521d..6da8745857cb 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -62,6 +62,7 @@ #include #include #include +#include #include #include @@ -1083,7 +1084,7 @@ static int de_thread(struct task_struct *tsk) while (sig->notify_count) { __set_current_state(TASK_KILLABLE); spin_unlock_irq(lock); - schedule(); + freezable_schedule(); if (unlikely(__fatal_signal_pending(tsk))) goto killed; spin_lock_irq(lock); @@ -1111,7 +1112,7 @@ static int de_thread(struct task_struct *tsk) __set_current_state(TASK_KILLABLE); write_unlock_irq(&tasklist_lock); cgroup_threadgroup_change_end(tsk); - schedule(); + freezable_schedule(); if (unlikely(__fatal_signal_pending(tsk))) goto killed; } From patchwork Thu Nov 29 05:55:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10704009 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A64413BF for ; Thu, 29 Nov 2018 06:11:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1C5242E778 for ; Thu, 29 Nov 2018 06:11:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0DAF32E78D; Thu, 29 Nov 2018 06:11:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 97BB52E778 for ; Thu, 29 Nov 2018 06:11:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729037AbeK2REO (ORCPT ); Thu, 29 Nov 2018 12:04:14 -0500 Received: from mail.kernel.org ([198.145.29.99]:38816 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727570AbeK2REN (ORCPT ); Thu, 29 Nov 2018 12:04:13 -0500 Received: from sasha-vm.mshome.net (unknown [37.142.5.207]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6836921019; Thu, 29 Nov 2018 06:00:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1543471202; bh=DQ3ixg0MFFH2XgMjali88FIJqJohl84f9waLu0eLn4Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=J3z6TBUr775tXtpzErfXjUgsiuaWvReLin6UXDjgGTRVLv2yCdFRiDwUS9TtifuJX wklYnXexwlErQaSHP6eIUVbhVSaCFAm7o6upK/TeoGaa35blci6rbR2abGrcEFiCVr TMUfbEYUsWICkmH2t2HcNDjfkMrcjZr/JZxifkQA= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Sasha Levin , linux-fsdevel@vger.kernel.org Subject: [PATCH AUTOSEL 4.19 51/68] iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents Date: Thu, 29 Nov 2018 00:55:42 -0500 Message-Id: <20181129055559.159228-51-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181129055559.159228-1-sashal@kernel.org> References: <20181129055559.159228-1-sashal@kernel.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner [ Upstream commit 0929d8580071c6a1cec1a7916a8f674c243ceee1 ] When we write into an unwritten extent via direct IO, we dirty metadata on IO completion to convert the unwritten extent to written. However, when we do the FUA optimisation checks, the inode may be clean and so we issue a FUA write into the unwritten extent. This means we then bypass the generic_write_sync() call after unwritten extent conversion has ben done and we don't force the modified metadata to stable storage. This violates O_DSYNC semantics. The window of exposure is a single IO, as the next DIO write will see the inode has dirty metadata and hence will not use the FUA optimisation. Calling generic_write_sync() after completion of the second IO will also sync the first write and it's metadata. Fix this by avoiding the FUA optimisation when writing to unwritten extents. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin --- fs/iomap.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index ec15cf2ec696..fa46e3ed8f53 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1597,12 +1597,13 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, if (iomap->flags & IOMAP_F_NEW) { need_zeroout = true; - } else { + } else if (iomap->type == IOMAP_MAPPED) { /* - * Use a FUA write if we need datasync semantics, this - * is a pure data IO that doesn't require any metadata - * updates and the underlying device supports FUA. This - * allows us to avoid cache flushes on IO completion. + * Use a FUA write if we need datasync semantics, this is a pure + * data IO that doesn't require any metadata updates (including + * after IO completion such as unwritten extent conversion) and + * the underlying device supports FUA. This allows us to avoid + * cache flushes on IO completion. */ if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && (dio->flags & IOMAP_DIO_WRITE_FUA) && From patchwork Thu Nov 29 05:55:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10704007 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C157C14E2 for ; Thu, 29 Nov 2018 06:11:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD1282E778 for ; Thu, 29 Nov 2018 06:11:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9C6DD2E78D; Thu, 29 Nov 2018 06:11:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B5152E778 for ; Thu, 29 Nov 2018 06:11:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729067AbeK2REQ (ORCPT ); Thu, 29 Nov 2018 12:04:16 -0500 Received: from mail.kernel.org ([198.145.29.99]:38852 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727570AbeK2REQ (ORCPT ); Thu, 29 Nov 2018 12:04:16 -0500 Received: from sasha-vm.mshome.net (unknown [37.142.5.207]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3A9C421104; Thu, 29 Nov 2018 06:00:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1543471205; bh=WdKGbFxZM9VvN1QzJOdBk7mXBsibUXd5Ruvhu4fCo+c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Payb8tCMypKN8BcfrMPElDbcA0viFKGWJobXaEOYQdxTrWYaGIISKUG/XHGhY9YXP fntTRGcAnHurLwFA3GGWCxN0lgoxPpWlrp1qLePB2oY+7W3uExvNKgWAclYRl0jH7j mFeqiCjwkJMRI3/TmhJTsCApiMVZSxM3/bU1qtAY= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Sasha Levin , linux-fsdevel@vger.kernel.org Subject: [PATCH AUTOSEL 4.19 52/68] iomap: sub-block dio needs to zeroout beyond EOF Date: Thu, 29 Nov 2018 00:55:43 -0500 Message-Id: <20181129055559.159228-52-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181129055559.159228-1-sashal@kernel.org> References: <20181129055559.159228-1-sashal@kernel.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner [ Upstream commit b450672fb66b4a991a5b55ee24209ac7ae7690ce ] If we are doing sub-block dio that extends EOF, we need to zero the unused tail of the block to initialise the data in it it. If we do not zero the tail of the block, then an immediate mmap read of the EOF block will expose stale data beyond EOF to userspace. Found with fsx running sub-block DIO sizes vs MAPREAD/MAPWRITE operations. Fix this by detecting if the end of the DIO write is beyond EOF and zeroing the tail if necessary. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin --- fs/iomap.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/iomap.c b/fs/iomap.c index fa46e3ed8f53..82e35265679d 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1678,7 +1678,14 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, dio->submit.cookie = submit_bio(bio); } while (nr_pages); - if (need_zeroout) { + /* + * We need to zeroout the tail of a sub-block write if the extent type + * requires zeroing or the write extends beyond EOF. If we don't zero + * the block tail in the latter case, we can expose stale data via mmap + * reads of the EOF block. + */ + if (need_zeroout || + ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) { /* zero out from the end of the write to the end of the block */ pad = pos & (fs_block_size - 1); if (pad) From patchwork Thu Nov 29 05:55:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10704005 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A12414E2 for ; Thu, 29 Nov 2018 06:11:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED4A72E778 for ; Thu, 29 Nov 2018 06:11:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC3D62E78D; Thu, 29 Nov 2018 06:11:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6EFCA2E778 for ; Thu, 29 Nov 2018 06:11:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728489AbeK2REU (ORCPT ); Thu, 29 Nov 2018 12:04:20 -0500 Received: from mail.kernel.org ([198.145.29.99]:38952 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727570AbeK2RET (ORCPT ); Thu, 29 Nov 2018 12:04:19 -0500 Received: from sasha-vm.mshome.net (unknown [37.142.5.207]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 250052133F; Thu, 29 Nov 2018 06:00:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1543471208; bh=r5NdYZc29/u2tP7cREPgP6768x9pRFyKpQDS/MKWgfY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IU6Yrwihy5+wMzhwccwnyFd0R5lPjVlAKzpTY3yFMb978s+S4tDvTIxRXeRr95v94 4HJhD5TPlEOs7j1XIzUFUBRg84zSONWB4SCCthqxwOD400V58Uxb3wsKS2Ke36Zxxb sDC/DBd8uX20QxmSebPrRiJhPFN0l7vh44S5tm7s= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Sasha Levin , linux-fsdevel@vger.kernel.org Subject: [PATCH AUTOSEL 4.19 53/68] iomap: dio data corruption and spurious errors when pipes fill Date: Thu, 29 Nov 2018 00:55:44 -0500 Message-Id: <20181129055559.159228-53-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181129055559.159228-1-sashal@kernel.org> References: <20181129055559.159228-1-sashal@kernel.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner [ Upstream commit 4721a6010990971440b4ffefbdf014976b8eda2f ] When doing direct IO to a pipe for do_splice_direct(), then pipe is trivial to fill up and overflow as it can only hold 16 pages. At this point bio_iov_iter_get_pages() then returns -EFAULT, and we abort the IO submission process. Unfortunately, iomap_dio_rw() propagates the error back up the stack. The error is converted from the EFAULT to EAGAIN in generic_file_splice_read() to tell the splice layers that the pipe is full. do_splice_direct() completely fails to handle EAGAIN errors (it aborts on error) and returns EAGAIN to the caller. copy_file_write() then completely fails to handle EAGAIN as well, and so returns EAGAIN to userspace, having failed to copy the data it was asked to. Avoid this whole steaming pile of fail by having iomap_dio_rw() silently swallow EFAULT errors and so do short reads. To make matters worse, iomap_dio_actor() has a stale data exposure bug bio_iov_iter_get_pages() fails - it does not zero the tail block that it may have been left uncovered by partial IO. Fix the error handling case to drop to the sub-block zeroing rather than immmediately returning the -EFAULT error. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin --- fs/iomap.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index 82e35265679d..e2e4a1154c90 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1581,7 +1581,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, struct bio *bio; bool need_zeroout = false; bool use_fua = false; - int nr_pages, ret; + int nr_pages, ret = 0; size_t copied = 0; if ((pos | length | align) & ((1 << blkbits) - 1)) @@ -1646,8 +1646,14 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, ret = bio_iov_iter_get_pages(bio, &iter); if (unlikely(ret)) { + /* + * We have to stop part way through an IO. We must fall + * through to the sub-block tail zeroing here, otherwise + * this short IO may expose stale data in the tail of + * the block we haven't written data to. + */ bio_put(bio); - return copied ? copied : ret; + goto zero_tail; } n = bio->bi_iter.bi_size; @@ -1684,6 +1690,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, * the block tail in the latter case, we can expose stale data via mmap * reads of the EOF block. */ +zero_tail: if (need_zeroout || ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) { /* zero out from the end of the write to the end of the block */ @@ -1691,7 +1698,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, if (pad) iomap_dio_zero(dio, iomap, pos, fs_block_size - pad); } - return copied; + return copied ? copied : ret; } static loff_t @@ -1866,6 +1873,15 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->wait_for_completion = true; ret = 0; } + + /* + * Splicing to pipes can fail on a full pipe. We have to + * swallow this to make it look like a short IO + * otherwise the higher splice layers will completely + * mishandle the error and stop moving data. + */ + if (ret == -EFAULT) + ret = 0; break; } pos += ret; From patchwork Thu Nov 29 05:55:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10703947 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D071017F0 for ; Thu, 29 Nov 2018 06:00:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C2AF52E745 for ; Thu, 29 Nov 2018 06:00:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B6C472E754; Thu, 29 Nov 2018 06:00:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 156D32E745 for ; Thu, 29 Nov 2018 06:00:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729105AbeK2REX (ORCPT ); Thu, 29 Nov 2018 12:04:23 -0500 Received: from mail.kernel.org ([198.145.29.99]:39036 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727570AbeK2REW (ORCPT ); Thu, 29 Nov 2018 12:04:22 -0500 Received: from sasha-vm.mshome.net (unknown [37.142.5.207]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E292B21019; Thu, 29 Nov 2018 06:00:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1543471210; bh=DpzPbtrVPMoXuGHy1EnsP1vGzaKA8GeCgDpIEH4MeBU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nLMzFvsxq2NYbLm+nR9A918aEws2wyDTxpd3bJN7mGPgXKs1sg7Q/dGUuK9TnqI5y Kx3GpwxUmotqxOgQsW06eu769YALqOCTJBqXYJmnTXQT1QXn8A8TBmxzc7E1i0hxd8 jJGNG18jTMHIHoNfwjShp49rbJVSdOEd040baJyE= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Sasha Levin , linux-fsdevel@vger.kernel.org Subject: [PATCH AUTOSEL 4.19 54/68] iomap: readpages doesn't zero page tail beyond EOF Date: Thu, 29 Nov 2018 00:55:45 -0500 Message-Id: <20181129055559.159228-54-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181129055559.159228-1-sashal@kernel.org> References: <20181129055559.159228-1-sashal@kernel.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner [ Upstream commit 8c110d43c6bca4b24dd13272a9d4e0ba6f2ec957 ] When we read the EOF page of the file via readpages, we need to zero the region beyond EOF that we either do not read or should not contain data so that mmap does not expose stale data to user applications. However, iomap_adjust_read_range() fails to detect EOF correctly, and so fsx on 1k block size filesystems fails very quickly with mapreads exposing data beyond EOF. There are two problems here. Firstly, when calculating the end block of the EOF byte, we have to round the size by one to avoid a block aligned EOF from reporting a block too large. i.e. a size of 1024 bytes is 1 block, which in index terms is block 0. Therefore we have to calculate the end block from (isize - 1), not isize. The second bug is determining if the current page spans EOF, and so whether we need split it into two half, one for the IO, and the other for zeroing. Unfortunately, the code that checks whether we should split the block doesn't actually check if we span EOF, it just checks if the read spans the /offset in the page/ that EOF sits on. So it splits every read into two if EOF is not page aligned, regardless of whether we are reading the EOF block or not. Hence we need to restrict the "does the read span EOF" check to just the page that spans EOF, not every page we read. This patch results in correct EOF detection through readpages: xfs_vm_readpages: dev 259:0 ino 0x43 nr_pages 24 xfs_iomap_found: dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4 iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864 xfs_iomap_found: dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864 iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864 As you can see, it now does full page reads until the last one which is split correctly at the block aligned EOF, reading 3072 bytes and zeroing the last 1024 bytes. The original version of the patch got this right, but it got another case wrong. The EOF detection crossing really needs to the the original length as plen, while it starts at the end of the block, will be shortened as up-to-date blocks are found on the page. This means "orig_pos + plen" no longer points to the end of the page, and so will not correctly detect EOF crossing. Hence we have to use the length passed in to detect this partial page case: xfs_filemap_fault: dev 259:1 ino 0x43 write_fault 0 xfs_vm_readpage: dev 259:1 ino 0x43 nr_pages 1 xfs_iomap_found: dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4 iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296 xfs_iomap_found: dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1 iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296 Heere we see a trace where the first block on the EOF page is up to date, hence poff = 1024 bytes. The offset into the page of EOF is 3072, so the range we want to read is 1024 - 3071, and the range we want to zero is 3072 - 4095. You can see this is split correctly now. This fixes the stale data beyond EOF problem that fsx quickly uncovers on 1k block size filesystems. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin --- fs/iomap.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index e2e4a1154c90..cc080445b057 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -143,13 +143,14 @@ static void iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop, loff_t *pos, loff_t length, unsigned *offp, unsigned *lenp) { + loff_t orig_pos = *pos; + loff_t isize = i_size_read(inode); unsigned block_bits = inode->i_blkbits; unsigned block_size = (1 << block_bits); unsigned poff = offset_in_page(*pos); unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length); unsigned first = poff >> block_bits; unsigned last = (poff + plen - 1) >> block_bits; - unsigned end = offset_in_page(i_size_read(inode)) >> block_bits; /* * If the block size is smaller than the page size we need to check the @@ -184,8 +185,12 @@ iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop, * handle both halves separately so that we properly zero data in the * page cache for blocks that are entirely outside of i_size. */ - if (first <= end && last > end) - plen -= (last - end) * block_size; + if (orig_pos <= isize && orig_pos + length > isize) { + unsigned end = offset_in_page(isize - 1) >> block_bits; + + if (first <= end && last > end) + plen -= (last - end) * block_size; + } *offp = poff; *lenp = plen;