From patchwork Tue Nov 5 11:58:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227587 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B5A014E5 for ; Tue, 5 Nov 2019 11:59:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 194D721D71 for ; Tue, 5 Nov 2019 11:59:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="u65yxE2w" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730922AbfKEL7E (ORCPT ); Tue, 5 Nov 2019 06:59:04 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:32859 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730880AbfKEL7E (ORCPT ); Tue, 5 Nov 2019 06:59:04 -0500 Received: by mail-pl1-f195.google.com with SMTP id ay6so2292626plb.0 for ; Tue, 05 Nov 2019 03:59:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=3VdKsd//dFRQ9+Fa8NSM2awvny3x28JChjcL7z2bSow=; b=u65yxE2wOCCjxGBRaPCG/rOGpX2Twv/PDVqQzPPBYm/w0c/Jk2T5953b5F36jTIchw xGbVJdUM2MFzgXacBQN5MuBXajw4c7KzLpv0lxWeEFuUa/NFstbyioNZPh/uV2ktZQn8 tyxeOGSHPfD4prABCaVEAX3RKHi4TUft5iratmp+cvdtidV2aj9zlRXmQFNAuCDvhojM dltROoO9jOfPaY3dkDqF6gokRZsd5X2Zm8ew/WM52H+ppplKsA9o/tHkt5YPjy8ikt4p C6RvsJbGsVnOQXlFNZuY10hD9olTozHlEUqdpSuC6qAmpqDfm8oEg9V6RJm0GOlsXMqz b9FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=3VdKsd//dFRQ9+Fa8NSM2awvny3x28JChjcL7z2bSow=; b=Rws1gkK6XyzoGv+FWB2KjBoSgGE1+qQ+ikYxXg1RVn0gxFJCTws7BJfC7bgBPi5oB/ C5/BHUWVbnqktT/2UJz0hvaKteeVjALqzIpwJgGGBmYcio2N9awinX80mXVRQsXwBQ4L nUY7YhR2NYplclJSglZZhJxIXLAxHD0J9i1G00Yhqy3Iemrtsv+XyEll52A4jjAWkBvS g0bqij12S/Gwr5HsMfyvQlvpn4ANLWTyFvr3Rs4ngZj6H3hdxH6E/BMsvQaiFfRlmbW6 JhcIuSVuRmY+pcVzBFYTbYW6B5q9NK4BfuH1KDumK1U8CXM4QltJIOi+ti8KRBZucu6y Kl0w== X-Gm-Message-State: APjAAAWZi9hkoIvQsziAd9DwD/9iv278P8SPsulnvUp1rzcYtc1G3gWG wGEpIDzlTDs0J9Y4KfjA96kGCl17jw== X-Google-Smtp-Source: APXvYqxLZgDUPW0YSoxvlIBvju48Z+fGQ6cHAThEnnmp+LPOgzqGUQGVeHFqEl77HxXzNX6ZnwsVeA== X-Received: by 2002:a17:902:fe11:: with SMTP id g17mr19690622plj.329.1572955141857; Tue, 05 Nov 2019 03:59:01 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id z18sm22687374pfq.182.2019.11.05.03.58.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 03:59:01 -0800 (PST) Date: Tue, 5 Nov 2019 22:58:55 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 01/11] ext4: reorder map.m_flags checks within ext4_iomap_begin() Message-ID: <1309ad80d31a637b2deed55a85283d582a54a26a.1572949325.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For the direct I/O changes that follow in this patch series, we need to accommodate for the case where the block mapping flags passed through to ext4_map_blocks() result in m_flags having both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN bits set. In order for any allocated unwritten extents to be converted correctly in the ->end_io() handler, the iomap->type must be set to IOMAP_UNWRITTEN for cases where the EXT4_MAP_UNWRITTEN bit has been set within m_flags. Hence the reason why we need to reshuffle this conditional statement around. This change is a no-op for DAX as the block mapping flags passed through to ext4_map_blocks() i.e. EXT4_GET_BLOCKS_CREATE_ZERO never results in both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN being set at once. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/inode.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 0d8971b819e9..e4b0722717b3 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3577,10 +3577,20 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, iomap->type = delalloc ? IOMAP_DELALLOC : IOMAP_HOLE; iomap->addr = IOMAP_NULL_ADDR; } else { - if (map.m_flags & EXT4_MAP_MAPPED) { - iomap->type = IOMAP_MAPPED; - } else if (map.m_flags & EXT4_MAP_UNWRITTEN) { + /* + * Flags passed into ext4_map_blocks() for direct I/O writes + * can result in m_flags having both EXT4_MAP_MAPPED and + * EXT4_MAP_UNWRITTEN bits set. In order for any allocated + * unwritten extents to be converted into written extents + * correctly within the ->end_io() handler, we need to ensure + * that the iomap->type is set appropriately. Hence the reason + * why we need to check whether EXT4_MAP_UNWRITTEN is set + * first. + */ + if (map.m_flags & EXT4_MAP_UNWRITTEN) { iomap->type = IOMAP_UNWRITTEN; + } else if (map.m_flags & EXT4_MAP_MAPPED) { + iomap->type = IOMAP_MAPPED; } else { WARN_ON_ONCE(1); return -EIO; From patchwork Tue Nov 5 11:59:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227589 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD5B114E5 for ; Tue, 5 Nov 2019 11:59:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ABBE521D7C for ; Tue, 5 Nov 2019 11:59:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="nPiHZ8jx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730935AbfKEL73 (ORCPT ); Tue, 5 Nov 2019 06:59:29 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:43833 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730852AbfKEL73 (ORCPT ); Tue, 5 Nov 2019 06:59:29 -0500 Received: by mail-pf1-f196.google.com with SMTP id 3so15131057pfb.10 for ; Tue, 05 Nov 2019 03:59:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=LcQBmC1CbJ1RDhTAUWqHhOSwn0GfYhx2KOpxTYNCMvg=; b=nPiHZ8jxM0wm48aejSLVJS+siylmp4UzJPTDZ0tM8pI1kBrxL+mTorpO1hXya2RnU5 NDOl7ihkPddhpC3A3fGEG3HjokZgjfieDehIVEM9m4MiI2ChQrIVj089FWqqW4X24RRN j6Q/o3cARG6v3bcaDN3yWhvWounx8mVaHXMw+EQXemwL/HxMXdlw5cqLyPNdbEBNypvs D4G4YSp8+KdperEVTAD6/rrtbdBx6pnebGPBGTixbX6FvM739DVOIj1gOaG2eeVcDydF gwTmmqMs7rKp2W1yuntfL2ZKGzFtgVd4asoA6QMSyCUU4yF+K4Ialyi0JrN0Xsl0Fr6C chnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=LcQBmC1CbJ1RDhTAUWqHhOSwn0GfYhx2KOpxTYNCMvg=; b=fckyS5Q1jzOmlr9ksWy5F4pHRydtaKJ6XR1At4yg7gG3vp0wfrHeJHBOP+iIlMClKb bXXsIL8EPOn8CqKcQ71tc+CMmzhfHjzGSwkB9PVPXE6L4fGA4ODoDWS9N+sQzPipWUfb C2nkUmAc2vTHIjL7kt6j4c7aqrvxz+ZylXIXNxE6DQtZ/Nk5IrHdL5HHjW0u8R45wb/r t1B9UIZhaGWC3MqHiOjEoXXXUgQYsNpdhJQIz52vXdApYrKdZ/HTqypR/LtKNXjl3sCq G3vUeImlLkpJWe7lA62lSh2VU5w7LE+En4M1OvO+IKEmaxPyl493KTPZh45WOUBL/CJU T0Xw== X-Gm-Message-State: APjAAAWujV63HgKJ5KH7NL+JAdByf3SFANslnN3I8Y9tDQTSM9tiOJvi 0ynheSVOHFpHQBJbCAPkHeSP X-Google-Smtp-Source: APXvYqxBk8sFHdjCziZsEkA4GQBpYmWh6FCuVbjKthvUyG44q3eE3nYGCpwaYtq7dWkBaOrzjRbc7g== X-Received: by 2002:a17:90a:6d27:: with SMTP id z36mr6215476pjj.38.1572955168558; Tue, 05 Nov 2019 03:59:28 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id k20sm13102609pgn.40.2019.11.05.03.59.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 03:59:27 -0800 (PST) Date: Tue, 5 Nov 2019 22:59:22 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 02/11] ext4: update direct I/O read lock pattern for IOCB_NOWAIT Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch updates the lock pattern in ext4_direct_IO_read() to not block on inode lock in cases of IOCB_NOWAIT direct I/O reads. The locking condition implemented here is similar to that of 942491c9e6d6 ("xfs: fix AIM7 regression"). Fixes: 16c54688592c ("ext4: Allow parallel DIO reads") Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/inode.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e4b0722717b3..f33fa86fff67 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3881,7 +3881,13 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter) * writes & truncates and since we take care of writing back page cache, * we are protected against page writeback as well. */ - inode_lock_shared(inode); + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!inode_trylock_shared(inode)) + return -EAGAIN; + } else { + inode_lock_shared(inode); + } + ret = filemap_write_and_wait_range(mapping, iocb->ki_pos, iocb->ki_pos + count - 1); if (ret) From patchwork Tue Nov 5 11:59:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227591 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A568E14E5 for ; Tue, 5 Nov 2019 11:59:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 83BFE217F4 for ; Tue, 5 Nov 2019 11:59:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="vE2uEZ+M" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730945AbfKEL7o (ORCPT ); Tue, 5 Nov 2019 06:59:44 -0500 Received: from mail-pg1-f193.google.com ([209.85.215.193]:43862 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730852AbfKEL7o (ORCPT ); Tue, 5 Nov 2019 06:59:44 -0500 Received: by mail-pg1-f193.google.com with SMTP id l24so13970166pgh.10 for ; Tue, 05 Nov 2019 03:59:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=IalHleKHLFMzSLg0SYJL+fw2PdceORLMv9WENP/4q30=; b=vE2uEZ+MnatULuVJetsFF5gEPeIzWNhZ95NapXKd4imaR/SqowpoGqqjFw/tVbdzG+ CI9snjgoCBDH8YMPrANIklT8ADrrEDUFXSrgLVl7M6uqnYJ+SeRINsjiuMH/3MzZjCRC x7uAmLnMEuClE+Mrfc8NKFi41jLNmE9Xy9bFj/+QshX8tGHRTNr6Z7FGBGPEAEIXws4E GP31NNnaNH0ktW/lhL1GcZhQzUoURdnz5KyqharDQ7f30iaEiHD+Nps2dz+FrrSgCIL6 tP9Meg4BlA0z0xswlmititxW6GAK5DkszjR9AKm43+GAfOyovCMgq1ySEK+5f/D4iKkG NVSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=IalHleKHLFMzSLg0SYJL+fw2PdceORLMv9WENP/4q30=; b=LadXn7A57zOnAJ9IfhxXyEOAaOTp/XY0shIILon1lpTQDB9aMqjU3JC2hcQNfLB0JL ahGgA5m67eyDwPszaaDJRmvRD0A8JXoH/2vF49V9jyrM0SQdmlpyxkC7zPqlKG9csx98 iQ78yx8Jjk6TvM4cI7VGCpGaSKnwW0NlPGKx6bj+q/8pnD9k3jafHdkib0wTWycaQvPm qqTT7EyS4L3DlEpq/VGpERfxUSwIa5kgcFNCb/4mEzZpGZkboJF33KGL3wolFWFN+nie SWqSOG0FvKTyF4jGD2Dycyl1fGktfn3Co8sXz5p0o+eIXFtTsG55CKvZ4F6hKl9j4NA1 4BZQ== X-Gm-Message-State: APjAAAUZLjQA5QiY/gSOyaUf9VRSZz88Ut8aF0Sa1rT1ISPQgRfR4MRl CTVMvLrw3DzBOpBfIokcFyvvxp+x3g== X-Google-Smtp-Source: APXvYqx42HPFhiqLRP7eQKgRiEHCLM07ajKcCvwhrVhnYOxkKyej9oG0EDT/Nx1XkNh3zvuThXkgwA== X-Received: by 2002:a65:4183:: with SMTP id a3mr36366936pgq.440.1572955183724; Tue, 05 Nov 2019 03:59:43 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id 31sm18822992pgy.63.2019.11.05.03.59.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 03:59:43 -0800 (PST) Date: Tue, 5 Nov 2019 22:59:37 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 03/11] ext4: iomap that extends beyond EOF should be marked dirty Message-ID: <8b43ee9ee94bee5328da56ba0909b7d2229ef150.1572949325.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch addresses what Dave Chinner had discovered and fixed within commit: 7684e2c4384d. This changes does not have any user visible impact for ext4 as none of the current users of ext4_iomap_begin() that extend files depend on IOMAP_F_DIRTY. When doing a direct IO that spans the current EOF, and there are written blocks beyond EOF that extend beyond the current write, the only metadata update that needs to be done is a file size extension. However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that there is IO completion metadata updates required, and hence we may fail to correctly sync file size extensions made in IO completion when O_DSYNC writes are being used and the hardware supports FUA. Hence when setting IOMAP_F_DIRTY, we need to also take into account whether the iomap spans the current EOF. If it does, then we need to mark it dirty so that IO completion will call generic_write_sync() to flush the inode size update to stable storage correctly. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/inode.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f33fa86fff67..b422d9b8c0bd 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3565,8 +3565,14 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, return ret; } + /* + * Writes that span EOF might trigger an I/O size update on completion, + * so consider them to be dirty for the purposes of O_DSYNC, even if + * there is no other metadata changes being made or are pending here. + */ iomap->flags = 0; - if (ext4_inode_datasync_dirty(inode)) + if (ext4_inode_datasync_dirty(inode) || + offset + length > i_size_read(inode)) iomap->flags |= IOMAP_F_DIRTY; iomap->bdev = inode->i_sb->s_bdev; iomap->dax_dev = sbi->s_daxdev; From patchwork Tue Nov 5 11:59:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227593 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECA2C1515 for ; Tue, 5 Nov 2019 12:00:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C0F5D21D7C for ; Tue, 5 Nov 2019 12:00:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="IySyQPBd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730946AbfKEMAF (ORCPT ); Tue, 5 Nov 2019 07:00:05 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:40393 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726751AbfKEMAF (ORCPT ); Tue, 5 Nov 2019 07:00:05 -0500 Received: by mail-pf1-f195.google.com with SMTP id r4so15139757pfl.7 for ; Tue, 05 Nov 2019 04:00:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=9DvOG+Bzv+9J/Y7WvEj6a/mdMF4pBTR9bq5BO8IzC1o=; b=IySyQPBdtAQQIY0PRyNHAxgm2468OhiEY0eml6eaF/5afuGwHCwY63QklmMHmzDrVn 0b69uPXsT2MreXOnf6JKXXxTXnvKZ/eHUPLBSffZ1NgLL1tsCeEDMin7T0JRamYTp8Gl mKe/gQHgPCywWAoWrVqWiM/fnY7TbCGfbU794PH/6IN5T7OA+S0xs/iHRpGw4gMOnEQK kBsniYMYpS8GF2a/SVHrlAz0f7F5CXDQ9oKvp6DTlLBnqPeFrZqOdXU4iSPenKvnhcYq fEbVJS9qN2yqD9l27dPRD4CHeTDpGBb3qPx1fVGdV1TWfbVkW+4uS9kD4eDq0HjDJB/Q jb+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=9DvOG+Bzv+9J/Y7WvEj6a/mdMF4pBTR9bq5BO8IzC1o=; b=ff9O361os5E1+kKYPXZ6PcAiIk7QNT2AZf6JOEBkz/CIPXD/P+sWoH5NhurUNwMpkQ bf0eHhVG4251Bh8l9TLxLDxKPL6TpBZtLfnyj7vhscBqyQK95M42reSa21+rYEt6NeWY JB3XKVB1gmMkKvFsx3cBjcI1ohM1On2Pug53b++bOWFtxTm1/wve0SpYUlmAxGSAmWxs G5nApAuMwgQbu4Zqaiw85pvP24mjwm7BK8LIvdCrwcG2HBzjsevDhCqwWgvxVqyBfZ7k JrU/CPapagHUgopZkfg5yknJjybYucnUXHbrvYcQS278a8ho1RY43XtFLTrRueCrVTCv Bo4w== X-Gm-Message-State: APjAAAVjnxfUDxcTFKE4b9wHZVX/WGFrtml1JV7jypGXVGWxtVpupJWl wVM0ID+kxlGQoulWGehKlzsN X-Google-Smtp-Source: APXvYqx7GKubfC6doj+f7ebYapYbfwM+Ry9OnpW/snQnOT6awK9qf7gNmUrlqaH9hSBdoAbNO4EsLQ== X-Received: by 2002:a63:1703:: with SMTP id x3mr36987546pgl.263.1572955202310; Tue, 05 Nov 2019 04:00:02 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id e8sm22115227pga.17.2019.11.05.03.59.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 04:00:01 -0800 (PST) Date: Tue, 5 Nov 2019 22:59:56 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 04/11] ext4: move set iomap routines into a separate helper ext4_set_iomap() Message-ID: <1ea34da65eecffcddffb2386668ae06134e8deaf.1572949325.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Separate the iomap field population code that is currently within ext4_iomap_begin() into a separate helper ext4_set_iomap(). The intent of this function is self explanatory, however the rationale behind taking this step is to reeduce the overall clutter that we currently have within the ext4_iomap_begin() callback. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/inode.c | 90 ++++++++++++++++++++++++++----------------------- 1 file changed, 48 insertions(+), 42 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index b422d9b8c0bd..9e1ac9fe816b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3448,10 +3448,54 @@ static bool ext4_inode_datasync_dirty(struct inode *inode) return inode->i_state & I_DIRTY_DATASYNC; } +static void ext4_set_iomap(struct inode *inode, struct iomap *iomap, + struct ext4_map_blocks *map, loff_t offset, + loff_t length) +{ + u8 blkbits = inode->i_blkbits; + + /* + * Writes that span EOF might trigger an I/O size update on completion, + * so consider them to be dirty for the purpose of O_DSYNC, even if + * there is no other metadata changes being made or are pending. + */ + iomap->flags = 0; + if (ext4_inode_datasync_dirty(inode) || + offset + length > i_size_read(inode)) + iomap->flags |= IOMAP_F_DIRTY; + + if (map->m_flags & EXT4_MAP_NEW) + iomap->flags |= IOMAP_F_NEW; + + iomap->bdev = inode->i_sb->s_bdev; + iomap->dax_dev = EXT4_SB(inode->i_sb)->s_daxdev; + iomap->offset = (u64) map->m_lblk << blkbits; + iomap->length = (u64) map->m_len << blkbits; + + /* + * Flags passed to ext4_map_blocks() for direct I/O writes can result + * in m_flags having both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN bits + * set. In order for any allocated unwritten extents to be converted + * into written extents correctly within the ->end_io() handler, we + * need to ensure that the iomap->type is set appropriately. Hence, the + * reason why we need to check whether the EXT4_MAP_UNWRITTEN bit has + * been set first. + */ + if (map->m_flags & EXT4_MAP_UNWRITTEN) { + iomap->type = IOMAP_UNWRITTEN; + iomap->addr = (u64) map->m_pblk << blkbits; + } else if (map->m_flags & EXT4_MAP_MAPPED) { + iomap->type = IOMAP_MAPPED; + iomap->addr = (u64) map->m_pblk << blkbits; + } else { + iomap->type = IOMAP_HOLE; + iomap->addr = IOMAP_NULL_ADDR; + } +} + static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned flags, struct iomap *iomap, struct iomap *srcmap) { - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); unsigned int blkbits = inode->i_blkbits; unsigned long first_block, last_block; struct ext4_map_blocks map; @@ -3565,47 +3609,9 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, return ret; } - /* - * Writes that span EOF might trigger an I/O size update on completion, - * so consider them to be dirty for the purposes of O_DSYNC, even if - * there is no other metadata changes being made or are pending here. - */ - iomap->flags = 0; - if (ext4_inode_datasync_dirty(inode) || - offset + length > i_size_read(inode)) - iomap->flags |= IOMAP_F_DIRTY; - iomap->bdev = inode->i_sb->s_bdev; - iomap->dax_dev = sbi->s_daxdev; - iomap->offset = (u64)first_block << blkbits; - iomap->length = (u64)map.m_len << blkbits; - - if (ret == 0) { - iomap->type = delalloc ? IOMAP_DELALLOC : IOMAP_HOLE; - iomap->addr = IOMAP_NULL_ADDR; - } else { - /* - * Flags passed into ext4_map_blocks() for direct I/O writes - * can result in m_flags having both EXT4_MAP_MAPPED and - * EXT4_MAP_UNWRITTEN bits set. In order for any allocated - * unwritten extents to be converted into written extents - * correctly within the ->end_io() handler, we need to ensure - * that the iomap->type is set appropriately. Hence the reason - * why we need to check whether EXT4_MAP_UNWRITTEN is set - * first. - */ - if (map.m_flags & EXT4_MAP_UNWRITTEN) { - iomap->type = IOMAP_UNWRITTEN; - } else if (map.m_flags & EXT4_MAP_MAPPED) { - iomap->type = IOMAP_MAPPED; - } else { - WARN_ON_ONCE(1); - return -EIO; - } - iomap->addr = (u64)map.m_pblk << blkbits; - } - - if (map.m_flags & EXT4_MAP_NEW) - iomap->flags |= IOMAP_F_NEW; + ext4_set_iomap(inode, iomap, &map, offset, length); + if (delalloc && iomap->type == IOMAP_HOLE) + iomap->type = IOMAP_DELALLOC; return 0; } From patchwork Tue Nov 5 12:00:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227595 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B65831515 for ; Tue, 5 Nov 2019 12:00:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8B2B521D7C for ; Tue, 5 Nov 2019 12:00:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="cgWsMGk3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730959AbfKEMAW (ORCPT ); Tue, 5 Nov 2019 07:00:22 -0500 Received: from mail-pl1-f196.google.com ([209.85.214.196]:41793 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726751AbfKEMAV (ORCPT ); Tue, 5 Nov 2019 07:00:21 -0500 Received: by mail-pl1-f196.google.com with SMTP id d29so2936139plj.8 for ; Tue, 05 Nov 2019 04:00:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Rm51/8A1FKCRPHUWGAek8qkPPsfDsFDO53iwbaAiymA=; b=cgWsMGk3ooN4Zx1i2OmisZ2e3J13bSH2R7g3MeS32R8ie1nAI6BiB289EX4mM6VUgX nZn3CoVWzehrY3qLXzA2MFDhyOoZpqX7UWH7JgKnJZXbb2fBDh3yQDpgrPXYNQ0C2Qgf asH3QROVITNEOl+yWtHEJ7QqNk6UXK7Sip/58rnZl+VjEm0tXL23yiRh21W4Wgy2AeGc kokqVnSR8LTbx3W9IA61J4D1hM0orcTdn8N1WYTXpjAEDl3S18GJCXMg8QLCN8dI3HxT KZMjFYD6Uk9bOaInP6YA5H1dD+G5m3gbv2fihUB8e0zgPLKn5c2HVqN568tWL/zneYbp L6vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Rm51/8A1FKCRPHUWGAek8qkPPsfDsFDO53iwbaAiymA=; b=FIXJfHPeg1XzdL3smD89b1DVGk9wIiZ6ET8yVzSBb+CAnH/wqCTDztBAOWLL1o11PL xNXUnmT6Ik0J7w29z3JSZ2nMTo5Ue4Y4Le9JRO65zSxx5vtqrPB9VYoNqNkswHkLuEGX ok6RBJwVLKEWJY/6cswimf8TAaKCDSxkuQjAvMunhCTVJtMYd5qofCz5tjGpLYc7Zcz8 v7N40kb/PAmOna0aOQ7SYn6XY0DE26v9zjtrr1BlkEjrGkt5BPRGg33K+OuLEwXiuxzH 2iLjwuQ4/96v2fsL8OQ8c/wUFA/kyGrLZQobQDknRrBoY00oDMVK3hiNPx7iUCIFAl7X EmNA== X-Gm-Message-State: APjAAAWu9vhpbAqhA0N84td2S+eoXLY6MP+7fh7/M3YKdiINJAPNJhSY UkTIZNoVzJK86NLhjrSzdDL+ X-Google-Smtp-Source: APXvYqx+yuz+55P2OkZTQLdWFLk8ay5ouJrH/tCnzKQV+MRpP8dWD/8n0jFlcPxnqGIx3Owh1cKOog== X-Received: by 2002:a17:902:7e45:: with SMTP id a5mr9164172pln.315.1572955220806; Tue, 05 Nov 2019 04:00:20 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id q34sm23038586pjb.15.2019.11.05.04.00.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 04:00:20 -0800 (PST) Date: Tue, 5 Nov 2019 23:00:14 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 05/11] ext4: split IOMAP_WRITE branch in ext4_iomap_begin() into helper Message-ID: <50eef383add1ea529651640574111076c55aca9f.1572949325.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In preparation for porting across the ext4 direct I/O path over to the iomap infrastructure, split up the IOMAP_WRITE branch that's currently within ext4_iomap_begin() into a separate helper ext4_alloc_iomap(). This way, when we add in the necessary code for direct I/O, we don't end up with ext4_iomap_begin() becoming a monstrous twisty maze. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/inode.c | 113 ++++++++++++++++++++++++++---------------------- 1 file changed, 61 insertions(+), 52 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9e1ac9fe816b..b540f2903faa 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3493,6 +3493,63 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap, } } +static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, + unsigned int flags) +{ + handle_t *handle; + u8 blkbits = inode->i_blkbits; + int ret, dio_credits, retries = 0; + + /* + * Trim the mapping request to the maximum value that we can map at + * once for direct I/O. + */ + if (map->m_len > DIO_MAX_BLOCKS) + map->m_len = DIO_MAX_BLOCKS; + dio_credits = ext4_chunk_trans_blocks(inode, map->m_len); + +retry: + /* + * Either we allocate blocks and then don't get an unwritten extent, so + * in that case we have reserved enough credits. Or, the blocks are + * already allocated and unwritten. In that case, the extent conversion + * fits into the credits as well. + */ + handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS, dio_credits); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + ret = ext4_map_blocks(handle, inode, map, EXT4_GET_BLOCKS_CREATE_ZERO); + if (ret < 0) + goto journal_stop; + + /* + * If we've allocated blocks beyond EOF, we need to ensure that they're + * truncated if we crash before updating the inode size metadata within + * ext4_iomap_end(). For faults, we don't need to do that (and cannot + * due to orphan list operations needing an inode_lock()). If we happen + * to instantiate blocks beyond EOF, it is because we race with a + * truncate operation, which already has added the inode onto the + * orphan list. + */ + if (!(flags & IOMAP_FAULT) && map->m_lblk + map->m_len > + (i_size_read(inode) + (1 << blkbits) - 1) >> blkbits) { + int err; + + err = ext4_orphan_add(handle, inode); + if (err < 0) + ret = err; + } + +journal_stop: + ext4_journal_stop(handle); + if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) + goto retry; + + return ret; +} + + static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned flags, struct iomap *iomap, struct iomap *srcmap) { @@ -3553,62 +3610,14 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, } } } else if (flags & IOMAP_WRITE) { - int dio_credits; - handle_t *handle; - int retries = 0; - - /* Trim mapping request to maximum we can map at once for DIO */ - if (map.m_len > DIO_MAX_BLOCKS) - map.m_len = DIO_MAX_BLOCKS; - dio_credits = ext4_chunk_trans_blocks(inode, map.m_len); -retry: - /* - * Either we allocate blocks and then we don't get unwritten - * extent so we have reserved enough credits, or the blocks - * are already allocated and unwritten and in that case - * extent conversion fits in the credits as well. - */ - handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS, - dio_credits); - if (IS_ERR(handle)) - return PTR_ERR(handle); - - ret = ext4_map_blocks(handle, inode, &map, - EXT4_GET_BLOCKS_CREATE_ZERO); - if (ret < 0) { - ext4_journal_stop(handle); - if (ret == -ENOSPC && - ext4_should_retry_alloc(inode->i_sb, &retries)) - goto retry; - return ret; - } - - /* - * If we added blocks beyond i_size, we need to make sure they - * will get truncated if we crash before updating i_size in - * ext4_iomap_end(). For faults we don't need to do that (and - * even cannot because for orphan list operations inode_lock is - * required) - if we happen to instantiate block beyond i_size, - * it is because we race with truncate which has already added - * the inode to the orphan list. - */ - if (!(flags & IOMAP_FAULT) && first_block + map.m_len > - (i_size_read(inode) + (1 << blkbits) - 1) >> blkbits) { - int err; - - err = ext4_orphan_add(handle, inode); - if (err < 0) { - ext4_journal_stop(handle); - return err; - } - } - ext4_journal_stop(handle); + ret = ext4_iomap_alloc(inode, &map, flags); } else { ret = ext4_map_blocks(NULL, inode, &map, 0); - if (ret < 0) - return ret; } + if (ret < 0) + return ret; + ext4_set_iomap(inode, iomap, &map, offset, length); if (delalloc && iomap->type == IOMAP_HOLE) iomap->type = IOMAP_DELALLOC; From patchwork Tue Nov 5 12:03:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227607 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9211A1515 for ; Tue, 5 Nov 2019 12:03:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 65B0821D71 for ; Tue, 5 Nov 2019 12:03:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="tM4Lzb8r" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730900AbfKEMDi (ORCPT ); Tue, 5 Nov 2019 07:03:38 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:44104 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730816AbfKEMDi (ORCPT ); Tue, 5 Nov 2019 07:03:38 -0500 Received: by mail-pl1-f195.google.com with SMTP id q16so9296850pll.11 for ; Tue, 05 Nov 2019 04:03:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=cFgPyT+ODS1OrE0hwd9zKr2rgvqb/PsZ4tSjiSvlbbI=; b=tM4Lzb8r0Y3azoMYlfMr2tjxpkD/t7EJ1W+mHQD92JAZkhZunDsIbNAgsVmy4HAx2A 6XKVJ9zMtcDBeHKOrEFwttX7/gYRBPxxLztFmcYKxf+awDYrMblE8mC5ag+3B20Hxk8B ci4OKEqZLcDocpfv++pNjbpTt06caKr8nEOZyIisK9ccoyDIeRgJJn6WZQQBM9khjFuK 7Ta/owPGktqqQnwatf170XRX5YtaAzsgbyEUZeqsSbfc9F7qFUHTjHfww5Q4peJ9ndq+ +aHm+wsTZAIOezIPc4XV7P4zEXsuP/MuC3UcoVRfwejs74M4sZ8ieJuR9yBRNu4ypS3Y X2Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=cFgPyT+ODS1OrE0hwd9zKr2rgvqb/PsZ4tSjiSvlbbI=; b=DPMpQrxwdE86D/IBN6OmbSLYPHtt/yMg8F421xfDJbt0cJXluCvLDfDIEqcNVyKFmP An2KrxjUKNAi8V4QwME05XdioPxaFZkMaHbyceKsAvCWAf4f9r4v/h0yTkZ/14ejB2yB Foy5OrkeT/vPFOjt/C7XKSfzpQp4ZMFAanC0m6JNsLXzMXur1PCuHHtX4gI1GPcj4uKn keZRU+FdotjaMy5Wh6Z0pSq/FhqlfnVP+QCFEin4AXluxw8qQggqrZ2yOa4JfcBmptxu kQ4XMQhQlfkeAI4Qv7FbiV9TIDrZrHvijVaerLRRAvkLXAah7Vdk3f888DHmNb0Zo7nB phOg== X-Gm-Message-State: APjAAAWSml9nqZYTS4aTyR4k+gLNLX9+Dzbjd8iqoIIrs7X/pQ/Jq9GE a4AVlajSQeqTu8OXMAezM7WOzrzbtA== X-Google-Smtp-Source: APXvYqy//gLInV/z0P1L70ZmRo5JX29PhNoALIeBgwlxEJoIk1PjOVNaCNOIr+X57s2rUsVBZUhuTw== X-Received: by 2002:a17:902:7c8f:: with SMTP id y15mr5008661pll.341.1572955417149; Tue, 05 Nov 2019 04:03:37 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id n3sm21268023pff.102.2019.11.05.04.03.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 04:03:36 -0800 (PST) Date: Tue, 5 Nov 2019 23:03:31 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 06/11] ext4: introduce new callback for IOMAP_REPORT Message-ID: <5c97a569e26ddb6696e3d3ac9fbde41317e029a0.1572949325.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org As part of the ext4_iomap_begin() cleanups that precede this patch, we also split up the IOMAP_REPORT branch into a completely separate ->iomap_begin() callback named ext4_iomap_begin_report(). Again, the raionale for this change is to reduce the overall clutter within ext4_iomap_begin(). Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/ext4.h | 1 + fs/ext4/file.c | 6 ++- fs/ext4/inode.c | 134 +++++++++++++++++++++++++++++------------------- 3 files changed, 85 insertions(+), 56 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 3616f1b0c987..5c6c4acea8b1 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3388,6 +3388,7 @@ static inline void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end) } extern const struct iomap_ops ext4_iomap_ops; +extern const struct iomap_ops ext4_iomap_report_ops; static inline int ext4_buffer_uptodate(struct buffer_head *bh) { diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 8d2bbcc2d813..ab75aee3e687 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -494,12 +494,14 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int whence) maxbytes, i_size_read(inode)); case SEEK_HOLE: inode_lock_shared(inode); - offset = iomap_seek_hole(inode, offset, &ext4_iomap_ops); + offset = iomap_seek_hole(inode, offset, + &ext4_iomap_report_ops); inode_unlock_shared(inode); break; case SEEK_DATA: inode_lock_shared(inode); - offset = iomap_seek_data(inode, offset, &ext4_iomap_ops); + offset = iomap_seek_data(inode, offset, + &ext4_iomap_report_ops); inode_unlock_shared(inode); break; } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index b540f2903faa..b5ba6767b276 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3553,74 +3553,32 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned flags, struct iomap *iomap, struct iomap *srcmap) { - unsigned int blkbits = inode->i_blkbits; - unsigned long first_block, last_block; - struct ext4_map_blocks map; - bool delalloc = false; int ret; + struct ext4_map_blocks map; + u8 blkbits = inode->i_blkbits; if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) return -EINVAL; - first_block = offset >> blkbits; - last_block = min_t(loff_t, (offset + length - 1) >> blkbits, - EXT4_MAX_LOGICAL_BLOCK); - - if (flags & IOMAP_REPORT) { - if (ext4_has_inline_data(inode)) { - ret = ext4_inline_data_iomap(inode, iomap); - if (ret != -EAGAIN) { - if (ret == 0 && offset >= iomap->length) - ret = -ENOENT; - return ret; - } - } - } else { - if (WARN_ON_ONCE(ext4_has_inline_data(inode))) - return -ERANGE; - } - map.m_lblk = first_block; - map.m_len = last_block - first_block + 1; - - if (flags & IOMAP_REPORT) { - ret = ext4_map_blocks(NULL, inode, &map, 0); - if (ret < 0) - return ret; - - if (ret == 0) { - ext4_lblk_t end = map.m_lblk + map.m_len - 1; - struct extent_status es; - - ext4_es_find_extent_range(inode, &ext4_es_is_delayed, - map.m_lblk, end, &es); + if (WARN_ON_ONCE(ext4_has_inline_data(inode))) + return -ERANGE; - if (!es.es_len || es.es_lblk > end) { - /* entire range is a hole */ - } else if (es.es_lblk > map.m_lblk) { - /* range starts with a hole */ - map.m_len = es.es_lblk - map.m_lblk; - } else { - ext4_lblk_t offs = 0; + /* + * Calculate the first and last logical blocks respectively. + */ + map.m_lblk = offset >> blkbits; + map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, + EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; - if (es.es_lblk < map.m_lblk) - offs = map.m_lblk - es.es_lblk; - map.m_lblk = es.es_lblk + offs; - map.m_len = es.es_len - offs; - delalloc = true; - } - } - } else if (flags & IOMAP_WRITE) { + if (flags & IOMAP_WRITE) ret = ext4_iomap_alloc(inode, &map, flags); - } else { + else ret = ext4_map_blocks(NULL, inode, &map, 0); - } if (ret < 0) return ret; ext4_set_iomap(inode, iomap, &map, offset, length); - if (delalloc && iomap->type == IOMAP_HOLE) - iomap->type = IOMAP_DELALLOC; return 0; } @@ -3682,6 +3640,74 @@ const struct iomap_ops ext4_iomap_ops = { .iomap_end = ext4_iomap_end, }; +static bool ext4_iomap_is_delalloc(struct inode *inode, + struct ext4_map_blocks *map) +{ + struct extent_status es; + ext4_lblk_t offset = 0, end = map->m_lblk + map->m_len - 1; + + ext4_es_find_extent_range(inode, &ext4_es_is_delayed, + map->m_lblk, end, &es); + + if (!es.es_len || es.es_lblk > end) + return false; + + if (es.es_lblk > map->m_lblk) { + map->m_len = es.es_lblk - map->m_lblk; + return false; + } + + offset = map->m_lblk - es.es_lblk; + map->m_len = es.es_len - offset; + + return true; +} + +static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, + loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + int ret; + bool delalloc = false; + struct ext4_map_blocks map; + u8 blkbits = inode->i_blkbits; + + if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) + return -EINVAL; + + if (ext4_has_inline_data(inode)) { + ret = ext4_inline_data_iomap(inode, iomap); + if (ret != -EAGAIN) { + if (ret == 0 && offset >= iomap->length) + ret = -ENOENT; + return ret; + } + } + + /* + * Calculate the first and last logical block respectively. + */ + map.m_lblk = offset >> blkbits; + map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, + EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; + + ret = ext4_map_blocks(NULL, inode, &map, 0); + if (ret < 0) + return ret; + if (ret == 0) + delalloc = ext4_iomap_is_delalloc(inode, &map); + + ext4_set_iomap(inode, iomap, &map, offset, length); + if (delalloc && iomap->type == IOMAP_HOLE) + iomap->type = IOMAP_DELALLOC; + + return 0; +} + +const struct iomap_ops ext4_iomap_report_ops = { + .iomap_begin = ext4_iomap_begin_report, +}; + static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset, ssize_t size, void *private) { From patchwork Tue Nov 5 12:01:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227597 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 14FBB1515 for ; Tue, 5 Nov 2019 12:01:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DE8D620650 for ; Tue, 5 Nov 2019 12:01:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="CQTh26qn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730876AbfKEMBp (ORCPT ); Tue, 5 Nov 2019 07:01:45 -0500 Received: from mail-pg1-f196.google.com ([209.85.215.196]:46195 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730848AbfKEMBo (ORCPT ); Tue, 5 Nov 2019 07:01:44 -0500 Received: by mail-pg1-f196.google.com with SMTP id f19so13965133pgn.13 for ; Tue, 05 Nov 2019 04:01:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=6LibSywqRXYWNf37STd+dCRG3O5Iq29JKhK3bKCXC1M=; b=CQTh26qnHLvJp4R7YJ7HUoMMH8QKakW5Bs+bEAqxTf+TrjmjvjW9wGx8vSaSMKPuRE eytfZ4TXZd6u6sCpAXNJobbqFbfb2TtQWdc6ZoeJhs8VgOAwC5IRbHaQtTy41UipWKEy z864eG35bZh2hyRIyas1BtKz9LPOuGmZS2nCPCDCa0yG9aEJE8hExpLU9OKmIniZMOdc yW9Xk9Ymvh2H1+40Kk+6hg44MIX3Z57ZNV8rUR0PJhW78s0F/5SGTpTjg5wswrzZDCtN JMwlwBd00RUztptucvamfBcpZmflHE4IO5l3vFmUmJip3rd5TO1DbmhNsMo3Od9FtchE f3Ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=6LibSywqRXYWNf37STd+dCRG3O5Iq29JKhK3bKCXC1M=; b=q55b6aDxCzEaxIVRniqku9jd7PhdjqGp+X/fUFKzj26q+rP5kXA//XkrW3zUSBt6Zh UjmtDxlLg8dCmh4F0BWDY4sKlLToRI6/YSrzSd3wzW6bGBVZUw7JopidNhigb3omwIRW xo9QL7c841oqXdFnt1i7naxiVNFEcqe3U9LM+VY4ZfDL+m6XQYHjurX189MuH3evPoPg IfTiQnr2HFPgv0lm2D7e9KR5fthvYocA3egJeFYsYzBWSgviw+kJJnddMc+SNAxX2sG2 yukmu4ZkbN3xGUpLhL/DrnyXsPJ+W8XTH7RQsDxNJKaSOt7/4I6Uk0jRgHeOnjI0S27C Z+KA== X-Gm-Message-State: APjAAAXdgskAtiiAvCn2yhL1vxwTfHitL0tEsbSzs5Yz0iFsYWeDZL8k wDsIJ+PMgMuxxbVw4oFm+f3b X-Google-Smtp-Source: APXvYqxGgRwW6wQeS06JUD8OJFOjC9nZ0rdGXWb/h+t0ik9/XB+YjEWTAK65t+EWoZGOPI8KAINd1Q== X-Received: by 2002:a17:90a:9741:: with SMTP id i1mr6282771pjw.2.1572955303619; Tue, 05 Nov 2019 04:01:43 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id f189sm28845329pgc.94.2019.11.05.04.01.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 04:01:42 -0800 (PST) Date: Tue, 5 Nov 2019 23:01:37 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 07/11] ext4: introduce direct I/O read using iomap infrastructure Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch introduces a new direct I/O read path which makes use of the iomap infrastructure. The new function ext4_do_read_iter() is responsible for calling into the iomap infrastructure via iomap_dio_rw(). If the read operation performed on the inode is not supported, which is checked via ext4_dio_supported(), then we simply fallback and complete the I/O using buffered I/O. Existing direct I/O read code path has been removed, as it is now redundant. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/file.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-- fs/ext4/inode.c | 38 +--------------------------------- 2 files changed, 54 insertions(+), 39 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index ab75aee3e687..440f4c6ba4ee 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -34,6 +34,52 @@ #include "xattr.h" #include "acl.h" +static bool ext4_dio_supported(struct inode *inode) +{ + if (IS_ENABLED(CONFIG_FS_ENCRYPTION) && IS_ENCRYPTED(inode)) + return false; + if (fsverity_active(inode)) + return false; + if (ext4_should_journal_data(inode)) + return false; + if (ext4_has_inline_data(inode)) + return false; + return true; +} + +static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + ssize_t ret; + struct inode *inode = file_inode(iocb->ki_filp); + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!inode_trylock_shared(inode)) + return -EAGAIN; + } else { + inode_lock_shared(inode); + } + + if (!ext4_dio_supported(inode)) { + inode_unlock_shared(inode); + /* + * Fallback to buffered I/O if the operation being performed on + * the inode is not supported by direct I/O. The IOCB_DIRECT + * flag needs to be cleared here in order to ensure that the + * direct I/O path within generic_file_read_iter() is not + * taken. + */ + iocb->ki_flags &= ~IOCB_DIRECT; + return generic_file_read_iter(iocb, to); + } + + ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL, + is_sync_kiocb(iocb)); + inode_unlock_shared(inode); + + file_accessed(iocb->ki_filp); + return ret; +} + #ifdef CONFIG_FS_DAX static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) { @@ -64,16 +110,21 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { - if (unlikely(ext4_forced_shutdown(EXT4_SB(file_inode(iocb->ki_filp)->i_sb)))) + struct inode *inode = file_inode(iocb->ki_filp); + + if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; if (!iov_iter_count(to)) return 0; /* skip atime */ #ifdef CONFIG_FS_DAX - if (IS_DAX(file_inode(iocb->ki_filp))) + if (IS_DAX(inode)) return ext4_dax_read_iter(iocb, to); #endif + if (iocb->ki_flags & IOCB_DIRECT) + return ext4_dio_read_iter(iocb, to); + return generic_file_read_iter(iocb, to); } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index b5ba6767b276..9bd80df6b856 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -863,9 +863,6 @@ int ext4_dio_get_block(struct inode *inode, sector_t iblock, { /* We don't expect handle for direct IO */ WARN_ON_ONCE(ext4_journal_current_handle()); - - if (!create) - return _ext4_get_block(inode, iblock, bh, 0); return ext4_get_block_trans(inode, iblock, bh, EXT4_GET_BLOCKS_CREATE); } @@ -3916,36 +3913,6 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) return ret; } -static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter) -{ - struct address_space *mapping = iocb->ki_filp->f_mapping; - struct inode *inode = mapping->host; - size_t count = iov_iter_count(iter); - ssize_t ret; - - /* - * Shared inode_lock is enough for us - it protects against concurrent - * writes & truncates and since we take care of writing back page cache, - * we are protected against page writeback as well. - */ - if (iocb->ki_flags & IOCB_NOWAIT) { - if (!inode_trylock_shared(inode)) - return -EAGAIN; - } else { - inode_lock_shared(inode); - } - - ret = filemap_write_and_wait_range(mapping, iocb->ki_pos, - iocb->ki_pos + count - 1); - if (ret) - goto out_unlock; - ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, - iter, ext4_dio_get_block, NULL, NULL, 0); -out_unlock: - inode_unlock_shared(inode); - return ret; -} - static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) { struct file *file = iocb->ki_filp; @@ -3972,10 +3939,7 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) return 0; trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter)); - if (iov_iter_rw(iter) == READ) - ret = ext4_direct_IO_read(iocb, iter); - else - ret = ext4_direct_IO_write(iocb, iter); + ret = ext4_direct_IO_write(iocb, iter); trace_ext4_direct_IO_exit(inode, offset, count, iov_iter_rw(iter), ret); return ret; } From patchwork Tue Nov 5 12:01:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227599 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 179A61515 for ; Tue, 5 Nov 2019 12:02:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DF0D221929 for ; Tue, 5 Nov 2019 12:02:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="JliYhNuw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730896AbfKEMCB (ORCPT ); Tue, 5 Nov 2019 07:02:01 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:36409 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730848AbfKEMCB (ORCPT ); Tue, 5 Nov 2019 07:02:01 -0500 Received: by mail-pf1-f195.google.com with SMTP id v19so15157823pfm.3 for ; Tue, 05 Nov 2019 04:02:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Ao1rlHcLUQ5Y1RuA5Fm1CjTs0sc8ov2JwdhgeIweKhc=; b=JliYhNuwH4T7aq/teXHCcWlm4i9B7mYbqlSRgvnvEXFSL3wUteMYgpGT2V7gDb++Fl N2V0IStiqPhLo8WwElLO3H/LedZNCreosAOfUrA+20VFGeu/3vuOu3OfEdYf+nVWSVsR BzL3FUoF2CJMtfBKPcZ3WKlQ2+N45IH1cKJvETefpF7As9QSmfWTUD3Gq2UiQCCEDVJ/ xhoR0P+8PCkRD3Bwxfk3VljEhXW5PwvNLqNG0XWUlJEC2Q0FMaA2k89JZzOrumC5LwSO 8DXYHDAhj22BuJNqA7BDtfQu7N3KYKvP+r+HhX8JIjYd/nvev5Ad0utpM1yYaUSr1Gv1 N+YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Ao1rlHcLUQ5Y1RuA5Fm1CjTs0sc8ov2JwdhgeIweKhc=; b=mS7fhRdCynmt640yrr3AvUA+caLJUEphkwZTdEZCC0SGp3kpbVInOXvxXZazWHtXdG oSZhe/T0SeQd0kTe0Gc4CL2cxoUazRWDahGy0PuhNzI9dIWqNBql+ya7jWc6J2IP7ydv Hn84cGe9ln5q6lGPrdWUGfE0BrFDBLmyLr3YNKh+vzrhSg7GBUSHhbafmOodF5Qp1EAq gcn/0lGxqCUe9QEHCRVYUs6MoYn+aU/LTuEwDWcbcNDn46VDGkxjzQsI+eTcB0XZYAhB npnJFXDj1DqqvrGrgT7neekKVbEWpAPbdA2+vqVMFkLs/8PqxUlwT6lN177X3RrxN3DP j8yg== X-Gm-Message-State: APjAAAXnrA7MWic6eqKv4ht8BIkiE6edHi5LjtKOOK331TZbJ99AcVyV WIkk3f0rNj99r5wjNzc3U+4K X-Google-Smtp-Source: APXvYqwyvm39n3nGWZHzGGsGs6NFXZ+OM31pC97J7kCT/lGvZDVzWcCtlnHCMX2HdEbT8W6QZhjXNg== X-Received: by 2002:a63:67c3:: with SMTP id b186mr36117014pgc.152.1572955320145; Tue, 05 Nov 2019 04:02:00 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id z23sm7311835pgj.43.2019.11.05.04.01.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 04:01:58 -0800 (PST) Date: Tue, 5 Nov 2019 23:01:51 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 08/11] ext4: move inode extension/truncate code out from ->iomap_end() callback Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In preparation for implementing the iomap direct I/O modifications, the inode extension/truncate code needs to be moved out from the ext4_iomap_end() callback. For direct I/O, if the current code remained, it would behave incorrrectly. Updating the inode size prior to converting unwritten extents would potentially allow a racing direct I/O read to find unwritten extents before being converted correctly. The inode extension/truncate code now resides within a new helper ext4_handle_inode_extension(). This function has been designed so that it can accommodate for both DAX and direct I/O extension/truncate operations. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/file.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++- fs/ext4/inode.c | 48 +------------------------- 2 files changed, 89 insertions(+), 48 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 440f4c6ba4ee..ec54fec96a81 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -33,6 +33,7 @@ #include "ext4_jbd2.h" #include "xattr.h" #include "acl.h" +#include "truncate.h" static bool ext4_dio_supported(struct inode *inode) { @@ -234,12 +235,95 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) return iov_iter_count(from); } +static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, + ssize_t written, size_t count) +{ + handle_t *handle; + bool truncate = false; + u8 blkbits = inode->i_blkbits; + ext4_lblk_t written_blk, end_blk; + + /* + * Note that EXT4_I(inode)->i_disksize can get extended up to + * inode->i_size while the I/O was running due to writeback of delalloc + * blocks. But, the code in ext4_iomap_alloc() is careful to use + * zeroed/unwritten extents if this is possible; thus we won't leave + * uninitialized blocks in a file even if we didn't succeed in writing + * as much as we intended. + */ + WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize); + if (offset + count <= EXT4_I(inode)->i_disksize) { + /* + * We need to ensure that the inode is removed from the orphan + * list if it has been added prematurely, due to writeback of + * delalloc blocks. + */ + if (!list_empty(&EXT4_I(inode)->i_orphan) && inode->i_nlink) { + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); + + if (IS_ERR(handle)) { + ext4_orphan_del(NULL, inode); + return PTR_ERR(handle); + } + + ext4_orphan_del(handle, inode); + ext4_journal_stop(handle); + } + + return written; + } + + if (written < 0) + goto truncate; + + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); + if (IS_ERR(handle)) { + written = PTR_ERR(handle); + goto truncate; + } + + if (ext4_update_inode_size(inode, offset + written)) + ext4_mark_inode_dirty(handle, inode); + + /* + * We may need to truncate allocated but not written blocks beyond EOF. + */ + written_blk = ALIGN(offset + written, 1 << blkbits); + end_blk = ALIGN(offset + count, 1 << blkbits); + if (written_blk < end_blk && ext4_can_truncate(inode)) + truncate = true; + + /* + * Remove the inode from the orphan list if it has been extended and + * everything went OK. + */ + if (!truncate && inode->i_nlink) + ext4_orphan_del(handle, inode); + ext4_journal_stop(handle); + + if (truncate) { +truncate: + ext4_truncate_failed_write(inode); + /* + * If the truncate operation failed early, then the inode may + * still be on the orphan list. In that case, we need to try + * remove the inode from the in-memory linked list. + */ + if (inode->i_nlink) + ext4_orphan_del(NULL, inode); + } + + return written; +} + #ifdef CONFIG_FS_DAX static ssize_t ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) { - struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; + size_t count; + loff_t offset; + struct inode *inode = file_inode(iocb->ki_filp); if (!inode_trylock(inode)) { if (iocb->ki_flags & IOCB_NOWAIT) @@ -256,7 +340,10 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) if (ret) goto out; + offset = iocb->ki_pos; + count = iov_iter_count(from); ret = dax_iomap_rw(iocb, from, &ext4_iomap_ops); + ret = ext4_handle_inode_extension(inode, offset, ret, count); out: inode_unlock(inode); if (ret > 0) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9bd80df6b856..071a1f976aab 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3583,53 +3583,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length, ssize_t written, unsigned flags, struct iomap *iomap) { - int ret = 0; - handle_t *handle; - int blkbits = inode->i_blkbits; - bool truncate = false; - - if (!(flags & IOMAP_WRITE) || (flags & IOMAP_FAULT)) - return 0; - - handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); - if (IS_ERR(handle)) { - ret = PTR_ERR(handle); - goto orphan_del; - } - if (ext4_update_inode_size(inode, offset + written)) - ext4_mark_inode_dirty(handle, inode); - /* - * We may need to truncate allocated but not written blocks beyond EOF. - */ - if (iomap->offset + iomap->length > - ALIGN(inode->i_size, 1 << blkbits)) { - ext4_lblk_t written_blk, end_blk; - - written_blk = (offset + written) >> blkbits; - end_blk = (offset + length) >> blkbits; - if (written_blk < end_blk && ext4_can_truncate(inode)) - truncate = true; - } - /* - * Remove inode from orphan list if we were extending a inode and - * everything went fine. - */ - if (!truncate && inode->i_nlink && - !list_empty(&EXT4_I(inode)->i_orphan)) - ext4_orphan_del(handle, inode); - ext4_journal_stop(handle); - if (truncate) { - ext4_truncate_failed_write(inode); -orphan_del: - /* - * If truncate failed early the inode might still be on the - * orphan list; we need to make sure the inode is removed from - * the orphan list in that case. - */ - if (inode->i_nlink) - ext4_orphan_del(NULL, inode); - } - return ret; + return 0; } const struct iomap_ops ext4_iomap_ops = { From patchwork Tue Nov 5 12:02:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C7A541515 for ; Tue, 5 Nov 2019 12:02:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A57F021929 for ; Tue, 5 Nov 2019 12:02:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="mYAc6W7n" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730918AbfKEMCR (ORCPT ); Tue, 5 Nov 2019 07:02:17 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:40728 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726612AbfKEMCQ (ORCPT ); Tue, 5 Nov 2019 07:02:16 -0500 Received: by mail-pl1-f193.google.com with SMTP id e3so7177675plt.7 for ; Tue, 05 Nov 2019 04:02:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=C4JTtLfW0Tocso0HHtIJVM1LwapQ7u0zpOXtwObQFvU=; b=mYAc6W7nwc4HxmM0tO6mmTrtCouBkNtp3MTioj+dDnC69pgwP4DOdEqBXSrz0287UW BWdFgzGMK/pCscuI3gI/11GR1CgmTp7qNI8BdLGgqY6vo8jdfAEgEUHmtCLZHMr/7rL1 4SWPJJ24UfORj7boiOKhQDhyac/DQfPYtRJfOa34y0nPxRVXJzaTbuBgTLpWSuBibR+E e1KEgHlgEhym9Qm7hpAhfHgvL0SJnE71JlQ4Fo4wqKgObhlrglsXS1iVC+QoP5umpD8l Dox8qpIpnceY3spsnuu1jHc22fOSAjCrWNY3t0GXGIkGcM/tndmHHZ+yCB88JT5akqCK eI/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=C4JTtLfW0Tocso0HHtIJVM1LwapQ7u0zpOXtwObQFvU=; b=gWFTjW/mv/yZfQh8LO7yGtkxLHf0/bUGw+JBXj1UBtFPKFSmN1D3WD2ztVR6m3WlHy tbxF2cSse+c3Ppzuw4AiTAyjM4+ioX2TRAv7eM0MIi6FVaMqW0nFu9cevtAk9a+lf0No yA3f/BEREUxMFPTGx/0i1bG52Yju6FE3d/lx/i8qYKNJq2MGE4+l0pX36NMDCIjeiUaS MD4PiINfMAVwYeZYqOHlm4CbjGs6rN11ZoGDnBrDTVVX3lBBtA6elao1Od85j1x047HX Q7IMUy5V+g9XGBSWDeAJ9erhlx+1ttuK7CtsPkWc5BZlZyqA+PpFiYxRR5iYv4oiiaGf nLrA== X-Gm-Message-State: APjAAAWeFQ9WMnE64wKrT2Yo7mU7Oma2Ya5UzLeJIp6f7lO0imZfs1pI YJB9tW937XsGbwyjRpxw9g9E X-Google-Smtp-Source: APXvYqw91hnJKX/xpc0YJQ3Cr+NkpzwXHzeGyCUvxHp8dDVV4Mvzz1qbgM1xnVoXlWFZItA1S0JWGA== X-Received: by 2002:a17:902:bcc7:: with SMTP id o7mr33399536pls.333.1572955335852; Tue, 05 Nov 2019 04:02:15 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id j14sm19921156pje.17.2019.11.05.04.02.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 04:02:14 -0800 (PST) Date: Tue, 5 Nov 2019 23:02:08 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 09/11] ext4: move inode extension check out from ext4_iomap_alloc() Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Lift the inode extension/orphan list handling code out from ext4_iomap_alloc() and apply it within the ext4_dax_write_iter(). Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/file.c | 24 +++++++++++++++++++++++- fs/ext4/inode.c | 22 ---------------------- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index ec54fec96a81..83ef9c9ed208 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -323,6 +323,8 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) ssize_t ret; size_t count; loff_t offset; + handle_t *handle; + bool extend = false; struct inode *inode = file_inode(iocb->ki_filp); if (!inode_trylock(inode)) { @@ -342,8 +344,28 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) offset = iocb->ki_pos; count = iov_iter_count(from); + + if (offset + count > EXT4_I(inode)->i_disksize) { + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + goto out; + } + + ret = ext4_orphan_add(handle, inode); + if (ret) { + ext4_journal_stop(handle); + goto out; + } + + extend = true; + ext4_journal_stop(handle); + } + ret = dax_iomap_rw(iocb, from, &ext4_iomap_ops); - ret = ext4_handle_inode_extension(inode, offset, ret, count); + + if (extend) + ret = ext4_handle_inode_extension(inode, offset, ret, count); out: inode_unlock(inode); if (ret > 0) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 071a1f976aab..392085aa7809 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3494,7 +3494,6 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, unsigned int flags) { handle_t *handle; - u8 blkbits = inode->i_blkbits; int ret, dio_credits, retries = 0; /* @@ -3517,28 +3516,7 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, return PTR_ERR(handle); ret = ext4_map_blocks(handle, inode, map, EXT4_GET_BLOCKS_CREATE_ZERO); - if (ret < 0) - goto journal_stop; - - /* - * If we've allocated blocks beyond EOF, we need to ensure that they're - * truncated if we crash before updating the inode size metadata within - * ext4_iomap_end(). For faults, we don't need to do that (and cannot - * due to orphan list operations needing an inode_lock()). If we happen - * to instantiate blocks beyond EOF, it is because we race with a - * truncate operation, which already has added the inode onto the - * orphan list. - */ - if (!(flags & IOMAP_FAULT) && map->m_lblk + map->m_len > - (i_size_read(inode) + (1 << blkbits) - 1) >> blkbits) { - int err; - - err = ext4_orphan_add(handle, inode); - if (err < 0) - ret = err; - } -journal_stop: ext4_journal_stop(handle); if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) goto retry; From patchwork Tue Nov 5 12:02:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227603 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 974411850 for ; Tue, 5 Nov 2019 12:02:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6D03121D71 for ; Tue, 5 Nov 2019 12:02:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="EqT1vkY1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730971AbfKEMCc (ORCPT ); Tue, 5 Nov 2019 07:02:32 -0500 Received: from mail-pg1-f196.google.com ([209.85.215.196]:43232 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730894AbfKEMCc (ORCPT ); Tue, 5 Nov 2019 07:02:32 -0500 Received: by mail-pg1-f196.google.com with SMTP id l24so13977016pgh.10 for ; Tue, 05 Nov 2019 04:02:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=FCXNM1/u8MXFatH/bliHUcCJojxaabMwp7gXdYHudrI=; b=EqT1vkY1v+5H8zuPL7noN3DGNkf1QzdTS6vsDZX4W/+xR95EgvUTc4G9huJbxU71ou 5thw4DMywxBhAbA9WaPrsLZEuIWkZghIOId8G8q2rFxAmQsGQiuxtR/CaM/X6Udempr0 rgktWr+qXx8+GWwXbvV16gPVfMcB2eN5hE0FYxikhqX2rqCRcV/36XricXbX0Txiqwxh Pb7+5CVjOun7u6w1ZopKTBNRlSgPnzMeKjmJykGFMBFsi0Mc/TRtM7FKVOOtYVTBJjCe 7xOagpQkoyYwicKK3Q5Ko3LED8K47+yz2NvF3pwfgC/YF3Yf3hntHlxKafA/zK4v77+T 4ZKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=FCXNM1/u8MXFatH/bliHUcCJojxaabMwp7gXdYHudrI=; b=nuW+r84O8lIQgWJXmVIFbBhiQ9M5hQhSAx3vt4BgJNdAOddv4OMVhQdJz21dnb5uim piwJdTNvoUxGKZqZsGIUR5rJueQj6kobj9FTghJ47pX/6SKvlxN3aS+7oXSF0BFLy04z sxGhYUl+50NFZUS6CPu+cOML5WBKKohZjNKRfXV70TioB+g99snqeuewhUC4dIKOzDK3 hJcc9+flEuV5BCrZC4T123skRgKAn0oy5FH472yEC2XajhHkTT+VVgXsXmgxT6op99/w EZqfeuF7OKCdTjT9FstRHWWhRCbc6obXw193XkBSToTGykt4swdKI4Ylq5jk+TaeiCnR XHhg== X-Gm-Message-State: APjAAAWUUuibtbg1X4BA5uzLQCi3ziDxhf4eHxaqxnFMRXj4U1myYKiS /rTI2C7jJOIoD/4K5iHhC0Tc X-Google-Smtp-Source: APXvYqwess4EqxYZl4426W7Pk1Tj4E9hROBvrGLbtzjDO9spCXeuQxvJ5dRc5IGuOa8g/U/t5QsVtA== X-Received: by 2002:a17:90a:1788:: with SMTP id q8mr6034299pja.139.1572955351511; Tue, 05 Nov 2019 04:02:31 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id m68sm18781631pfb.122.2019.11.05.04.02.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 04:02:30 -0800 (PST) Date: Tue, 5 Nov 2019 23:02:23 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 10/11] ext4: update ext4_sync_file() to not use __generic_file_fsync() Message-ID: <3495f35ef67f2021b567e28e6f59222e583689b8.1572949325.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When the filesystem is created without a journal, we eventually call into __generic_file_fsync() in order to write out all the modified in-core data to the permanent storage device. This function happens to try and obtain an inode_lock() while synchronizing the files buffer and it's associated metadata. Generally, this is fine, however it becomes a problem when there is higher level code that has already obtained an inode_lock() as this leads to a recursive lock situation. This case is especially true when porting across direct I/O to iomap infrastructure as we obtain an inode_lock() early on in the I/O within ext4_dio_write_iter() and hold it until the I/O has been completed. Consequently, to not run into this specific issue, we move away from calling into __generic_file_fsync() and perform the necessary synchronization tasks within ext4_sync_file(). Signed-off-by: Matthew Bobrowski Reviewed-by: Ritesh Harjani Reviewed-by: Jan Kara --- fs/ext4/fsync.c | 72 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 47 insertions(+), 25 deletions(-) diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index 5508baa11bb6..e10206e7f4bb 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -80,6 +80,43 @@ static int ext4_sync_parent(struct inode *inode) return ret; } +static int ext4_fsync_nojournal(struct inode *inode, bool datasync, + bool *needs_barrier) +{ + int ret, err; + + ret = sync_mapping_buffers(inode->i_mapping); + if (!(inode->i_state & I_DIRTY_ALL)) + return ret; + if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) + return ret; + + err = sync_inode_metadata(inode, 1); + if (!ret) + ret = err; + + if (!ret) + ret = ext4_sync_parent(inode); + if (test_opt(inode->i_sb, BARRIER)) + *needs_barrier = true; + + return ret; +} + +static int ext4_fsync_journal(struct inode *inode, bool datasync, + bool *needs_barrier) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; + tid_t commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; + + if (journal->j_flags & JBD2_BARRIER && + !jbd2_trans_will_send_data_barrier(journal, commit_tid)) + *needs_barrier = true; + + return jbd2_complete_transaction(journal, commit_tid); +} + /* * akpm: A new design for ext4_sync_file(). * @@ -91,17 +128,14 @@ static int ext4_sync_parent(struct inode *inode) * What we do is just kick off a commit and wait on it. This will snapshot the * inode to disk. */ - int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) { - struct inode *inode = file->f_mapping->host; - struct ext4_inode_info *ei = EXT4_I(inode); - journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; int ret = 0, err; - tid_t commit_tid; bool needs_barrier = false; + struct inode *inode = file->f_mapping->host; + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); - if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) + if (unlikely(ext4_forced_shutdown(sbi))) return -EIO; J_ASSERT(ext4_journal_current_handle() == NULL); @@ -111,23 +145,15 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) if (sb_rdonly(inode->i_sb)) { /* Make sure that we read updated s_mount_flags value */ smp_rmb(); - if (EXT4_SB(inode->i_sb)->s_mount_flags & EXT4_MF_FS_ABORTED) + if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED) ret = -EROFS; goto out; } - if (!journal) { - ret = __generic_file_fsync(file, start, end, datasync); - if (!ret) - ret = ext4_sync_parent(inode); - if (test_opt(inode->i_sb, BARRIER)) - goto issue_flush; - goto out; - } - ret = file_write_and_wait_range(file, start, end); if (ret) return ret; + /* * data=writeback,ordered: * The caller's filemap_fdatawrite()/wait will sync the data. @@ -142,18 +168,14 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * (they were dirtied by commit). But that's OK - the blocks are * safe in-journal, which is all fsync() needs to ensure. */ - if (ext4_should_journal_data(inode)) { + if (!sbi->s_journal) + ret = ext4_fsync_nojournal(inode, datasync, &needs_barrier); + else if (ext4_should_journal_data(inode)) ret = ext4_force_commit(inode->i_sb); - goto out; - } + else + ret = ext4_fsync_journal(inode, datasync, &needs_barrier); - commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; - if (journal->j_flags & JBD2_BARRIER && - !jbd2_trans_will_send_data_barrier(journal, commit_tid)) - needs_barrier = true; - ret = jbd2_complete_transaction(journal, commit_tid); if (needs_barrier) { - issue_flush: err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); if (!ret) ret = err; From patchwork Tue Nov 5 12:02:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 11227605 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4D01A1850 for ; Tue, 5 Nov 2019 12:02:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0E59721D7C for ; Tue, 5 Nov 2019 12:02:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="jKbH1v2Q" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730975AbfKEMCr (ORCPT ); Tue, 5 Nov 2019 07:02:47 -0500 Received: from mail-pg1-f193.google.com ([209.85.215.193]:37625 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730816AbfKEMCr (ORCPT ); Tue, 5 Nov 2019 07:02:47 -0500 Received: by mail-pg1-f193.google.com with SMTP id z24so9439782pgu.4 for ; Tue, 05 Nov 2019 04:02:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=vBDhpUbjkhtYpTJvOYcL9VLVatbn3pGKiaa+R8E5I90=; b=jKbH1v2QRYgKpR8zYiCcrwDJKXZ8H74BvdVzr7/0t7PyjXOwct8/0o7fqEvj9yQ4Rw znnE3MnLwXIA2WvbKYMg0Y8RRT2u7ffNFzJ6go0+4TFyBv5j11DuEe4E8hMffSSCeN7q hG40WcgQbkS7OXJM3HICRxMWBPrTcHb6+B8aCzLtXT571nUC3W6h1g7jwMpyan+SsyYw Oz49XlJbzVotFRQe7+4VcGEgH66bWoaztNEG8BfEQ7KsGJSBRaXNaEwPV2idnHjUFg6R MwaCL2G8Jh6QBbDJyUaJaW7uLM+TQioYzIQGfvd6mMRz8ia0cRsmxLFRMAR4ZTO0hyQx x9LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=vBDhpUbjkhtYpTJvOYcL9VLVatbn3pGKiaa+R8E5I90=; b=W5yz6Hw91T9jNqgiGwg29RUUza4PDa2ZYog6IFggxVQvk92BjSjqM5lpjPvyov9VP7 IytJvP/M9BVeziBRxEvt+ei5RYeqTtn8mEGb22VEm5MHUPaP+KQkiqaeGIUNlSFAIHXl +pOL+N4TgWgnSLK8aqns/iP0NRh3eV06diT1Jq4nuupuF1Kd6XF4c18Ed46QOufm9rU5 3eDui8jW6iN9rlM0uu6LK10ezBQFcZa4WqGM4YJZCa0gbYkkRj4mmCe1GTDcZ3ZaSVHF qLA2gJinV6q5ihnaA0v6FvR+UK1evciTb1r0xfy04QENWimIxrFywAUJYxUB9ul31CcN 10uw== X-Gm-Message-State: APjAAAUkrVIashRK760dscwIE6PJDO8oGVyimPgCb1CjsUIWX+9B8RpU kgeiJBE6yt5xLCkA78hpQMN2 X-Google-Smtp-Source: APXvYqxciO3aGY97WHI99gkP03fNwYOPVLPQDUffODpVqZIRRq8n+AvYsfPeOTatAWuGOtCBaAh5Ww== X-Received: by 2002:a17:90a:23e2:: with SMTP id g89mr5887459pje.127.1572955365454; Tue, 05 Nov 2019 04:02:45 -0800 (PST) Received: from poseidon.bobrowski.net ([114.78.226.167]) by smtp.gmail.com with ESMTPSA id c62sm20184313pfa.92.2019.11.05.04.02.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 04:02:44 -0800 (PST) Date: Tue, 5 Nov 2019 23:02:39 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, riteshh@linux.ibm.com Subject: [PATCH v7 11/11] ext4: introduce direct I/O write using iomap infrastructure Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch introduces a new direct I/O write path which makes use of the iomap infrastructure. All direct I/O writes are now passed from the ->write_iter() callback through to the new direct I/O handler ext4_dio_write_iter(). This function is responsible for calling into the iomap infrastructure via iomap_dio_rw(). Code snippets from the existing direct I/O write code within ext4_file_write_iter() such as, checking whether the I/O request is unaligned asynchronous I/O, or whether the write will result in an overwrite have effectively been moved out and into the new direct I/O ->write_iter() handler. The block mapping flags that are eventually passed down to ext4_map_blocks() from the *_get_block_*() suite of routines have been taken out and introduced within ext4_iomap_alloc(). For inode extension cases, ext4_handle_inode_extension() is effectively the function responsible for performing such metadata updates. This is called after iomap_dio_rw() has returned so that we can safely determine whether we need to potentially truncate any allocated blocks that may have been prepared for this direct I/O write. We don't perform the inode extension, or truncate operations from the ->end_io() handler as we don't have the original I/O 'length' available there. The ->end_io() however is responsible fo converting allocated unwritten extents to written extents. In the instance of a short write, we fallback and complete the remainder of the I/O using buffered I/O via ext4_buffered_write_iter(). The existing buffer_head direct I/O implementation has been removed as it's now redundant. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/ext4.h | 3 - fs/ext4/extents.c | 11 +- fs/ext4/file.c | 246 +++++++++++++++++++-------- fs/ext4/inode.c | 413 +++++----------------------------------------- 4 files changed, 218 insertions(+), 455 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 5c6c4acea8b1..24f79035c731 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1584,7 +1584,6 @@ enum { EXT4_STATE_NO_EXPAND, /* No space for expansion */ EXT4_STATE_DA_ALLOC_CLOSE, /* Alloc DA blks on close */ EXT4_STATE_EXT_MIGRATE, /* Inode is migrating */ - EXT4_STATE_DIO_UNWRITTEN, /* need convert on dio done*/ EXT4_STATE_NEWENTRY, /* File just added to dir */ EXT4_STATE_MAY_INLINE_DATA, /* may have in-inode data */ EXT4_STATE_EXT_PRECACHED, /* extents have been precached */ @@ -2565,8 +2564,6 @@ int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create); int ext4_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create); -int ext4_dio_get_block(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create); int ext4_da_get_block_prep(struct inode *inode, sector_t iblock, struct buffer_head *bh, int create); int ext4_walk_page_buffers(handle_t *handle, diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index cf6c5f64cb58..56a4cee00fb7 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1753,16 +1753,9 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1, */ if (ext1_ee_len + ext2_ee_len > EXT_INIT_MAX_LEN) return 0; - /* - * The check for IO to unwritten extent is somewhat racy as we - * increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after - * dropping i_data_sem. But reserved blocks should save us in that - * case. - */ + if (ext4_ext_is_unwritten(ex1) && - (ext4_test_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN) || - atomic_read(&EXT4_I(inode)->i_unwritten) || - (ext1_ee_len + ext2_ee_len > EXT_UNWRITTEN_MAX_LEN))) + ext1_ee_len + ext2_ee_len > EXT_UNWRITTEN_MAX_LEN) return 0; #ifdef AGGRESSIVE_TEST if (ext1_ee_len >= 4) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 83ef9c9ed208..3a8423bec372 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -29,6 +29,7 @@ #include #include #include +#include #include "ext4.h" #include "ext4_jbd2.h" #include "xattr.h" @@ -155,13 +156,6 @@ static int ext4_release_file(struct inode *inode, struct file *filp) return 0; } -static void ext4_unwritten_wait(struct inode *inode) -{ - wait_queue_head_t *wq = ext4_ioend_wq(inode); - - wait_event(*wq, (atomic_read(&EXT4_I(inode)->i_unwritten) == 0)); -} - /* * This tests whether the IO in question is block-aligned or not. * Ext4 utilizes unwritten extents when hole-filling during direct IO, and they @@ -214,13 +208,13 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; + if (unlikely(IS_IMMUTABLE(inode))) + return -EPERM; + ret = generic_write_checks(iocb, from); if (ret <= 0) return ret; - if (unlikely(IS_IMMUTABLE(inode))) - return -EPERM; - /* * If we have encountered a bitmap-format file, the size limit * is smaller than s_maxbytes, which is for extent-mapped files. @@ -232,9 +226,42 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) return -EFBIG; iov_iter_truncate(from, sbi->s_bitmap_maxbytes - iocb->ki_pos); } + + ret = file_modified(iocb->ki_filp); + if (ret) + return ret; + return iov_iter_count(from); } +static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, + struct iov_iter *from) +{ + ssize_t ret; + struct inode *inode = file_inode(iocb->ki_filp); + + if (iocb->ki_flags & IOCB_NOWAIT) + return -EOPNOTSUPP; + + inode_lock(inode); + ret = ext4_write_checks(iocb, from); + if (ret <= 0) + goto out; + + current->backing_dev_info = inode_to_bdi(inode); + ret = generic_perform_write(iocb->ki_filp, from, iocb->ki_pos); + current->backing_dev_info = NULL; + +out: + inode_unlock(inode); + if (likely(ret > 0)) { + iocb->ki_pos += ret; + ret = generic_write_sync(iocb, ret); + } + + return ret; +} + static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, ssize_t written, size_t count) { @@ -316,6 +343,139 @@ static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, return written; } +static int ext4_dio_write_end_io(struct kiocb *iocb, ssize_t size, + int error, unsigned int flags) +{ + loff_t offset = iocb->ki_pos; + struct inode *inode = file_inode(iocb->ki_filp); + + if (error) + return error; + + if (size && flags & IOMAP_DIO_UNWRITTEN) + return ext4_convert_unwritten_extents(NULL, inode, + offset, size); + + return 0; +} + +static const struct iomap_dio_ops ext4_dio_write_ops = { + .end_io = ext4_dio_write_end_io, +}; + +static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + ssize_t ret; + size_t count; + loff_t offset; + handle_t *handle; + struct inode *inode = file_inode(iocb->ki_filp); + bool extend = false, overwrite = false, unaligned_aio = false; + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!inode_trylock(inode)) + return -EAGAIN; + } else { + inode_lock(inode); + } + + if (!ext4_dio_supported(inode)) { + inode_unlock(inode); + /* + * Fallback to buffered I/O if the inode does not support + * direct I/O. + */ + return ext4_buffered_write_iter(iocb, from); + } + + ret = ext4_write_checks(iocb, from); + if (ret <= 0) { + inode_unlock(inode); + return ret; + } + + /* + * Unaligned asynchronous direct I/O must be serialized among each + * other as the zeroing of partial blocks of two competing unaligned + * asynchronous direct I/O writes can result in data corruption. + */ + offset = iocb->ki_pos; + count = iov_iter_count(from); + if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) && + !is_sync_kiocb(iocb) && ext4_unaligned_aio(inode, from, offset)) { + unaligned_aio = true; + inode_dio_wait(inode); + } + + /* + * Determine whether the I/O will overwrite allocated and initialized + * blocks. If so, check to see whether it is possible to take the + * dioread_nolock path. + */ + if (!unaligned_aio && ext4_overwrite_io(inode, offset, count) && + ext4_should_dioread_nolock(inode)) { + overwrite = true; + downgrade_write(&inode->i_rwsem); + } + + if (offset + count > EXT4_I(inode)->i_disksize) { + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + goto out; + } + + ret = ext4_orphan_add(handle, inode); + if (ret) { + ext4_journal_stop(handle); + goto out; + } + + extend = true; + ext4_journal_stop(handle); + } + + ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops, + is_sync_kiocb(iocb) || unaligned_aio || extend); + + if (extend) + ret = ext4_handle_inode_extension(inode, offset, ret, count); + +out: + if (overwrite) + inode_unlock_shared(inode); + else + inode_unlock(inode); + + if (ret >= 0 && iov_iter_count(from)) { + ssize_t err; + loff_t endbyte; + + offset = iocb->ki_pos; + err = ext4_buffered_write_iter(iocb, from); + if (err < 0) + return err; + + /* + * We need to ensure that the pages within the page cache for + * the range covered by this I/O are written to disk and + * invalidated. This is in attempt to preserve the expected + * direct I/O semantics in the case we fallback to buffered I/O + * to complete off the I/O request. + */ + ret += err; + endbyte = offset + ret - 1; + err = filemap_write_and_wait_range(iocb->ki_filp->f_mapping, + offset, endbyte); + if (!err) + invalidate_mapping_pages(iocb->ki_filp->f_mapping, + offset >> PAGE_SHIFT, + endbyte >> PAGE_SHIFT); + } + + return ret; +} + #ifdef CONFIG_FS_DAX static ssize_t ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) @@ -332,15 +492,10 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) return -EAGAIN; inode_lock(inode); } + ret = ext4_write_checks(iocb, from); if (ret <= 0) goto out; - ret = file_remove_privs(iocb->ki_filp); - if (ret) - goto out; - ret = file_update_time(iocb->ki_filp); - if (ret) - goto out; offset = iocb->ki_pos; count = iov_iter_count(from); @@ -378,10 +533,6 @@ static ssize_t ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); - int o_direct = iocb->ki_flags & IOCB_DIRECT; - int unaligned_aio = 0; - int overwrite = 0; - ssize_t ret; if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; @@ -390,59 +541,10 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (IS_DAX(inode)) return ext4_dax_write_iter(iocb, from); #endif + if (iocb->ki_flags & IOCB_DIRECT) + return ext4_dio_write_iter(iocb, from); - if (!inode_trylock(inode)) { - if (iocb->ki_flags & IOCB_NOWAIT) - return -EAGAIN; - inode_lock(inode); - } - - ret = ext4_write_checks(iocb, from); - if (ret <= 0) - goto out; - - /* - * Unaligned direct AIO must be serialized among each other as zeroing - * of partial blocks of two competing unaligned AIOs can result in data - * corruption. - */ - if (o_direct && ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) && - !is_sync_kiocb(iocb) && - ext4_unaligned_aio(inode, from, iocb->ki_pos)) { - unaligned_aio = 1; - ext4_unwritten_wait(inode); - } - - iocb->private = &overwrite; - /* Check whether we do a DIO overwrite or not */ - if (o_direct && !unaligned_aio) { - if (ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from))) { - if (ext4_should_dioread_nolock(inode)) - overwrite = 1; - } else if (iocb->ki_flags & IOCB_NOWAIT) { - ret = -EAGAIN; - goto out; - } - } - - ret = __generic_file_write_iter(iocb, from); - /* - * Unaligned direct AIO must be the only IO in flight. Otherwise - * overlapping aligned IO after unaligned might result in data - * corruption. - */ - if (ret == -EIOCBQUEUED && unaligned_aio) - ext4_unwritten_wait(inode); - inode_unlock(inode); - - if (ret > 0) - ret = generic_write_sync(iocb, ret); - - return ret; - -out: - inode_unlock(inode); - return ret; + return ext4_buffered_write_iter(iocb, from); } #ifdef CONFIG_FS_DAX diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 392085aa7809..c103362b9cf9 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -826,133 +826,6 @@ int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, /* Maximum number of blocks we map for direct IO at once. */ #define DIO_MAX_BLOCKS 4096 -/* - * Get blocks function for the cases that need to start a transaction - - * generally difference cases of direct IO and DAX IO. It also handles retries - * in case of ENOSPC. - */ -static int ext4_get_block_trans(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int flags) -{ - int dio_credits; - handle_t *handle; - int retries = 0; - int ret; - - /* Trim mapping request to maximum we can map at once for DIO */ - if (bh_result->b_size >> inode->i_blkbits > DIO_MAX_BLOCKS) - bh_result->b_size = DIO_MAX_BLOCKS << inode->i_blkbits; - dio_credits = ext4_chunk_trans_blocks(inode, - bh_result->b_size >> inode->i_blkbits); -retry: - handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS, dio_credits); - if (IS_ERR(handle)) - return PTR_ERR(handle); - - ret = _ext4_get_block(inode, iblock, bh_result, flags); - ext4_journal_stop(handle); - - if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) - goto retry; - return ret; -} - -/* Get block function for DIO reads and writes to inodes without extents */ -int ext4_dio_get_block(struct inode *inode, sector_t iblock, - struct buffer_head *bh, int create) -{ - /* We don't expect handle for direct IO */ - WARN_ON_ONCE(ext4_journal_current_handle()); - return ext4_get_block_trans(inode, iblock, bh, EXT4_GET_BLOCKS_CREATE); -} - -/* - * Get block function for AIO DIO writes when we create unwritten extent if - * blocks are not allocated yet. The extent will be converted to written - * after IO is complete. - */ -static int ext4_dio_get_block_unwritten_async(struct inode *inode, - sector_t iblock, struct buffer_head *bh_result, int create) -{ - int ret; - - /* We don't expect handle for direct IO */ - WARN_ON_ONCE(ext4_journal_current_handle()); - - ret = ext4_get_block_trans(inode, iblock, bh_result, - EXT4_GET_BLOCKS_IO_CREATE_EXT); - - /* - * When doing DIO using unwritten extents, we need io_end to convert - * unwritten extents to written on IO completion. We allocate io_end - * once we spot unwritten extent and store it in b_private. Generic - * DIO code keeps b_private set and furthermore passes the value to - * our completion callback in 'private' argument. - */ - if (!ret && buffer_unwritten(bh_result)) { - if (!bh_result->b_private) { - ext4_io_end_t *io_end; - - io_end = ext4_init_io_end(inode, GFP_KERNEL); - if (!io_end) - return -ENOMEM; - bh_result->b_private = io_end; - ext4_set_io_unwritten_flag(inode, io_end); - } - set_buffer_defer_completion(bh_result); - } - - return ret; -} - -/* - * Get block function for non-AIO DIO writes when we create unwritten extent if - * blocks are not allocated yet. The extent will be converted to written - * after IO is complete by ext4_direct_IO_write(). - */ -static int ext4_dio_get_block_unwritten_sync(struct inode *inode, - sector_t iblock, struct buffer_head *bh_result, int create) -{ - int ret; - - /* We don't expect handle for direct IO */ - WARN_ON_ONCE(ext4_journal_current_handle()); - - ret = ext4_get_block_trans(inode, iblock, bh_result, - EXT4_GET_BLOCKS_IO_CREATE_EXT); - - /* - * Mark inode as having pending DIO writes to unwritten extents. - * ext4_direct_IO_write() checks this flag and converts extents to - * written. - */ - if (!ret && buffer_unwritten(bh_result)) - ext4_set_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN); - - return ret; -} - -static int ext4_dio_get_block_overwrite(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create) -{ - int ret; - - ext4_debug("ext4_dio_get_block_overwrite: inode %lu, create flag %d\n", - inode->i_ino, create); - /* We don't expect handle for direct IO */ - WARN_ON_ONCE(ext4_journal_current_handle()); - - ret = _ext4_get_block(inode, iblock, bh_result, 0); - /* - * Blocks should have been preallocated! ext4_file_write_iter() checks - * that. - */ - WARN_ON_ONCE(!buffer_mapped(bh_result) || buffer_unwritten(bh_result)); - - return ret; -} - - /* * `handle' can be NULL if create is zero */ @@ -3494,7 +3367,8 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, unsigned int flags) { handle_t *handle; - int ret, dio_credits, retries = 0; + u8 blkbits = inode->i_blkbits; + int ret, dio_credits, m_flags = 0, retries = 0; /* * Trim the mapping request to the maximum value that we can map at @@ -3515,7 +3389,33 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, if (IS_ERR(handle)) return PTR_ERR(handle); - ret = ext4_map_blocks(handle, inode, map, EXT4_GET_BLOCKS_CREATE_ZERO); + /* + * DAX and direct I/O are the only two operations that are currently + * supported with IOMAP_WRITE. + */ + WARN_ON(!IS_DAX(inode) && !(flags & IOMAP_DIRECT)); + if (IS_DAX(inode)) + m_flags = EXT4_GET_BLOCKS_CREATE_ZERO; + /* + * We use i_size instead of i_disksize here because delalloc writeback + * can complete at any point during the I/O and subsequently push the + * i_disksize out to i_size. This could be beyond where direct I/O is + * happening and thus expose allocated blocks to direct I/O reads. + */ + else if ((map->m_lblk * (1 << blkbits)) >= i_size_read(inode)) + m_flags = EXT4_GET_BLOCKS_CREATE; + else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) + m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT; + + ret = ext4_map_blocks(handle, inode, map, m_flags); + + /* + * We cannot fill holes in indirect tree based inodes as that could + * expose stale data in the case of a crash. Use the magic error code + * to fallback to buffered I/O. + */ + if (!m_flags && !ret) + ret = -ENOTBLK; ext4_journal_stop(handle); if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) @@ -3561,6 +3461,16 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length, ssize_t written, unsigned flags, struct iomap *iomap) { + /* + * Check to see whether an error occurred while writing out the data to + * the allocated blocks. If so, return the magic error code so that we + * fallback to buffered I/O and attempt to complete the remainder of + * the I/O. Any blocks that may have been allocated in preparation for + * the direct I/O will be reused during buffered I/O. + */ + if (flags & (IOMAP_WRITE | IOMAP_DIRECT) && written == 0) + return -ENOTBLK; + return 0; } @@ -3637,245 +3547,6 @@ const struct iomap_ops ext4_iomap_report_ops = { .iomap_begin = ext4_iomap_begin_report, }; -static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset, - ssize_t size, void *private) -{ - ext4_io_end_t *io_end = private; - struct ext4_io_end_vec *io_end_vec; - - /* if not async direct IO just return */ - if (!io_end) - return 0; - - ext_debug("ext4_end_io_dio(): io_end 0x%p " - "for inode %lu, iocb 0x%p, offset %llu, size %zd\n", - io_end, io_end->inode->i_ino, iocb, offset, size); - - /* - * Error during AIO DIO. We cannot convert unwritten extents as the - * data was not written. Just clear the unwritten flag and drop io_end. - */ - if (size <= 0) { - ext4_clear_io_unwritten_flag(io_end); - size = 0; - } - io_end_vec = ext4_alloc_io_end_vec(io_end); - io_end_vec->offset = offset; - io_end_vec->size = size; - ext4_put_io_end(io_end); - - return 0; -} - -/* - * Handling of direct IO writes. - * - * For ext4 extent files, ext4 will do direct-io write even to holes, - * preallocated extents, and those write extend the file, no need to - * fall back to buffered IO. - * - * For holes, we fallocate those blocks, mark them as unwritten - * If those blocks were preallocated, we mark sure they are split, but - * still keep the range to write as unwritten. - * - * The unwritten extents will be converted to written when DIO is completed. - * For async direct IO, since the IO may still pending when return, we - * set up an end_io call back function, which will do the conversion - * when async direct IO completed. - * - * If the O_DIRECT write will extend the file then add this inode to the - * orphan list. So recovery will truncate it back to the original size - * if the machine crashes during the write. - * - */ -static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) -{ - struct file *file = iocb->ki_filp; - struct inode *inode = file->f_mapping->host; - struct ext4_inode_info *ei = EXT4_I(inode); - ssize_t ret; - loff_t offset = iocb->ki_pos; - size_t count = iov_iter_count(iter); - int overwrite = 0; - get_block_t *get_block_func = NULL; - int dio_flags = 0; - loff_t final_size = offset + count; - int orphan = 0; - handle_t *handle; - - if (final_size > inode->i_size || final_size > ei->i_disksize) { - /* Credits for sb + inode write */ - handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); - if (IS_ERR(handle)) { - ret = PTR_ERR(handle); - goto out; - } - ret = ext4_orphan_add(handle, inode); - if (ret) { - ext4_journal_stop(handle); - goto out; - } - orphan = 1; - ext4_update_i_disksize(inode, inode->i_size); - ext4_journal_stop(handle); - } - - BUG_ON(iocb->private == NULL); - - /* - * Make all waiters for direct IO properly wait also for extent - * conversion. This also disallows race between truncate() and - * overwrite DIO as i_dio_count needs to be incremented under i_mutex. - */ - inode_dio_begin(inode); - - /* If we do a overwrite dio, i_mutex locking can be released */ - overwrite = *((int *)iocb->private); - - if (overwrite) - inode_unlock(inode); - - /* - * For extent mapped files we could direct write to holes and fallocate. - * - * Allocated blocks to fill the hole are marked as unwritten to prevent - * parallel buffered read to expose the stale data before DIO complete - * the data IO. - * - * As to previously fallocated extents, ext4 get_block will just simply - * mark the buffer mapped but still keep the extents unwritten. - * - * For non AIO case, we will convert those unwritten extents to written - * after return back from blockdev_direct_IO. That way we save us from - * allocating io_end structure and also the overhead of offloading - * the extent convertion to a workqueue. - * - * For async DIO, the conversion needs to be deferred when the - * IO is completed. The ext4 end_io callback function will be - * called to take care of the conversion work. Here for async - * case, we allocate an io_end structure to hook to the iocb. - */ - iocb->private = NULL; - if (overwrite) - get_block_func = ext4_dio_get_block_overwrite; - else if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) || - round_down(offset, i_blocksize(inode)) >= inode->i_size) { - get_block_func = ext4_dio_get_block; - dio_flags = DIO_LOCKING | DIO_SKIP_HOLES; - } else if (is_sync_kiocb(iocb)) { - get_block_func = ext4_dio_get_block_unwritten_sync; - dio_flags = DIO_LOCKING; - } else { - get_block_func = ext4_dio_get_block_unwritten_async; - dio_flags = DIO_LOCKING; - } - ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, - get_block_func, ext4_end_io_dio, NULL, - dio_flags); - - if (ret > 0 && !overwrite && ext4_test_inode_state(inode, - EXT4_STATE_DIO_UNWRITTEN)) { - int err; - /* - * for non AIO case, since the IO is already - * completed, we could do the conversion right here - */ - err = ext4_convert_unwritten_extents(NULL, inode, - offset, ret); - if (err < 0) - ret = err; - ext4_clear_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN); - } - - inode_dio_end(inode); - /* take i_mutex locking again if we do a ovewrite dio */ - if (overwrite) - inode_lock(inode); - - if (ret < 0 && final_size > inode->i_size) - ext4_truncate_failed_write(inode); - - /* Handle extending of i_size after direct IO write */ - if (orphan) { - int err; - - /* Credits for sb + inode write */ - handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); - if (IS_ERR(handle)) { - /* - * We wrote the data but cannot extend - * i_size. Bail out. In async io case, we do - * not return error here because we have - * already submmitted the corresponding - * bio. Returning error here makes the caller - * think that this IO is done and failed - * resulting in race with bio's completion - * handler. - */ - if (!ret) - ret = PTR_ERR(handle); - if (inode->i_nlink) - ext4_orphan_del(NULL, inode); - - goto out; - } - if (inode->i_nlink) - ext4_orphan_del(handle, inode); - if (ret > 0) { - loff_t end = offset + ret; - if (end > inode->i_size || end > ei->i_disksize) { - ext4_update_i_disksize(inode, end); - if (end > inode->i_size) - i_size_write(inode, end); - /* - * We're going to return a positive `ret' - * here due to non-zero-length I/O, so there's - * no way of reporting error returns from - * ext4_mark_inode_dirty() to userspace. So - * ignore it. - */ - ext4_mark_inode_dirty(handle, inode); - } - } - err = ext4_journal_stop(handle); - if (ret == 0) - ret = err; - } -out: - return ret; -} - -static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) -{ - struct file *file = iocb->ki_filp; - struct inode *inode = file->f_mapping->host; - size_t count = iov_iter_count(iter); - loff_t offset = iocb->ki_pos; - ssize_t ret; - -#ifdef CONFIG_FS_ENCRYPTION - if (IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode)) - return 0; -#endif - if (fsverity_active(inode)) - return 0; - - /* - * If we are doing data journalling we don't support O_DIRECT - */ - if (ext4_should_journal_data(inode)) - return 0; - - /* Let buffer I/O handle the inline data case. */ - if (ext4_has_inline_data(inode)) - return 0; - - trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter)); - ret = ext4_direct_IO_write(iocb, iter); - trace_ext4_direct_IO_exit(inode, offset, count, iov_iter_rw(iter), ret); - return ret; -} - /* * Pages can be marked dirty completely asynchronously from ext4's journalling * activity. By filemap_sync_pte(), try_to_unmap_one(), etc. We cannot do @@ -3913,7 +3584,7 @@ static const struct address_space_operations ext4_aops = { .bmap = ext4_bmap, .invalidatepage = ext4_invalidatepage, .releasepage = ext4_releasepage, - .direct_IO = ext4_direct_IO, + .direct_IO = noop_direct_IO, .migratepage = buffer_migrate_page, .is_partially_uptodate = block_is_partially_uptodate, .error_remove_page = generic_error_remove_page, @@ -3930,7 +3601,7 @@ static const struct address_space_operations ext4_journalled_aops = { .bmap = ext4_bmap, .invalidatepage = ext4_journalled_invalidatepage, .releasepage = ext4_releasepage, - .direct_IO = ext4_direct_IO, + .direct_IO = noop_direct_IO, .is_partially_uptodate = block_is_partially_uptodate, .error_remove_page = generic_error_remove_page, }; @@ -3946,7 +3617,7 @@ static const struct address_space_operations ext4_da_aops = { .bmap = ext4_bmap, .invalidatepage = ext4_invalidatepage, .releasepage = ext4_releasepage, - .direct_IO = ext4_direct_IO, + .direct_IO = noop_direct_IO, .migratepage = buffer_migrate_page, .is_partially_uptodate = block_is_partially_uptodate, .error_remove_page = generic_error_remove_page,