From patchwork Thu Apr 25 13:28:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13643375 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9E8E149C69; Thu, 25 Apr 2024 13:29:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051750; cv=none; b=tbvaOUOU1zDtnlfq7ZXbJ8Ocj1urXnsvJtFmeZ3RQeFk3rXOOEyiVFwlmwIj3GK1AOq/xaqOImLhGf/Z6S6ddzdmcuKcKWMOREoUcLv5PzJlouwwk7envUvH7ayME8Q4yoVF37ZTvzvKr4ZJMWNGvfhx/vxgHps2BvIuPcS6Gpk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051750; c=relaxed/simple; bh=SBbQ5uBIgWybFSyOTAkJzR9WjflxNiGaVNMFfYIjNQ0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PYMV1Kzrtag3rirZGHZfPVMjGerdG6bKAxp3RBZlsWrltVKO6Ijn7PKXCa0CvAeGUK/5UyK8GXPqXcSiU9opJTcsdealH8kc082yv22yxVPYBdZt7Atngg6SxziVUTU46S3z0bBzO4uBQ7KNVhOG7pD/NcWrcitxEAnUnzo6wKs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dD0ifi7X; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dD0ifi7X" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6f074520c8cso1058342b3a.0; Thu, 25 Apr 2024 06:29:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714051748; x=1714656548; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WtYU8KBry5LUbKzLE1CfVwbFUPR5tg007xGtQJRD83o=; b=dD0ifi7XtydVwiTs8L7tPtZOZAxhEBqliDAwClg92i9V+0o6dZoun9hEnSIYUsEFgR /ac/sYcQYNTxm7IkOV1LXKb1nzwRy1U9AHZm1kcqksQBXF4B5zvvi+yixujnDsAw86TH OTuYTiggSawEZyXmhRmE3lEACg71zpluM8NbP/K8yXNNDpPEUUNrvk+iXxqzAIxKRMgv 4wrQtZT8vUWq0VlJIVmfjmv1hDjExA96tYqKYWmnZBhZJ/yYKZ27nSoEExYkXtarh2gJ x7enGvl4v3qdMivS2ia9ekwbUeGhnjmgEoGqFZBThgI+gRTSXrHQIcAMfi/w0haQ3FrF A/Zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714051748; x=1714656548; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WtYU8KBry5LUbKzLE1CfVwbFUPR5tg007xGtQJRD83o=; b=qsQ7MI6FDNF0/5u3nZraoZfWieBwAiyeyY4PjLC1JrTwdMMgVeHL8Ty5WYcx4AjqIz 0em2Wed0CDM4YAjASJr7+MHzKp8nEXkOcCwmwVgsKu97NcbtbttYyfeiqtKYOIl9HNX0 /Xhxgm3w7F+9sumLC8l+mMg+B12aE03oMZJ34Nw7OeWy5bYcvwV0VafgBOfIu3wGx+K9 Qs/vsTX4cHUHXu1ngkUuX6gJumhnrv5Xspp7vRBqyuuUl65y5BDi6RH2zK7HshEmA5vX nUXjiQwWXwH0//LlyliTxXg7e70u7TaiD4bdqM4cx0EIPP3/wBrlqLQfuW2GjOGoo7Ib Epyg== X-Forwarded-Encrypted: i=1; AJvYcCWvWuIBooXI+iwAEakEXMl+HbnoqdUPep1BggriLdHfXQCmh7RsSdxWAR2QVZdWM0Tdwx6o7/BUGH9l6MLiuyLwHx3EoD55zJDn X-Gm-Message-State: AOJu0YxMNhuBZLEcCwKgN5w5h4CpP3SSea9qUbQso9js/HezGoIttulL SwBRT9pN0pWZlfQiAC3XGyc1lKkTWaTOKUuREya/CGRBGY4QWOe3mTQI0OKw X-Google-Smtp-Source: AGHT+IEzx/7ppa7QCTSV3fMLkvafSKZhzB/RhOH8VUG5n+gX/1+Ye0D5Nafl/nJBNXm8ZkrmX/4d/A== X-Received: by 2002:a05:6a00:2388:b0:6ed:cece:8160 with SMTP id f8-20020a056a00238800b006edcece8160mr6585895pfc.31.1714051747982; Thu, 25 Apr 2024 06:29:07 -0700 (PDT) Received: from dw-tp.in.ibm.com ([129.41.58.7]) by smtp.gmail.com with ESMTPSA id s15-20020a62e70f000000b006f260fb17e5sm9764518pfh.141.2024.04.25.06.29.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 06:29:07 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , "Darrick J . Wong" , Ojaswin Mujoo , Ritesh Harjani , Jan Kara Subject: [RFCv3 1/7] ext2: Remove comment related to journal handle Date: Thu, 25 Apr 2024 18:58:45 +0530 Message-ID: <08f3371e0c0932b5e1367ebbdd77cf61b7e4850b.1714046808.git.ritesh.list@gmail.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Signed-off-by: Ritesh Harjani (IBM) Reviewed-by: Darrick J. Wong --- fs/ext2/inode.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index f3d570a9302b..c4de3a94c4b2 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -615,8 +615,6 @@ static void ext2_splice_branch(struct inode *inode, * allocations is needed - we simply release blocks and do not touch anything * reachable from inode. * - * `handle' can be NULL if create == 0. - * * return > 0, # of blocks mapped or allocated. * return = 0, if plain lookup failed. * return < 0, error case. From patchwork Thu Apr 25 13:28:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13643376 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E3E8149C6D; Thu, 25 Apr 2024 13:29:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051754; cv=none; b=WCJNO3pewQSGfdmFO/lWuF3vWvzMBpe7NiwIRCgrbv38gIpzm2iSYd7dkQ9U+PkjwouRwT4/zZrfD3btLAyczmljXCIGns2oZZ+5+FEl9Cei5KAnRYnPSjKWj3YarwpcJsqGudN5fkrWaBSPQLFPSSJxSkKv8Thgex622m7PVH4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051754; c=relaxed/simple; bh=I40xLpUAHIctlLAtpBSyFlGrk8n2E8bYdUoDPm5XAog=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kSSiX7P5IWVHq/7Yy6FsXdxOU4AxCt1gVaLhSVFl5nwKHXbETN2qkrydGd9ibe9ABaC7bLgwzO/GY7u9L6ie1Xq69ZLUt2nvJnQDFPQAVJwE3OW2r2u4tU55UHzMRS8dhs11Y4AsvpzNo6ffXBiDBx19xbXtI9oFySUKs7p03D0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gOs99D62; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gOs99D62" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6ed32341906so991406b3a.1; Thu, 25 Apr 2024 06:29:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714051751; x=1714656551; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DQsc0k300b3kctLYqSVw3Z17BFFR+ZMwOgu7lBfzzbs=; b=gOs99D62RtouVZdnHJfUzeSgqiZdonMD0CwRPDlaPzrTehmPI7WweuSj2MxKy9TM+m Bgs3Bxvzt/1+RqL1gYgTHv/44FA7BBYJNOpVrgcTn54MuHspzRl3EnLvbNJ5lFItfrXA afqHGUBoFHvHKZvyzZohOn2AFhtslKeT6Wfuh1u0PwAM29fXx6cu/XjODItruFGka9aK J2FtQRFOfGDCRHXPTJRMkFK0A8/SbnXWTWg//Qbk/WUAFnwGV6RiUwRPuouR7Y160CfL M60G3Xn/vRkgTtw7rvwkuuGIUIlayO0BCOIiEcUWKVYAe1qkvJlKWUvuzRtQ8ZqoJc+p TaMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714051751; x=1714656551; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DQsc0k300b3kctLYqSVw3Z17BFFR+ZMwOgu7lBfzzbs=; b=FqASZNALZOZxLXwbRyVP96gc8XMWPySQELcYsE/Y3GmGgmYgq3MkBywqv26QcEcFV+ 34qVT8k6801AgJ1wp/jyTs/Ci4JDXZ+8J/PIyQspr2GNbOU2J2C+NgXMJOXYzpUdiE/U 4QfqoPuXJQcTcG8fUUzdkKu/w0TEnbmFfe2fzlj+CXuIjL7qOMRs6ee70SXmGCwvA1e9 rRvve2ey3RR/Oieka45Z9cBa0sydFMcXU/LeVUF8mq9KDY2iagrz68XHoFB3vFcOctMy /zvsJTqAqwDQq8P5JzxRQWMxtWsc2GMSWe3G1mDGwekbx8XfkkUIohwqWmOteHq+zWtw V/ZQ== X-Forwarded-Encrypted: i=1; AJvYcCUTZYNBs3RHMb2dvGNWQnIAMcnZMBq3c7dVjVR1nnhcnYm15+ISBidX8tzbGfkner51jAMx6ARyGvGu21OyKxiq1GtP9TBauzq0 X-Gm-Message-State: AOJu0YwLl4fE/wEMuwyi1Sv/F0oAuDVx+kZVVC3Iym3yP7FDte9BFqwl lbA42BUewYlHt2vwa0FaWxaJhAb+m05px+M0FrNWY4VxuoVlBRyGlyMlBkjn X-Google-Smtp-Source: AGHT+IFxQWdbYGmmaQOyPDzj671xQy106G3swzYQOnPFN4SzBpQxFJem3ERZHsIDaQCps8RobdfOww== X-Received: by 2002:a05:6a00:1399:b0:6ea:914e:a108 with SMTP id t25-20020a056a00139900b006ea914ea108mr6884818pfg.12.1714051751384; Thu, 25 Apr 2024 06:29:11 -0700 (PDT) Received: from dw-tp.in.ibm.com ([129.41.58.7]) by smtp.gmail.com with ESMTPSA id s15-20020a62e70f000000b006f260fb17e5sm9764518pfh.141.2024.04.25.06.29.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 06:29:10 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , "Darrick J . Wong" , Ojaswin Mujoo , Ritesh Harjani , Jan Kara Subject: [RFCv3 2/7] ext2: Convert ext2 regular file buffered I/O to use iomap Date: Thu, 25 Apr 2024 18:58:46 +0530 Message-ID: <54d3fdabeb82e494fab83204cd49e75b58ef298e.1714046808.git.ritesh.list@gmail.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch converts ext2 regular file's buffered-io path to use iomap. - buffered-io path using iomap_file_buffered_write - DIO fallback to buffered-io now uses iomap_file_buffered_write - writeback path now uses a new aops - ext2_file_aops - truncate now uses iomap_truncate_page - mmap path of ext2 continues to use generic_file_vm_ops Signed-off-by: Ritesh Harjani (IBM) Reviewed-by: Darrick J. Wong --- fs/ext2/file.c | 20 ++++++++++++-- fs/ext2/inode.c | 69 ++++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 81 insertions(+), 8 deletions(-) diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 4ddc36f4dbd4..ee5cd4a2f24f 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -252,7 +252,7 @@ static ssize_t ext2_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) iocb->ki_flags &= ~IOCB_DIRECT; pos = iocb->ki_pos; - status = generic_perform_write(iocb, from); + status = iomap_file_buffered_write(iocb, from, &ext2_iomap_ops); if (unlikely(status < 0)) { ret = status; goto out_unlock; @@ -278,6 +278,22 @@ static ssize_t ext2_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) return ret; } +static ssize_t ext2_buffered_write_iter(struct kiocb *iocb, + struct iov_iter *from) +{ + ssize_t ret = 0; + struct inode *inode = file_inode(iocb->ki_filp); + + inode_lock(inode); + ret = generic_write_checks(iocb, from); + if (ret > 0) + ret = iomap_file_buffered_write(iocb, from, &ext2_iomap_ops); + inode_unlock(inode); + if (ret > 0) + ret = generic_write_sync(iocb, ret); + return ret; +} + static ssize_t ext2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { #ifdef CONFIG_FS_DAX @@ -299,7 +315,7 @@ static ssize_t ext2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (iocb->ki_flags & IOCB_DIRECT) return ext2_dio_write_iter(iocb, from); - return generic_file_write_iter(iocb, from); + return ext2_buffered_write_iter(iocb, from); } const struct file_operations ext2_file_operations = { diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index c4de3a94c4b2..f90d280025d9 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -877,10 +877,14 @@ ext2_iomap_end(struct inode *inode, loff_t offset, loff_t length, if ((flags & IOMAP_DIRECT) && (flags & IOMAP_WRITE) && written == 0) return -ENOTBLK; - if (iomap->type == IOMAP_MAPPED && - written < length && - (flags & IOMAP_WRITE)) + if (iomap->type == IOMAP_MAPPED && written < length && + (flags & IOMAP_WRITE)) { ext2_write_failed(inode->i_mapping, offset + length); + return 0; + } + + if (iomap->flags & IOMAP_F_SIZE_CHANGED) + mark_inode_dirty(inode); return 0; } @@ -912,6 +916,16 @@ static void ext2_readahead(struct readahead_control *rac) mpage_readahead(rac, ext2_get_block); } +static int ext2_file_read_folio(struct file *file, struct folio *folio) +{ + return iomap_read_folio(folio, &ext2_iomap_ops); +} + +static void ext2_file_readahead(struct readahead_control *rac) +{ + iomap_readahead(rac, &ext2_iomap_ops); +} + static int ext2_write_begin(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, struct page **pagep, void **fsdata) @@ -941,12 +955,41 @@ static sector_t ext2_bmap(struct address_space *mapping, sector_t block) return generic_block_bmap(mapping,block,ext2_get_block); } +static sector_t ext2_file_bmap(struct address_space *mapping, sector_t block) +{ + return iomap_bmap(mapping, block, &ext2_iomap_ops); +} + static int ext2_writepages(struct address_space *mapping, struct writeback_control *wbc) { return mpage_writepages(mapping, wbc, ext2_get_block); } +static int ext2_write_map_blocks(struct iomap_writepage_ctx *wpc, + struct inode *inode, loff_t offset, + unsigned len) +{ + if (offset >= wpc->iomap.offset && + offset < wpc->iomap.offset + wpc->iomap.length) + return 0; + + return ext2_iomap_begin(inode, offset, inode->i_sb->s_blocksize, + IOMAP_WRITE, &wpc->iomap, NULL); +} + +static const struct iomap_writeback_ops ext2_writeback_ops = { + .map_blocks = ext2_write_map_blocks, +}; + +static int ext2_file_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + struct iomap_writepage_ctx wpc = { }; + + return iomap_writepages(mapping, wbc, &wpc, &ext2_writeback_ops); +} + static int ext2_dax_writepages(struct address_space *mapping, struct writeback_control *wbc) { @@ -955,6 +998,20 @@ ext2_dax_writepages(struct address_space *mapping, struct writeback_control *wbc return dax_writeback_mapping_range(mapping, sbi->s_daxdev, wbc); } +const struct address_space_operations ext2_file_aops = { + .dirty_folio = iomap_dirty_folio, + .release_folio = iomap_release_folio, + .invalidate_folio = iomap_invalidate_folio, + .read_folio = ext2_file_read_folio, + .readahead = ext2_file_readahead, + .bmap = ext2_file_bmap, + .direct_IO = noop_direct_IO, + .writepages = ext2_file_writepages, + .migrate_folio = filemap_migrate_folio, + .is_partially_uptodate = iomap_is_partially_uptodate, + .error_remove_folio = generic_error_remove_folio, +}; + const struct address_space_operations ext2_aops = { .dirty_folio = block_dirty_folio, .invalidate_folio = block_invalidate_folio, @@ -1279,8 +1336,8 @@ static int ext2_setsize(struct inode *inode, loff_t newsize) error = dax_truncate_page(inode, newsize, NULL, &ext2_iomap_ops); else - error = block_truncate_page(inode->i_mapping, - newsize, ext2_get_block); + error = iomap_truncate_page(inode, newsize, NULL, + &ext2_iomap_ops); if (error) return error; @@ -1370,7 +1427,7 @@ void ext2_set_file_ops(struct inode *inode) if (IS_DAX(inode)) inode->i_mapping->a_ops = &ext2_dax_aops; else - inode->i_mapping->a_ops = &ext2_aops; + inode->i_mapping->a_ops = &ext2_file_aops; } struct inode *ext2_iget (struct super_block *sb, unsigned long ino) From patchwork Thu Apr 25 13:28:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13643377 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B7EC149C7D; Thu, 25 Apr 2024 13:29:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051757; cv=none; b=UX+V0w99M/jFXr6wU8Nn5NI9hK5YfQey6I0f0V8TA/4UVOev1MydIRKcsHTJ6cxAZ8lHTLPsqqS2tClWcXm6q82itODbzUPGyPAS5AchVF+cyE5fZlO9Al2HFiF0fjr4dxF3+2yvsIFBYjV1GODO8iasHhCdbTo9TzkGtZt3S68= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051757; c=relaxed/simple; bh=8PI/Yd36KE16CRpWHWbzB5M8B8kPx+mwO9n/9I4bEvs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=a1lTyxBLMlqqLyx7nCoKy83v78DvsyNTJVi3NJl+RDX2dVJbOpa6/NoEnSQv0vJ9uNblLGM+z2Fuy9lESv0qGQW7m8sSAm3a8zp14WIDPABp8U+P5wqx/HUtCbEw2TKODu9nQAHIkd9YWnQpgGIiw8l/ZbsHqb9KITiZ6OeXQW8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=W7dcTkBO; arc=none smtp.client-ip=209.85.210.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="W7dcTkBO" Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-6eff2be3b33so985268b3a.2; Thu, 25 Apr 2024 06:29:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714051754; x=1714656554; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kdS7HPQ96MsYbjUY38DCfySnr+/bDWzu/eQof1r1iiI=; b=W7dcTkBOx5AxprC0K8gOeL/nIsNVysvYmi5hIEUR56b9xYmXpp6yS3T47sTFiAaLcu mdivFUSU5EgLxVMPdsPR1Oc0Y0y7+KI3hJgwOtdiK/9gtp/xrefFZgJzKvY/BWOb7Qur HL/heG++wKpCbFpE3wV+KiQS7V6uI/s9f8TtuV8qprhvTmD+18c3wLLwmnEwjtlZVxT0 F8GYuVhHDnD4z/okF/UlGOhk+ZoDVPCYn4SHvKC1AFqA/9+Vw8vgYawuyhuZFNMa0fwl qbHpNm7W23RB2J5nCkrwTGakaECMh76cDFscOdflkqrNwX07Pqbcy+VbwVoc7q9zCzRO EXaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714051754; x=1714656554; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kdS7HPQ96MsYbjUY38DCfySnr+/bDWzu/eQof1r1iiI=; b=R5fZnpIV2Fdomq3MJRLQpWJtz6Nu+Xqtt5rPYIMroDl2t1j3J+apBaZLagfbGdMNrj 1/ZYfkP55leq7sD+2tgd5tumPuDpArq1WcSSmV2SSdUK3DPzyOsqC11fomzKmkAMBex2 65VzCcmwGKjgH5rysD3rfeFLn1N63O/NGV99loh11Ysn5B6zDxEOfjIP/3XVmA1g/jdB EFJk2OkLOWHAdorPmaCHoHNBmnr2rP7FVHzH2/JacdZW84LmRR+3AfC0yvUT7ULc5NzE lC4l+Oqhw5twt9dm4fSdhkN0OocwXbC0Mr/Nq9M4IjfsVELQwOfrC5k6yTs8PsXpc9L+ YT4w== X-Forwarded-Encrypted: i=1; AJvYcCXAO+9vtl3Nkpk1NtjQBsBrr3STYIVhn0Oq79FnF41L5vmwdsAbdNxdq8y9htSgwD/a3VBxao72zGEMOWnXQZOQvYgF8zZS6UIX X-Gm-Message-State: AOJu0YxpxHOR9LcHZmstk0dQ7imkfBryzRer0x6CvopzGsEfiHU6FWIQ bINTfoWkSe4Mhgzz8sOFrcwi6V/UDe1MqzYxkfutOJeJB3ptYg5wxtfeRqVN X-Google-Smtp-Source: AGHT+IH9C9r7Mnh4AB4KUmTdE9g+FmuvxgTspXlf6hn2+h3axk/BCWPGTa3Tf+wX9TDeNMVQNv8XQw== X-Received: by 2002:a05:6a20:12cc:b0:1a3:c3a9:53b7 with SMTP id v12-20020a056a2012cc00b001a3c3a953b7mr6938369pzg.55.1714051754548; Thu, 25 Apr 2024 06:29:14 -0700 (PDT) Received: from dw-tp.in.ibm.com ([129.41.58.7]) by smtp.gmail.com with ESMTPSA id s15-20020a62e70f000000b006f260fb17e5sm9764518pfh.141.2024.04.25.06.29.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 06:29:14 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , "Darrick J . Wong" , Ojaswin Mujoo , Ritesh Harjani , Jan Kara Subject: [RFCv3 3/7] ext2: Enable large folio support Date: Thu, 25 Apr 2024 18:58:47 +0530 Message-ID: <581b2ed21a709093522f3747c06e8171c82f2d8c.1714046808.git.ritesh.list@gmail.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Now that ext2 regular file buffered-io path is converted to use iomap, we can also enable large folio support for ext2. Signed-off-by: Ritesh Harjani (IBM) Reviewed-by: Darrick J. Wong --- fs/ext2/inode.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index f90d280025d9..2b62786130b5 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1424,10 +1424,12 @@ void ext2_set_file_ops(struct inode *inode) { inode->i_op = &ext2_file_inode_operations; inode->i_fop = &ext2_file_operations; - if (IS_DAX(inode)) + if (IS_DAX(inode)) { inode->i_mapping->a_ops = &ext2_dax_aops; - else + } else { inode->i_mapping->a_ops = &ext2_file_aops; + mapping_set_large_folios(inode->i_mapping); + } } struct inode *ext2_iget (struct super_block *sb, unsigned long ino) From patchwork Thu Apr 25 13:28:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13643378 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1AA5C14A091; Thu, 25 Apr 2024 13:29:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051761; cv=none; b=hi4AofZTMNv6qNtmtCcxj4rS33bF+XqIY7BPjWxybSn/UQ1gOLNrUwyUj5ueghZLPga1qf9aXE5h+mT0fzxDdzXVaEu7FMHF6884ev54r5+SujT2OGL0sxatCxiPIrx+5wUO64NQncYwI6Zf/BfGD80bW7kT3B8cH0afgtRjbhs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051761; c=relaxed/simple; bh=/wh0tkUN7g3ReoAIEMJ7oFqF6cMIggwYSbzM52BSptM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=P8bXejgjeOMFIrNyTh4xWD5IBQjyBCc75zx8puddCgXelYku7248FRlajLcPfFGzdtWw3xFu29w6auqDRiY1pQiijPRwShZ6AEMPGWGXjMeHTP3fK4l9RQyEkCpfNxyYEouvUJwX/hTH/3po+XOnt+bMfik6901pTmikUE8fwqs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kHrYvRz1; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kHrYvRz1" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-6ed32341906so991560b3a.1; Thu, 25 Apr 2024 06:29:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714051758; x=1714656558; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FocX2ENNC23gjap0Wl/CF9LC9c00Rg9MqvN6XwRvlvo=; b=kHrYvRz131blyEOyd/EtE6Zd1CRc53/QKCxya3k4+nh1avt3TUaN4P7ll3wN1oZjNS nqQl0gKgSpzHj8gOadK6gbczvz2Tjzw9Ywp5qTg9xnDiOllAc+Qs+aUPeVXp+YfJSzJG 85i5NlQ2fQV1OzahQFS3JrEd82PsKmYXqOBjtDy918dG8Uc2TPnvpq68YPeBP2z75Wqb NSkhsb69w98a0abv8LQJyqp9Sq3klETa2pFNw3OFEMIUIpsldCFzyYG//K6tye1B0WII Re/bViO6rKHz5M4yBttnGBzISmld14IOgKnc54KjMUH2i4TAbYpV9/7yUkBNBE+VF1QG DB+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714051758; x=1714656558; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FocX2ENNC23gjap0Wl/CF9LC9c00Rg9MqvN6XwRvlvo=; b=neyaRwO6AoqYH0TgMXYk+i+v7d3m+BLTivAOQFeR3Eh6/9hWLHZ0CRwwB2My2cLM9A qOV3nLYQ5q/R0WC/sR6gGAIR/6NPKMKTOLNUX0zLKlGHxcuA+6OzR5uJhdXu9aEn8rtu Nz/uqH8k/Fm2r6Tp+eydlGd8k+VhcTqv/AKjVwSKInHQ2L5mzDq0z/yAtYXVJ7eOqEm8 J3ytjmYNHtlHtdXQ3nOBOrZVi/foRkIGbKt+BBlbqYdZRf5uAdtEqbuCJ/0zusn60Y9i DDutylmbsgFJZNnqWRJKozzKFFSmnYCDpteKJ9Gagn0Wbwu9sgnleDQuZYEKWppdswcd yhbQ== X-Forwarded-Encrypted: i=1; AJvYcCU4dWx29WPZ4pWeZCP2X/4hbWOA9xIIIJOc48vFq8CVE19uxFBoqghrMzN/QRLVzOum/DYKJ+Bt7NF6nui+YrBj2zyHXCv7TB+J X-Gm-Message-State: AOJu0YwTaSQe5iSiawKyfWTSWJGLJe5rMK7nDRRKEMr4cKCHgoJ7bhjL GJthvzIQVBVSA6V41LrLxqCYmRhQYfIIhRuV9aSSh5Q/VhK3VFQ8Yr/5RdO9 X-Google-Smtp-Source: AGHT+IE3cSoUeWuPz0fi6kdz/sLK+SIFJCC1Rq+mP272g2W765HZiA8q20f0NRSVppPwuZcx4isCiQ== X-Received: by 2002:a05:6a00:8c4:b0:6ed:41f3:431d with SMTP id s4-20020a056a0008c400b006ed41f3431dmr6712776pfu.0.1714051758398; Thu, 25 Apr 2024 06:29:18 -0700 (PDT) Received: from dw-tp.in.ibm.com ([129.41.58.7]) by smtp.gmail.com with ESMTPSA id s15-20020a62e70f000000b006f260fb17e5sm9764518pfh.141.2024.04.25.06.29.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 06:29:17 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , "Darrick J . Wong" , Ojaswin Mujoo , Ritesh Harjani , Jan Kara Subject: [RFCv3 4/7] ext2: Implement seq counter for validating cached iomap Date: Thu, 25 Apr 2024 18:58:48 +0530 Message-ID: <009d08646b77e0d774b4ce248675b86564bca9ee.1714046808.git.ritesh.list@gmail.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 There is a possibility of following race with iomap during writebck - write_cache_pages() cache extent covering 0..1MB range write page at offset 0k truncate(file, 4k) drops all relevant pages frees fs blocks pwrite(file, 4k, 4k) creates dirty page in the page cache writes page at offset 4k to a stale block This race can happen because iomap_writepages() keeps a cached extent mapping within struct iomap. While write_cache_pages() is going over each folio, (can cache a large extent range), if a truncate happens in parallel on the next folio followed by a buffered write to the same offset within the file, this can change logical to physical offset of the cached iomap mapping. That means, the cached iomap has now become stale. This patch implements the seq counter approach for revalidation of stale iomap mappings. i_blkseq will get incremented for every block allocation/free. Here is what we do - For ext2 buffered-writes, the block allocation happens at the ->write_iter time itself. So at writeback time, 1. We first cache the i_blkseq. 2. Call ext2_get_blocks(, create = 0) to get the no. of blocks already allocated. 3. Call ext2_get_blocks() the second time with length to be same as the no. of blocks we know were already allocated. 4. Till now it means, the cached i_blkseq remains valid as no block allocation has happened yet. This means the next call to ->map_blocks(), we can verify whether the i_blkseq has raced with truncate or not. If not, then i_blkseq will remain valid. In case of a hole (could happen with mmaped writes), we only allocate 1 block at a time anyways. So even if the i_blkseq value changes right after, we anyway need to allocate the next block in subsequent ->map_blocks() call. Signed-off-by: Ritesh Harjani (IBM) --- fs/ext2/balloc.c | 1 + fs/ext2/ext2.h | 6 +++++ fs/ext2/inode.c | 57 ++++++++++++++++++++++++++++++++++++++++++++---- fs/ext2/super.c | 2 +- 4 files changed, 61 insertions(+), 5 deletions(-) diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c index 1bfd6ab11038..047a8f41a6f5 100644 --- a/fs/ext2/balloc.c +++ b/fs/ext2/balloc.c @@ -495,6 +495,7 @@ void ext2_free_blocks(struct inode * inode, ext2_fsblk_t block, } ext2_debug ("freeing block(s) %lu-%lu\n", block, block + count - 1); + ext2_inc_i_blkseq(EXT2_I(inode)); do_more: overflow = 0; diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index f38bdd46e4f7..67b1acb08eb2 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -663,6 +663,7 @@ struct ext2_inode_info { struct rw_semaphore xattr_sem; #endif rwlock_t i_meta_lock; + unsigned int i_blkseq; /* * truncate_mutex is for serialising ext2_truncate() against @@ -698,6 +699,11 @@ static inline struct ext2_inode_info *EXT2_I(struct inode *inode) return container_of(inode, struct ext2_inode_info, vfs_inode); } +static inline void ext2_inc_i_blkseq(struct ext2_inode_info *ei) +{ + WRITE_ONCE(ei->i_blkseq, READ_ONCE(ei->i_blkseq) + 1); +} + /* balloc.c */ extern int ext2_bg_has_super(struct super_block *sb, int group); extern unsigned long ext2_bg_num_gdb(struct super_block *sb, int group); diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 2b62786130b5..946a614ddfc0 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -406,6 +406,8 @@ static int ext2_alloc_blocks(struct inode *inode, ext2_fsblk_t current_block = 0; int ret = 0; + ext2_inc_i_blkseq(EXT2_I(inode)); + /* * Here we try to allocate the requested multiple blocks at once, * on a best-effort basis. @@ -966,15 +968,62 @@ ext2_writepages(struct address_space *mapping, struct writeback_control *wbc) return mpage_writepages(mapping, wbc, ext2_get_block); } +static bool ext2_imap_valid(struct iomap_writepage_ctx *wpc, struct inode *inode, + loff_t offset) +{ + if (offset < wpc->iomap.offset || + offset >= wpc->iomap.offset + wpc->iomap.length) + return false; + + if (wpc->iomap.validity_cookie != READ_ONCE(EXT2_I(inode)->i_blkseq)) + return false; + + return true; +} + static int ext2_write_map_blocks(struct iomap_writepage_ctx *wpc, struct inode *inode, loff_t offset, unsigned len) { - if (offset >= wpc->iomap.offset && - offset < wpc->iomap.offset + wpc->iomap.length) + loff_t maxblocks = (loff_t)INT_MAX; + u8 blkbits = inode->i_blkbits; + u32 bno; + bool new, boundary; + int ret; + + if (ext2_imap_valid(wpc, inode, offset)) return 0; - return ext2_iomap_begin(inode, offset, inode->i_sb->s_blocksize, + /* + * For ext2 buffered-writes, the block allocation happens at the + * ->write_iter time itself. So at writeback time - + * 1. We first cache the i_blkseq. + * 2. Call ext2_get_blocks(, create = 0) to get the no. of blocks + * already allocated. + * 3. Call ext2_get_blocks() the second time with length to be same as + * the no. of blocks we know were already allocated. + * 4. Till now it means, the cached i_blkseq remains valid as no block + * allocation has happened yet. + * This means the next call to ->map_blocks(), we can verify whether the + * i_blkseq has raced with truncate or not. If not, then i_blkseq will + * remain valid. + * + * In case of a hole (could happen with mmaped writes), we only allocate + * 1 block at a time anyways. So even if the i_blkseq value changes, we + * anyway need to allocate the next block in subsequent ->map_blocks() + * call. + */ + wpc->iomap.validity_cookie = READ_ONCE(EXT2_I(inode)->i_blkseq); + + ret = ext2_get_blocks(inode, offset >> blkbits, maxblocks << blkbits, + &bno, &new, &boundary, 0); + if (ret < 0) + return ret; + /* + * ret can be 0 in case of a hole which is possible for mmaped writes. + */ + ret = ret ? ret : 1; + return ext2_iomap_begin(inode, offset, (loff_t)ret << blkbits, IOMAP_WRITE, &wpc->iomap, NULL); } @@ -1000,7 +1049,7 @@ ext2_dax_writepages(struct address_space *mapping, struct writeback_control *wbc const struct address_space_operations ext2_file_aops = { .dirty_folio = iomap_dirty_folio, - .release_folio = iomap_release_folio, + .release_folio = iomap_release_folio, .invalidate_folio = iomap_invalidate_folio, .read_folio = ext2_file_read_folio, .readahead = ext2_file_readahead, diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 37f7ce56adce..32f5386284d6 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -188,7 +188,7 @@ static struct inode *ext2_alloc_inode(struct super_block *sb) #ifdef CONFIG_QUOTA memset(&ei->i_dquot, 0, sizeof(ei->i_dquot)); #endif - + WRITE_ONCE(ei->i_blkseq, 0); return &ei->vfs_inode; } From patchwork Thu Apr 25 13:28:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13643379 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 292DC149C5F; Thu, 25 Apr 2024 13:29:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051764; cv=none; b=dBfohxq3HhA9mkPqImUyREH+Ht/1nQtgUPUf+RAzFlyVUt6uu4aSDaYBAgtIF2qDpcl8ygezlH9pVZpflB8hoS2IVTaqOnZtax8/XPGjKL1S7GIZ+IZBjBhYZ8QwruZNJbaJgAvOHfLT5B9+HmnOgiotutQ2NdjteQoKhWDRnqk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051764; c=relaxed/simple; bh=1EVVSYodaYczteQL7YgaNHwjZuwuhwLauk68UaTQcTE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SwfgU9Dy+V/Nks6pQjt2d93+jZqSyX7HjBYcy1r9IbXvzcLlmxQgmXMD9bJ2IKQ8Q14/q7UiheDpkFapVoIYQCLSX47FduRvG1eCt5ifBYGq9UBw6TVk/MoRiTbRNC+HBLXce3h4dlMEJ/f1lkwFS7zBkQ5cfQkIgz5LWuBgPJ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=S8l61sdi; arc=none smtp.client-ip=209.85.210.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="S8l61sdi" Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-6ece8991654so965002b3a.3; Thu, 25 Apr 2024 06:29:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714051761; x=1714656561; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=n8wacH0Zdla+UG4oItfWBesuc5mnqKDfIZULU83MhBU=; b=S8l61sdiwDfpbYF1B1gbZmF6fb4Ms+OTkZT3ZPb0prqe+B+kbAwm4EtVwU2Mv5LJKy 1w9vFmkT2ph6gY3Erb4FgN/9QowFX2WnDTbc8tfiggx2x7fRYF87ngQVdTxF3C3NA1ZG dF9PUL13AFGNLHuR4UtvCR6+URukon4MbpoWjsju5rAnuYnfAs2luKWb+PQFkg+HhzSa Gy3Av9cFl0pgAGKEOtbEMS3lzZKBLpXPW7Q2GMmgHgSuF3zDjluH7M18hjU1/Ku+aAjw ohNE+JDlZ6HbahznEnaGZo1RZ6qh8QZBovUaQmD9hUd7NSzOzpFLBYyN/1Kot/4AF9AY AnJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714051761; x=1714656561; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n8wacH0Zdla+UG4oItfWBesuc5mnqKDfIZULU83MhBU=; b=Wso3EpUBedILcbOFfv6HPuSWmdFBRHqfB4tjexi+4ZFXELw2LGtgkckEl5gBl/DpTH LE0tyambmTsUf+uXxAQOnQcfluGMZECShCVow8KB9o5WWi+LfUn/1+OaynyFfFboDg6+ C6Q1p1scxRvJp/wKFgWRqBH4odpsyG5TS69qDobKnwtxRoOKyTfCs2e89VbjN0s1Wcyg PpCJ2e12Mq3Ii/N2ZXKGR6z6KeiVuhBw+E9Cs/gEIUy8dliZRg1kVzcFOo81UsW5NJwW nuA2ok1lcTOA5naPvlkrpfYhk2LZSdZdE/rvo48jE4fx3nx5yE1888qQypTv1cSVqqKR 3APQ== X-Forwarded-Encrypted: i=1; AJvYcCVQA2O/ztR20Abz1GRnWIvTgPL8GWGJk41ZOVHqx79l+uByxMF4Sbb187CCs4lwXjgrsdY76BZTCpy0CsLmXbxysuZs7RrKoXTI X-Gm-Message-State: AOJu0YxORNzgQL/afL80hUXyio2707xKaN2swV9DE0Ur4TLOhKvd7xtU q12KOA/1r5n5CQP8X1NYPcPQaPG9jxq8T9J7h8A3i2SGaDbOO1wWP55xC+C6 X-Google-Smtp-Source: AGHT+IE8DSouLugBHHToHn+jclSG03lwZRDwAQKWB4GajIqxXia+Ra4kvLmuyZ+Yn02D5NDn/3sfPQ== X-Received: by 2002:a05:6a00:4b04:b0:6e7:29dd:84db with SMTP id kq4-20020a056a004b0400b006e729dd84dbmr7904590pfb.31.1714051761550; Thu, 25 Apr 2024 06:29:21 -0700 (PDT) Received: from dw-tp.in.ibm.com ([129.41.58.7]) by smtp.gmail.com with ESMTPSA id s15-20020a62e70f000000b006f260fb17e5sm9764518pfh.141.2024.04.25.06.29.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 06:29:21 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , "Darrick J . Wong" , Ojaswin Mujoo , Ritesh Harjani , Jan Kara Subject: [RFCv3 5/7] iomap: Fix iomap_adjust_read_range for plen calculation Date: Thu, 25 Apr 2024 18:58:49 +0530 Message-ID: X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If the extent spans the block that contains the i_size, we need to handle both halves separately but only when the i_size is within the current folio under processing. "orig_pos + length > isize" can be true for all folios if the mapped extent length is greater than the folio size. That is making plen to break for every folio instead of only the last folio. So use orig_plen for checking if "orig_pos + orig_plen > isize". Signed-off-by: Ritesh Harjani (IBM) cc: Ojaswin Mujoo Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/buffered-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 4e8e41c8b3c0..9f79c82d1f73 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -241,6 +241,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, unsigned block_size = (1 << block_bits); size_t poff = offset_in_folio(folio, *pos); size_t plen = min_t(loff_t, folio_size(folio) - poff, length); + size_t orig_plen = plen; unsigned first = poff >> block_bits; unsigned last = (poff + plen - 1) >> block_bits; @@ -277,7 +278,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, * handle both halves separately so that we properly zero data in the * page cache for blocks that are entirely outside of i_size. */ - if (orig_pos <= isize && orig_pos + length > isize) { + if (orig_pos <= isize && orig_pos + orig_plen > isize) { unsigned end = offset_in_folio(folio, isize - 1) >> block_bits; if (first <= end && last > end) From patchwork Thu Apr 25 13:28:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13643380 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7EF4149C43; Thu, 25 Apr 2024 13:29:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051768; cv=none; b=NtrFBnsAq3pLawLmIU/TP351ztKlyWj0yOZb9DbeCZibhgnuyrNq91GsgEqPmJiqStpCDDaOYvpA1KSmKZ66t4t6upe1YHFDPyNlGSu+hGrV0tG8XNWAAzKgCrfib86AfQBDA7JilN/FzdVqDWQIyAkvKCwkMerZk5qkfOt3ypI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051768; c=relaxed/simple; bh=tsPaJVHRDXkBmzNMwJ/nKOKe9OUjegd10o36pE8Dt/I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=APYcSWpImN3yH2gU6FzK3ATbaGpCEMBmaZUlMtaxyJLbqI3pEcyrM7Q2Ax6GbqkwyX0k12Zj95iQYgwQrOrhADo7MtaYq+ujSNbfhSkFm3DORCSuM/7fcCfZkN47cD6wRynInTAuRrnuURti1V6Xeq9xfbQviAaWd+KOJo9IXyA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ItKQVO+r; arc=none smtp.client-ip=209.85.210.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ItKQVO+r" Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-6ed0e9ccca1so949076b3a.0; Thu, 25 Apr 2024 06:29:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714051765; x=1714656565; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RCyv0eSKDRq6D8j569kOqYfrClk2YTDJxfBncGXtKdE=; b=ItKQVO+r+z+VyMzD5upQeyjR9mdn7CWeBu1LYTBqq08UwsApSiX/k/QQdu/lFESofb q3W59tmDFQPQ/rSM3rpjNGM32DJy5PjI++I36jytbd/yKEtg2MjjF1UMUZcR/KlddTzV TLT6oc/GDdVvu5rr/iErQnkIo19Mpba5YeUSyIXHgK8icJYFXX7XOMsZiawrirb6U0Kd nUET/h8vJ9+gfUj9qs3WWIi8aTDG1ZBA1o/UzNVc+QlhxR8yM+yuBZHAKr/ctNm2sZUN NtNH0NeTgJ8LF39bb+U4Ksc4T6BfA3qd29UVwD7H3BcchDvEb451gNYU/wy3/+4oQJyx jlrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714051765; x=1714656565; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RCyv0eSKDRq6D8j569kOqYfrClk2YTDJxfBncGXtKdE=; b=ugbuSnUGq9UDl7bx2HykLMzrnNpsJ4lKzNs2Y0DX/nYQZllEmK2WT7XX3+5Ds8IszS yF1N3RtX5iXMk3oE2YLEAnhBoJ/ndtsUlOBe3ch5x+b65DS7DvdK8zGVZSm2r+KlsTme MWg1ZYAqlwq9K1VuD+3kYGjxYE0YLiLFfWDYbA8GboRaDD4Eh2wq+2pbVGLTXAoM4VK1 i7gf2aEE4tTP2xraT9+UCRPhi2WglgioEwdr7g6cnqx6BNBgLwU1fWYQx7H8qdPyHZK9 04Q+h5c8dQpXMGEvWIRXCIMwtT8GKM/7e1cuDLNRbKr2QbxLAFdIZjDyIwL5SQdZJiMz WrEA== X-Forwarded-Encrypted: i=1; AJvYcCV66B9dAXNC15tX8Dk8TZB3OMBciVgUvD2zSogkOspWpwb8Q7mnTKn4TXHU4hAygT6i144RSGq3xPJllSSeKM+Rox/o+hGNIU88 X-Gm-Message-State: AOJu0YwuONDeDf06eolHJcWRtZVc/MNgIscgWQwHYsRU+FLRaKyQHEsT qHxAypkp5laFR8soR2ZtY+d3LnNn0uKqe1i5nA6CZNgXdVk0w8R2vcrYtOKT X-Google-Smtp-Source: AGHT+IH99lzg0CGyhPNWPz4vjXqdj2G1nV6t3Zwosmar0rYxpJRofmUYJ57x77C/2QSEKkA+kMYTdw== X-Received: by 2002:a05:6a21:9217:b0:1ad:8862:318f with SMTP id tl23-20020a056a21921700b001ad8862318fmr5730168pzb.12.1714051764966; Thu, 25 Apr 2024 06:29:24 -0700 (PDT) Received: from dw-tp.in.ibm.com ([129.41.58.7]) by smtp.gmail.com with ESMTPSA id s15-20020a62e70f000000b006f260fb17e5sm9764518pfh.141.2024.04.25.06.29.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 06:29:24 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , "Darrick J . Wong" , Ojaswin Mujoo , Ritesh Harjani , Jan Kara Subject: [RFCv3 6/7] iomap: Optimize iomap_read_folio Date: Thu, 25 Apr 2024 18:58:50 +0530 Message-ID: X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 iomap_readpage_iter() handles "uptodate blocks" and "not uptodate blocks" within a folio separately. This makes iomap_read_folio() to call into ->iomap_begin() to request for extent mapping even though it might already have an extent which is not fully processed. This happens when we either have a large folio or with bs < ps. In these cases we can have sub blocks which can be uptodate (say for e.g. due to previous writes). With iomap_read_folio_iter(), this is handled more efficiently by not calling ->iomap_begin() call until all the sub blocks with the current folio are processed. Signed-off-by: Ritesh Harjani (IBM) cc: Ojaswin Mujoo --- fs/iomap/buffered-io.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 9f79c82d1f73..0a4269095ae2 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -444,6 +444,24 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, return pos - orig_pos + plen; } +static loff_t iomap_read_folio_iter(const struct iomap_iter *iter, + struct iomap_readpage_ctx *ctx) +{ + struct folio *folio = ctx->cur_folio; + size_t pos = offset_in_folio(folio, iter->pos); + loff_t length = min_t(loff_t, folio_size(folio) - pos, + iomap_length(iter)); + loff_t done, ret; + + for (done = 0; done < length; done += ret) { + ret = iomap_readpage_iter(iter, ctx, done); + if (ret <= 0) + return ret; + } + + return done; +} + int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops) { struct iomap_iter iter = { @@ -459,7 +477,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops) trace_iomap_readpage(iter.inode, 1); while ((ret = iomap_iter(&iter, ops)) > 0) - iter.processed = iomap_readpage_iter(&iter, &ctx, 0); + iter.processed = iomap_read_folio_iter(&iter, &ctx); if (ret < 0) folio_set_error(folio); From patchwork Thu Apr 25 13:28:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13643381 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16FE6149C5E; Thu, 25 Apr 2024 13:29:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051771; cv=none; b=IgzavEUF3eTFnrpUX5JLBRsrMl1MuvMhXTf0a6DGPOjmQJMnKQHlv/LIfoebnVWj46LfVAH84HZRL75J4+ruFEbAZuKMiDQSOTw4o3wA4bq14LF9XPCFD4okikhyqRzFCyYbi95oLKliV6My9Ey8jVpI3lNcQojA0ZR9Ho3XWDU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051771; c=relaxed/simple; bh=mXwhomWyRGbw6y94ChOUxJcx7nd/wj3RuraEiiLl0WQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kEjGKrBotMB6xpGgAQpNx0ICmQ65vkF/1JNI8PYscSALSzqsFoffXP51fNA+QOhSbW1BVvKFqD/E54AD76ipYdtdUhJYGAY5bflLLWtzVH7UGH0QQOxGJ4YZVuNmIQ92/9piV4YL0h9NUD/IMxaaWEms1i0XcjGXIA+PRdr5ZQE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Sbv6z1kR; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Sbv6z1kR" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-6f043f9e6d7so1044409b3a.3; Thu, 25 Apr 2024 06:29:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714051768; x=1714656568; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QRED1FXDeH08jKI4Y85H1+m9zHcF9yp/Q4DUuv7pi/8=; b=Sbv6z1kR5nZpnFP5sCwuN36E6Ct1UmgXrQNAo5HokBABhq8IfDToZUIZmL9gbw12Vy 9MbLKNAOUydfFj3iVMQ5oa894fJ4adL4lDTa+bkcKU/Jq7ww9u1m+ZBfa0LUF5qmcwjj nGMtv/+D0DIi6KGUJPcCXEClSqWxTUgGDbHAGIQG1bQ3FAcdyVeR2Nh5BOIAHwLSgrNi HJMZ9n79TFCHOc9NkkcwxtRG7Oeg7wqOmgf0ah3edMDiGQ/9R2+Mc/aoNKgOvvS/45yj Zu79iYnm0qK9OAPYJhMgAsLae6DPPD9yTCfGy6GS7QqmGfSJmWqwx6g36cpOvEnLvP3P frfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714051768; x=1714656568; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QRED1FXDeH08jKI4Y85H1+m9zHcF9yp/Q4DUuv7pi/8=; b=ES6uAIy0g6gNXERMGG0uUNZm2wD+/2jhVa8reT82A021GZu7+q4E96MDRecLpjTGfc B8zEZQtStL9zQMHlkR7MAoRn2WUVyOGZI/B6MQSQiGpiJsUOg8M4QnPAzmaR5kZiZOjV vKLIbidnzidgXlm8Ih1AQMErmINP+wjVTI9CwsbnUj+4fwJY666PoPHkOI3Bx5+RG8TL ZdWDWvZ9PqVHOiRbsAwcnBs4QRb9ZpCL97cTifvQlr6kuMNBGFSGmSbnmKrhLrWkwpff Cdt8vceBlfBdaOt8r25mLorMYth/i8Zq5Fc8WjHYxoOQQNuJu1+W0Ya/QEWoXdxFrmSs HYtw== X-Forwarded-Encrypted: i=1; AJvYcCX4VI7uG3EqO3BCxWHjgwbv5qzcdDHH2UGGnrsBUwTlwrPFbDWvSZKUyTYVVQGRKfSlEDrQkwBg9vmjDw4IFR+YSvLJtU6VgUMO X-Gm-Message-State: AOJu0YyjPUXIYAAW1T4kA8Hu8UVc0NPted2azA9VLFoRD3yInEcxB9zb ETUlTJ0A9en1E0nHFqMWGj+Zh08aLlqk+jfgits+LmtXKuVNCP1llQ02q1+Z X-Google-Smtp-Source: AGHT+IE842vOw+suDWOfmSI5f9x78evzQ+L8isXlOhD7js0Dp8z+R/MmRtyCTAHPH/QacsH85G29vg== X-Received: by 2002:a05:6a00:a93:b0:6e6:843f:1d05 with SMTP id b19-20020a056a000a9300b006e6843f1d05mr7305784pfl.25.1714051768374; Thu, 25 Apr 2024 06:29:28 -0700 (PDT) Received: from dw-tp.in.ibm.com ([129.41.58.7]) by smtp.gmail.com with ESMTPSA id s15-20020a62e70f000000b006f260fb17e5sm9764518pfh.141.2024.04.25.06.29.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 06:29:27 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , "Darrick J . Wong" , Ojaswin Mujoo , Ritesh Harjani , Jan Kara Subject: [RFCv3 7/7] iomap: Optimize data access patterns for filesystems with indirect mappings Date: Thu, 25 Apr 2024 18:58:51 +0530 Message-ID: <4e2752e99f55469c4eb5f2fe83e816d529110192.1714046808.git.ritesh.list@gmail.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch optimizes the data access patterns for filesystems with indirect block mapping by implementing BH_Boundary handling within iomap. Currently the bios for reads within iomap are only submitted at 2 places - 1. If we cannot merge the new req. with previous bio, only then we submit the previous bio. 2. Submit the bio at the end of the entire read processing. This means for filesystems with indirect block mapping, we call into ->iomap_begin() again w/o submitting the previous bios. That causes unoptimized data access patterns for blocks which are of BH_Boundary type. For e.g. consider the file mapping logical block(4k) physical block(4k) 0-11 1000-1011 12-15 1013-1016 In above physical block 1012 is an indirect metadata block which has the mapping information for next set of indirect blocks (1013-1016). With iomap buffered reads for reading 1st 16 logical blocks of a file (0-15), we get below I/O pattern - submit a bio for 1012 - complete the bio for 1012 - submit a bio for 1000-1011 - submit a bio for 1013-1016 - complete the bios for 1000-1011 - complete the bios for 1013-1016 So as we can see, above is an non-optimal I/O access pattern and also we get 3 bio completions instead of 2. This patch changes this behavior by doing submit_bio() if there are any bios already processed, before calling ->iomap_begin() again. That means if there are any blocks which are already processed, gets submitted for I/O earlier and then within ->iomap_begin(), if we get a request for reading an indirect metadata block, then block layer can merge those bios with the already submitted read request to reduce the no. of bio completions. Now, for bs < ps or for large folios, this patch requires proper handling of "ifs->read_bytes_pending". In that we first set ifs->read_bytes_pending to folio_size. Then handle all the cases where we need to subtract ifs->read_bytes_pending either during the submission side (if we don't need to submit any I/O - for e.g. for uptodate sub blocks), or during an I/O error, or at the completion of an I/O. Here is the ftrace output of iomap and block layer with ext2 iomap conversion patches - root# filefrag -b512 -v /mnt1/test/f1 Filesystem type is: ef53 Filesystem cylinder groups approximately 32 File size of /mnt1/test/f1 is 65536 (128 blocks of 512 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 95: 98304.. 98399: 96: merged 1: 96.. 127: 98408.. 98439: 32: 98400: last,merged,eof /mnt1/test/f1: 2 extents found root# #This reads 4 blocks starting from lblk 10, 11, 12, 13 root# xfs_io -c "pread -b$((4*4096)) $((10*4096)) $((4*4096))" /mnt1/test/f1 w/o this patch - (indirect block is submitted before and does not get merged, resulting in 3 bios completion) xfs_io-907 [002] ..... 185.608791: iomap_readahead: dev 8:16 ino 0xc nr_pages 4 xfs_io-907 [002] ..... 185.608819: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x4000 processed 0 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x9d/0x2c0 xfs_io-907 [002] ..... 185.608823: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300a000 offset 0xa000 length 0x2000 type MAPPED flags MERGED xfs_io-907 [002] ..... 185.608831: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x2000 processed 8192 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1e1/0x2c0 xfs_io-907 [002] ..... 185.608859: block_bio_queue: 8,16 R 98400 + 8 [xfs_io] xfs_io-907 [002] ..... 185.608865: block_getrq: 8,16 R 98400 + 8 [xfs_io] xfs_io-907 [002] ..... 185.608867: block_io_start: 8,16 R 4096 () 98400 + 8 [xfs_io] xfs_io-907 [002] ..... 185.608869: block_plug: [xfs_io] xfs_io-907 [002] ..... 185.608872: block_unplug: [xfs_io] 1 xfs_io-907 [002] ..... 185.608874: block_rq_insert: 8,16 R 4096 () 98400 + 8 [xfs_io] kworker/2:1H-198 [002] ..... 185.608908: block_rq_issue: 8,16 R 4096 () 98400 + 8 [kworker/2:1H] -0 [002] d.h2. 185.609579: block_rq_complete: 8,16 R () 98400 + 8 [0] -0 [002] dNh2. 185.609631: block_io_done: 8,16 R 0 () 98400 + 0 [swapper/2] xfs_io-907 [002] ..... 185.609694: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300d000 offset 0xc000 length 0x2000 type MAPPED flags MERGED xfs_io-907 [002] ..... 185.609704: block_bio_queue: 8,16 RA 98384 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609718: block_getrq: 8,16 RA 98384 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609721: block_io_start: 8,16 RA 8192 () 98384 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609726: block_plug: [xfs_io] xfs_io-907 [002] ..... 185.609735: iomap_iter: dev 8:16 ino 0xc pos 0xc000 length 0x2000 processed 8192 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1e1/0x2c0 xfs_io-907 [002] ..... 185.609736: block_bio_queue: 8,16 RA 98408 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609740: block_getrq: 8,16 RA 98408 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609741: block_io_start: 8,16 RA 8192 () 98408 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609756: block_rq_issue: 8,16 RA 8192 () 98408 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609769: block_rq_issue: 8,16 RA 8192 () 98384 + 16 [xfs_io] -0 [002] d.H2. 185.610280: block_rq_complete: 8,16 RA () 98408 + 16 [0] -0 [002] d.H2. 185.610289: block_io_done: 8,16 RA 0 () 98408 + 0 [swapper/2] -0 [002] d.H2. 185.610292: block_rq_complete: 8,16 RA () 98384 + 16 [0] -0 [002] dNH2. 185.610301: block_io_done: 8,16 RA 0 () 98384 + 0 [swapper/2] v/s with the patch - (optimzed I/O access pattern and bio gets merged resulting in only 2 bios completion) xfs_io-944 [005] ..... 99.926187: iomap_readahead: dev 8:16 ino 0xc nr_pages 4 xfs_io-944 [005] ..... 99.926208: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x4000 processed 0 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x9d/0x2c0 xfs_io-944 [005] ..... 99.926211: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300a000 offset 0xa000 length 0x2000 type MAPPED flags MERGED xfs_io-944 [005] ..... 99.926222: block_bio_queue: 8,16 RA 98384 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926232: block_getrq: 8,16 RA 98384 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926233: block_io_start: 8,16 RA 8192 () 98384 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926234: block_plug: [xfs_io] xfs_io-944 [005] ..... 99.926235: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x2000 processed 8192 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1f9/0x2c0 xfs_io-944 [005] ..... 99.926261: block_bio_queue: 8,16 R 98400 + 8 [xfs_io] xfs_io-944 [005] ..... 99.926266: block_bio_backmerge: 8,16 R 98400 + 8 [xfs_io] xfs_io-944 [005] ..... 99.926271: block_unplug: [xfs_io] 1 xfs_io-944 [005] ..... 99.926272: block_rq_insert: 8,16 RA 12288 () 98384 + 24 [xfs_io] kworker/5:1H-234 [005] ..... 99.926314: block_rq_issue: 8,16 RA 12288 () 98384 + 24 [kworker/5:1H] -0 [005] d.h2. 99.926905: block_rq_complete: 8,16 RA () 98384 + 24 [0] -0 [005] dNh2. 99.926931: block_io_done: 8,16 RA 0 () 98384 + 0 [swapper/5] xfs_io-944 [005] ..... 99.926971: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300d000 offset 0xc000 length 0x2000 type MAPPED flags MERGED xfs_io-944 [005] ..... 99.926981: block_bio_queue: 8,16 RA 98408 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926989: block_getrq: 8,16 RA 98408 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926989: block_io_start: 8,16 RA 8192 () 98408 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926991: block_plug: [xfs_io] xfs_io-944 [005] ..... 99.926993: iomap_iter: dev 8:16 ino 0xc pos 0xc000 length 0x2000 processed 8192 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1f9/0x2c0 xfs_io-944 [005] ..... 99.927001: block_rq_issue: 8,16 RA 8192 () 98408 + 16 [xfs_io] -0 [005] d.h2. 99.927397: block_rq_complete: 8,16 RA () 98408 + 16 [0] -0 [005] dNh2. 99.927414: block_io_done: 8,16 RA 0 () 98408 + 0 [swapper/5] Suggested-by: Matthew Wilcox Signed-off-by: Ritesh Harjani (IBM) cc: Ojaswin Mujoo --- fs/iomap/buffered-io.c | 112 +++++++++++++++++++++++++++++++---------- 1 file changed, 85 insertions(+), 27 deletions(-) -- 2.44.0 diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 0a4269095ae2..a1d50086a3f5 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -30,7 +30,7 @@ typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length); */ struct iomap_folio_state { spinlock_t state_lock; - unsigned int read_bytes_pending; + size_t read_bytes_pending; atomic_t write_bytes_pending; /* @@ -380,6 +380,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, loff_t orig_pos = pos; size_t poff, plen; sector_t sector; + bool rbp_finished = false; if (iomap->type == IOMAP_INLINE) return iomap_read_inline_data(iter, folio); @@ -387,21 +388,39 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, /* zero post-eof blocks as the page may be mapped */ ifs = ifs_alloc(iter->inode, folio, iter->flags); iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen); + + if (ifs) { + loff_t to_read = min_t(loff_t, iter->len - offset, + folio_size(folio) - offset_in_folio(folio, orig_pos)); + size_t padjust; + + spin_lock_irq(&ifs->state_lock); + if (!ifs->read_bytes_pending) + ifs->read_bytes_pending = to_read; + padjust = pos - orig_pos; + ifs->read_bytes_pending -= padjust; + if (!ifs->read_bytes_pending) + rbp_finished = true; + spin_unlock_irq(&ifs->state_lock); + } + if (plen == 0) goto done; if (iomap_block_needs_zeroing(iter, pos)) { + if (ifs) { + spin_lock_irq(&ifs->state_lock); + ifs->read_bytes_pending -= plen; + if (!ifs->read_bytes_pending) + rbp_finished = true; + spin_unlock_irq(&ifs->state_lock); + } folio_zero_range(folio, poff, plen); iomap_set_range_uptodate(folio, poff, plen); goto done; } ctx->cur_folio_in_bio = true; - if (ifs) { - spin_lock_irq(&ifs->state_lock); - ifs->read_bytes_pending += plen; - spin_unlock_irq(&ifs->state_lock); - } sector = iomap_sector(iomap, pos); if (!ctx->bio || @@ -435,6 +454,14 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, } done: + /* + * If there is no bio prepared and if rbp is finished and + * this was the last offset within this folio then mark + * cur_folio_in_bio to false. + */ + if (!ctx->bio && rbp_finished && + offset_in_folio(folio, pos + plen) == 0) + ctx->cur_folio_in_bio = false; /* * Move the caller beyond our range so that it keeps making progress. * For that, we have to include any leading non-uptodate ranges, but @@ -459,9 +486,43 @@ static loff_t iomap_read_folio_iter(const struct iomap_iter *iter, return ret; } + if (ctx->bio) { + submit_bio(ctx->bio); + WARN_ON_ONCE(!ctx->cur_folio_in_bio); + ctx->bio = NULL; + } + if (offset_in_folio(folio, iter->pos + done) == 0 && + !ctx->cur_folio_in_bio) { + folio_unlock(ctx->cur_folio); + } + return done; } +static void iomap_handle_read_error(struct iomap_readpage_ctx *ctx, + struct iomap_iter *iter) +{ + struct folio *folio = ctx->cur_folio; + struct iomap_folio_state *ifs; + unsigned long flags; + bool rbp_finished = false; + size_t rbp_adjust = folio_size(folio) - offset_in_folio(folio, + iter->pos); + ifs = folio->private; + if (!ifs || !ifs->read_bytes_pending) + goto unlock; + + spin_lock_irqsave(&ifs->state_lock, flags); + ifs->read_bytes_pending -= rbp_adjust; + if (!ifs->read_bytes_pending) + rbp_finished = true; + spin_unlock_irqrestore(&ifs->state_lock, flags); + +unlock: + if (rbp_finished || !ctx->cur_folio_in_bio) + folio_unlock(folio); +} + int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops) { struct iomap_iter iter = { @@ -479,15 +540,9 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops) while ((ret = iomap_iter(&iter, ops)) > 0) iter.processed = iomap_read_folio_iter(&iter, &ctx); - if (ret < 0) + if (ret < 0) { folio_set_error(folio); - - if (ctx.bio) { - submit_bio(ctx.bio); - WARN_ON_ONCE(!ctx.cur_folio_in_bio); - } else { - WARN_ON_ONCE(ctx.cur_folio_in_bio); - folio_unlock(folio); + iomap_handle_read_error(&ctx, &iter); } /* @@ -506,12 +561,6 @@ static loff_t iomap_readahead_iter(const struct iomap_iter *iter, loff_t done, ret; for (done = 0; done < length; done += ret) { - if (ctx->cur_folio && - offset_in_folio(ctx->cur_folio, iter->pos + done) == 0) { - if (!ctx->cur_folio_in_bio) - folio_unlock(ctx->cur_folio); - ctx->cur_folio = NULL; - } if (!ctx->cur_folio) { ctx->cur_folio = readahead_folio(ctx->rac); ctx->cur_folio_in_bio = false; @@ -519,6 +568,17 @@ static loff_t iomap_readahead_iter(const struct iomap_iter *iter, ret = iomap_readpage_iter(iter, ctx, done); if (ret <= 0) return ret; + if (ctx->cur_folio && offset_in_folio(ctx->cur_folio, + iter->pos + done + ret) == 0) { + if (!ctx->cur_folio_in_bio) + folio_unlock(ctx->cur_folio); + ctx->cur_folio = NULL; + } + } + + if (ctx->bio) { + submit_bio(ctx->bio); + ctx->bio = NULL; } return done; @@ -549,18 +609,16 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops) struct iomap_readpage_ctx ctx = { .rac = rac, }; + int ret = 0; trace_iomap_readahead(rac->mapping->host, readahead_count(rac)); - while (iomap_iter(&iter, ops) > 0) + while ((ret = iomap_iter(&iter, ops)) > 0) iter.processed = iomap_readahead_iter(&iter, &ctx); - if (ctx.bio) - submit_bio(ctx.bio); - if (ctx.cur_folio) { - if (!ctx.cur_folio_in_bio) - folio_unlock(ctx.cur_folio); - } + if (ret < 0 && ctx.cur_folio) + iomap_handle_read_error(&ctx, &iter); + } EXPORT_SYMBOL_GPL(iomap_readahead);