mbox series

[00/21] erofs: patchset addressing Christoph's comments

Message ID 20190901055130.30572-1-hsiangkao@aol.com (mailing list archive)
Headers show
Series erofs: patchset addressing Christoph's comments | expand

Message

Gao Xiang Sept. 1, 2019, 5:51 a.m. UTC
Hi,

This patchset is based on the following patch by Pratik Shinde,
https://lore.kernel.org/linux-erofs/20190830095615.10995-1-pratikshinde320@gmail.com/

All patches addressing Christoph's comments on v6, which are trivial,
most deleted code are from erofs specific fault injection, which was
followed f2fs and previously discussed in earlier topic [1], but
let's follow what Christoph's said now.

Comments and suggestions are welcome...

[1] https://lore.kernel.org/r/1eed1e6b-f95e-aa8e-c3e7-e9870401ee23@kernel.org/

Thanks,
Gao Xiang

Gao Xiang (21):
  erofs: remove all the byte offset comments
  erofs: on-disk format should have explicitly assigned numbers
  erofs: some macros are much more readable as a function
  erofs: kill __packed for on-disk structures
  erofs: update erofs_inode_is_data_compressed helper
  erofs: kill erofs_{init,exit}_inode_cache
  erofs: use erofs_inode naming
  erofs: update comments in inode.c
  erofs: update erofs symlink stuffs
  erofs: kill is_inode_layout_compression()
  erofs: use dsb instead of layout for ondisk super_block
  erofs: kill verbose debug info in erofs_fill_super
  erofs: simplify erofs_grab_bio() since bio_alloc() never fail
  erofs: kill prio and nofail of erofs_get_meta_page()
  erofs: kill __submit_bio()
  erofs: kill magic underscores
  erofs: use a switch statement when dealing with the file modes
  erofs: add "erofs_" prefix for common and short functions
  erofs: kill all erofs specific fault injection
  erofs: kill use_vmap module parameter
  erofs: save one level of indentation

 Documentation/filesystems/erofs.txt |   9 --
 fs/erofs/Kconfig                    |   7 --
 fs/erofs/data.c                     |  62 +++-------
 fs/erofs/decompressor.c             |  34 ++---
 fs/erofs/dir.c                      |   6 +-
 fs/erofs/erofs_fs.h                 | 162 ++++++++++++------------
 fs/erofs/inode.c                    | 176 +++++++++++++-------------
 fs/erofs/internal.h                 | 156 +++--------------------
 fs/erofs/namei.c                    |  12 +-
 fs/erofs/super.c                    | 185 ++++++++--------------------
 fs/erofs/xattr.c                    |  33 +++--
 fs/erofs/xattr.h                    |   4 +-
 fs/erofs/zdata.c                    |  44 +++----
 fs/erofs/zmap.c                     |  32 ++---
 include/trace/events/erofs.h        |  14 +--
 15 files changed, 338 insertions(+), 598 deletions(-)

Comments

Christoph Hellwig Sept. 2, 2019, 12:46 p.m. UTC | #1
On Sun, Sep 01, 2019 at 01:51:09PM +0800, Gao Xiang wrote:
> Hi,
> 
> This patchset is based on the following patch by Pratik Shinde,
> https://lore.kernel.org/linux-erofs/20190830095615.10995-1-pratikshinde320@gmail.com/
> 
> All patches addressing Christoph's comments on v6, which are trivial,
> most deleted code are from erofs specific fault injection, which was
> followed f2fs and previously discussed in earlier topic [1], but
> let's follow what Christoph's said now.

I like where the cleanups are going.  But this is really just basic
code quality stuff.  We're not addressing the issues with large amounts
of functionality duplicating VFS helpers.
Gao Xiang Sept. 2, 2019, 2:24 p.m. UTC | #2
Hi Christoph,

On Mon, Sep 02, 2019 at 05:46:45AM -0700, Christoph Hellwig wrote:
> On Sun, Sep 01, 2019 at 01:51:09PM +0800, Gao Xiang wrote:
> > Hi,
> > 
> > This patchset is based on the following patch by Pratik Shinde,
> > https://lore.kernel.org/linux-erofs/20190830095615.10995-1-pratikshinde320@gmail.com/
> > 
> > All patches addressing Christoph's comments on v6, which are trivial,
> > most deleted code are from erofs specific fault injection, which was
> > followed f2fs and previously discussed in earlier topic [1], but
> > let's follow what Christoph's said now.
> 
> I like where the cleanups are going.  But this is really just basic

For now, I think it will go to Greg's tree. Before that, I will do patchset
v2 to address all remaining comments...

> code quality stuff.  We're not addressing the issues with large amounts
> of functionality duplicating VFS helpers.

You means killing erofs_get_meta_page or avoid erofs_read_raw_page?

 - For killing erofs_get_meta_page, here is the current erofs_get_meta_page:

 35 struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr)
 36 {
 37         struct inode *const bd_inode = sb->s_bdev->bd_inode;
 38         struct address_space *const mapping = bd_inode->i_mapping;
 39         /* prefer retrying in the allocator to blindly looping below */
 40         const gfp_t gfp = mapping_gfp_constraint(mapping, ~__GFP_FS);
 41         struct page *page;
 42         int err;
 43
 44 repeat:
 45         page = find_or_create_page(mapping, blkaddr, gfp);
 46         if (!page)
 47                 return ERR_PTR(-ENOMEM);
 48
 49         DBG_BUGON(!PageLocked(page));
 50
 51         if (!PageUptodate(page)) {
 52                 struct bio *bio;
 53
 54                 bio = erofs_grab_bio(sb, REQ_OP_READ | REQ_META,
 55                                      blkaddr, 1, sb, erofs_readendio);
 56
 57                 if (bio_add_page(bio, page, PAGE_SIZE, 0) != PAGE_SIZE) {
 58                         err = -EFAULT;
 59                         goto err_out;
 60                 }
 61
 62                 submit_bio(bio);
 63                 lock_page(page);
 64
 65                 /* this page has been truncated by others */
 66                 if (page->mapping != mapping) {
 67                         unlock_page(page);
 68                         put_page(page);
 69                         goto repeat;
 70                 }
 71
 72                 /* more likely a read error */
 73                 if (!PageUptodate(page)) {
 74                         err = -EIO;
 75                         goto err_out;
 76                 }
 77         }
 78         return page;
 79
 80 err_out:
 81         unlock_page(page);
 82         put_page(page);
 83         return ERR_PTR(err);
 84 }

I think it is simple enough. read_cache_page need write a similar
filler, or read_cache_page_gfp will call .readpage, and then
introduce buffer_heads, that is what I'd like to avoid now (no need these
bd_inode buffer_heads in memory...)

 - For erofs_read_raw_page, it can be avoided after iomap tail-end packing
   feature is done... If we remove it now, it will make EROFS broken.
   It is no urgent and Chao will focus on iomap tail-end packing feature.

Thanks,
Gao Xiang
Christoph Hellwig Sept. 2, 2019, 3:23 p.m. UTC | #3
On Mon, Sep 02, 2019 at 10:24:52PM +0800, Gao Xiang wrote:
> > code quality stuff.  We're not addressing the issues with large amounts
> > of functionality duplicating VFS helpers.
> 
> You means killing erofs_get_meta_page or avoid erofs_read_raw_page?
> 
>  - For killing erofs_get_meta_page, here is the current erofs_get_meta_page:

> I think it is simple enough. read_cache_page need write a similar
> filler, or read_cache_page_gfp will call .readpage, and then
> introduce buffer_heads, that is what I'd like to avoid now (no need these
> bd_inode buffer_heads in memory...)

If using read_cache_page_gfp and ->readpage works, please do.  The
fact that the block device inode uses buffer heads is an implementation
detail that might not last very long and should be invisible to you.
It also means you can get rid of a lot of code that you don't have
to maintain and others don't have to update for global API changes.

>  - For erofs_read_raw_page, it can be avoided after iomap tail-end packing
>    feature is done... If we remove it now, it will make EROFS broken.
>    It is no urgent and Chao will focus on iomap tail-end packing feature.

Ok.  I wish we would have just sorted this out beforehand, which we
could have trivially done without all that staging mess.
Gao Xiang Sept. 2, 2019, 3:50 p.m. UTC | #4
Hi Christoph,

On Mon, Sep 02, 2019 at 08:23:23AM -0700, Christoph Hellwig wrote:
> On Mon, Sep 02, 2019 at 10:24:52PM +0800, Gao Xiang wrote:
> > > code quality stuff.  We're not addressing the issues with large amounts
> > > of functionality duplicating VFS helpers.
> > 
> > You means killing erofs_get_meta_page or avoid erofs_read_raw_page?
> > 
> >  - For killing erofs_get_meta_page, here is the current erofs_get_meta_page:
> 
> > I think it is simple enough. read_cache_page need write a similar
> > filler, or read_cache_page_gfp will call .readpage, and then
> > introduce buffer_heads, that is what I'd like to avoid now (no need these
> > bd_inode buffer_heads in memory...)
> 
> If using read_cache_page_gfp and ->readpage works, please do.  The
> fact that the block device inode uses buffer heads is an implementation
> detail that might not last very long and should be invisible to you.
> It also means you can get rid of a lot of code that you don't have
> to maintain and others don't have to update for global API changes.

I care about those useless buffer_heads in memory for our products...

Since we are nobh filesystem (a little request, could I use it
after buffer_heads are fully avoided, I have no idea why I need
those buffer_heads in memory.... But I think bd_inode is good
for caching metadata...)

> 
> >  - For erofs_read_raw_page, it can be avoided after iomap tail-end packing
> >    feature is done... If we remove it now, it will make EROFS broken.
> >    It is no urgent and Chao will focus on iomap tail-end packing feature.
> 
> Ok.  I wish we would have just sorted this out beforehand, which we
> could have trivially done without all that staging mess.

Firstly, I'd like to introduce EROFS as a self-contained
filesystem to introduce new fixed-sized output compression
to upstream and promote it...

And then we can do many improvements for EROFS in parallel...
(if we introduce EROFS and touch many core modules like
iomap, mm readahead code or modify LZ4 code at once...
It could be more careful... Let's improve it step-by-step...
We are a dedicated team if the Linux community needs us,
we will still here... It will be actively maintained.)

Thanks,
Gao Xiang
Christoph Hellwig Sept. 3, 2019, 6:58 a.m. UTC | #5
On Mon, Sep 02, 2019 at 11:50:38PM +0800, Gao Xiang wrote:
> > > You means killing erofs_get_meta_page or avoid erofs_read_raw_page?
> > > 
> > >  - For killing erofs_get_meta_page, here is the current erofs_get_meta_page:
> > 
> > > I think it is simple enough. read_cache_page need write a similar
> > > filler, or read_cache_page_gfp will call .readpage, and then
> > > introduce buffer_heads, that is what I'd like to avoid now (no need these
> > > bd_inode buffer_heads in memory...)
> > 
> > If using read_cache_page_gfp and ->readpage works, please do.  The
> > fact that the block device inode uses buffer heads is an implementation
> > detail that might not last very long and should be invisible to you.
> > It also means you can get rid of a lot of code that you don't have
> > to maintain and others don't have to update for global API changes.
> 
> I care about those useless buffer_heads in memory for our products...
> 
> Since we are nobh filesystem (a little request, could I use it
> after buffer_heads are fully avoided, I have no idea why I need
> those buffer_heads in memory.... But I think bd_inode is good
> for caching metadata...)

Then please use read_cache_page with iomap_readpage(s), and write
comment explaining why your are not using read_cache_page_gfp.
Gao Xiang Sept. 3, 2019, 8:17 a.m. UTC | #6
Hi Christoph,

On Mon, Sep 02, 2019 at 11:58:03PM -0700, Christoph Hellwig wrote:
> On Mon, Sep 02, 2019 at 11:50:38PM +0800, Gao Xiang wrote:
> > > > You means killing erofs_get_meta_page or avoid erofs_read_raw_page?
> > > > 
> > > >  - For killing erofs_get_meta_page, here is the current erofs_get_meta_page:
> > > 
> > > > I think it is simple enough. read_cache_page need write a similar
> > > > filler, or read_cache_page_gfp will call .readpage, and then
> > > > introduce buffer_heads, that is what I'd like to avoid now (no need these
> > > > bd_inode buffer_heads in memory...)
> > > 
> > > If using read_cache_page_gfp and ->readpage works, please do.  The
> > > fact that the block device inode uses buffer heads is an implementation
> > > detail that might not last very long and should be invisible to you.
> > > It also means you can get rid of a lot of code that you don't have
> > > to maintain and others don't have to update for global API changes.
> > 
> > I care about those useless buffer_heads in memory for our products...
> > 
> > Since we are nobh filesystem (a little request, could I use it
> > after buffer_heads are fully avoided, I have no idea why I need
> > those buffer_heads in memory.... But I think bd_inode is good
> > for caching metadata...)
> 
> Then please use read_cache_page with iomap_readpage(s), and write
> comment explaining why your are not using read_cache_page_gfp.

I implement a prelimitary version, but I have no idea it is a really
cleanup for now.

From 001e3e64c81e4ced0d22b147e6abf90060e704b5 Mon Sep 17 00:00:00 2001
From: Gao Xiang <gaoxiang25@huawei.com>
Date: Tue, 3 Sep 2019 16:13:00 +0800
Subject: [PATCH] erofs: use iomap_readpage for erofs_get_meta_page

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
---
 fs/erofs/Kconfig |  1 +
 fs/erofs/data.c  | 91 ++++++++++++++++++++++++++----------------------
 2 files changed, 51 insertions(+), 41 deletions(-)

diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index 9d634d3a1845..c9eeb0bf4737 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -3,6 +3,7 @@
 config EROFS_FS
 	tristate "EROFS filesystem support"
 	depends on BLOCK
+	select FS_IOMAP
 	help
 	  EROFS (Enhanced Read-Only File System) is a lightweight
 	  read-only file system with modern designs (eg. page-sized
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 3881c0689134..34c6e05fab71 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -5,6 +5,9 @@
  * Created by Gao Xiang <gaoxiang25@huawei.com>
  */
 #include "internal.h"
+#include <linux/iomap.h>
+#include <linux/mpage.h>
+#include <linux/sched/mm.h>
 #include <linux/prefetch.h>
 
 #include <trace/events/erofs.h>
@@ -51,54 +54,60 @@ static struct bio *erofs_grab_raw_bio(struct super_block *sb,
 	return bio;
 }
 
-struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr)
+static int erofs_meta_iomap_begin(struct inode *inode, loff_t pos,
+				  loff_t length, unsigned int flags,
+				  struct iomap *iomap)
 {
-	struct inode *const bd_inode = sb->s_bdev->bd_inode;
-	struct address_space *const mapping = bd_inode->i_mapping;
-	/* prefer retrying in the allocator to blindly looping below */
-	const gfp_t gfp = mapping_gfp_constraint(mapping, ~__GFP_FS);
-	struct page *page;
-	int err;
-
-repeat:
-	page = find_or_create_page(mapping, blkaddr, gfp);
-	if (!page)
-		return ERR_PTR(-ENOMEM);
-
-	DBG_BUGON(!PageLocked(page));
-
-	if (!PageUptodate(page)) {
-		struct bio *bio;
+	const unsigned int blkbits = inode->i_blkbits;
+
+	iomap->flags = 0;
+	iomap->bdev = I_BDEV(inode);
+	iomap->offset = round_down(pos, 1 << blkbits);
+	iomap->addr = iomap->offset;
+	iomap->length = round_up(length, 1 << blkbits);
+	iomap->type = IOMAP_MAPPED;
+	return 0;
+}
 
-		bio = erofs_grab_raw_bio(sb, blkaddr, 1, true);
+static const struct iomap_ops erofs_meta_iomap_ops = {
+	.iomap_begin = erofs_meta_iomap_begin,
+};
 
-		if (bio_add_page(bio, page, PAGE_SIZE, 0) != PAGE_SIZE) {
-			err = -EFAULT;
-			goto err_out;
-		}
+static int
+erofs_meta_get_block(struct inode *inode, sector_t iblock,
+		     struct buffer_head *bh, int create)
+{
+	bh->b_bdev = I_BDEV(inode);
+	bh->b_blocknr = iblock;
+	set_buffer_mapped(bh);
+	return 0;
+}
 
-		submit_bio(bio);
-		lock_page(page);
+static int erofs_read_meta_page(void *file, struct page *page)
+{
+	/* in case of getting some pages with buffer_heads */
+	if (i_blocksize(page->mapping->host) == PAGE_SIZE &&
+	    !page_has_buffers(page))
+		return iomap_readpage(page, &erofs_meta_iomap_ops);
+
+	/*
+	 * cannot use blkdev_readpage or blkdev_get_block directly
+	 * since static in block_dev.c
+	 */
+	return mpage_readpage(page, erofs_meta_get_block);
+}
 
-		/* this page has been truncated by others */
-		if (page->mapping != mapping) {
-			unlock_page(page);
-			put_page(page);
-			goto repeat;
-		}
+struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr)
+{
+	struct inode *const bd_inode = sb->s_bdev->bd_inode;
+	struct address_space *const mapping = bd_inode->i_mapping;
+	unsigned int nofs_flag;
+	struct page *page;
 
-		/* more likely a read error */
-		if (!PageUptodate(page)) {
-			err = -EIO;
-			goto err_out;
-		}
-	}
+	nofs_flag = memalloc_nofs_save();
+	page = read_cache_page(mapping, blkaddr, erofs_read_meta_page, NULL);
+	memalloc_nofs_restore(nofs_flag);
 	return page;
-
-err_out:
-	unlock_page(page);
-	put_page(page);
-	return ERR_PTR(err);
 }
 
 static int erofs_map_blocks_flatmode(struct inode *inode,
Christoph Hellwig Sept. 3, 2019, 3:37 p.m. UTC | #7
On Tue, Sep 03, 2019 at 04:17:49PM +0800, Gao Xiang wrote:
> I implement a prelimitary version, but I have no idea it is a really
> cleanup for now.

The fact that this has to guess the block device address_space
implementation is indeed pretty ugly.  I'd much prefer to just use
read_cache_page_gfp, and live with the fact that this allocates
bufferheads behind you for now.  I'll try to speed up my attempts to
get rid of the buffer heads on the block device mapping instead.
Gao Xiang Sept. 3, 2019, 3:43 p.m. UTC | #8
Hi Christoph,

On Tue, Sep 03, 2019 at 08:37:04AM -0700, Christoph Hellwig wrote:
> On Tue, Sep 03, 2019 at 04:17:49PM +0800, Gao Xiang wrote:
> > I implement a prelimitary version, but I have no idea it is a really
> > cleanup for now.
> 
> The fact that this has to guess the block device address_space
> implementation is indeed pretty ugly.  I'd much prefer to just use
> read_cache_page_gfp, and live with the fact that this allocates
> bufferheads behind you for now.  I'll try to speed up my attempts to
> get rid of the buffer heads on the block device mapping instead.

Fully agree with your point. Let me use read_cache_page_gfp
instead for now, hoping block device quickly avoiding from
buffer_heads...

Thanks,
Gao Xiang