Message ID | 20190703075502.79782-1-yuchao0@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] iomap: generalize IOMAP_INLINE to cover tail-packing case | expand |
Ping, any comments on this patch? On 2019/7/3 15:55, Chao Yu wrote: > Some filesystems like erofs/reiserfs have the ability to pack tail > data into metadata, e.g.: > IOMAP_MAPPED [0, 8192] > IOMAP_INLINE [8192, 8200] > > However current IOMAP_INLINE type has assumption that: > - inline data should be locating at page #0. > - inline size should equal to .i_size > Those restriction fail to convert to use iomap IOMAP_INLINE in erofs, > so this patch tries to relieve above limits to make IOMAP_INLINE more > generic to cover tail-packing case. > > Signed-off-by: Chao Yu <yuchao0@huawei.com> > --- > fs/iomap.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/fs/iomap.c b/fs/iomap.c > index 12654c2e78f8..d1c16b692d31 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -264,13 +264,12 @@ static void > iomap_read_inline_data(struct inode *inode, struct page *page, > struct iomap *iomap) > { > - size_t size = i_size_read(inode); > + size_t size = iomap->length; > void *addr; > > if (PageUptodate(page)) > return; > > - BUG_ON(page->index); > BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > addr = kmap_atomic(page); > @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > sector_t sector; > > if (iomap->type == IOMAP_INLINE) { > - WARN_ON_ONCE(pos); > iomap_read_inline_data(inode, page, iomap); > return PAGE_SIZE; > } >
On Wed, Jul 03, 2019 at 03:55:02PM +0800, Chao Yu wrote: > Some filesystems like erofs/reiserfs have the ability to pack tail > data into metadata, e.g.: > IOMAP_MAPPED [0, 8192] > IOMAP_INLINE [8192, 8200] > > However current IOMAP_INLINE type has assumption that: > - inline data should be locating at page #0. > - inline size should equal to .i_size > Those restriction fail to convert to use iomap IOMAP_INLINE in erofs, > so this patch tries to relieve above limits to make IOMAP_INLINE more > generic to cover tail-packing case. > > Signed-off-by: Chao Yu <yuchao0@huawei.com> This looks good to me, but I'd also like to see a review and gfs2 testing from Andreas.
On 2019-7-9 0:03, Christoph Hellwig wrote: > On Wed, Jul 03, 2019 at 03:55:02PM +0800, Chao Yu wrote: >> Some filesystems like erofs/reiserfs have the ability to pack tail >> data into metadata, e.g.: >> IOMAP_MAPPED [0, 8192] >> IOMAP_INLINE [8192, 8200] >> >> However current IOMAP_INLINE type has assumption that: >> - inline data should be locating at page #0. >> - inline size should equal to .i_size >> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs, >> so this patch tries to relieve above limits to make IOMAP_INLINE more >> generic to cover tail-packing case. >> >> Signed-off-by: Chao Yu <yuchao0@huawei.com> > > This looks good to me, but I'd also like to see a review and gfs2 > testing from Andreas. Thanks for your reply. :) Well, so, Andreas, could you please take a look at this patch and do related test on gfs2 if you have time? Thanks, >
Hi Chao, Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@huawei.com>: > Some filesystems like erofs/reiserfs have the ability to pack tail > data into metadata, e.g.: > IOMAP_MAPPED [0, 8192] > IOMAP_INLINE [8192, 8200] > > However current IOMAP_INLINE type has assumption that: > - inline data should be locating at page #0. > - inline size should equal to .i_size > Those restriction fail to convert to use iomap IOMAP_INLINE in erofs, > so this patch tries to relieve above limits to make IOMAP_INLINE more > generic to cover tail-packing case. this patch suffers from the following problems: * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use iomap_fiemap on a filesystem with tail packing, so if we don't make the same distinction in iomap, iomap_fiemap will silently produce the wrong result. This means that IOMAP_TAIL should become a separate mapping type. * IOMAP_INLINE mappings always start at offset 0 and span an entire page, so they are always page aligned. IOMAP_TAIL mappings only need to be block aligned. If the block size is smaller than the page size, a tail page can consist of more than one mapping. So iomap_read_inline_data needs to be changed to only copy a single block out of iomap->inline_data, and we need to adjust iomap_write_begin and iomap_readpage_actor accordingly. * Functions iomap_read_inline_data, iomap_write_end_inline, and iomap_dio_inline_actor currently all assume that we are operating on page 0, and that iomap->inline_data points at the data at offset 0. That's no longer the case for IOMAP_TAIL mappings. We need to look only at the offset within the page / block there. * There are some asserts like the following int he code: BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); Those should probably be tightened to check for block boundaries instead of page boundaries, i.e. something like: BUG_ON(size > i_blocksize(inode) - offset_in_block(iomap->inline_data, inode)); > Signed-off-by: Chao Yu <yuchao0@huawei.com> > --- > fs/iomap.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/fs/iomap.c b/fs/iomap.c > index 12654c2e78f8..d1c16b692d31 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -264,13 +264,12 @@ static void > iomap_read_inline_data(struct inode *inode, struct page *page, > struct iomap *iomap) > { > - size_t size = i_size_read(inode); > + size_t size = iomap->length; Function iomap_read_inline_data fills the entire page here, not only the iomap->length part, so this is wrong. But see my comment above about changing iomap_read_inline_data to read a single block above as well. > void *addr; > > if (PageUptodate(page)) > return; > > - BUG_ON(page->index); > BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > addr = kmap_atomic(page); > @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > sector_t sector; > > if (iomap->type == IOMAP_INLINE) { > - WARN_ON_ONCE(pos); > iomap_read_inline_data(inode, page, iomap); > return PAGE_SIZE; > } Those last two changes look right to me. Thanks, Andreas
Hi Andreas, Thanks for your review in advance. :) On 2019/7/10 7:32, Andreas Grünbacher wrote: > Hi Chao, > > Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@huawei.com>: >> Some filesystems like erofs/reiserfs have the ability to pack tail >> data into metadata, e.g.: >> IOMAP_MAPPED [0, 8192] >> IOMAP_INLINE [8192, 8200] >> >> However current IOMAP_INLINE type has assumption that: >> - inline data should be locating at page #0. >> - inline size should equal to .i_size >> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs, >> so this patch tries to relieve above limits to make IOMAP_INLINE more >> generic to cover tail-packing case. > > this patch suffers from the following problems: > > * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and > FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use > iomap_fiemap on a filesystem with tail packing, so if we don't make > the same distinction in iomap, iomap_fiemap will silently produce the > wrong result. This means that IOMAP_TAIL should become a separate > mapping type. I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are with different semantics: - IOMAP_TAIl: we may introduce this flag to cover tail-packing case, in where we merge small-sized data in the tail of file into free space of its metadata. - FIEMAP_EXTENT_DATA_TAIL: quoted from Documentation/filesystems/fiemap.txt " This will also set FIEMAP_EXTENT_NOT_ALIGNED Data is packed into a block with data from other files." It looks like this flag indicates that blocks from different files stores into one common block. I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA_TAIL should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambiguity. And then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case and tail-packing case? > > * IOMAP_INLINE mappings always start at offset 0 and span an entire > page, so they are always page aligned. IOMAP_TAIL mappings only need Why can't #0 page consist of more than one block/mapping? I didn't get what's difference here. > to be block aligned. If the block size is smaller than the page size, - reiserfs tries to find last page's content, if the size of that page's valid content is smaller than threshold (at least less than one block), reiserfs can do the packing. - erofs' block size is 4kb which is the same as page size. Actually, for IOMAP_TAIL, there is no users who need to map more than one block into metadata, so I'm not sure that we should support that, or we just need to let code preparing for that to those potential users? Thanks, > a tail page can consist of more than one mapping. So > iomap_read_inline_data needs to be changed to only copy a single block > out of iomap->inline_data, and we need to adjust iomap_write_begin and > iomap_readpage_actor accordingly. > > * Functions iomap_read_inline_data, iomap_write_end_inline, and > iomap_dio_inline_actor currently all assume that we are operating on > page 0, and that iomap->inline_data points at the data at offset 0. > That's no longer the case for IOMAP_TAIL mappings. We need to look > only at the offset within the page / block there. > > * There are some asserts like the following int he code: > > BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > Those should probably be tightened to check for block boundaries > instead of page boundaries, i.e. something like: > > BUG_ON(size > i_blocksize(inode) - > offset_in_block(iomap->inline_data, inode)); > >> Signed-off-by: Chao Yu <yuchao0@huawei.com> >> --- >> fs/iomap.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/fs/iomap.c b/fs/iomap.c >> index 12654c2e78f8..d1c16b692d31 100644 >> --- a/fs/iomap.c >> +++ b/fs/iomap.c >> @@ -264,13 +264,12 @@ static void >> iomap_read_inline_data(struct inode *inode, struct page *page, >> struct iomap *iomap) >> { >> - size_t size = i_size_read(inode); >> + size_t size = iomap->length; > > Function iomap_read_inline_data fills the entire page here, not only > the iomap->length part, so this is wrong. But see my comment above > about changing iomap_read_inline_data to read a single block above as > well. > >> void *addr; >> >> if (PageUptodate(page)) >> return; >> >> - BUG_ON(page->index); >> BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); >> >> addr = kmap_atomic(page); >> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, >> sector_t sector; >> >> if (iomap->type == IOMAP_INLINE) { >> - WARN_ON_ONCE(pos); >> iomap_read_inline_data(inode, page, iomap); >> return PAGE_SIZE; >> } > > Those last two changes look right to me. > > Thanks, > Andreas > . >
Am Mi., 10. Juli 2019 um 12:31 Uhr schrieb Chao Yu <yuchao0@huawei.com>: > Hi Andreas, > > Thanks for your review in advance. :) > > On 2019/7/10 7:32, Andreas Grünbacher wrote: > > Hi Chao, > > > > Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@huawei.com>: > >> Some filesystems like erofs/reiserfs have the ability to pack tail > >> data into metadata, e.g.: > >> IOMAP_MAPPED [0, 8192] > >> IOMAP_INLINE [8192, 8200] > >> > >> However current IOMAP_INLINE type has assumption that: > >> - inline data should be locating at page #0. > >> - inline size should equal to .i_size > >> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs, > >> so this patch tries to relieve above limits to make IOMAP_INLINE more > >> generic to cover tail-packing case. > > > > this patch suffers from the following problems: > > > > * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and > > FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use > > iomap_fiemap on a filesystem with tail packing, so if we don't make > > the same distinction in iomap, iomap_fiemap will silently produce the > > wrong result. This means that IOMAP_TAIL should become a separate > > mapping type. > > I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are with > different semantics: > > - IOMAP_TAIL: > we may introduce this flag to cover tail-packing case, in where we merge > small-sized data in the tail of file into free space of its metadata. > - FIEMAP_EXTENT_DATA_TAIL: > quoted from Documentation/filesystems/fiemap.txt > " This will also set FIEMAP_EXTENT_NOT_ALIGNED > Data is packed into a block with data from other files." > It looks like this flag indicates that blocks from different files stores into > one common block. Well, maybe FIEMAP_EXTENT_DATA_INLINE is indeed the more appropriate flag in your scenario. In that case, we should be fine on the fiemap front. > I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA_TAIL > should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambiguity. And > then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case and > tail-packing case? Those definitions are user-visible. > > * IOMAP_INLINE mappings always start at offset 0 and span an entire > > page, so they are always page aligned. IOMAP_TAIL mappings only need > > Why can't #0 page consist of more than one block/mapping? I didn't get what's > difference here. Currently, iomap_write_begin will read the inline data into page #0 and mark that page up-to-date. There's a check for that in iomap_write_end_inline. If a page contains more than one mapping, we won't be able to mark the entire page up to date anymore; we'd need to track which parts of the page are up to date and which are not. Iomap supports two tracking mechanisms, buffer heads and iomap_page, and we'd need to implement that tracking code for each of those cases. > > to be block aligned. If the block size is smaller than the page size, > > - reiserfs tries to find last page's content, if the size of that page's valid > content is smaller than threshold (at least less than one block), reiserfs can > do the packing. > > - erofs' block size is 4kb which is the same as page size. > > > Actually, for IOMAP_TAIL, there is no users who need to map more than one block > into metadata, so I'm not sure that we should support that, or we just need to > let code preparing for that to those potential users? How about architectures with PAGE_SIZE > 4096? I'm assuming that erofs doesn't require block size == PAGE_SIZE? > Thanks, > > > a tail page can consist of more than one mapping. So > > iomap_read_inline_data needs to be changed to only copy a single block > > out of iomap->inline_data, and we need to adjust iomap_write_begin and > > iomap_readpage_actor accordingly. > > > > * Functions iomap_read_inline_data, iomap_write_end_inline, and > > iomap_dio_inline_actor currently all assume that we are operating on > > page 0, and that iomap->inline_data points at the data at offset 0. > > That's no longer the case for IOMAP_TAIL mappings. We need to look > > only at the offset within the page / block there. > > > > * There are some asserts like the following int he code: > > > > BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > > > Those should probably be tightened to check for block boundaries > > instead of page boundaries, i.e. something like: > > > > BUG_ON(size > i_blocksize(inode) - > > offset_in_block(iomap->inline_data, inode)); > > > >> Signed-off-by: Chao Yu <yuchao0@huawei.com> > >> --- > >> fs/iomap.c | 4 +--- > >> 1 file changed, 1 insertion(+), 3 deletions(-) > >> > >> diff --git a/fs/iomap.c b/fs/iomap.c > >> index 12654c2e78f8..d1c16b692d31 100644 > >> --- a/fs/iomap.c > >> +++ b/fs/iomap.c > >> @@ -264,13 +264,12 @@ static void > >> iomap_read_inline_data(struct inode *inode, struct page *page, > >> struct iomap *iomap) > >> { > >> - size_t size = i_size_read(inode); > >> + size_t size = iomap->length; > > > > Function iomap_read_inline_data fills the entire page here, not only > > the iomap->length part, so this is wrong. But see my comment above > > about changing iomap_read_inline_data to read a single block above as > > well. > > > >> void *addr; > >> > >> if (PageUptodate(page)) > >> return; > >> > >> - BUG_ON(page->index); > >> BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > >> > >> addr = kmap_atomic(page); > >> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > >> sector_t sector; > >> > >> if (iomap->type == IOMAP_INLINE) { > >> - WARN_ON_ONCE(pos); > >> iomap_read_inline_data(inode, page, iomap); > >> return PAGE_SIZE; > >> } > > > > Those last two changes look right to me. > > > > Thanks, > > Andreas > > . > > At this point, can I ask how important this packing mechanism is to you? I can see a point in implementing inline files, which help because there tends to be a large number of very small files. But for not-so-small files, is saving an extra block really worth the trouble, especially given how cheap storage has become? Thanks, Andreas
At 2019/7/11 ??????5:50, Andreas Gr??nbacher Wrote: > At this point, can I ask how important this packing mechanism is to > you? I can see a point in implementing inline files, which help > because there tends to be a large number of very small files. But for > not-so-small files, is saving an extra block really worth the trouble, > especially given how cheap storage has become? I would try to answer the above. I think there are several advantages by using tail-end packing inline: 1) It is more cache-friendly. Considering a file "A" accessed by user now or recently, we ?????? tend to (1) get more data about "A" (2) leave more data about "A" according to LRU-like assumption ?????? because it is more likely to be used than the metadata of some other files "X", especially for files whose ?????? tail-end block is relatively small enough (less than a threshold, e.g. < 100B just for example); 2) for directories files, tail-end packing will boost up those traversal performance; 3) I think tail-end packing is a more generic inline, it saves I/Os for generic cases not just to ?????? save the storage space; "is saving an extra block really worth the trouble" I dont understand what exact the trouble is... Thanks, Gao Xiang
Something along the lines of the attached, broken patch might work in the end. Andreas --- fs/buffer.c | 10 ++++-- fs/iomap.c | 74 +++++++++++++++++++++++++++++-------------- include/linux/iomap.h | 3 ++ 3 files changed, 61 insertions(+), 26 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index e450c55f6434..8d8668e377ab 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to) EXPORT_SYMBOL(page_zero_new_buffers); static void -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, - struct iomap *iomap) +iomap_to_bh(struct inode *inode, struct page *page, sector_t block, + struct buffer_head *bh, struct iomap *iomap) { loff_t offset = block << inode->i_blkbits; @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, inode->i_blkbits; set_buffer_mapped(bh); break; + case IOMAP_INLINE: + __iomap_read_inline_data(inode, page, iomap); + set_buffer_uptodate(bh); + break; } } @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len, if (err) break; } else { - iomap_to_bh(inode, block, bh, iomap); + iomap_to_bh(inode, page, block, bh, iomap); } if (buffer_new(bh)) { diff --git a/fs/iomap.c b/fs/iomap.c index 45aa58e837b5..61188e95def2 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -260,24 +260,47 @@ struct iomap_readpage_ctx { struct list_head *pages; }; -static void -iomap_read_inline_data(struct inode *inode, struct page *page, +#define offset_in_block(offset, inode) \ + ((unsigned long)(offset) & (i_blocksize(inode) - 1)) + +static bool +inline_data_within_block(struct inode *inode, struct iomap *iomap, + unsigned int size) +{ + unsigned int off = offset_in_block(iomap->inline_data, inode); + + return size <= i_blocksize(inode) - off; +} + +void +__iomap_read_inline_data(struct inode *inode, struct page *page, struct iomap *iomap) { - size_t size = i_size_read(inode); + size_t size = offset_in_block(i_size_read(inode), inode); + unsigned int poff = offset_in_page(iomap->offset); + unsigned int bsize = i_blocksize(inode); void *addr; if (PageUptodate(page)) return; - BUG_ON(page->index); - BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); + BUG_ON(!inline_data_within_block(inode, iomap, size)); addr = kmap_atomic(page); - memcpy(addr, iomap->inline_data, size); - memset(addr + size, 0, PAGE_SIZE - size); + memcpy(addr + poff, iomap->inline_data, size); + memset(addr + poff + size, 0, bsize - size); kunmap_atomic(addr); - SetPageUptodate(page); +} + +static void +iomap_read_inline_data(struct inode *inode, struct page *page, + struct iomap *iomap) +{ + unsigned int poff = offset_in_page(iomap->offset); + unsigned int bsize = i_blocksize(inode); + + __iomap_read_inline_data(inode, page, iomap); + iomap_set_range_uptodate(page, poff, bsize); } static loff_t @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, unsigned poff, plen; sector_t sector; - if (iomap->type == IOMAP_INLINE) { - WARN_ON_ONCE(pos); + if (iomap->type == IOMAP_INLINE) iomap_read_inline_data(inode, page, iomap); - return PAGE_SIZE; - } /* zero post-eof blocks as the page may be mapped */ iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen); @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, if (PageUptodate(page)) return 0; + if (iomap->type == IOMAP_INLINE) { + iomap_read_inline_data(inode, page, iomap); + return 0; + } + do { iomap_adjust_read_range(inode, iop, &block_start, block_end - block_start, &poff, &plen); @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, goto out_no_page; } - if (iomap->type == IOMAP_INLINE) - iomap_read_inline_data(inode, page, iomap); - else if (iomap->flags & IOMAP_F_BUFFER_HEAD) + if (iomap->flags & IOMAP_F_BUFFER_HEAD) status = __block_write_begin_int(page, pos, len, NULL, iomap); else status = __iomap_write_begin(inode, pos, len, page, iomap); @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page, { void *addr; - WARN_ON_ONCE(!PageUptodate(page)); - BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data)); + BUG_ON(!inline_data_within_block(inode, iomap, pos + copied)); addr = kmap_atomic(page); - memcpy(iomap->inline_data + pos, addr + pos, copied); + memcpy(iomap->inline_data + offset_in_block(pos, inode), + addr + offset_in_page(pos), copied); kunmap_atomic(addr); mark_inode_dirty(inode); @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, const struct iomap_ops *ops) { unsigned int blocksize = i_blocksize(inode); - unsigned int off = pos & (blocksize - 1); + unsigned int off = offset_in_block(pos, inode); /* Block boundary? Nothing to do */ if (!off) @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, struct iov_iter *iter = dio->submit.iter; size_t copied; - BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data)); + BUG_ON(!inline_data_within_block(inode, iomap, pos + length)); if (dio->flags & IOMAP_DIO_WRITE) { loff_t size = inode->i_size; if (pos > size) - memset(iomap->inline_data + size, 0, pos - size); - copied = copy_from_iter(iomap->inline_data + pos, length, iter); + memset(iomap->inline_data + + offset_in_block(size, inode), 0, pos - size); + copied = copy_from_iter(iomap->inline_data + + offset_in_block(pos, inode), + length, iter); if (copied) { if (pos + copied > size) i_size_write(inode, pos + copied); mark_inode_dirty(inode); } } else { - copied = copy_to_iter(iomap->inline_data + pos, length, iter); + copied = copy_to_iter(iomap->inline_data + + offset_in_block(pos, inode), + length, iter); } dio->size += copied; return copied; diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 2103b94cb1bf..a8a60dd2fdc0 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page) return NULL; } +void __iomap_read_inline_data(struct inode *inode, struct page *page, + struct iomap *iomap); + ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, const struct iomap_ops *ops); int iomap_readpage(struct page *page, const struct iomap_ops *ops);
On Thu, Jul 11, 2019 at 07:42:20AM +0800, Gao Xiang wrote: > > At 2019/7/11 ??????5:50, Andreas Gr??nbacher Wrote: > > At this point, can I ask how important this packing mechanism is to > > you? I can see a point in implementing inline files, which help > > because there tends to be a large number of very small files. But for > > not-so-small files, is saving an extra block really worth the trouble, > > especially given how cheap storage has become? > > I would try to answer the above. I think there are several advantages by > using tail-end packing inline: > 1) It is more cache-friendly. Considering a file "A" accessed by user > now or recently, we > ?????? tend to (1) get more data about "A" (2) leave more data about "A" > according to LRU-like assumption > ?????? because it is more likely to be used than the metadata of some other > files "X", especially for files whose > ?????? tail-end block is relatively small enough (less than a threshold, > e.g. < 100B just for example); > > 2) for directories files, tail-end packing will boost up those traversal > performance; > > 3) I think tail-end packing is a more generic inline, it saves I/Os for > generic cases not just to > ?????? save the storage space; > > "is saving an extra block really worth the trouble" I dont understand > what exact the trouble is... "the trouble" is adding code complexity and additional things to test. I'm not sure you really understood Andreas' question. He's saying that he understands the performance and space gain from packing short files (eg files less than 100 bytes). But how many files are there between 4096 and 4196 bytes in size, let alone between 8192 and 8292, 12384 and 12484 ... Is optimising for _those_ files worth it?
On 2019/7/11 21:06, Matthew Wilcox wrote: > On Thu, Jul 11, 2019 at 07:42:20AM +0800, Gao Xiang wrote: >> >> At 2019/7/11 ??????5:50, Andreas Gr??nbacher Wrote: >>> At this point, can I ask how important this packing mechanism is to >>> you? I can see a point in implementing inline files, which help >>> because there tends to be a large number of very small files. But for >>> not-so-small files, is saving an extra block really worth the trouble, >>> especially given how cheap storage has become? >> >> I would try to answer the above. I think there are several advantages by >> using tail-end packing inline: >> 1) It is more cache-friendly. Considering a file "A" accessed by user >> now or recently, we >> ?????? tend to (1) get more data about "A" (2) leave more data about "A" >> according to LRU-like assumption >> ?????? because it is more likely to be used than the metadata of some other >> files "X", especially for files whose >> ?????? tail-end block is relatively small enough (less than a threshold, >> e.g. < 100B just for example); >> >> 2) for directories files, tail-end packing will boost up those traversal >> performance; >> >> 3) I think tail-end packing is a more generic inline, it saves I/Os for >> generic cases not just to >> ?????? save the storage space; >> >> "is saving an extra block really worth the trouble" I dont understand >> what exact the trouble is... > > "the trouble" is adding code complexity and additional things to test. > > I'm not sure you really understood Andreas' question. He's saying that he > understands the performance and space gain from packing short files > (eg files less than 100 bytes). But how many files are there between > 4096 and 4196 bytes in size, let alone between 8192 and 8292, 12384 and > 12484 ... > > Is optimising for _those_ files worth it? Hi Willy, Thanks for your kindly explanation.. I get it :) I try to express my thoughts in the following aspects... 1) In my thought, I personally think Chao's first patch which adds an additional type could be better for now, maybe we can reduce duplicate code based on that patch even further. What EROFS needs is only a read-only tail-end packing, I think for write we actually need to rethink more carefully (but it doesn't mean complex I think, but I don't do research on this.. I have to be silent...) and maybe we could leave it until a really fs user switching to iomap and mix INLINE and TAIL at that time... 2) EROFS actually has an unfinished feature which supports tail-end packing compresssed data, which means decompressed data could not be so small... and I know that is another matter... So to direct answer the question is that it depends on the userdata and user. For EROFS, tail-end packing inline is easy to implement, and it's a per-file optional feature (not mandatory) and the threshold (< 100B) is not a hardcoded limit as well, which is configured by mkfs users and only help mkfs decide whether the file should enable it or not. it should be useful for all directories at least, and I think it is more cache-friendly for regular files as well so a large range of files configured by users (not just < 100B) can be benefited from this... Sorry about my English... Thanks, Gao Xiang >
On 2019-7-11 5:50, Andreas Grünbacher wrote: > Am Mi., 10. Juli 2019 um 12:31 Uhr schrieb Chao Yu <yuchao0@huawei.com>: >> Hi Andreas, >> >> Thanks for your review in advance. :) >> >> On 2019/7/10 7:32, Andreas Grünbacher wrote: >>> Hi Chao, >>> >>> Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@huawei.com>: >>>> Some filesystems like erofs/reiserfs have the ability to pack tail >>>> data into metadata, e.g.: >>>> IOMAP_MAPPED [0, 8192] >>>> IOMAP_INLINE [8192, 8200] >>>> >>>> However current IOMAP_INLINE type has assumption that: >>>> - inline data should be locating at page #0. >>>> - inline size should equal to .i_size >>>> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs, >>>> so this patch tries to relieve above limits to make IOMAP_INLINE more >>>> generic to cover tail-packing case. >>> >>> this patch suffers from the following problems: >>> >>> * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and >>> FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use >>> iomap_fiemap on a filesystem with tail packing, so if we don't make >>> the same distinction in iomap, iomap_fiemap will silently produce the >>> wrong result. This means that IOMAP_TAIL should become a separate >>> mapping type. >> >> I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are with >> different semantics: >> >> - IOMAP_TAIL: >> we may introduce this flag to cover tail-packing case, in where we merge >> small-sized data in the tail of file into free space of its metadata. >> - FIEMAP_EXTENT_DATA_TAIL: >> quoted from Documentation/filesystems/fiemap.txt >> " This will also set FIEMAP_EXTENT_NOT_ALIGNED >> Data is packed into a block with data from other files." >> It looks like this flag indicates that blocks from different files stores into >> one common block. > > Well, maybe FIEMAP_EXTENT_DATA_INLINE is indeed the more appropriate > flag in your scenario. In that case, we should be fine on the fiemap > front. Yup. > >> I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA_TAIL >> should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambiguity. And >> then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case and >> tail-packing case? > > Those definitions are user-visible. > >>> * IOMAP_INLINE mappings always start at offset 0 and span an entire >>> page, so they are always page aligned. IOMAP_TAIL mappings only need >> >> Why can't #0 page consist of more than one block/mapping? I didn't get what's >> difference here. > > Currently, iomap_write_begin will read the inline data into page #0 > and mark that page up-to-date. There's a check for that in > iomap_write_end_inline. If a page contains more than one mapping, we > won't be able to mark the entire page up to date anymore; we'd need to > track which parts of the page are up to date and which are not. Iomap > supports two tracking mechanisms, buffer heads and iomap_page, and > we'd need to implement that tracking code for each of those cases. Okay, I can understand now, so the reason here is the limitation (BUG_ON, WARM_ON in iomap_inline_xxx functions) makes inline page should only contain one mapping, then to generalize it, we should consider the unaligned case in between page size and block size. > >>> to be block aligned. If the block size is smaller than the page size, >> >> - reiserfs tries to find last page's content, if the size of that page's valid >> content is smaller than threshold (at least less than one block), reiserfs can >> do the packing. >> >> - erofs' block size is 4kb which is the same as page size. >> >> >> Actually, for IOMAP_TAIL, there is no users who need to map more than one block >> into metadata, so I'm not sure that we should support that, or we just need to >> let code preparing for that to those potential users? > > How about architectures with PAGE_SIZE > 4096? I'm assuming that erofs We haven't considered the PAGE_SIZE > 4096 case yet. > doesn't require block size == PAGE_SIZE? At least now erofs block size is 4KB by default. > >> Thanks, >> >>> a tail page can consist of more than one mapping. So >>> iomap_read_inline_data needs to be changed to only copy a single block >>> out of iomap->inline_data, and we need to adjust iomap_write_begin and >>> iomap_readpage_actor accordingly. >>> >>> * Functions iomap_read_inline_data, iomap_write_end_inline, and >>> iomap_dio_inline_actor currently all assume that we are operating on >>> page 0, and that iomap->inline_data points at the data at offset 0. >>> That's no longer the case for IOMAP_TAIL mappings. We need to look >>> only at the offset within the page / block there. >>> >>> * There are some asserts like the following int he code: >>> >>> BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); >>> >>> Those should probably be tightened to check for block boundaries >>> instead of page boundaries, i.e. something like: >>> >>> BUG_ON(size > i_blocksize(inode) - >>> offset_in_block(iomap->inline_data, inode)); >>> >>>> Signed-off-by: Chao Yu <yuchao0@huawei.com> >>>> --- >>>> fs/iomap.c | 4 +--- >>>> 1 file changed, 1 insertion(+), 3 deletions(-) >>>> >>>> diff --git a/fs/iomap.c b/fs/iomap.c >>>> index 12654c2e78f8..d1c16b692d31 100644 >>>> --- a/fs/iomap.c >>>> +++ b/fs/iomap.c >>>> @@ -264,13 +264,12 @@ static void >>>> iomap_read_inline_data(struct inode *inode, struct page *page, >>>> struct iomap *iomap) >>>> { >>>> - size_t size = i_size_read(inode); >>>> + size_t size = iomap->length; >>> >>> Function iomap_read_inline_data fills the entire page here, not only >>> the iomap->length part, so this is wrong. But see my comment above >>> about changing iomap_read_inline_data to read a single block above as >>> well. >>> >>>> void *addr; >>>> >>>> if (PageUptodate(page)) >>>> return; >>>> >>>> - BUG_ON(page->index); >>>> BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); >>>> >>>> addr = kmap_atomic(page); >>>> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, >>>> sector_t sector; >>>> >>>> if (iomap->type == IOMAP_INLINE) { >>>> - WARN_ON_ONCE(pos); >>>> iomap_read_inline_data(inode, page, iomap); >>>> return PAGE_SIZE; >>>> } >>> >>> Those last two changes look right to me. >>> >>> Thanks, >>> Andreas >>> . >>> > > At this point, can I ask how important this packing mechanism is to > you? I can see a point in implementing inline files, which help > because there tends to be a large number of very small files. But for > not-so-small files, is saving an extra block really worth the trouble, > especially given how cheap storage has become? - Erofs is a readonly filesystem, we don't need to consider/design write path for tail-pack feature, including writeback and inline conversion cases. So code complex didn't trouble us. - The files from real user's scenario are always unaligned to page size or block size, so it can expect that part of files can be packed its tail data; and erofs is used in consumer-electronics now (cell phone), there is no such large size storage, we think that tail-packing can be one of compression/compaction solutions in erofs to save storage space as much as possible. Thanks, > > Thanks, > Andreas >
On 2019/7/11 20:28, Andreas Gruenbacher wrote: > Something along the lines of the attached, broken patch might work in > the end. > > Andreas > > --- > fs/buffer.c | 10 ++++-- > fs/iomap.c | 74 +++++++++++++++++++++++++++++-------------- > include/linux/iomap.h | 3 ++ > 3 files changed, 61 insertions(+), 26 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index e450c55f6434..8d8668e377ab 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to) > EXPORT_SYMBOL(page_zero_new_buffers); > > static void > -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, > - struct iomap *iomap) > +iomap_to_bh(struct inode *inode, struct page *page, sector_t block, > + struct buffer_head *bh, struct iomap *iomap) > { > loff_t offset = block << inode->i_blkbits; > > @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, > inode->i_blkbits; > set_buffer_mapped(bh); > break; > + case IOMAP_INLINE: > + __iomap_read_inline_data(inode, page, iomap); > + set_buffer_uptodate(bh); > + break; > } > } > > @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len, > if (err) > break; > } else { > - iomap_to_bh(inode, block, bh, iomap); > + iomap_to_bh(inode, page, block, bh, iomap); > } > > if (buffer_new(bh)) { > diff --git a/fs/iomap.c b/fs/iomap.c > index 45aa58e837b5..61188e95def2 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -260,24 +260,47 @@ struct iomap_readpage_ctx { > struct list_head *pages; > }; > > -static void > -iomap_read_inline_data(struct inode *inode, struct page *page, > +#define offset_in_block(offset, inode) \ > + ((unsigned long)(offset) & (i_blocksize(inode) - 1)) > + > +static bool > +inline_data_within_block(struct inode *inode, struct iomap *iomap, > + unsigned int size) > +{ > + unsigned int off = offset_in_block(iomap->inline_data, inode); > + > + return size <= i_blocksize(inode) - off; > +} > + > +void > +__iomap_read_inline_data(struct inode *inode, struct page *page, > struct iomap *iomap) > { > - size_t size = i_size_read(inode); > + size_t size = offset_in_block(i_size_read(inode), inode); > + unsigned int poff = offset_in_page(iomap->offset); > + unsigned int bsize = i_blocksize(inode); > void *addr; > > if (PageUptodate(page)) > return; > > - BUG_ON(page->index); > - BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > + BUG_ON(!inline_data_within_block(inode, iomap, size)); > > addr = kmap_atomic(page); > - memcpy(addr, iomap->inline_data, size); > - memset(addr + size, 0, PAGE_SIZE - size); > + memcpy(addr + poff, iomap->inline_data, size); > + memset(addr + poff + size, 0, bsize - size); > kunmap_atomic(addr); > - SetPageUptodate(page); > +} > + > +static void > +iomap_read_inline_data(struct inode *inode, struct page *page, > + struct iomap *iomap) > +{ > + unsigned int poff = offset_in_page(iomap->offset); > + unsigned int bsize = i_blocksize(inode); > + > + __iomap_read_inline_data(inode, page, iomap); > + iomap_set_range_uptodate(page, poff, bsize); > } > > static loff_t > @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > unsigned poff, plen; > sector_t sector; > > - if (iomap->type == IOMAP_INLINE) { > - WARN_ON_ONCE(pos); > + if (iomap->type == IOMAP_INLINE) > iomap_read_inline_data(inode, page, iomap); > - return PAGE_SIZE; Hi Andreas, Thanks for your patch. In my erofs test case, filled inline data will be zeroed out due to we fallback to following flow: if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) { zero_user(page, poff, plen); Should we return before this condition check? Thanks, > - } > > /* zero post-eof blocks as the page may be mapped */ > iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen); > @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, > if (PageUptodate(page)) > return 0; > > + if (iomap->type == IOMAP_INLINE) { > + iomap_read_inline_data(inode, page, iomap); > + return 0; > + } > + > do { > iomap_adjust_read_range(inode, iop, &block_start, > block_end - block_start, &poff, &plen); > @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, > goto out_no_page; > } > > - if (iomap->type == IOMAP_INLINE) > - iomap_read_inline_data(inode, page, iomap); > - else if (iomap->flags & IOMAP_F_BUFFER_HEAD) > + if (iomap->flags & IOMAP_F_BUFFER_HEAD) > status = __block_write_begin_int(page, pos, len, NULL, iomap); > else > status = __iomap_write_begin(inode, pos, len, page, iomap); > @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page, > { > void *addr; > > - WARN_ON_ONCE(!PageUptodate(page)); > - BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data)); > + BUG_ON(!inline_data_within_block(inode, iomap, pos + copied)); > > addr = kmap_atomic(page); > - memcpy(iomap->inline_data + pos, addr + pos, copied); > + memcpy(iomap->inline_data + offset_in_block(pos, inode), > + addr + offset_in_page(pos), copied); > kunmap_atomic(addr); > > mark_inode_dirty(inode); > @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, > const struct iomap_ops *ops) > { > unsigned int blocksize = i_blocksize(inode); > - unsigned int off = pos & (blocksize - 1); > + unsigned int off = offset_in_block(pos, inode); > > /* Block boundary? Nothing to do */ > if (!off) > @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, > struct iov_iter *iter = dio->submit.iter; > size_t copied; > > - BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data)); > + BUG_ON(!inline_data_within_block(inode, iomap, pos + length)); > > if (dio->flags & IOMAP_DIO_WRITE) { > loff_t size = inode->i_size; > > if (pos > size) > - memset(iomap->inline_data + size, 0, pos - size); > - copied = copy_from_iter(iomap->inline_data + pos, length, iter); > + memset(iomap->inline_data + > + offset_in_block(size, inode), 0, pos - size); > + copied = copy_from_iter(iomap->inline_data + > + offset_in_block(pos, inode), > + length, iter); > if (copied) { > if (pos + copied > size) > i_size_write(inode, pos + copied); > mark_inode_dirty(inode); > } > } else { > - copied = copy_to_iter(iomap->inline_data + pos, length, iter); > + copied = copy_to_iter(iomap->inline_data + > + offset_in_block(pos, inode), > + length, iter); > } > dio->size += copied; > return copied; > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > index 2103b94cb1bf..a8a60dd2fdc0 100644 > --- a/include/linux/iomap.h > +++ b/include/linux/iomap.h > @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page) > return NULL; > } > > +void __iomap_read_inline_data(struct inode *inode, struct page *page, > + struct iomap *iomap); > + > ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, > const struct iomap_ops *ops); > int iomap_readpage(struct page *page, const struct iomap_ops *ops); >
On Fri, 12 Jul 2019 at 11:31, Chao Yu <yuchao0@huawei.com> wrote: > On 2019/7/11 20:28, Andreas Gruenbacher wrote: > > Something along the lines of the attached, broken patch might work in > > the end. > > > > Andreas > > > > --- > > fs/buffer.c | 10 ++++-- > > fs/iomap.c | 74 +++++++++++++++++++++++++++++-------------- > > include/linux/iomap.h | 3 ++ > > 3 files changed, 61 insertions(+), 26 deletions(-) > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > index e450c55f6434..8d8668e377ab 100644 > > --- a/fs/buffer.c > > +++ b/fs/buffer.c > > @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to) > > EXPORT_SYMBOL(page_zero_new_buffers); > > > > static void > > -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, > > - struct iomap *iomap) > > +iomap_to_bh(struct inode *inode, struct page *page, sector_t block, > > + struct buffer_head *bh, struct iomap *iomap) > > { > > loff_t offset = block << inode->i_blkbits; > > > > @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, > > inode->i_blkbits; > > set_buffer_mapped(bh); > > break; > > + case IOMAP_INLINE: > > + __iomap_read_inline_data(inode, page, iomap); > > + set_buffer_uptodate(bh); > > + break; > > } > > } > > > > @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len, > > if (err) > > break; > > } else { > > - iomap_to_bh(inode, block, bh, iomap); > > + iomap_to_bh(inode, page, block, bh, iomap); > > } > > > > if (buffer_new(bh)) { > > diff --git a/fs/iomap.c b/fs/iomap.c > > index 45aa58e837b5..61188e95def2 100644 > > --- a/fs/iomap.c > > +++ b/fs/iomap.c > > @@ -260,24 +260,47 @@ struct iomap_readpage_ctx { > > struct list_head *pages; > > }; > > > > -static void > > -iomap_read_inline_data(struct inode *inode, struct page *page, > > +#define offset_in_block(offset, inode) \ > > + ((unsigned long)(offset) & (i_blocksize(inode) - 1)) > > + > > +static bool > > +inline_data_within_block(struct inode *inode, struct iomap *iomap, > > + unsigned int size) > > +{ > > + unsigned int off = offset_in_block(iomap->inline_data, inode); > > + > > + return size <= i_blocksize(inode) - off; > > +} > > + > > +void > > +__iomap_read_inline_data(struct inode *inode, struct page *page, > > struct iomap *iomap) > > { > > - size_t size = i_size_read(inode); > > + size_t size = offset_in_block(i_size_read(inode), inode); > > + unsigned int poff = offset_in_page(iomap->offset); > > + unsigned int bsize = i_blocksize(inode); > > void *addr; > > > > if (PageUptodate(page)) > > return; > > > > - BUG_ON(page->index); > > - BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > + BUG_ON(!inline_data_within_block(inode, iomap, size)); > > > > addr = kmap_atomic(page); > > - memcpy(addr, iomap->inline_data, size); > > - memset(addr + size, 0, PAGE_SIZE - size); > > + memcpy(addr + poff, iomap->inline_data, size); > > + memset(addr + poff + size, 0, bsize - size); > > kunmap_atomic(addr); > > - SetPageUptodate(page); > > +} > > + > > +static void > > +iomap_read_inline_data(struct inode *inode, struct page *page, > > + struct iomap *iomap) > > +{ > > + unsigned int poff = offset_in_page(iomap->offset); > > + unsigned int bsize = i_blocksize(inode); > > + > > + __iomap_read_inline_data(inode, page, iomap); > > + iomap_set_range_uptodate(page, poff, bsize); > > } > > > > static loff_t > > @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > > unsigned poff, plen; > > sector_t sector; > > > > - if (iomap->type == IOMAP_INLINE) { > > - WARN_ON_ONCE(pos); > > + if (iomap->type == IOMAP_INLINE) > > iomap_read_inline_data(inode, page, iomap); > > - return PAGE_SIZE; > > Hi Andreas, > > Thanks for your patch. > > In my erofs test case, filled inline data will be zeroed out due to we fallback > to following flow: > > if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) { > zero_user(page, poff, plen); > > Should we return before this condition check? Yes, probably by returning i_blocksize(inode) after iomap_read_inline_data, but that alone isn't enough to make the patch work completely. This really needs a review from Christoph and careful testing of all the code paths. Andreas > Thanks, > > > - } > > > > /* zero post-eof blocks as the page may be mapped */ > > iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen); > > @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, > > if (PageUptodate(page)) > > return 0; > > > > + if (iomap->type == IOMAP_INLINE) { > > + iomap_read_inline_data(inode, page, iomap); > > + return 0; > > + } > > + > > do { > > iomap_adjust_read_range(inode, iop, &block_start, > > block_end - block_start, &poff, &plen); > > @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, > > goto out_no_page; > > } > > > > - if (iomap->type == IOMAP_INLINE) > > - iomap_read_inline_data(inode, page, iomap); > > - else if (iomap->flags & IOMAP_F_BUFFER_HEAD) > > + if (iomap->flags & IOMAP_F_BUFFER_HEAD) > > status = __block_write_begin_int(page, pos, len, NULL, iomap); > > else > > status = __iomap_write_begin(inode, pos, len, page, iomap); > > @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page, > > { > > void *addr; > > > > - WARN_ON_ONCE(!PageUptodate(page)); > > - BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > + BUG_ON(!inline_data_within_block(inode, iomap, pos + copied)); > > > > addr = kmap_atomic(page); > > - memcpy(iomap->inline_data + pos, addr + pos, copied); > > + memcpy(iomap->inline_data + offset_in_block(pos, inode), > > + addr + offset_in_page(pos), copied); > > kunmap_atomic(addr); > > > > mark_inode_dirty(inode); > > @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, > > const struct iomap_ops *ops) > > { > > unsigned int blocksize = i_blocksize(inode); > > - unsigned int off = pos & (blocksize - 1); > > + unsigned int off = offset_in_block(pos, inode); > > > > /* Block boundary? Nothing to do */ > > if (!off) > > @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, > > struct iov_iter *iter = dio->submit.iter; > > size_t copied; > > > > - BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > + BUG_ON(!inline_data_within_block(inode, iomap, pos + length)); > > > > if (dio->flags & IOMAP_DIO_WRITE) { > > loff_t size = inode->i_size; > > > > if (pos > size) > > - memset(iomap->inline_data + size, 0, pos - size); > > - copied = copy_from_iter(iomap->inline_data + pos, length, iter); > > + memset(iomap->inline_data + > > + offset_in_block(size, inode), 0, pos - size); > > + copied = copy_from_iter(iomap->inline_data + > > + offset_in_block(pos, inode), > > + length, iter); > > if (copied) { > > if (pos + copied > size) > > i_size_write(inode, pos + copied); > > mark_inode_dirty(inode); > > } > > } else { > > - copied = copy_to_iter(iomap->inline_data + pos, length, iter); > > + copied = copy_to_iter(iomap->inline_data + > > + offset_in_block(pos, inode), > > + length, iter); > > } > > dio->size += copied; > > return copied; > > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > > index 2103b94cb1bf..a8a60dd2fdc0 100644 > > --- a/include/linux/iomap.h > > +++ b/include/linux/iomap.h > > @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page) > > return NULL; > > } > > > > +void __iomap_read_inline_data(struct inode *inode, struct page *page, > > + struct iomap *iomap); > > + > > ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, > > const struct iomap_ops *ops); > > int iomap_readpage(struct page *page, const struct iomap_ops *ops); > >
On 2019/7/12 19:54, Andreas Gruenbacher wrote: > On Fri, 12 Jul 2019 at 11:31, Chao Yu <yuchao0@huawei.com> wrote: >> On 2019/7/11 20:28, Andreas Gruenbacher wrote: >>> Something along the lines of the attached, broken patch might work in >>> the end. >>> >>> Andreas >>> >>> --- >>> fs/buffer.c | 10 ++++-- >>> fs/iomap.c | 74 +++++++++++++++++++++++++++++-------------- >>> include/linux/iomap.h | 3 ++ >>> 3 files changed, 61 insertions(+), 26 deletions(-) >>> >>> diff --git a/fs/buffer.c b/fs/buffer.c >>> index e450c55f6434..8d8668e377ab 100644 >>> --- a/fs/buffer.c >>> +++ b/fs/buffer.c >>> @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to) >>> EXPORT_SYMBOL(page_zero_new_buffers); >>> >>> static void >>> -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, >>> - struct iomap *iomap) >>> +iomap_to_bh(struct inode *inode, struct page *page, sector_t block, >>> + struct buffer_head *bh, struct iomap *iomap) >>> { >>> loff_t offset = block << inode->i_blkbits; >>> >>> @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, >>> inode->i_blkbits; >>> set_buffer_mapped(bh); >>> break; >>> + case IOMAP_INLINE: >>> + __iomap_read_inline_data(inode, page, iomap); >>> + set_buffer_uptodate(bh); >>> + break; >>> } >>> } >>> >>> @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len, >>> if (err) >>> break; >>> } else { >>> - iomap_to_bh(inode, block, bh, iomap); >>> + iomap_to_bh(inode, page, block, bh, iomap); >>> } >>> >>> if (buffer_new(bh)) { >>> diff --git a/fs/iomap.c b/fs/iomap.c >>> index 45aa58e837b5..61188e95def2 100644 >>> --- a/fs/iomap.c >>> +++ b/fs/iomap.c >>> @@ -260,24 +260,47 @@ struct iomap_readpage_ctx { >>> struct list_head *pages; >>> }; >>> >>> -static void >>> -iomap_read_inline_data(struct inode *inode, struct page *page, >>> +#define offset_in_block(offset, inode) \ >>> + ((unsigned long)(offset) & (i_blocksize(inode) - 1)) >>> + >>> +static bool >>> +inline_data_within_block(struct inode *inode, struct iomap *iomap, >>> + unsigned int size) >>> +{ >>> + unsigned int off = offset_in_block(iomap->inline_data, inode); >>> + >>> + return size <= i_blocksize(inode) - off; >>> +} >>> + >>> +void >>> +__iomap_read_inline_data(struct inode *inode, struct page *page, >>> struct iomap *iomap) >>> { >>> - size_t size = i_size_read(inode); >>> + size_t size = offset_in_block(i_size_read(inode), inode); >>> + unsigned int poff = offset_in_page(iomap->offset); >>> + unsigned int bsize = i_blocksize(inode); >>> void *addr; >>> >>> if (PageUptodate(page)) >>> return; >>> >>> - BUG_ON(page->index); >>> - BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); >>> + BUG_ON(!inline_data_within_block(inode, iomap, size)); >>> >>> addr = kmap_atomic(page); >>> - memcpy(addr, iomap->inline_data, size); >>> - memset(addr + size, 0, PAGE_SIZE - size); >>> + memcpy(addr + poff, iomap->inline_data, size); >>> + memset(addr + poff + size, 0, bsize - size); >>> kunmap_atomic(addr); >>> - SetPageUptodate(page); >>> +} >>> + >>> +static void >>> +iomap_read_inline_data(struct inode *inode, struct page *page, >>> + struct iomap *iomap) >>> +{ >>> + unsigned int poff = offset_in_page(iomap->offset); >>> + unsigned int bsize = i_blocksize(inode); >>> + >>> + __iomap_read_inline_data(inode, page, iomap); >>> + iomap_set_range_uptodate(page, poff, bsize); >>> } >>> >>> static loff_t >>> @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, >>> unsigned poff, plen; >>> sector_t sector; >>> >>> - if (iomap->type == IOMAP_INLINE) { >>> - WARN_ON_ONCE(pos); >>> + if (iomap->type == IOMAP_INLINE) >>> iomap_read_inline_data(inode, page, iomap); >>> - return PAGE_SIZE; >> >> Hi Andreas, >> >> Thanks for your patch. >> >> In my erofs test case, filled inline data will be zeroed out due to we fallback >> to following flow: >> >> if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) { >> zero_user(page, poff, plen); >> >> Should we return before this condition check? > > Yes, probably by returning i_blocksize(inode) after > iomap_read_inline_data, but that alone isn't enough to make the patch Could you please resend this diff as a new patch with above fix? > work completely. This really needs a review from Christoph and careful > testing of all the code paths. Later, I can do the test on read path. Thanks, > > Andreas > >> Thanks, >> >>> - } >>> >>> /* zero post-eof blocks as the page may be mapped */ >>> iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen); >>> @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, >>> if (PageUptodate(page)) >>> return 0; >>> >>> + if (iomap->type == IOMAP_INLINE) { >>> + iomap_read_inline_data(inode, page, iomap); >>> + return 0; >>> + } >>> + >>> do { >>> iomap_adjust_read_range(inode, iop, &block_start, >>> block_end - block_start, &poff, &plen); >>> @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, >>> goto out_no_page; >>> } >>> >>> - if (iomap->type == IOMAP_INLINE) >>> - iomap_read_inline_data(inode, page, iomap); >>> - else if (iomap->flags & IOMAP_F_BUFFER_HEAD) >>> + if (iomap->flags & IOMAP_F_BUFFER_HEAD) >>> status = __block_write_begin_int(page, pos, len, NULL, iomap); >>> else >>> status = __iomap_write_begin(inode, pos, len, page, iomap); >>> @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page, >>> { >>> void *addr; >>> >>> - WARN_ON_ONCE(!PageUptodate(page)); >>> - BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data)); >>> + BUG_ON(!inline_data_within_block(inode, iomap, pos + copied)); >>> >>> addr = kmap_atomic(page); >>> - memcpy(iomap->inline_data + pos, addr + pos, copied); >>> + memcpy(iomap->inline_data + offset_in_block(pos, inode), >>> + addr + offset_in_page(pos), copied); >>> kunmap_atomic(addr); >>> >>> mark_inode_dirty(inode); >>> @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, >>> const struct iomap_ops *ops) >>> { >>> unsigned int blocksize = i_blocksize(inode); >>> - unsigned int off = pos & (blocksize - 1); >>> + unsigned int off = offset_in_block(pos, inode); >>> >>> /* Block boundary? Nothing to do */ >>> if (!off) >>> @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, >>> struct iov_iter *iter = dio->submit.iter; >>> size_t copied; >>> >>> - BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data)); >>> + BUG_ON(!inline_data_within_block(inode, iomap, pos + length)); >>> >>> if (dio->flags & IOMAP_DIO_WRITE) { >>> loff_t size = inode->i_size; >>> >>> if (pos > size) >>> - memset(iomap->inline_data + size, 0, pos - size); >>> - copied = copy_from_iter(iomap->inline_data + pos, length, iter); >>> + memset(iomap->inline_data + >>> + offset_in_block(size, inode), 0, pos - size); >>> + copied = copy_from_iter(iomap->inline_data + >>> + offset_in_block(pos, inode), >>> + length, iter); >>> if (copied) { >>> if (pos + copied > size) >>> i_size_write(inode, pos + copied); >>> mark_inode_dirty(inode); >>> } >>> } else { >>> - copied = copy_to_iter(iomap->inline_data + pos, length, iter); >>> + copied = copy_to_iter(iomap->inline_data + >>> + offset_in_block(pos, inode), >>> + length, iter); >>> } >>> dio->size += copied; >>> return copied; >>> diff --git a/include/linux/iomap.h b/include/linux/iomap.h >>> index 2103b94cb1bf..a8a60dd2fdc0 100644 >>> --- a/include/linux/iomap.h >>> +++ b/include/linux/iomap.h >>> @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page) >>> return NULL; >>> } >>> >>> +void __iomap_read_inline_data(struct inode *inode, struct page *page, >>> + struct iomap *iomap); >>> + >>> ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, >>> const struct iomap_ops *ops); >>> int iomap_readpage(struct page *page, const struct iomap_ops *ops); >>> > . >
Ping, Andreas. :P On 2019/7/15 17:26, Chao Yu wrote: > On 2019/7/12 19:54, Andreas Gruenbacher wrote: >> On Fri, 12 Jul 2019 at 11:31, Chao Yu <yuchao0@huawei.com> wrote: >>> On 2019/7/11 20:28, Andreas Gruenbacher wrote: >>>> Something along the lines of the attached, broken patch might work in >>>> the end. >>>> >>>> Andreas >>>> >>>> --- >>>> fs/buffer.c | 10 ++++-- >>>> fs/iomap.c | 74 +++++++++++++++++++++++++++++-------------- >>>> include/linux/iomap.h | 3 ++ >>>> 3 files changed, 61 insertions(+), 26 deletions(-) >>>> >>>> diff --git a/fs/buffer.c b/fs/buffer.c >>>> index e450c55f6434..8d8668e377ab 100644 >>>> --- a/fs/buffer.c >>>> +++ b/fs/buffer.c >>>> @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to) >>>> EXPORT_SYMBOL(page_zero_new_buffers); >>>> >>>> static void >>>> -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, >>>> - struct iomap *iomap) >>>> +iomap_to_bh(struct inode *inode, struct page *page, sector_t block, >>>> + struct buffer_head *bh, struct iomap *iomap) >>>> { >>>> loff_t offset = block << inode->i_blkbits; >>>> >>>> @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, >>>> inode->i_blkbits; >>>> set_buffer_mapped(bh); >>>> break; >>>> + case IOMAP_INLINE: >>>> + __iomap_read_inline_data(inode, page, iomap); >>>> + set_buffer_uptodate(bh); >>>> + break; >>>> } >>>> } >>>> >>>> @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len, >>>> if (err) >>>> break; >>>> } else { >>>> - iomap_to_bh(inode, block, bh, iomap); >>>> + iomap_to_bh(inode, page, block, bh, iomap); >>>> } >>>> >>>> if (buffer_new(bh)) { >>>> diff --git a/fs/iomap.c b/fs/iomap.c >>>> index 45aa58e837b5..61188e95def2 100644 >>>> --- a/fs/iomap.c >>>> +++ b/fs/iomap.c >>>> @@ -260,24 +260,47 @@ struct iomap_readpage_ctx { >>>> struct list_head *pages; >>>> }; >>>> >>>> -static void >>>> -iomap_read_inline_data(struct inode *inode, struct page *page, >>>> +#define offset_in_block(offset, inode) \ >>>> + ((unsigned long)(offset) & (i_blocksize(inode) - 1)) >>>> + >>>> +static bool >>>> +inline_data_within_block(struct inode *inode, struct iomap *iomap, >>>> + unsigned int size) >>>> +{ >>>> + unsigned int off = offset_in_block(iomap->inline_data, inode); >>>> + >>>> + return size <= i_blocksize(inode) - off; >>>> +} >>>> + >>>> +void >>>> +__iomap_read_inline_data(struct inode *inode, struct page *page, >>>> struct iomap *iomap) >>>> { >>>> - size_t size = i_size_read(inode); >>>> + size_t size = offset_in_block(i_size_read(inode), inode); >>>> + unsigned int poff = offset_in_page(iomap->offset); >>>> + unsigned int bsize = i_blocksize(inode); >>>> void *addr; >>>> >>>> if (PageUptodate(page)) >>>> return; >>>> >>>> - BUG_ON(page->index); >>>> - BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); >>>> + BUG_ON(!inline_data_within_block(inode, iomap, size)); >>>> >>>> addr = kmap_atomic(page); >>>> - memcpy(addr, iomap->inline_data, size); >>>> - memset(addr + size, 0, PAGE_SIZE - size); >>>> + memcpy(addr + poff, iomap->inline_data, size); >>>> + memset(addr + poff + size, 0, bsize - size); >>>> kunmap_atomic(addr); >>>> - SetPageUptodate(page); >>>> +} >>>> + >>>> +static void >>>> +iomap_read_inline_data(struct inode *inode, struct page *page, >>>> + struct iomap *iomap) >>>> +{ >>>> + unsigned int poff = offset_in_page(iomap->offset); >>>> + unsigned int bsize = i_blocksize(inode); >>>> + >>>> + __iomap_read_inline_data(inode, page, iomap); >>>> + iomap_set_range_uptodate(page, poff, bsize); >>>> } >>>> >>>> static loff_t >>>> @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, >>>> unsigned poff, plen; >>>> sector_t sector; >>>> >>>> - if (iomap->type == IOMAP_INLINE) { >>>> - WARN_ON_ONCE(pos); >>>> + if (iomap->type == IOMAP_INLINE) >>>> iomap_read_inline_data(inode, page, iomap); >>>> - return PAGE_SIZE; >>> >>> Hi Andreas, >>> >>> Thanks for your patch. >>> >>> In my erofs test case, filled inline data will be zeroed out due to we fallback >>> to following flow: >>> >>> if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) { >>> zero_user(page, poff, plen); >>> >>> Should we return before this condition check? >> >> Yes, probably by returning i_blocksize(inode) after >> iomap_read_inline_data, but that alone isn't enough to make the patch > > Could you please resend this diff as a new patch with above fix? > >> work completely. This really needs a review from Christoph and careful >> testing of all the code paths. > > Later, I can do the test on read path. > > Thanks, > >> >> Andreas >> >>> Thanks, >>> >>>> - } >>>> >>>> /* zero post-eof blocks as the page may be mapped */ >>>> iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen); >>>> @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, >>>> if (PageUptodate(page)) >>>> return 0; >>>> >>>> + if (iomap->type == IOMAP_INLINE) { >>>> + iomap_read_inline_data(inode, page, iomap); >>>> + return 0; >>>> + } >>>> + >>>> do { >>>> iomap_adjust_read_range(inode, iop, &block_start, >>>> block_end - block_start, &poff, &plen); >>>> @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, >>>> goto out_no_page; >>>> } >>>> >>>> - if (iomap->type == IOMAP_INLINE) >>>> - iomap_read_inline_data(inode, page, iomap); >>>> - else if (iomap->flags & IOMAP_F_BUFFER_HEAD) >>>> + if (iomap->flags & IOMAP_F_BUFFER_HEAD) >>>> status = __block_write_begin_int(page, pos, len, NULL, iomap); >>>> else >>>> status = __iomap_write_begin(inode, pos, len, page, iomap); >>>> @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page, >>>> { >>>> void *addr; >>>> >>>> - WARN_ON_ONCE(!PageUptodate(page)); >>>> - BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data)); >>>> + BUG_ON(!inline_data_within_block(inode, iomap, pos + copied)); >>>> >>>> addr = kmap_atomic(page); >>>> - memcpy(iomap->inline_data + pos, addr + pos, copied); >>>> + memcpy(iomap->inline_data + offset_in_block(pos, inode), >>>> + addr + offset_in_page(pos), copied); >>>> kunmap_atomic(addr); >>>> >>>> mark_inode_dirty(inode); >>>> @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, >>>> const struct iomap_ops *ops) >>>> { >>>> unsigned int blocksize = i_blocksize(inode); >>>> - unsigned int off = pos & (blocksize - 1); >>>> + unsigned int off = offset_in_block(pos, inode); >>>> >>>> /* Block boundary? Nothing to do */ >>>> if (!off) >>>> @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, >>>> struct iov_iter *iter = dio->submit.iter; >>>> size_t copied; >>>> >>>> - BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data)); >>>> + BUG_ON(!inline_data_within_block(inode, iomap, pos + length)); >>>> >>>> if (dio->flags & IOMAP_DIO_WRITE) { >>>> loff_t size = inode->i_size; >>>> >>>> if (pos > size) >>>> - memset(iomap->inline_data + size, 0, pos - size); >>>> - copied = copy_from_iter(iomap->inline_data + pos, length, iter); >>>> + memset(iomap->inline_data + >>>> + offset_in_block(size, inode), 0, pos - size); >>>> + copied = copy_from_iter(iomap->inline_data + >>>> + offset_in_block(pos, inode), >>>> + length, iter); >>>> if (copied) { >>>> if (pos + copied > size) >>>> i_size_write(inode, pos + copied); >>>> mark_inode_dirty(inode); >>>> } >>>> } else { >>>> - copied = copy_to_iter(iomap->inline_data + pos, length, iter); >>>> + copied = copy_to_iter(iomap->inline_data + >>>> + offset_in_block(pos, inode), >>>> + length, iter); >>>> } >>>> dio->size += copied; >>>> return copied; >>>> diff --git a/include/linux/iomap.h b/include/linux/iomap.h >>>> index 2103b94cb1bf..a8a60dd2fdc0 100644 >>>> --- a/include/linux/iomap.h >>>> +++ b/include/linux/iomap.h >>>> @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page) >>>> return NULL; >>>> } >>>> >>>> +void __iomap_read_inline_data(struct inode *inode, struct page *page, >>>> + struct iomap *iomap); >>>> + >>>> ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, >>>> const struct iomap_ops *ops); >>>> int iomap_readpage(struct page *page, const struct iomap_ops *ops); >>>> >> . >> > . >
> +iomap_to_bh(struct inode *inode, struct page *page, sector_t block, > + struct buffer_head *bh, struct iomap *iomap) > { > loff_t offset = block << inode->i_blkbits; > > @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, > inode->i_blkbits; > set_buffer_mapped(bh); > break; > + case IOMAP_INLINE: > + __iomap_read_inline_data(inode, page, iomap); > + set_buffer_uptodate(bh); > + break; I have to say I hate pushing this into the legacy buffer_head code. My hope was that we could get rid of this code rather than adding to it. The other issue is that this now calls iomap functions from buffer.c, which I'd also really avoid. That being said until the tail packing fs (erofs?) actually uses buffer_heads we should not need this hunk, and I don't think erofs should have any reason to use buffer_heads. > +#define offset_in_block(offset, inode) \ > + ((unsigned long)(offset) & (i_blocksize(inode) - 1)) Make this an inline function, please. I think we also have a few other places that could make use of this helper, maybe it might even go into fs.h. Otherwise this looks sensible to me.
On Fri, Jul 12, 2019 at 01:54:07PM +0200, Andreas Gruenbacher wrote: > Yes, probably by returning i_blocksize(inode) after > iomap_read_inline_data, but that alone isn't enough to make the patch > work completely. This really needs a review from Christoph and careful > testing of all the code paths. Well, lets start with the testing. Can we get a coherent series that includes the erofs bits?
On 2019/7/18 20:31, Christoph Hellwig wrote: > That being said until the tail packing fs (erofs?) actually uses > buffer_heads we should not need this hunk, and I don't think erofs > should have any reason to use buffer_heads. Yes, erofs will not use buffer_head to support sub-page blocksize in the long term. too heavy and no need... Thanks, Gao Xiang
diff --git a/fs/iomap.c b/fs/iomap.c index 12654c2e78f8..d1c16b692d31 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -264,13 +264,12 @@ static void iomap_read_inline_data(struct inode *inode, struct page *page, struct iomap *iomap) { - size_t size = i_size_read(inode); + size_t size = iomap->length; void *addr; if (PageUptodate(page)) return; - BUG_ON(page->index); BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); addr = kmap_atomic(page); @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, sector_t sector; if (iomap->type == IOMAP_INLINE) { - WARN_ON_ONCE(pos); iomap_read_inline_data(inode, page, iomap); return PAGE_SIZE; }
Some filesystems like erofs/reiserfs have the ability to pack tail data into metadata, e.g.: IOMAP_MAPPED [0, 8192] IOMAP_INLINE [8192, 8200] However current IOMAP_INLINE type has assumption that: - inline data should be locating at page #0. - inline size should equal to .i_size Those restriction fail to convert to use iomap IOMAP_INLINE in erofs, so this patch tries to relieve above limits to make IOMAP_INLINE more generic to cover tail-packing case. Signed-off-by: Chao Yu <yuchao0@huawei.com> --- fs/iomap.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)