Message ID | 20230707084027.91022-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: add a debug accounting for eb pages contiguousness | expand |
On Fri, Jul 07, 2023 at 04:40:27PM +0800, Qu Wenruo wrote: > !!! DO NOT MERGE !!! > > Although the current folio interface is not yet providing a way to > allocate a range of contiguous pages, What do you mean? Both folios and compound pages will allocate larger than PAGE_SIZE objects if you as for it.
On 2023/7/7 19:39, Christoph Hellwig wrote: > On Fri, Jul 07, 2023 at 04:40:27PM +0800, Qu Wenruo wrote: >> !!! DO NOT MERGE !!! >> >> Although the current folio interface is not yet providing a way to >> allocate a range of contiguous pages, > > What do you mean? Both folios and compound pages will allocate larger > than PAGE_SIZE objects if you as for it. > I mean a multi-page/folios version of find_or_create_page(). We have filemap_get_folios(), but we can not use it to create new pages. While find_or_create_page()/pagecache_get_page() can only get one page. What I want here a something like find_or_create_folios() or pagecache_get_pages(). Thanks, Qu
On Fri, Jul 07, 2023 at 07:45:41PM +0800, Qu Wenruo wrote:
> I mean a multi-page/folios version of find_or_create_page().
__filemap_get_folio with with FGP_CREAT.
On 2023/7/7 19:51, Christoph Hellwig wrote: > On Fri, Jul 07, 2023 at 07:45:41PM +0800, Qu Wenruo wrote: >> I mean a multi-page/folios version of find_or_create_page(). > > __filemap_get_folio with with FGP_CREAT. > Unfortunately __filemap_get_folio() is still only creating folios using order 0, under the no_page tag: folio = filemap_alloc_folio(gfp, 0); So in that case, it's really no difference than find_or_create_page() for new pages/folios. Thanks, Qu
On Sun, Jul 09, 2023 at 12:44:33PM +0800, Qu Wenruo wrote: > Unfortunately __filemap_get_folio() is still only creating folios using > order 0, under the no_page tag: > > folio = filemap_alloc_folio(gfp, 0); > > So in that case, it's really no difference than find_or_create_page() > for new pages/folios. You're right, willys patches to pass the order haven't actually made it to mainline yet, and I assumed they did. So you'll have to wait another merge window.
On 2023/7/10 13:28, Christoph Hellwig wrote: > On Sun, Jul 09, 2023 at 12:44:33PM +0800, Qu Wenruo wrote: >> Unfortunately __filemap_get_folio() is still only creating folios using >> order 0, under the no_page tag: >> >> folio = filemap_alloc_folio(gfp, 0); >> >> So in that case, it's really no difference than find_or_create_page() >> for new pages/folios. > > You're right, willys patches to pass the order haven't actually > made it to mainline yet, and I assumed they did. So you'll have > to wait another merge window. I'm considering another solution like this: folio = filemap_alloc_folio(); do_something_with_folio(); ret = filemap_add_folio(); Which is not that complex and pretty much used everywhere in mm code. Although I still need the old per-page allocation as a fallback, which may make it more complex than expected. Thanks, Qu
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 7513388b0567..4ba51e1235c4 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2698,6 +2698,8 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) INIT_LIST_HEAD(&fs_info->allocated_roots); INIT_LIST_HEAD(&fs_info->allocated_ebs); spin_lock_init(&fs_info->eb_leak_lock); + atomic64_set(&fs_info->eb_allocated, 0); + atomic64_set(&fs_info->eb_pages_contig, 0); #endif extent_map_tree_init(&fs_info->mapping_tree); btrfs_init_block_rsv(&fs_info->global_block_rsv, @@ -4321,6 +4323,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) /* Cancel or finish ongoing discard work */ btrfs_discard_cleanup(fs_info); + btrfs_info(fs_info, "eb contig=%llu allocated=%llu", + atomic64_read(&fs_info->eb_pages_contig), + atomic64_read(&fs_info->eb_allocated)); if (!sb_rdonly(fs_info->sb)) { /* * The cleaner kthread is stopped, so do one final pass over diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index a91d5ad27984..995205441937 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3530,6 +3530,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, struct page *p; struct address_space *mapping = fs_info->btree_inode->i_mapping; u64 lockdep_owner = owner_root; + bool pages_contig = true; int uptodate = 1; int ret; @@ -3624,6 +3625,8 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, eb->pages[i] = p; if (!btrfs_page_test_uptodate(fs_info, p, eb->start, eb->len)) uptodate = 0; + if (i && eb->pages[i - 1] + 1 != p) + pages_contig = false; /* * We can't unlock the pages just yet since the extent buffer @@ -3657,7 +3660,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, /* add one reference for the tree */ check_buffer_tree_ref(eb); set_bit(EXTENT_BUFFER_IN_TREE, &eb->bflags); - + eb->pages_contig = pages_contig; + atomic64_inc(&fs_info->eb_allocated); + if (pages_contig) + atomic64_inc(&fs_info->eb_pages_contig); /* * Now it's safe to unlock the pages because any calls to * btree_release_folio will correctly detect that a page belongs to a diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index c5fae3a7d911..98b596bbac2e 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -86,6 +86,9 @@ struct extent_buffer { /* >= 0 if eb belongs to a log tree, -1 otherwise */ s8 log_index; + /* Set if the pages are contiguous. */ + bool pages_contig; + struct rw_semaphore lock; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index 203d2a267828..8cdd2af98bd6 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -802,6 +802,8 @@ struct btrfs_fs_info { spinlock_t eb_leak_lock; struct list_head allocated_ebs; + atomic64_t eb_allocated; + atomic64_t eb_pages_contig; #endif };
!!! DO NOT MERGE !!! Although the current folio interface is not yet providing a way to allocate a range of contiguous pages, I'm always curious if we can get rid of the cross page handling. So this patch is an attempt to provide some benchmark on the extent buffer page contiguousness. The patch handle such work by: - Every allocated extent buffer (except dummy) would increase fs_info->eb_allocated - If the pages of the extent buffer are contiguous, increase fs_info->eb_pages_contig - On close_ctree(), output both numbers The VM has the following setup for benchmark: - Mem: 4G - CPU: 8 vCores - Disk: 500G SATA SSD directly passed to VM - Fs: 2X10G LV, 16K nodesize 4K sectorsize. I tested two sitautions: - Metadata heavy workload The workload is "fsstress -p 20 -n 10000 -w -d $mnt". The result is 566140 / 854009 = 66.3 % Which is better than what I thought. - Data heavy workload The workload is fio with a 12G file (3x system memory). The result is 1355 / 2358 = 57.5 % Considering it's mostly beyond 50%, I believe it should be worthy for us to consider reduce the cross-page boundary overhead. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 5 +++++ fs/btrfs/extent_io.c | 8 +++++++- fs/btrfs/extent_io.h | 3 +++ fs/btrfs/fs.h | 2 ++ 4 files changed, 17 insertions(+), 1 deletion(-)