Message ID | 20240828033224.146584-1-haifeng.xu@shopee.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | buffer: Associate the meta bio with blkg from buffer page | expand |
On Wed, Aug 28, 2024 at 11:32:24AM +0800, Haifeng Xu wrote: > + } else if (buffer_meta(bh)) { > + struct folio *folio; > + struct cgroup_subsys_state *memcg_css, *blkcg_css; > + > + folio = page_folio(bh->b_page); > + memcg_css = mem_cgroup_css_from_folio(folio); > + if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && > + cgroup_subsys_on_dfl(io_cgrp_subsys)) { > + blkcg_css = cgroup_e_css(memcg_css->cgroup, &io_cgrp_subsys); > + bio_associate_blkg_from_css(bio, blkcg_css); > + } I'll leave it to people more familiar with cgroups to decide if this is the right thing to do, but if it is the code here should go into a helper so that other metadata bio submitters can reuse it. That helper also should have a good comment explaining it.
Hi Haifeng, kernel test robot noticed the following build errors: [auto build test ERROR on brauner-vfs/vfs.all] [also build test ERROR on linus/master v6.11-rc5 next-20240829] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Haifeng-Xu/buffer-Associate-the-meta-bio-with-blkg-from-buffer-page/20240828-113409 base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all patch link: https://lore.kernel.org/r/20240828033224.146584-1-haifeng.xu%40shopee.com patch subject: [PATCH] buffer: Associate the meta bio with blkg from buffer page config: alpha-defconfig (https://download.01.org/0day-ci/archive/20240830/202408300007.m9sHEOXo-lkp@intel.com/config) compiler: alpha-linux-gcc (GCC) 13.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240830/202408300007.m9sHEOXo-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408300007.m9sHEOXo-lkp@intel.com/ All errors (new ones prefixed by >>): fs/buffer.c: In function 'submit_bh_wbc': fs/buffer.c:2826:29: error: implicit declaration of function 'mem_cgroup_css_from_folio'; did you mean 'mem_cgroup_from_obj'? [-Werror=implicit-function-declaration] 2826 | memcg_css = mem_cgroup_css_from_folio(folio); | ^~~~~~~~~~~~~~~~~~~~~~~~~ | mem_cgroup_from_obj fs/buffer.c:2826:27: warning: assignment to 'struct cgroup_subsys_state *' from 'int' makes pointer from integer without a cast [-Wint-conversion] 2826 | memcg_css = mem_cgroup_css_from_folio(folio); | ^ >> fs/buffer.c:2827:21: error: implicit declaration of function 'cgroup_subsys_on_dfl' [-Werror=implicit-function-declaration] 2827 | if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && | ^~~~~~~~~~~~~~~~~~~~ >> fs/buffer.c:2827:42: error: 'memory_cgrp_subsys' undeclared (first use in this function) 2827 | if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && | ^~~~~~~~~~~~~~~~~~ fs/buffer.c:2827:42: note: each undeclared identifier is reported only once for each function it appears in fs/buffer.c:2828:42: error: 'io_cgrp_subsys' undeclared (first use in this function) 2828 | cgroup_subsys_on_dfl(io_cgrp_subsys)) { | ^~~~~~~~~~~~~~ >> fs/buffer.c:2829:37: error: implicit declaration of function 'cgroup_e_css'; did you mean 'cgroup_exit'? [-Werror=implicit-function-declaration] 2829 | blkcg_css = cgroup_e_css(memcg_css->cgroup, &io_cgrp_subsys); | ^~~~~~~~~~~~ | cgroup_exit >> fs/buffer.c:2829:59: error: invalid use of undefined type 'struct cgroup_subsys_state' 2829 | blkcg_css = cgroup_e_css(memcg_css->cgroup, &io_cgrp_subsys); | ^~ cc1: some warnings being treated as errors vim +/cgroup_subsys_on_dfl +2827 fs/buffer.c 2778 2779 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh, 2780 enum rw_hint write_hint, 2781 struct writeback_control *wbc) 2782 { 2783 const enum req_op op = opf & REQ_OP_MASK; 2784 struct bio *bio; 2785 2786 BUG_ON(!buffer_locked(bh)); 2787 BUG_ON(!buffer_mapped(bh)); 2788 BUG_ON(!bh->b_end_io); 2789 BUG_ON(buffer_delay(bh)); 2790 BUG_ON(buffer_unwritten(bh)); 2791 2792 /* 2793 * Only clear out a write error when rewriting 2794 */ 2795 if (test_set_buffer_req(bh) && (op == REQ_OP_WRITE)) 2796 clear_buffer_write_io_error(bh); 2797 2798 if (buffer_meta(bh)) 2799 opf |= REQ_META; 2800 if (buffer_prio(bh)) 2801 opf |= REQ_PRIO; 2802 2803 bio = bio_alloc(bh->b_bdev, 1, opf, GFP_NOIO); 2804 2805 fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO); 2806 2807 bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9); 2808 bio->bi_write_hint = write_hint; 2809 2810 __bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh)); 2811 2812 bio->bi_end_io = end_bio_bh_io_sync; 2813 bio->bi_private = bh; 2814 2815 /* Take care of bh's that straddle the end of the device */ 2816 guard_bio_eod(bio); 2817 2818 if (wbc) { 2819 wbc_init_bio(wbc, bio); 2820 wbc_account_cgroup_owner(wbc, bh->b_page, bh->b_size); 2821 } else if (buffer_meta(bh)) { 2822 struct folio *folio; 2823 struct cgroup_subsys_state *memcg_css, *blkcg_css; 2824 2825 folio = page_folio(bh->b_page); 2826 memcg_css = mem_cgroup_css_from_folio(folio); > 2827 if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && 2828 cgroup_subsys_on_dfl(io_cgrp_subsys)) { > 2829 blkcg_css = cgroup_e_css(memcg_css->cgroup, &io_cgrp_subsys); 2830 bio_associate_blkg_from_css(bio, blkcg_css); 2831 } 2832 } 2833 2834 submit_bio(bio); 2835 } 2836
Hi Haifeng, kernel test robot noticed the following build errors: [auto build test ERROR on brauner-vfs/vfs.all] [also build test ERROR on linus/master v6.11-rc5 next-20240829] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Haifeng-Xu/buffer-Associate-the-meta-bio-with-blkg-from-buffer-page/20240828-113409 base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all patch link: https://lore.kernel.org/r/20240828033224.146584-1-haifeng.xu%40shopee.com patch subject: [PATCH] buffer: Associate the meta bio with blkg from buffer page config: x86_64-buildonly-randconfig-002-20240829 (https://download.01.org/0day-ci/archive/20240830/202408300119.UQ0zNU1f-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240830/202408300119.UQ0zNU1f-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408300119.UQ0zNU1f-lkp@intel.com/ All error/warnings (new ones prefixed by >>): fs/buffer.c: In function 'submit_bh_wbc': >> fs/buffer.c:2826:29: error: implicit declaration of function 'mem_cgroup_css_from_folio'; did you mean 'mem_cgroup_from_obj'? [-Werror=implicit-function-declaration] 2826 | memcg_css = mem_cgroup_css_from_folio(folio); | ^~~~~~~~~~~~~~~~~~~~~~~~~ | mem_cgroup_from_obj >> fs/buffer.c:2826:27: warning: assignment to 'struct cgroup_subsys_state *' from 'int' makes pointer from integer without a cast [-Wint-conversion] 2826 | memcg_css = mem_cgroup_css_from_folio(folio); | ^ In file included from include/linux/array_size.h:5, from include/linux/kernel.h:16, from fs/buffer.c:22: >> fs/buffer.c:2827:42: error: 'memory_cgrp_subsys_on_dfl_key' undeclared (first use in this function); did you mean 'misc_cgrp_subsys_on_dfl_key'? 2827 | if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && | ^~~~~~~~~~~~~~~~~~ include/linux/compiler.h:19:53: note: in definition of macro 'likely_notrace' 19 | #define likely_notrace(x) __builtin_expect(!!(x), 1) | ^ include/linux/jump_label.h:511:56: note: in expansion of macro 'static_key_enabled' 511 | #define static_branch_likely(x) likely_notrace(static_key_enabled(&(x)->key)) | ^~~~~~~~~~~~~~~~~~ include/linux/cgroup.h:95:9: note: in expansion of macro 'static_branch_likely' 95 | static_branch_likely(&ss ## _on_dfl_key) | ^~~~~~~~~~~~~~~~~~~~ fs/buffer.c:2827:21: note: in expansion of macro 'cgroup_subsys_on_dfl' 2827 | if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && | ^~~~~~~~~~~~~~~~~~~~ fs/buffer.c:2827:42: note: each undeclared identifier is reported only once for each function it appears in 2827 | if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && | ^~~~~~~~~~~~~~~~~~ include/linux/compiler.h:19:53: note: in definition of macro 'likely_notrace' 19 | #define likely_notrace(x) __builtin_expect(!!(x), 1) | ^ include/linux/jump_label.h:511:56: note: in expansion of macro 'static_key_enabled' 511 | #define static_branch_likely(x) likely_notrace(static_key_enabled(&(x)->key)) | ^~~~~~~~~~~~~~~~~~ include/linux/cgroup.h:95:9: note: in expansion of macro 'static_branch_likely' 95 | static_branch_likely(&ss ## _on_dfl_key) | ^~~~~~~~~~~~~~~~~~~~ fs/buffer.c:2827:21: note: in expansion of macro 'cgroup_subsys_on_dfl' 2827 | if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && | ^~~~~~~~~~~~~~~~~~~~ >> fs/buffer.c:2828:42: error: 'io_cgrp_subsys_on_dfl_key' undeclared (first use in this function); did you mean 'misc_cgrp_subsys_on_dfl_key'? 2828 | cgroup_subsys_on_dfl(io_cgrp_subsys)) { | ^~~~~~~~~~~~~~ include/linux/compiler.h:19:53: note: in definition of macro 'likely_notrace' 19 | #define likely_notrace(x) __builtin_expect(!!(x), 1) | ^ include/linux/jump_label.h:511:56: note: in expansion of macro 'static_key_enabled' 511 | #define static_branch_likely(x) likely_notrace(static_key_enabled(&(x)->key)) | ^~~~~~~~~~~~~~~~~~ include/linux/cgroup.h:95:9: note: in expansion of macro 'static_branch_likely' 95 | static_branch_likely(&ss ## _on_dfl_key) | ^~~~~~~~~~~~~~~~~~~~ fs/buffer.c:2828:21: note: in expansion of macro 'cgroup_subsys_on_dfl' 2828 | cgroup_subsys_on_dfl(io_cgrp_subsys)) { | ^~~~~~~~~~~~~~~~~~~~ >> fs/buffer.c:2829:70: error: 'io_cgrp_subsys' undeclared (first use in this function); did you mean 'misc_cgrp_subsys'? 2829 | blkcg_css = cgroup_e_css(memcg_css->cgroup, &io_cgrp_subsys); | ^~~~~~~~~~~~~~ | misc_cgrp_subsys cc1: some warnings being treated as errors vim +2826 fs/buffer.c 2778 2779 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh, 2780 enum rw_hint write_hint, 2781 struct writeback_control *wbc) 2782 { 2783 const enum req_op op = opf & REQ_OP_MASK; 2784 struct bio *bio; 2785 2786 BUG_ON(!buffer_locked(bh)); 2787 BUG_ON(!buffer_mapped(bh)); 2788 BUG_ON(!bh->b_end_io); 2789 BUG_ON(buffer_delay(bh)); 2790 BUG_ON(buffer_unwritten(bh)); 2791 2792 /* 2793 * Only clear out a write error when rewriting 2794 */ 2795 if (test_set_buffer_req(bh) && (op == REQ_OP_WRITE)) 2796 clear_buffer_write_io_error(bh); 2797 2798 if (buffer_meta(bh)) 2799 opf |= REQ_META; 2800 if (buffer_prio(bh)) 2801 opf |= REQ_PRIO; 2802 2803 bio = bio_alloc(bh->b_bdev, 1, opf, GFP_NOIO); 2804 2805 fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO); 2806 2807 bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9); 2808 bio->bi_write_hint = write_hint; 2809 2810 __bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh)); 2811 2812 bio->bi_end_io = end_bio_bh_io_sync; 2813 bio->bi_private = bh; 2814 2815 /* Take care of bh's that straddle the end of the device */ 2816 guard_bio_eod(bio); 2817 2818 if (wbc) { 2819 wbc_init_bio(wbc, bio); 2820 wbc_account_cgroup_owner(wbc, bh->b_page, bh->b_size); 2821 } else if (buffer_meta(bh)) { 2822 struct folio *folio; 2823 struct cgroup_subsys_state *memcg_css, *blkcg_css; 2824 2825 folio = page_folio(bh->b_page); > 2826 memcg_css = mem_cgroup_css_from_folio(folio); > 2827 if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && > 2828 cgroup_subsys_on_dfl(io_cgrp_subsys)) { > 2829 blkcg_css = cgroup_e_css(memcg_css->cgroup, &io_cgrp_subsys); 2830 bio_associate_blkg_from_css(bio, blkcg_css); 2831 } 2832 } 2833 2834 submit_bio(bio); 2835 } 2836
Hello, Haifeng. On Wed, Aug 28, 2024 at 11:32:24AM +0800, Haifeng Xu wrote: ... > The filesystem is ext4(ordered). The meta data can be written out by > writeback, but if there are too many dirty pages, we had to do > checkpoint to write out the meta data in current thread context. > > In this case, the blkg of thread1 has set io.max, so the j_checkpoint_mutex > can't be released and many threads must wait for it. However, the blkg from > buffer page didn' set any io policy. Therefore, for the meta buffer head, > we can associate the bio with blkg from the buffer page instead of current > thread context. > > Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> > --- > fs/buffer.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/fs/buffer.c b/fs/buffer.c > index e55ad471c530..a7889f258d0d 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -2819,6 +2819,17 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh, > if (wbc) { > wbc_init_bio(wbc, bio); > wbc_account_cgroup_owner(wbc, bh->b_page, bh->b_size); > + } else if (buffer_meta(bh)) { > + struct folio *folio; > + struct cgroup_subsys_state *memcg_css, *blkcg_css; > + > + folio = page_folio(bh->b_page); > + memcg_css = mem_cgroup_css_from_folio(folio); > + if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && > + cgroup_subsys_on_dfl(io_cgrp_subsys)) { > + blkcg_css = cgroup_e_css(memcg_css->cgroup, &io_cgrp_subsys); > + bio_associate_blkg_from_css(bio, blkcg_css); I think the right way to do it is marking the bio with REQ_META and implement forced charging in blk-throtl similar to blk-iocost. Thanks.
Hi, Tejun! 在 2024/08/31 3:37, Tejun Heo 写道: > Hello, Haifeng. > > On Wed, Aug 28, 2024 at 11:32:24AM +0800, Haifeng Xu wrote: > ... >> The filesystem is ext4(ordered). The meta data can be written out by >> writeback, but if there are too many dirty pages, we had to do >> checkpoint to write out the meta data in current thread context. >> >> In this case, the blkg of thread1 has set io.max, so the j_checkpoint_mutex >> can't be released and many threads must wait for it. However, the blkg from >> buffer page didn' set any io policy. Therefore, for the meta buffer head, >> we can associate the bio with blkg from the buffer page instead of current >> thread context. >> >> Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> >> --- >> fs/buffer.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/fs/buffer.c b/fs/buffer.c >> index e55ad471c530..a7889f258d0d 100644 >> --- a/fs/buffer.c >> +++ b/fs/buffer.c >> @@ -2819,6 +2819,17 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh, >> if (wbc) { >> wbc_init_bio(wbc, bio); >> wbc_account_cgroup_owner(wbc, bh->b_page, bh->b_size); >> + } else if (buffer_meta(bh)) { >> + struct folio *folio; >> + struct cgroup_subsys_state *memcg_css, *blkcg_css; >> + >> + folio = page_folio(bh->b_page); >> + memcg_css = mem_cgroup_css_from_folio(folio); >> + if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && >> + cgroup_subsys_on_dfl(io_cgrp_subsys)) { >> + blkcg_css = cgroup_e_css(memcg_css->cgroup, &io_cgrp_subsys); >> + bio_associate_blkg_from_css(bio, blkcg_css); > > I think the right way to do it is marking the bio with REQ_META and > implement forced charging in blk-throtl similar to blk-iocost. This is the exact thing I did in the code I attached in the other thread, do you take a look? https://lore.kernel.org/all/97fc38e6-a226-5e22-efc2-4405beb6d75b@huaweicloud.com/ Thanks, Kuai > > Thanks. >
Hello, On Sat, Aug 31, 2024 at 02:11:08PM +0800, Yu Kuai wrote: ... > > I think the right way to do it is marking the bio with REQ_META and > > implement forced charging in blk-throtl similar to blk-iocost. > > This is the exact thing I did in the code I attached in the other > thread, do you take a look? > > https://lore.kernel.org/all/97fc38e6-a226-5e22-efc2-4405beb6d75b@huaweicloud.com/ Sorry about missing it but yeah that *looks* like the right direction to be headed. Would you mind testing it and turning it into a proper patch? Thanks.
Hi, 在 2024/08/31 16:03, Tejun Heo 写道: > Hello, > > On Sat, Aug 31, 2024 at 02:11:08PM +0800, Yu Kuai wrote: > ... >>> I think the right way to do it is marking the bio with REQ_META and >>> implement forced charging in blk-throtl similar to blk-iocost. >> >> This is the exact thing I did in the code I attached in the other >> thread, do you take a look? >> >> https://lore.kernel.org/all/97fc38e6-a226-5e22-efc2-4405beb6d75b@huaweicloud.com/ > > Sorry about missing it but yeah that *looks* like the right direction to be > headed. Would you mind testing it and turning it into a proper patch? Of course. Thanks! Kuai > > Thanks. >
diff --git a/fs/buffer.c b/fs/buffer.c index e55ad471c530..a7889f258d0d 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2819,6 +2819,17 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh, if (wbc) { wbc_init_bio(wbc, bio); wbc_account_cgroup_owner(wbc, bh->b_page, bh->b_size); + } else if (buffer_meta(bh)) { + struct folio *folio; + struct cgroup_subsys_state *memcg_css, *blkcg_css; + + folio = page_folio(bh->b_page); + memcg_css = mem_cgroup_css_from_folio(folio); + if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && + cgroup_subsys_on_dfl(io_cgrp_subsys)) { + blkcg_css = cgroup_e_css(memcg_css->cgroup, &io_cgrp_subsys); + bio_associate_blkg_from_css(bio, blkcg_css); + } } submit_bio(bio);
In our production environment, we found many tasks were hung for a long time. Their call traces are like these: thread 1: PID: 189529 TASK: ffff92ab51e5c080 CPU: 34 COMMAND: "mc" [ffffa638db807800] __schedule at ffffffff83b19898 [ffffa638db807888] schedule at ffffffff83b19e9e [ffffa638db8078a8] io_schedule at ffffffff83b1a316 [ffffa638db8078c0] bit_wait_io at ffffffff83b1a751 [ffffa638db8078d8] __wait_on_bit at ffffffff83b1a373 [ffffa638db807918] out_of_line_wait_on_bit at ffffffff83b1a46d [ffffa638db807970] __wait_on_buffer at ffffffff831b9c64 [ffffa638db807988] jbd2_log_do_checkpoint at ffffffff832b556e [ffffa638db8079e8] __jbd2_log_wait_for_space at ffffffff832b55dc [ffffa638db807a30] add_transaction_credits at ffffffff832af369 [ffffa638db807a98] start_this_handle at ffffffff832af50f [ffffa638db807b20] jbd2__journal_start at ffffffff832afe1f [ffffa638db807b60] __ext4_journal_start_sb at ffffffff83241af3 [ffffa638db807ba8] __ext4_new_inode at ffffffff83253be6 [ffffa638db807c80] ext4_mkdir at ffffffff8327ec9e [ffffa638db807d10] vfs_mkdir at ffffffff83182a92 [ffffa638db807d50] ovl_mkdir_real at ffffffffc0965c9f [overlay] [ffffa638db807d80] ovl_create_real at ffffffffc0965e8b [overlay] [ffffa638db807db8] ovl_create_or_link at ffffffffc09677cc [overlay] [ffffa638db807e10] ovl_create_object at ffffffffc0967a48 [overlay] [ffffa638db807e60] ovl_mkdir at ffffffffc0967ad3 [overlay] [ffffa638db807e70] vfs_mkdir at ffffffff83182a92 [ffffa638db807eb0] do_mkdirat at ffffffff83184305 [ffffa638db807f08] __x64_sys_mkdirat at ffffffff831843df [ffffa638db807f28] do_syscall_64 at ffffffff83b0bf1c [ffffa638db807f50] entry_SYSCALL_64_after_hwframe at ffffffff83c0007c other threads: PID: 21125 TASK: ffff929f5b9a0000 CPU: 44 COMMAND: "task_server" [ffffa638aff9b900] __schedule at ffffffff83b19898 [ffffa638aff9b988] schedule at ffffffff83b19e9e [ffffa638aff9b9a8] schedule_preempt_disabled at ffffffff83b1a24e [ffffa638aff9b9b8] __mutex_lock at ffffffff83b1af28 [ffffa638aff9ba38] __mutex_lock_slowpath at ffffffff83b1b1a3 [ffffa638aff9ba48] mutex_lock at ffffffff83b1b1e2 [ffffa638aff9ba60] mutex_lock_io at ffffffff83b1b210 [ffffa638aff9ba80] __jbd2_log_wait_for_space at ffffffff832b563b [ffffa638aff9bac8] add_transaction_credits at ffffffff832af369 [ffffa638aff9bb30] start_this_handle at ffffffff832af50f [ffffa638aff9bbb8] jbd2__journal_start at ffffffff832afe1f [ffffa638aff9bbf8] __ext4_journal_start_sb at ffffffff83241af3 [ffffa638aff9bc40] ext4_dirty_inode at ffffffff83266d0a [ffffa638aff9bc60] __mark_inode_dirty at ffffffff831ab423 [ffffa638aff9bca0] generic_update_time at ffffffff8319169d [ffffa638aff9bcb0] inode_update_time at ffffffff831916e5 [ffffa638aff9bcc0] file_update_time at ffffffff83191b01 [ffffa638aff9bd08] file_modified at ffffffff83191d47 [ffffa638aff9bd20] ext4_write_checks at ffffffff8324e6e4 [ffffa638aff9bd40] ext4_buffered_write_iter at ffffffff8324edfb [ffffa638aff9bd78] ext4_file_write_iter at ffffffff8324f553 [ffffa638aff9bdf8] ext4_file_write_iter at ffffffff8324f505 [ffffa638aff9be00] new_sync_write at ffffffff8316dfca [ffffa638aff9be90] vfs_write at ffffffff8316e975 [ffffa638aff9bec8] ksys_write at ffffffff83170a97 [ffffa638aff9bf08] __x64_sys_write at ffffffff83170b2a [ffffa638aff9bf18] do_syscall_64 at ffffffff83b0bf1c [ffffa638aff9bf38] asm_common_interrupt at ffffffff83c00cc8 [ffffa638aff9bf50] entry_SYSCALL_64_after_hwframe at ffffffff83c0007c The filesystem is ext4(ordered). The meta data can be written out by writeback, but if there are too many dirty pages, we had to do checkpoint to write out the meta data in current thread context. In this case, the blkg of thread1 has set io.max, so the j_checkpoint_mutex can't be released and many threads must wait for it. However, the blkg from buffer page didn' set any io policy. Therefore, for the meta buffer head, we can associate the bio with blkg from the buffer page instead of current thread context. Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> --- fs/buffer.c | 11 +++++++++++ 1 file changed, 11 insertions(+)