Message ID | 1432329245-5844-27-git-send-email-tj@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri 22-05-15 17:13:40, Tejun Heo wrote: > Currently, balance_dirty_pages() always work on bdi->wb. This patch > updates it to work on the wb (bdi_writeback) matching memcg and blkcg > of the current task as that's what the inode is being dirtied against. > > balance_dirty_pages_ratelimited() now pins the current wb and passes > it to balance_dirty_pages(). > > As no filesystem has FS_CGROUP_WRITEBACK yet, this doesn't lead to > visible behavior differences. ... > void balance_dirty_pages_ratelimited(struct address_space *mapping) > { > - struct backing_dev_info *bdi = inode_to_bdi(mapping->host); > - struct bdi_writeback *wb = &bdi->wb; > + struct inode *inode = mapping->host; > + struct backing_dev_info *bdi = inode_to_bdi(inode); > + struct bdi_writeback *wb = NULL; > int ratelimit; > int *p; > > if (!bdi_cap_account_dirty(bdi)) > return; > > + if (inode_cgwb_enabled(inode)) > + wb = wb_get_create_current(bdi, GFP_KERNEL); > + if (!wb) > + wb = &bdi->wb; > + So this effectively adds a radix tree lookup (of wb belonging to memcg) for every set_page_dirty() call. That seems relatively costly to me. And all that just to check wb->dirty_exceeded. Cannot we just use inode_to_wb() instead? I understand results may be different if multiple memcgs share an inode and that's the reason why you use wb_get_create_current(), right? But for dirty_exceeded check it may be good enough? Honza > ratelimit = current->nr_dirtied_pause; > if (wb->dirty_exceeded) > ratelimit = min(ratelimit, 32 >> (PAGE_SHIFT - 10)); > @@ -1616,7 +1622,9 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping) > preempt_enable(); > > if (unlikely(current->nr_dirtied >= ratelimit)) > - balance_dirty_pages(mapping, current->nr_dirtied); > + balance_dirty_pages(mapping, wb, current->nr_dirtied); > + > + wb_put(wb); > } > EXPORT_SYMBOL(balance_dirty_pages_ratelimited); > > -- > 2.4.0 >
Hello, Jan. On Tue, Jun 30, 2015 at 04:31:00PM +0200, Jan Kara wrote: ... > > + if (inode_cgwb_enabled(inode)) > > + wb = wb_get_create_current(bdi, GFP_KERNEL); > > + if (!wb) > > + wb = &bdi->wb; > > + > > So this effectively adds a radix tree lookup (of wb belonging to memcg) for > every set_page_dirty() call. That seems relatively costly to me. And all Hmmm... idk, radix tree lookups should be cheap especially when shallow and set_page_dirty(). It's a glorified array indexing. If not, we should really be improving radix tree implementation. That said, > that just to check wb->dirty_exceeded. Cannot we just use inode_to_wb() > instead? I understand results may be different if multiple memcgs share an > inode and that's the reason why you use wb_get_create_current(), right? > But for dirty_exceeded check it may be good enough? Yeah, that probably should work. I'll think more about it. Thanks.
diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 4d0a9da..e31dea9 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1337,6 +1337,7 @@ static inline void wb_dirty_limits(struct bdi_writeback *wb, * perform some writeout. */ static void balance_dirty_pages(struct address_space *mapping, + struct bdi_writeback *wb, unsigned long pages_dirtied) { unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ @@ -1352,8 +1353,7 @@ static void balance_dirty_pages(struct address_space *mapping, unsigned long task_ratelimit; unsigned long dirty_ratelimit; unsigned long pos_ratio; - struct backing_dev_info *bdi = inode_to_bdi(mapping->host); - struct bdi_writeback *wb = &bdi->wb; + struct backing_dev_info *bdi = wb->bdi; bool strictlimit = bdi->capabilities & BDI_CAP_STRICTLIMIT; unsigned long start_time = jiffies; @@ -1575,14 +1575,20 @@ DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0; */ void balance_dirty_pages_ratelimited(struct address_space *mapping) { - struct backing_dev_info *bdi = inode_to_bdi(mapping->host); - struct bdi_writeback *wb = &bdi->wb; + struct inode *inode = mapping->host; + struct backing_dev_info *bdi = inode_to_bdi(inode); + struct bdi_writeback *wb = NULL; int ratelimit; int *p; if (!bdi_cap_account_dirty(bdi)) return; + if (inode_cgwb_enabled(inode)) + wb = wb_get_create_current(bdi, GFP_KERNEL); + if (!wb) + wb = &bdi->wb; + ratelimit = current->nr_dirtied_pause; if (wb->dirty_exceeded) ratelimit = min(ratelimit, 32 >> (PAGE_SHIFT - 10)); @@ -1616,7 +1622,9 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping) preempt_enable(); if (unlikely(current->nr_dirtied >= ratelimit)) - balance_dirty_pages(mapping, current->nr_dirtied); + balance_dirty_pages(mapping, wb, current->nr_dirtied); + + wb_put(wb); } EXPORT_SYMBOL(balance_dirty_pages_ratelimited);
Currently, balance_dirty_pages() always work on bdi->wb. This patch updates it to work on the wb (bdi_writeback) matching memcg and blkcg of the current task as that's what the inode is being dirtied against. balance_dirty_pages_ratelimited() now pins the current wb and passes it to balance_dirty_pages(). As no filesystem has FS_CGROUP_WRITEBACK yet, this doesn't lead to visible behavior differences. v2: Updated for per-inode wb association. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> --- mm/page-writeback.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)