diff mbox

[26/51] writeback: let balance_dirty_pages() work on the matching cgroup bdi_writeback

Message ID 1432329245-5844-27-git-send-email-tj@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Tejun Heo May 22, 2015, 9:13 p.m. UTC
Currently, balance_dirty_pages() always work on bdi->wb.  This patch
updates it to work on the wb (bdi_writeback) matching memcg and blkcg
of the current task as that's what the inode is being dirtied against.

balance_dirty_pages_ratelimited() now pins the current wb and passes
it to balance_dirty_pages().

As no filesystem has FS_CGROUP_WRITEBACK yet, this doesn't lead to
visible behavior differences.

v2: Updated for per-inode wb association.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
---
 mm/page-writeback.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

Comments

Jan Kara June 30, 2015, 2:31 p.m. UTC | #1
On Fri 22-05-15 17:13:40, Tejun Heo wrote:
> Currently, balance_dirty_pages() always work on bdi->wb.  This patch
> updates it to work on the wb (bdi_writeback) matching memcg and blkcg
> of the current task as that's what the inode is being dirtied against.
> 
> balance_dirty_pages_ratelimited() now pins the current wb and passes
> it to balance_dirty_pages().
> 
> As no filesystem has FS_CGROUP_WRITEBACK yet, this doesn't lead to
> visible behavior differences.
...
>  void balance_dirty_pages_ratelimited(struct address_space *mapping)
>  {
> -	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
> -	struct bdi_writeback *wb = &bdi->wb;
> +	struct inode *inode = mapping->host;
> +	struct backing_dev_info *bdi = inode_to_bdi(inode);
> +	struct bdi_writeback *wb = NULL;
>  	int ratelimit;
>  	int *p;
>  
>  	if (!bdi_cap_account_dirty(bdi))
>  		return;
>  
> +	if (inode_cgwb_enabled(inode))
> +		wb = wb_get_create_current(bdi, GFP_KERNEL);
> +	if (!wb)
> +		wb = &bdi->wb;
> +

So this effectively adds a radix tree lookup (of wb belonging to memcg) for
every set_page_dirty() call. That seems relatively costly to me. And all
that just to check wb->dirty_exceeded. Cannot we just use inode_to_wb()
instead? I understand results may be different if multiple memcgs share an
inode and that's the reason why you use wb_get_create_current(), right?
But for dirty_exceeded check it may be good enough?

								Honza

>  	ratelimit = current->nr_dirtied_pause;
>  	if (wb->dirty_exceeded)
>  		ratelimit = min(ratelimit, 32 >> (PAGE_SHIFT - 10));
> @@ -1616,7 +1622,9 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
>  	preempt_enable();
>  
>  	if (unlikely(current->nr_dirtied >= ratelimit))
> -		balance_dirty_pages(mapping, current->nr_dirtied);
> +		balance_dirty_pages(mapping, wb, current->nr_dirtied);
> +
> +	wb_put(wb);
>  }
>  EXPORT_SYMBOL(balance_dirty_pages_ratelimited);
>  
> -- 
> 2.4.0
>
Tejun Heo July 2, 2015, 1:26 a.m. UTC | #2
Hello, Jan.

On Tue, Jun 30, 2015 at 04:31:00PM +0200, Jan Kara wrote:
...
> > +	if (inode_cgwb_enabled(inode))
> > +		wb = wb_get_create_current(bdi, GFP_KERNEL);
> > +	if (!wb)
> > +		wb = &bdi->wb;
> > +
> 
> So this effectively adds a radix tree lookup (of wb belonging to memcg) for
> every set_page_dirty() call. That seems relatively costly to me. And all

Hmmm... idk, radix tree lookups should be cheap especially when
shallow and set_page_dirty().  It's a glorified array indexing.  If
not, we should really be improving radix tree implementation.  That
said,

> that just to check wb->dirty_exceeded. Cannot we just use inode_to_wb()
> instead? I understand results may be different if multiple memcgs share an
> inode and that's the reason why you use wb_get_create_current(), right?
> But for dirty_exceeded check it may be good enough?

Yeah, that probably should work.  I'll think more about it.

Thanks.
diff mbox

Patch

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 4d0a9da..e31dea9 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1337,6 +1337,7 @@  static inline void wb_dirty_limits(struct bdi_writeback *wb,
  * perform some writeout.
  */
 static void balance_dirty_pages(struct address_space *mapping,
+				struct bdi_writeback *wb,
 				unsigned long pages_dirtied)
 {
 	unsigned long nr_reclaimable;	/* = file_dirty + unstable_nfs */
@@ -1352,8 +1353,7 @@  static void balance_dirty_pages(struct address_space *mapping,
 	unsigned long task_ratelimit;
 	unsigned long dirty_ratelimit;
 	unsigned long pos_ratio;
-	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
-	struct bdi_writeback *wb = &bdi->wb;
+	struct backing_dev_info *bdi = wb->bdi;
 	bool strictlimit = bdi->capabilities & BDI_CAP_STRICTLIMIT;
 	unsigned long start_time = jiffies;
 
@@ -1575,14 +1575,20 @@  DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0;
  */
 void balance_dirty_pages_ratelimited(struct address_space *mapping)
 {
-	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
-	struct bdi_writeback *wb = &bdi->wb;
+	struct inode *inode = mapping->host;
+	struct backing_dev_info *bdi = inode_to_bdi(inode);
+	struct bdi_writeback *wb = NULL;
 	int ratelimit;
 	int *p;
 
 	if (!bdi_cap_account_dirty(bdi))
 		return;
 
+	if (inode_cgwb_enabled(inode))
+		wb = wb_get_create_current(bdi, GFP_KERNEL);
+	if (!wb)
+		wb = &bdi->wb;
+
 	ratelimit = current->nr_dirtied_pause;
 	if (wb->dirty_exceeded)
 		ratelimit = min(ratelimit, 32 >> (PAGE_SHIFT - 10));
@@ -1616,7 +1622,9 @@  void balance_dirty_pages_ratelimited(struct address_space *mapping)
 	preempt_enable();
 
 	if (unlikely(current->nr_dirtied >= ratelimit))
-		balance_dirty_pages(mapping, current->nr_dirtied);
+		balance_dirty_pages(mapping, wb, current->nr_dirtied);
+
+	wb_put(wb);
 }
 EXPORT_SYMBOL(balance_dirty_pages_ratelimited);