diff mbox series

[2/9] blkcg, writeback: Add wbc->no_wbc_acct

Message ID 20190615182453.843275-3-tj@kernel.org (mailing list archive)
State New, archived
Headers show
Series [1/9] cgroup, blkcg: Prepare some symbols for module and !CONFIG_CGROUP usages | expand

Commit Message

Tejun Heo June 15, 2019, 6:24 p.m. UTC
When writeback IOs are bounced through async layers, the IOs should
only be accounted against the wbc from the original bdi writeback to
avoid confusing cgroup inode ownership arbitration.  Add
wbc->no_wbc_acct to allow disabling wbc accounting.  This will be used
make btfs compression work well with cgroup IO control.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/fs-writeback.c         | 2 +-
 include/linux/writeback.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

Comments

Jan Kara June 20, 2019, 3:21 p.m. UTC | #1
On Sat 15-06-19 11:24:46, Tejun Heo wrote:
> When writeback IOs are bounced through async layers, the IOs should
> only be accounted against the wbc from the original bdi writeback to
> avoid confusing cgroup inode ownership arbitration.  Add
> wbc->no_wbc_acct to allow disabling wbc accounting.  This will be used
> make btfs compression work well with cgroup IO control.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>

I'm completely ignorant of how btrfs compressed writeback works so don't
quite understand implications of this. So does this mean that writeback to
btrfs compressed files won't be able to transition inodes from one memcg to
another? Or are you trying to say the 'wbc' used from async worker thread
is actually a dummy one and we would double-account the writeback?

Anyway, AFAICS no_wbc_acct means: "IO done as a result of this wbc will not
have influence on inode memcg ownership", doesn't it?

								Honza
> ---
>  fs/fs-writeback.c         | 2 +-
>  include/linux/writeback.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index c29cff345b1f..667ba07fffcd 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -724,7 +724,7 @@ void wbc_account_io(struct writeback_control *wbc, struct page *page,
>  	 * behind a slow cgroup.  Ultimately, we want pageout() to kick off
>  	 * regular writeback instead of writing things out itself.
>  	 */
> -	if (!wbc->wb)
> +	if (!wbc->wb || wbc->no_wbc_acct)
>  		return;
>  
>  	id = mem_cgroup_css_from_page(page)->id;
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 738a0c24874f..b8f5f000cde4 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -68,6 +68,7 @@ struct writeback_control {
>  	unsigned for_reclaim:1;		/* Invoked from the page allocator */
>  	unsigned range_cyclic:1;	/* range_start is cyclic */
>  	unsigned for_sync:1;		/* sync(2) WB_SYNC_ALL writeback */
> +	unsigned no_wbc_acct:1;		/* skip wbc IO accounting */
>  #ifdef CONFIG_CGROUP_WRITEBACK
>  	struct bdi_writeback *wb;	/* wb this writeback is issued under */
>  	struct inode *inode;		/* inode being written out */
> -- 
> 2.17.1
>
Tejun Heo June 20, 2019, 5:02 p.m. UTC | #2
Hello, Jan.

On Thu, Jun 20, 2019 at 05:21:45PM +0200, Jan Kara wrote:
> I'm completely ignorant of how btrfs compressed writeback works so don't
> quite understand implications of this. So does this mean that writeback to
> btrfs compressed files won't be able to transition inodes from one memcg to
> another? Or are you trying to say the 'wbc' used from async worker thread
> is actually a dummy one and we would double-account the writeback?

So, before, only the async compression workers would run through the
wbc accounting code regardless of who originated the dirty pages,
which is obviously wrong.  After the patch, the code accounts when the
dirty pages are being handed off to the compression workers and
no_wbc_acct is used to suppress spurious accounting from the workers.

> Anyway, AFAICS no_wbc_acct means: "IO done as a result of this wbc will not
> have influence on inode memcg ownership", doesn't it?

Yeap.

Thanks.
Jan Kara June 24, 2019, 8:21 a.m. UTC | #3
Hello Tejun!

On Thu 20-06-19 10:02:50, Tejun Heo wrote:
> On Thu, Jun 20, 2019 at 05:21:45PM +0200, Jan Kara wrote:
> > I'm completely ignorant of how btrfs compressed writeback works so don't
> > quite understand implications of this. So does this mean that writeback to
> > btrfs compressed files won't be able to transition inodes from one memcg to
> > another? Or are you trying to say the 'wbc' used from async worker thread
> > is actually a dummy one and we would double-account the writeback?
> 
> So, before, only the async compression workers would run through the
> wbc accounting code regardless of who originated the dirty pages,
> which is obviously wrong.  After the patch, the code accounts when the
> dirty pages are being handed off to the compression workers and
> no_wbc_acct is used to suppress spurious accounting from the workers.

OK, now I understand. Just one more question: So effectively, you are using
wbc->no_wbc_acct to pass information from btrfs code to btrfs code telling
it whether IO should or should not be accounted with wbc_account_io().
Wouldn't it make more sense to just pass this information internally
within btrfs? Granted, if this mechanism gets more widespread use by other
filesystems, then probably using wbc flag makes more sense. But I'm not
sure if this isn't a premature generalization...

								Honza
Tejun Heo June 24, 2019, 12:58 p.m. UTC | #4
Hello, Jan.

On Mon, Jun 24, 2019 at 10:21:30AM +0200, Jan Kara wrote:
> OK, now I understand. Just one more question: So effectively, you are using
> wbc->no_wbc_acct to pass information from btrfs code to btrfs code telling
> it whether IO should or should not be accounted with wbc_account_io().

Yes.

> Wouldn't it make more sense to just pass this information internally
> within btrfs? Granted, if this mechanism gets more widespread use by other
> filesystems, then probably using wbc flag makes more sense. But I'm not
> sure if this isn't a premature generalization...

The btrfs async issuers end up using generic writeback path and uses
the generic wbc owner mechanisms so that ios are attached to the right
cgroup too.  So, I kinda prefer to provide a generic mechanism from
wbc side.  That said, the names are a bit misleading and I think it'd
be better to rename them to something more explicit, e.g. sth along
the line of wbc_update_cgroup_owner() and wbc->no_cgroup_owner.  What
do you think?

Thanks.
Jan Kara June 24, 2019, 4:39 p.m. UTC | #5
On Mon 24-06-19 05:58:56, Tejun Heo wrote:
> Hello, Jan.
> 
> On Mon, Jun 24, 2019 at 10:21:30AM +0200, Jan Kara wrote:
> > OK, now I understand. Just one more question: So effectively, you are using
> > wbc->no_wbc_acct to pass information from btrfs code to btrfs code telling
> > it whether IO should or should not be accounted with wbc_account_io().
> 
> Yes.
> 
> > Wouldn't it make more sense to just pass this information internally
> > within btrfs? Granted, if this mechanism gets more widespread use by other
> > filesystems, then probably using wbc flag makes more sense. But I'm not
> > sure if this isn't a premature generalization...
> 
> The btrfs async issuers end up using generic writeback path and uses
> the generic wbc owner mechanisms so that ios are attached to the right
> cgroup too.  So, I kinda prefer to provide a generic mechanism from
> wbc side.

OK, I can live with that. We just have to be kind of careful so that people
just don't sprinkle no_wbc_acct writeback around because they don't know
better. Maybe you could at least add comment to no_wbc_acct mentioning that
this is for the cases where writeback has already been accounted for?

> That said, the names are a bit misleading and I think it'd
> be better to rename them to something more explicit, e.g. sth along
> the line of wbc_update_cgroup_owner() and wbc->no_cgroup_owner.  What
> do you think?

Yeah, renaming would probably make things clearer as well.

								Honza
Jan Kara June 24, 2019, 4:49 p.m. UTC | #6
On Sat 15-06-19 11:24:46, Tejun Heo wrote:
> When writeback IOs are bounced through async layers, the IOs should
> only be accounted against the wbc from the original bdi writeback to
> avoid confusing cgroup inode ownership arbitration.  Add
> wbc->no_wbc_acct to allow disabling wbc accounting.  This will be used
> make btfs compression work well with cgroup IO control.
       ^^ btrfs

> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
...
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 738a0c24874f..b8f5f000cde4 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -68,6 +68,7 @@ struct writeback_control {
>  	unsigned for_reclaim:1;		/* Invoked from the page allocator */
>  	unsigned range_cyclic:1;	/* range_start is cyclic */
>  	unsigned for_sync:1;		/* sync(2) WB_SYNC_ALL writeback */
> +	unsigned no_wbc_acct:1;		/* skip wbc IO accounting */
>  #ifdef CONFIG_CGROUP_WRITEBACK
>  	struct bdi_writeback *wb;	/* wb this writeback is issued under */
>  	struct inode *inode;		/* inode being written out */

Can you add comment here that this is for writeback which has been already
accounted for?

Otherwise the patch looks good to me so feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
diff mbox series

Patch

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c29cff345b1f..667ba07fffcd 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -724,7 +724,7 @@  void wbc_account_io(struct writeback_control *wbc, struct page *page,
 	 * behind a slow cgroup.  Ultimately, we want pageout() to kick off
 	 * regular writeback instead of writing things out itself.
 	 */
-	if (!wbc->wb)
+	if (!wbc->wb || wbc->no_wbc_acct)
 		return;
 
 	id = mem_cgroup_css_from_page(page)->id;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 738a0c24874f..b8f5f000cde4 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -68,6 +68,7 @@  struct writeback_control {
 	unsigned for_reclaim:1;		/* Invoked from the page allocator */
 	unsigned range_cyclic:1;	/* range_start is cyclic */
 	unsigned for_sync:1;		/* sync(2) WB_SYNC_ALL writeback */
+	unsigned no_wbc_acct:1;		/* skip wbc IO accounting */
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct bdi_writeback *wb;	/* wb this writeback is issued under */
 	struct inode *inode;		/* inode being written out */