diff mbox series

[v2,2/2] btrfs: fix compressed write bio attribution

Message ID 20191212181934.GA33645@dennisz-mbp.dhcp.thefacebook.com (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Dennis Zhou Dec. 12, 2019, 6:19 p.m. UTC
From a0569aebde08e31e994c92d0b70befb84f7f5563 Mon Sep 17 00:00:00 2001
From: Dennis Zhou <dennis@kernel.org>
Date: Wed, 11 Dec 2019 15:20:15 -0800

Bio attribution is handled at bio_set_dev() as once we have a device, we
have a corresponding request_queue and then can derive the current css.
In special cases, we want to attribute to bio to someone else. This can
be done by calling bio_associate_blkg_from_css() or
kthread_associate_blkcg() depending on the scenario. Btrfs does this for
compressed writeback as they are handled by kworkers, so the latter can
be done here.

Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
above assumption that we'll have a request_queue when we are doing
association. To fix this, switch to using kthread_associate_blkcg().

Without this, we crash in btrfs/024:
[ 3052.093088] BUG: kernel NULL pointer dereference, address: 0000000000000510
[ 3052.107013] #PF: supervisor read access in kernel mode
[ 3052.107014] #PF: error_code(0x0000) - not-present page
[ 3052.107015] PGD 0 P4D 0
[ 3052.107021] Oops: 0000 [#1] SMP
[ 3052.138904] CPU: 42 PID: 201270 Comm: kworker/u161:0 Kdump: loaded Not tainted 5.5.0-rc1-00062-g4852d8ac90a9 #712
[ 3052.138905] Hardware name: Quanta Tioga Pass Single Side 01-0032211004/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
[ 3052.138912] Workqueue: btrfs-delalloc btrfs_work_helper
[ 3052.191375] RIP: 0010:bio_associate_blkg_from_css+0x1e/0x3c0
[ 3052.191377] Code: ff 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 89 f3 48 83 ec 08 48 8b 47 08 65 ff 05 ea 6e 9f 7e <48> 8b a8 10 05 00 00 45 31 c9 45 31 c0 31 d2 31 f6 b9 02 00 00 00
[ 3052.191379] RSP: 0018:ffffc900210cfc90 EFLAGS: 00010282
[ 3052.191380] RAX: 0000000000000000 RBX: ffff88bfe5573c00 RCX: 0000000000000000
[ 3052.191382] RDX: ffff889db48ec2f0 RSI: ffff88bfe5573c00 RDI: ffff889db48ec2f0
[ 3052.191386] RBP: 0000000000000800 R08: 0000000000203bb0 R09: ffff889db16b2400
[ 3052.293364] R10: 0000000000000000 R11: ffff88a07fffde80 R12: ffff889db48ec2f0
[ 3052.293365] R13: 0000000000001000 R14: ffff889de82bc000 R15: ffff889e2b7bdcc8
[ 3052.293367] FS:  0000000000000000(0000) GS:ffff889ffba00000(0000) knlGS:0000000000000000
[ 3052.293368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3052.293369] CR2: 0000000000000510 CR3: 0000000002611001 CR4: 00000000007606e0
[ 3052.293370] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3052.293371] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3052.293372] PKRU: 55555554
[ 3052.293376] Call Trace:
[ 3052.402552]  btrfs_submit_compressed_write+0x137/0x390
[ 3052.402558]  submit_compressed_extents+0x40f/0x4c0
[ 3052.422401]  btrfs_work_helper+0x246/0x5a0
[ 3052.422408]  process_one_work+0x200/0x570
[ 3052.438601]  ? process_one_work+0x180/0x570
[ 3052.438605]  worker_thread+0x4c/0x3e0
[ 3052.438614]  kthread+0x103/0x140
[ 3052.460735]  ? process_one_work+0x570/0x570
[ 3052.460737]  ? kthread_mod_delayed_work+0xc0/0xc0
[ 3052.460744]  ret_from_fork+0x24/0x30

Fixes: 1a41802701ec ("btrfs: drop bio_set_dev where not needed")
Cc: David Sterba <dsterba@suse.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Reported-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
v2: rely on kthread_associate_blkcg() instead.

 fs/btrfs/compression.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

David Sterba Dec. 13, 2019, 12:24 p.m. UTC | #1
On Thu, Dec 12, 2019 at 10:19:34AM -0800, Dennis Zhou wrote:
> From a0569aebde08e31e994c92d0b70befb84f7f5563 Mon Sep 17 00:00:00 2001
> From: Dennis Zhou <dennis@kernel.org>
> Date: Wed, 11 Dec 2019 15:20:15 -0800
> 
> Bio attribution is handled at bio_set_dev() as once we have a device, we
> have a corresponding request_queue and then can derive the current css.
> In special cases, we want to attribute to bio to someone else. This can
> be done by calling bio_associate_blkg_from_css() or
> kthread_associate_blkcg() depending on the scenario. Btrfs does this for
> compressed writeback as they are handled by kworkers, so the latter can
> be done here.
> 
> Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
> early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
> above assumption that we'll have a request_queue when we are doing
> association. To fix this, switch to using kthread_associate_blkcg().

Can be kthread_associate_blkcg used also for submit_extent_page that
calls bio_associate_blkg_from_css indirectly when initializing wbc?

2996                 bio_set_dev(bio, bdev);
2997                 wbc_init_bio(wbc, bio);
2998                 wbc_account_cgroup_owner(wbc, page, page_size);

wbc_init_bio:

	if (wbc)
		bio_associate_blkg_from_css();
Dennis Zhou Dec. 13, 2019, 10:21 p.m. UTC | #2
On Fri, Dec 13, 2019 at 01:24:01PM +0100, David Sterba wrote:
> On Thu, Dec 12, 2019 at 10:19:34AM -0800, Dennis Zhou wrote:
> > From a0569aebde08e31e994c92d0b70befb84f7f5563 Mon Sep 17 00:00:00 2001
> > From: Dennis Zhou <dennis@kernel.org>
> > Date: Wed, 11 Dec 2019 15:20:15 -0800
> > 
> > Bio attribution is handled at bio_set_dev() as once we have a device, we
> > have a corresponding request_queue and then can derive the current css.
> > In special cases, we want to attribute to bio to someone else. This can
> > be done by calling bio_associate_blkg_from_css() or
> > kthread_associate_blkcg() depending on the scenario. Btrfs does this for
> > compressed writeback as they are handled by kworkers, so the latter can
> > be done here.
> > 
> > Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
> > early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
> > above assumption that we'll have a request_queue when we are doing
> > association. To fix this, switch to using kthread_associate_blkcg().
> 
> Can be kthread_associate_blkcg used also for submit_extent_page that
> calls bio_associate_blkg_from_css indirectly when initializing wbc?
> 
> 2996                 bio_set_dev(bio, bdev);
> 2997                 wbc_init_bio(wbc, bio);
> 2998                 wbc_account_cgroup_owner(wbc, page, page_size);
> 
> wbc_init_bio:
> 
> 	if (wbc)
> 		bio_associate_blkg_from_css();

Correct me if I'm wrong, but I don't think submit_extent_page() is only
called from kthread contexts. So, we wouldn't be able to rely on
kthread_associate_blkcg().

I can think about how to make wbc better for association in general, but
it's a percpu decrement and increment so it shouldn't really be much in
overhead.

Thanks,
Dennis
David Sterba Dec. 17, 2019, 3:05 p.m. UTC | #3
On Fri, Dec 13, 2019 at 02:21:49PM -0800, Dennis Zhou wrote:
> On Fri, Dec 13, 2019 at 01:24:01PM +0100, David Sterba wrote:
> > On Thu, Dec 12, 2019 at 10:19:34AM -0800, Dennis Zhou wrote:
> > > From a0569aebde08e31e994c92d0b70befb84f7f5563 Mon Sep 17 00:00:00 2001
> > > From: Dennis Zhou <dennis@kernel.org>
> > > Date: Wed, 11 Dec 2019 15:20:15 -0800
> > > 
> > > Bio attribution is handled at bio_set_dev() as once we have a device, we
> > > have a corresponding request_queue and then can derive the current css.
> > > In special cases, we want to attribute to bio to someone else. This can
> > > be done by calling bio_associate_blkg_from_css() or
> > > kthread_associate_blkcg() depending on the scenario. Btrfs does this for
> > > compressed writeback as they are handled by kworkers, so the latter can
> > > be done here.
> > > 
> > > Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
> > > early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
> > > above assumption that we'll have a request_queue when we are doing
> > > association. To fix this, switch to using kthread_associate_blkcg().
> > 
> > Can be kthread_associate_blkcg used also for submit_extent_page that
> > calls bio_associate_blkg_from_css indirectly when initializing wbc?
> > 
> > 2996                 bio_set_dev(bio, bdev);
> > 2997                 wbc_init_bio(wbc, bio);
> > 2998                 wbc_account_cgroup_owner(wbc, page, page_size);
> > 
> > wbc_init_bio:
> > 
> > 	if (wbc)
> > 		bio_associate_blkg_from_css();
> 
> Correct me if I'm wrong, but I don't think submit_extent_page() is only
> called from kthread contexts. So, we wouldn't be able to rely on
> kthread_associate_blkcg().

Yeah, the kthread is not guaranteed here.

> I can think about how to make wbc better for association in general, but
> it's a percpu decrement and increment so it shouldn't really be much in
> overhead.

Performance is not my concern here, the addition of bios and blkcg
association is new and there were some integration bugs where I
independently removed early bdev association while the blkg relied on
that. I'm looking for ways to make it less error prone and the kthread
association looks exactly like that so I was curious if it's possible to
use it everywhere. If not, the bdev needs to be found from other
available data.
Dennis Zhou Dec. 17, 2019, 6:44 p.m. UTC | #4
On Tue, Dec 17, 2019 at 04:05:48PM +0100, David Sterba wrote:
> On Fri, Dec 13, 2019 at 02:21:49PM -0800, Dennis Zhou wrote:
> > On Fri, Dec 13, 2019 at 01:24:01PM +0100, David Sterba wrote:
> > > On Thu, Dec 12, 2019 at 10:19:34AM -0800, Dennis Zhou wrote:
> > > > From a0569aebde08e31e994c92d0b70befb84f7f5563 Mon Sep 17 00:00:00 2001
> > > > From: Dennis Zhou <dennis@kernel.org>
> > > > Date: Wed, 11 Dec 2019 15:20:15 -0800
> > > > 
> > > > Bio attribution is handled at bio_set_dev() as once we have a device, we
> > > > have a corresponding request_queue and then can derive the current css.
> > > > In special cases, we want to attribute to bio to someone else. This can
> > > > be done by calling bio_associate_blkg_from_css() or
> > > > kthread_associate_blkcg() depending on the scenario. Btrfs does this for
> > > > compressed writeback as they are handled by kworkers, so the latter can
> > > > be done here.
> > > > 
> > > > Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
> > > > early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
> > > > above assumption that we'll have a request_queue when we are doing
> > > > association. To fix this, switch to using kthread_associate_blkcg().
> > > 
> > > Can be kthread_associate_blkcg used also for submit_extent_page that
> > > calls bio_associate_blkg_from_css indirectly when initializing wbc?
> > > 
> > > 2996                 bio_set_dev(bio, bdev);
> > > 2997                 wbc_init_bio(wbc, bio);
> > > 2998                 wbc_account_cgroup_owner(wbc, page, page_size);
> > > 
> > > wbc_init_bio:
> > > 
> > > 	if (wbc)
> > > 		bio_associate_blkg_from_css();
> > 
> > Correct me if I'm wrong, but I don't think submit_extent_page() is only
> > called from kthread contexts. So, we wouldn't be able to rely on
> > kthread_associate_blkcg().
> 
> Yeah, the kthread is not guaranteed here.
> 
> > I can think about how to make wbc better for association in general, but
> > it's a percpu decrement and increment so it shouldn't really be much in
> > overhead.
> 
> Performance is not my concern here, the addition of bios and blkcg
> association is new and there were some integration bugs where I
> independently removed early bdev association while the blkg relied on
> that. I'm looking for ways to make it less error prone and the kthread
> association looks exactly like that so I was curious if it's possible to
> use it everywhere. If not, the bdev needs to be found from other
> available data.

Yeah. At the time, going through bio_set_dev() was a way to guarantee we
weren't missing an association with a blk-cgroup. This simplified
auditing and prevented newer use cases from missing it. However, I do
agree it's quite error prone.. I'll put it on my list and see if I can
come up with something better.

Thanks,
Dennis
diff mbox series

Patch

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 4ce81571f0cd..de95ad27722f 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -447,7 +447,7 @@  blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 
 	if (blkcg_css) {
 		bio->bi_opf |= REQ_CGROUP_PUNT;
-		bio_associate_blkg_from_css(bio, blkcg_css);
+		kthread_associate_blkcg(blkcg_css);
 	}
 	refcount_set(&cb->pending_bios, 1);
 
@@ -491,10 +491,8 @@  blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 			bio->bi_opf = REQ_OP_WRITE | write_flags;
 			bio->bi_private = cb;
 			bio->bi_end_io = end_compressed_bio_write;
-			if (blkcg_css) {
+			if (blkcg_css)
 				bio->bi_opf |= REQ_CGROUP_PUNT;
-				bio_associate_blkg_from_css(bio, blkcg_css);
-			}
 			bio_add_page(bio, page, PAGE_SIZE, 0);
 		}
 		if (bytes_left < PAGE_SIZE) {
@@ -521,6 +519,9 @@  blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 		bio_endio(bio);
 	}
 
+	if (blkcg_css)
+		kthread_associate_blkcg(NULL);
+
 	return 0;
 }