Message ID | 20190923151209.7466-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: don't release queue's sysfs lock during switching elevator | expand |
On 9/23/19 8:12 AM, Ming Lei wrote: > @@ -523,11 +521,9 @@ void elv_unregister_queue(struct request_queue *q) > kobject_uevent(&e->kobj, KOBJ_REMOVE); > kobject_del(&e->kobj); > > - mutex_lock(&q->sysfs_lock); > e->registered = 0; > /* Re-enable throttling in case elevator disabled it */ > wbt_enable_default(q); > - mutex_unlock(&q->sysfs_lock); > } > } Does this patch cause sysfs_lock to be held around kobject_del(&e->kobj)? Since sysfs_lock is locked from inside elv_attr_show() and elv_attr_store(), does this mean that this patch reintroduces the lock inversion problem that was fixed recently? Thanks, Bart.
On Tue, Sep 24, 2019 at 11:37:09AM -0700, Bart Van Assche wrote: > On 9/23/19 8:12 AM, Ming Lei wrote: > > @@ -523,11 +521,9 @@ void elv_unregister_queue(struct request_queue *q) > > kobject_uevent(&e->kobj, KOBJ_REMOVE); > > kobject_del(&e->kobj); > > - mutex_lock(&q->sysfs_lock); > > e->registered = 0; > > /* Re-enable throttling in case elevator disabled it */ > > wbt_enable_default(q); > > - mutex_unlock(&q->sysfs_lock); > > } > > } > > Does this patch cause sysfs_lock to be held around kobject_del(&e->kobj)? Yes. > Since sysfs_lock is locked from inside elv_attr_show() and elv_attr_store(), The request queue's sysfs_lock isn't required in elv_attr_show() and elv_attr_store(), and only elevator's sysfs_lock is needed in the two functions. > does this mean that this patch reintroduces the lock inversion problem that > was fixed recently? No. The lock inversion issue only existed on kobjects of q->kobj & q->mq_obj, which was fixed already given the queue's sysfs_lock is required in .show/.store callback of these two kobjects' attributes. thanks, Ming
On 9/24/19 5:13 PM, Ming Lei wrote: > On Tue, Sep 24, 2019 at 11:37:09AM -0700, Bart Van Assche wrote: >> On 9/23/19 8:12 AM, Ming Lei wrote: >>> @@ -523,11 +521,9 @@ void elv_unregister_queue(struct request_queue *q) >>> kobject_uevent(&e->kobj, KOBJ_REMOVE); >>> kobject_del(&e->kobj); >>> - mutex_lock(&q->sysfs_lock); >>> e->registered = 0; >>> /* Re-enable throttling in case elevator disabled it */ >>> wbt_enable_default(q); >>> - mutex_unlock(&q->sysfs_lock); >>> } >>> } >> >> Does this patch cause sysfs_lock to be held around kobject_del(&e->kobj)? > > Yes. > >> Since sysfs_lock is locked from inside elv_attr_show() and elv_attr_store(), > > The request queue's sysfs_lock isn't required in elv_attr_show() and > elv_attr_store(), and only elevator's sysfs_lock is needed in the two > functions. > >> does this mean that this patch reintroduces the lock inversion problem that >> was fixed recently? > > No. > > The lock inversion issue only existed on kobjects of q->kobj & q->mq_obj, > which was fixed already given the queue's sysfs_lock is required in > .show/.store callback of these two kobjects' attributes. I had confused e->kobj and eq->kobj, hence my comments. After having taken another look at your patch: Reviewed-by: Bart Van Assche <bvanassche@acm.org>
On 9/23/19 5:12 PM, Ming Lei wrote: > cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to > release & acquire sysfs_lock before registering/un-registering elevator > queue during switching elevator for avoiding potential deadlock from > showing & storing 'queue/iosched' attributes and removing elevator's > kobject. > > Turns out there isn't such deadlock because 'q->sysfs_lock' isn't > required in .show & .store of queue/iosched's attributes, and just > elevator's sysfs lock is acquired in elv_iosched_store() and > elv_iosched_show(). So it is safe to hold queue's sysfs lock when > registering/un-registering elevator queue. > > The biggest issue is that commit cecf5d87ff20 assumes that concurrent > write on 'queue/scheduler' can't happen. However, this assumption isn't > true, because kernfs_fop_write() only guarantees that concurrent write > aren't called on the same open file, but the write could be from > different open on the file. So we can't release & re-acquire queue's > sysfs lock during switching elevator, otherwise use-after-free on > elevator could be triggered. > > Fixes the issue by not releasing queue's sysfs lock during switching > elevator. Applied, thanks Ming.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index b82736c781c5..962fc0c44381 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -989,13 +989,11 @@ int blk_register_queue(struct gendisk *disk) blk_mq_debugfs_register(q); } - /* - * The flag of QUEUE_FLAG_REGISTERED isn't set yet, so elevator - * switch won't happen at all. - */ + mutex_lock(&q->sysfs_lock); if (q->elevator) { ret = elv_register_queue(q, false); if (ret) { + mutex_unlock(&q->sysfs_lock); mutex_unlock(&q->sysfs_dir_lock); kobject_del(&q->kobj); blk_trace_remove_sysfs(dev); @@ -1005,7 +1003,6 @@ int blk_register_queue(struct gendisk *disk) has_elevator = true; } - mutex_lock(&q->sysfs_lock); blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q); wbt_enable_default(q); blk_throtl_register_queue(q); @@ -1062,12 +1059,10 @@ void blk_unregister_queue(struct gendisk *disk) kobject_del(&q->kobj); blk_trace_remove_sysfs(disk_to_dev(disk)); - /* - * q->kobj has been removed, so it is safe to check if elevator - * exists without holding q->sysfs_lock. - */ + mutex_lock(&q->sysfs_lock); if (q->elevator) elv_unregister_queue(q); + mutex_unlock(&q->sysfs_lock); mutex_unlock(&q->sysfs_dir_lock); kobject_put(&disk_to_dev(disk)->kobj); diff --git a/block/elevator.c b/block/elevator.c index bba10e83478a..5437059c9261 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -503,9 +503,7 @@ int elv_register_queue(struct request_queue *q, bool uevent) if (uevent) kobject_uevent(&e->kobj, KOBJ_ADD); - mutex_lock(&q->sysfs_lock); e->registered = 1; - mutex_unlock(&q->sysfs_lock); } return error; } @@ -523,11 +521,9 @@ void elv_unregister_queue(struct request_queue *q) kobject_uevent(&e->kobj, KOBJ_REMOVE); kobject_del(&e->kobj); - mutex_lock(&q->sysfs_lock); e->registered = 0; /* Re-enable throttling in case elevator disabled it */ wbt_enable_default(q); - mutex_unlock(&q->sysfs_lock); } } @@ -590,32 +586,11 @@ int elevator_switch_mq(struct request_queue *q, lockdep_assert_held(&q->sysfs_lock); if (q->elevator) { - if (q->elevator->registered) { - mutex_unlock(&q->sysfs_lock); - - /* - * Concurrent elevator switch can't happen becasue - * sysfs write is always exclusively on same file. - * - * Also the elevator queue won't be freed after - * sysfs_lock is released becasue kobject_del() in - * blk_unregister_queue() waits for completion of - * .store & .show on its attributes. - */ + if (q->elevator->registered) elv_unregister_queue(q); - mutex_lock(&q->sysfs_lock); - } ioc_clear_queue(q); elevator_exit(q, q->elevator); - - /* - * sysfs_lock may be dropped, so re-check if queue is - * unregistered. If yes, don't switch to new elevator - * any more - */ - if (!blk_queue_registered(q)) - return 0; } ret = blk_mq_init_sched(q, new_e); @@ -623,11 +598,7 @@ int elevator_switch_mq(struct request_queue *q, goto out; if (new_e) { - mutex_unlock(&q->sysfs_lock); - ret = elv_register_queue(q, true); - - mutex_lock(&q->sysfs_lock); if (ret) { elevator_exit(q, q->elevator); goto out;
cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to release & acquire sysfs_lock before registering/un-registering elevator queue during switching elevator for avoiding potential deadlock from showing & storing 'queue/iosched' attributes and removing elevator's kobject. Turns out there isn't such deadlock because 'q->sysfs_lock' isn't required in .show & .store of queue/iosched's attributes, and just elevator's sysfs lock is acquired in elv_iosched_store() and elv_iosched_show(). So it is safe to hold queue's sysfs lock when registering/un-registering elevator queue. The biggest issue is that commit cecf5d87ff20 assumes that concurrent write on 'queue/scheduler' can't happen. However, this assumption isn't true, because kernfs_fop_write() only guarantees that concurrent write aren't called on the same open file, but the write could be from different open on the file. So we can't release & re-acquire queue's sysfs lock during switching elevator, otherwise use-after-free on elevator could be triggered. Fixes the issue by not releasing queue's sysfs lock during switching elevator. Fixes: cecf5d87ff20 ("block: split .sysfs_lock into two locks") Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-sysfs.c | 13 ++++--------- block/elevator.c | 31 +------------------------------ 2 files changed, 5 insertions(+), 39 deletions(-)