diff mbox series

[for-6.0,2/2] block/nbd: ensure ->connection_thread is always valid

Message ID 20210409160406.1800272-3-rvkagan@yandex-team.ru (mailing list archive)
State New, archived
Headers show
Series block/nbd: assorted bugfixes | expand

Commit Message

Roman Kagan April 9, 2021, 4:04 p.m. UTC
Simplify lifetime management of BDRVNBDState->connection_thread by
delaying the possible cleanup of it until the BDRVNBDState itself goes
away.

This also fixes possible use-after-free in nbd_co_establish_connection
when it races with nbd_co_establish_connection_cancel.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
This is a more future-proof version of the fix for use-after-free but
less intrusive than Vladimir's series so that it can go in 6.0.

 block/nbd.c | 58 ++++++++++++++++++++++++++---------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy April 10, 2021, 8:06 a.m. UTC | #1
09.04.2021 19:04, Roman Kagan wrote:
> Simplify lifetime management of BDRVNBDState->connection_thread by
> delaying the possible cleanup of it until the BDRVNBDState itself goes
> away.
> 
> This also fixes possible use-after-free in nbd_co_establish_connection
> when it races with nbd_co_establish_connection_cancel.
> 
> Signed-off-by: Roman Kagan<rvkagan@yandex-team.ru>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Vladimir Sementsov-Ogievskiy April 10, 2021, 8:38 a.m. UTC | #2
10.04.2021 11:06, Vladimir Sementsov-Ogievskiy wrote:
> 09.04.2021 19:04, Roman Kagan wrote:
>> Simplify lifetime management of BDRVNBDState->connection_thread by
>> delaying the possible cleanup of it until the BDRVNBDState itself goes
>> away.
>>
>> This also fixes possible use-after-free in nbd_co_establish_connection
>> when it races with nbd_co_establish_connection_cancel.
>>
>> Signed-off-by: Roman Kagan<rvkagan@yandex-team.ru>
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 

Ha stop, it crashes iotest 51, as nbd_clear_bdrvstate is called also from nbd_process_options.

And this shows that we also do wrong thing when simply return from two ifs pre-patch (and one after-patch). Yes, after successful nbd_process options we should call nbd_clear_bdrvstate() on failure path.
Vladimir Sementsov-Ogievskiy April 10, 2021, 9:56 a.m. UTC | #3
10.04.2021 11:38, Vladimir Sementsov-Ogievskiy wrote:
> 10.04.2021 11:06, Vladimir Sementsov-Ogievskiy wrote:
>> 09.04.2021 19:04, Roman Kagan wrote:
>>> Simplify lifetime management of BDRVNBDState->connection_thread by
>>> delaying the possible cleanup of it until the BDRVNBDState itself goes
>>> away.
>>>
>>> This also fixes possible use-after-free in nbd_co_establish_connection
>>> when it races with nbd_co_establish_connection_cancel.
>>>
>>> Signed-off-by: Roman Kagan<rvkagan@yandex-team.ru>
>>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>
> 
> Ha stop, it crashes iotest 51, as nbd_clear_bdrvstate is called also from nbd_process_options.
> 
> And this shows that we also do wrong thing when simply return from two ifs pre-patch (and one after-patch). Yes, after successful nbd_process options we should call nbd_clear_bdrvstate() on failure path.
> 

And also it shows that patch is more difficult than it seems. I still think that for 6.0 we should take a simple use-after-free-only fix, and don't care about anything else.
Roman Kagan April 10, 2021, 12:20 p.m. UTC | #4
On Sat, Apr 10, 2021 at 12:56:34PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 10.04.2021 11:38, Vladimir Sementsov-Ogievskiy wrote:
> > 10.04.2021 11:06, Vladimir Sementsov-Ogievskiy wrote:
> > > 09.04.2021 19:04, Roman Kagan wrote:
> > > > Simplify lifetime management of BDRVNBDState->connection_thread by
> > > > delaying the possible cleanup of it until the BDRVNBDState itself goes
> > > > away.
> > > > 
> > > > This also fixes possible use-after-free in nbd_co_establish_connection
> > > > when it races with nbd_co_establish_connection_cancel.
> > > > 
> > > > Signed-off-by: Roman Kagan<rvkagan@yandex-team.ru>
> > > 
> > > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > 
> > 
> > Ha stop, it crashes iotest 51, as nbd_clear_bdrvstate is called also from nbd_process_options.
> > 
> > And this shows that we also do wrong thing when simply return from two ifs pre-patch (and one after-patch). Yes, after successful nbd_process options we should call nbd_clear_bdrvstate() on failure path.

The test didn't crash at me somehow but you're right there's bug here.

> And also it shows that patch is more difficult than it seems. I still think that for 6.0 we should take a simple use-after-free-only fix, and don't care about anything else.

I agree it turned out more complex than apporpriate for 6.0.  Let's
proceed with the one you've posted.

Thanks,
Roman.
diff mbox series

Patch

diff --git a/block/nbd.c b/block/nbd.c
index d86df3afcb..6e3879c0c5 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -144,16 +144,31 @@  typedef struct BDRVNBDState {
     NBDConnectThread *connect_thread;
 } BDRVNBDState;
 
+static void nbd_free_connect_thread(NBDConnectThread *thr);
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
                                     Error **errp);
 static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp);
-static void nbd_co_establish_connection_cancel(BlockDriverState *bs,
-                                               bool detach);
+static void nbd_co_establish_connection_cancel(BlockDriverState *bs);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
 
 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
+    NBDConnectThread *thr = s->connect_thread;
+    bool thr_running;
+
+    qemu_mutex_lock(&thr->mutex);
+    thr_running = thr->state == CONNECT_THREAD_RUNNING;
+    if (thr_running) {
+        thr->state = CONNECT_THREAD_RUNNING_DETACHED;
+    }
+    qemu_mutex_unlock(&thr->mutex);
+
+    /* the runaway thread will clean it up itself */
+    if (!thr_running) {
+        nbd_free_connect_thread(thr);
+    }
+
     object_unref(OBJECT(s->tlscreds));
     qapi_free_SocketAddress(s->saddr);
     s->saddr = NULL;
@@ -293,7 +308,7 @@  static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
         qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
     }
 
-    nbd_co_establish_connection_cancel(bs, false);
+    nbd_co_establish_connection_cancel(bs);
 
     reconnect_delay_timer_del(s);
 
@@ -333,7 +348,7 @@  static void nbd_teardown_connection(BlockDriverState *bs)
         if (s->connection_co_sleep_ns_state) {
             qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
         }
-        nbd_co_establish_connection_cancel(bs, true);
+        nbd_co_establish_connection_cancel(bs);
     }
     if (qemu_in_coroutine()) {
         s->teardown_co = qemu_coroutine_self();
@@ -534,18 +549,12 @@  nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
  * nbd_co_establish_connection_cancel
  * Cancel nbd_co_establish_connection asynchronously: it will finish soon, to
  * allow drained section to begin.
- *
- * If detach is true, also cleanup the state (or if thread is running, move it
- * to CONNECT_THREAD_RUNNING_DETACHED state). s->connect_thread becomes NULL if
- * detach is true.
  */
-static void nbd_co_establish_connection_cancel(BlockDriverState *bs,
-                                               bool detach)
+static void nbd_co_establish_connection_cancel(BlockDriverState *bs)
 {
     BDRVNBDState *s = bs->opaque;
     NBDConnectThread *thr = s->connect_thread;
     bool wake = false;
-    bool do_free = false;
 
     qemu_mutex_lock(&thr->mutex);
 
@@ -556,21 +565,10 @@  static void nbd_co_establish_connection_cancel(BlockDriverState *bs,
             s->wait_connect = false;
             wake = true;
         }
-        if (detach) {
-            thr->state = CONNECT_THREAD_RUNNING_DETACHED;
-            s->connect_thread = NULL;
-        }
-    } else if (detach) {
-        do_free = true;
     }
 
     qemu_mutex_unlock(&thr->mutex);
 
-    if (do_free) {
-        nbd_free_connect_thread(thr);
-        s->connect_thread = NULL;
-    }
-
     if (wake) {
         aio_co_wake(s->connection_co);
     }
@@ -2294,31 +2292,33 @@  static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
         return -EEXIST;
     }
 
+    nbd_init_connect_thread(s);
+
     /*
      * establish TCP connection, return error if it fails
      * TODO: Configurable retry-until-timeout behaviour.
      */
     if (nbd_establish_connection(bs, s->saddr, errp) < 0) {
-        yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
-        return -ECONNREFUSED;
+        ret = -ECONNREFUSED;
+        goto fail;
     }
 
     ret = nbd_client_handshake(bs, errp);
     if (ret < 0) {
-        yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
-        nbd_clear_bdrvstate(s);
-        return ret;
+        goto fail;
     }
     /* successfully connected */
     s->state = NBD_CLIENT_CONNECTED;
 
-    nbd_init_connect_thread(s);
-
     s->connection_co = qemu_coroutine_create(nbd_connection_entry, s);
     bdrv_inc_in_flight(bs);
     aio_co_schedule(bdrv_get_aio_context(bs), s->connection_co);
 
     return 0;
+fail:
+    yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
+    nbd_clear_bdrvstate(s);
+    return ret;
 }
 
 static int nbd_co_flush(BlockDriverState *bs)