diff mbox

[05/16] iothread: release AioContext around aio_poll

Message ID 1452870739-28484-6-git-send-email-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini Jan. 15, 2016, 3:12 p.m. UTC
This is the first step towards having fine-grained critical sections in
dataplane threads, which resolves lock ordering problems between
address_space_* functions (which need the BQL when doing MMIO, even
after we complete RCU-based dispatch) and the AioContext.

Because AioContext does not use contention callbacks anymore, the
unit test has to be changed.

Previously applied as a0710f7995f914e3044e5899bd8ff6c43c62f916 and
then reverted.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c                     | 22 ++++------------------
 docs/multiple-iothreads.txt | 25 +++++++++++++++----------
 include/block/aio.h         |  3 ---
 iothread.c                  | 11 ++---------
 tests/test-aio.c            | 19 +++++++++++--------
 5 files changed, 32 insertions(+), 48 deletions(-)

Comments

Stefan Hajnoczi Feb. 2, 2016, 2:52 p.m. UTC | #1
On Fri, Jan 15, 2016 at 04:12:08PM +0100, Paolo Bonzini wrote:
> @@ -120,9 +117,17 @@ Block layer code must therefore expect to run in an IOThread and avoid using
>  old APIs that implicitly use the main loop.  See the "How to program for
>  IOThreads" above for information on how to do that.
>  
> -If main loop code such as a QMP function wishes to access a BlockDriverState it
> -must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
> -IOThread does not run in parallel.
> +If main loop code such as a QMP function wishes to access a BlockDriverState
> +it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure
> +that callbacks in the IOThread do not run in parallel.
> +
> +Code running in the monitor typically uses bdrv_drain() to ensure that
> +past requests from the guest are completed.  When a block device is
> +running in an IOThread, the IOThread can also process requests from
> +the guest (via ioeventfd).  These requests have to be blocked with
> +aio_disable_clients() before calling bdrv_drain().  You can then reenable
> +guest requests with aio_enable_clients() before finally releasing the
> +AioContext and completing the monitor command.

This is outdated.  Monitor commands do this:
bdrv_drained_begin() -> ... -> bdrv_drain() -> ... -> bdrv_drained_end()

> @@ -110,6 +111,8 @@ static void *test_acquire_thread(void *opaque)
>      qemu_mutex_lock(&data->start_lock);
>      qemu_mutex_unlock(&data->start_lock);
>  
> +    g_usleep(500000);

Sleep?
Paolo Bonzini Feb. 2, 2016, 3:01 p.m. UTC | #2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



On 02/02/2016 15:52, Stefan Hajnoczi wrote:
>>> @@ -110,6 +111,8 @@ static void *test_acquire_thread(void
>>> *opaque) qemu_mutex_lock(&data->start_lock); 
>>> qemu_mutex_unlock(&data->start_lock);
>>> 
>>> +    g_usleep(500000);
> Sleep?

What about it? :)

Paolo
Stefan Hajnoczi Feb. 3, 2016, 9:34 a.m. UTC | #3
On Tue, Feb 02, 2016 at 04:01:55PM +0100, Paolo Bonzini wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> 
> 
> On 02/02/2016 15:52, Stefan Hajnoczi wrote:
> >>> @@ -110,6 +111,8 @@ static void *test_acquire_thread(void
> >>> *opaque) qemu_mutex_lock(&data->start_lock); 
> >>> qemu_mutex_unlock(&data->start_lock);
> >>> 
> >>> +    g_usleep(500000);
> > Sleep?
> 
> What about it? :)

Sleep in a loop is inefficient but at least correct.

A sleep outside a loop is a race condition.  If the machine is heavily
loaded, whatever you are waiting for may not happen in 500 ms and the
test case is unreliable.

What is the purpose of this sleep?

Stefan
Paolo Bonzini Feb. 3, 2016, 9:52 a.m. UTC | #4
On 03/02/2016 10:34, Stefan Hajnoczi wrote:
>>>>>>>>> +    g_usleep(500000);
>>>>> Sleep?
>>> 
>>> What about it? :)
> Sleep in a loop is inefficient but at least correct.
> 
> A sleep outside a loop is a race condition.  If the machine is
> heavily loaded, whatever you are waiting for may not happen in 500
> ms and the test case is unreliable.
> 
> What is the purpose of this sleep?

In the test the desired ordering of events is

main thread				additional thread
---------------------------------------------------------------------
lock start_lock
start thread
					lock start_lock (waits)
acquire context
unlock start_lock
					lock start_lock (returns)
					unlock start_lock
aio_poll
   poll()							<<<<
					event_notifier_set	<<<<

Comparing the test to QEMU, the roles of the threads are reversed. The
test's "main thread" is QEMU's dataplane thread, and the test's
additional thread is QEMU's main thread.

The sleep "ensures" that poll() blocks before event_notifier_set.  The
sleep represents how QEMU's main thread acquires the context rarely,
and not right after starting the dataplane thread.

Because this is a file descriptor source, there is really no
difference between the code's behavior, no matter if aio_poll starts
before or after the event_notifier_set.  The test passes even if you
remove the sleep.

Do you have any suggestion?  Just add a comment?

Paolo
Stefan Hajnoczi Feb. 5, 2016, 5:51 p.m. UTC | #5
On Wed, Feb 03, 2016 at 10:52:38AM +0100, Paolo Bonzini wrote:
> 
> 
> On 03/02/2016 10:34, Stefan Hajnoczi wrote:
> >>>>>>>>> +    g_usleep(500000);
> >>>>> Sleep?
> >>> 
> >>> What about it? :)
> > Sleep in a loop is inefficient but at least correct.
> > 
> > A sleep outside a loop is a race condition.  If the machine is
> > heavily loaded, whatever you are waiting for may not happen in 500
> > ms and the test case is unreliable.
> > 
> > What is the purpose of this sleep?
> 
> In the test the desired ordering of events is
> 
> main thread				additional thread
> ---------------------------------------------------------------------
> lock start_lock
> start thread
> 					lock start_lock (waits)
> acquire context
> unlock start_lock
> 					lock start_lock (returns)
> 					unlock start_lock
> aio_poll
>    poll()							<<<<
> 					event_notifier_set	<<<<
> 
> Comparing the test to QEMU, the roles of the threads are reversed. The
> test's "main thread" is QEMU's dataplane thread, and the test's
> additional thread is QEMU's main thread.
> 
> The sleep "ensures" that poll() blocks before event_notifier_set.  The
> sleep represents how QEMU's main thread acquires the context rarely,
> and not right after starting the dataplane thread.
> 
> Because this is a file descriptor source, there is really no
> difference between the code's behavior, no matter if aio_poll starts
> before or after the event_notifier_set.  The test passes even if you
> remove the sleep.
> 
> Do you have any suggestion?  Just add a comment?

How about removing the sleep and adding a comment explaining that the
event_notifier_set() could happen either before or during poll()?

Stefan
Paolo Bonzini Feb. 8, 2016, 3:34 p.m. UTC | #6
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



On 05/02/2016 18:51, Stefan Hajnoczi wrote:
>> Because this is a file descriptor source, there is really no 
>> difference between the code's behavior, no matter if aio_poll 
>> starts before or after the event_notifier_set.  The test passes 
>> even if you remove the sleep.
>> 
>> Do you have any suggestion?  Just add a comment?
> 
> How about removing the sleep and adding a comment explaining that 
> the event_notifier_set() could happen either before or during 
> poll()?

Okay.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJWuLV0AAoJEL/70l94x66DajUIAIddRKfy4S45hUQmz9XUTbhA
cWfq7SWKjRB7FQSTM2rGDgDCvxDA1MejaIgde3GEMYRQcOj3yZgS8oSvDLXR8Eny
Y71Ir/kyjbqVEgjnZwN/1g7gULoS/dZCI/rXgjFjxhe3TC9Pwp6Gp6d6+vlnWRWM
c0a1xSwXpbnfLthozgz4eHD7d0u77/Ph7WCyU/2VYI09xKnAel+sIELzrFUhOYcc
us4/6tYE1YT0ZagpLkVHXdWbJK30d8tqSU+tBCrW6L/Ql8TAUjwbufaYZpCFBBiL
lF1Aif4q3Bg3iCohq4gRSf+96gp2GpIowJKqIJg0BpT9Z5XI/xRRzUeIq9eLbMM=
=zN4i
-----END PGP SIGNATURE-----
diff mbox

Patch

diff --git a/async.c b/async.c
index 29f2845..e4ef8b8 100644
--- a/async.c
+++ b/async.c
@@ -84,8 +84,8 @@  int aio_bh_poll(AioContext *ctx)
          * aio_notify again if necessary.
          */
         if (!bh->deleted && atomic_xchg(&bh->scheduled, 0)) {
-            /* Idle BHs and the notify BH don't count as progress */
-            if (!bh->idle && bh != ctx->notify_dummy_bh) {
+            /* Idle BHs don't count as progress */
+            if (!bh->idle) {
                 ret = 1;
             }
             bh->idle = 0;
@@ -237,7 +237,6 @@  aio_ctx_finalize(GSource     *source)
 {
     AioContext *ctx = (AioContext *) source;
 
-    qemu_bh_delete(ctx->notify_dummy_bh);
     thread_pool_free(ctx->thread_pool);
 
     qemu_mutex_lock(&ctx->bh_lock);
@@ -522,19 +521,6 @@  static void aio_timerlist_notify(void *opaque)
     aio_notify(opaque);
 }
 
-static void aio_rfifolock_cb(void *opaque)
-{
-    AioContext *ctx = opaque;
-
-    /* Kick owner thread in case they are blocked in aio_poll() */
-    qemu_bh_schedule(ctx->notify_dummy_bh);
-}
-
-static void notify_dummy_bh(void *opaque)
-{
-    /* Do nothing, we were invoked just to force the event loop to iterate */
-}
-
 static void event_notifier_dummy_cb(EventNotifier *e)
 {
 }
@@ -563,10 +549,10 @@  AioContext *aio_context_new(Error **errp)
                            event_notifier_dummy_cb);
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
-    rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
+    rfifolock_init(&ctx->lock, NULL, NULL);
     timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
 
-    ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
+    qemu_event_init(&ctx->sync_io_event, true);
 
     return ctx;
 fail:
diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
index ac3e74c..22a89b2 100644
--- a/docs/multiple-iothreads.txt
+++ b/docs/multiple-iothreads.txt
@@ -103,13 +103,10 @@  a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
 acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
 sure to acquire the AioContext for aio_bh_new() if necessary.
 
-The relationship between AioContext and the block layer
--------------------------------------------------------
-The AioContext originates from the QEMU block layer because it provides a
-scoped way of running event loop iterations until all work is done.  This
-feature is used to complete all in-flight block I/O requests (see
-bdrv_drain_all()).  Nowadays AioContext is a generic event loop that can be
-used by any QEMU subsystem.
+AioContext and the block layer
+------------------------------
+The AioContext originates from the QEMU block layer, even though nowadays
+AioContext is a generic event loop that can be used by any QEMU subsystem.
 
 The block layer has support for AioContext integrated.  Each BlockDriverState
 is associated with an AioContext using bdrv_set_aio_context() and
@@ -120,9 +117,17 @@  Block layer code must therefore expect to run in an IOThread and avoid using
 old APIs that implicitly use the main loop.  See the "How to program for
 IOThreads" above for information on how to do that.
 
-If main loop code such as a QMP function wishes to access a BlockDriverState it
-must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
-IOThread does not run in parallel.
+If main loop code such as a QMP function wishes to access a BlockDriverState
+it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure
+that callbacks in the IOThread do not run in parallel.
+
+Code running in the monitor typically uses bdrv_drain() to ensure that
+past requests from the guest are completed.  When a block device is
+running in an IOThread, the IOThread can also process requests from
+the guest (via ioeventfd).  These requests have to be blocked with
+aio_disable_clients() before calling bdrv_drain().  You can then reenable
+guest requests with aio_enable_clients() before finally releasing the
+AioContext and completing the monitor command.
 
 Long-running jobs (usually in the form of coroutines) are best scheduled in the
 BlockDriverState's AioContext to avoid the need to acquire/release around each
diff --git a/include/block/aio.h b/include/block/aio.h
index 191bd3e..7223daf 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -114,9 +114,6 @@  struct AioContext {
     bool notified;
     EventNotifier notifier;
 
-    /* Scheduling this BH forces the event loop it iterate */
-    QEMUBH *notify_dummy_bh;
-
     /* Thread pool for performing work and receiving completion callbacks */
     struct ThreadPool *thread_pool;
 
diff --git a/iothread.c b/iothread.c
index 1150912..14a12cd 100644
--- a/iothread.c
+++ b/iothread.c
@@ -38,7 +38,6 @@  bool aio_context_in_iothread(AioContext *ctx)
 static void *iothread_run(void *opaque)
 {
     IOThread *iothread = opaque;
-    bool blocking;
 
     rcu_register_thread();
 
@@ -48,14 +47,8 @@  static void *iothread_run(void *opaque)
     qemu_cond_signal(&iothread->init_done_cond);
     qemu_mutex_unlock(&iothread->init_done_lock);
 
-    while (!iothread->stopping) {
-        aio_context_acquire(iothread->ctx);
-        blocking = true;
-        while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) {
-            /* Progress was made, keep going */
-            blocking = false;
-        }
-        aio_context_release(iothread->ctx);
+    while (!atomic_read(&iothread->stopping)) {
+        aio_poll(iothread->ctx, true);
     }
 
     rcu_unregister_thread();
diff --git a/tests/test-aio.c b/tests/test-aio.c
index 6ccea98..41e30f7 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -99,6 +99,7 @@  static void event_ready_cb(EventNotifier *e)
 
 typedef struct {
     QemuMutex start_lock;
+    EventNotifier notifier;
     bool thread_acquired;
 } AcquireTestData;
 
@@ -110,6 +111,8 @@  static void *test_acquire_thread(void *opaque)
     qemu_mutex_lock(&data->start_lock);
     qemu_mutex_unlock(&data->start_lock);
 
+    g_usleep(500000);
+    event_notifier_set(&data->notifier);
     aio_context_acquire(ctx);
     aio_context_release(ctx);
 
@@ -124,20 +127,19 @@  static void set_event_notifier(AioContext *ctx, EventNotifier *notifier,
     aio_set_event_notifier(ctx, notifier, false, handler);
 }
 
-static void dummy_notifier_read(EventNotifier *unused)
+static void dummy_notifier_read(EventNotifier *n)
 {
-    g_assert(false); /* should never be invoked */
+    event_notifier_test_and_clear(n);
 }
 
 static void test_acquire(void)
 {
     QemuThread thread;
-    EventNotifier notifier;
     AcquireTestData data;
 
     /* Dummy event notifier ensures aio_poll() will block */
-    event_notifier_init(&notifier, false);
-    set_event_notifier(ctx, &notifier, dummy_notifier_read);
+    event_notifier_init(&data.notifier, false);
+    set_event_notifier(ctx, &data.notifier, dummy_notifier_read);
     g_assert(!aio_poll(ctx, false)); /* consume aio_notify() */
 
     qemu_mutex_init(&data.start_lock);
@@ -151,12 +153,13 @@  static void test_acquire(void)
     /* Block in aio_poll(), let other thread kick us and acquire context */
     aio_context_acquire(ctx);
     qemu_mutex_unlock(&data.start_lock); /* let the thread run */
-    g_assert(!aio_poll(ctx, true));
+    g_assert(aio_poll(ctx, true));
+    g_assert(!data.thread_acquired);
     aio_context_release(ctx);
 
     qemu_thread_join(&thread);
-    set_event_notifier(ctx, &notifier, NULL);
-    event_notifier_cleanup(&notifier);
+    set_event_notifier(ctx, &data.notifier, NULL);
+    event_notifier_cleanup(&data.notifier);
 
     g_assert(data.thread_acquired);
 }