diff mbox series

[5/6] co-shared-resource: protect with a mutex

Message ID 20210510085941.22769-6-eesposit@redhat.com (mailing list archive)
State New, archived
Headers show
Series block-copy: make helper APIs thread safe | expand

Commit Message

Emanuele Giuseppe Esposito May 10, 2021, 8:59 a.m. UTC
co-shared-resource is currently not thread-safe, as also reported
in co-shared-resource.h. Add a QemuMutex because co_try_get_from_shres
can also be invoked from non-coroutine context.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 util/qemu-co-shared-resource.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy May 10, 2021, 11:40 a.m. UTC | #1
10.05.2021 11:59, Emanuele Giuseppe Esposito wrote:
> co-shared-resource is currently not thread-safe, as also reported
> in co-shared-resource.h. Add a QemuMutex because co_try_get_from_shres
> can also be invoked from non-coroutine context.

But it doesn't. It's called only from co_get_from_shres(). So, better make it a static function first.

> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>   util/qemu-co-shared-resource.c | 26 ++++++++++++++++++++++----
>   1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/util/qemu-co-shared-resource.c b/util/qemu-co-shared-resource.c
> index 1c83cd9d29..c455d02a1e 100644
> --- a/util/qemu-co-shared-resource.c
> +++ b/util/qemu-co-shared-resource.c
> @@ -32,6 +32,7 @@ struct SharedResource {
>       uint64_t available;
>   
>       CoQueue queue;
> +    QemuMutex lock;
>   };
>   
>   SharedResource *shres_create(uint64_t total)
> @@ -40,17 +41,23 @@ SharedResource *shres_create(uint64_t total)
>   
>       s->total = s->available = total;
>       qemu_co_queue_init(&s->queue);
> +    qemu_mutex_init(&s->lock);
>   
>       return s;
>   }
>   
>   void shres_destroy(SharedResource *s)
>   {
> -    assert(s->available == s->total);
> +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> +        assert(s->available == s->total);
> +    }

shres_destroy can't be thread-safe anyway, as it does qemu_mutex_destroy() and g_free. So, let's don't try to make part of shres_destroy() "thread safe".

> +
> +    qemu_mutex_destroy(&s->lock);
>       g_free(s);
>   }
>   
> -bool co_try_get_from_shres(SharedResource *s, uint64_t n)
> +/* Called with lock held */

it should be called _locked() than.


> +static bool co_try_get_from_shres_unlocked(SharedResource *s, uint64_t n)>   {
>       if (s->available >= n) {
>           s->available -= n;
> @@ -60,16 +67,27 @@ bool co_try_get_from_shres(SharedResource *s, uint64_t n)
>       return false;
>   }
>   
> +bool co_try_get_from_shres(SharedResource *s, uint64_t n)
> +{
> +    bool res;
> +    QEMU_LOCK_GUARD(&s->lock);
> +    res = co_try_get_from_shres_unlocked(s, n);
> +
> +    return res;
> +}
> +
>   void coroutine_fn co_get_from_shres(SharedResource *s, uint64_t n)
>   {
> +    QEMU_LOCK_GUARD(&s->lock);
>       assert(n <= s->total);
> -    while (!co_try_get_from_shres(s, n)) {
> -        qemu_co_queue_wait(&s->queue, NULL);
> +    while (!co_try_get_from_shres_unlocked(s, n)) {
> +        qemu_co_queue_wait(&s->queue, &s->lock);
>       }
>   }
>   
>   void coroutine_fn co_put_to_shres(SharedResource *s, uint64_t n)
>   {
> +    QEMU_LOCK_GUARD(&s->lock);
>       assert(s->total - s->available >= n);
>       s->available += n;
>       qemu_co_queue_restart_all(&s->queue);
>
Paolo Bonzini May 11, 2021, 8:34 a.m. UTC | #2
On 10/05/21 13:40, Vladimir Sementsov-Ogievskiy wrote:
> 
>> co-shared-resource is currently not thread-safe, as also reported
>> in co-shared-resource.h. Add a QemuMutex because co_try_get_from_shres
>> can also be invoked from non-coroutine context.
> 
> But it doesn't. It's called only from co_get_from_shres(). So, better 
> make it a static function first.

It's a sensible interface though.  It lets you sleep or retry in your 
own way if you cannot get the resources, so (apart from the 
unlocked/locked confusion in the names) I like keeping it in the public API.

Paolo
Stefan Hajnoczi May 12, 2021, 3:44 p.m. UTC | #3
On Mon, May 10, 2021 at 10:59:40AM +0200, Emanuele Giuseppe Esposito wrote:
> co-shared-resource is currently not thread-safe, as also reported
> in co-shared-resource.h. Add a QemuMutex because co_try_get_from_shres
> can also be invoked from non-coroutine context.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  util/qemu-co-shared-resource.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)

Hmm...this thread-safety change is more fine-grained than I was
expecting. If we follow this strategy basically any data structure used
by coroutines needs its own fine-grained lock (like Java's Object base
class which has its own lock).

I'm not sure I like it since callers may still need coarser grained
locks to protect their own state or synchronize access to multiple
items of data. Also, some callers may not need thread-safety.

Can the caller to be responsible for locking instead (e.g. using
CoMutex)?

> diff --git a/util/qemu-co-shared-resource.c b/util/qemu-co-shared-resource.c
> index 1c83cd9d29..c455d02a1e 100644
> --- a/util/qemu-co-shared-resource.c
> +++ b/util/qemu-co-shared-resource.c
> @@ -32,6 +32,7 @@ struct SharedResource {
>      uint64_t available;
>  
>      CoQueue queue;
> +    QemuMutex lock;

Please add a comment indicating what this lock protects.

Thread safety should also be documented in the header file so API users
know what to expect.
Paolo Bonzini May 12, 2021, 6:30 p.m. UTC | #4
On 12/05/21 17:44, Stefan Hajnoczi wrote:
> If we follow this strategy basically any data structure used
> by coroutines needs its own fine-grained lock (like Java's Object base
> class which has its own lock).

Maybe not all, but only those that use CoQueue?  Interestingly, I was a 
bit less okay with ProgressMeter and didn't think twice about the other two.

Paolo
Emanuele Giuseppe Esposito May 14, 2021, 2:10 p.m. UTC | #5
On 12/05/2021 17:44, Stefan Hajnoczi wrote:
> On Mon, May 10, 2021 at 10:59:40AM +0200, Emanuele Giuseppe Esposito wrote:
>> co-shared-resource is currently not thread-safe, as also reported
>> in co-shared-resource.h. Add a QemuMutex because co_try_get_from_shres
>> can also be invoked from non-coroutine context.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   util/qemu-co-shared-resource.c | 26 ++++++++++++++++++++++----
>>   1 file changed, 22 insertions(+), 4 deletions(-)
> 
> Hmm...this thread-safety change is more fine-grained than I was
> expecting. If we follow this strategy basically any data structure used
> by coroutines needs its own fine-grained lock (like Java's Object base
> class which has its own lock).
> 
> I'm not sure I like it since callers may still need coarser grained
> locks to protect their own state or synchronize access to multiple
> items of data. Also, some callers may not need thread-safety.
> 
> Can the caller to be responsible for locking instead (e.g. using
> CoMutex)?

Right now co-shared-resource is being used only by block-copy, so I 
guess locking it from the caller or within the API won't really matter 
in this case.

One possible idea on how to delegate this to the caller without adding 
additional small lock/unlock in block-copy is to move co_get_from_shres 
in block_copy_task_end, and calling it only when a boolean passed to 
block_copy_task_end is true.

Otherwise make b_c_task_end always call co_get_from_shres and then 
include co_get_from_shres in block_copy_task_create, so that we always 
add and in case remove (if error) in the shared resource.

Something like:

diff --git a/block/block-copy.c b/block/block-copy.c
index 3a447a7c3d..1e4914b0cb 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -233,6 +233,7 @@ static coroutine_fn BlockCopyTask 
*block_copy_task_create(BlockCopyState *s,
      /* region is dirty, so no existent tasks possible in it */
      assert(!find_conflicting_task(s, offset, bytes));
      QLIST_INSERT_HEAD(&s->tasks, task, list);
+    co_get_from_shres(s->mem, task->bytes);
      qemu_co_mutex_unlock(&s->tasks_lock);

      return task;
@@ -269,6 +270,7 @@ static void coroutine_fn 
block_copy_task_end(BlockCopyTask *task, int ret)
          bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, 
task->bytes);
      }
      qemu_co_mutex_lock(&task->s->tasks_lock);
+    co_put_to_shres(task->s->mem, task->bytes);
      task->s->in_flight_bytes -= task->bytes;
      QLIST_REMOVE(task, list);
      progress_set_remaining(task->s->progress,
@@ -379,7 +381,6 @@ static coroutine_fn int 
block_copy_task_run(AioTaskPool *pool,

      aio_task_pool_wait_slot(pool);
      if (aio_task_pool_status(pool) < 0) {
-        co_put_to_shres(task->s->mem, task->bytes);
          block_copy_task_end(task, -ECANCELED);
          g_free(task);
          return -ECANCELED;
@@ -498,7 +499,6 @@ static coroutine_fn int 
block_copy_task_entry(AioTask *task)
      }
      qemu_mutex_unlock(&t->s->calls_lock);

-    co_put_to_shres(t->s->mem, t->bytes);
      block_copy_task_end(t, ret);

      return ret;
@@ -687,8 +687,6 @@ block_copy_dirty_clusters(BlockCopyCallState 
*call_state)

          trace_block_copy_process(s, task->offset);

-        co_get_from_shres(s->mem, task->bytes);
-
          offset = task_end(task);
          bytes = end - offset;




> 
>> diff --git a/util/qemu-co-shared-resource.c b/util/qemu-co-shared-resource.c
>> index 1c83cd9d29..c455d02a1e 100644
>> --- a/util/qemu-co-shared-resource.c
>> +++ b/util/qemu-co-shared-resource.c
>> @@ -32,6 +32,7 @@ struct SharedResource {
>>       uint64_t available;
>>   
>>       CoQueue queue;
>> +    QemuMutex lock;
> 
> Please add a comment indicating what this lock protects.
> 
> Thread safety should also be documented in the header file so API users
> know what to expect.

Will do, thanks.

Emanuele
Vladimir Sementsov-Ogievskiy May 14, 2021, 2:26 p.m. UTC | #6
14.05.2021 17:10, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 12/05/2021 17:44, Stefan Hajnoczi wrote:
>> On Mon, May 10, 2021 at 10:59:40AM +0200, Emanuele Giuseppe Esposito wrote:
>>> co-shared-resource is currently not thread-safe, as also reported
>>> in co-shared-resource.h. Add a QemuMutex because co_try_get_from_shres
>>> can also be invoked from non-coroutine context.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>> ---
>>>   util/qemu-co-shared-resource.c | 26 ++++++++++++++++++++++----
>>>   1 file changed, 22 insertions(+), 4 deletions(-)
>>
>> Hmm...this thread-safety change is more fine-grained than I was
>> expecting. If we follow this strategy basically any data structure used
>> by coroutines needs its own fine-grained lock (like Java's Object base
>> class which has its own lock).
>>
>> I'm not sure I like it since callers may still need coarser grained
>> locks to protect their own state or synchronize access to multiple
>> items of data. Also, some callers may not need thread-safety.
>>
>> Can the caller to be responsible for locking instead (e.g. using
>> CoMutex)?
> 
> Right now co-shared-resource is being used only by block-copy, so I guess locking it from the caller or within the API won't really matter in this case.
> 
> One possible idea on how to delegate this to the caller without adding additional small lock/unlock in block-copy is to move co_get_from_shres in block_copy_task_end, and calling it only when a boolean passed to block_copy_task_end is true.
> 
> Otherwise make b_c_task_end always call co_get_from_shres and then include co_get_from_shres in block_copy_task_create, so that we always add and in case remove (if error) in the shared resource.
> 
> Something like:
> 
> diff --git a/block/block-copy.c b/block/block-copy.c
> index 3a447a7c3d..1e4914b0cb 100644
> --- a/block/block-copy.c
> +++ b/block/block-copy.c
> @@ -233,6 +233,7 @@ static coroutine_fn BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>       /* region is dirty, so no existent tasks possible in it */
>       assert(!find_conflicting_task(s, offset, bytes));
>       QLIST_INSERT_HEAD(&s->tasks, task, list);
> +    co_get_from_shres(s->mem, task->bytes);
>       qemu_co_mutex_unlock(&s->tasks_lock);
> 
>       return task;
> @@ -269,6 +270,7 @@ static void coroutine_fn block_copy_task_end(BlockCopyTask *task, int ret)
>           bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
>       }
>       qemu_co_mutex_lock(&task->s->tasks_lock);
> +    co_put_to_shres(task->s->mem, task->bytes);
>       task->s->in_flight_bytes -= task->bytes;
>       QLIST_REMOVE(task, list);
>       progress_set_remaining(task->s->progress,
> @@ -379,7 +381,6 @@ static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
> 
>       aio_task_pool_wait_slot(pool);
>       if (aio_task_pool_status(pool) < 0) {
> -        co_put_to_shres(task->s->mem, task->bytes);
>           block_copy_task_end(task, -ECANCELED);
>           g_free(task);
>           return -ECANCELED;
> @@ -498,7 +499,6 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
>       }
>       qemu_mutex_unlock(&t->s->calls_lock);
> 
> -    co_put_to_shres(t->s->mem, t->bytes);
>       block_copy_task_end(t, ret);
> 
>       return ret;
> @@ -687,8 +687,6 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
> 
>           trace_block_copy_process(s, task->offset);
> 
> -        co_get_from_shres(s->mem, task->bytes);

we want to get from shres here, after possible call to block_copy_task_shrink(), as task->bytes may be reduced.

> -
>           offset = task_end(task);
>           bytes = end - offset;
> 
> 
> 
> 
>>
>>> diff --git a/util/qemu-co-shared-resource.c b/util/qemu-co-shared-resource.c
>>> index 1c83cd9d29..c455d02a1e 100644
>>> --- a/util/qemu-co-shared-resource.c
>>> +++ b/util/qemu-co-shared-resource.c
>>> @@ -32,6 +32,7 @@ struct SharedResource {
>>>       uint64_t available;
>>>       CoQueue queue;
>>> +    QemuMutex lock;
>>
>> Please add a comment indicating what this lock protects.
>>
>> Thread safety should also be documented in the header file so API users
>> know what to expect.
> 
> Will do, thanks.
> 
> Emanuele
>
Emanuele Giuseppe Esposito May 14, 2021, 2:32 p.m. UTC | #7
On 14/05/2021 16:26, Vladimir Sementsov-Ogievskiy wrote:
> 14.05.2021 17:10, Emanuele Giuseppe Esposito wrote:
>>
>>
>> On 12/05/2021 17:44, Stefan Hajnoczi wrote:
>>> On Mon, May 10, 2021 at 10:59:40AM +0200, Emanuele Giuseppe Esposito 
>>> wrote:
>>>> co-shared-resource is currently not thread-safe, as also reported
>>>> in co-shared-resource.h. Add a QemuMutex because co_try_get_from_shres
>>>> can also be invoked from non-coroutine context.
>>>>
>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>> ---
>>>>   util/qemu-co-shared-resource.c | 26 ++++++++++++++++++++++----
>>>>   1 file changed, 22 insertions(+), 4 deletions(-)
>>>
>>> Hmm...this thread-safety change is more fine-grained than I was
>>> expecting. If we follow this strategy basically any data structure used
>>> by coroutines needs its own fine-grained lock (like Java's Object base
>>> class which has its own lock).
>>>
>>> I'm not sure I like it since callers may still need coarser grained
>>> locks to protect their own state or synchronize access to multiple
>>> items of data. Also, some callers may not need thread-safety.
>>>
>>> Can the caller to be responsible for locking instead (e.g. using
>>> CoMutex)?
>>
>> Right now co-shared-resource is being used only by block-copy, so I 
>> guess locking it from the caller or within the API won't really matter 
>> in this case.
>>
>> One possible idea on how to delegate this to the caller without adding 
>> additional small lock/unlock in block-copy is to move 
>> co_get_from_shres in block_copy_task_end, and calling it only when a 
>> boolean passed to block_copy_task_end is true.
>>
>> Otherwise make b_c_task_end always call co_get_from_shres and then 
>> include co_get_from_shres in block_copy_task_create, so that we always 
>> add and in case remove (if error) in the shared resource.
>>
>> Something like:
>>
>> diff --git a/block/block-copy.c b/block/block-copy.c
>> index 3a447a7c3d..1e4914b0cb 100644
>> --- a/block/block-copy.c
>> +++ b/block/block-copy.c
>> @@ -233,6 +233,7 @@ static coroutine_fn BlockCopyTask 
>> *block_copy_task_create(BlockCopyState *s,
>>       /* region is dirty, so no existent tasks possible in it */
>>       assert(!find_conflicting_task(s, offset, bytes));
>>       QLIST_INSERT_HEAD(&s->tasks, task, list);
>> +    co_get_from_shres(s->mem, task->bytes);
>>       qemu_co_mutex_unlock(&s->tasks_lock);
>>
>>       return task;
>> @@ -269,6 +270,7 @@ static void coroutine_fn 
>> block_copy_task_end(BlockCopyTask *task, int ret)
>>           bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, 
>> task->bytes);
>>       }
>>       qemu_co_mutex_lock(&task->s->tasks_lock);
>> +    co_put_to_shres(task->s->mem, task->bytes);
>>       task->s->in_flight_bytes -= task->bytes;
>>       QLIST_REMOVE(task, list);
>>       progress_set_remaining(task->s->progress,
>> @@ -379,7 +381,6 @@ static coroutine_fn int 
>> block_copy_task_run(AioTaskPool *pool,
>>
>>       aio_task_pool_wait_slot(pool);
>>       if (aio_task_pool_status(pool) < 0) {
>> -        co_put_to_shres(task->s->mem, task->bytes);
>>           block_copy_task_end(task, -ECANCELED);
>>           g_free(task);
>>           return -ECANCELED;
>> @@ -498,7 +499,6 @@ static coroutine_fn int 
>> block_copy_task_entry(AioTask *task)
>>       }
>>       qemu_mutex_unlock(&t->s->calls_lock);
>>
>> -    co_put_to_shres(t->s->mem, t->bytes);
>>       block_copy_task_end(t, ret);
>>
>>       return ret;
>> @@ -687,8 +687,6 @@ block_copy_dirty_clusters(BlockCopyCallState 
>> *call_state)
>>
>>           trace_block_copy_process(s, task->offset);
>>
>> -        co_get_from_shres(s->mem, task->bytes);
> 
> we want to get from shres here, after possible call to 
> block_copy_task_shrink(), as task->bytes may be reduced.

Ah right, I missed that. So I guess if we want the caller to protect 
co-shared-resource, get_from_shres stays where it is, and put_ instead 
can still go into task_end (with a boolean enabling it).

Thank you
Emanuele
> 
>> -
>>           offset = task_end(task);
>>           bytes = end - offset;
>>
>>
>>
>>
>>>
>>>> diff --git a/util/qemu-co-shared-resource.c 
>>>> b/util/qemu-co-shared-resource.c
>>>> index 1c83cd9d29..c455d02a1e 100644
>>>> --- a/util/qemu-co-shared-resource.c
>>>> +++ b/util/qemu-co-shared-resource.c
>>>> @@ -32,6 +32,7 @@ struct SharedResource {
>>>>       uint64_t available;
>>>>       CoQueue queue;
>>>> +    QemuMutex lock;
>>>
>>> Please add a comment indicating what this lock protects.
>>>
>>> Thread safety should also be documented in the header file so API users
>>> know what to expect.
>>
>> Will do, thanks.
>>
>> Emanuele
>>
> 
>
Vladimir Sementsov-Ogievskiy May 14, 2021, 3:30 p.m. UTC | #8
14.05.2021 17:32, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 14/05/2021 16:26, Vladimir Sementsov-Ogievskiy wrote:
>> 14.05.2021 17:10, Emanuele Giuseppe Esposito wrote:
>>>
>>>
>>> On 12/05/2021 17:44, Stefan Hajnoczi wrote:
>>>> On Mon, May 10, 2021 at 10:59:40AM +0200, Emanuele Giuseppe Esposito wrote:
>>>>> co-shared-resource is currently not thread-safe, as also reported
>>>>> in co-shared-resource.h. Add a QemuMutex because co_try_get_from_shres
>>>>> can also be invoked from non-coroutine context.
>>>>>
>>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>>> ---
>>>>>   util/qemu-co-shared-resource.c | 26 ++++++++++++++++++++++----
>>>>>   1 file changed, 22 insertions(+), 4 deletions(-)
>>>>
>>>> Hmm...this thread-safety change is more fine-grained than I was
>>>> expecting. If we follow this strategy basically any data structure used
>>>> by coroutines needs its own fine-grained lock (like Java's Object base
>>>> class which has its own lock).
>>>>
>>>> I'm not sure I like it since callers may still need coarser grained
>>>> locks to protect their own state or synchronize access to multiple
>>>> items of data. Also, some callers may not need thread-safety.
>>>>
>>>> Can the caller to be responsible for locking instead (e.g. using
>>>> CoMutex)?
>>>
>>> Right now co-shared-resource is being used only by block-copy, so I guess locking it from the caller or within the API won't really matter in this case.
>>>
>>> One possible idea on how to delegate this to the caller without adding additional small lock/unlock in block-copy is to move co_get_from_shres in block_copy_task_end, and calling it only when a boolean passed to block_copy_task_end is true.
>>>
>>> Otherwise make b_c_task_end always call co_get_from_shres and then include co_get_from_shres in block_copy_task_create, so that we always add and in case remove (if error) in the shared resource.
>>>
>>> Something like:
>>>
>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>> index 3a447a7c3d..1e4914b0cb 100644
>>> --- a/block/block-copy.c
>>> +++ b/block/block-copy.c
>>> @@ -233,6 +233,7 @@ static coroutine_fn BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>>>       /* region is dirty, so no existent tasks possible in it */
>>>       assert(!find_conflicting_task(s, offset, bytes));
>>>       QLIST_INSERT_HEAD(&s->tasks, task, list);
>>> +    co_get_from_shres(s->mem, task->bytes);
>>>       qemu_co_mutex_unlock(&s->tasks_lock);
>>>
>>>       return task;
>>> @@ -269,6 +270,7 @@ static void coroutine_fn block_copy_task_end(BlockCopyTask *task, int ret)
>>>           bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
>>>       }
>>>       qemu_co_mutex_lock(&task->s->tasks_lock);
>>> +    co_put_to_shres(task->s->mem, task->bytes);
>>>       task->s->in_flight_bytes -= task->bytes;
>>>       QLIST_REMOVE(task, list);
>>>       progress_set_remaining(task->s->progress,
>>> @@ -379,7 +381,6 @@ static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
>>>
>>>       aio_task_pool_wait_slot(pool);
>>>       if (aio_task_pool_status(pool) < 0) {
>>> -        co_put_to_shres(task->s->mem, task->bytes);
>>>           block_copy_task_end(task, -ECANCELED);
>>>           g_free(task);
>>>           return -ECANCELED;
>>> @@ -498,7 +499,6 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
>>>       }
>>>       qemu_mutex_unlock(&t->s->calls_lock);
>>>
>>> -    co_put_to_shres(t->s->mem, t->bytes);
>>>       block_copy_task_end(t, ret);
>>>
>>>       return ret;
>>> @@ -687,8 +687,6 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
>>>
>>>           trace_block_copy_process(s, task->offset);
>>>
>>> -        co_get_from_shres(s->mem, task->bytes);
>>
>> we want to get from shres here, after possible call to block_copy_task_shrink(), as task->bytes may be reduced.
> 
> Ah right, I missed that. So I guess if we want the caller to protect co-shared-resource, get_from_shres stays where it is, and put_ instead can still go into task_end (with a boolean enabling it).

honestly, I don't follow how it helps thread-safety

>>
>>> -
>>>           offset = task_end(task);
>>>           bytes = end - offset;
>>>
>>>
>>>
>>>
>>>>
>>>>> diff --git a/util/qemu-co-shared-resource.c b/util/qemu-co-shared-resource.c
>>>>> index 1c83cd9d29..c455d02a1e 100644
>>>>> --- a/util/qemu-co-shared-resource.c
>>>>> +++ b/util/qemu-co-shared-resource.c
>>>>> @@ -32,6 +32,7 @@ struct SharedResource {
>>>>>       uint64_t available;
>>>>>       CoQueue queue;
>>>>> +    QemuMutex lock;
>>>>
>>>> Please add a comment indicating what this lock protects.
>>>>
>>>> Thread safety should also be documented in the header file so API users
>>>> know what to expect.
>>>
>>> Will do, thanks.
>>>
>>> Emanuele
>>>
>>
>>
>
Emanuele Giuseppe Esposito May 14, 2021, 5:28 p.m. UTC | #9
On 14/05/2021 17:30, Vladimir Sementsov-Ogievskiy wrote:
> 14.05.2021 17:32, Emanuele Giuseppe Esposito wrote:
>>
>>
>> On 14/05/2021 16:26, Vladimir Sementsov-Ogievskiy wrote:
>>> 14.05.2021 17:10, Emanuele Giuseppe Esposito wrote:
>>>>
>>>>
>>>> On 12/05/2021 17:44, Stefan Hajnoczi wrote:
>>>>> On Mon, May 10, 2021 at 10:59:40AM +0200, Emanuele Giuseppe 
>>>>> Esposito wrote:
>>>>>> co-shared-resource is currently not thread-safe, as also reported
>>>>>> in co-shared-resource.h. Add a QemuMutex because 
>>>>>> co_try_get_from_shres
>>>>>> can also be invoked from non-coroutine context.
>>>>>>
>>>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>>>> ---
>>>>>>   util/qemu-co-shared-resource.c | 26 ++++++++++++++++++++++----
>>>>>>   1 file changed, 22 insertions(+), 4 deletions(-)
>>>>>
>>>>> Hmm...this thread-safety change is more fine-grained than I was
>>>>> expecting. If we follow this strategy basically any data structure 
>>>>> used
>>>>> by coroutines needs its own fine-grained lock (like Java's Object base
>>>>> class which has its own lock).
>>>>>
>>>>> I'm not sure I like it since callers may still need coarser grained
>>>>> locks to protect their own state or synchronize access to multiple
>>>>> items of data. Also, some callers may not need thread-safety.
>>>>>
>>>>> Can the caller to be responsible for locking instead (e.g. using
>>>>> CoMutex)?
>>>>
>>>> Right now co-shared-resource is being used only by block-copy, so I 
>>>> guess locking it from the caller or within the API won't really 
>>>> matter in this case.
>>>>
>>>> One possible idea on how to delegate this to the caller without 
>>>> adding additional small lock/unlock in block-copy is to move 
>>>> co_get_from_shres in block_copy_task_end, and calling it only when a 
>>>> boolean passed to block_copy_task_end is true.
>>>>
>>>> Otherwise make b_c_task_end always call co_get_from_shres and then 
>>>> include co_get_from_shres in block_copy_task_create, so that we 
>>>> always add and in case remove (if error) in the shared resource.
>>>>
>>>> Something like:
>>>>
>>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>>> index 3a447a7c3d..1e4914b0cb 100644
>>>> --- a/block/block-copy.c
>>>> +++ b/block/block-copy.c
>>>> @@ -233,6 +233,7 @@ static coroutine_fn BlockCopyTask 
>>>> *block_copy_task_create(BlockCopyState *s,
>>>>       /* region is dirty, so no existent tasks possible in it */
>>>>       assert(!find_conflicting_task(s, offset, bytes));
>>>>       QLIST_INSERT_HEAD(&s->tasks, task, list);
>>>> +    co_get_from_shres(s->mem, task->bytes);
>>>>       qemu_co_mutex_unlock(&s->tasks_lock);
>>>>
>>>>       return task;
>>>> @@ -269,6 +270,7 @@ static void coroutine_fn 
>>>> block_copy_task_end(BlockCopyTask *task, int ret)
>>>>           bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, 
>>>> task->bytes);
>>>>       }
>>>>       qemu_co_mutex_lock(&task->s->tasks_lock);
>>>> +    co_put_to_shres(task->s->mem, task->bytes);
>>>>       task->s->in_flight_bytes -= task->bytes;
>>>>       QLIST_REMOVE(task, list);
>>>>       progress_set_remaining(task->s->progress,
>>>> @@ -379,7 +381,6 @@ static coroutine_fn int 
>>>> block_copy_task_run(AioTaskPool *pool,
>>>>
>>>>       aio_task_pool_wait_slot(pool);
>>>>       if (aio_task_pool_status(pool) < 0) {
>>>> -        co_put_to_shres(task->s->mem, task->bytes);
>>>>           block_copy_task_end(task, -ECANCELED);
>>>>           g_free(task);
>>>>           return -ECANCELED;
>>>> @@ -498,7 +499,6 @@ static coroutine_fn int 
>>>> block_copy_task_entry(AioTask *task)
>>>>       }
>>>>       qemu_mutex_unlock(&t->s->calls_lock);
>>>>
>>>> -    co_put_to_shres(t->s->mem, t->bytes);
>>>>       block_copy_task_end(t, ret);
>>>>
>>>>       return ret;
>>>> @@ -687,8 +687,6 @@ block_copy_dirty_clusters(BlockCopyCallState 
>>>> *call_state)
>>>>
>>>>           trace_block_copy_process(s, task->offset);
>>>>
>>>> -        co_get_from_shres(s->mem, task->bytes);
>>>
>>> we want to get from shres here, after possible call to 
>>> block_copy_task_shrink(), as task->bytes may be reduced.
>>
>> Ah right, I missed that. So I guess if we want the caller to protect 
>> co-shared-resource, get_from_shres stays where it is, and put_ instead 
>> can still go into task_end (with a boolean enabling it).
> 
> honestly, I don't follow how it helps thread-safety

 From my understanding, the whole point here is to have no lock in 
co-shared-resource but let the caller take care of it (block-copy).

The above was just an idea on how to do it.

> 
>>>
>>>> -
>>>>           offset = task_end(task);
>>>>           bytes = end - offset;
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>> diff --git a/util/qemu-co-shared-resource.c 
>>>>>> b/util/qemu-co-shared-resource.c
>>>>>> index 1c83cd9d29..c455d02a1e 100644
>>>>>> --- a/util/qemu-co-shared-resource.c
>>>>>> +++ b/util/qemu-co-shared-resource.c
>>>>>> @@ -32,6 +32,7 @@ struct SharedResource {
>>>>>>       uint64_t available;
>>>>>>       CoQueue queue;
>>>>>> +    QemuMutex lock;
>>>>>
>>>>> Please add a comment indicating what this lock protects.
>>>>>
>>>>> Thread safety should also be documented in the header file so API 
>>>>> users
>>>>> know what to expect.
>>>>
>>>> Will do, thanks.
>>>>
>>>> Emanuele
>>>>
>>>
>>>
>>
> 
>
Paolo Bonzini May 14, 2021, 5:55 p.m. UTC | #10
Il ven 14 mag 2021, 16:10 Emanuele Giuseppe Esposito <eesposit@redhat.com>
ha scritto:

> > I'm not sure I like it since callers may still need coarser grained
> > locks to protect their own state or synchronize access to multiple
> > items of data. Also, some callers may not need thread-safety.
> >
> > Can the caller to be responsible for locking instead (e.g. using
> > CoMutex)?
>
> Right now co-shared-resource is being used only by block-copy, so I
> guess locking it from the caller or within the API won't really matter
> in this case.
>
> One possible idea on how to delegate this to the caller without adding
> additional small lock/unlock in block-copy is to move co_get_from_shres
> in block_copy_task_end, and calling it only when a boolean passed to
> block_copy_task_end is true.
>

The patch below won't work because qemu_co_queue_wait would have to unlock
the CoMutex; therefore you would have to pass it as an additional argument
to co_get_from_shres.

Overall, neither co_get_from_shres not AioTaskPool should be fast paths, so
using a local lock seems to produce the simplest API.

Paolo


> Otherwise make b_c_task_end always call co_get_from_shres and then
> include co_get_from_shres in block_copy_task_create, so that we always
> add and in case remove (if error) in the shared resource.
>
> Something like:
>
> diff --git a/block/block-copy.c b/block/block-copy.c
> index 3a447a7c3d..1e4914b0cb 100644
> --- a/block/block-copy.c
> +++ b/block/block-copy.c
> @@ -233,6 +233,7 @@ static coroutine_fn BlockCopyTask
> *block_copy_task_create(BlockCopyState *s,
>       /* region is dirty, so no existent tasks possible in it */
>       assert(!find_conflicting_task(s, offset, bytes));
>       QLIST_INSERT_HEAD(&s->tasks, task, list);
> +    co_get_from_shres(s->mem, task->bytes);
>       qemu_co_mutex_unlock(&s->tasks_lock);
>
>       return task;
> @@ -269,6 +270,7 @@ static void coroutine_fn
> block_copy_task_end(BlockCopyTask *task, int ret)
>           bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset,
> task->bytes);
>       }
>       qemu_co_mutex_lock(&task->s->tasks_lock);
> +    co_put_to_shres(task->s->mem, task->bytes);
>       task->s->in_flight_bytes -= task->bytes;
>       QLIST_REMOVE(task, list);
>       progress_set_remaining(task->s->progress,
> @@ -379,7 +381,6 @@ static coroutine_fn int
> block_copy_task_run(AioTaskPool *pool,
>
>       aio_task_pool_wait_slot(pool);
>       if (aio_task_pool_status(pool) < 0) {
> -        co_put_to_shres(task->s->mem, task->bytes);
>           block_copy_task_end(task, -ECANCELED);
>           g_free(task);
>           return -ECANCELED;
> @@ -498,7 +499,6 @@ static coroutine_fn int
> block_copy_task_entry(AioTask *task)
>       }
>       qemu_mutex_unlock(&t->s->calls_lock);
>
> -    co_put_to_shres(t->s->mem, t->bytes);
>       block_copy_task_end(t, ret);
>
>       return ret;
> @@ -687,8 +687,6 @@ block_copy_dirty_clusters(BlockCopyCallState
> *call_state)
>
>           trace_block_copy_process(s, task->offset);
>
> -        co_get_from_shres(s->mem, task->bytes);
> -
>           offset = task_end(task);
>           bytes = end - offset;
>
>
>
>
> >
> >> diff --git a/util/qemu-co-shared-resource.c
> b/util/qemu-co-shared-resource.c
> >> index 1c83cd9d29..c455d02a1e 100644
> >> --- a/util/qemu-co-shared-resource.c
> >> +++ b/util/qemu-co-shared-resource.c
> >> @@ -32,6 +32,7 @@ struct SharedResource {
> >>       uint64_t available;
> >>
> >>       CoQueue queue;
> >> +    QemuMutex lock;
> >
> > Please add a comment indicating what this lock protects.
> >
> > Thread safety should also be documented in the header file so API users
> > know what to expect.
>
> Will do, thanks.
>
> Emanuele
>
>
Vladimir Sementsov-Ogievskiy May 14, 2021, 9:15 p.m. UTC | #11
14.05.2021 20:28, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 14/05/2021 17:30, Vladimir Sementsov-Ogievskiy wrote:
>> 14.05.2021 17:32, Emanuele Giuseppe Esposito wrote:
>>>
>>>
>>> On 14/05/2021 16:26, Vladimir Sementsov-Ogievskiy wrote:
>>>> 14.05.2021 17:10, Emanuele Giuseppe Esposito wrote:
>>>>>
>>>>>
>>>>> On 12/05/2021 17:44, Stefan Hajnoczi wrote:
>>>>>> On Mon, May 10, 2021 at 10:59:40AM +0200, Emanuele Giuseppe Esposito wrote:
>>>>>>> co-shared-resource is currently not thread-safe, as also reported
>>>>>>> in co-shared-resource.h. Add a QemuMutex because co_try_get_from_shres
>>>>>>> can also be invoked from non-coroutine context.
>>>>>>>
>>>>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>>>>> ---
>>>>>>>   util/qemu-co-shared-resource.c | 26 ++++++++++++++++++++++----
>>>>>>>   1 file changed, 22 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> Hmm...this thread-safety change is more fine-grained than I was
>>>>>> expecting. If we follow this strategy basically any data structure used
>>>>>> by coroutines needs its own fine-grained lock (like Java's Object base
>>>>>> class which has its own lock).
>>>>>>
>>>>>> I'm not sure I like it since callers may still need coarser grained
>>>>>> locks to protect their own state or synchronize access to multiple
>>>>>> items of data. Also, some callers may not need thread-safety.
>>>>>>
>>>>>> Can the caller to be responsible for locking instead (e.g. using
>>>>>> CoMutex)?
>>>>>
>>>>> Right now co-shared-resource is being used only by block-copy, so I guess locking it from the caller or within the API won't really matter in this case.
>>>>>
>>>>> One possible idea on how to delegate this to the caller without adding additional small lock/unlock in block-copy is to move co_get_from_shres in block_copy_task_end, and calling it only when a boolean passed to block_copy_task_end is true.
>>>>>
>>>>> Otherwise make b_c_task_end always call co_get_from_shres and then include co_get_from_shres in block_copy_task_create, so that we always add and in case remove (if error) in the shared resource.
>>>>>
>>>>> Something like:
>>>>>
>>>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>>>> index 3a447a7c3d..1e4914b0cb 100644
>>>>> --- a/block/block-copy.c
>>>>> +++ b/block/block-copy.c
>>>>> @@ -233,6 +233,7 @@ static coroutine_fn BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>>>>>       /* region is dirty, so no existent tasks possible in it */
>>>>>       assert(!find_conflicting_task(s, offset, bytes));
>>>>>       QLIST_INSERT_HEAD(&s->tasks, task, list);
>>>>> +    co_get_from_shres(s->mem, task->bytes);
>>>>>       qemu_co_mutex_unlock(&s->tasks_lock);
>>>>>
>>>>>       return task;
>>>>> @@ -269,6 +270,7 @@ static void coroutine_fn block_copy_task_end(BlockCopyTask *task, int ret)
>>>>>           bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
>>>>>       }
>>>>>       qemu_co_mutex_lock(&task->s->tasks_lock);
>>>>> +    co_put_to_shres(task->s->mem, task->bytes);
>>>>>       task->s->in_flight_bytes -= task->bytes;
>>>>>       QLIST_REMOVE(task, list);
>>>>>       progress_set_remaining(task->s->progress,
>>>>> @@ -379,7 +381,6 @@ static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
>>>>>
>>>>>       aio_task_pool_wait_slot(pool);
>>>>>       if (aio_task_pool_status(pool) < 0) {
>>>>> -        co_put_to_shres(task->s->mem, task->bytes);
>>>>>           block_copy_task_end(task, -ECANCELED);
>>>>>           g_free(task);
>>>>>           return -ECANCELED;
>>>>> @@ -498,7 +499,6 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
>>>>>       }
>>>>>       qemu_mutex_unlock(&t->s->calls_lock);
>>>>>
>>>>> -    co_put_to_shres(t->s->mem, t->bytes);
>>>>>       block_copy_task_end(t, ret);
>>>>>
>>>>>       return ret;
>>>>> @@ -687,8 +687,6 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
>>>>>
>>>>>           trace_block_copy_process(s, task->offset);
>>>>>
>>>>> -        co_get_from_shres(s->mem, task->bytes);
>>>>
>>>> we want to get from shres here, after possible call to block_copy_task_shrink(), as task->bytes may be reduced.
>>>
>>> Ah right, I missed that. So I guess if we want the caller to protect co-shared-resource, get_from_shres stays where it is, and put_ instead can still go into task_end (with a boolean enabling it).
>>
>> honestly, I don't follow how it helps thread-safety
> 
>  From my understanding, the whole point here is to have no lock in co-shared-resource but let the caller take care of it (block-copy).
> 
> The above was just an idea on how to do it.

But how moving co_put_to_shres() make it thread-safe? Nothing in block-copy is thread-safe yet..
Emanuele Giuseppe Esposito May 14, 2021, 9:53 p.m. UTC | #12
>>>>> we want to get from shres here, after possible call to 
>>>>> block_copy_task_shrink(), as task->bytes may be reduced.
>>>>
>>>> Ah right, I missed that. So I guess if we want the caller to protect 
>>>> co-shared-resource, get_from_shres stays where it is, and put_ 
>>>> instead can still go into task_end (with a boolean enabling it).
>>>
>>> honestly, I don't follow how it helps thread-safety
>>
>>  From my understanding, the whole point here is to have no lock in 
>> co-shared-resource but let the caller take care of it (block-copy).
>>
>> The above was just an idea on how to do it.
> 
> But how moving co_put_to_shres() make it thread-safe? Nothing in 
> block-copy is thread-safe yet..
> 
Sorry this is my bad, I did not explain it properly. If you look closely 
at the diff I sent, there are locks in a similar way of my block-copy 
initial patch. So I am essentially assuming that block-copy has already 
locks, and moving co_put_to_shres in block_copy_task_end has the purpose 
of moving shres "in a function that has a critical section".

>>>>>>> @@ -269,6 +270,7 @@ static void coroutine_fn 
>>>>>>> block_copy_task_end(BlockCopyTask *task, int ret)
>>>>>>>           bdrv_set_dirty_bitmap(task->s->copy_bitmap, 
>>>>>>> task->offset, task->bytes);
>>>>>>>       }
>>>>>>>       qemu_co_mutex_lock(&task->s->tasks_lock);
                ^^^ locks
>>>>>>> +    co_put_to_shres(task->s->mem, task->bytes);
>>>>>>>       task->s->in_flight_bytes -= task->bytes;
>>>>>>>       QLIST_REMOVE(task, list);
>>>>>>>       progress_set_remaining(task->s->progress,

		unlocks here (not shown in the diff)
	 }

Hopefully now it is clear. Apologies again for the confusion.

Emanuele
Vladimir Sementsov-Ogievskiy May 15, 2021, 7:11 a.m. UTC | #13
15.05.2021 00:53, Emanuele Giuseppe Esposito wrote:
> 
>>>>>> we want to get from shres here, after possible call to block_copy_task_shrink(), as task->bytes may be reduced.
>>>>>
>>>>> Ah right, I missed that. So I guess if we want the caller to protect co-shared-resource, get_from_shres stays where it is, and put_ instead can still go into task_end (with a boolean enabling it).
>>>>
>>>> honestly, I don't follow how it helps thread-safety
>>>
>>>  From my understanding, the whole point here is to have no lock in co-shared-resource but let the caller take care of it (block-copy).
>>>
>>> The above was just an idea on how to do it.
>>
>> But how moving co_put_to_shres() make it thread-safe? Nothing in block-copy is thread-safe yet..
>>
> Sorry this is my bad, I did not explain it properly. If you look closely at the diff I sent, there are locks in a similar way of my block-copy initial patch. So I am essentially assuming that block-copy has already locks, and moving co_put_to_shres in block_copy_task_end has the purpose of moving shres "in a function that has a critical section".


Hm. Understand now. If you go this way, I'd better add a critical section and keep shres operations where they are now. We don't have shres operation in task creation, so should not have in task finalization. Shres operations are kept in seperate. It probably not bad to refactor it to make shres operations as a part of task manipulation functions, but it would require more accurate refactoring, simpler is to keep it as is.
diff mbox series

Patch

diff --git a/util/qemu-co-shared-resource.c b/util/qemu-co-shared-resource.c
index 1c83cd9d29..c455d02a1e 100644
--- a/util/qemu-co-shared-resource.c
+++ b/util/qemu-co-shared-resource.c
@@ -32,6 +32,7 @@  struct SharedResource {
     uint64_t available;
 
     CoQueue queue;
+    QemuMutex lock;
 };
 
 SharedResource *shres_create(uint64_t total)
@@ -40,17 +41,23 @@  SharedResource *shres_create(uint64_t total)
 
     s->total = s->available = total;
     qemu_co_queue_init(&s->queue);
+    qemu_mutex_init(&s->lock);
 
     return s;
 }
 
 void shres_destroy(SharedResource *s)
 {
-    assert(s->available == s->total);
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        assert(s->available == s->total);
+    }
+
+    qemu_mutex_destroy(&s->lock);
     g_free(s);
 }
 
-bool co_try_get_from_shres(SharedResource *s, uint64_t n)
+/* Called with lock held */
+static bool co_try_get_from_shres_unlocked(SharedResource *s, uint64_t n)
 {
     if (s->available >= n) {
         s->available -= n;
@@ -60,16 +67,27 @@  bool co_try_get_from_shres(SharedResource *s, uint64_t n)
     return false;
 }
 
+bool co_try_get_from_shres(SharedResource *s, uint64_t n)
+{
+    bool res;
+    QEMU_LOCK_GUARD(&s->lock);
+    res = co_try_get_from_shres_unlocked(s, n);
+
+    return res;
+}
+
 void coroutine_fn co_get_from_shres(SharedResource *s, uint64_t n)
 {
+    QEMU_LOCK_GUARD(&s->lock);
     assert(n <= s->total);
-    while (!co_try_get_from_shres(s, n)) {
-        qemu_co_queue_wait(&s->queue, NULL);
+    while (!co_try_get_from_shres_unlocked(s, n)) {
+        qemu_co_queue_wait(&s->queue, &s->lock);
     }
 }
 
 void coroutine_fn co_put_to_shres(SharedResource *s, uint64_t n)
 {
+    QEMU_LOCK_GUARD(&s->lock);
     assert(s->total - s->available >= n);
     s->available += n;
     qemu_co_queue_restart_all(&s->queue);