diff mbox series

[2/3] qcow2: Assert that qcow2_cache_get() callers hold s->lock

Message ID 20191023152620.13166-3-kwolf@redhat.com (mailing list archive)
State New, archived
Headers show
Series qcow2: Fix image corruption bug in 4.1 | expand

Commit Message

Kevin Wolf Oct. 23, 2019, 3:26 p.m. UTC
qcow2_cache_do_get() requires that s->lock is locked because it can
yield between picking a cache entry and actually taking ownership of it
by setting offset and increasing the reference count.

Add an assertion to make sure the caller really holds the lock. The
function can be called outside of coroutine context, where bdrv_pread
and flushes become synchronous operations. The lock cannot and need not
be taken in this case.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cache.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Kevin Wolf Oct. 23, 2019, 3:37 p.m. UTC | #1
Am 23.10.2019 um 17:26 hat Kevin Wolf geschrieben:
> qcow2_cache_do_get() requires that s->lock is locked because it can
> yield between picking a cache entry and actually taking ownership of it
> by setting offset and increasing the reference count.
> 
> Add an assertion to make sure the caller really holds the lock. The
> function can be called outside of coroutine context, where bdrv_pread
> and flushes become synchronous operations. The lock cannot and need not
> be taken in this case.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Oops, this one was a bit too optimistic. :-)

I'm still running tests to see if any other code paths trigger the
assertion, but image creation calls this without the lock held (which is
harmless because nobody else knows about the image so there won't be
concurrent requests). The following patch is needed additionally to make
image creation work with the new assertion.

Kevin


diff --git a/block/qcow2.c b/block/qcow2.c
index 0bc69e6996..7761cf3e07 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3213,6 +3213,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
     BlockBackend *blk = NULL;
     BlockDriverState *bs = NULL;
     BlockDriverState *data_bs = NULL;
+    BDRVQcow2State *s;
     QCowHeader *header;
     size_t cluster_size;
     int version;
@@ -3424,7 +3425,12 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
         goto out;
     }

+    s = blk_bs(blk)->opaque;
+
+    qemu_co_mutex_lock(&s->lock);
     ret = qcow2_alloc_clusters(blk_bs(blk), 3 * cluster_size);
+    qemu_co_mutex_unlock(&s->lock);
+
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Could not allocate clusters for qcow2 "
                          "header and refcount table");
@@ -3437,7 +3443,6 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)

     /* Set the external data file if necessary */
     if (data_bs) {
-        BDRVQcow2State *s = blk_bs(blk)->opaque;
         s->image_data_file = g_strdup(data_bs->filename);
     }
Denis V. Lunev Oct. 24, 2019, 10:01 a.m. UTC | #2
On 10/23/19 6:26 PM, Kevin Wolf wrote:
> qcow2_cache_do_get() requires that s->lock is locked because it can
> yield between picking a cache entry and actually taking ownership of it
> by setting offset and increasing the reference count.
>
> Add an assertion to make sure the caller really holds the lock. The
> function can be called outside of coroutine context, where bdrv_pread
> and flushes become synchronous operations. The lock cannot and need not
> be taken in this case.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2-cache.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index d29b038a67..75b13dad99 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
>      int min_lru_index = -1;
>  
>      assert(offset != 0);
> +    if (qemu_in_coroutine()) {
> +        qemu_co_mutex_assert_locked(&s->lock);
> +    }

that is looking not good to me. If this is really requires lock, we should
check for the lock always. In the other hand we could face missed
lock out of coroutine.

Den

>  
>      trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
>                            offset, read_from_disk);
> @@ -386,6 +389,8 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
>          }
>      }
>  
> +    assert(c->entries[i].ref == 0);
> +    assert(c->entries[i].offset == 0);
>      c->entries[i].offset = offset;
>  
>      /* And return the right table */
Kevin Wolf Oct. 24, 2019, 10:57 a.m. UTC | #3
Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben:
> On 10/23/19 6:26 PM, Kevin Wolf wrote:
> > qcow2_cache_do_get() requires that s->lock is locked because it can
> > yield between picking a cache entry and actually taking ownership of it
> > by setting offset and increasing the reference count.
> >
> > Add an assertion to make sure the caller really holds the lock. The
> > function can be called outside of coroutine context, where bdrv_pread
> > and flushes become synchronous operations. The lock cannot and need not
> > be taken in this case.
> >
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/qcow2-cache.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> > index d29b038a67..75b13dad99 100644
> > --- a/block/qcow2-cache.c
> > +++ b/block/qcow2-cache.c
> > @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
> >      int min_lru_index = -1;
> >  
> >      assert(offset != 0);
> > +    if (qemu_in_coroutine()) {
> > +        qemu_co_mutex_assert_locked(&s->lock);
> > +    }
> 
> that is looking not good to me. If this is really requires lock, we should
> check for the lock always. In the other hand we could face missed
> lock out of coroutine.

As the commit message explains, outside of coroutine context, we can't
yield and bdrv_pread and bdrv_flush become synchronous operations
instead, so there is nothing else that we need to protect against.

Kevin
Denis V. Lunev Oct. 24, 2019, 11:14 a.m. UTC | #4
On 10/24/19 1:57 PM, Kevin Wolf wrote:
> Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben:
>> On 10/23/19 6:26 PM, Kevin Wolf wrote:
>>> qcow2_cache_do_get() requires that s->lock is locked because it can
>>> yield between picking a cache entry and actually taking ownership of it
>>> by setting offset and increasing the reference count.
>>>
>>> Add an assertion to make sure the caller really holds the lock. The
>>> function can be called outside of coroutine context, where bdrv_pread
>>> and flushes become synchronous operations. The lock cannot and need not
>>> be taken in this case.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>  block/qcow2-cache.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
>>> index d29b038a67..75b13dad99 100644
>>> --- a/block/qcow2-cache.c
>>> +++ b/block/qcow2-cache.c
>>> @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
>>>      int min_lru_index = -1;
>>>  
>>>      assert(offset != 0);
>>> +    if (qemu_in_coroutine()) {
>>> +        qemu_co_mutex_assert_locked(&s->lock);
>>> +    }
>> that is looking not good to me. If this is really requires lock, we should
>> check for the lock always. In the other hand we could face missed
>> lock out of coroutine.
> As the commit message explains, outside of coroutine context, we can't
> yield and bdrv_pread and bdrv_flush become synchronous operations
> instead, so there is nothing else that we need to protect against.
>
> Kevin
>
Hmm. It seems I was not careful enough with reading entire message.
I am fine with this though it looks a bit tricky to me as such things
can change in the future.

Anyway, you could consider this as

Reviewed-by: Denis V. Lunev <den@openvz.org>

Den
Kevin Wolf Oct. 24, 2019, 12:07 p.m. UTC | #5
Am 24.10.2019 um 13:14 hat Denis Lunev geschrieben:
> On 10/24/19 1:57 PM, Kevin Wolf wrote:
> > Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben:
> >> On 10/23/19 6:26 PM, Kevin Wolf wrote:
> >>> qcow2_cache_do_get() requires that s->lock is locked because it can
> >>> yield between picking a cache entry and actually taking ownership of it
> >>> by setting offset and increasing the reference count.
> >>>
> >>> Add an assertion to make sure the caller really holds the lock. The
> >>> function can be called outside of coroutine context, where bdrv_pread
> >>> and flushes become synchronous operations. The lock cannot and need not
> >>> be taken in this case.
> >>>
> >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> >>> ---
> >>>  block/qcow2-cache.c | 5 +++++
> >>>  1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> >>> index d29b038a67..75b13dad99 100644
> >>> --- a/block/qcow2-cache.c
> >>> +++ b/block/qcow2-cache.c
> >>> @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
> >>>      int min_lru_index = -1;
> >>>  
> >>>      assert(offset != 0);
> >>> +    if (qemu_in_coroutine()) {
> >>> +        qemu_co_mutex_assert_locked(&s->lock);
> >>> +    }
> >> that is looking not good to me. If this is really requires lock, we should
> >> check for the lock always. In the other hand we could face missed
> >> lock out of coroutine.
> > As the commit message explains, outside of coroutine context, we can't
> > yield and bdrv_pread and bdrv_flush become synchronous operations
> > instead, so there is nothing else that we need to protect against.
> >
> Hmm. It seems I was not careful enough with reading entire message.
> I am fine with this though it looks a bit tricky to me as such things
> can change in the future.

In which way do you think this could change? It's a pretty fundamental
fact about non-coroutine code that it can't yield.

What could change, of course, is that some code switches from being
synchronous to using a coroutine. The assertion would automatically
apply then and catch the bug if adding proper locking is forgotten.

> Anyway, you could consider this as
> 
> Reviewed-by: Denis V. Lunev <den@openvz.org>

Thanks!

Kevin
Vladimir Sementsov-Ogievskiy Oct. 24, 2019, 1:03 p.m. UTC | #6
24.10.2019 13:57, Kevin Wolf wrote:
> Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben:
>> On 10/23/19 6:26 PM, Kevin Wolf wrote:
>>> qcow2_cache_do_get() requires that s->lock is locked because it can
>>> yield between picking a cache entry and actually taking ownership of it
>>> by setting offset and increasing the reference count.
>>>
>>> Add an assertion to make sure the caller really holds the lock. The
>>> function can be called outside of coroutine context, where bdrv_pread
>>> and flushes become synchronous operations. The lock cannot and need not
>>> be taken in this case.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>   block/qcow2-cache.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
>>> index d29b038a67..75b13dad99 100644
>>> --- a/block/qcow2-cache.c
>>> +++ b/block/qcow2-cache.c
>>> @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
>>>       int min_lru_index = -1;
>>>   
>>>       assert(offset != 0);
>>> +    if (qemu_in_coroutine()) {
>>> +        qemu_co_mutex_assert_locked(&s->lock);
>>> +    }
>>
>> that is looking not good to me. If this is really requires lock, we should
>> check for the lock always. In the other hand we could face missed
>> lock out of coroutine.
> 
> As the commit message explains, outside of coroutine context, we can't
> yield and bdrv_pread and bdrv_flush become synchronous operations
> instead, so there is nothing else that we need to protect against.
> 

Recently we discussed similar problems about block-dirty-bitmap-* qmp commands,
which wanted to update qcow2 metadata about bitmaps from non-coroutine context.
"qcow2 lock"
<135df452-397a-30bb-7518-2184fa5971aa@virtuozzo.com>
https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg01419.html

And, as I understand, the correct way is to convert to coroutine and lock mutex
appropriately. Or we want to lock aio context and to be in drained section to
avoid parallel requests accessing critical section. Should we assert such
conditions in case of !qemu_in_coroutine() ?

--
Final commit to fix bug about bitmaps:

commit d2c3080e41fd2c9bc36c996cc9d33804462ba803
Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date:   Fri Sep 20 11:25:43 2019 +0300

     block/qcow2: proper locking on bitmap add/remove paths
Kevin Wolf Oct. 24, 2019, 1:17 p.m. UTC | #7
Am 24.10.2019 um 15:03 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 24.10.2019 13:57, Kevin Wolf wrote:
> > Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben:
> >> On 10/23/19 6:26 PM, Kevin Wolf wrote:
> >>> qcow2_cache_do_get() requires that s->lock is locked because it can
> >>> yield between picking a cache entry and actually taking ownership of it
> >>> by setting offset and increasing the reference count.
> >>>
> >>> Add an assertion to make sure the caller really holds the lock. The
> >>> function can be called outside of coroutine context, where bdrv_pread
> >>> and flushes become synchronous operations. The lock cannot and need not
> >>> be taken in this case.
> >>>
> >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> >>> ---
> >>>   block/qcow2-cache.c | 5 +++++
> >>>   1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> >>> index d29b038a67..75b13dad99 100644
> >>> --- a/block/qcow2-cache.c
> >>> +++ b/block/qcow2-cache.c
> >>> @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
> >>>       int min_lru_index = -1;
> >>>   
> >>>       assert(offset != 0);
> >>> +    if (qemu_in_coroutine()) {
> >>> +        qemu_co_mutex_assert_locked(&s->lock);
> >>> +    }
> >>
> >> that is looking not good to me. If this is really requires lock, we should
> >> check for the lock always. In the other hand we could face missed
> >> lock out of coroutine.
> > 
> > As the commit message explains, outside of coroutine context, we can't
> > yield and bdrv_pread and bdrv_flush become synchronous operations
> > instead, so there is nothing else that we need to protect against.
> > 
> 
> Recently we discussed similar problems about block-dirty-bitmap-* qmp
> commands, which wanted to update qcow2 metadata about bitmaps from
> non-coroutine context.  "qcow2 lock"
> <135df452-397a-30bb-7518-2184fa5971aa@virtuozzo.com>
> https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg01419.html

Hm, right, I already forgot about the nested event loop again...

> And, as I understand, the correct way is to convert to coroutine and
> lock mutex appropriately. Or we want to lock aio context and to be in
> drained section to avoid parallel requests accessing critical section.
> Should we assert such conditions in case of !qemu_in_coroutine() ?

The AioContext lock must be held anyway, so I don't think this would be
a new requirement. As for draining, I'll have to see.

I'm currently still auditing all the callers of qcow2_cache_do_get().
The synchronous callers I already know are the snapshot functions. I
think these happen to be in a drained section anyway (or should be at
least), so AioContext lock + drain seems like a very reasonable option
for them.

For other synchronous callers, if any, maybe conversion to a coroutine
would make more sense.

Kevin
Michael Weiser Oct. 25, 2019, 10:35 a.m. UTC | #8
Hi Kevin,

On Wed, Oct 23, 2019 at 05:37:49PM +0200, Kevin Wolf wrote:

> > qcow2_cache_do_get() requires that s->lock is locked because it can
> > yield between picking a cache entry and actually taking ownership of it
> > by setting offset and increasing the reference count.
> > 
> > Add an assertion to make sure the caller really holds the lock. The
> > function can be called outside of coroutine context, where bdrv_pread
> > and flushes become synchronous operations. The lock cannot and need not
> > be taken in this case.
> I'm still running tests to see if any other code paths trigger the
> assertion, but image creation calls this without the lock held (which is
> harmless because nobody else knows about the image so there won't be
> concurrent requests). The following patch is needed additionally to make
> image creation work with the new assertion.

I can confirm that with all four patches corruption does no longer
occur as of commit 69f47505ee66afaa513305de0c1895a224e52c45. Removing
only 3/3 (qcow2: Fix corruption bug in
qcow2_detect_metadata_preallocation()) the assertion triggers after a
few seconds, leaving behind a few leaked clusters but no errors in the
image.

(qemu) qemu-system-x86_64:qemu/include/qemu/coroutine.h:175:
qemu_co_mutex_assert_locked: Assertion `mutex->locked && mutex->holder
== qemu_coroutine_self()' failed.
Aborted (core dumped)

$ qemu-img check qtest.qcow2 
Leaked cluster 169257 refcount=3 reference=2
Leaked cluster 172001 refcount=1 reference=0
Leaked cluster 172002 refcount=1 reference=0
Leaked cluster 172003 refcount=1 reference=0
Leaked cluster 172004 refcount=1 reference=0
Leaked cluster 172005 refcount=1 reference=0
Leaked cluster 172006 refcount=1 reference=0
Leaked cluster 172007 refcount=1 reference=0
Leaked cluster 172008 refcount=1 reference=0
Leaked cluster 172009 refcount=1 reference=0
Leaked cluster 172010 refcount=1 reference=0
Leaked cluster 172011 refcount=1 reference=0
Leaked cluster 172012 refcount=1 reference=0

13 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
255525/327680 = 77.98% allocated, 3.22% fragmented, 0.00% compressed
clusters
Image end offset: 17106403328

I was going to test with master as well but got overtaken by v2. Will
move on to test v2 now. :)

Series:
Tested-by: Michael Weiser <michael.weiser@gmx.de>

No biggie but if there's a chance could you switch my address to the
above?
Kevin Wolf Oct. 25, 2019, 12:42 p.m. UTC | #9
Am 25.10.2019 um 12:35 hat Michael Weiser geschrieben:
> I was going to test with master as well but got overtaken by v2. Will
> move on to test v2 now. :)
> 
> Series:
> Tested-by: Michael Weiser <michael.weiser@gmx.de>

Thanks for testing!

The fix itself is unchanged in v2, so I assume the result will be the
same, but testing it explicitly can't hurt. I'm going to send a pull
request today, but if you're quick, I can wait for your results.

> No biggie but if there's a chance could you switch my address to the
> above?

No problem, I've updated the address in the Reported-by tag.

Kevin
diff mbox series

Patch

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index d29b038a67..75b13dad99 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -327,6 +327,9 @@  static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
     int min_lru_index = -1;
 
     assert(offset != 0);
+    if (qemu_in_coroutine()) {
+        qemu_co_mutex_assert_locked(&s->lock);
+    }
 
     trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
                           offset, read_from_disk);
@@ -386,6 +389,8 @@  static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
         }
     }
 
+    assert(c->entries[i].ref == 0);
+    assert(c->entries[i].offset == 0);
     c->entries[i].offset = offset;
 
     /* And return the right table */