Message ID | 20191023152620.13166-3-kwolf@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | qcow2: Fix image corruption bug in 4.1 | expand |
Am 23.10.2019 um 17:26 hat Kevin Wolf geschrieben: > qcow2_cache_do_get() requires that s->lock is locked because it can > yield between picking a cache entry and actually taking ownership of it > by setting offset and increasing the reference count. > > Add an assertion to make sure the caller really holds the lock. The > function can be called outside of coroutine context, where bdrv_pread > and flushes become synchronous operations. The lock cannot and need not > be taken in this case. > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> Oops, this one was a bit too optimistic. :-) I'm still running tests to see if any other code paths trigger the assertion, but image creation calls this without the lock held (which is harmless because nobody else knows about the image so there won't be concurrent requests). The following patch is needed additionally to make image creation work with the new assertion. Kevin diff --git a/block/qcow2.c b/block/qcow2.c index 0bc69e6996..7761cf3e07 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3213,6 +3213,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) BlockBackend *blk = NULL; BlockDriverState *bs = NULL; BlockDriverState *data_bs = NULL; + BDRVQcow2State *s; QCowHeader *header; size_t cluster_size; int version; @@ -3424,7 +3425,12 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) goto out; } + s = blk_bs(blk)->opaque; + + qemu_co_mutex_lock(&s->lock); ret = qcow2_alloc_clusters(blk_bs(blk), 3 * cluster_size); + qemu_co_mutex_unlock(&s->lock); + if (ret < 0) { error_setg_errno(errp, -ret, "Could not allocate clusters for qcow2 " "header and refcount table"); @@ -3437,7 +3443,6 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) /* Set the external data file if necessary */ if (data_bs) { - BDRVQcow2State *s = blk_bs(blk)->opaque; s->image_data_file = g_strdup(data_bs->filename); }
On 10/23/19 6:26 PM, Kevin Wolf wrote: > qcow2_cache_do_get() requires that s->lock is locked because it can > yield between picking a cache entry and actually taking ownership of it > by setting offset and increasing the reference count. > > Add an assertion to make sure the caller really holds the lock. The > function can be called outside of coroutine context, where bdrv_pread > and flushes become synchronous operations. The lock cannot and need not > be taken in this case. > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > block/qcow2-cache.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c > index d29b038a67..75b13dad99 100644 > --- a/block/qcow2-cache.c > +++ b/block/qcow2-cache.c > @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, > int min_lru_index = -1; > > assert(offset != 0); > + if (qemu_in_coroutine()) { > + qemu_co_mutex_assert_locked(&s->lock); > + } that is looking not good to me. If this is really requires lock, we should check for the lock always. In the other hand we could face missed lock out of coroutine. Den > > trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache, > offset, read_from_disk); > @@ -386,6 +389,8 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, > } > } > > + assert(c->entries[i].ref == 0); > + assert(c->entries[i].offset == 0); > c->entries[i].offset = offset; > > /* And return the right table */
Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben: > On 10/23/19 6:26 PM, Kevin Wolf wrote: > > qcow2_cache_do_get() requires that s->lock is locked because it can > > yield between picking a cache entry and actually taking ownership of it > > by setting offset and increasing the reference count. > > > > Add an assertion to make sure the caller really holds the lock. The > > function can be called outside of coroutine context, where bdrv_pread > > and flushes become synchronous operations. The lock cannot and need not > > be taken in this case. > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > --- > > block/qcow2-cache.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c > > index d29b038a67..75b13dad99 100644 > > --- a/block/qcow2-cache.c > > +++ b/block/qcow2-cache.c > > @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, > > int min_lru_index = -1; > > > > assert(offset != 0); > > + if (qemu_in_coroutine()) { > > + qemu_co_mutex_assert_locked(&s->lock); > > + } > > that is looking not good to me. If this is really requires lock, we should > check for the lock always. In the other hand we could face missed > lock out of coroutine. As the commit message explains, outside of coroutine context, we can't yield and bdrv_pread and bdrv_flush become synchronous operations instead, so there is nothing else that we need to protect against. Kevin
On 10/24/19 1:57 PM, Kevin Wolf wrote: > Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben: >> On 10/23/19 6:26 PM, Kevin Wolf wrote: >>> qcow2_cache_do_get() requires that s->lock is locked because it can >>> yield between picking a cache entry and actually taking ownership of it >>> by setting offset and increasing the reference count. >>> >>> Add an assertion to make sure the caller really holds the lock. The >>> function can be called outside of coroutine context, where bdrv_pread >>> and flushes become synchronous operations. The lock cannot and need not >>> be taken in this case. >>> >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com> >>> --- >>> block/qcow2-cache.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c >>> index d29b038a67..75b13dad99 100644 >>> --- a/block/qcow2-cache.c >>> +++ b/block/qcow2-cache.c >>> @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, >>> int min_lru_index = -1; >>> >>> assert(offset != 0); >>> + if (qemu_in_coroutine()) { >>> + qemu_co_mutex_assert_locked(&s->lock); >>> + } >> that is looking not good to me. If this is really requires lock, we should >> check for the lock always. In the other hand we could face missed >> lock out of coroutine. > As the commit message explains, outside of coroutine context, we can't > yield and bdrv_pread and bdrv_flush become synchronous operations > instead, so there is nothing else that we need to protect against. > > Kevin > Hmm. It seems I was not careful enough with reading entire message. I am fine with this though it looks a bit tricky to me as such things can change in the future. Anyway, you could consider this as Reviewed-by: Denis V. Lunev <den@openvz.org> Den
Am 24.10.2019 um 13:14 hat Denis Lunev geschrieben: > On 10/24/19 1:57 PM, Kevin Wolf wrote: > > Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben: > >> On 10/23/19 6:26 PM, Kevin Wolf wrote: > >>> qcow2_cache_do_get() requires that s->lock is locked because it can > >>> yield between picking a cache entry and actually taking ownership of it > >>> by setting offset and increasing the reference count. > >>> > >>> Add an assertion to make sure the caller really holds the lock. The > >>> function can be called outside of coroutine context, where bdrv_pread > >>> and flushes become synchronous operations. The lock cannot and need not > >>> be taken in this case. > >>> > >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com> > >>> --- > >>> block/qcow2-cache.c | 5 +++++ > >>> 1 file changed, 5 insertions(+) > >>> > >>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c > >>> index d29b038a67..75b13dad99 100644 > >>> --- a/block/qcow2-cache.c > >>> +++ b/block/qcow2-cache.c > >>> @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, > >>> int min_lru_index = -1; > >>> > >>> assert(offset != 0); > >>> + if (qemu_in_coroutine()) { > >>> + qemu_co_mutex_assert_locked(&s->lock); > >>> + } > >> that is looking not good to me. If this is really requires lock, we should > >> check for the lock always. In the other hand we could face missed > >> lock out of coroutine. > > As the commit message explains, outside of coroutine context, we can't > > yield and bdrv_pread and bdrv_flush become synchronous operations > > instead, so there is nothing else that we need to protect against. > > > Hmm. It seems I was not careful enough with reading entire message. > I am fine with this though it looks a bit tricky to me as such things > can change in the future. In which way do you think this could change? It's a pretty fundamental fact about non-coroutine code that it can't yield. What could change, of course, is that some code switches from being synchronous to using a coroutine. The assertion would automatically apply then and catch the bug if adding proper locking is forgotten. > Anyway, you could consider this as > > Reviewed-by: Denis V. Lunev <den@openvz.org> Thanks! Kevin
24.10.2019 13:57, Kevin Wolf wrote: > Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben: >> On 10/23/19 6:26 PM, Kevin Wolf wrote: >>> qcow2_cache_do_get() requires that s->lock is locked because it can >>> yield between picking a cache entry and actually taking ownership of it >>> by setting offset and increasing the reference count. >>> >>> Add an assertion to make sure the caller really holds the lock. The >>> function can be called outside of coroutine context, where bdrv_pread >>> and flushes become synchronous operations. The lock cannot and need not >>> be taken in this case. >>> >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com> >>> --- >>> block/qcow2-cache.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c >>> index d29b038a67..75b13dad99 100644 >>> --- a/block/qcow2-cache.c >>> +++ b/block/qcow2-cache.c >>> @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, >>> int min_lru_index = -1; >>> >>> assert(offset != 0); >>> + if (qemu_in_coroutine()) { >>> + qemu_co_mutex_assert_locked(&s->lock); >>> + } >> >> that is looking not good to me. If this is really requires lock, we should >> check for the lock always. In the other hand we could face missed >> lock out of coroutine. > > As the commit message explains, outside of coroutine context, we can't > yield and bdrv_pread and bdrv_flush become synchronous operations > instead, so there is nothing else that we need to protect against. > Recently we discussed similar problems about block-dirty-bitmap-* qmp commands, which wanted to update qcow2 metadata about bitmaps from non-coroutine context. "qcow2 lock" <135df452-397a-30bb-7518-2184fa5971aa@virtuozzo.com> https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg01419.html And, as I understand, the correct way is to convert to coroutine and lock mutex appropriately. Or we want to lock aio context and to be in drained section to avoid parallel requests accessing critical section. Should we assert such conditions in case of !qemu_in_coroutine() ? -- Final commit to fix bug about bitmaps: commit d2c3080e41fd2c9bc36c996cc9d33804462ba803 Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Date: Fri Sep 20 11:25:43 2019 +0300 block/qcow2: proper locking on bitmap add/remove paths
Am 24.10.2019 um 15:03 hat Vladimir Sementsov-Ogievskiy geschrieben: > 24.10.2019 13:57, Kevin Wolf wrote: > > Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben: > >> On 10/23/19 6:26 PM, Kevin Wolf wrote: > >>> qcow2_cache_do_get() requires that s->lock is locked because it can > >>> yield between picking a cache entry and actually taking ownership of it > >>> by setting offset and increasing the reference count. > >>> > >>> Add an assertion to make sure the caller really holds the lock. The > >>> function can be called outside of coroutine context, where bdrv_pread > >>> and flushes become synchronous operations. The lock cannot and need not > >>> be taken in this case. > >>> > >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com> > >>> --- > >>> block/qcow2-cache.c | 5 +++++ > >>> 1 file changed, 5 insertions(+) > >>> > >>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c > >>> index d29b038a67..75b13dad99 100644 > >>> --- a/block/qcow2-cache.c > >>> +++ b/block/qcow2-cache.c > >>> @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, > >>> int min_lru_index = -1; > >>> > >>> assert(offset != 0); > >>> + if (qemu_in_coroutine()) { > >>> + qemu_co_mutex_assert_locked(&s->lock); > >>> + } > >> > >> that is looking not good to me. If this is really requires lock, we should > >> check for the lock always. In the other hand we could face missed > >> lock out of coroutine. > > > > As the commit message explains, outside of coroutine context, we can't > > yield and bdrv_pread and bdrv_flush become synchronous operations > > instead, so there is nothing else that we need to protect against. > > > > Recently we discussed similar problems about block-dirty-bitmap-* qmp > commands, which wanted to update qcow2 metadata about bitmaps from > non-coroutine context. "qcow2 lock" > <135df452-397a-30bb-7518-2184fa5971aa@virtuozzo.com> > https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg01419.html Hm, right, I already forgot about the nested event loop again... > And, as I understand, the correct way is to convert to coroutine and > lock mutex appropriately. Or we want to lock aio context and to be in > drained section to avoid parallel requests accessing critical section. > Should we assert such conditions in case of !qemu_in_coroutine() ? The AioContext lock must be held anyway, so I don't think this would be a new requirement. As for draining, I'll have to see. I'm currently still auditing all the callers of qcow2_cache_do_get(). The synchronous callers I already know are the snapshot functions. I think these happen to be in a drained section anyway (or should be at least), so AioContext lock + drain seems like a very reasonable option for them. For other synchronous callers, if any, maybe conversion to a coroutine would make more sense. Kevin
Hi Kevin, On Wed, Oct 23, 2019 at 05:37:49PM +0200, Kevin Wolf wrote: > > qcow2_cache_do_get() requires that s->lock is locked because it can > > yield between picking a cache entry and actually taking ownership of it > > by setting offset and increasing the reference count. > > > > Add an assertion to make sure the caller really holds the lock. The > > function can be called outside of coroutine context, where bdrv_pread > > and flushes become synchronous operations. The lock cannot and need not > > be taken in this case. > I'm still running tests to see if any other code paths trigger the > assertion, but image creation calls this without the lock held (which is > harmless because nobody else knows about the image so there won't be > concurrent requests). The following patch is needed additionally to make > image creation work with the new assertion. I can confirm that with all four patches corruption does no longer occur as of commit 69f47505ee66afaa513305de0c1895a224e52c45. Removing only 3/3 (qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()) the assertion triggers after a few seconds, leaving behind a few leaked clusters but no errors in the image. (qemu) qemu-system-x86_64:qemu/include/qemu/coroutine.h:175: qemu_co_mutex_assert_locked: Assertion `mutex->locked && mutex->holder == qemu_coroutine_self()' failed. Aborted (core dumped) $ qemu-img check qtest.qcow2 Leaked cluster 169257 refcount=3 reference=2 Leaked cluster 172001 refcount=1 reference=0 Leaked cluster 172002 refcount=1 reference=0 Leaked cluster 172003 refcount=1 reference=0 Leaked cluster 172004 refcount=1 reference=0 Leaked cluster 172005 refcount=1 reference=0 Leaked cluster 172006 refcount=1 reference=0 Leaked cluster 172007 refcount=1 reference=0 Leaked cluster 172008 refcount=1 reference=0 Leaked cluster 172009 refcount=1 reference=0 Leaked cluster 172010 refcount=1 reference=0 Leaked cluster 172011 refcount=1 reference=0 Leaked cluster 172012 refcount=1 reference=0 13 leaked clusters were found on the image. This means waste of disk space, but no harm to data. 255525/327680 = 77.98% allocated, 3.22% fragmented, 0.00% compressed clusters Image end offset: 17106403328 I was going to test with master as well but got overtaken by v2. Will move on to test v2 now. :) Series: Tested-by: Michael Weiser <michael.weiser@gmx.de> No biggie but if there's a chance could you switch my address to the above?
Am 25.10.2019 um 12:35 hat Michael Weiser geschrieben: > I was going to test with master as well but got overtaken by v2. Will > move on to test v2 now. :) > > Series: > Tested-by: Michael Weiser <michael.weiser@gmx.de> Thanks for testing! The fix itself is unchanged in v2, so I assume the result will be the same, but testing it explicitly can't hurt. I'm going to send a pull request today, but if you're quick, I can wait for your results. > No biggie but if there's a chance could you switch my address to the > above? No problem, I've updated the address in the Reported-by tag. Kevin
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index d29b038a67..75b13dad99 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, int min_lru_index = -1; assert(offset != 0); + if (qemu_in_coroutine()) { + qemu_co_mutex_assert_locked(&s->lock); + } trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache, offset, read_from_disk); @@ -386,6 +389,8 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, } } + assert(c->entries[i].ref == 0); + assert(c->entries[i].offset == 0); c->entries[i].offset = offset; /* And return the right table */
qcow2_cache_do_get() requires that s->lock is locked because it can yield between picking a cache entry and actually taking ownership of it by setting offset and increasing the reference count. Add an assertion to make sure the caller really holds the lock. The function can be called outside of coroutine context, where bdrv_pread and flushes become synchronous operations. The lock cannot and need not be taken in this case. Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/qcow2-cache.c | 5 +++++ 1 file changed, 5 insertions(+)