diff mbox series

[v4] blockjob: Fix crash with IOthread when block commit after snapshot

Message ID 20210128013053.23266-1-08005325@163.com (mailing list archive)
State New, archived
Headers show
Series [v4] blockjob: Fix crash with IOthread when block commit after snapshot | expand

Commit Message

08005325@163.com Jan. 28, 2021, 1:30 a.m. UTC
From: Michael Qiu <qiudayu@huayun.com>

v4: rebase to latest code

v3: reformat the commit log, remove duplicate content

v2: modify the coredump backtrace within commit log with the newest
    qemu with master branch

Currently, if guest has workloads, IO thread will acquire aio_context
lock before do io_submit, it leads to segmentfault when do block commit
after snapshot. Just like below:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f7c7d91f700 (LWP 99907)]
0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437
1437    ../block/mirror.c: No such file or directory.
(gdb) p s->job
$17 = (MirrorBlockJob *) 0x0
(gdb) p s->stop
$18 = false

(gdb) bt

Switch to qemu main thread:
/lib/../lib64/libpthread.so.0
/lib/../lib64/libpthread.so.0
../util/qemu-thread-posix.c:79
qapi/qapi-commands-block-core.c:346
../qapi/qmp-dispatch.c:110
/lib/../lib64/libglib-2.0.so.0

In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and stop field
is false, this means the MirrorBDSOpaque "s" object has not been initialized yet,
and this object is initialized by block_job_create(), but the initialize
process is stuck in acquiring the lock.

The rootcause is that qemu do release/acquire when hold the lock,
at the same time, IO thread get the lock after release stage, and the crash
occured.

Actually, in this situation, job->job.aio_context will not equal to
qemu_get_aio_context(), and will be the same as bs->aio_context,
thus, no need to release the lock, becasue bdrv_root_attach_child()
will not change the context.

This patch fix this issue.

Signed-off-by: Michael Qiu <qiudayu@huayun.com>
---
 blockjob.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Michael Qiu Jan. 28, 2021, 5:16 a.m. UTC | #1
Any comments?

-----Original Message-----
From: 08005325@163.com <08005325@163.com> 
Sent: 2021年1月28日 9:31
To: kwolf@redhat.com; mreitz@redhat.com; jsnow@redhat.com
Cc: qemu-block@nongnu.org; qemu-devel@nongnu.org; 仇大玉 <qiudayu@huayun.com>
Subject: [PATCH v4] blockjob: Fix crash with IOthread when block commit after snapshot

From: Michael Qiu <qiudayu@huayun.com>

v4: rebase to latest code

v3: reformat the commit log, remove duplicate content

v2: modify the coredump backtrace within commit log with the newest
    qemu with master branch

Currently, if guest has workloads, IO thread will acquire aio_context lock before do io_submit, it leads to segmentfault when do block commit after snapshot. Just like below:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f7c7d91f700 (LWP 99907)] 0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437
1437    ../block/mirror.c: No such file or directory.
(gdb) p s->job
$17 = (MirrorBlockJob *) 0x0
(gdb) p s->stop
$18 = false

(gdb) bt

Switch to qemu main thread:
/lib/../lib64/libpthread.so.0
/lib/../lib64/libpthread.so.0
../util/qemu-thread-posix.c:79
qapi/qapi-commands-block-core.c:346
../qapi/qmp-dispatch.c:110
/lib/../lib64/libglib-2.0.so.0

In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and stop field is false, this means the MirrorBDSOpaque "s" object has not been initialized yet, and this object is initialized by block_job_create(), but the initialize process is stuck in acquiring the lock.

The rootcause is that qemu do release/acquire when hold the lock, at the same time, IO thread get the lock after release stage, and the crash occured.

Actually, in this situation, job->job.aio_context will not equal to qemu_get_aio_context(), and will be the same as bs->aio_context, thus, no need to release the lock, becasue bdrv_root_attach_child() will not change the context.

This patch fix this issue.

Signed-off-by: Michael Qiu <qiudayu@huayun.com>
---
 blockjob.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 98ac8af982..51a09f3b60 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -214,13 +214,15 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
     BdrvChild *c;
 
     bdrv_ref(bs);
-    if (job->job.aio_context != qemu_get_aio_context()) {
+    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+        job->job.aio_context != qemu_get_aio_context()) {
         aio_context_release(job->job.aio_context);
     }
     c = bdrv_root_attach_child(bs, name, &child_job, 0,
                                job->job.aio_context, perm, shared_perm, job,
                                errp);
-    if (job->job.aio_context != qemu_get_aio_context()) {
+    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+        job->job.aio_context != qemu_get_aio_context()) {
         aio_context_acquire(job->job.aio_context);
     }
     if (c == NULL) {
--
2.22.0
Michael Qiu Feb. 1, 2021, 2:40 a.m. UTC | #2
Any comments?

It's really a bug and can cause the qemu to segmentfault.

Thanks,
Michael

-----Original Message-----
From: 仇大玉 
Sent: 2021年1月28日 13:16
To: qemu-block@nongnu.org; qemu-devel@nongnu.org
Cc: kwolf@redhat.com; mreitz@redhat.com; jsnow@redhat.com; 08005325@163.com
Subject: RE: [PATCH v4] blockjob: Fix crash with IOthread when block commit after snapshot

Any comments?

-----Original Message-----
From: 08005325@163.com <08005325@163.com> 
Sent: 2021年1月28日 9:31
To: kwolf@redhat.com; mreitz@redhat.com; jsnow@redhat.com
Cc: qemu-block@nongnu.org; qemu-devel@nongnu.org; 仇大玉 <qiudayu@huayun.com>
Subject: [PATCH v4] blockjob: Fix crash with IOthread when block commit after snapshot

From: Michael Qiu <qiudayu@huayun.com>

v4: rebase to latest code

v3: reformat the commit log, remove duplicate content

v2: modify the coredump backtrace within commit log with the newest
    qemu with master branch

Currently, if guest has workloads, IO thread will acquire aio_context lock before do io_submit, it leads to segmentfault when do block commit after snapshot. Just like below:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f7c7d91f700 (LWP 99907)] 0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437
1437    ../block/mirror.c: No such file or directory.
(gdb) p s->job
$17 = (MirrorBlockJob *) 0x0
(gdb) p s->stop
$18 = false

(gdb) bt

Switch to qemu main thread:
/lib/../lib64/libpthread.so.0
/lib/../lib64/libpthread.so.0
../util/qemu-thread-posix.c:79
qapi/qapi-commands-block-core.c:346
../qapi/qmp-dispatch.c:110
/lib/../lib64/libglib-2.0.so.0

In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and stop field is false, this means the MirrorBDSOpaque "s" object has not been initialized yet, and this object is initialized by block_job_create(), but the initialize process is stuck in acquiring the lock.

The rootcause is that qemu do release/acquire when hold the lock, at the same time, IO thread get the lock after release stage, and the crash occured.

Actually, in this situation, job->job.aio_context will not equal to qemu_get_aio_context(), and will be the same as bs->aio_context, thus, no need to release the lock, becasue bdrv_root_attach_child() will not change the context.

This patch fix this issue.

Signed-off-by: Michael Qiu <qiudayu@huayun.com>
---
 blockjob.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 98ac8af982..51a09f3b60 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -214,13 +214,15 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
     BdrvChild *c;
 
     bdrv_ref(bs);
-    if (job->job.aio_context != qemu_get_aio_context()) {
+    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+        job->job.aio_context != qemu_get_aio_context()) {
         aio_context_release(job->job.aio_context);
     }
     c = bdrv_root_attach_child(bs, name, &child_job, 0,
                                job->job.aio_context, perm, shared_perm, job,
                                errp);
-    if (job->job.aio_context != qemu_get_aio_context()) {
+    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+        job->job.aio_context != qemu_get_aio_context()) {
         aio_context_acquire(job->job.aio_context);
     }
     if (c == NULL) {
--
2.22.0
Vladimir Sementsov-Ogievskiy Feb. 1, 2021, 10:27 a.m. UTC | #3
Hi!

Tanks for fixing and sorry for a delay!

Please send each new version of a patch as a separate branch. It's a rule from https://wiki.qemu.org/Contribute/SubmitAPatch and it is more readable and less probable that your patch will be missed.

28.01.2021 04:30, 08005325@163.com wrote:
> From: Michael Qiu <qiudayu@huayun.com>
> 
> v4: rebase to latest code
> 
> v3: reformat the commit log, remove duplicate content
> 
> v2: modify the coredump backtrace within commit log with the newest
>      qemu with master branch

Such things shouldn't be in a commit message. You may put such comments after --- line[*] in a generated patch email

> 
> Currently, if guest has workloads, IO thread will acquire aio_context
> lock before do io_submit, it leads to segmentfault when do block commit
> after snapshot. Just like below:

Do you have some reproducer script?

> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f7c7d91f700 (LWP 99907)]
> 0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437
> 1437    ../block/mirror.c: No such file or directory.
> (gdb) p s->job
> $17 = (MirrorBlockJob *) 0x0
> (gdb) p s->stop
> $18 = false
> 
> (gdb) bt
> 
> Switch to qemu main thread:
> /lib/../lib64/libpthread.so.0
> /lib/../lib64/libpthread.so.0
> ../util/qemu-thread-posix.c:79
> qapi/qapi-commands-block-core.c:346
> ../qapi/qmp-dispatch.c:110
> /lib/../lib64/libglib-2.0.so.0

Not very informative bt..

> 
> In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and stop field
> is false, this means the MirrorBDSOpaque "s" object has not been initialized yet,
> and this object is initialized by block_job_create(), but the initialize
> process is stuck in acquiring the lock.

Could you show another thread bt?

Hm, so you argue that we come to bdrv_mirror_top_pwritev() (which means that
mirror-top node is already inserted into block graph), but its bs->opaque is
not initialized?

Hmm, really in mirror_start_job we do insert mirror_top_bs before block_job_create() call.

But we should do that all in a drained section, so that no parallel io requests may come.

And we have a drained section but it finishes immediately after bdrv_append, when
bs_opaque is still not initialized. Probably we just need to expand it?


  May be:

diff --git a/block/mirror.c b/block/mirror.c
index 8e1ad6eceb..0a6bfc1230 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1610,11 +1610,11 @@ static BlockJob *mirror_start_job(
      bdrv_ref(mirror_top_bs);
      bdrv_drained_begin(bs);
      bdrv_append(mirror_top_bs, bs, &local_err);
-    bdrv_drained_end(bs);
  
      if (local_err) {
          bdrv_unref(mirror_top_bs);
          error_propagate(errp, local_err);
+        bdrv_drained_end(bs);
          return NULL;
      }
  
@@ -1789,6 +1789,8 @@ static BlockJob *mirror_start_job(
      trace_mirror_start(bs, s, opaque);
      job_start(&s->common.job);
  
+    bdrv_drained_end(bs);
+
      return &s->common;
  
  fail:
@@ -1813,6 +1815,8 @@ fail:
  
      bdrv_unref(mirror_top_bs);
  
+    bdrv_drained_end(bs);
+
      return NULL;
  }
  


Could you check, does it help?


> 
> The rootcause is that qemu do release/acquire when hold the lock,
> at the same time, IO thread get the lock after release stage, and the crash
> occured.
> 
> Actually, in this situation, job->job.aio_context will not equal to
> qemu_get_aio_context(), and will be the same as bs->aio_context,
> thus, no need to release the lock, becasue bdrv_root_attach_child()
> will not change the context.
> 
> This patch fix this issue.
> 
> Signed-off-by: Michael Qiu <qiudayu@huayun.com>
> ---

[*] here you could add any comments, which will not go into final commit message, like version history.

>   blockjob.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 98ac8af982..51a09f3b60 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -214,13 +214,15 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
>       BdrvChild *c;
>   
>       bdrv_ref(bs);
> -    if (job->job.aio_context != qemu_get_aio_context()) {
> +    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
> +        job->job.aio_context != qemu_get_aio_context()) {
>           aio_context_release(job->job.aio_context);
>       }
>       c = bdrv_root_attach_child(bs, name, &child_job, 0,
>                                  job->job.aio_context, perm, shared_perm, job,
>                                  errp);
> -    if (job->job.aio_context != qemu_get_aio_context()) {
> +    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
> +        job->job.aio_context != qemu_get_aio_context()) {

that's a wrong check, it will never reacquire the lock on success path, as after successful attach, bs context would definitely equal to job context.

I think you need a boolean variable at start of function, initialized to the condition, and after _attach_child() you not recheck the condition but rely on variable.

>           aio_context_acquire(job->job.aio_context);
>       }
>       if (c == NULL) {
> 

The code was introduced by Kevin in 132ada80c4a6 "block: Adjust AioContexts when attaching nodes", so I think we need his opinion.
You also may add "Fixes: 132ada80c4a6fea7b67e8bb0a5fd299994d927c6", especially if you check that your case doesn't fail before this commit.

I think the idea itself is correct, as bdrv_root_attach_child will not call any of *_set_aio_*, and no reason to release the lock. So it shouldn't hurt and it's great if it fixes some crash.

When side effect of a function is temporary releasing some lock, it's hard to be sure that all callers are OK with it...
Michael Qiu Feb. 1, 2021, 11:26 a.m. UTC | #4
I'm so sorry, forgive my mail client(outlook)

I have try your solution, It doesn't work, still cause crash.

The reason is:  we come to bdrv_mirror_top_pwritev() (which means that mirror-top node is already inserted into block graph), but its bs->opaque->job is not initialized"

But the root cause is that in block_job_create() we released(unnecessary) the aio_context, and the iothread get the context.

Script has to part, one is run in the VM (to give some workload) we named script A:
#!/bin/sh
For((i=1;i<=100000000;i++));
Do
dd if=/dev/zero of=./xxx bs=1M count=200
sleep 6
done

Another one is in the hypervisor, we named script B:
#!/bin/sh
for((i=1;i<=10000000;i++));
do
virsh snapshot-create-as fqtest --atomic --no-metadata --name fq6 --disk-only  --diskspec vda,snapshot=external,file=/home/michael/snapshot/fq6.qcow2;
virsh blockcommit fqtest /home/michael/snapshot/fq6.qcow2 --shallow --verbose --wait --pivot --top /home/michael/snapshot/fq6.qcow2;
rm -r fq6.qcow2
done

How to reproduce:
1. start a VM, my case is use libvirt, named fqtest
2. run script B in hypervisor
3. after guest boot up, login and run script A in vda.

Make sure, the IO thread enabled for vda.

Mostly, just wait for several minutes, it will crash. 

The whole thread backtrace is:

[Switching to Thread 0x7f7c7d91f700 (LWP 99907)]
0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437
1437    ../block/mirror.c: No such file or directory.
(gdb) p s->job
$17 = (MirrorBlockJob *) 0x0
(gdb) p s->stop
$18 = false

(gdb) bt
#0  0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437
#1  0x00005576d0f7f3ab in bdrv_driver_pwritev at ../block/io.c:1174
#2  0x00005576d0f8139d in bdrv_aligned_pwritev at ../block/io.c:1988
#3  0x00005576d0f81b65 in bdrv_co_pwritev_part at ../block/io.c:2156
#4  0x00005576d0f8e6b7 in blk_do_pwritev_part at ../block/block-backend.c:1260
#5  0x00005576d0f8e84d in blk_aio_write_entry at ../block/block-backend.c:1476
#6  0x00005576d1060ddb in coroutine_trampoline at ../util/coroutine-ucontext.c:173
#7  0x00007f7c8d3be0d0 in __start_context at /lib/../lib64/libc.so.6
#8  0x00007f7b52beb1e0 in  ()
#9  0x0000000000000000 in  ()

Switch to qemu main thread:
#0  0x00007f903be704ed in __lll_lock_wait at
/lib/../lib64/libpthread.so.0
#1  0x00007f903be6bde6 in _L_lock_941 at /lib/../lib64/libpthread.so.0
#2  0x00007f903be6bcdf in pthread_mutex_lock at
/lib/../lib64/libpthread.so.0
#3  0x0000564b21456889 in qemu_mutex_lock_impl at
../util/qemu-thread-posix.c:79
#4  0x0000564b213af8a5 in block_job_add_bdrv at ../blockjob.c:224
#5  0x0000564b213b00ad in block_job_create at ../blockjob.c:440
#6  0x0000564b21357c0a in mirror_start_job at ../block/mirror.c:1622
#7  0x0000564b2135a9af in commit_active_start at ../block/mirror.c:1867
#8  0x0000564b2133d132 in qmp_block_commit at ../blockdev.c:2768
#9  0x0000564b2141fef3 in qmp_marshal_block_commit at
qapi/qapi-commands-block-core.c:346
#10 0x0000564b214503c9 in do_qmp_dispatch_bh at
../qapi/qmp-dispatch.c:110
#11 0x0000564b21451996 in aio_bh_poll at ../util/async.c:164
#12 0x0000564b2146018e in aio_dispatch at ../util/aio-posix.c:381
#13 0x0000564b2145187e in aio_ctx_dispatch at ../util/async.c:306
#14 0x00007f9040239049 in g_main_context_dispatch at
/lib/../lib64/libglib-2.0.so.0
#15 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:232
#16 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:255
#17 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:531
#18 0x0000564b212304e1 in qemu_main_loop at ../softmmu/runstate.c:721
#19 0x0000564b20f7975e in main at ../softmmu/main.c:50

Thanks,
Michael

-----Original Message-----
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> 
Sent: 2021年2月1日 18:28
To: 08005325@163.com; kwolf@redhat.com; mreitz@redhat.com; jsnow@redhat.com
Cc: 仇大玉 <qiudayu@huayun.com>; qemu-devel@nongnu.org; qemu-block@nongnu.org
Subject: Re: [PATCH v4] blockjob: Fix crash with IOthread when block commit after snapshot

Hi!

Tanks for fixing and sorry for a delay!

Please send each new version of a patch as a separate branch. It's a rule from https://wiki.qemu.org/Contribute/SubmitAPatch and it is more readable and less probable that your patch will be missed.

28.01.2021 04:30, 08005325@163.com wrote:
> From: Michael Qiu <qiudayu@huayun.com>
> 
> v4: rebase to latest code
> 
> v3: reformat the commit log, remove duplicate content
> 
> v2: modify the coredump backtrace within commit log with the newest
>      qemu with master branch

Such things shouldn't be in a commit message. You may put such comments after --- line[*] in a generated patch email

> 
> Currently, if guest has workloads, IO thread will acquire aio_context 
> lock before do io_submit, it leads to segmentfault when do block 
> commit after snapshot. Just like below:

Do you have some reproducer script?

> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f7c7d91f700 (LWP 99907)] 0x00005576d0f65aab in 
> bdrv_mirror_top_pwritev at ../block/mirror.c:1437
> 1437    ../block/mirror.c: No such file or directory.
> (gdb) p s->job
> $17 = (MirrorBlockJob *) 0x0
> (gdb) p s->stop
> $18 = false
> 
> (gdb) bt
> 
> Switch to qemu main thread:
> /lib/../lib64/libpthread.so.0
> /lib/../lib64/libpthread.so.0
> ../util/qemu-thread-posix.c:79
> qapi/qapi-commands-block-core.c:346
> ../qapi/qmp-dispatch.c:110
> /lib/../lib64/libglib-2.0.so.0

Not very informative bt..

> 
> In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and 
> stop field is false, this means the MirrorBDSOpaque "s" object has not 
> been initialized yet, and this object is initialized by 
> block_job_create(), but the initialize process is stuck in acquiring the lock.

Could you show another thread bt?

Hm, so you argue that we come to bdrv_mirror_top_pwritev() (which means that mirror-top node is already inserted into block graph), but its bs->opaque is not initialized?

Hmm, really in mirror_start_job we do insert mirror_top_bs before block_job_create() call.

But we should do that all in a drained section, so that no parallel io requests may come.

And we have a drained section but it finishes immediately after bdrv_append, when bs_opaque is still not initialized. Probably we just need to expand it?


  May be:

diff --git a/block/mirror.c b/block/mirror.c index 8e1ad6eceb..0a6bfc1230 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1610,11 +1610,11 @@ static BlockJob *mirror_start_job(
      bdrv_ref(mirror_top_bs);
      bdrv_drained_begin(bs);
      bdrv_append(mirror_top_bs, bs, &local_err);
-    bdrv_drained_end(bs);
  
      if (local_err) {
          bdrv_unref(mirror_top_bs);
          error_propagate(errp, local_err);
+        bdrv_drained_end(bs);
          return NULL;
      }
  
@@ -1789,6 +1789,8 @@ static BlockJob *mirror_start_job(
      trace_mirror_start(bs, s, opaque);
      job_start(&s->common.job);
  
+    bdrv_drained_end(bs);
+
      return &s->common;
  
  fail:
@@ -1813,6 +1815,8 @@ fail:
  
      bdrv_unref(mirror_top_bs);
  
+    bdrv_drained_end(bs);
+
      return NULL;
  }
  


Could you check, does it help?


> 
> The rootcause is that qemu do release/acquire when hold the lock, at 
> the same time, IO thread get the lock after release stage, and the 
> crash occured.
> 
> Actually, in this situation, job->job.aio_context will not equal to 
> qemu_get_aio_context(), and will be the same as bs->aio_context, thus, 
> no need to release the lock, becasue bdrv_root_attach_child() will not 
> change the context.
> 
> This patch fix this issue.
> 
> Signed-off-by: Michael Qiu <qiudayu@huayun.com>
> ---

[*] here you could add any comments, which will not go into final commit message, like version history.

>   blockjob.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 98ac8af982..51a09f3b60 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -214,13 +214,15 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
>       BdrvChild *c;
>   
>       bdrv_ref(bs);
> -    if (job->job.aio_context != qemu_get_aio_context()) {
> +    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
> +        job->job.aio_context != qemu_get_aio_context()) {
>           aio_context_release(job->job.aio_context);
>       }
>       c = bdrv_root_attach_child(bs, name, &child_job, 0,
>                                  job->job.aio_context, perm, shared_perm, job,
>                                  errp);
> -    if (job->job.aio_context != qemu_get_aio_context()) {
> +    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
> +        job->job.aio_context != qemu_get_aio_context()) {

that's a wrong check, it will never reacquire the lock on success path, as after successful attach, bs context would definitely equal to job context.

I think you need a boolean variable at start of function, initialized to the condition, and after _attach_child() you not recheck the condition but rely on variable.

>           aio_context_acquire(job->job.aio_context);
>       }
>       if (c == NULL) {
> 

The code was introduced by Kevin in 132ada80c4a6 "block: Adjust AioContexts when attaching nodes", so I think we need his opinion.
You also may add "Fixes: 132ada80c4a6fea7b67e8bb0a5fd299994d927c6", especially if you check that your case doesn't fail before this commit.

I think the idea itself is correct, as bdrv_root_attach_child will not call any of *_set_aio_*, and no reason to release the lock. So it shouldn't hurt and it's great if it fixes some crash.

When side effect of a function is temporary releasing some lock, it's hard to be sure that all callers are OK with it...


--
Best regards,
Vladimir
Peng Liang Feb. 1, 2021, 12:07 p.m. UTC | #5
Hi,

I encountered the problem months ago too.  Could we move the creation of
the block job (block_job_create) before appending the new bs to
mirror_top_bs (bdrv_append) as I wrote in [*]?  I found that after
bdrv_append, qemu will use mirror_top_bs to do write.  And when writing,
qemu will use bs->opaque, which maybe NULL.

[*]
http://patchwork.ozlabs.org/project/qemu-devel/patch/20200826131910.1879079-1-liangpeng10@huawei.com/

Thanks,
Peng

On 2/1/2021 7:26 PM, 仇大玉 wrote:
> I'm so sorry, forgive my mail client(outlook)
> 
> I have try your solution, It doesn't work, still cause crash.
> 
> The reason is:  we come to bdrv_mirror_top_pwritev() (which means that mirror-top node is already inserted into block graph), but its bs->opaque->job is not initialized"
> 
> But the root cause is that in block_job_create() we released(unnecessary) the aio_context, and the iothread get the context.
> 
> Script has to part, one is run in the VM (to give some workload) we named script A:
> #!/bin/sh
> For((i=1;i<=100000000;i++));
> Do
> dd if=/dev/zero of=./xxx bs=1M count=200
> sleep 6
> done
> 
> Another one is in the hypervisor, we named script B:
> #!/bin/sh
> for((i=1;i<=10000000;i++));
> do
> virsh snapshot-create-as fqtest --atomic --no-metadata --name fq6 --disk-only  --diskspec vda,snapshot=external,file=/home/michael/snapshot/fq6.qcow2;
> virsh blockcommit fqtest /home/michael/snapshot/fq6.qcow2 --shallow --verbose --wait --pivot --top /home/michael/snapshot/fq6.qcow2;
> rm -r fq6.qcow2
> done
> 
> How to reproduce:
> 1. start a VM, my case is use libvirt, named fqtest
> 2. run script B in hypervisor
> 3. after guest boot up, login and run script A in vda.
> 
> Make sure, the IO thread enabled for vda.
> 
> Mostly, just wait for several minutes, it will crash. 
> 
> The whole thread backtrace is:
> 
> [Switching to Thread 0x7f7c7d91f700 (LWP 99907)]
> 0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437
> 1437    ../block/mirror.c: No such file or directory.
> (gdb) p s->job
> $17 = (MirrorBlockJob *) 0x0
> (gdb) p s->stop
> $18 = false
> 
> (gdb) bt
> #0  0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437
> #1  0x00005576d0f7f3ab in bdrv_driver_pwritev at ../block/io.c:1174
> #2  0x00005576d0f8139d in bdrv_aligned_pwritev at ../block/io.c:1988
> #3  0x00005576d0f81b65 in bdrv_co_pwritev_part at ../block/io.c:2156
> #4  0x00005576d0f8e6b7 in blk_do_pwritev_part at ../block/block-backend.c:1260
> #5  0x00005576d0f8e84d in blk_aio_write_entry at ../block/block-backend.c:1476
> #6  0x00005576d1060ddb in coroutine_trampoline at ../util/coroutine-ucontext.c:173
> #7  0x00007f7c8d3be0d0 in __start_context at /lib/../lib64/libc.so.6
> #8  0x00007f7b52beb1e0 in  ()
> #9  0x0000000000000000 in  ()
> 
> Switch to qemu main thread:
> #0  0x00007f903be704ed in __lll_lock_wait at
> /lib/../lib64/libpthread.so.0
> #1  0x00007f903be6bde6 in _L_lock_941 at /lib/../lib64/libpthread.so.0
> #2  0x00007f903be6bcdf in pthread_mutex_lock at
> /lib/../lib64/libpthread.so.0
> #3  0x0000564b21456889 in qemu_mutex_lock_impl at
> ../util/qemu-thread-posix.c:79
> #4  0x0000564b213af8a5 in block_job_add_bdrv at ../blockjob.c:224
> #5  0x0000564b213b00ad in block_job_create at ../blockjob.c:440
> #6  0x0000564b21357c0a in mirror_start_job at ../block/mirror.c:1622
> #7  0x0000564b2135a9af in commit_active_start at ../block/mirror.c:1867
> #8  0x0000564b2133d132 in qmp_block_commit at ../blockdev.c:2768
> #9  0x0000564b2141fef3 in qmp_marshal_block_commit at
> qapi/qapi-commands-block-core.c:346
> #10 0x0000564b214503c9 in do_qmp_dispatch_bh at
> ../qapi/qmp-dispatch.c:110
> #11 0x0000564b21451996 in aio_bh_poll at ../util/async.c:164
> #12 0x0000564b2146018e in aio_dispatch at ../util/aio-posix.c:381
> #13 0x0000564b2145187e in aio_ctx_dispatch at ../util/async.c:306
> #14 0x00007f9040239049 in g_main_context_dispatch at
> /lib/../lib64/libglib-2.0.so.0
> #15 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:232
> #16 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:255
> #17 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:531
> #18 0x0000564b212304e1 in qemu_main_loop at ../softmmu/runstate.c:721
> #19 0x0000564b20f7975e in main at ../softmmu/main.c:50
> 
> Thanks,
> Michael
> 
> -----Original Message-----
> From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> 
> Sent: 2021年2月1日 18:28
> To: 08005325@163.com; kwolf@redhat.com; mreitz@redhat.com; jsnow@redhat.com
> Cc: 仇大玉 <qiudayu@huayun.com>; qemu-devel@nongnu.org; qemu-block@nongnu.org
> Subject: Re: [PATCH v4] blockjob: Fix crash with IOthread when block commit after snapshot
> 
> Hi!
> 
> Tanks for fixing and sorry for a delay!
> 
> Please send each new version of a patch as a separate branch. It's a rule from https://wiki.qemu.org/Contribute/SubmitAPatch and it is more readable and less probable that your patch will be missed.
> 
> 28.01.2021 04:30, 08005325@163.com wrote:
>> From: Michael Qiu <qiudayu@huayun.com>
>>
>> v4: rebase to latest code
>>
>> v3: reformat the commit log, remove duplicate content
>>
>> v2: modify the coredump backtrace within commit log with the newest
>>      qemu with master branch
> 
> Such things shouldn't be in a commit message. You may put such comments after --- line[*] in a generated patch email
> 
>>
>> Currently, if guest has workloads, IO thread will acquire aio_context 
>> lock before do io_submit, it leads to segmentfault when do block 
>> commit after snapshot. Just like below:
> 
> Do you have some reproducer script?
> 
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7f7c7d91f700 (LWP 99907)] 0x00005576d0f65aab in 
>> bdrv_mirror_top_pwritev at ../block/mirror.c:1437
>> 1437    ../block/mirror.c: No such file or directory.
>> (gdb) p s->job
>> $17 = (MirrorBlockJob *) 0x0
>> (gdb) p s->stop
>> $18 = false
>>
>> (gdb) bt
>>
>> Switch to qemu main thread:
>> /lib/../lib64/libpthread.so.0
>> /lib/../lib64/libpthread.so.0
>> ../util/qemu-thread-posix.c:79
>> qapi/qapi-commands-block-core.c:346
>> ../qapi/qmp-dispatch.c:110
>> /lib/../lib64/libglib-2.0.so.0
> 
> Not very informative bt..
> 
>>
>> In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and 
>> stop field is false, this means the MirrorBDSOpaque "s" object has not 
>> been initialized yet, and this object is initialized by 
>> block_job_create(), but the initialize process is stuck in acquiring the lock.
> 
> Could you show another thread bt?
> 
> Hm, so you argue that we come to bdrv_mirror_top_pwritev() (which means that mirror-top node is already inserted into block graph), but its bs->opaque is not initialized?
> 
> Hmm, really in mirror_start_job we do insert mirror_top_bs before block_job_create() call.
> 
> But we should do that all in a drained section, so that no parallel io requests may come.
> 
> And we have a drained section but it finishes immediately after bdrv_append, when bs_opaque is still not initialized. Probably we just need to expand it?
> 
> 
>   May be:
> 
> diff --git a/block/mirror.c b/block/mirror.c index 8e1ad6eceb..0a6bfc1230 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1610,11 +1610,11 @@ static BlockJob *mirror_start_job(
>       bdrv_ref(mirror_top_bs);
>       bdrv_drained_begin(bs);
>       bdrv_append(mirror_top_bs, bs, &local_err);
> -    bdrv_drained_end(bs);
>   
>       if (local_err) {
>           bdrv_unref(mirror_top_bs);
>           error_propagate(errp, local_err);
> +        bdrv_drained_end(bs);
>           return NULL;
>       }
>   
> @@ -1789,6 +1789,8 @@ static BlockJob *mirror_start_job(
>       trace_mirror_start(bs, s, opaque);
>       job_start(&s->common.job);
>   
> +    bdrv_drained_end(bs);
> +
>       return &s->common;
>   
>   fail:
> @@ -1813,6 +1815,8 @@ fail:
>   
>       bdrv_unref(mirror_top_bs);
>   
> +    bdrv_drained_end(bs);
> +
>       return NULL;
>   }
>   
> 
> 
> Could you check, does it help?
> 
> 
>>
>> The rootcause is that qemu do release/acquire when hold the lock, at 
>> the same time, IO thread get the lock after release stage, and the 
>> crash occured.
>>
>> Actually, in this situation, job->job.aio_context will not equal to 
>> qemu_get_aio_context(), and will be the same as bs->aio_context, thus, 
>> no need to release the lock, becasue bdrv_root_attach_child() will not 
>> change the context.
>>
>> This patch fix this issue.
>>
>> Signed-off-by: Michael Qiu <qiudayu@huayun.com>
>> ---
> 
> [*] here you could add any comments, which will not go into final commit message, like version history.
> 
>>   blockjob.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/blockjob.c b/blockjob.c
>> index 98ac8af982..51a09f3b60 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -214,13 +214,15 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
>>       BdrvChild *c;
>>   
>>       bdrv_ref(bs);
>> -    if (job->job.aio_context != qemu_get_aio_context()) {
>> +    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
>> +        job->job.aio_context != qemu_get_aio_context()) {
>>           aio_context_release(job->job.aio_context);
>>       }
>>       c = bdrv_root_attach_child(bs, name, &child_job, 0,
>>                                  job->job.aio_context, perm, shared_perm, job,
>>                                  errp);
>> -    if (job->job.aio_context != qemu_get_aio_context()) {
>> +    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
>> +        job->job.aio_context != qemu_get_aio_context()) {
> 
> that's a wrong check, it will never reacquire the lock on success path, as after successful attach, bs context would definitely equal to job context.
> 
> I think you need a boolean variable at start of function, initialized to the condition, and after _attach_child() you not recheck the condition but rely on variable.
> 
>>           aio_context_acquire(job->job.aio_context);
>>       }
>>       if (c == NULL) {
>>
> 
> The code was introduced by Kevin in 132ada80c4a6 "block: Adjust AioContexts when attaching nodes", so I think we need his opinion.
> You also may add "Fixes: 132ada80c4a6fea7b67e8bb0a5fd299994d927c6", especially if you check that your case doesn't fail before this commit.
> 
> I think the idea itself is correct, as bdrv_root_attach_child will not call any of *_set_aio_*, and no reason to release the lock. So it shouldn't hurt and it's great if it fixes some crash.
> 
> When side effect of a function is temporary releasing some lock, it's hard to be sure that all callers are OK with it...
> 
> 
> --
> Best regards,
> Vladimir
>
Vladimir Sementsov-Ogievskiy Feb. 1, 2021, 12:44 p.m. UTC | #6
Hi!

01.02.2021 15:07, Peng Liang wrote:
> Hi,
> 
> I encountered the problem months ago too.  Could we move the creation of
> the block job (block_job_create) before appending the new bs to
> mirror_top_bs (bdrv_append) as I wrote in [*]?  I found that after
> bdrv_append, qemu will use mirror_top_bs to do write.  And when writing,
> qemu will use bs->opaque, which maybe NULL.
> 
> [*]
> http://patchwork.ozlabs.org/project/qemu-devel/patch/20200826131910.1879079-1-liangpeng10@huawei.com/
> 

In this patch you create job over original bs, when jobs are normally created over job-filter bs. I don't know is it wrong, but it at least requires some research, and probably the code that removes the filter should be adjusted somehow. Also, you make bs->opaque be non-zero. But still, job is not fully initialized, and some another problem may occur. So, do we create job prior to filter insertion or after it, parallel io requests to bs should not interrupt mirror_start_job(). So I think Michael's patch is closer to real problem to fix.
Michael Qiu Feb. 1, 2021, 1:09 p.m. UTC | #7
Peng,



In my analysis, the root casue should be the lock: aio_context, qemu main thread do an unnecessary release/aquire action,
That's why IO thread could get the lock it shouldn't hold at this stage.
Thanks,
Michael














At 2021-02-01 20:44:00, "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com> wrote:
>Hi!
>
>01.02.2021 15:07, Peng Liang wrote:
>> Hi,
>> 
>> I encountered the problem months ago too.  Could we move the creation of
>> the block job (block_job_create) before appending the new bs to
>> mirror_top_bs (bdrv_append) as I wrote in [*]?  I found that after
>> bdrv_append, qemu will use mirror_top_bs to do write.  And when writing,
>> qemu will use bs->opaque, which maybe NULL.
>> 
>> [*]
>> http://patchwork.ozlabs.org/project/qemu-devel/patch/20200826131910.1879079-1-liangpeng10@huawei.com/
>> 
>
>In this patch you create job over original bs, when jobs are normally created over job-filter bs. I don't know is it wrong, but it at least requires some research, and probably the code that removes the filter should be adjusted somehow. Also, you make bs->opaque be non-zero. But still, job is not fully initialized, and some another problem may occur. So, do we create job prior to filter insertion or after it, parallel io requests to bs should not interrupt mirror_start_job(). So I think Michael's patch is closer to real problem to fix.
>
>
>-- 
>Best regards,
>Vladimir
diff mbox series

Patch

diff --git a/blockjob.c b/blockjob.c
index 98ac8af982..51a09f3b60 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -214,13 +214,15 @@  int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
     BdrvChild *c;
 
     bdrv_ref(bs);
-    if (job->job.aio_context != qemu_get_aio_context()) {
+    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+        job->job.aio_context != qemu_get_aio_context()) {
         aio_context_release(job->job.aio_context);
     }
     c = bdrv_root_attach_child(bs, name, &child_job, 0,
                                job->job.aio_context, perm, shared_perm, job,
                                errp);
-    if (job->job.aio_context != qemu_get_aio_context()) {
+    if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+        job->job.aio_context != qemu_get_aio_context()) {
         aio_context_acquire(job->job.aio_context);
     }
     if (c == NULL) {