diff mbox series

iotests: 30: drop from auto group (and effectively from make check)

Message ID 20210205111021.715240-1-vsementsov@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series iotests: 30: drop from auto group (and effectively from make check) | expand

Commit Message

Vladimir Sementsov-Ogievskiy Feb. 5, 2021, 11:10 a.m. UTC
I reproduced the following crash fast enough:

0  raise () at /lib64/libc.so.6
1  abort () at /lib64/libc.so.6
2  _nl_load_domain.cold () at /lib64/libc.so.6
3  annobin_assert.c_end () at /lib64/libc.so.6
4  bdrv_reopen_multiple (bs_queue=0x55de75fa9b70, errp=0x0)
   at ../block.c:3820
5  bdrv_reopen_set_read_only (bs=0x55de760fc020, read_only=true,
   errp=0x0) at ../block.c:3870
6  stream_clean (job=0x55de75fa9410) at ../block/stream.c:99
7  job_clean (job=0x55de75fa9410) at ../job.c:680
8  job_finalize_single (job=0x55de75fa9410) at ../job.c:696
9  job_txn_apply (job=0x55de75fa9410,
   fn=0x55de741eee27 <job_finalize_single>) at ../job.c:158
10 job_do_finalize (job=0x55de75fa9410) at ../job.c:805
11 job_completed_txn_success (job=0x55de75fa9410) at ../job.c:855
12 job_completed (job=0x55de75fa9410) at ../job.c:868
13 job_exit (opaque=0x55de75fa9410) at ../job.c:888
14 aio_bh_call (bh=0x55de76b9b4e0) at ../util/async.c:136
15 aio_bh_poll (ctx=0x55de75bc5300) at ../util/async.c:164
16 aio_dispatch (ctx=0x55de75bc5300) at ../util/aio-posix.c:381
17 aio_ctx_dispatch (source=0x55de75bc5300, callback=0x0,
   user_data=0x0) at ../util/async.c:306
18 g_main_context_dispatch () at /lib64/libglib-2.0.so.0
19 glib_pollfds_poll () at ../util/main-loop.c:232
20 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
21 main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
22 qemu_main_loop () at ../softmmu/runstate.c:722
23 main (argc=20, argv=0x7ffe218f0268, envp=0x7ffe218f0310) at
   ../softmmu/main.c:50

(gdb) fr 4
4  bdrv_reopen_multiple (bs_queue=0x55de75fa9b70, errp=0x0) at
      ../block.c:3820
3820                assert(perm == state->perm);
(gdb) list
3815
3816            if (ret == 0) {
3817                uint64_t perm, shared;
3818
3819                bdrv_get_cumulative_perm(state->bs, &perm,
                    &shared);
3820                assert(perm == state->perm);
3821                assert(shared == state->shared_perm);
3822
3823                bdrv_set_perm(state->bs);
3824            } else {
(gdb) p perm
$1 = 1
(gdb) p state->perm
$2 = 0

Then I had 38 successful iterations and another crash:
0  bdrv_check_update_perm (bs=0x5631ac97bc50, q=0x0, new_used_perm=1,
   new_shared_perm=31, ignore_children=0x0, errp=0x7ffd9d477cf8) at
   ../block.c:2197
1  bdrv_root_attach_child
    (child_bs=0x5631ac97bc50, child_name=0x5631aaf6b1f9 "backing",
    child_class=0x5631ab280ca0 <child_of_bds>, child_role=8,
    ctx=0x5631ab757300, perm=1, shared_perm=31, opaque=0x5631abb8c020,
    errp=0x7ffd9d477cf8)
    at ../block.c:2642
2  bdrv_attach_child (parent_bs=0x5631abb8c020,
   child_bs=0x5631ac97bc50, child_name=0x5631aaf6b1f9 "backing",
   child_class=0x5631ab280ca0 <child_of_bds>, child_role=8,
   errp=0x7ffd9d477cf8)
    at ../block.c:2719
3  bdrv_set_backing_hd (bs=0x5631abb8c020, backing_hd=0x5631ac97bc50,
   errp=0x7ffd9d477cf8) at ../block.c:2854
4  stream_prepare (job=0x5631ac751eb0) at ../block/stream.c:74
5  job_prepare (job=0x5631ac751eb0) at ../job.c:784
6  job_txn_apply (job=0x5631ac751eb0, fn=0x5631aacb1156 <job_prepare>)
   at ../job.c:158
7  job_do_finalize (job=0x5631ac751eb0) at ../job.c:801
8  job_completed_txn_success (job=0x5631ac751eb0) at ../job.c:855
9  job_completed (job=0x5631ac751eb0) at ../job.c:868
10 job_exit (opaque=0x5631ac751eb0) at ../job.c:888
11 aio_bh_call (bh=0x7f3d9c007680) at ../util/async.c:136
12 aio_bh_poll (ctx=0x5631ab757300) at ../util/async.c:164
13 aio_dispatch (ctx=0x5631ab757300) at ../util/aio-posix.c:381
14 aio_ctx_dispatch (source=0x5631ab757300, callback=0x0,
   user_data=0x0) at ../util/async.c:306
15 g_main_context_dispatch () at /lib64/libglib-2.0.so.0
16 glib_pollfds_poll () at ../util/main-loop.c:232
17 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
18 main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
19 qemu_main_loop () at ../softmmu/runstate.c:722
20 main (argc=20, argv=0x7ffd9d478198, envp=0x7ffd9d478240) at
   ../softmmu/main.c:50
(gdb) list
2192        QLIST_FOREACH(c, &bs->parents, next_parent) {
2193            if (g_slist_find(ignore_children, c)) {
2194                continue;
2195            }
2196
2197            if ((new_used_perm & c->shared_perm) != new_used_perm)
                {
2198                char *user = bdrv_child_user_desc(c);
2199                char *perm_names = bdrv_perm_names(new_used_perm &
                    ~c->shared_perm);
2200
2201                error_setg(errp, "Conflicts with use by %s as '%s',
                    which does not "
(gdb) p c
$1 = (BdrvChild *) 0x8585858585858585

and trying to reproduce it on top of
"block: update graph permissions update" I had 634 successful
iterations
and then the following crash (which looks much better):
0  raise () at /lib64/libc.so.6
1  abort () at /lib64/libc.so.6
2  _nl_load_domain.cold () at /lib64/libc.so.6
3  annobin_assert.c_end () at /lib64/libc.so.6
4  bdrv_replace_child_noperm (child=0x5585bb632010,
   new_bs=0x5585bc4f42a0) at ../block.c:2589
5  bdrv_replace_child (child=0x5585bb632010, new_bs=0x5585bc4f42a0,
   tran=0x7fff5a14d8e0) at ../block.c:2211
6  bdrv_set_backing_noperm (bs=0x5585bb704020,
   backing_bs=0x5585bc4f42a0, tran=0x7fff5a14d8e0, errp=0x7fff5a14d918)
   at ../block.c:3030
7  bdrv_set_backing_hd (bs=0x5585bb704020, backing_hd=0x5585bc4f42a0,
   errp=0x7fff5a14d918) at ../block.c:3072
8  stream_prepare (job=0x5585bc2ef230) at ../block/stream.c:74
9  job_prepare (job=0x5585bc2ef230) at ../job.c:784
10 job_txn_apply (job=0x5585bc2ef230, fn=0x5585ba638ad0 <job_prepare>)
   at ../job.c:158
11 job_do_finalize (job=0x5585bc2ef230) at ../job.c:801
12 job_completed_txn_success (job=0x5585bc2ef230) at ../job.c:855
13 job_completed (job=0x5585bc2ef230) at ../job.c:868
14 job_exit (opaque=0x5585bc2ef230) at ../job.c:888
15 aio_bh_call (bh=0x7f62b8004270) at ../util/async.c:136
16 aio_bh_poll (ctx=0x5585bb2ce4a0) at ../util/async.c:164
17 aio_dispatch (ctx=0x5585bb2ce4a0) at ../util/aio-posix.c:381
18 aio_ctx_dispatch (source=0x5585bb2ce4a0, callback=0x0,
   user_data=0x0) at ../util/async.c:306
19 g_main_context_dispatch () at /lib64/libglib-2.0.so.0
20 glib_pollfds_poll () at ../util/main-loop.c:232
21 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
22 main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
23 qemu_main_loop () at ../softmmu/runstate.c:722
24 main (argc=20, argv=0x7fff5a14ddb8, envp=0x7fff5a14de60) at
   ../softmmu/main.c:50

(gdb) fr 4
4  bdrv_replace_child_noperm (child=0x5585bb632010,
   new_bs=0x5585bc4f42a0) at ../block.c:2589
2589            assert(bdrv_get_aio_context(old_bs) ==
                bdrv_get_aio_context(new_bs));
(gdb) list
2584        int drain_saldo;
2585
2586        assert(!child->frozen);
2587
2588        if (old_bs && new_bs) {
2589            assert(bdrv_get_aio_context(old_bs) ==
                bdrv_get_aio_context(new_bs));
2590        }
2591
2592        new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter
            : 0);
2593        drain_saldo = new_bs_quiesce_counter -
            child->parent_quiesce_counter;

So it seems reasonable to drop test from auto group at least until we
merge "block: update graph permissions update"

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

Note: be free to shorten commit message if needed :)

 tests/qemu-iotests/030 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Claudio Fontana Feb. 5, 2021, 11:15 a.m. UTC | #1
Just in case it helps,

I started getting this error only when I rebased to latest master a couple days ago, after regular rebasing and testing with full make check every 1 or 2 days.

Ciao,

Claudio


On 2/5/21 12:10 PM, Vladimir Sementsov-Ogievskiy wrote:
> I reproduced the following crash fast enough:
> 
> 0  raise () at /lib64/libc.so.6
> 1  abort () at /lib64/libc.so.6
> 2  _nl_load_domain.cold () at /lib64/libc.so.6
> 3  annobin_assert.c_end () at /lib64/libc.so.6
> 4  bdrv_reopen_multiple (bs_queue=0x55de75fa9b70, errp=0x0)
>    at ../block.c:3820
> 5  bdrv_reopen_set_read_only (bs=0x55de760fc020, read_only=true,
>    errp=0x0) at ../block.c:3870
> 6  stream_clean (job=0x55de75fa9410) at ../block/stream.c:99
> 7  job_clean (job=0x55de75fa9410) at ../job.c:680
> 8  job_finalize_single (job=0x55de75fa9410) at ../job.c:696
> 9  job_txn_apply (job=0x55de75fa9410,
>    fn=0x55de741eee27 <job_finalize_single>) at ../job.c:158
> 10 job_do_finalize (job=0x55de75fa9410) at ../job.c:805
> 11 job_completed_txn_success (job=0x55de75fa9410) at ../job.c:855
> 12 job_completed (job=0x55de75fa9410) at ../job.c:868
> 13 job_exit (opaque=0x55de75fa9410) at ../job.c:888
> 14 aio_bh_call (bh=0x55de76b9b4e0) at ../util/async.c:136
> 15 aio_bh_poll (ctx=0x55de75bc5300) at ../util/async.c:164
> 16 aio_dispatch (ctx=0x55de75bc5300) at ../util/aio-posix.c:381
> 17 aio_ctx_dispatch (source=0x55de75bc5300, callback=0x0,
>    user_data=0x0) at ../util/async.c:306
> 18 g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> 19 glib_pollfds_poll () at ../util/main-loop.c:232
> 20 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
> 21 main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
> 22 qemu_main_loop () at ../softmmu/runstate.c:722
> 23 main (argc=20, argv=0x7ffe218f0268, envp=0x7ffe218f0310) at
>    ../softmmu/main.c:50
> 
> (gdb) fr 4
> 4  bdrv_reopen_multiple (bs_queue=0x55de75fa9b70, errp=0x0) at
>       ../block.c:3820
> 3820                assert(perm == state->perm);
> (gdb) list
> 3815
> 3816            if (ret == 0) {
> 3817                uint64_t perm, shared;
> 3818
> 3819                bdrv_get_cumulative_perm(state->bs, &perm,
>                     &shared);
> 3820                assert(perm == state->perm);
> 3821                assert(shared == state->shared_perm);
> 3822
> 3823                bdrv_set_perm(state->bs);
> 3824            } else {
> (gdb) p perm
> $1 = 1
> (gdb) p state->perm
> $2 = 0
> 
> Then I had 38 successful iterations and another crash:
> 0  bdrv_check_update_perm (bs=0x5631ac97bc50, q=0x0, new_used_perm=1,
>    new_shared_perm=31, ignore_children=0x0, errp=0x7ffd9d477cf8) at
>    ../block.c:2197
> 1  bdrv_root_attach_child
>     (child_bs=0x5631ac97bc50, child_name=0x5631aaf6b1f9 "backing",
>     child_class=0x5631ab280ca0 <child_of_bds>, child_role=8,
>     ctx=0x5631ab757300, perm=1, shared_perm=31, opaque=0x5631abb8c020,
>     errp=0x7ffd9d477cf8)
>     at ../block.c:2642
> 2  bdrv_attach_child (parent_bs=0x5631abb8c020,
>    child_bs=0x5631ac97bc50, child_name=0x5631aaf6b1f9 "backing",
>    child_class=0x5631ab280ca0 <child_of_bds>, child_role=8,
>    errp=0x7ffd9d477cf8)
>     at ../block.c:2719
> 3  bdrv_set_backing_hd (bs=0x5631abb8c020, backing_hd=0x5631ac97bc50,
>    errp=0x7ffd9d477cf8) at ../block.c:2854
> 4  stream_prepare (job=0x5631ac751eb0) at ../block/stream.c:74
> 5  job_prepare (job=0x5631ac751eb0) at ../job.c:784
> 6  job_txn_apply (job=0x5631ac751eb0, fn=0x5631aacb1156 <job_prepare>)
>    at ../job.c:158
> 7  job_do_finalize (job=0x5631ac751eb0) at ../job.c:801
> 8  job_completed_txn_success (job=0x5631ac751eb0) at ../job.c:855
> 9  job_completed (job=0x5631ac751eb0) at ../job.c:868
> 10 job_exit (opaque=0x5631ac751eb0) at ../job.c:888
> 11 aio_bh_call (bh=0x7f3d9c007680) at ../util/async.c:136
> 12 aio_bh_poll (ctx=0x5631ab757300) at ../util/async.c:164
> 13 aio_dispatch (ctx=0x5631ab757300) at ../util/aio-posix.c:381
> 14 aio_ctx_dispatch (source=0x5631ab757300, callback=0x0,
>    user_data=0x0) at ../util/async.c:306
> 15 g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> 16 glib_pollfds_poll () at ../util/main-loop.c:232
> 17 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
> 18 main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
> 19 qemu_main_loop () at ../softmmu/runstate.c:722
> 20 main (argc=20, argv=0x7ffd9d478198, envp=0x7ffd9d478240) at
>    ../softmmu/main.c:50
> (gdb) list
> 2192        QLIST_FOREACH(c, &bs->parents, next_parent) {
> 2193            if (g_slist_find(ignore_children, c)) {
> 2194                continue;
> 2195            }
> 2196
> 2197            if ((new_used_perm & c->shared_perm) != new_used_perm)
>                 {
> 2198                char *user = bdrv_child_user_desc(c);
> 2199                char *perm_names = bdrv_perm_names(new_used_perm &
>                     ~c->shared_perm);
> 2200
> 2201                error_setg(errp, "Conflicts with use by %s as '%s',
>                     which does not "
> (gdb) p c
> $1 = (BdrvChild *) 0x8585858585858585
> 
> and trying to reproduce it on top of
> "block: update graph permissions update" I had 634 successful
> iterations
> and then the following crash (which looks much better):
> 0  raise () at /lib64/libc.so.6
> 1  abort () at /lib64/libc.so.6
> 2  _nl_load_domain.cold () at /lib64/libc.so.6
> 3  annobin_assert.c_end () at /lib64/libc.so.6
> 4  bdrv_replace_child_noperm (child=0x5585bb632010,
>    new_bs=0x5585bc4f42a0) at ../block.c:2589
> 5  bdrv_replace_child (child=0x5585bb632010, new_bs=0x5585bc4f42a0,
>    tran=0x7fff5a14d8e0) at ../block.c:2211
> 6  bdrv_set_backing_noperm (bs=0x5585bb704020,
>    backing_bs=0x5585bc4f42a0, tran=0x7fff5a14d8e0, errp=0x7fff5a14d918)
>    at ../block.c:3030
> 7  bdrv_set_backing_hd (bs=0x5585bb704020, backing_hd=0x5585bc4f42a0,
>    errp=0x7fff5a14d918) at ../block.c:3072
> 8  stream_prepare (job=0x5585bc2ef230) at ../block/stream.c:74
> 9  job_prepare (job=0x5585bc2ef230) at ../job.c:784
> 10 job_txn_apply (job=0x5585bc2ef230, fn=0x5585ba638ad0 <job_prepare>)
>    at ../job.c:158
> 11 job_do_finalize (job=0x5585bc2ef230) at ../job.c:801
> 12 job_completed_txn_success (job=0x5585bc2ef230) at ../job.c:855
> 13 job_completed (job=0x5585bc2ef230) at ../job.c:868
> 14 job_exit (opaque=0x5585bc2ef230) at ../job.c:888
> 15 aio_bh_call (bh=0x7f62b8004270) at ../util/async.c:136
> 16 aio_bh_poll (ctx=0x5585bb2ce4a0) at ../util/async.c:164
> 17 aio_dispatch (ctx=0x5585bb2ce4a0) at ../util/aio-posix.c:381
> 18 aio_ctx_dispatch (source=0x5585bb2ce4a0, callback=0x0,
>    user_data=0x0) at ../util/async.c:306
> 19 g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> 20 glib_pollfds_poll () at ../util/main-loop.c:232
> 21 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
> 22 main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
> 23 qemu_main_loop () at ../softmmu/runstate.c:722
> 24 main (argc=20, argv=0x7fff5a14ddb8, envp=0x7fff5a14de60) at
>    ../softmmu/main.c:50
> 
> (gdb) fr 4
> 4  bdrv_replace_child_noperm (child=0x5585bb632010,
>    new_bs=0x5585bc4f42a0) at ../block.c:2589
> 2589            assert(bdrv_get_aio_context(old_bs) ==
>                 bdrv_get_aio_context(new_bs));
> (gdb) list
> 2584        int drain_saldo;
> 2585
> 2586        assert(!child->frozen);
> 2587
> 2588        if (old_bs && new_bs) {
> 2589            assert(bdrv_get_aio_context(old_bs) ==
>                 bdrv_get_aio_context(new_bs));
> 2590        }
> 2591
> 2592        new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter
>             : 0);
> 2593        drain_saldo = new_bs_quiesce_counter -
>             child->parent_quiesce_counter;
> 
> So it seems reasonable to drop test from auto group at least until we
> merge "block: update graph permissions update"
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Note: be free to shorten commit message if needed :)
> 
>  tests/qemu-iotests/030 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
> index 832fe4a1e2..12aa9ed37e 100755
> --- a/tests/qemu-iotests/030
> +++ b/tests/qemu-iotests/030
> @@ -1,5 +1,5 @@
>  #!/usr/bin/env python3
> -# group: rw auto backing
> +# group: rw backing
>  #
>  # Tests for image streaming.
>  #
>
Vladimir Sementsov-Ogievskiy Feb. 5, 2021, 11:24 a.m. UTC | #2
05.02.2021 14:15, Claudio Fontana wrote:
> Just in case it helps,
> 
> I started getting this error only when I rebased to latest master a couple days ago, after regular rebasing and testing with full make check every 1 or 2 days.
> 

The first crash below is on assertion which was added recently. I still think the assertion is correct. And shows that we just have problems that were hidden. Probably we can make the assertion less restrictive. But I hope for my "block: update graph permissions update".

> 
> 
> On 2/5/21 12:10 PM, Vladimir Sementsov-Ogievskiy wrote:
>> I reproduced the following crash fast enough:
>>
>> 0  raise () at /lib64/libc.so.6
>> 1  abort () at /lib64/libc.so.6
>> 2  _nl_load_domain.cold () at /lib64/libc.so.6
>> 3  annobin_assert.c_end () at /lib64/libc.so.6
>> 4  bdrv_reopen_multiple (bs_queue=0x55de75fa9b70, errp=0x0)
>>     at ../block.c:3820
>> 5  bdrv_reopen_set_read_only (bs=0x55de760fc020, read_only=true,
>>     errp=0x0) at ../block.c:3870
>> 6  stream_clean (job=0x55de75fa9410) at ../block/stream.c:99
>> 7  job_clean (job=0x55de75fa9410) at ../job.c:680
>> 8  job_finalize_single (job=0x55de75fa9410) at ../job.c:696
>> 9  job_txn_apply (job=0x55de75fa9410,
>>     fn=0x55de741eee27 <job_finalize_single>) at ../job.c:158
>> 10 job_do_finalize (job=0x55de75fa9410) at ../job.c:805
>> 11 job_completed_txn_success (job=0x55de75fa9410) at ../job.c:855
>> 12 job_completed (job=0x55de75fa9410) at ../job.c:868
>> 13 job_exit (opaque=0x55de75fa9410) at ../job.c:888
>> 14 aio_bh_call (bh=0x55de76b9b4e0) at ../util/async.c:136
>> 15 aio_bh_poll (ctx=0x55de75bc5300) at ../util/async.c:164
>> 16 aio_dispatch (ctx=0x55de75bc5300) at ../util/aio-posix.c:381
>> 17 aio_ctx_dispatch (source=0x55de75bc5300, callback=0x0,
>>     user_data=0x0) at ../util/async.c:306
>> 18 g_main_context_dispatch () at /lib64/libglib-2.0.so.0
>> 19 glib_pollfds_poll () at ../util/main-loop.c:232
>> 20 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
>> 21 main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
>> 22 qemu_main_loop () at ../softmmu/runstate.c:722
>> 23 main (argc=20, argv=0x7ffe218f0268, envp=0x7ffe218f0310) at
>>     ../softmmu/main.c:50
>>
>> (gdb) fr 4
>> 4  bdrv_reopen_multiple (bs_queue=0x55de75fa9b70, errp=0x0) at
>>        ../block.c:3820
>> 3820                assert(perm == state->perm);
>> (gdb) list
>> 3815
>> 3816            if (ret == 0) {
>> 3817                uint64_t perm, shared;
>> 3818
>> 3819                bdrv_get_cumulative_perm(state->bs, &perm,
>>                      &shared);
>> 3820                assert(perm == state->perm);
>> 3821                assert(shared == state->shared_perm);
>> 3822
>> 3823                bdrv_set_perm(state->bs);
>> 3824            } else {
>> (gdb) p perm
>> $1 = 1
>> (gdb) p state->perm
>> $2 = 0
>>
>> Then I had 38 successful iterations and another crash:
>> 0  bdrv_check_update_perm (bs=0x5631ac97bc50, q=0x0, new_used_perm=1,
>>     new_shared_perm=31, ignore_children=0x0, errp=0x7ffd9d477cf8) at
>>     ../block.c:2197
>> 1  bdrv_root_attach_child
>>      (child_bs=0x5631ac97bc50, child_name=0x5631aaf6b1f9 "backing",
>>      child_class=0x5631ab280ca0 <child_of_bds>, child_role=8,
>>      ctx=0x5631ab757300, perm=1, shared_perm=31, opaque=0x5631abb8c020,
>>      errp=0x7ffd9d477cf8)
>>      at ../block.c:2642
>> 2  bdrv_attach_child (parent_bs=0x5631abb8c020,
>>     child_bs=0x5631ac97bc50, child_name=0x5631aaf6b1f9 "backing",
>>     child_class=0x5631ab280ca0 <child_of_bds>, child_role=8,
>>     errp=0x7ffd9d477cf8)
>>      at ../block.c:2719
>> 3  bdrv_set_backing_hd (bs=0x5631abb8c020, backing_hd=0x5631ac97bc50,
>>     errp=0x7ffd9d477cf8) at ../block.c:2854
>> 4  stream_prepare (job=0x5631ac751eb0) at ../block/stream.c:74
>> 5  job_prepare (job=0x5631ac751eb0) at ../job.c:784
>> 6  job_txn_apply (job=0x5631ac751eb0, fn=0x5631aacb1156 <job_prepare>)
>>     at ../job.c:158
>> 7  job_do_finalize (job=0x5631ac751eb0) at ../job.c:801
>> 8  job_completed_txn_success (job=0x5631ac751eb0) at ../job.c:855
>> 9  job_completed (job=0x5631ac751eb0) at ../job.c:868
>> 10 job_exit (opaque=0x5631ac751eb0) at ../job.c:888
>> 11 aio_bh_call (bh=0x7f3d9c007680) at ../util/async.c:136
>> 12 aio_bh_poll (ctx=0x5631ab757300) at ../util/async.c:164
>> 13 aio_dispatch (ctx=0x5631ab757300) at ../util/aio-posix.c:381
>> 14 aio_ctx_dispatch (source=0x5631ab757300, callback=0x0,
>>     user_data=0x0) at ../util/async.c:306
>> 15 g_main_context_dispatch () at /lib64/libglib-2.0.so.0
>> 16 glib_pollfds_poll () at ../util/main-loop.c:232
>> 17 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
>> 18 main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
>> 19 qemu_main_loop () at ../softmmu/runstate.c:722
>> 20 main (argc=20, argv=0x7ffd9d478198, envp=0x7ffd9d478240) at
>>     ../softmmu/main.c:50
>> (gdb) list
>> 2192        QLIST_FOREACH(c, &bs->parents, next_parent) {
>> 2193            if (g_slist_find(ignore_children, c)) {
>> 2194                continue;
>> 2195            }
>> 2196
>> 2197            if ((new_used_perm & c->shared_perm) != new_used_perm)
>>                  {
>> 2198                char *user = bdrv_child_user_desc(c);
>> 2199                char *perm_names = bdrv_perm_names(new_used_perm &
>>                      ~c->shared_perm);
>> 2200
>> 2201                error_setg(errp, "Conflicts with use by %s as '%s',
>>                      which does not "
>> (gdb) p c
>> $1 = (BdrvChild *) 0x8585858585858585
>>
>> and trying to reproduce it on top of
>> "block: update graph permissions update" I had 634 successful
>> iterations
>> and then the following crash (which looks much better):
>> 0  raise () at /lib64/libc.so.6
>> 1  abort () at /lib64/libc.so.6
>> 2  _nl_load_domain.cold () at /lib64/libc.so.6
>> 3  annobin_assert.c_end () at /lib64/libc.so.6
>> 4  bdrv_replace_child_noperm (child=0x5585bb632010,
>>     new_bs=0x5585bc4f42a0) at ../block.c:2589
>> 5  bdrv_replace_child (child=0x5585bb632010, new_bs=0x5585bc4f42a0,
>>     tran=0x7fff5a14d8e0) at ../block.c:2211
>> 6  bdrv_set_backing_noperm (bs=0x5585bb704020,
>>     backing_bs=0x5585bc4f42a0, tran=0x7fff5a14d8e0, errp=0x7fff5a14d918)
>>     at ../block.c:3030
>> 7  bdrv_set_backing_hd (bs=0x5585bb704020, backing_hd=0x5585bc4f42a0,
>>     errp=0x7fff5a14d918) at ../block.c:3072
>> 8  stream_prepare (job=0x5585bc2ef230) at ../block/stream.c:74
>> 9  job_prepare (job=0x5585bc2ef230) at ../job.c:784
>> 10 job_txn_apply (job=0x5585bc2ef230, fn=0x5585ba638ad0 <job_prepare>)
>>     at ../job.c:158
>> 11 job_do_finalize (job=0x5585bc2ef230) at ../job.c:801
>> 12 job_completed_txn_success (job=0x5585bc2ef230) at ../job.c:855
>> 13 job_completed (job=0x5585bc2ef230) at ../job.c:868
>> 14 job_exit (opaque=0x5585bc2ef230) at ../job.c:888
>> 15 aio_bh_call (bh=0x7f62b8004270) at ../util/async.c:136
>> 16 aio_bh_poll (ctx=0x5585bb2ce4a0) at ../util/async.c:164
>> 17 aio_dispatch (ctx=0x5585bb2ce4a0) at ../util/aio-posix.c:381
>> 18 aio_ctx_dispatch (source=0x5585bb2ce4a0, callback=0x0,
>>     user_data=0x0) at ../util/async.c:306
>> 19 g_main_context_dispatch () at /lib64/libglib-2.0.so.0
>> 20 glib_pollfds_poll () at ../util/main-loop.c:232
>> 21 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
>> 22 main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
>> 23 qemu_main_loop () at ../softmmu/runstate.c:722
>> 24 main (argc=20, argv=0x7fff5a14ddb8, envp=0x7fff5a14de60) at
>>     ../softmmu/main.c:50
>>
>> (gdb) fr 4
>> 4  bdrv_replace_child_noperm (child=0x5585bb632010,
>>     new_bs=0x5585bc4f42a0) at ../block.c:2589
>> 2589            assert(bdrv_get_aio_context(old_bs) ==
>>                  bdrv_get_aio_context(new_bs));
>> (gdb) list
>> 2584        int drain_saldo;
>> 2585
>> 2586        assert(!child->frozen);
>> 2587
>> 2588        if (old_bs && new_bs) {
>> 2589            assert(bdrv_get_aio_context(old_bs) ==
>>                  bdrv_get_aio_context(new_bs));
>> 2590        }
>> 2591
>> 2592        new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter
>>              : 0);
>> 2593        drain_saldo = new_bs_quiesce_counter -
>>              child->parent_quiesce_counter;
>>
>> So it seems reasonable to drop test from auto group at least until we
>> merge "block: update graph permissions update"
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>
>> Note: be free to shorten commit message if needed :)
>>
>>   tests/qemu-iotests/030 | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
>> index 832fe4a1e2..12aa9ed37e 100755
>> --- a/tests/qemu-iotests/030
>> +++ b/tests/qemu-iotests/030
>> @@ -1,5 +1,5 @@
>>   #!/usr/bin/env python3
>> -# group: rw auto backing
>> +# group: rw backing
>>   #
>>   # Tests for image streaming.
>>   #
>>
>
Eric Blake Feb. 5, 2021, 2:48 p.m. UTC | #3
On 2/5/21 5:10 AM, Vladimir Sementsov-Ogievskiy wrote:
> I reproduced the following crash fast enough:
> 
> 0  raise () at /lib64/libc.so.6
> 1  abort () at /lib64/libc.so.6
> 2  _nl_load_domain.cold () at /lib64/libc.so.6
> 3  annobin_assert.c_end () at /lib64/libc.so.6
> 4  bdrv_reopen_multiple (bs_queue=0x55de75fa9b70, errp=0x0)
>    at ../block.c:3820
> 5  bdrv_reopen_set_read_only (bs=0x55de760fc020, read_only=true,
>    errp=0x0) at ../block.c:3870
> 6  stream_clean (job=0x55de75fa9410) at ../block/stream.c:99

> Then I had 38 successful iterations and another crash:
> 0  bdrv_check_update_perm (bs=0x5631ac97bc50, q=0x0, new_used_perm=1,
>    new_shared_perm=31, ignore_children=0x0, errp=0x7ffd9d477cf8) at
>    ../block.c:2197
> 1  bdrv_root_attach_child
>     (child_bs=0x5631ac97bc50, child_name=0x5631aaf6b1f9 "backing",
>     child_class=0x5631ab280ca0 <child_of_bds>, child_role=8,
>     ctx=0x5631ab757300, perm=1, shared_perm=31, opaque=0x5631abb8c020,
>     errp=0x7ffd9d477cf8)
>     at ../block.c:2642
> 2  bdrv_attach_child (parent_bs=0x5631abb8c020,
>    child_bs=0x5631ac97bc50, child_name=0x5631aaf6b1f9 "backing",
>    child_class=0x5631ab280ca0 <child_of_bds>, child_role=8,
>    errp=0x7ffd9d477cf8)
>     at ../block.c:2719
> 3  bdrv_set_backing_hd (bs=0x5631abb8c020, backing_hd=0x5631ac97bc50,
>    errp=0x7ffd9d477cf8) at ../block.c:2854
> 4  stream_prepare (job=0x5631ac751eb0) at ../block/stream.c:74

So we definitely have a race that can show a wide variety of knock-on
effects.


> 
> and trying to reproduce it on top of
> "block: update graph permissions update" I had 634 successful
> iterations
> and then the following crash (which looks much better):

This part of the commit message is odd - if we check it in to git as
written, you're pointing to a future commit, while still stating that it
is not a perfect commit.  But maybe by the time that commit gets in
we'll have figured out this last crash and corrected it as well.
Sticking to just the first two logs is fine by me.


> 
> So it seems reasonable to drop test from auto group at least until we
> merge "block: update graph permissions update"
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Note: be free to shorten commit message if needed :)

Indeed.  But as to the patch itself, I agree, and may Peter wants to
apply it directly to master instead of waiting for it to come through on
of the block maintainers?

Reviewed-by: Eric Blake <eblake@redhat.com>
Peter Maydell Feb. 5, 2021, 3:18 p.m. UTC | #4
On Fri, 5 Feb 2021 at 14:48, Eric Blake <eblake@redhat.com> wrote:
>
> On 2/5/21 5:10 AM, Vladimir Sementsov-Ogievskiy wrote:
> > and trying to reproduce it on top of
> > "block: update graph permissions update" I had 634 successful
> > iterations
> > and then the following crash (which looks much better):
>
> This part of the commit message is odd - if we check it in to git as
> written, you're pointing to a future commit, while still stating that it
> is not a perfect commit.  But maybe by the time that commit gets in
> we'll have figured out this last crash and corrected it as well.
> Sticking to just the first two logs is fine by me.
>
>
> >
> > So it seems reasonable to drop test from auto group at least until we
> > merge "block: update graph permissions update"
> >
> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > ---
> >
> > Note: be free to shorten commit message if needed :)
>
> Indeed.  But as to the patch itself, I agree, and may Peter wants to
> apply it directly to master instead of waiting for it to come through on
> of the block maintainers?
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks; I have applied this to master, after trimming the
part of the commit message that refers to as-yet-unapplied
patch series.

-- PMM
diff mbox series

Patch

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 832fe4a1e2..12aa9ed37e 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -1,5 +1,5 @@ 
 #!/usr/bin/env python3
-# group: rw auto backing
+# group: rw backing
 #
 # Tests for image streaming.
 #