diff mbox series

file-posix: unlock qemu_global_mutex before pread when attach disk

Message ID 1560173684-6264-1-git-send-email-lizhengui@huawei.com (mailing list archive)
State New, archived
Headers show
Series file-posix: unlock qemu_global_mutex before pread when attach disk | expand

Commit Message

Zhengui li June 10, 2019, 1:34 p.m. UTC
virtual machine cpu soft lockup when attach a disk to the vm in the case
that backend storage network has a large delay or IO pressure is too large.

The bt of qemu main thread:
#0 0x0000ffff9d78402c in pread64 () from /lib64/libpthread.so.0
#1 0x0000aaaace3357d8 in pread64 (__offset=0, __nbytes=4096, __buf=0xaaaad47a5200, __fd=202) at /usr/include/bits/unistd.h:99
#2 raw_is_io_aligned (fd=fd@entry=202, buf=buf@entry=0xaaaad47a5200, len=len@entry=4096) at block/raw_posix.c:294
#3 0x0000aaaace33597c in raw_probe_alignment (bs=bs@entry=0xaaaad32ea920, fd=202, errp=errp@entry=0xfffffef7a330) at block/raw_posix.c:349
#4 0x0000aaaace335a48 in raw_refresh_limits (bs=0xaaaad32ea920, errp=0xfffffef7a330) at block/raw_posix.c:811
#5 0x0000aaaace3404b0 in bdrv_refresh_limits (bs=0xaaaad32ea920, errp=0xfffffef7a330, errp@entry=0xfffffef7a360) at block/io.c:122
#6 0x0000aaaace340504 in bdrv_refresh_limits (bs=bs@entry=0xaaaad09ce800, errp=errp@entry=0xfffffef7a3b0) at block/io.c:97
#7 0x0000aaaace2eb9f0 in bdrv_open_common (bs=bs@entry=0xaaaad09ce800, file=file@entry=0xaaaad0e89800, options=<optimized out>, errp=errp@entry=0xfffffef7a450)
at block.c:1194
#8 0x0000aaaace2eedec in bdrv_open_inherit (filename=<optimized out>, filename@entry=0xaaaad25f92d0 "/dev/mapper/36384c4f100630193359db7a80000011d",
reference=reference@entry=0x0, options=<optimized out>, options@entry=0xaaaad3d0f4b0, flags=<optimized out>, flags@entry=128, parent=parent@entry=0x0,
child_role=child_role@entry=0x0, errp=errp@entry=0xfffffef7a710) at block.c:1895
#9 0x0000aaaace2ef510 in bdrv_open (filename=filename@entry=0xaaaad25f92d0 "/dev/mapper/36384c4f100630193359db7a80000011d", reference=reference@entry=0x0,
options=options@entry=0xaaaad3d0f4b0, flags=flags@entry=128, errp=errp@entry=0xfffffef7a710) at block.c:1979
#10 0x0000aaaace331ef0 in blk_new_open (filename=filename@entry=0xaaaad25f92d0 "/dev/mapper/36384c4f100630193359db7a80000011d", reference=reference@entry=0x0,
options=options@entry=0xaaaad3d0f4b0, flags=128, errp=errp@entry=0xfffffef7a710) at block/block_backend.c:213
#11 0x0000aaaace0da1f4 in blockdev_init (file=file@entry=0xaaaad25f92d0 "/dev/mapper/36384c4f100630193359db7a80000011d", bs_opts=bs_opts@entry=0xaaaad3d0f4b0,
errp=errp@entry=0xfffffef7a710) at blockdev.c:603
#12 0x0000aaaace0dc478 in drive_new (all_opts=all_opts@entry=0xaaaad4dc31d0, block_default_type=<optimized out>) at blockdev.c:1116
#13 0x0000aaaace0e3ee0 in add_init_drive (
optstr=optstr@entry=0xaaaad0872ec0 "file=/dev/mapper/36384c4f100630193359db7a80000011d,format=raw,if=none,id=drive-scsi0-0-0-3,cache=none,aio=native")
at device_hotplug.c:46
#14 0x0000aaaace0e3f78 in hmp_drive_add (mon=0xfffffef7a810, qdict=0xaaaad0c8f000) at device_hotplug.c:67
#15 0x0000aaaacdf7d688 in handle_hmp_command (mon=0xfffffef7a810, cmdline=<optimized out>) at /usr/src/debug/qemu-kvm-2.8.1/monitor.c:3199
#16 0x0000aaaacdf7d778 in qmp_human_monitor_command (
command_line=0xaaaacfc8e3c0 "drive_add dummy file=/dev/mapper/36384c4f100630193359db7a80000011d,format=raw,if=none,id=drive-scsi0-0-0-3,cache=none,aio=native",
has_cpu_index=false, cpu_index=0, errp=errp@entry=0xfffffef7a968) at /usr/src/debug/qemu-kvm-2.8.1/monitor.c:660
#17 0x0000aaaace0fdb30 in qmp_marshal_human_monitor_command (args=<optimized out>, ret=0xfffffef7a9e0, errp=0xfffffef7a9d8) at qmp-marshal.c:2223
#18 0x0000aaaace3b6ad0 in do_qmp_dispatch (request=<optimized out>, errp=0xfffffef7aa20, errp@entry=0xfffffef7aa40) at qapi/qmp_dispatch.c:115
#19 0x0000aaaace3b6d58 in qmp_dispatch (request=<optimized out>) at qapi/qmp_dispatch.c:142
#20 0x0000aaaacdf79398 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-kvm-2.8.1/monitor.c:4010
#21 0x0000aaaace3bd6c0 in json_message_process_token (lexer=0xaaaacf834c80, input=<optimized out>, type=JSON_RCURLY, x=214, y=274) at qobject/json_streamer.c:105
#22 0x0000aaaace3f3d4c in json_lexer_feed_char (lexer=lexer@entry=0xaaaacf834c80, ch=<optimized out>, flush=flush@entry=false) at qobject/json_lexer.c:319
#23 0x0000aaaace3f3e6c in json_lexer_feed (lexer=0xaaaacf834c80, buffer=<optimized out>, size=<optimized out>) at qobject/json_lexer.c:369
#24 0x0000aaaacdf77c64 in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-kvm-2.8.1/monitor.c:4040
#25 0x0000aaaace0eab18 in tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0xaaaacf90b280) at qemu_char.c:3260
#26 0x0000ffff9dadf200 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#27 0x0000aaaace3c4a00 in glib_pollfds_poll () at util/main_loop.c:230
--Type <RET> for more, q to quit, c to continue without paging--
#28 0x0000aaaace3c4a88 in os_host_main_loop_wait (timeout=<optimized out>) at util/main_loop.c:278
#29 0x0000aaaace3c4bf0 in main_loop_wait (nonblocking=<optimized out>) at util/main_loop.c:534
#30 0x0000aaaace0f5d08 in main_loop () at vl.c:2120
#31 0x0000aaaacdf3a770 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:5017

when do qmp sush as drive_add,  qemu main thread locks the qemu_global_mutex  and do pread in raw_probe_alignmen. Pread
is a synchronous operation. If backend storage network has a large delay or IO pressure is too large,  the pread operation will not return for a long time, which
make vcpu thread can't acquire qemu_global_mutex for a long time and make the vcpu thread unable to be scheduled for a long time.  So virtual machine cpu soft lockup happened.

qemu main thread should not hold qemu_global_mutex for a long time when do qmp that involving IO synchronous operation sush pread , ioctl, etc.
So this patch unlock qemu_global_mutex before IO synchronous operation sush pread.
---
 block/file-posix.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Paolo Bonzini June 10, 2019, 1:51 p.m. UTC | #1
On 10/06/19 15:34, Zhengui li wrote:
> 
> when do qmp sush as drive_add,  qemu main thread locks the
> qemu_global_mutex  and do pread in raw_probe_alignmen. Pread is a
> synchronous operation. If backend storage network has a large delay
> or IO pressure is too large,  the pread operation will not return for
> a long time, which make vcpu thread can't acquire qemu_global_mutex
> for a long time and make the vcpu thread unable to be scheduled for a
> long time.  So virtual machine cpu soft lockup happened.
> 
> qemu main thread should not hold qemu_global_mutex for a long time
> when do qmp that involving IO synchronous operation sush pread ,
> ioctl, etc. So this patch unlock qemu_global_mutex before IO
> synchronous operation sush pread.

These preads are for 512-4096 bytes, can they really last much longer
than the "open" that precedes them?  If pread of 4K can trigger a soft
lockup, things are really screwed up---and it's hard to be sure that all
callers of raw_probe_alignment are okay with releasing the global mutex.

Paolo
no-reply@patchew.org June 10, 2019, 2:12 p.m. UTC | #2
Patchew URL: https://patchew.org/QEMU/1560173684-6264-1-git-send-email-lizhengui@huawei.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH] file-posix: unlock qemu_global_mutex before pread when attach disk
Type: series
Message-id: 1560173684-6264-1-git-send-email-lizhengui@huawei.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag]               patchew/1560173684-6264-1-git-send-email-lizhengui@huawei.com -> patchew/1560173684-6264-1-git-send-email-lizhengui@huawei.com
Switched to a new branch 'test'
a5549055a3 file-posix: unlock qemu_global_mutex before pread when attach disk

=== OUTPUT BEGIN ===
ERROR: Missing Signed-off-by: line(s)

total: 1 errors, 0 warnings, 9 lines checked

Commit a5549055a30d (file-posix: unlock qemu_global_mutex before pread when attach disk) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/1560173684-6264-1-git-send-email-lizhengui@huawei.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Zhengui li June 10, 2019, 2:51 p.m. UTC | #3
The pread will hang in attaching disk just when backend storage network 
disconnection .

I think the locking range of qemu_global_mutex is too large when do qmp 
operation. what

does the qemu_global_mutex  really protect?  what is the risk of 
unlocking qemu_global_mutex

in qmp?


On 2019/6/10 21:51, Paolo Bonzini wrote:
> On 10/06/19 15:34, Zhengui li wrote:
>> when do qmp sush as drive_add,  qemu main thread locks the
>> qemu_global_mutex  and do pread in raw_probe_alignmen. Pread is a
>> synchronous operation. If backend storage network has a large delay
>> or IO pressure is too large,  the pread operation will not return for
>> a long time, which make vcpu thread can't acquire qemu_global_mutex
>> for a long time and make the vcpu thread unable to be scheduled for a
>> long time.  So virtual machine cpu soft lockup happened.
>>
>> qemu main thread should not hold qemu_global_mutex for a long time
>> when do qmp that involving IO synchronous operation sush pread ,
>> ioctl, etc. So this patch unlock qemu_global_mutex before IO
>> synchronous operation sush pread.
> These preads are for 512-4096 bytes, can they really last much longer
> than the "open" that precedes them?  If pread of 4K can trigger a soft
> lockup, things are really screwed up---and it's hard to be sure that all
> callers of raw_probe_alignment are okay with releasing the global mutex.
>
> Paolo
>
> .
>
null
Paolo Bonzini June 10, 2019, 4:03 p.m. UTC | #4
On 10/06/19 16:51, l00284672 wrote:
> The pread will hang in attaching disk just when backend storage network
> disconnection .

Would the "open" hang as well in that case?
> I think the locking range of qemu_global_mutex is too large when do qmp
> operation. what
> does the qemu_global_mutex  really protect?

Basically everything.

> what is the risk of unlocking qemu_global_mutex in qmp?

It's not QMP; it's specifically the code that calls raw_probe_alignment.

Paolo
Zhengui li June 11, 2019, 2:53 a.m. UTC | #5
-- Would the "open" hang as well in that case?
    The "open" doesn't hang in that case.

Do you have any better solutions to solve this problem in the case?
   


On 2019/6/11 0:03, Paolo Bonzini wrote:
> On 10/06/19 16:51, l00284672 wrote:
>> The pread will hang in attaching disk just when backend storage network
>> disconnection .
> Would the "open" hang as well in that case?
>> I think the locking range of qemu_global_mutex is too large when do qmp
>> operation. what
>> does the qemu_global_mutex  really protect?
> Basically everything.
>
>> what is the risk of unlocking qemu_global_mutex in qmp?
> It's not QMP; it's specifically the code that calls raw_probe_alignment.
>
> Paolo
>
> .
>
null
Kevin Wolf June 11, 2019, 8:49 a.m. UTC | #6
Am 11.06.2019 um 04:53 hat l00284672 geschrieben:
> -- Would the "open" hang as well in that case?
>    The "open" doesn't hang in that case.
> 
> Do you have any better solutions to solve this problem in the case?

Yes, but unfortunately it's a lot harder.

This is roughly what you'd have to do:

1. Make QMP command handlers async (patches from Marc-André are on the
   list)
2. Stop using HMP drive_add and instead switch to QMP blockdev-add
3. Move the blockdev-add code into a coroutine
4. Make .bdrv_open a coroutine_fn
5. Move the pread() in file-posix to the thread pool and let the
   coroutine yield while the request is running

Only with all of these pieces in place we'll be able to release the
global mutex while we're waiting for the pread() to complete.

Kevin
diff mbox series

Patch

diff --git a/block/file-posix.c b/block/file-posix.c
index 1cf4ee4..192c779 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -297,7 +297,9 @@  static int probe_physical_blocksize(int fd, unsigned int *blk_size)
  */
 static bool raw_is_io_aligned(int fd, void *buf, size_t len)
 {
+    qemu_mutex_unlock_iothread();
     ssize_t ret = pread(fd, buf, len, 0);
+    qemu_mutex_lock_iothread();
 
     if (ret >= 0) {
         return true;