From patchwork Sat Aug 6 10:19:41 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gonglei (Arei)" X-Patchwork-Id: 9265905 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A85C16075A for ; Sat, 6 Aug 2016 10:25:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E85F28365 for ; Sat, 6 Aug 2016 10:25:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 819BC28462; Sat, 6 Aug 2016 10:25:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 29AEC28365 for ; Sat, 6 Aug 2016 10:25:52 +0000 (UTC) Received: from localhost ([::1]:48761 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bVynf-0002u4-Ht for patchwork-qemu-devel@patchwork.kernel.org; Sat, 06 Aug 2016 06:25:51 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44186) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bVynN-0002to-0B for qemu-devel@nongnu.org; Sat, 06 Aug 2016 06:25:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bVynI-00012V-KU for qemu-devel@nongnu.org; Sat, 06 Aug 2016 06:25:32 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:3035) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bVynH-0000vZ-65 for qemu-devel@nongnu.org; Sat, 06 Aug 2016 06:25:28 -0400 Received: from 172.24.1.47 (EHLO SZXEMA416-HUB.china.huawei.com) ([172.24.1.47]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id CFY78505; Sat, 06 Aug 2016 18:19:49 +0800 (CST) Received: from SZXEMA503-MBS.china.huawei.com ([169.254.6.245]) by SZXEMA416-HUB.china.huawei.com ([10.82.72.35]) with mapi id 14.03.0235.001; Sat, 6 Aug 2016 18:19:42 +0800 From: "Gonglei (Arei)" To: "Gonglei (Arei)" , Paolo Bonzini , "jan.kiszka@siemens.com" , "amit.shah@redhat.com" , "quintela@redhat.com" Thread-Topic: [Question] virtio-serial misses irq delivery on migration? Thread-Index: AdHu+cHKWcdqm7YpQViAcTxXmpxA2wAynVLw Date: Sat, 6 Aug 2016 10:19:41 +0000 Message-ID: <33183CC9F5247A488A2544077AF19020B038C0BA@SZXEMA503-MBS.china.huawei.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.177.18.62] MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090204.57A5B9C7.0006, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=169.254.6.245, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: f54090e20f9257689a79ef7ff4f679d7 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 119.145.14.66 Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration? X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Huangweidong \(C\)" , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP Hi all, I might catch the bug. Now, we rely on a vm_clock timer to assure that the vcpus are started before we invoke send_control_event(),which queue a message in the virtio ring and trigger a irq. In actually, we can't attach our initial intention under a special situation: 1. the process of migration destination is in a coroutine Commit 82a4da79fd6 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd)) if the Qemu encounter a EAGIN while reading QEMUFile. commit 595ab64169b static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, size_t size) { QEMUFileSocket *s = opaque; ssize_t len; for (;;) { len = qemu_recv(s->fd, buf, size, 0); if (len != -1) { break; } if (socket_error() == EAGAIN) { yield_until_fd_readable(s->fd); } else if (socket_error() != EINTR) { break; } } if (len == -1) { len = -socket_error(); } return len; } 3. The main thread can get the cpu and timer will run, asumes that we get EAGIN after invoking fetch_active_ports_list(). 4. reproduce the problem by fault injection. The debug logs: [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "xbzrle"}]} [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "rdma-pin-all"}]} [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter process_incoming_migration_co() // Enter corountine [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will init virtqueue [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: ************** begin to inject debug ********** // fault injection begin, migration coroutine yiled cpu [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1, current_time: 9750771061592 //vm_clock timer boom [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei: virtio_serial_post_load_timer_cb //calling the timer callback [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 0 // send a message to the guest, raise the irq line. [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0 [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 64056}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}} [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 0 [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0 [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750770788478, current_time: 9750771061592 [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state //Enter migration coroutine again, and the restore the lapic register, the previous lapic's info will be covered :( [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start //enter vm_start here [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that means DRIVER OK [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7 that means DRIVER OK [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that means DRIVER OK [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1, old: 1 [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"} [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750772061592, current_time: 9750794269805 [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1 [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750795284148, current_time: 9750795312685 [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1 [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750796329656, current_time: 9750796369458 [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750797396208, current_time: 9750797426752 [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750798445707, current_time: 9750798477520 [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750799496162, current_time: 9750799532183 [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750800556817, current_time: 9750800588741 [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750801606596, current_time: 9750801637961 The below is my debugging diff (Not based on the newest master branch) Can we wait looply the migration process finished when EAGAIN? Regards, -Gonglei > -----Original Message----- > From: Gonglei (Arei) > Sent: Friday, August 05, 2016 5:14 PM > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com' > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C) > Subject: [Question] virtio-serial misses irq delivery on migration? > > Hi Paolo , Jan, Amit > > Recently we encountered a problem that the virtio-serial can't work after > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :( > > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work > because > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new > interrupt > can't be injected to VM because the irq line bit had been set to 1, but the > frontend > driver never handle it or never know it. > > Bug 867366 - virtio-serial misses irq delivery on migration > https://bugzilla.redhat.com/show_bug.cgi?id=867366 > > But my qemu is the newest qemu, Both commit 80dcfb8532"virtio-serial-bus: > post_load > send_event when vm is running" and commit bc6b815d9e " virtio-serial: > propagate > guest_connected to the port on post_load" are applied. > > I noticed that Paolo posted another problem maybe have a pertential problem > about > apic in Comment 23. But this patch > https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff > haven't merged into qemu master. > > Would you give me some clues please? Thanks! > > Regards, > -Gonglei > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 6a45af9..a00b02f 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque) if (!s->post_load) { return; } + QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n"); for (i = 0 ; i < s->post_load->nr_active_ports; ++i) { port = s->post_load->connected[i].port; host_connected = s->post_load->connected[i].host_connected; @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque) */ send_control_event(s, port->id, VIRTIO_CONSOLE_PORT_OPEN, port->host_connected); + QEMU_LOG("gonglei: port->host_connected: %u\n", port->host_connected); } vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port); if (vsc->set_guest_connected) { @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id, } } timer_mod(s->post_load->timer, 1); + QEMU_LOG("gonglei: ************** begin to inject debug **********\n"); + yield_until_fd_readable(qemu_get_fd(f)); return 0; } @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) if (version_id > 3) { return -EINVAL; } - + QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n"); /* The virtio device */ return virtio_load(VIRTIO_DEVICE(opaque), f, version_id); } diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c index 5b47056..2d988cb 100644 --- a/hw/i386/kvm/apic.c +++ b/hw/i386/kvm/apic.c @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct kvm_lapic_state *kapic) { APICCommonState *s = APIC_COMMON(dev); int i; - + QEMU_LOG("gonglei: kvm_put_apic_state\n"); memset(kapic, 0, sizeof(*kapic)); kvm_apic_set_reg(kapic, 0x2, s->id << 24); kvm_apic_set_reg(kapic, 0x8, s->tpr); diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c index d2a6c4c..07699ec 100644 --- a/hw/i386/kvm/ioapic.c +++ b/hw/i386/kvm/ioapic.c @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s) chip.chip_id = KVM_IRQCHIP_IOAPIC; kioapic = &chip.chip.ioapic; - + QEMU_LOG("gonglei: kvm_ioapic_put\n"); kioapic->id = s->id; kioapic->ioregsel = s->ioregsel; kioapic->base_address = s->busdev.mmio[0].addr; diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index 042e960..110be20 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id) if (info->post_load) { info->post_load(s); } + QEMU_LOG("gonglei: apic_dispatch_post_load\n"); return 0; } diff --git a/migration/migration.c b/migration/migration.c index 92c1427..d074c0a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void *opaque) const char *arpGuestCMD = "{\"execute\":\"guest-write-flag-arp\"}\n"; const char *portName = "charchannel1"; size_t uRet; - + QEMU_LOG("gonglei: enter process_incoming_migration_co()\n"); ret = qemu_loadvm_state(f); qemu_fclose(f); free_xbzrle_decoded_buf(); diff --git a/qemu-timer.c b/qemu-timer.c index 5741f0d..dd36bc9 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled) QEMUTimerList *tl; bool old = clock->enabled; clock->enabled = enabled; + QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled, old); if (enabled && !old) { qemu_clock_notify(type); } else if (!enabled && old) { @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) bool progress = false; QEMUTimerCB *cb; void *opaque; + static int count = 10; qemu_event_reset(&timer_list->timers_done_ev); if (!timer_list->clock->enabled) { @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_unlock(&timer_list->active_timers_lock); break; } - + if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) { + QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n", + ts->expire_time, current_time); + } /* remove timer from the list before calling the callback */ timer_list->active_timers = ts->next; ts->next = NULL; diff --git a/vl.c b/vl.c index 06f34fe..824b246 100644 --- a/vl.c +++ b/vl.c @@ -978,7 +978,7 @@ void vm_start(void) { RunState requested; int64_t start_time, end_time; - + QEMU_LOG("gonglei: enter vm_start\n"); qemu_vmstop_requested(&requested); if (runstate_is_running() && requested == RUN_STATE_MAX) { return;