From patchwork Tue Sep 19 16:48:01 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Perevalov X-Patchwork-Id: 9959725 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 336A660568 for ; Tue, 19 Sep 2017 17:18:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1AA911FFB9 for ; Tue, 19 Sep 2017 17:18:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0EF8828ED4; Tue, 19 Sep 2017 17:18:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EBDB81FFB9 for ; Tue, 19 Sep 2017 17:18:07 +0000 (UTC) Received: from localhost ([::1]:44228 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1duM9v-0001K8-0e for patchwork-qemu-devel@patchwork.kernel.org; Tue, 19 Sep 2017 13:18:07 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42504) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1duLhG-0001Dw-OD for qemu-devel@nongnu.org; Tue, 19 Sep 2017 12:48:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1duLhE-0006RH-CO for qemu-devel@nongnu.org; Tue, 19 Sep 2017 12:48:30 -0400 Received: from mailout1.w1.samsung.com ([210.118.77.11]:33242) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1duLhE-0006Q2-2I for qemu-devel@nongnu.org; Tue, 19 Sep 2017 12:48:28 -0400 Received: from eucas1p2.samsung.com (unknown [182.198.249.207]) by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id 20170919164826euoutp0177221a800d493d6dd02044aa38dc7ec1~l0TwfJrBI2610726107euoutp01m; Tue, 19 Sep 2017 16:48:26 +0000 (GMT) Received: from eusmges2.samsung.com (unknown [203.254.199.241]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20170919164825eucas1p14708b99c17b8f90a186e2368492d27b1~l0Tvxk4Re3069830698eucas1p1G; Tue, 19 Sep 2017 16:48:25 +0000 (GMT) Received: from eucas1p2.samsung.com ( [182.198.249.207]) by eusmges2.samsung.com (EUCPMTA) with SMTP id 2A.6F.12907.95A41C95; Tue, 19 Sep 2017 17:48:25 +0100 (BST) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eucas1p2.samsung.com (KnoxPortal) with ESMTP id 20170919164825eucas1p213639e95387bf054270d020f1d7dbfe9~l0TvFbKZ-2073720737eucas1p2k; Tue, 19 Sep 2017 16:48:25 +0000 (GMT) X-AuditID: cbfec7f1-f793a6d00000326b-3c-59c14a59746a Received: from eusync1.samsung.com ( [203.254.199.211]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id 36.6C.18832.95A41C95; Tue, 19 Sep 2017 17:48:25 +0100 (BST) Received: from aperevalov-ubuntu.rnd.samsung.ru ([106.109.129.199]) by eusync1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0OWJ003OHDC8XU40@eusync1.samsung.com>; Tue, 19 Sep 2017 17:48:25 +0100 (BST) From: Alexey Perevalov To: qemu-devel@nongnu.org Date: Tue, 19 Sep 2017 19:48:01 +0300 Message-id: <1505839684-10046-8-git-send-email-a.perevalov@samsung.com> X-Mailer: git-send-email 1.9.1 In-reply-to: <1505839684-10046-1-git-send-email-a.perevalov@samsung.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFIsWRmVeSWpSXmKPExsWy7djP87qRXgcjDbZcUrSYe/c8i0Xvtnvs FhPfrme1mPb5NrvFlfaf7BZb9n9jtzjeu4PF4s6WPiYHDo8n1zYzebzfd5XNo2/LKsYA5igu m5TUnMyy1CJ9uwSujJU3LzAV9HlUHDm1lKmB8bRZFyMnh4SAicTDPyfYIGwxiQv31gPZXBxC AksZJU4euM0O4XxmlFh6Zh87TMf1U78YIRLLGCUOzOxihXC6mSTu9e9n6WLk4GATMJDYd88W pEFEQFLid9dpZpAaZoEtjBJ37r1mBUkIC3hJ7Jq2nhHEZhFQleh43A4W5xVwl7i5dikLxDY5 iZPHJoPFOQU8JG4dXwt2koRAB5vEl7anrBBFLhKTd/UyQdjCEq+Ob4E6VUais+MgE0RDO6NE 985OVghnAqPEmel/oarsJU7dvArWzSzAJzFp23RmkBckBHglOtqEIEwPietP9CCqHSUWv78G DjAhgdmMEjf3809glF7AyLCKUSS1tDg3PbXYSK84Mbe4NC9dLzk/dxMjMEZP/zv+cQfj+xNW hxgFOBiVeHhX2ByMFGJNLCuuzD3EKMHBrCTC+9oJKMSbklhZlVqUH19UmpNafIhRmoNFSZzX NqotUkggPbEkNTs1tSC1CCbLxMEp1cC43GLdNZbO679ZqusPc83/9HWt4P60yhj7LbzS9h4n pv7e8dNMJ/exortrgO11qz+uCyptLXaIrP482TTs9qKYGV79D7PVHi3olz1xePPjONllPTlR 3+2ORy5pM+W7P0f5Zl6c6hEfFZXEBINd/5oOccWJqLgKd09bPvnP34Nic+MW/UtX+2KpxFKc kWioxVxUnAgAOlSejM0CAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupnluLIzCtJLcpLzFFi42I5/e/4Zd1Ir4ORBq96mSzm3j3PYtG77R67 xcS361ktpn2+zW5xpf0nu8WW/d/YLY737mCxuLOlj8mBw+PJtc1MHu/3XWXz6NuyijGAOYrL JiU1J7MstUjfLoErY+XNC0wFfR4VR04tZWpgPG3WxcjJISFgInH91C9GCFtM4sK99WxdjFwc QgJLGCUmvjsB5fQySay//BOoioODTcBAYt89W5AGEQFJid9dp5lBapgFtjBKbDy+EmySsICX xK5p68FsFgFViY7H7awgNq+Au8TNtUtZILbJSZw8NhkszingIXHr+Fp2EFsIqObG2utMExh5 FzAyrGIUSS0tzk3PLTbUK07MLS7NS9dLzs/dxAgMpm3Hfm7ewXhpY/AhRgEORiUe3hU2ByOF WBPLiitzDzFKcDArifC+dgIK8aYkVlalFuXHF5XmpBYfYpTmYFES5+3dszpSSCA9sSQ1OzW1 ILUIJsvEwSnVwNg8P1ifvY/tl3W84dHNrQGHuqQ6MzOVRAsfTHqXnfgiz+IY7/OaFIUnZtuX lrC+ZP9+4BoX7yIzVmOxZV8L7k3dNO2+8d/TepymTAuCr0Ut3nVQz19SsfzzHhWN/8tm3f+v mXRSr7yYfY74tskeYjMVD5TVCR9kuHXjnluYWlUvO9fBWvvqeCWW4oxEQy3mouJEAGQ7DbMi AgAA X-CMS-MailID: 20170919164825eucas1p213639e95387bf054270d020f1d7dbfe9 X-Msg-Generator: CA X-Sender-IP: 182.198.249.179 X-Local-Sender: =?UTF-8?B?QWxleGV5IFBlcmV2YWxvdhtTUlItVmlydHVhbGl6YXRpb24g?= =?UTF-8?B?TGFiG+yCvOyEseyghOyekBtTZW5pb3IgRW5naW5lZXI=?= X-Global-Sender: =?UTF-8?B?QWxleGV5IFBlcmV2YWxvdhtTUlItVmlydHVhbGl6YXRpb24g?= =?UTF-8?B?TGFiG1NhbXN1bmcgRWxlY3Ryb25pY3MbU2VuaW9yIEVuZ2luZWVy?= X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?= CMS-TYPE: 201P X-CMS-RootMailID: 20170919164825eucas1p213639e95387bf054270d020f1d7dbfe9 X-RootMTR: 20170919164825eucas1p213639e95387bf054270d020f1d7dbfe9 References: <1505839684-10046-1-git-send-email-a.perevalov@samsung.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 210.118.77.11 Subject: [Qemu-devel] [PATCH v10 07/10] migration: calculate vCPU blocktime on dst side X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: heetae82.ahn@samsung.com, quintela@redhat.com, Alexey Perevalov , peterx@redhat.com, dgilbert@redhat.com, i.maximets@samsung.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP This patch provides blocktime calculation per vCPU, as a summary and as a overlapped value for all vCPUs. This approach was suggested by Peter Xu, as an improvements of previous approch where QEMU kept tree with faulted page address and cpus bitmask in it. Now QEMU is keeping array with faulted page address as value and vCPU as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps list for blocktime per vCPU (could be traced with page_fault_addr) Blocktime will not calculated if postcopy_blocktime field of MigrationIncomingState wasn't initialized. Signed-off-by: Alexey Perevalov --- migration/postcopy-ram.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++- migration/trace-events | 5 +- 2 files changed, 140 insertions(+), 3 deletions(-) diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index cc78981..9a5133f 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -110,7 +110,6 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void) ctx->exit_notifier.notify = migration_exit_cb; qemu_add_exit_notifier(&ctx->exit_notifier); - add_migration_state_change_notifier(&ctx->postcopy_notifier); return ctx; } @@ -559,6 +558,136 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr, return 0; } +static int get_mem_fault_cpu_index(uint32_t pid) +{ + CPUState *cpu_iter; + + CPU_FOREACH(cpu_iter) { + if (cpu_iter->thread_id == pid) { + trace_get_mem_fault_cpu_index(cpu_iter->cpu_index, pid); + return cpu_iter->cpu_index; + } + } + trace_get_mem_fault_cpu_index(-1, pid); + return -1; +} + +/* + * This function is being called when pagefault occurs. It + * tracks down vCPU blocking time. + * + * @addr: faulted host virtual address + * @ptid: faulted process thread id + * @rb: ramblock appropriate to addr + */ +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid, + RAMBlock *rb) +{ + int cpu, already_received; + MigrationIncomingState *mis = migration_incoming_get_current(); + PostcopyBlocktimeContext *dc = mis->blocktime_ctx; + int64_t now_ms; + + if (!dc || ptid == 0) { + return; + } + cpu = get_mem_fault_cpu_index(ptid); + if (cpu < 0) { + return; + } + + now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + if (dc->vcpu_addr[cpu] == 0) { + atomic_inc(&dc->smp_cpus_down); + } + + atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr); + atomic_xchg__nocheck(&dc->last_begin, now_ms); + atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms); + + already_received = ramblock_recv_bitmap_test(rb, (void *)addr); + if (already_received) { + atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0); + atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0); + atomic_sub(&dc->smp_cpus_down, 1); + } + trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu], + cpu, already_received); +} + +/* + * This function just provide calculated blocktime per cpu and trace it. + * Total blocktime is calculated in mark_postcopy_blocktime_end. + * + * + * Assume we have 3 CPU + * + * S1 E1 S1 E1 + * -----***********------------xxx***************------------------------> CPU1 + * + * S2 E2 + * ------------****************xxx---------------------------------------> CPU2 + * + * S3 E3 + * ------------------------****xxx********-------------------------------> CPU3 + * + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1 + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3 + * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 - + * it's a part of total blocktime. + * S1 - here is last_begin + * Legend of the picture is following: + * * - means blocktime per vCPU + * x - means overlapped blocktime (total blocktime) + * + * @addr: host virtual address + */ +static void mark_postcopy_blocktime_end(uint64_t addr) +{ + MigrationIncomingState *mis = migration_incoming_get_current(); + PostcopyBlocktimeContext *dc = mis->blocktime_ctx; + int i, affected_cpu = 0; + int64_t now_ms; + bool vcpu_total_blocktime = false; + + if (!dc) { + return; + } + + now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + + /* lookup cpu, to clear it, + * that algorithm looks straighforward, but it's not + * optimal, more optimal algorithm is keeping tree or hash + * where key is address value is a list of */ + for (i = 0; i < smp_cpus; i++) { + uint64_t vcpu_blocktime = 0; + if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) { + continue; + } + atomic_xchg__nocheck(&dc->vcpu_addr[i], 0); + vcpu_blocktime = now_ms - + atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0); + affected_cpu += 1; + /* we need to know is that mark_postcopy_end was due to + * faulted page, another possible case it's prefetched + * page and in that case we shouldn't be here */ + if (!vcpu_total_blocktime && + atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) { + vcpu_total_blocktime = true; + } + /* continue cycle, due to one page could affect several vCPUs */ + dc->vcpu_blocktime[i] += vcpu_blocktime; + } + + atomic_sub(&dc->smp_cpus_down, affected_cpu); + if (vcpu_total_blocktime) { + dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0); + } + trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime, + affected_cpu); +} + /* * Handle faults detected by the USERFAULT markings */ @@ -636,8 +765,11 @@ static void *postcopy_ram_fault_thread(void *opaque) rb_offset &= ~(qemu_ram_pagesize(rb) - 1); trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address, qemu_ram_get_idstr(rb), - rb_offset); + rb_offset, + msg.arg.pagefault.feat.ptid); + mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address), + msg.arg.pagefault.feat.ptid, rb); /* * Send the request to the source - we want to request one * of our host page sizes (which is >= TPS) @@ -727,6 +859,8 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, if (!ret) { ramblock_recv_bitmap_set_range(rb, host_addr, pagesize / qemu_target_page_size()); + mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host_addr); + } return ret; } diff --git a/migration/trace-events b/migration/trace-events index d2910a6..01f30fe 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -114,6 +114,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d" process_incoming_migration_co_postcopy_end_main(void) "" migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s" migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname) "ioc=%p ioctype=%s hostname=%s" +mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu, int received) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", cpu: %d, already_received: %d" +mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", affected_cpu: %d" # migration/rdma.c qemu_rdma_accept_incoming_migration(void) "" @@ -190,7 +192,7 @@ postcopy_ram_enable_notify(void) "" postcopy_ram_fault_thread_entry(void) "" postcopy_ram_fault_thread_exit(void) "" postcopy_ram_fault_thread_quit(void) "" -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx" +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx pid=%u" postcopy_ram_incoming_cleanup_closeuf(void) "" postcopy_ram_incoming_cleanup_entry(void) "" postcopy_ram_incoming_cleanup_exit(void) "" @@ -199,6 +201,7 @@ save_xbzrle_page_skipping(void) "" save_xbzrle_page_overflow(void) "" ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations" ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64 +get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u" # migration/exec.c migration_exec_outgoing(const char *cmd) "cmd=%s"