From patchwork Tue Sep 19 16:48:01 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexey Perevalov <a.perevalov@samsung.com>
X-Patchwork-Id: 9959725
Return-Path: 
 <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	336A660568 for <patchwork-qemu-devel@patchwork.kernel.org>;
	Tue, 19 Sep 2017 17:18:09 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1AA911FFB9
	for <patchwork-qemu-devel@patchwork.kernel.org>;
	Tue, 19 Sep 2017 17:18:09 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 0EF8828ED4; Tue, 19 Sep 2017 17:18:09 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EBDB81FFB9
	for <patchwork-qemu-devel@patchwork.kernel.org>;
	Tue, 19 Sep 2017 17:18:07 +0000 (UTC)
Received: from localhost ([::1]:44228 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>)
	id 1duM9v-0001K8-0e for patchwork-qemu-devel@patchwork.kernel.org;
	Tue, 19 Sep 2017 13:18:07 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42504)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <a.perevalov@samsung.com>) id 1duLhG-0001Dw-OD
	for qemu-devel@nongnu.org; Tue, 19 Sep 2017 12:48:32 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <a.perevalov@samsung.com>) id 1duLhE-0006RH-CO
	for qemu-devel@nongnu.org; Tue, 19 Sep 2017 12:48:30 -0400
Received: from mailout1.w1.samsung.com ([210.118.77.11]:33242)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <a.perevalov@samsung.com>)
	id 1duLhE-0006Q2-2I
	for qemu-devel@nongnu.org; Tue, 19 Sep 2017 12:48:28 -0400
Received: from eucas1p2.samsung.com (unknown [182.198.249.207])
	by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id
	20170919164826euoutp0177221a800d493d6dd02044aa38dc7ec1~l0TwfJrBI2610726107euoutp01m;
	Tue, 19 Sep 2017 16:48:26 +0000 (GMT)
Received: from eusmges2.samsung.com (unknown [203.254.199.241]) by
	eucas1p1.samsung.com (KnoxPortal) with ESMTP id
	20170919164825eucas1p14708b99c17b8f90a186e2368492d27b1~l0Tvxk4Re3069830698eucas1p1G;
	Tue, 19 Sep 2017 16:48:25 +0000 (GMT)
Received: from eucas1p2.samsung.com ( [182.198.249.207]) by
	eusmges2.samsung.com (EUCPMTA) with SMTP id 2A.6F.12907.95A41C95;
	Tue, 19 Sep 2017 17:48:25 +0100 (BST)
Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by
	eucas1p2.samsung.com (KnoxPortal) with ESMTP id
	20170919164825eucas1p213639e95387bf054270d020f1d7dbfe9~l0TvFbKZ-2073720737eucas1p2k;
	Tue, 19 Sep 2017 16:48:25 +0000 (GMT)
X-AuditID: cbfec7f1-f793a6d00000326b-3c-59c14a59746a
Received: from eusync1.samsung.com ( [203.254.199.211]) by
	eusmgms1.samsung.com (EUCPMTA) with SMTP id 36.6C.18832.95A41C95;
	Tue, 19 Sep 2017 17:48:25 +0100 (BST)
Received: from aperevalov-ubuntu.rnd.samsung.ru ([106.109.129.199]) by
	eusync1.samsung.com (Oracle Communications Messaging Server
	7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id
	<0OWJ003OHDC8XU40@eusync1.samsung.com>;
	Tue, 19 Sep 2017 17:48:25 +0100 (BST)
From: Alexey Perevalov <a.perevalov@samsung.com>
To: qemu-devel@nongnu.org
Date: Tue, 19 Sep 2017 19:48:01 +0300
Message-id: <1505839684-10046-8-git-send-email-a.perevalov@samsung.com>
X-Mailer: git-send-email 1.9.1
In-reply-to: <1505839684-10046-1-git-send-email-a.perevalov@samsung.com>
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFtrFIsWRmVeSWpSXmKPExsWy7djP87qRXgcjDbZcUrSYe/c8i0Xvtnvs
	FhPfrme1mPb5NrvFlfaf7BZb9n9jtzjeu4PF4s6WPiYHDo8n1zYzebzfd5XNo2/LKsYA5igu
	m5TUnMyy1CJ9uwSujJU3LzAV9HlUHDm1lKmB8bRZFyMnh4SAicTDPyfYIGwxiQv31gPZXBxC
	AksZJU4euM0O4XxmlFh6Zh87TMf1U78YIRLLGCUOzOxihXC6mSTu9e9n6WLk4GATMJDYd88W
	pEFEQFLid9dpZpAaZoEtjBJ37r1mBUkIC3hJ7Jq2nhHEZhFQleh43A4W5xVwl7i5dikLxDY5
	iZPHJoPFOQU8JG4dXwt2koRAB5vEl7anrBBFLhKTd/UyQdjCEq+Ob4E6VUais+MgE0RDO6NE
	985OVghnAqPEmel/oarsJU7dvArWzSzAJzFp23RmkBckBHglOtqEIEwPietP9CCqHSUWv78G
	DjAhgdmMEjf3809glF7AyLCKUSS1tDg3PbXYSK84Mbe4NC9dLzk/dxMjMEZP/zv+cQfj+xNW
	hxgFOBiVeHhX2ByMFGJNLCuuzD3EKMHBrCTC+9oJKMSbklhZlVqUH19UmpNafIhRmoNFSZzX
	NqotUkggPbEkNTs1tSC1CCbLxMEp1cC43GLdNZbO679ZqusPc83/9HWt4P60yhj7LbzS9h4n
	pv7e8dNMJ/exortrgO11qz+uCyptLXaIrP482TTs9qKYGV79D7PVHi3olz1xePPjONllPTlR
	3+2ORy5pM+W7P0f5Zl6c6hEfFZXEBINd/5oOccWJqLgKd09bPvnP34Nic+MW/UtX+2KpxFKc
	kWioxVxUnAgAOlSejM0CAAA=
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFupnluLIzCtJLcpLzFFi42I5/e/4Zd1Ir4ORBq96mSzm3j3PYtG77R67
	xcS361ktpn2+zW5xpf0nu8WW/d/YLY737mCxuLOlj8mBw+PJtc1MHu/3XWXz6NuyijGAOYrL
	JiU1J7MstUjfLoErY+XNC0wFfR4VR04tZWpgPG3WxcjJISFgInH91C9GCFtM4sK99WxdjFwc
	QgJLGCUmvjsB5fQySay//BOoioODTcBAYt89W5AGEQFJid9dp5lBapgFtjBKbDy+EmySsICX
	xK5p68FsFgFViY7H7awgNq+Au8TNtUtZILbJSZw8NhkszingIXHr+Fp2EFsIqObG2utMExh5
	FzAyrGIUSS0tzk3PLTbUK07MLS7NS9dLzs/dxAgMpm3Hfm7ewXhpY/AhRgEORiUe3hU2ByOF
	WBPLiitzDzFKcDArifC+dgIK8aYkVlalFuXHF5XmpBYfYpTmYFES5+3dszpSSCA9sSQ1OzW1
	ILUIJsvEwSnVwNg8P1ifvY/tl3W84dHNrQGHuqQ6MzOVRAsfTHqXnfgiz+IY7/OaFIUnZtuX
	lrC+ZP9+4BoX7yIzVmOxZV8L7k3dNO2+8d/TepymTAuCr0Ut3nVQz19SsfzzHhWN/8tm3f+v
	mXRSr7yYfY74tskeYjMVD5TVCR9kuHXjnluYWlUvO9fBWvvqeCWW4oxEQy3mouJEAGQ7DbMi
	AgAA
X-CMS-MailID: 20170919164825eucas1p213639e95387bf054270d020f1d7dbfe9
X-Msg-Generator: CA
X-Sender-IP: 182.198.249.179
X-Local-Sender: =?UTF-8?B?QWxleGV5IFBlcmV2YWxvdhtTUlItVmlydHVhbGl6YXRpb24g?=
	=?UTF-8?B?TGFiG+yCvOyEseyghOyekBtTZW5pb3IgRW5naW5lZXI=?=
X-Global-Sender: =?UTF-8?B?QWxleGV5IFBlcmV2YWxvdhtTUlItVmlydHVhbGl6YXRpb24g?=
	=?UTF-8?B?TGFiG1NhbXN1bmcgRWxlY3Ryb25pY3MbU2VuaW9yIEVuZ2luZWVy?=
X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?=
CMS-TYPE: 201P
X-CMS-RootMailID: 20170919164825eucas1p213639e95387bf054270d020f1d7dbfe9
X-RootMTR: 20170919164825eucas1p213639e95387bf054270d020f1d7dbfe9
References: <1505839684-10046-1-git-send-email-a.perevalov@samsung.com>
	<CGME20170919164825eucas1p213639e95387bf054270d020f1d7dbfe9@eucas1p2.samsung.com>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 210.118.77.11
Subject: [Qemu-devel] [PATCH v10 07/10] migration: calculate vCPU blocktime
	on dst side
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: heetae82.ahn@samsung.com, quintela@redhat.com,
	Alexey Perevalov <a.perevalov@samsung.com>, peterx@redhat.com,
	dgilbert@redhat.com, i.maximets@samsung.com
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This patch provides blocktime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for blocktime per vCPU (could be traced with page_fault_addr)

Blocktime will not calculated if postcopy_blocktime field of
MigrationIncomingState wasn't initialized.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/postcopy-ram.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++-
 migration/trace-events   |   5 +-
 2 files changed, 140 insertions(+), 3 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index cc78981..9a5133f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -110,7 +110,6 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
 
     ctx->exit_notifier.notify = migration_exit_cb;
     qemu_add_exit_notifier(&ctx->exit_notifier);
-    add_migration_state_change_notifier(&ctx->postcopy_notifier);
     return ctx;
 }
 
@@ -559,6 +558,136 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+static int get_mem_fault_cpu_index(uint32_t pid)
+{
+    CPUState *cpu_iter;
+
+    CPU_FOREACH(cpu_iter) {
+        if (cpu_iter->thread_id == pid) {
+            trace_get_mem_fault_cpu_index(cpu_iter->cpu_index, pid);
+            return cpu_iter->cpu_index;
+        }
+    }
+    trace_get_mem_fault_cpu_index(-1, pid);
+    return -1;
+}
+
+/*
+ * This function is being called when pagefault occurs. It
+ * tracks down vCPU blocking time.
+ *
+ * @addr: faulted host virtual address
+ * @ptid: faulted process thread id
+ * @rb: ramblock appropriate to addr
+ */
+static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
+                                          RAMBlock *rb)
+{
+    int cpu, already_received;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+    int64_t now_ms;
+
+    if (!dc || ptid == 0) {
+        return;
+    }
+    cpu = get_mem_fault_cpu_index(ptid);
+    if (cpu < 0) {
+        return;
+    }
+
+    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    if (dc->vcpu_addr[cpu] == 0) {
+        atomic_inc(&dc->smp_cpus_down);
+    }
+
+    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
+    atomic_xchg__nocheck(&dc->last_begin, now_ms);
+    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
+
+    already_received = ramblock_recv_bitmap_test(rb, (void *)addr);
+    if (already_received) {
+        atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
+        atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
+        atomic_sub(&dc->smp_cpus_down, 1);
+    }
+    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
+                                        cpu, already_received);
+}
+
+/*
+ *  This function just provide calculated blocktime per cpu and trace it.
+ *  Total blocktime is calculated in mark_postcopy_blocktime_end.
+ *
+ *
+ * Assume we have 3 CPU
+ *
+ *      S1        E1           S1               E1
+ * -----***********------------xxx***************------------------------> CPU1
+ *
+ *             S2                E2
+ * ------------****************xxx---------------------------------------> CPU2
+ *
+ *                         S3            E3
+ * ------------------------****xxx********-------------------------------> CPU3
+ *
+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
+ * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
+ *            it's a part of total blocktime.
+ * S1 - here is last_begin
+ * Legend of the picture is following:
+ *              * - means blocktime per vCPU
+ *              x - means overlapped blocktime (total blocktime)
+ *
+ * @addr: host virtual address
+ */
+static void mark_postcopy_blocktime_end(uint64_t addr)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+    int i, affected_cpu = 0;
+    int64_t now_ms;
+    bool vcpu_total_blocktime = false;
+
+    if (!dc) {
+        return;
+    }
+
+    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+    /* lookup cpu, to clear it,
+     * that algorithm looks straighforward, but it's not
+     * optimal, more optimal algorithm is keeping tree or hash
+     * where key is address value is a list of  */
+    for (i = 0; i < smp_cpus; i++) {
+        uint64_t vcpu_blocktime = 0;
+        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) {
+            continue;
+        }
+        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
+        vcpu_blocktime = now_ms -
+            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
+        affected_cpu += 1;
+        /* we need to know is that mark_postcopy_end was due to
+         * faulted page, another possible case it's prefetched
+         * page and in that case we shouldn't be here */
+        if (!vcpu_total_blocktime &&
+            atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
+            vcpu_total_blocktime = true;
+        }
+        /* continue cycle, due to one page could affect several vCPUs */
+        dc->vcpu_blocktime[i] += vcpu_blocktime;
+    }
+
+    atomic_sub(&dc->smp_cpus_down, affected_cpu);
+    if (vcpu_total_blocktime) {
+        dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
+    }
+    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime,
+                                      affected_cpu);
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -636,8 +765,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
         rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
         trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
                                                 qemu_ram_get_idstr(rb),
-                                                rb_offset);
+                                                rb_offset,
+                                                msg.arg.pagefault.feat.ptid);
 
+        mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),
+                                      msg.arg.pagefault.feat.ptid, rb);
         /*
          * Send the request to the source - we want to request one
          * of our host page sizes (which is >= TPS)
@@ -727,6 +859,8 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
     if (!ret) {
         ramblock_recv_bitmap_set_range(rb, host_addr,
                                        pagesize / qemu_target_page_size());
+        mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host_addr);
+
     }
     return ret;
 }
diff --git a/migration/trace-events b/migration/trace-events
index d2910a6..01f30fe 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -114,6 +114,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
 migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
+mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu, int received) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", cpu: %d, already_received: %d"
+mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", affected_cpu: %d"
 
 # migration/rdma.c
 qemu_rdma_accept_incoming_migration(void) ""
@@ -190,7 +192,7 @@ postcopy_ram_enable_notify(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
 postcopy_ram_fault_thread_quit(void) ""
-postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx"
+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx pid=%u"
 postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
@@ -199,6 +201,7 @@ save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
+get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
 
 # migration/exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"