From patchwork Mon Oct 30 13:16:27 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexey Perevalov <a.perevalov@samsung.com>
X-Patchwork-Id: 10032637
Return-Path: 
 <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	8B97D603B4 for <patchwork-qemu-devel@patchwork.kernel.org>;
	Mon, 30 Oct 2017 13:20:34 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7451F267EC
	for <patchwork-qemu-devel@patchwork.kernel.org>;
	Mon, 30 Oct 2017 13:20:34 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 6909826E40; Mon, 30 Oct 2017 13:20:34 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B44B6267EC
	for <patchwork-qemu-devel@patchwork.kernel.org>;
	Mon, 30 Oct 2017 13:20:33 +0000 (UTC)
Received: from localhost ([::1]:40685 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>)
	id 1e99zU-00047O-VV for patchwork-qemu-devel@patchwork.kernel.org;
	Mon, 30 Oct 2017 09:20:33 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52370)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <a.perevalov@samsung.com>) id 1e99w3-0002GX-Io
	for qemu-devel@nongnu.org; Mon, 30 Oct 2017 09:17:06 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <a.perevalov@samsung.com>) id 1e99vx-00045D-8m
	for qemu-devel@nongnu.org; Mon, 30 Oct 2017 09:16:59 -0400
Received: from mailout1.w1.samsung.com ([210.118.77.11]:51553)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <a.perevalov@samsung.com>)
	id 1e99vw-00043o-Rf
	for qemu-devel@nongnu.org; Mon, 30 Oct 2017 09:16:53 -0400
Received: from eucas1p2.samsung.com (unknown [182.198.249.207])
	by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id
	20171030131649euoutp01ce9bc06991b5c972738b67aabad1a1fc~yW3sus3VR0337403374euoutp01j;
	Mon, 30 Oct 2017 13:16:49 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 mailout1.w1.samsung.com
	20171030131649euoutp01ce9bc06991b5c972738b67aabad1a1fc~yW3sus3VR0337403374euoutp01j
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com;
	s=mail20170921; t=1509369409;
	bh=ajjBl+cDJyRgKYxG1m2eDgUkdXTgCYH7Ea/7Wdrz5uc=;
	h=From:To:Cc:Subject:Date:In-reply-to:References:From;
	b=OEFFKnGSzzev5oh2bG9NQvuw+WmObHYTkXiUCZlnz6syz8weKokVrMSdLT3DW0duw
	bPsZrQ/lwIrpuXCehUYGZCo3dcxKZ55wagAsZ2DDZINfj0YmvkFNrZ744woHgxQN5s
	xv0+VV/q2iqt0jFXf057SA8Qm/YpebaD17IZfv5g=
Received: from eusmges5.samsung.com (unknown [203.254.199.245]) by
	eucas1p1.samsung.com (KnoxPortal) with ESMTP id
	20171030131649eucas1p1d336723cd55bfbdcb6e25e8b92b2409d~yW3sWcSA21152411524eucas1p1R;
	Mon, 30 Oct 2017 13:16:49 +0000 (GMT)
Received: from eucas1p1.samsung.com ( [182.198.249.206]) by
	eusmges5.samsung.com (EUCPMTA) with SMTP id A1.13.12743.14627F95;
	Mon, 30 Oct 2017 13:16:49 +0000 (GMT)
Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by
	eucas1p1.samsung.com (KnoxPortal) with ESMTP id
	20171030131648eucas1p14830b04b73c8eb12d07b752738238b8a~yW3rtpQZi2427724277eucas1p1f;
	Mon, 30 Oct 2017 13:16:48 +0000 (GMT)
X-AuditID: cbfec7f5-f79d06d0000031c7-e2-59f72641461c
Received: from eusync3.samsung.com ( [203.254.199.213]) by
	eusmgms1.samsung.com (EUCPMTA) with SMTP id B4.C1.18832.04627F95;
	Mon, 30 Oct 2017 13:16:48 +0000 (GMT)
Received: from aperevalov-ubuntu.rnd.samsung.ru ([106.109.129.199]) by
	eusync3.samsung.com (Oracle Communications Messaging Server
	7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id
	<0OYN00A7O0VNBHD0@eusync3.samsung.com>;
	Mon, 30 Oct 2017 13:16:48 +0000 (GMT)
From: Alexey Perevalov <a.perevalov@samsung.com>
To: qemu-devel@nongnu.org
Date: Mon, 30 Oct 2017 16:16:27 +0300
Message-id: <1509369390-8285-4-git-send-email-a.perevalov@samsung.com>
X-Mailer: git-send-email 2.7.4
In-reply-to: <1509369390-8285-1-git-send-email-a.perevalov@samsung.com>
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFtrOIsWRmVeSWpSXmKPExsWy7djPc7qOat8jDdZ+ZbGYe/c8i0Xvtnvs
	FtM+32a3uNL+k91iy/5v7BbHe3ewWNzZ0sfkwO7x5NpmJo/3+66yefRtWcUYwBzFZZOSmpNZ
	llqkb5fAlXFsQQNbwTP3inN9sxkbGCeZdTFyckgImEj0nzrLBGGLSVy4t56ti5GLQ0hgKaPE
	neVTWSGcz4wSmycuYIfpmH/xBFTVMkaJ533zWSCcbiaJibuuAM3i4GATMJDYd88WpEFEQFLi
	d9dpZpAaZoGFjBJT9j9nA0kIC3hIdLVMZgWxWQRUJabOnckMYvMKuElceDiXEWKbnMTNc51g
	cU4Bd4lbBz4wgQySEOhgk1i8poMFoshF4vG8icwQtrDEq+NboE6VkejsOAjV0M4o0b2zkxXC
	mcAocWb6X6gqe4lTN6+Cg4BZgE9i0rbpzCAvSAjwSnS0CUGUeEh8mzuZDcJ2lFj67Sc0YGYx
	Suw49ZBpAqP0AkaGVYwiqaXFuempxaZ6xYm5xaV56XrJ+bmbGIHRefrf8a87GJceszrEKMDB
	qMTD6yDyPVKINbGsuDL3EKMEB7OSCO8qRaAQb0piZVVqUX58UWlOavEhRmkOFiVxXtuotkgh
	gfTEktTs1NSC1CKYLBMHp1QDI/+tpybGpp+2y9qb+m98KLH4WQ0zV2gSS/Tzan/1l0deK0j8
	f/9KNdqdlfP2ifDyfjf/gjyVzoM7P6Tdn9KsPzV5UsUlr7WhOvVHyhKfn/8w8dkUP4cL5aHq
	NQpu8RYhfEeMt4jVZ0yJuvzu18tb/bwPJhlFTOxuFzTfIvXm2luhy6a3rL/2K7EUZyQaajEX
	FScCAIIqAJHKAgAA
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFupiluLIzCtJLcpLzFFi42I5/e/4VV0Hte+RBm/2qVnMvXuexaJ32z12
	i2mfb7NbXGn/yW6xZf83dovjvTtYLO5s6WNyYPd4cm0zk8f7fVfZPPq2rGIMYI7isklJzcks
	Sy3St0vgyji2oIGt4Jl7xbm+2YwNjJPMuhg5OSQETCTmXzzBBmGLSVy4tx7I5uIQEljCKPFn
	4WF2CKeXSWLRmrksXYwcHGwCBhL77tmCNIgISEr87jrNDFLDLLCQUeJS92tmkISwgIdEV8tk
	VhCbRUBVYurcmWBxXgE3iQsP5zJCbJOTuHmuEyzOKeAucevAByYQWwioZvb2C4wTGHkXMDKs
	YhRJLS3OTc8tNtQrTswtLs1L10vOz93ECAyhbcd+bt7BeGlj8CFGAQ5GJR7eG7nfIoVYE8uK
	K3MPMUpwMCuJ8K5S/B4pxJuSWFmVWpQfX1Sak1p8iFGag0VJnLd3z+pIIYH0xJLU7NTUgtQi
	mCwTB6dUA2OTu3HrzoiIkyV8d1j+Ws1dccbsZiKj4mHLc1NXG71RWrzc0P3wecmdruFXONs6
	9HefcFKf9Ilxnk7RQUPWFeLmf367vtZYUsbk+uO1mnKe2sUtnesCdpgznCzd9FouP+HV07uF
	Wzqls2f2Zp3zN6pvnK91L//BQ0ttS/bO21zftm/X1/zplKjEUpyRaKjFXFScCACB0+weHQIA
	AA==
X-CMS-MailID: 20171030131648eucas1p14830b04b73c8eb12d07b752738238b8a
X-Msg-Generator: CA
CMS-TYPE: 201P
X-CMS-RootMailID: 20171030131648eucas1p14830b04b73c8eb12d07b752738238b8a
X-RootMTR: 20171030131648eucas1p14830b04b73c8eb12d07b752738238b8a
References: <1509369390-8285-1-git-send-email-a.perevalov@samsung.com>
	<CGME20171030131648eucas1p14830b04b73c8eb12d07b752738238b8a@eucas1p1.samsung.com>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 210.118.77.11
Subject: [Qemu-devel] [PATCH v12 3/6] migration: calculate vCPU blocktime on
	dst side
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: heetae82.ahn@samsung.com, quintela@redhat.com, dgilbert@redhat.com,
	peterx@redhat.com, Alexey Perevalov <a.perevalov@samsung.com>,
	i.maximets@samsung.com
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This patch provides blocktime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for blocktime per vCPU (could be traced with page_fault_addr)

Blocktime will not calculated if postcopy_blocktime field of
MigrationIncomingState wasn't initialized.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 migration/postcopy-ram.c | 143 ++++++++++++++++++++++++++++++++++++++++++++++-
 migration/trace-events   |   5 +-
 2 files changed, 146 insertions(+), 2 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index c18ec5a..6bf24e9 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -553,6 +553,142 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+static int get_mem_fault_cpu_index(uint32_t pid)
+{
+    CPUState *cpu_iter;
+
+    CPU_FOREACH(cpu_iter) {
+        if (cpu_iter->thread_id == pid) {
+            trace_get_mem_fault_cpu_index(cpu_iter->cpu_index, pid);
+            return cpu_iter->cpu_index;
+        }
+    }
+    trace_get_mem_fault_cpu_index(-1, pid);
+    return -1;
+}
+
+/*
+ * This function is being called when pagefault occurs. It
+ * tracks down vCPU blocking time.
+ *
+ * @addr: faulted host virtual address
+ * @ptid: faulted process thread id
+ * @rb: ramblock appropriate to addr
+ */
+static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
+                                          RAMBlock *rb)
+{
+    int cpu, already_received;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+    int64_t now_ms;
+
+    if (!dc || ptid == 0) {
+        return;
+    }
+    cpu = get_mem_fault_cpu_index(ptid);
+    if (cpu < 0) {
+        return;
+    }
+
+    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    if (dc->vcpu_addr[cpu] == 0) {
+        atomic_inc(&dc->smp_cpus_down);
+    }
+
+    atomic_xchg__nocheck(&dc->last_begin, now_ms);
+    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
+    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
+
+    /* check it here, not at the begining of the function,
+     * due to, check could accur early than bitmap_set in
+     * qemu_ufd_copy_ioctl */
+    already_received = ramblock_recv_bitmap_test(rb, (void *)addr);
+    if (already_received) {
+        atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
+        atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
+        atomic_dec(&dc->smp_cpus_down);
+    }
+    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
+                                        cpu, already_received);
+}
+
+/*
+ *  This function just provide calculated blocktime per cpu and trace it.
+ *  Total blocktime is calculated in mark_postcopy_blocktime_end.
+ *
+ *
+ * Assume we have 3 CPU
+ *
+ *      S1        E1           S1               E1
+ * -----***********------------xxx***************------------------------> CPU1
+ *
+ *             S2                E2
+ * ------------****************xxx---------------------------------------> CPU2
+ *
+ *                         S3            E3
+ * ------------------------****xxx********-------------------------------> CPU3
+ *
+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
+ * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
+ *            it's a part of total blocktime.
+ * S1 - here is last_begin
+ * Legend of the picture is following:
+ *              * - means blocktime per vCPU
+ *              x - means overlapped blocktime (total blocktime)
+ *
+ * @addr: host virtual address
+ */
+static void mark_postcopy_blocktime_end(uint64_t addr)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+    int i, affected_cpu = 0;
+    int64_t now_ms;
+    bool vcpu_total_blocktime = false;
+    int64_t read_vcpu_time;
+
+    if (!dc) {
+        return;
+    }
+
+    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+    /* lookup cpu, to clear it,
+     * that algorithm looks straighforward, but it's not
+     * optimal, more optimal algorithm is keeping tree or hash
+     * where key is address value is a list of  */
+    for (i = 0; i < smp_cpus; i++) {
+        uint64_t vcpu_blocktime = 0;
+
+        read_vcpu_time = atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
+        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr ||
+            read_vcpu_time == 0) {
+            continue;
+        }
+        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
+        vcpu_blocktime = now_ms - read_vcpu_time;
+        affected_cpu += 1;
+        /* we need to know is that mark_postcopy_end was due to
+         * faulted page, another possible case it's prefetched
+         * page and in that case we shouldn't be here */
+        if (!vcpu_total_blocktime &&
+            atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
+            vcpu_total_blocktime = true;
+        }
+        /* continue cycle, due to one page could affect several vCPUs */
+        dc->vcpu_blocktime[i] += vcpu_blocktime;
+    }
+
+    atomic_sub(&dc->smp_cpus_down, affected_cpu);
+    if (vcpu_total_blocktime) {
+        dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
+    }
+    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime,
+                                      affected_cpu);
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -630,8 +766,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
         rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
         trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
                                                 qemu_ram_get_idstr(rb),
-                                                rb_offset);
+                                                rb_offset,
+                                                msg.arg.pagefault.feat.ptid);
 
+        mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),
+                                      msg.arg.pagefault.feat.ptid, rb);
         /*
          * Send the request to the source - we want to request one
          * of our host page sizes (which is >= TPS)
@@ -721,6 +860,8 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
     if (!ret) {
         ramblock_recv_bitmap_set_range(rb, host_addr,
                                        pagesize / qemu_target_page_size());
+        mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host_addr);
+
     }
     return ret;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 6f29fcc..462d157 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -115,6 +115,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
 migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
+mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu, int received) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", cpu: %d, already_received: %d"
+mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %" PRId64 ", affected_cpu: %d"
 
 # migration/rdma.c
 qemu_rdma_accept_incoming_migration(void) ""
@@ -191,7 +193,7 @@ postcopy_ram_enable_notify(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
 postcopy_ram_fault_thread_quit(void) ""
-postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx"
+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx pid=%u"
 postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
@@ -200,6 +202,7 @@ save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
+get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
 
 # migration/exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"