From patchwork Wed Jun 7 07:35:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Perevalov X-Patchwork-Id: 9770759 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C53D660234 for ; Wed, 7 Jun 2017 07:44:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AE5A3283BE for ; Wed, 7 Jun 2017 07:44:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A133C28528; Wed, 7 Jun 2017 07:44:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AF7FC283BE for ; Wed, 7 Jun 2017 07:44:12 +0000 (UTC) Received: from localhost ([::1]:41679 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dIVdT-0001de-R4 for patchwork-qemu-devel@patchwork.kernel.org; Wed, 07 Jun 2017 03:44:11 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54651) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dIVVk-00049S-Oo for qemu-devel@nongnu.org; Wed, 07 Jun 2017 03:36:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dIVVh-0000ug-Ha for qemu-devel@nongnu.org; Wed, 07 Jun 2017 03:36:12 -0400 Received: from mailout4.w1.samsung.com ([210.118.77.14]:17202) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dIVVh-0000tj-Ay for qemu-devel@nongnu.org; Wed, 07 Jun 2017 03:36:09 -0400 Received: from eucas1p2.samsung.com (unknown [182.198.249.207]) by mailout4.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0OR600LAL2G7JG60@mailout4.w1.samsung.com> for qemu-devel@nongnu.org; Wed, 07 Jun 2017 08:36:07 +0100 (BST) Received: from eusmges1.samsung.com (unknown [203.254.199.239]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20170607073607eucas1p15632b5352341969fad70c26266bf9f01~Fxr09dmI92023320233eucas1p1H; Wed, 7 Jun 2017 07:36:07 +0000 (GMT) Received: from eucas1p2.samsung.com ( [182.198.249.207]) by eusmges1.samsung.com (EUCPMTA) with SMTP id 87.61.14140.9ECA7395; Wed, 7 Jun 2017 08:36:09 +0100 (BST) Received: from eusmgms2.samsung.com (unknown [182.198.249.180]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20170607073606eucas1p15855edc9776f50d47eb18b1d4ba3263e~Fxr0TUjwq2014420144eucas1p1K; Wed, 7 Jun 2017 07:36:06 +0000 (GMT) X-AuditID: cbfec7ef-f796a6d00000373c-e3-5937ace931e6 Received: from eusync4.samsung.com ( [203.254.199.214]) by eusmgms2.samsung.com (EUCPMTA) with SMTP id 3B.3C.20206.6ECA7395; Wed, 7 Jun 2017 08:36:06 +0100 (BST) Received: from aperevalov-ubuntu.rnd.samsung.ru ([106.109.129.199]) by eusync4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0OR600HQN2FH4N90@eusync4.samsung.com>; Wed, 07 Jun 2017 08:36:06 +0100 (BST) From: Alexey Perevalov To: qemu-devel@nongnu.org Date: Wed, 07 Jun 2017 10:35:29 +0300 Message-id: <1496820931-27416-10-git-send-email-a.perevalov@samsung.com> X-Mailer: git-send-email 1.9.1 In-reply-to: <1496820931-27416-1-git-send-email-a.perevalov@samsung.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrCIsWRmVeSWpSXmKPExsWy7djP87ov15hHGpx9oWYx9+55FovebffY La60/2S32LL/G7vF8d4dLA6sHk+ubWbyeL/vKptH35ZVjAHMUVw2Kak5mWWpRfp2CVwZy+6c Zy745F6x5/9+lgbGc2ZdjJwcEgImEj2bnzBB2GISF+6tZ+ti5OIQEljGKLF6wgR2COczo8Tm g1dYYDr27nnEDle18uVxVginm0lizbaPjF2MHBxsAgYS++7ZgjSICEhK/O46zQxiMwsUSHxs XQk2SFjAW2LOyTVsIOUsAqoSCyb4gYR5BTwk+r6vYoTYJSdx8thkVhCbEyi+78NaJpBVEgK3 2ST2PP7MBNIrISArsekAM4TpIjH3vD5Eq7DEq+Nb2CFsGYnLk7tZIFrbGSW6d3ayQjgTGCXO TP8LVWUvcermVSaIO/kkJm2bDjWUV6KjTQiixENi3YZ3LBBhR4ld050gPp/NKNE85QLzBEaZ BYwMqxhFUkuLc9NTiw31ihNzi0vz0vWS83M3MQLj8vS/4+93MD5tDjnEKMDBqMTDm7HHLFKI NbGsuDL3EKMEB7OSCO+lpeaRQrwpiZVVqUX58UWlOanFhxilOViUxHl5T12LEBJITyxJzU5N LUgtgskycXBKNTDy2Dm+PvTq8Z7E/2K3M+5NU11xfemdm++6hT4qnJ9SWLRxjkfn6uA3pysy js1Tn9Sd/en1jCxR+Yj9tz/Mm7c+KkXsuu3KDV231OQcvdV80oVv3xGd99dd8MTMtt8XXzh5 hPL9Y1C/LKppc5d3xppr/R17F27cnmWYNDtk9dezkyTd+aIUv67gUWIpzkg01GIuKk4EABc3 +LvHAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsVy+t/xa7rP1phHGhyYy2Ux9+55FovebffY La60/2S32LL/G7vF8d4dLA6sHk+ubWbyeL/vKptH35ZVjAHMUW42GamJKalFCql5yfkpmXnp tkqhIW66FkoKeYm5qbZKEbq+IUFKCmWJOaVAnpEBGnBwDnAPVtK3S3DLWHbnPHPBJ/eKPf/3 szQwnjPrYuTkkBAwkdi75xE7hC0mceHeerYuRi4OIYEljBKXF56GcnqZJJY09LN0MXJwsAkY SOy7ZwvSICIgKfG76zQziM0sUCBxbvIUFhBbWMBbYs7JNWwg5SwCqhILJviBhHkFPCT6vq9i hNglJ3Hy2GRWEJsTKL7vw1omEFtIwF1i9+kZbBMYeRcwMqxiFEktLc5Nzy020itOzC0uzUvX S87P3cQIDNFtx35u2cHY9S74EKMAB6MSD2/GHrNIIdbEsuLK3EOMEhzMSiK8l5aaRwrxpiRW VqUW5ccXleakFh9iNAW6aSKzlGhyPjB+8kriDU0MzS0NjYwtLMyNjJTEead+uBIuJJCeWJKa nZpakFoE08fEwSnVwDjVoihqg4r6KR/2+A6Jdfrcj6IKRG0DZRg3i6z8E1JxuvMbb37Z7pkz maIfT3V6UP8zLvM1y57tP0XPncnrXzbPdjfXad1L2/xZL0W/u8Rrb6Vyo+fNp6aTsny/vE9J lG7mSvyTZn5kFuunO8InL1+8ubziYvXC33Fcws1y0mWMW/aszeB6kaLEUpyRaKjFXFScCADP dQ6DZwIAAA== X-MTR: 20000000000000000@CPGS X-CMS-MailID: 20170607073606eucas1p15855edc9776f50d47eb18b1d4ba3263e X-Msg-Generator: CA X-Sender-IP: 182.198.249.180 X-Local-Sender: =?UTF-8?B?QWxleGV5IFBlcmV2YWxvdhtTUlItVmlydHVhbGl6YXRpb24g?= =?UTF-8?B?TGFiG+yCvOyEseyghOyekBtTZW5pb3IgRW5naW5lZXI=?= X-Global-Sender: =?UTF-8?B?QWxleGV5IFBlcmV2YWxvdhtTUlItVmlydHVhbGl6YXRpb24g?= =?UTF-8?B?TGFiG1NhbXN1bmcgRWxlY3Ryb25pY3MbU2VuaW9yIEVuZ2luZWVy?= X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?= CMS-TYPE: 201P X-HopCount: 7 X-CMS-RootMailID: 20170607073606eucas1p15855edc9776f50d47eb18b1d4ba3263e X-RootMTR: 20170607073606eucas1p15855edc9776f50d47eb18b1d4ba3263e References: <1496820931-27416-1-git-send-email-a.perevalov@samsung.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 210.118.77.14 Subject: [Qemu-devel] [[PATCH V7] 09/11] migration: calculate vCPU blocktime on dst side X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: i.maximets@samsung.com, Alexey Perevalov , peterx@redhat.com, dgilbert@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP This patch provides blocktime calculation per vCPU, as a summary and as a overlapped value for all vCPUs. This approach was suggested by Peter Xu, as an improvements of previous approch where QEMU kept tree with faulted page address and cpus bitmask in it. Now QEMU is keeping array with faulted page address as value and vCPU as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps list for blocktime per vCPU (could be traced with page_fault_addr) Blocktime will not calculated if postcopy_blocktime field of MigrationIncomingState wasn't initialized. Signed-off-by: Alexey Perevalov --- migration/postcopy-ram.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++- migration/trace-events | 5 +- 2 files changed, 142 insertions(+), 2 deletions(-) diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 62a272a..0ad9f9f 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -27,6 +27,7 @@ #include "ram.h" #include "sysemu/sysemu.h" #include "sysemu/balloon.h" +#include #include "qemu/error-report.h" #include "trace.h" @@ -561,6 +562,133 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr, return 0; } +static int get_mem_fault_cpu_index(uint32_t pid) +{ + CPUState *cpu_iter; + + CPU_FOREACH(cpu_iter) { + if (cpu_iter->thread_id == pid) { + return cpu_iter->cpu_index; + } + } + trace_get_mem_fault_cpu_index(pid); + return -1; +} + +/* + * This function is being called when pagefault occurs. It + * tracks down vCPU blocking time. + * + * @addr: faulted host virtual address + * @ptid: faulted process thread id + * @rb: ramblock appropriate to addr + */ +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid, + RAMBlock *rb) +{ + int cpu; + MigrationIncomingState *mis = migration_incoming_get_current(); + PostcopyBlocktimeContext *dc = mis->blocktime_ctx; + int64_t now_ms; + + if (!dc || ptid == 0) { + return; + } + cpu = get_mem_fault_cpu_index(ptid); + if (cpu < 0) { + return; + } + + now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + if (dc->vcpu_addr[cpu] == 0) { + atomic_inc(&dc->smp_cpus_down); + } + + atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr); + atomic_xchg__nocheck(&dc->last_begin, now_ms); + atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms); + + if (test_copiedmap_by_addr(addr, rb)) { + atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0); + atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0); + atomic_sub(&dc->smp_cpus_down, 1); + } + trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu], + cpu); +} + +/* + * This function just provide calculated blocktime per cpu and trace it. + * Total blocktime is calculated in mark_postcopy_blocktime_end. + * + * + * Assume we have 3 CPU + * + * S1 E1 S1 E1 + * -----***********------------xxx***************------------------------> CPU1 + * + * S2 E2 + * ------------****************xxx---------------------------------------> CPU2 + * + * S3 E3 + * ------------------------****xxx********-------------------------------> CPU3 + * + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1 + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3 + * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 - + * it's a part of total blocktime. + * S1 - here is last_begin + * Legend of the picture is following: + * * - means blocktime per vCPU + * x - means overlapped blocktime (total blocktime) + * + * @addr: host virtual address + */ +static void mark_postcopy_blocktime_end(uint64_t addr) +{ + MigrationIncomingState *mis = migration_incoming_get_current(); + PostcopyBlocktimeContext *dc = mis->blocktime_ctx; + int i, affected_cpu = 0; + int64_t now_ms; + bool vcpu_total_blocktime = false; + + if (!dc) { + return; + } + + now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + + /* lookup cpu, to clear it, + * that algorithm looks straighforward, but it's not + * optimal, more optimal algorithm is keeping tree or hash + * where key is address value is a list of */ + for (i = 0; i < smp_cpus; i++) { + uint64_t vcpu_blocktime = 0; + if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) { + continue; + } + atomic_xchg__nocheck(&dc->vcpu_addr[i], 0); + vcpu_blocktime = now_ms - + atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0); + affected_cpu += 1; + /* we need to know is that mark_postcopy_end was due to + * faulted page, another possible case it's prefetched + * page and in that case we shouldn't be here */ + if (!vcpu_total_blocktime && + atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) { + vcpu_total_blocktime = true; + } + /* continue cycle, due to one page could affect several vCPUs */ + dc->vcpu_blocktime[i] += vcpu_blocktime; + } + + atomic_sub(&dc->smp_cpus_down, affected_cpu); + if (vcpu_total_blocktime) { + dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0); + } + trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime); +} + /* * Handle faults detected by the USERFAULT markings */ @@ -638,8 +766,11 @@ static void *postcopy_ram_fault_thread(void *opaque) rb_offset &= ~(qemu_ram_pagesize(rb) - 1); trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address, qemu_ram_get_idstr(rb), - rb_offset); + rb_offset, + msg.arg.pagefault.feat.ptid); + mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address), + msg.arg.pagefault.feat.ptid, rb); /* * Send the request to the source - we want to request one * of our host page sizes (which is >= TPS) @@ -723,6 +854,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from, copy_struct.len = pagesize; copy_struct.mode = 0; + /* copied page isn't feature of blocktime calculation, + * it's more general entity, so keep it here, + * but gup betwean two following operation could be high, + * and in this case blocktime for such small interval will be lost */ + set_copiedmap_by_addr((uint64_t)(uintptr_t)host, rb); + mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host); /* copy also acks to the kernel waking the stalled thread up * TODO: We can inhibit that ack and only do it if it was requested * which would be slightly cheaper, but we'd have to be careful diff --git a/migration/trace-events b/migration/trace-events index 5b8ccf3..7bdadbb 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -112,6 +112,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d" process_incoming_migration_co_postcopy_end_main(void) "" migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s" migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname) "ioc=%p ioctype=%s hostname=%s" +mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d" +mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64 # migration/rdma.c qemu_rdma_accept_incoming_migration(void) "" @@ -188,7 +190,7 @@ postcopy_ram_enable_notify(void) "" postcopy_ram_fault_thread_entry(void) "" postcopy_ram_fault_thread_exit(void) "" postcopy_ram_fault_thread_quit(void) "" -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx" +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u" postcopy_ram_incoming_cleanup_closeuf(void) "" postcopy_ram_incoming_cleanup_entry(void) "" postcopy_ram_incoming_cleanup_exit(void) "" @@ -197,6 +199,7 @@ save_xbzrle_page_skipping(void) "" save_xbzrle_page_overflow(void) "" ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations" ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64 +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU" # migration/exec.c migration_exec_outgoing(const char *cmd) "cmd=%s"