From patchwork Thu Mar 16 07:08:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 9627475 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6FF8060244 for ; Thu, 16 Mar 2017 07:13:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6153A285D4 for ; Thu, 16 Mar 2017 07:13:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 545752863B; Thu, 16 Mar 2017 07:13:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B590F285D4 for ; Thu, 16 Mar 2017 07:13:37 +0000 (UTC) Received: from localhost ([::1]:41355 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1coPbM-0005ku-P8 for patchwork-qemu-devel@patchwork.kernel.org; Thu, 16 Mar 2017 03:13:36 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41875) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1coPb3-0005j0-K8 for qemu-devel@nongnu.org; Thu, 16 Mar 2017 03:13:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1coPb2-0001FG-9N for qemu-devel@nongnu.org; Thu, 16 Mar 2017 03:13:17 -0400 Received: from mga04.intel.com ([192.55.52.120]:55976) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1coPb1-0001Ei-Ts for qemu-devel@nongnu.org; Thu, 16 Mar 2017 03:13:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=intel; t=1489648395; x=1521184395; h=from:to:subject:date:message-id:in-reply-to:references; bh=6W9beWh6HYfHZwlEavVAW/+lGXl0YyEYhSPyMLYK0ZM=; b=CfAyU/VjaUZx7ak/ejxr5yuAilvmUDmjd/Q99LzF91azwibDi9TW2iLj A2oSLQ4jRlHVwaOONBTZbq1Rei08kA==; Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Mar 2017 00:13:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,170,1486454400"; d="scan'208";a="76816757" Received: from devel-ww.sh.intel.com ([10.239.48.105]) by fmsmga005.fm.intel.com with ESMTP; 16 Mar 2017 00:13:11 -0700 From: Wei Wang To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, david@redhat.com, dave.hansen@intel.com, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com Date: Thu, 16 Mar 2017 15:08:46 +0800 Message-Id: <1489648127-37282-4-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1489648127-37282-1-git-send-email-wei.w.wang@intel.com> References: <1489648127-37282-1-git-send-email-wei.w.wang@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.120 Subject: [Qemu-devel] [PATCH kernel v8 3/4] mm: add inerface to offer info about unused pages X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Liang Li This patch adds a function to provides a snapshot of the present system unused pages. An important usage of this function is to provide the unsused pages to the Live migration thread, which skips the transfer of thoses unused pages. Newly used pages can be re-tracked by the dirty page logging mechanisms. Signed-off-by: Liang Li Signed-off-by: Wei Wang --- include/linux/mm.h | 3 ++ mm/page_alloc.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 117 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index b84615b..869749d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1764,6 +1764,9 @@ extern void free_area_init(unsigned long * zones_size); extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); +extern int record_unused_pages(struct zone **start_zone, int order, + __le64 *pages, unsigned int size, + unsigned int *pos, bool part_fill); /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f3e0c69..b72a7ac 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4498,6 +4498,120 @@ void show_free_areas(unsigned int filter) show_swap_cache_info(); } +static int __record_unused_pages(struct zone *zone, int order, + __le64 *buf, unsigned int size, + unsigned int *offset, bool part_fill) +{ + unsigned long pfn, flags; + int t, ret = 0; + struct list_head *curr; + __le64 *chunk; + + if (zone_is_empty(zone)) + return 0; + + spin_lock_irqsave(&zone->lock, flags); + + if (*offset + zone->free_area[order].nr_free > size && !part_fill) { + ret = -ENOSPC; + goto out; + } + for (t = 0; t < MIGRATE_TYPES; t++) { + list_for_each(curr, &zone->free_area[order].free_list[t]) { + pfn = page_to_pfn(list_entry(curr, struct page, lru)); + chunk = buf + *offset; + if (*offset + 2 > size) { + ret = -ENOSPC; + goto out; + } + /* Align to the chunk format used in virtio-balloon */ + *chunk = cpu_to_le64(pfn << 12); + *(chunk + 1) = cpu_to_le64((1 << order) << 12); + *offset += 2; + } + } + +out: + spin_unlock_irqrestore(&zone->lock, flags); + + return ret; +} + +/* + * The record_unused_pages() function is used to record the system unused + * pages. The unused pages can be skipped to transfer during live migration. + * Though the unused pages are dynamically changing, dirty page logging + * mechanisms are able to capture the newly used pages though they were + * recorded as unused pages via this function. + * + * This function scans the free page list of the specified order to record + * the unused pages, and chunks those continuous pages following the chunk + * format below: + * -------------------------------------- + * | Base (52-bit) | Rsvd (12-bit) | + * -------------------------------------- + * -------------------------------------- + * | Size (52-bit) | Rsvd (12-bit) | + * -------------------------------------- + * + * @start_zone: zone to start the record operation. + * @order: order of the free page list to record. + * @buf: buffer to record the unused page info in chunks. + * @size: size of the buffer in __le64 to record + * @offset: offset in the buffer to record. + * @part_fill: indicate if partial fill is used. + * + * return -EINVAL if parameter is invalid + * return -ENOSPC when the buffer is too small to record all the unsed pages + * return 0 when sccess + */ +int record_unused_pages(struct zone **start_zone, int order, + __le64 *buf, unsigned int size, + unsigned int *offset, bool part_fill) +{ + struct zone *zone; + int ret = 0; + bool skip_check = false; + + /* Make sure all the parameters are valid */ + if (buf == NULL || offset == NULL || order >= MAX_ORDER) + return -EINVAL; + + if (*start_zone != NULL) { + bool found = false; + + for_each_populated_zone(zone) { + if (zone != *start_zone) + continue; + found = true; + break; + } + if (!found) + return -EINVAL; + } else + skip_check = true; + + for_each_populated_zone(zone) { + /* Start from *start_zone if it's not NULL */ + if (!skip_check) { + if (*start_zone != zone) + continue; + else + skip_check = true; + } + ret = __record_unused_pages(zone, order, buf, size, + offset, part_fill); + if (ret < 0) { + /* record the failed zone */ + *start_zone = zone; + break; + } + } + + return ret; +} +EXPORT_SYMBOL(record_unused_pages); + static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) { zoneref->zone = zone;