From patchwork Mon Jun 25 12:05:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 10485819 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D44446031B for ; Mon, 25 Jun 2018 12:30:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B919D2841C for ; Mon, 25 Jun 2018 12:30:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ABF0D284BD; Mon, 25 Jun 2018 12:30:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05D6F2841C for ; Mon, 25 Jun 2018 12:30:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51D056B0007; Mon, 25 Jun 2018 08:30:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4CCE16B000A; Mon, 25 Jun 2018 08:30:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 371CB6B000C; Mon, 25 Jun 2018 08:30:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id DD3886B0007 for ; Mon, 25 Jun 2018 08:30:37 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id y8-v6so7035244pfl.17 for ; Mon, 25 Jun 2018 05:30:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=PCqfWpvRT7TNk7/QPxy3VdYstCsi26tbhurgM1I9EKo=; b=Q/Q9GuXlYgcdbkkxTTjm2TGiZIX0O4vSdrnU6Zc0Eel+j1dHwOpCLgJQrjbYsvWWkh /82XR8zqCdk/HzAOfzIuhTXRxFh0KLZZbxi6MKugdLw+MkKer8h/mv7F8xnTmQhJms2k gF/1PKd4+WwkjlkkaGDZutNSAg0sfnt0SYTAjMR26qvHUhmRoLlP5JUocFaXt/OF5PBu C/7OfXs2sbYRknh3j38Lh+/C4FyZOZb5TcSUssSKMokSrQAgkpFcrL22Nh+arXPL1TlQ I7S9QE4n4Ql0wZHaREHH+74O5pLOzF14+8kYvjU6LrreAgUprH/LFsb4UWKvY81Z6PR0 ILrA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of wei.w.wang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=wei.w.wang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APt69E1v8E4O9DBjDOuMAhpDG7OOPxE4qd93IUxR5mCup/MMqJWzSmBn eSe3+1xvItDVHntOJtr06xwV3nul52aOwpTsm5NLGBXy/Y7BJmwjevDQjfDbIFWSWKq87VUFRR1 i7vYSN6gLS1lxrwh93eDOqlPCoX1LUUpVQwwfc9EIvYbHKvVSbZXruxFBtyL5MYD7Zg== X-Received: by 2002:a62:90db:: with SMTP id q88-v6mr12976937pfk.186.1529929837432; Mon, 25 Jun 2018 05:30:37 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLSbcf+2DiZivVAPlZ2AGY0gEeyuCnETRaOg/NV0gePV8WNDlV7oYZXlazyEHYIsKmQchmn X-Received: by 2002:a62:90db:: with SMTP id q88-v6mr12976867pfk.186.1529929836461; Mon, 25 Jun 2018 05:30:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529929836; cv=none; d=google.com; s=arc-20160816; b=Yi75Vu7jON+e9BF4wTjTd+ZWjdWKnb9FYd//zksNsy6e4ERgN2/hQck7HBg+CvFEa2 r/mcbQJwrMWYzBV1R3Yzz/DHKo5Yw/XgisJ/ubEwbzR6oORu6rAc1BsRDLUnI6jG075G 5bwt8eESDGYNcvgq8RLw7buMjWJbZUSbnEGV9TXzQKNhW4g1xRcHdf6WHdzE9Jdt3zPs o/5vgktAq+DUws819qdASMZZKPyCfEmm4Yyiu6TfbHJpQEm+Cvclnxa3cj6St+nhRZti EzjFXU5k7CwZfQLurrPN15uE3AhsJYospkmR9/EU9ETQQHnXKXrit13o+x99oJc4VO+0 KjRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=PCqfWpvRT7TNk7/QPxy3VdYstCsi26tbhurgM1I9EKo=; b=r6gSXDWIDvMyH/kt3xhLix5j5AckcjwwkTkkfMh8Q+vKM63+s07kIEZNN+u+Nn2QSk +KMN26shh7TTo1e7Cz1qaT0KyniIiwCh2XQnIzBXVbrRkpf4SHF6rnwU/tJLVEvaIAFV e+Cd4klfCyrG0UfhjYTZ7Db48yrEpcbcRypmb+YUg5g5An66GFujvqYAc5hOiscreRnr VwqPPSUvN4mBtcW+dIbnohJg8GH4YFJJrXSUzzlR/+A8h0Fby34rQaqjV2OzlD13fg77 Lcs50eZLVD+0/rjdEkHSn/XmiOwExUt/B0pVJLQP9BhQeKShL10t9lqHTdm7jsT7wd/o XXWQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of wei.w.wang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=wei.w.wang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga17.intel.com (mga17.intel.com. [192.55.52.151]) by mx.google.com with ESMTPS id w65-v6si2893805pgb.377.2018.06.25.05.30.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Jun 2018 05:30:36 -0700 (PDT) Received-SPF: pass (google.com: domain of wei.w.wang@intel.com designates 192.55.52.151 as permitted sender) client-ip=192.55.52.151; Authentication-Results: mx.google.com; spf=pass (google.com: domain of wei.w.wang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=wei.w.wang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Jun 2018 05:30:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,270,1526367600"; d="scan'208";a="67044114" Received: from devel-ww.sh.intel.com ([10.239.48.110]) by fmsmga001.fm.intel.com with ESMTP; 25 Jun 2018 05:30:33 -0700 From: Wei Wang To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org, akpm@linux-foundation.org Cc: torvalds@linux-foundation.org, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com, yang.zhang.wz@gmail.com, quan.xu0@gmail.com, nilal@redhat.com, riel@redhat.com, peterx@redhat.com Subject: [PATCH v34 1/4] mm: support to get hints of free page blocks Date: Mon, 25 Jun 2018 20:05:09 +0800 Message-Id: <1529928312-30500-2-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1529928312-30500-1-git-send-email-wei.w.wang@intel.com> References: <1529928312-30500-1-git-send-email-wei.w.wang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch adds support to get free page blocks from a free page list. The physical addresses of the blocks are stored to the arrays passed from the caller. The obtained free page blocks are hints about free pages, because there is no guarantee that they are still on the free page list after the function returns. One use example of this patch is to accelerate live migration by skipping the transfer of free pages reported from the guest. A popular method used by the hypervisor to track which part of memory is written during live migration is to write-protect all the guest memory. So, those pages that are hinted as free pages but are written after this function returns will be captured by the hypervisor, and they will be added to the next round of memory transfer. Suggested-by: Linus Torvalds Signed-off-by: Wei Wang Signed-off-by: Liang Li Cc: Michal Hocko Cc: Andrew Morton Cc: Michael S. Tsirkin Cc: Linus Torvalds --- include/linux/mm.h | 3 ++ mm/page_alloc.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index a0fbb9f..1b51d43 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2007,6 +2007,9 @@ extern void free_area_init(unsigned long * zones_size); extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); +uint32_t max_free_page_blocks(int order); +uint32_t get_from_free_page_list(int order, uint32_t num, __le64 *buf[], + uint32_t size); /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100..2e462ab 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5042,6 +5042,88 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) show_swap_cache_info(); } +/** + * max_free_page_blocks - estimate the max number of free page blocks + * @order: the order of the free page blocks to estimate + * + * This function gives a rough estimation of the possible maximum number of + * free page blocks a free list may have. The estimation works on an assumption + * that all the system pages are on that list. + * + * Context: Any context. + * + * Return: The largest number of free page blocks that the free list can have. + */ +uint32_t max_free_page_blocks(int order) +{ + return totalram_pages / (1 << order); +} +EXPORT_SYMBOL_GPL(max_free_page_blocks); + +/** + * get_from_free_page_list - get hints of free pages from a free page list + * @order: the order of the free page list to check + * @num: the number of arrays + * @bufs: the arrays to store the physical addresses of the free page blocks + * @size: the number of entries each array has + * + * This function offers hints about free pages. The addresses of free page + * blocks are stored to the arrays passed from the caller. There is no + * guarantee that the obtained free pages are still on the free page list + * after the function returns. pfn_to_page on the obtained free pages is + * strongly discouraged and if there is an absolute need for that, make sure + * to contact MM people to discuss potential problems. + * + * The addresses are currently stored to an array in little endian. This + * avoids the overhead of converting endianness by the caller who needs data + * in the little endian format. Big endian support can be added on demand in + * the future. The maximum number of free page blocks that can be obtained is + * limited to the size of arrays. + * + * Context: Process context. + * + * Return: The number of free page blocks obtained from the free page list. + */ +uint32_t get_from_free_page_list(int order, uint32_t num, __le64 *bufs[], + uint32_t size) +{ + struct zone *zone; + enum migratetype mt; + struct page *page; + struct list_head *list; + unsigned long addr; + uint32_t array_index = 0, entry_index = 0; + __le64 *array = bufs[array_index]; + + /* Validity check */ + if (order < 0 || order >= MAX_ORDER) + return 0; + + for_each_populated_zone(zone) { + spin_lock_irq(&zone->lock); + for (mt = 0; mt < MIGRATE_TYPES; mt++) { + list = &zone->free_area[order].free_list[mt]; + list_for_each_entry(page, list, lru) { + addr = page_to_pfn(page) << PAGE_SHIFT; + /* This array is full, so use the next one */ + if (entry_index == size) { + /* All the arrays are consumed */ + if (++array_index == num) { + spin_unlock_irq(&zone->lock); + return array_index * size; + } + array = bufs[array_index]; + entry_index = 0; + } + array[entry_index++] = cpu_to_le64(addr); + } + } + spin_unlock_irq(&zone->lock); + } + + return array_index * size + entry_index; +} +EXPORT_SYMBOL_GPL(get_from_free_page_list); static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) {