From patchwork Tue Jul 10 09:31:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 10516737 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0B9E3600CA for ; Tue, 10 Jul 2018 09:56:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED4DA28B59 for ; Tue, 10 Jul 2018 09:56:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E138928BA9; Tue, 10 Jul 2018 09:56:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3FA9028B59 for ; Tue, 10 Jul 2018 09:56:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 350FD6B000D; Tue, 10 Jul 2018 05:56:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2DB596B000E; Tue, 10 Jul 2018 05:56:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 17D5D6B0266; Tue, 10 Jul 2018 05:56:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f71.google.com (mail-pl0-f71.google.com [209.85.160.71]) by kanga.kvack.org (Postfix) with ESMTP id C704E6B000E for ; Tue, 10 Jul 2018 05:56:35 -0400 (EDT) Received: by mail-pl0-f71.google.com with SMTP id ba8-v6so12166507plb.4 for ; Tue, 10 Jul 2018 02:56:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=yHytI94k0CgojJBfueqDywJbmXuLCTTHJqJINJJh3CQ=; b=Bl6KSh9ZYKaRnQ3E5ib8LJeDJoLFtmyruq88fsjg2LzLScvmohqohiut6gmt9KTzCM DGnv+kFKBdmRPHRGwCAG7dGJjvhWsEmpAZFqDL6JnjrTHhh+d5DZk1xNwHjRjsTOD5Ba H/s0YXI78tndRfXjALfTg9uUTH9hocI9/oTWP3zR2GR3VTjeHoXz19CeK4AynDlgbnIQ ED1PXUmnJJvO8cWW/0L5Fswqg9YQ5ICvpKNj7ZQy0Q2pee9vuQniJKqaawX9bvIWQ43V 1Ey0U16/3qWJfVL95BjclDhw2AnGhgtgm3nGh+6go3i/zpc69EeNmzY/531h/fyMJ/AC 9h1g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of wei.w.wang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=wei.w.wang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APt69E3+n3Jp3ARLGpOTR43mTLLWZo6P3NlEcT8hhk5iC4bQirQkD0R2 EO2TGymvBwAFI7Xm9Zwu60M0+KqZ7a4s65rIXXwtDZ3GXnZPNZLWFlAJQNPRKkNF3EXUGN7e3go ujKiLhtUUgz9FWa/qXlrManNtW8muODAficZg1LriSXny6P2bW5JVJ+ofGJ5aBOiGAA== X-Received: by 2002:a62:e18:: with SMTP id w24-v6mr25017913pfi.145.1531216595461; Tue, 10 Jul 2018 02:56:35 -0700 (PDT) X-Google-Smtp-Source: AAOMgpd65pbQ9TY9tavzGaKrlldy8oBL6DjgdeVwfM5lEKyBG2EUHA83MaJuZoD7yK0adtxpYekh X-Received: by 2002:a62:e18:: with SMTP id w24-v6mr25017863pfi.145.1531216594534; Tue, 10 Jul 2018 02:56:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531216594; cv=none; d=google.com; s=arc-20160816; b=PfM1cGlWv7aDRlRLhyeKnDyo4hgjG9oZf9UJf9WX6PslpWMYFDDnIWqJZbM03oCy6W HiDJrBtrCAnN1dA3fqUWgsIIO9tipoSCkHg0jg4oSAClfC3IJnatAYTADX1M7HLns8pH 2d3Suxs5RccAnUSNq3w+if4a2eg8EvknkeWKXSdfAO4mVwrZ7/DJhMWKyFwVOJXuNAaZ byuJlu0jgg8ZzcN02QYMAuMujRbp3Xa8Mb+PkFXbteUJpZBWReRedzx742T91yOQpgPr FASEE7rb0kIULDmSf2JoBn6mObY0VcZxfWBbY/js1g4FmidKmH6Y3C0tGKM6U+ITbEOh 09qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=yHytI94k0CgojJBfueqDywJbmXuLCTTHJqJINJJh3CQ=; b=ZCX80UPdVXCV5G7P7KHVv+fekgZADxEToqIJbkgiZkmtZSJ3dPpjJIkv9owgzZ0Gqn 5292ihNGfTe8R7L7Kuce1DsjFumJMWK4rTG4gK1Rpn84WRILTUuQ+HUBatyXBPFlzFa/ MR+RvE+LZh/wkEIKDEoY7pJx7H5CE85CGTvr1u5xDKWNpqxhCu7JK8BQVUwI34x1IJgt POEM3fG11HwsqaQzKT1L2mBJImpPP7aN2/VmtXIj29Wi2CvSzRG1b+8c2wsEHi0ku+xu g0UA2xSFv35Z3KUhy84Ovg+867u3Rw0LDu6UsEJ0xl0YvHuLXBcqNI8VPo9apv30C164 qXHA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of wei.w.wang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=wei.w.wang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id w4-v6si17586520pfb.52.2018.07.10.02.56.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 10 Jul 2018 02:56:34 -0700 (PDT) Received-SPF: pass (google.com: domain of wei.w.wang@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of wei.w.wang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=wei.w.wang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Jul 2018 02:56:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,334,1526367600"; d="scan'208";a="73598558" Received: from devel-ww.sh.intel.com ([10.239.48.110]) by orsmga002.jf.intel.com with ESMTP; 10 Jul 2018 02:56:27 -0700 From: Wei Wang To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org, akpm@linux-foundation.org Cc: torvalds@linux-foundation.org, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com, yang.zhang.wz@gmail.com, quan.xu0@gmail.com, nilal@redhat.com, riel@redhat.com, peterx@redhat.com Subject: [PATCH v35 1/5] mm: support to get hints of free page blocks Date: Tue, 10 Jul 2018 17:31:03 +0800 Message-Id: <1531215067-35472-2-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531215067-35472-1-git-send-email-wei.w.wang@intel.com> References: <1531215067-35472-1-git-send-email-wei.w.wang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch adds support to get free page blocks from a free page list. The physical addresses of the blocks are stored to a list of buffers passed from the caller. The obtained free page blocks are hints about free pages, because there is no guarantee that they are still on the free page list after the function returns. One use example of this patch is to accelerate live migration by skipping the transfer of free pages reported from the guest. A popular method used by the hypervisor to track which part of memory is written during live migration is to write-protect all the guest memory. So, those pages that are hinted as free pages but are written after this function returns will be captured by the hypervisor, and they will be added to the next round of memory transfer. Suggested-by: Linus Torvalds Signed-off-by: Wei Wang Signed-off-by: Liang Li Cc: Michal Hocko Cc: Andrew Morton Cc: Michael S. Tsirkin Cc: Linus Torvalds --- include/linux/mm.h | 3 ++ mm/page_alloc.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index a0fbb9f..5ce654f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2007,6 +2007,9 @@ extern void free_area_init(unsigned long * zones_size); extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); +unsigned long max_free_page_blocks(int order); +int get_from_free_page_list(int order, struct list_head *pages, + unsigned int size, unsigned long *loaded_num); /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100..b67839b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5043,6 +5043,104 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) show_swap_cache_info(); } +/** + * max_free_page_blocks - estimate the max number of free page blocks + * @order: the order of the free page blocks to estimate + * + * This function gives a rough estimation of the possible maximum number of + * free page blocks a free list may have. The estimation works on an assumption + * that all the system pages are on that list. + * + * Context: Any context. + * + * Return: The largest number of free page blocks that the free list can have. + */ +unsigned long max_free_page_blocks(int order) +{ + return totalram_pages / (1 << order); +} +EXPORT_SYMBOL_GPL(max_free_page_blocks); + +/** + * get_from_free_page_list - get hints of free pages from a free page list + * @order: the order of the free page list to check + * @pages: the list of page blocks used as buffers to load the addresses + * @size: the size of each buffer in bytes + * @loaded_num: the number of addresses loaded to the buffers + * + * This function offers hints about free pages. The addresses of free page + * blocks are stored to the list of buffers passed from the caller. There is + * no guarantee that the obtained free pages are still on the free page list + * after the function returns. pfn_to_page on the obtained free pages is + * strongly discouraged and if there is an absolute need for that, make sure + * to contact MM people to discuss potential problems. + * + * The addresses are currently stored to a buffer in little endian. This + * avoids the overhead of converting endianness by the caller who needs data + * in the little endian format. Big endian support can be added on demand in + * the future. + * + * Context: Process context. + * + * Return: 0 if all the free page block addresses are stored to the buffers; + * -ENOSPC if the buffers are not sufficient to store all the + * addresses; or -EINVAL if an unexpected argument is received (e.g. + * incorrect @order, empty buffer list). + */ +int get_from_free_page_list(int order, struct list_head *pages, + unsigned int size, unsigned long *loaded_num) +{ + struct zone *zone; + enum migratetype mt; + struct list_head *free_list; + struct page *free_page, *buf_page; + unsigned long addr; + __le64 *buf; + unsigned int used_buf_num = 0, entry_index = 0, + entries = size / sizeof(__le64); + *loaded_num = 0; + + /* Validity check */ + if (order < 0 || order >= MAX_ORDER) + return -EINVAL; + + buf_page = list_first_entry_or_null(pages, struct page, lru); + if (!buf_page) + return -EINVAL; + buf = (__le64 *)page_address(buf_page); + + for_each_populated_zone(zone) { + spin_lock_irq(&zone->lock); + for (mt = 0; mt < MIGRATE_TYPES; mt++) { + free_list = &zone->free_area[order].free_list[mt]; + list_for_each_entry(free_page, free_list, lru) { + addr = page_to_pfn(free_page) << PAGE_SHIFT; + /* This buffer is full, so use the next one */ + if (entry_index == entries) { + buf_page = list_next_entry(buf_page, + lru); + /* All the buffers are consumed */ + if (!buf_page) { + spin_unlock_irq(&zone->lock); + *loaded_num = used_buf_num * + entries; + return -ENOSPC; + } + buf = (__le64 *)page_address(buf_page); + entry_index = 0; + used_buf_num++; + } + buf[entry_index++] = cpu_to_le64(addr); + } + } + spin_unlock_irq(&zone->lock); + } + + *loaded_num = used_buf_num * entries + entry_index; + return 0; +} +EXPORT_SYMBOL_GPL(get_from_free_page_list); + static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) { zoneref->zone = zone;