From patchwork Mon Jun 3 17:03:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nitesh Narayan Lal X-Patchwork-Id: 10973497 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BA1BF6C5 for ; Mon, 3 Jun 2019 17:04:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9FBE728630 for ; Mon, 3 Jun 2019 17:04:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 932E128684; Mon, 3 Jun 2019 17:04:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 88E1728630 for ; Mon, 3 Jun 2019 17:04:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 862586B0286; Mon, 3 Jun 2019 13:04:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7EEF56B0288; Mon, 3 Jun 2019 13:04:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B57A6B0289; Mon, 3 Jun 2019 13:04:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-oi1-f197.google.com (mail-oi1-f197.google.com [209.85.167.197]) by kanga.kvack.org (Postfix) with ESMTP id 346B26B0286 for ; Mon, 3 Jun 2019 13:04:23 -0400 (EDT) Received: by mail-oi1-f197.google.com with SMTP id t15so5117135oic.14 for ; Mon, 03 Jun 2019 10:04:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=h73IcnqPEpgE8mC2EW+7uYms27DnNi4SED0AbLYhtfY=; b=exeGkjVbDgGRcbYmzi5qaYpdj5Z25FB5SEXV2ki5frrFzUg5UwdgaNRTWpeJNzJOVi NhIY6QPrZK2fwqkB2ON1G0xGP3cGOeol0RwxJAAtFHZQmw3GpCEzrcAC7GpLK/vZZj9K HZzSpXYG4DpmeiCWtfcXgU09C7eY4d7q7ON+YXjkpRBQiBuw/97TmpHE9XbtVcs3mLxj I+bwKIXk9ElO0b+dLG0IPyTEp95b9KclEwiOy8BBS3fn9heHRpeJEMeH5tgUzUjyKMCS q5dbPlPiBHTgJQ8qjc+OXB1OHPVXHjUOJt5Ta2QBLnd7rO6CZ7nohGHC1v9KGP1Wi2z6 WLZQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of nitesh@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=nitesh@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAWT5hs45EMkggySb9BZQjsAdr/MFwTxuvJcUIjDnR2T91SWLfNN kQldfCXgcNwf5jzoLwGyfq+1iitJa3jufezuvmxbVOn9iUm/c+1tFEt9Ae329tLkCZ9ri3e+ARY fXho5fJvtps7MIM3pfIoa27iNfnvPW5vD4hXAfR5gsKLQWEY2YS1qiu3anTfnsPNO/g== X-Received: by 2002:a05:6830:93:: with SMTP id a19mr656473oto.127.1559581462835; Mon, 03 Jun 2019 10:04:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqxSMfOXSNw8TQ6/sNbYtf08lMNd2+Ccx5RfRH6Lu/jY4wpvuksnNGVQqDfsm1S2vsFrgp3B X-Received: by 2002:a05:6830:93:: with SMTP id a19mr656369oto.127.1559581461521; Mon, 03 Jun 2019 10:04:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559581461; cv=none; d=google.com; s=arc-20160816; b=FjrFG/8WhyG8N8HOr7VSVpwZNo7cNv6eQBvaS+SGkRyXKgxAG9lIr8BLfvs/NBpvGA eTFs84uU8sknRaLjtB9LLunOOTFqsQZjhscpSgsRU3x0weLNOetosH0LtjWsm+Dylno3 2Yw0SuIrOHVCW/d3y4td5fuk8pgArivDGcd2htdBJAyVFfeDD6/Mxel8TWMdru4TkwoP so2p0PfR39J2L8FawgX5JpQhWPR6+eo3XA3GGllLw6LMlAIjybVgqYB3lpYLaAe6Lbyl vPFIhy7dELaskeCuSPRi22Y6sZS3Os9tS9bO4grAJ+BqPHcxETaRd3ehWmA/gPMJP3f7 9AHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from; bh=h73IcnqPEpgE8mC2EW+7uYms27DnNi4SED0AbLYhtfY=; b=cNwwAR1izKDEJdfE6a7qFr0e+fGZyWap4ECVbLxE/dJ7Ggn7ctctREadAbkkuiKpVb pSSskuvAq4D1ybp96bl3L66Wy/4rDTTCnnzaNSE9JFTKfWgFG2zzJBAAJib/d8OFfodC QfZ7UlXqcP85YS8anqManeLtIitwfSDFdqsJLUDFsRCfaVj8r1T8k7yMOPr3l7SJ9mAO 8udR11ZrAoKrJHYyklrhkHqds4mjCSfbTfZGigDUU5gkWUEsNuTZgQdN3/Gp1lBDPhYZ UWplfiY/yN4rXs/nDBMX0XjB2GW5LZY2HI2CFQFhFLx+mATVnkDsQVYNRmEFIr2XmMMa 1Xjw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of nitesh@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=nitesh@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id q133si302646oic.30.2019.06.03.10.04.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Jun 2019 10:04:21 -0700 (PDT) Received-SPF: pass (google.com: domain of nitesh@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of nitesh@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=nitesh@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2AD41316290E; Mon, 3 Jun 2019 17:04:07 +0000 (UTC) Received: from virtlab512.virt.lab.eng.bos.redhat.com (virtlab512.virt.lab.eng.bos.redhat.com [10.19.152.206]) by smtp.corp.redhat.com (Postfix) with ESMTP id 008B36108E; Mon, 3 Jun 2019 17:03:55 +0000 (UTC) From: Nitesh Narayan Lal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, mst@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com, alexander.duyck@gmail.com Subject: [RFC][Patch v10 1/2] mm: page_hinting: core infrastructure Date: Mon, 3 Jun 2019 13:03:05 -0400 Message-Id: <20190603170306.49099-2-nitesh@redhat.com> In-Reply-To: <20190603170306.49099-1-nitesh@redhat.com> References: <20190603170306.49099-1-nitesh@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Mon, 03 Jun 2019 17:04:12 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces the core infrastructure for free page hinting in virtual environments. It enables the kernel to track the free pages which can be reported to its hypervisor so that the hypervisor could free and reuse that memory as per its requirement. While the pages are getting processed in the hypervisor (e.g., via MADV_FREE), the guest must not use them, otherwise, data loss would be possible. To avoid such a situation, these pages are temporarily removed from the buddy. The amount of pages removed temporarily from the buddy is governed by the backend(virtio-balloon in our case). To efficiently identify free pages that can to be hinted to the hypervisor, bitmaps in a coarse granularity are used. Only fairly big chunks are reported to the hypervisor - especially, to not break up THP in the hypervisor - "MAX_ORDER - 2" on x86, and to save space. The bits in the bitmap are an indication whether a page *might* be free, not a guarantee. A new hook after buddy merging sets the bits. Bitmaps are stored per zone, protected by the zone lock. A workqueue asynchronously processes the bitmaps, trying to isolate and report pages that are still free. The backend (virtio-balloon) is responsible for reporting these batched pages to the host synchronously. Once reporting/ freeing is complete, isolated pages are returned back to the buddy. There are still various things to look into (e.g., memory hotplug, more efficient locking, possible races when disabling). Signed-off-by: Nitesh Narayan Lal --- drivers/virtio/Kconfig | 1 + include/linux/page_hinting.h | 46 +++++++ mm/Kconfig | 6 + mm/Makefile | 2 + mm/page_alloc.c | 17 +-- mm/page_hinting.c | 236 +++++++++++++++++++++++++++++++++++ 6 files changed, 301 insertions(+), 7 deletions(-) create mode 100644 include/linux/page_hinting.h create mode 100644 mm/page_hinting.c diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 35897649c24f..5a96b7a2ed1e 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -46,6 +46,7 @@ config VIRTIO_BALLOON tristate "Virtio balloon driver" depends on VIRTIO select MEMORY_BALLOON + select PAGE_HINTING ---help--- This driver supports increasing and decreasing the amount of memory within a KVM guest. diff --git a/include/linux/page_hinting.h b/include/linux/page_hinting.h new file mode 100644 index 000000000000..e65188fe1e6b --- /dev/null +++ b/include/linux/page_hinting.h @@ -0,0 +1,46 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PAGE_HINTING_H +#define _LINUX_PAGE_HINTING_H + +/* + * Minimum page order required for a page to be hinted to the host. + */ +#define PAGE_HINTING_MIN_ORDER (MAX_ORDER - 2) + +/* + * struct page_hinting_cb: holds the callbacks to store, report and cleanup + * isolated pages. + * @prepare: Callback responsible for allocating an array to hold + * the isolated pages. + * @hint_pages: Callback which reports the isolated pages synchornously + * to the host. + * @cleanup: Callback to free the the array used for reporting the + * isolated pages. + * @max_pages: Maxmimum pages that are going to be hinted to the host + * at a time of granularity >= PAGE_HINTING_MIN_ORDER. + */ +struct page_hinting_cb { + int (*prepare)(void); + void (*hint_pages)(struct list_head *list); + void (*cleanup)(void); + int max_pages; +}; + +#ifdef CONFIG_PAGE_HINTING +void page_hinting_enqueue(struct page *page, int order); +void page_hinting_enable(const struct page_hinting_cb *cb); +void page_hinting_disable(void); +#else +static inline void page_hinting_enqueue(struct page *page, int order) +{ +} + +static inline void page_hinting_enable(struct page_hinting_cb *cb) +{ +} + +static inline void page_hinting_disable(void) +{ +} +#endif +#endif /* _LINUX_PAGE_HINTING_H */ diff --git a/mm/Kconfig b/mm/Kconfig index ee8d1f311858..177d858de758 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -764,4 +764,10 @@ config GUP_BENCHMARK config ARCH_HAS_PTE_SPECIAL bool +# PAGE_HINTING will allow the guest to report the free pages to the +# host in regular interval of time. +config PAGE_HINTING + bool + def_bool n + depends on X86_64 endmenu diff --git a/mm/Makefile b/mm/Makefile index ac5e5ba78874..bec456dfee34 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -41,6 +41,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ interval_tree.o list_lru.o workingset.o \ debug.o $(mmu-y) + # Give 'page_alloc' its own module-parameter namespace page-alloc-y := page_alloc.o page-alloc-$(CONFIG_SHUFFLE_PAGE_ALLOCATOR) += shuffle.o @@ -94,6 +95,7 @@ obj-$(CONFIG_Z3FOLD) += z3fold.o obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o obj-$(CONFIG_CMA) += cma.o obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o +obj-$(CONFIG_PAGE_HINTING) += page_hinting.o obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o obj-$(CONFIG_USERFAULTFD) += userfaultfd.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3b13d3914176..d12f69e0e402 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -68,6 +68,7 @@ #include #include #include +#include #include #include @@ -873,10 +874,10 @@ compaction_capture(struct capture_control *capc, struct page *page, * -- nyc */ -static inline void __free_one_page(struct page *page, +inline void __free_one_page(struct page *page, unsigned long pfn, struct zone *zone, unsigned int order, - int migratetype) + int migratetype, bool hint) { unsigned long combined_pfn; unsigned long uninitialized_var(buddy_pfn); @@ -951,6 +952,8 @@ static inline void __free_one_page(struct page *page, done_merging: set_page_order(page, order); + if (hint) + page_hinting_enqueue(page, order); /* * If this is not the largest possible page, check if the buddy * of the next-highest order is free. If it is, it's possible @@ -1262,7 +1265,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, if (unlikely(isolated_pageblocks)) mt = get_pageblock_migratetype(page); - __free_one_page(page, page_to_pfn(page), zone, 0, mt); + __free_one_page(page, page_to_pfn(page), zone, 0, mt, true); trace_mm_page_pcpu_drain(page, 0, mt); } spin_unlock(&zone->lock); @@ -1271,14 +1274,14 @@ static void free_pcppages_bulk(struct zone *zone, int count, static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, - int migratetype) + int migratetype, bool hint) { spin_lock(&zone->lock); if (unlikely(has_isolate_pageblock(zone) || is_migrate_isolate(migratetype))) { migratetype = get_pfnblock_migratetype(page, pfn); } - __free_one_page(page, pfn, zone, order, migratetype); + __free_one_page(page, pfn, zone, order, migratetype, hint); spin_unlock(&zone->lock); } @@ -1368,7 +1371,7 @@ static void __free_pages_ok(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); local_irq_save(flags); __count_vm_events(PGFREE, 1 << order); - free_one_page(page_zone(page), page, pfn, order, migratetype); + free_one_page(page_zone(page), page, pfn, order, migratetype, true); local_irq_restore(flags); } @@ -2968,7 +2971,7 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) */ if (migratetype >= MIGRATE_PCPTYPES) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(zone, page, pfn, 0, migratetype); + free_one_page(zone, page, pfn, 0, migratetype, true); return; } migratetype = MIGRATE_MOVABLE; diff --git a/mm/page_hinting.c b/mm/page_hinting.c new file mode 100644 index 000000000000..7341c6462de2 --- /dev/null +++ b/mm/page_hinting.c @@ -0,0 +1,236 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Page hinting support to enable a VM to report the freed pages back + * to the host. + * + * Copyright Red Hat, Inc. 2019 + * + * Author(s): Nitesh Narayan Lal + */ + +#include +#include +#include +#include + +/* + * struct hinting_bitmap: holds the bitmap pointer which tracks the freed PFNs + * and other required parameters which could help in retrieving the original + * PFN value using the bitmap. + * @bitmap: Pointer to the bitmap of free PFN. + * @base_pfn: Starting PFN value for the zone whose bitmap is stored. + * @free_pages: Tracks the number of free pages of granularity + * PAGE_HINTING_MIN_ORDER. + * @nbits: Indicates the total size of the bitmap in bits allocated + * at the time of initialization. + */ +struct hinting_bitmap { + unsigned long *bitmap; + unsigned long base_pfn; + atomic_t free_pages; + unsigned long nbits; +} bm_zone[MAX_NR_ZONES]; + +static void init_hinting_wq(struct work_struct *work); +extern int __isolate_free_page(struct page *page, unsigned int order); +extern void __free_one_page(struct page *page, unsigned long pfn, + struct zone *zone, unsigned int order, + int migratetype, bool hint); +const struct page_hinting_cb *hcb; +struct work_struct hinting_work; + +static unsigned long find_bitmap_size(struct zone *zone) +{ + unsigned long nbits = ALIGN(zone->spanned_pages, + PAGE_HINTING_MIN_ORDER); + + nbits = nbits >> PAGE_HINTING_MIN_ORDER; + return nbits; +} + +void page_hinting_enable(const struct page_hinting_cb *callback) +{ + struct zone *zone; + int idx = 0; + unsigned long bitmap_size = 0; + + for_each_populated_zone(zone) { + spin_lock(&zone->lock); + bitmap_size = find_bitmap_size(zone); + bm_zone[idx].bitmap = bitmap_zalloc(bitmap_size, GFP_KERNEL); + if (!bm_zone[idx].bitmap) + return; + bm_zone[idx].nbits = bitmap_size; + bm_zone[idx].base_pfn = zone->zone_start_pfn; + spin_unlock(&zone->lock); + idx++; + } + hcb = callback; + INIT_WORK(&hinting_work, init_hinting_wq); +} +EXPORT_SYMBOL_GPL(page_hinting_enable); + +void page_hinting_disable(void) +{ + struct zone *zone; + int idx = 0; + + cancel_work_sync(&hinting_work); + hcb = NULL; + for_each_populated_zone(zone) { + spin_lock(&zone->lock); + bitmap_free(bm_zone[idx].bitmap); + bm_zone[idx].base_pfn = 0; + bm_zone[idx].nbits = 0; + atomic_set(&bm_zone[idx].free_pages, 0); + spin_unlock(&zone->lock); + idx++; + } +} +EXPORT_SYMBOL_GPL(page_hinting_disable); + +static unsigned long pfn_to_bit(struct page *page, int zonenum) +{ + unsigned long bitnr; + + bitnr = (page_to_pfn(page) - bm_zone[zonenum].base_pfn) + >> PAGE_HINTING_MIN_ORDER; + return bitnr; +} + +static void release_buddy_pages(struct list_head *pages) +{ + int mt = 0, zonenum, order; + struct page *page, *next; + struct zone *zone; + unsigned long bitnr; + + list_for_each_entry_safe(page, next, pages, lru) { + zonenum = page_zonenum(page); + zone = page_zone(page); + bitnr = pfn_to_bit(page, zonenum); + spin_lock(&zone->lock); + list_del(&page->lru); + order = page_private(page); + set_page_private(page, 0); + mt = get_pageblock_migratetype(page); + __free_one_page(page, page_to_pfn(page), zone, + order, mt, false); + spin_unlock(&zone->lock); + } +} + +static void bm_set_pfn(struct page *page) +{ + unsigned long bitnr = 0; + int zonenum = page_zonenum(page); + struct zone *zone = page_zone(page); + + lockdep_assert_held(&zone->lock); + bitnr = pfn_to_bit(page, zonenum); + if (bm_zone[zonenum].bitmap && + bitnr < bm_zone[zonenum].nbits && + !test_and_set_bit(bitnr, bm_zone[zonenum].bitmap)) + atomic_inc(&bm_zone[zonenum].free_pages); +} + +static void scan_hinting_bitmap(int zonenum, int free_pages) +{ + unsigned long set_bit, start = 0; + struct page *page; + struct zone *zone; + int scanned_pages = 0, ret = 0, order, isolated_cnt = 0; + LIST_HEAD(isolated_pages); + + ret = hcb->prepare(); + if (ret < 0) + return; + for (;;) { + ret = 0; + set_bit = find_next_bit(bm_zone[zonenum].bitmap, + bm_zone[zonenum].nbits, start); + if (set_bit >= bm_zone[zonenum].nbits) + break; + page = pfn_to_online_page((set_bit << PAGE_HINTING_MIN_ORDER) + + bm_zone[zonenum].base_pfn); + if (!page) + continue; + zone = page_zone(page); + spin_lock(&zone->lock); + + if (PageBuddy(page) && page_private(page) >= + PAGE_HINTING_MIN_ORDER) { + order = page_private(page); + ret = __isolate_free_page(page, order); + } + clear_bit(set_bit, bm_zone[zonenum].bitmap); + spin_unlock(&zone->lock); + if (ret) { + /* + * restoring page order to use it while releasing + * the pages back to the buddy. + */ + set_page_private(page, order); + list_add_tail(&page->lru, &isolated_pages); + isolated_cnt++; + if (isolated_cnt == hcb->max_pages) { + hcb->hint_pages(&isolated_pages); + release_buddy_pages(&isolated_pages); + isolated_cnt = 0; + } + } + start = set_bit + 1; + scanned_pages++; + } + if (isolated_cnt) { + hcb->hint_pages(&isolated_pages); + release_buddy_pages(&isolated_pages); + } + hcb->cleanup(); + if (scanned_pages > free_pages) + atomic_sub((scanned_pages - free_pages), + &bm_zone[zonenum].free_pages); +} + +static bool check_hinting_threshold(void) +{ + int zonenum = 0; + + for (; zonenum < MAX_NR_ZONES; zonenum++) { + if (atomic_read(&bm_zone[zonenum].free_pages) >= + hcb->max_pages) + return true; + } + return false; +} + +static void init_hinting_wq(struct work_struct *work) +{ + int zonenum = 0, free_pages = 0; + + for (; zonenum < MAX_NR_ZONES; zonenum++) { + free_pages = atomic_read(&bm_zone[zonenum].free_pages); + if (free_pages >= hcb->max_pages) { + /* Find a better way to synchronize per zone + * free_pages. + */ + atomic_sub(free_pages, + &bm_zone[zonenum].free_pages); + scan_hinting_bitmap(zonenum, free_pages); + } + } +} + +void page_hinting_enqueue(struct page *page, int order) +{ + if (hcb && order >= PAGE_HINTING_MIN_ORDER) + bm_set_pfn(page); + else + return; + + if (check_hinting_threshold()) { + int cpu = smp_processor_id(); + + queue_work_on(cpu, system_wq, &hinting_work); + } +} From patchwork Mon Jun 3 17:03:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nitesh Narayan Lal X-Patchwork-Id: 10973495 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64F7676 for ; Mon, 3 Jun 2019 17:04:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 49FB728630 for ; Mon, 3 Jun 2019 17:04:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3E4A128684; Mon, 3 Jun 2019 17:04:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 85E3828630 for ; Mon, 3 Jun 2019 17:04:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 862F56B0284; Mon, 3 Jun 2019 13:04:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7EC0F6B0286; Mon, 3 Jun 2019 13:04:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 666A46B0287; Mon, 3 Jun 2019 13:04:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-oi1-f197.google.com (mail-oi1-f197.google.com [209.85.167.197]) by kanga.kvack.org (Postfix) with ESMTP id 3D4456B0284 for ; Mon, 3 Jun 2019 13:04:17 -0400 (EDT) Received: by mail-oi1-f197.google.com with SMTP id 5so5452507oix.4 for ; Mon, 03 Jun 2019 10:04:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=iAVJ7RlF1BhJM+W0IG8SEOeAkDoHCeR7KzJbXzqzq0Y=; b=RWWB6DIX1tWTJAUbuh5jAGB5OIhqTvWo1XJQkOnfm16UbpjisGAELETZjS4cfGtUXB HEzDJVAO7Mi3iE/7XZGGN+xuRMJIwEEnXeJAKTi5ZdZPZruXhJu8ly1pDAUDXUSaovmO jAsK9HnV+yi0T2xeGnIfB+G3YqosJ6PxlABFnuDw4Qgt5XmE8c/3ikvNSJPtGbX+Pth3 XTZAV4eIghcSHQSL8Ja0H5H2ZHK88a2EmsYRJ/thwQ3NKom2t3IuuOuhaRVPIJ/xWYLm ycZfDnvQUIhc3ON5kbcbtE0RYAt50Kw3dekPu0ZLdpKSDxhGb5jhIerPLan8fetAzzYd dkEQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of nitesh@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=nitesh@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAWj07HIHgucj5U+WlP8yOZNq3q97118mSZR2Ty4nsWsJXeurmzw S+aU2nE5Nz5IfT19Q0VnfOiaHuAvVQfDBFVz+Ap9rs2rGQtkKYTXDh+ttlARaTkz1TVX9K1KoM1 Sw9Z+vW+vx2yFQWH0XNJ1V/ySW2DlpHdvvxxBVcmKpA0w1oLg31esKwTkZQLEPb1oow== X-Received: by 2002:aca:5c1:: with SMTP id 184mr145158oif.92.1559581456901; Mon, 03 Jun 2019 10:04:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqzj/X/IfNJV4QM3lLQQqvqH8rwIOLfr8uM2K0b3eaGnJqBDho7vKvsRlZGne0EgiKQcEi0Y X-Received: by 2002:aca:5c1:: with SMTP id 184mr145095oif.92.1559581456031; Mon, 03 Jun 2019 10:04:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559581456; cv=none; d=google.com; s=arc-20160816; b=KPbyuZ4BaIVPpzo54L6HNRESCJOZxNTvsTJ1aFs5Qw+AXDvqsRXK9Mrqnox+ScPFyf eueK1k/PNITLJeA2dTLYo03RNNK6DjdZ7CJfoeaZ0exOqTKGXiwMg43hypHHIObT/LyS kNb3pdPfWwHMjcq4DwyTrz9YC7wHxFiajrX3Z3qtr1UFbFR/5b9URw+/7hADJuPIndGn qAuiGVITrrXDva9uqvG6e7eVTl2LLftdobInpk27x8C32Stq09+sUzXfwMQld2z4OoDm KtJxX/TiiX5K14NenU6Zz0JjUVliWiX1L+M6otUUtcGRR8rF8KS0yv5xFo/VlZQB0GxQ idlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from; bh=iAVJ7RlF1BhJM+W0IG8SEOeAkDoHCeR7KzJbXzqzq0Y=; b=H4LS8pJs3hr5BcYMiIC0cHlrDjPAjgKa9YcoujHsc9T0+rUENyE2zs5n5axlqspUjB 8Snf40sdc7uYSu+4Q7ncXl7lG73wGgkEYapI3AtyPwcYXACSKRY0A0neq+HRhfz6az1U psJtMR0oa3pjL1+v5Ikh8LYp9YC8Re1CE3OpLkM9FiQcKL6SjS5KrAUTabXrLXF6+kNI p7lukAXf+VnN5jwkuMQX2BnXsjocF/CK7pH+Jv6vT3yysWLlseUz7WLxFlJnQ9Wkfuz3 wYvHnGmng0fO1SBB9NBHH26iLnjliUvBE3kFhlAfi4fjsA4c4pFwPm141kMsmIBfRerF g9MQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of nitesh@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=nitesh@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id m30si9042128otj.199.2019.06.03.10.04.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Jun 2019 10:04:16 -0700 (PDT) Received-SPF: pass (google.com: domain of nitesh@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of nitesh@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=nitesh@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E3EA481E0C; Mon, 3 Jun 2019 17:04:14 +0000 (UTC) Received: from virtlab512.virt.lab.eng.bos.redhat.com (virtlab512.virt.lab.eng.bos.redhat.com [10.19.152.206]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4410761983; Mon, 3 Jun 2019 17:04:07 +0000 (UTC) From: Nitesh Narayan Lal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, mst@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com, alexander.duyck@gmail.com Subject: [RFC][Patch v10 2/2] virtio-balloon: page_hinting: reporting to the host Date: Mon, 3 Jun 2019 13:03:06 -0400 Message-Id: <20190603170306.49099-3-nitesh@redhat.com> In-Reply-To: <20190603170306.49099-1-nitesh@redhat.com> References: <20190603170306.49099-1-nitesh@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Mon, 03 Jun 2019 17:04:15 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Enables the kernel to negotiate VIRTIO_BALLOON_F_HINTING feature with the host. If it is available and page_hinting_flag is set to true, page_hinting is enabled and its callbacks are configured along with the max_pages count which indicates the maximum number of pages that can be isolated and hinted at a time. Currently, only free pages of order >= (MAX_ORDER - 2) are reported. To prevent any false OOM max_pages count is set to 16. By default page_hinting feature is enabled and gets loaded as soon as the virtio-balloon driver is loaded. However, it could be disabled by writing the page_hinting_flag which is a virtio-balloon parameter. Signed-off-by: Nitesh Narayan Lal --- drivers/virtio/virtio_balloon.c | 112 +++++++++++++++++++++++++++- include/uapi/linux/virtio_balloon.h | 14 ++++ 2 files changed, 125 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index f19061b585a4..40f09ea31643 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -31,6 +31,7 @@ #include #include #include +#include /* * Balloon device works in 4K page units. So each page is pointed to by @@ -48,6 +49,7 @@ /* The size of a free page block in bytes */ #define VIRTIO_BALLOON_FREE_PAGE_SIZE \ (1 << (VIRTIO_BALLOON_FREE_PAGE_ORDER + PAGE_SHIFT)) +#define VIRTIO_BALLOON_PAGE_HINTING_MAX_PAGES 16 #ifdef CONFIG_BALLOON_COMPACTION static struct vfsmount *balloon_mnt; @@ -58,6 +60,7 @@ enum virtio_balloon_vq { VIRTIO_BALLOON_VQ_DEFLATE, VIRTIO_BALLOON_VQ_STATS, VIRTIO_BALLOON_VQ_FREE_PAGE, + VIRTIO_BALLOON_VQ_HINTING, VIRTIO_BALLOON_VQ_MAX }; @@ -67,7 +70,8 @@ enum virtio_balloon_config_read { struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq, + *hinting_vq; /* Balloon's own wq for cpu-intensive work items */ struct workqueue_struct *balloon_wq; @@ -125,6 +129,9 @@ struct virtio_balloon { /* To register a shrinker to shrink memory upon memory pressure */ struct shrinker shrinker; + + /* object pointing at the array of isolated pages ready for hinting */ + struct hinting_data *hinting_arr; }; static struct virtio_device_id id_table[] = { @@ -132,6 +139,85 @@ static struct virtio_device_id id_table[] = { { 0 }, }; +#ifdef CONFIG_PAGE_HINTING +struct virtio_balloon *hvb; +bool page_hinting_flag = true; +module_param(page_hinting_flag, bool, 0444); +MODULE_PARM_DESC(page_hinting_flag, "Enable page hinting"); + +static bool virtqueue_kick_sync(struct virtqueue *vq) +{ + u32 len; + + if (likely(virtqueue_kick(vq))) { + while (!virtqueue_get_buf(vq, &len) && + !virtqueue_is_broken(vq)) + cpu_relax(); + return true; + } + return false; +} + +static void page_hinting_report(int entries) +{ + struct scatterlist sg; + struct virtqueue *vq = hvb->hinting_vq; + int err = 0; + struct hinting_data *hint_req; + u64 gpaddr; + + hint_req = kmalloc(sizeof(*hint_req), GFP_KERNEL); + if (!hint_req) + return; + gpaddr = virt_to_phys(hvb->hinting_arr); + hint_req->phys_addr = cpu_to_virtio64(hvb->vdev, gpaddr); + hint_req->size = cpu_to_virtio32(hvb->vdev, entries); + sg_init_one(&sg, hint_req, sizeof(*hint_req)); + err = virtqueue_add_outbuf(vq, &sg, 1, hint_req, GFP_KERNEL); + if (!err) + virtqueue_kick_sync(hvb->hinting_vq); + + kfree(hint_req); +} + +int page_hinting_prepare(void) +{ + hvb->hinting_arr = kmalloc_array(VIRTIO_BALLOON_PAGE_HINTING_MAX_PAGES, + sizeof(*hvb->hinting_arr), GFP_KERNEL); + if (!hvb->hinting_arr) + return -ENOMEM; + return 0; +} + +void hint_pages(struct list_head *pages) +{ + struct page *page, *next; + unsigned long pfn; + int idx = 0, order; + + list_for_each_entry_safe(page, next, pages, lru) { + pfn = page_to_pfn(page); + order = page_private(page); + hvb->hinting_arr[idx].phys_addr = pfn << PAGE_SHIFT; + hvb->hinting_arr[idx].size = (1 << order) * PAGE_SIZE; + idx++; + } + page_hinting_report(idx); +} + +void page_hinting_cleanup(void) +{ + kfree(hvb->hinting_arr); +} + +static const struct page_hinting_cb hcb = { + .prepare = page_hinting_prepare, + .hint_pages = hint_pages, + .cleanup = page_hinting_cleanup, + .max_pages = VIRTIO_BALLOON_PAGE_HINTING_MAX_PAGES, +}; +#endif + static u32 page_to_balloon_pfn(struct page *page) { unsigned long pfn = page_to_pfn(page); @@ -488,6 +574,7 @@ static int init_vqs(struct virtio_balloon *vb) names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate"; names[VIRTIO_BALLOON_VQ_STATS] = NULL; names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; + names[VIRTIO_BALLOON_VQ_HINTING] = NULL; if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { names[VIRTIO_BALLOON_VQ_STATS] = "stats"; @@ -499,11 +586,18 @@ static int init_vqs(struct virtio_balloon *vb) callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; } + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINTING)) { + names[VIRTIO_BALLOON_VQ_HINTING] = "hinting_vq"; + callbacks[VIRTIO_BALLOON_VQ_HINTING] = NULL; + } err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs, callbacks, names, NULL, NULL); if (err) return err; + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINTING)) + vb->hinting_vq = vqs[VIRTIO_BALLOON_VQ_HINTING]; + vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE]; vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE]; if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { @@ -942,6 +1036,14 @@ static int virtballoon_probe(struct virtio_device *vdev) if (err) goto out_del_balloon_wq; } + +#ifdef CONFIG_PAGE_HINTING + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINTING) && + page_hinting_flag) { + hvb = vb; + page_hinting_enable(&hcb); + } +#endif virtio_device_ready(vdev); if (towards_target(vb)) @@ -989,6 +1091,12 @@ static void virtballoon_remove(struct virtio_device *vdev) destroy_workqueue(vb->balloon_wq); } +#ifdef CONFIG_PAGE_HINTING + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINTING)) { + hvb = NULL; + page_hinting_disable(); + } +#endif remove_common(vb); #ifdef CONFIG_BALLOON_COMPACTION if (vb->vb_dev_info.inode) @@ -1043,8 +1151,10 @@ static unsigned int features[] = { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, + VIRTIO_BALLOON_F_HINTING, VIRTIO_BALLOON_F_FREE_PAGE_HINT, VIRTIO_BALLOON_F_PAGE_POISON, + VIRTIO_BALLOON_F_HINTING, }; static struct virtio_driver virtio_balloon_driver = { diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index a1966cd7b677..25e4f817c660 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -29,6 +29,7 @@ #include #include #include +#include /* The feature bitmap for virtio balloon */ #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */ @@ -36,6 +37,7 @@ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */ #define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */ +#define VIRTIO_BALLOON_F_HINTING 5 /* Page hinting virtqueue */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -108,4 +110,16 @@ struct virtio_balloon_stat { __virtio64 val; } __attribute__((packed)); +#ifdef CONFIG_PAGE_HINTING +/* + * struct hinting_data- holds the information associated with hinting. + * @phys_add: physical address associated with a page or the array holding + * the array of isolated pages. + * @size: total size associated with the phys_addr. + */ +struct hinting_data { + __virtio64 phys_addr; + __virtio32 size; +}; +#endif #endif /* _LINUX_VIRTIO_BALLOON_H */