From patchwork Fri Mar 26 09:44:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xunlei Pang X-Patchwork-Id: 12166149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F520C433C1 for ; Fri, 26 Mar 2021 09:45:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7FB5861A0F for ; Fri, 26 Mar 2021 09:45:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FB5861A0F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3F9F66B0072; Fri, 26 Mar 2021 05:45:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1885F6B0071; Fri, 26 Mar 2021 05:45:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF50F6B0075; Fri, 26 Mar 2021 05:45:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0128.hostedemail.com [216.40.44.128]) by kanga.kvack.org (Postfix) with ESMTP id BBB966B0036 for ; Fri, 26 Mar 2021 05:45:04 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 77E41943F for ; Fri, 26 Mar 2021 09:45:04 +0000 (UTC) X-FDA: 77961541728.33.F5B0041 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf22.hostedemail.com (Postfix) with ESMTP id 2174EC0007D6 for ; Fri, 26 Mar 2021 09:45:00 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=xlpang@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0UTMseJS_1616751898; Received: from localhost(mailfrom:xlpang@linux.alibaba.com fp:SMTPD_---0UTMseJS_1616751898) by smtp.aliyun-inc.com(127.0.0.1); Fri, 26 Mar 2021 17:44:59 +0800 From: Xunlei Pang To: Andrew Morton , Alexander Duyck , Mel Gorman Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Xunlei Pang Subject: [PATCH 1/4] mm/page_reporting: Introduce free page reported counters Date: Fri, 26 Mar 2021 17:44:55 +0800 Message-Id: <1616751898-58393-2-git-send-email-xlpang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1616751898-58393-1-git-send-email-xlpang@linux.alibaba.com> References: <1616751898-58393-1-git-send-email-xlpang@linux.alibaba.com> X-Stat-Signature: rgsdntuej6ty7n5wwcybneqyte5858hp X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2174EC0007D6 Received-SPF: none (linux.alibaba.com>: No applicable sender policy available) receiver=imf22; identity=mailfrom; envelope-from=""; helo=out30-130.freemail.mail.aliyun.com; client-ip=115.124.30.130 X-HE-DKIM-Result: none/none X-HE-Tag: 1616751900-749284 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It's useful to know how many memory has been actually reported, so add new zone::reported_pages to record that. Add "/sys/kernel/mm/page_reporting/reported_kbytes" for the actual memory has been reported. Add "/sys/kernel/mm/page_reporting/refault_kbytes" for the accumulated memory has refaulted in after been reported out. Signed-off-by: Xunlei Pang --- include/linux/mmzone.h | 3 ++ mm/page_alloc.c | 4 +- mm/page_reporting.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++-- mm/page_reporting.h | 5 +++ 4 files changed, 119 insertions(+), 5 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 47946ce..ebd169f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -530,6 +530,9 @@ struct zone { atomic_long_t managed_pages; unsigned long spanned_pages; unsigned long present_pages; +#ifdef CONFIG_PAGE_REPORTING + unsigned long reported_pages; +#endif #ifdef CONFIG_CMA unsigned long cma_pages; #endif diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3e4b29ee..c2c5688 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -930,8 +930,10 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, unsigned int order) { /* clear reported state and update reported page count */ - if (page_reported(page)) + if (page_reported(page)) { __ClearPageReported(page); + page_reporting_update_refault(zone, 1 << order); + } list_del(&page->lru); __ClearPageBuddy(page); diff --git a/mm/page_reporting.c b/mm/page_reporting.c index c50d93f..ba195ea 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include @@ -19,6 +20,22 @@ enum { PAGE_REPORTING_ACTIVE }; +#ifdef CONFIG_SYSFS +static struct percpu_counter refault_pages; + +void page_reporting_update_refault(struct zone *zone, unsigned int pages) +{ + zone->reported_pages -= pages; + percpu_counter_add_batch(&refault_pages, pages, INT_MAX / 2); +} +#else +void page_reporting_update_refault(struct zone *zone, unsigned int pages) +{ + zone->reported_pages -= pages; +} +#endif + + /* request page reporting */ static void __page_reporting_request(struct page_reporting_dev_info *prdev) @@ -66,7 +83,8 @@ void __page_reporting_notify(void) static void page_reporting_drain(struct page_reporting_dev_info *prdev, - struct scatterlist *sgl, unsigned int nents, bool reported) + struct scatterlist *sgl, struct zone *zone, + unsigned int nents, bool reported) { struct scatterlist *sg = sgl; @@ -92,8 +110,10 @@ void __page_reporting_notify(void) * report on the new larger page when we make our way * up to that higher order. */ - if (PageBuddy(page) && buddy_order(page) == order) + if (PageBuddy(page) && buddy_order(page) == order) { __SetPageReported(page); + zone->reported_pages += (1 << order); + } } while ((sg = sg_next(sg))); /* reinitialize scatterlist now that it is empty */ @@ -197,7 +217,7 @@ void __page_reporting_notify(void) spin_lock_irq(&zone->lock); /* flush reported pages from the sg list */ - page_reporting_drain(prdev, sgl, PAGE_REPORTING_CAPACITY, !err); + page_reporting_drain(prdev, sgl, zone, PAGE_REPORTING_CAPACITY, !err); /* * Reset next to first entry, the old next isn't valid @@ -260,7 +280,7 @@ void __page_reporting_notify(void) /* flush any remaining pages out from the last report */ spin_lock_irq(&zone->lock); - page_reporting_drain(prdev, sgl, leftover, !err); + page_reporting_drain(prdev, sgl, zone, leftover, !err); spin_unlock_irq(&zone->lock); } @@ -362,3 +382,87 @@ void page_reporting_unregister(struct page_reporting_dev_info *prdev) mutex_unlock(&page_reporting_mutex); } EXPORT_SYMBOL_GPL(page_reporting_unregister); + +#ifdef CONFIG_SYSFS +#define REPORTING_ATTR(_name) \ + static struct kobj_attribute _name##_attr = \ + __ATTR(_name, 0644, _name##_show, _name##_store) + +static unsigned long get_reported_kbytes(void) +{ + struct zone *z; + unsigned long nr_reported = 0; + + for_each_populated_zone(z) + nr_reported += z->reported_pages; + + return nr_reported << (PAGE_SHIFT - 10); +} + +static ssize_t reported_kbytes_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", get_reported_kbytes()); +} + +static ssize_t reported_kbytes_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + return -EINVAL; +} +REPORTING_ATTR(reported_kbytes); + +static u64 get_refault_kbytes(void) +{ + u64 sum; + + sum = percpu_counter_sum_positive(&refault_pages); + return sum << (PAGE_SHIFT - 10); +} + +static ssize_t refault_kbytes_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%llu\n", get_refault_kbytes()); +} + +static ssize_t refault_kbytes_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + return -EINVAL; +} +REPORTING_ATTR(refault_kbytes); + +static struct attribute *reporting_attrs[] = { + &reported_kbytes_attr.attr, + &refault_kbytes_attr.attr, + NULL, +}; + +static struct attribute_group reporting_attr_group = { + .attrs = reporting_attrs, + .name = "page_reporting", +}; +#endif + +static int __init page_reporting_init(void) +{ +#ifdef CONFIG_SYSFS + int err; + + if (percpu_counter_init(&refault_pages, 0, GFP_KERNEL)) + panic("Failed to allocate refault_pages percpu counter\n"); + + err = sysfs_create_group(mm_kobj, &reporting_attr_group); + if (err) { + pr_err("%s: Unable to populate sysfs files\n", __func__); + return err; + } +#endif + + return 0; +} + +module_init(page_reporting_init); diff --git a/mm/page_reporting.h b/mm/page_reporting.h index 2c385dd..19549c7 100644 --- a/mm/page_reporting.h +++ b/mm/page_reporting.h @@ -44,11 +44,16 @@ static inline void page_reporting_notify_free(unsigned int order) /* This will add a few cycles, but should be called infrequently */ __page_reporting_notify(); } + +void page_reporting_update_refault(struct zone *zone, unsigned int pages); #else /* CONFIG_PAGE_REPORTING */ #define page_reported(_page) false static inline void page_reporting_notify_free(unsigned int order) { } + +static inline void +page_reporting_update_refault(struct zone *zone, unsigned int pages) { } #endif /* CONFIG_PAGE_REPORTING */ #endif /*_MM_PAGE_REPORTING_H */ From patchwork Fri Mar 26 09:44:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xunlei Pang X-Patchwork-Id: 12166147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDC26C433E1 for ; Fri, 26 Mar 2021 09:45:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2B3C961A0F for ; Fri, 26 Mar 2021 09:45:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2B3C961A0F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 211EB6B006E; Fri, 26 Mar 2021 05:45:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1CEF6B0074; Fri, 26 Mar 2021 05:45:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C84B06B0073; Fri, 26 Mar 2021 05:45:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0221.hostedemail.com [216.40.44.221]) by kanga.kvack.org (Postfix) with ESMTP id 8A0F76B0070 for ; Fri, 26 Mar 2021 05:45:04 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3DCC5ABF4 for ; Fri, 26 Mar 2021 09:45:04 +0000 (UTC) X-FDA: 77961541728.21.0523679 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf27.hostedemail.com (Postfix) with ESMTP id 988E080192E0 for ; Fri, 26 Mar 2021 09:44:59 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=xlpang@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0UTNHunI_1616751899; Received: from localhost(mailfrom:xlpang@linux.alibaba.com fp:SMTPD_---0UTNHunI_1616751899) by smtp.aliyun-inc.com(127.0.0.1); Fri, 26 Mar 2021 17:44:59 +0800 From: Xunlei Pang To: Andrew Morton , Alexander Duyck , Mel Gorman Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Xunlei Pang Subject: [PATCH 2/4] mm/page_reporting: Introduce free page reporting factor Date: Fri, 26 Mar 2021 17:44:56 +0800 Message-Id: <1616751898-58393-3-git-send-email-xlpang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1616751898-58393-1-git-send-email-xlpang@linux.alibaba.com> References: <1616751898-58393-1-git-send-email-xlpang@linux.alibaba.com> X-Stat-Signature: b3ii4b81gu3xwq5hppuztoh4fk383gpa X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 988E080192E0 Received-SPF: none (linux.alibaba.com>: No applicable sender policy available) receiver=imf27; identity=mailfrom; envelope-from=""; helo=out30-131.freemail.mail.aliyun.com; client-ip=115.124.30.131 X-HE-DKIM-Result: none/none X-HE-Tag: 1616751899-237916 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add new "/sys/kernel/mm/page_reporting/reporting_factor" within [0, 100], and stop page reporting when it reaches the configured threshold. Default is 100 which means no limitation is imposed. Percentile is adopted to reflect the fact that it reports on the per-zone basis. We can control the total number of reporting pages via this knob to avoid EPT violations which may affect the performance of the business, imagine the guest memory allocation burst or host long-tail memory reclaiming really hurt. This knob can help make customized control policies according to VM priority, it is also useful for testing, gray-release, etc. Signed-off-by: Xunlei Pang --- mm/page_reporting.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/mm/page_reporting.c b/mm/page_reporting.c index ba195ea..86c6479 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -11,6 +11,8 @@ #include "page_reporting.h" #include "internal.h" +static int reporting_factor = 100; + #define PAGE_REPORTING_DELAY (2 * HZ) static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly; @@ -134,6 +136,7 @@ void __page_reporting_notify(void) struct list_head *list = &area->free_list[mt]; unsigned int page_len = PAGE_SIZE << order; struct page *page, *next; + unsigned long threshold; long budget; int err = 0; @@ -144,6 +147,7 @@ void __page_reporting_notify(void) if (list_empty(list)) return err; + threshold = atomic_long_read(&zone->managed_pages) * reporting_factor / 100; spin_lock_irq(&zone->lock); /* @@ -181,6 +185,8 @@ void __page_reporting_notify(void) /* Attempt to pull page from list and place in scatterlist */ if (*offset) { + unsigned long nr_pages; + if (!__isolate_free_page(page, order)) { next = page; break; @@ -190,6 +196,12 @@ void __page_reporting_notify(void) --(*offset); sg_set_page(&sgl[*offset], page, page_len, 0); + nr_pages = (PAGE_REPORTING_CAPACITY - *offset) << order; + if (zone->reported_pages + nr_pages >= threshold) { + err = 1; + break; + } + continue; } @@ -244,9 +256,13 @@ void __page_reporting_notify(void) struct scatterlist *sgl, struct zone *zone) { unsigned int order, mt, leftover, offset = PAGE_REPORTING_CAPACITY; - unsigned long watermark; + unsigned long watermark, threshold; int err = 0; + threshold = atomic_long_read(&zone->managed_pages) * reporting_factor / 100; + if (zone->reported_pages >= threshold) + return err; + /* Generate minimum watermark to be able to guarantee progress */ watermark = low_wmark_pages(zone) + (PAGE_REPORTING_CAPACITY << PAGE_REPORTING_MIN_ORDER); @@ -267,11 +283,18 @@ void __page_reporting_notify(void) err = page_reporting_cycle(prdev, zone, order, mt, sgl, &offset); + /* Exceed threshold go to report leftover */ + if (err > 0) { + err = 0; + goto leftover; + } + if (err) return err; } } +leftover: /* report the leftover pages before going idle */ leftover = PAGE_REPORTING_CAPACITY - offset; if (leftover) { @@ -435,9 +458,44 @@ static ssize_t refault_kbytes_store(struct kobject *kobj, } REPORTING_ATTR(refault_kbytes); +static ssize_t reporting_factor_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%u\n", reporting_factor); +} + +static ssize_t reporting_factor_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int new, old, err; + struct page *page; + + err = kstrtoint(buf, 10, &new); + if (err || (new < 0 || new > 100)) + return -EINVAL; + + old = reporting_factor; + reporting_factor = new; + + if (new <= old) + goto out; + + /* Trigger reporting with new larger reporting_factor */ + page = alloc_pages(__GFP_HIGHMEM | __GFP_NOWARN, + PAGE_REPORTING_MIN_ORDER); + if (page) + __free_pages(page, PAGE_REPORTING_MIN_ORDER); + +out: + return count; +} +REPORTING_ATTR(reporting_factor); + static struct attribute *reporting_attrs[] = { &reported_kbytes_attr.attr, &refault_kbytes_attr.attr, + &reporting_factor_attr.attr, NULL, }; From patchwork Fri Mar 26 09:44:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xunlei Pang X-Patchwork-Id: 12166143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71626C433DB for ; Fri, 26 Mar 2021 09:45:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 837CB61A0F for ; Fri, 26 Mar 2021 09:45:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 837CB61A0F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C8B346B0070; Fri, 26 Mar 2021 05:45:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C10576B006E; Fri, 26 Mar 2021 05:45:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB00D6B0072; Fri, 26 Mar 2021 05:45:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0227.hostedemail.com [216.40.44.227]) by kanga.kvack.org (Postfix) with ESMTP id 857A96B006E for ; Fri, 26 Mar 2021 05:45:04 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3596D8249980 for ; Fri, 26 Mar 2021 09:45:04 +0000 (UTC) X-FDA: 77961541728.23.80E1B16 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by imf26.hostedemail.com (Postfix) with ESMTP id 732B440002C4 for ; Fri, 26 Mar 2021 09:45:01 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=xlpang@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0UTMseJh_1616751899; Received: from localhost(mailfrom:xlpang@linux.alibaba.com fp:SMTPD_---0UTMseJh_1616751899) by smtp.aliyun-inc.com(127.0.0.1); Fri, 26 Mar 2021 17:44:59 +0800 From: Xunlei Pang To: Andrew Morton , Alexander Duyck , Mel Gorman Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Xunlei Pang Subject: [PATCH 3/4] mm/page_reporting: Introduce "page_reporting_factor=" boot parameter Date: Fri, 26 Mar 2021 17:44:57 +0800 Message-Id: <1616751898-58393-4-git-send-email-xlpang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1616751898-58393-1-git-send-email-xlpang@linux.alibaba.com> References: <1616751898-58393-1-git-send-email-xlpang@linux.alibaba.com> X-Stat-Signature: wd4hoqyskyktpribhmkmmsmssxc7cp8e X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 732B440002C4 Received-SPF: none (linux.alibaba.com>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=out30-44.freemail.mail.aliyun.com; client-ip=115.124.30.44 X-HE-DKIM-Result: none/none X-HE-Tag: 1616751901-331044 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently the default behaviour(value 100) is to do full report, although we can control it after boot via "page_reporting_factor" sysfs knob, it could still be late because some amount of memory has already been reported before operating this knob. Sometimes we really want it safely off by default and turn it on as needed at runtime, so "page_reporting_factor=" boot parameter is that guarantee and meets different default setting requirements. There's also a real-world problem that I noticed on tiny instances, it always reports some memory at boot stage before application starts and uses up memory which retriggers EPT fault after boot. The following data(right after boot) indicates that 172032KiB pages were unnecessarily reported and refault in: $ cat /sys/kernel/mm/page_reporting/refault_kbytes 172032 $ cat /sys/kernel/mm/page_reporting/reported_kbytes 0 Thus it's reasonable to turn the page reporting off by default and enable it at runtime as needed. Signed-off-by: Xunlei Pang --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ mm/page_reporting.c | 13 +++++++++++++ 2 files changed, 16 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0454572..46e296c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3524,6 +3524,9 @@ off: turn off poisoning (default) on: turn on poisoning + page_reporting_factor= [KNL] Guest Free Page Reporting percentile. + [0, 100]: 0 - off, not report; 100 - default, full report. + panic= [KNL] Kernel behaviour on panic: delay timeout > 0: seconds before rebooting timeout = 0: wait forever diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 86c6479..6ffedb8 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -524,3 +524,16 @@ static int __init page_reporting_init(void) } module_init(page_reporting_init); + +static int __init setup_reporting_factor(char *str) +{ + int v; + + if (kstrtoint(str, 10, &v)) + return -EINVAL; + if (v >= 0 && v <= 100) + reporting_factor = v; + + return 0; +} +__setup("page_reporting_factor=", setup_reporting_factor); From patchwork Fri Mar 26 09:44:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xunlei Pang X-Patchwork-Id: 12166151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24EE4C433DB for ; Fri, 26 Mar 2021 09:45:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9150F61A0F for ; Fri, 26 Mar 2021 09:45:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9150F61A0F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 648316B0071; Fri, 26 Mar 2021 05:45:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F7846B0073; Fri, 26 Mar 2021 05:45:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D52E6B0074; Fri, 26 Mar 2021 05:45:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0249.hostedemail.com [216.40.44.249]) by kanga.kvack.org (Postfix) with ESMTP id 14D8B6B0071 for ; Fri, 26 Mar 2021 05:45:08 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C2CEE1804654B for ; Fri, 26 Mar 2021 09:45:07 +0000 (UTC) X-FDA: 77961541854.04.05524E0 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf15.hostedemail.com (Postfix) with ESMTP id 73AA8A0009ED for ; Fri, 26 Mar 2021 09:45:05 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R561e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=xlpang@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0UTNHunT_1616751900; Received: from localhost(mailfrom:xlpang@linux.alibaba.com fp:SMTPD_---0UTNHunT_1616751900) by smtp.aliyun-inc.com(127.0.0.1); Fri, 26 Mar 2021 17:45:00 +0800 From: Xunlei Pang To: Andrew Morton , Alexander Duyck , Mel Gorman Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Xunlei Pang Subject: [PATCH 4/4] mm/page_reporting: Fix possible user allocation failure Date: Fri, 26 Mar 2021 17:44:58 +0800 Message-Id: <1616751898-58393-5-git-send-email-xlpang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1616751898-58393-1-git-send-email-xlpang@linux.alibaba.com> References: <1616751898-58393-1-git-send-email-xlpang@linux.alibaba.com> X-Stat-Signature: ych6w77rmt3as757yak8im9u99jbkkdx X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 73AA8A0009ED Received-SPF: none (linux.alibaba.com>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from=""; helo=out30-130.freemail.mail.aliyun.com; client-ip=115.124.30.130 X-HE-DKIM-Result: none/none X-HE-Tag: 1616751905-232227 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We encountered user memory allocation failure(OOM) on our 512MiB tiny instances, it didn't happen after turning off the page reporting. After some debugging, it turns out 32*4MB=128MB(order-10) free pages were isolated during reporting window resulting in no free available. Actually this might also happen on large instances when having a few free memory. This patch introduces a rule to limit reporting capacity according to current free memory, and reduce accordingly for higher orders which could break this rule. For example, 100MiB free, sgl capacity for different orders are: order-9 : 32 order-10: 16 Reported-by: Helin Guo Tested-by: Helin Guo Signed-off-by: Xunlei Pang --- mm/page_reporting.c | 89 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 72 insertions(+), 17 deletions(-) diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 6ffedb8..2ec0ec0 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -129,8 +129,8 @@ void __page_reporting_notify(void) */ static int page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, - unsigned int order, unsigned int mt, - struct scatterlist *sgl, unsigned int *offset) + unsigned int order, unsigned int mt, struct scatterlist *sgl, + const unsigned int capacity, unsigned int *offset) { struct free_area *area = &zone->free_area[order]; struct list_head *list = &area->free_list[mt]; @@ -161,10 +161,10 @@ void __page_reporting_notify(void) * list processed. This should result in us reporting all pages on * an idle system in about 30 seconds. * - * The division here should be cheap since PAGE_REPORTING_CAPACITY - * should always be a power of 2. + * The division here should be cheap since capacity should + * always be a power of 2. */ - budget = DIV_ROUND_UP(area->nr_free, PAGE_REPORTING_CAPACITY * 16); + budget = DIV_ROUND_UP(area->nr_free, capacity * 16); /* loop through free list adding unreported pages to sg list */ list_for_each_entry_safe(page, next, list, lru) { @@ -196,7 +196,7 @@ void __page_reporting_notify(void) --(*offset); sg_set_page(&sgl[*offset], page, page_len, 0); - nr_pages = (PAGE_REPORTING_CAPACITY - *offset) << order; + nr_pages = (capacity - *offset) << order; if (zone->reported_pages + nr_pages >= threshold) { err = 1; break; @@ -217,10 +217,10 @@ void __page_reporting_notify(void) spin_unlock_irq(&zone->lock); /* begin processing pages in local list */ - err = prdev->report(prdev, sgl, PAGE_REPORTING_CAPACITY); + err = prdev->report(prdev, sgl, capacity); /* reset offset since the full list was reported */ - *offset = PAGE_REPORTING_CAPACITY; + *offset = capacity; /* update budget to reflect call to report function */ budget--; @@ -229,7 +229,7 @@ void __page_reporting_notify(void) spin_lock_irq(&zone->lock); /* flush reported pages from the sg list */ - page_reporting_drain(prdev, sgl, zone, PAGE_REPORTING_CAPACITY, !err); + page_reporting_drain(prdev, sgl, zone, capacity, !err); /* * Reset next to first entry, the old next isn't valid @@ -251,12 +251,39 @@ void __page_reporting_notify(void) return err; } +/* + * For guest with little free memory, we should tune reporting capacity + * correctly to avoid reporting too much once, otherwise user allocation + * may fail and OOM during reporting window between __isolate_free_page() + * and page_reporting_drain(). + * + * Calculate from which order we begin to reduce the scatterlist capacity, + * in order not to isolate too many pages to fail the user allocation. + */ +static unsigned int calculate_zone_order_threshold(struct zone *z) +{ + unsigned int order; + long pages_threshold; + + pages_threshold = zone_page_state(z, NR_FREE_PAGES) - low_wmark_pages(z); + for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) { + if ((PAGE_REPORTING_CAPACITY << order) > pages_threshold) + break; + } + + return order; +} + static int page_reporting_process_zone(struct page_reporting_dev_info *prdev, struct scatterlist *sgl, struct zone *zone) { - unsigned int order, mt, leftover, offset = PAGE_REPORTING_CAPACITY; + unsigned int order, mt, leftover, offset; unsigned long watermark, threshold; + unsigned int capacity = PAGE_REPORTING_CAPACITY; + unsigned int capacity_curr; + struct scatterlist *sgl_curr; + unsigned int order_threshold; int err = 0; threshold = atomic_long_read(&zone->managed_pages) * reporting_factor / 100; @@ -274,15 +301,28 @@ void __page_reporting_notify(void) if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) return err; + sgl_curr = sgl; + capacity_curr = offset = capacity; + order_threshold = calculate_zone_order_threshold(zone); /* Process each free list starting from lowest order/mt */ for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) { + /* try to reduce unexpected high order's reporting capacity */ + if (order >= order_threshold) { + capacity_curr = capacity >> (order - order_threshold + 1); + if (capacity_curr == 0) + capacity_curr = 1; + sgl_curr = sgl + capacity - capacity_curr; + offset = capacity_curr; + sg_init_table(sgl_curr, capacity_curr); + } + for (mt = 0; mt < MIGRATE_TYPES; mt++) { /* We do not pull pages from the isolate free list */ if (is_migrate_isolate(mt)) continue; err = page_reporting_cycle(prdev, zone, order, mt, - sgl, &offset); + sgl_curr, capacity_curr, &offset); /* Exceed threshold go to report leftover */ if (err > 0) { err = 0; @@ -292,18 +332,34 @@ void __page_reporting_notify(void) if (err) return err; } + + /* report the leftover pages for next orders with reduced capacity */ + leftover = capacity_curr - offset; + if (leftover && order + 1 >= order_threshold) { + sgl_curr = &sgl_curr[offset]; + err = prdev->report(prdev, sgl_curr, leftover); + offset = capacity_curr; + + /* flush any remaining pages out from the last report */ + spin_lock_irq(&zone->lock); + page_reporting_drain(prdev, sgl_curr, zone, leftover, !err); + spin_unlock_irq(&zone->lock); + + if (err) + return err; + } } leftover: /* report the leftover pages before going idle */ - leftover = PAGE_REPORTING_CAPACITY - offset; + leftover = capacity_curr - offset; if (leftover) { - sgl = &sgl[offset]; - err = prdev->report(prdev, sgl, leftover); + sgl_curr = &sgl_curr[offset]; + err = prdev->report(prdev, sgl_curr, leftover); /* flush any remaining pages out from the last report */ spin_lock_irq(&zone->lock); - page_reporting_drain(prdev, sgl, zone, leftover, !err); + page_reporting_drain(prdev, sgl_curr, zone, leftover, !err); spin_unlock_irq(&zone->lock); } @@ -332,9 +388,8 @@ static void page_reporting_process(struct work_struct *work) if (!sgl) goto err_out; - sg_init_table(sgl, PAGE_REPORTING_CAPACITY); - for_each_zone(zone) { + sg_init_table(sgl, PAGE_REPORTING_CAPACITY); err = page_reporting_process_zone(prdev, sgl, zone); if (err) break;