From patchwork Mon Dec 21 16:28:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 11984877 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51816C433DB for ; Mon, 21 Dec 2020 16:28:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E9DCA22C9C for ; Mon, 21 Dec 2020 16:28:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E9DCA22C9C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 46E466B0036; Mon, 21 Dec 2020 11:28:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4440A6B005C; Mon, 21 Dec 2020 11:28:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30BCE6B005D; Mon, 21 Dec 2020 11:28:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id 184956B0036 for ; Mon, 21 Dec 2020 11:28:13 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id CE0748249980 for ; Mon, 21 Dec 2020 16:28:12 +0000 (UTC) X-FDA: 77617821624.19.nest35_3e0523e27458 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id A0AF11AD1B1 for ; Mon, 21 Dec 2020 16:28:12 +0000 (UTC) X-HE-Tag: nest35_3e0523e27458 X-Filterd-Recvd-Size: 7642 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 16:28:12 +0000 (UTC) Received: by mail-pj1-f54.google.com with SMTP id m5so6721830pjv.5 for ; Mon, 21 Dec 2020 08:28:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=yCqvw49/Rmb78qrcQ/V5V54EYYCj1w98X72tRqbM6ug=; b=ISuBRRAJ/22G4sfeIRuMpcpnOaxALlpU4jqnRl3kvDrtgKm3mszXPYIzziMVzphXMH ruF9mJTxXebBWTXxVxxVJrR/0KtJrLr3gAWy3I1zDEMplJnMFY/QdWYLfr0dJk33FAlc TpvLGlUWCSWVVP6nYbFhNPkJPaD5yq3x+EkbVN0Hor468AyTqFaqKEG94wph/WDMy+ds BZc9tnM5gXI8EZD8GsVCQla/qU2TKGqLzvfJCKGNMK1+3sVcppgRivwQv0tpG8qxOhB5 xTW6KaPMZOnqtTNFvw+9cwtQ9P9aLpxFu3FYKA8vj0D8s2NEb7i6MZ9AgTB2dwlt6T60 p26g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition:user-agent; bh=yCqvw49/Rmb78qrcQ/V5V54EYYCj1w98X72tRqbM6ug=; b=pvbrlKnxGtKiS9YmQVsgkxDTAMZMRHswxxPlBBC5RDVYcN95mqmy2Gjugeqvs4UWDX FhqqbP0Da6BuCzBReVUp5OdUW/cm25B9bGdz8joQK53OFj7ihVEj9UzXOHp9qFIPzm0G I4AgewBOcIUVH6jVu/t6YbCltvmrpteQmfwqVB7TpRWACiyzlEmyrjbgRvUYVUYQwUiQ t/3KK3Cl7Jp+IRywsJqXcyk5+4HQAagr/BuGHnZ9mjUbaKAvzD2o8pA8Pz0sjqhRdrpG Slyi9lvc0w4yNH3SofhOt9zHp4quGMIWMJNOHSrz4bvjTAD+D/XTRRkH0hjyTP+/98ta CN0g== X-Gm-Message-State: AOAM531iLGFdPN3IZk0fhhQ+tQAFpfSnJaF4gjUr5GPpAvrAvW0d09MI bMs1mKJGnqozjQ23uXD5QSg= X-Google-Smtp-Source: ABdhPJz3AIrbe7WmvCFNyoUxmGYHG4WGKx8LarPfV1FGev4qvJ72uy0P4T4bxfBZHf2YyXIp2rSmQw== X-Received: by 2002:a17:902:aa8b:b029:da:ef22:8675 with SMTP id d11-20020a170902aa8bb02900daef228675mr16841289plr.15.1608568091235; Mon, 21 Dec 2020 08:28:11 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id ev23sm16794186pjb.24.2020.12.21.08.28.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Dec 2020 08:28:11 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Mon, 21 Dec 2020 11:28:08 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [RFC v2 PATCH 1/4] mm: make page reporing worker works better for low order page Message-ID: <20201221162806.GA22524@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 'page_reporting_cycle' may take the zone lock for too long when scan a low order free page list, because the low order page free list may have a lot iterms and very few pages without the PG_report flag. Current implementation limits the mini reported order to pageblock_order, it's ok for most case. If we want to report low order pages, we should prevent zone lock from being taken too long by the reporting worker, or it may affect system performance, this patch try to make 'page_reporting_cycle' work better for low order pages, the zone lock was released periodicly and cup was yielded voluntarily if needed. Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Signed-off-by: Liang Li --- mm/page_reporting.c | 35 ++++++++++++++++++++++++++++++++--- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/mm/page_reporting.c b/mm/page_reporting.c index cd8e13d41df4..0b22db94ce2a 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -6,11 +6,14 @@ #include #include #include +#include #include "page_reporting.h" #include "internal.h" #define PAGE_REPORTING_DELAY (2 * HZ) +#define MAX_SCAN_NUM 1024 + static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly; enum { @@ -115,7 +118,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, unsigned int page_len = PAGE_SIZE << order; struct page *page, *next; long budget; - int err = 0; + int err = 0, scan_cnt = 0; /* * Perform early check, if free area is empty there is @@ -145,8 +148,14 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* loop through free list adding unreported pages to sg list */ list_for_each_entry_safe(page, next, list, lru) { /* We are going to skip over the reported pages. */ - if (PageReported(page)) + if (PageReported(page)) { + if (++scan_cnt >= MAX_SCAN_NUM) { + err = scan_cnt; + break; + } continue; + } + /* * If we fully consumed our budget then update our @@ -219,6 +228,26 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, return err; } +static int +reporting_order_type(struct page_reporting_dev_info *prdev, struct zone *zone, + unsigned int order, unsigned int mt, + struct scatterlist *sgl, unsigned int *offset) +{ + int ret = 0; + unsigned long total = 0; + + might_sleep(); + do { + cond_resched(); + ret = page_reporting_cycle(prdev, zone, order, mt, + sgl, offset); + if (ret > 0) + total += ret; + } while (ret > 0 && total < zone->free_area[order].nr_free); + + return ret; +} + static int page_reporting_process_zone(struct page_reporting_dev_info *prdev, struct scatterlist *sgl, struct zone *zone) @@ -245,7 +274,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev, if (is_migrate_isolate(mt)) continue; - err = page_reporting_cycle(prdev, zone, order, mt, + err = reporting_order_type(prdev, zone, order, mt, sgl, &offset); if (err) return err; From patchwork Mon Dec 21 16:28:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 11984879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2D83C433DB for ; Mon, 21 Dec 2020 16:28:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8186622B51 for ; Mon, 21 Dec 2020 16:28:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8186622B51 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DB90D6B005C; Mon, 21 Dec 2020 11:28:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D69CE6B005D; Mon, 21 Dec 2020 11:28:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C591C6B006C; Mon, 21 Dec 2020 11:28:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id 9CAE96B005C for ; Mon, 21 Dec 2020 11:28:57 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 574873643 for ; Mon, 21 Dec 2020 16:28:57 +0000 (UTC) X-FDA: 77617823514.29.card18_2e0e44227458 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 39145180868FB for ; Mon, 21 Dec 2020 16:28:57 +0000 (UTC) X-HE-Tag: card18_2e0e44227458 X-Filterd-Recvd-Size: 6865 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 16:28:56 +0000 (UTC) Received: by mail-pf1-f178.google.com with SMTP id t22so6744454pfl.3 for ; Mon, 21 Dec 2020 08:28:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=TWbytHk0kwDmGNz219zPiK2yM0SvWN9E6mBmb9Rckbw=; b=VsX+OIQPLU3ay3St7Stnyt2AOk+euEJgP8drwoX3Wj31/pbdT/Ko9TWjb0JbdVzVs3 yvCZDHYcwR3dazppXzQqvTB13XlvW3jV2/T97Tkjd0EeqZz7Z6/xpSBapsC5obYpjtOW EIET1bZ17a2dSa2aZZ/2oJdKnJZuWAp6/qycQEFnsFg5xM0Oa+uc66jA30Z98VvYyswK 9jzn5K/MSE/H4hZ3oWj4A4w5UA9Xb6YXejyFg92y7txHX3Y/slWBan9EyrjdoOWikCE5 R6i2P71yAbGNZxyFgbtJ4uDoCzTPpQEhUCRvtll4kqR1e8QKl1NF4nNpu7774LsZGUR/ naaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition:user-agent; bh=TWbytHk0kwDmGNz219zPiK2yM0SvWN9E6mBmb9Rckbw=; b=DuVLn+GPvMFNHFh+YbcT3Pxi5SekQDT13qXkbk5GsyK8x5+SNk3Ocl5xXtIWsPxlBc 7VsgdV8eHK/lRjbCHZLiKnhtnLxZtCxytxbDD4mvnkbbkIwA+EqZ28SU7oB5B/zCEcrE cfJWqO57ADpeEgPzngiUT4xsO/w7u8FJ4a1iohcFHFShJ/akrBpzt80tnhXF0priR4KW C1AhMS+Hi+QI42v/RbASl2J2AbD/k3M/H4EmFBNIi4yvNCJgptF/mfbIr82l6jE9T6gT sBS552F9iA+iYZE1QEtykQT+1CbWZf4y1z5ReuCxepPmOhX8+fk2eOEp1YaGLcv9kEYu OliQ== X-Gm-Message-State: AOAM532hpWbSbVl9xhSE2+1mS7HxB2vkgGU4ZyP79VKNQDmSjRys80CK Ig6KQI+5ZTUlMApTEjmwzmo= X-Google-Smtp-Source: ABdhPJwzr5pBcakDlhPI2PXs+KSqljHYy/00sB78l96c6Dn/mbSBJJ218gj4PIzYRWs0Xx/WSc9ZGQ== X-Received: by 2002:a65:64c9:: with SMTP id t9mr16051232pgv.418.1608568135985; Mon, 21 Dec 2020 08:28:55 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id gm18sm16118590pjb.55.2020.12.21.08.28.55 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Dec 2020 08:28:55 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Mon, 21 Dec 2020 11:28:53 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [RFC v2 PATCH 2/4] mm: Add batch size for free page reporting Message-ID: <20201221162851.GA22528@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use the page order as the only threshold for page reporting is not flexible and has some flaws. When the system's memory becomes very fragmented, there will be a lot of low order free pages but very few high order pages, limit the mini order as pageblock_order will prevent most of free pages from being reclaimed by host for the case of page reclmain through the virtio-balllon driver. Scan a long free list is not cheap, it's better to wake up the page reporting worker when there are more pages, wake it up for a sigle page may not worth. This patch add a batch size as another threshold to control the waking up of reporting worker. Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Signed-off-by: Liang Li --- mm/page_reporting.c | 2 ++ mm/page_reporting.h | 12 ++++++++++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 0b22db94ce2a..2f8e3d032fab 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -14,6 +14,8 @@ #define PAGE_REPORTING_DELAY (2 * HZ) #define MAX_SCAN_NUM 1024 +unsigned long page_report_batch_size __read_mostly = 4 * 1024 * 1024UL; + static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly; enum { diff --git a/mm/page_reporting.h b/mm/page_reporting.h index 2c385dd4ddbd..b8fb3bbb345f 100644 --- a/mm/page_reporting.h +++ b/mm/page_reporting.h @@ -12,6 +12,8 @@ #define PAGE_REPORTING_MIN_ORDER pageblock_order +extern unsigned long page_report_batch_size; + #ifdef CONFIG_PAGE_REPORTING DECLARE_STATIC_KEY_FALSE(page_reporting_enabled); void __page_reporting_notify(void); @@ -33,6 +35,8 @@ static inline bool page_reported(struct page *page) */ static inline void page_reporting_notify_free(unsigned int order) { + static long batch_size; + /* Called from hot path in __free_one_page() */ if (!static_branch_unlikely(&page_reporting_enabled)) return; @@ -41,8 +45,12 @@ static inline void page_reporting_notify_free(unsigned int order) if (order < PAGE_REPORTING_MIN_ORDER) return; - /* This will add a few cycles, but should be called infrequently */ - __page_reporting_notify(); + batch_size += (1 << order) << PAGE_SHIFT; + if (batch_size >= page_report_batch_size) { + batch_size = 0; + /* This add a few cycles, but should be called infrequently */ + __page_reporting_notify(); + } } #else /* CONFIG_PAGE_REPORTING */ #define page_reported(_page) false From patchwork Mon Dec 21 16:29:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 11984881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F41EC433DB for ; Mon, 21 Dec 2020 16:29:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 57E9322CA1 for ; Mon, 21 Dec 2020 16:29:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 57E9322CA1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D6ABD6B0036; Mon, 21 Dec 2020 11:29:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D40BC6B005D; Mon, 21 Dec 2020 11:29:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE2166B006C; Mon, 21 Dec 2020 11:29:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 9FC3E6B0036 for ; Mon, 21 Dec 2020 11:29:43 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5E682180AD83B for ; Mon, 21 Dec 2020 16:29:43 +0000 (UTC) X-FDA: 77617825446.14.bulb63_1707a6e27458 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 3802318021DAA for ; Mon, 21 Dec 2020 16:29:43 +0000 (UTC) X-HE-Tag: bulb63_1707a6e27458 X-Filterd-Recvd-Size: 9743 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 16:29:42 +0000 (UTC) Received: by mail-pf1-f180.google.com with SMTP id 11so6730594pfu.4 for ; Mon, 21 Dec 2020 08:29:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=5kDglcd6VQIqjjNw+a1FSAU9ly8u5KobZD9sJqfzmhY=; b=DFhcmrJPOlYHNP1sMevkV/MJhv/j7/xlyEEaaigvep8rnOXypKxqL6sSndgMJHPfDV /VGqHeHZNOPGVLtc17wQ1soRflBwrZVLAqFrKavH4+dXC7h88ctWw7iMFcbzqa4G+Vyr qdZckrzp3v9MysXO0JIpuyu21vuTICM16rPXehkCwbImep5wXGYzmhRP4SfhKEywCKHn wkn0tF18a7OoBcCtstBFNASOR/LJAyJJgRGNfBKwap+f6sJaIB5pypH+CSw3lvOOf2X/ wqdJbJG58gqVQPncHESPYy5TxLsx6e84nNTsNlbYoXtVuLJnucYBfmkrCXvmy9zCosk5 60xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition:user-agent; bh=5kDglcd6VQIqjjNw+a1FSAU9ly8u5KobZD9sJqfzmhY=; b=XayEpheczUdFZIt/Q/IsyJEekV3qv56XUN4JG3cu7T5+a9na2ob6JDphnKMEVyRSOo H6MaEv36H9j/sP38B5EkTgV/QJ/QHINTW2TlXTR6ErTpvri9Omy+FV+Tw6qPoaEWK1Bb Qr5xt5RBuZ4Ii3cU5JaUgMW9igYzqUlYvpo4h37WTZIQ0++H42uGu83hKXubdAsyxx0B AODSQR2ihdzd63SD7lVtfLKoChyj7bTfOq2gK7f6UErssbcZJtPMGW3WbzPo2c89xFpI +N/eVM1gn1ck5bsfBeUS/UsKQaic6yo5qQ/xgdgblVA6jJ7PXRbQYfcCyAqMrzUeoLAK //0w== X-Gm-Message-State: AOAM532NPfKBQJfNFGS32C5KfZDLTaKlK8pCueZqj43W8+hAkptjipjj vzns5Vt+CcatGCOACalatks= X-Google-Smtp-Source: ABdhPJyuienqAJue8qcDSJAHs+Vi+ADg974Xm4zu7z826xU1FeXUDlKlvmq8JBTXgMScbLS/qUASkw== X-Received: by 2002:a63:d855:: with SMTP id k21mr4931948pgj.399.1608568181980; Mon, 21 Dec 2020 08:29:41 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id mr7sm15551570pjb.31.2020.12.21.08.29.41 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Dec 2020 08:29:41 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Mon, 21 Dec 2020 11:29:39 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [RFC v2 PATCH 3/4] mm: let user decide page reporting option Message-ID: <20201221162937.GA22530@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Some key parameters for page reporting are now hard coded, different users of the framework may have their special requirements, make these parameter configrable and let the user decide them. Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Signed-off-by: Liang Li --- drivers/virtio/virtio_balloon.c | 3 +++ include/linux/page_reporting.h | 3 +++ mm/page_reporting.c | 18 ++++++++++-------- mm/page_reporting.h | 6 +++--- 4 files changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 8985fc2cea86..a298517079bb 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -993,6 +993,9 @@ static int virtballoon_probe(struct virtio_device *vdev) goto out_unregister_oom; } + vb->pr_dev_info.mini_order = 6; + vb->pr_dev_info.batch_size = 32 * 1024 * 1024; /* 32M */ + vb->pr_dev_info.delay_jiffies = 2 * HZ; /* 2 seconds */ err = page_reporting_register(&vb->pr_dev_info); if (err) goto out_unregister_oom; diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h index 3b99e0ec24f2..63e1e9fbcaa2 100644 --- a/include/linux/page_reporting.h +++ b/include/linux/page_reporting.h @@ -13,6 +13,9 @@ struct page_reporting_dev_info { int (*report)(struct page_reporting_dev_info *prdev, struct scatterlist *sg, unsigned int nents); + unsigned long batch_size; + unsigned long delay_jiffies; + int mini_order; /* work struct for processing reports */ struct delayed_work work; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 2f8e3d032fab..20ec3fb1afc4 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -11,12 +11,10 @@ #include "page_reporting.h" #include "internal.h" -#define PAGE_REPORTING_DELAY (2 * HZ) #define MAX_SCAN_NUM 1024 - -unsigned long page_report_batch_size __read_mostly = 4 * 1024 * 1024UL; - static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly; +int page_report_mini_order = pageblock_order; +unsigned long page_report_batch_size = 32 * 1024 * 1024; enum { PAGE_REPORTING_IDLE = 0, @@ -48,7 +46,7 @@ __page_reporting_request(struct page_reporting_dev_info *prdev) * now we are limiting this to running no more than once every * couple of seconds. */ - schedule_delayed_work(&prdev->work, PAGE_REPORTING_DELAY); + schedule_delayed_work(&prdev->work, prdev->delay_jiffies); } /* notify prdev of free page reporting request */ @@ -260,7 +258,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev, /* Generate minimum watermark to be able to guarantee progress */ watermark = low_wmark_pages(zone) + - (PAGE_REPORTING_CAPACITY << PAGE_REPORTING_MIN_ORDER); + (PAGE_REPORTING_CAPACITY << prdev->mini_order); /* * Cancel request if insufficient free memory or if we failed @@ -270,7 +268,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev, return err; /* Process each free list starting from lowest order/mt */ - for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) { + for (order = prdev->mini_order; order < MAX_ORDER; order++) { for (mt = 0; mt < MIGRATE_TYPES; mt++) { /* We do not pull pages from the isolate free list */ if (is_migrate_isolate(mt)) @@ -337,7 +335,7 @@ static void page_reporting_process(struct work_struct *work) */ state = atomic_cmpxchg(&prdev->state, state, PAGE_REPORTING_IDLE); if (state == PAGE_REPORTING_REQUESTED) - schedule_delayed_work(&prdev->work, PAGE_REPORTING_DELAY); + schedule_delayed_work(&prdev->work, prdev->delay_jiffies); } static DEFINE_MUTEX(page_reporting_mutex); @@ -365,6 +363,8 @@ int page_reporting_register(struct page_reporting_dev_info *prdev) /* Assign device to allow notifications */ rcu_assign_pointer(pr_dev_info, prdev); + page_report_mini_order = prdev->mini_order; + page_report_batch_size = prdev->batch_size; /* enable page reporting notification */ if (!static_key_enabled(&page_reporting_enabled)) { static_branch_enable(&page_reporting_enabled); @@ -382,6 +382,8 @@ void page_reporting_unregister(struct page_reporting_dev_info *prdev) mutex_lock(&page_reporting_mutex); if (rcu_access_pointer(pr_dev_info) == prdev) { + if (static_key_enabled(&page_reporting_enabled)) + static_branch_disable(&page_reporting_enabled); /* Disable page reporting notification */ RCU_INIT_POINTER(pr_dev_info, NULL); synchronize_rcu(); diff --git a/mm/page_reporting.h b/mm/page_reporting.h index b8fb3bbb345f..86ac6ffad970 100644 --- a/mm/page_reporting.h +++ b/mm/page_reporting.h @@ -9,9 +9,9 @@ #include #include #include +#include -#define PAGE_REPORTING_MIN_ORDER pageblock_order - +extern int page_report_mini_order; extern unsigned long page_report_batch_size; #ifdef CONFIG_PAGE_REPORTING @@ -42,7 +42,7 @@ static inline void page_reporting_notify_free(unsigned int order) return; /* Determine if we have crossed reporting threshold */ - if (order < PAGE_REPORTING_MIN_ORDER) + if (order < page_report_mini_order) return; batch_size += (1 << order) << PAGE_SHIFT; From patchwork Mon Dec 21 16:30:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 11984883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.0 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F349C433DB for ; Mon, 21 Dec 2020 16:30:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 14B9C22ADF for ; Mon, 21 Dec 2020 16:30:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14B9C22ADF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 883AF6B0036; Mon, 21 Dec 2020 11:30:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 833E66B005C; Mon, 21 Dec 2020 11:30:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 722E26B005D; Mon, 21 Dec 2020 11:30:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id 54F7B6B0036 for ; Mon, 21 Dec 2020 11:30:49 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C0F5B4DA2 for ; Mon, 21 Dec 2020 16:30:48 +0000 (UTC) X-FDA: 77617828176.18.book01_2d1303c27458 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 1EE2D100EC695 for ; Mon, 21 Dec 2020 16:30:30 +0000 (UTC) X-HE-Tag: book01_2d1303c27458 X-Filterd-Recvd-Size: 21833 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 16:30:29 +0000 (UTC) Received: by mail-pl1-f169.google.com with SMTP id e2so5854827plt.12 for ; Mon, 21 Dec 2020 08:30:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:content-transfer-encoding:user-agent; bh=9Oi8lrhXv/5Abu57ubtpE66M1jalfYjk6u8WuCWIm+A=; b=d48Usehigpdf+lkAwC0AElsEImiMghaPoNWOEeoqVRZRi5KFD61UWRJ9LiQEPi/yPE C1FHnbJ1aDnvqrprZa4j7StywZYZz/W/DdvOcEcKLSbXKmbgtZbOkewSOEXnL5A40huA 5WKP9AJvJ4Zv2FkAD77bnFimH1wPlLSIkWs8xYm4vTgNSedDlSTgPKO02jIkPq9cU980 VsfSzcDa/1e9dK9WinBWu3IZBdHTLoabXXseo5PRE2FCfpwDmKOBRi9LILPLzNlGYnbs B7N/qUeXcpwu4joYxNIpyHEp8X58Kcui5moD2cFXIeRuOHUPn9MdKy6xqWgjPj9c9MvU 1fVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition :content-transfer-encoding:user-agent; bh=9Oi8lrhXv/5Abu57ubtpE66M1jalfYjk6u8WuCWIm+A=; b=NT41E2nS3BtHUr4nnTRUdJl/sJq51f4HMDxFTHiYQVeFiaAP2TWtspFeV2Fv9N9mAc xDGX1e0V3O/Xv6So1Nm9yVE/m4oC+oePehAEiZslQrDa5rNxuAnqOTlKQx6La5k/1r2g KSly/5Fy9qJ+2nrY4PnkTkZolpkCcSgc0mJEN1iVPXTPZXKaJKQG5Cfu2L6qA11K2VtZ IhxgDHcjZwQ30X7a9RT0Z73xBpHYLeYJzQxTqZgk5ZK592RGjMmJv3sBUua+crE2DIZh QrqLUhFI27HQSXB5BJnm56PNamDUFUei6qoYxvVNvpdhhncqSJYVU3Hdk5u5tOyLQWfD LrnQ== X-Gm-Message-State: AOAM530LKFgCOgLvSdgvDD9KweQJhsA7mVu00U6d9Gu8txHf3eRwy1FA HRkD8AtulDJDr1kyIq6eK/w= X-Google-Smtp-Source: ABdhPJyW9OjfUf5XM6pDRzTbaEK6TiZuK6JeHqMmkb2+V6kRt3sPIjwRJ07j5aBXUqSLoCnpvwDj4g== X-Received: by 2002:a17:902:ee52:b029:da:4dee:1a54 with SMTP id 18-20020a170902ee52b02900da4dee1a54mr17139049plo.29.1608568228355; Mon, 21 Dec 2020 08:30:28 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id u12sm16070387pfn.88.2020.12.21.08.30.27 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Dec 2020 08:30:28 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Mon, 21 Dec 2020 11:30:25 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO Message-ID: <20201221163024.GA22532@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Zero out the page content usually happens when allocating pages with the flag of __GFP_ZERO, this is a time consuming operation, it makes the population of a large vma area very slowly. This patch introduce a new feature for zero out pages before page allocation, it can help to speed up page allocation with __GFP_ZERO. My original intention for adding this feature is to shorten VM creation time when SR-IOV devicde is attached, it works good and the VM creation time is reduced by about 90%. Creating a VM [64G RAM, 32 CPUs] with GPU passthrough ===================================================== QEMU use 4K pages, THP is off round1 round2 round3 w/o this patch: 23.5s 24.7s 24.6s w/ this patch: 10.2s 10.3s 11.2s QEMU use 4K pages, THP is on round1 round2 round3 w/o this patch: 17.9s 14.8s 14.9s w/ this patch: 1.9s 1.8s 1.9s ===================================================== Obviously, it can do more than this. We can benefit from this feature in the flowing case: Interactive sence ================= Shorten application lunch time on desktop or mobile phone, it can help to improve the user experience. Test shows on a server [Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz], zero out 1GB RAM by the kernel will take about 200ms, while some mainly used application like Firefox browser, Office will consume 100 ~ 300 MB RAM just after launch, by pre zero out free pages, it means the application launch time will be reduced about 20~60ms (can be sensed by people?). May be we can make use of this feature to speed up the launch of Andorid APP (I didn't do any test for Android). Virtulization ============= Speed up VM creation and shorten guest boot time, especially for PCI SR-IOV device passthrough scenario. Compared with some of the para vitalization solutions, it is easy to deploy because it’s transparent to guest and can handle DMA properly in BIOS stage, while the para virtualization solution can’t handle it well. Improve guest performance when use VIRTIO_BALLOON_F_REPORTING for memory overcommit. The VIRTIO_BALLOON_F_REPORTING feature will report guest page to the VMM, VMM will unmap the corresponding host page for reclaim, when guest allocate a page just reclaimed, host will allocate a new page and zero it out for guest, in this case pre zero out free page will help to speed up the proccess of fault in and reduce the performance impaction. Speed up kernel routine ======================= This can’t be guaranteed because we don’t pre zero out all the free pages, but is true for most case. It can help to speed up some important system call just like fork, which will allocate zero pages for building page table. And speed up the process of page fault, especially for huge page fault. The POC of Hugetlb free page pre zero out has been done. Security ======== This is a weak version of "introduce init_on_alloc=1 and init_on_free=1 boot options", which zero out page in a asynchronous way. For users can't tolerate the impaction of 'init_on_alloc=1' or 'init_on_free=1' brings, this feauture provide another choice. Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Signed-off-by: Liang Li --- include/linux/highmem.h | 31 +++- include/linux/page-flags.h | 16 +- include/trace/events/mmflags.h | 7 + mm/Kconfig | 10 ++ mm/Makefile | 1 + mm/huge_memory.c | 3 +- mm/page_alloc.c | 4 + mm/page_prezero.c | 266 +++++++++++++++++++++++++++++++++ mm/page_prezero.h | 13 ++ 9 files changed, 346 insertions(+), 5 deletions(-) create mode 100644 mm/page_prezero.c create mode 100644 mm/page_prezero.h diff --git a/include/linux/highmem.h b/include/linux/highmem.h index d2c70d3772a3..badb5d0528f7 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -146,7 +146,13 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size) #ifndef clear_user_highpage static inline void clear_user_highpage(struct page *page, unsigned long vaddr) { - void *addr = kmap_atomic(page); + void *addr; + +#ifdef CONFIG_PREZERO_PAGE + if (TestClearPageZero(page)) + return; +#endif + addr = kmap_atomic(page); clear_user_page(addr, vaddr, page); kunmap_atomic(addr); } @@ -197,9 +203,30 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, return __alloc_zeroed_user_highpage(__GFP_MOVABLE, vma, vaddr); } +#ifdef CONFIG_PREZERO_PAGE +static inline void __clear_highpage(struct page *page) +{ + void *kaddr; + + if (PageZero(page)) + return; + + kaddr = kmap_atomic(page); + clear_page(kaddr); + SetPageZero(page); + kunmap_atomic(kaddr); +} +#endif + static inline void clear_highpage(struct page *page) { - void *kaddr = kmap_atomic(page); + void *kaddr; + +#ifdef CONFIG_PREZERO_PAGE + if (TestClearPageZero(page)) + return; +#endif + kaddr = kmap_atomic(page); clear_page(kaddr); kunmap_atomic(kaddr); } diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index ec5d0290e0ee..b0a1ae41659c 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -137,6 +137,9 @@ enum pageflags { #endif #ifdef CONFIG_64BIT PG_arch_2, +#endif +#ifdef CONFIG_PREZERO_PAGE + PG_zero, #endif __NR_PAGEFLAGS, @@ -451,6 +454,15 @@ PAGEFLAG(Idle, idle, PF_ANY) */ __PAGEFLAG(Reported, reported, PF_NO_COMPOUND) +#ifdef CONFIG_PREZERO_PAGE +PAGEFLAG(Zero, zero, PF_ANY) +TESTSCFLAG(Zero, zero, PF_ANY) +#define __PG_ZERO (1UL << PG_zero) +#else +PAGEFLAG_FALSE(Zero) +#define __PG_ZERO 0 +#endif + /* * On an anonymous page mapped into a user virtual memory area, * page->mapping points to its anon_vma, not to a struct address_space; @@ -823,7 +835,7 @@ static inline void ClearPageSlabPfmemalloc(struct page *page) 1UL << PG_private | 1UL << PG_private_2 | \ 1UL << PG_writeback | 1UL << PG_reserved | \ 1UL << PG_slab | 1UL << PG_active | \ - 1UL << PG_unevictable | __PG_MLOCKED) + 1UL << PG_unevictable | __PG_MLOCKED | __PG_ZERO) /* * Flags checked when a page is prepped for return by the page allocator. @@ -834,7 +846,7 @@ static inline void ClearPageSlabPfmemalloc(struct page *page) * alloc-free cycle to prevent from reusing the page. */ #define PAGE_FLAGS_CHECK_AT_PREP \ - (((1UL << NR_PAGEFLAGS) - 1) & ~__PG_HWPOISON) + (((1UL << NR_PAGEFLAGS) - 1) & ~(__PG_HWPOISON | __PG_ZERO)) #define PAGE_FLAGS_PRIVATE \ (1UL << PG_private | 1UL << PG_private_2) diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 67018d367b9f..16dfdbfed8d2 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -73,6 +73,12 @@ #define IF_HAVE_PG_HWPOISON(flag,string) #endif +#ifdef CONFIG_PREZERO_PAGE +#define IF_HAVE_PG_ZERO(flag,string) ,{1UL << flag, string} +#else +#define IF_HAVE_PG_ZERO(flag,string) +#endif + #if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) #define IF_HAVE_PG_IDLE(flag,string) ,{1UL << flag, string} #else @@ -110,6 +116,7 @@ IF_HAVE_PG_MLOCK(PG_mlocked, "mlocked" ) \ IF_HAVE_PG_UNCACHED(PG_uncached, "uncached" ) \ IF_HAVE_PG_HWPOISON(PG_hwpoison, "hwpoison" ) \ +IF_HAVE_PG_ZERO(PG_zero, "zero" ) \ IF_HAVE_PG_IDLE(PG_young, "young" ) \ IF_HAVE_PG_IDLE(PG_idle, "idle" ) \ IF_HAVE_PG_ARCH_2(PG_arch_2, "arch_2" ) diff --git a/mm/Kconfig b/mm/Kconfig index 4275c25b5d8a..22ac80fb4f4e 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -253,6 +253,16 @@ config PAGE_REPORTING those pages to another entity, such as a hypervisor, so that the memory can be freed within the host for other uses. +# +# support for pre zero out free page +config PREZERO_PAGE + bool "Pre zero out free page" + def_bool y + depends on PAGE_REPORTING + help + Allows pre zero out free pages in freelist based on free + page reporting + # # support for page migration # diff --git a/mm/Makefile b/mm/Makefile index b6cd2fffa492..cf91c8b47d43 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -120,3 +120,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o +obj-$(CONFIG_PREZERO_PAGE) += page_prezero.o diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9237976abe72..b17472caf605 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2407,7 +2407,8 @@ static void __split_huge_page_tail(struct page *head, int tail, #ifdef CONFIG_64BIT (1L << PG_arch_2) | #endif - (1L << PG_dirty))); + (1L << PG_dirty) | + __PG_ZERO)); /* ->mapping in first tail page is compound_mapcount */ VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3beeb8d722f3..217d61092592 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -79,6 +79,7 @@ #include "internal.h" #include "shuffle.h" #include "page_reporting.h" +#include "page_prezero.h" /* Free Page Internal flags: for internal, non-pcp variants of free_pages(). */ typedef int __bitwise fpi_t; @@ -1217,6 +1218,7 @@ static __always_inline bool free_pages_prepare(struct page *page, VM_BUG_ON_PAGE(PageTail(page), page); trace_mm_page_free(page, order); + clear_zero_page_flag(page, order); if (unlikely(PageHWPoison(page)) && !order) { /* @@ -1302,6 +1304,7 @@ static bool free_pcp_prepare(struct page *page) static bool bulkfree_pcp_prepare(struct page *page) { + clear_zero_page_flag(page, 0); if (debug_pagealloc_enabled_static()) return check_free_page(page); else @@ -1324,6 +1327,7 @@ static bool free_pcp_prepare(struct page *page) static bool bulkfree_pcp_prepare(struct page *page) { + clear_zero_page_flag(page, 0); return check_free_page(page); } #endif /* CONFIG_DEBUG_VM */ diff --git a/mm/page_prezero.c b/mm/page_prezero.c new file mode 100644 index 000000000000..c8ce720bfc54 --- /dev/null +++ b/mm/page_prezero.c @@ -0,0 +1,266 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Copyright (C) 2020 Didi chuxing. + * + * Authors: Liang Li + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include +#include +#include +#include +#include +#include "internal.h" +#include "page_prezero.h" + +#define ZERO_PAGE_STOP 0 +#define ZERO_PAGE_RUN 1 + +static int mini_page_order = 0; +static unsigned long batch_size = 64 * 1024 * 1024; +static unsigned long delay_millisecs = 1000; +static unsigned long zeropage_enable __read_mostly; +static DEFINE_MUTEX(kzeropaged_mutex); +static struct page_reporting_dev_info zero_page_dev_info; + +inline void clear_zero_page_flag(struct page *page, int order) +{ + int i; + + for (i = 0; i < (1 << order); i++) + ClearPageZero(page + i); +} + +static int zero_free_pages(struct page_reporting_dev_info *pr_dev_info, + struct scatterlist *sgl, unsigned int nents) +{ + struct scatterlist *sg = sgl; + + might_sleep(); + do { + struct page *page = sg_page(sg); + unsigned int order = get_order(sg->length); + int i; + + VM_BUG_ON(PageBuddy(page) || buddy_order(page)); + + pr_info("%s order=%d\n", __func__, order); + for (i = 0; i < (1 << order); i++) { + cond_resched(); + __clear_highpage(page + i); + } + } while ((sg = sg_next(sg))); + + return 0; +} + +static int start_kzeropaged(void) +{ + int err = 0; + + if (zeropage_enable) { + zero_page_dev_info.report = zero_free_pages; + zero_page_dev_info.mini_order = mini_page_order; + zero_page_dev_info.batch_size = batch_size; + zero_page_dev_info.delay_jiffies = msecs_to_jiffies(delay_millisecs); + + err = page_reporting_register(&zero_page_dev_info); + pr_info("Zero page enabled\n"); + } else { + page_reporting_unregister(&zero_page_dev_info); + pr_info("Zero page disabled\n"); + } + + return err; +} + +static int restart_kzeropaged(void) +{ + int err = 0; + + if (zeropage_enable) { + page_reporting_unregister(&zero_page_dev_info); + + zero_page_dev_info.report = zero_free_pages; + zero_page_dev_info.mini_order = mini_page_order; + zero_page_dev_info.batch_size = batch_size; + zero_page_dev_info.delay_jiffies = msecs_to_jiffies(delay_millisecs); + + err = page_reporting_register(&zero_page_dev_info); + pr_info("Zero page enabled\n"); + } + + return err; +} + +static ssize_t enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", zeropage_enable); +} + +static ssize_t enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret = 0; + unsigned long flags; + int err; + + err = kstrtoul(buf, 10, &flags); + if (err || flags > UINT_MAX) + return -EINVAL; + if (flags > ZERO_PAGE_RUN) + return -EINVAL; + + if (zeropage_enable != flags) { + mutex_lock(&kzeropaged_mutex); + zeropage_enable = flags; + ret = start_kzeropaged(); + mutex_unlock(&kzeropaged_mutex); + } + + return count; +} + +static struct kobj_attribute enabled_attr = + __ATTR(enabled, 0644, enabled_show, enabled_store); + + +static ssize_t batch_size_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", batch_size); +} + +static ssize_t batch_size_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + unsigned long size; + int err; + + err = kstrtoul(buf, 10, &size); + if (err || size >= UINT_MAX) + return -EINVAL; + + batch_size = size; + + restart_kzeropaged(); + return count; +} + +static struct kobj_attribute batch_size_attr = + __ATTR(batch_size, 0644, batch_size_show, batch_size_store); + +static ssize_t delay_millisecs_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", delay_millisecs); +} + +static ssize_t delay_millisecs_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + unsigned long msecs; + int err; + + err = kstrtoul(buf, 10, &msecs); + if (err || msecs >= UINT_MAX) + return -EINVAL; + + delay_millisecs = msecs; + + restart_kzeropaged(); + + return count; +} + +static struct kobj_attribute wake_delay_millisecs_attr = + __ATTR(delay_millisecs, 0644, delay_millisecs_show, + delay_millisecs_store); + +static ssize_t mini_order_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%u\n", mini_page_order); +} + +static ssize_t mini_order_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + unsigned int order; + int err; + + err = kstrtouint(buf, 10, &order); + if (err || order >= MAX_ORDER) + return -EINVAL; + + if (mini_page_order != order) { + mutex_lock(&kzeropaged_mutex); + mini_page_order = order; + mutex_unlock(&kzeropaged_mutex); + } + + restart_kzeropaged(); + return count; +} + +static struct kobj_attribute mini_order_attr = + __ATTR(mini_order, 0644, mini_order_show, mini_order_store); + +static struct attribute *zeropage_attr[] = { + &enabled_attr.attr, + &mini_order_attr.attr, + &wake_delay_millisecs_attr.attr, + &batch_size_attr.attr, + NULL, +}; + +static struct attribute_group zeropage_attr_group = { + .attrs = zeropage_attr, +}; + +static int __init zeropage_init_sysfs(struct kobject **zeropage_kobj) +{ + int err; + + *zeropage_kobj = kobject_create_and_add("zero_page", mm_kobj); + if (unlikely(!*zeropage_kobj)) { + pr_err("zeropage: failed to create zeropage kobject\n"); + return -ENOMEM; + } + + err = sysfs_create_group(*zeropage_kobj, &zeropage_attr_group); + if (err) { + pr_err("zeropage: failed to register zeropage group\n"); + goto delete_obj; + } + + return 0; + +delete_obj: + kobject_put(*zeropage_kobj); + return err; +} + +static int __init zeropage_init(void) +{ + int err; + struct kobject *zeropage_kobj; + + err = zeropage_init_sysfs(&zeropage_kobj); + if (err) + return err; + + start_kzeropaged(); + + return 0; +} +subsys_initcall(zeropage_init); diff --git a/mm/page_prezero.h b/mm/page_prezero.h new file mode 100644 index 000000000000..292df4b91585 --- /dev/null +++ b/mm/page_prezero.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PREZERO_PAGE_H +#define _LINUX_PREZERO_PAGE_H + +#ifdef CONFIG_PREZERO_PAGE +extern inline void clear_zero_page_flag(struct page *page, int order); +#else +inline void clear_zero_page_flag(struct page *page, int order) +{ +} +#endif +#endif /*_LINUX_PREZERO_NG_H */ +