From patchwork Wed Jan 6 03:47:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 12000903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2DC2C433E0 for ; Wed, 6 Jan 2021 03:47:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3DDE522D04 for ; Wed, 6 Jan 2021 03:47:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3DDE522D04 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C11A48D00E0; Tue, 5 Jan 2021 22:47:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BE92B8D00D1; Tue, 5 Jan 2021 22:47:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD99D8D00E0; Tue, 5 Jan 2021 22:47:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0215.hostedemail.com [216.40.44.215]) by kanga.kvack.org (Postfix) with ESMTP id 99AB68D00D1 for ; Tue, 5 Jan 2021 22:47:24 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5F407824556B for ; Wed, 6 Jan 2021 03:47:24 +0000 (UTC) X-FDA: 77673965208.25.hands39_36027f2274de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 3BD2C1804E3A0 for ; Wed, 6 Jan 2021 03:47:24 +0000 (UTC) X-HE-Tag: hands39_36027f2274de X-Filterd-Recvd-Size: 6761 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Jan 2021 03:47:22 +0000 (UTC) Received: by mail-pl1-f181.google.com with SMTP id 4so886074plk.5 for ; Tue, 05 Jan 2021 19:47:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=vJphBylUyfRBLd0KjwNu7dzrYLKUuRAIjJORmt8gKFM=; b=UADAf0v+riuXRh8QHcdOF92aLenvoNDIgspIEIhT0lmhAdhCE0imRB41B/rwuveGDX zFK4V8lOcHkFVg1ZlCAmAbAqkSI2nJ3BDOnLHiuiIQOWOOwDfECZB8OuILZPlB/t82nO Suzt75PVVGc61Byu+CU804jeS7X1YZPQxb5IxHxvIEuwT03XoX+1+FG4paLg8ZzoTk8r zXBErIbkxOi4Screg7dNZqRY1hwFxtH0O9FpVaInNMx4Rf4Igtkl7PueeLGRw7UNSzjO ehvruNgslMrnYBRpBvQi77sde6oLdOWWypyqFS98EPKz/hW0Pxd5YzIUkuQJfSfSccJr I6dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition:user-agent; bh=vJphBylUyfRBLd0KjwNu7dzrYLKUuRAIjJORmt8gKFM=; b=PTdSWtJO/4AbhMOEbHaA4m3qf2VjziBXdSFSxBVJ+UFjgu751fKUEQVM1wRWGbRal7 39LojwfeXeRpxA775+R/UCYkv6JJcKShdH0OcRd+5Uqhach119p0YtorDP91mNZkgkfS 58sMUcpX7u8RF0JePS+4nyFR+o1g727ix0iXwVvP8DTEhAHhbxu578Z1sLxuyZb1NA7E 51KKRqfh9oefQSXPg/CwAA38i2hh/p2x+cvUkrnFjFJsru383dIZ0tsvmHd5pVlhAvIG dIbgyBELJ6riWJkZqq7uY2S7JiyW+zPlllTbDIwLpkjf5Hrl2oqPEbegurY+vWNeuzom Ta4w== X-Gm-Message-State: AOAM530hBvWIewV8S1HN1uRF1H0vRpQcJ9fS80WgyiQs9fDjE2lJfoDx 9Bfe5F7oDy1UTKaveD/LCkg= X-Google-Smtp-Source: ABdhPJyjcEoTJ7mKcLsx8+M6xfjkpI283KIp57+hZihzR132LBI0Yxy5O1+DCbl3qaZocHq7pmhERg== X-Received: by 2002:a17:902:bf44:b029:da:d140:6f91 with SMTP id u4-20020a170902bf44b02900dad1406f91mr2260025pls.51.1609904842148; Tue, 05 Jan 2021 19:47:22 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id 19sm726179pfu.85.2021.01.05.19.47.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jan 2021 19:47:21 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Tue, 5 Jan 2021 22:47:18 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Liang Li , Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [PATCH 1/6] mm: Add batch size for free page reporting Message-ID: <20210106034715.GA1138@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use the page order as the only threshold for page reporting is not flexible and has some flaws. Because scan a long free list is not cheap, it's better to wake up the page reporting worker when there are more pages, wake it up for a sigle page may not worth. This patch add a batch size as another threshold to control the waking up of reporting worker. Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Cc: Liang Li Signed-off-by: Liang Li --- mm/page_reporting.c | 1 + mm/page_reporting.h | 12 ++++++++++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/mm/page_reporting.c b/mm/page_reporting.c index cd8e13d41df4..694df981ddd2 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -12,6 +12,7 @@ #define PAGE_REPORTING_DELAY (2 * HZ) static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly; +unsigned long page_report_batch_size __read_mostly = 16 * 1024 * 1024UL; enum { PAGE_REPORTING_IDLE = 0, diff --git a/mm/page_reporting.h b/mm/page_reporting.h index 2c385dd4ddbd..b8fb3bbb345f 100644 --- a/mm/page_reporting.h +++ b/mm/page_reporting.h @@ -12,6 +12,8 @@ #define PAGE_REPORTING_MIN_ORDER pageblock_order +extern unsigned long page_report_batch_size; + #ifdef CONFIG_PAGE_REPORTING DECLARE_STATIC_KEY_FALSE(page_reporting_enabled); void __page_reporting_notify(void); @@ -33,6 +35,8 @@ static inline bool page_reported(struct page *page) */ static inline void page_reporting_notify_free(unsigned int order) { + static long batch_size; + /* Called from hot path in __free_one_page() */ if (!static_branch_unlikely(&page_reporting_enabled)) return; @@ -41,8 +45,12 @@ static inline void page_reporting_notify_free(unsigned int order) if (order < PAGE_REPORTING_MIN_ORDER) return; - /* This will add a few cycles, but should be called infrequently */ - __page_reporting_notify(); + batch_size += (1 << order) << PAGE_SHIFT; + if (batch_size >= page_report_batch_size) { + batch_size = 0; + /* This add a few cycles, but should be called infrequently */ + __page_reporting_notify(); + } } #else /* CONFIG_PAGE_REPORTING */ #define page_reported(_page) false From patchwork Wed Jan 6 03:48:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 12000915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F812C433E0 for ; Wed, 6 Jan 2021 03:48:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D920A22CB1 for ; Wed, 6 Jan 2021 03:48:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D920A22CB1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 420E48D00E1; Tue, 5 Jan 2021 22:48:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A91C8D00D1; Tue, 5 Jan 2021 22:48:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2221C8D00E1; Tue, 5 Jan 2021 22:48:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168]) by kanga.kvack.org (Postfix) with ESMTP id 05BFE8D00D1 for ; Tue, 5 Jan 2021 22:48:15 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C236B21E4 for ; Wed, 6 Jan 2021 03:48:14 +0000 (UTC) X-FDA: 77673967308.27.robin95_5815056274de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id A86453D663 for ; Wed, 6 Jan 2021 03:48:14 +0000 (UTC) X-HE-Tag: robin95_5815056274de X-Filterd-Recvd-Size: 9791 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Jan 2021 03:48:13 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id i7so1306228pgc.8 for ; Tue, 05 Jan 2021 19:48:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=B3WV/j0Yc3UxPycY24YbbsJo25sgb+PzV0Vt6RMFIJ4=; b=p3QzQqTq1x0ViyPmHXZ0vVH7lPNwPDUFAlch1JuuQBXeGQHUbPBD7OrSBcWrzru7kX JPto5iqfwVBAVg8jfDcNFrfFcMzYeWZRDoTkMplia/JRwuGYC9BulZF7mrDMhvNJ2Gnl mt2RFpygGcezHGnLG06Fk7EoTjVa/oK5CGZZRDnTAjoULTWU1PSLj+c/xZ4jpoURjV9G uA7yyAYH4by5fB+WwZBA/+N62r9yuIP/55klMvSJZvhz2XBpvlt9sh8SGtkJtiNmLB1v NLf2/wPpv1lS3w8i+wu58tKw/9Y+nRf6B4gWunkCsqpJhIW2no6TO5+vlSi24zKvQ9f6 gqaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition:user-agent; bh=B3WV/j0Yc3UxPycY24YbbsJo25sgb+PzV0Vt6RMFIJ4=; b=UYOXnCzGcyZGTcV+dt+RtAU9l/XANIvkBuji4TtD4+orXkqsMAFsNXiChNpVF8eqWh 4pkByj9dq1bOYknxKn63kb0BBmEH4zOZlcg2BBmuaf111aCDJI1b1QOrcmM6Hl5P4D2D b5a3pGNnTx1yj2ArAaZayr6YuOHH5P1rGr/carphsWmR0YDVZIHOGRgoEsHQoZM+ovpm XKyFP1HLS4N3dEyMaG3oEkIvJe71pJzpbxVykbECzIMmxeZ5EUsWf15knBTA4JeA1yjj 03RghfQpzk21g4WzrpBryV5vrhU4sNp6Db/W4pMJjGbdLnXz7wMZKb/wVYoZCbV2hpmr UPjQ== X-Gm-Message-State: AOAM530ixN9jgSD/m044wkcIfteP3i2u2/jLOwUgLU8DPY8YkjMB1WIP 1r6M7AoI11B9TuNblEVgsHQ= X-Google-Smtp-Source: ABdhPJz9vssAAYkLAHpWNywNUSwHPNC9skdPEiGDqhNAxr+rje3pVUyq972g8FX64yvm6siaLRbxXA== X-Received: by 2002:aa7:8641:0:b029:1a1:e2f5:23de with SMTP id a1-20020aa786410000b02901a1e2f523demr2131021pfo.35.1609904892473; Tue, 05 Jan 2021 19:48:12 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id t15sm736076pfc.12.2021.01.05.19.48.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jan 2021 19:48:11 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Tue, 5 Jan 2021 22:48:09 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Liang Li , Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [PATCH 2/6] mm: let user decide page reporting option Message-ID: <20210106034806.GA1146@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Some key parameters for page reporting are now hard coded, different users of the framework may have their special requirements, make these parameter configrable and let the user decide them. Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Cc: Liang Li Signed-off-by: Liang Li --- drivers/virtio/virtio_balloon.c | 3 +++ include/linux/page_reporting.h | 3 +++ mm/page_reporting.c | 13 +++++++++---- mm/page_reporting.h | 6 +++--- 4 files changed, 18 insertions(+), 7 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 8985fc2cea86..684bcc39ef5a 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -993,6 +993,9 @@ static int virtballoon_probe(struct virtio_device *vdev) goto out_unregister_oom; } + vb->pr_dev_info.mini_order = pageblock_order; + vb->pr_dev_info.batch_size = 16 * 1024 * 1024; /* 16M */ + vb->pr_dev_info.delay_jiffies = 2 * HZ; /* 2 seconds */ err = page_reporting_register(&vb->pr_dev_info); if (err) goto out_unregister_oom; diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h index 3b99e0ec24f2..63e1e9fbcaa2 100644 --- a/include/linux/page_reporting.h +++ b/include/linux/page_reporting.h @@ -13,6 +13,9 @@ struct page_reporting_dev_info { int (*report)(struct page_reporting_dev_info *prdev, struct scatterlist *sg, unsigned int nents); + unsigned long batch_size; + unsigned long delay_jiffies; + int mini_order; /* work struct for processing reports */ struct delayed_work work; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 694df981ddd2..39bc6a9d7b73 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -13,6 +13,7 @@ #define PAGE_REPORTING_DELAY (2 * HZ) static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly; unsigned long page_report_batch_size __read_mostly = 16 * 1024 * 1024UL; +int page_report_mini_order = pageblock_order; enum { PAGE_REPORTING_IDLE = 0, @@ -44,7 +45,7 @@ __page_reporting_request(struct page_reporting_dev_info *prdev) * now we are limiting this to running no more than once every * couple of seconds. */ - schedule_delayed_work(&prdev->work, PAGE_REPORTING_DELAY); + schedule_delayed_work(&prdev->work, prdev->delay_jiffies); } /* notify prdev of free page reporting request */ @@ -230,7 +231,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev, /* Generate minimum watermark to be able to guarantee progress */ watermark = low_wmark_pages(zone) + - (PAGE_REPORTING_CAPACITY << PAGE_REPORTING_MIN_ORDER); + (PAGE_REPORTING_CAPACITY << prdev->mini_order); /* * Cancel request if insufficient free memory or if we failed @@ -240,7 +241,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev, return err; /* Process each free list starting from lowest order/mt */ - for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) { + for (order = prdev->mini_order; order < MAX_ORDER; order++) { for (mt = 0; mt < MIGRATE_TYPES; mt++) { /* We do not pull pages from the isolate free list */ if (is_migrate_isolate(mt)) @@ -307,7 +308,7 @@ static void page_reporting_process(struct work_struct *work) */ state = atomic_cmpxchg(&prdev->state, state, PAGE_REPORTING_IDLE); if (state == PAGE_REPORTING_REQUESTED) - schedule_delayed_work(&prdev->work, PAGE_REPORTING_DELAY); + schedule_delayed_work(&prdev->work, prdev->delay_jiffies); } static DEFINE_MUTEX(page_reporting_mutex); @@ -335,6 +336,8 @@ int page_reporting_register(struct page_reporting_dev_info *prdev) /* Assign device to allow notifications */ rcu_assign_pointer(pr_dev_info, prdev); + page_report_mini_order = prdev->mini_order; + page_report_batch_size = prdev->batch_size; /* enable page reporting notification */ if (!static_key_enabled(&page_reporting_enabled)) { static_branch_enable(&page_reporting_enabled); @@ -352,6 +355,8 @@ void page_reporting_unregister(struct page_reporting_dev_info *prdev) mutex_lock(&page_reporting_mutex); if (rcu_access_pointer(pr_dev_info) == prdev) { + if (static_key_enabled(&page_reporting_enabled)) + static_branch_disable(&page_reporting_enabled); /* Disable page reporting notification */ RCU_INIT_POINTER(pr_dev_info, NULL); synchronize_rcu(); diff --git a/mm/page_reporting.h b/mm/page_reporting.h index b8fb3bbb345f..86ac6ffad970 100644 --- a/mm/page_reporting.h +++ b/mm/page_reporting.h @@ -9,9 +9,9 @@ #include #include #include +#include -#define PAGE_REPORTING_MIN_ORDER pageblock_order - +extern int page_report_mini_order; extern unsigned long page_report_batch_size; #ifdef CONFIG_PAGE_REPORTING @@ -42,7 +42,7 @@ static inline void page_reporting_notify_free(unsigned int order) return; /* Determine if we have crossed reporting threshold */ - if (order < PAGE_REPORTING_MIN_ORDER) + if (order < page_report_mini_order) return; batch_size += (1 << order) << PAGE_SHIFT; From patchwork Wed Jan 6 03:49:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 12000917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1BA9C433E0 for ; Wed, 6 Jan 2021 03:49:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2C72922D04 for ; Wed, 6 Jan 2021 03:49:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2C72922D04 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BE6AA8D00E2; Tue, 5 Jan 2021 22:49:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B94218D00D1; Tue, 5 Jan 2021 22:49:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0EED8D00E2; Tue, 5 Jan 2021 22:49:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id 86CF58D00D1 for ; Tue, 5 Jan 2021 22:49:26 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 522C3824556B for ; Wed, 6 Jan 2021 03:49:26 +0000 (UTC) X-FDA: 77673970332.09.sugar08_0810bf3274de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 3A5DE180AD817 for ; Wed, 6 Jan 2021 03:49:26 +0000 (UTC) X-HE-Tag: sugar08_0810bf3274de X-Filterd-Recvd-Size: 20553 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Jan 2021 03:49:25 +0000 (UTC) Received: by mail-pf1-f182.google.com with SMTP id s21so933200pfu.13 for ; Tue, 05 Jan 2021 19:49:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=lBE80DiAzCuDP+elt7sPeA0KPPaJzTapNwV8hoLlirQ=; b=W8Pm970WT8FFS7c4tyriVgv2cemuMNsVjehuv4bTaBQG4EP72fQTKcRS4hC6iCFgGA GZPh+sHECl8dvAeT5ziEuwLVbU3PEtw0Zl59MJafVSelkpvsDbvjgaGQDgR5nRc1aVkn WdCjXHX/Kmpt+yGin+lPMMJoLmmH79LHt4XrxX600mYx0tRRzFTaBEbgXrqaWdVqy/bD 7GWGKiKuil+h6Ov9tmxPWhzRmhScMTYz+8EZaztYRn3Z6ceIfvlR3H2Rg4qhC9i+wzpj yTNG95NM/FypjtwKs0GqcYwdoUMyyszGPHXLKwVEwqQlSl864GzcvCAQIcpruE1wpwjV BlEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition:user-agent; bh=lBE80DiAzCuDP+elt7sPeA0KPPaJzTapNwV8hoLlirQ=; b=ncUSmrHuwfmDHhkoOdUduzx0xwyACO5CmjhoZVmousimx0vxgwEJm1EiZoDmLFp1u6 LGjA7DK5lE+L5XIq1fnx2LE44iyN9hl4egV9iK0pSmy1wdXIS+77F4wCcDVcUmiogomg gZh0b+i3DQuNk0ZAclNVlhWhyS5qgYLCdZ6atdVUVCh7ZeVLFtjlEMM1oNXZB35XR9u8 LNF3b6wsaOJ5YCHZn6n2cK8+Sv74boi5cbdSTe2qF9hBjE12ajENSak1JWFnpMXRbC6T Q5uMF2EMQ7S0LgU0R/ylhMxolHPvXx+AdmZxlCD/D2LLclM3H45HlDZPWBYqRPnSNqTi 5LEg== X-Gm-Message-State: AOAM530CU7M3cU4fXZ1j7kWnHH1rFvnuTGDRmIXqPqTl8ibOQmb/fME6 5PDUI1brnUjnp+vWfy0x0O8= X-Google-Smtp-Source: ABdhPJwJ7UdmD8BKDrUkc1RIlVmeVrXZ6wfPFNrq3eR262WncT+CTOqeixq3iU9ed44Jo2b7NrHRSw== X-Received: by 2002:a63:4426:: with SMTP id r38mr2353171pga.240.1609904964843; Tue, 05 Jan 2021 19:49:24 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id g202sm703405pfb.196.2021.01.05.19.49.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jan 2021 19:49:24 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Tue, 5 Jan 2021 22:49:21 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Liang Li , Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [PATCH 3/6] hugetlb: add free page reporting support Message-ID: <20210106034918.GA1154@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: hugetlb manages its page in hstate's free page list, not in buddy system, this patch try to make it works for hugetlbfs. It canbe used for memory overcommit in virtualization and hugetlb pre zero out. Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Cc: Liang Li Signed-off-by: Liang Li --- include/linux/hugetlb.h | 3 + include/linux/page_reporting.h | 4 + mm/Kconfig | 1 + mm/hugetlb.c | 19 +++ mm/page_reporting.c | 297 +++++++++++++++++++++++++++++++++ mm/page_reporting.h | 34 ++++ 6 files changed, 358 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ebca2ef02212..d55e6a00b3dc 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -11,6 +11,7 @@ #include #include #include +#include struct ctl_table; struct user_struct; @@ -114,6 +115,8 @@ int hugetlb_treat_movable_handler(struct ctl_table *, int, void *, size_t *, int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); +void isolate_free_huge_page(struct page *page, struct hstate *h, int nid); +void putback_isolate_huge_page(struct hstate *h, struct page *page); int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h index 63e1e9fbcaa2..884afa2ad70c 100644 --- a/include/linux/page_reporting.h +++ b/include/linux/page_reporting.h @@ -26,4 +26,8 @@ struct page_reporting_dev_info { /* Tear-down and bring-up for page reporting devices */ void page_reporting_unregister(struct page_reporting_dev_info *prdev); int page_reporting_register(struct page_reporting_dev_info *prdev); + +/* Tear-down and bring-up for hugepage reporting devices */ +void hugepage_reporting_unregister(struct page_reporting_dev_info *prdev); +int hugepage_reporting_register(struct page_reporting_dev_info *prdev); #endif /*_LINUX_PAGE_REPORTING_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 4275c25b5d8a..630cde982186 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -247,6 +247,7 @@ config COMPACTION config PAGE_REPORTING bool "Free page reporting" def_bool n + select HUGETLBFS help Free page reporting allows for the incremental acquisition of free pages from the buddy allocator for the purpose of reporting diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cbf32d2824fd..eb533995cb49 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -41,6 +41,7 @@ #include #include #include +#include "page_reporting.h" #include "internal.h" int hugetlb_max_hstate __read_mostly; @@ -1028,6 +1029,9 @@ static void enqueue_huge_page(struct hstate *h, struct page *page) list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; + if (hugepage_reported(page)) + __ClearPageReported(page); + hugepage_reporting_notify_free(h->order); } static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) @@ -5531,6 +5535,21 @@ follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int fla return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT); } +void isolate_free_huge_page(struct page *page, struct hstate *h, int nid) +{ + VM_BUG_ON_PAGE(!PageHead(page), page); + + list_move(&page->lru, &h->hugepage_activelist); + set_page_refcounted(page); +} + +void putback_isolate_huge_page(struct hstate *h, struct page *page) +{ + int nid = page_to_nid(page); + + list_move(&page->lru, &h->hugepage_freelists[nid]); +} + bool isolate_huge_page(struct page *page, struct list_head *list) { bool ret = true; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 39bc6a9d7b73..cc31696225bb 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -6,15 +6,22 @@ #include #include #include +#include #include "page_reporting.h" #include "internal.h" #define PAGE_REPORTING_DELAY (2 * HZ) +#define MAX_SCAN_NUM 1024 + static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly; unsigned long page_report_batch_size __read_mostly = 16 * 1024 * 1024UL; int page_report_mini_order = pageblock_order; +static struct page_reporting_dev_info __rcu *hgpr_dev_info __read_mostly; +int hugepage_report_mini_order = pageblock_order; +unsigned long hugepage_report_batch_size = 16 * 1024 * 1024; + enum { PAGE_REPORTING_IDLE = 0, PAGE_REPORTING_REQUESTED, @@ -66,6 +73,24 @@ void __page_reporting_notify(void) rcu_read_unlock(); } +/* notify prdev of free hugepage reporting request */ +void __hugepage_reporting_notify(void) +{ + struct page_reporting_dev_info *prdev; + + /* + * We use RCU to protect the hgpr_dev_info pointer. In almost all + * cases this should be present, however in the unlikely case of + * a shutdown this will be NULL and we should exit. + */ + rcu_read_lock(); + prdev = rcu_dereference(hgpr_dev_info); + if (likely(prdev)) + __page_reporting_request(prdev); + + rcu_read_unlock(); +} + static void page_reporting_drain(struct page_reporting_dev_info *prdev, struct scatterlist *sgl, unsigned int nents, bool reported) @@ -102,6 +127,221 @@ page_reporting_drain(struct page_reporting_dev_info *prdev, sg_init_table(sgl, nents); } +static void +hugepage_reporting_drain(struct page_reporting_dev_info *prdev, + struct hstate *h, struct scatterlist *sgl, + unsigned int nents, bool reported) +{ + struct scatterlist *sg = sgl; + + /* + * Drain the now reported pages back into their respective + * free lists. We assume at least one page is populated. + */ + do { + struct page *page = sg_page(sg); + + putback_isolate_huge_page(h, page); + + /* If the pages were not reported due to error skip flagging */ + if (!reported) + continue; + + __SetPageReported(page); + } while ((sg = sg_next(sg))); + + /* reinitialize scatterlist now that it is empty */ + sg_init_table(sgl, nents); +} + +/* + * The page reporting cycle consists of 4 stages, fill, report, drain, and + * idle. We will cycle through the first 3 stages until we cannot obtain a + * full scatterlist of pages, in that case we will switch to idle. + */ +static int +hugepage_reporting_cycle(struct page_reporting_dev_info *prdev, + struct hstate *h, unsigned int nid, + struct scatterlist *sgl, unsigned int *offset, + int max_items) +{ + struct list_head *list = &h->hugepage_freelists[nid]; + unsigned int page_len = PAGE_SIZE << h->order; + struct page *page, *next; + long budget; + int ret = 0, scan_cnt = 0; + + /* + * Perform early check, if free list is empty there is + * nothing to process so we can skip this free list. + */ + if (list_empty(list)) + return ret; + + spin_lock(&hugetlb_lock); + + if (huge_page_order(h) > MAX_ORDER - 1) + budget = max_items; + else + budget = DIV_ROUND_UP(h->free_huge_pages_node[nid], + max_items * 16); + + /* loop through free list adding unreported pages to sg list */ + list_for_each_entry_safe(page, next, list, lru) { + /* We are going to skip over the reported pages. */ + if (PageReported(page)) { + if (++scan_cnt >= MAX_SCAN_NUM) { + ret = scan_cnt; + break; + } + continue; + } + + /* + * If we fully consumed our budget then update our + * state to indicate that we are requesting additional + * processing and exit this list. + */ + if (budget < 0) { + atomic_set(&prdev->state, PAGE_REPORTING_REQUESTED); + next = page; + break; + } + + /* Attempt to pull page from list and place in scatterlist */ + if (*offset) { + isolate_free_huge_page(page, h, nid); + /* Add page to scatter list */ + --(*offset); + sg_set_page(&sgl[*offset], page, page_len, 0); + continue; + } + + /* + * Make the first non-processed page in the free list + * the new head of the free list before we release + * hugetlb lock. + */ + if (&page->lru != list && !list_is_first(&page->lru, list)) + list_rotate_to_front(&page->lru, list); + + /* release lock before waiting on report processing */ + spin_unlock(&hugetlb_lock); + + /* begin processing pages in local list */ + ret = prdev->report(prdev, sgl, max_items); + + /* reset offset since the full list was reported */ + *offset = max_items; + + /* update budget to reflect call to report function */ + budget--; + + /* reacquire hugetlb lock and resume processing */ + spin_lock(&hugetlb_lock); + + /* flush reported pages from the sg list */ + hugepage_reporting_drain(prdev, h, sgl, max_items, !ret); + + /* + * Reset next to first entry, the old next isn't valid + * since we dropped the lock to report the pages + */ + next = list_first_entry(list, struct page, lru); + + /* exit on error */ + if (ret) + break; + } + + /* Rotate any leftover pages to the head of the freelist */ + if (&next->lru != list && !list_is_first(&next->lru, list)) + list_rotate_to_front(&next->lru, list); + + spin_unlock(&hugetlb_lock); + + return ret; +} + +static int +hugepage_reporting_process_hstate(struct page_reporting_dev_info *prdev, + struct scatterlist *sgl, int max_items, + struct hstate *h) +{ + unsigned int leftover, offset; + int ret = 0, nid; + + offset = max_items; + for (nid = 0; nid < MAX_NUMNODES; nid++) { + ret = hugepage_reporting_cycle(prdev, h, nid, sgl, &offset, + max_items); + + if (ret < 0) + return ret; + } + + /* report the leftover pages before going idle */ + leftover = max_items - offset; + if (leftover) { + sgl = &sgl[offset]; + ret = prdev->report(prdev, sgl, leftover); + + /* flush any remaining pages out from the last report */ + spin_lock(&hugetlb_lock); + hugepage_reporting_drain(prdev, h, sgl, leftover, !ret); + spin_unlock(&hugetlb_lock); + } + + return ret; +} + +static void hugepage_reporting_process(struct work_struct *work) +{ + struct delayed_work *d_work = to_delayed_work(work); + struct page_reporting_dev_info *prdev = container_of(d_work, + struct page_reporting_dev_info, work); + int err = 0, state = PAGE_REPORTING_ACTIVE; + struct scatterlist *sgl; + struct hstate *h; + + /* + * Change the state to "Active" so that we can track if there is + * anyone requests page reporting after we complete our pass. If + * the state is not altered by the end of the pass we will switch + * to idle and quit scheduling reporting runs. + */ + atomic_set(&prdev->state, state); + + /* allocate scatterlist to store pages being reported on */ + sgl = kmalloc_array(PAGE_REPORTING_CAPACITY, sizeof(*sgl), GFP_KERNEL); + if (!sgl) + goto err_out; + + for_each_hstate(h) { + int max_items; + + if (huge_page_order(h) > MAX_ORDER - 1) + max_items = 1; + else + max_items = PAGE_REPORTING_CAPACITY; + sg_init_table(sgl, max_items); + err = hugepage_reporting_process_hstate(prdev, sgl, max_items, h); + if (err) + break; + } + + kfree(sgl); +err_out: + /* + * If the state has reverted back to requested then there may be + * additional pages to be processed. We will defer sometime to allow + * more pages to accumulate. + */ + state = atomic_cmpxchg(&prdev->state, state, PAGE_REPORTING_IDLE); + if (state == PAGE_REPORTING_REQUESTED) + schedule_delayed_work(&prdev->work, prdev->delay_jiffies); +} + /* * The page reporting cycle consists of 4 stages, fill, report, drain, and * idle. We will cycle through the first 3 stages until we cannot obtain a @@ -314,6 +554,9 @@ static void page_reporting_process(struct work_struct *work) static DEFINE_MUTEX(page_reporting_mutex); DEFINE_STATIC_KEY_FALSE(page_reporting_enabled); +static DEFINE_MUTEX(hugepage_reporting_mutex); +DEFINE_STATIC_KEY_FALSE(hugepage_reporting_enabled); + int page_reporting_register(struct page_reporting_dev_info *prdev) { int err = 0; @@ -368,3 +611,57 @@ void page_reporting_unregister(struct page_reporting_dev_info *prdev) mutex_unlock(&page_reporting_mutex); } EXPORT_SYMBOL_GPL(page_reporting_unregister); + +int hugepage_reporting_register(struct page_reporting_dev_info *prdev) +{ + int err = 0; + + mutex_lock(&hugepage_reporting_mutex); + + /* nothing to do if already in use */ + if (rcu_access_pointer(hgpr_dev_info)) { + err = -EBUSY; + goto err_out; + } + + /* initialize state and work structures */ + atomic_set(&prdev->state, PAGE_REPORTING_IDLE); + INIT_DELAYED_WORK(&prdev->work, &hugepage_reporting_process); + + /* Begin initial flush of zones */ + __page_reporting_request(prdev); + + /* Assign device to allow notifications */ + rcu_assign_pointer(hgpr_dev_info, prdev); + + hugepage_report_mini_order = prdev->mini_order; + hugepage_report_batch_size = prdev->batch_size; + + /* enable hugepage reporting notification */ + if (!static_key_enabled(&hugepage_reporting_enabled)) + static_branch_enable(&hugepage_reporting_enabled); +err_out: + mutex_unlock(&hugepage_reporting_mutex); + + return err; +} +EXPORT_SYMBOL_GPL(hugepage_reporting_register); + +void hugepage_reporting_unregister(struct page_reporting_dev_info *prdev) +{ + mutex_lock(&hugepage_reporting_mutex); + + if (rcu_access_pointer(hgpr_dev_info) == prdev) { + if (static_key_enabled(&hugepage_reporting_enabled)) + static_branch_disable(&hugepage_reporting_enabled); + /* Disable huge page reporting notification */ + RCU_INIT_POINTER(hgpr_dev_info, NULL); + synchronize_rcu(); + + /* Flush any existing work, and lock it out */ + cancel_delayed_work_sync(&prdev->work); + } + + mutex_unlock(&hugepage_reporting_mutex); +} +EXPORT_SYMBOL_GPL(hugepage_reporting_unregister); diff --git a/mm/page_reporting.h b/mm/page_reporting.h index 86ac6ffad970..f0c504b29722 100644 --- a/mm/page_reporting.h +++ b/mm/page_reporting.h @@ -18,12 +18,24 @@ extern unsigned long page_report_batch_size; DECLARE_STATIC_KEY_FALSE(page_reporting_enabled); void __page_reporting_notify(void); +extern int hugepage_report_mini_order; +extern unsigned long hugepage_report_batch_size; + +DECLARE_STATIC_KEY_FALSE(hugepage_reporting_enabled); +void __hugepage_reporting_notify(void); + static inline bool page_reported(struct page *page) { return static_branch_unlikely(&page_reporting_enabled) && PageReported(page); } +static inline bool hugepage_reported(struct page *page) +{ + return static_branch_unlikely(&hugepage_reporting_enabled) && + PageReported(page); +} + /** * page_reporting_notify_free - Free page notification to start page processing * @@ -52,11 +64,33 @@ static inline void page_reporting_notify_free(unsigned int order) __page_reporting_notify(); } } + +static inline void hugepage_reporting_notify_free(unsigned int order) +{ + static long batch_size; + + if (!static_branch_unlikely(&hugepage_reporting_enabled)) + return; + + /* Determine if we have crossed reporting threshold */ + if (order < hugepage_report_mini_order) + return; + + batch_size += (1 << order) << PAGE_SHIFT; + if (batch_size >= hugepage_report_batch_size) { + batch_size = 0; + __hugepage_reporting_notify(); + } +} #else /* CONFIG_PAGE_REPORTING */ #define page_reported(_page) false static inline void page_reporting_notify_free(unsigned int order) { } + +static inline void hugepage_reporting_notify_free(unsigned int order) +{ +} #endif /* CONFIG_PAGE_REPORTING */ #endif /*_MM_PAGE_REPORTING_H */ From patchwork Wed Jan 6 03:50:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 12000919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFC55C433E0 for ; Wed, 6 Jan 2021 03:50:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9322522CB1 for ; Wed, 6 Jan 2021 03:50:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9322522CB1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1CED98D00D1; Tue, 5 Jan 2021 22:50:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 17EB28D0090; Tue, 5 Jan 2021 22:50:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06E758D00D1; Tue, 5 Jan 2021 22:50:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id E5F1D8D0090 for ; Tue, 5 Jan 2021 22:50:35 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A6AAD824556B for ; Wed, 6 Jan 2021 03:50:35 +0000 (UTC) X-FDA: 77673973230.08.brain96_1500b08274de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 89D131819E76C for ; Wed, 6 Jan 2021 03:50:35 +0000 (UTC) X-HE-Tag: brain96_1500b08274de X-Filterd-Recvd-Size: 8185 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Jan 2021 03:50:35 +0000 (UTC) Received: by mail-pg1-f172.google.com with SMTP id i5so1337715pgo.1 for ; Tue, 05 Jan 2021 19:50:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=vjozGeH0DLUu5oDcu6sd/j2j59t7TfqorAS7kIj0og4=; b=cB8IknFCHfwQTUc4eD+XI8HFf02mJt83IhhReI+NBrD7IjqJmV0OLeqrMy/t4c+RHC U6CgIhtQmjlxEaQSkgbunWFDNBIRUrryYr49JMAem5NfsQDJ3yPvJwYlGhuDaPt6oukB NrZCRdkvUQ1iKBusU9sLG8dmYD1wGWgz1jCjsyR2ymzoESR2vugOLhAZxfLTCFzYxNmH 9siNnxcynZ8tyh1GiH2aFj37kccCVuvaprUTCDj4EDzTzsRI2EJ4EcUvbQd4FdKWDov8 DOB9u4Pt9gk//014cc2L5tvlwvtozrWAD4WfkF8veQOCfe5y3F9drcsGN1p26CmOeCbd eSYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition:user-agent; bh=vjozGeH0DLUu5oDcu6sd/j2j59t7TfqorAS7kIj0og4=; b=LHAFEUowDBb6ASB2oL6AHNlAkidrcNztSezugVi64ZSC9pl5gicyeIo8C0mYZ1akrY 6ZaNsl11pD0Qar5A6Q2XExT3/MmuYDR5VlzUDGLO7UXd39XLUsM5L2uawK0RNn7bYJxS nFemAqWjYyG5baFbyxHSLnEBddKVJALFQowongDhkSrF/xVLcnb+VzUTBfJw7PmloPOZ xGefEcYr9g3yTHxfVyer+afFaZjeGpA/jjUFWSxqmwezhJ+NPSS9vrktQe6jwvYDYyPn dNJUL1E8RzKwmoCZzyT/QGdlRBnpzi9wJij0fD7gJbYTXN+s5f/H6Fc6bXFpJ5AaYb5A Tv0Q== X-Gm-Message-State: AOAM530BV16o0kVFDOx6+ohoqE2nTgcSVATRFRmd/AGEX0EzrTuWQfHf Ge/m+4hGKettAxtX75Avc58= X-Google-Smtp-Source: ABdhPJx3UVAdnbo3q8qAcpUMRqlMo/+9TS95oG+EkORfpLfFTM4NBjJQnPC+fi+osFhdS8KdCq1qdw== X-Received: by 2002:a63:78ca:: with SMTP id t193mr2351201pgc.391.1609905034313; Tue, 05 Jan 2021 19:50:34 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id m15sm735974pfa.72.2021.01.05.19.50.32 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jan 2021 19:50:33 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Tue, 5 Jan 2021 22:50:31 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Liang Li , Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [PATCH 4/6] hugetlb: avoid allocation failed when page reporting is on going Message-ID: <20210106035027.GA1160@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Page reporting isolates free pages temporarily when reporting free pages information. It will reduce the actual free pages and may cause application failed for no enough available memory. This patch try to solve this issue, when there is no free page and page repoting is on going, wait until it is done. Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Cc: Liang Li Signed-off-by: Liang Li --- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 9 +++++++++ mm/page_reporting.c | 6 +++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d55e6a00b3dc..73b2934ba91c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -490,6 +490,7 @@ struct hstate { unsigned long resv_huge_pages; unsigned long surplus_huge_pages; unsigned long nr_overcommit_huge_pages; + unsigned long isolated_huge_pages; struct list_head hugepage_activelist; struct list_head hugepage_freelists[MAX_NUMNODES]; unsigned int nr_huge_pages_node[MAX_NUMNODES]; @@ -500,6 +501,7 @@ struct hstate { struct cftype cgroup_files_dfl[7]; struct cftype cgroup_files_legacy[9]; #endif + struct mutex mtx_prezero; char name[HSTATE_NAME_LEN]; }; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index eb533995cb49..0fccd5f96954 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2320,6 +2320,12 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, goto out_uncharge_cgroup_reservation; spin_lock(&hugetlb_lock); + while (h->free_huge_pages <= 1 && h->isolated_huge_pages) { + spin_unlock(&hugetlb_lock); + mutex_lock(&h->mtx_prezero); + mutex_unlock(&h->mtx_prezero); + spin_lock(&hugetlb_lock); + } /* * glb_chg is passed to indicate whether or not a page must be taken * from the global free pool (global change). gbl_chg == 0 indicates @@ -3208,6 +3214,7 @@ void __init hugetlb_add_hstate(unsigned int order) INIT_LIST_HEAD(&h->hugepage_activelist); h->next_nid_to_alloc = first_memory_node; h->next_nid_to_free = first_memory_node; + mutex_init(&h->mtx_prezero); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", huge_page_size(h)/1024); @@ -5541,6 +5548,7 @@ void isolate_free_huge_page(struct page *page, struct hstate *h, int nid) list_move(&page->lru, &h->hugepage_activelist); set_page_refcounted(page); + h->isolated_huge_pages++; } void putback_isolate_huge_page(struct hstate *h, struct page *page) @@ -5548,6 +5556,7 @@ void putback_isolate_huge_page(struct hstate *h, struct page *page) int nid = page_to_nid(page); list_move(&page->lru, &h->hugepage_freelists[nid]); + h->isolated_huge_pages--; } bool isolate_huge_page(struct page *page, struct list_head *list) diff --git a/mm/page_reporting.c b/mm/page_reporting.c index cc31696225bb..99e1e688d7c1 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -272,12 +272,15 @@ hugepage_reporting_process_hstate(struct page_reporting_dev_info *prdev, int ret = 0, nid; offset = max_items; + mutex_lock(&h->mtx_prezero); for (nid = 0; nid < MAX_NUMNODES; nid++) { ret = hugepage_reporting_cycle(prdev, h, nid, sgl, &offset, max_items); - if (ret < 0) + if (ret < 0) { + mutex_unlock(&h->mtx_prezero); return ret; + } } /* report the leftover pages before going idle */ @@ -291,6 +294,7 @@ hugepage_reporting_process_hstate(struct page_reporting_dev_info *prdev, hugepage_reporting_drain(prdev, h, sgl, leftover, !ret); spin_unlock(&hugetlb_lock); } + mutex_unlock(&h->mtx_prezero); return ret; } From patchwork Wed Jan 6 03:51:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 12000921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F54AC433DB for ; Wed, 6 Jan 2021 03:51:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3B75D22D04 for ; Wed, 6 Jan 2021 03:51:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3B75D22D04 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A58B28D00E3; Tue, 5 Jan 2021 22:51:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A08698D0090; Tue, 5 Jan 2021 22:51:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91E538D00E3; Tue, 5 Jan 2021 22:51:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id 7D1468D0090 for ; Tue, 5 Jan 2021 22:51:18 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4CF911EFF for ; Wed, 6 Jan 2021 03:51:18 +0000 (UTC) X-FDA: 77673975036.16.rock98_4c03557274de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 29D92100E6903 for ; Wed, 6 Jan 2021 03:51:18 +0000 (UTC) X-HE-Tag: rock98_4c03557274de X-Filterd-Recvd-Size: 9139 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Jan 2021 03:51:17 +0000 (UTC) Received: by mail-pf1-f171.google.com with SMTP id c79so966523pfc.2 for ; Tue, 05 Jan 2021 19:51:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=eXrpMrj6YvsHKIRK9+LKKpoAFDK39SfbfHOXlypYzcY=; b=RvvSRBJL76JHBWv5Iuo5pbFPOkCRGhaqksMZdVAk3yc+voMyPuThBYLQuC1WqBWUet lDoluuYk8qyNnjbsrHBvLEm+r5ZN2NVlldQ0fPVErO5yiDlMCmPqjGCmqn9tdrpDLA1x q93rHGhvbSgH3HGS153bt7sjg5jfKMevbILEiujQKrjlFIw3GWYRTCRimjHBDGQwhNEP agCrtdxYbNowovcXKfSJ5JeVo3g91u/xu6Id0QxxYfihGyldf9ZujfBdQogUtcHYilXo FT9MrM86wlMFkA6n+jzSXjxlnEw74t2HBwgtaQzicWSZNenH4RXpPr8pU0RMB+7h1eQJ T5pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition:user-agent; bh=eXrpMrj6YvsHKIRK9+LKKpoAFDK39SfbfHOXlypYzcY=; b=nOH7SHUr32U7oCgSCvzoiLnwEB0m1WrMzYNhjNZtG/fMjdhg6VirNyPB64eVyiYtjf TsDDIDxl6PUr/11qtj9HXQAhrG4ssT3J65MmfQsEjTVbm7CNilCBUYLC2t9rvZPCWcLR ZdY4/t1wkyauq3kGd+JN6QfXi1VlY1WcONk582SFM+e5w5FkYqSJ8fpQXMObRC9Vdyz/ cilcjwVM+oR56jEWThL/Zu9qxnFrCV60PBme35v3gIFgVh43lEne4//00J3sDnABTGBS BmQ0WNIEwlU7YkWjQoHZd+cMyeioDBK2pefDbSABSWQDkjm4r6SOKlJ387MDcAhoQxDN hnng== X-Gm-Message-State: AOAM532Iu6kbH6qQWbyWZNYSJE8cdeCZW11pw8gaH1EHyTOAWt8S7swA P2+4qEvGEbwvgp1E8qdCnh8= X-Google-Smtp-Source: ABdhPJxZms0h1r8psnB+l/5ZJpMwdThWyheWb4JvpjFTF2mk3Cpuyqqc/D0VsAYWKcTPmp6zF3+k1g== X-Received: by 2002:aa7:96c9:0:b029:1a3:c274:f717 with SMTP id h9-20020aa796c90000b02901a3c274f717mr2214501pfq.38.1609905076918; Tue, 05 Jan 2021 19:51:16 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id g75sm743323pfb.2.2021.01.05.19.51.15 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jan 2021 19:51:16 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Tue, 5 Jan 2021 22:51:13 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Liang Li , Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [PATCH 5/6] virtio-balloon: reporting hugetlb free page to host Message-ID: <20210106035110.GA1170@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Free page reporting only supports buddy pages, it can't report the free pages reserved for hugetlbfs case. On the other hand, hugetlbfs is a good choice for a system with a huge amount of RAM, because it can help to reduce the memory management overhead and improve system performance. This patch add support for reporting free hugepage to host when guest use hugetlbfs. Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Cc: Liang Li Signed-off-by: Liang Li --- drivers/virtio/virtio_balloon.c | 55 +++++++++++++++++++++++++++++++-- 1 file changed, 53 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 684bcc39ef5a..7bd7fcacee8c 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -126,6 +126,10 @@ struct virtio_balloon { /* Free page reporting device */ struct virtqueue *reporting_vq; struct page_reporting_dev_info pr_dev_info; + + /* Free hugepage reporting device */ + struct page_reporting_dev_info hpr_dev_info; + struct mutex mtx_report; }; static const struct virtio_device_id id_table[] = { @@ -173,6 +177,38 @@ static int virtballoon_free_page_report(struct page_reporting_dev_info *pr_dev_i struct virtqueue *vq = vb->reporting_vq; unsigned int unused, err; + mutex_lock(&vb->mtx_report); + /* We should always be able to add these buffers to an empty queue. */ + err = virtqueue_add_inbuf(vq, sg, nents, vb, GFP_NOWAIT | __GFP_NOWARN); + + /* + * In the extremely unlikely case that something has occurred and we + * are able to trigger an error we will simply display a warning + * and exit without actually processing the pages. + */ + if (WARN_ON_ONCE(err)) { + mutex_unlock(&vb->mtx_report); + return err; + } + + virtqueue_kick(vq); + + /* When host has read buffer, this completes via balloon_ack */ + wait_event(vb->acked, virtqueue_get_buf(vq, &unused)); + mutex_unlock(&vb->mtx_report); + + return 0; +} + +static int virtballoon_free_hugepage_report(struct page_reporting_dev_info *hpr_dev_info, + struct scatterlist *sg, unsigned int nents) +{ + struct virtio_balloon *vb = + container_of(hpr_dev_info, struct virtio_balloon, hpr_dev_info); + struct virtqueue *vq = vb->reporting_vq; + unsigned int unused, err; + + mutex_lock(&vb->mtx_report); /* We should always be able to add these buffers to an empty queue. */ err = virtqueue_add_inbuf(vq, sg, nents, vb, GFP_NOWAIT | __GFP_NOWARN); @@ -181,13 +217,16 @@ static int virtballoon_free_page_report(struct page_reporting_dev_info *pr_dev_i * are able to trigger an error we will simply display a warning * and exit without actually processing the pages. */ - if (WARN_ON_ONCE(err)) + if (WARN_ON_ONCE(err)) { + mutex_unlock(&vb->mtx_report); return err; + } virtqueue_kick(vq); /* When host has read buffer, this completes via balloon_ack */ wait_event(vb->acked, virtqueue_get_buf(vq, &unused)); + mutex_unlock(&vb->mtx_report); return 0; } @@ -984,9 +1023,11 @@ static int virtballoon_probe(struct virtio_device *vdev) } vb->pr_dev_info.report = virtballoon_free_page_report; + vb->hpr_dev_info.report = virtballoon_free_hugepage_report; if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) { unsigned int capacity; + mutex_init(&vb->mtx_report); capacity = virtqueue_get_vring_size(vb->reporting_vq); if (capacity < PAGE_REPORTING_CAPACITY) { err = -ENOSPC; @@ -999,6 +1040,14 @@ static int virtballoon_probe(struct virtio_device *vdev) err = page_reporting_register(&vb->pr_dev_info); if (err) goto out_unregister_oom; + + vb->hpr_dev_info.mini_order = MAX_ORDER - 1; + vb->hpr_dev_info.batch_size = 16 * 1024 * 1024; /* 16M */ + vb->hpr_dev_info.delay_jiffies = 2 * HZ; /* 2 seconds */ + err = hugepage_reporting_register(&vb->hpr_dev_info); + if (err) + goto out_unregister_oom; + } virtio_device_ready(vdev); @@ -1051,8 +1100,10 @@ static void virtballoon_remove(struct virtio_device *vdev) { struct virtio_balloon *vb = vdev->priv; - if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) { page_reporting_unregister(&vb->pr_dev_info); + hugepage_reporting_unregister(&vb->hpr_dev_info); + } if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) unregister_oom_notifier(&vb->oom_nb); if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) From patchwork Wed Jan 6 03:52:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 12000923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0145C433DB for ; Wed, 6 Jan 2021 03:52:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6A73122D04 for ; Wed, 6 Jan 2021 03:52:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A73122D04 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E4EF48D00E4; Tue, 5 Jan 2021 22:52:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E23CE8D0090; Tue, 5 Jan 2021 22:52:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3AD38D00E4; Tue, 5 Jan 2021 22:52:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0165.hostedemail.com [216.40.44.165]) by kanga.kvack.org (Postfix) with ESMTP id BECDE8D0090 for ; Tue, 5 Jan 2021 22:52:14 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 81D47180AD802 for ; Wed, 6 Jan 2021 03:52:14 +0000 (UTC) X-FDA: 77673977388.07.river12_401610c274de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 650EA1803F9B0 for ; Wed, 6 Jan 2021 03:52:14 +0000 (UTC) X-HE-Tag: river12_401610c274de X-Filterd-Recvd-Size: 14794 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Jan 2021 03:52:13 +0000 (UTC) Received: by mail-pj1-f52.google.com with SMTP id v1so885083pjr.2 for ; Tue, 05 Jan 2021 19:52:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=YXvVktv0fPXXQzAccQWu9oQX6wR5ZBzr4yUWR5mZn8E=; b=E+TqYkQ+vvGm52XKLcj7pzx/AoMoyjmhZionMWvB+iD2nGhN5HS2IXBsK0zsx48zld zXcZh0S3wExZQCQ90cZ1G7D/fmDZM3qxk72XPMvMbLoqjpR80jhTBVBuSVpTf2JXoAvm TBHyq0uqWJXFIW0qW/ZB9O6FUdO8i3Yj/9EPbaZWrGcZ/SzgCau2KsffScsl9EH/pv3Z O6sIpIGdP3opqIhvC8N6Bpxecb4GRYil97DjiX6lGs8SzC5kWmiCW7gf4PIlSmS/gVpk JNxtNxehprc5qZftfDr7Ru35WC6Zgp3MygwCulVKZZDxupbc7wvSmNmEbHFsXteaydHl sGhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition:user-agent; bh=YXvVktv0fPXXQzAccQWu9oQX6wR5ZBzr4yUWR5mZn8E=; b=dS2Icogc4x9WZWKqFG6ctNt40laeMnoZrP2Wc4QBNlNIMinaUXh0HuQ6ujMKSus+8Y 9xMWyQndYw6SitHJUL6frBEL1JKutVhW9nJSrgzSz1vTAOao5x/dC7c6Fm1Mdb1sSdSi 53Zx3Mf6YHkKlCnRpvLMBnNNLeozyw5CVxf6akWXoBKP31IKljcCvuM7Pj3ppsEIQpRF uihwipJq9vTM5j98Gw88Vm1eXn1dIe6qWGS1XM961nVV+lRMRqyhJq4nXJfdts7KS+OY injT1dTsNi4zbW3jxkPmMYok+qD6NCqHKXHKM3sT8mampa1kCgfHZu2ym6CgPeB87Xjh KcHw== X-Gm-Message-State: AOAM531Ci2xXxdy4yheoC2omscZ9SYxtoOrbe7fRirigCgV2cctw/lSc HjLH1a8UDe0d7+KVm7M63fg= X-Google-Smtp-Source: ABdhPJzv1+2srOGnaXnQG0stKz8wkd2+RJLuqGpq6an9LW/JaeXEr9uy1iQyEuOrnTNbFnyUiaJXbw== X-Received: by 2002:a17:902:a60c:b029:da:e036:5dbe with SMTP id u12-20020a170902a60cb02900dae0365dbemr2616785plq.43.1609905133063; Tue, 05 Jan 2021 19:52:13 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id b11sm724605pfr.38.2021.01.05.19.52.11 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jan 2021 19:52:12 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Tue, 5 Jan 2021 22:52:09 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Liang Li , Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [PATCH 6/6] hugetlb: support free hugepage pre zero out Message-ID: <20210106035206.GA1183@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch add support of pre zero out free hugepage, we can use this feature to speed up page population and page fault handing. Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Cc: Jason Wang Cc: Liang Li Signed-off-by: Liang Li --- include/linux/page-flags.h | 12 ++ mm/Kconfig | 10 ++ mm/huge_memory.c | 3 +- mm/hugetlb.c | 243 +++++++++++++++++++++++++++++++++++++ mm/memory.c | 4 + 5 files changed, 271 insertions(+), 1 deletion(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index ec5d0290e0ee..f177c5e85632 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -173,6 +173,9 @@ enum pageflags { /* Only valid for buddy pages. Used to track pages that are reported */ PG_reported = PG_uptodate, + + /* Only valid for hugetlb pages. Used to mark zero pages */ + PG_zero = PG_slab, }; #ifndef __GENERATING_BOUNDS_H @@ -451,6 +454,15 @@ PAGEFLAG(Idle, idle, PF_ANY) */ __PAGEFLAG(Reported, reported, PF_NO_COMPOUND) +/* + * PageZero() is used to track hugetlb free pages within the free list + * of hugetlbfs. We can use the non-atomic version of the test and set + * operations as both should be shielded with the hugetlb lock to prevent + * any possible races on the setting or clearing of the bit. + */ +__PAGEFLAG(Zero, zero, PF_ONLY_HEAD) + + /* * On an anonymous page mapped into a user virtual memory area, * page->mapping points to its anon_vma, not to a struct address_space; diff --git a/mm/Kconfig b/mm/Kconfig index 630cde982186..1d91e182825d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -254,6 +254,16 @@ config PAGE_REPORTING those pages to another entity, such as a hypervisor, so that the memory can be freed within the host for other uses. +# +# support for pre zero out hugetlbfs free page +config PREZERO_HPAGE + bool "Pre zero out hugetlbfs free page" + def_bool n + depends on PAGE_REPORTING + help + Allows pre zero out hugetlbfs free pages in freelist based on free + page reporting + # # support for page migration # diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9237976abe72..4ff99724d669 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2407,7 +2407,8 @@ static void __split_huge_page_tail(struct page *head, int tail, #ifdef CONFIG_64BIT (1L << PG_arch_2) | #endif - (1L << PG_dirty))); + (1L << PG_dirty) | + (1L << PG_zero))); /* ->mapping in first tail page is compound_mapcount */ VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0fccd5f96954..2029668a0864 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1029,6 +1029,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page) list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; + __ClearPageZero(page); if (hugepage_reported(page)) __ClearPageReported(page); hugepage_reporting_notify_free(h->order); @@ -1315,6 +1316,7 @@ static void update_and_free_page(struct hstate *h, struct page *page) VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page); set_compound_page_dtor(page, NULL_COMPOUND_DTOR); set_page_refcounted(page); + __ClearPageZero(page); if (hstate_is_gigantic(h)) { /* * Temporarily drop the hugetlb_lock, because @@ -2963,6 +2965,237 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, return retval; } +#ifdef CONFIG_PREZERO_HPAGE + +#define PRE_ZERO_STOP 0 +#define PRE_ZERO_RUN 1 + +static int mini_page_order; +static unsigned long batch_size = 16 * 1024 * 1024; +static unsigned long delay_millisecs = 1000; +static unsigned long prezero_enable __read_mostly; +static DEFINE_MUTEX(kprezerod_mutex); +static struct page_reporting_dev_info pre_zero_hpage_dev_info; + +static int zero_out_pages(struct page_reporting_dev_info *pr_dev_info, + struct scatterlist *sgl, unsigned int nents) +{ + struct scatterlist *sg = sgl; + + might_sleep(); + do { + struct page *page = sg_page(sg); + unsigned int i, order = get_order(sg->length); + + VM_BUG_ON(PageBuddy(page) || buddy_order(page)); + + if (PageZero(page)) + continue; + for (i = 0; i < (1 << order); i++) { + cond_resched(); + clear_highpage(page + i); + } + __SetPageZero(page); + } while ((sg = sg_next(sg))); + + return 0; +} + +static int start_kprezerod(void) +{ + int err = 0; + + if (prezero_enable == PRE_ZERO_RUN) { + pre_zero_hpage_dev_info.report = zero_out_pages; + pre_zero_hpage_dev_info.mini_order = mini_page_order; + pre_zero_hpage_dev_info.batch_size = batch_size; + pre_zero_hpage_dev_info.delay_jiffies = msecs_to_jiffies(delay_millisecs); + + err = hugepage_reporting_register(&pre_zero_hpage_dev_info); + pr_info("Pre zero hugepage enabled\n"); + } else { + hugepage_reporting_unregister(&pre_zero_hpage_dev_info); + pr_info("Pre zero hugepage disabled\n"); + } + + return err; +} + +static int restart_kprezerod(void) +{ + int err = 0; + + mutex_lock(&kprezerod_mutex); + if (prezero_enable == PRE_ZERO_RUN) { + hugepage_reporting_unregister(&pre_zero_hpage_dev_info); + + pre_zero_hpage_dev_info.report = zero_out_pages; + pre_zero_hpage_dev_info.mini_order = mini_page_order; + pre_zero_hpage_dev_info.batch_size = batch_size; + pre_zero_hpage_dev_info.delay_jiffies = msecs_to_jiffies(delay_millisecs); + + err = hugepage_reporting_register(&pre_zero_hpage_dev_info); + } + mutex_unlock(&kprezerod_mutex); + + return err; +} + +static ssize_t enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", prezero_enable); +} + +static ssize_t enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret = 0; + unsigned long flags; + int err; + + err = kstrtoul(buf, 10, &flags); + if (err || flags > UINT_MAX) + return -EINVAL; + if (flags > PRE_ZERO_RUN) + return -EINVAL; + + mutex_lock(&kprezerod_mutex); + if (prezero_enable != flags) { + prezero_enable = flags; + ret = start_kprezerod(); + } + mutex_unlock(&kprezerod_mutex); + + return count; +} + +static struct kobj_attribute enabled_attr = + __ATTR(enabled, 0644, enabled_show, enabled_store); + + +static ssize_t batch_size_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", batch_size); +} + +static ssize_t batch_size_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + unsigned long size; + int err; + + err = kstrtoul(buf, 10, &size); + if (err || size >= UINT_MAX) + return -EINVAL; + + batch_size = size; + + restart_kprezerod(); + return count; +} + +static struct kobj_attribute batch_size_attr = + __ATTR(batch_size, 0644, batch_size_show, batch_size_store); + +static ssize_t delay_millisecs_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", delay_millisecs); +} + +static ssize_t delay_millisecs_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + unsigned long msecs; + int err; + + err = kstrtoul(buf, 10, &msecs); + if (err || msecs >= UINT_MAX) + return -EINVAL; + + delay_millisecs = msecs; + + restart_kprezerod(); + + return count; +} + +static struct kobj_attribute wake_delay_millisecs_attr = + __ATTR(delay_millisecs, 0644, delay_millisecs_show, + delay_millisecs_store); + +static ssize_t mini_order_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%u\n", mini_page_order); +} + +static ssize_t mini_order_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + unsigned int order; + int err; + + err = kstrtouint(buf, 10, &order); + if (err || order >= MAX_ORDER) + return -EINVAL; + + if (mini_page_order != order) { + mutex_lock(&kprezerod_mutex); + mini_page_order = order; + mutex_unlock(&kprezerod_mutex); + } + + restart_kprezerod(); + return count; +} + +static struct kobj_attribute mini_order_attr = + __ATTR(mini_order, 0644, mini_order_show, mini_order_store); + +static struct attribute *pre_zero_attr[] = { + &enabled_attr.attr, + &mini_order_attr.attr, + &wake_delay_millisecs_attr.attr, + &batch_size_attr.attr, + NULL, +}; + +static struct attribute_group pre_zero_attr_group = { + .attrs = pre_zero_attr, +}; + +static int __init zeropage_init_sysfs(struct kobject *parent_kobj) +{ + int err; + struct kobject *pre_zero_kobj; + + pre_zero_kobj = kobject_create_and_add("pre_zero", parent_kobj); + if (unlikely(!pre_zero_kobj)) { + pr_err("pre_zero: failed to create pre_zero kobject\n"); + return -ENOMEM; + } + + err = sysfs_create_group(pre_zero_kobj, &pre_zero_attr_group); + if (err) { + pr_err("pre_zero: failed to register pre_zero group\n"); + goto delete_obj; + } + + return 0; + +delete_obj: + kobject_put(pre_zero_kobj); + return err; +} +#endif + static void __init hugetlb_sysfs_init(void) { struct hstate *h; @@ -2978,6 +3211,16 @@ static void __init hugetlb_sysfs_init(void) if (err) pr_err("HugeTLB: Unable to add hstate %s", h->name); } + + if (err) + return; +#ifdef CONFIG_PREZERO_HPAGE + err = zeropage_init_sysfs(hugepages_kobj); + if (err) + return; + + start_kprezerod(); +#endif } #ifdef CONFIG_NUMA diff --git a/mm/memory.c b/mm/memory.c index 7d608765932b..e98eed1a59a5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5100,6 +5100,10 @@ void clear_huge_page(struct page *page, unsigned long addr = addr_hint & ~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1); + if (PageZero(page)) { + __ClearPageZero(page); + return; + } if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) { clear_gigantic_page(page, addr, pages_per_huge_page); return;