From patchwork Tue Oct 20 08:59:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846077 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A57011580 for ; Tue, 20 Oct 2020 09:01:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3428F223FB for ; Tue, 20 Oct 2020 09:01:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="SXTGRVDD" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3428F223FB Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0286B6B0062; Tue, 20 Oct 2020 05:01:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F17596B0068; Tue, 20 Oct 2020 05:01:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDE8C6B006C; Tue, 20 Oct 2020 05:01:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0179.hostedemail.com [216.40.44.179]) by kanga.kvack.org (Postfix) with ESMTP id A94516B0062 for ; Tue, 20 Oct 2020 05:01:10 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3ED358401 for ; Tue, 20 Oct 2020 09:01:10 +0000 (UTC) X-FDA: 77391709500.04.leg03_010e5d62723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id 1B9F28000269 for ; Tue, 20 Oct 2020 09:01:10 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30003:30051:30054:30064:30075:30080,0,RBL:52.95.49.90:@amazon.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10;04yrb84k1r8egy5rfkkwnonizyunxocdb6c4r9xb8ue43pzegj3i7gmxe37aid3.rhwhwp5rw7z98w147mnmjj6gy4xqer9s7pgjoi9h4ms4o1jrr7gryabg7q7amse.h-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: leg03_010e5d62723e X-Filterd-Recvd-Size: 12421 Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:01:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184470; x=1634720470; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=kYfu257hraDIVWnN2ur3f67o7iLdxFJlqqlPASzOVSA=; b=SXTGRVDDFPhrNmI8YOEPKeFEa9XprJEhazZMkgGRFqPe5UeJBjeX4RPt Cd0YL0adhb7Vuh6kBUUdi/hiCqsTSl4y5qtdv2a1AyNWSZ9ZOgPRJqeuc Ft7Rq5v9ep3O2OgfuvIgDeEMBoDnABrHcAX08hI4r3JtpqnjvoAkCYmKw U=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="60707311" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-90c42d1d.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 20 Oct 2020 09:01:01 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2a-90c42d1d.us-west-2.amazon.com (Postfix) with ESMTPS id 3998FA202F; Tue, 20 Oct 2020 09:00:54 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:00:32 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 01/18] mm: Introduce Data Access MONitor (DAMON) Date: Tue, 20 Oct 2020 10:59:23 +0200 Message-ID: <20201020085940.13875-2-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park DAMON is a data access monitoring framework for the Linux kernel. The core mechanisms of DAMON make it - accurate (the monitoring output is useful enough for DRAM level performance-centric memory management; It might be inappropriate for CPU Cache levels, though), - light-weight (the monitoring overhead is normally low enough to be applied online), and - scalable (the upper-bound of the overhead is in constant range regardless of the size of target workloads). Using this framework, hence, we can easily write efficient kernel space data access monitoring applications. For example, the kernel's memory management mechanisms can make advanced decisions using this. Experimental data access aware optimization works that incurring high access monitoring overhead could implemented again on top of this. Due to its simple and flexible interface, providing user space interface would be also easy. Then, user space users who have some special workloads can write personalized applications for better understanding and optimizations of their workloads and systems. That said, this commit is implementing only basic data structures and simple manipulation functions of the structures. The core mechanisms of DAMON will be implemented by following commits. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster Reviewed-by: Varad Gautam --- include/linux/damon.h | 95 +++++++++++++++++++++++++++++++++ mm/Kconfig | 2 + mm/Makefile | 1 + mm/damon/Kconfig | 15 ++++++ mm/damon/Makefile | 3 ++ mm/damon/core.c | 121 ++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 237 insertions(+) create mode 100644 include/linux/damon.h create mode 100644 mm/damon/Kconfig create mode 100644 mm/damon/Makefile create mode 100644 mm/damon/core.c diff --git a/include/linux/damon.h b/include/linux/damon.h new file mode 100644 index 000000000000..183e0edd7f43 --- /dev/null +++ b/include/linux/damon.h @@ -0,0 +1,95 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * DAMON api + * + * Author: SeongJae Park + */ + +#ifndef _DAMON_H_ +#define _DAMON_H_ + +#include + +/** + * struct damon_addr_range - Represents an address region of [@start, @end). + * @start: Start address of the region (inclusive). + * @end: End address of the region (exclusive). + */ +struct damon_addr_range { + unsigned long start; + unsigned long end; +}; + +/** + * struct damon_region - Represents a monitoring target region. + * @ar: The address range of the region. + * @nr_accesses: Access frequency of this region. + * @list: List head for siblings. + */ +struct damon_region { + struct damon_addr_range ar; + unsigned int nr_accesses; + struct list_head list; +}; + +/** + * struct damon_target - Represents a monitoring target. + * @id: Unique identifier for this target. + * @regions_list: Head of the monitoring target regions of this target. + * @list: List head for siblings. + * + * Each monitoring context could have multiple targets. For example, a context + * for virtual memory address spaces could have multiple target processes. The + * @id of each target should be unique among the targets of the context. For + * example, in the virtual address monitoring context, it could be a pidfd or + * an address of an mm_struct. + */ +struct damon_target { + unsigned long id; + struct list_head regions_list; + struct list_head list; +}; + +/** + * struct damon_ctx - Represents a context for each monitoring. + * @targets_list: Head of monitoring targets (&damon_target) list. + */ +struct damon_ctx { + struct list_head targets_list; /* 'damon_target' objects */ +}; + +#define damon_next_region(r) \ + (container_of(r->list.next, struct damon_region, list)) + +#define damon_prev_region(r) \ + (container_of(r->list.prev, struct damon_region, list)) + +#define damon_for_each_region(r, t) \ + list_for_each_entry(r, &t->regions_list, list) + +#define damon_for_each_region_safe(r, next, t) \ + list_for_each_entry_safe(r, next, &t->regions_list, list) + +#define damon_for_each_target(t, ctx) \ + list_for_each_entry(t, &(ctx)->targets_list, list) + +#define damon_for_each_target_safe(t, next, ctx) \ + list_for_each_entry_safe(t, next, &(ctx)->targets_list, list) + +#ifdef CONFIG_DAMON + +struct damon_region *damon_new_region(unsigned long start, unsigned long end); +inline void damon_insert_region(struct damon_region *r, + struct damon_region *prev, struct damon_region *next); +void damon_add_region(struct damon_region *r, struct damon_target *t); +void damon_destroy_region(struct damon_region *r); + +struct damon_target *damon_new_target(unsigned long id); +void damon_add_target(struct damon_ctx *ctx, struct damon_target *t); +void damon_free_target(struct damon_target *t); +void damon_destroy_target(struct damon_target *t); +unsigned int damon_nr_regions(struct damon_target *t); + +#endif /* CONFIG_DAMON */ + +#endif diff --git a/mm/Kconfig b/mm/Kconfig index 6c974888f86f..19fe2251c87a 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -868,4 +868,6 @@ config ARCH_HAS_HUGEPD config MAPPING_DIRTY_HELPERS bool +source "mm/damon/Kconfig" + endmenu diff --git a/mm/Makefile b/mm/Makefile index d5649f1c12c0..5d969d09521b 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -121,3 +121,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o +obj-$(CONFIG_DAMON) += damon/ diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig new file mode 100644 index 000000000000..d00e99ac1a15 --- /dev/null +++ b/mm/damon/Kconfig @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0-only + +menu "Data Access Monitoring" + +config DAMON + bool "DAMON: Data Access Monitoring Framework" + help + This builds a framework that allows kernel subsystems to monitor + access frequency of each memory region. The information can be useful + for performance-centric DRAM level memory management. + + See https://damonitor.github.io/doc/html/latest-damon/index.html for + more information. + +endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile new file mode 100644 index 000000000000..4fd2edb4becf --- /dev/null +++ b/mm/damon/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_DAMON) := core.o diff --git a/mm/damon/core.c b/mm/damon/core.c new file mode 100644 index 000000000000..4562b2458719 --- /dev/null +++ b/mm/damon/core.c @@ -0,0 +1,121 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Data Access Monitor + * + * Author: SeongJae Park + */ + +#define pr_fmt(fmt) "damon: " fmt + +#include +#include + +/* + * Functions and macros for DAMON data structures + */ + +/* + * Construct a damon_region struct + * + * Returns the pointer to the new struct if success, or NULL otherwise + */ +struct damon_region *damon_new_region(unsigned long start, unsigned long end) +{ + struct damon_region *region; + + region = kmalloc(sizeof(*region), GFP_KERNEL); + if (!region) + return NULL; + + region->ar.start = start; + region->ar.end = end; + region->nr_accesses = 0; + INIT_LIST_HEAD(®ion->list); + + return region; +} + +/* + * Add a region between two other regions + */ +inline void damon_insert_region(struct damon_region *r, + struct damon_region *prev, struct damon_region *next) +{ + __list_add(&r->list, &prev->list, &next->list); +} + +void damon_add_region(struct damon_region *r, struct damon_target *t) +{ + list_add_tail(&r->list, &t->regions_list); +} + +static void damon_del_region(struct damon_region *r) +{ + list_del(&r->list); +} + +static void damon_free_region(struct damon_region *r) +{ + kfree(r); +} + +void damon_destroy_region(struct damon_region *r) +{ + damon_del_region(r); + damon_free_region(r); +} + +/* + * Construct a damon_target struct + * + * Returns the pointer to the new struct if success, or NULL otherwise + */ +struct damon_target *damon_new_target(unsigned long id) +{ + struct damon_target *t; + + t = kmalloc(sizeof(*t), GFP_KERNEL); + if (!t) + return NULL; + + t->id = id; + INIT_LIST_HEAD(&t->regions_list); + + return t; +} + +void damon_add_target(struct damon_ctx *ctx, struct damon_target *t) +{ + list_add_tail(&t->list, &ctx->targets_list); +} + +static void damon_del_target(struct damon_target *t) +{ + list_del(&t->list); +} + +void damon_free_target(struct damon_target *t) +{ + struct damon_region *r, *next; + + damon_for_each_region_safe(r, next, t) + damon_free_region(r); + kfree(t); +} + +void damon_destroy_target(struct damon_target *t) +{ + damon_del_target(t); + damon_free_target(t); +} + +unsigned int damon_nr_regions(struct damon_target *t) +{ + struct damon_region *r; + unsigned int nr_regions = 0; + + damon_for_each_region(r, t) + nr_regions++; + + return nr_regions; +} From patchwork Tue Oct 20 08:59:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846079 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8DC9515E6 for ; Tue, 20 Oct 2020 09:01:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 18FD92237B for ; Tue, 20 Oct 2020 09:01:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="SS0eXVcO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 18FD92237B Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3CDB86B0068; Tue, 20 Oct 2020 05:01:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 37DB46B006C; Tue, 20 Oct 2020 05:01:24 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AA666B006E; Tue, 20 Oct 2020 05:01:24 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id C7B516B0068 for ; Tue, 20 Oct 2020 05:01:23 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5B7C482F6B0F for ; Tue, 20 Oct 2020 09:01:23 +0000 (UTC) X-FDA: 77391710046.27.hen29_3a082972723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 36E333D669 for ; Tue, 20 Oct 2020 09:01:23 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30003:30012:30034:30051:30054:30056:30064:30070:30075,0,RBL:207.171.184.29:@amazon.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10;04yrpzkb4m96hoeo5ofadgybrp1oyyp7aqpjcq4ss7ofpun5b1t7medmr8t3rfn.tdb4kcb1njndr7abihq7ka8ytcqe345a4qjpendypyu13w897hrwib8jyjqdpsi.h-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: hen29_3a082972723e X-Filterd-Recvd-Size: 22112 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:01:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184483; x=1634720483; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=dARCre0biGrRSqz/sAes9zlAIFNWsxlae2wsWUh+NM4=; b=SS0eXVcONfG1LFsVWlP9pA6m9V9DSCuzqPVWT5epXZzW+rSbVtLWrUml ip2RzQp29ZKP7Cc3QOpd4AHlEw20MjuQVvdtzcWYPqlFY1SKLiKRA392q 3aiMhKqLpPjOKruISiskWFgqp0vUqioM1DBUsng2M/CR0vaJBvBnhqD79 Y=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="86288243" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2a-41350382.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9102.sea19.amazon.com with ESMTP; 20 Oct 2020 09:01:15 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2a-41350382.us-west-2.amazon.com (Postfix) with ESMTPS id 4E7C7C2971; Tue, 20 Oct 2020 09:01:12 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:00:53 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 02/18] mm/damon: Implement region based sampling Date: Tue, 20 Oct 2020 10:59:24 +0200 Message-ID: <20201020085940.13875-3-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park DAMON separates its monitoring target address space independent high level logics from the target space dependent low level primitives for flexible support of various address spaces. This commit implements DAMON's target address space independent high level logics for basic access check and region based sampling. Hence, without the target address space specific parts implementations, this doesn't work alone. A reference implementation of those will be provided by a later commit. Basic Access Check ================== The output of DAMON says what pages are how frequently accessed for a given duration. The resolution of the access frequency is controlled by setting ``sampling interval`` and ``aggregation interval``. In detail, DAMON checks access to each page per ``sampling interval`` and aggregates the results. In other words, counts the number of the accesses to each page. After each ``aggregation interval`` passes, DAMON calls callback functions that previously registered by users so that users can read the aggregated results and then clears the results. This can be described in below simple pseudo-code:: while monitoring_on: for page in monitoring_target: if accessed(page): nr_accesses[page] += 1 if time() % aggregation_interval == 0: for callback in user_registered_callbacks: callback(monitoring_target, nr_accesses) for page in monitoring_target: nr_accesses[page] = 0 sleep(sampling interval) The monitoring overhead of this mechanism will arbitrarily increase as the size of the target workload grows. Region Based Sampling ===================== To avoid the unbounded increase of the overhead, DAMON groups adjacent pages that assumed to have the same access frequencies into a region. As long as the assumption (pages in a region have the same access frequencies) is kept, only one page in the region is required to be checked. Thus, for each ``sampling interval``, DAMON randomly picks one page in each region, waits for one ``sampling interval``, checks whether the page is accessed meanwhile, and increases the access frequency of the region if so. Therefore, the monitoring overhead is controllable by setting the number of regions. DAMON allows users to set the minimum and the maximum number of regions for the trade-off. This scheme, however, cannot preserve the quality of the output if the assumption is not guaranteed. Next commit will address this problem. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 133 ++++++++++++++++- mm/damon/core.c | 333 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 465 insertions(+), 1 deletion(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index 183e0edd7f43..1f7b095646c2 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -8,6 +8,8 @@ #ifndef _DAMON_H_ #define _DAMON_H_ +#include +#include #include /** @@ -23,11 +25,13 @@ struct damon_addr_range { /** * struct damon_region - Represents a monitoring target region. * @ar: The address range of the region. + * @sampling_addr: Address of the sample for the next access check. * @nr_accesses: Access frequency of this region. * @list: List head for siblings. */ struct damon_region { struct damon_addr_range ar; + unsigned long sampling_addr; unsigned int nr_accesses; struct list_head list; }; @@ -50,12 +54,130 @@ struct damon_target { struct list_head list; }; +struct damon_ctx; + /** - * struct damon_ctx - Represents a context for each monitoring. + * struct damon_primitive Monitoring primitives for given use cases. + * + * @init_target_regions: Constructs initial monitoring target regions. + * @prepare_access_checks: Prepares next access check of target regions. + * @check_accesses: Checks the access of target regions. + * @target_valid: Determine if the target is valid. + * @cleanup: Cleans up the context. + * + * DAMON can be extended for various address spaces and usages. For this, + * users should register the low level primitives for their target address + * space and usecase via the &damon_ctx.primitive. Then, the monitoring thread + * calls @init_target_regions before starting the monitoring and + * @prepare_access_checks, @check_accesses, and @target_valid for each + * @sample_interval. + * + * @init_target_regions should construct proper monitoring target regions and + * link those to the DAMON context struct. + * @prepare_access_checks should manipulate the monitoring regions to be + * prepare for the next access check. + * @check_accesses should check the accesses to each region that made after the + * last preparation and update the `->nr_accesses` of each region. It should + * also return max &damon_region.nr_accesses that made as a result of its + * update. + * @target_valid should check whether the target is still valid for the + * monitoring. + * @cleanup is called from @kdamond just before its termination. After this + * call, only @kdamond_lock and @kdamond will be touched. + */ +struct damon_primitive { + void (*init_target_regions)(struct damon_ctx *context); + void (*prepare_access_checks)(struct damon_ctx *context); + unsigned int (*check_accesses)(struct damon_ctx *context); + bool (*target_valid)(struct damon_target *target); + void (*cleanup)(struct damon_ctx *context); +}; + +/* + * struct damon_callback Monitoring events notification callbacks. + * + * @before_start: Called before starting the monitoring. + * @after_sampling: Called after each sampling. + * @after_aggregation: Called after each aggregation. + * @before_terminate: Called before terminating the monitoring. + * @private: User private data. + * + * The monitoring thread (&damon_ctx->kdamond) calls @before_start and + * @before_terminate just before starting the monitoring and just before + * finishing the monitoring. Therefore, those are good places for installing + * and cleaning @private. + * + * The monitoring thread calls @after_sampling and @after_aggregation for each + * of the sampling intervals and aggregation intervals, respectively. + * Therefore, users can safely access the monitoring results via + * &damon_ctx.targets_list without additional protection of + * damon_ctx.kdamond_lock. For the reason, users are recommended to use these + * callback for the accesses to the results. + * + * If any callback returns non-zero, monitoring stops. + */ +struct damon_callback { + void *private; + + int (*before_start)(struct damon_ctx *context); + int (*after_sampling)(struct damon_ctx *context); + int (*after_aggregation)(struct damon_ctx *context); + int (*before_terminate)(struct damon_ctx *context); +}; + +/** + * struct damon_ctx - Represents a context for each monitoring. This is the + * main interface that allows users to set the attributes and get the results + * of the monitoring. + * + * @sample_interval: The time between access samplings. + * @aggr_interval: The time between monitor results aggregations. + * @nr_regions: The number of monitoring regions. + * + * For each @sample_interval, DAMON checks whether each region is accessed or + * not. It aggregates and keeps the access information (number of accesses to + * each region) for @aggr_interval time. All time intervals are in + * micro-seconds. + * + * @kdamond: Kernel thread who does the monitoring. + * @kdamond_stop: Notifies whether kdamond should stop. + * @kdamond_lock: Mutex for the synchronizations with @kdamond. + * + * For each monitoring context, one kernel thread for the monitoring is + * created. The pointer to the thread is stored in @kdamond. + * + * Once started, the monitoring thread runs until explicitly required to be + * terminated or every monitoring target is invalid. The validity of the + * targets is checked via the @target_valid callback. The termination can also + * be explicitly requested by writing non-zero to @kdamond_stop. The thread + * sets @kdamond to NULL when it terminates. Therefore, users can know whether + * the monitoring is ongoing or terminated by reading @kdamond. Reads and + * writes to @kdamond and @kdamond_stop from outside of the monitoring thread + * must be protected by @kdamond_lock. + * + * Note that the monitoring thread protects only @kdamond and @kdamond_stop via + * @kdamond_lock. Accesses to other fields must be protected by themselves. + * * @targets_list: Head of monitoring targets (&damon_target) list. + * + * @primitive: Set of monitoring primitives for given use cases. + * @callback: Set of callbacks for monitoring events notifications. */ struct damon_ctx { + unsigned long sample_interval; + unsigned long aggr_interval; + unsigned long nr_regions; + + struct timespec64 last_aggregation; + + struct task_struct *kdamond; + bool kdamond_stop; + struct mutex kdamond_lock; + struct list_head targets_list; /* 'damon_target' objects */ + + struct damon_primitive primitive; + struct damon_callback callback; }; #define damon_next_region(r) \ @@ -90,6 +212,15 @@ void damon_free_target(struct damon_target *t); void damon_destroy_target(struct damon_target *t); unsigned int damon_nr_regions(struct damon_target *t); +int damon_set_targets(struct damon_ctx *ctx, + unsigned long *ids, ssize_t nr_ids); +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, + unsigned long aggr_int, unsigned long nr_reg); + +int damon_nr_running_ctxs(void); +int damon_start(struct damon_ctx **ctxs, int nr_ctxs); +int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); + #endif /* CONFIG_DAMON */ #endif diff --git a/mm/damon/core.c b/mm/damon/core.c index 4562b2458719..eb4ebeaa064d 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -8,12 +8,20 @@ #define pr_fmt(fmt) "damon: " fmt #include +#include +#include #include +/* Minimal region size. Every damon_region is aligned by this. */ +#define MIN_REGION PAGE_SIZE + /* * Functions and macros for DAMON data structures */ +static DEFINE_MUTEX(damon_lock); +static int nr_running_ctxs; + /* * Construct a damon_region struct * @@ -119,3 +127,328 @@ unsigned int damon_nr_regions(struct damon_target *t) return nr_regions; } + +/** + * damon_set_targets() - Set monitoring targets. + * @ctx: monitoring context + * @ids: array of target ids + * @nr_ids: number of entries in @ids + * + * This function should not be called while the kdamond is running. + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_set_targets(struct damon_ctx *ctx, + unsigned long *ids, ssize_t nr_ids) +{ + ssize_t i; + struct damon_target *t, *next; + + damon_for_each_target_safe(t, next, ctx) + damon_destroy_target(t); + + for (i = 0; i < nr_ids; i++) { + t = damon_new_target(ids[i]); + if (!t) { + pr_err("Failed to alloc damon_target\n"); + return -ENOMEM; + } + damon_add_target(ctx, t); + } + + return 0; +} + +/** + * damon_set_attrs() - Set attributes for the monitoring. + * @ctx: monitoring context + * @sample_int: time interval between samplings + * @aggr_int: time interval between aggregations + * @nr_reg: number of regions + * + * This function should not be called while the kdamond is running. + * Every time interval is in micro-seconds. + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, + unsigned long aggr_int, unsigned long nr_reg) +{ + if (nr_reg < 3) { + pr_err("nr_regions (%lu) must be at least 3\n", + nr_reg); + return -EINVAL; + } + + ctx->sample_interval = sample_int; + ctx->aggr_interval = aggr_int; + ctx->nr_regions = nr_reg; + + return 0; +} + +static bool damon_kdamond_running(struct damon_ctx *ctx) +{ + bool running; + + mutex_lock(&ctx->kdamond_lock); + running = ctx->kdamond != NULL; + mutex_unlock(&ctx->kdamond_lock); + + return running; +} + +static int kdamond_fn(void *data); + +/* + * __damon_start() - Starts monitoring with given context. + * @ctx: monitoring context + * + * This function should be called while damon_lock is hold. + * + * Return: 0 on success, negative error code otherwise. + */ +static int __damon_start(struct damon_ctx *ctx) +{ + int err = -EBUSY; + + mutex_lock(&ctx->kdamond_lock); + if (!ctx->kdamond) { + err = 0; + ctx->kdamond_stop = false; + ctx->kdamond = kthread_create(kdamond_fn, ctx, "kdamond.%d", + nr_running_ctxs); + if (IS_ERR(ctx->kdamond)) + err = PTR_ERR(ctx->kdamond); + else + wake_up_process(ctx->kdamond); + } + mutex_unlock(&ctx->kdamond_lock); + + return err; +} + +/** + * damon_start() - Starts the monitorings for a given group of contexts. + * @ctxs: an array of the pointers for contexts to start monitoring + * @nr_ctxs: size of @ctxs + * + * This function starts a group of monitoring threads for a group of monitoring + * contexts. One thread per each context is created and run in parallel. The + * caller should handle synchronization between the threads by itself. If a + * group of threads that created by other 'damon_start()' call is currently + * running, this function does nothing but returns -EBUSY. + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_start(struct damon_ctx **ctxs, int nr_ctxs) +{ + int i; + int err = 0; + + mutex_lock(&damon_lock); + if (nr_running_ctxs) { + mutex_unlock(&damon_lock); + return -EBUSY; + } + + for (i = 0; i < nr_ctxs; i++) { + err = __damon_start(ctxs[i]); + if (err) + break; + nr_running_ctxs++; + } + mutex_unlock(&damon_lock); + + return err; +} + +/* + * __damon_stop() - Stops monitoring of given context. + * @ctx: monitoring context + * + * Return: 0 on success, negative error code otherwise. + */ +static int __damon_stop(struct damon_ctx *ctx) +{ + mutex_lock(&ctx->kdamond_lock); + if (ctx->kdamond) { + ctx->kdamond_stop = true; + mutex_unlock(&ctx->kdamond_lock); + while (damon_kdamond_running(ctx)) + usleep_range(ctx->sample_interval, + ctx->sample_interval * 2); + return 0; + } + mutex_unlock(&ctx->kdamond_lock); + + return -EPERM; +} + +/** + * damon_stop() - Stops the monitorings for a given group of contexts. + * @ctxs: an array of the pointers for contexts to stop monitoring + * @nr_ctxs: size of @ctxs + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_stop(struct damon_ctx **ctxs, int nr_ctxs) +{ + int i, err = 0; + + for (i = 0; i < nr_ctxs; i++) { + /* nr_running_ctxs is decremented in kdamond_fn */ + err = __damon_stop(ctxs[i]); + if (err) + return err; + } + + return err; +} + +/* + * Functions for DAMON core logics + */ + +/* + * damon_check_reset_time_interval() - Check if a time interval is elapsed. + * @baseline: the time to check whether the interval has elapsed since + * @interval: the time interval (microseconds) + * + * See whether the given time interval has passed since the given baseline + * time. If so, it also updates the baseline to current time for next check. + * + * Return: true if the time interval has passed, or false otherwise. + */ +static bool damon_check_reset_time_interval(struct timespec64 *baseline, + unsigned long interval) +{ + struct timespec64 now; + + ktime_get_coarse_ts64(&now); + if ((timespec64_to_ns(&now) - timespec64_to_ns(baseline)) < + interval * 1000) + return false; + *baseline = now; + return true; +} + +/* + * Check whether it is time to flush the aggregated information + */ +static bool kdamond_aggregate_interval_passed(struct damon_ctx *ctx) +{ + return damon_check_reset_time_interval(&ctx->last_aggregation, + ctx->aggr_interval); +} + +/* + * Reset the aggregated monitoring results ('nr_accesses' of each region). + */ +static void kdamond_reset_aggregated(struct damon_ctx *c) +{ + struct damon_target *t; + + damon_for_each_target(t, c) { + struct damon_region *r; + + damon_for_each_region(r, t) + r->nr_accesses = 0; + } +} + +/* + * Check whether current monitoring should be stopped + * + * The monitoring is stopped when either the user requested to stop, or all + * monitoring targets are invalid. + * + * Returns true if need to stop current monitoring. + */ +static bool kdamond_need_stop(struct damon_ctx *ctx) +{ + struct damon_target *t; + bool stop; + + mutex_lock(&ctx->kdamond_lock); + stop = ctx->kdamond_stop; + mutex_unlock(&ctx->kdamond_lock); + if (stop) + return true; + + if (!ctx->primitive.target_valid) + return false; + + damon_for_each_target(t, ctx) { + if (ctx->primitive.target_valid(t)) + return false; + } + + return true; +} + +static void set_kdamond_stop(struct damon_ctx *ctx, bool stop) +{ + mutex_lock(&ctx->kdamond_lock); + ctx->kdamond_stop = stop; + mutex_unlock(&ctx->kdamond_lock); +} + +#define kdamond_call_prmt(ctx, fn) \ + do { \ + if (ctx->primitive.fn) \ + ctx->primitive.fn(ctx); \ + } while (0) + +#define kdamond_callback(ctx, fn) \ + do { \ + if (ctx->callback.fn && ctx->callback.fn(ctx)) \ + set_kdamond_stop(ctx, true); \ + } while (0) + +/* + * The monitoring daemon that runs as a kernel thread + */ +static int kdamond_fn(void *data) +{ + struct damon_ctx *ctx = (struct damon_ctx *)data; + struct damon_target *t; + struct damon_region *r, *next; + + pr_info("kdamond (%d) starts\n", ctx->kdamond->pid); + + kdamond_call_prmt(ctx, init_target_regions); + kdamond_callback(ctx, before_start); + + while (!kdamond_need_stop(ctx)) { + kdamond_call_prmt(ctx, prepare_access_checks); + kdamond_callback(ctx, after_sampling); + + usleep_range(ctx->sample_interval, ctx->sample_interval + 1); + + kdamond_call_prmt(ctx, check_accesses); + + if (kdamond_aggregate_interval_passed(ctx)) { + kdamond_callback(ctx, after_aggregation); + kdamond_reset_aggregated(ctx); + } + } + damon_for_each_target(t, ctx) { + damon_for_each_region_safe(r, next, t) + damon_destroy_region(r); + } + + kdamond_callback(ctx, before_terminate); + kdamond_call_prmt(ctx, cleanup); + + pr_debug("kdamond (%d) finishes\n", ctx->kdamond->pid); + mutex_lock(&ctx->kdamond_lock); + ctx->kdamond = NULL; + mutex_unlock(&ctx->kdamond_lock); + + mutex_lock(&damon_lock); + nr_running_ctxs--; + mutex_unlock(&damon_lock); + + do_exit(0); +} From patchwork Tue Oct 20 08:59:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846081 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 50E7C1580 for ; Tue, 20 Oct 2020 09:01:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AAF91223C6 for ; Tue, 20 Oct 2020 09:01:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="VbGC1Em3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AAF91223C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CF5336B005C; Tue, 20 Oct 2020 05:01:53 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C7CBD6B006C; Tue, 20 Oct 2020 05:01:53 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B45656B006E; Tue, 20 Oct 2020 05:01:53 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id 848636B005C for ; Tue, 20 Oct 2020 05:01:53 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 28F1482F7A9E for ; Tue, 20 Oct 2020 09:01:53 +0000 (UTC) X-FDA: 77391711306.15.price74_07097ce2723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id D59EC1814B0C7 for ; Tue, 20 Oct 2020 09:01:52 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30012:30034:30054:30056:30064:30070:30075,0,RBL:207.171.190.10:@amazon.com:.lbl8.mailshell.net-66.10.201.10 62.18.0.100;04yrpguf869aruiparkct59pfzo33yc8exjifrcc7zo6srx89ba3z79ba8qzj1w.14su9idazedzwziuqbgbhuwm1fsuo376aq5ht3njkxobgf7n58wykekg6grscr4.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: price74_07097ce2723e X-Filterd-Recvd-Size: 13889 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:01:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184513; x=1634720513; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=w/JSqe9EhMqEdSn91HD19xQaMzZawG3lkv7krg0C2Ow=; b=VbGC1Em3suYchpS7fS5uU498/iIr3EECImSiKa+a7YRx/mqHecEw6cPW rG5iJCsr2a763flhWRxfYxa+dQkFIuWNimkNpwvSmllTbuEwbiVgWiT5v Z6rzehMmzkGROgzb+MZUDsbud9PhBeQ+pqes6aumOpdC1gpuNxTA6KxU8 U=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="85041161" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2b-c7131dcf.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP; 20 Oct 2020 09:01:45 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162]) by email-inbound-relay-2b-c7131dcf.us-west-2.amazon.com (Postfix) with ESMTPS id 4064EA2424; Tue, 20 Oct 2020 09:01:32 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:01:13 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 03/18] mm/damon: Adaptively adjust regions Date: Tue, 20 Oct 2020 10:59:25 +0200 Message-ID: <20201020085940.13875-4-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Even somehow the initial monitoring target regions are well constructed to fulfill the assumption (pages in same region have similar access frequencies), the data access pattern can be dynamically changed. This will result in low monitoring quality. To keep the assumption as much as possible, DAMON adaptively merges and splits each region based on their access frequency. For each ``aggregation interval``, it compares the access frequencies of adjacent regions and merges those if the frequency difference is small. Then, after it reports and clears the aggregated access frequency of each region, it splits each region into two or three regions if the total number of regions will not exceed the user-specified maximum number of regions after the split. In this way, DAMON provides its best-effort quality and minimal overhead while keeping the upper-bound overhead that users set. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 11 ++- mm/damon/core.c | 196 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 195 insertions(+), 12 deletions(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index 1f7b095646c2..0797bdfbfc24 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -132,7 +132,8 @@ struct damon_callback { * * @sample_interval: The time between access samplings. * @aggr_interval: The time between monitor results aggregations. - * @nr_regions: The number of monitoring regions. + * @min_nr_regions: The minimum number of monitoring regions. + * @max_nr_regions: The maximum number of monitoring regions. * * For each @sample_interval, DAMON checks whether each region is accessed or * not. It aggregates and keeps the access information (number of accesses to @@ -166,7 +167,8 @@ struct damon_callback { struct damon_ctx { unsigned long sample_interval; unsigned long aggr_interval; - unsigned long nr_regions; + unsigned long min_nr_regions; + unsigned long max_nr_regions; struct timespec64 last_aggregation; @@ -214,8 +216,9 @@ unsigned int damon_nr_regions(struct damon_target *t); int damon_set_targets(struct damon_ctx *ctx, unsigned long *ids, ssize_t nr_ids); -int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, - unsigned long aggr_int, unsigned long nr_reg); +int damon_set_attrs(struct damon_ctx *ctx, + unsigned long sample_int, unsigned long aggr_int, + unsigned long min_nr_reg, unsigned long max_nr_reg); int damon_nr_running_ctxs(void); int damon_start(struct damon_ctx **ctxs, int nr_ctxs); diff --git a/mm/damon/core.c b/mm/damon/core.c index eb4ebeaa064d..ed364b42721d 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -10,6 +10,7 @@ #include #include #include +#include #include /* Minimal region size. Every damon_region is aligned by this. */ @@ -19,6 +20,9 @@ * Functions and macros for DAMON data structures */ +/* Get a random number in [l, r) */ +#define damon_rand(l, r) (l + prandom_u32_max(r - l)) + static DEFINE_MUTEX(damon_lock); static int nr_running_ctxs; @@ -164,29 +168,57 @@ int damon_set_targets(struct damon_ctx *ctx, * @ctx: monitoring context * @sample_int: time interval between samplings * @aggr_int: time interval between aggregations - * @nr_reg: number of regions + * @min_nr_reg: minimal number of regions + * @max_nr_reg: maximum number of regions * * This function should not be called while the kdamond is running. * Every time interval is in micro-seconds. * * Return: 0 on success, negative error code otherwise. */ -int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, - unsigned long aggr_int, unsigned long nr_reg) +int damon_set_attrs(struct damon_ctx *ctx, + unsigned long sample_int, unsigned long aggr_int, + unsigned long min_nr_reg, unsigned long max_nr_reg) { - if (nr_reg < 3) { - pr_err("nr_regions (%lu) must be at least 3\n", - nr_reg); + if (min_nr_reg < 3) { + pr_err("min_nr_regions (%lu) must be at least 3\n", + min_nr_reg); + return -EINVAL; + } + if (min_nr_reg > max_nr_reg) { + pr_err("invalid nr_regions. min (%lu) > max (%lu)\n", + min_nr_reg, max_nr_reg); return -EINVAL; } ctx->sample_interval = sample_int; ctx->aggr_interval = aggr_int; - ctx->nr_regions = nr_reg; + ctx->min_nr_regions = min_nr_reg; + ctx->max_nr_regions = max_nr_reg; return 0; } +/* Returns the size upper limit for each monitoring region */ +static unsigned long damon_region_sz_limit(struct damon_ctx *ctx) +{ + struct damon_target *t; + struct damon_region *r; + unsigned long sz = 0; + + damon_for_each_target(t, ctx) { + damon_for_each_region(r, t) + sz += r->ar.end - r->ar.start; + } + + if (ctx->min_nr_regions) + sz /= ctx->min_nr_regions; + if (sz < MIN_REGION) + sz = MIN_REGION; + + return sz; +} + static bool damon_kdamond_running(struct damon_ctx *ctx) { bool running; @@ -357,6 +389,146 @@ static void kdamond_reset_aggregated(struct damon_ctx *c) } } +#define sz_damon_region(r) (r->ar.end - r->ar.start) + +/* + * Merge two adjacent regions into one region + */ +static void damon_merge_two_regions(struct damon_region *l, + struct damon_region *r) +{ + unsigned long sz_l = sz_damon_region(l), sz_r = sz_damon_region(r); + + l->nr_accesses = (l->nr_accesses * sz_l + r->nr_accesses * sz_r) / + (sz_l + sz_r); + l->ar.end = r->ar.end; + damon_destroy_region(r); +} + +#define diff_of(a, b) (a > b ? a - b : b - a) + +/* + * Merge adjacent regions having similar access frequencies + * + * t target affected by this merge operation + * thres '->nr_accesses' diff threshold for the merge + * sz_limit size upper limit of each region + */ +static void damon_merge_regions_of(struct damon_target *t, unsigned int thres, + unsigned long sz_limit) +{ + struct damon_region *r, *prev = NULL, *next; + + damon_for_each_region_safe(r, next, t) { + if (prev && prev->ar.end == r->ar.start && + diff_of(prev->nr_accesses, r->nr_accesses) <= thres && + sz_damon_region(prev) + sz_damon_region(r) <= sz_limit) + damon_merge_two_regions(prev, r); + else + prev = r; + } +} + +/* + * Merge adjacent regions having similar access frequencies + * + * threshold '->nr_accesses' diff threshold for the merge + * sz_limit size upper limit of each region + * + * This function merges monitoring target regions which are adjacent and their + * access frequencies are similar. This is for minimizing the monitoring + * overhead under the dynamically changeable access pattern. If a merge was + * unnecessarily made, later 'kdamond_split_regions()' will revert it. + */ +static void kdamond_merge_regions(struct damon_ctx *c, unsigned int threshold, + unsigned long sz_limit) +{ + struct damon_target *t; + + damon_for_each_target(t, c) + damon_merge_regions_of(t, threshold, sz_limit); +} + +/* + * Split a region in two smaller regions + * + * r the region to be split + * sz_r size of the first sub-region that will be made + */ +static void damon_split_region_at(struct damon_ctx *ctx, + struct damon_region *r, unsigned long sz_r) +{ + struct damon_region *new; + + new = damon_new_region(r->ar.start + sz_r, r->ar.end); + r->ar.end = new->ar.start; + + damon_insert_region(new, r, damon_next_region(r)); +} + +/* Split every region in the given target into 'nr_subs' regions */ +static void damon_split_regions_of(struct damon_ctx *ctx, + struct damon_target *t, int nr_subs) +{ + struct damon_region *r, *next; + unsigned long sz_region, sz_sub = 0; + int i; + + damon_for_each_region_safe(r, next, t) { + sz_region = r->ar.end - r->ar.start; + + for (i = 0; i < nr_subs - 1 && + sz_region > 2 * MIN_REGION; i++) { + /* + * Randomly select size of left sub-region to be at + * least 10 percent and at most 90% of original region + */ + sz_sub = ALIGN_DOWN(damon_rand(1, 10) * + sz_region / 10, MIN_REGION); + /* Do not allow blank region */ + if (sz_sub == 0 || sz_sub >= sz_region) + continue; + + damon_split_region_at(ctx, r, sz_sub); + sz_region = sz_sub; + } + } +} + +/* + * Split every target region into randomly-sized small regions + * + * This function splits every target region into random-sized small regions if + * current total number of the regions is equal or smaller than half of the + * user-specified maximum number of regions. This is for maximizing the + * monitoring accuracy under the dynamically changeable access patterns. If a + * split was unnecessarily made, later 'kdamond_merge_regions()' will revert + * it. + */ +static void kdamond_split_regions(struct damon_ctx *ctx) +{ + struct damon_target *t; + unsigned int nr_regions = 0; + static unsigned int last_nr_regions; + int nr_subregions = 2; + + damon_for_each_target(t, ctx) + nr_regions += damon_nr_regions(t); + + if (nr_regions > ctx->max_nr_regions / 2) + return; + + /* Maybe the middle of the region has different access frequency */ + if (last_nr_regions == nr_regions && + nr_regions < ctx->max_nr_regions / 3) + nr_subregions = 3; + + damon_for_each_target(t, ctx) + damon_split_regions_of(ctx, t, nr_subregions); + + last_nr_regions = nr_regions; +} + /* * Check whether current monitoring should be stopped * @@ -414,23 +586,31 @@ static int kdamond_fn(void *data) struct damon_ctx *ctx = (struct damon_ctx *)data; struct damon_target *t; struct damon_region *r, *next; + unsigned int max_nr_accesses = 0; + unsigned long sz_limit = 0; pr_info("kdamond (%d) starts\n", ctx->kdamond->pid); kdamond_call_prmt(ctx, init_target_regions); kdamond_callback(ctx, before_start); + sz_limit = damon_region_sz_limit(ctx); + while (!kdamond_need_stop(ctx)) { kdamond_call_prmt(ctx, prepare_access_checks); kdamond_callback(ctx, after_sampling); usleep_range(ctx->sample_interval, ctx->sample_interval + 1); - kdamond_call_prmt(ctx, check_accesses); + if (ctx->primitive.check_accesses) + max_nr_accesses = ctx->primitive.check_accesses(ctx); if (kdamond_aggregate_interval_passed(ctx)) { + kdamond_merge_regions(ctx, max_nr_accesses / 10, + sz_limit); kdamond_callback(ctx, after_aggregation); kdamond_reset_aggregated(ctx); + kdamond_split_regions(ctx); } } damon_for_each_target(t, ctx) { From patchwork Tue Oct 20 08:59:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846083 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8AE2815E6 for ; Tue, 20 Oct 2020 09:02:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 33EA9223C6 for ; Tue, 20 Oct 2020 09:02:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="Z8T3hJQR" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33EA9223C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4DF0A6B006C; Tue, 20 Oct 2020 05:02:13 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 48ED56B006E; Tue, 20 Oct 2020 05:02:13 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 309146B0070; Tue, 20 Oct 2020 05:02:13 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id F31846B006C for ; Tue, 20 Oct 2020 05:02:12 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8F2A6181AC9CB for ; Tue, 20 Oct 2020 09:02:12 +0000 (UTC) X-FDA: 77391712104.14.trees75_1f118ed2723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 6FBC818229818 for ; Tue, 20 Oct 2020 09:02:12 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30003:30012:30034:30046:30054:30064:30070:30075,0,RBL:207.171.184.25:@amazon.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10;04yrx9gji4ijhnmur841su1n4tnamop9kzrhajaw5sna31h1ra4oghbw57nkfeo.yn1xsnonsb3qrpija6j1zhnuyh9wjgggauh3hn6gnuaeyurf917u16j4izirbt8.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: trees75_1f118ed2723e X-Filterd-Recvd-Size: 9898 Received: from smtp-fw-9101.amazon.com (smtp-fw-9101.amazon.com [207.171.184.25]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:02:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184532; x=1634720532; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=vCiIgpH/r3UBfJEt/yc/mpQlouFHUp3v4P69BFVrfRg=; b=Z8T3hJQReeBKYb+lEZs1gKCnYb/FZPbqvSO/B6S879vjPONK8VOwmOFQ pupgCDKpK1FE14alpQdsZdl1A0ajKEWYVpcrtGZ2wu9X46dORNoki6pBG +XF0QoSm1ZLbeyGTpHFxd6WavdEEo4XcYECo8XT4kbljYH618Kk3ETx+d I=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="78102445" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2b-8cc5d68b.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP; 20 Oct 2020 09:02:11 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2b-8cc5d68b.us-west-2.amazon.com (Postfix) with ESMTPS id EFCCAA18A8; Tue, 20 Oct 2020 09:01:58 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:01:39 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 04/18] mm/damon: Track dynamic monitoring target regions update Date: Tue, 20 Oct 2020 10:59:26 +0200 Message-ID: <20201020085940.13875-5-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park The monitoring target address range can be dynamically changed. For example, virtual memory could be dynamically mapped and unmapped. Physical memory could be hot-plugged. As the changes could be quite frequent in some cases, DAMON checks the dynamic memory mapping changes and applies it to the abstracted target area only for each of a user-specified time interval, ``regions update interval``. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 22 +++++++++++++++++----- mm/damon/core.c | 22 ++++++++++++++++++++-- 2 files changed, 37 insertions(+), 7 deletions(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index 0797bdfbfc24..b8562814751e 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -60,6 +60,7 @@ struct damon_ctx; * struct damon_primitive Monitoring primitives for given use cases. * * @init_target_regions: Constructs initial monitoring target regions. + * @update_target_regions: Updates monitoring target regions. * @prepare_access_checks: Prepares next access check of target regions. * @check_accesses: Checks the access of target regions. * @target_valid: Determine if the target is valid. @@ -68,12 +69,17 @@ struct damon_ctx; * DAMON can be extended for various address spaces and usages. For this, * users should register the low level primitives for their target address * space and usecase via the &damon_ctx.primitive. Then, the monitoring thread - * calls @init_target_regions before starting the monitoring and + * calls @init_target_regions before starting the monitoring, + * @update_target_regions for each @regions_update_interval, and * @prepare_access_checks, @check_accesses, and @target_valid for each * @sample_interval. + * * @init_target_regions should construct proper monitoring target regions and * link those to the DAMON context struct. + * @update_target_regions should update the monitoring target regions for + * current status. + * @prepare_access_checks should manipulate the monitoring regions to be * prepare for the next access check. * @check_accesses should check the accesses to each region that made after the @@ -87,6 +93,7 @@ struct damon_ctx; */ struct damon_primitive { void (*init_target_regions)(struct damon_ctx *context); + void (*update_target_regions)(struct damon_ctx *context); void (*prepare_access_checks)(struct damon_ctx *context); unsigned int (*check_accesses)(struct damon_ctx *context); bool (*target_valid)(struct damon_target *target); @@ -132,13 +139,16 @@ struct damon_callback { * * @sample_interval: The time between access samplings. * @aggr_interval: The time between monitor results aggregations. + * @regions_update_interval: The time between monitor regions updates. * @min_nr_regions: The minimum number of monitoring regions. * @max_nr_regions: The maximum number of monitoring regions. * * For each @sample_interval, DAMON checks whether each region is accessed or * not. It aggregates and keeps the access information (number of accesses to - * each region) for @aggr_interval time. All time intervals are in - * micro-seconds. + * each region) for @aggr_interval time. DAMON also checks whether the target + * memory regions need update (e.g., by ``mmap()`` calls from the application, + * in case of virtual memory monitoring) and applies the changes for each + * @regions_update_interval. All time intervals are in micro-seconds. * * @kdamond: Kernel thread who does the monitoring. * @kdamond_stop: Notifies whether kdamond should stop. @@ -167,10 +177,12 @@ struct damon_callback { struct damon_ctx { unsigned long sample_interval; unsigned long aggr_interval; + unsigned long regions_update_interval; unsigned long min_nr_regions; unsigned long max_nr_regions; struct timespec64 last_aggregation; + struct timespec64 last_regions_update; struct task_struct *kdamond; bool kdamond_stop; @@ -216,8 +228,8 @@ unsigned int damon_nr_regions(struct damon_target *t); int damon_set_targets(struct damon_ctx *ctx, unsigned long *ids, ssize_t nr_ids); -int damon_set_attrs(struct damon_ctx *ctx, - unsigned long sample_int, unsigned long aggr_int, +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, + unsigned long aggr_int, unsigned long regions_update_int, unsigned long min_nr_reg, unsigned long max_nr_reg); int damon_nr_running_ctxs(void); diff --git a/mm/damon/core.c b/mm/damon/core.c index ed364b42721d..36428327e848 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -167,6 +167,7 @@ int damon_set_targets(struct damon_ctx *ctx, * damon_set_attrs() - Set attributes for the monitoring. * @ctx: monitoring context * @sample_int: time interval between samplings + * @regions_update_int: time interval between target regions update * @aggr_int: time interval between aggregations * @min_nr_reg: minimal number of regions * @max_nr_reg: maximum number of regions @@ -176,8 +177,8 @@ int damon_set_targets(struct damon_ctx *ctx, * * Return: 0 on success, negative error code otherwise. */ -int damon_set_attrs(struct damon_ctx *ctx, - unsigned long sample_int, unsigned long aggr_int, +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, + unsigned long aggr_int, unsigned long regions_update_int, unsigned long min_nr_reg, unsigned long max_nr_reg) { if (min_nr_reg < 3) { @@ -193,6 +194,7 @@ int damon_set_attrs(struct damon_ctx *ctx, ctx->sample_interval = sample_int; ctx->aggr_interval = aggr_int; + ctx->regions_update_interval = regions_update_int; ctx->min_nr_regions = min_nr_reg; ctx->max_nr_regions = max_nr_reg; @@ -529,6 +531,17 @@ static void kdamond_split_regions(struct damon_ctx *ctx) last_nr_regions = nr_regions; } +/* + * Check whether it is time to check and apply the target monitoring regions + * + * Returns true if it is. + */ +static bool kdamond_need_update_regions(struct damon_ctx *ctx) +{ + return damon_check_reset_time_interval(&ctx->last_regions_update, + ctx->regions_update_interval); +} + /* * Check whether current monitoring should be stopped * @@ -612,6 +625,11 @@ static int kdamond_fn(void *data) kdamond_reset_aggregated(ctx); kdamond_split_regions(ctx); } + + if (kdamond_need_update_regions(ctx)) { + kdamond_call_prmt(ctx, update_target_regions); + sz_limit = damon_region_sz_limit(ctx); + } } damon_for_each_target(t, ctx) { damon_for_each_region_safe(r, next, t) From patchwork Tue Oct 20 08:59:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846087 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39ABB1580 for ; Tue, 20 Oct 2020 09:04:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E26332237B for ; Tue, 20 Oct 2020 09:04:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="QSTXCpyP" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E26332237B Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E9DFB6B0068; Tue, 20 Oct 2020 05:04:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E5C636B0070; Tue, 20 Oct 2020 05:04:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3C066B006E; Tue, 20 Oct 2020 05:04:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0249.hostedemail.com [216.40.44.249]) by kanga.kvack.org (Postfix) with ESMTP id A41446B0062 for ; Tue, 20 Oct 2020 05:04:08 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 45A2082F4F9D for ; Tue, 20 Oct 2020 09:04:08 +0000 (UTC) X-FDA: 77391716976.08.verse83_000cb242723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 199E71819E76F for ; Tue, 20 Oct 2020 09:04:08 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30003:30005:30029:30045:30051:30054:30064:30075,0,RBL:72.21.198.25:@amazon.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10;04y8mowgjykkthen5u8greew8p6hioc4ehf6ws9w8uuhwqcsgbz7ay8t9xqu65r.7ee47nnb9xijiotnjtsg9hu57486ky75npg6yzi49dziws35u1hm5sbtu5qjqwt.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: verse83_000cb242723e X-Filterd-Recvd-Size: 9563 Received: from smtp-fw-4101.amazon.com (smtp-fw-4101.amazon.com [72.21.198.25]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:04:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184648; x=1634720648; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=8x4DJKOYLh0gqQEMw739n/sQ0+LowFPlq4c5wDj0ivE=; b=QSTXCpyPBePeK2nzgPBSKEPgpGZImoKHuIPo5egOEa3lBSPGTI3+C5WQ GwEwXIp6rT7XG+IyC7uJMGSJM13KW1sKbeflECJQnrLZIkXvGJwDMU0pn W4tpFgQmVtTcm7isfYjRhED1BTsC2F4ThWhRxpwvzxZD2TsHThzayCOEb o=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="60312483" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-1c1b5cdd.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP; 20 Oct 2020 09:03:58 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162]) by email-inbound-relay-2a-1c1b5cdd.us-west-2.amazon.com (Postfix) with ESMTPS id 985A8A1DAF; Tue, 20 Oct 2020 09:02:35 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:02:15 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 05/18] mm/idle_page_tracking: Make PG_(idle|young) reusable Date: Tue, 20 Oct 2020 10:59:27 +0200 Message-ID: <20201020085940.13875-6-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park PG_idle and PG_young allows the two PTE Accessed bit users, IDLE_PAGE_TRACKING and the reclaim logic concurrently work while don't interfere each other. That is, when they need to clear the Accessed bit, they set PG_young and PG_idle to represent the previous state of the bit, respectively. And when they need to read the bit, if the bit is cleared, they further read the PG_young and PG_idle, respectively, to know whether the other has cleared the bit meanwhile or not. We could add another page flag and extend the mechanism to use the flag if we need to add another concurrent PTE Accessed bit user subsystem. However, it would be only waste the space. Instead, if the new subsystem is mutually exclusive with IDLE_PAGE_TRACKING, it could simply reuse the PG_idle flag. However, it's impossible because the flags are dependent on IDLE_PAGE_TRACKING. To allow such reuse of the flags, this commit separates the PG_young and PG_idle flag logic from IDLE_PAGE_TRACKING and introduces new kernel config, 'PAGE_IDLE_FLAG'. Hence, if !IDLE_PAGE_TRACKING and IDLE_PAGE_FLAG, a new subsystem would be able to reuse PG_idle. In the next commit, DAMON's reference implementation of the virtual memory address space monitoring primitives will use it. Signed-off-by: SeongJae Park --- include/linux/page-flags.h | 4 ++-- include/linux/page_ext.h | 2 +- include/linux/page_idle.h | 6 +++--- include/trace/events/mmflags.h | 2 +- mm/Kconfig | 8 ++++++++ mm/page_ext.c | 12 +++++++++++- mm/page_idle.c | 10 ---------- 7 files changed, 26 insertions(+), 18 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6be1aa559b1e..7736d290bb61 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -132,7 +132,7 @@ enum pageflags { #ifdef CONFIG_MEMORY_FAILURE PG_hwpoison, /* hardware poisoned page. Don't touch */ #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) PG_young, PG_idle, #endif @@ -432,7 +432,7 @@ static inline bool set_hwpoison_free_buddy_page(struct page *page) #define __PG_HWPOISON 0 #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) TESTPAGEFLAG(Young, young, PF_ANY) SETPAGEFLAG(Young, young, PF_ANY) TESTCLEARFLAG(Young, young, PF_ANY) diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index cfce186f0c4e..c9cbc9756011 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -19,7 +19,7 @@ struct page_ext_operations { enum page_ext_flags { PAGE_EXT_OWNER, PAGE_EXT_OWNER_ALLOCATED, -#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, #endif diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index 1e894d34bdce..d8a6aecf99cb 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -6,7 +6,7 @@ #include #include -#ifdef CONFIG_IDLE_PAGE_TRACKING +#ifdef CONFIG_PAGE_IDLE_FLAG #ifdef CONFIG_64BIT static inline bool page_is_young(struct page *page) @@ -106,7 +106,7 @@ static inline void clear_page_idle(struct page *page) } #endif /* CONFIG_64BIT */ -#else /* !CONFIG_IDLE_PAGE_TRACKING */ +#else /* !CONFIG_PAGE_IDLE_FLAG */ static inline bool page_is_young(struct page *page) { @@ -135,6 +135,6 @@ static inline void clear_page_idle(struct page *page) { } -#endif /* CONFIG_IDLE_PAGE_TRACKING */ +#endif /* CONFIG_PAGE_IDLE_FLAG */ #endif /* _LINUX_MM_PAGE_IDLE_H */ diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 5fb752034386..4d182c32071b 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -73,7 +73,7 @@ #define IF_HAVE_PG_HWPOISON(flag,string) #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) #define IF_HAVE_PG_IDLE(flag,string) ,{1UL << flag, string} #else #define IF_HAVE_PG_IDLE(flag,string) diff --git a/mm/Kconfig b/mm/Kconfig index 19fe2251c87a..044317ef9143 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -761,10 +761,18 @@ config DEFERRED_STRUCT_PAGE_INIT lifetime of the system until these kthreads finish the initialisation. +config PAGE_IDLE_FLAG + bool "Add PG_idle and PG_young flags" + help + This feature adds PG_idle and PG_young flags in 'struct page'. PTE + Accessed bit writers can set the state of the bit in the flags to let + other PTE Accessed bit readers don't disturbed. + config IDLE_PAGE_TRACKING bool "Enable idle page tracking" depends on SYSFS && MMU select PAGE_EXTENSION if !64BIT + select PAGE_IDLE_FLAG help This feature allows to estimate the amount of user pages that have not been touched during a given period of time. This information can diff --git a/mm/page_ext.c b/mm/page_ext.c index a3616f7a0e9e..f9a6ff65ac0a 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -58,11 +58,21 @@ * can utilize this callback to initialize the state of it correctly. */ +#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) +static bool need_page_idle(void) +{ + return true; +} +struct page_ext_operations page_idle_ops = { + .need = need_page_idle, +}; +#endif + static struct page_ext_operations *page_ext_ops[] = { #ifdef CONFIG_PAGE_OWNER &page_owner_ops, #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) &page_idle_ops, #endif }; diff --git a/mm/page_idle.c b/mm/page_idle.c index 057c61df12db..144fb4ed961d 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -211,16 +211,6 @@ static const struct attribute_group page_idle_attr_group = { .name = "page_idle", }; -#ifndef CONFIG_64BIT -static bool need_page_idle(void) -{ - return true; -} -struct page_ext_operations page_idle_ops = { - .need = need_page_idle, -}; -#endif - static int __init page_idle_init(void) { int err; From patchwork Tue Oct 20 08:59:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846099 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 246101580 for ; Tue, 20 Oct 2020 09:06:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A04C82242E for ; Tue, 20 Oct 2020 09:06:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="bgXwCapS" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A04C82242E Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9B4DD6B0072; Tue, 20 Oct 2020 05:06:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 964DD6B0073; Tue, 20 Oct 2020 05:06:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DC3E6B0074; Tue, 20 Oct 2020 05:06:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3E22B6B0072 for ; Tue, 20 Oct 2020 05:06:34 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D38E5180AD807 for ; Tue, 20 Oct 2020 09:06:33 +0000 (UTC) X-FDA: 77391723066.23.able41_270eb362723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id AFA8837604 for ; Tue, 20 Oct 2020 09:06:33 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30003:30012:30034:30051:30054:30056:30064:30070:30071:30075:30080:30090,0,RBL:207.171.184.29:@amazon.com:.lbl8.mailshell.net-66.10.201.10 62.18.0.100;04ygsxwbtfn6hkogwg57ewq8k6f4yoppdh3msmym6c41ig8ykhgjkqhtt43j1uq.sx9infwuwjbyctm333w8dgc77dqmw91s1qau53kgzgkiajxks9ndqhzrcq1rkhr.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: able41_270eb362723e X-Filterd-Recvd-Size: 24304 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:06:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184793; x=1634720793; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=wlhigpqUWnPzGdNMOvD3bnUTVB5Oom4xoVgYUvo94Cs=; b=bgXwCapS9ES5/XZ9R8HhYplFSGg0WTWylaMNIVrXY3Us1/xTpQ11tvFp oyek3+MDbF3SIqvegebZplHf0vBlM5H3doKbsA5WKrkizOiWiOG2y9vcM X6SF5/9WYyuii+Lyhc1gBm6AyGtFJqugCS6OjVOLtnq3X4xFQUkSJ3ztC A=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="86290293" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2c-579b7f5b.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9102.sea19.amazon.com with ESMTP; 20 Oct 2020 09:06:31 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2c-579b7f5b.us-west-2.amazon.com (Postfix) with ESMTPS id B8ED7A1BC3; Tue, 20 Oct 2020 09:02:58 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:02:37 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 06/18] mm/damon: Implement primitives for the virtual memory address spaces Date: Tue, 20 Oct 2020 10:59:28 +0200 Message-ID: <20201020085940.13875-7-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park This commit introduces a reference implementation of the address space specific low level primitives for the virtual address space, so that users of DAMON can easily monitor the data accesses on virtual address spaces of specific processes by simply configuring the implementation to be used by DAMON. The low level primitives for the fundamental access monitoring are defined in two parts: 1. Identification of the monitoring target address range for the address space. 2. Access check of specific address range in the target space. The reference implementation for the virtual address space provided by this commit is designed as below. PTE Accessed-bit Based Access Check ----------------------------------- The implementation uses PTE Accessed-bit for basic access checks. That is, it clears the bit for next sampling target page and checks whether it set again after one sampling period. This could disturb other kernel subsystems using the Accessed bits, namely Idle page tracking and the reclaim logic. To avoid such disturbances, DAMON makes it mutually exclusive with Idle page tracking and uses ``PG_idle`` and ``PG_young`` page flags to solve the conflict with the reclaim logics, as Idle page tracking does. VMA-based Target Address Range Construction ------------------------------------------- Only small parts in the super-huge virtual address space of the processes are mapped to physical memory and accessed. Thus, tracking the unmapped address regions is just wasteful. However, because DAMON can deal with some level of noise using the adaptive regions adjustment mechanism, tracking every mapping is not strictly required but could even incur a high overhead in some cases. That said, too huge unmapped areas inside the monitoring target should be removed to not take the time for the adaptive mechanism. For the reason, this implementation converts the complex mappings to three distinct regions that cover every mapped area of the address space. Also, the two gaps between the three regions are the two biggest unmapped areas in the given address space. The two biggest unmapped areas would be the gap between the heap and the uppermost mmap()-ed region, and the gap between the lowermost mmap()-ed region and the stack in most of the cases. Because these gaps are exceptionally huge in usual address spacees, excluding these will be sufficient to make a reasonable trade-off. Below shows this in detail:: (small mmap()-ed regions and munmap()-ed regions) Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 14 + mm/damon/Kconfig | 10 + mm/damon/Makefile | 1 + mm/damon/primitives.c | 582 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 607 insertions(+) create mode 100644 mm/damon/primitives.c diff --git a/include/linux/damon.h b/include/linux/damon.h index b8562814751e..70cc4b54212e 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -238,4 +238,18 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); #endif /* CONFIG_DAMON */ +#ifdef CONFIG_DAMON_PRIMITIVES + +/* Reference callback implementations for virtual memory */ +void damon_va_init_regions(struct damon_ctx *ctx); +void damon_va_update_regions(struct damon_ctx *ctx); +void damon_va_prepare_access_checks(struct damon_ctx *ctx); +unsigned int damon_va_check_accesses(struct damon_ctx *ctx); +bool damon_va_target_valid(struct damon_target *t); +void damon_va_cleanup(struct damon_ctx *ctx); +void damon_va_set_primitives(struct damon_ctx *ctx); + +#endif /* CONFIG_DAMON_PRIMITIVES */ + + #endif diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index d00e99ac1a15..0d2a18ddb9d8 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -12,4 +12,14 @@ config DAMON See https://damonitor.github.io/doc/html/latest-damon/index.html for more information. +config DAMON_PRIMITIVES + bool "Monitoring primitives for virtual address spaces monitoring" + depends on DAMON && MMU && !IDLE_PAGE_TRACKING + select PAGE_EXTENSION if !64BIT + select PAGE_IDLE_FLAG + help + This builds the default data access monitoring primitives for DAMON. + The primitives support only virtual address spaces. If this cannot + cover your use case, you can implement and use your own primitives. + endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile index 4fd2edb4becf..2f3235a52e5e 100644 --- a/mm/damon/Makefile +++ b/mm/damon/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_DAMON) := core.o +obj-$(CONFIG_DAMON_PRIMITIVES) += primitives.o diff --git a/mm/damon/primitives.c b/mm/damon/primitives.c new file mode 100644 index 000000000000..9b603ac0077c --- /dev/null +++ b/mm/damon/primitives.c @@ -0,0 +1,582 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Low Level Primitives for Data Access Monitoring + * + * Author: SeongJae Park + */ + +#define pr_fmt(fmt) "damon-prmt: " fmt + +#include +#include +#include +#include +#include +#include +#include + +/* Minimal region size. Every damon_region is aligned by this. */ +#define MIN_REGION PAGE_SIZE + +/* Get a random number in [l, r) */ +#define damon_rand(l, r) (l + prandom_u32_max(r - l)) + +/* + * 't->id' should be the pointer to the relevant 'struct pid' having reference + * count. Caller must put the returned task, unless it is NULL. + */ +#define damon_get_task_struct(t) \ + (get_pid_task((struct pid *)t->id, PIDTYPE_PID)) + +/* + * Get the mm_struct of the given target + * + * Caller _must_ put the mm_struct after use, unless it is NULL. + * + * Returns the mm_struct of the target on success, NULL on failure + */ +static struct mm_struct *damon_get_mm(struct damon_target *t) +{ + struct task_struct *task; + struct mm_struct *mm; + + task = damon_get_task_struct(t); + if (!task) + return NULL; + + mm = get_task_mm(task); + put_task_struct(task); + return mm; +} + +/* + * Primitives for virtual address spaces + */ + +/* + * Functions for the initial monitoring target regions construction + */ + +/* + * Size-evenly split a region into 'nr_pieces' small regions + * + * Returns 0 on success, or negative error code otherwise. + */ +static int damon_va_evenly_split_region(struct damon_ctx *ctx, + struct damon_region *r, unsigned int nr_pieces) +{ + unsigned long sz_orig, sz_piece, orig_end; + struct damon_region *n = NULL, *next; + unsigned long start; + + if (!r || !nr_pieces) + return -EINVAL; + + orig_end = r->ar.end; + sz_orig = r->ar.end - r->ar.start; + sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, MIN_REGION); + + if (!sz_piece) + return -EINVAL; + + r->ar.end = r->ar.start + sz_piece; + next = damon_next_region(r); + for (start = r->ar.end; start + sz_piece <= orig_end; + start += sz_piece) { + n = damon_new_region(start, start + sz_piece); + if (!n) + return -ENOMEM; + damon_insert_region(n, r, next); + r = n; + } + /* complement last region for possible rounding error */ + if (n) + n->ar.end = orig_end; + + return 0; +} + +static unsigned long sz_range(struct damon_addr_range *r) +{ + return r->end - r->start; +} + +static void swap_ranges(struct damon_addr_range *r1, + struct damon_addr_range *r2) +{ + struct damon_addr_range tmp; + + tmp = *r1; + *r1 = *r2; + *r2 = tmp; +} + +/* + * Find three regions separated by two biggest unmapped regions + * + * vma the head vma of the target address space + * regions an array of three address ranges that results will be saved + * + * This function receives an address space and finds three regions in it which + * separated by the two biggest unmapped regions in the space. Please refer to + * below comments of '__damon_va_init_regions()' function to know why this is + * necessary. + * + * Returns 0 if success, or negative error code otherwise. + */ +static int __damon_va_three_regions(struct vm_area_struct *vma, + struct damon_addr_range regions[3]) +{ + struct damon_addr_range gap = {0}, first_gap = {0}, second_gap = {0}; + struct vm_area_struct *last_vma = NULL; + unsigned long start = 0; + struct rb_root rbroot; + + /* Find two biggest gaps so that first_gap > second_gap > others */ + for (; vma; vma = vma->vm_next) { + if (!last_vma) { + start = vma->vm_start; + goto next; + } + + if (vma->rb_subtree_gap <= sz_range(&second_gap)) { + rbroot.rb_node = &vma->vm_rb; + vma = rb_entry(rb_last(&rbroot), + struct vm_area_struct, vm_rb); + goto next; + } + + gap.start = last_vma->vm_end; + gap.end = vma->vm_start; + if (sz_range(&gap) > sz_range(&second_gap)) { + swap_ranges(&gap, &second_gap); + if (sz_range(&second_gap) > sz_range(&first_gap)) + swap_ranges(&second_gap, &first_gap); + } +next: + last_vma = vma; + } + + if (!sz_range(&second_gap) || !sz_range(&first_gap)) + return -EINVAL; + + /* Sort the two biggest gaps by address */ + if (first_gap.start > second_gap.start) + swap_ranges(&first_gap, &second_gap); + + /* Store the result */ + regions[0].start = ALIGN(start, MIN_REGION); + regions[0].end = ALIGN(first_gap.start, MIN_REGION); + regions[1].start = ALIGN(first_gap.end, MIN_REGION); + regions[1].end = ALIGN(second_gap.start, MIN_REGION); + regions[2].start = ALIGN(second_gap.end, MIN_REGION); + regions[2].end = ALIGN(last_vma->vm_end, MIN_REGION); + + return 0; +} + +/* + * Get the three regions in the given target (task) + * + * Returns 0 on success, negative error code otherwise. + */ +static int damon_va_three_regions(struct damon_target *t, + struct damon_addr_range regions[3]) +{ + struct mm_struct *mm; + int rc; + + mm = damon_get_mm(t); + if (!mm) + return -EINVAL; + + mmap_read_lock(mm); + rc = __damon_va_three_regions(mm->mmap, regions); + mmap_read_unlock(mm); + + mmput(mm); + return rc; +} + +/* + * Initialize the monitoring target regions for the given target (task) + * + * t the given target + * + * Because only a number of small portions of the entire address space + * is actually mapped to the memory and accessed, monitoring the unmapped + * regions is wasteful. That said, because we can deal with small noises, + * tracking every mapping is not strictly required but could even incur a high + * overhead if the mapping frequently changes or the number of mappings is + * high. The adaptive regions adjustment mechanism will further help to deal + * with the noise by simply identifying the unmapped areas as a region that + * has no access. Moreover, applying the real mappings that would have many + * unmapped areas inside will make the adaptive mechanism quite complex. That + * said, too huge unmapped areas inside the monitoring target should be removed + * to not take the time for the adaptive mechanism. + * + * For the reason, we convert the complex mappings to three distinct regions + * that cover every mapped area of the address space. Also the two gaps + * between the three regions are the two biggest unmapped areas in the given + * address space. In detail, this function first identifies the start and the + * end of the mappings and the two biggest unmapped areas of the address space. + * Then, it constructs the three regions as below: + * + * [mappings[0]->start, big_two_unmapped_areas[0]->start) + * [big_two_unmapped_areas[0]->end, big_two_unmapped_areas[1]->start) + * [big_two_unmapped_areas[1]->end, mappings[nr_mappings - 1]->end) + * + * As usual memory map of processes is as below, the gap between the heap and + * the uppermost mmap()-ed region, and the gap between the lowermost mmap()-ed + * region and the stack will be two biggest unmapped regions. Because these + * gaps are exceptionally huge areas in usual address space, excluding these + * two biggest unmapped regions will be sufficient to make a trade-off. + * + * + * + * + * (other mmap()-ed regions and small unmapped regions) + * + * + * + */ +static void __damon_va_init_regions(struct damon_ctx *c, + struct damon_target *t) +{ + struct damon_region *r; + struct damon_addr_range regions[3]; + unsigned long sz = 0, nr_pieces; + int i; + + if (damon_va_three_regions(t, regions)) { + pr_err("Failed to get three regions of target %lu\n", t->id); + return; + } + + for (i = 0; i < 3; i++) + sz += regions[i].end - regions[i].start; + if (c->min_nr_regions) + sz /= c->min_nr_regions; + if (sz < MIN_REGION) + sz = MIN_REGION; + + /* Set the initial three regions of the target */ + for (i = 0; i < 3; i++) { + r = damon_new_region(regions[i].start, regions[i].end); + if (!r) { + pr_err("%d'th init region creation failed\n", i); + return; + } + damon_add_region(r, t); + + nr_pieces = (regions[i].end - regions[i].start) / sz; + damon_va_evenly_split_region(c, r, nr_pieces); + } +} + +/* Initialize '->regions_list' of every target (task) */ +void damon_va_init_regions(struct damon_ctx *ctx) +{ + struct damon_target *t; + + damon_for_each_target(t, ctx) { + /* the user may set the target regions as they want */ + if (!damon_nr_regions(t)) + __damon_va_init_regions(ctx, t); + } +} + +/* + * Functions for the dynamic monitoring target regions update + */ + +/* + * Check whether a region is intersecting an address range + * + * Returns true if it is. + */ +static bool damon_intersect(struct damon_region *r, struct damon_addr_range *re) +{ + return !(r->ar.end <= re->start || re->end <= r->ar.start); +} + +/* + * Update damon regions for the three big regions of the given target + * + * t the given target + * bregions the three big regions of the target + */ +static void damon_va_apply_three_regions(struct damon_ctx *ctx, + struct damon_target *t, struct damon_addr_range bregions[3]) +{ + struct damon_region *r, *next; + unsigned int i = 0; + + /* Remove regions which are not in the three big regions now */ + damon_for_each_region_safe(r, next, t) { + for (i = 0; i < 3; i++) { + if (damon_intersect(r, &bregions[i])) + break; + } + if (i == 3) + damon_destroy_region(r); + } + + /* Adjust intersecting regions to fit with the three big regions */ + for (i = 0; i < 3; i++) { + struct damon_region *first = NULL, *last; + struct damon_region *newr; + struct damon_addr_range *br; + + br = &bregions[i]; + /* Get the first and last regions which intersects with br */ + damon_for_each_region(r, t) { + if (damon_intersect(r, br)) { + if (!first) + first = r; + last = r; + } + if (r->ar.start >= br->end) + break; + } + if (!first) { + /* no damon_region intersects with this big region */ + newr = damon_new_region( + ALIGN_DOWN(br->start, MIN_REGION), + ALIGN(br->end, MIN_REGION)); + if (!newr) + continue; + damon_insert_region(newr, damon_prev_region(r), r); + } else { + first->ar.start = ALIGN_DOWN(br->start, MIN_REGION); + last->ar.end = ALIGN(br->end, MIN_REGION); + } + } +} + +/* + * Update regions for current memory mappings + */ +void damon_va_update_regions(struct damon_ctx *ctx) +{ + struct damon_addr_range three_regions[3]; + struct damon_target *t; + + damon_for_each_target(t, ctx) { + if (damon_va_three_regions(t, three_regions)) + continue; + damon_va_apply_three_regions(ctx, t, three_regions); + } +} + +/* + * Functions for the access checking of the regions + */ + +static void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, + unsigned long addr) +{ + bool referenced = false; + struct page *page = pte_page(*pte); + + if (pte_young(*pte)) { + referenced = true; + *pte = pte_mkold(*pte); + } + +#ifdef CONFIG_MMU_NOTIFIER + if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE)) + referenced = true; +#endif /* CONFIG_MMU_NOTIFIER */ + + if (referenced) + set_page_young(page); + + set_page_idle(page); +} + +static void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, + unsigned long addr) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + bool referenced = false; + struct page *page = pmd_page(*pmd); + + if (pmd_young(*pmd)) { + referenced = true; + *pmd = pmd_mkold(*pmd); + } + +#ifdef CONFIG_MMU_NOTIFIER + if (mmu_notifier_clear_young(mm, addr, + addr + ((1UL) << HPAGE_PMD_SHIFT))) + referenced = true; +#endif /* CONFIG_MMU_NOTIFIER */ + + if (referenced) + set_page_young(page); + + set_page_idle(page); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +} + +static void damon_va_mkold(struct mm_struct *mm, unsigned long addr) +{ + pte_t *pte = NULL; + pmd_t *pmd = NULL; + spinlock_t *ptl; + + if (follow_pte_pmd(mm, addr, NULL, &pte, &pmd, &ptl)) + return; + + if (pte) { + damon_ptep_mkold(pte, mm, addr); + pte_unmap_unlock(pte, ptl); + } else { + damon_pmdp_mkold(pmd, mm, addr); + spin_unlock(ptl); + } +} + +static void damon_va_prepare_access_check(struct damon_ctx *ctx, + struct mm_struct *mm, struct damon_region *r) +{ + r->sampling_addr = damon_rand(r->ar.start, r->ar.end); + + damon_va_mkold(mm, r->sampling_addr); +} + +void damon_va_prepare_access_checks(struct damon_ctx *ctx) +{ + struct damon_target *t; + struct mm_struct *mm; + struct damon_region *r; + + damon_for_each_target(t, ctx) { + mm = damon_get_mm(t); + if (!mm) + continue; + damon_for_each_region(r, t) + damon_va_prepare_access_check(ctx, mm, r); + mmput(mm); + } +} + +static bool damon_va_young(struct mm_struct *mm, unsigned long addr, + unsigned long *page_sz) +{ + pte_t *pte = NULL; + pmd_t *pmd = NULL; + spinlock_t *ptl; + bool young = false; + + if (follow_pte_pmd(mm, addr, NULL, &pte, &pmd, &ptl)) + return false; + + *page_sz = PAGE_SIZE; + if (pte) { + young = pte_young(*pte); + if (!young) + young = !page_is_idle(pte_page(*pte)); + pte_unmap_unlock(pte, ptl); + return young; + } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + young = pmd_young(*pmd); + if (!young) + young = !page_is_idle(pmd_page(*pmd)); + spin_unlock(ptl); + *page_sz = ((1UL) << HPAGE_PMD_SHIFT); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + + return young; +} + +/* + * Check whether the region was accessed after the last preparation + * + * mm 'mm_struct' for the given virtual address space + * r the region to be checked + */ +static void damon_va_check_access(struct damon_ctx *ctx, + struct mm_struct *mm, struct damon_region *r) +{ + static struct mm_struct *last_mm; + static unsigned long last_addr; + static unsigned long last_page_sz = PAGE_SIZE; + static bool last_accessed; + + /* If the region is in the last checked page, reuse the result */ + if (mm == last_mm && (ALIGN_DOWN(last_addr, last_page_sz) == + ALIGN_DOWN(r->sampling_addr, last_page_sz))) { + if (last_accessed) + r->nr_accesses++; + return; + } + + last_accessed = damon_va_young(mm, r->sampling_addr, &last_page_sz); + if (last_accessed) + r->nr_accesses++; + + last_mm = mm; + last_addr = r->sampling_addr; +} + +unsigned int damon_va_check_accesses(struct damon_ctx *ctx) +{ + struct damon_target *t; + struct mm_struct *mm; + struct damon_region *r; + unsigned int max_nr_accesses = 0; + + damon_for_each_target(t, ctx) { + mm = damon_get_mm(t); + if (!mm) + continue; + damon_for_each_region(r, t) { + damon_va_check_access(ctx, mm, r); + max_nr_accesses = max(r->nr_accesses, max_nr_accesses); + } + mmput(mm); + } + + return max_nr_accesses; +} + +/* + * Functions for the target validity check and cleanup + */ + +bool damon_va_target_valid(struct damon_target *t) +{ + struct task_struct *task; + + task = damon_get_task_struct(t); + if (task) { + put_task_struct(task); + return true; + } + + return false; +} + +void damon_va_cleanup(struct damon_ctx *ctx) +{ + struct damon_target *t, *next; + + damon_for_each_target_safe(t, next, ctx) { + put_pid((struct pid *)t->id); + damon_destroy_target(t); + } +} + +void damon_va_set_primitives(struct damon_ctx *ctx) +{ + ctx->primitive.init_target_regions = damon_va_init_regions; + ctx->primitive.update_target_regions = damon_va_update_regions; + ctx->primitive.prepare_access_checks = damon_va_prepare_access_checks; + ctx->primitive.check_accesses = damon_va_check_accesses; + ctx->primitive.target_valid = damon_va_target_valid; + ctx->primitive.cleanup = damon_va_cleanup; +} From patchwork Tue Oct 20 08:59:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846101 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 19DE215E6 for ; Tue, 20 Oct 2020 09:06:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BF74C222C8 for ; Tue, 20 Oct 2020 09:06:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="vixnYMm3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BF74C222C8 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 12B346B0073; Tue, 20 Oct 2020 05:06:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0B4C36B0074; Tue, 20 Oct 2020 05:06:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D87C76B0075; Tue, 20 Oct 2020 05:06:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0150.hostedemail.com [216.40.44.150]) by kanga.kvack.org (Postfix) with ESMTP id A22866B0073 for ; Tue, 20 Oct 2020 05:06:35 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4A79DAC05 for ; Tue, 20 Oct 2020 09:06:35 +0000 (UTC) X-FDA: 77391723150.29.cable36_380e2472723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 2F4DE180868D8 for ; Tue, 20 Oct 2020 09:06:35 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30029:30046:30051:30054:30064:30070:30074:30075,0,RBL:52.95.49.90:@amazon.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10;04y8rx6baqfzi7gj4zt6ondtcxm39yc8j4setp3ows1torqe5m8c8q5tqryus35.y98cehjjnq34ywsnyq9ba7skmkchhs15wxkpi9yke8rot45gsceu6xwd718qynk.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: cable36_380e2472723e X-Filterd-Recvd-Size: 8780 Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:06:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184794; x=1634720794; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=tdNdw6IfHLO/vMz+dCnfV6saHBkcPkuM1L7pDo6BFss=; b=vixnYMm3BlrXeaZPw31sjE0m1un5f0cc0iw2F6xgbk259ZjIlb4VhQ5S L2ufD3S93Q2nXXuYkr4PCkJJTdy1MhvB2yYHD+3PUZxtCNBu/nVPQ7AaH TW+VkQbMfo4gzT3Bkv90c10c9SGM0Uv/1rVjLrpZx8BtGbe8rLHsM0cFr U=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="60708464" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2c-579b7f5b.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 20 Oct 2020 09:06:30 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2c-579b7f5b.us-west-2.amazon.com (Postfix) with ESMTPS id D1604A1EED; Tue, 20 Oct 2020 09:03:22 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:03:03 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 07/18] mm/page_idle: Avoid interferences from concurrent users Date: Tue, 20 Oct 2020 10:59:29 +0200 Message-ID: <20201020085940.13875-8-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Concurrent Idle Page Tracking users can interfere each other because the interface doesn't provide a central rule for synchronization between the users. Users could implement their own synchronization rule, but even in that case, applications developed by different users would not know how to synchronize with others. To help this situation, this commit introduces a centralized synchronization infrastructure of Idle Page Tracking. In detail, this commit introduces a mutex lock for Idle Page Tracking, called 'page_idle_lock'. It is exposed to user space via a new bool sysfs file, '/sys/kernel/mm/page_idle/lock'. By writing to and reading from the file, users can hold/release and read status of the mutex. Writes to the Idle Page Tracking 'bitmap' file fails if the lock is not held, while reads of the file can be done regardless of the lock status. Note that users could still interfere each other if they abuse this locking rule. Nevertheless, this change will let them notice the rule. Signed-off-by: SeongJae Park --- .../admin-guide/mm/idle_page_tracking.rst | 22 +++++++--- mm/page_idle.c | 40 +++++++++++++++++++ 2 files changed, 56 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/mm/idle_page_tracking.rst b/Documentation/admin-guide/mm/idle_page_tracking.rst index df9394fb39c2..3f5e7a8b5b78 100644 --- a/Documentation/admin-guide/mm/idle_page_tracking.rst +++ b/Documentation/admin-guide/mm/idle_page_tracking.rst @@ -21,13 +21,13 @@ User API ======== The idle page tracking API is located at ``/sys/kernel/mm/page_idle``. -Currently, it consists of the only read-write file, -``/sys/kernel/mm/page_idle/bitmap``. +Currently, it consists of two read-write file, +``/sys/kernel/mm/page_idle/bitmap`` and ``/sys/kernel/mm/page_idle/lock``. -The file implements a bitmap where each bit corresponds to a memory page. The -bitmap is represented by an array of 8-byte integers, and the page at PFN #i is -mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is -set, the corresponding page is idle. +The ``bitmap`` file implements a bitmap where each bit corresponds to a memory +page. The bitmap is represented by an array of 8-byte integers, and the page at +PFN #i is mapped to bit #i%64 of array element #i/64, byte order is native. +When a bit is set, the corresponding page is idle. A page is considered idle if it has not been accessed since it was marked idle (for more details on what "accessed" actually means see the :ref:`Implementation @@ -74,6 +74,16 @@ See :ref:`Documentation/admin-guide/mm/pagemap.rst ` for more information about ``/proc/pid/pagemap``, ``/proc/kpageflags``, and ``/proc/kpagecgroup``. +The ``lock`` file is for avoidance of interference from concurrent users. If +the content of the ``lock`` file is ``1``, it means the ``bitmap`` file is +currently being used by someone. While the content of the ``lock`` file is +``1``, writing ``1`` to the file fails. Therefore, users should first +successfully write ``1`` to the ``lock`` file before starting use of ``bitmap`` +file and write ``0`` to the ``lock`` file after they finished use of the +``bitmap`` file. If a user writes the ``bitmap`` file while the ``lock`` is +``0``, the write fails. Meanwhile, reads of the ``bitmap`` file success +regardless of the ``lock`` status. + .. _impl_details: Implementation Details diff --git a/mm/page_idle.c b/mm/page_idle.c index 144fb4ed961d..0aa45f848570 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -16,6 +16,8 @@ #define BITMAP_CHUNK_SIZE sizeof(u64) #define BITMAP_CHUNK_BITS (BITMAP_CHUNK_SIZE * BITS_PER_BYTE) +static DEFINE_MUTEX(page_idle_lock); + /* * Idle page tracking only considers user memory pages, for other types of * pages the idle flag is always unset and an attempt to set it is silently @@ -169,6 +171,9 @@ static ssize_t page_idle_bitmap_write(struct file *file, struct kobject *kobj, unsigned long pfn, end_pfn; int bit; + if (!mutex_is_locked(&page_idle_lock)) + return -EPERM; + if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE) return -EINVAL; @@ -197,17 +202,52 @@ static ssize_t page_idle_bitmap_write(struct file *file, struct kobject *kobj, return (char *)in - buf; } +static ssize_t page_idle_lock_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", mutex_is_locked(&page_idle_lock)); +} + +static ssize_t page_idle_lock_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + bool do_lock; + int ret; + + ret = kstrtobool(buf, &do_lock); + if (ret < 0) + return ret; + + if (do_lock) { + if (!mutex_trylock(&page_idle_lock)) + return -EBUSY; + } else { + mutex_unlock(&page_idle_lock); + } + + return count; +} + static struct bin_attribute page_idle_bitmap_attr = __BIN_ATTR(bitmap, 0600, page_idle_bitmap_read, page_idle_bitmap_write, 0); +static struct kobj_attribute page_idle_lock_attr = + __ATTR(lock, 0600, page_idle_lock_show, page_idle_lock_store); + static struct bin_attribute *page_idle_bin_attrs[] = { &page_idle_bitmap_attr, NULL, }; +static struct attribute *page_idle_lock_attrs[] = { + &page_idle_lock_attr.attr, + NULL, +}; + static const struct attribute_group page_idle_attr_group = { .bin_attrs = page_idle_bin_attrs, + .attrs = page_idle_lock_attrs, .name = "page_idle", }; From patchwork Tue Oct 20 08:59:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846089 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5CC31580 for ; Tue, 20 Oct 2020 09:04:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8485D222C8 for ; Tue, 20 Oct 2020 09:04:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="A5YSAzyu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8485D222C8 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 43CB86B0062; Tue, 20 Oct 2020 05:04:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 413B46B0070; Tue, 20 Oct 2020 05:04:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E4A16B0062; Tue, 20 Oct 2020 05:04:09 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0209.hostedemail.com [216.40.44.209]) by kanga.kvack.org (Postfix) with ESMTP id E04146B0062 for ; Tue, 20 Oct 2020 05:04:08 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 675BCA8DD for ; Tue, 20 Oct 2020 09:04:08 +0000 (UTC) X-FDA: 77391716976.12.goat52_1c106a22723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 302381801BD7B for ; Tue, 20 Oct 2020 09:04:08 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30003:30051:30054:30064,0,RBL:207.171.190.10:@amazon.com:.lbl8.mailshell.net-66.10.201.10 62.18.0.100;04yfxz5tahb456891ck83kz9gbtikoc9o3e1jizsn5o69bkps4hf9onztotcpiq.857q4g567z3k881qoahn5op4szjaf4o7yshqtxnk8uqhjjssy3491s6935akmwh.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: goat52_1c106a22723e X-Filterd-Recvd-Size: 5812 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:04:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184648; x=1634720648; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=ex0dyR52pQlWbtLTi7+DCeRnQUrgwEcr4wvrGWXwkYQ=; b=A5YSAzyum2Tux7zJ66NfEzlWz29JnVbxJA5u9ZKzwp7veNdeD84nr6vI N18RuFwCI3QLMA6POSh/f/exPphBOCuyG39+MXuW16tqcR8+eMPQ9UBTg GWXxPgzgHBN6XfoOD0vV02AJ//TcN72ktkNOzT70x+iuJRnuJ5rRdEBGD c=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="85042254" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2c-456ef9c9.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP; 20 Oct 2020 09:04:07 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2c-456ef9c9.us-west-2.amazon.com (Postfix) with ESMTPS id B46DAA88A7; Tue, 20 Oct 2020 09:04:02 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:03:44 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 08/18] mm/damon/primitives: Make coexistable with Idle Page Tracking Date: Tue, 20 Oct 2020 10:59:30 +0200 Message-ID: <20201020085940.13875-9-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park DAMON's reference 'primitives' internally use 'PG_Idle' flag. Because the flag is also used by Idle Page Tracking but there was no way to synchronize with it, the 'primitives' were configured to be exclusive with Idle Page Tracking before. However, as we can now synchronize with Idle Page Tracking using 'idle_page_lock', this commit makes the primitives to do the synchronization and coexistable with Idle Page Tracking. Note that the 'primitives' only require the users to do the synchronization by themselves. Signed-off-by: SeongJae Park --- include/linux/page_idle.h | 2 ++ mm/damon/Kconfig | 2 +- mm/damon/primitives.c | 7 +++++++ mm/page_idle.c | 2 +- 4 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index d8a6aecf99cb..bcbb965b566c 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -8,6 +8,8 @@ #ifdef CONFIG_PAGE_IDLE_FLAG +extern struct mutex page_idle_lock; + #ifdef CONFIG_64BIT static inline bool page_is_young(struct page *page) { diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index 0d2a18ddb9d8..63b9c905b548 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -14,7 +14,7 @@ config DAMON config DAMON_PRIMITIVES bool "Monitoring primitives for virtual address spaces monitoring" - depends on DAMON && MMU && !IDLE_PAGE_TRACKING + depends on DAMON && MMU select PAGE_EXTENSION if !64BIT select PAGE_IDLE_FLAG help diff --git a/mm/damon/primitives.c b/mm/damon/primitives.c index 9b603ac0077c..59f4de703413 100644 --- a/mm/damon/primitives.c +++ b/mm/damon/primitives.c @@ -15,6 +15,10 @@ #include #include +#ifndef CONFIG_IDLE_PAGE_TRACKING +DEFINE_MUTEX(page_idle_lock); +#endif + /* Minimal region size. Every damon_region is aligned by this. */ #define MIN_REGION PAGE_SIZE @@ -552,6 +556,9 @@ bool damon_va_target_valid(struct damon_target *t) { struct task_struct *task; + if (!mutex_is_locked(&page_idle_lock)) + return false; + task = damon_get_task_struct(t); if (task) { put_task_struct(task); diff --git a/mm/page_idle.c b/mm/page_idle.c index 0aa45f848570..958dcc18f6cd 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -16,7 +16,7 @@ #define BITMAP_CHUNK_SIZE sizeof(u64) #define BITMAP_CHUNK_BITS (BITMAP_CHUNK_SIZE * BITS_PER_BYTE) -static DEFINE_MUTEX(page_idle_lock); +DEFINE_MUTEX(page_idle_lock); /* * Idle page tracking only considers user memory pages, for other types of From patchwork Tue Oct 20 08:59:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846115 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 619511580 for ; Tue, 20 Oct 2020 09:08:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0FD98223C6 for ; Tue, 20 Oct 2020 09:08:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="kmP+S9RM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0FD98223C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2FB976B0071; Tue, 20 Oct 2020 05:08:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 283D26B0073; Tue, 20 Oct 2020 05:08:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1722F6B0074; Tue, 20 Oct 2020 05:08:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0082.hostedemail.com [216.40.44.82]) by kanga.kvack.org (Postfix) with ESMTP id CD0FE6B0071 for ; Tue, 20 Oct 2020 05:08:03 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 71359181AC9CB for ; Tue, 20 Oct 2020 09:08:03 +0000 (UTC) X-FDA: 77391726846.28.ball78_080bafb2723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 53407B9FC for ; Tue, 20 Oct 2020 09:08:03 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30054:30056:30064:30080,0,RBL:207.171.184.25:@amazon.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10;04y8r5ss6jnrx8w351uhnrr5q7obxyc3ac9g46599owp3nafbxz33j1167z1ub9.xa1pwzr1odfnnpktetbtdgq1ggu5h3jt8ne6ptp5ap16kn9odyr3bfn1gyoh5yb.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: ball78_080bafb2723e X-Filterd-Recvd-Size: 5769 Received: from smtp-fw-9101.amazon.com (smtp-fw-9101.amazon.com [207.171.184.25]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:08:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184883; x=1634720883; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=rWHwBW9odJPqpm8jQnNHAqv59PCTaiCF9pp/UZsKxQo=; b=kmP+S9RMjLxI8hq/O6oNaK84x0O6sljFDG/5qJdN/w/acmDEx37FJ40x eNlWyWuQ06sdMSfz0JOaVkfz2kLTXOShYG3qGBg9UvC3Jv+BPKXlgt9EL 05IxAXl+dnKiNbH35RHQ89HtQKq/Vfk1XwfWLR9v4AoEhGaSlFOCqJzzJ w=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="78104561" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2a-538b0bfb.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP; 20 Oct 2020 09:08:01 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2a-538b0bfb.us-west-2.amazon.com (Postfix) with ESMTPS id EEB06A180E; Tue, 20 Oct 2020 09:04:48 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:04:24 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 09/18] mm/damon: Add a tracepoint Date: Tue, 20 Oct 2020 10:59:31 +0200 Message-ID: <20201020085940.13875-10-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park This commit adds a tracepoint for DAMON. It traces the monitoring results of each region for each aggregation interval. Using this, DAMON can easily integrated with tracepoints supporting tools such as perf. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster Reviewed-by: Steven Rostedt (VMware) --- include/trace/events/damon.h | 43 ++++++++++++++++++++++++++++++++++++ mm/damon/core.c | 7 +++++- 2 files changed, 49 insertions(+), 1 deletion(-) create mode 100644 include/trace/events/damon.h diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h new file mode 100644 index 000000000000..2f422f4f1fb9 --- /dev/null +++ b/include/trace/events/damon.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM damon + +#if !defined(_TRACE_DAMON_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DAMON_H + +#include +#include +#include + +TRACE_EVENT(damon_aggregated, + + TP_PROTO(struct damon_target *t, struct damon_region *r, + unsigned int nr_regions), + + TP_ARGS(t, r, nr_regions), + + TP_STRUCT__entry( + __field(unsigned long, target_id) + __field(unsigned int, nr_regions) + __field(unsigned long, start) + __field(unsigned long, end) + __field(unsigned int, nr_accesses) + ), + + TP_fast_assign( + __entry->target_id = t->id; + __entry->nr_regions = nr_regions; + __entry->start = r->ar.start; + __entry->end = r->ar.end; + __entry->nr_accesses = r->nr_accesses; + ), + + TP_printk("target_id=%lu nr_regions=%u %lu-%lu: %u", + __entry->target_id, __entry->nr_regions, + __entry->start, __entry->end, __entry->nr_accesses) +); + +#endif /* _TRACE_DAMON_H */ + +/* This part must be outside protection */ +#include diff --git a/mm/damon/core.c b/mm/damon/core.c index 36428327e848..d7957d8ff530 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -13,6 +13,9 @@ #include #include +#define CREATE_TRACE_POINTS +#include + /* Minimal region size. Every damon_region is aligned by this. */ #define MIN_REGION PAGE_SIZE @@ -386,8 +389,10 @@ static void kdamond_reset_aggregated(struct damon_ctx *c) damon_for_each_target(t, c) { struct damon_region *r; - damon_for_each_region(r, t) + damon_for_each_region(r, t) { + trace_damon_aggregated(t, r, damon_nr_regions(t)); r->nr_accesses = 0; + } } } From patchwork Tue Oct 20 08:59:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846097 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5A091580 for ; Tue, 20 Oct 2020 09:06:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 55D992237B for ; Tue, 20 Oct 2020 09:06:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="RuEZny06" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 55D992237B Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 76AC06B0071; Tue, 20 Oct 2020 05:06:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6F41B6B0072; Tue, 20 Oct 2020 05:06:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 56C5A6B0073; Tue, 20 Oct 2020 05:06:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0110.hostedemail.com [216.40.44.110]) by kanga.kvack.org (Postfix) with ESMTP id 162C66B0071 for ; Tue, 20 Oct 2020 05:06:28 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 90532AF8C for ; Tue, 20 Oct 2020 09:06:27 +0000 (UTC) X-FDA: 77391722814.16.ear68_061400a2723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 6FB171028AFA5 for ; Tue, 20 Oct 2020 09:06:27 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30003:30012:30034:30046:30051:30054:30056:30064:30070:30080,0,RBL:52.95.49.90:@amazon.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04yfpxx3rq1ki391jd1jqnzuwcahtyp8mur17k7hzr6i1rimuo8wi77ahuia91t.swpp54989u684toc6meo8zweohmmny8j5izyrnod77tdrdh9158sqkjedpjipbn.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: ear68_061400a2723e X-Filterd-Recvd-Size: 19707 Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:06:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184787; x=1634720787; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=LsWjPNz3Yj7zyjFT/cDqF7ZBuGzEE0okHQgyvsNcGRk=; b=RuEZny06SeJz99sj9gPCq8Jiuguns1V0zRsJLH3DbR3QqjkAoF546LVI giWuL7iiODHPaI4aztILYiDaZbf1xlb/4tsI7JP/cU0Gsz9vCypwUcsag hBXFZzuHTM/vcOxz3Es7b8CHRmoyy0zjW9LU5/87KDMA9hFpwOn3w/xSq c=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="60708410" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2b-c300ac87.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 20 Oct 2020 09:06:16 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162]) by email-inbound-relay-2b-c300ac87.us-west-2.amazon.com (Postfix) with ESMTPS id 02439A21C4; Tue, 20 Oct 2020 09:05:08 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:04:50 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 10/18] mm/damon: Implement a debugfs-based user space interface Date: Tue, 20 Oct 2020 10:59:32 +0200 Message-ID: <20201020085940.13875-11-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park DAMON is designed to be used by kernel space code such as the memory management subsystems, and therefore it provides only kernel space API. That said, letting the user space control DAMON could provide some benefits to them. For example, it will allow user space to analyze their specific workloads and make their own special optimizations. For such cases, this commit implements a simple DAMON application kernel module, namely 'damon-dbgfs', which merely wraps the DAMON api and exports those to the user space via the debugfs. 'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and ``monitor_on`` under its debugfs directory, ``/damon/``. Attributes ---------- Users can read and write the ``sampling interval``, ``aggregation interval``, ``regions update interval``, and min/max number of monitoring target regions by reading from and writing to the ``attrs`` file. For example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10, 1000 and check it again:: # cd /damon # echo 5000 100000 1000000 10 1000 > attrs # cat attrs 5000 100000 1000000 10 1000 Target IDs ---------- Some types of address spaces supports multiple monitoring target. For example, the virtual memory address spaces monitoring can have multiple processes as the monitoring targets. Users can set the targets by writing relevant id values of the targets to, and get the ids of the current targets by reading from the ``target_ids`` file. In case of the virtual address spaces monitoring, the values should be pids of the monitoring target processes. For example, below commands set processes having pids 42 and 4242 as the monitoring targets and check it again:: # cd /damon # echo 42 4242 > target_ids # cat target_ids 42 4242 Note that setting the target ids doesn't start the monitoring. Turning On/Off -------------- Setting the files as described above doesn't incur effect unless you explicitly start the monitoring. You can start, stop, and check the current status of the monitoring by writing to and reading from the ``monitor_on`` file. Writing ``on`` to the file starts the monitoring of the targets with the attributes. Writing ``off`` to the file stops those. DAMON also stops if every targets are invalidated (in case of the virtual memory monitoring, target processes are invalidated when terminated). Below example commands turn on, off, and check the status of DAMON:: # cd /damon # echo on > monitor_on # echo off > monitor_on # cat monitor_on off Please note that you cannot write to the above-mentioned debugfs files while the monitoring is turned on. If you write to the files while DAMON is running, an error code such as ``-EBUSY`` will be returned. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 2 + mm/damon/Kconfig | 9 + mm/damon/Makefile | 1 + mm/damon/core.c | 48 +++++ mm/damon/dbgfs.c | 428 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 488 insertions(+) create mode 100644 mm/damon/dbgfs.c diff --git a/include/linux/damon.h b/include/linux/damon.h index 70cc4b54212e..d675ea908a02 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -226,6 +226,8 @@ void damon_free_target(struct damon_target *t); void damon_destroy_target(struct damon_target *t); unsigned int damon_nr_regions(struct damon_target *t); +struct damon_ctx *damon_new_ctx(void); +void damon_destroy_ctx(struct damon_ctx *ctx); int damon_set_targets(struct damon_ctx *ctx, unsigned long *ids, ssize_t nr_ids); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index 63b9c905b548..e38f95d28f74 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -22,4 +22,13 @@ config DAMON_PRIMITIVES The primitives support only virtual address spaces. If this cannot cover your use case, you can implement and use your own primitives. +config DAMON_DBGFS + bool "DAMON debugfs interface" + depends on DAMON_PRIMITIVES && DEBUG_FS + help + This builds the debugfs interface for DAMON. The user space admins + can use the interface for arbitrary data access monitoring. + + If unsure, say N. + endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile index 2f3235a52e5e..2295deb2fe0e 100644 --- a/mm/damon/Makefile +++ b/mm/damon/Makefile @@ -2,3 +2,4 @@ obj-$(CONFIG_DAMON) := core.o obj-$(CONFIG_DAMON_PRIMITIVES) += primitives.o +obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o diff --git a/mm/damon/core.c b/mm/damon/core.c index d7957d8ff530..47baf859d7d9 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -135,6 +135,40 @@ unsigned int damon_nr_regions(struct damon_target *t) return nr_regions; } +struct damon_ctx *damon_new_ctx(void) +{ + struct damon_ctx *ctx; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return NULL; + + ctx->sample_interval = 5 * 1000; + ctx->aggr_interval = 100 * 1000; + ctx->regions_update_interval = 1000 * 1000; + ctx->min_nr_regions = 10; + ctx->max_nr_regions = 1000; + + ktime_get_coarse_ts64(&ctx->last_aggregation); + ctx->last_regions_update = ctx->last_aggregation; + + mutex_init(&ctx->kdamond_lock); + + INIT_LIST_HEAD(&ctx->targets_list); + + return ctx; +} + +void damon_destroy_ctx(struct damon_ctx *ctx) +{ + struct damon_target *t, *next_t; + + damon_for_each_target_safe(t, next_t, ctx) + damon_destroy_target(t); + + kfree(ctx); +} + /** * damon_set_targets() - Set monitoring targets. * @ctx: monitoring context @@ -204,6 +238,20 @@ int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, return 0; } +/** + * damon_nr_running_ctxs() - Return number of currently running contexts. + */ +int damon_nr_running_ctxs(void) +{ + int nr_ctxs; + + mutex_lock(&damon_lock); + nr_ctxs = nr_running_ctxs; + mutex_unlock(&damon_lock); + + return nr_ctxs; +} + /* Returns the size upper limit for each monitoring region */ static unsigned long damon_region_sz_limit(struct damon_ctx *ctx) { diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c new file mode 100644 index 000000000000..6316d4cae2a4 --- /dev/null +++ b/mm/damon/dbgfs.c @@ -0,0 +1,428 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * DAMON Debugfs Interface + * + * Author: SeongJae Park + */ + +#define pr_fmt(fmt) "damon-dbgfs: " fmt + +#include +#include +#include +#include +#include +#include +#include + +static struct damon_ctx **dbgfs_ctxs; +static int dbgfs_nr_ctxs = 1; +static int dbgfs_nr_terminated_ctxs; +static struct dentry **dbgfs_dirs; +static DEFINE_MUTEX(damon_dbgfs_lock); + +/* + * Returns non-empty string on success, negarive error code otherwise. + */ +static char *user_input_str(const char __user *buf, size_t count, loff_t *ppos) +{ + char *kbuf; + ssize_t ret; + + /* We do not accept continuous write */ + if (*ppos) + return ERR_PTR(-EINVAL); + + kbuf = kmalloc(count + 1, GFP_KERNEL); + if (!kbuf) + return ERR_PTR(-ENOMEM); + + ret = simple_write_to_buffer(kbuf, count + 1, ppos, buf, count); + if (ret != count) { + kfree(kbuf); + return ERR_PTR(-EIO); + } + kbuf[ret] = '\0'; + + return kbuf; +} + +static ssize_t dbgfs_attrs_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + char kbuf[128]; + int ret; + + mutex_lock(&ctx->kdamond_lock); + ret = scnprintf(kbuf, ARRAY_SIZE(kbuf), "%lu %lu %lu %lu %lu\n", + ctx->sample_interval, ctx->aggr_interval, + ctx->regions_update_interval, ctx->min_nr_regions, + ctx->max_nr_regions); + mutex_unlock(&ctx->kdamond_lock); + + return simple_read_from_buffer(buf, count, ppos, kbuf, ret); +} + +static ssize_t dbgfs_attrs_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + unsigned long s, a, r, minr, maxr; + char *kbuf; + ssize_t ret = count; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + if (sscanf(kbuf, "%lu %lu %lu %lu %lu", + &s, &a, &r, &minr, &maxr) != 5) { + ret = -EINVAL; + goto out; + } + + mutex_lock(&ctx->kdamond_lock); + if (ctx->kdamond) { + ret = -EBUSY; + goto unlock_out; + } + + err = damon_set_attrs(ctx, s, a, r, minr, maxr); + if (err) + ret = err; +unlock_out: + mutex_unlock(&ctx->kdamond_lock); +out: + kfree(kbuf); + return ret; +} + +#define targetid_is_pid(ctx) \ + (ctx->primitive.target_valid == damon_va_target_valid) + +static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len) +{ + struct damon_target *t; + unsigned long id; + int written = 0; + int rc; + + damon_for_each_target(t, ctx) { + id = t->id; + if (targetid_is_pid(ctx)) + /* Show pid numbers to debugfs users */ + id = (unsigned long)pid_vnr((struct pid *)id); + + rc = scnprintf(&buf[written], len - written, "%lu ", id); + if (!rc) + return -ENOMEM; + written += rc; + } + if (written) + written -= 1; + written += scnprintf(&buf[written], len - written, "\n"); + return written; +} + +static ssize_t dbgfs_target_ids_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + ssize_t len; + char ids_buf[320]; + + mutex_lock(&ctx->kdamond_lock); + len = sprint_target_ids(ctx, ids_buf, 320); + mutex_unlock(&ctx->kdamond_lock); + if (len < 0) + return len; + + return simple_read_from_buffer(buf, count, ppos, ids_buf, len); +} + +/* + * Converts a string into an array of unsigned long integers + * + * Returns an array of unsigned long integers if the conversion success, or + * NULL otherwise. + */ +static unsigned long *str_to_target_ids(const char *str, ssize_t len, + ssize_t *nr_ids) +{ + unsigned long *ids; + const int max_nr_ids = 32; + unsigned long id; + int pos = 0, parsed, ret; + + *nr_ids = 0; + ids = kmalloc_array(max_nr_ids, sizeof(id), GFP_KERNEL); + if (!ids) + return NULL; + while (*nr_ids < max_nr_ids && pos < len) { + ret = sscanf(&str[pos], "%lu%n", &id, &parsed); + pos += parsed; + if (ret != 1) + break; + ids[*nr_ids] = id; + *nr_ids += 1; + } + + return ids; +} + +/* Returns pid for the given pidfd if it's valid, or NULL otherwise. */ +static struct pid *damon_get_pidfd_pid(unsigned int pidfd) +{ + struct fd f; + struct pid *pid; + + f = fdget(pidfd); + if (!f.file) + return NULL; + + pid = pidfd_pid(f.file); + if (!IS_ERR(pid)) + get_pid(pid); + else + pid = NULL; + + fdput(f); + return pid; +} + +static ssize_t dbgfs_target_ids_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + char *kbuf, *nrs; + bool received_pidfds = false; + unsigned long *targets; + ssize_t nr_targets; + ssize_t ret = count; + int i; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + nrs = kbuf; + + if (!strncmp(kbuf, "pidfd ", 6)) { + received_pidfds = true; + nrs = &kbuf[6]; + } + + targets = str_to_target_ids(nrs, ret, &nr_targets); + if (!targets) { + ret = -ENOMEM; + goto out; + } + + if (received_pidfds) { + for (i = 0; i < nr_targets; i++) + targets[i] = (unsigned long)damon_get_pidfd_pid( + (unsigned int)targets[i]); + } else if (targetid_is_pid(ctx)) { + for (i = 0; i < nr_targets; i++) + targets[i] = (unsigned long)find_get_pid( + (int)targets[i]); + } + + mutex_lock(&ctx->kdamond_lock); + if (ctx->kdamond) { + ret = -EINVAL; + goto unlock_out; + } + + err = damon_set_targets(ctx, targets, nr_targets); + if (err) + ret = err; +unlock_out: + mutex_unlock(&ctx->kdamond_lock); + kfree(targets); +out: + kfree(kbuf); + return ret; +} + +static int damon_dbgfs_open(struct inode *inode, struct file *file) +{ + file->private_data = inode->i_private; + + return nonseekable_open(inode, file); +} + +static const struct file_operations attrs_fops = { + .owner = THIS_MODULE, + .open = damon_dbgfs_open, + .read = dbgfs_attrs_read, + .write = dbgfs_attrs_write, +}; + +static const struct file_operations target_ids_fops = { + .owner = THIS_MODULE, + .open = damon_dbgfs_open, + .read = dbgfs_target_ids_read, + .write = dbgfs_target_ids_write, +}; + +static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) +{ + const char * const file_names[] = {"attrs", "target_ids"}; + const struct file_operations *fops[] = {&attrs_fops, &target_ids_fops}; + int i; + + for (i = 0; i < ARRAY_SIZE(file_names); i++) { + if (!debugfs_create_file(file_names[i], 0600, dir, + ctx, fops[i])) { + pr_err("failed to create %s file\n", file_names[i]); + return -ENOMEM; + } + } + + return 0; +} + +static void dbgfs_unlock_page_idle_lock(void) +{ + mutex_lock(&damon_dbgfs_lock); + if (++dbgfs_nr_terminated_ctxs == dbgfs_nr_ctxs) { + dbgfs_nr_terminated_ctxs = 0; + mutex_unlock(&page_idle_lock); + } + mutex_unlock(&damon_dbgfs_lock); +} + +static int dbgfs_before_terminate(struct damon_ctx *ctx) +{ + dbgfs_unlock_page_idle_lock(); + return 0; +} + +static struct damon_ctx *dbgfs_new_ctx(void) +{ + struct damon_ctx *ctx; + + ctx = damon_new_ctx(); + if (!ctx) + return NULL; + + damon_va_set_primitives(ctx); + ctx->callback.before_terminate = dbgfs_before_terminate; + return ctx; +} + +static ssize_t dbgfs_monitor_on_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + char monitor_on_buf[5]; + bool monitor_on = damon_nr_running_ctxs() != 0; + int len; + + len = scnprintf(monitor_on_buf, 5, monitor_on ? "on\n" : "off\n"); + + return simple_read_from_buffer(buf, count, ppos, monitor_on_buf, len); +} + +static int dbgfs_start_ctxs(struct damon_ctx **ctxs, int nr_ctxs) +{ + int rc; + + if (!mutex_trylock(&page_idle_lock)) + return -EBUSY; + + rc = damon_start(ctxs, nr_ctxs); + if (rc) + mutex_unlock(&page_idle_lock); + + return rc; +} + +static ssize_t dbgfs_monitor_on_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + ssize_t ret = count; + char *kbuf; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + /* Remove white space */ + if (sscanf(kbuf, "%s", kbuf) != 1) { + kfree(kbuf); + return -EINVAL; + } + + if (!strncmp(kbuf, "on", count)) + err = dbgfs_start_ctxs(dbgfs_ctxs, dbgfs_nr_ctxs); + else if (!strncmp(kbuf, "off", count)) + err = damon_stop(dbgfs_ctxs, dbgfs_nr_ctxs); + else + err = -EINVAL; + + if (err) + ret = err; + kfree(kbuf); + return ret; +} + +static const struct file_operations monitor_on_fops = { + .owner = THIS_MODULE, + .read = dbgfs_monitor_on_read, + .write = dbgfs_monitor_on_write, +}; + +static int __init __damon_dbgfs_init(void) +{ + struct dentry *dbgfs_root; + const char * const file_names[] = {"monitor_on"}; + const struct file_operations *fops[] = {&monitor_on_fops}; + int i; + + dbgfs_root = debugfs_create_dir("damon", NULL); + if (IS_ERR(dbgfs_root)) { + pr_err("failed to create the dbgfs dir\n"); + return PTR_ERR(dbgfs_root); + } + + for (i = 0; i < ARRAY_SIZE(file_names); i++) { + if (!debugfs_create_file(file_names[i], 0600, dbgfs_root, + NULL, fops[i])) { + pr_err("failed to create %s file\n", file_names[i]); + return -ENOMEM; + } + } + dbgfs_fill_ctx_dir(dbgfs_root, dbgfs_ctxs[0]); + + dbgfs_dirs = kmalloc_array(1, sizeof(dbgfs_root), GFP_KERNEL); + dbgfs_dirs[0] = dbgfs_root; + + return 0; +} + +/* + * Functions for the initialization + */ + +static int __init damon_dbgfs_init(void) +{ + int rc; + + dbgfs_ctxs = kmalloc(sizeof(*dbgfs_ctxs), GFP_KERNEL); + dbgfs_ctxs[0] = dbgfs_new_ctx(); + if (!dbgfs_ctxs[0]) + return -ENOMEM; + + rc = __damon_dbgfs_init(); + if (rc) + pr_err("%s: dbgfs init failed\n", __func__); + + return rc; +} + +module_init(damon_dbgfs_init); From patchwork Tue Oct 20 08:59:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11846091 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4D791580 for ; Tue, 20 Oct 2020 09:05:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 639D522409 for ; Tue, 20 Oct 2020 09:05:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="Lvl03JRA" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 639D522409 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8A5C96B0062; Tue, 20 Oct 2020 05:05:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 87DCE6B0068; Tue, 20 Oct 2020 05:05:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 745C66B006E; Tue, 20 Oct 2020 05:05:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id 48D736B0062 for ; Tue, 20 Oct 2020 05:05:37 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id BF662181AC9CB for ; Tue, 20 Oct 2020 09:05:36 +0000 (UTC) X-FDA: 77391720672.26.cup26_19069942723e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 7A8A11804B65C for ; Tue, 20 Oct 2020 09:05:36 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55584ce82=sjpark@amazon.com,,RULES_HIT:30003:30012:30034:30051:30054:30064:30075:30080,0,RBL:207.171.184.29:@amazon.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10;04y887exfcgmde8asrx1aqtwe6unfocq38et7ubh6nju8afeykzygtcydn8uwxm.3krpeu386wdpgw4r9bqqqpfkfn57beznztgcn5xzhpqocfbopjdayqseptb7dj4.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: cup26_19069942723e X-Filterd-Recvd-Size: 12738 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 09:05:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1603184737; x=1634720737; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=XdUhU/uieMloV+LtEQ7UZfIH1e/xiUeXXXPJkp3izOM=; b=Lvl03JRAUBQw+279N93lGB8fvA2iZkX8w5k8diLXANnJhNdnHdMJ6nFl sqrFXm9m5zfKlpdOpH20Rjzs28zLMJ2gQm+d3CveTRLLqunO1KblBRx4m Vssh8D+hlMfmSBANujQjBlYihbOXy6yp3WwMU9qMGUioZzFv6h/UgwrlX 0=; X-IronPort-AV: E=Sophos;i="5.77,396,1596499200"; d="scan'208";a="86290015" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2c-579b7f5b.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9102.sea19.amazon.com with ESMTP; 20 Oct 2020 09:05:35 +0000 Received: from EX13D31EUB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2c-579b7f5b.us-west-2.amazon.com (Postfix) with ESMTPS id F1207A1CCC; Tue, 20 Oct 2020 09:05:31 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.161.237) by EX13D31EUB001.ant.amazon.com (10.43.166.210) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 20 Oct 2020 09:05:09 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v22 11/18] mm/damon/dbgfs: Implement recording feature Date: Tue, 20 Oct 2020 10:59:33 +0200 Message-ID: <20201020085940.13875-12-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201020085940.13875-1-sjpark@amazon.com> References: <20201020085940.13875-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.237] X-ClientProxiedBy: EX13D41UWC001.ant.amazon.com (10.43.162.107) To EX13D31EUB001.ant.amazon.com (10.43.166.210) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park The user space users can control DAMON via and get the monitoring results using the 'damon_aggregated' tracepoint event. However, dealing with the tracepoint might be complex for some simple use cases. This commit therefore implements 'recording' feature in 'damon-dbgfs'. The feature can be used via 'record' file in the '/damon/' directory. The file allows users to record monitored access patterns in a regular binary file. The recorded results are first written in an in-memory buffer and flushed to a file in batch. Users can get and set the size of the buffer and the path to the result file by reading from and writing to the ``record`` file. For example, below commands set the buffer to be 4 KiB and the result to be saved in ``/damon.data``. :: # cd /damon # echo "4096 /damon.data" > record # cat record 4096 /damon.data The recording can be disabled by setting the buffer size zero. Signed-off-by: SeongJae Park --- mm/damon/dbgfs.c | 255 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 253 insertions(+), 2 deletions(-) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index 6316d4cae2a4..5aac85de23d2 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -15,6 +15,17 @@ #include #include +#define MIN_RECORD_BUFFER_LEN 1024 +#define MAX_RECORD_BUFFER_LEN (4 * 1024 * 1024) +#define MAX_RFILE_PATH_LEN 256 + +struct dbgfs_recorder { + unsigned char *rbuf; + unsigned int rbuf_len; + unsigned int rbuf_offset; + char *rfile_path; +}; + static struct damon_ctx **dbgfs_ctxs; static int dbgfs_nr_ctxs = 1; static int dbgfs_nr_terminated_ctxs; @@ -99,6 +110,116 @@ static ssize_t dbgfs_attrs_write(struct file *file, return ret; } +static ssize_t dbgfs_record_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + struct dbgfs_recorder *rec = ctx->callback.private; + char record_buf[20 + MAX_RFILE_PATH_LEN]; + int ret; + + mutex_lock(&ctx->kdamond_lock); + ret = scnprintf(record_buf, ARRAY_SIZE(record_buf), "%u %s\n", + rec->rbuf_len, rec->rfile_path); + mutex_unlock(&ctx->kdamond_lock); + return simple_read_from_buffer(buf, count, ppos, record_buf, ret); +} + +/* + * dbgfs_set_recording() - Set attributes for the recording. + * @ctx: target kdamond context + * @rbuf_len: length of the result buffer + * @rfile_path: path to the monitor result files + * + * Setting 'rbuf_len' 0 disables recording. + * + * This function should not be called while the kdamond is running. + * + * Return: 0 on success, negative error code otherwise. + */ +static int dbgfs_set_recording(struct damon_ctx *ctx, + unsigned int rbuf_len, char *rfile_path) +{ + struct dbgfs_recorder *recorder; + size_t rfile_path_len; + + if (rbuf_len && (rbuf_len > MAX_RECORD_BUFFER_LEN || + rbuf_len < MIN_RECORD_BUFFER_LEN)) { + pr_err("result buffer size (%u) is out of [%d,%d]\n", + rbuf_len, MIN_RECORD_BUFFER_LEN, + MAX_RECORD_BUFFER_LEN); + return -EINVAL; + } + rfile_path_len = strnlen(rfile_path, MAX_RFILE_PATH_LEN); + if (rfile_path_len >= MAX_RFILE_PATH_LEN) { + pr_err("too long (>%d) result file path %s\n", + MAX_RFILE_PATH_LEN, rfile_path); + return -EINVAL; + } + + recorder = ctx->callback.private; + if (!recorder) { + recorder = kzalloc(sizeof(*recorder), GFP_KERNEL); + if (!recorder) + return -ENOMEM; + ctx->callback.private = recorder; + } + + recorder->rbuf_len = rbuf_len; + kfree(recorder->rbuf); + recorder->rbuf = NULL; + kfree(recorder->rfile_path); + recorder->rfile_path = NULL; + + if (rbuf_len) { + recorder->rbuf = kvmalloc(rbuf_len, GFP_KERNEL); + if (!recorder->rbuf) + return -ENOMEM; + } + recorder->rfile_path = kmalloc(rfile_path_len + 1, GFP_KERNEL); + if (!recorder->rfile_path) + return -ENOMEM; + strncpy(recorder->rfile_path, rfile_path, rfile_path_len + 1); + + return 0; +} + +static ssize_t dbgfs_record_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + char *kbuf; + unsigned int rbuf_len; + char rfile_path[MAX_RFILE_PATH_LEN]; + ssize_t ret = count; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + if (sscanf(kbuf, "%u %s", + &rbuf_len, rfile_path) != 2) { + ret = -EINVAL; + goto out; + } + + mutex_lock(&ctx->kdamond_lock); + if (ctx->kdamond) { + ret = -EBUSY; + goto unlock_out; + } + + err = dbgfs_set_recording(ctx, rbuf_len, rfile_path); + if (err) + ret = err; +unlock_out: + mutex_unlock(&ctx->kdamond_lock); +out: + kfree(kbuf); + return ret; +} + #define targetid_is_pid(ctx) \ (ctx->primitive.target_valid == damon_va_target_valid) @@ -262,6 +383,13 @@ static const struct file_operations attrs_fops = { .write = dbgfs_attrs_write, }; +static const struct file_operations record_fops = { + .owner = THIS_MODULE, + .open = damon_dbgfs_open, + .read = dbgfs_record_read, + .write = dbgfs_record_write, +}; + static const struct file_operations target_ids_fops = { .owner = THIS_MODULE, .open = damon_dbgfs_open, @@ -271,8 +399,9 @@ static const struct file_operations target_ids_fops = { static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) { - const char * const file_names[] = {"attrs", "target_ids"}; - const struct file_operations *fops[] = {&attrs_fops, &target_ids_fops}; + const char * const file_names[] = {"attrs", "record", "target_ids"}; + const struct file_operations *fops[] = {&attrs_fops, &record_fops, + &target_ids_fops}; int i; for (i = 0; i < ARRAY_SIZE(file_names); i++) { @@ -286,6 +415,120 @@ static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) return 0; } +/* + * Flush the content in the result buffer to the result file + */ +static void dbgfs_flush_rbuffer(struct dbgfs_recorder *rec) +{ + ssize_t sz; + loff_t pos = 0; + struct file *rfile; + + if (!rec->rbuf_offset) + return; + + rfile = filp_open(rec->rfile_path, + O_CREAT | O_RDWR | O_APPEND | O_LARGEFILE, 0644); + if (IS_ERR(rfile)) { + pr_err("Cannot open the result file %s\n", + rec->rfile_path); + return; + } + + while (rec->rbuf_offset) { + sz = kernel_write(rfile, rec->rbuf, rec->rbuf_offset, &pos); + if (sz < 0) + break; + rec->rbuf_offset -= sz; + } + filp_close(rfile, NULL); +} + +/* + * Write a data into the result buffer + */ +static void dbgfs_write_rbuf(struct damon_ctx *ctx, void *data, ssize_t size) +{ + struct dbgfs_recorder *rec = ctx->callback.private; + + if (!rec->rbuf_len || !rec->rbuf || !rec->rfile_path) + return; + if (rec->rbuf_offset + size > rec->rbuf_len) + dbgfs_flush_rbuffer(ctx->callback.private); + if (rec->rbuf_offset + size > rec->rbuf_len) { + pr_warn("%s: flush failed, or wrong size given(%u, %zu)\n", + __func__, rec->rbuf_offset, size); + return; + } + + memcpy(&rec->rbuf[rec->rbuf_offset], data, size); + rec->rbuf_offset += size; +} + +static void dbgfs_write_record_header(struct damon_ctx *ctx) +{ + int recfmt_ver = 2; + + dbgfs_write_rbuf(ctx, "damon_recfmt_ver", 16); + dbgfs_write_rbuf(ctx, &recfmt_ver, sizeof(recfmt_ver)); +} + +static unsigned int nr_damon_targets(struct damon_ctx *ctx) +{ + struct damon_target *t; + unsigned int nr_targets = 0; + + damon_for_each_target(t, ctx) + nr_targets++; + + return nr_targets; +} + +static int dbgfs_before_start(struct damon_ctx *ctx) +{ + dbgfs_write_record_header(ctx); + return 0; +} + +/* + * Store the aggregated monitoring results to the result buffer + * + * The format for the result buffer is as below: + * + *