From patchwork Tue Dec 15 11:54:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11974623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1BAAC4361B for ; Tue, 15 Dec 2020 11:56:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3A8C3222B3 for ; Tue, 15 Dec 2020 11:56:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A8C3222B3 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C44C16B005D; Tue, 15 Dec 2020 06:56:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C1BB66B0068; Tue, 15 Dec 2020 06:56:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE5406B006C; Tue, 15 Dec 2020 06:56:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0217.hostedemail.com [216.40.44.217]) by kanga.kvack.org (Postfix) with ESMTP id 952096B005D for ; Tue, 15 Dec 2020 06:56:13 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5F6BE1EFF for ; Tue, 15 Dec 2020 11:56:13 +0000 (UTC) X-FDA: 77595363426.30.range14_4017a7927423 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 390D2180B3C85 for ; Tue, 15 Dec 2020 11:56:13 +0000 (UTC) X-HE-Tag: range14_4017a7927423 X-Filterd-Recvd-Size: 24207 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 11:56:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1608033373; x=1639569373; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=gnnWX583hdLYiT+7DdVvUIVFpfkMzIFLhOitCSuEmhA=; b=GhQxKJ+hM1BqTTjMGYFs0oDjpFU56xyIo0hLTNCKYAzED3exGp2tDckM xJSENvwurisX1o6MFaV2pBhjPQxiMVsNCLFmYKDdhsLAEGW4uYptZw6zR D4zDgM8nDqtMifNAoj9vsvlReNgLHCgrZ/EPhVVFhFM+O+m3oXi1pNFRy A=; X-IronPort-AV: E=Sophos;i="5.78,420,1599523200"; d="scan'208";a="103147491" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1a-7d76a15f.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP; 15 Dec 2020 11:56:03 +0000 Received: from EX13D31EUA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-1a-7d76a15f.us-east-1.amazon.com (Postfix) with ESMTPS id EFE25A23BA; Tue, 15 Dec 2020 11:55:50 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.162.252) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 15 Dec 2020 11:55:33 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v23 01/15] mm: Introduce Data Access MONitor (DAMON) Date: Tue, 15 Dec 2020 12:54:34 +0100 Message-ID: <20201215115448.25633-2-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201215115448.25633-1-sjpark@amazon.com> References: <20201215115448.25633-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.162.252] X-ClientProxiedBy: EX13D33UWC002.ant.amazon.com (10.43.162.11) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park DAMON is a data access monitoring framework for the Linux kernel. The core mechanisms of DAMON make it - accurate (the monitoring output is useful enough for DRAM level performance-centric memory management; It might be inappropriate for CPU cache levels, though), - light-weight (the monitoring overhead is normally low enough to be applied online), and - scalable (the upper-bound of the overhead is in constant range regardless of the size of target workloads). Using this framework, hence, we can easily write efficient kernel space data access monitoring applications. For example, the kernel's memory management mechanisms can make advanced decisions using this. Experimental data access aware optimization works that incurring high access monitoring overhead could implemented again on top of this. Due to its simple and flexible interface, providing user space interface would be also easy. Then, user space users who have some special workloads can write personalized applications for better understanding and optimizations of their workloads and systems. --- Nevertheless, this commit is defining and implementing only basic access check part without the overhead-accuracy handling core logic. The basic access check is as below. The output of DAMON says what memory regions are how frequently accessed for a given duration. The resolution of the access frequency is controlled by setting ``sampling interval`` and ``aggregation interval``. In detail, DAMON checks access to each page per ``sampling interval`` and aggregates the results. In other words, counts the number of the accesses to each region. After each ``aggregation interval`` passes, DAMON calls callback functions that previously registered by users so that users can read the aggregated results and then clears the results. This can be described in below simple pseudo-code:: while monitoring_on: for page in monitoring_target: if accessed(page): nr_accesses[page] += 1 if time() % aggregation_interval == 0: for callback in user_registered_callbacks: callback(monitoring_target, nr_accesses) for page in monitoring_target: nr_accesses[page] = 0 sleep(sampling interval) The target regions constructed at the beginning of the monitoring and updated after each ``regions_update_interval``, because the target regions could be dynamically changed (e.g., mmap() or memory hotplug). The monitoring overhead of this mechanism will arbitrarily increase as the size of the target workload grows. The basic monitoring primitives for actual access check and dynamic target regions construction aren't in the core part of DAMON. Instead, it allows users to implement their own primitives that optimized for their use case and configure DAMON to use those. In other words, users cannot use current version of DAMON without some additional works. Following commits will implement the core mechanisms for the overhead-accuracy control and default primitives implementations. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 167 ++++++++++++++++++++++ mm/Kconfig | 2 + mm/Makefile | 1 + mm/damon/Kconfig | 15 ++ mm/damon/Makefile | 3 + mm/damon/core.c | 316 ++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 504 insertions(+) create mode 100644 include/linux/damon.h create mode 100644 mm/damon/Kconfig create mode 100644 mm/damon/Makefile create mode 100644 mm/damon/core.c diff --git a/include/linux/damon.h b/include/linux/damon.h new file mode 100644 index 000000000000..387fa4399fc8 --- /dev/null +++ b/include/linux/damon.h @@ -0,0 +1,167 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * DAMON api + * + * Author: SeongJae Park + */ + +#ifndef _DAMON_H_ +#define _DAMON_H_ + +#include +#include +#include + +struct damon_ctx; + +/** + * struct damon_primitive Monitoring primitives for given use cases. + * + * @init_target_regions: Constructs initial monitoring target regions. + * @update_target_regions: Updates monitoring target regions. + * @prepare_access_checks: Prepares next access check of target regions. + * @check_accesses: Checks the access of target regions. + * @reset_aggregated: Resets aggregated accesses monitoring results. + * @target_valid: Determine if the target is valid. + * @cleanup: Cleans up the context. + * + * DAMON can be extended for various address spaces and usages. For this, + * users should register the low level primitives for their target address + * space and usecase via the &damon_ctx.primitive. Then, the monitoring thread + * calls @init_target_regions and @prepare_access_checks before starting the + * monitoring, @update_target_regions after each + * &damon_ctx.regions_update_interval, and @check_accesses, @target_valid and + * @prepare_access_checks after each &damon_ctx.sample_interval. Finally, + * @reset_aggregated is called after each &damon_ctx.aggr_interval. + * + * @init_target_regions should construct proper monitoring target regions and + * link those to the DAMON context struct. The regions should be defined by + * user and saved in @damon_ctx.target. + * @update_target_regions should update the monitoring target regions for + * current status. + * @prepare_access_checks should manipulate the monitoring regions to be + * prepared for the next access check. + * @check_accesses should check the accesses to each region that made after the + * last preparation and update the number of observed accesses of each region. + * @reset_aggregated should reset the access monitoring results that aggregated + * by @check_accesses. + * @target_valid should check whether the target is still valid for the + * monitoring. + * @cleanup is called from @kdamond just before its termination. After this + * call, only @kdamond_lock and @kdamond will be touched. + */ +struct damon_primitive { + void (*init_target_regions)(struct damon_ctx *context); + void (*update_target_regions)(struct damon_ctx *context); + void (*prepare_access_checks)(struct damon_ctx *context); + void (*check_accesses)(struct damon_ctx *context); + void (*reset_aggregated)(struct damon_ctx *context); + bool (*target_valid)(void *target); + void (*cleanup)(struct damon_ctx *context); +}; + +/* + * struct damon_callback Monitoring events notification callbacks. + * + * @before_start: Called before starting the monitoring. + * @after_sampling: Called after each sampling. + * @after_aggregation: Called after each aggregation. + * @before_terminate: Called before terminating the monitoring. + * @private: User private data. + * + * The monitoring thread (&damon_ctx->kdamond) calls @before_start and + * @before_terminate just before starting and finishing the monitoring, + * respectively. Therefore, those are good places for installing and cleaning + * @private. + * + * The monitoring thread calls @after_sampling and @after_aggregation for each + * of the sampling intervals and aggregation intervals, respectively. + * Therefore, users can safely access the monitoring results without additional + * protection. For the reason, users are recommended to use these callback for + * the accesses to the results. + * + * If any callback returns non-zero, monitoring stops. + */ +struct damon_callback { + void *private; + + int (*before_start)(struct damon_ctx *context); + int (*after_sampling)(struct damon_ctx *context); + int (*after_aggregation)(struct damon_ctx *context); + int (*before_terminate)(struct damon_ctx *context); +}; + +/** + * struct damon_ctx - Represents a context for each monitoring. This is the + * main interface that allows users to set the attributes and get the results + * of the monitoring. + * + * @sample_interval: The time between access samplings. + * @aggr_interval: The time between monitor results aggregations. + * @regions_update_interval: The time between monitor regions updates. + * + * For each @sample_interval, DAMON checks whether each region is accessed or + * not. It aggregates and keeps the access information (number of accesses to + * each region) for @aggr_interval time. DAMON also checks whether the target + * memory regions need update (e.g., by ``mmap()`` calls from the application, + * in case of virtual memory monitoring) and applies the changes for each + * @regions_update_interval. All time intervals are in micro-seconds. Please + * refer to &struct damon_primitive and &struct damon_callback for more detail. + * + * @kdamond: Kernel thread who does the monitoring. + * @kdamond_stop: Notifies whether kdamond should stop. + * @kdamond_lock: Mutex for the synchronizations with @kdamond. + * + * For each monitoring context, one kernel thread for the monitoring is + * created. The pointer to the thread is stored in @kdamond. + * + * Once started, the monitoring thread runs until explicitly required to be + * terminated or every monitoring target is invalid. The validity of the + * targets is checked via the &damon_primitive.target_valid of @primitive. The + * termination can also be explicitly requested by writing non-zero to + * @kdamond_stop. The thread sets @kdamond to NULL when it terminates. + * Therefore, users can know whether the monitoring is ongoing or terminated by + * reading @kdamond. Reads and writes to @kdamond and @kdamond_stop from + * outside of the monitoring thread must be protected by @kdamond_lock. + * + * Note that the monitoring thread protects only @kdamond and @kdamond_stop via + * @kdamond_lock. Accesses to other fields must be protected by themselves. + * + * @primitive: Set of monitoring primitives for given use cases. + * @callback: Set of callbacks for monitoring events notifications. + * + * @target: Pointer to the user-defined monitoring target. + */ +struct damon_ctx { + unsigned long sample_interval; + unsigned long aggr_interval; + unsigned long regions_update_interval; + +/* private */ + struct timespec64 last_aggregation; + struct timespec64 last_regions_update; + +/* public */ + struct task_struct *kdamond; + bool kdamond_stop; + struct mutex kdamond_lock; + + struct damon_primitive primitive; + struct damon_callback callback; + + void *target; +}; + +#ifdef CONFIG_DAMON + +struct damon_ctx *damon_new_ctx(void); +void damon_destroy_ctx(struct damon_ctx *ctx); +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, + unsigned long aggr_int, unsigned long regions_update_int); + +int damon_start(struct damon_ctx **ctxs, int nr_ctxs); +int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); + +#endif /* CONFIG_DAMON */ + +#endif /* _DAMON_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 390165ffbb0f..b97f2e8ab83f 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -859,4 +859,6 @@ config ARCH_HAS_HUGEPD config MAPPING_DIRTY_HELPERS bool +source "mm/damon/Kconfig" + endmenu diff --git a/mm/Makefile b/mm/Makefile index d73aed0fc99c..8022b8f04096 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -120,3 +120,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o +obj-$(CONFIG_DAMON) += damon/ diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig new file mode 100644 index 000000000000..d00e99ac1a15 --- /dev/null +++ b/mm/damon/Kconfig @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0-only + +menu "Data Access Monitoring" + +config DAMON + bool "DAMON: Data Access Monitoring Framework" + help + This builds a framework that allows kernel subsystems to monitor + access frequency of each memory region. The information can be useful + for performance-centric DRAM level memory management. + + See https://damonitor.github.io/doc/html/latest-damon/index.html for + more information. + +endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile new file mode 100644 index 000000000000..4fd2edb4becf --- /dev/null +++ b/mm/damon/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_DAMON) := core.o diff --git a/mm/damon/core.c b/mm/damon/core.c new file mode 100644 index 000000000000..8963804efdf9 --- /dev/null +++ b/mm/damon/core.c @@ -0,0 +1,316 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Data Access Monitor + * + * Author: SeongJae Park + */ + +#define pr_fmt(fmt) "damon: " fmt + +#include +#include +#include +#include + +static DEFINE_MUTEX(damon_lock); +static int nr_running_ctxs; + +struct damon_ctx *damon_new_ctx(void) +{ + struct damon_ctx *ctx; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return NULL; + + ctx->sample_interval = 5 * 1000; + ctx->aggr_interval = 100 * 1000; + ctx->regions_update_interval = 1000 * 1000; + + ktime_get_coarse_ts64(&ctx->last_aggregation); + ctx->last_regions_update = ctx->last_aggregation; + + mutex_init(&ctx->kdamond_lock); + + ctx->target = NULL; + + return ctx; +} + +void damon_destroy_ctx(struct damon_ctx *ctx) +{ + kfree(ctx); +} + +/** + * damon_set_attrs() - Set attributes for the monitoring. + * @ctx: monitoring context + * @sample_int: time interval between samplings + * @aggr_int: time interval between aggregations + * @regions_update_int: time interval between target regions update + * + * This function should not be called while the kdamond is running. + * Every time interval is in micro-seconds. + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, + unsigned long aggr_int, unsigned long regions_update_int) +{ + ctx->sample_interval = sample_int; + ctx->aggr_interval = aggr_int; + ctx->regions_update_interval = regions_update_int; + + return 0; +} + +static bool damon_kdamond_running(struct damon_ctx *ctx) +{ + bool running; + + mutex_lock(&ctx->kdamond_lock); + running = ctx->kdamond != NULL; + mutex_unlock(&ctx->kdamond_lock); + + return running; +} + +static int kdamond_fn(void *data); + +/* + * __damon_start() - Starts monitoring with given context. + * @ctx: monitoring context + * + * This function should be called while damon_lock is hold. + * + * Return: 0 on success, negative error code otherwise. + */ +static int __damon_start(struct damon_ctx *ctx) +{ + int err = -EBUSY; + + mutex_lock(&ctx->kdamond_lock); + if (!ctx->kdamond) { + err = 0; + ctx->kdamond_stop = false; + ctx->kdamond = kthread_create(kdamond_fn, ctx, "kdamond.%d", + nr_running_ctxs); + if (IS_ERR(ctx->kdamond)) + err = PTR_ERR(ctx->kdamond); + else + wake_up_process(ctx->kdamond); + } + mutex_unlock(&ctx->kdamond_lock); + + return err; +} + +/** + * damon_start() - Starts the monitorings for a given group of contexts. + * @ctxs: an array of the pointers for contexts to start monitoring + * @nr_ctxs: size of @ctxs + * + * This function starts a group of monitoring threads for a group of monitoring + * contexts. One thread per each context is created and run in parallel. The + * caller should handle synchronization between the threads by itself. If a + * group of threads that created by other 'damon_start()' call is currently + * running, this function does nothing but returns -EBUSY. + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_start(struct damon_ctx **ctxs, int nr_ctxs) +{ + int i; + int err = 0; + + mutex_lock(&damon_lock); + if (nr_running_ctxs) { + mutex_unlock(&damon_lock); + return -EBUSY; + } + + for (i = 0; i < nr_ctxs; i++) { + err = __damon_start(ctxs[i]); + if (err) + break; + nr_running_ctxs++; + } + mutex_unlock(&damon_lock); + + return err; +} + +/* + * __damon_stop() - Stops monitoring of given context. + * @ctx: monitoring context + * + * Return: 0 on success, negative error code otherwise. + */ +static int __damon_stop(struct damon_ctx *ctx) +{ + mutex_lock(&ctx->kdamond_lock); + if (ctx->kdamond) { + ctx->kdamond_stop = true; + mutex_unlock(&ctx->kdamond_lock); + while (damon_kdamond_running(ctx)) + usleep_range(ctx->sample_interval, + ctx->sample_interval * 2); + return 0; + } + mutex_unlock(&ctx->kdamond_lock); + + return -EPERM; +} + +/** + * damon_stop() - Stops the monitorings for a given group of contexts. + * @ctxs: an array of the pointers for contexts to stop monitoring + * @nr_ctxs: size of @ctxs + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_stop(struct damon_ctx **ctxs, int nr_ctxs) +{ + int i, err = 0; + + for (i = 0; i < nr_ctxs; i++) { + /* nr_running_ctxs is decremented in kdamond_fn */ + err = __damon_stop(ctxs[i]); + if (err) + return err; + } + + return err; +} + +/* + * damon_check_reset_time_interval() - Check if a time interval is elapsed. + * @baseline: the time to check whether the interval has elapsed since + * @interval: the time interval (microseconds) + * + * See whether the given time interval has passed since the given baseline + * time. If so, it also updates the baseline to current time for next check. + * + * Return: true if the time interval has passed, or false otherwise. + */ +static bool damon_check_reset_time_interval(struct timespec64 *baseline, + unsigned long interval) +{ + struct timespec64 now; + + ktime_get_coarse_ts64(&now); + if ((timespec64_to_ns(&now) - timespec64_to_ns(baseline)) < + interval * 1000) + return false; + *baseline = now; + return true; +} + +/* + * Check whether it is time to flush the aggregated information + */ +static bool kdamond_aggregate_interval_passed(struct damon_ctx *ctx) +{ + return damon_check_reset_time_interval(&ctx->last_aggregation, + ctx->aggr_interval); +} + +/* + * Check whether it is time to check and apply the target monitoring regions + * + * Returns true if it is. + */ +static bool kdamond_need_update_regions(struct damon_ctx *ctx) +{ + return damon_check_reset_time_interval(&ctx->last_regions_update, + ctx->regions_update_interval); +} + +/* + * Check whether current monitoring should be stopped + * + * The monitoring is stopped when either the user requested to stop, or all + * monitoring targets are invalid. + * + * Returns true if need to stop current monitoring. + */ +static bool kdamond_need_stop(struct damon_ctx *ctx) +{ + bool stop; + + mutex_lock(&ctx->kdamond_lock); + stop = ctx->kdamond_stop; + mutex_unlock(&ctx->kdamond_lock); + if (stop) + return true; + + if (!ctx->primitive.target_valid) + return false; + + return !ctx->primitive.target_valid(ctx->target); +} + +static void set_kdamond_stop(struct damon_ctx *ctx) +{ + mutex_lock(&ctx->kdamond_lock); + ctx->kdamond_stop = true; + mutex_unlock(&ctx->kdamond_lock); +} + +/* + * The monitoring daemon that runs as a kernel thread + */ +static int kdamond_fn(void *data) +{ + struct damon_ctx *ctx = (struct damon_ctx *)data; + + pr_info("kdamond (%d) starts\n", ctx->kdamond->pid); + + if (ctx->primitive.init_target_regions) + ctx->primitive.init_target_regions(ctx); + if (ctx->callback.before_start && ctx->callback.before_start(ctx)) + set_kdamond_stop(ctx); + + while (!kdamond_need_stop(ctx)) { + if (ctx->primitive.prepare_access_checks) + ctx->primitive.prepare_access_checks(ctx); + if (ctx->callback.after_sampling && + ctx->callback.after_sampling(ctx)) + set_kdamond_stop(ctx); + + usleep_range(ctx->sample_interval, ctx->sample_interval + 1); + + if (ctx->primitive.check_accesses) + ctx->primitive.check_accesses(ctx); + + if (kdamond_aggregate_interval_passed(ctx)) { + if (ctx->callback.after_aggregation && + ctx->callback.after_aggregation(ctx)) + set_kdamond_stop(ctx); + if (ctx->primitive.reset_aggregated) + ctx->primitive.reset_aggregated(ctx); + } + + if (kdamond_need_update_regions(ctx)) { + if (ctx->primitive.update_target_regions) + ctx->primitive.update_target_regions(ctx); + } + } + + if (ctx->callback.before_terminate && + ctx->callback.before_terminate(ctx)) + set_kdamond_stop(ctx); + if (ctx->primitive.cleanup) + ctx->primitive.cleanup(ctx); + + pr_debug("kdamond (%d) finishes\n", ctx->kdamond->pid); + mutex_lock(&ctx->kdamond_lock); + ctx->kdamond = NULL; + mutex_unlock(&ctx->kdamond_lock); + + mutex_lock(&damon_lock); + nr_running_ctxs--; + mutex_unlock(&damon_lock); + + do_exit(0); +} From patchwork Tue Dec 15 11:54:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11974625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0474BC4361B for ; Tue, 15 Dec 2020 11:56:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 62335222BB for ; Tue, 15 Dec 2020 11:56:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 62335222BB Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 95DF16B0068; Tue, 15 Dec 2020 06:56:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 90D606B006C; Tue, 15 Dec 2020 06:56:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 788298D0001; Tue, 15 Dec 2020 06:56:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0031.hostedemail.com [216.40.44.31]) by kanga.kvack.org (Postfix) with ESMTP id 62F246B0068 for ; Tue, 15 Dec 2020 06:56:53 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 28F7A363B for ; Tue, 15 Dec 2020 11:56:53 +0000 (UTC) X-FDA: 77595365106.17.dress40_2612d7527423 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 0FDD3180D0194 for ; Tue, 15 Dec 2020 11:56:53 +0000 (UTC) X-HE-Tag: dress40_2612d7527423 X-Filterd-Recvd-Size: 16015 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 11:56:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1608033413; x=1639569413; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=i3ZpZkgcBc2ZhYHhhGRF5vHUveiBH/YLWibRSteXiBo=; b=ujh6tePUNDHqEE4UWZdZbeslC+IpM7V37sMZu2dHY7KkY8UJRI3fg3Jy K6R9hfI2q5X+yBeMYv8PcaC+hNaE6P371xEzZTLG4YSN6W1H9Z0gw9ZuA DRHhx/Q62kEJ8Vya8TNmIFHoSh/Fc9wV9rTEU+N1XTRLQ0km2DE2LSxTS w=; X-IronPort-AV: E=Sophos;i="5.78,420,1599523200"; d="scan'208";a="103147633" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1e-57e1d233.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP; 15 Dec 2020 11:56:48 +0000 Received: from EX13D31EUA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-1e-57e1d233.us-east-1.amazon.com (Postfix) with ESMTPS id 3C67714168E; Tue, 15 Dec 2020 11:56:35 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.162.252) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 15 Dec 2020 11:56:19 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v23 02/15] mm/damon/core: Implement region-based sampling Date: Tue, 15 Dec 2020 12:54:35 +0100 Message-ID: <20201215115448.25633-3-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201215115448.25633-1-sjpark@amazon.com> References: <20201215115448.25633-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.162.252] X-ClientProxiedBy: EX13D33UWC002.ant.amazon.com (10.43.162.11) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park To avoid the unbounded increase of the overhead, DAMON groups adjacent pages that assumed to have the same access frequencies into a region. As long as the assumption (pages in a region have the same access frequencies) is kept, only one page in the region is required to be checked. Thus, for each ``sampling interval``, 1. the 'prepare_access_checks' primitive picks one page in each region, 2. waits for one ``sampling interval``, 3. checks whether the page is accessed meanwhile, and 4. increases the access frequency of the region if so. Therefore, the monitoring overhead is controllable by adjusting the number of regions. DAMON allows both the underlying primitives and user callbacks adjust regions for the trade-off. In other words, this commit makes DAMON to use not only time-based sampling but also space-based sampling. This scheme, however, cannot preserve the quality of the output if the assumption is not guaranteed. Next commit will address this problem. Another problem of this region abstraction is additional memory space overhead for the regions metadata. For example, suppose page granularity monitoring that doesn't want to know fine-grained access frequency but only if each page accessed or not. Then, we can do that by directly resetting and reading the PG_Idle flags and/or the PTE Accessed bits. The metadata for the region abstraction is only burden in the case. For the reason, this commit makes DAMON to support the user-defined arbitrary target, which could be stored in a void pointer of the monitoring context with specific target type. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 109 ++++++++++++++++++++++++++++++-- mm/damon/core.c | 142 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 243 insertions(+), 8 deletions(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index 387fa4399fc8..7d4685adc8a9 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -12,6 +12,48 @@ #include #include +/** + * struct damon_addr_range - Represents an address region of [@start, @end). + * @start: Start address of the region (inclusive). + * @end: End address of the region (exclusive). + */ +struct damon_addr_range { + unsigned long start; + unsigned long end; +}; + +/** + * struct damon_region - Represents a monitoring target region. + * @ar: The address range of the region. + * @sampling_addr: Address of the sample for the next access check. + * @nr_accesses: Access frequency of this region. + * @list: List head for siblings. + */ +struct damon_region { + struct damon_addr_range ar; + unsigned long sampling_addr; + unsigned int nr_accesses; + struct list_head list; +}; + +/** + * struct damon_target - Represents a monitoring target. + * @id: Unique identifier for this target. + * @regions_list: Head of the monitoring target regions of this target. + * @list: List head for siblings. + * + * Each monitoring context could have multiple targets. For example, a context + * for virtual memory address spaces could have multiple target processes. The + * @id of each target should be unique among the targets of the context. For + * example, in the virtual address monitoring context, it could be a pidfd or + * an address of an mm_struct. + */ +struct damon_target { + unsigned long id; + struct list_head regions_list; + struct list_head list; +}; + struct damon_ctx; /** @@ -36,7 +78,8 @@ struct damon_ctx; * * @init_target_regions should construct proper monitoring target regions and * link those to the DAMON context struct. The regions should be defined by - * user and saved in @damon_ctx.target. + * user and saved in @damon_ctx.arbitrary_target if @damon_ctx.target_type is + * &DAMON_ARBITRARY_TARGET. Otherwise, &struct damon_region should be used. * @update_target_regions should update the monitoring target regions for * current status. * @prepare_access_checks should manipulate the monitoring regions to be @@ -46,7 +89,8 @@ struct damon_ctx; * @reset_aggregated should reset the access monitoring results that aggregated * by @check_accesses. * @target_valid should check whether the target is still valid for the - * monitoring. + * monitoring. It receives &damon_ctx.arbitrary_target or &struct damon_target + * pointer depends on &damon_ctx.target_type. * @cleanup is called from @kdamond just before its termination. After this * call, only @kdamond_lock and @kdamond will be touched. */ @@ -91,6 +135,17 @@ struct damon_callback { int (*before_terminate)(struct damon_ctx *context); }; +/** + * enum damon_target_type - Represents the type of the monitoring target. + * + * @DAMON_REGION_SAMPLING_TARGET: Region based sampling target. + * @DAMON_ARBITRARY_TARGET: User-defined arbitrary type target. + */ +enum damon_target_type { + DAMON_REGION_SAMPLING_TARGET, + DAMON_ARBITRARY_TARGET, +}; + /** * struct damon_ctx - Represents a context for each monitoring. This is the * main interface that allows users to set the attributes and get the results @@ -130,7 +185,15 @@ struct damon_callback { * @primitive: Set of monitoring primitives for given use cases. * @callback: Set of callbacks for monitoring events notifications. * - * @target: Pointer to the user-defined monitoring target. + * @target_type: Type of the monitoring target. + * + * @region_targets: Head of monitoring targets (&damon_target) list. + * + * @arbitrary_target: Pointer to arbitrary type target. + * + * @region_targets are valid only if @target_type is + * DAMON_REGION_SAMPLING_TARGET. @arbitrary_target is valid only if + * @target_type is DAMON_ARBITRARY_TARGET. */ struct damon_ctx { unsigned long sample_interval; @@ -149,12 +212,48 @@ struct damon_ctx { struct damon_primitive primitive; struct damon_callback callback; - void *target; + enum damon_target_type target_type; + union { + /* DAMON_REGION_SAMPLING_TARGET */ + struct list_head region_targets; + + /* DAMON_ARBITRARY_TARGET */ + void *arbitrary_target; + }; }; +#define damon_next_region(r) \ + (container_of(r->list.next, struct damon_region, list)) + +#define damon_prev_region(r) \ + (container_of(r->list.prev, struct damon_region, list)) + +#define damon_for_each_region(r, t) \ + list_for_each_entry(r, &t->regions_list, list) + +#define damon_for_each_region_safe(r, next, t) \ + list_for_each_entry_safe(r, next, &t->regions_list, list) + +#define damon_for_each_target(t, ctx) \ + list_for_each_entry(t, &(ctx)->region_targets, list) + +#define damon_for_each_target_safe(t, next, ctx) \ + list_for_each_entry_safe(t, next, &(ctx)->region_targets, list) + #ifdef CONFIG_DAMON -struct damon_ctx *damon_new_ctx(void); +struct damon_region *damon_new_region(unsigned long start, unsigned long end); +inline void damon_insert_region(struct damon_region *r, + struct damon_region *prev, struct damon_region *next); +void damon_add_region(struct damon_region *r, struct damon_target *t); +void damon_destroy_region(struct damon_region *r); + +struct damon_target *damon_new_target(unsigned long id); +void damon_add_target(struct damon_ctx *ctx, struct damon_target *t); +void damon_free_target(struct damon_target *t); +void damon_destroy_target(struct damon_target *t); + +struct damon_ctx *damon_new_ctx(enum damon_target_type type); void damon_destroy_ctx(struct damon_ctx *ctx); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, unsigned long aggr_int, unsigned long regions_update_int); diff --git a/mm/damon/core.c b/mm/damon/core.c index 8963804efdf9..167487e75737 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -15,7 +15,102 @@ static DEFINE_MUTEX(damon_lock); static int nr_running_ctxs; -struct damon_ctx *damon_new_ctx(void) +/* + * Construct a damon_region struct + * + * Returns the pointer to the new struct if success, or NULL otherwise + */ +struct damon_region *damon_new_region(unsigned long start, unsigned long end) +{ + struct damon_region *region; + + region = kmalloc(sizeof(*region), GFP_KERNEL); + if (!region) + return NULL; + + region->ar.start = start; + region->ar.end = end; + region->nr_accesses = 0; + INIT_LIST_HEAD(®ion->list); + + return region; +} + +/* + * Add a region between two other regions + */ +inline void damon_insert_region(struct damon_region *r, + struct damon_region *prev, struct damon_region *next) +{ + __list_add(&r->list, &prev->list, &next->list); +} + +void damon_add_region(struct damon_region *r, struct damon_target *t) +{ + list_add_tail(&r->list, &t->regions_list); +} + +static void damon_del_region(struct damon_region *r) +{ + list_del(&r->list); +} + +static void damon_free_region(struct damon_region *r) +{ + kfree(r); +} + +void damon_destroy_region(struct damon_region *r) +{ + damon_del_region(r); + damon_free_region(r); +} + +/* + * Construct a damon_target struct + * + * Returns the pointer to the new struct if success, or NULL otherwise + */ +struct damon_target *damon_new_target(unsigned long id) +{ + struct damon_target *t; + + t = kmalloc(sizeof(*t), GFP_KERNEL); + if (!t) + return NULL; + + t->id = id; + INIT_LIST_HEAD(&t->regions_list); + + return t; +} + +void damon_add_target(struct damon_ctx *ctx, struct damon_target *t) +{ + list_add_tail(&t->list, &ctx->region_targets); +} + +static void damon_del_target(struct damon_target *t) +{ + list_del(&t->list); +} + +void damon_free_target(struct damon_target *t) +{ + struct damon_region *r, *next; + + damon_for_each_region_safe(r, next, t) + damon_free_region(r); + kfree(t); +} + +void damon_destroy_target(struct damon_target *t) +{ + damon_del_target(t); + damon_free_target(t); +} + +struct damon_ctx *damon_new_ctx(enum damon_target_type type) { struct damon_ctx *ctx; @@ -32,13 +127,20 @@ struct damon_ctx *damon_new_ctx(void) mutex_init(&ctx->kdamond_lock); - ctx->target = NULL; + ctx->target_type = type; + if (type != DAMON_ARBITRARY_TARGET) + INIT_LIST_HEAD(&ctx->region_targets); return ctx; } void damon_destroy_ctx(struct damon_ctx *ctx) { + struct damon_target *t, *next_t; + + damon_for_each_target_safe(t, next_t, ctx) + damon_destroy_target(t); + kfree(ctx); } @@ -215,6 +317,21 @@ static bool kdamond_aggregate_interval_passed(struct damon_ctx *ctx) ctx->aggr_interval); } +/* + * Reset the aggregated monitoring results ('nr_accesses' of each region). + */ +static void kdamond_reset_aggregated(struct damon_ctx *c) +{ + struct damon_target *t; + + damon_for_each_target(t, c) { + struct damon_region *r; + + damon_for_each_region(r, t) + r->nr_accesses = 0; + } +} + /* * Check whether it is time to check and apply the target monitoring regions * @@ -236,6 +353,7 @@ static bool kdamond_need_update_regions(struct damon_ctx *ctx) */ static bool kdamond_need_stop(struct damon_ctx *ctx) { + struct damon_target *t; bool stop; mutex_lock(&ctx->kdamond_lock); @@ -247,7 +365,15 @@ static bool kdamond_need_stop(struct damon_ctx *ctx) if (!ctx->primitive.target_valid) return false; - return !ctx->primitive.target_valid(ctx->target); + if (ctx->target_type == DAMON_ARBITRARY_TARGET) + return !ctx->primitive.target_valid(ctx->arbitrary_target); + + damon_for_each_target(t, ctx) { + if (ctx->primitive.target_valid(t)) + return false; + } + + return true; } static void set_kdamond_stop(struct damon_ctx *ctx) @@ -263,6 +389,8 @@ static void set_kdamond_stop(struct damon_ctx *ctx) static int kdamond_fn(void *data) { struct damon_ctx *ctx = (struct damon_ctx *)data; + struct damon_target *t; + struct damon_region *r, *next; pr_info("kdamond (%d) starts\n", ctx->kdamond->pid); @@ -287,6 +415,8 @@ static int kdamond_fn(void *data) if (ctx->callback.after_aggregation && ctx->callback.after_aggregation(ctx)) set_kdamond_stop(ctx); + if (ctx->target_type != DAMON_ARBITRARY_TARGET) + kdamond_reset_aggregated(ctx); if (ctx->primitive.reset_aggregated) ctx->primitive.reset_aggregated(ctx); } @@ -296,6 +426,12 @@ static int kdamond_fn(void *data) ctx->primitive.update_target_regions(ctx); } } + if (ctx->target_type != DAMON_ARBITRARY_TARGET) { + damon_for_each_target(t, ctx) { + damon_for_each_region_safe(r, next, t) + damon_destroy_region(r); + } + } if (ctx->callback.before_terminate && ctx->callback.before_terminate(ctx)) From patchwork Tue Dec 15 11:54:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11974627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19677C4361B for ; Tue, 15 Dec 2020 11:57:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 94FE4222B3 for ; Tue, 15 Dec 2020 11:57:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 94FE4222B3 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2F5F88D0001; Tue, 15 Dec 2020 06:57:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 27D766B006E; Tue, 15 Dec 2020 06:57:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1544F8D0001; Tue, 15 Dec 2020 06:57:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103]) by kanga.kvack.org (Postfix) with ESMTP id E86EA6B006C for ; Tue, 15 Dec 2020 06:57:28 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B24688249980 for ; Tue, 15 Dec 2020 11:57:28 +0000 (UTC) X-FDA: 77595366576.18.error92_0e1253427423 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 8E353100ED3CB for ; Tue, 15 Dec 2020 11:57:28 +0000 (UTC) X-HE-Tag: error92_0e1253427423 X-Filterd-Recvd-Size: 18822 Received: from smtp-fw-9101.amazon.com (smtp-fw-9101.amazon.com [207.171.184.25]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 11:57:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1608033448; x=1639569448; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=N/NhuWKknhjTq7U4VhihMJV62r2AG12h+d0PPbwSbbs=; b=pbe4wthng6z1vKtOs2ldcNSGxP+WqfZDlLqxC4/UwTyO5RmauzGB88rr ZurlT7zUK1RDqeK/2gJwJw9tDzTFiuT2Qt9df6dObjbWbbJAKFPSNCZJo t8uy8Gx8+0sVW/6ekOto/mayY9dTaZr30jBTY9qfUCG/vamh1tnsdwCmk o=; X-IronPort-AV: E=Sophos;i="5.78,420,1599523200"; d="scan'208";a="96129173" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1d-474bcd9f.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP; 15 Dec 2020 11:57:18 +0000 Received: from EX13D31EUA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-1d-474bcd9f.us-east-1.amazon.com (Postfix) with ESMTPS id D5D20A1C35; Tue, 15 Dec 2020 11:57:05 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.162.252) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 15 Dec 2020 11:56:48 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v23 03/15] mm/damon: Adaptively adjust regions Date: Tue, 15 Dec 2020 12:54:36 +0100 Message-ID: <20201215115448.25633-4-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201215115448.25633-1-sjpark@amazon.com> References: <20201215115448.25633-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.162.252] X-ClientProxiedBy: EX13D33UWC002.ant.amazon.com (10.43.162.11) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Even somehow the initial monitoring target regions are well constructed to fulfill the assumption (pages in same region have similar access frequencies), the data access pattern can be dynamically changed. This will result in low monitoring quality. To keep the assumption as much as possible, DAMON adaptively merges and splits each region based on their access frequency. For each ``aggregation interval``, it compares the access frequencies of adjacent regions and merges those if the frequency difference is small. Then, after it reports and clears the aggregated access frequency of each region, it splits each region into two or three regions if the total number of regions will not exceed the user-specified maximum number of regions after the split. In this way, DAMON provides its best-effort quality and minimal overhead while keeping the upper-bound overhead that users set. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 41 +++++--- mm/damon/core.c | 220 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 240 insertions(+), 21 deletions(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index 7d4685adc8a9..f446f8433599 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -12,6 +12,9 @@ #include #include +/* Minimal region size. Every damon_region is aligned by this. */ +#define DAMON_MIN_REGION PAGE_SIZE + /** * struct damon_addr_range - Represents an address region of [@start, @end). * @start: Start address of the region (inclusive). @@ -86,6 +89,8 @@ struct damon_ctx; * prepared for the next access check. * @check_accesses should check the accesses to each region that made after the * last preparation and update the number of observed accesses of each region. + * It should also return max number of observed accesses that made as a result + * of its update. * @reset_aggregated should reset the access monitoring results that aggregated * by @check_accesses. * @target_valid should check whether the target is still valid for the @@ -98,7 +103,7 @@ struct damon_primitive { void (*init_target_regions)(struct damon_ctx *context); void (*update_target_regions)(struct damon_ctx *context); void (*prepare_access_checks)(struct damon_ctx *context); - void (*check_accesses)(struct damon_ctx *context); + unsigned int (*check_accesses)(struct damon_ctx *context); void (*reset_aggregated)(struct damon_ctx *context); bool (*target_valid)(void *target); void (*cleanup)(struct damon_ctx *context); @@ -138,11 +143,11 @@ struct damon_callback { /** * enum damon_target_type - Represents the type of the monitoring target. * - * @DAMON_REGION_SAMPLING_TARGET: Region based sampling target. - * @DAMON_ARBITRARY_TARGET: User-defined arbitrary type target. + * @DAMON_ADAPTIVE_TARGET: Adaptive regions adjustment applied target. + * @DAMON_ARBITRARY_TARGET: User-defined arbitrary type target. */ enum damon_target_type { - DAMON_REGION_SAMPLING_TARGET, + DAMON_ADAPTIVE_TARGET, DAMON_ARBITRARY_TARGET, }; @@ -187,13 +192,15 @@ enum damon_target_type { * * @target_type: Type of the monitoring target. * - * @region_targets: Head of monitoring targets (&damon_target) list. + * @min_nr_regions: The minimum number of adaptive monitoring regions. + * @max_nr_regions: The maximum number of adaptive monitoring regions. + * @adaptive_targets: Head of monitoring targets (&damon_target) list. * * @arbitrary_target: Pointer to arbitrary type target. * - * @region_targets are valid only if @target_type is - * DAMON_REGION_SAMPLING_TARGET. @arbitrary_target is valid only if - * @target_type is DAMON_ARBITRARY_TARGET. + * @min_nr_regions, @max_nr_regions and @adaptive_targets are valid only if + * @target_type is &DAMON_ADAPTIVE_TARGET. @arbitrary_target is valid only if + * @target_type is &DAMON_ARBITRARY_TARGET. */ struct damon_ctx { unsigned long sample_interval; @@ -214,11 +221,13 @@ struct damon_ctx { enum damon_target_type target_type; union { - /* DAMON_REGION_SAMPLING_TARGET */ - struct list_head region_targets; + struct { /* DAMON_ADAPTIVE_TARGET */ + unsigned long min_nr_regions; + unsigned long max_nr_regions; + struct list_head adaptive_targets; + }; - /* DAMON_ARBITRARY_TARGET */ - void *arbitrary_target; + void *arbitrary_target; /* DAMON_ARBITRARY_TARGET */ }; }; @@ -235,10 +244,10 @@ struct damon_ctx { list_for_each_entry_safe(r, next, &t->regions_list, list) #define damon_for_each_target(t, ctx) \ - list_for_each_entry(t, &(ctx)->region_targets, list) + list_for_each_entry(t, &(ctx)->adaptive_targets, list) #define damon_for_each_target_safe(t, next, ctx) \ - list_for_each_entry_safe(t, next, &(ctx)->region_targets, list) + list_for_each_entry_safe(t, next, &(ctx)->adaptive_targets, list) #ifdef CONFIG_DAMON @@ -252,11 +261,13 @@ struct damon_target *damon_new_target(unsigned long id); void damon_add_target(struct damon_ctx *ctx, struct damon_target *t); void damon_free_target(struct damon_target *t); void damon_destroy_target(struct damon_target *t); +unsigned int damon_nr_regions(struct damon_target *t); struct damon_ctx *damon_new_ctx(enum damon_target_type type); void damon_destroy_ctx(struct damon_ctx *ctx); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, - unsigned long aggr_int, unsigned long regions_update_int); + unsigned long aggr_int, unsigned long regions_update_int, + unsigned long min_nr_reg, unsigned long max_nr_reg); int damon_start(struct damon_ctx **ctxs, int nr_ctxs); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); diff --git a/mm/damon/core.c b/mm/damon/core.c index 167487e75737..0f9beb60d9dd 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -10,8 +10,12 @@ #include #include #include +#include #include +/* Get a random number in [l, r) */ +#define damon_rand(l, r) (l + prandom_u32_max(r - l)) + static DEFINE_MUTEX(damon_lock); static int nr_running_ctxs; @@ -87,7 +91,7 @@ struct damon_target *damon_new_target(unsigned long id) void damon_add_target(struct damon_ctx *ctx, struct damon_target *t) { - list_add_tail(&t->list, &ctx->region_targets); + list_add_tail(&t->list, &ctx->adaptive_targets); } static void damon_del_target(struct damon_target *t) @@ -110,6 +114,17 @@ void damon_destroy_target(struct damon_target *t) damon_free_target(t); } +unsigned int damon_nr_regions(struct damon_target *t) +{ + struct damon_region *r; + unsigned int nr_regions = 0; + + damon_for_each_region(r, t) + nr_regions++; + + return nr_regions; +} + struct damon_ctx *damon_new_ctx(enum damon_target_type type) { struct damon_ctx *ctx; @@ -128,8 +143,12 @@ struct damon_ctx *damon_new_ctx(enum damon_target_type type) mutex_init(&ctx->kdamond_lock); ctx->target_type = type; - if (type != DAMON_ARBITRARY_TARGET) - INIT_LIST_HEAD(&ctx->region_targets); + if (type != DAMON_ARBITRARY_TARGET) { + ctx->min_nr_regions = 10; + ctx->max_nr_regions = 1000; + + INIT_LIST_HEAD(&ctx->adaptive_targets); + } return ctx; } @@ -150,6 +169,8 @@ void damon_destroy_ctx(struct damon_ctx *ctx) * @sample_int: time interval between samplings * @aggr_int: time interval between aggregations * @regions_update_int: time interval between target regions update + * @min_nr_reg: minimal number of regions + * @max_nr_reg: maximum number of regions * * This function should not be called while the kdamond is running. * Every time interval is in micro-seconds. @@ -157,15 +178,51 @@ void damon_destroy_ctx(struct damon_ctx *ctx) * Return: 0 on success, negative error code otherwise. */ int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, - unsigned long aggr_int, unsigned long regions_update_int) + unsigned long aggr_int, unsigned long regions_update_int, + unsigned long min_nr_reg, unsigned long max_nr_reg) { + if (min_nr_reg < 3) { + pr_err("min_nr_regions (%lu) must be at least 3\n", + min_nr_reg); + return -EINVAL; + } + if (min_nr_reg > max_nr_reg) { + pr_err("invalid nr_regions. min (%lu) > max (%lu)\n", + min_nr_reg, max_nr_reg); + return -EINVAL; + } + ctx->sample_interval = sample_int; ctx->aggr_interval = aggr_int; ctx->regions_update_interval = regions_update_int; + if (ctx->target_type != DAMON_ARBITRARY_TARGET) { + ctx->min_nr_regions = min_nr_reg; + ctx->max_nr_regions = max_nr_reg; + } return 0; } +/* Returns the size upper limit for each monitoring region */ +static unsigned long damon_region_sz_limit(struct damon_ctx *ctx) +{ + struct damon_target *t; + struct damon_region *r; + unsigned long sz = 0; + + damon_for_each_target(t, ctx) { + damon_for_each_region(r, t) + sz += r->ar.end - r->ar.start; + } + + if (ctx->min_nr_regions) + sz /= ctx->min_nr_regions; + if (sz < DAMON_MIN_REGION) + sz = DAMON_MIN_REGION; + + return sz; +} + static bool damon_kdamond_running(struct damon_ctx *ctx) { bool running; @@ -332,6 +389,146 @@ static void kdamond_reset_aggregated(struct damon_ctx *c) } } +#define sz_damon_region(r) (r->ar.end - r->ar.start) + +/* + * Merge two adjacent regions into one region + */ +static void damon_merge_two_regions(struct damon_region *l, + struct damon_region *r) +{ + unsigned long sz_l = sz_damon_region(l), sz_r = sz_damon_region(r); + + l->nr_accesses = (l->nr_accesses * sz_l + r->nr_accesses * sz_r) / + (sz_l + sz_r); + l->ar.end = r->ar.end; + damon_destroy_region(r); +} + +#define diff_of(a, b) (a > b ? a - b : b - a) + +/* + * Merge adjacent regions having similar access frequencies + * + * t target affected by this merge operation + * thres '->nr_accesses' diff threshold for the merge + * sz_limit size upper limit of each region + */ +static void damon_merge_regions_of(struct damon_target *t, unsigned int thres, + unsigned long sz_limit) +{ + struct damon_region *r, *prev = NULL, *next; + + damon_for_each_region_safe(r, next, t) { + if (prev && prev->ar.end == r->ar.start && + diff_of(prev->nr_accesses, r->nr_accesses) <= thres && + sz_damon_region(prev) + sz_damon_region(r) <= sz_limit) + damon_merge_two_regions(prev, r); + else + prev = r; + } +} + +/* + * Merge adjacent regions having similar access frequencies + * + * threshold '->nr_accesses' diff threshold for the merge + * sz_limit size upper limit of each region + * + * This function merges monitoring target regions which are adjacent and their + * access frequencies are similar. This is for minimizing the monitoring + * overhead under the dynamically changeable access pattern. If a merge was + * unnecessarily made, later 'kdamond_split_regions()' will revert it. + */ +static void kdamond_merge_regions(struct damon_ctx *c, unsigned int threshold, + unsigned long sz_limit) +{ + struct damon_target *t; + + damon_for_each_target(t, c) + damon_merge_regions_of(t, threshold, sz_limit); +} + +/* + * Split a region in two + * + * r the region to be split + * sz_r size of the first sub-region that will be made + */ +static void damon_split_region_at(struct damon_ctx *ctx, + struct damon_region *r, unsigned long sz_r) +{ + struct damon_region *new; + + new = damon_new_region(r->ar.start + sz_r, r->ar.end); + r->ar.end = new->ar.start; + + damon_insert_region(new, r, damon_next_region(r)); +} + +/* Split every region in the given target into 'nr_subs' regions */ +static void damon_split_regions_of(struct damon_ctx *ctx, + struct damon_target *t, int nr_subs) +{ + struct damon_region *r, *next; + unsigned long sz_region, sz_sub = 0; + int i; + + damon_for_each_region_safe(r, next, t) { + sz_region = r->ar.end - r->ar.start; + + for (i = 0; i < nr_subs - 1 && + sz_region > 2 * DAMON_MIN_REGION; i++) { + /* + * Randomly select size of left sub-region to be at + * least 10 percent and at most 90% of original region + */ + sz_sub = ALIGN_DOWN(damon_rand(1, 10) * + sz_region / 10, DAMON_MIN_REGION); + /* Do not allow blank region */ + if (sz_sub == 0 || sz_sub >= sz_region) + continue; + + damon_split_region_at(ctx, r, sz_sub); + sz_region = sz_sub; + } + } +} + +/* + * Split every target region into randomly-sized small regions + * + * This function splits every target region into random-sized small regions if + * current total number of the regions is equal or smaller than half of the + * user-specified maximum number of regions. This is for maximizing the + * monitoring accuracy under the dynamically changeable access patterns. If a + * split was unnecessarily made, later 'kdamond_merge_regions()' will revert + * it. + */ +static void kdamond_split_regions(struct damon_ctx *ctx) +{ + struct damon_target *t; + unsigned int nr_regions = 0; + static unsigned int last_nr_regions; + int nr_subregions = 2; + + damon_for_each_target(t, ctx) + nr_regions += damon_nr_regions(t); + + if (nr_regions > ctx->max_nr_regions / 2) + return; + + /* Maybe the middle of the region has different access frequency */ + if (last_nr_regions == nr_regions && + nr_regions < ctx->max_nr_regions / 3) + nr_subregions = 3; + + damon_for_each_target(t, ctx) + damon_split_regions_of(ctx, t, nr_subregions); + + last_nr_regions = nr_regions; +} + /* * Check whether it is time to check and apply the target monitoring regions * @@ -391,6 +588,8 @@ static int kdamond_fn(void *data) struct damon_ctx *ctx = (struct damon_ctx *)data; struct damon_target *t; struct damon_region *r, *next; + unsigned int max_nr_accesses = 0; + unsigned long sz_limit = 0; pr_info("kdamond (%d) starts\n", ctx->kdamond->pid); @@ -399,6 +598,8 @@ static int kdamond_fn(void *data) if (ctx->callback.before_start && ctx->callback.before_start(ctx)) set_kdamond_stop(ctx); + sz_limit = damon_region_sz_limit(ctx); + while (!kdamond_need_stop(ctx)) { if (ctx->primitive.prepare_access_checks) ctx->primitive.prepare_access_checks(ctx); @@ -409,14 +610,20 @@ static int kdamond_fn(void *data) usleep_range(ctx->sample_interval, ctx->sample_interval + 1); if (ctx->primitive.check_accesses) - ctx->primitive.check_accesses(ctx); + max_nr_accesses = ctx->primitive.check_accesses(ctx); if (kdamond_aggregate_interval_passed(ctx)) { + if (ctx->target_type != DAMON_ARBITRARY_TARGET) + kdamond_merge_regions(ctx, + max_nr_accesses / 10, + sz_limit); if (ctx->callback.after_aggregation && ctx->callback.after_aggregation(ctx)) set_kdamond_stop(ctx); - if (ctx->target_type != DAMON_ARBITRARY_TARGET) + if (ctx->target_type != DAMON_ARBITRARY_TARGET) { kdamond_reset_aggregated(ctx); + kdamond_split_regions(ctx); + } if (ctx->primitive.reset_aggregated) ctx->primitive.reset_aggregated(ctx); } @@ -424,6 +631,7 @@ static int kdamond_fn(void *data) if (kdamond_need_update_regions(ctx)) { if (ctx->primitive.update_target_regions) ctx->primitive.update_target_regions(ctx); + sz_limit = damon_region_sz_limit(ctx); } } if (ctx->target_type != DAMON_ARBITRARY_TARGET) { From patchwork Tue Dec 15 11:54:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11974629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3577C4361B for ; Tue, 15 Dec 2020 11:57:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 49AF9222B6 for ; Tue, 15 Dec 2020 11:57:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 49AF9222B6 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C6A8A6B006C; Tue, 15 Dec 2020 06:57:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BF24A8D0002; Tue, 15 Dec 2020 06:57:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABA556B0070; Tue, 15 Dec 2020 06:57:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0137.hostedemail.com [216.40.44.137]) by kanga.kvack.org (Postfix) with ESMTP id 8F50B6B006C for ; Tue, 15 Dec 2020 06:57:47 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5A1EE1E01 for ; Tue, 15 Dec 2020 11:57:47 +0000 (UTC) X-FDA: 77595367374.06.plant98_391695f27423 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 378EC10038270 for ; Tue, 15 Dec 2020 11:57:47 +0000 (UTC) X-HE-Tag: plant98_391695f27423 X-Filterd-Recvd-Size: 9523 Received: from smtp-fw-9101.amazon.com (smtp-fw-9101.amazon.com [207.171.184.25]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 11:57:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1608033467; x=1639569467; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=8iWqDRqn6hFNTl54/ldlnAbz7CQs7dyENk/0p0BsffE=; b=WcxUmdd4UI9qW6OttLc/WevHnVrgF8had9XtF26uaEb6GQBgPfjdWoW2 nPjrv+i6De+//i+EzvizxWraNxlMTbnCmC5fzN1Usw59CyEQsYXWAVRt0 KL3pQc+EN7fQKS0FnwvMlhzkyK6SmvjFNu25ISE/CHRAcOvs2b4n2ojuk Q=; X-IronPort-AV: E=Sophos;i="5.78,420,1599523200"; d="scan'208";a="96129235" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP; 15 Dec 2020 11:57:43 +0000 Received: from EX13D31EUA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com (Postfix) with ESMTPS id CDC92A1E3A; Tue, 15 Dec 2020 11:57:30 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.162.252) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 15 Dec 2020 11:57:13 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v23 04/15] mm/idle_page_tracking: Make PG_idle reusable Date: Tue, 15 Dec 2020 12:54:37 +0100 Message-ID: <20201215115448.25633-5-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201215115448.25633-1-sjpark@amazon.com> References: <20201215115448.25633-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.162.252] X-ClientProxiedBy: EX13D33UWC002.ant.amazon.com (10.43.162.11) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park PG_idle and PG_young allow the two PTE Accessed bit users, Idle Page Tracking and the reclaim logic concurrently work while don't interfere each other. That is, when they need to clear the Accessed bit, they set PG_young to represent the previous state of the bit, respectively. And when they need to read the bit, if the bit is cleared, they further read the PG_young to know whether the other has cleared the bit meanwhile or not. We could add another page flag and extend the mechanism to use the flag if we need to add another concurrent PTE Accessed bit user subsystem. However, the space is limited. Meanwhile, if the new subsystem is mutually exclusive with IDLE_PAGE_TRACKING or interfering with it is not a real problem, it would be ok to simply reuse the PG_idle flag. However, it's impossible because the flags are dependent on IDLE_PAGE_TRACKING. To allow such reuse of the flags, this commit separates the PG_young and PG_idle flag logic from IDLE_PAGE_TRACKING and introduces new kernel config, 'PAGE_IDLE_FLAG'. Hence, a new subsystem would be able to reuse PG_idle without depending on IDLE_PAGE_TRACKING. In the next commit, DAMON's reference implementation of the virtual memory address space monitoring primitives will use it. Signed-off-by: SeongJae Park Reviewed-by: Shakeel Butt --- include/linux/page-flags.h | 4 ++-- include/linux/page_ext.h | 2 +- include/linux/page_idle.h | 6 +++--- include/trace/events/mmflags.h | 2 +- mm/Kconfig | 8 ++++++++ mm/page_ext.c | 12 +++++++++++- mm/page_idle.c | 10 ---------- 7 files changed, 26 insertions(+), 18 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 4f6ba9379112..0f010fc7f1c4 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -132,7 +132,7 @@ enum pageflags { #ifdef CONFIG_MEMORY_FAILURE PG_hwpoison, /* hardware poisoned page. Don't touch */ #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) PG_young, PG_idle, #endif @@ -437,7 +437,7 @@ PAGEFLAG_FALSE(HWPoison) #define __PG_HWPOISON 0 #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) TESTPAGEFLAG(Young, young, PF_ANY) SETPAGEFLAG(Young, young, PF_ANY) TESTCLEARFLAG(Young, young, PF_ANY) diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index cfce186f0c4e..c9cbc9756011 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -19,7 +19,7 @@ struct page_ext_operations { enum page_ext_flags { PAGE_EXT_OWNER, PAGE_EXT_OWNER_ALLOCATED, -#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, #endif diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index 1e894d34bdce..d8a6aecf99cb 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -6,7 +6,7 @@ #include #include -#ifdef CONFIG_IDLE_PAGE_TRACKING +#ifdef CONFIG_PAGE_IDLE_FLAG #ifdef CONFIG_64BIT static inline bool page_is_young(struct page *page) @@ -106,7 +106,7 @@ static inline void clear_page_idle(struct page *page) } #endif /* CONFIG_64BIT */ -#else /* !CONFIG_IDLE_PAGE_TRACKING */ +#else /* !CONFIG_PAGE_IDLE_FLAG */ static inline bool page_is_young(struct page *page) { @@ -135,6 +135,6 @@ static inline void clear_page_idle(struct page *page) { } -#endif /* CONFIG_IDLE_PAGE_TRACKING */ +#endif /* CONFIG_PAGE_IDLE_FLAG */ #endif /* _LINUX_MM_PAGE_IDLE_H */ diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 67018d367b9f..ebee94c397a6 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -73,7 +73,7 @@ #define IF_HAVE_PG_HWPOISON(flag,string) #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) #define IF_HAVE_PG_IDLE(flag,string) ,{1UL << flag, string} #else #define IF_HAVE_PG_IDLE(flag,string) diff --git a/mm/Kconfig b/mm/Kconfig index b97f2e8ab83f..500c885eb9f1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -749,10 +749,18 @@ config DEFERRED_STRUCT_PAGE_INIT lifetime of the system until these kthreads finish the initialisation. +config PAGE_IDLE_FLAG + bool "Add PG_idle and PG_young flags" + help + This feature adds PG_idle and PG_young flags in 'struct page'. PTE + Accessed bit writers can set the state of the bit in the flags to let + other PTE Accessed bit readers don't disturbed. + config IDLE_PAGE_TRACKING bool "Enable idle page tracking" depends on SYSFS && MMU select PAGE_EXTENSION if !64BIT + select PAGE_IDLE_FLAG help This feature allows to estimate the amount of user pages that have not been touched during a given period of time. This information can diff --git a/mm/page_ext.c b/mm/page_ext.c index a3616f7a0e9e..f9a6ff65ac0a 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -58,11 +58,21 @@ * can utilize this callback to initialize the state of it correctly. */ +#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) +static bool need_page_idle(void) +{ + return true; +} +struct page_ext_operations page_idle_ops = { + .need = need_page_idle, +}; +#endif + static struct page_ext_operations *page_ext_ops[] = { #ifdef CONFIG_PAGE_OWNER &page_owner_ops, #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) &page_idle_ops, #endif }; diff --git a/mm/page_idle.c b/mm/page_idle.c index 057c61df12db..144fb4ed961d 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -211,16 +211,6 @@ static const struct attribute_group page_idle_attr_group = { .name = "page_idle", }; -#ifndef CONFIG_64BIT -static bool need_page_idle(void) -{ - return true; -} -struct page_ext_operations page_idle_ops = { - .need = need_page_idle, -}; -#endif - static int __init page_idle_init(void) { int err; From patchwork Tue Dec 15 11:54:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11974631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DA30C2BB9A for ; Tue, 15 Dec 2020 11:58:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0FDB7222B3 for ; Tue, 15 Dec 2020 11:58:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0FDB7222B3 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 69D9A6B0068; Tue, 15 Dec 2020 06:58:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 64DB88D0003; Tue, 15 Dec 2020 06:58:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EDA18D0002; Tue, 15 Dec 2020 06:58:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id 2F6776B0068 for ; Tue, 15 Dec 2020 06:58:22 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CD12D1DE4 for ; Tue, 15 Dec 2020 11:58:21 +0000 (UTC) X-FDA: 77595368802.08.tent78_0d0c11627423 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id AADF31819E76B for ; Tue, 15 Dec 2020 11:58:21 +0000 (UTC) X-HE-Tag: tent78_0d0c11627423 X-Filterd-Recvd-Size: 23923 Received: from smtp-fw-9103.amazon.com (smtp-fw-9103.amazon.com [207.171.188.200]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 11:58:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1608033501; x=1639569501; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=+yDNUbD1VkSrw4/flSoVi2taHYQgWJ5Kn58Sj7Vevdg=; b=Y96E6hyINhFd1+Zt1X/l3VBJa5hA5BpWLOdThrIdOI366rXNBzWNs+ZV RhyNpXqEvN+05bX7GVjl9NVnRGZiOVDWeK28ufWxEZPjc/9gk8Cd9j1o9 P5h1EVMoK+8VfqvwMq7RJ+nUzmKzgBTHRIowNQEnrFq9hN65OwkgFXZbD 4=; X-IronPort-AV: E=Sophos;i="5.78,420,1599523200"; d="scan'208";a="903209542" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9103.sea19.amazon.com with ESMTP; 15 Dec 2020 11:58:10 +0000 Received: from EX13D31EUA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com (Postfix) with ESMTPS id E1519A2109; Tue, 15 Dec 2020 11:57:58 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.162.252) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 15 Dec 2020 11:57:41 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v23 05/15] mm/damon: Implement primitives for the virtual memory address spaces Date: Tue, 15 Dec 2020 12:54:38 +0100 Message-ID: <20201215115448.25633-6-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201215115448.25633-1-sjpark@amazon.com> References: <20201215115448.25633-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.162.252] X-ClientProxiedBy: EX13D33UWC002.ant.amazon.com (10.43.162.11) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park This commit introduces a reference implementation of the address space specific low level primitives for the virtual address space, so that users of DAMON can easily monitor the data accesses on virtual address spaces of specific processes by simply configuring the implementation to be used by DAMON. The low level primitives for the fundamental access monitoring are defined in two parts: 1. Identification of the monitoring target address range for the address space. 2. Access check of specific address range in the target space. The reference implementation for the virtual address space does the works as below. PTE Accessed-bit Based Access Check ----------------------------------- The implementation uses PTE Accessed-bit for basic access checks. That is, it clears the bit for next sampling target page and checks whether it set again after one sampling period. This could disturb the reclaim logic. DAMON uses ``PG_idle`` and ``PG_young`` page flags to solve the conflict, as Idle page tracking does. VMA-based Target Address Range Construction ------------------------------------------- Only small parts in the super-huge virtual address space of the processes are mapped to physical memory and accessed. Thus, tracking the unmapped address regions is just wasteful. However, because DAMON can deal with some level of noise using the adaptive regions adjustment mechanism, tracking every mapping is not strictly required but could even incur a high overhead in some cases. That said, too huge unmapped areas inside the monitoring target should be removed to not take the time for the adaptive mechanism. For the reason, this implementation converts the complex mappings to three distinct regions that cover every mapped area of the address space. Also, the two gaps between the three regions are the two biggest unmapped areas in the given address space. The two biggest unmapped areas would be the gap between the heap and the uppermost mmap()-ed region, and the gap between the lowermost mmap()-ed region and the stack in most of the cases. Because these gaps are exceptionally huge in usual address spacees, excluding these will be sufficient to make a reasonable trade-off. Below shows this in detail:: (small mmap()-ed regions and munmap()-ed regions) Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 13 + mm/damon/Kconfig | 9 + mm/damon/Makefile | 1 + mm/damon/vaddr.c | 579 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 602 insertions(+) create mode 100644 mm/damon/vaddr.c diff --git a/include/linux/damon.h b/include/linux/damon.h index f446f8433599..39b4d6d3ddee 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -274,4 +274,17 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); #endif /* CONFIG_DAMON */ +#ifdef CONFIG_DAMON_VADDR + +/* Monitoring primitives for virtual memory address spaces */ +void damon_va_init_regions(struct damon_ctx *ctx); +void damon_va_update_regions(struct damon_ctx *ctx); +void damon_va_prepare_access_checks(struct damon_ctx *ctx); +unsigned int damon_va_check_accesses(struct damon_ctx *ctx); +bool damon_va_target_valid(void *t); +void damon_va_cleanup(struct damon_ctx *ctx); +void damon_va_set_primitives(struct damon_ctx *ctx); + +#endif /* CONFIG_DAMON_VADDR */ + #endif /* _DAMON_H */ diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index d00e99ac1a15..8ae080c52950 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -12,4 +12,13 @@ config DAMON See https://damonitor.github.io/doc/html/latest-damon/index.html for more information. +config DAMON_VADDR + bool "Data access monitoring primitives for virtual address spaces" + depends on DAMON && MMU + select PAGE_EXTENSION if !64BIT + select PAGE_IDLE_FLAG + help + This builds the default data access monitoring primitives for DAMON + that works for virtual address spaces. + endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile index 4fd2edb4becf..6ebbd08aed67 100644 --- a/mm/damon/Makefile +++ b/mm/damon/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_DAMON) := core.o +obj-$(CONFIG_DAMON_VADDR) += vaddr.o diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c new file mode 100644 index 000000000000..efc56ef843fe --- /dev/null +++ b/mm/damon/vaddr.c @@ -0,0 +1,579 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * DAMON Primitives for Virtual Address Spaces + * + * Author: SeongJae Park + */ + +#define pr_fmt(fmt) "damon-va: " fmt + +#include +#include +#include +#include +#include +#include +#include + +/* Get a random number in [l, r) */ +#define damon_rand(l, r) (l + prandom_u32_max(r - l)) + +/* + * 't->id' should be the pointer to the relevant 'struct pid' having reference + * count. Caller must put the returned task, unless it is NULL. + */ +#define damon_get_task_struct(t) \ + (get_pid_task((struct pid *)t->id, PIDTYPE_PID)) + +/* + * Get the mm_struct of the given target + * + * Caller _must_ put the mm_struct after use, unless it is NULL. + * + * Returns the mm_struct of the target on success, NULL on failure + */ +static struct mm_struct *damon_get_mm(struct damon_target *t) +{ + struct task_struct *task; + struct mm_struct *mm; + + task = damon_get_task_struct(t); + if (!task) + return NULL; + + mm = get_task_mm(task); + put_task_struct(task); + return mm; +} + +/* + * Functions for the initial monitoring target regions construction + */ + +/* + * Size-evenly split a region into 'nr_pieces' small regions + * + * Returns 0 on success, or negative error code otherwise. + */ +static int damon_va_evenly_split_region(struct damon_ctx *ctx, + struct damon_region *r, unsigned int nr_pieces) +{ + unsigned long sz_orig, sz_piece, orig_end; + struct damon_region *n = NULL, *next; + unsigned long start; + + if (!r || !nr_pieces) + return -EINVAL; + + orig_end = r->ar.end; + sz_orig = r->ar.end - r->ar.start; + sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION); + + if (!sz_piece) + return -EINVAL; + + r->ar.end = r->ar.start + sz_piece; + next = damon_next_region(r); + for (start = r->ar.end; start + sz_piece <= orig_end; + start += sz_piece) { + n = damon_new_region(start, start + sz_piece); + if (!n) + return -ENOMEM; + damon_insert_region(n, r, next); + r = n; + } + /* complement last region for possible rounding error */ + if (n) + n->ar.end = orig_end; + + return 0; +} + +static unsigned long sz_range(struct damon_addr_range *r) +{ + return r->end - r->start; +} + +static void swap_ranges(struct damon_addr_range *r1, + struct damon_addr_range *r2) +{ + struct damon_addr_range tmp; + + tmp = *r1; + *r1 = *r2; + *r2 = tmp; +} + +/* + * Find three regions separated by two biggest unmapped regions + * + * vma the head vma of the target address space + * regions an array of three address ranges that results will be saved + * + * This function receives an address space and finds three regions in it which + * separated by the two biggest unmapped regions in the space. Please refer to + * below comments of '__damon_va_init_regions()' function to know why this is + * necessary. + * + * Returns 0 if success, or negative error code otherwise. + */ +static int __damon_va_three_regions(struct vm_area_struct *vma, + struct damon_addr_range regions[3]) +{ + struct damon_addr_range gap = {0}, first_gap = {0}, second_gap = {0}; + struct vm_area_struct *last_vma = NULL; + unsigned long start = 0; + struct rb_root rbroot; + + /* Find two biggest gaps so that first_gap > second_gap > others */ + for (; vma; vma = vma->vm_next) { + if (!last_vma) { + start = vma->vm_start; + goto next; + } + + if (vma->rb_subtree_gap <= sz_range(&second_gap)) { + rbroot.rb_node = &vma->vm_rb; + vma = rb_entry(rb_last(&rbroot), + struct vm_area_struct, vm_rb); + goto next; + } + + gap.start = last_vma->vm_end; + gap.end = vma->vm_start; + if (sz_range(&gap) > sz_range(&second_gap)) { + swap_ranges(&gap, &second_gap); + if (sz_range(&second_gap) > sz_range(&first_gap)) + swap_ranges(&second_gap, &first_gap); + } +next: + last_vma = vma; + } + + if (!sz_range(&second_gap) || !sz_range(&first_gap)) + return -EINVAL; + + /* Sort the two biggest gaps by address */ + if (first_gap.start > second_gap.start) + swap_ranges(&first_gap, &second_gap); + + /* Store the result */ + regions[0].start = ALIGN(start, DAMON_MIN_REGION); + regions[0].end = ALIGN(first_gap.start, DAMON_MIN_REGION); + regions[1].start = ALIGN(first_gap.end, DAMON_MIN_REGION); + regions[1].end = ALIGN(second_gap.start, DAMON_MIN_REGION); + regions[2].start = ALIGN(second_gap.end, DAMON_MIN_REGION); + regions[2].end = ALIGN(last_vma->vm_end, DAMON_MIN_REGION); + + return 0; +} + +/* + * Get the three regions in the given target (task) + * + * Returns 0 on success, negative error code otherwise. + */ +static int damon_va_three_regions(struct damon_target *t, + struct damon_addr_range regions[3]) +{ + struct mm_struct *mm; + int rc; + + mm = damon_get_mm(t); + if (!mm) + return -EINVAL; + + mmap_read_lock(mm); + rc = __damon_va_three_regions(mm->mmap, regions); + mmap_read_unlock(mm); + + mmput(mm); + return rc; +} + +/* + * Initialize the monitoring target regions for the given target (task) + * + * t the given target + * + * Because only a number of small portions of the entire address space + * is actually mapped to the memory and accessed, monitoring the unmapped + * regions is wasteful. That said, because we can deal with small noises, + * tracking every mapping is not strictly required but could even incur a high + * overhead if the mapping frequently changes or the number of mappings is + * high. The adaptive regions adjustment mechanism will further help to deal + * with the noise by simply identifying the unmapped areas as a region that + * has no access. Moreover, applying the real mappings that would have many + * unmapped areas inside will make the adaptive mechanism quite complex. That + * said, too huge unmapped areas inside the monitoring target should be removed + * to not take the time for the adaptive mechanism. + * + * For the reason, we convert the complex mappings to three distinct regions + * that cover every mapped area of the address space. Also the two gaps + * between the three regions are the two biggest unmapped areas in the given + * address space. In detail, this function first identifies the start and the + * end of the mappings and the two biggest unmapped areas of the address space. + * Then, it constructs the three regions as below: + * + * [mappings[0]->start, big_two_unmapped_areas[0]->start) + * [big_two_unmapped_areas[0]->end, big_two_unmapped_areas[1]->start) + * [big_two_unmapped_areas[1]->end, mappings[nr_mappings - 1]->end) + * + * As usual memory map of processes is as below, the gap between the heap and + * the uppermost mmap()-ed region, and the gap between the lowermost mmap()-ed + * region and the stack will be two biggest unmapped regions. Because these + * gaps are exceptionally huge areas in usual address space, excluding these + * two biggest unmapped regions will be sufficient to make a trade-off. + * + * + * + * + * (other mmap()-ed regions and small unmapped regions) + * + * + * + */ +static void __damon_va_init_regions(struct damon_ctx *c, + struct damon_target *t) +{ + struct damon_region *r; + struct damon_addr_range regions[3]; + unsigned long sz = 0, nr_pieces; + int i; + + if (damon_va_three_regions(t, regions)) { + pr_err("Failed to get three regions of target %lu\n", t->id); + return; + } + + for (i = 0; i < 3; i++) + sz += regions[i].end - regions[i].start; + if (c->min_nr_regions) + sz /= c->min_nr_regions; + if (sz < DAMON_MIN_REGION) + sz = DAMON_MIN_REGION; + + /* Set the initial three regions of the target */ + for (i = 0; i < 3; i++) { + r = damon_new_region(regions[i].start, regions[i].end); + if (!r) { + pr_err("%d'th init region creation failed\n", i); + return; + } + damon_add_region(r, t); + + nr_pieces = (regions[i].end - regions[i].start) / sz; + damon_va_evenly_split_region(c, r, nr_pieces); + } +} + +/* Initialize '->regions_list' of every target (task) */ +void damon_va_init_regions(struct damon_ctx *ctx) +{ + struct damon_target *t; + + damon_for_each_target(t, ctx) { + /* the user may set the target regions as they want */ + if (!damon_nr_regions(t)) + __damon_va_init_regions(ctx, t); + } +} + +/* + * Functions for the dynamic monitoring target regions update + */ + +/* + * Check whether a region is intersecting an address range + * + * Returns true if it is. + */ +static bool damon_intersect(struct damon_region *r, struct damon_addr_range *re) +{ + return !(r->ar.end <= re->start || re->end <= r->ar.start); +} + +/* + * Update damon regions for the three big regions of the given target + * + * t the given target + * bregions the three big regions of the target + */ +static void damon_va_apply_three_regions(struct damon_ctx *ctx, + struct damon_target *t, struct damon_addr_range bregions[3]) +{ + struct damon_region *r, *next; + unsigned int i = 0; + + /* Remove regions which are not in the three big regions now */ + damon_for_each_region_safe(r, next, t) { + for (i = 0; i < 3; i++) { + if (damon_intersect(r, &bregions[i])) + break; + } + if (i == 3) + damon_destroy_region(r); + } + + /* Adjust intersecting regions to fit with the three big regions */ + for (i = 0; i < 3; i++) { + struct damon_region *first = NULL, *last; + struct damon_region *newr; + struct damon_addr_range *br; + + br = &bregions[i]; + /* Get the first and last regions which intersects with br */ + damon_for_each_region(r, t) { + if (damon_intersect(r, br)) { + if (!first) + first = r; + last = r; + } + if (r->ar.start >= br->end) + break; + } + if (!first) { + /* no damon_region intersects with this big region */ + newr = damon_new_region( + ALIGN_DOWN(br->start, + DAMON_MIN_REGION), + ALIGN(br->end, DAMON_MIN_REGION)); + if (!newr) + continue; + damon_insert_region(newr, damon_prev_region(r), r); + } else { + first->ar.start = ALIGN_DOWN(br->start, + DAMON_MIN_REGION); + last->ar.end = ALIGN(br->end, DAMON_MIN_REGION); + } + } +} + +/* + * Update regions for current memory mappings + */ +void damon_va_update_regions(struct damon_ctx *ctx) +{ + struct damon_addr_range three_regions[3]; + struct damon_target *t; + + damon_for_each_target(t, ctx) { + if (damon_va_three_regions(t, three_regions)) + continue; + damon_va_apply_three_regions(ctx, t, three_regions); + } +} + +static void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, + unsigned long addr) +{ + bool referenced = false; + struct page *page = pte_page(*pte); + + if (pte_young(*pte)) { + referenced = true; + *pte = pte_mkold(*pte); + } + +#ifdef CONFIG_MMU_NOTIFIER + if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE)) + referenced = true; +#endif /* CONFIG_MMU_NOTIFIER */ + + if (referenced) + set_page_young(page); + + set_page_idle(page); +} + +static void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, + unsigned long addr) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + bool referenced = false; + struct page *page = pmd_page(*pmd); + + if (pmd_young(*pmd)) { + referenced = true; + *pmd = pmd_mkold(*pmd); + } + +#ifdef CONFIG_MMU_NOTIFIER + if (mmu_notifier_clear_young(mm, addr, + addr + ((1UL) << HPAGE_PMD_SHIFT))) + referenced = true; +#endif /* CONFIG_MMU_NOTIFIER */ + + if (referenced) + set_page_young(page); + + set_page_idle(page); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +} + +static void damon_va_mkold(struct mm_struct *mm, unsigned long addr) +{ + pte_t *pte = NULL; + pmd_t *pmd = NULL; + spinlock_t *ptl; + + if (follow_pte_pmd(mm, addr, NULL, &pte, &pmd, &ptl)) + return; + + if (pte) { + damon_ptep_mkold(pte, mm, addr); + pte_unmap_unlock(pte, ptl); + } else { + damon_pmdp_mkold(pmd, mm, addr); + spin_unlock(ptl); + } +} + +/* + * Functions for the access checking of the regions + */ + +static void damon_va_prepare_access_check(struct damon_ctx *ctx, + struct mm_struct *mm, struct damon_region *r) +{ + r->sampling_addr = damon_rand(r->ar.start, r->ar.end); + + damon_va_mkold(mm, r->sampling_addr); +} + +void damon_va_prepare_access_checks(struct damon_ctx *ctx) +{ + struct damon_target *t; + struct mm_struct *mm; + struct damon_region *r; + + damon_for_each_target(t, ctx) { + mm = damon_get_mm(t); + if (!mm) + continue; + damon_for_each_region(r, t) + damon_va_prepare_access_check(ctx, mm, r); + mmput(mm); + } +} + +static bool damon_va_young(struct mm_struct *mm, unsigned long addr, + unsigned long *page_sz) +{ + pte_t *pte = NULL; + pmd_t *pmd = NULL; + spinlock_t *ptl; + bool young = false; + + if (follow_pte_pmd(mm, addr, NULL, &pte, &pmd, &ptl)) + return false; + + *page_sz = PAGE_SIZE; + if (pte) { + young = pte_young(*pte); + if (!young) + young = !page_is_idle(pte_page(*pte)); + pte_unmap_unlock(pte, ptl); + return young; + } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + young = pmd_young(*pmd); + if (!young) + young = !page_is_idle(pmd_page(*pmd)); + spin_unlock(ptl); + *page_sz = ((1UL) << HPAGE_PMD_SHIFT); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + + return young; +} + +/* + * Check whether the region was accessed after the last preparation + * + * mm 'mm_struct' for the given virtual address space + * r the region to be checked + */ +static void damon_va_check_access(struct damon_ctx *ctx, + struct mm_struct *mm, struct damon_region *r) +{ + static struct mm_struct *last_mm; + static unsigned long last_addr; + static unsigned long last_page_sz = PAGE_SIZE; + static bool last_accessed; + + /* If the region is in the last checked page, reuse the result */ + if (mm == last_mm && (ALIGN_DOWN(last_addr, last_page_sz) == + ALIGN_DOWN(r->sampling_addr, last_page_sz))) { + if (last_accessed) + r->nr_accesses++; + return; + } + + last_accessed = damon_va_young(mm, r->sampling_addr, &last_page_sz); + if (last_accessed) + r->nr_accesses++; + + last_mm = mm; + last_addr = r->sampling_addr; +} + +unsigned int damon_va_check_accesses(struct damon_ctx *ctx) +{ + struct damon_target *t; + struct mm_struct *mm; + struct damon_region *r; + unsigned int max_nr_accesses = 0; + + damon_for_each_target(t, ctx) { + mm = damon_get_mm(t); + if (!mm) + continue; + damon_for_each_region(r, t) { + damon_va_check_access(ctx, mm, r); + max_nr_accesses = max(r->nr_accesses, max_nr_accesses); + } + mmput(mm); + } + + return max_nr_accesses; +} + +/* + * Functions for the target validity check and cleanup + */ + +bool damon_va_target_valid(void *target) +{ + struct damon_target *t = target; + struct task_struct *task; + + task = damon_get_task_struct(t); + if (task) { + put_task_struct(task); + return true; + } + + return false; +} + +void damon_va_cleanup(struct damon_ctx *ctx) +{ + struct damon_target *t, *next; + + damon_for_each_target_safe(t, next, ctx) { + put_pid((struct pid *)t->id); + damon_destroy_target(t); + } +} + +void damon_va_set_primitives(struct damon_ctx *ctx) +{ + ctx->primitive.init_target_regions = damon_va_init_regions; + ctx->primitive.update_target_regions = damon_va_update_regions; + ctx->primitive.prepare_access_checks = damon_va_prepare_access_checks; + ctx->primitive.check_accesses = damon_va_check_accesses; + ctx->primitive.reset_aggregated = NULL; + ctx->primitive.target_valid = damon_va_target_valid; + ctx->primitive.cleanup = damon_va_cleanup; +} From patchwork Tue Dec 15 11:54:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11974633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E92BC2BB48 for ; Tue, 15 Dec 2020 11:58:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 76BE72246B for ; Tue, 15 Dec 2020 11:58:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 76BE72246B Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0A9E96B006E; Tue, 15 Dec 2020 06:58:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0095B6B0070; Tue, 15 Dec 2020 06:58:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E14CF6B0071; Tue, 15 Dec 2020 06:58:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0054.hostedemail.com [216.40.44.54]) by kanga.kvack.org (Postfix) with ESMTP id C7E266B006E for ; Tue, 15 Dec 2020 06:58:47 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 91311181AEF30 for ; Tue, 15 Dec 2020 11:58:47 +0000 (UTC) X-FDA: 77595369894.12.mint67_0b0e19727423 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 6F0AC1801BE33 for ; Tue, 15 Dec 2020 11:58:47 +0000 (UTC) X-HE-Tag: mint67_0b0e19727423 X-Filterd-Recvd-Size: 5748 Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 11:58:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1608033527; x=1639569527; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=YFQHSMTSB3FR2oCFJrUxmUoQTFo+oRQbGyfrImXtuWA=; b=EjYpFP9zh5oKI680of51Uee8xBx/E9Y/hnBp94OYlK9/a8HbcmY4dCof R7SZnM9ty8TJAFq4jjEduF/P1crrIWY7ceRDfU2d/554FuEeffc15hwTt m9yRekuhbZijeStarBNKh2Cha35eSDWudl04JatP+35zi/m4HHaKZtsI/ w=; X-IronPort-AV: E=Sophos;i="5.78,420,1599523200"; d="scan'208";a="71288003" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-1a-16acd5e0.us-east-1.amazon.com) ([10.43.8.2]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 15 Dec 2020 11:58:40 +0000 Received: from EX13D31EUA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-1a-16acd5e0.us-east-1.amazon.com (Postfix) with ESMTPS id 37761A213D; Tue, 15 Dec 2020 11:58:27 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.162.252) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 15 Dec 2020 11:58:11 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v23 06/15] mm/damon: Add a tracepoint Date: Tue, 15 Dec 2020 12:54:39 +0100 Message-ID: <20201215115448.25633-7-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201215115448.25633-1-sjpark@amazon.com> References: <20201215115448.25633-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.162.252] X-ClientProxiedBy: EX13D33UWC002.ant.amazon.com (10.43.162.11) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park This commit adds a tracepoint for DAMON. It traces the monitoring results of each region for each aggregation interval. Using this, DAMON can easily integrated with tracepoints supporting tools such as perf. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster Reviewed-by: Steven Rostedt (VMware) --- include/trace/events/damon.h | 43 ++++++++++++++++++++++++++++++++++++ mm/damon/core.c | 7 +++++- 2 files changed, 49 insertions(+), 1 deletion(-) create mode 100644 include/trace/events/damon.h diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h new file mode 100644 index 000000000000..2f422f4f1fb9 --- /dev/null +++ b/include/trace/events/damon.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM damon + +#if !defined(_TRACE_DAMON_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DAMON_H + +#include +#include +#include + +TRACE_EVENT(damon_aggregated, + + TP_PROTO(struct damon_target *t, struct damon_region *r, + unsigned int nr_regions), + + TP_ARGS(t, r, nr_regions), + + TP_STRUCT__entry( + __field(unsigned long, target_id) + __field(unsigned int, nr_regions) + __field(unsigned long, start) + __field(unsigned long, end) + __field(unsigned int, nr_accesses) + ), + + TP_fast_assign( + __entry->target_id = t->id; + __entry->nr_regions = nr_regions; + __entry->start = r->ar.start; + __entry->end = r->ar.end; + __entry->nr_accesses = r->nr_accesses; + ), + + TP_printk("target_id=%lu nr_regions=%u %lu-%lu: %u", + __entry->target_id, __entry->nr_regions, + __entry->start, __entry->end, __entry->nr_accesses) +); + +#endif /* _TRACE_DAMON_H */ + +/* This part must be outside protection */ +#include diff --git a/mm/damon/core.c b/mm/damon/core.c index 0f9beb60d9dd..5ca9f79ccbb6 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -13,6 +13,9 @@ #include #include +#define CREATE_TRACE_POINTS +#include + /* Get a random number in [l, r) */ #define damon_rand(l, r) (l + prandom_u32_max(r - l)) @@ -384,8 +387,10 @@ static void kdamond_reset_aggregated(struct damon_ctx *c) damon_for_each_target(t, c) { struct damon_region *r; - damon_for_each_region(r, t) + damon_for_each_region(r, t) { + trace_damon_aggregated(t, r, damon_nr_regions(t)); r->nr_accesses = 0; + } } } From patchwork Tue Dec 15 11:54:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11974635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30490C2BB9A for ; Tue, 15 Dec 2020 11:59:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 810E6222B6 for ; Tue, 15 Dec 2020 11:59:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 810E6222B6 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0A8726B006C; Tue, 15 Dec 2020 06:59:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0576D8D0002; Tue, 15 Dec 2020 06:59:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E62E06B0071; Tue, 15 Dec 2020 06:59:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C5EE96B006C for ; Tue, 15 Dec 2020 06:59:09 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9100B8249980 for ; Tue, 15 Dec 2020 11:59:09 +0000 (UTC) X-FDA: 77595370818.02.wire55_131545d27423 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 70FB510097AA0 for ; Tue, 15 Dec 2020 11:59:09 +0000 (UTC) X-HE-Tag: wire55_131545d27423 X-Filterd-Recvd-Size: 18481 Received: from smtp-fw-9101.amazon.com (smtp-fw-9101.amazon.com [207.171.184.25]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 11:59:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1608033549; x=1639569549; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=JjbvHlBJ8zDN5ZFGnsdHpCv1YCRLpbG8itdbb6wcvCM=; b=g+oBLH3hOHKlRntbePy7yqspw+s3rgsJm7+4WpB9JOiu/NI0MaO4lLfg 9CbMK9zNYzfIuv03tC0yWXHxZTsYbIqxKdFktjhVvQOAoG9cbN+4rT3hP +HPX8eIwzNNYXz6yDJcAKsAA3eC6Ps2d8Uaa/I0vCmVEOx5uSOGKUDv8T s=; X-IronPort-AV: E=Sophos;i="5.78,420,1599523200"; d="scan'208";a="96129498" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1e-303d0b0e.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP; 15 Dec 2020 11:59:05 +0000 Received: from EX13D31EUA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-1e-303d0b0e.us-east-1.amazon.com (Postfix) with ESMTPS id 0F0AFA1E7F; Tue, 15 Dec 2020 11:58:52 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.162.252) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 15 Dec 2020 11:58:36 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v23 07/15] mm/damon: Implement a debugfs-based user space interface Date: Tue, 15 Dec 2020 12:54:40 +0100 Message-ID: <20201215115448.25633-8-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201215115448.25633-1-sjpark@amazon.com> References: <20201215115448.25633-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.162.252] X-ClientProxiedBy: EX13D33UWC002.ant.amazon.com (10.43.162.11) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park DAMON is designed to be used by kernel space code such as the memory management subsystems, and therefore it provides only kernel space API. That said, letting the user space control DAMON could provide some benefits to them. For example, it will allow user space to analyze their specific workloads and make their own special optimizations. For such cases, this commit implements a simple DAMON application kernel module, namely 'damon-dbgfs', which merely wraps the DAMON api and exports those to the user space via the debugfs. 'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and ``monitor_on`` under its debugfs directory, ``/damon/``. Attributes ---------- Users can read and write the ``sampling interval``, ``aggregation interval``, ``regions update interval``, and min/max number of monitoring target regions by reading from and writing to the ``attrs`` file. For example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10, 1000 and check it again:: # cd /damon # echo 5000 100000 1000000 10 1000 > attrs # cat attrs 5000 100000 1000000 10 1000 Target IDs ---------- Some types of address spaces supports multiple monitoring target. For example, the virtual memory address spaces monitoring can have multiple processes as the monitoring targets. Users can set the targets by writing relevant id values of the targets to, and get the ids of the current targets by reading from the ``target_ids`` file. In case of the virtual address spaces monitoring, the values should be pids of the monitoring target processes. For example, below commands set processes having pids 42 and 4242 as the monitoring targets and check it again:: # cd /damon # echo 42 4242 > target_ids # cat target_ids 42 4242 Note that setting the target ids doesn't start the monitoring. Turning On/Off -------------- Setting the files as described above doesn't incur effect unless you explicitly start the monitoring. You can start, stop, and check the current status of the monitoring by writing to and reading from the ``monitor_on`` file. Writing ``on`` to the file starts the monitoring of the targets with the attributes. Writing ``off`` to the file stops those. DAMON also stops if every targets are invalidated (in case of the virtual memory monitoring, target processes are invalidated when terminated). Below example commands turn on, off, and check the status of DAMON:: # cd /damon # echo on > monitor_on # echo off > monitor_on # cat monitor_on off Please note that you cannot write to the above-mentioned debugfs files while the monitoring is turned on. If you write to the files while DAMON is running, an error code such as ``-EBUSY`` will be returned. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 3 + mm/damon/Kconfig | 9 ++ mm/damon/Makefile | 1 + mm/damon/core.c | 45 ++++++ mm/damon/dbgfs.c | 366 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 424 insertions(+) create mode 100644 mm/damon/dbgfs.c diff --git a/include/linux/damon.h b/include/linux/damon.h index 39b4d6d3ddee..f9e0d4349352 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -265,9 +265,12 @@ unsigned int damon_nr_regions(struct damon_target *t); struct damon_ctx *damon_new_ctx(enum damon_target_type type); void damon_destroy_ctx(struct damon_ctx *ctx); +int damon_set_targets(struct damon_ctx *ctx, + unsigned long *ids, ssize_t nr_ids); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, unsigned long aggr_int, unsigned long regions_update_int, unsigned long min_nr_reg, unsigned long max_nr_reg); +int damon_nr_running_ctxs(void); int damon_start(struct damon_ctx **ctxs, int nr_ctxs); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index 8ae080c52950..72f1683ba0ee 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -21,4 +21,13 @@ config DAMON_VADDR This builds the default data access monitoring primitives for DAMON that works for virtual address spaces. +config DAMON_DBGFS + bool "DAMON debugfs interface" + depends on DAMON_VADDR && DEBUG_FS + help + This builds the debugfs interface for DAMON. The user space admins + can use the interface for arbitrary data access monitoring. + + If unsure, say N. + endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile index 6ebbd08aed67..fed4be3bace3 100644 --- a/mm/damon/Makefile +++ b/mm/damon/Makefile @@ -2,3 +2,4 @@ obj-$(CONFIG_DAMON) := core.o obj-$(CONFIG_DAMON_VADDR) += vaddr.o +obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o diff --git a/mm/damon/core.c b/mm/damon/core.c index 5ca9f79ccbb6..b9575a6bebff 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -166,6 +166,37 @@ void damon_destroy_ctx(struct damon_ctx *ctx) kfree(ctx); } +/** + * damon_set_targets() - Set monitoring targets. + * @ctx: monitoring context + * @ids: array of target ids + * @nr_ids: number of entries in @ids + * + * This function should not be called while the kdamond is running. + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_set_targets(struct damon_ctx *ctx, + unsigned long *ids, ssize_t nr_ids) +{ + ssize_t i; + struct damon_target *t, *next; + + damon_for_each_target_safe(t, next, ctx) + damon_destroy_target(t); + + for (i = 0; i < nr_ids; i++) { + t = damon_new_target(ids[i]); + if (!t) { + pr_err("Failed to alloc damon_target\n"); + return -ENOMEM; + } + damon_add_target(ctx, t); + } + + return 0; +} + /** * damon_set_attrs() - Set attributes for the monitoring. * @ctx: monitoring context @@ -206,6 +237,20 @@ int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, return 0; } +/** + * damon_nr_running_ctxs() - Return number of currently running contexts. + */ +int damon_nr_running_ctxs(void) +{ + int nr_ctxs; + + mutex_lock(&damon_lock); + nr_ctxs = nr_running_ctxs; + mutex_unlock(&damon_lock); + + return nr_ctxs; +} + /* Returns the size upper limit for each monitoring region */ static unsigned long damon_region_sz_limit(struct damon_ctx *ctx) { diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c new file mode 100644 index 000000000000..fd1665a183c2 --- /dev/null +++ b/mm/damon/dbgfs.c @@ -0,0 +1,366 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * DAMON Debugfs Interface + * + * Author: SeongJae Park + */ + +#define pr_fmt(fmt) "damon-dbgfs: " fmt + +#include +#include +#include +#include +#include +#include +#include + +static struct damon_ctx **dbgfs_ctxs; +static int dbgfs_nr_ctxs; +static struct dentry **dbgfs_dirs; + +/* + * Returns non-empty string on success, negarive error code otherwise. + */ +static char *user_input_str(const char __user *buf, size_t count, loff_t *ppos) +{ + char *kbuf; + ssize_t ret; + + /* We do not accept continuous write */ + if (*ppos) + return ERR_PTR(-EINVAL); + + kbuf = kmalloc(count + 1, GFP_KERNEL); + if (!kbuf) + return ERR_PTR(-ENOMEM); + + ret = simple_write_to_buffer(kbuf, count + 1, ppos, buf, count); + if (ret != count) { + kfree(kbuf); + return ERR_PTR(-EIO); + } + kbuf[ret] = '\0'; + + return kbuf; +} + +static ssize_t dbgfs_attrs_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + char kbuf[128]; + int ret; + + mutex_lock(&ctx->kdamond_lock); + ret = scnprintf(kbuf, ARRAY_SIZE(kbuf), "%lu %lu %lu %lu %lu\n", + ctx->sample_interval, ctx->aggr_interval, + ctx->regions_update_interval, ctx->min_nr_regions, + ctx->max_nr_regions); + mutex_unlock(&ctx->kdamond_lock); + + return simple_read_from_buffer(buf, count, ppos, kbuf, ret); +} + +static ssize_t dbgfs_attrs_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + unsigned long s, a, r, minr, maxr; + char *kbuf; + ssize_t ret = count; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + if (sscanf(kbuf, "%lu %lu %lu %lu %lu", + &s, &a, &r, &minr, &maxr) != 5) { + ret = -EINVAL; + goto out; + } + + mutex_lock(&ctx->kdamond_lock); + if (ctx->kdamond) { + ret = -EBUSY; + goto unlock_out; + } + + err = damon_set_attrs(ctx, s, a, r, minr, maxr); + if (err) + ret = err; +unlock_out: + mutex_unlock(&ctx->kdamond_lock); +out: + kfree(kbuf); + return ret; +} + +#define targetid_is_pid(ctx) \ + (ctx->primitive.target_valid == damon_va_target_valid) + +static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len) +{ + struct damon_target *t; + unsigned long id; + int written = 0; + int rc; + + damon_for_each_target(t, ctx) { + id = t->id; + if (targetid_is_pid(ctx)) + /* Show pid numbers to debugfs users */ + id = (unsigned long)pid_vnr((struct pid *)id); + + rc = scnprintf(&buf[written], len - written, "%lu ", id); + if (!rc) + return -ENOMEM; + written += rc; + } + if (written) + written -= 1; + written += scnprintf(&buf[written], len - written, "\n"); + return written; +} + +static ssize_t dbgfs_target_ids_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + ssize_t len; + char ids_buf[320]; + + mutex_lock(&ctx->kdamond_lock); + len = sprint_target_ids(ctx, ids_buf, 320); + mutex_unlock(&ctx->kdamond_lock); + if (len < 0) + return len; + + return simple_read_from_buffer(buf, count, ppos, ids_buf, len); +} + +/* + * Converts a string into an array of unsigned long integers + * + * Returns an array of unsigned long integers if the conversion success, or + * NULL otherwise. + */ +static unsigned long *str_to_target_ids(const char *str, ssize_t len, + ssize_t *nr_ids) +{ + unsigned long *ids; + const int max_nr_ids = 32; + unsigned long id; + int pos = 0, parsed, ret; + + *nr_ids = 0; + ids = kmalloc_array(max_nr_ids, sizeof(id), GFP_KERNEL); + if (!ids) + return NULL; + while (*nr_ids < max_nr_ids && pos < len) { + ret = sscanf(&str[pos], "%lu%n", &id, &parsed); + pos += parsed; + if (ret != 1) + break; + ids[*nr_ids] = id; + *nr_ids += 1; + } + + return ids; +} + +static ssize_t dbgfs_target_ids_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + char *kbuf, *nrs; + unsigned long *targets; + ssize_t nr_targets; + ssize_t ret = count; + int i; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + nrs = kbuf; + + targets = str_to_target_ids(nrs, ret, &nr_targets); + if (!targets) { + ret = -ENOMEM; + goto out; + } + + if (targetid_is_pid(ctx)) { + for (i = 0; i < nr_targets; i++) + targets[i] = (unsigned long)find_get_pid( + (int)targets[i]); + } + + mutex_lock(&ctx->kdamond_lock); + if (ctx->kdamond) { + ret = -EINVAL; + goto unlock_out; + } + + err = damon_set_targets(ctx, targets, nr_targets); + if (err) + ret = err; +unlock_out: + mutex_unlock(&ctx->kdamond_lock); + kfree(targets); +out: + kfree(kbuf); + return ret; +} + +static int damon_dbgfs_open(struct inode *inode, struct file *file) +{ + file->private_data = inode->i_private; + + return nonseekable_open(inode, file); +} + +static const struct file_operations attrs_fops = { + .owner = THIS_MODULE, + .open = damon_dbgfs_open, + .read = dbgfs_attrs_read, + .write = dbgfs_attrs_write, +}; + +static const struct file_operations target_ids_fops = { + .owner = THIS_MODULE, + .open = damon_dbgfs_open, + .read = dbgfs_target_ids_read, + .write = dbgfs_target_ids_write, +}; + +static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) +{ + const char * const file_names[] = {"attrs", "target_ids"}; + const struct file_operations *fops[] = {&attrs_fops, &target_ids_fops}; + int i; + + for (i = 0; i < ARRAY_SIZE(file_names); i++) { + if (!debugfs_create_file(file_names[i], 0600, dir, + ctx, fops[i])) { + pr_err("failed to create %s file\n", file_names[i]); + return -ENOMEM; + } + } + + return 0; +} + +static struct damon_ctx *dbgfs_new_ctx(void) +{ + struct damon_ctx *ctx; + + ctx = damon_new_ctx(DAMON_ADAPTIVE_TARGET); + if (!ctx) + return NULL; + + damon_va_set_primitives(ctx); + return ctx; +} + +static ssize_t dbgfs_monitor_on_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + char monitor_on_buf[5]; + bool monitor_on = damon_nr_running_ctxs() != 0; + int len; + + len = scnprintf(monitor_on_buf, 5, monitor_on ? "on\n" : "off\n"); + + return simple_read_from_buffer(buf, count, ppos, monitor_on_buf, len); +} + +static ssize_t dbgfs_monitor_on_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + ssize_t ret = count; + char *kbuf; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + /* Remove white space */ + if (sscanf(kbuf, "%s", kbuf) != 1) { + kfree(kbuf); + return -EINVAL; + } + + if (!strncmp(kbuf, "on", count)) + err = damon_start(dbgfs_ctxs, dbgfs_nr_ctxs); + else if (!strncmp(kbuf, "off", count)) + err = damon_stop(dbgfs_ctxs, dbgfs_nr_ctxs); + else + err = -EINVAL; + + if (err) + ret = err; + kfree(kbuf); + return ret; +} + +static const struct file_operations monitor_on_fops = { + .owner = THIS_MODULE, + .read = dbgfs_monitor_on_read, + .write = dbgfs_monitor_on_write, +}; + +static int __init __damon_dbgfs_init(void) +{ + struct dentry *dbgfs_root; + const char * const file_names[] = {"monitor_on"}; + const struct file_operations *fops[] = {&monitor_on_fops}; + int i; + + dbgfs_root = debugfs_create_dir("damon", NULL); + if (IS_ERR(dbgfs_root)) { + pr_err("failed to create the dbgfs dir\n"); + return PTR_ERR(dbgfs_root); + } + + for (i = 0; i < ARRAY_SIZE(file_names); i++) { + if (!debugfs_create_file(file_names[i], 0600, dbgfs_root, + NULL, fops[i])) { + pr_err("failed to create %s file\n", file_names[i]); + return -ENOMEM; + } + } + dbgfs_fill_ctx_dir(dbgfs_root, dbgfs_ctxs[0]); + + dbgfs_dirs = kmalloc_array(1, sizeof(dbgfs_root), GFP_KERNEL); + dbgfs_dirs[0] = dbgfs_root; + + return 0; +} + +/* + * Functions for the initialization + */ + +static int __init damon_dbgfs_init(void) +{ + int rc; + + dbgfs_ctxs = kmalloc(sizeof(*dbgfs_ctxs), GFP_KERNEL); + dbgfs_ctxs[0] = dbgfs_new_ctx(); + if (!dbgfs_ctxs[0]) + return -ENOMEM; + dbgfs_nr_ctxs = 1; + + rc = __damon_dbgfs_init(); + if (rc) + pr_err("%s: dbgfs init failed\n", __func__); + + return rc; +} + +module_init(damon_dbgfs_init); From patchwork Tue Dec 15 11:54:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11974637 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 695C5C4361B for ; Tue, 15 Dec 2020 11:59:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C84A12246B for ; Tue, 15 Dec 2020 11:59:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C84A12246B Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 365526B0068; Tue, 15 Dec 2020 06:59:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2ED006B006E; Tue, 15 Dec 2020 06:59:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DDC98D0002; Tue, 15 Dec 2020 06:59:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0214.hostedemail.com [216.40.44.214]) by kanga.kvack.org (Postfix) with ESMTP id 071636B0068 for ; Tue, 15 Dec 2020 06:59:58 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id BEE84363B for ; Tue, 15 Dec 2020 11:59:57 +0000 (UTC) X-FDA: 77595372834.08.cord31_060d24d27423 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 9FEA61819E76B for ; Tue, 15 Dec 2020 11:59:57 +0000 (UTC) X-HE-Tag: cord31_060d24d27423 X-Filterd-Recvd-Size: 12604 Received: from smtp-fw-6001.amazon.com (smtp-fw-6001.amazon.com [52.95.48.154]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 11:59:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1608033598; x=1639569598; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=mmig0xWAuH+8R7buRR+xGc43vRr7JSIK7LjZr2gl/KM=; b=l4xR0HP8UH3oDL3jv2IJ6Vptp6h5+KRZs8bfe+B7CTm6f72/bdKhq89D 1Uhz8cRq1a3PgjJVtxb7urbdy5fp8JfzwuzSBRdzQMBcdKe+GFSBar6mE mH46Gh1xFpXavCOEp/E+b2PZ67m4q7f//umi0qdKYMUOK+pP8jOgz8uxR w=; X-IronPort-AV: E=Sophos;i="5.78,420,1599523200"; d="scan'208";a="72726783" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-1a-821c648d.us-east-1.amazon.com) ([10.43.8.2]) by smtp-border-fw-out-6001.iad6.amazon.com with ESMTP; 15 Dec 2020 11:59:42 +0000 Received: from EX13D31EUA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-1a-821c648d.us-east-1.amazon.com (Postfix) with ESMTPS id 1F4EFA0289; Tue, 15 Dec 2020 11:59:29 +0000 (UTC) Received: from u3f2cd687b01c55.ant.amazon.com (10.43.162.252) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 15 Dec 2020 11:59:13 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v23 08/15] mm/damon/dbgfs: Implement recording feature Date: Tue, 15 Dec 2020 12:54:41 +0100 Message-ID: <20201215115448.25633-9-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201215115448.25633-1-sjpark@amazon.com> References: <20201215115448.25633-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.162.252] X-ClientProxiedBy: EX13D33UWC002.ant.amazon.com (10.43.162.11) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park The user space users can control DAMON and get the monitoring results via the 'damon_aggregated' tracepoint event. However, dealing with the tracepoint might be complex for some simple use cases. This commit therefore implements 'recording' feature in 'damon-dbgfs'. The feature can be used via 'record' file in the '/damon/' directory. The file allows users to record monitored access patterns in a regular binary file. The recorded results are first written in an in-memory buffer and flushed to a file in batch. Users can get and set the size of the buffer and the path to the result file by reading from and writing to the ``record`` file. For example, below commands set the buffer to be 4 KiB and the result to be saved in ``/damon.data``. :: # cd /damon # echo "4096 /damon.data" > record # cat record 4096 /damon.data The recording can be disabled by setting the buffer size zero. Signed-off-by: SeongJae Park --- mm/damon/dbgfs.c | 261 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 259 insertions(+), 2 deletions(-) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index fd1665a183c2..733d2f3e646e 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -15,6 +15,17 @@ #include #include +#define MIN_RECORD_BUFFER_LEN 1024 +#define MAX_RECORD_BUFFER_LEN (4 * 1024 * 1024) +#define MAX_RFILE_PATH_LEN 256 + +struct dbgfs_recorder { + unsigned char *rbuf; + unsigned int rbuf_len; + unsigned int rbuf_offset; + char *rfile_path; +}; + static struct damon_ctx **dbgfs_ctxs; static int dbgfs_nr_ctxs; static struct dentry **dbgfs_dirs; @@ -97,6 +108,116 @@ static ssize_t dbgfs_attrs_write(struct file *file, return ret; } +static ssize_t dbgfs_record_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + struct dbgfs_recorder *rec = ctx->callback.private; + char record_buf[20 + MAX_RFILE_PATH_LEN]; + int ret; + + mutex_lock(&ctx->kdamond_lock); + ret = scnprintf(record_buf, ARRAY_SIZE(record_buf), "%u %s\n", + rec->rbuf_len, rec->rfile_path); + mutex_unlock(&ctx->kdamond_lock); + return simple_read_from_buffer(buf, count, ppos, record_buf, ret); +} + +/* + * dbgfs_set_recording() - Set attributes for the recording. + * @ctx: target kdamond context + * @rbuf_len: length of the result buffer + * @rfile_path: path to the monitor result files + * + * Setting 'rbuf_len' 0 disables recording. + * + * This function should not be called while the kdamond is running. + * + * Return: 0 on success, negative error code otherwise. + */ +static int dbgfs_set_recording(struct damon_ctx *ctx, + unsigned int rbuf_len, char *rfile_path) +{ + struct dbgfs_recorder *recorder; + size_t rfile_path_len; + + if (rbuf_len && (rbuf_len > MAX_RECORD_BUFFER_LEN || + rbuf_len < MIN_RECORD_BUFFER_LEN)) { + pr_err("result buffer size (%u) is out of [%d,%d]\n", + rbuf_len, MIN_RECORD_BUFFER_LEN, + MAX_RECORD_BUFFER_LEN); + return -EINVAL; + } + rfile_path_len = strnlen(rfile_path, MAX_RFILE_PATH_LEN); + if (rfile_path_len >= MAX_RFILE_PATH_LEN) { + pr_err("too long (>%d) result file path %s\n", + MAX_RFILE_PATH_LEN, rfile_path); + return -EINVAL; + } + + recorder = ctx->callback.private; + if (!recorder) { + recorder = kzalloc(sizeof(*recorder), GFP_KERNEL); + if (!recorder) + return -ENOMEM; + ctx->callback.private = recorder; + } + + recorder->rbuf_len = rbuf_len; + kfree(recorder->rbuf); + recorder->rbuf = NULL; + kfree(recorder->rfile_path); + recorder->rfile_path = NULL; + + if (rbuf_len) { + recorder->rbuf = kvmalloc(rbuf_len, GFP_KERNEL); + if (!recorder->rbuf) + return -ENOMEM; + } + recorder->rfile_path = kmalloc(rfile_path_len + 1, GFP_KERNEL); + if (!recorder->rfile_path) + return -ENOMEM; + strncpy(recorder->rfile_path, rfile_path, rfile_path_len + 1); + + return 0; +} + +static ssize_t dbgfs_record_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + char *kbuf; + unsigned int rbuf_len; + char rfile_path[MAX_RFILE_PATH_LEN]; + ssize_t ret = count; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + if (sscanf(kbuf, "%u %s", + &rbuf_len, rfile_path) != 2) { + ret = -EINVAL; + goto out; + } + + mutex_lock(&ctx->kdamond_lock); + if (ctx->kdamond) { + ret = -EBUSY; + goto unlock_out; + } + + err = dbgfs_set_recording(ctx, rbuf_len, rfile_path); + if (err) + ret = err; +unlock_out: + mutex_unlock(&ctx->kdamond_lock); +out: + kfree(kbuf); + return ret; +} + #define targetid_is_pid(ctx) \ (ctx->primitive.target_valid == damon_va_target_valid) @@ -230,6 +351,13 @@ static const struct file_operations attrs_fops = { .write = dbgfs_attrs_write, }; +static const struct file_operations record_fops = { + .owner = THIS_MODULE, + .open = damon_dbgfs_open, + .read = dbgfs_record_read, + .write = dbgfs_record_write, +}; + static const struct file_operations target_ids_fops = { .owner = THIS_MODULE, .open = damon_dbgfs_open, @@ -239,8 +367,9 @@ static const struct file_operations target_ids_fops = { static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) { - const char * const file_names[] = {"attrs", "target_ids"}; - const struct file_operations *fops[] = {&attrs_fops, &target_ids_fops}; + const char * const file_names[] = {"attrs", "record", "target_ids"}; + const struct file_operations *fops[] = {&attrs_fops, &record_fops, + &target_ids_fops}; int i; for (i = 0; i < ARRAY_SIZE(file_names); i++) { @@ -254,6 +383,126 @@ static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) return 0; } +/* + * Flush the content in the result buffer to the result file + */ +static void dbgfs_flush_rbuffer(struct dbgfs_recorder *rec) +{ + ssize_t sz; + loff_t pos = 0; + struct file *rfile; + + if (!rec->rbuf_offset) + return; + + rfile = filp_open(rec->rfile_path, + O_CREAT | O_RDWR | O_APPEND | O_LARGEFILE, 0644); + if (IS_ERR(rfile)) { + pr_err("Cannot open the result file %s\n", + rec->rfile_path); + return; + } + + while (rec->rbuf_offset) { + sz = kernel_write(rfile, rec->rbuf, rec->rbuf_offset, &pos); + if (sz < 0) + break; + rec->rbuf_offset -= sz; + } + filp_close(rfile, NULL); +} + +/* + * Write a data into the result buffer + */ +static void dbgfs_write_rbuf(struct damon_ctx *ctx, void *data, ssize_t size) +{ + struct dbgfs_recorder *rec = ctx->callback.private; + + if (!rec->rbuf_len || !rec->rbuf || !rec->rfile_path) + return; + if (rec->rbuf_offset + size > rec->rbuf_len) + dbgfs_flush_rbuffer(ctx->callback.private); + if (rec->rbuf_offset + size > rec->rbuf_len) { + pr_warn("%s: flush failed, or wrong size given(%u, %zu)\n", + __func__, rec->rbuf_offset, size); + return; + } + + memcpy(&rec->rbuf[rec->rbuf_offset], data, size); + rec->rbuf_offset += size; +} + +static void dbgfs_write_record_header(struct damon_ctx *ctx) +{ + int recfmt_ver = 2; + + dbgfs_write_rbuf(ctx, "damon_recfmt_ver", 16); + dbgfs_write_rbuf(ctx, &recfmt_ver, sizeof(recfmt_ver)); +} + +static unsigned int nr_damon_targets(struct damon_ctx *ctx) +{ + struct damon_target *t; + unsigned int nr_targets = 0; + + damon_for_each_target(t, ctx) + nr_targets++; + + return nr_targets; +} + +static int dbgfs_before_start(struct damon_ctx *ctx) +{ + dbgfs_write_record_header(ctx); + return 0; +} + +/* + * Store the aggregated monitoring results to the result buffer + * + * The format for the result buffer is as below: + * + *