From patchwork Thu Jan 23 20:24:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13948611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD04BC02182 for ; Thu, 23 Jan 2025 20:25:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 46B86280016; Thu, 23 Jan 2025 15:25:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F361280014; Thu, 23 Jan 2025 15:25:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24649280016; Thu, 23 Jan 2025 15:25:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EF59D280014 for ; Thu, 23 Jan 2025 15:25:43 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9A4001205BA for ; Thu, 23 Jan 2025 20:25:43 +0000 (UTC) X-FDA: 83039847366.21.F79D81F Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) by imf23.hostedemail.com (Postfix) with ESMTP id B356314000A for ; Thu, 23 Jan 2025 20:25:41 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=dubeyko-com.20230601.gappssmtp.com header.s=20230601 header.b=fLkxFOIE; spf=pass (imf23.hostedemail.com: domain of slava@dubeyko.com designates 209.85.167.182 as permitted sender) smtp.mailfrom=slava@dubeyko.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737663941; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Doo6OpKUKSeJFdcqNGn9ZvydOTnZMVGPBEGCdkAtdjQ=; b=kexajK9RFPDfnU5a0Ah8zOLw4OGNjO2/zUQiDO2gOUQXUUF5Nqwz5cPQZBkWztSUZNbH4P 88gm+6sDLst9YI14N79pTzAtx/q30WpP6ab5CqUw0KV9d2UHP+un9rnmwLv0WcvqnumQ9e D/9mca5FswWprLDawxUTFqCx2hTCWs0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=dubeyko-com.20230601.gappssmtp.com header.s=20230601 header.b=fLkxFOIE; spf=pass (imf23.hostedemail.com: domain of slava@dubeyko.com designates 209.85.167.182 as permitted sender) smtp.mailfrom=slava@dubeyko.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737663941; a=rsa-sha256; cv=none; b=nKueDM+tys5Xpk7Ru2V2vMH/yrESJEbxDVfK5tf2jj7+uc7hdRKLgeIyHDqUH31U5a1f+/ 67m1jKdIyZZXbVFVG4KwRo6q9P96rQxol+YFGN+IzK2fx+oY/eC+zg6h/a3rzdokn0Ij9J za61ZlM/eA0umph2WWsh8HG1miP7UxE= Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-3eb7edfa42dso726224b6e.2 for ; Thu, 23 Jan 2025 12:25:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20230601.gappssmtp.com; s=20230601; t=1737663940; x=1738268740; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Doo6OpKUKSeJFdcqNGn9ZvydOTnZMVGPBEGCdkAtdjQ=; b=fLkxFOIEVs9VA01bItEJGMU9s2fWvZ1PRiWZ5fJl5xBEbgaBez2T4aoDdPzCfcuZRY aYiWSbynousbeXHaXy4ZXyT32xW6iTrUAKBpmdGD4ICY2+KiF1/aibHGxVVfmbWMDsJM rp0LwQr/xrN0MXjDw0tsEGaw1XR9TuuJqxCgcSVROGc0JnI/wnLGAlSRm2K3WHglvWVZ mRMDtzFtI+1DeUSskdIsauGVO7UAjGkClhL+Ty8DaFp2tjL3utB5N5P2iwY9iMvaUNqi xWDXJ8JCEoL/B9C7aaz22SdXS6zzWFyf+Aqy/dzhyz9e962qnWx5NdmSAtmL1YD+L0pL mnLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737663940; x=1738268740; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Doo6OpKUKSeJFdcqNGn9ZvydOTnZMVGPBEGCdkAtdjQ=; b=p2O4ILaVKYvLCgwVRXI+fVqc3Tx/SiwrAiJMKvO+qRsXKFk9c2Dce023kMj1JvJHVR K9SE0Cvg3cGLPkSh0GlJTB850T3yC1o+rvqDlnO0LQsZXpJ7wUU5PWSs3vX9nljmLuWQ 47ReyHpljKOdZJmmbp3/BN0pcTaJFO6s1jSOokad5bxXr2j9kaay8Kr8kxxpYsMJlb49 rqj+QudrxjlBqPDafDQInggJbLmZnEBcAe4/uSPPpqiVg76U1eiyuSgycvWjGIJhyuLl uqC2DLVrL6WkZ7s40nXF+02vqd9QTXn+k5t87dLczl30OxMsRwHe/46oseAY5L7N2Pkv V6nQ== X-Gm-Message-State: AOJu0YysfQh+/M3fGrtZ+0gJ+fhURj7dZOuBcny0xeN2PSpUFB9URML7 5j2uHJsH/Yd6o2bt1WOpDhQN+ojEM0DASH4wLpeeEYCmavXHshY5kENcEAlR5QM= X-Gm-Gg: ASbGncs4SzKb6rfBP8zEG0+zHQ0gkL/67+STd03+we8P5+zUBjXRyR+GMdpZjGi3uIj UhrCYR2BP7I02XfeyRPwXMXggrec5anJ/9J7cDy4pli7CiQJ4MdK0xyf/EBaKtsStuK0Gn2u3TA J8R++CvQXEMxJv6oAhJpOTd1yNHTRqcTwfjfJPoTEpk0CKwI35jhoDtpv5eBiBMACgSxwrnIVnj tiv/l8QWDYr1HIquNpNTdBoU8GmEwDX4d4pBzt16SANprBuSNQx5h9kfg/aB8wzxYY2BFy6j478 M7X7iJa5Tx/R6sRmXfs= X-Google-Smtp-Source: AGHT+IHOjPIRebFFC7+1EL6dzjviI8d8fvr3PDtB1yuZkcEmjhVfWt4akGRmKow35r+X+I2ee+ZkMg== X-Received: by 2002:a05:6808:2286:b0:3eb:695f:5382 with SMTP id 5614622812f47-3f19fc1a025mr19102254b6e.13.1737663940554; Thu, 23 Jan 2025 12:25:40 -0800 (PST) Received: from system76-pc.attlocal.net ([2600:1700:6476:1430:3505:6c:7825:7b9]) by smtp.gmail.com with ESMTPSA id 5614622812f47-3f1f09810f7sm53856b6e.37.2025.01.23.12.25.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 12:25:39 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-mm@kvack.org, javier.gonz@samsung.com, Slava.Dubeyko@ibm.com, Viacheslav Dubeyko Subject: [RFC PATCH] Introduce generalized data temperature estimation framework Date: Thu, 23 Jan 2025 12:24:55 -0800 Message-ID: <20250123202455.11338-1-slava@dubeyko.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Rspamd-Queue-Id: B356314000A X-Stat-Signature: nyusuetdr3ojadetswtoe9qfcoh64f54 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1737663941-164086 X-HE-Meta: U2FsdGVkX18vr2BIgb78QMWk/j51tl6fHVTFZEAi2e6QFmCmlDiYXQ1st2OjRuxDkffuHKEX6x//7b/Qj/oAUsFY9l8rWWGWCulA3I/IJLmD2/+cCHixvXGibvkpAgdIooulciKzDGz0zWP0G2MNCnyRR9StvHDrsE7omtdhjrwuajM8EPdXYgQ1FjrVtqaPuQ82MzIpUNhExmiS2nrZ4awr2UBxwtQJxEdkk19xpdzWD7HZjKYW0wTfHAhOQcpgx7k0Y5YvnMIb7zjKf/+qkI/cDCHXLRGF+ypDikR6MAWCNO//5KsDcuIG/7iTaLifj1NVHtIkPPBipqoawQSwZaVzuRfSkIx5fuhWEriM2CMd9n4hsEZ0XhvGKVVf1mTLX5UcFmiUHVQuhAr+lI/7+r2G1H5hkjDsOTtdMQZ71ffOyNUqinA77CFYQ8dw9GdQjevSnGrKgaJZSCpKRquFF3cj0U7iUHLLclhnqs8ehQrt+RnlxfMleeHhPbBmQhXLtMwkqD8PNzw20ALIDOMeU/aaAXUYrwmiF3rF2lXMabAahXejTZmu7rq1vpAVFp8UPN+zE9bEAX5+vbfZxajxQcHA67d/q6MCdt32qbcQF1pmO5/Sy8J9XVdM0yzOQ7pfOtzsApujzwufaYQbnzObv5Y8dKPxt9dNqScafNQm8xovh44jSnsDmqjuhW6RwfEybA7HNkZwD03EoEyDh1Z8OhzYgiI+SqnVjFqfxocu6hovieCwi7KTA0EGIWiCGrsthqtpjllvhDwxU0ei7mNrvbN/poCdENT/qe20BgAxoZgK49uT3dj3WGrQ+L5qNVIEXq9nefeNEolHzFMmjD5rrTj/V9ueQopz3kJVdOi99kU1CSVQVNRckB3+P1xdq46xuHuCv7oyczgWYf+JIT3FmO3DnOI7jP25eQPHtDm1eGDjI97D1Zb+M4zcdM0sUWfdaOba+GXilKjcOBB2hxM fsYKC5Ht gVnoej/9grlBvms1novZHwtxBE7S8iQVv2o4C8vv1UXudhCtPuEcgWsRi5YSxtgFQJQ1Fh1yzomeRA1ZfD/QVzOVeWxcIWoOCJ90PKaWgOXKnykh0qzM5RcYd24R7lX+4EYb8W3JDeEbxYiw0GdssX14aSM3Pulbl8pHwlQib7XIAIde3YY13c+2b6onndGBoZ/fg65Wy/BORphrjDQ54OrHD6I1iJGPMvI553rR6bc7MAR9oYI5G6W4J2xi6isFBPs3PYprTpHdXnLYyuSMwrruGSbZywEjwIS27kkecuKILTYK6w1oKSs5njxkSXbWroW9H4MNGddWWctZDf1/iVKjx+zhUi97PIMWvv793Xz7Ju3dbQ7Ic5/9xrM/kfU9HBUSBZM4MSA6bEbupsaYJMd3bT+mba/GL3Xz0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [PROBLEM DECLARATION] Efficient data placement policy is a Holy Grail for data storage and file system engineers. Achieving this goal is equally important and really hard. Multiple data storage and file system technologies have been invented to manage the data placement policy (for example, COW, ZNS, FDP, etc). But these technologies still require the hints related to nature of data from application side. [DATA "TEMPERATURE" CONCEPT] One of the widely used and intuitively clear idea of data nature definition is data "temperature" (cold, warm, hot data). However, data "temperature" is as intuitively sound as illusive definition of data nature. Generally speaking, thermodynamics defines temperature as a way to estimate the average kinetic energy of vibrating atoms in a substance. But we cannot see a direct analogy between data "temperature" and temperature in physics because data is not something that has kinetic energy. [WHAT IS GENERALIZED DATA "TEMPERATURE" ESTIMATION] We usually imply that if some data is updated more frequently, then such data is more hot than other one. But, it is possible to see several problems here: (1) How can we estimate the data "hotness" in quantitative way? (2) We can state that data is "hot" after some number of updates. It means that this definition implies state of the data in the past. Will this data continue to be "hot" in the future? Generally speaking, the crucial problem is how to define the data nature or data "temperature" in the future. Because, this knowledge is the fundamental basis for elaboration an efficient data placement policy. Generalized data "temperature" estimation framework suggests the way to define a future state of the data and the basis for quantitative measurement of data "temperature". [ARCHITECTURE OF FRAMEWORK] Usually, file system has a page cache for every inode. And initially memory pages become dirty in page cache. Finally, dirty pages will be sent to storage device. Technically speaking, the number of dirty pages in a particular page cache is the quantitative measurement of current "hotness" of a file. But number of dirty pages is still not stable basis for quantitative measurement of data "temperature". It is possible to suggest of using the total number of logical blocks in a file as a unit of one degree of data "temperature". As a result, if the whole file was updated several times, then "temperature" of the file has been increased for several degrees. And if the file is under continous updates, then the file "temperature" is growing. We need to keep not only current number of dirty pages, but also the number of updated pages in the near past for accumulating the total "temperature" of a file. Generally speaking, total number of updated pages in the nearest past defines the aggregated "temperature" of file. And number of dirty pages defines the delta of "temperature" growth for current update operation. This approach defines the mechanism of "temperature" growth. But if we have no more updates for the file, then "temperature" needs to decrease. Starting and ending timestamps of update operation can work as a basis for decreasing "temperature" of a file. If we know the number of updated logical blocks of the file, then we can divide the duration of update operation on number of updated logical blocks. As a result, this is the way to define a time duration per one logical block. By means of multiplying this value (time duration per one logical block) on total number of logical blocks in file, we can calculate the time duration of "temperature" decreasing for one degree. Finally, the operation of division the time range (between end of last update operation and begin of new update operation) on the time duration of "temperature" decreasing for one degree provides the way to define how many degrees should be subtracted from current "temperature" of the file. [HOW TO USE THE APPROACH] The lifetime of data "temperature" value for a file can be explained by steps: (1) iget() method sets the data "temperature" object; (2) folio_account_dirtied() method accounts the number of dirty memory pages and tries to estimate the current temperature of the file; (3) folio_clear_dirty_for_io() decrease number of dirty memory pages and increases number of updated pages; (4) folio_account_dirtied() also decreases file's "temperature" if updates hasn't happened some time; (5) file system can get file's temperature and to share the hint with block layer; (6) inode eviction method removes and free the data "temperature" object. Signed-off-by: Viacheslav Dubeyko --- fs/Kconfig | 2 + fs/Makefile | 1 + fs/data-temperature/Kconfig | 11 + fs/data-temperature/Makefile | 3 + fs/data-temperature/data_temperature.c | 347 +++++++++++++++++++++++++ include/linux/data_temperature.h | 124 +++++++++ include/linux/fs.h | 4 + mm/page-writeback.c | 9 + 8 files changed, 501 insertions(+) create mode 100644 fs/data-temperature/Kconfig create mode 100644 fs/data-temperature/Makefile create mode 100644 fs/data-temperature/data_temperature.c create mode 100644 include/linux/data_temperature.h diff --git a/fs/Kconfig b/fs/Kconfig index 64d420e3c475..ae117c2e3ce2 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -139,6 +139,8 @@ source "fs/autofs/Kconfig" source "fs/fuse/Kconfig" source "fs/overlayfs/Kconfig" +source "fs/data-temperature/Kconfig" + menu "Caches" source "fs/netfs/Kconfig" diff --git a/fs/Makefile b/fs/Makefile index 15df0a923d3a..c7e6ccac633d 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -129,3 +129,4 @@ obj-$(CONFIG_EROFS_FS) += erofs/ obj-$(CONFIG_VBOXSF_FS) += vboxsf/ obj-$(CONFIG_ZONEFS_FS) += zonefs/ obj-$(CONFIG_BPF_LSM) += bpf_fs_kfuncs.o +obj-$(CONFIG_DATA_TEMPERATURE) += data-temperature/ diff --git a/fs/data-temperature/Kconfig b/fs/data-temperature/Kconfig new file mode 100644 index 000000000000..1cade2741982 --- /dev/null +++ b/fs/data-temperature/Kconfig @@ -0,0 +1,11 @@ +# SPDX-License-Identifier: GPL-2.0 + +config DATA_TEMPERATURE + bool "Data temperature approach for efficient data placement" + help + Enable data "temperature" estimation for efficient data + placement policy. This approach is file based and + it estimates "temperature" for every file independently. + The goal of the approach is to provide valuable hints + to file system or/and SSD for isolation and proper + managament of data with different temperatures. diff --git a/fs/data-temperature/Makefile b/fs/data-temperature/Makefile new file mode 100644 index 000000000000..8e089a681360 --- /dev/null +++ b/fs/data-temperature/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_DATA_TEMPERATURE) += data_temperature.o diff --git a/fs/data-temperature/data_temperature.c b/fs/data-temperature/data_temperature.c new file mode 100644 index 000000000000..ea43fbfc3976 --- /dev/null +++ b/fs/data-temperature/data_temperature.c @@ -0,0 +1,347 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Data "temperature" paradigm implementation + * + * Copyright (c) 2024-2025 Viacheslav Dubeyko + */ + +#include +#include +#include +#include +#include + +#define TIME_IS_UNKNOWN (U64_MAX) + +struct kmem_cache *data_temperature_info_cachep; + +static inline +void create_data_temperature_info(struct data_temperature *dt_info) +{ + if (!dt_info) + return; + + atomic_set(&dt_info->temperature, 0); + dt_info->updated_blocks = 0; + dt_info->dirty_blocks = 0; + dt_info->start_timestamp = TIME_IS_UNKNOWN; + dt_info->end_timestamp = TIME_IS_UNKNOWN; + dt_info->state = DATA_TEMPERATURE_CREATED; +} + +static inline +void free_data_temperature_info(struct data_temperature *dt_info) +{ + if (!dt_info) + return; + + kmem_cache_free(data_temperature_info_cachep, dt_info); +} + +int __set_data_temperature_info(struct inode *inode) +{ + struct data_temperature *dt_info; + + dt_info = kmem_cache_zalloc(data_temperature_info_cachep, GFP_KERNEL); + if (!dt_info) + return -ENOMEM; + + spin_lock_init(&dt_info->change_lock); + create_data_temperature_info(dt_info); + + if (cmpxchg_release(&inode->i_data_temperature_info, + NULL, dt_info) != NULL) { + free_data_temperature_info(dt_info); + get_data_temperature_info(inode); + } + + return 0; +} +EXPORT_SYMBOL_GPL(__set_data_temperature_info); + +void __remove_data_temperature_info(struct inode *inode) +{ + free_data_temperature_info(inode->i_data_temperature_info); + inode->i_data_temperature_info = NULL; +} +EXPORT_SYMBOL_GPL(__remove_data_temperature_info); + +int __get_data_temperature(const struct inode *inode) +{ + struct data_temperature *dt_info; + + if (!S_ISREG(inode->i_mode)) + return 0; + + dt_info = get_data_temperature_info(inode); + if (IS_ERR_OR_NULL(dt_info)) + return 0; + + return atomic_read(&dt_info->temperature); +} +EXPORT_SYMBOL_GPL(__get_data_temperature); + +static inline +bool is_timestamp_invalid(struct data_temperature *dt_info) +{ + if (!dt_info) + return false; + + if (dt_info->start_timestamp == TIME_IS_UNKNOWN || + dt_info->end_timestamp == TIME_IS_UNKNOWN) + return true; + + if (dt_info->start_timestamp > dt_info->end_timestamp) + return true; + + return false; +} + +static inline +u64 get_current_timestamp(void) +{ + return ktime_get_boottime_ns(); +} + +static inline +void start_account_data_temperature_info(struct data_temperature *dt_info) +{ + if (!dt_info) + return; + + dt_info->dirty_blocks = 1; + dt_info->start_timestamp = get_current_timestamp(); + dt_info->end_timestamp = TIME_IS_UNKNOWN; + dt_info->state = DATA_TEMPERATURE_UPDATE_STARTED; +} + +static inline +void __increase_data_temperature(struct inode *inode, + struct data_temperature *dt_info) +{ + u64 bytes_count; + u64 file_blocks; + u32 block_bytes; + int dirty_blocks_ratio; + int updated_blocks_ratio; + int old_temperature; + int calculated; + + if (!inode || !dt_info) + return; + + block_bytes = 1 << inode->i_blkbits; + bytes_count = i_size_read(inode) + block_bytes - 1; + file_blocks = bytes_count >> inode->i_blkbits; + + dt_info->dirty_blocks++; + + if (file_blocks > 0) { + old_temperature = atomic_read(&dt_info->temperature); + + dirty_blocks_ratio = div_u64(dt_info->dirty_blocks, + file_blocks); + updated_blocks_ratio = div_u64(dt_info->updated_blocks, + file_blocks); + calculated = max_t(int, dirty_blocks_ratio, + updated_blocks_ratio); + + if (calculated > 0 && old_temperature < calculated) + atomic_set(&dt_info->temperature, calculated); + } +} + +static inline +void __decrease_data_temperature(struct inode *inode, + struct data_temperature *dt_info) +{ + u64 timestamp; + u64 time_range; + u64 time_diff; + u64 bytes_count; + u64 file_blocks; + u32 block_bytes; + u64 blks_per_temperature_degree; + u64 ns_per_block; + u64 temperature_diff; + + if (!inode || !dt_info) + return; + + if (is_timestamp_invalid(dt_info)) { + create_data_temperature_info(dt_info); + return; + } + + timestamp = get_current_timestamp(); + + if (dt_info->end_timestamp > timestamp) { + create_data_temperature_info(dt_info); + return; + } + + time_range = dt_info->end_timestamp - dt_info->start_timestamp; + time_diff = timestamp - dt_info->end_timestamp; + + block_bytes = 1 << inode->i_blkbits; + bytes_count = i_size_read(inode) + block_bytes - 1; + file_blocks = bytes_count >> inode->i_blkbits; + + blks_per_temperature_degree = file_blocks; + if (blks_per_temperature_degree == 0) { + start_account_data_temperature_info(dt_info); + return; + } + + if (dt_info->updated_blocks == 0 || time_range == 0) { + start_account_data_temperature_info(dt_info); + return; + } + + ns_per_block = div_u64(time_range, dt_info->updated_blocks); + if (ns_per_block == 0) + ns_per_block = 1; + + if (time_diff == 0) { + start_account_data_temperature_info(dt_info); + return; + } + + temperature_diff = div_u64(time_diff, ns_per_block); + temperature_diff = div_u64(temperature_diff, + blks_per_temperature_degree); + + if (temperature_diff == 0) + return; + + if (temperature_diff <= atomic_read(&dt_info->temperature)) { + atomic_sub(temperature_diff, &dt_info->temperature); + dt_info->updated_blocks -= + temperature_diff * blks_per_temperature_degree; + } else { + atomic_set(&dt_info->temperature, 0); + dt_info->updated_blocks = 0; + } +} + +int __increase_data_temperature_by_dirty_folio(struct folio *folio) +{ + struct inode *inode; + struct data_temperature *dt_info; + + if (!folio || !folio->mapping) + return 0; + + inode = folio_inode(folio); + + if (!S_ISREG(inode->i_mode)) + return 0; + + dt_info = get_data_temperature_info(inode); + if (IS_ERR_OR_NULL(dt_info)) + return 0; + + spin_lock(&dt_info->change_lock); + switch (dt_info->state) { + case DATA_TEMPERATURE_CREATED: + atomic_set(&dt_info->temperature, 0); + start_account_data_temperature_info(dt_info); + break; + + case DATA_TEMPERATURE_UPDATE_STARTED: + __increase_data_temperature(inode, dt_info); + break; + + case DATA_TEMPERATURE_UPDATE_FINISHED: + __decrease_data_temperature(inode, dt_info); + start_account_data_temperature_info(dt_info); + break; + + default: + /* do nothing */ + break; + } + spin_unlock(&dt_info->change_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(__increase_data_temperature_by_dirty_folio); + +static inline +void decrement_dirty_blocks(struct data_temperature *dt_info) +{ + if (!dt_info) + return; + + if (dt_info->dirty_blocks > 0) { + dt_info->dirty_blocks--; + dt_info->updated_blocks++; + } +} + +static inline +void finish_increasing_data_temperature(struct data_temperature *dt_info) +{ + if (!dt_info) + return; + + if (dt_info->dirty_blocks == 0) { + dt_info->end_timestamp = get_current_timestamp(); + dt_info->state = DATA_TEMPERATURE_UPDATE_FINISHED; + } +} + +int __account_flushed_folio_by_data_temperature(struct folio *folio) +{ + struct inode *inode; + struct data_temperature *dt_info; + + if (!folio || !folio->mapping) + return 0; + + inode = folio_inode(folio); + + if (!S_ISREG(inode->i_mode)) + return 0; + + dt_info = get_data_temperature_info(inode); + if (IS_ERR_OR_NULL(dt_info)) + return 0; + + spin_lock(&dt_info->change_lock); + switch (dt_info->state) { + case DATA_TEMPERATURE_CREATED: + create_data_temperature_info(dt_info); + break; + + case DATA_TEMPERATURE_UPDATE_STARTED: + if (dt_info->dirty_blocks > 0) + decrement_dirty_blocks(dt_info); + if (dt_info->dirty_blocks == 0) + finish_increasing_data_temperature(dt_info); + break; + + case DATA_TEMPERATURE_UPDATE_FINISHED: + /* do nothing */ + break; + + default: + /* do nothing */ + break; + } + spin_unlock(&dt_info->change_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(__account_flushed_folio_by_data_temperature); + +static int __init data_temperature_init(void) +{ + data_temperature_info_cachep = KMEM_CACHE(data_temperature, + SLAB_RECLAIM_ACCOUNT); + if (!data_temperature_info_cachep) + return -ENOMEM; + + return 0; +} +late_initcall(data_temperature_init) diff --git a/include/linux/data_temperature.h b/include/linux/data_temperature.h new file mode 100644 index 000000000000..40abf6322385 --- /dev/null +++ b/include/linux/data_temperature.h @@ -0,0 +1,124 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Data "temperature" paradigm declarations + * + * Copyright (c) 2024-2025 Viacheslav Dubeyko + */ + +#ifndef _LINUX_DATA_TEMPERATURE_H +#define _LINUX_DATA_TEMPERATURE_H + +/* + * struct data_temperature - data temperature definition + * @temperature: current temperature of a file + * @change_lock: modification lock + * @state: current state of data temperature object + * @dirty_blocks: current number of dirty blocks in page cache + * @updated_blocks: number of updated blocks [start_timestamp, end_timestamp] + * @start_timestamp: starting timestamp of update operations + * @end_timestamp: finishing timestamp of update operations + */ +struct data_temperature { + atomic_t temperature; + + spinlock_t change_lock; + int state; + u64 dirty_blocks; + u64 updated_blocks; + u64 start_timestamp; + u64 end_timestamp; +}; + +enum data_temperature_state { + DATA_TEMPERATURE_UNKNOWN_STATE, + DATA_TEMPERATURE_CREATED, + DATA_TEMPERATURE_UPDATE_STARTED, + DATA_TEMPERATURE_UPDATE_FINISHED, + DATA_TEMPERATURE_STATE_MAX +}; + +#ifdef CONFIG_DATA_TEMPERATURE + +int __set_data_temperature_info(struct inode *inode); +void __remove_data_temperature_info(struct inode *inode); +int __get_data_temperature(const struct inode *inode); +int __increase_data_temperature_by_dirty_folio(struct folio *folio); +int __account_flushed_folio_by_data_temperature(struct folio *folio); + +static inline +struct data_temperature *get_data_temperature_info(const struct inode *inode) +{ + return smp_load_acquire(&inode->i_data_temperature_info); +} + +static inline +int set_data_temperature_info(struct inode *inode) +{ + return __set_data_temperature_info(inode); +} + +static inline +void remove_data_temperature_info(struct inode *inode) +{ + __remove_data_temperature_info(inode); +} + +static inline +int get_data_temperature(const struct inode *inode) +{ + return __get_data_temperature(inode); +} + +static inline +int increase_data_temperature_by_dirty_folio(struct folio *folio) +{ + return __increase_data_temperature_by_dirty_folio(folio); +} + +static inline +int account_flushed_folio_by_data_temperature(struct folio *folio) +{ + return __account_flushed_folio_by_data_temperature(folio); +} + +#else /* !CONFIG_DATA_TEMPERATURE */ + +static inline +int set_data_temperature_info(struct inode *inode) +{ + return 0; +} + +static inline +void remove_data_temperature_info(struct inode *inode) +{ + return; +} + +static inline +struct data_temperature *get_data_temperature_info(const struct inode *inode) +{ + return ERR_PTR(-EOPNOTSUPP); +} + +static inline +int get_data_temperature(const struct inode *inode) +{ + return 0; +} + +static inline +int increase_data_temperature_by_dirty_folio(struct folio *folio) +{ + return 0; +} + +static inline +int account_flushed_folio_by_data_temperature(struct folio *folio) +{ + return 0; +} + +#endif /* CONFIG_DATA_TEMPERATURE */ + +#endif /* _LINUX_DATA_TEMPERATURE_H */ diff --git a/include/linux/fs.h b/include/linux/fs.h index a4af70367f8a..57c4810a28a0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -753,6 +753,10 @@ struct inode { struct fsverity_info *i_verity_info; #endif +#ifdef CONFIG_DATA_TEMPERATURE + struct data_temperature *i_data_temperature_info; +#endif + void *i_private; /* fs or device private pointer */ } __randomize_layout; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index d9861e42b2bd..5de458b7fefc 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include "internal.h" @@ -2775,6 +2776,10 @@ static void folio_account_dirtied(struct folio *folio, __this_cpu_add(bdp_ratelimits, nr); mem_cgroup_track_foreign_dirty(folio, wb); + +#ifdef CONFIG_DATA_TEMPERATURE + increase_data_temperature_by_dirty_folio(folio); +#endif /* CONFIG_DATA_TEMPERATURE */ } } @@ -3006,6 +3011,10 @@ bool folio_clear_dirty_for_io(struct folio *folio) VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); +#ifdef CONFIG_DATA_TEMPERATURE + account_flushed_folio_by_data_temperature(folio); +#endif /* CONFIG_DATA_TEMPERATURE */ + if (mapping && mapping_can_writeback(mapping)) { struct inode *inode = mapping->host; struct bdi_writeback *wb;