From patchwork Sun Sep 11 08:34:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 12972783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17174ECAAA1 for ; Sun, 11 Sep 2022 08:35:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DB428D0001; Sun, 11 Sep 2022 04:35:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 364D16B0073; Sun, 11 Sep 2022 04:35:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 204458D0001; Sun, 11 Sep 2022 04:35:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0B30A6B0072 for ; Sun, 11 Sep 2022 04:35:28 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D1518120117 for ; Sun, 11 Sep 2022 08:35:27 +0000 (UTC) X-FDA: 79899145494.20.6A81A4A Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf26.hostedemail.com (Postfix) with ESMTP id 842181400B6 for ; Sun, 11 Sep 2022 08:35:27 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-345158b6641so51206027b3.8 for ; Sun, 11 Sep 2022 01:35:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date; bh=YbtCFZvILXpJC+a2fdohEwEfZPgdvsYvrLOiWC4FHmU=; b=VR7VeyMXlcZH0eIG2slwMf0Avl0QqmZ584jICzhQKSzU/7NMm+PbJkUcbdepWXQr/S 6nUUBz/fNLBk5+WG1cKXVERR2VF96XEyBO99p2hk3IZ9WhPLcqOZ6jPMDvaxCammepA7 QhbUnmh9Yfj4gPEgmJZAuGcUeoi0vKqpfNYzhaIyY7/GyttjK13Ns9fsjZwgYQJREeiw azxOTeFd6y0wkTKF0eUWIRGNHWvlvJ0jGWncuDH/V3fi9jlU81x+s5nZQIGLVo/k2Nc/ 1MCYRhH2TlBG7mJpwa4RTypq+EZpqYGdlWs7nqyc1imZ9/iHVpjkzbGCvwiRwW77BlFn Q0kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date; bh=YbtCFZvILXpJC+a2fdohEwEfZPgdvsYvrLOiWC4FHmU=; b=4REk3aM7EFHv8tiCr17iqydFEQYMtaiWNJ1sJncxTULsoebrB9m3W+s4EAk9uroEMN glt/1Y1NCIi6JIX6jCewH965J9i3FoaqD7D9qJqQSQN5pYsqkl+kNMsvhUghLN0Yl+4N SZgMB4YvjJ9ERpvciodg19CdMijKrdav7+flQ4CcIyBGdjLAJYNOhuAHF9kJRlfAdYRK mgeBSV94Imkf5EkFcANVzewZ3P+CPnw0OnAdQGgb/Lh9eHT2cG6bd0aLdfybu6X4ZU10 cbp8ocjYiL2Yf/Gb0ZfaLc/qbDIrS8AB/FwsoTqe9STwCau/WFjMWfPLJ2HhUGVaIviF Fv3w== X-Gm-Message-State: ACgBeo2BU3BtgJ9txg2Io5yAxXftWT0tAJnQYWVmIso5xfRfXqCPvyAX m9B2q+OVL5OAaxEfisSCx3qIIXC/2U2jyniz9WAtCsLhGfT5gTfMaZKIU+IMbgoprptZxrb+J0X OW8LlQLYXVONlsLL+lQNsny1DTtBdaoKhjNAMRxZhCV4Kx+GWnqtV0z/9XDE= X-Google-Smtp-Source: AA6agR7fFiGV7ORO5loPzZn8xBcfu1/VrYN+FRNC79M2U4ZQJuiDnvJNiUt4h+qem/NB9lNHcwzyAtCQz3pA X-Received: from yuanchu.svl.corp.google.com ([2620:15c:2d4:203:5076:f273:1383:891d]) (user=yuanchu job=sendgmr) by 2002:a05:6902:725:b0:6ae:4951:cc24 with SMTP id l5-20020a056902072500b006ae4951cc24mr13154999ybt.50.1662885326656; Sun, 11 Sep 2022 01:35:26 -0700 (PDT) Date: Sun, 11 Sep 2022 01:34:16 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.37.2.789.g6183377224-goog Message-ID: <20220911083418.2818369-1-yuanchu@google.com> Subject: [RFC PATCH 0/2] mm: multi-gen LRU: per-process heatmaps From: Yuanchu Xie To: linux-mm@kvack.org, Yu Zhao Cc: Michael Larabel , Jon Corbet , Andrew Morton , Yuanchu Xie , linux-kernel@vger.kernel.org, bpf@vger.kernel.org ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VR7VeyMX; spf=pass (imf26.hostedemail.com: domain of 3zp0dYwcKCJ8XT9MBGTFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3zp0dYwcKCJ8XT9MBGTFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662885327; a=rsa-sha256; cv=none; b=E2/o82RA9f1rU6Xrsq/RqwCl8L183RHjpSIjc/JH3xEcs3CLgVZ2+xSxZHBcScP6FMDbvE V0GU4gOG38dxg8r7OFC/azZaw13mLtmCvWbhFB99tJxhm6KLT4X/RrcOU88AJ/J8MtB3+M MvpQF3rOemGKBVCFk+h3enugVlOQfNc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662885327; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=YbtCFZvILXpJC+a2fdohEwEfZPgdvsYvrLOiWC4FHmU=; b=fCrxqIFolfMxPmnURabMk2M41lqv5kKwV1N/uQNd2mIuHSkcam3LIpjx05iq6EA2QZhcML +at3z9kuMN/XXtbJ1iDtR0afa1PcP1G9FIlec9Jzhoq8v0nyF5DMm8tYl7S7LJZp9AkD6m KhaJEKngz0r6bViAJRo1yFLUKJHdOSI= X-Stat-Signature: cyfhsm6ot8xi7pzagyha5p7pepxdn95q X-Rspam-User: Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VR7VeyMX; spf=pass (imf26.hostedemail.com: domain of 3zp0dYwcKCJ8XT9MBGTFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3zp0dYwcKCJ8XT9MBGTFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 842181400B6 X-HE-Tag: 1662885327-130628 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Today, the MGLRU debugfs interface (/sys/kernel/debug/lru_gen) provides a histogram counting the number of pages belonging to each generation, providing some data for memory coldness, but we don't actually know where the memory actually is. However, since MGLRU revamps the page reclaim mechanism to walk page tables, we can hook into MGLRU page table access bit harvesting with a BPF program to collect information on relative hotness and coldness, NUMA nodes, whether a page is anon/file, etc. Using BPF programs to collect and aggregate page access information allows for the userspace agent to customize what to collect and how to aggregate. It could focus on a particular region of interest and count a moving average access frequency, or find allocations that are never accessed that could be eliminated all together. Currently MGLRU relies on heuristics with regards to what generation a page is assigned, for example, pages accessed through page tables are always assigned to the youngest generation. Exposing page access data can allow future work to customize page generation assignments (with more BPF). We demonstrate feasibility with a proof-of-concept that prints a live heatmap of a process, with configurable MGLRU aging intervals and aggregation intervals. This is a very rough PoC that still needs a lot of work, but it shows a lot can be done by exposing page access information from MGLRU. I will be presenting this work at the coming LPC. As an example. I ran the memtier benchmark[1] and captured a heatmap of memcached being populated and running the benchmark (similar to the one Yu posted for OpenWRT[2]): $ cat ./run_memtier_benchmark.sh run_memtier_benchmark() { # populate dataset memtier_benchmark/memtier_benchmark -s 127.0.0.1 -p 11211 \ -P memcache_binary -n allkeys -t 1 -c 1 --ratio 1:0 --pipeline 8 \ --key-minimum=1 --key-maximum=$2 --key-pattern=P:P \ -d 1000 # access dataset using Guassian pattern memtier_benchmark/memtier_benchmark -s 127.0.0.1 -p 11211 \ -P memcache_binary --test-time $1 -t 1 -c 1 --ratio 0:1 \ --pipeline 8 --key-minimum=1 --key-maximum=$2 \ --key-pattern=G:G --randomize --distinct-client-seed # collect results } run_duration_secs=3600 max_key=8000000 run_memtier_benchmark $run_duration_secs $max_key In the following screenshot we can see the process of populating the dataset and accessing the dataset: https://services.google.com/fh/files/events/memcached_memtier_startup.png Patch 1 adds the infrastructure to enable BPF programs to monitor page access bit harvesting Patch 2 includes a proof-of-concept python TUI program displaying online per-process heatmaps. [1] https://github.com/RedisLabs/memtier_benchmark [2] https://lore.kernel.org/all/20220831041731.3836322-1-yuzhao@google.com/ Yuanchu Xie (2): mm: multi-gen LRU: support page access info harvesting with eBPF mm: add a BPF-based per-process heatmap tool include/linux/mmzone.h | 1 + mm/vmscan.c | 154 ++++++++ tools/vm/heatmap/Makefile | 30 ++ tools/vm/heatmap/heatmap.bpf.c | 123 +++++++ tools/vm/heatmap/heatmap.user.c | 188 ++++++++++ tools/vm/heatmap/heatmap_tui.py | 600 ++++++++++++++++++++++++++++++++ 6 files changed, 1096 insertions(+) create mode 100644 tools/vm/heatmap/Makefile create mode 100644 tools/vm/heatmap/heatmap.bpf.c create mode 100644 tools/vm/heatmap/heatmap.user.c create mode 100755 tools/vm/heatmap/heatmap_tui.py