From patchwork Thu Aug 10 00:15:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Colascione X-Patchwork-Id: 9892453 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0916F601EB for ; Thu, 10 Aug 2017 00:16:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D3D7E2896D for ; Thu, 10 Aug 2017 00:16:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C883F289EB; Thu, 10 Aug 2017 00:16:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 851072896D for ; Thu, 10 Aug 2017 00:16:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752440AbdHJAQi (ORCPT ); Wed, 9 Aug 2017 20:16:38 -0400 Received: from mail-pg0-f50.google.com ([74.125.83.50]:32779 "EHLO mail-pg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752132AbdHJAQg (ORCPT ); Wed, 9 Aug 2017 20:16:36 -0400 Received: by mail-pg0-f50.google.com with SMTP id u5so34310619pgn.0 for ; Wed, 09 Aug 2017 17:16:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=UM2AmphGTYNQjb6eIUfmFDkZ2OI6NUvPlhl5ubXGI/Q=; b=Vagjw6c81V2yG0Vnvrk54omTmEd0KEvYxBX1AErC9BIfAyTWcVLJY7gmJBcXdaAisO z4qhu6yuVvUF43NQnBmRJoNYyVlZ3celu823oNW02iSRU0NKhxLhIFYqPc7XbhQFh6Yx sh2pNgmTL9JB72/yB9MM3SVXS+EFH7BtmIegAbq/c3izWK/XK/3ZQRTIlbUsiH5HZ1VF lmHG9vpqLRPqW3wCgMW0xqrsoWwdsrSAD55znbB8qfVTpvQHA05dZEPoHCXY0myqSGxr iTJd6rFtfR1Tu9v+1EEF6zW7yRpzJ1NawwVz0hAOaBwcQgnbRA32Z6TVxP4b/jY44geq rwuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=UM2AmphGTYNQjb6eIUfmFDkZ2OI6NUvPlhl5ubXGI/Q=; b=LlmV/5e/lOv2M9j8/dHedT6jIco+pj7knUkVp5qNtIn+97Z2dqCjmpkvBcKxIsFGFJ 7YHAgpyYQHI6x/DfjyCyneh3LY4Phk9kkVC637vaRpBvllpMgzUD3GN/6M/QvQo2RQsO XqPOWjX8ygDVmeRSylDRYfeJBtg/4pMl/UUpeOTw5Fcx9N0HM7+xcg/akO7Fj29MugA5 /5pK+u4tzlscknCvX+za87Wz3+YjFfeB9+WckLEQTwKDRhbPoKTdQs5CPSamn9OPAqID fjaC8qyxxPER1hdTPDu9OSdkh7LGICLsZ8J284jEwdyrx8mWpHkvcUvcPAGpw7htkque qFCA== X-Gm-Message-State: AHYfb5hG1aRQkvSY87LK2e5lNPcHr/RBIHQKSOPvPV6HTpa7t+pzCvKZ ZgkDlLvKdfcfTsR1 X-Received: by 10.99.139.66 with SMTP id j63mr9450404pge.266.1502324195374; Wed, 09 Aug 2017 17:16:35 -0700 (PDT) Received: from poke.sea.corp.google.com ([100.100.214.22]) by smtp.gmail.com with ESMTPSA id k185sm8194355pgc.31.2017.08.09.17.16.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 09 Aug 2017 17:16:34 -0700 (PDT) From: Daniel Colascione To: linux-kernel@vger.kernel.org, timmurray@google.com, joelaf@google.com, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: Daniel Colascione Subject: [PATCH RFC v2] Add /proc/pid/smaps_rollup Date: Wed, 9 Aug 2017 17:15:57 -0700 Message-Id: <20170810001557.147285-1-dancol@google.com> X-Mailer: git-send-email 2.14.0.434.g98096fd7a8-goog In-Reply-To: <20170808132554.141143-1-dancol@google.com> References: <20170808132554.141143-1-dancol@google.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP /proc/pid/smaps_rollup is a new proc file that improves the performance of user programs that determine aggregate memory statistics (e.g., total PSS) of a process. Android regularly "samples" the memory usage of various processes in order to balance its memory pool sizes. This sampling process involves opening /proc/pid/smaps and summing certain fields. For very large processes, sampling memory use this way can take several hundred milliseconds, due mostly to the overhead of the seq_printf calls in task_mmu.c. smaps_rollup improves the situation. It contains most of the fields of /proc/pid/smaps, but instead of a set of fields for each VMA, smaps_rollup instead contains one synthetic smaps-format entry representing the whole process. In the single smaps_rollup synthetic entry, each field is the summation of the corresponding field in all of the real-smaps VMAs. Using a common format for smaps_rollup and smaps allows userspace parsers to repurpose parsers meant for use with non-rollup smaps for smaps_rollup, and it allows userspace to switch between smaps_rollup and smaps at runtime (say, based on the availability of smaps_rollup in a given kernel) with minimal fuss. By using smaps_rollup instead of smaps, a caller can avoid the significant overhead of formatting, reading, and parsing each of a large process's potentially very numerous memory mappings. For sampling system_server's PSS in Android, we measured a 12x speedup, representing a savings of several hundred milliseconds. One alternative to a new per-process proc file would have been including PSS information in /proc/pid/status. We considered this option but thought that PSS would be too expensive (by a few orders of magnitude) to collect relative to what's already emitted as part of /proc/pid/status, and slowing every user of /proc/pid/status for the sake of readers that happen to want PSS feels wrong. The code itself works by reusing the existing VMA-walking framework we use for regular smaps generation and keeping the mem_size_stats structure around between VMA walks instead of using a fresh one for each VMA. In this way, summation happens automatically. We let seq_file walk over the VMAs just as it does for regular smaps and just emit nothing to the seq_file until we hit the last VMA. Patch changelog: v2: Fix typo in commit message Add ABI documentation as requested by gregkh Signed-off-by: Daniel Colascione --- Documentation/ABI/testing/procfs-smaps_rollup | 34 +++++ fs/proc/base.c | 2 + fs/proc/internal.h | 3 + fs/proc/task_mmu.c | 196 ++++++++++++++++++-------- 4 files changed, 173 insertions(+), 62 deletions(-) create mode 100644 Documentation/ABI/testing/procfs-smaps_rollup diff --git a/Documentation/ABI/testing/procfs-smaps_rollup b/Documentation/ABI/testing/procfs-smaps_rollup new file mode 100644 index 000000000000..fd5a3699edf1 --- /dev/null +++ b/Documentation/ABI/testing/procfs-smaps_rollup @@ -0,0 +1,34 @@ +What: /proc/pid/smaps_Rollup +Date: August 2017 +Contact: Daniel Colascione +Description: + This file provides pre-summed memory information for a + process. The format is identical to /proc/pid/smaps, + except instead of an entry for each VMA in a process, + smaps_rollup has a single entry (tagged "[rollup]") + for which each field is the sum of the corresponding + fields from all the maps in /proc/pid/smaps. + For more details, see the procfs man page. + + Typical output looks like this: + + 00100000-ff709000 ---p 00000000 00:00 0 [rollup] + Rss: 884 kB + Pss: 385 kB + Shared_Clean: 696 kB + Shared_Dirty: 0 kB + Private_Clean: 120 kB + Private_Dirty: 68 kB + Referenced: 884 kB + Anonymous: 68 kB + LazyFree: 0 kB + AnonHugePages: 0 kB + ShmemPmdMapped: 0 kB + Shared_Hugetlb: 0 kB + Private_Hugetlb: 0 kB + Swap: 0 kB + SwapPss: 0 kB + Locked: 385 kB + + + diff --git a/fs/proc/base.c b/fs/proc/base.c index 719c2e943ea1..a9587b9cace5 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2930,6 +2930,7 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), REG("smaps", S_IRUGO, proc_pid_smaps_operations), + REG("smaps_rollup", S_IRUGO, proc_pid_smaps_rollup_operations), REG("pagemap", S_IRUSR, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY @@ -3323,6 +3324,7 @@ static const struct pid_entry tid_base_stuff[] = { #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), REG("smaps", S_IRUGO, proc_tid_smaps_operations), + REG("smaps_rollup", S_IRUGO, proc_pid_smaps_rollup_operations), REG("pagemap", S_IRUSR, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY diff --git a/fs/proc/internal.h b/fs/proc/internal.h index aa2b89071630..2cbfcd32e884 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -269,10 +269,12 @@ extern int proc_remount(struct super_block *, int *, char *); /* * task_[no]mmu.c */ +struct mem_size_stats; struct proc_maps_private { struct inode *inode; struct task_struct *task; struct mm_struct *mm; + struct mem_size_stats *rollup; #ifdef CONFIG_MMU struct vm_area_struct *tail_vma; #endif @@ -288,6 +290,7 @@ extern const struct file_operations proc_tid_maps_operations; extern const struct file_operations proc_pid_numa_maps_operations; extern const struct file_operations proc_tid_numa_maps_operations; extern const struct file_operations proc_pid_smaps_operations; +extern const struct file_operations proc_pid_smaps_rollup_operations; extern const struct file_operations proc_tid_smaps_operations; extern const struct file_operations proc_clear_refs_operations; extern const struct file_operations proc_pagemap_operations; diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index b836fd61ed87..02b55df7291c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -252,6 +252,7 @@ static int proc_map_release(struct inode *inode, struct file *file) if (priv->mm) mmdrop(priv->mm); + kfree(priv->rollup); return seq_release_private(inode, file); } @@ -278,6 +279,23 @@ static int is_stack(struct proc_maps_private *priv, vma->vm_end >= vma->vm_mm->start_stack; } +static void show_vma_header_prefix(struct seq_file *m, + unsigned long start, unsigned long end, + vm_flags_t flags, unsigned long long pgoff, + dev_t dev, unsigned long ino) +{ + seq_setwidth(m, 25 + sizeof(void *) * 6 - 1); + seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu ", + start, + end, + flags & VM_READ ? 'r' : '-', + flags & VM_WRITE ? 'w' : '-', + flags & VM_EXEC ? 'x' : '-', + flags & VM_MAYSHARE ? 's' : 'p', + pgoff, + MAJOR(dev), MINOR(dev), ino); +} + static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) { @@ -300,17 +318,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) start = vma->vm_start; end = vma->vm_end; - - seq_setwidth(m, 25 + sizeof(void *) * 6 - 1); - seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu ", - start, - end, - flags & VM_READ ? 'r' : '-', - flags & VM_WRITE ? 'w' : '-', - flags & VM_EXEC ? 'x' : '-', - flags & VM_MAYSHARE ? 's' : 'p', - pgoff, - MAJOR(dev), MINOR(dev), ino); + show_vma_header_prefix(m, start, end, flags, pgoff, dev, ino); /* * Print the dentry name for named mappings, and a @@ -429,6 +437,7 @@ const struct file_operations proc_tid_maps_operations = { #ifdef CONFIG_PROC_PAGE_MONITOR struct mem_size_stats { + bool first; unsigned long resident; unsigned long shared_clean; unsigned long shared_dirty; @@ -442,7 +451,9 @@ struct mem_size_stats { unsigned long swap; unsigned long shared_hugetlb; unsigned long private_hugetlb; + unsigned long first_vma_start; u64 pss; + u64 pss_locked; u64 swap_pss; bool check_shmem_swap; }; @@ -718,18 +729,36 @@ void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma) static int show_smap(struct seq_file *m, void *v, int is_pid) { + struct proc_maps_private *priv = m->private; struct vm_area_struct *vma = v; - struct mem_size_stats mss; + struct mem_size_stats mss_stack; + struct mem_size_stats *mss; struct mm_walk smaps_walk = { .pmd_entry = smaps_pte_range, #ifdef CONFIG_HUGETLB_PAGE .hugetlb_entry = smaps_hugetlb_range, #endif .mm = vma->vm_mm, - .private = &mss, }; + int ret = 0; + bool rollup_mode; + bool last_vma; + + if (priv->rollup) { + rollup_mode = true; + mss = priv->rollup; + if (mss->first) { + mss->first_vma_start = vma->vm_start; + mss->first = false; + } + last_vma = !m_next_vma(priv, vma); + } else { + rollup_mode = false; + memset(&mss_stack, 0, sizeof(mss_stack)); + mss = &mss_stack; + } - memset(&mss, 0, sizeof mss); + smaps_walk.private = mss; #ifdef CONFIG_SHMEM if (vma->vm_file && shmem_mapping(vma->vm_file->f_mapping)) { @@ -747,9 +776,9 @@ static int show_smap(struct seq_file *m, void *v, int is_pid) if (!shmem_swapped || (vma->vm_flags & VM_SHARED) || !(vma->vm_flags & VM_WRITE)) { - mss.swap = shmem_swapped; + mss->swap = shmem_swapped; } else { - mss.check_shmem_swap = true; + mss->check_shmem_swap = true; smaps_walk.pte_hole = smaps_pte_hole; } } @@ -757,54 +786,71 @@ static int show_smap(struct seq_file *m, void *v, int is_pid) /* mmap_sem is held in m_start */ walk_page_vma(vma, &smaps_walk); + if (vma->vm_flags & VM_LOCKED) + mss->pss_locked += mss->pss; + + if (!rollup_mode) { + show_map_vma(m, vma, is_pid); + } else if (last_vma) { + show_vma_header_prefix( + m, mss->first_vma_start, vma->vm_end, 0, 0, 0, 0); + seq_pad(m, ' '); + seq_puts(m, "[rollup]\n"); + } else { + ret = SEQ_SKIP; + } - show_map_vma(m, vma, is_pid); - - seq_printf(m, - "Size: %8lu kB\n" - "Rss: %8lu kB\n" - "Pss: %8lu kB\n" - "Shared_Clean: %8lu kB\n" - "Shared_Dirty: %8lu kB\n" - "Private_Clean: %8lu kB\n" - "Private_Dirty: %8lu kB\n" - "Referenced: %8lu kB\n" - "Anonymous: %8lu kB\n" - "LazyFree: %8lu kB\n" - "AnonHugePages: %8lu kB\n" - "ShmemPmdMapped: %8lu kB\n" - "Shared_Hugetlb: %8lu kB\n" - "Private_Hugetlb: %7lu kB\n" - "Swap: %8lu kB\n" - "SwapPss: %8lu kB\n" - "KernelPageSize: %8lu kB\n" - "MMUPageSize: %8lu kB\n" - "Locked: %8lu kB\n", - (vma->vm_end - vma->vm_start) >> 10, - mss.resident >> 10, - (unsigned long)(mss.pss >> (10 + PSS_SHIFT)), - mss.shared_clean >> 10, - mss.shared_dirty >> 10, - mss.private_clean >> 10, - mss.private_dirty >> 10, - mss.referenced >> 10, - mss.anonymous >> 10, - mss.lazyfree >> 10, - mss.anonymous_thp >> 10, - mss.shmem_thp >> 10, - mss.shared_hugetlb >> 10, - mss.private_hugetlb >> 10, - mss.swap >> 10, - (unsigned long)(mss.swap_pss >> (10 + PSS_SHIFT)), - vma_kernel_pagesize(vma) >> 10, - vma_mmu_pagesize(vma) >> 10, - (vma->vm_flags & VM_LOCKED) ? - (unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0); - - arch_show_smap(m, vma); - show_smap_vma_flags(m, vma); + if (!rollup_mode) + seq_printf(m, + "Size: %8lu kB\n" + "KernelPageSize: %8lu kB\n" + "MMUPageSize: %8lu kB\n", + (vma->vm_end - vma->vm_start) >> 10, + vma_kernel_pagesize(vma) >> 10, + vma_mmu_pagesize(vma) >> 10); + + + if (!rollup_mode || last_vma) + seq_printf(m, + "Rss: %8lu kB\n" + "Pss: %8lu kB\n" + "Shared_Clean: %8lu kB\n" + "Shared_Dirty: %8lu kB\n" + "Private_Clean: %8lu kB\n" + "Private_Dirty: %8lu kB\n" + "Referenced: %8lu kB\n" + "Anonymous: %8lu kB\n" + "LazyFree: %8lu kB\n" + "AnonHugePages: %8lu kB\n" + "ShmemPmdMapped: %8lu kB\n" + "Shared_Hugetlb: %8lu kB\n" + "Private_Hugetlb: %7lu kB\n" + "Swap: %8lu kB\n" + "SwapPss: %8lu kB\n" + "Locked: %8lu kB\n", + mss->resident >> 10, + (unsigned long)(mss->pss >> (10 + PSS_SHIFT)), + mss->shared_clean >> 10, + mss->shared_dirty >> 10, + mss->private_clean >> 10, + mss->private_dirty >> 10, + mss->referenced >> 10, + mss->anonymous >> 10, + mss->lazyfree >> 10, + mss->anonymous_thp >> 10, + mss->shmem_thp >> 10, + mss->shared_hugetlb >> 10, + mss->private_hugetlb >> 10, + mss->swap >> 10, + (unsigned long)(mss->swap_pss >> (10 + PSS_SHIFT)), + (unsigned long)(mss->pss >> (10 + PSS_SHIFT))); + + if (!rollup_mode) { + arch_show_smap(m, vma); + show_smap_vma_flags(m, vma); + } m_cache_vma(m, vma); - return 0; + return ret; } static int show_pid_smap(struct seq_file *m, void *v) @@ -836,6 +882,25 @@ static int pid_smaps_open(struct inode *inode, struct file *file) return do_maps_open(inode, file, &proc_pid_smaps_op); } +static int pid_smaps_rollup_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + struct proc_maps_private *priv; + int ret = do_maps_open(inode, file, &proc_pid_smaps_op); + + if (ret < 0) + return ret; + seq = file->private_data; + priv = seq->private; + priv->rollup = kzalloc(sizeof(*priv->rollup), GFP_KERNEL); + if (!priv->rollup) { + proc_map_release(inode, file); + return -ENOMEM; + } + priv->rollup->first = true; + return 0; +} + static int tid_smaps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_tid_smaps_op); @@ -848,6 +913,13 @@ const struct file_operations proc_pid_smaps_operations = { .release = proc_map_release, }; +const struct file_operations proc_pid_smaps_rollup_operations = { + .open = pid_smaps_rollup_open, + .read = seq_read, + .llseek = seq_lseek, + .release = proc_map_release, +}; + const struct file_operations proc_tid_smaps_operations = { .open = tid_smaps_open, .read = seq_read,