From patchwork Mon Oct 12 11:49:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 11832367 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A7BE01580 for ; Mon, 12 Oct 2020 11:49:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 33B29215A4 for ; Mon, 12 Oct 2020 11:49:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="qawEKBkk" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33B29215A4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 26642940007; Mon, 12 Oct 2020 07:49:49 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 21617900002; Mon, 12 Oct 2020 07:49:49 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10556940007; Mon, 12 Oct 2020 07:49:49 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id D3C9A900002 for ; Mon, 12 Oct 2020 07:49:48 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 54D503625 for ; Mon, 12 Oct 2020 11:49:48 +0000 (UTC) X-FDA: 77363104056.20.form36_1604483271fa Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 29493180C0609 for ; Mon, 12 Oct 2020 11:49:48 +0000 (UTC) X-Spam-Summary: 10,1,0,4acae65463339bd3,d41d8cd98f00b204,jannh@google.com,,RULES_HIT:2:41:196:355:379:421:541:800:960:968:973:988:989:1260:1311:1314:1345:1437:1515:1535:1606:1730:1747:1777:1792:1801:2393:2559:2562:2892:2901:3138:3139:3140:3141:3142:3152:3355:3865:3866:3867:3868:3870:3871:3873:3874:4119:4250:4321:4605:5007:6119:6120:6261:6653:7514:7901:7903:7904:8603:8784:8957:9969:10004:11026:11473:11658:11914:12043:12220:12291:12295:12296:12297:12438:12517:12519:12555:12683:12895:12986:13141:13161:13221:13229:13230:13894:14394:14877:21080:21365:21433:21444:21451:21627:21889:21939:21990:30012:30034:30046:30054:30056:30075,0,RBL:209.85.221.68:@google.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04y81k4mqnrcinezyyjr9wxgq7nu7yptgdcrxruoezagxbs3hn47s6f3hwno8ot.74b93okr4zqkq6339kznxjihhzc9bdrujyei6sgwkq5np6omhquufwhrwoonza6.a-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules: 0:0:0,LF X-HE-Tag: form36_1604483271fa X-Filterd-Recvd-Size: 8490 Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Mon, 12 Oct 2020 11:49:47 +0000 (UTC) Received: by mail-wr1-f68.google.com with SMTP id e17so18803707wru.12 for ; Mon, 12 Oct 2020 04:49:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=YkdUvtt0wQc+YWB64dl/dvOh0mjfz0rbs2JPKy5jOvU=; b=qawEKBkkC+hrex5xiuDqe1T10neDdDo054wQvQ4o8xFqd6vbhjlR/YVtlgRJB7k7En zxiQP+fEbifu+0oXoSiLtF0gn/PrM3R5e+JmunLmVb+/0nHAZXMRAxVQEGzSpE+N97ry 1ojLQ9vw1oDQg0UmTGhbtU53LhvQG1alyTs3nXNiGWcKPfp+yyhajzxJVvkT1jDtj+JM uiNbNEZ+OkoJVywbwgU6DmD+3SVojXOB3MiSkzvMyn823cdl3LjrNfjcevDpAqn14/KJ k2vKr65E5IyjJfulriimURJCXzYpJLSPv6HRCpldhs3NK9UPwmICECQfF5ioawTR4rVQ VpEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=YkdUvtt0wQc+YWB64dl/dvOh0mjfz0rbs2JPKy5jOvU=; b=hi1IWGdcv2ChQH04e1TohKnUoGB2pEvpB0wnMU10cAEA8aAsoUIImEgzs1HQEsAmuR EFSuYNecf1iFqFu0nXYplxz0mt1wAI6a1xiiexVZyFiVY0ACzAC8UtOsnOluzXZGNaqR Iwc52l6xjGe0Fe2jY4EcziLuRP/EN0QAmqGGxBm1LtMaO5oRnHUa4Tz4gNu/RxGLbzKo +g+gdTUV5cxvASDR7ayQr5e6nQXyktNCg6dv7GD+xQzUlenrPp6lWHJoFAis1VzN0u58 SbGLk8gAPNEhCkEkgxsxr0D9dTwnViWuQkDYtFzhdNUP8r2gYWvrHjh9nOrXvmnosKTE Aj5g== X-Gm-Message-State: AOAM530Y2A5wCOWMaKUXq1ND/DSzkjr9yejbJsR4+kHXVvGfI4eFmVx8 XhnDHsXZlAWUEVniUqMTzES1qQ== X-Google-Smtp-Source: ABdhPJw7wIx6Hccz5yWvTA+jnGiQR6ZiaxlG6c0qpnvxuhjGBbf0rI0tv6ycB0aDw7R/KT3ANOcSTg== X-Received: by 2002:adf:dd8f:: with SMTP id x15mr27695391wrl.124.1602503386188; Mon, 12 Oct 2020 04:49:46 -0700 (PDT) Received: from localhost ([2a02:168:96c5:1:55ed:514f:6ad7:5bcc]) by smtp.gmail.com with ESMTPSA id x3sm14459191wmi.45.2020.10.12.04.49.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Oct 2020 04:49:45 -0700 (PDT) From: Jann Horn To: mtk.manpages@gmail.com Cc: linux-man@vger.kernel.org, linux-mm@kvack.org, Mark Mossberg Subject: [PATCH] proc.5: Document inaccurate RSS due to SPLIT_RSS_COUNTING Date: Mon, 12 Oct 2020 13:49:40 +0200 Message-Id: <20201012114940.1317510-1-jannh@google.com> X-Mailer: git-send-email 2.28.0.1011.ga647a8990f-goog MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since 34e55232e59f7b19050267a05ff1226e5cd122a5 (introduced back in v2.6.34), Linux uses per-thread RSS counters to reduce cache contention on the per-mm counters. With a 4K page size, that means that you can end up with the counters off by up to 252KiB per thread. Example: $ cat rsstest.c #include #include #include #include #include #include #include #include void dump(int pid) { char cmd[1000]; sprintf(cmd, "grep '^VmRSS' /proc/%d/status;" "grep '^Rss:' /proc/%d/smaps_rollup;" "echo", pid, pid ); system(cmd); } int main(void) { eventfd_t dummy; int child_wait = eventfd(0, EFD_SEMAPHORE|EFD_CLOEXEC); int child_resume = eventfd(0, EFD_SEMAPHORE|EFD_CLOEXEC); if (child_wait == -1 || child_resume == -1) err(1, "eventfd"); pid_t child = fork(); if (child == -1) err(1, "fork"); if (child == 0) { if (prctl(PR_SET_PDEATHSIG, SIGKILL)) err(1, "PDEATHSIG"); if (getppid() == 1) exit(0); char *mapping = mmap(NULL, 80 * 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); eventfd_write(child_wait, 1); eventfd_read(child_resume, &dummy); for (int i=0; i<40; i++) mapping[0x1000 * i] = 1; eventfd_write(child_wait, 1); eventfd_read(child_resume, &dummy); for (int i=40; i<80; i++) mapping[0x1000 * i] = 1; eventfd_write(child_wait, 1); eventfd_read(child_resume, &dummy); exit(0); } eventfd_read(child_wait, &dummy); dump(child); eventfd_write(child_resume, 1); eventfd_read(child_wait, &dummy); dump(child); eventfd_write(child_resume, 1); eventfd_read(child_wait, &dummy); dump(child); eventfd_write(child_resume, 1); exit(0); } $ gcc -o rsstest rsstest.c && ./rsstest VmRSS: 68 kB Rss: 616 kB VmRSS: 68 kB Rss: 776 kB VmRSS: 812 kB Rss: 936 kB $ Let's document that those counters aren't entirely accurate. Reported-by: Mark Mossberg Signed-off-by: Jann Horn --- man5/proc.5 | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) base-commit: 92e4056a29156598d057045ad25f59d44fcd1bb5 diff --git a/man5/proc.5 b/man5/proc.5 index ed309380b53b..13208811efb0 100644 --- a/man5/proc.5 +++ b/man5/proc.5 @@ -2265,6 +2265,9 @@ This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out. +This value is inaccurate; see +.I /proc/[pid]/statm +below. .TP (25) \fIrsslim\fP \ %lu Current soft limit in bytes on the rss of the process; @@ -2409,9 +2412,9 @@ The columns are: size (1) total program size (same as VmSize in \fI/proc/[pid]/status\fP) resident (2) resident set size - (same as VmRSS in \fI/proc/[pid]/status\fP) + (inaccurate; same as VmRSS in \fI/proc/[pid]/status\fP) shared (3) number of resident shared pages (i.e., backed by a file) - (same as RssFile+RssShmem in \fI/proc/[pid]/status\fP) + (inaccurate; same as RssFile+RssShmem in \fI/proc/[pid]/status\fP) text (4) text (code) .\" (not including libs; broken, includes data segment) lib (5) library (unused since Linux 2.6; always 0) @@ -2420,6 +2423,16 @@ data (6) data + stack dt (7) dirty pages (unused since Linux 2.6; always 0) .EE .in +.IP +.\" See SPLIT_RSS_COUNTING in the kernel. +.\" Inaccuracy is bounded by TASK_RSS_EVENTS_THRESH. +Some of these values are somewhat inaccurate (up to 63 pages per thread) because +of a kernel-internal scalability optimization. +If accurate values are required, use +.I /proc/[pid]/smaps +or +.I /proc/[pid]/smaps_rollup +instead, which are much slower but provide accurate, detailed information. .TP .I /proc/[pid]/status Provides much of the information in @@ -2596,6 +2609,9 @@ directly access physical memory. .IP * .IR VmHWM : Peak resident set size ("high water mark"). +This value is inaccurate; see +.I /proc/[pid]/statm +above. .IP * .IR VmRSS : Resident set size. @@ -2604,16 +2620,25 @@ Note that the value here is the sum of .IR RssFile , and .IR RssShmem . +This value is inaccurate; see +.I /proc/[pid]/statm +above. .IP * .IR RssAnon : Size of resident anonymous memory. .\" commit bf9683d6990589390b5178dafe8fd06808869293 (since Linux 4.5). +This value is inaccurate; see +.I /proc/[pid]/statm +above. .IP * .IR RssFile : Size of resident file mappings. .\" commit bf9683d6990589390b5178dafe8fd06808869293 (since Linux 4.5). +This value is inaccurate; see +.I /proc/[pid]/statm +above. .IP * .IR RssShmem : Size of resident shared memory (includes System V shared memory, @@ -2622,6 +2647,9 @@ mappings from and shared anonymous mappings). .\" commit bf9683d6990589390b5178dafe8fd06808869293 (since Linux 4.5). +This value is inaccurate; see +.I /proc/[pid]/statm +above. .IP * .IR VmData ", " VmStk ", " VmExe : Size of data, stack, and text segments. @@ -2640,6 +2668,9 @@ Size of second-level page tables (added in Linux 4.0; removed in Linux 4.15). .\" commit b084d4353ff99d824d3bc5a5c2c22c70b1fba722 Swapped-out virtual memory size by anonymous private pages; shmem swap usage is not included (since Linux 2.6.34). +This value is inaccurate; see +.I /proc/[pid]/statm +above. .IP * .IR HugetlbPages : Size of hugetlb memory portions