[RFC,2/2] mm, slub: add shrinker to reclaim cached slabs

For performance reasons, SLUB doesn't keep all slabs on shared lists and
doesn't always free slabs immediately after all objects are freed. Namely:

- for each cache and cpu, there might be a "CPU slab" page, partially or fully
  free
- with SLUB_CPU_PARTIAL enabled (default y), there might be a number of "percpu
  partial slabs" for each cache and cpu, also partially or fully free
- for each cache and numa node, there are caches on per-node partial list, up
  to 10 of those may be empty

As Jann reports [1], the number of percpu partial slabs should be limited by
number of free objects (up to 30), but due to imprecise accounting, this can
deterioriate so that there are up to 30 free slabs. He notes:

> Even on an old-ish Android phone (Pixel 2), with normal-ish usage, I
> see something like 1.5MiB of pages with zero inuse objects stuck in
> percpu lists.

My observations match Jann's, and we've seen e.g. cases with 10 free slabs per
cpu. We can also confirm Jann's theory that on kernels pre-kmemcg rewrite (in
v5.9), this issue is amplified as there are separate sets of kmem caches with
cpu caches, per-cpu partial and per-node partial lists for each memcg and cache
that deals with kmemcg-accounted objects.

The cached free slabs can therefore become a memory waste, making memory
pressure higher, causing more reclaim of actually used LRU pages, and even
cause OOM (global, or memcg on older kernels).

SLUB provides __kmem_cache_shrink() that can flush all the abovementioned
slabs, but is currently called only in rare situations, or from a sysfs
handler. The standard way to cooperate with reclaim is to provide a shrinker,
and so this patch adds such shrinker to call __kmem_cache_shrink()
systematically.

The shrinker design is however atypical. The usual design assumes that a
shrinker can easily count how many objects can be reclaimed, and then reclaim
given number of objects. For SLUB, determining the number of the various cached
slabs would be a lot of work, and controlling how many to shrink precisely
would be impractical. Instead, the shrinker is based on reclaim priority, and
on lowest priority shrinks a single kmem cache, while on highest it shrinks all
of them. To do that effectively, there's a new list caches_to_shrink where
caches are taken from its head and then moved to tail. Existing slab_caches
list is unaffected so that e.g. /proc/slabinfo order is not disrupted.

This approach should not cause excessive shrinking and IPI storms:

- If there are multiple reclaimers in parallel, only one can proceed, thanks to
  mutex_trylock(&slab_mutex). After unlocking, caches that were just shrinked
  are at the tail of the list.
- in flush_all(), we actually check if there's anything to flush by a CPU
  (has_cpu_slab()) before sending an IPI
- CPU slab deactivation became more efficient with "mm, slub: splice cpu and
  page freelists in deactivate_slab()

The result is that SLUB's per-cpu and per-node caches are trimmed of free
pages, and partially used pages have higher chance of being either reused of
freed. The trimming effort is controlled by reclaim activity and thus memory
pressure. Before an OOM, a reclaim attempt at highest priority ensures
shrinking all caches. Also being a proper slab shrinker, the shrinking is
now also called as part of the drop_caches sysctl operation.

[1] https://lore.kernel.org/linux-mm/CAG48ez2Qx5K1Cab-m8BdSibp6wLTip6ro4=-umR7BLsEgjEYzA@mail.gmail.com/

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slub_def.h |  1 +
 mm/slub.c                | 76 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 76 insertions(+), 1 deletion(-)

Message ID	20210121172154.27580-2-vbabka@suse.cz (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=kv+e=GY=kvack.org=owner-linux-mm@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E970CC433E9 for <linux-mm@archiver.kernel.org>; Thu, 21 Jan 2021 17:22:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6A8B023A57 for <linux-mm@archiver.kernel.org>; Thu, 21 Jan 2021 17:22:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A8B023A57 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0E5266B0007; Thu, 21 Jan 2021 12:22:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 04EC16B000C; Thu, 21 Jan 2021 12:22:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D423D6B0007; Thu, 21 Jan 2021 12:22:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0008.hostedemail.com [216.40.44.8]) by kanga.kvack.org (Postfix) with ESMTP id A94936B0008 for <linux-mm@kvack.org>; Thu, 21 Jan 2021 12:22:07 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 619A18249980 for <linux-mm@kvack.org>; Thu, 21 Jan 2021 17:22:07 +0000 (UTC) X-FDA: 77730450294.06.kiss88_4c094e927564 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 29F601005FE9B for <linux-mm@kvack.org>; Thu, 21 Jan 2021 17:22:07 +0000 (UTC) X-HE-Tag: kiss88_4c094e927564 X-Filterd-Recvd-Size: 9012 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf12.hostedemail.com (Postfix) with ESMTP for <linux-mm@kvack.org>; Thu, 21 Jan 2021 17:22:06 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 32597B8F8; Thu, 21 Jan 2021 17:22:05 +0000 (UTC) From: Vlastimil Babka <vbabka@suse.cz> To: vbabka@suse.cz Cc: akpm@linux-foundation.org, bigeasy@linutronix.de, cl@linux.com, guro@fb.com, hannes@cmpxchg.org, iamjoonsoo.kim@lge.com, jannh@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@kernel.org, minchan@kernel.org, penberg@kernel.org, rientjes@google.com, shakeelb@google.com, surenb@google.com, tglx@linutronix.de Subject: [RFC 2/2] mm, slub: add shrinker to reclaim cached slabs Date: Thu, 21 Jan 2021 18:21:54 +0100 Message-Id: <20210121172154.27580-2-vbabka@suse.cz> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210121172154.27580-1-vbabka@suse.cz> References: <aa02cf86-3a83-2e55-3bb6-3ec1c0f71b11@suse.cz> <20210121172154.27580-1-vbabka@suse.cz> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: <linux-mm.kvack.org>
Series	[RFC,1/2] mm, vmscan: add priority field to struct shrink_control \| expand [RFC,1/2] mm, vmscan: add priority field to struct shrink_control [RFC,2/2] mm, slub: add shrinker to reclaim cached slabs

[RFC,2/2] mm, slub: add shrinker to reclaim cached slabs

Commit Message

Comments

Patch