From patchwork Thu Sep 17 18:13:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 11783219 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E5C26618 for ; Thu, 17 Sep 2020 18:13:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5BA13221E3 for ; Thu, 17 Sep 2020 18:13:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="mP42vJsb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5BA13221E3 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 70B986B0055; Thu, 17 Sep 2020 14:13:56 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6BC6F6B005A; Thu, 17 Sep 2020 14:13:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5ACA96B005C; Thu, 17 Sep 2020 14:13:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 3F7256B0055 for ; Thu, 17 Sep 2020 14:13:56 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 048543642 for ; Thu, 17 Sep 2020 18:13:56 +0000 (UTC) X-FDA: 77273352072.11.play89_0f0f34e27124 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id D2943180F8B80 for ; Thu, 17 Sep 2020 18:13:55 +0000 (UTC) X-Spam-Summary: 1,0,0,0d90be2a7c4b18a8,d41d8cd98f00b204,3yqdjxw0kceed0houdvpxvvhqjrrjoh.frpolqx0-ppnydfn.ruj@flex--axelrasmussen.bounces.google.com,,RULES_HIT:4:41:152:355:379:541:960:973:988:989:1260:1277:1313:1314:1345:1434:1437:1516:1518:1593:1594:1605:1730:1747:1777:1792:1801:2194:2199:2393:2559:2562:2639:2693:2740:2892:2897:2912:3138:3139:3140:3141:3142:3152:3865:3866:3867:3868:3870:3871:3872:4250:4321:4605:5007:6119:6261:6653:7875:7903:9108:9149:9969:10004:10946:11026:11473:11657:11658:11914:12043:12048:12291:12296:12297:12438:12555:12683:12895:12986:13137:13150:13161:13229:13230:13231:14096:14097:14394:14659:21080:21444:21451:21611:21627:21740:21966:21987:21990:30012:30029:30054:30070:30075,0,RBL:209.85.219.202:@flex--axelrasmussen.bounces.google.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04y8efn5q4eec87t7u4nbfndadak5oc9peyk4r7gedhb73roij4wzdngzeg65eh.1juxoqusb7c3a5ywjzza844jc9mdhgf38dxtn5cbnawbzokyngicwouoiqs7zfi.n-lbl8.mailshell.net-223.238.255.100,CacheIP:no ne,Bayes X-HE-Tag: play89_0f0f34e27124 X-Filterd-Recvd-Size: 16465 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Thu, 17 Sep 2020 18:13:55 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id k3so2941684ybk.16 for ; Thu, 17 Sep 2020 11:13:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:message-id:mime-version:subject:from:to:cc; bh=9Wik/JeTDacaC0oFCbpTunMwEMIdrdX0311qhCimp9g=; b=mP42vJsb57js1hKiY2YcWmfP7vhJSpWs6Zugce2sggHwzN6gZ9andjilKZRJGZ5YKK XEkDJBbKq4JWf6fKYaAeL01hqGA8gryWwa8IEUrrF0+M+c2aSd11qGFP9WIr7H/T/IEI imlzLF7CjmB3Hvi8oGXOMfh2c/e5z0JYRea+JvNP3Y7VAZRxosaR3XySyZtk9gHbTDkj BJwe8OV7jMp9MFUfE/5p/thHb+32K17l6fBNsiXca4zyjFbhHlMRQd0gQTwg1lCfJ95X eIhxzepRVPRvucXsGn4UCFv/cFdea6QwcaZy2ADx+53ZVk7tw2y/D9LzKQuRzvoWyX4g wHPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:message-id:mime-version:subject:from :to:cc; bh=9Wik/JeTDacaC0oFCbpTunMwEMIdrdX0311qhCimp9g=; b=RHvtsHt9/mdF3tXZfQzI5dZVraxTscoYLRZqEv05DNq6dLtbgOS9v36byFnMB8IZw9 1NAKS4SI3ACrrOknCN22wA76k0HFCQqm+VXp/ezU133Lt83hE0ez1FblKeZhhTIuRojD dnryN7zQhxgs/uM0DGH7JCseernaxS48yuySc8B+EBURuHlQd05ZzStIpmfcrB4ZO1pa gozLK/iZudtFex7xoOlRFRBQ7dxWN7NhBnvLFl0KFsuCiaUqe1Bl9zA+H9hZ47qvj56D 4Km10PzzBGLF+wu9JexmankvM5GurcxQL2u2KUClROm0G78zEHGFR04RlkOjge2b1Lko RYVA== X-Gm-Message-State: AOAM532RTm80/raKsoZzjJR5dT5j5/pPpBo3HHJTrdHMUddOmlo3xrnt PedJZYnkya/cvcU89idVbHQYTtKpwdw6acTyKdqi X-Google-Smtp-Source: ABdhPJzLMOaZB/z5X0/6bTpb2VNS3agN05GocGzHCy7rOs4cYaFPxRZ8/yNGDw4SuWCSZ7NBLR3QMJwDcxZ4Nx0f/BTY X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:f693:9fff:feef:c8f8]) (user=axelrasmussen job=sendgmr) by 2002:a25:5a56:: with SMTP id o83mr40461289ybb.55.1600366434467; Thu, 17 Sep 2020 11:13:54 -0700 (PDT) Date: Thu, 17 Sep 2020 11:13:47 -0700 Message-Id: <20200917181347.1359365-1-axelrasmussen@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.28.0.618.gf4bc123cb7-goog Subject: [PATCH] mmap_lock: add tracepoints around lock acquisition From: Axel Rasmussen To: Steven Rostedt , Ingo Molnar , Andrew Morton , Vlastimil Babka , Michel Lespinasse , Daniel Jordan , Davidlohr Bueso Cc: Yafang Shao , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Axel Rasmussen X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The goal of these tracepoints is to be able to debug lock contention issues. This lock is acquired on most (all?) mmap / munmap / page fault operations, so a multi-threaded process which does a lot of these can experience significant contention. We trace just before we start acquisition, when the acquisition returns (whether it succeeded or not), and when the lock is released (or downgraded). The events are broken out by lock type (read / write). The events are also broken out by memcg path. For container-based workloads, users often think of several processes in a memcg as a single logical "task", so collecting statistics at this level is useful. These events *do not* include latency bucket information, which means for a proper latency histogram users will need to use BPF instead of event histograms. The benefit we get from this is simpler code. This patch is a no-op if the Kconfig option is not enabled. If it is, tracepoints are still disabled by default (configurable at runtime); the only fixed cost here is un-inlining a few functions. As best as I've been able to measure, the overhead this introduces is a small fraction of 1%. Actually hooking up the tracepoints to BPF introduces additional overhead, depending on exactly what the BPF program is collecting. --- include/linux/mmap_lock.h | 28 +++- include/trace/events/mmap_lock.h | 73 ++++++++++ mm/Kconfig | 17 +++ mm/Makefile | 1 + mm/mmap_lock.c | 224 +++++++++++++++++++++++++++++++ 5 files changed, 342 insertions(+), 1 deletion(-) create mode 100644 include/trace/events/mmap_lock.h create mode 100644 mm/mmap_lock.c diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 0707671851a8..d12aa2ff6c05 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -1,11 +1,35 @@ #ifndef _LINUX_MMAP_LOCK_H #define _LINUX_MMAP_LOCK_H +#include +#include #include +#include +#include #define MMAP_LOCK_INITIALIZER(name) \ .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock), +#ifdef CONFIG_MMAP_LOCK_STATS + +void mmap_init_lock(struct mm_struct *mm); +void mmap_write_lock(struct mm_struct *mm); +void mmap_write_lock_nested(struct mm_struct *mm, int subclass); +int mmap_write_lock_killable(struct mm_struct *mm); +bool mmap_write_trylock(struct mm_struct *mm); +void mmap_write_unlock(struct mm_struct *mm); +void mmap_write_downgrade(struct mm_struct *mm); +void mmap_read_lock(struct mm_struct *mm); +int mmap_read_lock_killable(struct mm_struct *mm); +bool mmap_read_trylock(struct mm_struct *mm); +void mmap_read_unlock(struct mm_struct *mm); +bool mmap_read_trylock_non_owner(struct mm_struct *mm); +void mmap_read_unlock_non_owner(struct mm_struct *mm); +void mmap_assert_locked(struct mm_struct *mm); +void mmap_assert_write_locked(struct mm_struct *mm); + +#else /* !CONFIG_MMAP_LOCK_STATS */ + static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); @@ -63,7 +87,7 @@ static inline void mmap_read_unlock(struct mm_struct *mm) static inline bool mmap_read_trylock_non_owner(struct mm_struct *mm) { - if (down_read_trylock(&mm->mmap_lock)) { + if (mmap_read_trylock(mm)) { rwsem_release(&mm->mmap_lock.dep_map, _RET_IP_); return true; } @@ -87,4 +111,6 @@ static inline void mmap_assert_write_locked(struct mm_struct *mm) VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } +#endif /* CONFIG_MMAP_LOCK_STATS */ + #endif /* _LINUX_MMAP_LOCK_H */ diff --git a/include/trace/events/mmap_lock.h b/include/trace/events/mmap_lock.h new file mode 100644 index 000000000000..549c662e6ed8 --- /dev/null +++ b/include/trace/events/mmap_lock.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM mmap_lock + +#if !defined(_TRACE_MMAP_LOCK_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_MMAP_LOCK_H + +#include +#include + +struct mm_struct; + +DECLARE_EVENT_CLASS( + mmap_lock_template, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, u64 duration, + bool write, bool success), + + TP_ARGS(mm, memcg_path, duration, write, success), + + TP_STRUCT__entry( + __field(struct mm_struct *, mm) + __string(memcg_path, memcg_path) + __field(u64, duration) + __field(bool, write) + __field(bool, success) + ), + + TP_fast_assign( + __entry->mm = mm; + __assign_str(memcg_path, memcg_path); + __entry->duration = duration; + __entry->write = write; + __entry->success = success; + ), + + TP_printk( + "mm=%p memcg_path=%s duration=%llu write=%s success=%s\n", + __entry->mm, + __get_str(memcg_path), + __entry->duration, + __entry->write ? "true" : "false", + __entry->success ? "true" : "false") + ); + +DEFINE_EVENT(mmap_lock_template, mmap_lock_start_locking, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, u64 duration, + bool write, bool success), + + TP_ARGS(mm, memcg_path, duration, write, success) +); + +DEFINE_EVENT(mmap_lock_template, mmap_lock_acquire_returned, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, u64 duration, + bool write, bool success), + + TP_ARGS(mm, memcg_path, duration, write, success) +); + +DEFINE_EVENT(mmap_lock_template, mmap_lock_released, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, u64 duration, + bool write, bool success), + + TP_ARGS(mm, memcg_path, duration, write, success) +); + +#endif /* _TRACE_MMAP_LOCK_H */ + +/* This part must be outside protection */ +#include diff --git a/mm/Kconfig b/mm/Kconfig index 6c974888f86f..b602df8bcee0 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -818,6 +818,23 @@ config DEVICE_PRIVATE config FRAME_VECTOR bool +config MMAP_LOCK_STATS + bool "mmap_lock stats / instrumentation" + select HISTOGRAM + default n + + help + Enables tracepoints around mmap_lock (start aquiring, acquire + returned, and released), which are off by default + controlled at + runtime. These can be used for deeper debugging of contention + issues, via e.g. BPF. + + This option has a small (small fraction of 1%) fixed overhead + even if tracepoints aren't actually in use at runtime, since it + requires un-inlining some functions. + + If unsure, say "n". + config ARCH_USES_HIGH_VMA_FLAGS bool config ARCH_HAS_PKEYS diff --git a/mm/Makefile b/mm/Makefile index d5649f1c12c0..eb6ed855a002 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -121,3 +121,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o +obj-$(CONFIG_MMAP_LOCK_STATS) += mmap_lock.o diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c new file mode 100644 index 000000000000..1624f90164c0 --- /dev/null +++ b/mm/mmap_lock.c @@ -0,0 +1,224 @@ +// SPDX-License-Identifier: GPL-2.0 +#define CREATE_TRACE_POINTS +#include + +#include +#include +#include +#include +#include +#include +#include + +#ifdef CONFIG_MEMCG + +DEFINE_PER_CPU(char[MAX_FILTER_STR_VAL], trace_memcg_path); + +/* + * Write the given mm_struct's memcg path to a percpu buffer, and return a + * pointer to it. If the path cannot be determined, the buffer will contain the + * empty string. + * + * Note: buffers are allocated per-cpu to avoid locking, so preemption must be + * disabled by the caller before calling us, and re-enabled only after the + * caller is done with the pointer. + */ +static const char *get_mm_memcg_path(struct mm_struct *mm) +{ + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm); + + if (memcg != NULL && likely(memcg->css.cgroup != NULL)) { + char *buf = this_cpu_ptr(trace_memcg_path); + + cgroup_path(memcg->css.cgroup, buf, MAX_FILTER_STR_VAL); + return buf; + } + return ""; +} + +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ + do { \ + if (trace_mmap_lock_##type##_enabled()) { \ + get_cpu(); \ + trace_mmap_lock_##type(mm, get_mm_memcg_path(mm), \ + ##__VA_ARGS__); \ + put_cpu(); \ + } \ + } while (0) + +#else /* !CONFIG_MEMCG */ + +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ + trace_mmap_lock_##type(mm, "", ##__VA_ARGS__) + +#endif /* CONFIG_MEMCG */ + +/* + * Trace calls must be in a separate file, as otherwise there's a circuclar + * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. + */ + +static void trace_start_locking(struct mm_struct *mm, bool write) +{ + TRACE_MMAP_LOCK_EVENT(start_locking, mm, 0, write, true); +} + +static void trace_acquire_returned(struct mm_struct *mm, u64 start_time_ns, + bool write, bool success) +{ + TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, + sched_clock() - start_time_ns, write, success); +} + +static void trace_released(struct mm_struct *mm, bool write) +{ + TRACE_MMAP_LOCK_EVENT(released, mm, 0, write, true); +} + +static bool trylock_impl(struct mm_struct *mm, + int (*trylock)(struct rw_semaphore *), bool write) +{ + bool ret; + + trace_start_locking(mm, write); + ret = trylock(&mm->mmap_lock) != 0; + /* Avoid calling sched_clock() for trylocks; assume duration = 0. */ + TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, 0, write, ret); + return ret; +} + +static inline void lock_impl(struct mm_struct *mm, + void (*lock)(struct rw_semaphore *), bool write) +{ + u64 start_time_ns; + + trace_start_locking(mm, write); + start_time_ns = sched_clock(); + lock(&mm->mmap_lock); + trace_acquire_returned(mm, start_time_ns, write, true); +} + +static inline int lock_return_impl(struct mm_struct *mm, + int (*lock)(struct rw_semaphore *), + bool write) +{ + u64 start_time_ns; + int ret; + + trace_start_locking(mm, write); + start_time_ns = sched_clock(); + ret = lock(&mm->mmap_lock); + trace_acquire_returned(mm, start_time_ns, write, ret == 0); + return ret; +} + +static inline void unlock_impl(struct mm_struct *mm, + void (*unlock)(struct rw_semaphore *), + bool write) +{ + unlock(&mm->mmap_lock); + trace_released(mm, write); +} + +void mmap_init_lock(struct mm_struct *mm) +{ + init_rwsem(&mm->mmap_lock); +} + +void mmap_write_lock(struct mm_struct *mm) +{ + lock_impl(mm, down_write, true); +} +EXPORT_SYMBOL(mmap_write_lock); + +void mmap_write_lock_nested(struct mm_struct *mm, int subclass) +{ + u64 start_time_ns; + + trace_start_locking(mm, true); + start_time_ns = sched_clock(); + down_write_nested(&mm->mmap_lock, subclass); + trace_acquire_returned(mm, start_time_ns, true, true); +} +EXPORT_SYMBOL(mmap_write_lock_nested); + +int mmap_write_lock_killable(struct mm_struct *mm) +{ + return lock_return_impl(mm, down_write_killable, true); +} +EXPORT_SYMBOL(mmap_write_lock_killable); + +bool mmap_write_trylock(struct mm_struct *mm) +{ + return trylock_impl(mm, down_write_trylock, true); +} +EXPORT_SYMBOL(mmap_write_trylock); + +void mmap_write_unlock(struct mm_struct *mm) +{ + unlock_impl(mm, up_write, true); +} +EXPORT_SYMBOL(mmap_write_unlock); + +void mmap_write_downgrade(struct mm_struct *mm) +{ + downgrade_write(&mm->mmap_lock); + TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, 0, false, true); +} +EXPORT_SYMBOL(mmap_write_downgrade); + +void mmap_read_lock(struct mm_struct *mm) +{ + lock_impl(mm, down_read, false); +} +EXPORT_SYMBOL(mmap_read_lock); + +int mmap_read_lock_killable(struct mm_struct *mm) +{ + return lock_return_impl(mm, down_read_killable, false); +} +EXPORT_SYMBOL(mmap_read_lock_killable); + +bool mmap_read_trylock(struct mm_struct *mm) +{ + return trylock_impl(mm, down_read_trylock, false); +} +EXPORT_SYMBOL(mmap_read_trylock); + +void mmap_read_unlock(struct mm_struct *mm) +{ + unlock_impl(mm, up_read, false); +} +EXPORT_SYMBOL(mmap_read_unlock); + +bool mmap_read_trylock_non_owner(struct mm_struct *mm) +{ + if (mmap_read_trylock(mm)) { + rwsem_release(&mm->mmap_lock.dep_map, _RET_IP_); + trace_released(mm, false); + return true; + } + return false; +} +EXPORT_SYMBOL(mmap_read_trylock_non_owner); + +void mmap_read_unlock_non_owner(struct mm_struct *mm) +{ + up_read_non_owner(&mm->mmap_lock); + trace_released(mm, false); +} +EXPORT_SYMBOL(mmap_read_unlock_non_owner); + +void mmap_assert_locked(struct mm_struct *mm) +{ + lockdep_assert_held(&mm->mmap_lock); + VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); +} +EXPORT_SYMBOL(mmap_assert_locked); + +void mmap_assert_write_locked(struct mm_struct *mm) +{ + lockdep_assert_held_write(&mm->mmap_lock); + VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); +} +EXPORT_SYMBOL(mmap_assert_write_locked);