From patchwork Tue Oct 20 18:47:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 11847655 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 445CB14B4 for ; Tue, 20 Oct 2020 18:47:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 92E0A2223C for ; Tue, 20 Oct 2020 18:47:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="bM0YAINO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 92E0A2223C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 67A8F6B0068; Tue, 20 Oct 2020 14:47:57 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 62AA96B006C; Tue, 20 Oct 2020 14:47:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 541F66B006E; Tue, 20 Oct 2020 14:47:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id 0B3016B0068 for ; Tue, 20 Oct 2020 14:47:56 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8D0F91EE6 for ; Tue, 20 Oct 2020 18:47:56 +0000 (UTC) X-FDA: 77393188152.23.shake23_251351927241 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 66AA137606 for ; Tue, 20 Oct 2020 18:47:56 +0000 (UTC) X-Spam-Summary: 1,0,0,ac0e9549f9c5399f,d41d8cd98f00b204,32zcpxw0kchwf2jqwfxrzxxjslttlqj.htrqnsz2-rrp0fhp.twl@flex--axelrasmussen.bounces.google.com,,RULES_HIT:4:41:152:355:379:541:800:960:973:988:989:1260:1277:1313:1314:1345:1359:1431:1434:1437:1516:1518:1593:1594:1605:1730:1747:1777:1792:1801:2393:2559:2562:2638:2693:2892:2897:3138:3139:3140:3141:3142:3152:3865:3866:3867:3868:3870:3871:3872:4250:4321:4605:5007:6119:6261:6653:7514:7903:9149:9969:10004:10946:11026:11473:11658:11914:12043:12048:12291:12296:12297:12438:12555:12683:12895:12986:13137:13150:13161:13229:13230:13231:14096:14097:14394:14659:21080:21444:21451:21627:21740:21966:21987:21990:30012:30029:30054:30075,0,RBL:209.85.222.202:@flex--axelrasmussen.bounces.google.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04yfpy64hh5b9316warute8ksa85sypossxgam4n5cm47f648w7s8r4s7ao1mpa.4r79cg5kmbsk17fjxg8zdp5s3bmptxfncj5fheo8km3xp1fiorq6rraw3faso1n.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netch eck:none X-HE-Tag: shake23_251351927241 X-Filterd-Recvd-Size: 15929 Received: from mail-qk1-f202.google.com (mail-qk1-f202.google.com [209.85.222.202]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 18:47:55 +0000 (UTC) Received: by mail-qk1-f202.google.com with SMTP id d124so2680722qke.4 for ; Tue, 20 Oct 2020 11:47:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=nOu5990EAVWcl0vENKdyJ6wWQ+WSv1D3fkNhY8O16/s=; b=bM0YAINO4DolnTNqcOiAoMMaNMmGNU2g4O9x3IkxwhodiYoLs+gJgf1AtuOE6ysf5c pPR6DdMtm2v9AWN4Ll+stnYkfd5T94MQLu5UqYV9WP47IuWyzVf1sdouYWizZWP0/8cp 7HhFP0nNMBnlv6qO6lRcCj5Z7hLXLkjxfQm326PxmrgZ2GqTcjTx0LI9UX9qHRVdsx/7 +jkyA8PQb6vj3TDhu5lUi3Pj49JPK+lWTC2/XNXbw/+2mFoBldketIxBECZT2Air9eQ3 yiCs/X0dwnamRtAHHjWCY3p5avwpDuLwvKOy3ZYQxg6Bvq951YSMFj7G8o549fKBneZG rMKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=nOu5990EAVWcl0vENKdyJ6wWQ+WSv1D3fkNhY8O16/s=; b=OFm9T4mcXf5Jfbvj2jmwJJPFplAYrRGiWzcgGPsLDYsWLMagCvp6xgyk1o6yHHFTAO Fw1K+/J5jJqW/IefWB776FwWv99H7QOrvQlNVIm4Z+QofTRRafxrKry8ftCIDVm6ll4r WavI3Y5DWQGFA9S8l8jM9tc7M7D2jC0FAHYZo2NTAhdkNo7tvFJuN4JH186xisk3tzg8 tsr54tReDApu0NEd/3+lkk1HesGYeAN9vuLGjTZKygNsIYGafXKsrktUchACxmTErVu1 35EawAOFNSVAt5SCLxNOcNxYoycbU/9I8OPOE3OVfILjoxcDnBfdOcY324x/377AragC MeaQ== X-Gm-Message-State: AOAM532Ic4ahF2akGvyVqABPzzfs5XZyHxdG9GXbiPjwG4wlJo4Ml+sZ D2KlwV4DfNJeK+bi7kg5/XeeGqlL6NiXfbpL8c2Z X-Google-Smtp-Source: ABdhPJxMUUBvTB0kRbndXitw+F3FhGXvMgMiRTyJLGiaW4F6WD9OCc1Bh0vH94c7z/TpowGXgME7mRtcqlF8KHijn0Ji X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:f693:9fff:feef:c8f8]) (user=axelrasmussen job=sendgmr) by 2002:a0c:e387:: with SMTP id a7mr4802058qvl.21.1603219675121; Tue, 20 Oct 2020 11:47:55 -0700 (PDT) Date: Tue, 20 Oct 2020 11:47:46 -0700 In-Reply-To: <20201020184746.300555-1-axelrasmussen@google.com> Message-Id: <20201020184746.300555-2-axelrasmussen@google.com> Mime-Version: 1.0 References: <20201020184746.300555-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.29.0.rc1.297.gfa9743e501-goog Subject: [PATCH v4 1/1] mmap_lock: add tracepoints around lock acquisition From: Axel Rasmussen To: Steven Rostedt , Ingo Molnar , Andrew Morton , Michel Lespinasse , Vlastimil Babka , Daniel Jordan , Jann Horn , Chinwen Chang , Davidlohr Bueso , David Rientjes Cc: Yafang Shao , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Axel Rasmussen X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The goal of these tracepoints is to be able to debug lock contention issues. This lock is acquired on most (all?) mmap / munmap / page fault operations, so a multi-threaded process which does a lot of these can experience significant contention. We trace just before we start acquisition, when the acquisition returns (whether it succeeded or not), and when the lock is released (or downgraded). The events are broken out by lock type (read / write). The events are also broken out by memcg path. For container-based workloads, users often think of several processes in a memcg as a single logical "task", so collecting statistics at this level is useful. The end goal is to get latency information. This isn't directly included in the trace events. Instead, users are expected to compute the time between "start locking" and "acquire returned", using e.g. synthetic events or BPF. The benefit we get from this is simpler code. Because we use tracepoint_enabled() to decide whether or not to trace, this patch has effectively no overhead unless tracepoints are enabled at runtime. If tracepoints are enabled, there is a performance impact, but how much depends on exactly what e.g. the BPF program does. Reviewed-by: Michel Lespinasse Acked-by: Yafang Shao Acked-by: David Rientjes Signed-off-by: Axel Rasmussen --- include/linux/mmap_lock.h | 93 ++++++++++++++++++++++++++++-- include/trace/events/mmap_lock.h | 98 ++++++++++++++++++++++++++++++++ mm/Makefile | 2 +- mm/mmap_lock.c | 80 ++++++++++++++++++++++++++ 4 files changed, 267 insertions(+), 6 deletions(-) create mode 100644 include/trace/events/mmap_lock.h create mode 100644 mm/mmap_lock.c diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index e1afb420fc94..453d82879266 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -1,11 +1,63 @@ #ifndef _LINUX_MMAP_LOCK_H #define _LINUX_MMAP_LOCK_H +#include +#include #include +#include +#include +#include #define MMAP_LOCK_INITIALIZER(name) \ .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock), +DECLARE_TRACEPOINT(mmap_lock_start_locking); +DECLARE_TRACEPOINT(mmap_lock_acquire_returned); +DECLARE_TRACEPOINT(mmap_lock_released); + +#ifdef CONFIG_TRACING + +void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write); +void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write, + bool success); +void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write); + +static inline void __mmap_lock_trace_start_locking(struct mm_struct *mm, + bool write) +{ + if (tracepoint_enabled(mmap_lock_start_locking)) + __mmap_lock_do_trace_start_locking(mm, write); +} + +static inline void __mmap_lock_trace_acquire_returned(struct mm_struct *mm, + bool write, bool success) +{ + if (tracepoint_enabled(mmap_lock_acquire_returned)) + __mmap_lock_do_trace_acquire_returned(mm, write, success); +} + +static inline void __mmap_lock_trace_released(struct mm_struct *mm, bool write) +{ + if (tracepoint_enabled(mmap_lock_released)) + __mmap_lock_do_trace_released(mm, write); +} + +#else /* !CONFIG_TRACING */ + +static inline void __mmap_lock_trace_start_locking(struct mm_struct *mm, + bool write) +{ +} +static inline void __mmap_lock_trace_acquire_returned(struct mm_struct *mm, + bool write, bool success) +{ +} +static inline void __mmap_lock_trace_released(struct mm_struct *mm, bool write) +{ +} + +#endif /* CONFIG_TRACING */ + static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); @@ -13,58 +65,88 @@ static inline void mmap_init_lock(struct mm_struct *mm) static inline void mmap_write_lock(struct mm_struct *mm) { + __mmap_lock_trace_start_locking(mm, true); down_write(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, true, true); } static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass) { + __mmap_lock_trace_start_locking(mm, true); down_write_nested(&mm->mmap_lock, subclass); + __mmap_lock_trace_acquire_returned(mm, true, true); } static inline int mmap_write_lock_killable(struct mm_struct *mm) { - return down_write_killable(&mm->mmap_lock); + int ret; + + __mmap_lock_trace_start_locking(mm, true); + ret = down_write_killable(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, true, ret == 0); + return ret; } static inline bool mmap_write_trylock(struct mm_struct *mm) { - return down_write_trylock(&mm->mmap_lock) != 0; + bool ret; + + __mmap_lock_trace_start_locking(mm, true); + ret = down_write_trylock(&mm->mmap_lock) != 0; + __mmap_lock_trace_acquire_returned(mm, true, ret); + return ret; } static inline void mmap_write_unlock(struct mm_struct *mm) { up_write(&mm->mmap_lock); + __mmap_lock_trace_released(mm, true); } static inline void mmap_write_downgrade(struct mm_struct *mm) { downgrade_write(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, false, true); } static inline void mmap_read_lock(struct mm_struct *mm) { + __mmap_lock_trace_start_locking(mm, false); down_read(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, false, true); } static inline int mmap_read_lock_killable(struct mm_struct *mm) { - return down_read_killable(&mm->mmap_lock); + int ret; + + __mmap_lock_trace_start_locking(mm, false); + ret = down_read_killable(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, false, ret == 0); + return ret; } static inline bool mmap_read_trylock(struct mm_struct *mm) { - return down_read_trylock(&mm->mmap_lock) != 0; + bool ret; + + __mmap_lock_trace_start_locking(mm, false); + ret = down_read_trylock(&mm->mmap_lock) != 0; + __mmap_lock_trace_acquire_returned(mm, false, ret); + return ret; } static inline void mmap_read_unlock(struct mm_struct *mm) { up_read(&mm->mmap_lock); + __mmap_lock_trace_released(mm, false); } static inline bool mmap_read_trylock_non_owner(struct mm_struct *mm) { - if (down_read_trylock(&mm->mmap_lock)) { + if (mmap_read_trylock(mm)) { rwsem_release(&mm->mmap_lock.dep_map, _RET_IP_); + __mmap_lock_trace_released(mm, false); return true; } return false; @@ -73,6 +155,7 @@ static inline bool mmap_read_trylock_non_owner(struct mm_struct *mm) static inline void mmap_read_unlock_non_owner(struct mm_struct *mm) { up_read_non_owner(&mm->mmap_lock); + __mmap_lock_trace_released(mm, false); } static inline void mmap_assert_locked(struct mm_struct *mm) diff --git a/include/trace/events/mmap_lock.h b/include/trace/events/mmap_lock.h new file mode 100644 index 000000000000..2268ff967072 --- /dev/null +++ b/include/trace/events/mmap_lock.h @@ -0,0 +1,98 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM mmap_lock + +#if !defined(_TRACE_MMAP_LOCK_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_MMAP_LOCK_H + +#include +#include + +struct mm_struct; + +TRACE_EVENT(mmap_lock_start_locking, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write), + + TP_ARGS(mm, memcg_path, write), + + TP_STRUCT__entry( + __field(struct mm_struct *, mm) + __string(memcg_path, memcg_path) + __field(bool, write) + ), + + TP_fast_assign( + __entry->mm = mm; + __assign_str(memcg_path, memcg_path); + __entry->write = write; + ), + + TP_printk( + "mm=%p memcg_path=%s write=%s\n", + __entry->mm, + __get_str(memcg_path), + __entry->write ? "true" : "false" + ) +); + +TRACE_EVENT(mmap_lock_acquire_returned, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write, + bool success), + + TP_ARGS(mm, memcg_path, write, success), + + TP_STRUCT__entry( + __field(struct mm_struct *, mm) + __string(memcg_path, memcg_path) + __field(bool, write) + __field(bool, success) + ), + + TP_fast_assign( + __entry->mm = mm; + __assign_str(memcg_path, memcg_path); + __entry->write = write; + __entry->success = success; + ), + + TP_printk( + "mm=%p memcg_path=%s write=%s success=%s\n", + __entry->mm, + __get_str(memcg_path), + __entry->write ? "true" : "false", + __entry->success ? "true" : "false" + ) +); + +TRACE_EVENT(mmap_lock_released, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write), + + TP_ARGS(mm, memcg_path, write), + + TP_STRUCT__entry( + __field(struct mm_struct *, mm) + __string(memcg_path, memcg_path) + __field(bool, write) + ), + + TP_fast_assign( + __entry->mm = mm; + __assign_str(memcg_path, memcg_path); + __entry->write = write; + ), + + TP_printk( + "mm=%p memcg_path=%s write=%s\n", + __entry->mm, + __get_str(memcg_path), + __entry->write ? "true" : "false" + ) +); + +#endif /* _TRACE_MMAP_LOCK_H */ + +/* This part must be outside protection */ +#include diff --git a/mm/Makefile b/mm/Makefile index 069f216e109e..b6cd2fffa492 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -52,7 +52,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ mm_init.o percpu.o slab_common.o \ compaction.o vmacache.o \ interval_tree.o list_lru.o workingset.o \ - debug.o gup.o $(mmu-y) + debug.o gup.o mmap_lock.o $(mmu-y) # Give 'page_alloc' its own module-parameter namespace page-alloc-y := page_alloc.o diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c new file mode 100644 index 000000000000..52741eff7bea --- /dev/null +++ b/mm/mmap_lock.c @@ -0,0 +1,80 @@ +// SPDX-License-Identifier: GPL-2.0 +#define CREATE_TRACE_POINTS +#include + +#include +#include +#include +#include +#include +#include +#include + +EXPORT_TRACEPOINT_SYMBOL(mmap_lock_start_locking); +EXPORT_TRACEPOINT_SYMBOL(mmap_lock_acquire_returned); +EXPORT_TRACEPOINT_SYMBOL(mmap_lock_released); + +#ifdef CONFIG_MEMCG + +DEFINE_PER_CPU(char[MAX_FILTER_STR_VAL], trace_memcg_path); + +/* + * Write the given mm_struct's memcg path to a percpu buffer, and return a + * pointer to it. If the path cannot be determined, the buffer will contain the + * empty string. + * + * Note: buffers are allocated per-cpu to avoid locking, so preemption must be + * disabled by the caller before calling us, and re-enabled only after the + * caller is done with the pointer. + */ +static const char *get_mm_memcg_path(struct mm_struct *mm) +{ + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm); + + if (memcg != NULL && likely(memcg->css.cgroup != NULL)) { + char *buf = this_cpu_ptr(trace_memcg_path); + + cgroup_path(memcg->css.cgroup, buf, MAX_FILTER_STR_VAL); + return buf; + } + return ""; +} + +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ + do { \ + get_cpu(); \ + trace_mmap_lock_##type(mm, get_mm_memcg_path(mm), \ + ##__VA_ARGS__); \ + put_cpu(); \ + } while (0) + +#else /* !CONFIG_MEMCG */ + +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ + trace_mmap_lock_##type(mm, "", ##__VA_ARGS__) + +#endif /* CONFIG_MEMCG */ + +/* + * Trace calls must be in a separate file, as otherwise there's a circular + * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. + */ + +void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write) +{ + TRACE_MMAP_LOCK_EVENT(start_locking, mm, write); +} +EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking); + +void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write, + bool success) +{ + TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success); +} +EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned); + +void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) +{ + TRACE_MMAP_LOCK_EVENT(released, mm, write); +} +EXPORT_SYMBOL(__mmap_lock_do_trace_released);