From patchwork Fri Jun 16 19:34:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arjun Roy X-Patchwork-Id: 13283232 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 218A1EB64D8 for ; Fri, 16 Jun 2023 19:34:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A24C56B0074; Fri, 16 Jun 2023 15:34:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D53D8E0003; Fri, 16 Jun 2023 15:34:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89C928E0002; Fri, 16 Jun 2023 15:34:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7A1D06B0074 for ; Fri, 16 Jun 2023 15:34:41 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3A8BFB085F for ; Fri, 16 Jun 2023 19:34:41 +0000 (UTC) X-FDA: 80909613162.04.5FCB0BB Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf06.hostedemail.com (Postfix) with ESMTP id 60A6118000D for ; Fri, 16 Jun 2023 19:34:39 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=NafbEdhe; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of arjunroy.kdev@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=arjunroy.kdev@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686944079; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=hCQ0oqZebt0f7w02ep0VGISdxy1EYqDBBDeVHsi8aZc=; b=xCGGAb988ez7g1ZjW61+cUUBk9Tv8rgO4NZv0Z/D2sRNHDuQZn3/PAwGZONG0rnFIZRGwm 0DygQCbQv9MtetwM08w5wiI69y+S3QGq8pZkueNviwH8vvOddZlPj92RaqLQoOpllkSa7p PInjQ4tPsixMnK558Cq/UIN32xY/zb8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=NafbEdhe; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of arjunroy.kdev@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=arjunroy.kdev@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686944079; a=rsa-sha256; cv=none; b=Ftrf9/U69xgRYLGllqjNnaWsPF8VhNBg/Iy/pCFI67wyjvLDqorz1WLQghELWoYqvCDVRR /sv7KBbQ8caBFGmVp2X40vpjmW6UbhnYLIQIBsinGFi0eF0vTti0DFRl23dOugN1N+eXNQ 6arXDFj05Nlx8S64JTBEFVaPBJPo2jM= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1b4fef08cfdso7522295ad.1 for ; Fri, 16 Jun 2023 12:34:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686944078; x=1689536078; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=hCQ0oqZebt0f7w02ep0VGISdxy1EYqDBBDeVHsi8aZc=; b=NafbEdheF2Jl3IkyDMvQ16RvmFOkhfOwMhPp+EF4a2hqVHgGUXDPSLyCd8T6zhZ7pm SfrBcrtweDUHa1vBHHgEftw2vMayhcDhjB7WQ9y3vWfzzRwVxnQGzLZ/s9YYBHlIrKAt uvXbYn2OnixVdEQwbFKG7Nd5/UK3biYJEr6Po0OcsmOnrbS/yeQIWT5UkOtQ7fIcyKwY bDOXPOPAmWmXmQ7vsqTGzVBgqwEmui+qwGwv7vd8AL5ieZp5eIrGSJ1t5aFiZRCPbWTZ 4qCwCSRh+V8GfsapDJ1WoyCsDgbmL8GZw7tAco7lV9nQYgFdFR6Qu4CS9bLP6QLPrNcq 6M2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686944078; x=1689536078; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hCQ0oqZebt0f7w02ep0VGISdxy1EYqDBBDeVHsi8aZc=; b=kBIwiWDNKNVfTd68STz0mj9wi4d8o631iqRbSrbkAI5s+fclGgXEOJ7xJ2gVvr9HXi UBZDZcnSDOMHrhfYETbcbjCSXMQRGpqPya+5KPI3kyputRrEU4jDxrudk/WQ7QHLorh9 AAgH2lX/zP0qNf+UXlzkQSwoH1WEFckOEv6hiKzDM0ohgo1OT/HwEDEJlFb06UlhQzC5 J+3UBseyxbGBvI0eyXRwKhGJbJ7TiUcQV05tUSVwoFc81azvHXb2+DxcDNUT2cwhKzzk JR4wAHfwOT4ed9OprypUwkftEcLwIGa+xdh6WU7lMMw7PnTVCWUuzJTiZE3l/R1j6iXW a18Q== X-Gm-Message-State: AC+VfDxGUU8Lz+CFQ0bgVoPAP0gSLqKEaCjSzlC6y/zOur8j489EHn7r i83COMHpctidRTWd0T1dmYY= X-Google-Smtp-Source: ACHHUZ4H3z4L5ojkjpF+pepdCWhBtkAp4Lx17JLNH59rUbrhtMbC1SG3MPpojScFnWgEIWFEZFa5+g== X-Received: by 2002:a17:903:280e:b0:1b0:48e9:cddd with SMTP id kp14-20020a170903280e00b001b048e9cdddmr2124411plb.69.1686944077968; Fri, 16 Jun 2023 12:34:37 -0700 (PDT) Received: from phantasmagoria.svl.corp.google.com ([2620:15c:2c4:200:3a2f:3090:9c74:c638]) by smtp.gmail.com with ESMTPSA id q11-20020a170902a3cb00b001b077301a58sm7614912plb.79.2023.06.16.12.34.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Jun 2023 12:34:37 -0700 (PDT) From: Arjun Roy To: netdev@vger.kernel.org Cc: arjunroy@google.com, edumazet@google.com, soheil@google.com, kuba@kernel.org, akpm@linux-foundation.org, dsahern@kernel.org, davem@davemloft.net, linux-mm@kvack.org, pabeni@redhat.com Subject: [net-next,v2] tcp: Use per-vma locking for receive zerocopy Date: Fri, 16 Jun 2023 12:34:27 -0700 Message-ID: <20230616193427.3908429-1-arjunroy.kdev@gmail.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 60A6118000D X-Stat-Signature: 5ekx9hdwe59zb4qt1bkux6jpwzir9pn9 X-HE-Tag: 1686944079-953714 X-HE-Meta: U2FsdGVkX19WhWo2xWiHhBHTUHF2vCqHpzT/954LMu6pYr95q/NU5TpARxd2+3i/4/UvHrVO4cr7Qrpc7MOldLl/319k8WY7qY/jbvPBjTn1cZQXIaWRjuh1hmUcPu+BA5q+rKfQEycU8eEhmZRdDQ6w/WmSBMNAkRU6n9GkZAFhu/LOYkTizjgKUWDzue163aKWOkK9kvLT9X0t7Ja/wlyhcMXkrzyshRyrqIxTATima+mMuX6T1M4yIW0oDwtXdxIuyhedM6QMJxihgy63K2iaQj5V+PES6ZlWrMK18oVtKTxEbnk8qO6M9TJL0fLKJ2ipjP9b50fIqKfOdfe2f6CEsqdQESy8xI3jJ6Q5uMzFKlxHevuF8CjV3a78S4Yi+IrrFzHPkyBO0eWoEufMH3x480sXipGjPoAZ96H7YREEW+Q3unPQJTvWCbu8nXxmw7EvL5XdKkKAPJX2fyvdLLpSY8jqArBCYtRESnVtiBpobTAnqHN3K8jHuulZuna99O/D0nveoFRTybla/abvkuhQeae6bQ1PtMtXuFoK9pS3J8l5JTn1LCkIzLveCcfIq8G2Dhr60QwzWgexuI0q5J36mJoOLrUcuq3vaNy1Z3e3bWLbA9amiA9dU/wUPhK+IUR0oSeZld++KJlc9S80i1yjQDxyaAYs0C/GikIkoFobhfMQZ/5lkA9Cao+h8SwD1Ng7PXj7X7GCOPP/0RhPpVe+3W7I6BH/pPAA7nbSwPu7zM/bR3q/dQK7/Tpq1QDXwAK1pky/YT3uz+v6slpr0HDZUTY2LrtnpmTdFzhsYM9FY4YXyfLuVgmOkjUcBfet1B2qWbdM7KxhOE1vNM342BGgm3OqseSxqT8hyo2tdW+6StS41GTgTPqYBaZ99C3fz9B05G5DCfcqj80PTIHJQhe5d2yJGln4GPiN20VV8+/3DaLPkyGpskuLxd4oGReFfbiUQIwqLrA0vdZqY4C 6X1t2HCk c0gQVYRT/ooEPMKzdhoWMCjuwKMJhaW5hLjVPlH3/eH2TNdKzyet1ceeHJjwh7NfCYenquNXe2RIkc5jMF9fvS1bKcFaljM6Yzxnd5Dm48FrMYOBIL8D5xhmqDUmPFDYCxc1oODe9ryPaNkgVUAL5L9nzwxZn4dcuQeWM7X9tGB2xu94TNF7oOKfrV7Ge6/YVfJAQbfwbSKcMI40CWR9r6Jw1tR2EkpA8TvS07ppX4oFdeSaqYV86Jn/uRWo9Zm51G+Eg9cUD23cQTGJ5wBBPNC+g+EUX1CQaSNHiHKKDubQ6ie4Opt23WwItNcq/Jmb9X7t8DSht1A1iB5JYDY/Nntn7eTt5OqkTSyXzhRt4xhqhgIga0S6aeZ7TIONauD88SfInTDqSsYS/6nlDF+128uXtHXsc0qChdmeRxDph403zwQpg17kt2CSTW2MOC5iTStDF/TZ52eLTVJ7CroeMYIk8UY9Wpasns6R7T4EmV77oV43IUCs5d5l9s5l1Qiom8wBXZoPAmLGDCNB6tp7SAR8RUxnApfYQUrTtnShcd5i0gRY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Arjun Roy Per-VMA locking allows us to lock a struct vm_area_struct without taking the process-wide mmap lock in read mode. Consider a process workload where the mmap lock is taken constantly in write mode. In this scenario, all zerocopy receives are periodically blocked during that period of time - though in principle, the memory ranges being used by TCP are not touched by the operations that need the mmap write lock. This results in performance degradation. Now consider another workload where the mmap lock is never taken in write mode, but there are many TCP connections using receive zerocopy that are concurrently receiving. These connections all take the mmap lock in read mode, but this does induce a lot of contention and atomic ops for this process-wide lock. This results in additional CPU overhead caused by contending on the cache line for this lock. However, with per-vma locking, both of these problems can be avoided. As a test, I ran an RPC-style request/response workload with 4KB payloads and receive zerocopy enabled, with 100 simultaneous TCP connections. I measured perf cycles within the find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and without per-vma locking enabled. When using process-wide mmap semaphore read locking, about 1% of measured perf cycles were within this path. With per-VMA locking, this value dropped to about 0.45%. Signed-off-by: Arjun Roy Reviewed-by: Eric Dumazet --- v2 change: Fixed linker error on builds without CONFIG_INET. --- MAINTAINERS | 1 + include/linux/net_mm.h | 17 ++++++++++++++++ include/net/tcp.h | 1 + mm/memory.c | 7 ++++--- net/ipv4/tcp.c | 45 ++++++++++++++++++++++++++++++++++-------- 5 files changed, 60 insertions(+), 11 deletions(-) create mode 100644 include/linux/net_mm.h diff --git a/MAINTAINERS b/MAINTAINERS index c6fa6ed454f4..a7c495e3323b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14727,6 +14727,7 @@ NETWORKING [TCP] M: Eric Dumazet L: netdev@vger.kernel.org S: Maintained +F: include/linux/net_mm.h F: include/linux/tcp.h F: include/net/tcp.h F: include/trace/events/tcp.h diff --git a/include/linux/net_mm.h b/include/linux/net_mm.h new file mode 100644 index 000000000000..b298998bd5a0 --- /dev/null +++ b/include/linux/net_mm.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifdef CONFIG_MMU + +#ifdef CONFIG_INET +extern const struct vm_operations_struct tcp_vm_ops; +static inline bool vma_is_tcp(const struct vm_area_struct *vma) +{ + return vma->vm_ops == &tcp_vm_ops; +} +#else +static inline bool vma_is_tcp(const struct vm_area_struct *vma) +{ + return false; +} +#endif /* CONFIG_INET*/ + +#endif /* CONFIG_MMU */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 5066e4586cf0..bfa5e27205ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -45,6 +45,7 @@ #include #include #include +#include extern struct inet_hashinfo tcp_hashinfo; diff --git a/mm/memory.c b/mm/memory.c index f69fbc251198..3e46b4d881dc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -77,6 +77,7 @@ #include #include #include +#include #include @@ -5280,12 +5281,12 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, if (!vma) goto inval; - /* Only anonymous vmas are supported for now */ - if (!vma_is_anonymous(vma)) + /* Only anonymous and tcp vmas are supported for now */ + if (!vma_is_anonymous(vma) && !vma_is_tcp(vma)) goto inval; /* find_mergeable_anon_vma uses adjacent vmas which are not locked */ - if (!vma->anon_vma) + if (!vma->anon_vma && !vma_is_tcp(vma)) goto inval; if (!vma_start_read(vma)) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 8d20d9221238..6240d81476b8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1877,7 +1877,7 @@ void tcp_update_recv_tstamps(struct sk_buff *skb, } #ifdef CONFIG_MMU -static const struct vm_operations_struct tcp_vm_ops = { +const struct vm_operations_struct tcp_vm_ops = { }; int tcp_mmap(struct file *file, struct socket *sock, @@ -2176,6 +2176,34 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk, } } +static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm, + unsigned long address, + bool *mmap_locked) +{ + struct vm_area_struct *vma = NULL; + +#ifdef CONFIG_PER_VMA_LOCK + vma = lock_vma_under_rcu(mm, address); +#endif + if (vma) { + if (!vma_is_tcp(vma)) { + vma_end_read(vma); + return NULL; + } + *mmap_locked = false; + return vma; + } + + mmap_read_lock(mm); + vma = vma_lookup(mm, address); + if (!vma || !vma_is_tcp(vma)) { + mmap_read_unlock(mm); + return NULL; + } + *mmap_locked = true; + return vma; +} + #define TCP_ZEROCOPY_PAGE_BATCH_SIZE 32 static int tcp_zerocopy_receive(struct sock *sk, struct tcp_zerocopy_receive *zc, @@ -2193,6 +2221,7 @@ static int tcp_zerocopy_receive(struct sock *sk, u32 seq = tp->copied_seq; u32 total_bytes_to_map; int inq = tcp_inq(sk); + bool mmap_locked; int ret; zc->copybuf_len = 0; @@ -2217,13 +2246,10 @@ static int tcp_zerocopy_receive(struct sock *sk, return 0; } - mmap_read_lock(current->mm); - - vma = vma_lookup(current->mm, address); - if (!vma || vma->vm_ops != &tcp_vm_ops) { - mmap_read_unlock(current->mm); + vma = find_tcp_vma(current->mm, address, &mmap_locked); + if (!vma) return -EINVAL; - } + vma_len = min_t(unsigned long, zc->length, vma->vm_end - address); avail_len = min_t(u32, vma_len, inq); total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1); @@ -2297,7 +2323,10 @@ static int tcp_zerocopy_receive(struct sock *sk, zc, total_bytes_to_map); } out: - mmap_read_unlock(current->mm); + if (mmap_locked) + mmap_read_unlock(current->mm); + else + vma_end_read(vma); /* Try to copy straggler data. */ if (!ret) copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);