From patchwork Tue Jul 11 20:20:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13309350 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6803DC001B0 for ; Tue, 11 Jul 2023 20:21:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0601D8D0007; Tue, 11 Jul 2023 16:21:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0103B8D0002; Tue, 11 Jul 2023 16:21:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1ADF8D0007; Tue, 11 Jul 2023 16:21:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D4A2C8D0002 for ; Tue, 11 Jul 2023 16:21:21 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B17634035E for ; Tue, 11 Jul 2023 20:21:21 +0000 (UTC) X-FDA: 81000450762.13.843AC30 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf12.hostedemail.com (Postfix) with ESMTP id 285D240013 for ; Tue, 11 Jul 2023 20:21:19 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=oRIKjCgd; spf=none (imf12.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689106880; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/8ipsblXmqhBdjmpy7pSUutiQ9uV/NjS0hvvOESjZ3g=; b=2azREM208lcaCRJJtKJvLIuFte2Qhx5BprfJcxzBu1CkUtjhsV4BluYitMBcO4W7zBRTFK 2ZqVAwYRbop0b3kC612SGbt+gi/oa1jvDfRtHfiJsZJRuqcPIAcPiT/JJ8s1WeAM/M0f8M sD7kvfojZ2yXyNuNmSsiZFrt/Kv8rTw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689106880; a=rsa-sha256; cv=none; b=snLOYbqFP8JWRvSV6udMEZXE0VUR7Zx4HEU+9WJLvRoEZFHaaH0gq1t8YCkKfvu4Tz49jE Q7FEsq7zhXqx//XpuSLpOM2jJ6xlPpxgHeAdawZeufjCYJojEo/XMUXj4DG/+CGYIUaAvM NM15omgVqi4kDiijWdq3kI+Ye+YH7LE= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=oRIKjCgd; spf=none (imf12.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=/8ipsblXmqhBdjmpy7pSUutiQ9uV/NjS0hvvOESjZ3g=; b=oRIKjCgdAvLsraMKft9V0w2oKZ gA8+g5HvWxvz4J9SdjP/9qtxqWDS4jIJdweAuxxJ07yxaaSCu0MhExZ19AW2OuvPf+CA6aC+NYOqT QsgdUoumFuLkq5/WvVKVt/mYvxzsa7oZ1+my3VBjHQ7LrQYRRkh7iEd+/l6iwx5E4Lo3bhWFbXcen HXyLgW8Wd4jARBTL4Z6MZxVI6s9jQrSqITbeKUWsnlYMYZms9rU81kzYHlX62UPOop87gF1igohrk 6TJx6U6NbyQJBf3mNlRd3N7lTbJ8mI8y5XtZtR6ZhMIGyIzri2Zmj1h68QaAbAoenY2dJY9625Dzi 2yTOA1Xg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qJJqs-00G1QN-D5; Tue, 11 Jul 2023 20:20:50 +0000 From: "Matthew Wilcox (Oracle)" To: linux-mm@kvack.org Cc: Arjun Roy , Eric Dumazet , Suren Baghdasaryan , linux-fsdevel@vger.kernel.org, Punit Agrawal , "David S . Miller" , Matthew Wilcox Subject: [PATCH v2 9/9] tcp: Use per-vma locking for receive zerocopy Date: Tue, 11 Jul 2023 21:20:47 +0100 Message-Id: <20230711202047.3818697-10-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230711202047.3818697-1-willy@infradead.org> References: <20230711202047.3818697-1-willy@infradead.org> MIME-Version: 1.0 X-Stat-Signature: zuee998mchhsdf6yq4s11g6xwzmwxshd X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 285D240013 X-Rspam-User: X-HE-Tag: 1689106879-300544 X-HE-Meta: U2FsdGVkX1+gzWleNaMF3BzSdHavxgppdiU+s34nX8inL+MmmgTbdv+E41ht9ivdMhmrZbYoIO5oaRcuYlt68ByDk7ZHW0P3a9Gtxkb4hElQXTArUwReHWpD02xIp5LVn/czIMqIzVXXAC1OW1AVlDSJVtCGJ4ppni7JlqhG+EfapKAlfTEgVplP38UWAv5PpCQupbqjKwJBFXM3WUA3tB73LswdjNUSUB14Ub0lRLh9i2I8VFIGqyIzBwhj00qc2Uzxl7pcbQEpWc4a894e+an0ZBc0wt6iDv1ZRSN0pLj0dFhs6KNuUjBCdDf6sSxeAp+CTOsEcs6crH0pncFr7jyQ9LuOlXWjdEEjUiUUQJ20Hgj3ZkfnwZlOKvFyEzONiRxhV93wEd47gTuk8vpkFnGG0fAsDhZkUPpYtwnURYBtWUNgHgaXoYURdVvInfCyE22wcO1mV9EXnrw8ehplqTCwBBDVF6zxvQkU++1UKKTVrIq0Yr+uRE2Uy8T5QfAQC8J6klmDhlnNenhjl14BN44Zq4KpF2Gl1YpRaDW3nJLccXCngRcQ4a9GM5OhJy9s3LceWiOz3gO0/PaqiNxA9O2zRxP87bGLhcSFhwby9A44s0ww/jfZ/BGS6iNs6u1uvzPojsxFIaDc1vzld8C+NjbOpomC4rdazjQ5xK53nYk/uhC28W+N2VS3IQ4dl/KYc3RMjIFnADYYuEC5bX64pZVs4ilGRBM4HGqBYDOOcTEKnPDvPkUxlx4dA5UUv4xgxpYxm70T0N6MdNqZMW3T4kPTaZdVR4tKHHEDZ3VLjLEUnPA1MVq0HDIp/aVLtX1p9m32PIvloPnoEnq1Evad8kBZMoI4Ou6UMPRI1icock4qY7bCbQuVWKswYgPTp0jvPhXPA/meQXmwZBq5/9IzZX8lx8BnXDPpccjzTJxBEWdZE7+VG6fFJheszCCxRnWn101uLfE3jbo0kAEbDcU HfYXT6bZ ysmz3O2w1RiLAFSWUAspNeRSGnAOIEIiVOcbX/7JfHGYV1L2N2bWoV3q+LVFgw/bcc7CxAU07kQsjHoTn0CvTz2jxZdk60In2WKmPGyFyRsN4uOBTnimB8yEM+A7GE9aB9quWgjxI5MEJ8cjbux6k7TaxcmbXnF/lmBIvKUq3lnzPs5U3Kqt7SrFUYEBFYRQiRb11nD5RkCAIgyecJMeBmN68no6Oa5wWcmBmhCEberGBjAI2rzuO0w7smk9JioIw1oVzMFjhhYzJ2L8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Arjun Roy Per-VMA locking allows us to lock a struct vm_area_struct without taking the process-wide mmap lock in read mode. Consider a process workload where the mmap lock is taken constantly in write mode. In this scenario, all zerocopy receives are periodically blocked during that period of time - though in principle, the memory ranges being used by TCP are not touched by the operations that need the mmap write lock. This results in performance degradation. Now consider another workload where the mmap lock is never taken in write mode, but there are many TCP connections using receive zerocopy that are concurrently receiving. These connections all take the mmap lock in read mode, but this does induce a lot of contention and atomic ops for this process-wide lock. This results in additional CPU overhead caused by contending on the cache line for this lock. However, with per-vma locking, both of these problems can be avoided. As a test, I ran an RPC-style request/response workload with 4KB payloads and receive zerocopy enabled, with 100 simultaneous TCP connections. I measured perf cycles within the find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and without per-vma locking enabled. When using process-wide mmap semaphore read locking, about 1% of measured perf cycles were within this path. With per-VMA locking, this value dropped to about 0.45%. Signed-off-by: Arjun Roy Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Suren Baghdasaryan --- net/ipv4/tcp.c | 39 ++++++++++++++++++++++++++++++++------- 1 file changed, 32 insertions(+), 7 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 1542de3f66f7..7118ec6cf886 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2038,6 +2038,30 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk, } } +static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm, + unsigned long address, bool *mmap_locked) +{ + struct vm_area_struct *vma = lock_vma_under_rcu(mm, address); + + if (vma) { + if (vma->vm_ops != &tcp_vm_ops) { + vma_end_read(vma); + return NULL; + } + *mmap_locked = false; + return vma; + } + + mmap_read_lock(mm); + vma = vma_lookup(mm, address); + if (!vma || vma->vm_ops != &tcp_vm_ops) { + mmap_read_unlock(mm); + return NULL; + } + *mmap_locked = true; + return vma; +} + #define TCP_ZEROCOPY_PAGE_BATCH_SIZE 32 static int tcp_zerocopy_receive(struct sock *sk, struct tcp_zerocopy_receive *zc, @@ -2055,6 +2079,7 @@ static int tcp_zerocopy_receive(struct sock *sk, u32 seq = tp->copied_seq; u32 total_bytes_to_map; int inq = tcp_inq(sk); + bool mmap_locked; int ret; zc->copybuf_len = 0; @@ -2079,13 +2104,10 @@ static int tcp_zerocopy_receive(struct sock *sk, return 0; } - mmap_read_lock(current->mm); - - vma = vma_lookup(current->mm, address); - if (!vma || vma->vm_ops != &tcp_vm_ops) { - mmap_read_unlock(current->mm); + vma = find_tcp_vma(current->mm, address, &mmap_locked); + if (!vma) return -EINVAL; - } + vma_len = min_t(unsigned long, zc->length, vma->vm_end - address); avail_len = min_t(u32, vma_len, inq); total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1); @@ -2159,7 +2181,10 @@ static int tcp_zerocopy_receive(struct sock *sk, zc, total_bytes_to_map); } out: - mmap_read_unlock(current->mm); + if (mmap_locked) + mmap_read_unlock(current->mm); + else + vma_end_read(vma); /* Try to copy straggler data. */ if (!ret) copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);