From patchwork Tue Sep 11 05:36:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595063 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE8D7921 for ; Tue, 11 Sep 2018 05:36:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDAAD2902E for ; Tue, 11 Sep 2018 05:36:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D1704292AD; Tue, 11 Sep 2018 05:36:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 127382902E for ; Tue, 11 Sep 2018 05:36:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 499A78E0005; Tue, 11 Sep 2018 01:36:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3F54A8E0001; Tue, 11 Sep 2018 01:36:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30D148E0005; Tue, 11 Sep 2018 01:36:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id DEBC58E0001 for ; Tue, 11 Sep 2018 01:36:27 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id j15-v6so12358482pfi.10 for ; Mon, 10 Sep 2018 22:36:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ErKSJHlV3UP3Qfuq7HvkzgnR2NQbuQmKWUVJArRileo=; b=Z3e0MHdpvaXbltajbLq/EXFp3GqpyytlVhbP6QnFePgzNZWqGUG4eWY5tSDK5PHOTa SvjRdalunAjx4AmsAxT6PL2vfoD3s74Y1hw3r6a79OzkRH/hOO9nV72DkXbUlMwqGu21 y0vBM2fYEO6Slmn88VYkJcz6tpEDEOGg4dTlTcli+SRmlKRG/5wuXepRRfDInypyl0bU lZNVC3fqx36IUaCIJ58ipQxpyMzTvOEIBPp+Bk9fAU7uTtmANIxXnhV6oRNcz2ZEKHtr OYneh4Y1JG0dc0tMXRDDlko87PKVaugBLruo4wi/vi2o3Ca1Qxg6alnzReoNP/be8yxa eNiQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51BVJUi4jsMFZnRWEtvurLQPPfF7/QZE3qvMTHi5LCNZUoeIu3fr qgZBl/hS6PYAi0kBhz/F1sttHl8SvtT63IjdxKnN2GfwUp9CHf6rpQTM76+AiOTGFQ8GW189D2f q18BtlsgSZ5DunvJ2PM24igo3rlP6wPBYvYU1rosokQxcKUGyRwVW2igTHBgQ3d87Eg== X-Received: by 2002:a63:d90b:: with SMTP id r11-v6mr26456670pgg.315.1536644187540; Mon, 10 Sep 2018 22:36:27 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYQQ6LIeYo+iFozebIIcOa1MRrnKseNsmXeeB3u9+phySgtt9gD4hmMYkmB+Rvg/JnAAaVa X-Received: by 2002:a63:d90b:: with SMTP id r11-v6mr26456600pgg.315.1536644186409; Mon, 10 Sep 2018 22:36:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644186; cv=none; d=google.com; s=arc-20160816; b=OR8AQdjE5uevQ5xStS6Wcg1Rvt6h5vUNqS3pll3A39HvlzUMW0yez0O4W88JIagABc MndYVDx+6QSJaRJFIqGyWi8Vz7zgObKt5PsXxh2Y8vnhs6r2GytyBHHL91poOIixPbG3 WELPK2PecXO+Oat7OAEUOw2i8HRsaW4wygHqZXr82w37/GtNhZlE47pl1QZUd2F3ekKF fTjmCUGaaNWijMwhD4gFk8mmMtWiPjI5gHhmgeMIIHBcoZyiu4l0lEmXjHKf+x6D2xmI s/2zm2D54s1+fJCPwMim1OZwltevlVW2UstbX5CE2tlWa5L3jvNNOXoHClV4ojIwFhyj 5ixw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=ErKSJHlV3UP3Qfuq7HvkzgnR2NQbuQmKWUVJArRileo=; b=uJ5aZ8Ys9Pbx6yDxKeCogSex/fzwdCmpjUFou1YG7pn0iiBF75q0iHKb5M68bU9hoo wRToqN7KuyILGuqT/PfqqH4XJ/NTtSFUyI9pk3SAu09x3YwUoqkUiLUFSCCnO6YENBT0 Mq0Fca7hHU0DFxNdYwXPgCJP+7ZWwd/IYlpCkBNjGzVgwiGmEy0REZoPdUT3Mxk5SP/S MjHa7nhrAZP+Wq8A4VWf/yn+xd23+qxkouxBW25ro48JZYanksz3B5YWPZw+7co66hXY zz8xyG8pNb3fgwmPVzHUh0otS3Zgi3UG0bTIzMuyMF2X02fIuBGEZCGEmX8013zaGWIh QE1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:26 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426318" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:23 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 2/9] mm: introduce smp_list_del for concurrent list entry removals Date: Tue, 11 Sep 2018 13:36:09 +0800 Message-Id: <20180911053616.6894-3-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Daniel Jordan Now that the LRU lock is a RW lock, lay the groundwork for fine-grained synchronization so that multiple threads holding the lock as reader can safely remove pages from an LRU at the same time. Add a thread-safe variant of list_del called smp_list_del that allows multiple threads to delete nodes from a list, and wrap this new list API in smp_del_page_from_lru to get the LRU statistics updates right. For bisectability's sake, call the new function only when holding lru_lock as writer. In the next patch, switch to taking it as reader. The algorithm is explained in detail in the comments. Yosef Lev conceived of the algorithm, and this patch is heavily based on an earlier version from him. Thanks to Dave Dice for suggesting the prefetch. [aaronlu: only take list related code here] Signed-off-by: Yosef Lev Signed-off-by: Daniel Jordan --- include/linux/list.h | 2 + lib/Makefile | 2 +- lib/list.c | 158 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 161 insertions(+), 1 deletion(-) create mode 100644 lib/list.c diff --git a/include/linux/list.h b/include/linux/list.h index de04cc5ed536..0fd9c87dd14b 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -47,6 +47,8 @@ static inline bool __list_del_entry_valid(struct list_head *entry) } #endif +extern void smp_list_del(struct list_head *entry); + /* * Insert a new entry between two known consecutive entries. * diff --git a/lib/Makefile b/lib/Makefile index ca3f7ebb900d..9527b7484653 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -38,7 +38,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \ gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o reciprocal_div.o \ - once.o refcount.o usercopy.o errseq.o bucket_locks.o + once.o refcount.o usercopy.o errseq.o bucket_locks.o list.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o diff --git a/lib/list.c b/lib/list.c new file mode 100644 index 000000000000..4d0949ea1a09 --- /dev/null +++ b/lib/list.c @@ -0,0 +1,158 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright (c) 2017, 2018 Oracle and/or its affiliates. All rights reserved. + * + * Authors: Yosef Lev + * Daniel Jordan + */ + +#include +#include + +/* + * smp_list_del is a variant of list_del that allows concurrent list removals + * under certain assumptions. The idea is to get away from overly coarse + * synchronization, such as using a lock to guard an entire list, which + * serializes all operations even though those operations might be happening on + * disjoint parts. + * + * If you want to use other functions from the list API concurrently, + * additional synchronization may be necessary. For example, you could use a + * rwlock as a two-mode lock, where readers use the lock in shared mode and are + * allowed to call smp_list_del concurrently, and writers use the lock in + * exclusive mode and are allowed to use all list operations. + */ + +/** + * smp_list_del - concurrent variant of list_del + * @entry: entry to delete from the list + * + * Safely removes an entry from the list in the presence of other threads that + * may try to remove adjacent entries. Uses the entry's next field and the + * predecessor entry's next field as locks to accomplish this. + * + * Assumes that no two threads may try to delete the same entry. This + * assumption holds, for example, if the objects on the list are + * reference-counted so that an object is only removed when its refcount falls + * to 0. + * + * @entry's next and prev fields are poisoned on return just as with list_del. + */ +void smp_list_del(struct list_head *entry) +{ + struct list_head *succ, *pred, *pred_reread; + + /* + * The predecessor entry's cacheline is read before it's written, so to + * avoid an unnecessary cacheline state transition, prefetch for + * writing. In the common case, the predecessor won't change. + */ + prefetchw(entry->prev); + + /* + * Step 1: Lock @entry E by making its next field point to its + * predecessor D. This prevents any thread from removing the + * predecessor because that thread will loop in its step 4 while + * E->next == D. This also prevents any thread from removing the + * successor F because that thread will see that F->prev->next != F in + * the cmpxchg in its step 3. Retry if the successor is being removed + * and has already set this field to NULL in step 3. + */ + succ = READ_ONCE(entry->next); + pred = READ_ONCE(entry->prev); + while (succ == NULL || cmpxchg(&entry->next, succ, pred) != succ) { + /* + * Reread @entry's successor because it may change until + * @entry's next field is locked. Reread the predecessor to + * have a better chance of publishing the right value and avoid + * entering the loop in step 2 while @entry is locked, + * but this isn't required for correctness because the + * predecessor is reread in step 2. + */ + cpu_relax(); + succ = READ_ONCE(entry->next); + pred = READ_ONCE(entry->prev); + } + + /* + * Step 2: A racing thread may remove @entry's predecessor. Reread and + * republish @entry->prev until it does not change. This guarantees + * that the racing thread has not passed the while loop in step 4 and + * has not freed the predecessor, so it is safe for this thread to + * access predecessor fields in step 3. + */ + pred_reread = READ_ONCE(entry->prev); + while (pred != pred_reread) { + WRITE_ONCE(entry->next, pred_reread); + pred = pred_reread; + /* + * Ensure the predecessor is published in @entry's next field + * before rereading the predecessor. Pairs with the smp_mb in + * step 4. + */ + smp_mb(); + pred_reread = READ_ONCE(entry->prev); + } + + /* + * Step 3: If the predecessor points to @entry, lock it and continue. + * Otherwise, the predecessor is being removed, so loop until that + * removal finishes and this thread's @entry->prev is updated, which + * indicates the old predecessor has reached the loop in step 4. Write + * the new predecessor into @entry->next. This both releases the old + * predecessor from its step 4 loop and sets this thread up to lock the + * new predecessor. + */ + while (pred->next != entry || + cmpxchg(&pred->next, entry, NULL) != entry) { + /* + * The predecessor is being removed so wait for a new, + * unlocked predecessor. + */ + cpu_relax(); + pred_reread = READ_ONCE(entry->prev); + if (pred != pred_reread) { + /* + * The predecessor changed, so republish it and update + * it as in step 2. + */ + WRITE_ONCE(entry->next, pred_reread); + pred = pred_reread; + /* Pairs with smp_mb in step 4. */ + smp_mb(); + } + } + + /* + * Step 4: @entry and @entry's predecessor are both locked, so now + * actually remove @entry from the list. + * + * It is safe to write to the successor's prev pointer because step 1 + * prevents the successor from being removed. + */ + + WRITE_ONCE(succ->prev, pred); + + /* + * The full barrier guarantees that all changes are visible to other + * threads before the entry is unlocked by the final write, pairing + * with the implied full barrier before the cmpxchg in step 1. + * + * The barrier also guarantees that this thread writes succ->prev + * before reading succ->next, pairing with a thread in step 2 or 3 that + * writes entry->next before reading entry->prev, which ensures that + * the one that writes second sees the update from the other. + */ + smp_mb(); + + while (READ_ONCE(succ->next) == entry) { + /* The successor is being removed, so wait for it to finish. */ + cpu_relax(); + } + + /* Simultaneously completes the removal and unlocks the predecessor. */ + WRITE_ONCE(pred->next, succ); + + entry->next = LIST_POISON1; + entry->prev = LIST_POISON2; +}