From patchwork Fri Jul 22 20:35:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 9244229 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2D33260459 for ; Fri, 22 Jul 2016 20:38:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1FAB127D8D for ; Fri, 22 Jul 2016 20:38:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 13BEC2808F; Fri, 22 Jul 2016 20:38:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0F55D27D8D for ; Fri, 22 Jul 2016 20:38:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754012AbcGVUhv (ORCPT ); Fri, 22 Jul 2016 16:37:51 -0400 Received: from g2t1383g.austin.hpe.com ([15.233.16.89]:39971 "EHLO g2t1383g.austin.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753773AbcGVUg0 (ORCPT ); Fri, 22 Jul 2016 16:36:26 -0400 Received: from g2t2355.austin.hpe.com (g2t2355.austin.hpe.com [15.233.44.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by g2t1383g.austin.hpe.com (Postfix) with ESMTPS id D3B611ED8; Fri, 22 Jul 2016 20:36:25 +0000 (UTC) Received: from g2t2360.austin.hpecorp.net (g2t2360.austin.hpecorp.net [16.196.225.135]) by g2t2355.austin.hpe.com (Postfix) with ESMTP id F08A459; Fri, 22 Jul 2016 20:36:24 +0000 (UTC) Received: from RHEL65.localdomain (longwa3.americas.hpqcorp.net [16.214.129.217]) by g2t2360.austin.hpecorp.net (Postfix) with ESMTP id 9856C3D; Fri, 22 Jul 2016 20:36:23 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Boqun Feng , Scott J Norton , Douglas Hatch , Waiman Long Subject: [PATCH v4 1/5] lib/dlock-list: Distributed and lock-protected lists Date: Fri, 22 Jul 2016 16:35:52 -0400 Message-Id: <1469219756-26353-2-git-send-email-Waiman.Long@hpe.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1469219756-26353-1-git-send-email-Waiman.Long@hpe.com> References: <1469219756-26353-1-git-send-email-Waiman.Long@hpe.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Linked list is used everywhere in the Linux kernel. However, if many threads are trying to add or delete entries into the same linked list, it can create a performance bottleneck. This patch introduces a new list APIs that provide a set of distributed lists (one per CPU), each of which is protected by its own spinlock. To the callers, however, the set of lists acts like a single consolidated list. This allows list entries insertion and deletion operations to happen in parallel instead of being serialized with a global list and lock. List entry insertion is strictly per cpu. List deletion, however, can happen in a cpu other than the one that did the insertion. So we still need lock to protect the list. Because of that, there may still be a small amount of contention when deletion is being done. A new header file include/linux/dlock-list.h will be added with the associated dlock_list_head and dlock_list_node structures. The following functions are provided to manage the per-cpu list: 1. int init_dlock_list_head(struct dlock_list_head *dlist) 2. void dlock_list_add(struct dlock_list_node *node, struct dlock_list_head *dlist) 3. void dlock_list_del(struct dlock_list *node) Iteration of all the list entries within a group of per-cpu lists is done by calling either the dlist_for_each_entry() or dlist_for_each_entry_safe() macros. They correspond to the list_for_each_entry() and list_for_each_entry_safe() macros respectively. The iteration states are keep in a dlock_list_iter structure that is passed to the iteration macros. Signed-off-by: Waiman Long Reviewed-by: Jan Kara --- include/linux/dlock-list.h | 242 ++++++++++++++++++++++++++++++++++++++++++++ lib/Makefile | 2 +- lib/dlock-list.c | 182 +++++++++++++++++++++++++++++++++ 3 files changed, 425 insertions(+), 1 deletions(-) create mode 100644 include/linux/dlock-list.h create mode 100644 lib/dlock-list.c diff --git a/include/linux/dlock-list.h b/include/linux/dlock-list.h new file mode 100644 index 0000000..ceb4228 --- /dev/null +++ b/include/linux/dlock-list.h @@ -0,0 +1,242 @@ +/* + * Distributed and locked list + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * (C) Copyright 2016 Hewlett-Packard Enterprise Development LP + * + * Authors: Waiman Long + */ +#ifndef __LINUX_DLOCK_LIST_H +#define __LINUX_DLOCK_LIST_H + +#include +#include +#include + +/* + * include/linux/dlock-list.h + * + * A distributed (per-cpu) set of lists each of which is protected by its + * own spinlock, but acts like a single consolidated list to the callers. + * + * The dlock_list_head_percpu structure contains the spinlock, the other + * dlock_list_node structures only contains a pointer to the spinlock in + * dlock_list_head_percpu. + */ +struct dlock_list_head_percpu { + struct list_head list; + spinlock_t lock; +}; + +struct dlock_list_head { + struct dlock_list_head_percpu __percpu *head; +}; + +/* + * dlock list node data structure + */ +struct dlock_list_node { + struct list_head list; + spinlock_t *lockptr; +}; + +/* + * dlock list iteration state + * + * This is an opaque data structure that may change. Users of this structure + * should not access the structure members directly other than using the + * helper functions and macros provided in this header file. + */ +struct dlock_list_iter { + int cpu; + struct dlock_list_head *head; + struct dlock_list_head_percpu *pcpu_head; +}; + +#define DLOCK_LIST_ITER_INIT(dlist) \ + { \ + .cpu = -1, \ + .head = dlist, \ + } + +#define DEFINE_DLOCK_LIST_ITER(s, dlist) \ + struct dlock_list_iter s = DLOCK_LIST_ITER_INIT(dlist) + +static inline void init_dlock_list_iter(struct dlock_list_iter *iter, + struct dlock_list_head *head) +{ + *iter = (struct dlock_list_iter)DLOCK_LIST_ITER_INIT(head); +} + +#define DLOCK_LIST_NODE_INIT(name) \ + { \ + .list = LIST_HEAD_INIT(name) \ + } + +static inline void init_dlock_list_node(struct dlock_list_node *node) +{ + *node = (struct dlock_list_node)DLOCK_LIST_NODE_INIT(node->list); +} + +/** + * dlock_list_empty - Check if all the dlock lists are empty + * @dlist: Pointer to the dlock_list_head structure + * Return: true if list is empty, false otherwise. + * + * This can be a pretty expensive function call. If this function is required + * in a performance critical path, we may have to maintain a global count + * of the list entries in the global dlock_list_head structure instead. + */ +static inline bool dlock_list_empty(struct dlock_list_head *dlist) +{ + int cpu; + + for_each_possible_cpu(cpu) + if (!list_empty(&per_cpu_ptr(dlist->head, cpu)->list)) + return false; + return true; +} + +/** + * dlock_list_unlock - unlock the spinlock that protects the percpu list + * @iter: Pointer to the dlock list iterator structure + */ +static inline void dlock_list_unlock(struct dlock_list_iter *iter) +{ + spin_unlock(&iter->pcpu_head->lock); +} + +/** + * dlock_list_relock - lock the spinlock that protects the percpu list + * @iter: Pointer to the dlock list iterator structure + */ +static inline void dlock_list_relock(struct dlock_list_iter *iter) +{ + spin_lock(&iter->pcpu_head->lock); +} + +/* + * Allocation and freeing of dlock list + */ +extern int alloc_dlock_list_head(struct dlock_list_head *dlist); +extern void free_dlock_list_head(struct dlock_list_head *dlist); + +/* + * The dlock list addition and deletion functions here are not irq-safe. + * Special irq-safe variants will have to be added if we need them. + */ +extern void dlock_list_add(struct dlock_list_node *node, + struct dlock_list_head *dlist); +extern void dlock_list_del(struct dlock_list_node *node); + +/* + * Find the first entry of the next per-cpu list. + */ +extern struct dlock_list_node * +__dlock_list_next_cpu(struct dlock_list_iter *iter); + +/** + * __dlock_list_next_entry - Iterate to the next entry of the dlock list + * @curr : Pointer to the current dlock_list_node structure + * @iter : Pointer to the dlock list iterator structure + * Return: Pointer to the next entry or NULL if all the entries are iterated + * + * The iterator has to be properly initialized before calling this function. + */ +static inline struct dlock_list_node * +__dlock_list_next_entry(struct dlock_list_node *curr, + struct dlock_list_iter *iter) +{ + /* + * Find next entry + */ + if (curr) + curr = list_next_entry(curr, list); + + if (!curr || (&curr->list == &iter->pcpu_head->list)) { + /* + * The current per-cpu list has been exhausted, try the next + * per-cpu list. + */ + curr = __dlock_list_next_cpu(iter); + } + + return curr; /* Continue the iteration */ +} + +/** + * dlock_list_first_entry - get the first element from a list + * @iter : The dlock list iterator. + * @type : The type of the struct this is embedded in. + * @member: The name of the dlock_list_node within the struct. + * Return : Pointer to the next entry or NULL if all the entries are iterated. + */ +#define dlock_list_first_entry(iter, type, member) \ + ({ \ + struct dlock_list_node *_n; \ + _n = __dlock_list_next_entry(NULL, iter); \ + _n ? list_entry(_n, type, member) : NULL; \ + }) + +/** + * dlock_list_next_entry - iterate to the next entry of the list + * @pos : The type * to cursor + * @iter : The dlock list iterator. + * @member: The name of the dlock_list_node within the struct. + * Return : Pointer to the next entry or NULL if all the entries are iterated. + * + * Note that pos can't be NULL. + */ +#define dlock_list_next_entry(pos, iter, member) \ + ({ \ + struct dlock_list_node *_n; \ + _n = __dlock_list_next_entry(&(pos)->member, iter); \ + _n ? list_entry(_n, typeof(*(pos)), member) : NULL; \ + }) + +/** + * dlist_for_each_entry - iterate over the dlock list + * @pos : Type * to use as a loop cursor + * @iter : The dlock list iterator + * @member: The name of the dlock_list_node within the struct + * + * This iteration macro isn't safe with respect to list entry removal, but + * it can correctly iterate newly added entries right after the current one. + * This iteration function is designed to be used in a while loop. + */ +#define dlist_for_each_entry(pos, iter, member) \ + for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member);\ + pos != NULL; \ + pos = dlock_list_next_entry(pos, iter, member)) + +/** + * dlist_for_each_entry_safe - iterate over the dlock list & safe over removal + * @pos : Type * to use as a loop cursor + * @n : Another type * to use as temporary storage + * @iter : The dlock list iterator + * @member: The name of the dlock_list_node within the struct + * + * This iteration macro is safe with respect to list entry removal. + * However, it cannot correctly iterate newly added entries right after the + * current one. + */ +#define dlist_for_each_entry_safe(pos, n, iter, member) \ + for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member);\ + ({ \ + bool _b = (pos != NULL); \ + if (_b) \ + n = dlock_list_next_entry(pos, iter, member); \ + _b; \ + }); \ + pos = n) + +#endif /* __LINUX_DLOCK_LIST_H */ diff --git a/lib/Makefile b/lib/Makefile index 499fb35..92e8c38 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -40,7 +40,7 @@ obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \ gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \ - once.o + once.o dlock-list.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o obj-y += hexdump.o diff --git a/lib/dlock-list.c b/lib/dlock-list.c new file mode 100644 index 0000000..54006dc --- /dev/null +++ b/lib/dlock-list.c @@ -0,0 +1,182 @@ +/* + * Distributed and locked list + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * (C) Copyright 2016 Hewlett-Packard Enterprise Development LP + * + * Authors: Waiman Long + */ +#include +#include + +/* + * As all the locks in the dlock list are dynamically allocated, they need + * to belong to their own special lock class to avoid warning and stack + * trace in kernel log when lockdep is enabled. Statically allocated locks + * don't have this problem. + */ +static struct lock_class_key dlock_list_key; + +/** + * alloc_dlock_list_head - Initialize and allocate the per-cpu list head + * @dlist: Pointer to the dlock_list_head structure to be initialized + * Return: 0 if successful, -ENOMEM if memory allocation error + * + * This function does not allocate the dlock_list_head structure itself. The + * callers will have to do their own memory allocation, if necessary. However, + * this allows embedding the dlock_list_head structure directly into other + * structures. + */ +int alloc_dlock_list_head(struct dlock_list_head *dlist) +{ + struct dlock_list_head dlist_tmp; + int cpu; + + dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu); + if (!dlist_tmp.head) + return -ENOMEM; + + for_each_possible_cpu(cpu) { + struct dlock_list_head_percpu *head; + + head = per_cpu_ptr(dlist_tmp.head, cpu); + INIT_LIST_HEAD(&head->list); + head->lock = __SPIN_LOCK_UNLOCKED(&head->lock); + lockdep_set_class(&head->lock, &dlock_list_key); + } + + dlist->head = dlist_tmp.head; + return 0; +} + +/** + * free_dlock_list_head - Free the per-cpu list head of dlock list + * @dlist: Pointer of the dlock_list_head structure to be freed + * + * This function doesn't free the dlock_list_head structure itself. So + * the caller will have to do it, if necessary. + */ +void free_dlock_list_head(struct dlock_list_head *dlist) +{ + free_percpu(dlist->head); + dlist->head = NULL; +} + +/** + * dlock_list_add - Adds a node to the given dlock list + * @node : The node to be added + * @dlist: The dlock list where the node is to be added + * + * List selection is based on the CPU being used when the dlock_list_add() + * function is called. However, deletion may be done by a different CPU. + * So we still need to use a lock to protect the content of the list. + */ +void dlock_list_add(struct dlock_list_node *node, + struct dlock_list_head *dlist) +{ + struct dlock_list_head_percpu *head; + + /* + * Disable preemption to make sure that CPU won't gets changed. + */ + head = get_cpu_ptr(dlist->head); + spin_lock(&head->lock); + node->lockptr = &head->lock; + list_add(&node->list, &head->list); + spin_unlock(&head->lock); + put_cpu_ptr(dlist->head); +} + +/** + * dlock_list_del - Delete a node from a dlock list + * @node : The node to be deleted + * + * We need to check the lock pointer again after taking the lock to guard + * against concurrent deletion of the same node. If the lock pointer changes + * (becomes NULL or to a different one), we assume that the deletion was done + * elsewhere. A warning will be printed if this happens as it is likely to be + * a bug. + */ +void dlock_list_del(struct dlock_list_node *node) +{ + spinlock_t *lock; + bool retry; + + do { + lock = READ_ONCE(node->lockptr); + if (WARN_ONCE(!lock, + "dlock_list_del: node 0x%lx has no associated lock\n", + (unsigned long)node)) + return; + + spin_lock(lock); + if (likely(lock == node->lockptr)) { + list_del_init(&node->list); + node->lockptr = NULL; + retry = false; + } else { + /* + * The lock has somehow changed. Retry again if it is + * not NULL. Otherwise, just ignore the delete + * operation. + */ + retry = (node->lockptr != NULL); + } + spin_unlock(lock); + } while (retry); +} + +/** + * __dlock_list_next_cpu: Find the first entry of the next per-cpu list + * @dlist: Pointer to the dlock_list_head structure + * @iter : Pointer to the dlock list iterator structure + * Return: true if the entry is found, false if all the lists exhausted + * + * The information about the next per-cpu list will be put into the iterator. + */ +struct dlock_list_node *__dlock_list_next_cpu(struct dlock_list_iter *iter) +{ + struct dlock_list_node *next; + + if (iter->pcpu_head) { + spin_unlock(&iter->pcpu_head->lock); + iter->pcpu_head = NULL; + } + +next_cpu: + /* + * for_each_possible_cpu(cpu) + */ + iter->cpu = cpumask_next(iter->cpu, cpu_possible_mask); + if (iter->cpu >= nr_cpu_ids) + return NULL; /* All the per-cpu lists iterated */ + + iter->pcpu_head = per_cpu_ptr(iter->head->head, iter->cpu); + if (list_empty(&iter->pcpu_head->list)) + goto next_cpu; + + spin_lock(&iter->pcpu_head->lock); + /* + * There is a slight chance that the list may become empty just + * before the lock is acquired. So an additional check is + * needed to make sure that iter->curr points to a valid entry. + */ + if (list_empty(&iter->pcpu_head->list)) { + spin_unlock(&iter->pcpu_head->lock); + goto next_cpu; + } + next = list_entry(iter->pcpu_head->list.next, struct dlock_list_node, + list); + WARN_ON_ONCE(next->lockptr != &iter->pcpu_head->lock); + + return next; +}