From patchwork Thu Feb 7 19:07:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801939 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F3D7417FB for ; Thu, 7 Feb 2019 19:08:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E26DA2D9BF for ; Thu, 7 Feb 2019 19:08:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D63732D9E8; Thu, 7 Feb 2019 19:08:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 379F02D9C4 for ; Thu, 7 Feb 2019 19:08:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727003AbfBGTIo (ORCPT ); Thu, 7 Feb 2019 14:08:44 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45952 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726512AbfBGTIo (ORCPT ); Thu, 7 Feb 2019 14:08:44 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0D5C6155E0; Thu, 7 Feb 2019 19:08:43 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 83D8060FDF; Thu, 7 Feb 2019 19:08:35 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 01/22] locking/qspinlock_stat: Introduce a generic lockevent counting APIs Date: Thu, 7 Feb 2019 14:07:05 -0500 Message-Id: <1549566446-27967-2-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 07 Feb 2019 19:08:43 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The percpu event counts used by qspinlock code can be useful for other locking code as well. So a new set of lockevent_* counting APIs is introduced with the lock event names extracted out into the new lock_events_list.h header file for easier addition in the future. The existing qstat_inc() calls are replaced by either lockevent_inc() or lockevent_cond_inc() calls. The qstat_hop() call is renamed to lockevent_pv_hop(). The "reset_counters" debugfs file is also renamed to ".reset_counts". Signed-off-by: Waiman Long --- kernel/locking/lock_events.h | 55 ++++++++++++ kernel/locking/lock_events_list.h | 50 +++++++++++ kernel/locking/qspinlock.c | 8 +- kernel/locking/qspinlock_paravirt.h | 19 +++-- kernel/locking/qspinlock_stat.h | 163 ++++++++++++++---------------------- 5 files changed, 181 insertions(+), 114 deletions(-) create mode 100644 kernel/locking/lock_events.h create mode 100644 kernel/locking/lock_events_list.h diff --git a/kernel/locking/lock_events.h b/kernel/locking/lock_events.h new file mode 100644 index 0000000..4009e07 --- /dev/null +++ b/kernel/locking/lock_events.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: Waiman Long + */ + +enum lock_events { + +#include "lock_events_list.h" + + lockevent_num, /* Total number of lock event counts */ + LOCKEVENT_reset_cnts = lockevent_num, +}; + +#ifdef CONFIG_QUEUED_LOCK_STAT +/* + * Per-cpu counters + */ +DECLARE_PER_CPU(unsigned long, lockevents[lockevent_num]); + +/* + * Increment the PV qspinlock statistical counters + */ +static inline void __lockevent_inc(enum lock_events event, bool cond) +{ + if (cond) + __this_cpu_inc(lockevents[event]); +} + +#define lockevent_inc(ev) __lockevent_inc(LOCKEVENT_ ##ev, true) +#define lockevent_cond_inc(ev, c) __lockevent_inc(LOCKEVENT_ ##ev, c) + +static inline void __lockevent_add(enum lock_events event, int inc) +{ + __this_cpu_add(lockevents[event], inc); +} + +#define lockevent_add(ev, c) __lockevent_add(LOCKEVENT_ ##ev, c) + +#else /* CONFIG_QUEUED_LOCK_STAT */ + +#define lockevent_inc(ev) +#define lockevent_add(ev, c) +#define lockevent_cond_inc(ev, c) + +#endif /* CONFIG_QUEUED_LOCK_STAT */ diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h new file mode 100644 index 0000000..8b4d2e1 --- /dev/null +++ b/kernel/locking/lock_events_list.h @@ -0,0 +1,50 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: Waiman Long + */ + +#ifndef LOCK_EVENT +#define LOCK_EVENT(name) LOCKEVENT_ ## name, +#endif + +#ifdef CONFIG_QUEUED_SPINLOCKS +#ifdef CONFIG_PARAVIRT_SPINLOCKS +/* + * Locking events for PV qspinlock. + */ +LOCK_EVENT(pv_hash_hops) /* Average # of hops per hashing operation */ +LOCK_EVENT(pv_kick_unlock) /* # of vCPU kicks issued at unlock time */ +LOCK_EVENT(pv_kick_wake) /* # of vCPU kicks for pv_latency_wake */ +LOCK_EVENT(pv_latency_kick) /* Average latency (ns) of vCPU kick */ +LOCK_EVENT(pv_latency_wake) /* Average latency (ns) of kick-to-wakeup */ +LOCK_EVENT(pv_lock_stealing) /* # of lock stealing operations */ +LOCK_EVENT(pv_spurious_wakeup) /* # of spurious wakeups in non-head vCPUs */ +LOCK_EVENT(pv_wait_again) /* # of wait's after queue head vCPU kick */ +LOCK_EVENT(pv_wait_early) /* # of early vCPU wait's */ +LOCK_EVENT(pv_wait_head) /* # of vCPU wait's at the queue head */ +LOCK_EVENT(pv_wait_node) /* # of vCPU wait's at non-head queue node */ +#endif /* CONFIG_PARAVIRT_SPINLOCKS */ + +/* + * Locking events for qspinlock + * + * Subtracting lock_use_node[234] from lock_slowpath will give you + * lock_use_node1. + */ +LOCK_EVENT(lock_pending) /* # of locking ops via pending code */ +LOCK_EVENT(lock_slowpath) /* # of locking ops via MCS lock queue */ +LOCK_EVENT(lock_use_node2) /* # of locking ops that use 2nd percpu node */ +LOCK_EVENT(lock_use_node3) /* # of locking ops that use 3rd percpu node */ +LOCK_EVENT(lock_use_node4) /* # of locking ops that use 4th percpu node */ +LOCK_EVENT(lock_no_node) /* # of locking ops w/o using percpu node */ +#endif /* CONFIG_QUEUED_SPINLOCKS */ diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 21ee51b..fe0dc3c 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -398,7 +398,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * 0,1,0 -> 0,0,1 */ clear_pending_set_locked(lock); - qstat_inc(qstat_lock_pending, true); + lockevent_inc(lock_pending); return; /* @@ -406,7 +406,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * queuing. */ queue: - qstat_inc(qstat_lock_slowpath, true); + lockevent_inc(lock_slowpath); pv_queue: node = this_cpu_ptr(&qnodes[0].mcs); idx = node->count++; @@ -422,7 +422,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * simple enough. */ if (unlikely(idx >= MAX_NODES)) { - qstat_inc(qstat_lock_no_node, true); + lockevent_inc(lock_no_node); while (!queued_spin_trylock(lock)) cpu_relax(); goto release; @@ -433,7 +433,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) /* * Keep counts of non-zero index values: */ - qstat_inc(qstat_lock_use_node2 + idx - 1, idx); + lockevent_cond_inc(lock_use_node2 + idx - 1, idx); /* * Ensure that we increment the head node->count before initialising diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index 8f36c27..89bab07 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -89,7 +89,7 @@ static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock) if (!(val & _Q_LOCKED_PENDING_MASK) && (cmpxchg_acquire(&lock->locked, 0, _Q_LOCKED_VAL) == 0)) { - qstat_inc(qstat_pv_lock_stealing, true); + lockevent_inc(pv_lock_stealing); return true; } if (!(val & _Q_TAIL_MASK) || (val & _Q_PENDING_MASK)) @@ -219,7 +219,7 @@ static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node) hopcnt++; if (!cmpxchg(&he->lock, NULL, lock)) { WRITE_ONCE(he->node, node); - qstat_hop(hopcnt); + lockevent_pv_hop(hopcnt); return &he->lock; } } @@ -320,8 +320,8 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev) smp_store_mb(pn->state, vcpu_halted); if (!READ_ONCE(node->locked)) { - qstat_inc(qstat_pv_wait_node, true); - qstat_inc(qstat_pv_wait_early, wait_early); + lockevent_inc(pv_wait_node); + lockevent_cond_inc(pv_wait_early, wait_early); pv_wait(&pn->state, vcpu_halted); } @@ -339,7 +339,8 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev) * So it is better to spin for a while in the hope that the * MCS lock will be released soon. */ - qstat_inc(qstat_pv_spurious_wakeup, !READ_ONCE(node->locked)); + lockevent_cond_inc(pv_spurious_wakeup, + !READ_ONCE(node->locked)); } /* @@ -416,7 +417,7 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) /* * Tracking # of slowpath locking operations */ - qstat_inc(qstat_lock_slowpath, true); + lockevent_inc(lock_slowpath); for (;; waitcnt++) { /* @@ -464,8 +465,8 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) } } WRITE_ONCE(pn->state, vcpu_hashed); - qstat_inc(qstat_pv_wait_head, true); - qstat_inc(qstat_pv_wait_again, waitcnt); + lockevent_inc(pv_wait_head); + lockevent_cond_inc(pv_wait_again, waitcnt); pv_wait(&lock->locked, _Q_SLOW_VAL); /* @@ -528,7 +529,7 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) * vCPU is harmless other than the additional latency in completing * the unlock. */ - qstat_inc(qstat_pv_kick_unlock, true); + lockevent_inc(pv_kick_unlock); pv_kick(node->cpu); } diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h index d73f853..1db5b37 100644 --- a/kernel/locking/qspinlock_stat.h +++ b/kernel/locking/qspinlock_stat.h @@ -38,8 +38,8 @@ * Subtracting lock_use_node[234] from lock_slowpath will give you * lock_use_node1. * - * Writing to the "reset_counters" file will reset all the above counter - * values. + * Writing to the special ".reset_counts" file will reset all the above + * counter values. * * These statistical counters are implemented as per-cpu variables which are * summed and computed whenever the corresponding debugfs files are read. This @@ -48,27 +48,7 @@ * * There may be slight difference between pv_kick_wake and pv_kick_unlock. */ -enum qlock_stats { - qstat_pv_hash_hops, - qstat_pv_kick_unlock, - qstat_pv_kick_wake, - qstat_pv_latency_kick, - qstat_pv_latency_wake, - qstat_pv_lock_stealing, - qstat_pv_spurious_wakeup, - qstat_pv_wait_again, - qstat_pv_wait_early, - qstat_pv_wait_head, - qstat_pv_wait_node, - qstat_lock_pending, - qstat_lock_slowpath, - qstat_lock_use_node2, - qstat_lock_use_node3, - qstat_lock_use_node4, - qstat_lock_no_node, - qstat_num, /* Total number of statistical counters */ - qstat_reset_cnts = qstat_num, -}; +#include "lock_events.h" #ifdef CONFIG_QUEUED_LOCK_STAT /* @@ -79,99 +59,91 @@ enum qlock_stats { #include #include -static const char * const qstat_names[qstat_num + 1] = { - [qstat_pv_hash_hops] = "pv_hash_hops", - [qstat_pv_kick_unlock] = "pv_kick_unlock", - [qstat_pv_kick_wake] = "pv_kick_wake", - [qstat_pv_spurious_wakeup] = "pv_spurious_wakeup", - [qstat_pv_latency_kick] = "pv_latency_kick", - [qstat_pv_latency_wake] = "pv_latency_wake", - [qstat_pv_lock_stealing] = "pv_lock_stealing", - [qstat_pv_wait_again] = "pv_wait_again", - [qstat_pv_wait_early] = "pv_wait_early", - [qstat_pv_wait_head] = "pv_wait_head", - [qstat_pv_wait_node] = "pv_wait_node", - [qstat_lock_pending] = "lock_pending", - [qstat_lock_slowpath] = "lock_slowpath", - [qstat_lock_use_node2] = "lock_use_node2", - [qstat_lock_use_node3] = "lock_use_node3", - [qstat_lock_use_node4] = "lock_use_node4", - [qstat_lock_no_node] = "lock_no_node", - [qstat_reset_cnts] = "reset_counters", +#define EVENT_COUNT(ev) lockevents[LOCKEVENT_ ## ev] + +#undef LOCK_EVENT +#define LOCK_EVENT(name) [LOCKEVENT_ ## name] = #name, + +static const char * const lockevent_names[lockevent_num + 1] = { + +#include "lock_events_list.h" + + [LOCKEVENT_reset_cnts] = ".reset_counts", }; /* * Per-cpu counters */ -static DEFINE_PER_CPU(unsigned long, qstats[qstat_num]); +DEFINE_PER_CPU(unsigned long, lockevents[lockevent_num]); static DEFINE_PER_CPU(u64, pv_kick_time); /* * Function to read and return the qlock statistical counter values * * The following counters are handled specially: - * 1. qstat_pv_latency_kick + * 1. pv_latency_kick * Average kick latency (ns) = pv_latency_kick/pv_kick_unlock - * 2. qstat_pv_latency_wake + * 2. pv_latency_wake * Average wake latency (ns) = pv_latency_wake/pv_kick_wake - * 3. qstat_pv_hash_hops + * 3. pv_hash_hops * Average hops/hash = pv_hash_hops/pv_kick_unlock */ -static ssize_t qstat_read(struct file *file, char __user *user_buf, - size_t count, loff_t *ppos) +static ssize_t lockevent_read(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) { char buf[64]; - int cpu, counter, len; - u64 stat = 0, kicks = 0; + int cpu, id, len; + u64 sum = 0, kicks = 0; /* * Get the counter ID stored in file->f_inode->i_private */ - counter = (long)file_inode(file)->i_private; + id = (long)file_inode(file)->i_private; - if (counter >= qstat_num) + if (id >= lockevent_num) return -EBADF; for_each_possible_cpu(cpu) { - stat += per_cpu(qstats[counter], cpu); + sum += per_cpu(lockevents[id], cpu); /* - * Need to sum additional counter for some of them + * Need to sum additional counters for some of them */ - switch (counter) { + switch (id) { - case qstat_pv_latency_kick: - case qstat_pv_hash_hops: - kicks += per_cpu(qstats[qstat_pv_kick_unlock], cpu); + case LOCKEVENT_pv_latency_kick: + case LOCKEVENT_pv_hash_hops: + kicks += per_cpu(EVENT_COUNT(pv_kick_unlock), cpu); break; - case qstat_pv_latency_wake: - kicks += per_cpu(qstats[qstat_pv_kick_wake], cpu); + case LOCKEVENT_pv_latency_wake: + kicks += per_cpu(EVENT_COUNT(pv_kick_wake), cpu); break; } } - if (counter == qstat_pv_hash_hops) { + if (id == LOCKEVENT_pv_hash_hops) { u64 frac = 0; if (kicks) { - frac = 100ULL * do_div(stat, kicks); + frac = 100ULL * do_div(sum, kicks); frac = DIV_ROUND_CLOSEST_ULL(frac, kicks); } /* * Return a X.XX decimal number */ - len = snprintf(buf, sizeof(buf) - 1, "%llu.%02llu\n", stat, frac); + len = snprintf(buf, sizeof(buf) - 1, "%llu.%02llu\n", + sum, frac); } else { /* * Round to the nearest ns */ - if ((counter == qstat_pv_latency_kick) || - (counter == qstat_pv_latency_wake)) { + if ((id == LOCKEVENT_pv_latency_kick) || + (id == LOCKEVENT_pv_latency_wake)) { if (kicks) - stat = DIV_ROUND_CLOSEST_ULL(stat, kicks); + sum = DIV_ROUND_CLOSEST_ULL(sum, kicks); } - len = snprintf(buf, sizeof(buf) - 1, "%llu\n", stat); + len = snprintf(buf, sizeof(buf) - 1, "%llu\n", sum); } return simple_read_from_buffer(user_buf, count, ppos, buf, len); @@ -180,11 +152,9 @@ static ssize_t qstat_read(struct file *file, char __user *user_buf, /* * Function to handle write request * - * When counter = reset_cnts, reset all the counter values. - * Since the counter updates aren't atomic, the resetting is done twice - * to make sure that the counters are very likely to be all cleared. + * When id = .reset_cnts, reset all the counter values. */ -static ssize_t qstat_write(struct file *file, const char __user *user_buf, +static ssize_t lockevent_write(struct file *file, const char __user *user_buf, size_t count, loff_t *ppos) { int cpu; @@ -192,14 +162,14 @@ static ssize_t qstat_write(struct file *file, const char __user *user_buf, /* * Get the counter ID stored in file->f_inode->i_private */ - if ((long)file_inode(file)->i_private != qstat_reset_cnts) + if ((long)file_inode(file)->i_private != LOCKEVENT_reset_cnts) return count; for_each_possible_cpu(cpu) { int i; - unsigned long *ptr = per_cpu_ptr(qstats, cpu); + unsigned long *ptr = per_cpu_ptr(lockevents, cpu); - for (i = 0 ; i < qstat_num; i++) + for (i = 0 ; i < lockevent_num; i++) WRITE_ONCE(ptr[i], 0); } return count; @@ -208,9 +178,9 @@ static ssize_t qstat_write(struct file *file, const char __user *user_buf, /* * Debugfs data structures */ -static const struct file_operations fops_qstat = { - .read = qstat_read, - .write = qstat_write, +static const struct file_operations fops_lockevent = { + .read = lockevent_read, + .write = lockevent_write, .llseek = default_llseek, }; @@ -219,10 +189,10 @@ static ssize_t qstat_write(struct file *file, const char __user *user_buf, */ static int __init init_qspinlock_stat(void) { - struct dentry *d_qstat = debugfs_create_dir("qlockstat", NULL); + struct dentry *d_counts = debugfs_create_dir("qlockstat", NULL); int i; - if (!d_qstat) + if (!d_counts) goto out; /* @@ -232,18 +202,19 @@ static int __init init_qspinlock_stat(void) * root is allowed to do the read/write to limit impact to system * performance. */ - for (i = 0; i < qstat_num; i++) - if (!debugfs_create_file(qstat_names[i], 0400, d_qstat, - (void *)(long)i, &fops_qstat)) + for (i = 0; i < lockevent_num; i++) + if (!debugfs_create_file(lockevent_names[i], 0400, d_counts, + (void *)(long)i, &fops_lockevent)) goto fail_undo; - if (!debugfs_create_file(qstat_names[qstat_reset_cnts], 0200, d_qstat, - (void *)(long)qstat_reset_cnts, &fops_qstat)) + if (!debugfs_create_file(lockevent_names[LOCKEVENT_reset_cnts], 0200, + d_counts, (void *)(long)LOCKEVENT_reset_cnts, + &fops_lockevent)) goto fail_undo; return 0; fail_undo: - debugfs_remove_recursive(d_qstat); + debugfs_remove_recursive(d_counts); out: pr_warn("Could not create 'qlockstat' debugfs entries\n"); return -ENOMEM; @@ -251,20 +222,11 @@ static int __init init_qspinlock_stat(void) fs_initcall(init_qspinlock_stat); /* - * Increment the PV qspinlock statistical counters - */ -static inline void qstat_inc(enum qlock_stats stat, bool cond) -{ - if (cond) - this_cpu_inc(qstats[stat]); -} - -/* * PV hash hop count */ -static inline void qstat_hop(int hopcnt) +static inline void lockevent_pv_hop(int hopcnt) { - this_cpu_add(qstats[qstat_pv_hash_hops], hopcnt); + this_cpu_add(EVENT_COUNT(pv_hash_hops), hopcnt); } /* @@ -276,7 +238,7 @@ static inline void __pv_kick(int cpu) per_cpu(pv_kick_time, cpu) = start; pv_kick(cpu); - this_cpu_add(qstats[qstat_pv_latency_kick], sched_clock() - start); + this_cpu_add(EVENT_COUNT(pv_latency_kick), sched_clock() - start); } /* @@ -289,9 +251,9 @@ static inline void __pv_wait(u8 *ptr, u8 val) *pkick_time = 0; pv_wait(ptr, val); if (*pkick_time) { - this_cpu_add(qstats[qstat_pv_latency_wake], + this_cpu_add(EVENT_COUNT(pv_latency_wake), sched_clock() - *pkick_time); - qstat_inc(qstat_pv_kick_wake, true); + lockevent_inc(pv_kick_wake); } } @@ -300,7 +262,6 @@ static inline void __pv_wait(u8 *ptr, u8 val) #else /* CONFIG_QUEUED_LOCK_STAT */ -static inline void qstat_inc(enum qlock_stats stat, bool cond) { } -static inline void qstat_hop(int hopcnt) { } +static inline void lockevent_pv_hop(int hopcnt) { } #endif /* CONFIG_QUEUED_LOCK_STAT */ From patchwork Thu Feb 7 19:07:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801999 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 233851669 for ; Thu, 7 Feb 2019 19:11:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 10D452E05F for ; Thu, 7 Feb 2019 19:11:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 04B382E068; Thu, 7 Feb 2019 19:11:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BAA42E05F for ; Thu, 7 Feb 2019 19:11:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727129AbfBGTIv (ORCPT ); Thu, 7 Feb 2019 14:08:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38734 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726512AbfBGTIr (ORCPT ); Thu, 7 Feb 2019 14:08:47 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3794281138; Thu, 7 Feb 2019 19:08:45 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2EF6061146; Thu, 7 Feb 2019 19:08:43 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 02/22] locking/lock_events: Make lock_events available for all archs & other locks Date: Thu, 7 Feb 2019 14:07:06 -0500 Message-Id: <1549566446-27967-3-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 07 Feb 2019 19:08:46 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The QUEUED_LOCK_STAT option to report queued spinlocks event counts was previously allowed only on x86 architecture. To make the locking event counting code more useful, it is now renamed to a more generic LOCK_EVENT_COUNTS config option. This new option will be available to all the architectures that use qspinlock at the moment. Other locking code can now start to use the generic locking event counting code by including lock_events.h and put the new locking event names into the lock_events_list.h header file. My experience with lock event counting is that it gives valuable insight on how the locking code works and what can be done to make it better. I would like to extend this benefit to other locking code like mutex and rwsem in the near future. The PV qspinlock specific code will stay in qspinlock_stat.h. The locking event counters will now reside in the /lock_event_counts directory. Signed-off-by: Waiman Long --- arch/Kconfig | 10 +++ arch/x86/Kconfig | 8 --- kernel/locking/Makefile | 1 + kernel/locking/lock_events.c | 153 ++++++++++++++++++++++++++++++++++++++++ kernel/locking/lock_events.h | 6 +- kernel/locking/qspinlock_stat.h | 141 ++++-------------------------------- 6 files changed, 179 insertions(+), 140 deletions(-) create mode 100644 kernel/locking/lock_events.c diff --git a/arch/Kconfig b/arch/Kconfig index 9f02132..af147c2 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -888,6 +888,16 @@ config HAVE_ARCH_PREL32_RELOCATIONS config ARCH_USE_MEMREMAP_PROT bool +config LOCK_EVENT_COUNTS + bool "Locking event counts collection" + depends on DEBUG_FS + depends on QUEUED_SPINLOCKS + ---help--- + Enable light-weight counting of various locking related events + in the system with minimal performance impact. This reduces + the chance of application behavior change because of timing + differences. The counts are reported via debugfs. + source "kernel/gcov/Kconfig" source "scripts/gcc-plugins/Kconfig" diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a7e7a60..e4f620f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -784,14 +784,6 @@ config PARAVIRT_SPINLOCKS If you are unsure how to answer this question, answer Y. -config QUEUED_LOCK_STAT - bool "Paravirt queued spinlock statistics" - depends on PARAVIRT_SPINLOCKS && DEBUG_FS - ---help--- - Enable the collection of statistical data on the slowpath - behavior of paravirtualized queued spinlocks and report - them on debugfs. - source "arch/x86/xen/Kconfig" config KVM_GUEST diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index 392c7f2..f2257d7 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -30,3 +30,4 @@ obj-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem-xadd.o obj-$(CONFIG_QUEUED_RWLOCKS) += qrwlock.o obj-$(CONFIG_LOCK_TORTURE_TEST) += locktorture.o obj-$(CONFIG_WW_MUTEX_SELFTEST) += test-ww_mutex.o +obj-$(CONFIG_LOCK_EVENT_COUNTS) += lock_events.o diff --git a/kernel/locking/lock_events.c b/kernel/locking/lock_events.c new file mode 100644 index 0000000..71c36d1 --- /dev/null +++ b/kernel/locking/lock_events.c @@ -0,0 +1,153 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: Waiman Long + */ + +/* + * Collect locking event counts + */ +#include +#include +#include +#include + +#include "lock_events.h" + +#undef LOCK_EVENT +#define LOCK_EVENT(name) [LOCKEVENT_ ## name] = #name, + +#define LOCK_EVENTS_DIR "lock_event_counts" + +/* + * When CONFIG_LOCK_EVENT_COUNTS is enabled, event counts of different + * types of locks will be reported under the /lock_event_counts/ + * directory. See lock_events_list.h for the list of available locking + * events. + * + * Writing to the special ".reset_counts" file will reset all the above + * locking event counts. This is a very slow operation and so should not + * be done frequently. + * + * These event counts are implemented as per-cpu variables which are + * summed and computed whenever the corresponding debugfs files are read. This + * minimizes added overhead making the counts usable even in a production + * environment. + */ +static const char * const lockevent_names[lockevent_num + 1] = { + +#include "lock_events_list.h" + + [LOCKEVENT_reset_cnts] = ".reset_counts", +}; + +/* + * Per-cpu counts + */ +DEFINE_PER_CPU(unsigned long, lockevents[lockevent_num]); + +/* + * The lockevent_read() function can be overridden. + */ +ssize_t __weak lockevent_read(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + char buf[64]; + int cpu, id, len; + u64 sum = 0; + + /* + * Get the counter ID stored in file->f_inode->i_private + */ + id = (long)file_inode(file)->i_private; + + if (id >= lockevent_num) + return -EBADF; + + for_each_possible_cpu(cpu) + sum += per_cpu(lockevents[id], cpu); + len = snprintf(buf, sizeof(buf) - 1, "%llu\n", sum); + + return simple_read_from_buffer(user_buf, count, ppos, buf, len); +} + +/* + * Function to handle write request + * + * When idx = reset_cnts, reset all the counts. + */ +static ssize_t lockevent_write(struct file *file, const char __user *user_buf, + size_t count, loff_t *ppos) +{ + int cpu; + + /* + * Get the counter ID stored in file->f_inode->i_private + */ + if ((long)file_inode(file)->i_private != LOCKEVENT_reset_cnts) + return count; + + for_each_possible_cpu(cpu) { + int i; + unsigned long *ptr = per_cpu_ptr(lockevents, cpu); + + for (i = 0 ; i < lockevent_num; i++) + WRITE_ONCE(ptr[i], 0); + } + return count; +} + +/* + * Debugfs data structures + */ +static const struct file_operations fops_lockevent = { + .read = lockevent_read, + .write = lockevent_write, + .llseek = default_llseek, +}; + +/* + * Initialize debugfs for the locking event counts. + */ +static int __init init_lockevent_counts(void) +{ + struct dentry *d_counts = debugfs_create_dir(LOCK_EVENTS_DIR, NULL); + int i; + + if (!d_counts) + goto out; + + /* + * Create the debugfs files + * + * As reading from and writing to the stat files can be slow, only + * root is allowed to do the read/write to limit impact to system + * performance. + */ + for (i = 0; i < lockevent_num; i++) + if (!debugfs_create_file(lockevent_names[i], 0400, d_counts, + (void *)(long)i, &fops_lockevent)) + goto fail_undo; + + if (!debugfs_create_file(lockevent_names[LOCKEVENT_reset_cnts], 0200, + d_counts, (void *)(long)LOCKEVENT_reset_cnts, + &fops_lockevent)) + goto fail_undo; + + return 0; +fail_undo: + debugfs_remove_recursive(d_counts); +out: + pr_warn("Could not create '%s' debugfs entries\n", LOCK_EVENTS_DIR); + return -ENOMEM; +} +fs_initcall(init_lockevent_counts); diff --git a/kernel/locking/lock_events.h b/kernel/locking/lock_events.h index 4009e07..d965470 100644 --- a/kernel/locking/lock_events.h +++ b/kernel/locking/lock_events.h @@ -21,7 +21,7 @@ enum lock_events { LOCKEVENT_reset_cnts = lockevent_num, }; -#ifdef CONFIG_QUEUED_LOCK_STAT +#ifdef CONFIG_LOCK_EVENT_COUNTS /* * Per-cpu counters */ @@ -46,10 +46,10 @@ static inline void __lockevent_add(enum lock_events event, int inc) #define lockevent_add(ev, c) __lockevent_add(LOCKEVENT_ ##ev, c) -#else /* CONFIG_QUEUED_LOCK_STAT */ +#else /* CONFIG_LOCK_EVENT_COUNTS */ #define lockevent_inc(ev) #define lockevent_add(ev, c) #define lockevent_cond_inc(ev, c) -#endif /* CONFIG_QUEUED_LOCK_STAT */ +#endif /* CONFIG_LOCK_EVENT_COUNTS */ diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h index 1db5b37..5415267 100644 --- a/kernel/locking/qspinlock_stat.h +++ b/kernel/locking/qspinlock_stat.h @@ -9,76 +9,29 @@ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * - * Authors: Waiman Long + * Authors: Waiman Long */ -/* - * When queued spinlock statistical counters are enabled, the following - * debugfs files will be created for reporting the counter values: - * - * /qlockstat/ - * pv_hash_hops - average # of hops per hashing operation - * pv_kick_unlock - # of vCPU kicks issued at unlock time - * pv_kick_wake - # of vCPU kicks used for computing pv_latency_wake - * pv_latency_kick - average latency (ns) of vCPU kick operation - * pv_latency_wake - average latency (ns) from vCPU kick to wakeup - * pv_lock_stealing - # of lock stealing operations - * pv_spurious_wakeup - # of spurious wakeups in non-head vCPUs - * pv_wait_again - # of wait's after a queue head vCPU kick - * pv_wait_early - # of early vCPU wait's - * pv_wait_head - # of vCPU wait's at the queue head - * pv_wait_node - # of vCPU wait's at a non-head queue node - * lock_pending - # of locking operations via pending code - * lock_slowpath - # of locking operations via MCS lock queue - * lock_use_node2 - # of locking operations that use 2nd per-CPU node - * lock_use_node3 - # of locking operations that use 3rd per-CPU node - * lock_use_node4 - # of locking operations that use 4th per-CPU node - * lock_no_node - # of locking operations without using per-CPU node - * - * Subtracting lock_use_node[234] from lock_slowpath will give you - * lock_use_node1. - * - * Writing to the special ".reset_counts" file will reset all the above - * counter values. - * - * These statistical counters are implemented as per-cpu variables which are - * summed and computed whenever the corresponding debugfs files are read. This - * minimizes added overhead making the counters usable even in a production - * environment. - * - * There may be slight difference between pv_kick_wake and pv_kick_unlock. - */ #include "lock_events.h" -#ifdef CONFIG_QUEUED_LOCK_STAT +#ifdef CONFIG_LOCK_EVENT_COUNTS +#ifdef CONFIG_PARAVIRT_SPINLOCKS /* - * Collect pvqspinlock statistics + * Collect pvqspinlock locking event counts */ -#include #include #include #include #define EVENT_COUNT(ev) lockevents[LOCKEVENT_ ## ev] -#undef LOCK_EVENT -#define LOCK_EVENT(name) [LOCKEVENT_ ## name] = #name, - -static const char * const lockevent_names[lockevent_num + 1] = { - -#include "lock_events_list.h" - - [LOCKEVENT_reset_cnts] = ".reset_counts", -}; - /* - * Per-cpu counters + * PV specific per-cpu counter */ -DEFINE_PER_CPU(unsigned long, lockevents[lockevent_num]); static DEFINE_PER_CPU(u64, pv_kick_time); /* - * Function to read and return the qlock statistical counter values + * Function to read and return the PV qspinlock counts. * * The following counters are handled specially: * 1. pv_latency_kick @@ -88,8 +41,8 @@ * 3. pv_hash_hops * Average hops/hash = pv_hash_hops/pv_kick_unlock */ -static ssize_t lockevent_read(struct file *file, char __user *user_buf, - size_t count, loff_t *ppos) +ssize_t lockevent_read(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) { char buf[64]; int cpu, id, len; @@ -150,78 +103,6 @@ static ssize_t lockevent_read(struct file *file, char __user *user_buf, } /* - * Function to handle write request - * - * When id = .reset_cnts, reset all the counter values. - */ -static ssize_t lockevent_write(struct file *file, const char __user *user_buf, - size_t count, loff_t *ppos) -{ - int cpu; - - /* - * Get the counter ID stored in file->f_inode->i_private - */ - if ((long)file_inode(file)->i_private != LOCKEVENT_reset_cnts) - return count; - - for_each_possible_cpu(cpu) { - int i; - unsigned long *ptr = per_cpu_ptr(lockevents, cpu); - - for (i = 0 ; i < lockevent_num; i++) - WRITE_ONCE(ptr[i], 0); - } - return count; -} - -/* - * Debugfs data structures - */ -static const struct file_operations fops_lockevent = { - .read = lockevent_read, - .write = lockevent_write, - .llseek = default_llseek, -}; - -/* - * Initialize debugfs for the qspinlock statistical counters - */ -static int __init init_qspinlock_stat(void) -{ - struct dentry *d_counts = debugfs_create_dir("qlockstat", NULL); - int i; - - if (!d_counts) - goto out; - - /* - * Create the debugfs files - * - * As reading from and writing to the stat files can be slow, only - * root is allowed to do the read/write to limit impact to system - * performance. - */ - for (i = 0; i < lockevent_num; i++) - if (!debugfs_create_file(lockevent_names[i], 0400, d_counts, - (void *)(long)i, &fops_lockevent)) - goto fail_undo; - - if (!debugfs_create_file(lockevent_names[LOCKEVENT_reset_cnts], 0200, - d_counts, (void *)(long)LOCKEVENT_reset_cnts, - &fops_lockevent)) - goto fail_undo; - - return 0; -fail_undo: - debugfs_remove_recursive(d_counts); -out: - pr_warn("Could not create 'qlockstat' debugfs entries\n"); - return -ENOMEM; -} -fs_initcall(init_qspinlock_stat); - -/* * PV hash hop count */ static inline void lockevent_pv_hop(int hopcnt) @@ -260,8 +141,10 @@ static inline void __pv_wait(u8 *ptr, u8 val) #define pv_kick(c) __pv_kick(c) #define pv_wait(p, v) __pv_wait(p, v) -#else /* CONFIG_QUEUED_LOCK_STAT */ +#endif /* CONFIG_PARAVIRT_SPINLOCKS */ + +#else /* CONFIG_LOCK_EVENT_COUNTS */ static inline void lockevent_pv_hop(int hopcnt) { } -#endif /* CONFIG_QUEUED_LOCK_STAT */ +#endif /* CONFIG_LOCK_EVENT_COUNTS */ From patchwork Thu Feb 7 19:07:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801995 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31076746 for ; Thu, 7 Feb 2019 19:11:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2158D2E068 for ; Thu, 7 Feb 2019 19:11:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 148C22E06C; Thu, 7 Feb 2019 19:11:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 981462E05F for ; Thu, 7 Feb 2019 19:11:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727138AbfBGTIv (ORCPT ); Thu, 7 Feb 2019 14:08:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46070 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726654AbfBGTIt (ORCPT ); Thu, 7 Feb 2019 14:08:49 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6737737EE7; Thu, 7 Feb 2019 19:08:47 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 550C262FA3; Thu, 7 Feb 2019 19:08:45 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 03/22] locking/rwsem: Relocate rwsem_down_read_failed() Date: Thu, 7 Feb 2019 14:07:07 -0500 Message-Id: <1549566446-27967-4-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 07 Feb 2019 19:08:48 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The rwsem_down_read_failed*() functions were relocted from above the optimistic spinning section to below that section. This enables the reader functions to use optimisitic spinning in future patches. There is no code change. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 172 ++++++++++++++++++++++---------------------- 1 file changed, 86 insertions(+), 86 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index fbe9634..0d518b5 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -225,92 +225,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, } /* - * Wait for the read lock to be granted - */ -static inline struct rw_semaphore __sched * -__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state) -{ - long count, adjustment = -RWSEM_ACTIVE_READ_BIAS; - struct rwsem_waiter waiter; - DEFINE_WAKE_Q(wake_q); - - waiter.task = current; - waiter.type = RWSEM_WAITING_FOR_READ; - - raw_spin_lock_irq(&sem->wait_lock); - if (list_empty(&sem->wait_list)) { - /* - * In case the wait queue is empty and the lock isn't owned - * by a writer, this reader can exit the slowpath and return - * immediately as its RWSEM_ACTIVE_READ_BIAS has already - * been set in the count. - */ - if (atomic_long_read(&sem->count) >= 0) { - raw_spin_unlock_irq(&sem->wait_lock); - return sem; - } - adjustment += RWSEM_WAITING_BIAS; - } - list_add_tail(&waiter.list, &sem->wait_list); - - /* we're now waiting on the lock, but no longer actively locking */ - count = atomic_long_add_return(adjustment, &sem->count); - - /* - * If there are no active locks, wake the front queued process(es). - * - * If there are no writers and we are first in the queue, - * wake our own waiter to join the existing active readers ! - */ - if (count == RWSEM_WAITING_BIAS || - (count > RWSEM_WAITING_BIAS && - adjustment != -RWSEM_ACTIVE_READ_BIAS)) - __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); - - raw_spin_unlock_irq(&sem->wait_lock); - wake_up_q(&wake_q); - - /* wait to be given the lock */ - while (true) { - set_current_state(state); - if (!waiter.task) - break; - if (signal_pending_state(state, current)) { - raw_spin_lock_irq(&sem->wait_lock); - if (waiter.task) - goto out_nolock; - raw_spin_unlock_irq(&sem->wait_lock); - break; - } - schedule(); - } - - __set_current_state(TASK_RUNNING); - return sem; -out_nolock: - list_del(&waiter.list); - if (list_empty(&sem->wait_list)) - atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count); - raw_spin_unlock_irq(&sem->wait_lock); - __set_current_state(TASK_RUNNING); - return ERR_PTR(-EINTR); -} - -__visible struct rw_semaphore * __sched -rwsem_down_read_failed(struct rw_semaphore *sem) -{ - return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE); -} -EXPORT_SYMBOL(rwsem_down_read_failed); - -__visible struct rw_semaphore * __sched -rwsem_down_read_failed_killable(struct rw_semaphore *sem) -{ - return __rwsem_down_read_failed_common(sem, TASK_KILLABLE); -} -EXPORT_SYMBOL(rwsem_down_read_failed_killable); - -/* * This function must be called with the sem->wait_lock held to prevent * race conditions between checking the rwsem wait list and setting the * sem->count accordingly. @@ -505,6 +419,92 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) #endif /* + * Wait for the read lock to be granted + */ +static inline struct rw_semaphore __sched * +__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state) +{ + long count, adjustment = -RWSEM_ACTIVE_READ_BIAS; + struct rwsem_waiter waiter; + DEFINE_WAKE_Q(wake_q); + + waiter.task = current; + waiter.type = RWSEM_WAITING_FOR_READ; + + raw_spin_lock_irq(&sem->wait_lock); + if (list_empty(&sem->wait_list)) { + /* + * In case the wait queue is empty and the lock isn't owned + * by a writer, this reader can exit the slowpath and return + * immediately as its RWSEM_ACTIVE_READ_BIAS has already + * been set in the count. + */ + if (atomic_long_read(&sem->count) >= 0) { + raw_spin_unlock_irq(&sem->wait_lock); + return sem; + } + adjustment += RWSEM_WAITING_BIAS; + } + list_add_tail(&waiter.list, &sem->wait_list); + + /* we're now waiting on the lock, but no longer actively locking */ + count = atomic_long_add_return(adjustment, &sem->count); + + /* + * If there are no active locks, wake the front queued process(es). + * + * If there are no writers and we are first in the queue, + * wake our own waiter to join the existing active readers ! + */ + if (count == RWSEM_WAITING_BIAS || + (count > RWSEM_WAITING_BIAS && + adjustment != -RWSEM_ACTIVE_READ_BIAS)) + __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); + + raw_spin_unlock_irq(&sem->wait_lock); + wake_up_q(&wake_q); + + /* wait to be given the lock */ + while (true) { + set_current_state(state); + if (!waiter.task) + break; + if (signal_pending_state(state, current)) { + raw_spin_lock_irq(&sem->wait_lock); + if (waiter.task) + goto out_nolock; + raw_spin_unlock_irq(&sem->wait_lock); + break; + } + schedule(); + } + + __set_current_state(TASK_RUNNING); + return sem; +out_nolock: + list_del(&waiter.list); + if (list_empty(&sem->wait_list)) + atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count); + raw_spin_unlock_irq(&sem->wait_lock); + __set_current_state(TASK_RUNNING); + return ERR_PTR(-EINTR); +} + +__visible struct rw_semaphore * __sched +rwsem_down_read_failed(struct rw_semaphore *sem) +{ + return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE); +} +EXPORT_SYMBOL(rwsem_down_read_failed); + +__visible struct rw_semaphore * __sched +rwsem_down_read_failed_killable(struct rw_semaphore *sem) +{ + return __rwsem_down_read_failed_common(sem, TASK_KILLABLE); +} +EXPORT_SYMBOL(rwsem_down_read_failed_killable); + +/* * Wait until we successfully acquire the write lock */ static inline struct rw_semaphore * From patchwork Thu Feb 7 19:07:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801997 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 617171669 for ; Thu, 7 Feb 2019 19:11:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 507642CD35 for ; Thu, 7 Feb 2019 19:11:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4451F2E06C; Thu, 7 Feb 2019 19:11:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54FCE2CD35 for ; Thu, 7 Feb 2019 19:11:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727194AbfBGTJD (ORCPT ); Thu, 7 Feb 2019 14:09:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44350 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726576AbfBGTJC (ORCPT ); Thu, 7 Feb 2019 14:09:02 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D105EE6A87; Thu, 7 Feb 2019 19:09:00 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 80B8860FDF; Thu, 7 Feb 2019 19:08:47 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 04/22] locking/rwsem: Remove arch specific rwsem files Date: Thu, 7 Feb 2019 14:07:08 -0500 Message-Id: <1549566446-27967-5-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 07 Feb 2019 19:09:01 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP As the generic rwsem-xadd code is using the appropriate acquire and release versions of the atomic operations, the arch specific rwsem.h files will not be that much faster than the generic code. So we can remove those arch specific rwsem.h and stop building asm/rwsem.h to reduce maintenance effort. The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h as no other code other than those under kernel/locking needs to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 ----------------------------------- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/include/asm/rwsem.h | 172 ----------------------------- arch/powerpc/include/asm/Kbuild | 1 - arch/s390/include/asm/Kbuild | 1 - arch/sh/include/asm/Kbuild | 1 - arch/sparc/include/asm/Kbuild | 1 - arch/x86/include/asm/rwsem.h | 237 ---------------------------------------- arch/x86/lib/Makefile | 1 - arch/x86/lib/rwsem.S | 156 -------------------------- arch/xtensa/include/asm/Kbuild | 1 - include/asm-generic/rwsem.h | 140 ------------------------ include/linux/rwsem.h | 4 +- kernel/locking/percpu-rwsem.c | 2 + kernel/locking/rwsem.h | 130 ++++++++++++++++++++++ 18 files changed, 133 insertions(+), 929 deletions(-) delete mode 100644 arch/alpha/include/asm/rwsem.h delete mode 100644 arch/ia64/include/asm/rwsem.h delete mode 100644 arch/x86/include/asm/rwsem.h delete mode 100644 arch/x86/lib/rwsem.S delete mode 100644 include/asm-generic/rwsem.h diff --git a/MAINTAINERS b/MAINTAINERS index 08e989a..0b921f2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8922,7 +8922,6 @@ F: arch/*/include/asm/spinlock*.h F: include/linux/rwlock*.h F: include/linux/mutex*.h F: include/linux/rwsem*.h -F: arch/*/include/asm/rwsem.h F: include/linux/seqlock.h F: lib/locking*.[ch] F: kernel/locking/ diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h deleted file mode 100644 index cf8fc8f9..0000000 --- a/arch/alpha/include/asm/rwsem.h +++ /dev/null @@ -1,211 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _ALPHA_RWSEM_H -#define _ALPHA_RWSEM_H - -/* - * Written by Ivan Kokshaysky , 2001. - * Based on asm-alpha/semaphore.h and asm-i386/rwsem.h - */ - -#ifndef _LINUX_RWSEM_H -#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead" -#endif - -#ifdef __KERNEL__ - -#include - -#define RWSEM_UNLOCKED_VALUE 0x0000000000000000L -#define RWSEM_ACTIVE_BIAS 0x0000000000000001L -#define RWSEM_ACTIVE_MASK 0x00000000ffffffffL -#define RWSEM_WAITING_BIAS (-0x0000000100000000L) -#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS -#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS) - -static inline int ___down_read(struct rw_semaphore *sem) -{ - long oldcount; -#ifndef CONFIG_SMP - oldcount = sem->count.counter; - sem->count.counter += RWSEM_ACTIVE_READ_BIAS; -#else - long temp; - __asm__ __volatile__( - "1: ldq_l %0,%1\n" - " addq %0,%3,%2\n" - " stq_c %2,%1\n" - " beq %2,2f\n" - " mb\n" - ".subsection 2\n" - "2: br 1b\n" - ".previous" - :"=&r" (oldcount), "=m" (sem->count), "=&r" (temp) - :"Ir" (RWSEM_ACTIVE_READ_BIAS), "m" (sem->count) : "memory"); -#endif - return (oldcount < 0); -} - -static inline void __down_read(struct rw_semaphore *sem) -{ - if (unlikely(___down_read(sem))) - rwsem_down_read_failed(sem); -} - -static inline int __down_read_killable(struct rw_semaphore *sem) -{ - if (unlikely(___down_read(sem))) - if (IS_ERR(rwsem_down_read_failed_killable(sem))) - return -EINTR; - - return 0; -} - -/* - * trylock for reading -- returns 1 if successful, 0 if contention - */ -static inline int __down_read_trylock(struct rw_semaphore *sem) -{ - long old, new, res; - - res = atomic_long_read(&sem->count); - do { - new = res + RWSEM_ACTIVE_READ_BIAS; - if (new <= 0) - break; - old = res; - res = atomic_long_cmpxchg(&sem->count, old, new); - } while (res != old); - return res >= 0 ? 1 : 0; -} - -static inline long ___down_write(struct rw_semaphore *sem) -{ - long oldcount; -#ifndef CONFIG_SMP - oldcount = sem->count.counter; - sem->count.counter += RWSEM_ACTIVE_WRITE_BIAS; -#else - long temp; - __asm__ __volatile__( - "1: ldq_l %0,%1\n" - " addq %0,%3,%2\n" - " stq_c %2,%1\n" - " beq %2,2f\n" - " mb\n" - ".subsection 2\n" - "2: br 1b\n" - ".previous" - :"=&r" (oldcount), "=m" (sem->count), "=&r" (temp) - :"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory"); -#endif - return oldcount; -} - -static inline void __down_write(struct rw_semaphore *sem) -{ - if (unlikely(___down_write(sem))) - rwsem_down_write_failed(sem); -} - -static inline int __down_write_killable(struct rw_semaphore *sem) -{ - if (unlikely(___down_write(sem))) { - if (IS_ERR(rwsem_down_write_failed_killable(sem))) - return -EINTR; - } - - return 0; -} - -/* - * trylock for writing -- returns 1 if successful, 0 if contention - */ -static inline int __down_write_trylock(struct rw_semaphore *sem) -{ - long ret = atomic_long_cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE, - RWSEM_ACTIVE_WRITE_BIAS); - if (ret == RWSEM_UNLOCKED_VALUE) - return 1; - return 0; -} - -static inline void __up_read(struct rw_semaphore *sem) -{ - long oldcount; -#ifndef CONFIG_SMP - oldcount = sem->count.counter; - sem->count.counter -= RWSEM_ACTIVE_READ_BIAS; -#else - long temp; - __asm__ __volatile__( - " mb\n" - "1: ldq_l %0,%1\n" - " subq %0,%3,%2\n" - " stq_c %2,%1\n" - " beq %2,2f\n" - ".subsection 2\n" - "2: br 1b\n" - ".previous" - :"=&r" (oldcount), "=m" (sem->count), "=&r" (temp) - :"Ir" (RWSEM_ACTIVE_READ_BIAS), "m" (sem->count) : "memory"); -#endif - if (unlikely(oldcount < 0)) - if ((int)oldcount - RWSEM_ACTIVE_READ_BIAS == 0) - rwsem_wake(sem); -} - -static inline void __up_write(struct rw_semaphore *sem) -{ - long count; -#ifndef CONFIG_SMP - sem->count.counter -= RWSEM_ACTIVE_WRITE_BIAS; - count = sem->count.counter; -#else - long temp; - __asm__ __volatile__( - " mb\n" - "1: ldq_l %0,%1\n" - " subq %0,%3,%2\n" - " stq_c %2,%1\n" - " beq %2,2f\n" - " subq %0,%3,%0\n" - ".subsection 2\n" - "2: br 1b\n" - ".previous" - :"=&r" (count), "=m" (sem->count), "=&r" (temp) - :"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory"); -#endif - if (unlikely(count)) - if ((int)count == 0) - rwsem_wake(sem); -} - -/* - * downgrade write lock to read lock - */ -static inline void __downgrade_write(struct rw_semaphore *sem) -{ - long oldcount; -#ifndef CONFIG_SMP - oldcount = sem->count.counter; - sem->count.counter -= RWSEM_WAITING_BIAS; -#else - long temp; - __asm__ __volatile__( - "1: ldq_l %0,%1\n" - " addq %0,%3,%2\n" - " stq_c %2,%1\n" - " beq %2,2f\n" - " mb\n" - ".subsection 2\n" - "2: br 1b\n" - ".previous" - :"=&r" (oldcount), "=m" (sem->count), "=&r" (temp) - :"Ir" (-RWSEM_WAITING_BIAS), "m" (sem->count) : "memory"); -#endif - if (unlikely(oldcount < 0)) - rwsem_downgrade_wake(sem); -} - -#endif /* __KERNEL__ */ -#endif /* _ALPHA_RWSEM_H */ diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 1d66db9..989e1a7 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -12,7 +12,6 @@ generic-y += mm-arch-hooks.h generic-y += msi.h generic-y += parport.h generic-y += preempt.h -generic-y += rwsem.h generic-y += seccomp.h generic-y += segment.h generic-y += serial.h diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 1e17ea5..60a933b 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -16,7 +16,6 @@ generic-y += mm-arch-hooks.h generic-y += msi.h generic-y += qrwlock.h generic-y += qspinlock.h -generic-y += rwsem.h generic-y += segment.h generic-y += serial.h generic-y += set_memory.h diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild index b25fd42..0a12feb 100644 --- a/arch/hexagon/include/asm/Kbuild +++ b/arch/hexagon/include/asm/Kbuild @@ -26,7 +26,6 @@ generic-y += mm-arch-hooks.h generic-y += pci.h generic-y += percpu.h generic-y += preempt.h -generic-y += rwsem.h generic-y += sections.h generic-y += segment.h generic-y += serial.h diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h deleted file mode 100644 index 9179106..0000000 --- a/arch/ia64/include/asm/rwsem.h +++ /dev/null @@ -1,172 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * R/W semaphores for ia64 - * - * Copyright (C) 2003 Ken Chen - * Copyright (C) 2003 Asit Mallick - * Copyright (C) 2005 Christoph Lameter - * - * Based on asm-i386/rwsem.h and other architecture implementation. - * - * The MSW of the count is the negated number of active writers and - * waiting lockers, and the LSW is the total number of active locks. - * - * The lock count is initialized to 0 (no active and no waiting lockers). - * - * When a writer subtracts WRITE_BIAS, it'll get 0xffffffff00000001 for - * the case of an uncontended lock. Readers increment by 1 and see a positive - * value when uncontended, negative if there are writers (and maybe) readers - * waiting (in which case it goes to sleep). - */ - -#ifndef _ASM_IA64_RWSEM_H -#define _ASM_IA64_RWSEM_H - -#ifndef _LINUX_RWSEM_H -#error "Please don't include directly, use instead." -#endif - -#include - -#define RWSEM_UNLOCKED_VALUE __IA64_UL_CONST(0x0000000000000000) -#define RWSEM_ACTIVE_BIAS (1L) -#define RWSEM_ACTIVE_MASK (0xffffffffL) -#define RWSEM_WAITING_BIAS (-0x100000000L) -#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS -#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS) - -/* - * lock for reading - */ -static inline int -___down_read (struct rw_semaphore *sem) -{ - long result = ia64_fetchadd8_acq((unsigned long *)&sem->count.counter, 1); - - return (result < 0); -} - -static inline void -__down_read (struct rw_semaphore *sem) -{ - if (___down_read(sem)) - rwsem_down_read_failed(sem); -} - -static inline int -__down_read_killable (struct rw_semaphore *sem) -{ - if (___down_read(sem)) - if (IS_ERR(rwsem_down_read_failed_killable(sem))) - return -EINTR; - - return 0; -} - -/* - * lock for writing - */ -static inline long -___down_write (struct rw_semaphore *sem) -{ - long old, new; - - do { - old = atomic_long_read(&sem->count); - new = old + RWSEM_ACTIVE_WRITE_BIAS; - } while (atomic_long_cmpxchg_acquire(&sem->count, old, new) != old); - - return old; -} - -static inline void -__down_write (struct rw_semaphore *sem) -{ - if (___down_write(sem)) - rwsem_down_write_failed(sem); -} - -static inline int -__down_write_killable (struct rw_semaphore *sem) -{ - if (___down_write(sem)) { - if (IS_ERR(rwsem_down_write_failed_killable(sem))) - return -EINTR; - } - - return 0; -} - -/* - * unlock after reading - */ -static inline void -__up_read (struct rw_semaphore *sem) -{ - long result = ia64_fetchadd8_rel((unsigned long *)&sem->count.counter, -1); - - if (result < 0 && (--result & RWSEM_ACTIVE_MASK) == 0) - rwsem_wake(sem); -} - -/* - * unlock after writing - */ -static inline void -__up_write (struct rw_semaphore *sem) -{ - long old, new; - - do { - old = atomic_long_read(&sem->count); - new = old - RWSEM_ACTIVE_WRITE_BIAS; - } while (atomic_long_cmpxchg_release(&sem->count, old, new) != old); - - if (new < 0 && (new & RWSEM_ACTIVE_MASK) == 0) - rwsem_wake(sem); -} - -/* - * trylock for reading -- returns 1 if successful, 0 if contention - */ -static inline int -__down_read_trylock (struct rw_semaphore *sem) -{ - long tmp; - while ((tmp = atomic_long_read(&sem->count)) >= 0) { - if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, tmp+1)) { - return 1; - } - } - return 0; -} - -/* - * trylock for writing -- returns 1 if successful, 0 if contention - */ -static inline int -__down_write_trylock (struct rw_semaphore *sem) -{ - long tmp = atomic_long_cmpxchg_acquire(&sem->count, - RWSEM_UNLOCKED_VALUE, RWSEM_ACTIVE_WRITE_BIAS); - return tmp == RWSEM_UNLOCKED_VALUE; -} - -/* - * downgrade write lock to read lock - */ -static inline void -__downgrade_write (struct rw_semaphore *sem) -{ - long old, new; - - do { - old = atomic_long_read(&sem->count); - new = old - RWSEM_WAITING_BIAS; - } while (atomic_long_cmpxchg_release(&sem->count, old, new) != old); - - if (old < 0) - rwsem_downgrade_wake(sem); -} - -#endif /* _ASM_IA64_RWSEM_H */ diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild index 77ff7fb..75bd77a 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -9,6 +9,5 @@ generic-y += irq_work.h generic-y += local64.h generic-y += mcs_spinlock.h generic-y += preempt.h -generic-y += rwsem.h generic-y += vtime.h generic-y += msi.h diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index e323977..4b77b42 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -21,7 +21,6 @@ generic-y += local64.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h generic-y += preempt.h -generic-y += rwsem.h generic-y += trace_clock.h generic-y += unaligned.h generic-y += word-at-a-time.h diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild index a6ef3fe..b89fce4 100644 --- a/arch/sh/include/asm/Kbuild +++ b/arch/sh/include/asm/Kbuild @@ -16,7 +16,6 @@ generic-y += mm-arch-hooks.h generic-y += parport.h generic-y += percpu.h generic-y += preempt.h -generic-y += rwsem.h generic-y += serial.h generic-y += sizes.h generic-y += trace_clock.h diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild index b82f64e..e843fc0 100644 --- a/arch/sparc/include/asm/Kbuild +++ b/arch/sparc/include/asm/Kbuild @@ -17,7 +17,6 @@ generic-y += mm-arch-hooks.h generic-y += module.h generic-y += msi.h generic-y += preempt.h -generic-y += rwsem.h generic-y += serial.h generic-y += trace_clock.h generic-y += word-at-a-time.h diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h deleted file mode 100644 index 4c25cf6..0000000 --- a/arch/x86/include/asm/rwsem.h +++ /dev/null @@ -1,237 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* rwsem.h: R/W semaphores implemented using XADD/CMPXCHG for i486+ - * - * Written by David Howells (dhowells@redhat.com). - * - * Derived from asm-x86/semaphore.h - * - * - * The MSW of the count is the negated number of active writers and waiting - * lockers, and the LSW is the total number of active locks - * - * The lock count is initialized to 0 (no active and no waiting lockers). - * - * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an - * uncontended lock. This can be determined because XADD returns the old value. - * Readers increment by 1 and see a positive value when uncontended, negative - * if there are writers (and maybe) readers waiting (in which case it goes to - * sleep). - * - * The value of WAITING_BIAS supports up to 32766 waiting processes. This can - * be extended to 65534 by manually checking the whole MSW rather than relying - * on the S flag. - * - * The value of ACTIVE_BIAS supports up to 65535 active processes. - * - * This should be totally fair - if anything is waiting, a process that wants a - * lock will go to the back of the queue. When the currently active lock is - * released, if there's a writer at the front of the queue, then that and only - * that will be woken up; if there's a bunch of consecutive readers at the - * front, then they'll all be woken up, but no other readers will be. - */ - -#ifndef _ASM_X86_RWSEM_H -#define _ASM_X86_RWSEM_H - -#ifndef _LINUX_RWSEM_H -#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead" -#endif - -#ifdef __KERNEL__ -#include - -/* - * The bias values and the counter type limits the number of - * potential readers/writers to 32767 for 32 bits and 2147483647 - * for 64 bits. - */ - -#ifdef CONFIG_X86_64 -# define RWSEM_ACTIVE_MASK 0xffffffffL -#else -# define RWSEM_ACTIVE_MASK 0x0000ffffL -#endif - -#define RWSEM_UNLOCKED_VALUE 0x00000000L -#define RWSEM_ACTIVE_BIAS 0x00000001L -#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1) -#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS -#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS) - -/* - * lock for reading - */ -#define ____down_read(sem, slow_path) \ -({ \ - struct rw_semaphore* ret; \ - asm volatile("# beginning down_read\n\t" \ - LOCK_PREFIX _ASM_INC "(%[sem])\n\t" \ - /* adds 0x00000001 */ \ - " jns 1f\n" \ - " call " slow_path "\n" \ - "1:\n\t" \ - "# ending down_read\n\t" \ - : "+m" (sem->count), "=a" (ret), \ - ASM_CALL_CONSTRAINT \ - : [sem] "a" (sem) \ - : "memory", "cc"); \ - ret; \ -}) - -static inline void __down_read(struct rw_semaphore *sem) -{ - ____down_read(sem, "call_rwsem_down_read_failed"); -} - -static inline int __down_read_killable(struct rw_semaphore *sem) -{ - if (IS_ERR(____down_read(sem, "call_rwsem_down_read_failed_killable"))) - return -EINTR; - return 0; -} - -/* - * trylock for reading -- returns 1 if successful, 0 if contention - */ -static inline bool __down_read_trylock(struct rw_semaphore *sem) -{ - long result, tmp; - asm volatile("# beginning __down_read_trylock\n\t" - " mov %[count],%[result]\n\t" - "1:\n\t" - " mov %[result],%[tmp]\n\t" - " add %[inc],%[tmp]\n\t" - " jle 2f\n\t" - LOCK_PREFIX " cmpxchg %[tmp],%[count]\n\t" - " jnz 1b\n\t" - "2:\n\t" - "# ending __down_read_trylock\n\t" - : [count] "+m" (sem->count), [result] "=&a" (result), - [tmp] "=&r" (tmp) - : [inc] "i" (RWSEM_ACTIVE_READ_BIAS) - : "memory", "cc"); - return result >= 0; -} - -/* - * lock for writing - */ -#define ____down_write(sem, slow_path) \ -({ \ - long tmp; \ - struct rw_semaphore* ret; \ - \ - asm volatile("# beginning down_write\n\t" \ - LOCK_PREFIX " xadd %[tmp],(%[sem])\n\t" \ - /* adds 0xffff0001, returns the old value */ \ - " test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \ - /* was the active mask 0 before? */\ - " jz 1f\n" \ - " call " slow_path "\n" \ - "1:\n" \ - "# ending down_write" \ - : "+m" (sem->count), [tmp] "=d" (tmp), \ - "=a" (ret), ASM_CALL_CONSTRAINT \ - : [sem] "a" (sem), "[tmp]" (RWSEM_ACTIVE_WRITE_BIAS) \ - : "memory", "cc"); \ - ret; \ -}) - -static inline void __down_write(struct rw_semaphore *sem) -{ - ____down_write(sem, "call_rwsem_down_write_failed"); -} - -static inline int __down_write_killable(struct rw_semaphore *sem) -{ - if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable"))) - return -EINTR; - - return 0; -} - -/* - * trylock for writing -- returns 1 if successful, 0 if contention - */ -static inline bool __down_write_trylock(struct rw_semaphore *sem) -{ - bool result; - long tmp0, tmp1; - asm volatile("# beginning __down_write_trylock\n\t" - " mov %[count],%[tmp0]\n\t" - "1:\n\t" - " test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" - /* was the active mask 0 before? */ - " jnz 2f\n\t" - " mov %[tmp0],%[tmp1]\n\t" - " add %[inc],%[tmp1]\n\t" - LOCK_PREFIX " cmpxchg %[tmp1],%[count]\n\t" - " jnz 1b\n\t" - "2:\n\t" - CC_SET(e) - "# ending __down_write_trylock\n\t" - : [count] "+m" (sem->count), [tmp0] "=&a" (tmp0), - [tmp1] "=&r" (tmp1), CC_OUT(e) (result) - : [inc] "er" (RWSEM_ACTIVE_WRITE_BIAS) - : "memory"); - return result; -} - -/* - * unlock after reading - */ -static inline void __up_read(struct rw_semaphore *sem) -{ - long tmp; - asm volatile("# beginning __up_read\n\t" - LOCK_PREFIX " xadd %[tmp],(%[sem])\n\t" - /* subtracts 1, returns the old value */ - " jns 1f\n\t" - " call call_rwsem_wake\n" /* expects old value in %edx */ - "1:\n" - "# ending __up_read\n" - : "+m" (sem->count), [tmp] "=d" (tmp) - : [sem] "a" (sem), "[tmp]" (-RWSEM_ACTIVE_READ_BIAS) - : "memory", "cc"); -} - -/* - * unlock after writing - */ -static inline void __up_write(struct rw_semaphore *sem) -{ - long tmp; - asm volatile("# beginning __up_write\n\t" - LOCK_PREFIX " xadd %[tmp],(%[sem])\n\t" - /* subtracts 0xffff0001, returns the old value */ - " jns 1f\n\t" - " call call_rwsem_wake\n" /* expects old value in %edx */ - "1:\n\t" - "# ending __up_write\n" - : "+m" (sem->count), [tmp] "=d" (tmp) - : [sem] "a" (sem), "[tmp]" (-RWSEM_ACTIVE_WRITE_BIAS) - : "memory", "cc"); -} - -/* - * downgrade write lock to read lock - */ -static inline void __downgrade_write(struct rw_semaphore *sem) -{ - asm volatile("# beginning __downgrade_write\n\t" - LOCK_PREFIX _ASM_ADD "%[inc],(%[sem])\n\t" - /* - * transitions 0xZZZZ0001 -> 0xYYYY0001 (i386) - * 0xZZZZZZZZ00000001 -> 0xYYYYYYYY00000001 (x86_64) - */ - " jns 1f\n\t" - " call call_rwsem_downgrade_wake\n" - "1:\n\t" - "# ending __downgrade_write\n" - : "+m" (sem->count) - : [sem] "a" (sem), [inc] "er" (-RWSEM_WAITING_BIAS) - : "memory", "cc"); -} - -#endif /* __KERNEL__ */ -#endif /* _ASM_X86_RWSEM_H */ diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 140e618..9866520 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -23,7 +23,6 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o lib-y := delay.o misc.o cmdline.o cpu.o lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o lib-y += memcpy_$(BITS).o -lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o lib-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S deleted file mode 100644 index dc2ab6e..0000000 --- a/arch/x86/lib/rwsem.S +++ /dev/null @@ -1,156 +0,0 @@ -/* - * x86 semaphore implementation. - * - * (C) Copyright 1999 Linus Torvalds - * - * Portions Copyright 1999 Red Hat, Inc. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - * - * rw semaphores implemented November 1999 by Benjamin LaHaise - */ - -#include -#include -#include - -#define __ASM_HALF_REG(reg) __ASM_SEL(reg, e##reg) -#define __ASM_HALF_SIZE(inst) __ASM_SEL(inst##w, inst##l) - -#ifdef CONFIG_X86_32 - -/* - * The semaphore operations have a special calling sequence that - * allow us to do a simpler in-line version of them. These routines - * need to convert that sequence back into the C sequence when - * there is contention on the semaphore. - * - * %eax contains the semaphore pointer on entry. Save the C-clobbered - * registers (%eax, %edx and %ecx) except %eax which is either a return - * value or just gets clobbered. Same is true for %edx so make sure GCC - * reloads it after the slow path, by making it hold a temporary, for - * example see ____down_write(). - */ - -#define save_common_regs \ - pushl %ecx - -#define restore_common_regs \ - popl %ecx - - /* Avoid uglifying the argument copying x86-64 needs to do. */ - .macro movq src, dst - .endm - -#else - -/* - * x86-64 rwsem wrappers - * - * This interfaces the inline asm code to the slow-path - * C routines. We need to save the call-clobbered regs - * that the asm does not mark as clobbered, and move the - * argument from %rax to %rdi. - * - * NOTE! We don't need to save %rax, because the functions - * will always return the semaphore pointer in %rax (which - * is also the input argument to these helpers) - * - * The following can clobber %rdx because the asm clobbers it: - * call_rwsem_down_write_failed - * call_rwsem_wake - * but %rdi, %rsi, %rcx, %r8-r11 always need saving. - */ - -#define save_common_regs \ - pushq %rdi; \ - pushq %rsi; \ - pushq %rcx; \ - pushq %r8; \ - pushq %r9; \ - pushq %r10; \ - pushq %r11 - -#define restore_common_regs \ - popq %r11; \ - popq %r10; \ - popq %r9; \ - popq %r8; \ - popq %rcx; \ - popq %rsi; \ - popq %rdi - -#endif - -/* Fix up special calling conventions */ -ENTRY(call_rwsem_down_read_failed) - FRAME_BEGIN - save_common_regs - __ASM_SIZE(push,) %__ASM_REG(dx) - movq %rax,%rdi - call rwsem_down_read_failed - __ASM_SIZE(pop,) %__ASM_REG(dx) - restore_common_regs - FRAME_END - ret -ENDPROC(call_rwsem_down_read_failed) - -ENTRY(call_rwsem_down_read_failed_killable) - FRAME_BEGIN - save_common_regs - __ASM_SIZE(push,) %__ASM_REG(dx) - movq %rax,%rdi - call rwsem_down_read_failed_killable - __ASM_SIZE(pop,) %__ASM_REG(dx) - restore_common_regs - FRAME_END - ret -ENDPROC(call_rwsem_down_read_failed_killable) - -ENTRY(call_rwsem_down_write_failed) - FRAME_BEGIN - save_common_regs - movq %rax,%rdi - call rwsem_down_write_failed - restore_common_regs - FRAME_END - ret -ENDPROC(call_rwsem_down_write_failed) - -ENTRY(call_rwsem_down_write_failed_killable) - FRAME_BEGIN - save_common_regs - movq %rax,%rdi - call rwsem_down_write_failed_killable - restore_common_regs - FRAME_END - ret -ENDPROC(call_rwsem_down_write_failed_killable) - -ENTRY(call_rwsem_wake) - FRAME_BEGIN - /* do nothing if still outstanding active readers */ - __ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx) - jnz 1f - save_common_regs - movq %rax,%rdi - call rwsem_wake - restore_common_regs -1: FRAME_END - ret -ENDPROC(call_rwsem_wake) - -ENTRY(call_rwsem_downgrade_wake) - FRAME_BEGIN - save_common_regs - __ASM_SIZE(push,) %__ASM_REG(dx) - movq %rax,%rdi - call rwsem_downgrade_wake - __ASM_SIZE(pop,) %__ASM_REG(dx) - restore_common_regs - FRAME_END - ret -ENDPROC(call_rwsem_downgrade_wake) diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild index e255683..7026b37 100644 --- a/arch/xtensa/include/asm/Kbuild +++ b/arch/xtensa/include/asm/Kbuild @@ -23,7 +23,6 @@ generic-y += mm-arch-hooks.h generic-y += param.h generic-y += percpu.h generic-y += preempt.h -generic-y += rwsem.h generic-y += sections.h generic-y += topology.h generic-y += trace_clock.h diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h deleted file mode 100644 index 93e67a0..0000000 --- a/include/asm-generic/rwsem.h +++ /dev/null @@ -1,140 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _ASM_GENERIC_RWSEM_H -#define _ASM_GENERIC_RWSEM_H - -#ifndef _LINUX_RWSEM_H -#error "Please don't include directly, use instead." -#endif - -#ifdef __KERNEL__ - -/* - * R/W semaphores originally for PPC using the stuff in lib/rwsem.c. - * Adapted largely from include/asm-i386/rwsem.h - * by Paul Mackerras . - */ - -/* - * the semaphore definition - */ -#ifdef CONFIG_64BIT -# define RWSEM_ACTIVE_MASK 0xffffffffL -#else -# define RWSEM_ACTIVE_MASK 0x0000ffffL -#endif - -#define RWSEM_UNLOCKED_VALUE 0x00000000L -#define RWSEM_ACTIVE_BIAS 0x00000001L -#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1) -#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS -#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS) - -/* - * lock for reading - */ -static inline void __down_read(struct rw_semaphore *sem) -{ - if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) - rwsem_down_read_failed(sem); -} - -static inline int __down_read_killable(struct rw_semaphore *sem) -{ - if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) { - if (IS_ERR(rwsem_down_read_failed_killable(sem))) - return -EINTR; - } - - return 0; -} - -static inline int __down_read_trylock(struct rw_semaphore *sem) -{ - long tmp; - - while ((tmp = atomic_long_read(&sem->count)) >= 0) { - if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, - tmp + RWSEM_ACTIVE_READ_BIAS)) { - return 1; - } - } - return 0; -} - -/* - * lock for writing - */ -static inline void __down_write(struct rw_semaphore *sem) -{ - long tmp; - - tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, - &sem->count); - if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) - rwsem_down_write_failed(sem); -} - -static inline int __down_write_killable(struct rw_semaphore *sem) -{ - long tmp; - - tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, - &sem->count); - if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) - if (IS_ERR(rwsem_down_write_failed_killable(sem))) - return -EINTR; - return 0; -} - -static inline int __down_write_trylock(struct rw_semaphore *sem) -{ - long tmp; - - tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE, - RWSEM_ACTIVE_WRITE_BIAS); - return tmp == RWSEM_UNLOCKED_VALUE; -} - -/* - * unlock after reading - */ -static inline void __up_read(struct rw_semaphore *sem) -{ - long tmp; - - tmp = atomic_long_dec_return_release(&sem->count); - if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)) - rwsem_wake(sem); -} - -/* - * unlock after writing - */ -static inline void __up_write(struct rw_semaphore *sem) -{ - if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS, - &sem->count) < 0)) - rwsem_wake(sem); -} - -/* - * downgrade write lock to read lock - */ -static inline void __downgrade_write(struct rw_semaphore *sem) -{ - long tmp; - - /* - * When downgrading from exclusive to shared ownership, - * anything inside the write-locked region cannot leak - * into the read side. In contrast, anything in the - * read-locked region is ok to be re-ordered into the - * write side. As such, rely on RELEASE semantics. - */ - tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count); - if (tmp < 0) - rwsem_downgrade_wake(sem); -} - -#endif /* __KERNEL__ */ -#endif /* _ASM_GENERIC_RWSEM_H */ diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 67dbb57..6e56006 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -57,15 +57,13 @@ struct rw_semaphore { extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *); extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem); -/* Include the arch specific part */ -#include - /* In all implementations count != 0 means locked */ static inline int rwsem_is_locked(struct rw_semaphore *sem) { return atomic_long_read(&sem->count) != 0; } +#define RWSEM_UNLOCKED_VALUE 0L #define __RWSEM_INIT_COUNT(name) .count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE) #endif diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c index 883cf1b..f17dad9 100644 --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -7,6 +7,8 @@ #include #include +#include "rwsem.h" + int __percpu_init_rwsem(struct percpu_rw_semaphore *sem, const char *name, struct lock_class_key *rwsem_key) { diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h index bad2bca..067e265 100644 --- a/kernel/locking/rwsem.h +++ b/kernel/locking/rwsem.h @@ -32,6 +32,26 @@ # define DEBUG_RWSEMS_WARN_ON(c) #endif +/* + * R/W semaphores originally for PPC using the stuff in lib/rwsem.c. + * Adapted largely from include/asm-i386/rwsem.h + * by Paul Mackerras . + */ + +/* + * the semaphore definition + */ +#ifdef CONFIG_64BIT +# define RWSEM_ACTIVE_MASK 0xffffffffL +#else +# define RWSEM_ACTIVE_MASK 0x0000ffffL +#endif + +#define RWSEM_ACTIVE_BIAS 0x00000001L +#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1) +#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS +#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS) + #ifdef CONFIG_RWSEM_SPIN_ON_OWNER /* * All writes to owner are protected by WRITE_ONCE() to make sure that @@ -132,3 +152,113 @@ static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) { } #endif + +#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM +/* + * lock for reading + */ +static inline void __down_read(struct rw_semaphore *sem) +{ + if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) + rwsem_down_read_failed(sem); +} + +static inline int __down_read_killable(struct rw_semaphore *sem) +{ + if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) { + if (IS_ERR(rwsem_down_read_failed_killable(sem))) + return -EINTR; + } + + return 0; +} + +static inline int __down_read_trylock(struct rw_semaphore *sem) +{ + long tmp; + + while ((tmp = atomic_long_read(&sem->count)) >= 0) { + if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, + tmp + RWSEM_ACTIVE_READ_BIAS)) { + return 1; + } + } + return 0; +} + +/* + * lock for writing + */ +static inline void __down_write(struct rw_semaphore *sem) +{ + long tmp; + + tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, + &sem->count); + if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) + rwsem_down_write_failed(sem); +} + +static inline int __down_write_killable(struct rw_semaphore *sem) +{ + long tmp; + + tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, + &sem->count); + if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) + if (IS_ERR(rwsem_down_write_failed_killable(sem))) + return -EINTR; + return 0; +} + +static inline int __down_write_trylock(struct rw_semaphore *sem) +{ + long tmp; + + tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE, + RWSEM_ACTIVE_WRITE_BIAS); + return tmp == RWSEM_UNLOCKED_VALUE; +} + +/* + * unlock after reading + */ +static inline void __up_read(struct rw_semaphore *sem) +{ + long tmp; + + tmp = atomic_long_dec_return_release(&sem->count); + if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)) + rwsem_wake(sem); +} + +/* + * unlock after writing + */ +static inline void __up_write(struct rw_semaphore *sem) +{ + if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS, + &sem->count) < 0)) + rwsem_wake(sem); +} + +/* + * downgrade write lock to read lock + */ +static inline void __downgrade_write(struct rw_semaphore *sem) +{ + long tmp; + + /* + * When downgrading from exclusive to shared ownership, + * anything inside the write-locked region cannot leak + * into the read side. In contrast, anything in the + * read-locked region is ok to be re-ordered into the + * write side. As such, rely on RELEASE semantics. + */ + tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count); + if (tmp < 0) + rwsem_downgrade_wake(sem); +} + +#endif /* CONFIG_RWSEM_XCHGADD_ALGORITHM */ From patchwork Thu Feb 7 19:07:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801943 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8ED80746 for ; Thu, 7 Feb 2019 19:09:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7ABE42CD0E for ; Thu, 7 Feb 2019 19:09:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6CC132DD93; Thu, 7 Feb 2019 19:09:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CA5E92CD0E for ; Thu, 7 Feb 2019 19:09:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727225AbfBGTJH (ORCPT ); Thu, 7 Feb 2019 14:09:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49980 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726576AbfBGTJG (ORCPT ); Thu, 7 Feb 2019 14:09:06 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7044C10F85; Thu, 7 Feb 2019 19:09:05 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id F0DF061146; Thu, 7 Feb 2019 19:09:00 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 05/22] locking/rwsem: Move owner setting code from rwsem.c to rwsem.h Date: Thu, 7 Feb 2019 14:07:09 -0500 Message-Id: <1549566446-27967-6-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 07 Feb 2019 19:09:05 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The setting of owner field is specific to rwsem-xadd code, it is not needed for rwsem-spinlock. This patch moves all the owner setting code closer to the rwsem-xadd fast paths directly within rwsem.h file. For __down_read() and __down_read_killable(), it is assumed that rwsem will be marked as reader-owned when the functions return. That is currently true except in the transient case that the waiter queue just becomes empty. So a rwsem_set_reader_owned() call is added for this case. The __rwsem_set_reader_owned() call in __rwsem_mark_wake() is now necessary. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 6 +++--- kernel/locking/rwsem.c | 19 ++----------------- kernel/locking/rwsem.h | 17 +++++++++++++++-- 3 files changed, 20 insertions(+), 22 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 0d518b5..c213869 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -176,9 +176,8 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, goto try_reader_grant; } /* - * It is not really necessary to set it to reader-owned here, - * but it gives the spinners an early indication that the - * readers now have the lock. + * Set it to reader-owned to give spinners an early + * indication that readers now have the lock. */ __rwsem_set_reader_owned(sem, waiter->task); } @@ -441,6 +440,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) */ if (atomic_long_read(&sem->count) >= 0) { raw_spin_unlock_irq(&sem->wait_lock); + rwsem_set_reader_owned(sem); return sem; } adjustment += RWSEM_WAITING_BIAS; diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index e586f0d..59e5848 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -24,7 +24,6 @@ void __sched down_read(struct rw_semaphore *sem) rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); LOCK_CONTENDED(sem, __down_read_trylock, __down_read); - rwsem_set_reader_owned(sem); } EXPORT_SYMBOL(down_read); @@ -39,7 +38,6 @@ int __sched down_read_killable(struct rw_semaphore *sem) return -EINTR; } - rwsem_set_reader_owned(sem); return 0; } @@ -52,10 +50,8 @@ int down_read_trylock(struct rw_semaphore *sem) { int ret = __down_read_trylock(sem); - if (ret == 1) { + if (ret == 1) rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_); - rwsem_set_reader_owned(sem); - } return ret; } @@ -70,7 +66,6 @@ void __sched down_write(struct rw_semaphore *sem) rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_); LOCK_CONTENDED(sem, __down_write_trylock, __down_write); - rwsem_set_owner(sem); } EXPORT_SYMBOL(down_write); @@ -88,7 +83,6 @@ int __sched down_write_killable(struct rw_semaphore *sem) return -EINTR; } - rwsem_set_owner(sem); return 0; } @@ -101,10 +95,8 @@ int down_write_trylock(struct rw_semaphore *sem) { int ret = __down_write_trylock(sem); - if (ret == 1) { + if (ret == 1) rwsem_acquire(&sem->dep_map, 0, 1, _RET_IP_); - rwsem_set_owner(sem); - } return ret; } @@ -119,7 +111,6 @@ void up_read(struct rw_semaphore *sem) rwsem_release(&sem->dep_map, 1, _RET_IP_); DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED)); - rwsem_clear_reader_owned(sem); __up_read(sem); } @@ -133,7 +124,6 @@ void up_write(struct rw_semaphore *sem) rwsem_release(&sem->dep_map, 1, _RET_IP_); DEBUG_RWSEMS_WARN_ON(sem->owner != current); - rwsem_clear_owner(sem); __up_write(sem); } @@ -147,7 +137,6 @@ void downgrade_write(struct rw_semaphore *sem) lock_downgrade(&sem->dep_map, _RET_IP_); DEBUG_RWSEMS_WARN_ON(sem->owner != current); - rwsem_set_reader_owned(sem); __downgrade_write(sem); } @@ -161,7 +150,6 @@ void down_read_nested(struct rw_semaphore *sem, int subclass) rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_); LOCK_CONTENDED(sem, __down_read_trylock, __down_read); - rwsem_set_reader_owned(sem); } EXPORT_SYMBOL(down_read_nested); @@ -172,7 +160,6 @@ void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest) rwsem_acquire_nest(&sem->dep_map, 0, 0, nest, _RET_IP_); LOCK_CONTENDED(sem, __down_write_trylock, __down_write); - rwsem_set_owner(sem); } EXPORT_SYMBOL(_down_write_nest_lock); @@ -193,7 +180,6 @@ void down_write_nested(struct rw_semaphore *sem, int subclass) rwsem_acquire(&sem->dep_map, subclass, 0, _RET_IP_); LOCK_CONTENDED(sem, __down_write_trylock, __down_write); - rwsem_set_owner(sem); } EXPORT_SYMBOL(down_write_nested); @@ -208,7 +194,6 @@ int __sched down_write_killable_nested(struct rw_semaphore *sem, int subclass) return -EINTR; } - rwsem_set_owner(sem); return 0; } diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h index 067e265..f568391 100644 --- a/kernel/locking/rwsem.h +++ b/kernel/locking/rwsem.h @@ -161,6 +161,8 @@ static inline void __down_read(struct rw_semaphore *sem) { if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) rwsem_down_read_failed(sem); + else + rwsem_set_reader_owned(sem); } static inline int __down_read_killable(struct rw_semaphore *sem) @@ -168,8 +170,9 @@ static inline int __down_read_killable(struct rw_semaphore *sem) if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) { if (IS_ERR(rwsem_down_read_failed_killable(sem))) return -EINTR; + } else { + rwsem_set_reader_owned(sem); } - return 0; } @@ -180,6 +183,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem) while ((tmp = atomic_long_read(&sem->count)) >= 0) { if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, tmp + RWSEM_ACTIVE_READ_BIAS)) { + rwsem_set_reader_owned(sem); return 1; } } @@ -197,6 +201,7 @@ static inline void __down_write(struct rw_semaphore *sem) &sem->count); if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) rwsem_down_write_failed(sem); + rwsem_set_owner(sem); } static inline int __down_write_killable(struct rw_semaphore *sem) @@ -208,6 +213,7 @@ static inline int __down_write_killable(struct rw_semaphore *sem) if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) if (IS_ERR(rwsem_down_write_failed_killable(sem))) return -EINTR; + rwsem_set_owner(sem); return 0; } @@ -217,7 +223,11 @@ static inline int __down_write_trylock(struct rw_semaphore *sem) tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE, RWSEM_ACTIVE_WRITE_BIAS); - return tmp == RWSEM_UNLOCKED_VALUE; + if (tmp == RWSEM_UNLOCKED_VALUE) { + rwsem_set_owner(sem); + return true; + } + return false; } /* @@ -227,6 +237,7 @@ static inline void __up_read(struct rw_semaphore *sem) { long tmp; + rwsem_clear_reader_owned(sem); tmp = atomic_long_dec_return_release(&sem->count); if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)) rwsem_wake(sem); @@ -237,6 +248,7 @@ static inline void __up_read(struct rw_semaphore *sem) */ static inline void __up_write(struct rw_semaphore *sem) { + rwsem_clear_owner(sem); if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS, &sem->count) < 0)) rwsem_wake(sem); @@ -257,6 +269,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem) * write side. As such, rely on RELEASE semantics. */ tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count); + rwsem_set_reader_owned(sem); if (tmp < 0) rwsem_downgrade_wake(sem); } From patchwork Thu Feb 7 19:07:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801941 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB8AB746 for ; Thu, 7 Feb 2019 19:09:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 99ADA2CCF9 for ; Thu, 7 Feb 2019 19:09:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8C4582CD9E; Thu, 7 Feb 2019 19:09:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 431FF2CD0E for ; Thu, 7 Feb 2019 19:09:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727249AbfBGTJM (ORCPT ); Thu, 7 Feb 2019 14:09:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:41636 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726576AbfBGTJL (ORCPT ); Thu, 7 Feb 2019 14:09:11 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B8C5FC062D16; Thu, 7 Feb 2019 19:09:09 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 921AA1691C; Thu, 7 Feb 2019 19:09:05 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 06/22] locking/rwsem: Rename kernel/locking/rwsem.h Date: Thu, 7 Feb 2019 14:07:10 -0500 Message-Id: <1549566446-27967-7-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 07 Feb 2019 19:09:10 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The content of kernel/locking/rwsem.h is now specific to rwsem-xadd only. Rename it to rwsem-xadd.h to indicate that it is specific to rwsem-xadd and include it only when CONFIG_RWSEM_XCHGADD_ALGORITHM is set. Signed-off-by: Waiman Long --- kernel/locking/percpu-rwsem.c | 4 +- kernel/locking/rwsem-xadd.c | 2 +- kernel/locking/rwsem-xadd.h | 274 +++++++++++++++++++++++++++++++++++++++++ kernel/locking/rwsem.c | 7 +- kernel/locking/rwsem.h | 277 ------------------------------------------ 5 files changed, 284 insertions(+), 280 deletions(-) create mode 100644 kernel/locking/rwsem-xadd.h delete mode 100644 kernel/locking/rwsem.h diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c index f17dad9..d06f7c3 100644 --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -7,7 +7,9 @@ #include #include -#include "rwsem.h" +#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM +#include "rwsem-xadd.h" +#endif int __percpu_init_rwsem(struct percpu_rw_semaphore *sem, const char *name, struct lock_class_key *rwsem_key) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index c213869..62422a6 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -19,7 +19,7 @@ #include #include -#include "rwsem.h" +#include "rwsem-xadd.h" /* * Guide to the rw_semaphore's count field for common values. diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h new file mode 100644 index 0000000..2eaa762 --- /dev/null +++ b/kernel/locking/rwsem-xadd.h @@ -0,0 +1,274 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * The least significant 2 bits of the owner value has the following + * meanings when set. + * - RWSEM_READER_OWNED (bit 0): The rwsem is owned by readers + * - RWSEM_ANONYMOUSLY_OWNED (bit 1): The rwsem is anonymously owned, + * i.e. the owner(s) cannot be readily determined. It can be reader + * owned or the owning writer is indeterminate. + * + * When a writer acquires a rwsem, it puts its task_struct pointer + * into the owner field. It is cleared after an unlock. + * + * When a reader acquires a rwsem, it will also puts its task_struct + * pointer into the owner field with both the RWSEM_READER_OWNED and + * RWSEM_ANONYMOUSLY_OWNED bits set. On unlock, the owner field will + * largely be left untouched. So for a free or reader-owned rwsem, + * the owner value may contain information about the last reader that + * acquires the rwsem. The anonymous bit is set because that particular + * reader may or may not still own the lock. + * + * That information may be helpful in debugging cases where the system + * seems to hang on a reader owned rwsem especially if only one reader + * is involved. Ideally we would like to track all the readers that own + * a rwsem, but the overhead is simply too big. + */ +#define RWSEM_READER_OWNED (1UL << 0) +#define RWSEM_ANONYMOUSLY_OWNED (1UL << 1) + +#ifdef CONFIG_DEBUG_RWSEMS +# define DEBUG_RWSEMS_WARN_ON(c) DEBUG_LOCKS_WARN_ON(c) +#else +# define DEBUG_RWSEMS_WARN_ON(c) +#endif + +/* + * R/W semaphores originally for PPC using the stuff in lib/rwsem.c. + * Adapted largely from include/asm-i386/rwsem.h + * by Paul Mackerras . + */ + +/* + * the semaphore definition + */ +#ifdef CONFIG_64BIT +# define RWSEM_ACTIVE_MASK 0xffffffffL +#else +# define RWSEM_ACTIVE_MASK 0x0000ffffL +#endif + +#define RWSEM_ACTIVE_BIAS 0x00000001L +#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1) +#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS +#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS) + +#ifdef CONFIG_RWSEM_SPIN_ON_OWNER +/* + * All writes to owner are protected by WRITE_ONCE() to make sure that + * store tearing can't happen as optimistic spinners may read and use + * the owner value concurrently without lock. Read from owner, however, + * may not need READ_ONCE() as long as the pointer value is only used + * for comparison and isn't being dereferenced. + */ +static inline void rwsem_set_owner(struct rw_semaphore *sem) +{ + WRITE_ONCE(sem->owner, current); +} + +static inline void rwsem_clear_owner(struct rw_semaphore *sem) +{ + WRITE_ONCE(sem->owner, NULL); +} + +/* + * The task_struct pointer of the last owning reader will be left in + * the owner field. + * + * Note that the owner value just indicates the task has owned the rwsem + * previously, it may not be the real owner or one of the real owners + * anymore when that field is examined, so take it with a grain of salt. + */ +static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem, + struct task_struct *owner) +{ + unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED + | RWSEM_ANONYMOUSLY_OWNED; + + WRITE_ONCE(sem->owner, (struct task_struct *)val); +} + +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem) +{ + __rwsem_set_reader_owned(sem, current); +} + +/* + * Return true if the a rwsem waiter can spin on the rwsem's owner + * and steal the lock, i.e. the lock is not anonymously owned. + * N.B. !owner is considered spinnable. + */ +static inline bool is_rwsem_owner_spinnable(struct task_struct *owner) +{ + return !((unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED); +} + +/* + * Return true if rwsem is owned by an anonymous writer or readers. + */ +static inline bool rwsem_has_anonymous_owner(struct task_struct *owner) +{ + return (unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED; +} + +#ifdef CONFIG_DEBUG_RWSEMS +/* + * With CONFIG_DEBUG_RWSEMS configured, it will make sure that if there + * is a task pointer in owner of a reader-owned rwsem, it will be the + * real owner or one of the real owners. The only exception is when the + * unlock is done by up_read_non_owner(). + */ +#define rwsem_clear_reader_owned rwsem_clear_reader_owned +static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) +{ + unsigned long val = (unsigned long)current | RWSEM_READER_OWNED + | RWSEM_ANONYMOUSLY_OWNED; + if (READ_ONCE(sem->owner) == (struct task_struct *)val) + cmpxchg_relaxed((unsigned long *)&sem->owner, val, + RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED); +} +#endif + +#else +static inline void rwsem_set_owner(struct rw_semaphore *sem) +{ +} + +static inline void rwsem_clear_owner(struct rw_semaphore *sem) +{ +} + +static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem, + struct task_struct *owner) +{ +} + +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem) +{ +} +#endif + +#ifndef rwsem_clear_reader_owned +static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) +{ +} +#endif + +/* + * lock for reading + */ +static inline void __down_read(struct rw_semaphore *sem) +{ + if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) + rwsem_down_read_failed(sem); + else + rwsem_set_reader_owned(sem); +} + +static inline int __down_read_killable(struct rw_semaphore *sem) +{ + if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) { + if (IS_ERR(rwsem_down_read_failed_killable(sem))) + return -EINTR; + } else { + rwsem_set_reader_owned(sem); + } + return 0; +} + +static inline int __down_read_trylock(struct rw_semaphore *sem) +{ + long tmp; + + while ((tmp = atomic_long_read(&sem->count)) >= 0) { + if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, + tmp + RWSEM_ACTIVE_READ_BIAS)) { + rwsem_set_reader_owned(sem); + return 1; + } + } + return 0; +} + +/* + * lock for writing + */ +static inline void __down_write(struct rw_semaphore *sem) +{ + long tmp; + + tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, + &sem->count); + if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) + rwsem_down_write_failed(sem); + rwsem_set_owner(sem); +} + +static inline int __down_write_killable(struct rw_semaphore *sem) +{ + long tmp; + + tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, + &sem->count); + if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) + if (IS_ERR(rwsem_down_write_failed_killable(sem))) + return -EINTR; + rwsem_set_owner(sem); + return 0; +} + +static inline int __down_write_trylock(struct rw_semaphore *sem) +{ + long tmp; + + tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE, + RWSEM_ACTIVE_WRITE_BIAS); + if (tmp == RWSEM_UNLOCKED_VALUE) { + rwsem_set_owner(sem); + return true; + } + return false; +} + +/* + * unlock after reading + */ +static inline void __up_read(struct rw_semaphore *sem) +{ + long tmp; + + rwsem_clear_reader_owned(sem); + tmp = atomic_long_dec_return_release(&sem->count); + if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)) + rwsem_wake(sem); +} + +/* + * unlock after writing + */ +static inline void __up_write(struct rw_semaphore *sem) +{ + rwsem_clear_owner(sem); + if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS, + &sem->count) < 0)) + rwsem_wake(sem); +} + +/* + * downgrade write lock to read lock + */ +static inline void __downgrade_write(struct rw_semaphore *sem) +{ + long tmp; + + /* + * When downgrading from exclusive to shared ownership, + * anything inside the write-locked region cannot leak + * into the read side. In contrast, anything in the + * read-locked region is ok to be re-ordered into the + * write side. As such, rely on RELEASE semantics. + */ + tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count); + rwsem_set_reader_owned(sem); + if (tmp < 0) + rwsem_downgrade_wake(sem); +} diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 59e5848..b3b4582 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -13,7 +13,12 @@ #include #include -#include "rwsem.h" +#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM +#include "rwsem-xadd.h" +#else +#define __rwsem_set_reader_owned(a, b) +#define DEBUG_RWSEMS_WARN_ON(a) +#endif /* * lock for reading diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h deleted file mode 100644 index f568391..0000000 --- a/kernel/locking/rwsem.h +++ /dev/null @@ -1,277 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * The least significant 2 bits of the owner value has the following - * meanings when set. - * - RWSEM_READER_OWNED (bit 0): The rwsem is owned by readers - * - RWSEM_ANONYMOUSLY_OWNED (bit 1): The rwsem is anonymously owned, - * i.e. the owner(s) cannot be readily determined. It can be reader - * owned or the owning writer is indeterminate. - * - * When a writer acquires a rwsem, it puts its task_struct pointer - * into the owner field. It is cleared after an unlock. - * - * When a reader acquires a rwsem, it will also puts its task_struct - * pointer into the owner field with both the RWSEM_READER_OWNED and - * RWSEM_ANONYMOUSLY_OWNED bits set. On unlock, the owner field will - * largely be left untouched. So for a free or reader-owned rwsem, - * the owner value may contain information about the last reader that - * acquires the rwsem. The anonymous bit is set because that particular - * reader may or may not still own the lock. - * - * That information may be helpful in debugging cases where the system - * seems to hang on a reader owned rwsem especially if only one reader - * is involved. Ideally we would like to track all the readers that own - * a rwsem, but the overhead is simply too big. - */ -#define RWSEM_READER_OWNED (1UL << 0) -#define RWSEM_ANONYMOUSLY_OWNED (1UL << 1) - -#ifdef CONFIG_DEBUG_RWSEMS -# define DEBUG_RWSEMS_WARN_ON(c) DEBUG_LOCKS_WARN_ON(c) -#else -# define DEBUG_RWSEMS_WARN_ON(c) -#endif - -/* - * R/W semaphores originally for PPC using the stuff in lib/rwsem.c. - * Adapted largely from include/asm-i386/rwsem.h - * by Paul Mackerras . - */ - -/* - * the semaphore definition - */ -#ifdef CONFIG_64BIT -# define RWSEM_ACTIVE_MASK 0xffffffffL -#else -# define RWSEM_ACTIVE_MASK 0x0000ffffL -#endif - -#define RWSEM_ACTIVE_BIAS 0x00000001L -#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1) -#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS -#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS) - -#ifdef CONFIG_RWSEM_SPIN_ON_OWNER -/* - * All writes to owner are protected by WRITE_ONCE() to make sure that - * store tearing can't happen as optimistic spinners may read and use - * the owner value concurrently without lock. Read from owner, however, - * may not need READ_ONCE() as long as the pointer value is only used - * for comparison and isn't being dereferenced. - */ -static inline void rwsem_set_owner(struct rw_semaphore *sem) -{ - WRITE_ONCE(sem->owner, current); -} - -static inline void rwsem_clear_owner(struct rw_semaphore *sem) -{ - WRITE_ONCE(sem->owner, NULL); -} - -/* - * The task_struct pointer of the last owning reader will be left in - * the owner field. - * - * Note that the owner value just indicates the task has owned the rwsem - * previously, it may not be the real owner or one of the real owners - * anymore when that field is examined, so take it with a grain of salt. - */ -static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem, - struct task_struct *owner) -{ - unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED - | RWSEM_ANONYMOUSLY_OWNED; - - WRITE_ONCE(sem->owner, (struct task_struct *)val); -} - -static inline void rwsem_set_reader_owned(struct rw_semaphore *sem) -{ - __rwsem_set_reader_owned(sem, current); -} - -/* - * Return true if the a rwsem waiter can spin on the rwsem's owner - * and steal the lock, i.e. the lock is not anonymously owned. - * N.B. !owner is considered spinnable. - */ -static inline bool is_rwsem_owner_spinnable(struct task_struct *owner) -{ - return !((unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED); -} - -/* - * Return true if rwsem is owned by an anonymous writer or readers. - */ -static inline bool rwsem_has_anonymous_owner(struct task_struct *owner) -{ - return (unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED; -} - -#ifdef CONFIG_DEBUG_RWSEMS -/* - * With CONFIG_DEBUG_RWSEMS configured, it will make sure that if there - * is a task pointer in owner of a reader-owned rwsem, it will be the - * real owner or one of the real owners. The only exception is when the - * unlock is done by up_read_non_owner(). - */ -#define rwsem_clear_reader_owned rwsem_clear_reader_owned -static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) -{ - unsigned long val = (unsigned long)current | RWSEM_READER_OWNED - | RWSEM_ANONYMOUSLY_OWNED; - if (READ_ONCE(sem->owner) == (struct task_struct *)val) - cmpxchg_relaxed((unsigned long *)&sem->owner, val, - RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED); -} -#endif - -#else -static inline void rwsem_set_owner(struct rw_semaphore *sem) -{ -} - -static inline void rwsem_clear_owner(struct rw_semaphore *sem) -{ -} - -static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem, - struct task_struct *owner) -{ -} - -static inline void rwsem_set_reader_owned(struct rw_semaphore *sem) -{ -} -#endif - -#ifndef rwsem_clear_reader_owned -static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) -{ -} -#endif - -#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM -/* - * lock for reading - */ -static inline void __down_read(struct rw_semaphore *sem) -{ - if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) - rwsem_down_read_failed(sem); - else - rwsem_set_reader_owned(sem); -} - -static inline int __down_read_killable(struct rw_semaphore *sem) -{ - if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) { - if (IS_ERR(rwsem_down_read_failed_killable(sem))) - return -EINTR; - } else { - rwsem_set_reader_owned(sem); - } - return 0; -} - -static inline int __down_read_trylock(struct rw_semaphore *sem) -{ - long tmp; - - while ((tmp = atomic_long_read(&sem->count)) >= 0) { - if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, - tmp + RWSEM_ACTIVE_READ_BIAS)) { - rwsem_set_reader_owned(sem); - return 1; - } - } - return 0; -} - -/* - * lock for writing - */ -static inline void __down_write(struct rw_semaphore *sem) -{ - long tmp; - - tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, - &sem->count); - if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) - rwsem_down_write_failed(sem); - rwsem_set_owner(sem); -} - -static inline int __down_write_killable(struct rw_semaphore *sem) -{ - long tmp; - - tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, - &sem->count); - if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) - if (IS_ERR(rwsem_down_write_failed_killable(sem))) - return -EINTR; - rwsem_set_owner(sem); - return 0; -} - -static inline int __down_write_trylock(struct rw_semaphore *sem) -{ - long tmp; - - tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE, - RWSEM_ACTIVE_WRITE_BIAS); - if (tmp == RWSEM_UNLOCKED_VALUE) { - rwsem_set_owner(sem); - return true; - } - return false; -} - -/* - * unlock after reading - */ -static inline void __up_read(struct rw_semaphore *sem) -{ - long tmp; - - rwsem_clear_reader_owned(sem); - tmp = atomic_long_dec_return_release(&sem->count); - if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)) - rwsem_wake(sem); -} - -/* - * unlock after writing - */ -static inline void __up_write(struct rw_semaphore *sem) -{ - rwsem_clear_owner(sem); - if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS, - &sem->count) < 0)) - rwsem_wake(sem); -} - -/* - * downgrade write lock to read lock - */ -static inline void __downgrade_write(struct rw_semaphore *sem) -{ - long tmp; - - /* - * When downgrading from exclusive to shared ownership, - * anything inside the write-locked region cannot leak - * into the read side. In contrast, anything in the - * read-locked region is ok to be re-ordered into the - * write side. As such, rely on RELEASE semantics. - */ - tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count); - rwsem_set_reader_owned(sem); - if (tmp < 0) - rwsem_downgrade_wake(sem); -} - -#endif /* CONFIG_RWSEM_XCHGADD_ALGORITHM */ From patchwork Thu Feb 7 19:07:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801949 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B2931669 for ; Thu, 7 Feb 2019 19:09:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F023E2CCF9 for ; Thu, 7 Feb 2019 19:09:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E42192DDB9; Thu, 7 Feb 2019 19:09:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90D962CCF9 for ; Thu, 7 Feb 2019 19:09:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727266AbfBGTJW (ORCPT ); Thu, 7 Feb 2019 14:09:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59612 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726956AbfBGTJV (ORCPT ); Thu, 7 Feb 2019 14:09:21 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E5B3180E7B; Thu, 7 Feb 2019 19:09:20 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id D4E8617BAC; Thu, 7 Feb 2019 19:09:09 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 07/22] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h Date: Thu, 7 Feb 2019 14:07:11 -0500 Message-Id: <1549566446-27967-8-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 07 Feb 2019 19:09:21 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We don't need to expose rwsem internal functions which are not supposed to be called directly from other kernel code. Signed-off-by: Waiman Long --- include/linux/rwsem.h | 7 ------- kernel/locking/rwsem-xadd.h | 7 +++++++ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 6e56006..1e0457b 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -50,13 +50,6 @@ struct rw_semaphore { */ #define RWSEM_OWNER_UNKNOWN ((struct task_struct *)-2L) -extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem); -extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem); -extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem); -extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem); -extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *); -extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem); - /* In all implementations count != 0 means locked */ static inline int rwsem_is_locked(struct rw_semaphore *sem) { diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index 2eaa762..64e7d62 100644 --- a/kernel/locking/rwsem-xadd.h +++ b/kernel/locking/rwsem-xadd.h @@ -153,6 +153,13 @@ static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) } #endif +extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem); +extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem); +extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem); +extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem); +extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem); +extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem); + /* * lock for reading */ From patchwork Thu Feb 7 19:07:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801951 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 858631669 for ; Thu, 7 Feb 2019 19:09:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 757332CCF9 for ; Thu, 7 Feb 2019 19:09:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 67AAC2DDB9; Thu, 7 Feb 2019 19:09:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 047542CCF9 for ; Thu, 7 Feb 2019 19:09:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727013AbfBGTJ3 (ORCPT ); Thu, 7 Feb 2019 14:09:29 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59804 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726843AbfBGTJ2 (ORCPT ); Thu, 7 Feb 2019 14:09:28 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2B3A381DFA; Thu, 7 Feb 2019 19:09:28 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1060461146; Thu, 7 Feb 2019 19:09:20 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 08/22] locking/rwsem: Add debug check for __down_read*() Date: Thu, 7 Feb 2019 14:07:12 -0500 Message-Id: <1549566446-27967-9-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 07 Feb 2019 19:09:28 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When rwsem_down_read_failed*() return, the read lock is acquired indirectly by others. So debug checks are added in __down_read() and __down_read_killable() to make sure the rwsem is really reader-owned. The other debug check calls in kernel/locking/rwsem.c except the one in up_read_non_owner() are also moved over to rwsem-xadd.h. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.h | 12 ++++++++++-- kernel/locking/rwsem.c | 3 --- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index 64e7d62..77151c3 100644 --- a/kernel/locking/rwsem-xadd.h +++ b/kernel/locking/rwsem-xadd.h @@ -165,10 +165,13 @@ static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) */ static inline void __down_read(struct rw_semaphore *sem) { - if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) + if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) { rwsem_down_read_failed(sem); - else + DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & + RWSEM_READER_OWNED)); + } else { rwsem_set_reader_owned(sem); + } } static inline int __down_read_killable(struct rw_semaphore *sem) @@ -176,6 +179,8 @@ static inline int __down_read_killable(struct rw_semaphore *sem) if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) { if (IS_ERR(rwsem_down_read_failed_killable(sem))) return -EINTR; + DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & + RWSEM_READER_OWNED)); } else { rwsem_set_reader_owned(sem); } @@ -243,6 +248,7 @@ static inline void __up_read(struct rw_semaphore *sem) { long tmp; + DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED)); rwsem_clear_reader_owned(sem); tmp = atomic_long_dec_return_release(&sem->count); if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)) @@ -254,6 +260,7 @@ static inline void __up_read(struct rw_semaphore *sem) */ static inline void __up_write(struct rw_semaphore *sem) { + DEBUG_RWSEMS_WARN_ON(sem->owner != current); rwsem_clear_owner(sem); if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS, &sem->count) < 0)) @@ -274,6 +281,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem) * read-locked region is ok to be re-ordered into the * write side. As such, rely on RELEASE semantics. */ + DEBUG_RWSEMS_WARN_ON(sem->owner != current); tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count); rwsem_set_reader_owned(sem); if (tmp < 0) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index b3b4582..598fc7c 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -114,7 +114,6 @@ int down_write_trylock(struct rw_semaphore *sem) void up_read(struct rw_semaphore *sem) { rwsem_release(&sem->dep_map, 1, _RET_IP_); - DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED)); __up_read(sem); } @@ -127,7 +126,6 @@ void up_read(struct rw_semaphore *sem) void up_write(struct rw_semaphore *sem) { rwsem_release(&sem->dep_map, 1, _RET_IP_); - DEBUG_RWSEMS_WARN_ON(sem->owner != current); __up_write(sem); } @@ -140,7 +138,6 @@ void up_write(struct rw_semaphore *sem) void downgrade_write(struct rw_semaphore *sem) { lock_downgrade(&sem->dep_map, _RET_IP_); - DEBUG_RWSEMS_WARN_ON(sem->owner != current); __downgrade_write(sem); } From patchwork Thu Feb 7 19:07:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801993 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 76A37746 for ; Thu, 7 Feb 2019 19:11:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6616E2CD35 for ; Thu, 7 Feb 2019 19:11:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 592C12E060; Thu, 7 Feb 2019 19:11:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6B6A2CD35 for ; Thu, 7 Feb 2019 19:11:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727321AbfBGTJf (ORCPT ); Thu, 7 Feb 2019 14:09:35 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59906 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726791AbfBGTJe (ORCPT ); Thu, 7 Feb 2019 14:09:34 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DCCF488E52; Thu, 7 Feb 2019 19:09:33 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4B1361693B; Thu, 7 Feb 2019 19:09:28 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 09/22] locking/rwsem: Enhance DEBUG_RWSEMS_WARN_ON() macro Date: Thu, 7 Feb 2019 14:07:13 -0500 Message-Id: <1549566446-27967-10-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 07 Feb 2019 19:09:34 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently, the DEBUG_RWSEMS_WARN_ON() macro just dumps a stack trace when the rwsem isn't in the right state. It does not show the actual states of the rwsem. This may not be that helpful in the debugging process. Enhance the DEBUG_RWSEMS_WARN_ON() macro to also show the current content of the rwsem count and owner fields to give more information about what is wrong with the rwsem. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.h | 19 ++++++++++++------- kernel/locking/rwsem.c | 5 +++-- 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index 77151c3..6d2478c 100644 --- a/kernel/locking/rwsem-xadd.h +++ b/kernel/locking/rwsem-xadd.h @@ -27,9 +27,13 @@ #define RWSEM_ANONYMOUSLY_OWNED (1UL << 1) #ifdef CONFIG_DEBUG_RWSEMS -# define DEBUG_RWSEMS_WARN_ON(c) DEBUG_LOCKS_WARN_ON(c) +# define DEBUG_RWSEMS_WARN_ON(c, sem) \ + WARN_ONCE(c, "DEBUG_RWSEMS_WARN_ON(%s): count = 0x%lx, owner = 0x%lx, curr 0x%lx, list %sempty\n",\ + #c, atomic_long_read(&(sem)->count), \ + (long)((sem)->owner), (long)current, \ + list_empty(&(sem)->wait_list) ? "" : "not ") #else -# define DEBUG_RWSEMS_WARN_ON(c) +# define DEBUG_RWSEMS_WARN_ON(c, sem) #endif /* @@ -168,7 +172,7 @@ static inline void __down_read(struct rw_semaphore *sem) if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) { rwsem_down_read_failed(sem); DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & - RWSEM_READER_OWNED)); + RWSEM_READER_OWNED), sem); } else { rwsem_set_reader_owned(sem); } @@ -180,7 +184,7 @@ static inline int __down_read_killable(struct rw_semaphore *sem) if (IS_ERR(rwsem_down_read_failed_killable(sem))) return -EINTR; DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & - RWSEM_READER_OWNED)); + RWSEM_READER_OWNED), sem); } else { rwsem_set_reader_owned(sem); } @@ -248,7 +252,8 @@ static inline void __up_read(struct rw_semaphore *sem) { long tmp; - DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED)); + DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED), + sem); rwsem_clear_reader_owned(sem); tmp = atomic_long_dec_return_release(&sem->count); if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)) @@ -260,7 +265,7 @@ static inline void __up_read(struct rw_semaphore *sem) */ static inline void __up_write(struct rw_semaphore *sem) { - DEBUG_RWSEMS_WARN_ON(sem->owner != current); + DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem); rwsem_clear_owner(sem); if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS, &sem->count) < 0)) @@ -281,7 +286,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem) * read-locked region is ok to be re-ordered into the * write side. As such, rely on RELEASE semantics. */ - DEBUG_RWSEMS_WARN_ON(sem->owner != current); + DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem); tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count); rwsem_set_reader_owned(sem); if (tmp < 0) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 598fc7c..bdfca7c 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -17,7 +17,7 @@ #include "rwsem-xadd.h" #else #define __rwsem_set_reader_owned(a, b) -#define DEBUG_RWSEMS_WARN_ON(a) +#define DEBUG_RWSEMS_WARN_ON(a, b) #endif /* @@ -203,7 +203,8 @@ int __sched down_write_killable_nested(struct rw_semaphore *sem, int subclass) void up_read_non_owner(struct rw_semaphore *sem) { - DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED)); + DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED), + sem); __up_read(sem); } From patchwork Thu Feb 7 19:07:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801953 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84210746 for ; Thu, 7 Feb 2019 19:09:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 73C292CCF9 for ; Thu, 7 Feb 2019 19:09:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 67D412DDB9; Thu, 7 Feb 2019 19:09:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D49EF2CCF9 for ; Thu, 7 Feb 2019 19:09:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727348AbfBGTJj (ORCPT ); Thu, 7 Feb 2019 14:09:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50558 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727336AbfBGTJh (ORCPT ); Thu, 7 Feb 2019 14:09:37 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1DFDF10FA8; Thu, 7 Feb 2019 19:09:36 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0995D62FA5; Thu, 7 Feb 2019 19:09:33 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 10/22] locking/rwsem: Enable lock event counting Date: Thu, 7 Feb 2019 14:07:14 -0500 Message-Id: <1549566446-27967-11-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 07 Feb 2019 19:09:36 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add lock event counting calls so that we can track the number of lock events happening in the rwsem code. With CONFIG_LOCK_EVENT_COUNTS on and booting a 1-socket 22-core 44-thread x86-64 system, the non-zero rwsem counts after system bootup were as follows: rwsem_opt_fail=113 rwsem_opt_wlock=13647 rwsem_rlock=176 rwsem_rlock_fast=10 rwsem_wake_reader=153 rwsem_wake_writer=139 rwsem_wlock=113 It can be seen that most of the lock acquisitions in the slowpath were writer-locks in the optimistic spinning code path with no sleeping at all. Only about 4% of locks were acquired after sleeping. Signed-off-by: Waiman Long --- arch/Kconfig | 2 +- kernel/locking/lock_events_list.h | 17 +++++++++++++++++ kernel/locking/rwsem-xadd.c | 12 ++++++++++++ 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/arch/Kconfig b/arch/Kconfig index af147c2..7471791 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -891,7 +891,7 @@ config ARCH_USE_MEMREMAP_PROT config LOCK_EVENT_COUNTS bool "Locking event counts collection" depends on DEBUG_FS - depends on QUEUED_SPINLOCKS + depends on (QUEUED_SPINLOCKS || RWSEM_XCHGADD_ALGORITHM) ---help--- Enable light-weight counting of various locking related events in the system with minimal performance impact. This reduces diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index 8b4d2e1..c33c5df 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -48,3 +48,20 @@ LOCK_EVENT(lock_use_node4) /* # of locking ops that use 4th percpu node */ LOCK_EVENT(lock_no_node) /* # of locking ops w/o using percpu node */ #endif /* CONFIG_QUEUED_SPINLOCKS */ + +#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM +/* + * Locking events for rwsem + */ +LOCK_EVENT(rwsem_sleep_reader) /* # of reader sleeps */ +LOCK_EVENT(rwsem_sleep_writer) /* # of writer sleeps */ +LOCK_EVENT(rwsem_wake_reader) /* # of reader wakeups */ +LOCK_EVENT(rwsem_wake_writer) /* # of writer wakeups */ +LOCK_EVENT(rwsem_opt_wlock) /* # of write locks opt-spin acquired */ +LOCK_EVENT(rwsem_opt_fail) /* # of failed opt-spinnings */ +LOCK_EVENT(rwsem_rlock) /* # of read locks acquired */ +LOCK_EVENT(rwsem_rlock_fast) /* # of fast read locks acquired */ +LOCK_EVENT(rwsem_rlock_fail) /* # of failed read lock acquisitions */ +LOCK_EVENT(rwsem_wlock) /* # of write locks acquired */ +LOCK_EVENT(rwsem_wlock_fail) /* # of failed write lock acquisitions */ +#endif /* CONFIG_RWSEM_XCHGADD_ALGORITHM */ diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 62422a6..fff231a 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -20,6 +20,7 @@ #include #include "rwsem-xadd.h" +#include "lock_events.h" /* * Guide to the rw_semaphore's count field for common values. @@ -147,6 +148,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, * will notice the queued writer. */ wake_q_add(wake_q, waiter->task); + lockevent_inc(rwsem_wake_writer); } return; @@ -214,6 +216,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, } adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment; + lockevent_cond_inc(rwsem_wake_reader, woken); if (list_empty(&sem->wait_list)) { /* hit end of list above */ adjustment -= RWSEM_WAITING_BIAS; @@ -269,6 +272,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem) count + RWSEM_ACTIVE_WRITE_BIAS); if (old == count) { rwsem_set_owner(sem); + lockevent_inc(rwsem_opt_wlock); return true; } @@ -394,6 +398,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) osq_unlock(&sem->osq); done: preempt_enable(); + lockevent_cond_inc(rwsem_opt_fail, !taken); return taken; } @@ -441,6 +446,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) if (atomic_long_read(&sem->count) >= 0) { raw_spin_unlock_irq(&sem->wait_lock); rwsem_set_reader_owned(sem); + lockevent_inc(rwsem_rlock_fast); return sem; } adjustment += RWSEM_WAITING_BIAS; @@ -477,9 +483,11 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) break; } schedule(); + lockevent_inc(rwsem_sleep_reader); } __set_current_state(TASK_RUNNING); + lockevent_inc(rwsem_rlock); return sem; out_nolock: list_del(&waiter.list); @@ -487,6 +495,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count); raw_spin_unlock_irq(&sem->wait_lock); __set_current_state(TASK_RUNNING); + lockevent_inc(rwsem_rlock_fail); return ERR_PTR(-EINTR); } @@ -580,6 +589,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) goto out_nolock; schedule(); + lockevent_inc(rwsem_sleep_writer); set_current_state(state); } while ((count = atomic_long_read(&sem->count)) & RWSEM_ACTIVE_MASK); @@ -588,6 +598,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) __set_current_state(TASK_RUNNING); list_del(&waiter.list); raw_spin_unlock_irq(&sem->wait_lock); + lockevent_inc(rwsem_wlock); return ret; @@ -601,6 +612,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); raw_spin_unlock_irq(&sem->wait_lock); wake_up_q(&wake_q); + lockevent_inc(rwsem_wlock_fail); return ERR_PTR(-EINTR); } From patchwork Thu Feb 7 19:07:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801957 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58BBD1669 for ; Thu, 7 Feb 2019 19:09:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 478842DDB2 for ; Thu, 7 Feb 2019 19:09:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3B6862DDBE; Thu, 7 Feb 2019 19:09:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D7C1F2DDB9 for ; Thu, 7 Feb 2019 19:09:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727375AbfBGTJo (ORCPT ); Thu, 7 Feb 2019 14:09:44 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60044 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727336AbfBGTJn (ORCPT ); Thu, 7 Feb 2019 14:09:43 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 23BC97F3EA; Thu, 7 Feb 2019 19:09:42 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3E97E1693B; Thu, 7 Feb 2019 19:09:36 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 11/22] locking/rwsem: Implement a new locking scheme Date: Thu, 7 Feb 2019 14:07:15 -0500 Message-Id: <1549566446-27967-12-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 07 Feb 2019 19:09:42 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The current way of using various reader, writer and waiting biases in the rwsem code are confusing and hard to understand. I have to reread the rwsem count guide in the rwsem-xadd.c file from time to time to remind myself how this whole thing works. It also makes the rwsem code harder to be optimized. To make rwsem more sane, a new locking scheme similar to the one in qrwlock is now being used. The count is now a 32-bit atomic value in all architectures. The current bit definitions are: Bit 0 - writer locked bit Bit 1 - waiters present bit Bits 2-7 - reserved for future extension Bits 8-X - reader count (24/56 bits) Now the cmpxchg instruction is used to acquire the write lock. The read lock is still acquired with xadd instruction, so there is no change here. This scheme will allow up to 16M active readers which should be more than enough. We can always use some more reserved bits if necessary. With that change, we can deterministically know if a rwsem has been write-locked. Looking at the count alone, however, one cannot determine for certain if a rwsem is owned by readers or not as the readers that set the reader count bits may be in the process of backing out. So we still need the reader-owned bit in the owner field to be sure. With a locking microbenchmark running on 5.0 based kernel, the total locking rates (in kops/s) of the benchmark on a 4-socket 56-core x86-64 system before and after the patch were as follows: Before Patch After Patch # of Threads wlock rlock wlock rlock ------------ ----- ----- ----- ----- 1 28,773 30,164 29,063 29,967 2 7,435 15,167 6,746 13,198 4 7,181 14,438 6,515 12,763 8 6,918 13,796 6,747 13,419 16 6,554 16,030 6,653 15,534 The locking rates of the benchmark on a Power9 system were as follows: Before Patch After Patch # of Threads wlock rlock wlock rlock ------------ ----- ----- ----- ----- 1 11,527 12,378 11,527 12,195 2 6,318 15,116 7,690 15,065 4 3,957 16,812 4,495 16,884 8 3,617 16,496 3,993 16,588 The locking rates of the benchmark on a 2-socket ARM64 system were as follows: Before Patch After Patch # of Threads wlock rlock wlock rlock ------------ ----- ----- ----- ----- 1 23,703 23,208 23,923 23,167 2 6,206 11,996 6,756 13,268 4 6,098 12,343 6,928 15,064 8 6,007 13,152 7,117 15,335 For x86, the locking rate dropped somewhat at low contention level with the patch. At high contention level, however, the performance recovered. For both Power9 and ARM64, the patch caused no performance drop. In fact, it improved a little bit after the patch. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 145 +++++++++++++++----------------------------- kernel/locking/rwsem-xadd.h | 85 +++++++++++++------------- 2 files changed, 89 insertions(+), 141 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index fff231a..18a414e 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -9,6 +9,8 @@ * * Optimistic spinning by Tim Chen * and Davidlohr Bueso . Based on mutexes. + * + * Rwsem count bit fields re-definition by Waiman Long . */ #include #include @@ -23,52 +25,20 @@ #include "lock_events.h" /* - * Guide to the rw_semaphore's count field for common values. - * (32-bit case illustrated, similar for 64-bit) - * - * 0x0000000X (1) X readers active or attempting lock, no writer waiting - * X = #active_readers + #readers attempting to lock - * (X*ACTIVE_BIAS) - * - * 0x00000000 rwsem is unlocked, and no one is waiting for the lock or - * attempting to read lock or write lock. - * - * 0xffff000X (1) X readers active or attempting lock, with waiters for lock - * X = #active readers + # readers attempting lock - * (X*ACTIVE_BIAS + WAITING_BIAS) - * (2) 1 writer attempting lock, no waiters for lock - * X-1 = #active readers + #readers attempting lock - * ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS) - * (3) 1 writer active, no waiters for lock - * X-1 = #active readers + #readers attempting lock - * ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS) - * - * 0xffff0001 (1) 1 reader active or attempting lock, waiters for lock - * (WAITING_BIAS + ACTIVE_BIAS) - * (2) 1 writer active or attempting lock, no waiters for lock - * (ACTIVE_WRITE_BIAS) + * Guide to the rw_semaphore's count field. * - * 0xffff0000 (1) There are writers or readers queued but none active - * or in the process of attempting lock. - * (WAITING_BIAS) - * Note: writer can attempt to steal lock for this count by adding - * ACTIVE_WRITE_BIAS in cmpxchg and checking the old count + * When the RWSEM_WRITER_LOCKED bit in count is set, the lock is owned + * by a writer. * - * 0xfffe0001 (1) 1 writer active, or attempting lock. Waiters on queue. - * (ACTIVE_WRITE_BIAS + WAITING_BIAS) - * - * Note: Readers attempt to lock by adding ACTIVE_BIAS in down_read and checking - * the count becomes more than 0 for successful lock acquisition, - * i.e. the case where there are only readers or nobody has lock. - * (1st and 2nd case above). - * - * Writers attempt to lock by adding ACTIVE_WRITE_BIAS in down_write and - * checking the count becomes ACTIVE_WRITE_BIAS for successful lock - * acquisition (i.e. nobody else has lock or attempts lock). If - * unsuccessful, in rwsem_down_write_failed, we'll check to see if there - * are only waiters but none active (5th case above), and attempt to - * steal the lock. + * The lock is owned by readers when + * (1) the RWSEM_WRITER_LOCKED isn't set in count, + * (2) some of the reader bits are set in count, and + * (3) the owner field has RWSEM_READ_OWNED bit set. * + * Having some reader bits set is not enough to guarantee a readers owned + * lock as the readers may be in the process of backing out from the count + * and a writer has just released the lock. So another writer may steal + * the lock immediately after that. */ /* @@ -114,9 +84,8 @@ enum rwsem_wake_type { /* * handle the lock release when processes blocked on it that can now run - * - if we come here from up_xxxx(), then: - * - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed) - * - the 'waiting part' of count (&0xffff0000) is -ve (and will still be so) + * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must + * have been set. * - there must be someone on the queue * - the wait_lock must be held by the caller * - tasks are marked for wakeup, the caller must later invoke wake_up_q() @@ -160,22 +129,11 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, * so we can bail out early if a writer stole the lock. */ if (wake_type != RWSEM_WAKE_READ_OWNED) { - adjustment = RWSEM_ACTIVE_READ_BIAS; - try_reader_grant: + adjustment = RWSEM_READER_BIAS; oldcount = atomic_long_fetch_add(adjustment, &sem->count); - if (unlikely(oldcount < RWSEM_WAITING_BIAS)) { - /* - * If the count is still less than RWSEM_WAITING_BIAS - * after removing the adjustment, it is assumed that - * a writer has stolen the lock. We have to undo our - * reader grant. - */ - if (atomic_long_add_return(-adjustment, &sem->count) < - RWSEM_WAITING_BIAS) - return; - - /* Last active locker left. Retry waking readers. */ - goto try_reader_grant; + if (unlikely(oldcount & RWSEM_WRITER_MASK)) { + atomic_long_sub(adjustment, &sem->count); + return; } /* * Set it to reader-owned to give spinners an early @@ -215,11 +173,11 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, wake_q_add_safe(wake_q, tsk); } - adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment; + adjustment = woken * RWSEM_READER_BIAS - adjustment; lockevent_cond_inc(rwsem_wake_reader, woken); if (list_empty(&sem->wait_list)) { /* hit end of list above */ - adjustment -= RWSEM_WAITING_BIAS; + adjustment -= RWSEM_FLAG_WAITERS; } if (adjustment) @@ -233,22 +191,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, */ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem) { - /* - * Avoid trying to acquire write lock if count isn't RWSEM_WAITING_BIAS. - */ - if (count != RWSEM_WAITING_BIAS) + long new; + + if (RWSEM_COUNT_LOCKED(count)) return false; - /* - * Acquire the lock by trying to set it to ACTIVE_WRITE_BIAS. If there - * are other tasks on the wait list, we need to add on WAITING_BIAS. - */ - count = list_is_singular(&sem->wait_list) ? - RWSEM_ACTIVE_WRITE_BIAS : - RWSEM_ACTIVE_WRITE_BIAS + RWSEM_WAITING_BIAS; + new = count + RWSEM_WRITER_LOCKED - + (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0); - if (atomic_long_cmpxchg_acquire(&sem->count, RWSEM_WAITING_BIAS, count) - == RWSEM_WAITING_BIAS) { + if (atomic_long_cmpxchg_acquire(&sem->count, count, new) == count) { rwsem_set_owner(sem); return true; } @@ -265,11 +216,11 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem) long old, count = atomic_long_read(&sem->count); while (true) { - if (!(count == 0 || count == RWSEM_WAITING_BIAS)) + if (RWSEM_COUNT_LOCKED(count)) return false; old = atomic_long_cmpxchg_acquire(&sem->count, count, - count + RWSEM_ACTIVE_WRITE_BIAS); + count + RWSEM_WRITER_LOCKED); if (old == count) { rwsem_set_owner(sem); lockevent_inc(rwsem_opt_wlock); @@ -428,7 +379,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) static inline struct rw_semaphore __sched * __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state) { - long count, adjustment = -RWSEM_ACTIVE_READ_BIAS; + long count, adjustment = -RWSEM_READER_BIAS; struct rwsem_waiter waiter; DEFINE_WAKE_Q(wake_q); @@ -440,16 +391,16 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) /* * In case the wait queue is empty and the lock isn't owned * by a writer, this reader can exit the slowpath and return - * immediately as its RWSEM_ACTIVE_READ_BIAS has already - * been set in the count. + * immediately as its RWSEM_READER_BIAS has already been + * set in the count. */ - if (atomic_long_read(&sem->count) >= 0) { + if (!(atomic_long_read(&sem->count) & RWSEM_WRITER_MASK)) { raw_spin_unlock_irq(&sem->wait_lock); rwsem_set_reader_owned(sem); lockevent_inc(rwsem_rlock_fast); return sem; } - adjustment += RWSEM_WAITING_BIAS; + adjustment += RWSEM_FLAG_WAITERS; } list_add_tail(&waiter.list, &sem->wait_list); @@ -462,9 +413,8 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) * If there are no writers and we are first in the queue, * wake our own waiter to join the existing active readers ! */ - if (count == RWSEM_WAITING_BIAS || - (count > RWSEM_WAITING_BIAS && - adjustment != -RWSEM_ACTIVE_READ_BIAS)) + if (!RWSEM_COUNT_LOCKED(count) || + (!(count & RWSEM_WRITER_MASK) && (adjustment & RWSEM_FLAG_WAITERS))) __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); raw_spin_unlock_irq(&sem->wait_lock); @@ -492,7 +442,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) out_nolock: list_del(&waiter.list); if (list_empty(&sem->wait_list)) - atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count); + atomic_long_add(-RWSEM_FLAG_WAITERS, &sem->count); raw_spin_unlock_irq(&sem->wait_lock); __set_current_state(TASK_RUNNING); lockevent_inc(rwsem_rlock_fail); @@ -525,9 +475,6 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) struct rw_semaphore *ret = sem; DEFINE_WAKE_Q(wake_q); - /* undo write bias from down_write operation, stop active locking */ - count = atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS, &sem->count); - /* do optimistic spinning and steal lock if possible */ if (rwsem_optimistic_spin(sem)) return sem; @@ -553,10 +500,12 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) /* * If there were already threads queued before us and there are - * no active writers, the lock must be read owned; so we try to - * wake any read locks that were queued ahead of us. + * no active writers and some readers, the lock must be read + * owned; so we try to any read locks that were queued ahead + * of us. */ - if (count > RWSEM_WAITING_BIAS) { + if (!(count & RWSEM_WRITER_MASK) && + (count & RWSEM_READER_MASK)) { __rwsem_mark_wake(sem, RWSEM_WAKE_READERS, &wake_q); /* * The wakeup is normally called _after_ the wait_lock @@ -573,8 +522,9 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) wake_q_init(&wake_q); } - } else - count = atomic_long_add_return(RWSEM_WAITING_BIAS, &sem->count); + } else { + count = atomic_long_add_return(RWSEM_FLAG_WAITERS, &sem->count); + } /* wait until we successfully acquire the lock */ set_current_state(state); @@ -591,7 +541,8 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) schedule(); lockevent_inc(rwsem_sleep_writer); set_current_state(state); - } while ((count = atomic_long_read(&sem->count)) & RWSEM_ACTIVE_MASK); + count = atomic_long_read(&sem->count); + } while (RWSEM_COUNT_LOCKED(count)); raw_spin_lock_irq(&sem->wait_lock); } @@ -607,7 +558,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) raw_spin_lock_irq(&sem->wait_lock); list_del(&waiter.list); if (list_empty(&sem->wait_list)) - atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count); + atomic_long_add(-RWSEM_FLAG_WAITERS, &sem->count); else __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); raw_spin_unlock_irq(&sem->wait_lock); diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index 6d2478c..1febd17 100644 --- a/kernel/locking/rwsem-xadd.h +++ b/kernel/locking/rwsem-xadd.h @@ -37,24 +37,26 @@ #endif /* - * R/W semaphores originally for PPC using the stuff in lib/rwsem.c. - * Adapted largely from include/asm-i386/rwsem.h - * by Paul Mackerras . - */ - -/* - * the semaphore definition + * The definition of the atomic counter in the semaphore: + * + * Bit 0 - writer locked bit + * Bit 1 - waiters present bit + * Bits 2-7 - reserved + * Bits 8-X - 24-bit (32-bit) or 56-bit reader count + * + * atomic_long_fetch_add() is used to obtain reader lock, whereas + * atomic_long_cmpxchg() will be used to obtain writer lock. */ -#ifdef CONFIG_64BIT -# define RWSEM_ACTIVE_MASK 0xffffffffL -#else -# define RWSEM_ACTIVE_MASK 0x0000ffffL -#endif +#define RWSEM_WRITER_LOCKED (1UL << 0) +#define RWSEM_FLAG_WAITERS (1UL << 1) +#define RWSEM_READER_SHIFT 8 +#define RWSEM_READER_BIAS (1UL << RWSEM_READER_SHIFT) +#define RWSEM_READER_MASK (~(RWSEM_READER_BIAS - 1)) +#define RWSEM_WRITER_MASK RWSEM_WRITER_LOCKED +#define RWSEM_LOCK_MASK (RWSEM_WRITER_MASK|RWSEM_READER_MASK) +#define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS) -#define RWSEM_ACTIVE_BIAS 0x00000001L -#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1) -#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS -#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS) +#define RWSEM_COUNT_LOCKED(c) ((c) & RWSEM_LOCK_MASK) #ifdef CONFIG_RWSEM_SPIN_ON_OWNER /* @@ -169,7 +171,8 @@ static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) */ static inline void __down_read(struct rw_semaphore *sem) { - if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) { + if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS, + &sem->count) & RWSEM_READ_FAILED_MASK)) { rwsem_down_read_failed(sem); DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED), sem); @@ -180,7 +183,8 @@ static inline void __down_read(struct rw_semaphore *sem) static inline int __down_read_killable(struct rw_semaphore *sem) { - if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) { + if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS, + &sem->count) & RWSEM_READ_FAILED_MASK)) { if (IS_ERR(rwsem_down_read_failed_killable(sem))) return -EINTR; DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & @@ -195,9 +199,10 @@ static inline int __down_read_trylock(struct rw_semaphore *sem) { long tmp; - while ((tmp = atomic_long_read(&sem->count)) >= 0) { + while (!((tmp = atomic_long_read(&sem->count)) & + RWSEM_READ_FAILED_MASK)) { if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, - tmp + RWSEM_ACTIVE_READ_BIAS)) { + tmp + RWSEM_READER_BIAS)) { rwsem_set_reader_owned(sem); return 1; } @@ -210,22 +215,16 @@ static inline int __down_read_trylock(struct rw_semaphore *sem) */ static inline void __down_write(struct rw_semaphore *sem) { - long tmp; - - tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, - &sem->count); - if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) + if (unlikely(atomic_long_cmpxchg_acquire(&sem->count, 0, + RWSEM_WRITER_LOCKED))) rwsem_down_write_failed(sem); rwsem_set_owner(sem); } static inline int __down_write_killable(struct rw_semaphore *sem) { - long tmp; - - tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, - &sem->count); - if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) + if (unlikely(atomic_long_cmpxchg_acquire(&sem->count, 0, + RWSEM_WRITER_LOCKED))) if (IS_ERR(rwsem_down_write_failed_killable(sem))) return -EINTR; rwsem_set_owner(sem); @@ -234,15 +233,11 @@ static inline int __down_write_killable(struct rw_semaphore *sem) static inline int __down_write_trylock(struct rw_semaphore *sem) { - long tmp; - - tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE, - RWSEM_ACTIVE_WRITE_BIAS); - if (tmp == RWSEM_UNLOCKED_VALUE) { + bool taken = !atomic_long_cmpxchg_acquire(&sem->count, 0, + RWSEM_WRITER_LOCKED); + if (taken) rwsem_set_owner(sem); - return true; - } - return false; + return taken; } /* @@ -255,8 +250,9 @@ static inline void __up_read(struct rw_semaphore *sem) DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED), sem); rwsem_clear_reader_owned(sem); - tmp = atomic_long_dec_return_release(&sem->count); - if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)) + tmp = atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count); + if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS)) + == RWSEM_FLAG_WAITERS)) rwsem_wake(sem); } @@ -267,8 +263,8 @@ static inline void __up_write(struct rw_semaphore *sem) { DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem); rwsem_clear_owner(sem); - if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS, - &sem->count) < 0)) + if (unlikely(atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, + &sem->count) & RWSEM_FLAG_WAITERS)) rwsem_wake(sem); } @@ -287,8 +283,9 @@ static inline void __downgrade_write(struct rw_semaphore *sem) * write side. As such, rely on RELEASE semantics. */ DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem); - tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count); + tmp = atomic_long_fetch_add_release( + -RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS, &sem->count); rwsem_set_reader_owned(sem); - if (tmp < 0) + if (tmp & RWSEM_FLAG_WAITERS) rwsem_downgrade_wake(sem); } From patchwork Thu Feb 7 19:07:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801961 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5778017FB for ; Thu, 7 Feb 2019 19:09:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 478D62DDB2 for ; Thu, 7 Feb 2019 19:09:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3B72A2DDBE; Thu, 7 Feb 2019 19:09:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3F9C52DDB2 for ; Thu, 7 Feb 2019 19:09:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727403AbfBGTJs (ORCPT ); Thu, 7 Feb 2019 14:09:48 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42340 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727122AbfBGTJr (ORCPT ); Thu, 7 Feb 2019 14:09:47 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E1A64C06A7F9; Thu, 7 Feb 2019 19:09:46 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3FAF461146; Thu, 7 Feb 2019 19:09:42 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 12/22] locking/rwsem: Implement lock handoff to prevent lock starvation Date: Thu, 7 Feb 2019 14:07:16 -0500 Message-Id: <1549566446-27967-13-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 07 Feb 2019 19:09:47 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Because of writer lock stealing, it is possible that a constant stream of incoming writers will cause a waiting writer or reader to wait indefinitely leading to lock starvation. The mutex code has a lock handoff mechanism to prevent lock starvation. This patch implements a similar lock handoff mechanism to disable lock stealing and force lock handoff to the first waiter in the queue after at least a 5ms waiting period. The waiting period is used to avoid discouraging lock stealing too much to affect performance. A rwsem microbenchmark was run for 8 seconds on a 4-socket 56-core 128-thread x86-64 system with a v5.0 based kernel and 112 write_lock threads with 5us sleep critical section. Before the patch, the min/mean/max numbers of locking operations for the locking threads were 1/7,708/542,988. After the patch, the figures became 6,031/7,267/9,476. It can be seen that the rwsem became much more fair, though there was a slight drop of about 6% in the mean locking operations done which was a tradeoff of having better fairness. Signed-off-by: Waiman Long --- kernel/locking/lock_events_list.h | 2 + kernel/locking/rwsem-xadd.c | 110 +++++++++++++++++++++++++++++++------- kernel/locking/rwsem-xadd.h | 23 +++++--- 3 files changed, 109 insertions(+), 26 deletions(-) diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index c33c5df..4cde507 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -62,6 +62,8 @@ LOCK_EVENT(rwsem_rlock) /* # of read locks acquired */ LOCK_EVENT(rwsem_rlock_fast) /* # of fast read locks acquired */ LOCK_EVENT(rwsem_rlock_fail) /* # of failed read lock acquisitions */ +LOCK_EVENT(rwsem_rlock_handoff) /* # of read lock handoffs */ LOCK_EVENT(rwsem_wlock) /* # of write locks acquired */ LOCK_EVENT(rwsem_wlock_fail) /* # of failed write lock acquisitions */ +LOCK_EVENT(rwsem_wlock_handoff) /* # of write lock handoffs */ #endif /* CONFIG_RWSEM_XCHGADD_ALGORITHM */ diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 18a414e..12b1d61 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -74,6 +74,7 @@ struct rwsem_waiter { struct list_head list; struct task_struct *task; enum rwsem_waiter_type type; + unsigned long timeout; }; enum rwsem_wake_type { @@ -83,6 +84,12 @@ enum rwsem_wake_type { }; /* + * The minimum waiting time (5ms) in the wait queue before initiating the + * handoff protocol. + */ +#define RWSEM_WAIT_TIMEOUT ((HZ - 1)/200 + 1) + +/* * handle the lock release when processes blocked on it that can now run * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must * have been set. @@ -132,6 +139,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, adjustment = RWSEM_READER_BIAS; oldcount = atomic_long_fetch_add(adjustment, &sem->count); if (unlikely(oldcount & RWSEM_WRITER_MASK)) { + /* + * Initiate handoff to reader, if applicable. + */ + if (!(oldcount & RWSEM_FLAG_HANDOFF) && + time_after(jiffies, waiter->timeout)) { + adjustment -= RWSEM_FLAG_HANDOFF; + lockevent_inc(rwsem_rlock_handoff); + } + atomic_long_sub(adjustment, &sem->count); return; } @@ -180,6 +196,12 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, adjustment -= RWSEM_FLAG_WAITERS; } + /* + * Clear the handoff flag + */ + if (woken && RWSEM_COUNT_HANDOFF(atomic_long_read(&sem->count))) + adjustment -= RWSEM_FLAG_HANDOFF; + if (adjustment) atomic_long_add(adjustment, &sem->count); } @@ -189,15 +211,19 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, * race conditions between checking the rwsem wait list and setting the * sem->count accordingly. */ -static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem) +static inline bool +rwsem_try_write_lock(long count, struct rw_semaphore *sem, bool first) { long new; if (RWSEM_COUNT_LOCKED(count)) return false; - new = count + RWSEM_WRITER_LOCKED - - (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0); + if (!first && RWSEM_COUNT_HANDOFF(count)) + return false; + + new = (count & ~RWSEM_FLAG_HANDOFF) + RWSEM_WRITER_LOCKED - + (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0); if (atomic_long_cmpxchg_acquire(&sem->count, count, new) == count) { rwsem_set_owner(sem); @@ -216,7 +242,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem) long old, count = atomic_long_read(&sem->count); while (true) { - if (RWSEM_COUNT_LOCKED(count)) + if (RWSEM_COUNT_LOCKED_OR_HANDOFF(count)) return false; old = atomic_long_cmpxchg_acquire(&sem->count, count, @@ -374,6 +400,16 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) #endif /* + * This is safe to be called without holding the wait_lock. + */ +static inline bool +rwsem_waiter_is_first(struct rw_semaphore *sem, struct rwsem_waiter *waiter) +{ + return list_first_entry(&sem->wait_list, struct rwsem_waiter, list) + == waiter; +} + +/* * Wait for the read lock to be granted */ static inline struct rw_semaphore __sched * @@ -385,6 +421,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) waiter.task = current; waiter.type = RWSEM_WAITING_FOR_READ; + waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT; raw_spin_lock_irq(&sem->wait_lock); if (list_empty(&sem->wait_list)) { @@ -441,8 +478,12 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) return sem; out_nolock: list_del(&waiter.list); - if (list_empty(&sem->wait_list)) - atomic_long_add(-RWSEM_FLAG_WAITERS, &sem->count); + if (list_empty(&sem->wait_list)) { + int adjustment = -RWSEM_FLAG_WAITERS - + (atomic_long_read(&sem->count) & RWSEM_FLAG_HANDOFF); + + atomic_long_add(adjustment, &sem->count); + } raw_spin_unlock_irq(&sem->wait_lock); __set_current_state(TASK_RUNNING); lockevent_inc(rwsem_rlock_fail); @@ -469,8 +510,8 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) static inline struct rw_semaphore * __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state) { - long count; - bool waiting = true; /* any queued threads before us */ + long count, adjustment; + bool first; /* First one in queue */ struct rwsem_waiter waiter; struct rw_semaphore *ret = sem; DEFINE_WAKE_Q(wake_q); @@ -485,17 +526,17 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) */ waiter.task = current; waiter.type = RWSEM_WAITING_FOR_WRITE; + waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT; raw_spin_lock_irq(&sem->wait_lock); /* account for this before adding a new element to the list */ - if (list_empty(&sem->wait_list)) - waiting = false; + first = list_empty(&sem->wait_list); list_add_tail(&waiter.list, &sem->wait_list); /* we're now waiting on the lock, but no longer actively locking */ - if (waiting) { + if (!first) { count = atomic_long_read(&sem->count); /* @@ -529,12 +570,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) /* wait until we successfully acquire the lock */ set_current_state(state); while (true) { - if (rwsem_try_write_lock(count, sem)) + if (rwsem_try_write_lock(count, sem, first)) break; + raw_spin_unlock_irq(&sem->wait_lock); /* Block until there are no active lockers. */ - do { + for (;;) { if (signal_pending_state(state, current)) goto out_nolock; @@ -542,7 +584,29 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) lockevent_inc(rwsem_sleep_writer); set_current_state(state); count = atomic_long_read(&sem->count); - } while (RWSEM_COUNT_LOCKED(count)); + + if (!first) + first = rwsem_waiter_is_first(sem, &waiter); + + if (!RWSEM_COUNT_LOCKED(count)) + break; + + if (first && !RWSEM_COUNT_HANDOFF(count) && + time_after(jiffies, waiter.timeout)) { + atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count); + /* + * Make sure the handoff bit is seen by + * others before proceeding. + */ + smp_mb__after_atomic(); + lockevent_inc(rwsem_wlock_handoff); + /* + * Do a trylock after setting the handoff + * flag to avoid missed wakeup. + */ + break; + } + } raw_spin_lock_irq(&sem->wait_lock); } @@ -557,9 +621,15 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) __set_current_state(TASK_RUNNING); raw_spin_lock_irq(&sem->wait_lock); list_del(&waiter.list); + adjustment = 0; if (list_empty(&sem->wait_list)) - atomic_long_add(-RWSEM_FLAG_WAITERS, &sem->count); - else + adjustment -= RWSEM_FLAG_WAITERS; + if (first) + adjustment -= (atomic_long_read(&sem->count) & + RWSEM_FLAG_HANDOFF); + if (adjustment) + atomic_long_add(adjustment, &sem->count); + if (!list_empty(&sem->wait_list)) __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); raw_spin_unlock_irq(&sem->wait_lock); wake_up_q(&wake_q); @@ -587,7 +657,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) * - up_read/up_write has decremented the active part of count if we come here */ __visible -struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem) +struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, long count) { unsigned long flags; DEFINE_WAKE_Q(wake_q); @@ -620,7 +690,9 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem) smp_rmb(); /* - * If a spinner is present, it is not necessary to do the wakeup. + * If a spinner is present and the handoff flag isn't set, it is + * not necessary to do the wakeup. + * * Try to do wakeup only if the trylock succeeds to minimize * spinlock contention which may introduce too much delay in the * unlock operation. @@ -639,7 +711,7 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem) * rwsem_has_spinner() is true, it will guarantee at least one * trylock attempt on the rwsem later on. */ - if (rwsem_has_spinner(sem)) { + if (rwsem_has_spinner(sem) && !RWSEM_COUNT_HANDOFF(count)) { /* * The smp_rmb() here is to make sure that the spinner * state is consulted before reading the wait_lock. diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index 1febd17..6d4890d 100644 --- a/kernel/locking/rwsem-xadd.h +++ b/kernel/locking/rwsem-xadd.h @@ -41,7 +41,8 @@ * * Bit 0 - writer locked bit * Bit 1 - waiters present bit - * Bits 2-7 - reserved + * Bit 2 - lock handoff bit + * Bits 3-7 - reserved * Bits 8-X - 24-bit (32-bit) or 56-bit reader count * * atomic_long_fetch_add() is used to obtain reader lock, whereas @@ -49,14 +50,20 @@ */ #define RWSEM_WRITER_LOCKED (1UL << 0) #define RWSEM_FLAG_WAITERS (1UL << 1) +#define RWSEM_FLAG_HANDOFF (1UL << 2) + #define RWSEM_READER_SHIFT 8 #define RWSEM_READER_BIAS (1UL << RWSEM_READER_SHIFT) #define RWSEM_READER_MASK (~(RWSEM_READER_BIAS - 1)) #define RWSEM_WRITER_MASK RWSEM_WRITER_LOCKED #define RWSEM_LOCK_MASK (RWSEM_WRITER_MASK|RWSEM_READER_MASK) -#define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS) +#define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\ + RWSEM_FLAG_HANDOFF) #define RWSEM_COUNT_LOCKED(c) ((c) & RWSEM_LOCK_MASK) +#define RWSEM_COUNT_HANDOFF(c) ((c) & RWSEM_FLAG_HANDOFF) +#define RWSEM_COUNT_LOCKED_OR_HANDOFF(c) \ + ((c) & (RWSEM_LOCK_MASK|RWSEM_FLAG_HANDOFF)) #ifdef CONFIG_RWSEM_SPIN_ON_OWNER /* @@ -163,7 +170,7 @@ static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem); extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem); extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem); -extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem); +extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, long count); extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem); /* @@ -253,7 +260,7 @@ static inline void __up_read(struct rw_semaphore *sem) tmp = atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count); if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS)) == RWSEM_FLAG_WAITERS)) - rwsem_wake(sem); + rwsem_wake(sem, tmp); } /* @@ -261,11 +268,13 @@ static inline void __up_read(struct rw_semaphore *sem) */ static inline void __up_write(struct rw_semaphore *sem) { + long tmp; + DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem); rwsem_clear_owner(sem); - if (unlikely(atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, - &sem->count) & RWSEM_FLAG_WAITERS)) - rwsem_wake(sem); + tmp = atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count); + if (unlikely(tmp & RWSEM_FLAG_WAITERS)) + rwsem_wake(sem, tmp); } /* From patchwork Thu Feb 7 19:07:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801991 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A9A1F746 for ; Thu, 7 Feb 2019 19:11:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9A90D2CD35 for ; Thu, 7 Feb 2019 19:11:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8E3912E060; Thu, 7 Feb 2019 19:11:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25A752E05F for ; Thu, 7 Feb 2019 19:11:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727430AbfBGTJv (ORCPT ); Thu, 7 Feb 2019 14:09:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50730 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727122AbfBGTJu (ORCPT ); Thu, 7 Feb 2019 14:09:50 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3875A10FA1; Thu, 7 Feb 2019 19:09:49 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0629D60FDF; Thu, 7 Feb 2019 19:09:46 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 13/22] locking/rwsem: Remove rwsem_wake() wakeup optimization Date: Thu, 7 Feb 2019 14:07:17 -0500 Message-Id: <1549566446-27967-14-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 07 Feb 2019 19:09:49 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With the commit 59aabfc7e959 ("locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write()"), the rwsem_wake() forgoes doing a wakeup if the wait_lock cannot be directly acquired and an optimistic spinning locker is present. This can help performance by avoiding spinning on the wait_lock when it is contended. With the later commit 133e89ef5ef3 ("locking/rwsem: Enable lockless waiter wakeup(s)"), the performance advantage of the above optimization diminishes as the average wait_lock hold time become much shorter. By supporting rwsem lock handoff, we can no longer relies on the fact that the presence of an optimistic spinning locker will ensure that the lock will be acquired by a task soon. This can lead to missed wakeup and application hang. So the commit 59aabfc7e959 ("locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write()") will have to be reverted. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 74 --------------------------------------------- 1 file changed, 74 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 12b1d61..5f74bae 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -378,25 +378,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) lockevent_cond_inc(rwsem_opt_fail, !taken); return taken; } - -/* - * Return true if the rwsem has active spinner - */ -static inline bool rwsem_has_spinner(struct rw_semaphore *sem) -{ - return osq_is_locked(&sem->osq); -} - #else static bool rwsem_optimistic_spin(struct rw_semaphore *sem) { return false; } - -static inline bool rwsem_has_spinner(struct rw_semaphore *sem) -{ - return false; -} #endif /* @@ -662,67 +648,7 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, long count) unsigned long flags; DEFINE_WAKE_Q(wake_q); - /* - * __rwsem_down_write_failed_common(sem) - * rwsem_optimistic_spin(sem) - * osq_unlock(sem->osq) - * ... - * atomic_long_add_return(&sem->count) - * - * - VS - - * - * __up_write() - * if (atomic_long_sub_return_release(&sem->count) < 0) - * rwsem_wake(sem) - * osq_is_locked(&sem->osq) - * - * And __up_write() must observe !osq_is_locked() when it observes the - * atomic_long_add_return() in order to not miss a wakeup. - * - * This boils down to: - * - * [S.rel] X = 1 [RmW] r0 = (Y += 0) - * MB RMB - * [RmW] Y += 1 [L] r1 = X - * - * exists (r0=1 /\ r1=0) - */ - smp_rmb(); - - /* - * If a spinner is present and the handoff flag isn't set, it is - * not necessary to do the wakeup. - * - * Try to do wakeup only if the trylock succeeds to minimize - * spinlock contention which may introduce too much delay in the - * unlock operation. - * - * spinning writer up_write/up_read caller - * --------------- ----------------------- - * [S] osq_unlock() [L] osq - * MB RMB - * [RmW] rwsem_try_write_lock() [RmW] spin_trylock(wait_lock) - * - * Here, it is important to make sure that there won't be a missed - * wakeup while the rwsem is free and the only spinning writer goes - * to sleep without taking the rwsem. Even when the spinning writer - * is just going to break out of the waiting loop, it will still do - * a trylock in rwsem_down_write_failed() before sleeping. IOW, if - * rwsem_has_spinner() is true, it will guarantee at least one - * trylock attempt on the rwsem later on. - */ - if (rwsem_has_spinner(sem) && !RWSEM_COUNT_HANDOFF(count)) { - /* - * The smp_rmb() here is to make sure that the spinner - * state is consulted before reading the wait_lock. - */ - smp_rmb(); - if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags)) - return sem; - goto locked; - } raw_spin_lock_irqsave(&sem->wait_lock, flags); -locked: if (!list_empty(&sem->wait_list)) __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); From patchwork Thu Feb 7 19:07:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801969 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E63F3746 for ; Thu, 7 Feb 2019 19:10:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D5A9C2CCF9 for ; Thu, 7 Feb 2019 19:10:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C9FC02DDB9; Thu, 7 Feb 2019 19:10:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 443202CCF9 for ; Thu, 7 Feb 2019 19:10:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727475AbfBGTJ5 (ORCPT ); Thu, 7 Feb 2019 14:09:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47220 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727441AbfBGTJw (ORCPT ); Thu, 7 Feb 2019 14:09:52 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7107B3C2CFC; Thu, 7 Feb 2019 19:09:51 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 518E761146; Thu, 7 Feb 2019 19:09:49 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 14/22] locking/rwsem: Add more rwsem owner access helpers Date: Thu, 7 Feb 2019 14:07:18 -0500 Message-Id: <1549566446-27967-15-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 07 Feb 2019 19:09:52 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Before combining owner and count, we are adding two new helpers for accessing the owner value in the rwsem. 1) struct task_struct *rwsem_get_owner(struct rw_semaphore *sem) 2) bool is_rwsem_reader_owned(struct rw_semaphore *sem) Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 11 ++++++----- kernel/locking/rwsem-xadd.h | 32 ++++++++++++++++++++++++++------ kernel/locking/rwsem.c | 3 +-- 3 files changed, 33 insertions(+), 13 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 5f74bae..719d390 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -277,7 +277,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) return false; rcu_read_lock(); - owner = READ_ONCE(sem->owner); + owner = rwsem_get_owner(sem); if (owner) { ret = is_rwsem_owner_spinnable(owner) && owner_on_cpu(owner); @@ -291,13 +291,13 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) */ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) { - struct task_struct *owner = READ_ONCE(sem->owner); + struct task_struct *owner = rwsem_get_owner(sem); if (!is_rwsem_owner_spinnable(owner)) return false; rcu_read_lock(); - while (owner && (READ_ONCE(sem->owner) == owner)) { + while (owner && (rwsem_get_owner(sem) == owner)) { /* * Ensure we emit the owner->on_cpu, dereference _after_ * checking sem->owner still matches owner, if that fails, @@ -323,7 +323,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) * If there is a new owner or the owner is not set, we continue * spinning. */ - return is_rwsem_owner_spinnable(READ_ONCE(sem->owner)); + return is_rwsem_owner_spinnable(rwsem_get_owner(sem)); } static bool rwsem_optimistic_spin(struct rw_semaphore *sem) @@ -361,7 +361,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) * we're an RT task that will live-lock because we won't let * the owner complete. */ - if (!sem->owner && (need_resched() || rt_task(current))) + if (!rwsem_get_owner(sem) && + (need_resched() || rt_task(current))) break; /* diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index 6d4890d..277a134 100644 --- a/kernel/locking/rwsem-xadd.h +++ b/kernel/locking/rwsem-xadd.h @@ -83,6 +83,11 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem) WRITE_ONCE(sem->owner, NULL); } +static inline struct task_struct *rwsem_get_owner(struct rw_semaphore *sem) +{ + return READ_ONCE(sem->owner); +} + /* * The task_struct pointer of the last owning reader will be left in * the owner field. @@ -116,6 +121,23 @@ static inline bool is_rwsem_owner_spinnable(struct task_struct *owner) } /* + * Return true if the rwsem is owned by a reader. + */ +static inline bool is_rwsem_reader_owned(struct rw_semaphore *sem) +{ +#ifdef CONFIG_DEBUG_RWSEMS + /* + * Check the count to see if it is write-locked. + */ + long count = atomic_long_read(&sem->count); + + if (count & RWSEM_WRITER_MASK) + return false; +#endif + return (unsigned long)sem->owner & RWSEM_READER_OWNED; +} + +/* * Return true if rwsem is owned by an anonymous writer or readers. */ static inline bool rwsem_has_anonymous_owner(struct task_struct *owner) @@ -135,6 +157,7 @@ static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) { unsigned long val = (unsigned long)current | RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED; + if (READ_ONCE(sem->owner) == (struct task_struct *)val) cmpxchg_relaxed((unsigned long *)&sem->owner, val, RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED); @@ -181,8 +204,7 @@ static inline void __down_read(struct rw_semaphore *sem) if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count) & RWSEM_READ_FAILED_MASK)) { rwsem_down_read_failed(sem); - DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & - RWSEM_READER_OWNED), sem); + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); } else { rwsem_set_reader_owned(sem); } @@ -194,8 +216,7 @@ static inline int __down_read_killable(struct rw_semaphore *sem) &sem->count) & RWSEM_READ_FAILED_MASK)) { if (IS_ERR(rwsem_down_read_failed_killable(sem))) return -EINTR; - DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & - RWSEM_READER_OWNED), sem); + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); } else { rwsem_set_reader_owned(sem); } @@ -254,8 +275,7 @@ static inline void __up_read(struct rw_semaphore *sem) { long tmp; - DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED), - sem); + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); rwsem_clear_reader_owned(sem); tmp = atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count); if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS)) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index bdfca7c..79fa6e4 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -203,8 +203,7 @@ int __sched down_write_killable_nested(struct rw_semaphore *sem, int subclass) void up_read_non_owner(struct rw_semaphore *sem) { - DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED), - sem); + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); __up_read(sem); } From patchwork Thu Feb 7 19:07:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801965 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23607746 for ; Thu, 7 Feb 2019 19:10:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1066C2CCF9 for ; Thu, 7 Feb 2019 19:10:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 03BBB2DDB9; Thu, 7 Feb 2019 19:10:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 33FC32DDB2 for ; Thu, 7 Feb 2019 19:10:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727481AbfBGTJ5 (ORCPT ); Thu, 7 Feb 2019 14:09:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49254 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726882AbfBGTJy (ORCPT ); Thu, 7 Feb 2019 14:09:54 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A159E41A59; Thu, 7 Feb 2019 19:09:53 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 873DA62FA5; Thu, 7 Feb 2019 19:09:51 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 15/22] locking/rwsem: Merge owner into count on x86-64 Date: Thu, 7 Feb 2019 14:07:19 -0500 Message-Id: <1549566446-27967-16-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 07 Feb 2019 19:09:54 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With separate count and owner, there are timing windows where the two values are inconsistent. That can cause problem when trying to figure out the exact state of the rwsem. For instance, a RT task will stop optimistic spinning if the lock is acquired by a writer but the owner field isn't set yet. That can be solved by combining the count and owner together in a single atomic value. On 32-bit architectures, there aren't enough bits to hold both. 64-bit architectures, however, can have enough bits to do that. For x86-64, the physical address can use up to 52 bits. That is 4PB of memory. That leaves 12 bits available for other use. The task structure pointer is also aligned to the L1 cache size. That means another 6 bits (64 bytes cacheline) will be available. Reserving 2 bits for status flags, we will have 16 bits for the reader count. That can supports up to (64k-1) readers. The owner value will still be duplicated in the owner field for the purpose of signalling that the task is in the process of acquiring or releasing a rwsem. This change is currently for x86-64 only. Other 64-bit architectures may be enabled in the future if the need arises. With a locking microbenchmark running on 5.0 based kernel, the total locking rates (in kops/s) of the benchmark on a 4-socket 56-core x86-64 system before and after the patch were as follows: Before Patch After Patch # of Threads wlock rlock wlock rlock ------------ ----- ----- ----- ----- 1 29,085 30,179 27,892 29,514 2 7,341 14,084 6,240 14,304 4 7,393 14,246 5,216 11,754 8 7,139 13,860 5,400 11,308 16 6,650 15,773 5,744 15,405 This change does have an impact on both read and write lock performance. Signed-off-by: Waiman Long Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 20 +++++++-- kernel/locking/rwsem-xadd.h | 105 +++++++++++++++++++++++++++++++++++++++----- 2 files changed, 110 insertions(+), 15 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 719d390..0869fbf 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -27,11 +27,11 @@ /* * Guide to the rw_semaphore's count field. * - * When the RWSEM_WRITER_LOCKED bit in count is set, the lock is owned - * by a writer. + * When any of the RWSEM_WRITER_MASK bits in count is set, the lock is + * owned by a writer. * * The lock is owned by readers when - * (1) the RWSEM_WRITER_LOCKED isn't set in count, + * (1) none of the RWSEM_WRITER_MASK bits is set in count, * (2) some of the reader bits are set in count, and * (3) the owner field has RWSEM_READ_OWNED bit set. * @@ -47,6 +47,11 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name, struct lock_class_key *key) { + /* + * We should support at least (4k-1) concurrent readers + */ + BUILD_BUG_ON(sizeof(long) * 8 - RWSEM_READER_SHIFT < 12); + #ifdef CONFIG_DEBUG_LOCK_ALLOC /* * Make sure we are not reinitializing a held semaphore: @@ -297,7 +302,14 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) return false; rcu_read_lock(); - while (owner && (rwsem_get_owner(sem) == owner)) { + /* + * In case the owner task pointer is also stored in the count, + * checking the sem->owner value alone will give an early indication + * if the owner is about to release the lock (sem->owner cleared). + * This enables the spinner to move forward and do a trylock + * earlier. + */ + while (owner && (READ_ONCE(sem->owner) == owner)) { /* * Ensure we emit the owner->on_cpu, dereference _after_ * checking sem->owner still matches owner, if that fails, diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index 277a134..d54b5db 100644 --- a/kernel/locking/rwsem-xadd.h +++ b/kernel/locking/rwsem-xadd.h @@ -37,25 +37,73 @@ #endif /* - * The definition of the atomic counter in the semaphore: + * With separate count and owner, there are timing windows where the two + * values are inconsistent. That can cause problem when trying to figure + * out the exact state of the rwsem. That can be solved by combining + * the count and owner together in a single atomic value. * - * Bit 0 - writer locked bit - * Bit 1 - waiters present bit - * Bit 2 - lock handoff bit - * Bits 3-7 - reserved - * Bits 8-X - 24-bit (32-bit) or 56-bit reader count + * On 64-bit architectures, the owner task structure pointer can be + * compressed and combined with reader count and other status flags. + * A simple compression method is to map the virtual address back to + * the physical address by subtracting PAGE_OFFSET. On 32-bit + * architectures, the long integer value just isn't big enough for + * combining owner and count. So they remain separate. + * + * For x86-64, the physical address can use up to 52 bits. That is 4PB + * of memory. That leaves 12 bits available for other use. The task + * structure pointer is also aligned to the L1 cache size. That means + * another 6 bits (64 bytes cacheline) will be available. Reserving + * 2 bits for status flags, we will have 16 bits for the reader count. + * That can supports up to (64k-1) readers. + * + * On x86-64, the bit definitions of the count are: + * + * Bit 0 - waiters present bit + * Bit 1 - lock handoff bit + * Bits 2-47 - compressed task structure pointer + * Bits 48-63 - 16-bit reader counts + * + * On other 64-bit architectures, the bit definitions are: + * + * Bit 0 - waiters present bit + * Bit 1 - lock handoff bit + * Bits 2-6 - reserved + * Bit 7 - writer lock bit + * Bits 8-63 - 56-bit reader counts + * + * On 32-bit architectures, the bit definitions of the count are: + * + * Bit 0 - waiters present bit + * Bit 1 - lock handoff bit + * Bits 2-6 - reserved + * Bit 7 - writer lock bit + * Bits 8-31 - 24-bit reader counts * * atomic_long_fetch_add() is used to obtain reader lock, whereas * atomic_long_cmpxchg() will be used to obtain writer lock. */ -#define RWSEM_WRITER_LOCKED (1UL << 0) -#define RWSEM_FLAG_WAITERS (1UL << 1) -#define RWSEM_FLAG_HANDOFF (1UL << 2) +#define RWSEM_FLAG_WAITERS (1UL << 0) +#define RWSEM_FLAG_HANDOFF (1UL << 1) +#ifdef CONFIG_X86_64 + +#ifdef __PHYSICAL_MASK_SHIFT +#define RWSEM_PA_MASK_SHIFT __PHYSICAL_MASK_SHIFT +#else +#define RWSEM_PA_MASK_SHIFT 52 +#endif +#define RWSEM_READER_SHIFT (RWSEM_PA_MASK_SHIFT - L1_CACHE_SHIFT + 2) +#define RWSEM_WRITER_MASK ((1UL << RWSEM_READER_SHIFT) - 4) +#define RWSEM_WRITER_LOCKED rwsem_owner_count(current) + +#else /* CONFIG_X86_64 */ +#define RWSEM_WRITER_MASK (1UL << 7) #define RWSEM_READER_SHIFT 8 +#define RWSEM_WRITER_LOCKED RWSEM_WRITER_MASK +#endif /* CONFIG_X86_64 */ + #define RWSEM_READER_BIAS (1UL << RWSEM_READER_SHIFT) #define RWSEM_READER_MASK (~(RWSEM_READER_BIAS - 1)) -#define RWSEM_WRITER_MASK RWSEM_WRITER_LOCKED #define RWSEM_LOCK_MASK (RWSEM_WRITER_MASK|RWSEM_READER_MASK) #define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\ RWSEM_FLAG_HANDOFF) @@ -65,6 +113,21 @@ #define RWSEM_COUNT_LOCKED_OR_HANDOFF(c) \ ((c) & (RWSEM_LOCK_MASK|RWSEM_FLAG_HANDOFF)) +/* + * Task structure pointer compression (64-bit only): + * (owner - PAGE_OFFSET) >> (L1_CACHE_SHIFT - 2) + */ +static inline unsigned long rwsem_owner_count(struct task_struct *owner) +{ + return ((unsigned long)owner - PAGE_OFFSET) >> (L1_CACHE_SHIFT - 2); +} + +static inline unsigned long rwsem_count_owner(long count) +{ + return (((unsigned long)count & RWSEM_WRITER_MASK) + << (L1_CACHE_SHIFT - 2)) + PAGE_OFFSET; +} + #ifdef CONFIG_RWSEM_SPIN_ON_OWNER /* * All writes to owner are protected by WRITE_ONCE() to make sure that @@ -72,7 +135,12 @@ * the owner value concurrently without lock. Read from owner, however, * may not need READ_ONCE() as long as the pointer value is only used * for comparison and isn't being dereferenced. + * + * On 32-bit architectures, the owner and count are separate. On 64-bit + * architectures, however, the writer task structure pointer is written + * to the count as well in addition to the owner field. */ + static inline void rwsem_set_owner(struct rw_semaphore *sem) { WRITE_ONCE(sem->owner, current); @@ -83,10 +151,22 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem) WRITE_ONCE(sem->owner, NULL); } +#ifdef CONFIG_X86_64 +/* + * Get the owner value from count to have early access to the task structure. + */ +static inline struct task_struct *rwsem_get_owner(struct rw_semaphore *sem) +{ + return (struct task_struct *) + (rwsem_count_owner(atomic_long_read(&sem->count)) | + ((unsigned long)READ_ONCE(sem->owner) & 3)); +} +#else /* !CONFIG_X86_64 */ static inline struct task_struct *rwsem_get_owner(struct rw_semaphore *sem) { return READ_ONCE(sem->owner); } +#endif /* CONFIG_X86_64 */ /* * The task_struct pointer of the last owning reader will be left in @@ -291,8 +371,11 @@ static inline void __up_write(struct rw_semaphore *sem) long tmp; DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem); +#ifdef CONFIG_X86_64 + DEBUG_RWSEMS_WARN_ON(sem->owner != rwsem_get_owner(sem), sem); +#endif rwsem_clear_owner(sem); - tmp = atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count); + tmp = atomic_long_fetch_and_release(~RWSEM_WRITER_MASK, &sem->count); if (unlikely(tmp & RWSEM_FLAG_WAITERS)) rwsem_wake(sem, tmp); } From patchwork Thu Feb 7 19:07:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801987 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 355D417FB for ; Thu, 7 Feb 2019 19:10:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 271292CD35 for ; Thu, 7 Feb 2019 19:10:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B2A22E060; Thu, 7 Feb 2019 19:10:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A6A582CD35 for ; Thu, 7 Feb 2019 19:10:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727533AbfBGTKQ (ORCPT ); Thu, 7 Feb 2019 14:10:16 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60284 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727122AbfBGTJ4 (ORCPT ); Thu, 7 Feb 2019 14:09:56 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ED98F7AE81; Thu, 7 Feb 2019 19:09:55 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id BF6A560FDF; Thu, 7 Feb 2019 19:09:53 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 16/22] locking/rwsem: Remove redundant computation of writer lock word Date: Thu, 7 Feb 2019 14:07:20 -0500 Message-Id: <1549566446-27967-17-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 07 Feb 2019 19:09:56 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 64-bit architectures, each rwsem writer will have its unique lock word for acquiring the lock. Right now, the writer code recomputes the lock word every time it tries to acquire the lock. This is a waste of time. The lock word is now cached and reused when it is needed. On 32-bit architectures, the extra constant argument to rwsem_try_write_lock() and rwsem_try_write_lock_unqueued() should be optimized out by the compiler. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 0869fbf..16dc7a1 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -216,8 +216,8 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, * race conditions between checking the rwsem wait list and setting the * sem->count accordingly. */ -static inline bool -rwsem_try_write_lock(long count, struct rw_semaphore *sem, bool first) +static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem, + const long wlock, bool first) { long new; @@ -227,7 +227,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, if (!first && RWSEM_COUNT_HANDOFF(count)) return false; - new = (count & ~RWSEM_FLAG_HANDOFF) + RWSEM_WRITER_LOCKED - + new = (count & ~RWSEM_FLAG_HANDOFF) + wlock - (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0); if (atomic_long_cmpxchg_acquire(&sem->count, count, new) == count) { @@ -242,7 +242,8 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, /* * Try to acquire write lock before the writer has been put on wait queue. */ -static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem) +static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem, + const long wlock) { long old, count = atomic_long_read(&sem->count); @@ -251,7 +252,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem) return false; old = atomic_long_cmpxchg_acquire(&sem->count, count, - count + RWSEM_WRITER_LOCKED); + count + wlock); if (old == count) { rwsem_set_owner(sem); lockevent_inc(rwsem_opt_wlock); @@ -338,7 +339,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) return is_rwsem_owner_spinnable(rwsem_get_owner(sem)); } -static bool rwsem_optimistic_spin(struct rw_semaphore *sem) +static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) { bool taken = false; @@ -362,7 +363,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) /* * Try to acquire the lock */ - if (rwsem_try_write_lock_unqueued(sem)) { + if (rwsem_try_write_lock_unqueued(sem, wlock)) { taken = true; break; } @@ -392,7 +393,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) return taken; } #else -static bool rwsem_optimistic_spin(struct rw_semaphore *sem) +static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) { return false; } @@ -514,9 +515,10 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) struct rwsem_waiter waiter; struct rw_semaphore *ret = sem; DEFINE_WAKE_Q(wake_q); + const long wlock = RWSEM_WRITER_LOCKED; /* do optimistic spinning and steal lock if possible */ - if (rwsem_optimistic_spin(sem)) + if (rwsem_optimistic_spin(sem, wlock)) return sem; /* @@ -569,7 +571,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) /* wait until we successfully acquire the lock */ set_current_state(state); while (true) { - if (rwsem_try_write_lock(count, sem, first)) + if (rwsem_try_write_lock(count, sem, wlock, first)) break; raw_spin_unlock_irq(&sem->wait_lock); From patchwork Thu Feb 7 19:07:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801967 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4B241669 for ; Thu, 7 Feb 2019 19:10:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 950752CCF9 for ; Thu, 7 Feb 2019 19:10:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8921A2DDB9; Thu, 7 Feb 2019 19:10:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 212FE2DDB2 for ; Thu, 7 Feb 2019 19:10:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727074AbfBGTKH (ORCPT ); Thu, 7 Feb 2019 14:10:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42784 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726576AbfBGTKH (ORCPT ); Thu, 7 Feb 2019 14:10:07 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 00B3AC06A802; Thu, 7 Feb 2019 19:10:06 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 18D6161146; Thu, 7 Feb 2019 19:09:56 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 17/22] locking/rwsem: Recheck owner if it is not on cpu Date: Thu, 7 Feb 2019 14:07:21 -0500 Message-Id: <1549566446-27967-18-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 07 Feb 2019 19:10:06 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP After merging the owner value directly into the count field, it was found that the number of failed optimistic spinning operations increased significantly during the boot up process. The cause of this increased failures was tracked down to the condition that a lock holder might have just released the lock and gone to sleep right after its owner value was fetched by a spinner. So the task might have slept, but it was no longer the lock holder. The merging of owner into count increases the chance this condition can happen. To close this failure mode, we are now rechecking the owner value again to see if it has been changed in case it is not on cpu. On a 1-socket x86-64 system, the lock event counts before the patch were: rwsem_opt_fail=5847 rwsem_opt_wlock=7880 rwsem_wlock=5847 After the patch, the counts were: rwsem_opt_fail=225 rwsem_opt_wlock=8541 rwsem_wlock=225 Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 16dc7a1..21d462f 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -263,13 +263,25 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem, } } -static inline bool owner_on_cpu(struct task_struct *owner) +static inline bool owner_on_cpu(struct task_struct *owner, + struct rw_semaphore *sem) { /* * As lock holder preemption issue, we both skip spinning if * task is not on cpu or its cpu is preempted */ - return owner->on_cpu && !vcpu_is_preempted(task_cpu(owner)); + bool oncpu = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner)); + + /* + * There is a slight chance that the lock holder might have + * just release the rwsem and gone to sleep right after we + * fetched the owner value. So we double-check the sem->owner + * field again to see if it has been changed. The sem->owner + * would have been cleared right before the lock was released. + */ + if (!oncpu && (READ_ONCE(sem->owner) != owner)) + return true; /* Assume the new owner is on cpu */ + return oncpu; } static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) @@ -286,7 +298,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) owner = rwsem_get_owner(sem); if (owner) { ret = is_rwsem_owner_spinnable(owner) && - owner_on_cpu(owner); + owner_on_cpu(owner, sem); } rcu_read_unlock(); return ret; @@ -323,7 +335,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) * abort spinning when need_resched or owner is not running or * owner's cpu is preempted. */ - if (need_resched() || !owner_on_cpu(owner)) { + if (need_resched() || !owner_on_cpu(owner, sem)) { rcu_read_unlock(); return false; } From patchwork Thu Feb 7 19:07:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801985 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C126F1669 for ; Thu, 7 Feb 2019 19:10:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B23832E05F for ; Thu, 7 Feb 2019 19:10:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A50B62E077; Thu, 7 Feb 2019 19:10:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 378992CD35 for ; Thu, 7 Feb 2019 19:10:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727541AbfBGTKQ (ORCPT ); Thu, 7 Feb 2019 14:10:16 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42922 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726576AbfBGTKP (ORCPT ); Thu, 7 Feb 2019 14:10:15 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C3FF8C0669A8; Thu, 7 Feb 2019 19:10:13 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1846D17A75; Thu, 7 Feb 2019 19:10:06 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 18/22] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Date: Thu, 7 Feb 2019 14:07:22 -0500 Message-Id: <1549566446-27967-19-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 07 Feb 2019 19:10:15 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch modifies rwsem_spin_on_owner() to return a tri-state value to better reflect the state of lock holder which enables us to make a better decision of what to do next. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 25 +++++++++++++++++++------ kernel/locking/rwsem-xadd.h | 5 +++++ 2 files changed, 24 insertions(+), 6 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 21d462f..0a29aac 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -305,14 +305,24 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) } /* - * Return true only if we can still spin on the owner field of the rwsem. + * Return the folowing three values depending on the lock owner state. + * 1 when owner changes and no reader is detected yet. + * 0 when owner changes and the new owner may be a reader. + * -1 when optimistic spinning has to stop because either the owner stops + * running, is unknown, or its timeslice has been used up. */ -static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) +enum owner_state { + OWNER_SPINNABLE = 1, + OWNER_READER = 0, + OWNER_NONSPINNABLE = -1, +}; + +static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem) { struct task_struct *owner = rwsem_get_owner(sem); if (!is_rwsem_owner_spinnable(owner)) - return false; + return OWNER_NONSPINNABLE; rcu_read_lock(); /* @@ -337,7 +347,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) */ if (need_resched() || !owner_on_cpu(owner, sem)) { rcu_read_unlock(); - return false; + return OWNER_NONSPINNABLE; } cpu_relax(); @@ -348,7 +358,10 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) * If there is a new owner or the owner is not set, we continue * spinning. */ - return is_rwsem_owner_spinnable(rwsem_get_owner(sem)); + owner = rwsem_get_owner(sem); + if (!is_rwsem_owner_spinnable(owner)) + return OWNER_NONSPINNABLE; + return is_rwsem_owner_reader(owner) ? OWNER_READER : OWNER_SPINNABLE; } static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) @@ -371,7 +384,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) * 2) readers own the lock as we can't determine if they are * actively running or not. */ - while (rwsem_spin_on_owner(sem)) { + while (rwsem_spin_on_owner(sem) == OWNER_SPINNABLE) { /* * Try to acquire the lock */ diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index d54b5db..1de6f1e 100644 --- a/kernel/locking/rwsem-xadd.h +++ b/kernel/locking/rwsem-xadd.h @@ -200,6 +200,11 @@ static inline bool is_rwsem_owner_spinnable(struct task_struct *owner) return !((unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED); } +static inline bool is_rwsem_owner_reader(struct task_struct *owner) +{ + return (unsigned long)owner & RWSEM_READER_OWNED; +} + /* * Return true if the rwsem is owned by a reader. */ From patchwork Thu Feb 7 19:07:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801981 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D216B746 for ; Thu, 7 Feb 2019 19:10:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C25EC2E05F for ; Thu, 7 Feb 2019 19:10:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B5C112E066; Thu, 7 Feb 2019 19:10:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B9C22E05F for ; Thu, 7 Feb 2019 19:10:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727573AbfBGTKV (ORCPT ); Thu, 7 Feb 2019 14:10:21 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60716 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726576AbfBGTKU (ORCPT ); Thu, 7 Feb 2019 14:10:20 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3F35788E4F; Thu, 7 Feb 2019 19:10:18 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id B2C1861146; Thu, 7 Feb 2019 19:10:13 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 19/22] locking/rwsem: Enable readers spinning on writer Date: Thu, 7 Feb 2019 14:07:23 -0500 Message-Id: <1549566446-27967-20-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 07 Feb 2019 19:10:19 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch enables readers to optimistically spin on a rwsem when it is owned by a writer instead of going to sleep directly. The rwsem_can_spin_on_owner() function is extracted out of rwsem_optimistic_spin() and is called directly by __rwsem_down_read_failed_common() and __rwsem_down_write_failed_common(). This patch may actually reduce performance under certain circumstances for reader-mostly workload as the readers may not be grouped together in the wait queue anymore. So we may have a number of small reader groups among writers instead of a large reader group. However, this change is needed for some of the subsequent patches. With a locking microbenchmark running on 5.0 based kernel, the total locking rates (in kops/s) of the benchmark on a 4-socket 56-core x86-64 system with equal numbers of readers and writers before and after the patch were as follows: # of Threads Pre-patch Post-patch ------------ --------- ---------- 2 1,926 2,120 4 1,391 1,320 8 716 694 16 618 606 32 501 487 64 61 57 Signed-off-by: Waiman Long --- kernel/locking/lock_events_list.h | 1 + kernel/locking/rwsem-xadd.c | 80 ++++++++++++++++++++++++++++++++++----- kernel/locking/rwsem-xadd.h | 3 ++ 3 files changed, 74 insertions(+), 10 deletions(-) diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index 4cde507..54b6650 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -57,6 +57,7 @@ LOCK_EVENT(rwsem_sleep_writer) /* # of writer sleeps */ LOCK_EVENT(rwsem_wake_reader) /* # of reader wakeups */ LOCK_EVENT(rwsem_wake_writer) /* # of writer wakeups */ +LOCK_EVENT(rwsem_opt_rlock) /* # of read locks opt-spin acquired */ LOCK_EVENT(rwsem_opt_wlock) /* # of write locks opt-spin acquired */ LOCK_EVENT(rwsem_opt_fail) /* # of failed opt-spinnings */ LOCK_EVENT(rwsem_rlock) /* # of read locks acquired */ diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 0a29aac..015edd6 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -240,6 +240,30 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem, #ifdef CONFIG_RWSEM_SPIN_ON_OWNER /* + * Try to acquire read lock before the reader is put on wait queue. + * Lock acquisition isn't allowed if the rwsem is locked or a writer handoff + * is ongoing. + */ +static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem) +{ + long count = atomic_long_read(&sem->count); + + if (RWSEM_COUNT_WLOCKED_OR_HANDOFF(count)) + return false; + + count = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count); + if (!RWSEM_COUNT_WLOCKED_OR_HANDOFF(count)) { + rwsem_set_reader_owned(sem); + lockevent_inc(rwsem_opt_rlock); + return true; + } + + /* Back out the change */ + atomic_long_add(-RWSEM_READER_BIAS, &sem->count); + return false; +} + +/* * Try to acquire write lock before the writer has been put on wait queue. */ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem, @@ -291,8 +315,10 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) BUILD_BUG_ON(!rwsem_has_anonymous_owner(RWSEM_OWNER_UNKNOWN)); - if (need_resched()) + if (need_resched()) { + lockevent_inc(rwsem_opt_fail); return false; + } rcu_read_lock(); owner = rwsem_get_owner(sem); @@ -301,6 +327,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) owner_on_cpu(owner, sem); } rcu_read_unlock(); + lockevent_cond_inc(rwsem_opt_fail, !ret); return ret; } @@ -371,9 +398,6 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) preempt_disable(); /* sem->wait_lock should not be held when doing optimistic spinning */ - if (!rwsem_can_spin_on_owner(sem)) - goto done; - if (!osq_lock(&sem->osq)) goto done; @@ -388,10 +412,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) /* * Try to acquire the lock */ - if (rwsem_try_write_lock_unqueued(sem, wlock)) { - taken = true; + taken = wlock ? rwsem_try_write_lock_unqueued(sem, wlock) + : rwsem_try_read_lock_unqueued(sem); + + if (taken) break; - } /* * When there's no owner, we might have preempted between the @@ -418,7 +443,13 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) return taken; } #else -static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) +static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) +{ + return false; +} + +static inline bool rwsem_optimistic_spin(struct rw_semaphore *sem, + const long wlock) { return false; } @@ -444,6 +475,33 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) struct rwsem_waiter waiter; DEFINE_WAKE_Q(wake_q); + if (!rwsem_can_spin_on_owner(sem)) + goto queue; + + /* + * Undo read bias from down_read() and do optimistic spinning. + */ + atomic_long_add(-RWSEM_READER_BIAS, &sem->count); + adjustment = 0; + if (rwsem_optimistic_spin(sem, 0)) { + unsigned long flags; + + /* + * Opportunistically wake up other readers in the wait queue. + * It has another chance of wakeup at unlock time. + */ + if ((atomic_long_read(&sem->count) & RWSEM_FLAG_WAITERS) && + raw_spin_trylock_irqsave(&sem->wait_lock, flags)) { + if (!list_empty(&sem->wait_list)) + __rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED, + &wake_q); + raw_spin_unlock_irqrestore(&sem->wait_lock, flags); + wake_up_q(&wake_q); + } + return sem; + } + +queue: waiter.task = current; waiter.type = RWSEM_WAITING_FOR_READ; waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT; @@ -456,7 +514,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) * immediately as its RWSEM_READER_BIAS has already been * set in the count. */ - if (!(atomic_long_read(&sem->count) & RWSEM_WRITER_MASK)) { + if (adjustment && + !(atomic_long_read(&sem->count) & RWSEM_WRITER_MASK)) { raw_spin_unlock_irq(&sem->wait_lock); rwsem_set_reader_owned(sem); lockevent_inc(rwsem_rlock_fast); @@ -543,7 +602,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) const long wlock = RWSEM_WRITER_LOCKED; /* do optimistic spinning and steal lock if possible */ - if (rwsem_optimistic_spin(sem, wlock)) + if (rwsem_can_spin_on_owner(sem) && + rwsem_optimistic_spin(sem, wlock)) return sem; /* diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index 1de6f1e..eb4ef36 100644 --- a/kernel/locking/rwsem-xadd.h +++ b/kernel/locking/rwsem-xadd.h @@ -109,9 +109,12 @@ RWSEM_FLAG_HANDOFF) #define RWSEM_COUNT_LOCKED(c) ((c) & RWSEM_LOCK_MASK) +#define RWSEM_COUNT_WLOCKED(c) ((c) & RWSEM_WRITER_MASK) #define RWSEM_COUNT_HANDOFF(c) ((c) & RWSEM_FLAG_HANDOFF) #define RWSEM_COUNT_LOCKED_OR_HANDOFF(c) \ ((c) & (RWSEM_LOCK_MASK|RWSEM_FLAG_HANDOFF)) +#define RWSEM_COUNT_WLOCKED_OR_HANDOFF(c) \ + ((c) & (RWSEM_WRITER_MASK | RWSEM_FLAG_HANDOFF)) /* * Task structure pointer compression (64-bit only): From patchwork Thu Feb 7 19:07:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801973 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F90A1669 for ; Thu, 7 Feb 2019 19:10:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 607462E05F for ; Thu, 7 Feb 2019 19:10:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 53D6A2E066; Thu, 7 Feb 2019 19:10:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FFB62E05F for ; Thu, 7 Feb 2019 19:10:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727605AbfBGTKY (ORCPT ); Thu, 7 Feb 2019 14:10:24 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60800 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727594AbfBGTKX (ORCPT ); Thu, 7 Feb 2019 14:10:23 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9316A81DE2; Thu, 7 Feb 2019 19:10:21 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id E5C6E62FA3; Thu, 7 Feb 2019 19:10:15 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 20/22] locking/rwsem: Enable count-based spinning on reader Date: Thu, 7 Feb 2019 14:07:24 -0500 Message-Id: <1549566446-27967-21-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 07 Feb 2019 19:10:22 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When the rwsem is owned by reader, writers stop optimistic spinning simply because there is no easy way to figure out if all the readers are actively running or not. However, there are scenarios where the readers are unlikely to sleep and optimistic spinning can help performance. This patch provides a simple mechanism for spinning on a reader-owned rwsem. It is a loop count threshold based spinning where the count will get reset whenenver the rwsem reader count value changes indicating that the rwsem is still active. There is another maximum count value that limits that maximum number of spinnings that can happen. When the loop or max counts reach 0, a bit will be set in the owner field to indicate that no more optimistic spinning should be done on this rwsem until it becomes writer owned again. Not even readers is allowed to acquire the reader-locked rwsem for better fairness. The spinning threshold and maximum values can be overridden by architecture specific header file, if necessary. The current default threshold value is 512 iterations. With a locking microbenchmark running on 5.0 based kernel, the total locking rates (in kops/s) of the benchmark on a 4-socket 56-core x86-64 system with equal numbers of readers and writers before all the reader spining patches, before this patch and after this patch were as follows: # of Threads Pre-rspin Pre-Patch Post-patch ------------ --------- --------- ---------- 2 1,926 2,120 8,057 4 1,391 1,320 7,680 8 716 694 7,284 16 618 606 6,542 32 501 487 1,449 64 61 57 480 This patch gives a big boost in performance for mixed reader/writer workloads. Signed-off-by: Waiman Long --- kernel/locking/lock_events_list.h | 1 + kernel/locking/rwsem-xadd.c | 63 +++++++++++++++++++++++++++++++++++---- kernel/locking/rwsem-xadd.h | 45 +++++++++++++++++++++------- 3 files changed, 94 insertions(+), 15 deletions(-) diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index 54b6650..0052534 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -60,6 +60,7 @@ LOCK_EVENT(rwsem_opt_rlock) /* # of read locks opt-spin acquired */ LOCK_EVENT(rwsem_opt_wlock) /* # of write locks opt-spin acquired */ LOCK_EVENT(rwsem_opt_fail) /* # of failed opt-spinnings */ +LOCK_EVENT(rwsem_opt_nospin) /* # of disabled reader opt-spinnings */ LOCK_EVENT(rwsem_rlock) /* # of read locks acquired */ LOCK_EVENT(rwsem_rlock_fast) /* # of fast read locks acquired */ LOCK_EVENT(rwsem_rlock_fail) /* # of failed read lock acquisitions */ diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 015edd6..3beb942 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -95,6 +95,22 @@ enum rwsem_wake_type { #define RWSEM_WAIT_TIMEOUT ((HZ - 1)/200 + 1) /* + * Reader-owned rwsem spinning threshold and maximum value + * + * This threshold and maximum values can be overridden by architecture + * specific value. The loop count will be reset whenenver the rwsem count + * value changes. The max value constrains the total number of reader-owned + * lock spinnings that can happen. + */ +#ifdef ARCH_RWSEM_RSPIN_THRESHOLD +# define RWSEM_RSPIN_THRESHOLD ARCH_RWSEM_RSPIN_THRESHOLD +# define RWSEM_RSPIN_MAX ARCH_RWSEM_RSPIN_MAX +#else +# define RWSEM_RSPIN_THRESHOLD (1 << 9) +# define RWSEM_RSPIN_MAX (1 << 12) +#endif + +/* * handle the lock release when processes blocked on it that can now run * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must * have been set. @@ -324,7 +340,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) owner = rwsem_get_owner(sem); if (owner) { ret = is_rwsem_owner_spinnable(owner) && - owner_on_cpu(owner, sem); + (is_rwsem_owner_reader(owner) || owner_on_cpu(owner, sem)); } rcu_read_unlock(); lockevent_cond_inc(rwsem_opt_fail, !ret); @@ -359,7 +375,8 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem) * This enables the spinner to move forward and do a trylock * earlier. */ - while (owner && (READ_ONCE(sem->owner) == owner)) { + while (owner && !is_rwsem_owner_reader(owner) + && (READ_ONCE(sem->owner) == owner)) { /* * Ensure we emit the owner->on_cpu, dereference _after_ * checking sem->owner still matches owner, if that fails, @@ -394,6 +411,10 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem) static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) { bool taken = false; + enum owner_state owner_state; + int rspin_cnt = RWSEM_RSPIN_THRESHOLD; + int rspin_max = RWSEM_RSPIN_MAX; + int old_rcount = 0; preempt_disable(); @@ -401,14 +422,16 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) if (!osq_lock(&sem->osq)) goto done; + if (!is_rwsem_spinnable(sem)) + rspin_cnt = 0; + /* * Optimistically spin on the owner field and attempt to acquire the * lock whenever the owner changes. Spinning will be stopped when: * 1) the owning writer isn't running; or - * 2) readers own the lock as we can't determine if they are - * actively running or not. + * 2) readers own the lock and spinning count has reached 0. */ - while (rwsem_spin_on_owner(sem) == OWNER_SPINNABLE) { + while ((owner_state = rwsem_spin_on_owner(sem)) != OWNER_NONSPINNABLE) { /* * Try to acquire the lock */ @@ -429,6 +452,36 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) break; /* + * We only decremnt rspin_cnt when a writer is trying to + * acquire a lock owned by readers. In which case, + * rwsem_spin_on_owner() will essentially be a no-op + * and we will be spinning in this main loop. The spinning + * count will be reset whenever the rwsem count value + * changes. + */ + if (wlock && (owner_state == OWNER_READER)) { + int rcount; + + if (!rspin_cnt || !rspin_max) { + if (is_rwsem_spinnable(sem)) { + rwsem_set_nonspinnable(sem); + lockevent_inc(rwsem_opt_nospin); + } + break; + } + + rcount = atomic_long_read(&sem->count) + >> RWSEM_READER_SHIFT; + if (rcount != old_rcount) { + old_rcount = rcount; + rspin_cnt = RWSEM_RSPIN_THRESHOLD; + } else { + rspin_cnt--; + } + rspin_max--; + } + + /* * The cpu_relax() call is a compiler barrier which forces * everything in this loop to be re-loaded. We don't need * memory barriers as we'll eventually observe the right diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index eb4ef36..be67dbd 100644 --- a/kernel/locking/rwsem-xadd.h +++ b/kernel/locking/rwsem-xadd.h @@ -5,18 +5,20 @@ * - RWSEM_READER_OWNED (bit 0): The rwsem is owned by readers * - RWSEM_ANONYMOUSLY_OWNED (bit 1): The rwsem is anonymously owned, * i.e. the owner(s) cannot be readily determined. It can be reader - * owned or the owning writer is indeterminate. + * owned or the owning writer is indeterminate. Optimistic spinning + * should be disabled if this flag is set. * * When a writer acquires a rwsem, it puts its task_struct pointer - * into the owner field. It is cleared after an unlock. + * into the owner field or the count itself (64-bit only. It should + * be cleared after an unlock. * * When a reader acquires a rwsem, it will also puts its task_struct - * pointer into the owner field with both the RWSEM_READER_OWNED and - * RWSEM_ANONYMOUSLY_OWNED bits set. On unlock, the owner field will - * largely be left untouched. So for a free or reader-owned rwsem, - * the owner value may contain information about the last reader that - * acquires the rwsem. The anonymous bit is set because that particular - * reader may or may not still own the lock. + * pointer into the owner field with the RWSEM_READER_OWNED bit set. + * On unlock, the owner field will largely be left untouched. So + * for a free or reader-owned rwsem, the owner value may contain + * information about the last reader that acquires the rwsem. The + * anonymous bit may also be set to permanently disable optimistic + * spinning on a reader-own rwsem until a writer comes along. * * That information may be helpful in debugging cases where the system * seems to hang on a reader owned rwsem especially if only one reader @@ -182,8 +184,7 @@ static inline struct task_struct *rwsem_get_owner(struct rw_semaphore *sem) static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem, struct task_struct *owner) { - unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED - | RWSEM_ANONYMOUSLY_OWNED; + unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED; WRITE_ONCE(sem->owner, (struct task_struct *)val); } @@ -209,6 +210,14 @@ static inline bool is_rwsem_owner_reader(struct task_struct *owner) } /* + * Return true if the rwsem is spinnable. + */ +static inline bool is_rwsem_spinnable(struct rw_semaphore *sem) +{ + return is_rwsem_owner_spinnable(READ_ONCE(sem->owner)); +} + +/* * Return true if the rwsem is owned by a reader. */ static inline bool is_rwsem_reader_owned(struct rw_semaphore *sem) @@ -226,6 +235,22 @@ static inline bool is_rwsem_reader_owned(struct rw_semaphore *sem) } /* + * Set the RWSEM_ANONYMOUSLY_OWNED flag if the RWSEM_READER_OWNED flag + * remains set. Otherwise, the operation will be aborted. + */ +static inline void rwsem_set_nonspinnable(struct rw_semaphore *sem) +{ + long owner = (long)READ_ONCE(sem->owner); + + while (is_rwsem_owner_reader((struct task_struct *)owner)) { + if (!is_rwsem_owner_spinnable((struct task_struct *)owner)) + break; + owner = cmpxchg((long *)&sem->owner, owner, + owner | RWSEM_ANONYMOUSLY_OWNED); + } +} + +/* * Return true if rwsem is owned by an anonymous writer or readers. */ static inline bool rwsem_has_anonymous_owner(struct task_struct *owner) From patchwork Thu Feb 7 19:07:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801979 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C2B717FB for ; Thu, 7 Feb 2019 19:10:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EEC772E060 for ; Thu, 7 Feb 2019 19:10:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E25082E05F; Thu, 7 Feb 2019 19:10:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8343B2E060 for ; Thu, 7 Feb 2019 19:10:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727636AbfBGTKa (ORCPT ); Thu, 7 Feb 2019 14:10:30 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49796 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727612AbfBGTKY (ORCPT ); Thu, 7 Feb 2019 14:10:24 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BB5252551; Thu, 7 Feb 2019 19:10:23 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9488961146; Thu, 7 Feb 2019 19:10:21 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 21/22] locking/rwsem: Wake up all readers in wait queue Date: Thu, 7 Feb 2019 14:07:25 -0500 Message-Id: <1549566446-27967-22-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 07 Feb 2019 19:10:24 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When the front of the wait queue is a reader, other readers immediately following the first reader will also be woken up at the same time. However, if there is a writer in between. Those readers behind the writer will not be woken up. Because of optimistic spinning, the lock acquisition order is not FIFO anyway. The lock handoff mechanism will ensure that lock starvation will not happen. Assuming that the lock hold times of the other readers still in the queue will be about the same as the readers that are being woken up, there is really not much additional cost other than the additional latency due to the wakeup of additional tasks by the waker. Therefore all the readers in the queue are woken up when the first waiter is a reader to improve reader throughput. With a locking microbenchmark running on 5.0 based kernel, the total locking rates (in kops/s) of the benchmark on a 4-socket 56-core x86-64 system with equal numbers of readers and writers before all the reader spining patches, before this patch and after this patch were as follows: # of Threads Pre-rspin Pre-Patch Post-patch ------------ --------- --------- ---------- 2 1,926 8,057 7,397 4 1,391 7,680 6,161 8 716 7,284 6,405 16 618 6,542 6,768 32 501 1,449 6,550 64 61 480 5,548 112 75 769 5,216 At low contention level, there is a slight drop in performance. At high contention level, however, this patch gives a big performance boost. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 3beb942..3cf2e84 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -180,16 +180,16 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, } /* - * Grant an infinite number of read locks to the readers at the front - * of the queue. We know that woken will be at least 1 as we accounted - * for above. Note we increment the 'active part' of the count by the + * Grant an infinite number of read locks to all the readers in the + * queue. We know that woken will be at least 1 as we accounted for + * above. Note we increment the 'active part' of the count by the * number of readers before waking any processes up. */ list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) { struct task_struct *tsk; if (waiter->type == RWSEM_WAITING_FOR_WRITE) - break; + continue; woken++; tsk = waiter->task; From patchwork Thu Feb 7 19:07:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10801977 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB658746 for ; Thu, 7 Feb 2019 19:10:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CBFC22E05F for ; Thu, 7 Feb 2019 19:10:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C01D22E068; Thu, 7 Feb 2019 19:10:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6138F2E05F for ; Thu, 7 Feb 2019 19:10:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727641AbfBGTKa (ORCPT ); Thu, 7 Feb 2019 14:10:30 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47968 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727594AbfBGTK0 (ORCPT ); Thu, 7 Feb 2019 14:10:26 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 03F0B37E85; Thu, 7 Feb 2019 19:10:26 +0000 (UTC) Received: from llong.com (dhcp-17-35.bos.redhat.com [10.18.17.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB62662FA3; Thu, 7 Feb 2019 19:10:23 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, x86@kernel.org, Arnd Bergmann , Borislav Petkov , "H. Peter Anvin" , Davidlohr Bueso , Linus Torvalds , Andrew Morton , Tim Chen , Waiman Long Subject: [PATCH-tip 22/22] locking/rwsem: Ensure an RT task will not spin on reader Date: Thu, 7 Feb 2019 14:07:26 -0500 Message-Id: <1549566446-27967-23-git-send-email-longman@redhat.com> In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com> References: <1549566446-27967-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 07 Feb 2019 19:10:26 +0000 (UTC) Sender: linux-sh-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP An RT task can do optimistic spinning only if the lock holder is actually running. If the state of the lock holder isn't known, there is a possibility that high priority of the RT task may block forward progress of the lock holder if they happen to reside on the same CPU. This will lead to deadlock. So we have to make sure that an RT task will not spin on a reader-owned rwsem. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 3cf2e84..ac1f6c4 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -349,12 +349,14 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) /* * Return the folowing three values depending on the lock owner state. + * 2 when owner is currently NULL * 1 when owner changes and no reader is detected yet. * 0 when owner changes and the new owner may be a reader. * -1 when optimistic spinning has to stop because either the owner stops * running, is unknown, or its timeslice has been used up. */ enum owner_state { + OWNER_NULL = 2, OWNER_SPINNABLE = 1, OWNER_READER = 0, OWNER_NONSPINNABLE = -1, @@ -405,7 +407,8 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem) owner = rwsem_get_owner(sem); if (!is_rwsem_owner_spinnable(owner)) return OWNER_NONSPINNABLE; - return is_rwsem_owner_reader(owner) ? OWNER_READER : OWNER_SPINNABLE; + return !owner ? OWNER_NULL : + is_rwsem_owner_reader(owner) ? OWNER_READER : OWNER_SPINNABLE; } static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) @@ -442,12 +445,13 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock) break; /* - * When there's no owner, we might have preempted between the - * owner acquiring the lock and setting the owner field. If - * we're an RT task that will live-lock because we won't let - * the owner complete. + * An RT task cannot do optimistic spinning if it cannot + * be sure the lock holder is running. When there's no owner + * or is reader-owned, an RT task has to stop spinning or + * live-lock may happen if the current task and the lock + * holder happen to run in the same CPU. */ - if (!rwsem_get_owner(sem) && + if ((owner_state != OWNER_SPINNABLE) && (need_resched() || rt_task(current))) break;