From patchwork Tue Mar 5 16:19:12 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Lespinasse X-Patchwork-Id: 2220401 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork1.kernel.org (Postfix) with ESMTP id 8849D3FCF6 for ; Tue, 5 Mar 2013 16:22:15 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1UCuaM-0005qk-QC; Tue, 05 Mar 2013 16:19:26 +0000 Received: from mail-pb0-f44.google.com ([209.85.160.44]) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1UCuaI-0005oK-HS for linux-arm-kernel@lists.infradead.org; Tue, 05 Mar 2013 16:19:24 +0000 Received: by mail-pb0-f44.google.com with SMTP id wz12so4664438pbc.3 for ; Tue, 05 Mar 2013 08:19:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=nAr2X99pytchHEbbN5bVQn0pZBlyGSVi9ADDsLndqI8=; b=c4x4d18T2ozSh/z2+K5MdPRtSJg36ua8V8ipe3jlBkiw5LweAD/s1Zs3oYJusVtAUx LvIHyOjV/WokeUpZ9U+dUWvJw9nsxW/GwEy1MJYQDLL6wJ3qYBzQxGMzerSJijJFuL23 LEgxMx/90dqIIVI5a95CC0EpSRDBvOvYTStLyehhOj19I9SP7vrHmveTMkvCM40XN5A+ peQEKLKKSR9oJF9fjBtyMWKxSdbx7GhytGHZzeIrA4NJSTJ+mOTTFQLW6BY5WSAwg0RO dgW5VNqLlRKU8Rh0RrKRJUsqWmXC0Eth1jsZUMz3lABWijm5l5Z1Hr/I5i/I+/QowqVO W9aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent:x-gm-message-state; bh=nAr2X99pytchHEbbN5bVQn0pZBlyGSVi9ADDsLndqI8=; b=OSz4xraBdhvyQeCPt935H3kej9Xp94EV9i8rxUL3jTwBNFbfpVfxUYphX0RWSklaSk mRHVLGUBVKq3dkeRxICY0MvSZJYZEUkWPViUB3gXRRUrS6rYSgSF3EmIWPKJ3vOKbKxQ 0tjgijG+TaDRZugP0AjpbbMOI2YgIGgD+fxzfiUYZazr2tdh7B03j9hCDsF7kOCXr5vn N05A2nlsfn6ovaA/ZRT3l3Gtxw9k0m38+fmXbCorXfis42lF5udlKHzXtXfAGmBS94Bo 1Us9tYPZ463m1BEHavC6Jz36A6/T+UdoiwDxX1q/4DnD7yJ1/RCaCPg89ULv6NXzxpX3 m6hw== X-Received: by 10.68.213.7 with SMTP id no7mr37919597pbc.48.1362500356266; Tue, 05 Mar 2013 08:19:16 -0800 (PST) Received: from google.com ([2620:0:1000:3003:baac:6fff:fe98:d63f]) by mx.google.com with ESMTPS id rl3sm27256125pbb.28.2013.03.05.08.19.14 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 05 Mar 2013 08:19:15 -0800 (PST) Date: Tue, 5 Mar 2013 08:19:12 -0800 From: Michel Lespinasse To: Lai Jiangshan Subject: Re: [PATCH V2] lglock: add read-preference local-global rwlock Message-ID: <20130305161912.GA30756@google.com> References: <512D0D67.9010609@linux.vnet.ibm.com> <512E7879.20109@linux.vnet.ibm.com> <5130E8E2.50206@cn.fujitsu.com> <20130301182854.GA3631@redhat.com> <5131FB4C.7070408@cn.fujitsu.com> <20130302172003.GC29769@redhat.com> <51360ED1.3030104@cn.fujitsu.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <51360ED1.3030104@cn.fujitsu.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Gm-Message-State: ALoCoQlcMDmSIYd0TbqZGPGNq6MgIN4GbVMMFmmxj4SDpM1bHFOG62cia+p5WgGBIhVcnJmSIa4bPrjOsfFq3qnb2Pf9mS7705qlw0FKZ6mSuf005Tn/ric54VjEnC/QDc/UcKndQ66Ct7gP6imuKjqFYHqoLOZV7JmloH+kpA/xhooxXQR5E/hYK9YlmbAg2Pftsj2TpVUbX3hwNISR8QEw25qs4e6Vug== X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20130305_111922_858602_ACDB9EA4 X-CRM114-Status: GOOD ( 23.26 ) X-Spam-Score: -3.3 (---) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-3.3 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.160.44 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -0.6 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature Cc: Lai Jiangshan , linux-doc@vger.kernel.org, peterz@infradead.org, fweisbec@gmail.com, linux-kernel@vger.kernel.org, mingo@kernel.org, linux-arch@vger.kernel.org, linux@arm.linux.org.uk, xiaoguangrong@linux.vnet.ibm.com, wangyun@linux.vnet.ibm.com, paulmck@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com, linux-pm@vger.kernel.org, rusty@rustcorp.com.au, rostedt@goodmis.org, rjw@sisk.pl, namhyung@kernel.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org, Oleg Nesterov , vincent.guittot@linaro.org, sbw@mit.edu, "Srivatsa S. Bhat" , tj@kernel.org, akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Hi Lai, Just a few comments about your v2 proposal. Hopefully you'll catch these before you send out v3 :) - I would prefer reader_refcnt to be unsigned int instead of unsigned long - I would like some comment to indicate that lgrwlocks don't have reader-writer fairness and are thus somewhat discouraged (people could use plain lglock if they don't need reader preference, though even that use (as brlock) is discouraged already :) - I don't think FALLBACK_BASE is necessary (you already mentioned you'd drop it) - I prefer using the fallback_rwlock's dep_map for lockdep tracking. I feel this is more natural since we want the lgrwlock to behave as the rwlock, not as the lglock. - I prefer to avoid return statements in the middle of functions when it's easyto do so. Attached is my current version (based on an earlier version of your code). You don't have to take it as is but I feel it makes for a more concrete suggestion :) Thanks, ----------------------------8<------------------------------------------- lglock: add read-preference lgrwlock Current lglock may be used as a fair rwlock; however sometimes a read-preference rwlock is preferred. One such use case recently came up for get_cpu_online_atomic(). This change adds a new lgrwlock with the following properties: - high performance read side, using only cpu-local structures when there is no write side to contend with; - correctness guarantees similar to rwlock_t: recursive readers are allowed and the lock's read side is not ordered vs other locks; - low performance write side (comparable to lglocks' global side). The implementation relies on the following principles: - reader_refcnt is a local lock count; it indicates how many recursive read locks are taken using the local lglock; - lglock is used by readers for local locking; it must be acquired before reader_refcnt becomes nonzero and released after reader_refcnt goes back to zero; - fallback_rwlock is used by readers for global locking; it is acquired when fallback_reader_refcnt is zero and the trylock fails on lglock. - writers take both the lglock write side and the fallback_rwlock, thus making sure to exclude both local and global readers. Thanks to Srivatsa S. Bhat for proposing a lock with these requirements and Lai Jiangshan for proposing this algorithm as an lglock extension. Signed-off-by: Michel Lespinasse --- include/linux/lglock.h | 46 +++++++++++++++++++++++++++++++++++++++ kernel/lglock.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+) diff --git a/include/linux/lglock.h b/include/linux/lglock.h index 0d24e932db0b..8b59084935d5 100644 --- a/include/linux/lglock.h +++ b/include/linux/lglock.h @@ -67,4 +67,50 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu); void lg_global_lock(struct lglock *lg); void lg_global_unlock(struct lglock *lg); +/* + * lglock may be used as a read write spinlock if desired (though this is + * not encouraged as the write side scales badly on high CPU count machines). + * It has reader/writer fairness when used that way. + * + * However, sometimes it is desired to have an unfair rwlock instead, with + * reentrant readers that don't need to be ordered vs other locks, comparable + * to rwlock_t. lgrwlock implements such semantics. + */ +struct lgrwlock { + unsigned int __percpu *reader_refcnt; + struct lglock lglock; + rwlock_t fallback_rwlock; +}; + +#define __DEFINE_LGRWLOCK_PERCPU_DATA(name) \ + static DEFINE_PER_CPU(unsigned int, name ## _refcnt); \ + static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \ + = __ARCH_SPIN_LOCK_UNLOCKED; + +#define __LGRWLOCK_INIT(name) { \ + .reader_refcnt = &name ## _refcnt, \ + .lglock = { .lock = &name ## _lock }, \ + .fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock) \ +} + +#define DEFINE_LGRWLOCK(name) \ + __DEFINE_LGRWLOCK_PERCPU_DATA(name) \ + struct lgrwlock name = __LGRWLOCK_INIT(name) + +#define DEFINE_STATIC_LGRWLOCK(name) \ + __DEFINE_LGRWLOCK_PERCPU_DATA(name) \ + static struct lgrwlock name = __LGRWLOCK_INIT(name) + +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name) +{ + lg_lock_init(&lgrw->lglock, name); +} + +void lg_read_lock(struct lgrwlock *lgrw); +void lg_read_unlock(struct lgrwlock *lgrw); +void lg_write_lock(struct lgrwlock *lgrw); +void lg_write_unlock(struct lgrwlock *lgrw); +void __lg_read_write_lock(struct lgrwlock *lgrw); +void __lg_read_write_unlock(struct lgrwlock *lgrw); + #endif diff --git a/kernel/lglock.c b/kernel/lglock.c index 86ae2aebf004..e78a7c95dbfd 100644 --- a/kernel/lglock.c +++ b/kernel/lglock.c @@ -87,3 +87,61 @@ void lg_global_unlock(struct lglock *lg) preempt_enable(); } EXPORT_SYMBOL(lg_global_unlock); + +void lg_read_lock(struct lgrwlock *lgrw) +{ + preempt_disable(); + + if (__this_cpu_read(*lgrw->reader_refcnt) || + arch_spin_trylock(this_cpu_ptr(lgrw->lglock.lock))) { + __this_cpu_inc(*lgrw->reader_refcnt); + rwlock_acquire_read(&lgrw->fallback_rwlock.dep_map, + 0, 0, _RET_IP_); + } else { + read_lock(&lgrw->fallback_rwlock); + } +} +EXPORT_SYMBOL(lg_read_lock); + +void lg_read_unlock(struct lgrwlock *lgrw) +{ + if (likely(__this_cpu_read(*lgrw->reader_refcnt))) { + rwlock_release(&lgrw->fallback_rwlock.dep_map, + 1, _RET_IP_); + if (!__this_cpu_dec_return(*lgrw->reader_refcnt)) + arch_spin_unlock(this_cpu_ptr(lgrw->lglock.lock)); + } else { + read_unlock(&lgrw->fallback_rwlock); + } + + preempt_enable(); +} +EXPORT_SYMBOL(lg_read_unlock); + +void lg_write_lock(struct lgrwlock *lgrw) +{ + lg_global_lock(&lgrw->lglock); + write_lock(&lgrw->fallback_rwlock); +} +EXPORT_SYMBOL(lg_write_lock); + +void lg_write_unlock(struct lgrwlock *lgrw) +{ + write_unlock(&lgrw->fallback_rwlock); + lg_global_unlock(&lgrw->lglock); +} +EXPORT_SYMBOL(lg_write_unlock); + +void __lg_read_write_lock(struct lgrwlock *lgrw) +{ + lg_write_lock(lgrw); + __this_cpu_write(*lgrw->reader_refcnt, 1); +} +EXPORT_SYMBOL(__lg_read_write_lock); + +void __lg_read_write_unlock(struct lgrwlock *lgrw) +{ + __this_cpu_write(*lgrw->reader_refcnt, 0); + lg_write_unlock(lgrw); +} +EXPORT_SYMBOL(__lg_read_write_unlock);