diff mbox

[V2] lglock: add read-preference local-global rwlock

Message ID 5131FB4C.7070408@cn.fujitsu.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Lai Jiangshan March 2, 2013, 1:14 p.m. UTC
From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Sat, 2 Mar 2013 20:33:14 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
write-site must requires all percpu locks and fallback_rwlock.
outmost read-site must requires one of these locks.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) due to rwlock is read-preference.

3) read site functions are irqsafe(reentrance-safe)
   (read site functions is not protected by disabled irq, but they are irqsafe)
	If read site function is interrupted at any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Changed from V1
 fix a reentrance-bug which is cought by Oleg
 don't touch lockdep for nested reader(needed by below change)
 add two special APIs for cpuhotplug.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   38 ++++++++++++++++++++++++++
 kernel/lglock.c        |   68 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+), 0 deletions(-)

Comments

Srivatsa S. Bhat March 2, 2013, 5:11 p.m. UTC | #1
On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Sat, 2 Mar 2013 20:33:14 +0800
> Subject: [PATCH] lglock: add read-preference local-global rwlock
> 
> Current lglock is not read-preference, so it can't be used on some cases
> which read-preference rwlock can do. Example, get_cpu_online_atomic().
> 
[...]
> diff --git a/kernel/lglock.c b/kernel/lglock.c
> index 6535a66..52e9b2c 100644
> --- a/kernel/lglock.c
> +++ b/kernel/lglock.c
> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>  	preempt_enable();
>  }
>  EXPORT_SYMBOL(lg_global_unlock);
> +
> +#define FALLBACK_BASE	(1UL << 30)
> +
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +	struct lglock *lg = &lgrw->lglock;
> +
> +	preempt_disable();
> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
> +			read_lock(&lgrw->fallback_rwlock);
> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
> +			return;
> +		}
> +	}
> +
> +	__this_cpu_inc(*lgrw->reader_refcnt);
> +}
> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
> +
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> +	case 1:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		lg_local_unlock(&lgrw->lglock);
> +		return;

This should be a break, instead of a return, right?
Otherwise, there will be a preempt imbalance...

> +	case FALLBACK_BASE:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		read_unlock(&lgrw->fallback_rwlock);
> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> +		break;
> +	default:
> +		__this_cpu_dec(*lgrw->reader_refcnt);
> +		break;
> +	}
> +
> +	preempt_enable();
> +}


Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov March 2, 2013, 5:20 p.m. UTC | #2
On 03/02, Lai Jiangshan wrote:
>
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> +	case 1:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		lg_local_unlock(&lgrw->lglock);
> +		return;
> +	case FALLBACK_BASE:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		read_unlock(&lgrw->fallback_rwlock);
> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);

I guess "case 1:" should do rwlock_release() too.

Otherwise, at first glance looks correct...

However, I still think that FALLBACK_BASE only adds the unnecessary
complications. But even if I am right this is subjective of course, please
feel free to ignore.

And btw, I am not sure about lg->lock_dep_map, perhaps we should use
fallback_rwlock->dep_map ?

We need rwlock_acquire_read() even in the fast-path, and this acquire_read
should be paired with rwlock_acquire() in _write_lock(), but it does
spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
but perhaps fallback_rwlock->dep_map would be more clean.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov March 3, 2013, 5:40 p.m. UTC | #3
On 03/02, Oleg Nesterov wrote:
>
> On 03/02, Lai Jiangshan wrote:
> >
> > +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> > +{
> > +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> > +	case 1:
> > +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> > +		lg_local_unlock(&lgrw->lglock);
> > +		return;
> > +	case FALLBACK_BASE:
> > +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> > +		read_unlock(&lgrw->fallback_rwlock);
> > +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
>
> I guess "case 1:" should do rwlock_release() too.
>
> Otherwise, at first glance looks correct...
>
> However, I still think that FALLBACK_BASE only adds the unnecessary
> complications. But even if I am right this is subjective of course, please
> feel free to ignore.

Yes, but...

> And btw, I am not sure about lg->lock_dep_map, perhaps we should use
> fallback_rwlock->dep_map ?
>
> We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> should be paired with rwlock_acquire() in _write_lock(), but it does
> spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> but perhaps fallback_rwlock->dep_map would be more clean.

Please ignore this part.

I missed that lg_rwlock_global_write_lock() relies on lg_global_lock(),
and I just noticed that it does rwlock_acquire(lg->lock_dep_map).

Hmm. But then I do not understand the lglock annotations. Obviously,
rwlock_acquire_read() in lg_local_lock() can't even detect the simplest
deadlock, say, lg_local_lock(LOCK) + lg_local_lock(LOCK). Not to mention
spin_lock(X) + lg_local_lock(Y) vs lg_local_lock(Y) + spin_lock(X).

OK, I understand that it is not easy to make these annotations correct...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michel Lespinasse March 5, 2013, 1:37 a.m. UTC | #4
On Sun, Mar 3, 2013 at 9:40 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> However, I still think that FALLBACK_BASE only adds the unnecessary
>> complications. But even if I am right this is subjective of course, please
>> feel free to ignore.

Would it help if I sent out that version (without FALLBACK_BASE) as a
formal proposal ?

> Hmm. But then I do not understand the lglock annotations. Obviously,
> rwlock_acquire_read() in lg_local_lock() can't even detect the simplest
> deadlock, say, lg_local_lock(LOCK) + lg_local_lock(LOCK). Not to mention
> spin_lock(X) + lg_local_lock(Y) vs lg_local_lock(Y) + spin_lock(X).
>
> OK, I understand that it is not easy to make these annotations correct...

I am going to send out a proposal to fix the existing lglock
annotations and detect the two cases you noticed. It's actually not
that hard :)
Lai Jiangshan March 5, 2013, 3:27 p.m. UTC | #5
On 03/03/13 01:20, Oleg Nesterov wrote:
> On 03/02, Lai Jiangshan wrote:
>>
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>> +	case 1:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
>> +	case FALLBACK_BASE:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		read_unlock(&lgrw->fallback_rwlock);
>> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> 
> I guess "case 1:" should do rwlock_release() too.

Already do it in "lg_local_unlock(&lgrw->lglock);" before it returns.
(I like reuse old code)

> 
> Otherwise, at first glance looks correct...
> 
> However, I still think that FALLBACK_BASE only adds the unnecessary
> complications. But even if I am right this is subjective of course, please
> feel free to ignore.

OK, I kill FALLBACK_BASE in later patch.

> 
> And btw, I am not sure about lg->lock_dep_map, perhaps we should use
> fallback_rwlock->dep_map ?

Use either one is OK.

> 
> We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> should be paired with rwlock_acquire() in _write_lock(), but it does
> spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> but perhaps fallback_rwlock->dep_map would be more clean.
> 

I can't tell which one is better. I try to use fallback_rwlock->dep_map later.

> Oleg.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lai Jiangshan March 5, 2013, 3:41 p.m. UTC | #6
On 03/03/13 01:11, Srivatsa S. Bhat wrote:
> On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
>> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Sat, 2 Mar 2013 20:33:14 +0800
>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>
>> Current lglock is not read-preference, so it can't be used on some cases
>> which read-preference rwlock can do. Example, get_cpu_online_atomic().
>>
> [...]
>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>> index 6535a66..52e9b2c 100644
>> --- a/kernel/lglock.c
>> +++ b/kernel/lglock.c
>> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>>  	preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lg_global_unlock);
>> +
>> +#define FALLBACK_BASE	(1UL << 30)
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +	struct lglock *lg = &lgrw->lglock;
>> +
>> +	preempt_disable();
>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>> +			read_lock(&lgrw->fallback_rwlock);
>> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +			return;
>> +		}
>> +	}
>> +
>> +	__this_cpu_inc(*lgrw->reader_refcnt);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>> +
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>> +	case 1:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
> 
> This should be a break, instead of a return, right?
> Otherwise, there will be a preempt imbalance...


"lockdep" and "preempt" are handled in lg_local_unlock(&lgrw->lglock);

Thanks,
Lai

> 
>> +	case FALLBACK_BASE:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		read_unlock(&lgrw->fallback_rwlock);
>> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
>> +		break;
>> +	default:
>> +		__this_cpu_dec(*lgrw->reader_refcnt);
>> +		break;
>> +	}
>> +
>> +	preempt_enable();
>> +}
> 
> 
> Regards,
> Srivatsa S. Bhat
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov March 5, 2013, 4:41 p.m. UTC | #7
On 03/05, Lai Jiangshan wrote:
>
> On 03/03/13 01:20, Oleg Nesterov wrote:
> > On 03/02, Lai Jiangshan wrote:
> >>
> >> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> >> +{
> >> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> >> +	case 1:
> >> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> >> +		lg_local_unlock(&lgrw->lglock);
> >> +		return;
> >> +	case FALLBACK_BASE:
> >> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> >> +		read_unlock(&lgrw->fallback_rwlock);
> >> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> >
> > I guess "case 1:" should do rwlock_release() too.
>
> Already do it in "lg_local_unlock(&lgrw->lglock);" before it returns.
> (I like reuse old code)

Yes, I was wrong thanks. Another case when I didn't notice that you
re-use the regular lg_ code...

> > We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> > should be paired with rwlock_acquire() in _write_lock(), but it does
> > spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> > but perhaps fallback_rwlock->dep_map would be more clean.
>
> I can't tell which one is better. I try to use fallback_rwlock->dep_map later.

I am not sure which one should be better too, please check.

Again, I forgot that _write_lock/unlock use lg_global_*() code.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat March 5, 2013, 5:55 p.m. UTC | #8
On 03/05/2013 09:11 PM, Lai Jiangshan wrote:
> On 03/03/13 01:11, Srivatsa S. Bhat wrote:
>> On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
>>> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Sat, 2 Mar 2013 20:33:14 +0800
>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>
>>> Current lglock is not read-preference, so it can't be used on some cases
>>> which read-preference rwlock can do. Example, get_cpu_online_atomic().
>>>
>> [...]
>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>> index 6535a66..52e9b2c 100644
>>> --- a/kernel/lglock.c
>>> +++ b/kernel/lglock.c
>>> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>>>  	preempt_enable();
>>>  }
>>>  EXPORT_SYMBOL(lg_global_unlock);
>>> +
>>> +#define FALLBACK_BASE	(1UL << 30)
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>> +{
>>> +	struct lglock *lg = &lgrw->lglock;
>>> +
>>> +	preempt_disable();
>>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>>> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>> +			read_lock(&lgrw->fallback_rwlock);
>>> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
>>> +			return;
>>> +		}
>>> +	}
>>> +
>>> +	__this_cpu_inc(*lgrw->reader_refcnt);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>> +
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>>> +	case 1:
>>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>>> +		lg_local_unlock(&lgrw->lglock);
>>> +		return;
>>
>> This should be a break, instead of a return, right?
>> Otherwise, there will be a preempt imbalance...
> 
> 
> "lockdep" and "preempt" are handled in lg_local_unlock(&lgrw->lglock);
> 

Ah, ok.. I had missed that.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..90bfe79 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,42 @@  void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+/* read-preference read-write-lock like rwlock but provides local-read-lock */
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..52e9b2c 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,71 @@  void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
+			return;
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
+	case 1:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		read_unlock(&lgrw->fallback_rwlock);
+		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+		break;
+	default:
+		__this_cpu_dec(*lgrw->reader_refcnt);
+		break;
+	}
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
+
+/* special write-site APIs allolw nested reader in such write-site */
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw)
+{
+	lg_rwlock_global_write_lock(lgrw);
+	__this_cpu_write(*lgrw->reader_refcnt, 1);
+}
+
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw)
+{
+	__this_cpu_write(*lgrw->reader_refcnt, 0);
+	lg_rwlock_global_write_unlock(lgrw);
+}