diff mbox

[3/3] eventfd: add internal reference counting to fix notifier race conditions

Message ID 20090619185138.31118.14916.stgit@dev.haskins.net (mailing list archive)
State New, archived
Headers show

Commit Message

Gregory Haskins June 19, 2009, 6:51 p.m. UTC
eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
notifier->release() callback.  This lets notification clients know if
the eventfd is about to go away and is very useful particularly for
in-kernel clients.  However, as it stands today it is not possible to
use the notification API in a race-free way.  This patch adds some
additional logic to the notification subsystem to rectify this problem.

Background:
-----------------------
Eventfd currently only has one reference count mechanism: fget/fput.  This
in of itself is normally fine.  However, if a client expects to be
notified if the eventfd is closed, it cannot hold a fget() reference
itself or the underlying f_ops->release() callback will never be invoked
by VFS.  Therefore we have this somewhat unusual situation where we may
hold a pointer to an eventfd object (by virtue of having a waiter registered
in its wait-queue), but no reference.  This makes it nearly impossible to
design a mutual decoupling algorithm: you cannot unhook one side from the
other (or vice versa) without racing.

The first problem was dealt with by essentially "donating" a surrogate
object to eventfd.  In other words, when a client attached to eventfd
and then later detached, it would decouple internally in a race free way
and then leave part of the object still attached to the eventfd.  This
decoupled object would correctly detect the end-of-life of the eventfd
object at some point in the future and be deallocated: However, we cannot
guarantee that this operation would not race with a potential rmmod of the
client, and is therefore broken.

Solution Details:
-----------------------

1) We add a private kref to the internal eventfd_ctx object.  This
   reference can be (transparently) held by notification clients without
   affecting the ability for VFS to indicate ->release() notification.

2) We convert the current lockless POLLHUP to a more traditional locked
   variant (*) so that we can ensure a race free mutual-decouple
   algorithm without requiring an surrogate object.

3) We guard the decouple algorithm with an atomic bit-clear to ensure
   mutual exclusion of the decoupling and reference-drop.

4) We hold a reference to the underlying eventfd_ctx until all paths
   have satisfactorily completed to ensure we do not race with eventfd
   going away.

Between these points, we believe we now have a race-free release
mechanism.

[*] Clients that previously assumed the ->release() could sleep will
need to be refactored.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Davide Libenzi <davidel@xmailserver.org>
CC: Michael S. Tsirkin <mst@redhat.com>
---

 fs/eventfd.c            |   62 +++++++++++++++++++++++++++++++++--------------
 include/linux/eventfd.h |    3 ++
 2 files changed, 47 insertions(+), 18 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Davide Libenzi June 19, 2009, 7:10 p.m. UTC | #1
On Fri, 19 Jun 2009, Gregory Haskins wrote:

> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
> notifier->release() callback.  This lets notification clients know if
> the eventfd is about to go away and is very useful particularly for
> in-kernel clients.  However, as it stands today it is not possible to
> use the notification API in a race-free way.  This patch adds some
> additional logic to the notification subsystem to rectify this problem.
> 
> Background:
> -----------------------
> Eventfd currently only has one reference count mechanism: fget/fput.  This
> in of itself is normally fine.  However, if a client expects to be
> notified if the eventfd is closed, it cannot hold a fget() reference
> itself or the underlying f_ops->release() callback will never be invoked
> by VFS.  Therefore we have this somewhat unusual situation where we may
> hold a pointer to an eventfd object (by virtue of having a waiter registered
> in its wait-queue), but no reference.  This makes it nearly impossible to
> design a mutual decoupling algorithm: you cannot unhook one side from the
> other (or vice versa) without racing.

And why is that?

struct xxx {
	struct mutex mtx;
	struct file *file;
	...
};

struct file *xxx_get_file(struct xxx *x) {
	struct file *file;

	mutex_lock(&x->mtx);
	file = x->file;
	if (!file)
		mutex_unlock(&x->mtx);
	return file;
}

void xxx_release_file(struct xxx *x) {
	mutex_unlock(&x->mtx);
}

void handle_POLLHUP(struct xxx *x) {
	struct file *file;

	file = xxx_get_file(x);
	if (file) {
		unhook_waitqueue(file, ...);
		x->file = NULL;
		xxx_release_file(x);
	}
}


Every time you need to "use" file, you call xxx_get_file(), and if you get 
NULL, it means it's gone and you handle it accordigly to your IRQ fd 
policies. As soon as you done with the file, you call xxx_release_file().
Replace "mtx" with the lock that fits your needs.



- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Haskins June 19, 2009, 9:16 p.m. UTC | #2
Davide Libenzi wrote:
> On Fri, 19 Jun 2009, Gregory Haskins wrote:
>
>   
>> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
>> notifier->release() callback.  This lets notification clients know if
>> the eventfd is about to go away and is very useful particularly for
>> in-kernel clients.  However, as it stands today it is not possible to
>> use the notification API in a race-free way.  This patch adds some
>> additional logic to the notification subsystem to rectify this problem.
>>
>> Background:
>> -----------------------
>> Eventfd currently only has one reference count mechanism: fget/fput.  This
>> in of itself is normally fine.  However, if a client expects to be
>> notified if the eventfd is closed, it cannot hold a fget() reference
>> itself or the underlying f_ops->release() callback will never be invoked
>> by VFS.  Therefore we have this somewhat unusual situation where we may
>> hold a pointer to an eventfd object (by virtue of having a waiter registered
>> in its wait-queue), but no reference.  This makes it nearly impossible to
>> design a mutual decoupling algorithm: you cannot unhook one side from the
>> other (or vice versa) without racing.
>>     
>
> And why is that?
>
> struct xxx {
> 	struct mutex mtx;
> 	struct file *file;
> 	...
> };
>
> struct file *xxx_get_file(struct xxx *x) {
> 	struct file *file;
>
> 	mutex_lock(&x->mtx);
> 	file = x->file;
> 	if (!file)
> 		mutex_unlock(&x->mtx);
> 	return file;
> }
>
> void xxx_release_file(struct xxx *x) {
> 	mutex_unlock(&x->mtx);
> }
>
> void handle_POLLHUP(struct xxx *x) {
> 	struct file *file;
>
> 	file = xxx_get_file(x);
> 	if (file) {
> 		unhook_waitqueue(file, ...);
> 		x->file = NULL;
> 		xxx_release_file(x);
> 	}
> }
>
>
> Every time you need to "use" file, you call xxx_get_file(), and if you get 
> NULL, it means it's gone and you handle it accordigly to your IRQ fd 
> policies. As soon as you done with the file, you call xxx_release_file().
> Replace "mtx" with the lock that fits your needs.
>   

Consider what would happen if the f_ops->release() was preempted inside
the wake_up_locked_polled() after it dereferenced the xxx from the list,
but before it calls the callback(POLLHUP).  The xxx object, and/or the
.text for the xxx object may be long gone by the time it comes back
around.  Afaict, there is no way to guard against that scenario unless
you do something like 2/3+3/3.  Or am I missing something?

-Greg
Davide Libenzi June 19, 2009, 9:26 p.m. UTC | #3
On Fri, 19 Jun 2009, Gregory Haskins wrote:

> Davide Libenzi wrote:
> > On Fri, 19 Jun 2009, Gregory Haskins wrote:
> >
> >   
> >> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
> >> notifier->release() callback.  This lets notification clients know if
> >> the eventfd is about to go away and is very useful particularly for
> >> in-kernel clients.  However, as it stands today it is not possible to
> >> use the notification API in a race-free way.  This patch adds some
> >> additional logic to the notification subsystem to rectify this problem.
> >>
> >> Background:
> >> -----------------------
> >> Eventfd currently only has one reference count mechanism: fget/fput.  This
> >> in of itself is normally fine.  However, if a client expects to be
> >> notified if the eventfd is closed, it cannot hold a fget() reference
> >> itself or the underlying f_ops->release() callback will never be invoked
> >> by VFS.  Therefore we have this somewhat unusual situation where we may
> >> hold a pointer to an eventfd object (by virtue of having a waiter registered
> >> in its wait-queue), but no reference.  This makes it nearly impossible to
> >> design a mutual decoupling algorithm: you cannot unhook one side from the
> >> other (or vice versa) without racing.
> >>     
> >
> > And why is that?
> >
> > struct xxx {
> > 	struct mutex mtx;
> > 	struct file *file;
> > 	...
> > };
> >
> > struct file *xxx_get_file(struct xxx *x) {
> > 	struct file *file;
> >
> > 	mutex_lock(&x->mtx);
> > 	file = x->file;
> > 	if (!file)
> > 		mutex_unlock(&x->mtx);
> > 	return file;
> > }
> >
> > void xxx_release_file(struct xxx *x) {
> > 	mutex_unlock(&x->mtx);
> > }
> >
> > void handle_POLLHUP(struct xxx *x) {
> > 	struct file *file;
> >
> > 	file = xxx_get_file(x);
> > 	if (file) {
> > 		unhook_waitqueue(file, ...);
> > 		x->file = NULL;
> > 		xxx_release_file(x);
> > 	}
> > }
> >
> >
> > Every time you need to "use" file, you call xxx_get_file(), and if you get 
> > NULL, it means it's gone and you handle it accordigly to your IRQ fd 
> > policies. As soon as you done with the file, you call xxx_release_file().
> > Replace "mtx" with the lock that fits your needs.
> >   
> 
> Consider what would happen if the f_ops->release() was preempted inside
> the wake_up_locked_polled() after it dereferenced the xxx from the list,
> but before it calls the callback(POLLHUP).  The xxx object, and/or the
> .text for the xxx object may be long gone by the time it comes back
> around.  Afaict, there is no way to guard against that scenario unless
> you do something like 2/3+3/3.  Or am I missing something?

Right. Don't you see an easier answer to that, instead of adding 300 lines 
of code to eventfd?
For example, turning wake_up_locked() into a nornal wake_up().



- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Haskins June 19, 2009, 9:49 p.m. UTC | #4
Davide Libenzi wrote:
> On Fri, 19 Jun 2009, Gregory Haskins wrote:
>
>   
>> Davide Libenzi wrote:
>>     
>>> On Fri, 19 Jun 2009, Gregory Haskins wrote:
>>>
>>>   
>>>       
>>>> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
>>>> notifier->release() callback.  This lets notification clients know if
>>>> the eventfd is about to go away and is very useful particularly for
>>>> in-kernel clients.  However, as it stands today it is not possible to
>>>> use the notification API in a race-free way.  This patch adds some
>>>> additional logic to the notification subsystem to rectify this problem.
>>>>
>>>> Background:
>>>> -----------------------
>>>> Eventfd currently only has one reference count mechanism: fget/fput.  This
>>>> in of itself is normally fine.  However, if a client expects to be
>>>> notified if the eventfd is closed, it cannot hold a fget() reference
>>>> itself or the underlying f_ops->release() callback will never be invoked
>>>> by VFS.  Therefore we have this somewhat unusual situation where we may
>>>> hold a pointer to an eventfd object (by virtue of having a waiter registered
>>>> in its wait-queue), but no reference.  This makes it nearly impossible to
>>>> design a mutual decoupling algorithm: you cannot unhook one side from the
>>>> other (or vice versa) without racing.
>>>>     
>>>>         
>>> And why is that?
>>>
>>> struct xxx {
>>> 	struct mutex mtx;
>>> 	struct file *file;
>>> 	...
>>> };
>>>
>>> struct file *xxx_get_file(struct xxx *x) {
>>> 	struct file *file;
>>>
>>> 	mutex_lock(&x->mtx);
>>> 	file = x->file;
>>> 	if (!file)
>>> 		mutex_unlock(&x->mtx);
>>> 	return file;
>>> }
>>>
>>> void xxx_release_file(struct xxx *x) {
>>> 	mutex_unlock(&x->mtx);
>>> }
>>>
>>> void handle_POLLHUP(struct xxx *x) {
>>> 	struct file *file;
>>>
>>> 	file = xxx_get_file(x);
>>> 	if (file) {
>>> 		unhook_waitqueue(file, ...);
>>> 		x->file = NULL;
>>> 		xxx_release_file(x);
>>> 	}
>>> }
>>>
>>>
>>> Every time you need to "use" file, you call xxx_get_file(), and if you get 
>>> NULL, it means it's gone and you handle it accordigly to your IRQ fd 
>>> policies. As soon as you done with the file, you call xxx_release_file().
>>> Replace "mtx" with the lock that fits your needs.
>>>   
>>>       
>> Consider what would happen if the f_ops->release() was preempted inside
>> the wake_up_locked_polled() after it dereferenced the xxx from the list,
>> but before it calls the callback(POLLHUP).  The xxx object, and/or the
>> .text for the xxx object may be long gone by the time it comes back
>> around.  Afaict, there is no way to guard against that scenario unless
>> you do something like 2/3+3/3.  Or am I missing something?
>>     
>
> Right. Don't you see an easier answer to that, instead of adding 300 lines 
> of code to eventfd?
>   

I tried, but this problem made my head hurt and this was what I came up
with that I felt closes the holes all the way.  Also keep in mind that
while I added X lines to eventfd, I took Y lines *out* of irqfd in the
process, too.  I just excluded the KVM portions in this thread per your
request, so its not apparent.  But technically, any other clients that
may come along can reuse that notification code instead of coding it
again.  One way or the other, *someone* has to do that ptable_proc stuff
;)  FYI: Its more like 133 lines, fwiw.

fs/eventfd.c            |  104
++++++++++++++++++++++++++++++++++++++++++++----
 include/linux/eventfd.h |   36 ++++++++++++++++
 2 files changed, 133 insertions(+), 7 deletions(-)

In case you care, heres what the complete solution when I include KVM
currently looks like:

 fs/eventfd.c            |  104 +++++++++++++++++++++++++--
 include/linux/eventfd.h |   36 +++++++++
 virt/kvm/eventfd.c      |  181
+++++++++++++++++++++++++-----------------------
 3 files changed, 228 insertions(+), 93 deletions(-)

> For example, turning wake_up_locked() into a nornal wake_up().
>   

I am fairly confident it is not that simple after having thought about
this issue over the last few days.  But I've been wrong in the past. 
Propose a patch and I will review it for races/correctness, if you
like.  Perhaps a combination of that plus your asymmetrical locking
scheme would work.  One of the challenges you will hit is avoiding ABBA
between your "get" lock and the wqh, but good luck!

-Greg
Davide Libenzi June 19, 2009, 9:54 p.m. UTC | #5
On Fri, 19 Jun 2009, Gregory Haskins wrote:

> I am fairly confident it is not that simple after having thought about
> this issue over the last few days.  But I've been wrong in the past. 
> Propose a patch and I will review it for races/correctness, if you
> like.  Perhaps a combination of that plus your asymmetrical locking
> scheme would work.  One of the challenges you will hit is avoiding ABBA
> between your "get" lock and the wqh, but good luck!

A patch for what? The eventfd patch is a one-liner.
It seems hard to believe that the thing cannot be handled on your side. 
Once the wake_up_locked() is turned into a wake_up(), what other races are 
there?
Let's not try to find the problem that fits and justify the "cool" 
solution, but let's see if a problem is there at all.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Davide Libenzi June 19, 2009, 10:47 p.m. UTC | #6
On Fri, 19 Jun 2009, Davide Libenzi wrote:

> On Fri, 19 Jun 2009, Gregory Haskins wrote:
> 
> > I am fairly confident it is not that simple after having thought about
> > this issue over the last few days.  But I've been wrong in the past. 
> > Propose a patch and I will review it for races/correctness, if you
> > like.  Perhaps a combination of that plus your asymmetrical locking
> > scheme would work.  One of the challenges you will hit is avoiding ABBA
> > between your "get" lock and the wqh, but good luck!
> 
> A patch for what? The eventfd patch is a one-liner.
> It seems hard to believe that the thing cannot be handled on your side. 
> Once the wake_up_locked() is turned into a wake_up(), what other races are 
> there?

AFAICS, the IRQfd code simply registers the callback to ->poll() and waits 
for two events.
In the POLLIN event, you schedule_work(&irqfd->inject) and there are no 
races there AFAICS (you basically do not care of anything eventfd memory 
related at all).
For POLLHUP, you do:

	spin_lock(irqfd->slock);
	if (irqfd->wqh)
		schedule_work(&irqfd->inject);
	irqfd->wqh = NULL;
	spin_unlock(irqfd->slock);

In your work function you notice the POLLHUP condition and take proper 
action (dunno what it is in your case).
In your kvm_irqfd_release() function:

	spin_lock(irqfd->slock);
	if (irqfd->wqh)
		remove_wait_queue(irqfd->wqh, &irqfd->wait);
	irqfd->wqh = NULL;
	spin_unlock(irqfd->slock);

Any races in there?



- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Haskins June 20, 2009, 2:09 a.m. UTC | #7
Davide Libenzi wrote:
> On Fri, 19 Jun 2009, Davide Libenzi wrote:
>
>   
>> On Fri, 19 Jun 2009, Gregory Haskins wrote:
>>
>>     
>>> I am fairly confident it is not that simple after having thought about
>>> this issue over the last few days.  But I've been wrong in the past. 
>>> Propose a patch and I will review it for races/correctness, if you
>>> like.  Perhaps a combination of that plus your asymmetrical locking
>>> scheme would work.  One of the challenges you will hit is avoiding ABBA
>>> between your "get" lock and the wqh, but good luck!
>>>       
>> A patch for what? The eventfd patch is a one-liner.
>>     

Yes, understood.  What I was trying to gently say is that the one-liner
proposal alone is still broken afaict.  However, if there is another
solution that works that you like better than 133-liner I posted, I am
more than willing to help analyze it.  In the end, I only care that this
is fixed.

>> It seems hard to believe that the thing cannot be handled on your side. 
>> Once the wake_up_locked() is turned into a wake_up(), what other races are 
>> there?
>>     
>
> AFAICS, the IRQfd code simply registers the callback to ->poll() and waits 
> for two events.
> In the POLLIN event, you schedule_work(&irqfd->inject) and there are no 
> races there AFAICS (you basically do not care of anything eventfd memory 
> related at all).
> For POLLHUP, you do:
>
> 	spin_lock(irqfd->slock);
> 	if (irqfd->wqh)
> 		schedule_work(&irqfd->inject);
> 	irqfd->wqh = NULL;
> 	spin_unlock(irqfd->slock);
>
> In your work function you notice the POLLHUP condition and take proper 
> action (dunno what it is in your case).
> In your kvm_irqfd_release() function:
>
> 	spin_lock(irqfd->slock);
> 	if (irqfd->wqh)
> 		remove_wait_queue(irqfd->wqh, &irqfd->wait);
> 	irqfd->wqh = NULL;
> 	spin_unlock(irqfd->slock);
>
> Any races in there?
>   

Yes, for one you have an ABBA deadlock on the irqfd->slock and wqh->lock
(recall wqh has to be locked to fix that other race I mentioned).

(As a hint, I think I fixed 4-5 races with these patches, so there are a
few others still lurking as well)

-Greg
diff mbox

Patch

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 3d7fb16..934efee 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -16,8 +16,10 @@ 
 #include <linux/anon_inodes.h>
 #include <linux/eventfd.h>
 #include <linux/syscalls.h>
+#include <linux/kref.h>
 
 struct eventfd_ctx {
+	struct kref kref;
 	wait_queue_head_t wqh;
 	/*
 	 * Every time that a write(2) is performed on an eventfd, the
@@ -57,17 +59,24 @@  int eventfd_signal(struct file *file, int n)
 	return n;
 }
 
+static void _eventfd_release(struct kref *kref)
+{
+	struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref);
+
+	kfree(ctx);
+}
+
+static void _eventfd_put(struct kref *kref)
+{
+	kref_put(kref, &_eventfd_release);
+}
+
 static int eventfd_release(struct inode *inode, struct file *file)
 {
 	struct eventfd_ctx *ctx = file->private_data;
 
-	/*
-	 * No need to hold the lock here, since we are on the file cleanup
-	 * path and the ones still attached to the wait queue will be
-	 * serialized by wake_up_locked_poll().
-	 */
-	wake_up_locked_poll(&ctx->wqh, POLLHUP);
-	kfree(ctx);
+	wake_up_poll(&ctx->wqh, POLLHUP);
+	_eventfd_put(&ctx->kref);
 	return 0;
 }
 
@@ -222,6 +231,7 @@  SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
 	if (!ctx)
 		return -ENOMEM;
 
+	kref_init(&ctx->kref);
 	init_waitqueue_head(&ctx->wqh);
 	ctx->count = count;
 	ctx->flags = flags;
@@ -242,6 +252,15 @@  SYSCALL_DEFINE1(eventfd, unsigned int, count)
 	return sys_eventfd2(count, 0);
 }
 
+enum {
+	eventfd_notifier_flag_active,
+};
+
+static int test_and_clear_active(struct eventfd_notifier *en)
+{
+	return test_and_clear_bit(eventfd_notifier_flag_active, &en->flags);
+}
+
 static int eventfd_notifier_wakeup(wait_queue_t *wait, unsigned mode,
 				   int sync, void *key)
 {
@@ -251,19 +270,15 @@  static int eventfd_notifier_wakeup(wait_queue_t *wait, unsigned mode,
 	en = container_of(wait, struct eventfd_notifier, wait);
 
 	if (flags & POLLIN)
-		/*
-		 * The POLLIN wake_up is called with interrupts disabled.
-		 */
 		en->ops->signal(en);
 
 	if (flags & POLLHUP) {
-		/*
-		 * The POLLHUP is called unlocked, so it theoretically should
-		 * be safe to remove ourselves from the wqh using the locked
-		 * variant of remove_wait_queue()
-		 */
-		remove_wait_queue(en->wqh, &en->wait);
-		en->ops->release(en);
+
+		if (test_and_clear_active(en)) {
+			__remove_wait_queue(en->wqh, &en->wait);
+			_eventfd_put(en->eventfd);
+			en->ops->release(en);
+		}
 	}
 
 	return 0;
@@ -283,11 +298,14 @@  static void eventfd_notifier_ptable_enqueue(struct file *file,
 
 int eventfd_notifier_register(struct file *file, struct eventfd_notifier *en)
 {
+	struct eventfd_ctx *ctx;
 	unsigned int events;
 
 	if (file->f_op != &eventfd_fops)
 		return -EINVAL;
 
+	ctx = file->private_data;
+
 	/*
 	 * Install our own custom wake-up handling so we are notified via
 	 * a callback whenever someone signals the underlying eventfd
@@ -297,12 +315,20 @@  int eventfd_notifier_register(struct file *file, struct eventfd_notifier *en)
 
 	events = file->f_op->poll(file, &en->pt);
 
+	kref_get(&ctx->kref);
+	en->eventfd = &ctx->kref;
+
+	set_bit(eventfd_notifier_flag_active, &en->flags);
+
 	return (events & POLLIN) ? 1 : 0;
 }
 EXPORT_SYMBOL_GPL(eventfd_notifier_register);
 
 void eventfd_notifier_unregister(struct eventfd_notifier *en)
 {
-	remove_wait_queue(en->wqh, &en->wait);
+	if (test_and_clear_active(en)) {
+		remove_wait_queue(en->wqh, &en->wait);
+		_eventfd_put(en->eventfd);
+	}
 }
 EXPORT_SYMBOL_GPL(eventfd_notifier_unregister);
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index cb23969..2b6e239 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -12,6 +12,7 @@ 
 #include <linux/poll.h>
 #include <linux/file.h>
 #include <linux/list.h>
+#include <linux/kref.h>
 
 struct eventfd_notifier;
 
@@ -24,6 +25,8 @@  struct eventfd_notifier {
 	poll_table                         pt;
 	wait_queue_head_t                 *wqh;
 	wait_queue_t                       wait;
+	struct kref                       *eventfd;
+	unsigned long                      flags;
 	const struct eventfd_notifier_ops *ops;
 };