diff mbox series

[4/4] fanotify: Use interruptible wait when waiting for permission events

Message ID 20190108164611.11440-5-jack@suse.cz (mailing list archive)
State New, archived
Headers show
Series fanotify: Make wait for permission event response interruptible | expand

Commit Message

Jan Kara Jan. 8, 2019, 4:46 p.m. UTC
When waiting for response to fanotify permission events, we currently
use uninterruptible waits. That makes code simple however it can cause
lots of processes to end up in uninterruptible sleep with hard reboot
being the only alternative in case fanotify listener process stops
responding (e.g. due to a bug in its implementation). Uninterruptible
sleep also makes system hibernation fail if the listener gets frozen
before the process generating fanotify permission event.

Fix these problems by using interruptible sleep for waiting for response
to fanotify event. This is slightly tricky though - we have to
detect when the event got already reported to userspace as in that
case we must not free the event. Instead we push the responsibility for
freeing the event to the process that will write response to the
event.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify.c      | 36 ++++++++++++++++++++++++++----
 fs/notify/fanotify/fanotify.h      |  1 +
 fs/notify/fanotify/fanotify_user.c | 45 ++++++++++++++++++++++++--------------
 fs/notify/notification.c           | 20 +++++++++++------
 include/linux/fsnotify_backend.h   |  3 +++
 5 files changed, 78 insertions(+), 27 deletions(-)

Comments

Amir Goldstein Jan. 9, 2019, 7:51 a.m. UTC | #1
On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
>
> When waiting for response to fanotify permission events, we currently
> use uninterruptible waits. That makes code simple however it can cause
> lots of processes to end up in uninterruptible sleep with hard reboot
> being the only alternative in case fanotify listener process stops
> responding (e.g. due to a bug in its implementation). Uninterruptible
> sleep also makes system hibernation fail if the listener gets frozen
> before the process generating fanotify permission event.
>
> Fix these problems by using interruptible sleep for waiting for response
> to fanotify event. This is slightly tricky though - we have to
> detect when the event got already reported to userspace as in that
> case we must not free the event.

> Instead we push the responsibility for
> freeing the event to the process that will write response to the
> event.

It a bit hard to follow this patch. Can we make the patch that
shifts responsibility to free the event a separate patch?
fsnotify_remove_queued_event() helper can tag along with it
or separate patch as you wish.

>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

I think it would be good to add reported-by tand the links you provided
in cover letter in this patch as well.

...

>
> @@ -87,8 +116,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
>         if (response & FAN_AUDIT)
>                 audit_fanotify(response & ~FAN_AUDIT);
>
> -       pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
> -                group, event, ret);

Looks useful. Why remove?

...

>
> +static void set_event_response(struct fsnotify_group *group,
> +                              struct fanotify_perm_event_info *event,
> +                              unsigned int response)
> +{
> +       spin_lock(&group->notification_lock);
> +       /* Waiter got aborted by a signal? Free the event. */
> +       if (unlikely(event->response == FAN_EVENT_CANCELED)) {
> +               spin_unlock(&group->notification_lock);
> +               fsnotify_destroy_event(group, &event->fae.fse);
> +               return;
> +       }
> +       event->response = response | FAN_EVENT_ANSWERED;
> +       spin_unlock(&group->notification_lock);
> +}
> +

I don't understand. AFAICS, all callers of set_event_response()
call immediately after spin_unlock(&group->notification_lock),
without any user wait involved.
I think it makes more sense for set_event_response() to assert the
lock than it is to take it.

Am I missing anything?

Thanks,
Amir.
Jan Kara Jan. 9, 2019, 9:23 a.m. UTC | #2
On Wed 09-01-19 09:51:08, Amir Goldstein wrote:
> On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
> >
> > When waiting for response to fanotify permission events, we currently
> > use uninterruptible waits. That makes code simple however it can cause
> > lots of processes to end up in uninterruptible sleep with hard reboot
> > being the only alternative in case fanotify listener process stops
> > responding (e.g. due to a bug in its implementation). Uninterruptible
> > sleep also makes system hibernation fail if the listener gets frozen
> > before the process generating fanotify permission event.
> >
> > Fix these problems by using interruptible sleep for waiting for response
> > to fanotify event. This is slightly tricky though - we have to
> > detect when the event got already reported to userspace as in that
> > case we must not free the event.
> 
> > Instead we push the responsibility for
> > freeing the event to the process that will write response to the
> > event.
> 
> It a bit hard to follow this patch. Can we make the patch that
> shifts responsibility to free the event a separate patch?
> fsnotify_remove_queued_event() helper can tag along with it
> or separate patch as you wish.

I'll have a look how to best split this. It should be possible.

> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> 
> I think it would be good to add reported-by tand the links you provided
> in cover letter in this patch as well.

Good point. Will do.

> >
> > @@ -87,8 +116,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
> >         if (response & FAN_AUDIT)
> >                 audit_fanotify(response & ~FAN_AUDIT);
> >
> > -       pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
> > -                group, event, ret);
> 
> Looks useful. Why remove?

OK, I'll leave it alone.

> > +static void set_event_response(struct fsnotify_group *group,
> > +                              struct fanotify_perm_event_info *event,
> > +                              unsigned int response)
> > +{
> > +       spin_lock(&group->notification_lock);
> > +       /* Waiter got aborted by a signal? Free the event. */
> > +       if (unlikely(event->response == FAN_EVENT_CANCELED)) {
> > +               spin_unlock(&group->notification_lock);
> > +               fsnotify_destroy_event(group, &event->fae.fse);
> > +               return;
> > +       }
> > +       event->response = response | FAN_EVENT_ANSWERED;
> > +       spin_unlock(&group->notification_lock);
> > +}
> > +
> 
> I don't understand. AFAICS, all callers of set_event_response()
> call immediately after spin_unlock(&group->notification_lock),
> without any user wait involved.
> I think it makes more sense for set_event_response() to assert the
> lock than it is to take it.
> 
> Am I missing anything?

In case we need to destroy the event, we want to drop the
notification_lock. So to avoid a situation where set_event_response() drops
a lock it did not acquire (which is not very intuitive), I've decided for
the less efficient scheme of dropping and retaking the lock.

But maybe with better function name and some asserts, we could live with
dropping the lock inside the function without taking it.

								Honza
Orion Poplawski Feb. 12, 2019, 3:40 p.m. UTC | #3
On 1/9/19 2:23 AM, Jan Kara wrote:
> On Wed 09-01-19 09:51:08, Amir Goldstein wrote:
>> On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
>>>
>>> When waiting for response to fanotify permission events, we currently
>>> use uninterruptible waits. That makes code simple however it can cause
>>> lots of processes to end up in uninterruptible sleep with hard reboot
>>> being the only alternative in case fanotify listener process stops
>>> responding (e.g. due to a bug in its implementation). Uninterruptible
>>> sleep also makes system hibernation fail if the listener gets frozen
>>> before the process generating fanotify permission event.
>>>
>>> Fix these problems by using interruptible sleep for waiting for response
>>> to fanotify event. This is slightly tricky though - we have to
>>> detect when the event got already reported to userspace as in that
>>> case we must not free the event.
>>
>>> Instead we push the responsibility for
>>> freeing the event to the process that will write response to the
>>> event.
>>
>> It a bit hard to follow this patch. Can we make the patch that
>> shifts responsibility to free the event a separate patch?
>> fsnotify_remove_queued_event() helper can tag along with it
>> or separate patch as you wish.
> 
> I'll have a look how to best split this. It should be possible.
> 
>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>> ---
>>
>> I think it would be good to add reported-by tand the links you provided
>> in cover letter in this patch as well.
> 
> Good point. Will do.
> 
>>>
>>> @@ -87,8 +116,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
>>>         if (response & FAN_AUDIT)
>>>                 audit_fanotify(response & ~FAN_AUDIT);
>>>
>>> -       pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
>>> -                group, event, ret);
>>
>> Looks useful. Why remove?
> 
> OK, I'll leave it alone.
> 
>>> +static void set_event_response(struct fsnotify_group *group,
>>> +                              struct fanotify_perm_event_info *event,
>>> +                              unsigned int response)
>>> +{
>>> +       spin_lock(&group->notification_lock);
>>> +       /* Waiter got aborted by a signal? Free the event. */
>>> +       if (unlikely(event->response == FAN_EVENT_CANCELED)) {
>>> +               spin_unlock(&group->notification_lock);
>>> +               fsnotify_destroy_event(group, &event->fae.fse);
>>> +               return;
>>> +       }
>>> +       event->response = response | FAN_EVENT_ANSWERED;
>>> +       spin_unlock(&group->notification_lock);
>>> +}
>>> +
>>
>> I don't understand. AFAICS, all callers of set_event_response()
>> call immediately after spin_unlock(&group->notification_lock),
>> without any user wait involved.
>> I think it makes more sense for set_event_response() to assert the
>> lock than it is to take it.
>>
>> Am I missing anything?
> 
> In case we need to destroy the event, we want to drop the
> notification_lock. So to avoid a situation where set_event_response() drops
> a lock it did not acquire (which is not very intuitive), I've decided for
> the less efficient scheme of dropping and retaking the lock.
> 
> But maybe with better function name and some asserts, we could live with
> dropping the lock inside the function without taking it.
> 
> 								Honza
> 


Any more progress here?  Thanks for your work on this, it's a real thorn in
our side here.
Jan Kara Feb. 13, 2019, 2:56 p.m. UTC | #4
On Tue 12-02-19 08:40:13, Orion Poplawski wrote:
> On 1/9/19 2:23 AM, Jan Kara wrote:
> > On Wed 09-01-19 09:51:08, Amir Goldstein wrote:
> >> On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
> >>>
> >>> When waiting for response to fanotify permission events, we currently
> >>> use uninterruptible waits. That makes code simple however it can cause
> >>> lots of processes to end up in uninterruptible sleep with hard reboot
> >>> being the only alternative in case fanotify listener process stops
> >>> responding (e.g. due to a bug in its implementation). Uninterruptible
> >>> sleep also makes system hibernation fail if the listener gets frozen
> >>> before the process generating fanotify permission event.
> >>>
> >>> Fix these problems by using interruptible sleep for waiting for response
> >>> to fanotify event. This is slightly tricky though - we have to
> >>> detect when the event got already reported to userspace as in that
> >>> case we must not free the event.
> >>
> >>> Instead we push the responsibility for
> >>> freeing the event to the process that will write response to the
> >>> event.
> >>
> >> It a bit hard to follow this patch. Can we make the patch that
> >> shifts responsibility to free the event a separate patch?
> >> fsnotify_remove_queued_event() helper can tag along with it
> >> or separate patch as you wish.
> > 
> > I'll have a look how to best split this. It should be possible.
> > 
> >>> Signed-off-by: Jan Kara <jack@suse.cz>
> >>> ---
> >>
> >> I think it would be good to add reported-by tand the links you provided
> >> in cover letter in this patch as well.
> > 
> > Good point. Will do.
> > 
> >>>
> >>> @@ -87,8 +116,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
> >>>         if (response & FAN_AUDIT)
> >>>                 audit_fanotify(response & ~FAN_AUDIT);
> >>>
> >>> -       pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
> >>> -                group, event, ret);
> >>
> >> Looks useful. Why remove?
> > 
> > OK, I'll leave it alone.
> > 
> >>> +static void set_event_response(struct fsnotify_group *group,
> >>> +                              struct fanotify_perm_event_info *event,
> >>> +                              unsigned int response)
> >>> +{
> >>> +       spin_lock(&group->notification_lock);
> >>> +       /* Waiter got aborted by a signal? Free the event. */
> >>> +       if (unlikely(event->response == FAN_EVENT_CANCELED)) {
> >>> +               spin_unlock(&group->notification_lock);
> >>> +               fsnotify_destroy_event(group, &event->fae.fse);
> >>> +               return;
> >>> +       }
> >>> +       event->response = response | FAN_EVENT_ANSWERED;
> >>> +       spin_unlock(&group->notification_lock);
> >>> +}
> >>> +
> >>
> >> I don't understand. AFAICS, all callers of set_event_response()
> >> call immediately after spin_unlock(&group->notification_lock),
> >> without any user wait involved.
> >> I think it makes more sense for set_event_response() to assert the
> >> lock than it is to take it.
> >>
> >> Am I missing anything?
> > 
> > In case we need to destroy the event, we want to drop the
> > notification_lock. So to avoid a situation where set_event_response() drops
> > a lock it did not acquire (which is not very intuitive), I've decided for
> > the less efficient scheme of dropping and retaking the lock.
> > 
> > But maybe with better function name and some asserts, we could live with
> > dropping the lock inside the function without taking it.
> > 
> > 								Honza
> > 
> 
> 
> Any more progress here?  Thanks for your work on this, it's a real thorn in
> our side here.

I've sent v2 of the patches today to push things further.

								Honza
diff mbox series

Patch

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index cca13adc3a4c..cb7f7c5484fc 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -57,6 +57,13 @@  static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
 	return 0;
 }
 
+/*
+ * Wait for response to permission event. The function also takes care of
+ * freeing the permission event (or offloads that in case the wait is canceled
+ * by a signal). The function returns 0 in case access got allowed by userspace,
+ * -EPERM in case userspace disallowed the access, and -ERESTARTSYS in case
+ * the wait got interrupted by a signal.
+ */
 static int fanotify_get_response(struct fsnotify_group *group,
 				 struct fanotify_perm_event_info *event,
 				 struct fsnotify_iter_info *iter_info)
@@ -67,9 +74,31 @@  static int fanotify_get_response(struct fsnotify_group *group,
 	BUILD_BUG_ON(FAN_EVENT_STATE_MASK & (FAN_AUDIT | FAN_ALLOW | FAN_DENY));
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	wait_event(group->fanotify_data.access_waitq,
+	ret = wait_event_interruptible(group->fanotify_data.access_waitq,
 		   (event->response & FAN_EVENT_STATE_MASK) ==
 							FAN_EVENT_ANSWERED);
+	/* Signal pending? */
+	if (ret < 0) {
+		spin_lock(&group->notification_lock);
+		/* Event reported to userspace and no answer yet? */
+		if ((event->response & FAN_EVENT_STATE_MASK) ==
+		    FAN_EVENT_REPORTED) {
+			/* Event will get freed once userspace answers to it */
+			event->response = FAN_EVENT_CANCELED;
+			spin_unlock(&group->notification_lock);
+			return ret;
+		}
+		/* Event not yet reported? Just remove it. */
+		if ((event->response & FAN_EVENT_STATE_MASK) == 0)
+			fsnotify_remove_queued_event(group, &event->fae.fse);
+		/*
+		 * Event may be also answered in case signal delivery raced
+		 * with wakeup. In that case we have nothing to do besides
+		 * freeing the event and reporting error.
+		 */
+		spin_unlock(&group->notification_lock);
+		goto out;
+	}
 
 	response = event->response & ~FAN_EVENT_STATE_MASK;
 
@@ -87,8 +116,8 @@  static int fanotify_get_response(struct fsnotify_group *group,
 	if (response & FAN_AUDIT)
 		audit_fanotify(response & ~FAN_AUDIT);
 
-	pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
-		 group, event, ret);
+out:
+	fsnotify_destroy_event(group, &event->fae.fse);
 	
 	return ret;
 }
@@ -259,7 +288,6 @@  static int fanotify_handle_event(struct fsnotify_group *group,
 	} else if (fanotify_is_perm_event(mask)) {
 		ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event),
 					    iter_info);
-		fsnotify_destroy_event(group, fsn_event);
 	}
 finish:
 	if (fanotify_is_perm_event(mask))
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 954d997745c3..57286518f420 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -27,6 +27,7 @@  struct fanotify_event_info {
 
 #define FAN_EVENT_REPORTED 0x40000000	/* Event reported to userspace */
 #define FAN_EVENT_ANSWERED 0x80000000	/* Event answered by userspace */
+#define FAN_EVENT_CANCELED 0xc0000000	/* Event got canceled by a signal */
 
 /*
  * Structure for permission fanotify events. It gets allocated and freed in
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 611c2ff50d64..d6ae9e9338e6 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -146,6 +146,21 @@  static int fill_event_metadata(struct fsnotify_group *group,
 	return ret;
 }
 
+static void set_event_response(struct fsnotify_group *group,
+			       struct fanotify_perm_event_info *event,
+			       unsigned int response)
+{
+	spin_lock(&group->notification_lock);
+	/* Waiter got aborted by a signal? Free the event. */
+	if (unlikely(event->response == FAN_EVENT_CANCELED)) {
+		spin_unlock(&group->notification_lock);
+		fsnotify_destroy_event(group, &event->fae.fse);
+		return;
+	}
+	event->response = response | FAN_EVENT_ANSWERED;
+	spin_unlock(&group->notification_lock);
+}
+
 static int process_access_response(struct fsnotify_group *group,
 				   struct fanotify_response *response_struct)
 {
@@ -181,8 +196,8 @@  static int process_access_response(struct fsnotify_group *group,
 			continue;
 
 		list_del_init(&event->fae.fse.list);
-		event->response = response | FAN_EVENT_ANSWERED;
 		spin_unlock(&group->notification_lock);
+		set_event_response(group, event, response);
 		wake_up(&group->fanotify_data.access_waitq);
 		return 0;
 	}
@@ -304,10 +319,8 @@  static ssize_t fanotify_read(struct file *file, char __user *buf,
 			fsnotify_destroy_event(group, kevent);
 		} else {
 			if (ret <= 0) {
-				spin_lock(&group->notification_lock);
-				FANOTIFY_PE(kevent)->response =
-						FAN_DENY | FAN_EVENT_ANSWERED;
-				spin_unlock(&group->notification_lock);
+				set_event_response(group, FANOTIFY_PE(kevent),
+						   FAN_DENY);
 				wake_up(&group->fanotify_data.access_waitq);
 			} else {
 				spin_lock(&group->notification_lock);
@@ -357,7 +370,7 @@  static ssize_t fanotify_write(struct file *file, const char __user *buf, size_t
 static int fanotify_release(struct inode *ignored, struct file *file)
 {
 	struct fsnotify_group *group = file->private_data;
-	struct fanotify_perm_event_info *event, *next;
+	struct fanotify_perm_event_info *event;
 	struct fsnotify_event *fsn_event;
 
 	/*
@@ -372,13 +385,13 @@  static int fanotify_release(struct inode *ignored, struct file *file)
 	 * and simulate reply from userspace.
 	 */
 	spin_lock(&group->notification_lock);
-	list_for_each_entry_safe(event, next, &group->fanotify_data.access_list,
-				 fae.fse.list) {
-		pr_debug("%s: found group=%p event=%p\n", __func__, group,
-			 event);
-
+	while (!list_empty(&group->fanotify_data.access_list)) {
+		event = list_first_entry(&group->fanotify_data.access_list,
+				struct fanotify_perm_event_info, fae.fse.list);
 		list_del_init(&event->fae.fse.list);
-		event->response = FAN_ALLOW | FAN_EVENT_ANSWERED;
+		spin_unlock(&group->notification_lock);
+		set_event_response(group, event, FAN_ALLOW);
+		spin_lock(&group->notification_lock);
 	}
 
 	/*
@@ -388,14 +401,14 @@  static int fanotify_release(struct inode *ignored, struct file *file)
 	 */
 	while (!fsnotify_notify_queue_is_empty(group)) {
 		fsn_event = fsnotify_remove_first_event(group);
+		spin_unlock(&group->notification_lock);
 		if (!(fsn_event->mask & FANOTIFY_PERM_EVENTS)) {
-			spin_unlock(&group->notification_lock);
 			fsnotify_destroy_event(group, fsn_event);
-			spin_lock(&group->notification_lock);
 		} else {
-			FANOTIFY_PE(fsn_event)->response =
-					FAN_ALLOW | FAN_EVENT_ANSWERED;
+			set_event_response(group, FANOTIFY_PE(fsn_event),
+					   FAN_ALLOW);
 		}
+		spin_lock(&group->notification_lock);
 	}
 	spin_unlock(&group->notification_lock);
 
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index 3c3e36745f59..2195a5cf745a 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -141,6 +141,18 @@  int fsnotify_add_event(struct fsnotify_group *group,
 	return ret;
 }
 
+void fsnotify_remove_queued_event(struct fsnotify_group *group,
+				  struct fsnotify_event *event)
+{
+	assert_spin_locked(&group->notification_lock);
+	/*
+	 * We need to init list head for the case of overflow event so that
+	 * check in fsnotify_add_event() works
+	 */
+	list_del_init(&event->list);
+	group->q_len--;
+}
+
 /*
  * Remove and return the first event from the notification list.  It is the
  * responsibility of the caller to destroy the obtained event
@@ -155,13 +167,7 @@  struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group)
 
 	event = list_first_entry(&group->notification_list,
 				 struct fsnotify_event, list);
-	/*
-	 * We need to init list head for the case of overflow event so that
-	 * check in fsnotify_add_event() works
-	 */
-	list_del_init(&event->list);
-	group->q_len--;
-
+	fsnotify_remove_queued_event(group, event);
 	return event;
 }
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 7639774e7475..cddf839bac96 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -416,6 +416,9 @@  extern bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group);
 extern struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group);
 /* return AND dequeue the first event on the notification queue */
 extern struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group);
+/* Remove event queued in the notification list */
+extern void fsnotify_remove_queued_event(struct fsnotify_group *group,
+					 struct fsnotify_event *event);
 
 /* functions used to manipulate the marks attached to inodes */