diff mbox series

fanotify: use killable wait for waiting response for permission events

Message ID 153474898224.6806.12518115530793064797.stgit@buzz (mailing list archive)
State New, archived
Headers show
Series fanotify: use killable wait for waiting response for permission events | expand

Commit Message

Konstantin Khlebnikov Aug. 20, 2018, 7:09 a.m. UTC
Waiting in uninterruptible state for response from userspace
easily produces deadlocks and hordes of unkillable tasks.

This patch makes this wait killable.

At receiving fatal signal task will remove queued event and die.
If event is already handled then response will be received as usual.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/notify/fanotify/fanotify.c |   22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

Jan Kara Aug. 20, 2018, 10:53 a.m. UTC | #1
Hi!

On Mon 20-08-18 10:09:42, Konstantin Khlebnikov wrote:
> Waiting in uninterruptible state for response from userspace
> easily produces deadlocks and hordes of unkillable tasks.
> 
> This patch makes this wait killable.
> 
> At receiving fatal signal task will remove queued event and die.
> If event is already handled then response will be received as usual.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Thanks for the patch. I like the idea. Some comments inline.

> ---
>  fs/notify/fanotify/fanotify.c |   22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index eb4e75175cfb..7a0c37790c89 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -64,7 +64,27 @@ static int fanotify_get_response(struct fsnotify_group *group,
>  
>  	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>  
> -	wait_event(group->fanotify_data.access_waitq, event->response);
> +	ret = wait_event_killable(group->fanotify_data.access_waitq,
> +				  event->response);
> +	if (ret) {
> +		/* Try to remove pending event from the queue */
> +		spin_lock(&group->notification_lock);
> +		if (!list_empty(&event->fae.fse.list))
> +			list_del_init(&event->fae.fse.list);

Here you forget to decrement group->q_len like
fsnotify_remove_first_event() does.

> +		else
> +			ret = 0;
> +		spin_unlock(&group->notification_lock);

So the above check for list_empty can hit either when response is just
being processed (and then we'll be woken up very soon) or when the event is
just in the process of being copied from event queue to userspace (in which
case we are in the same situation as in the old code). So it would be
weird that in rare cases wait would not be really killable. I think we
could detect this situation in fanotify_read() before adding event to
access_list and just wakeup waiter in fanotify_get_response() again and
avoid reporting the event to userspace. Hmm?

								Honza

> +
> +		if (ret)
> +			return ret;
> +
> +		/*
> +		 * We cannot return, this will destroy event while
> +		 * process_access_response() fills response.
> +		 * Just wait for wakeup and continue normal flow.
> +		 */
> +		wait_event(group->fanotify_data.access_waitq, event->response);
> +	}
>  
>  	/* userspace responded, convert to something usable */
>  	switch (event->response & ~FAN_AUDIT) {
>
Konstantin Khlebnikov Aug. 21, 2018, 1:42 p.m. UTC | #2
On 20.08.2018 13:53, Jan Kara wrote:
> Hi!
> 
> On Mon 20-08-18 10:09:42, Konstantin Khlebnikov wrote:
>> Waiting in uninterruptible state for response from userspace
>> easily produces deadlocks and hordes of unkillable tasks.
>>
>> This patch makes this wait killable.
>>
>> At receiving fatal signal task will remove queued event and die.
>> If event is already handled then response will be received as usual.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> 
> Thanks for the patch. I like the idea. Some comments inline.
> 
>> ---
>>   fs/notify/fanotify/fanotify.c |   22 +++++++++++++++++++++-
>>   1 file changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
>> index eb4e75175cfb..7a0c37790c89 100644
>> --- a/fs/notify/fanotify/fanotify.c
>> +++ b/fs/notify/fanotify/fanotify.c
>> @@ -64,7 +64,27 @@ static int fanotify_get_response(struct fsnotify_group *group,
>>   
>>   	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>>   
>> -	wait_event(group->fanotify_data.access_waitq, event->response);
>> +	ret = wait_event_killable(group->fanotify_data.access_waitq,
>> +				  event->response);
>> +	if (ret) {
>> +		/* Try to remove pending event from the queue */
>> +		spin_lock(&group->notification_lock);
>> +		if (!list_empty(&event->fae.fse.list))
>> +			list_del_init(&event->fae.fse.list);
> 
> Here you forget to decrement group->q_len like
> fsnotify_remove_first_event() does.
> 

Yep

>> +		else
>> +			ret = 0;
>> +		spin_unlock(&group->notification_lock);
> 
> So the above check for list_empty can hit either when response is just
> being processed (and then we'll be woken up very soon) or when the event is
> just in the process of being copied from event queue to userspace (in which
> case we are in the same situation as in the old code). So it would be
> weird that in rare cases wait would not be really killable. I think we
> could detect this situation in fanotify_read() before adding event to
> access_list and just wakeup waiter in fanotify_get_response() again and
> avoid reporting the event to userspace. Hmm?

I've missed that move from list to list in fanotify_read().

So, fanotify_read needs event alive for a long time - copy_to_user might block forever.

We have to transfer ownership and destroy event in fanotify_read.
I'll try this approach.

> 
> 								Honza
> 
>> +
>> +		if (ret)
>> +			return ret;
>> +
>> +		/*
>> +		 * We cannot return, this will destroy event while
>> +		 * process_access_response() fills response.
>> +		 * Just wait for wakeup and continue normal flow.
>> +		 */
>> +		wait_event(group->fanotify_data.access_waitq, event->response);
>> +	}
>>   
>>   	/* userspace responded, convert to something usable */
>>   	switch (event->response & ~FAN_AUDIT) {
>>
Jan Kara Aug. 21, 2018, 2:43 p.m. UTC | #3
On Tue 21-08-18 16:42:26, Konstantin Khlebnikov wrote:
> On 20.08.2018 13:53, Jan Kara wrote:
> > > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> > > index eb4e75175cfb..7a0c37790c89 100644
> > > --- a/fs/notify/fanotify/fanotify.c
> > > +++ b/fs/notify/fanotify/fanotify.c
> > > @@ -64,7 +64,27 @@ static int fanotify_get_response(struct fsnotify_group *group,
> > >   	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > > -	wait_event(group->fanotify_data.access_waitq, event->response);
> > > +	ret = wait_event_killable(group->fanotify_data.access_waitq,
> > > +				  event->response);
> > > +	if (ret) {
> > > +		/* Try to remove pending event from the queue */
> > > +		spin_lock(&group->notification_lock);
> > > +		if (!list_empty(&event->fae.fse.list))
> > > +			list_del_init(&event->fae.fse.list);
> > 
> > Here you forget to decrement group->q_len like
> > fsnotify_remove_first_event() does.
> > 
> 
> Yep

Actually only if this was the list of events to report to userspace. If the
event was on a list of events already reported but not responded to,
group->q_len should not be touched.

> > > +		else
> > > +			ret = 0;
> > > +		spin_unlock(&group->notification_lock);
> > 
> > So the above check for list_empty can hit either when response is just
> > being processed (and then we'll be woken up very soon) or when the event is
> > just in the process of being copied from event queue to userspace (in which
> > case we are in the same situation as in the old code). So it would be
> > weird that in rare cases wait would not be really killable. I think we
> > could detect this situation in fanotify_read() before adding event to
> > access_list and just wakeup waiter in fanotify_get_response() again and
> > avoid reporting the event to userspace. Hmm?
> 
> I've missed that move from list to list in fanotify_read().
> 
> So, fanotify_read needs event alive for a long time - copy_to_user might
> block forever.

It might block for a long time due to page fault. That is correct.

> We have to transfer ownership and destroy event in fanotify_read.
> I'll try this approach.

I'm open to that if you come up with something reasonably simple. But you
need to somehow communicate back the response and that used to be a mess
and that's why we ended up with permission events being completely handled
by the process generating them...

								Honza
diff mbox series

Patch

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index eb4e75175cfb..7a0c37790c89 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -64,7 +64,27 @@  static int fanotify_get_response(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	wait_event(group->fanotify_data.access_waitq, event->response);
+	ret = wait_event_killable(group->fanotify_data.access_waitq,
+				  event->response);
+	if (ret) {
+		/* Try to remove pending event from the queue */
+		spin_lock(&group->notification_lock);
+		if (!list_empty(&event->fae.fse.list))
+			list_del_init(&event->fae.fse.list);
+		else
+			ret = 0;
+		spin_unlock(&group->notification_lock);
+
+		if (ret)
+			return ret;
+
+		/*
+		 * We cannot return, this will destroy event while
+		 * process_access_response() fills response.
+		 * Just wait for wakeup and continue normal flow.
+		 */
+		wait_event(group->fanotify_data.access_waitq, event->response);
+	}
 
 	/* userspace responded, convert to something usable */
 	switch (event->response & ~FAN_AUDIT) {