Message ID | 153474898224.6806.12518115530793064797.stgit@buzz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fanotify: use killable wait for waiting response for permission events | expand |
Hi! On Mon 20-08-18 10:09:42, Konstantin Khlebnikov wrote: > Waiting in uninterruptible state for response from userspace > easily produces deadlocks and hordes of unkillable tasks. > > This patch makes this wait killable. > > At receiving fatal signal task will remove queued event and die. > If event is already handled then response will be received as usual. > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Thanks for the patch. I like the idea. Some comments inline. > --- > fs/notify/fanotify/fanotify.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > index eb4e75175cfb..7a0c37790c89 100644 > --- a/fs/notify/fanotify/fanotify.c > +++ b/fs/notify/fanotify/fanotify.c > @@ -64,7 +64,27 @@ static int fanotify_get_response(struct fsnotify_group *group, > > pr_debug("%s: group=%p event=%p\n", __func__, group, event); > > - wait_event(group->fanotify_data.access_waitq, event->response); > + ret = wait_event_killable(group->fanotify_data.access_waitq, > + event->response); > + if (ret) { > + /* Try to remove pending event from the queue */ > + spin_lock(&group->notification_lock); > + if (!list_empty(&event->fae.fse.list)) > + list_del_init(&event->fae.fse.list); Here you forget to decrement group->q_len like fsnotify_remove_first_event() does. > + else > + ret = 0; > + spin_unlock(&group->notification_lock); So the above check for list_empty can hit either when response is just being processed (and then we'll be woken up very soon) or when the event is just in the process of being copied from event queue to userspace (in which case we are in the same situation as in the old code). So it would be weird that in rare cases wait would not be really killable. I think we could detect this situation in fanotify_read() before adding event to access_list and just wakeup waiter in fanotify_get_response() again and avoid reporting the event to userspace. Hmm? Honza > + > + if (ret) > + return ret; > + > + /* > + * We cannot return, this will destroy event while > + * process_access_response() fills response. > + * Just wait for wakeup and continue normal flow. > + */ > + wait_event(group->fanotify_data.access_waitq, event->response); > + } > > /* userspace responded, convert to something usable */ > switch (event->response & ~FAN_AUDIT) { >
On 20.08.2018 13:53, Jan Kara wrote: > Hi! > > On Mon 20-08-18 10:09:42, Konstantin Khlebnikov wrote: >> Waiting in uninterruptible state for response from userspace >> easily produces deadlocks and hordes of unkillable tasks. >> >> This patch makes this wait killable. >> >> At receiving fatal signal task will remove queued event and die. >> If event is already handled then response will be received as usual. >> >> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > > Thanks for the patch. I like the idea. Some comments inline. > >> --- >> fs/notify/fanotify/fanotify.c | 22 +++++++++++++++++++++- >> 1 file changed, 21 insertions(+), 1 deletion(-) >> >> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c >> index eb4e75175cfb..7a0c37790c89 100644 >> --- a/fs/notify/fanotify/fanotify.c >> +++ b/fs/notify/fanotify/fanotify.c >> @@ -64,7 +64,27 @@ static int fanotify_get_response(struct fsnotify_group *group, >> >> pr_debug("%s: group=%p event=%p\n", __func__, group, event); >> >> - wait_event(group->fanotify_data.access_waitq, event->response); >> + ret = wait_event_killable(group->fanotify_data.access_waitq, >> + event->response); >> + if (ret) { >> + /* Try to remove pending event from the queue */ >> + spin_lock(&group->notification_lock); >> + if (!list_empty(&event->fae.fse.list)) >> + list_del_init(&event->fae.fse.list); > > Here you forget to decrement group->q_len like > fsnotify_remove_first_event() does. > Yep >> + else >> + ret = 0; >> + spin_unlock(&group->notification_lock); > > So the above check for list_empty can hit either when response is just > being processed (and then we'll be woken up very soon) or when the event is > just in the process of being copied from event queue to userspace (in which > case we are in the same situation as in the old code). So it would be > weird that in rare cases wait would not be really killable. I think we > could detect this situation in fanotify_read() before adding event to > access_list and just wakeup waiter in fanotify_get_response() again and > avoid reporting the event to userspace. Hmm? I've missed that move from list to list in fanotify_read(). So, fanotify_read needs event alive for a long time - copy_to_user might block forever. We have to transfer ownership and destroy event in fanotify_read. I'll try this approach. > > Honza > >> + >> + if (ret) >> + return ret; >> + >> + /* >> + * We cannot return, this will destroy event while >> + * process_access_response() fills response. >> + * Just wait for wakeup and continue normal flow. >> + */ >> + wait_event(group->fanotify_data.access_waitq, event->response); >> + } >> >> /* userspace responded, convert to something usable */ >> switch (event->response & ~FAN_AUDIT) { >>
On Tue 21-08-18 16:42:26, Konstantin Khlebnikov wrote: > On 20.08.2018 13:53, Jan Kara wrote: > > > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > > > index eb4e75175cfb..7a0c37790c89 100644 > > > --- a/fs/notify/fanotify/fanotify.c > > > +++ b/fs/notify/fanotify/fanotify.c > > > @@ -64,7 +64,27 @@ static int fanotify_get_response(struct fsnotify_group *group, > > > pr_debug("%s: group=%p event=%p\n", __func__, group, event); > > > - wait_event(group->fanotify_data.access_waitq, event->response); > > > + ret = wait_event_killable(group->fanotify_data.access_waitq, > > > + event->response); > > > + if (ret) { > > > + /* Try to remove pending event from the queue */ > > > + spin_lock(&group->notification_lock); > > > + if (!list_empty(&event->fae.fse.list)) > > > + list_del_init(&event->fae.fse.list); > > > > Here you forget to decrement group->q_len like > > fsnotify_remove_first_event() does. > > > > Yep Actually only if this was the list of events to report to userspace. If the event was on a list of events already reported but not responded to, group->q_len should not be touched. > > > + else > > > + ret = 0; > > > + spin_unlock(&group->notification_lock); > > > > So the above check for list_empty can hit either when response is just > > being processed (and then we'll be woken up very soon) or when the event is > > just in the process of being copied from event queue to userspace (in which > > case we are in the same situation as in the old code). So it would be > > weird that in rare cases wait would not be really killable. I think we > > could detect this situation in fanotify_read() before adding event to > > access_list and just wakeup waiter in fanotify_get_response() again and > > avoid reporting the event to userspace. Hmm? > > I've missed that move from list to list in fanotify_read(). > > So, fanotify_read needs event alive for a long time - copy_to_user might > block forever. It might block for a long time due to page fault. That is correct. > We have to transfer ownership and destroy event in fanotify_read. > I'll try this approach. I'm open to that if you come up with something reasonably simple. But you need to somehow communicate back the response and that used to be a mess and that's why we ended up with permission events being completely handled by the process generating them... Honza
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index eb4e75175cfb..7a0c37790c89 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -64,7 +64,27 @@ static int fanotify_get_response(struct fsnotify_group *group, pr_debug("%s: group=%p event=%p\n", __func__, group, event); - wait_event(group->fanotify_data.access_waitq, event->response); + ret = wait_event_killable(group->fanotify_data.access_waitq, + event->response); + if (ret) { + /* Try to remove pending event from the queue */ + spin_lock(&group->notification_lock); + if (!list_empty(&event->fae.fse.list)) + list_del_init(&event->fae.fse.list); + else + ret = 0; + spin_unlock(&group->notification_lock); + + if (ret) + return ret; + + /* + * We cannot return, this will destroy event while + * process_access_response() fills response. + * Just wait for wakeup and continue normal flow. + */ + wait_event(group->fanotify_data.access_waitq, event->response); + } /* userspace responded, convert to something usable */ switch (event->response & ~FAN_AUDIT) {
Waiting in uninterruptible state for response from userspace easily produces deadlocks and hordes of unkillable tasks. This patch makes this wait killable. At receiving fatal signal task will remove queued event and die. If event is already handled then response will be received as usual. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> --- fs/notify/fanotify/fanotify.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)