Message ID | 56A7FF64.1050301@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, On Tue 26-01-16 17:21:08, Eric Sandeen wrote: > From: Steve Grubb <sgrubb@redhat.com> > > If a daemon using FANOTIFY needs to open a file on a watched filesystem and > its wanting OPEN_PERM events, we get deadlock. (This could happen because > of a library the daemon is using suddenly decides it needs to look in a new > file.) Even though the man page says that the daemon should approve its own > access decision, it really can't. If its in the middle of working on a > request and that in turn generates another request, the second request is > going to sit in the queue inside the kernel and not be read because the > daemon is waiting on a library call that will never finish. We also have no > idea how many requests are stacked up before we get to it. So, it really > can't approve its own access requests. > > The solution is to assume that the daemon is going to approve its own file > access requests. So, any requested access that matches the pid of the program > receiving fanotify events should be pre-approved in the kernel and not sent > to user space for approval. This should prevent deadlock. > > This behavior only exists if FAN_SELF_APPROVE is in the flags at > fanotify_init() time. > > [Eric Sandeen: Make behavior contingent on fanotify_init flag] > > Signed-off-by: Steve Grubb <sgrubb@redhat.com> > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > --- > > Resending this; first submission to lkml generated no responses, but offline > Eric Paris indicated that the original patch was "policy in the kernel," > so I'll see what people think of making it contingent on an fanotify_init > flag at syscall time. AKPM is merging fanotify patches these days so it is good to CC him. I'm keeping eye on the notification infrastructure so you can CC me when you need an opinion. Now to the patch: This solution really looks half baked to me. What if there are two processes mediating the access? You'll get the deadlock again: We have processes A and B mediating access. A accesses some file f, A selfapproves the event for itself but the access is still blocked on the approval from B. B proceeds to process the access request by A. It accesses some file which generates the permission event - now B is blocked waiting for A to approve its event. Dang. This really resembles a situation when e.g. multipathd must be damn carful not to generate any IO while it is running lest it deadlocks while reconfiguring paths. So here you have to be damn careful not to generate any events when processing fanotify permission events... And I know that is inconvenient but that's how things were designed and duck taping some special cases IMHO is not acceptable. One solution which would look reasonably clean to me would be that a process could have a capability which when set would mean that accesses by that process would not generate fanotify permission events. This would really avoid the deadlocks. But then botnet / virus writers would quickly learn to set this capability for their processes so I'm not sure if this doesn't somewhat defeat the purpose of permission events. OTOH setting this capability would be gated by CAP_SYS_ADMIN anyway and once you have this you have other ways to circumvent any protection so it is not catastrophical. But still it makes life easier for the bad guys. Honza
On Thursday, January 28, 2016 02:56:11 PM Jan Kara wrote: > Hi, > > On Tue 26-01-16 17:21:08, Eric Sandeen wrote: > > From: Steve Grubb <sgrubb@redhat.com> > > > > If a daemon using FANOTIFY needs to open a file on a watched filesystem > > and > > its wanting OPEN_PERM events, we get deadlock. (This could happen because > > of a library the daemon is using suddenly decides it needs to look in a > > new > > file.) Even though the man page says that the daemon should approve its > > own > > access decision, it really can't. If its in the middle of working on a > > request and that in turn generates another request, the second request is > > going to sit in the queue inside the kernel and not be read because the > > daemon is waiting on a library call that will never finish. We also have > > no > > idea how many requests are stacked up before we get to it. So, it really > > can't approve its own access requests. > > > > The solution is to assume that the daemon is going to approve its own file > > access requests. So, any requested access that matches the pid of the > > program receiving fanotify events should be pre-approved in the kernel > > and not sent to user space for approval. This should prevent deadlock. > > > > This behavior only exists if FAN_SELF_APPROVE is in the flags at > > fanotify_init() time. > > > > [Eric Sandeen: Make behavior contingent on fanotify_init flag] > > > > Signed-off-by: Steve Grubb <sgrubb@redhat.com> > > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > > --- > > > > Resending this; first submission to lkml generated no responses, but > > offline Eric Paris indicated that the original patch was "policy in the > > kernel," so I'll see what people think of making it contingent on an > > fanotify_init flag at syscall time. > > AKPM is merging fanotify patches these days so it is good to CC him. I'm > keeping eye on the notification infrastructure so you can CC me when you > need an opinion. > > Now to the patch: This solution really looks half baked to me. What if > there are two processes mediating the access? While this is certainly possible, is this actually done in real life? > You'll get the deadlock again: We have processes A and B mediating access. A > accesses some file f, A selfapproves the event for itself but the access is > still blocked on the approval from B. B proceeds to process the access > request by A. It accesses some file which generates the permission event - > now B is blocked waiting for A to approve its event. Dang. > > This really resembles a situation when e.g. multipathd must be damn carful > not to generate any IO while it is running lest it deadlocks while > reconfiguring paths. So here you have to be damn careful not to generate > any events when processing fanotify permission events... And I know that is > inconvenient but that's how things were designed and duck taping some > special cases IMHO is not acceptable. The issue here is that any relatively interesting program will have several libraries who could at any time decide it needs to open a file. Perhaps even /etc/hosts for network name resolution. You really don't know what the third party libraries might do and the consequences are pretty severe - deadlock. You could make it multi-threaded and have one thread dequeuing approval requests and another servicing them. But...each request has a live descriptor that counts towards the rlimit for max open descriptors. Hitting that is bad also. The simplest solution is to assume that the daemon is going to approve its own request. It would never refuse its own request, should it? -Steve > One solution which would look reasonably clean to me would be that a > process could have a capability which when set would mean that accesses by > that process would not generate fanotify permission events. This would > really avoid the deadlocks. But then botnet / virus writers would quickly > learn to set this capability for their processes so I'm not sure if this > doesn't somewhat defeat the purpose of permission events. OTOH setting > this capability would be gated by CAP_SYS_ADMIN anyway and once you have > this you have other ways to circumvent any protection so it is not > catastrophical. But still it makes life easier for the bad guys. > > Honza -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 30-03-16 14:47:31, Steve Grubb wrote: > On Thursday, January 28, 2016 02:56:11 PM Jan Kara wrote: > > On Tue 26-01-16 17:21:08, Eric Sandeen wrote: > > > From: Steve Grubb <sgrubb@redhat.com> > > > > > > If a daemon using FANOTIFY needs to open a file on a watched filesystem > > > and > > > its wanting OPEN_PERM events, we get deadlock. (This could happen because > > > of a library the daemon is using suddenly decides it needs to look in a > > > new > > > file.) Even though the man page says that the daemon should approve its > > > own > > > access decision, it really can't. If its in the middle of working on a > > > request and that in turn generates another request, the second request is > > > going to sit in the queue inside the kernel and not be read because the > > > daemon is waiting on a library call that will never finish. We also have > > > no > > > idea how many requests are stacked up before we get to it. So, it really > > > can't approve its own access requests. > > > > > > The solution is to assume that the daemon is going to approve its own file > > > access requests. So, any requested access that matches the pid of the > > > program receiving fanotify events should be pre-approved in the kernel > > > and not sent to user space for approval. This should prevent deadlock. > > > > > > This behavior only exists if FAN_SELF_APPROVE is in the flags at > > > fanotify_init() time. > > > > > > [Eric Sandeen: Make behavior contingent on fanotify_init flag] > > > > > > Signed-off-by: Steve Grubb <sgrubb@redhat.com> > > > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > > > --- > > > > > > Resending this; first submission to lkml generated no responses, but > > > offline Eric Paris indicated that the original patch was "policy in the > > > kernel," so I'll see what people think of making it contingent on an > > > fanotify_init flag at syscall time. > > > > AKPM is merging fanotify patches these days so it is good to CC him. I'm > > keeping eye on the notification infrastructure so you can CC me when you > > need an opinion. > > > > Now to the patch: This solution really looks half baked to me. What if > > there are two processes mediating the access? > > While this is certainly possible, is this actually done in real life? Honestly, I don't know. But it doesn't seem like too exotic configuration to me. Think for example about containers. It just seems as a bad design to restrict fanotify permission events to: "only one process in the system can safely use this". That has always proven to be a bad idea in the long run. > > You'll get the deadlock again: We have processes A and B mediating access. A > > accesses some file f, A selfapproves the event for itself but the access is > > still blocked on the approval from B. B proceeds to process the access > > request by A. It accesses some file which generates the permission event - > > now B is blocked waiting for A to approve its event. Dang. > > > > This really resembles a situation when e.g. multipathd must be damn carful > > not to generate any IO while it is running lest it deadlocks while > > reconfiguring paths. So here you have to be damn careful not to generate > > any events when processing fanotify permission events... And I know that is > > inconvenient but that's how things were designed and duck taping some > > special cases IMHO is not acceptable. > > The issue here is that any relatively interesting program will have several > libraries who could at any time decide it needs to open a file. Perhaps even > /etc/hosts for network name resolution. You really don't know what the third > party libraries might do and the consequences are pretty severe - deadlock. > > You could make it multi-threaded and have one thread dequeuing approval > requests and another servicing them. But...each request has a live descriptor > that counts towards the rlimit for max open descriptors. Hitting that is bad > also. I agree this reduces the attractivity of the two thread design. > The simplest solution is to assume that the daemon is going to approve > its own request. It would never refuse its own request, should it? I agree this is a solution which fixes your usecase but I don't think it is good enough for general use. I won't accept anything that doesn't work for several users of fanotify permission events. As I wrote below in my original email, you can use for example capabilities to make things work for several processes without too big hassle... > > One solution which would look reasonably clean to me would be that a > > process could have a capability which when set would mean that accesses by > > that process would not generate fanotify permission events. This would > > really avoid the deadlocks. But then botnet / virus writers would quickly > > learn to set this capability for their processes so I'm not sure if this > > doesn't somewhat defeat the purpose of permission events. OTOH setting > > this capability would be gated by CAP_SYS_ADMIN anyway and once you have > > this you have other ways to circumvent any protection so it is not > > catastrophical. But still it makes life easier for the bad guys. Honza
On 30.03.2016 20:47, Steve Grubb wrote: Hi, >> Now to the patch: This solution really looks half baked to me. What if >> there are two processes mediating the access? > > While this is certainly possible, is this actually done in real life? we actually DID do this. We did not use fanotify though, since this did not exist then, but dazukoFS a stackable filesystem which worked similar to fanotify. If you want to scan files e.g for viruses you want to have it done as fast as possible and multiple scanner (processes) is what you need then. > > >> You'll get the deadlock again: We have processes A and B mediating access. A >> accesses some file f, A selfapproves the event for itself but the access is >> still blocked on the approval from B. B proceeds to process the access >> request by A. It accesses some file which generates the permission event - >> now B is blocked waiting for A to approve its event. Dang. >> >> This really resembles a situation when e.g. multipathd must be damn carful >> not to generate any IO while it is running lest it deadlocks while >> reconfiguring paths. So here you have to be damn careful not to generate >> any events when processing fanotify permission events... And I know that is >> inconvenient but that's how things were designed and duck taping some >> special cases IMHO is not acceptable. > > The issue here is that any relatively interesting program will have several > libraries who could at any time decide it needs to open a file. Perhaps even > /etc/hosts for network name resolution. You really don't know what the third > party libraries might do and the consequences are pretty severe - deadlock. > > You could make it multi-threaded and have one thread dequeuing approval > requests and another servicing them. But...each request has a live descriptor > that counts towards the rlimit for max open descriptors. Hitting that is bad > also. > > The simplest solution is to assume that the daemon is going to approve its own > request. It would never refuse its own request, should it? > I have thought about this problem long ago when fanotify came out, because dazukoFS actually provided the possibility to set arbitrary processes to an "ignore" list exactly for this reason and I wondered how we would handle this with fanotify. But then I came to the result that all you need to do is make sure that you have one dedicated process which does nothing else than approving (or denying) file accesses. If that process knows the pid of the processes that handle the file accesses (in our case the file scanner processes) it could simply ack all accesses done by them (note that the pid of the file accessing process is part of an fanotify event). For all file accesses done by unknown pids it would have to wait for an ACK from the scanner processes. Sure, that would require quite some interprocess communication between scanner processes and the "approver process" but it should work. Regards, Lino -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index d2f97ec..9b5802c 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -105,6 +105,8 @@ static bool fanotify_should_send_event(struct fsnotify_mark *inode_mark, { __u32 marks_mask, marks_ignored_mask; struct path *path = data; + pid_t grp_pid; + struct pid *cur_pid; pr_debug("%s: inode_mark=%p vfsmnt_mark=%p mask=%x data=%p" " data_type=%d\n", __func__, inode_mark, vfsmnt_mark, @@ -139,6 +141,17 @@ static bool fanotify_should_send_event(struct fsnotify_mark *inode_mark, BUG(); } + /* Auto-approve the listening process's own requests if asked to */ + grp_pid = pid_nr(vfsmnt_mark->group->fanotify_data.pid); + if (grp_pid) { + cur_pid = get_pid(task_tgid(current)); + if (grp_pid == pid_nr(cur_pid)) { + put_pid(cur_pid); + return false; + } + put_pid(cur_pid); + } + if (d_is_dir(path->dentry) && !(marks_mask & FS_ISDIR & ~marks_ignored_mask)) return false; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 8e8e6bc..c81cee8 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -387,6 +387,8 @@ static int fanotify_release(struct inode *ignored, struct file *file) */ wake_up(&group->fanotify_data.access_waitq); #endif + /* Get rid of reference held since fanotify_init */ + put_pid(group->fanotify_data.pid); /* matches the fanotify_init->fsnotify_alloc_group */ fsnotify_destroy_group(group); @@ -741,6 +743,11 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) group->fanotify_data.user = user; atomic_inc(&user->fanotify_listeners); + if (flags & FAN_SELF_APPROVE) + group->fanotify_data.pid = get_pid(task_tgid(current)); + else + group->fanotify_data.pid = 0; + oevent = fanotify_alloc_event(NULL, FS_Q_OVERFLOW, NULL); if (unlikely(!oevent)) { fd = -ENOMEM; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 533c440..48938ad 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -16,6 +16,7 @@ #include <linux/spinlock.h> #include <linux/types.h> #include <linux/atomic.h> +#include <linux/pid.h> /* * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily @@ -184,6 +185,7 @@ struct fsnotify_group { int f_flags; unsigned int max_marks; struct user_struct *user; + struct pid *pid; } fanotify_data; #endif /* CONFIG_FANOTIFY */ }; diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 030508d..5b4ce4e 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -35,6 +35,7 @@ #define FAN_UNLIMITED_QUEUE 0x00000010 #define FAN_UNLIMITED_MARKS 0x00000020 +#define FAN_SELF_APPROVE 0x00000040 /* listener pid auto-approved */ #define FAN_ALL_INIT_FLAGS (FAN_CLOEXEC | FAN_NONBLOCK | \ FAN_ALL_CLASS_BITS | FAN_UNLIMITED_QUEUE |\