diff mbox

[1/1] fanotify: pre-approve listener's OPEN_PERM access requests

Message ID 56A7FF64.1050301@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Sandeen Jan. 26, 2016, 11:21 p.m. UTC
From: Steve Grubb <sgrubb@redhat.com>

Hello,

If a daemon using FANOTIFY needs to open a file on a watched filesystem and
its wanting OPEN_PERM events, we get deadlock. (This could happen because
of a library the daemon is using suddenly decides it needs to look in a new
file.) Even though the man page says that the daemon should approve its own
access decision, it really can't. If its in the middle of working on a
request and that in turn generates another request, the second request is
going to sit in the queue inside the kernel and not be read because the
daemon is waiting on a library call that will never finish. We also have no
idea how many requests are stacked up before we get to it. So, it really
can't approve its own access requests.

The solution is to assume that the daemon is going to approve its own file
access requests. So, any requested access that matches the pid of the program
receiving fanotify events should be pre-approved in the kernel and not sent
to user space for approval. This should prevent deadlock.

This behavior only exists if FAN_SELF_APPROVE is in the flags at
fanotify_init() time.

[Eric Sandeen: Make behavior contingent on fanotify_init flag]

Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

Resending this; first submission to lkml generated no responses, but offline
Eric Paris indicated that the original patch was "policy in the kernel,"
so I'll see what people think of making it contingent on an fanotify_init
flag at syscall time.



--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jan Kara Jan. 28, 2016, 1:56 p.m. UTC | #1
Hi,

On Tue 26-01-16 17:21:08, Eric Sandeen wrote:
> From: Steve Grubb <sgrubb@redhat.com>
> 
> If a daemon using FANOTIFY needs to open a file on a watched filesystem and
> its wanting OPEN_PERM events, we get deadlock. (This could happen because
> of a library the daemon is using suddenly decides it needs to look in a new
> file.) Even though the man page says that the daemon should approve its own
> access decision, it really can't. If its in the middle of working on a
> request and that in turn generates another request, the second request is
> going to sit in the queue inside the kernel and not be read because the
> daemon is waiting on a library call that will never finish. We also have no
> idea how many requests are stacked up before we get to it. So, it really
> can't approve its own access requests.
> 
> The solution is to assume that the daemon is going to approve its own file
> access requests. So, any requested access that matches the pid of the program
> receiving fanotify events should be pre-approved in the kernel and not sent
> to user space for approval. This should prevent deadlock.
> 
> This behavior only exists if FAN_SELF_APPROVE is in the flags at
> fanotify_init() time.
> 
> [Eric Sandeen: Make behavior contingent on fanotify_init flag]
> 
> Signed-off-by: Steve Grubb <sgrubb@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> Resending this; first submission to lkml generated no responses, but offline
> Eric Paris indicated that the original patch was "policy in the kernel,"
> so I'll see what people think of making it contingent on an fanotify_init
> flag at syscall time.

AKPM is merging fanotify patches these days so it is good to CC him. I'm
keeping eye on the notification infrastructure so you can CC me when you
need an opinion.

Now to the patch: This solution really looks half baked to me. What if
there are two processes mediating the access? You'll get the deadlock
again: We have processes A and B mediating access. A accesses some file f,
A selfapproves the event for itself but the access is still blocked on the
approval from B. B proceeds to process the access request by A. It accesses
some file which generates the permission event - now B is blocked waiting
for A to approve its event. Dang.

This really resembles a situation when e.g. multipathd must be damn carful
not to generate any IO while it is running lest it deadlocks while
reconfiguring paths. So here you have to be damn careful not to generate
any events when processing fanotify permission events... And I know that is
inconvenient but that's how things were designed and duck taping some
special cases IMHO is not acceptable.

One solution which would look reasonably clean to me would be that a
process could have a capability which when set would mean that accesses by
that process would not generate fanotify permission events. This would
really avoid the deadlocks. But then botnet / virus writers would quickly
learn to set this capability for their processes so I'm not sure if this
doesn't somewhat defeat the purpose of permission events. OTOH setting
this capability would be gated by CAP_SYS_ADMIN anyway and once you have
this you have other ways to circumvent any protection so it is not
catastrophical. But still it makes life easier for the bad guys.

								Honza
Steve Grubb March 30, 2016, 6:47 p.m. UTC | #2
On Thursday, January 28, 2016 02:56:11 PM Jan Kara wrote:
> Hi,
> 
> On Tue 26-01-16 17:21:08, Eric Sandeen wrote:
> > From: Steve Grubb <sgrubb@redhat.com>
> > 
> > If a daemon using FANOTIFY needs to open a file on a watched filesystem
> > and
> > its wanting OPEN_PERM events, we get deadlock. (This could happen because
> > of a library the daemon is using suddenly decides it needs to look in a
> > new
> > file.) Even though the man page says that the daemon should approve its
> > own
> > access decision, it really can't. If its in the middle of working on a
> > request and that in turn generates another request, the second request is
> > going to sit in the queue inside the kernel and not be read because the
> > daemon is waiting on a library call that will never finish. We also have
> > no
> > idea how many requests are stacked up before we get to it. So, it really
> > can't approve its own access requests.
> > 
> > The solution is to assume that the daemon is going to approve its own file
> > access requests. So, any requested access that matches the pid of the
> > program receiving fanotify events should be pre-approved in the kernel
> > and not sent to user space for approval. This should prevent deadlock.
> > 
> > This behavior only exists if FAN_SELF_APPROVE is in the flags at
> > fanotify_init() time.
> > 
> > [Eric Sandeen: Make behavior contingent on fanotify_init flag]
> > 
> > Signed-off-by: Steve Grubb <sgrubb@redhat.com>
> > Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > ---
> > 
> > Resending this; first submission to lkml generated no responses, but
> > offline Eric Paris indicated that the original patch was "policy in the
> > kernel," so I'll see what people think of making it contingent on an
> > fanotify_init flag at syscall time.
> 
> AKPM is merging fanotify patches these days so it is good to CC him. I'm
> keeping eye on the notification infrastructure so you can CC me when you
> need an opinion.
> 
> Now to the patch: This solution really looks half baked to me. What if
> there are two processes mediating the access?

While this is certainly possible, is this actually done in real life?


> You'll get the deadlock again: We have processes A and B mediating access. A
> accesses some file f, A selfapproves the event for itself but the access is
> still blocked on the approval from B. B proceeds to process the access
> request by A. It accesses some file which generates the permission event -
> now B is blocked waiting for A to approve its event. Dang.
> 
> This really resembles a situation when e.g. multipathd must be damn carful
> not to generate any IO while it is running lest it deadlocks while
> reconfiguring paths. So here you have to be damn careful not to generate
> any events when processing fanotify permission events... And I know that is
> inconvenient but that's how things were designed and duck taping some
> special cases IMHO is not acceptable.

The issue here is that any relatively interesting program will have several 
libraries who could at any time decide it needs to open a file. Perhaps even 
/etc/hosts for network name resolution. You really don't know what the third 
party libraries might do and the consequences are pretty severe - deadlock.

You could make it multi-threaded and have one thread dequeuing approval 
requests and another servicing them. But...each request has a live descriptor 
that counts towards the rlimit for max open descriptors. Hitting that is bad 
also.

The simplest solution is to assume that the daemon is going to approve its own 
request. It would never refuse its own request, should it?

-Steve

> One solution which would look reasonably clean to me would be that a
> process could have a capability which when set would mean that accesses by
> that process would not generate fanotify permission events. This would
> really avoid the deadlocks. But then botnet / virus writers would quickly
> learn to set this capability for their processes so I'm not sure if this
> doesn't somewhat defeat the purpose of permission events. OTOH setting
> this capability would be gated by CAP_SYS_ADMIN anyway and once you have
> this you have other ways to circumvent any protection so it is not
> catastrophical. But still it makes life easier for the bad guys.
> 
> 								Honza

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara March 31, 2016, 11:17 a.m. UTC | #3
On Wed 30-03-16 14:47:31, Steve Grubb wrote:
> On Thursday, January 28, 2016 02:56:11 PM Jan Kara wrote:
> > On Tue 26-01-16 17:21:08, Eric Sandeen wrote:
> > > From: Steve Grubb <sgrubb@redhat.com>
> > > 
> > > If a daemon using FANOTIFY needs to open a file on a watched filesystem
> > > and
> > > its wanting OPEN_PERM events, we get deadlock. (This could happen because
> > > of a library the daemon is using suddenly decides it needs to look in a
> > > new
> > > file.) Even though the man page says that the daemon should approve its
> > > own
> > > access decision, it really can't. If its in the middle of working on a
> > > request and that in turn generates another request, the second request is
> > > going to sit in the queue inside the kernel and not be read because the
> > > daemon is waiting on a library call that will never finish. We also have
> > > no
> > > idea how many requests are stacked up before we get to it. So, it really
> > > can't approve its own access requests.
> > > 
> > > The solution is to assume that the daemon is going to approve its own file
> > > access requests. So, any requested access that matches the pid of the
> > > program receiving fanotify events should be pre-approved in the kernel
> > > and not sent to user space for approval. This should prevent deadlock.
> > > 
> > > This behavior only exists if FAN_SELF_APPROVE is in the flags at
> > > fanotify_init() time.
> > > 
> > > [Eric Sandeen: Make behavior contingent on fanotify_init flag]
> > > 
> > > Signed-off-by: Steve Grubb <sgrubb@redhat.com>
> > > Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > > ---
> > > 
> > > Resending this; first submission to lkml generated no responses, but
> > > offline Eric Paris indicated that the original patch was "policy in the
> > > kernel," so I'll see what people think of making it contingent on an
> > > fanotify_init flag at syscall time.
> > 
> > AKPM is merging fanotify patches these days so it is good to CC him. I'm
> > keeping eye on the notification infrastructure so you can CC me when you
> > need an opinion.
> > 
> > Now to the patch: This solution really looks half baked to me. What if
> > there are two processes mediating the access?
> 
> While this is certainly possible, is this actually done in real life?

Honestly, I don't know. But it doesn't seem like too exotic configuration
to me. Think for example about containers. It just seems as a bad design to
restrict fanotify permission events to: "only one process in the system
can safely use this". That has always proven to be a bad idea in the long
run.

> > You'll get the deadlock again: We have processes A and B mediating access. A
> > accesses some file f, A selfapproves the event for itself but the access is
> > still blocked on the approval from B. B proceeds to process the access
> > request by A. It accesses some file which generates the permission event -
> > now B is blocked waiting for A to approve its event. Dang.
> > 
> > This really resembles a situation when e.g. multipathd must be damn carful
> > not to generate any IO while it is running lest it deadlocks while
> > reconfiguring paths. So here you have to be damn careful not to generate
> > any events when processing fanotify permission events... And I know that is
> > inconvenient but that's how things were designed and duck taping some
> > special cases IMHO is not acceptable.
> 
> The issue here is that any relatively interesting program will have several 
> libraries who could at any time decide it needs to open a file. Perhaps even 
> /etc/hosts for network name resolution. You really don't know what the third 
> party libraries might do and the consequences are pretty severe - deadlock.
> 
> You could make it multi-threaded and have one thread dequeuing approval 
> requests and another servicing them. But...each request has a live descriptor 
> that counts towards the rlimit for max open descriptors. Hitting that is bad 
> also.

I agree this reduces the attractivity of the two thread design.
 
> The simplest solution is to assume that the daemon is going to approve
> its own request. It would never refuse its own request, should it?

I agree this is a solution which fixes your usecase but I don't think it is
good enough for general use. I won't accept anything that doesn't work for
several users of fanotify permission events. As I wrote below in my
original email, you can use for example capabilities to make things work for
several processes without too big hassle...

> > One solution which would look reasonably clean to me would be that a
> > process could have a capability which when set would mean that accesses by
> > that process would not generate fanotify permission events. This would
> > really avoid the deadlocks. But then botnet / virus writers would quickly
> > learn to set this capability for their processes so I'm not sure if this
> > doesn't somewhat defeat the purpose of permission events. OTOH setting
> > this capability would be gated by CAP_SYS_ADMIN anyway and once you have
> > this you have other ways to circumvent any protection so it is not
> > catastrophical. But still it makes life easier for the bad guys.

								Honza
Lino Sanfilippo April 1, 2016, 11:05 p.m. UTC | #4
On 30.03.2016 20:47, Steve Grubb wrote:


Hi,

>> Now to the patch: This solution really looks half baked to me. What if
>> there are two processes mediating the access?
> 
> While this is certainly possible, is this actually done in real life?

we actually DID do this. We did not use fanotify though, since this did
not exist then, but dazukoFS a stackable filesystem which worked similar
to fanotify.
If you want to scan files e.g for viruses you want to have it done as
fast as possible and multiple scanner (processes) is what you need then.

> 
> 
>> You'll get the deadlock again: We have processes A and B mediating access. A
>> accesses some file f, A selfapproves the event for itself but the access is
>> still blocked on the approval from B. B proceeds to process the access
>> request by A. It accesses some file which generates the permission event -
>> now B is blocked waiting for A to approve its event. Dang.
>> 
>> This really resembles a situation when e.g. multipathd must be damn carful
>> not to generate any IO while it is running lest it deadlocks while
>> reconfiguring paths. So here you have to be damn careful not to generate
>> any events when processing fanotify permission events... And I know that is
>> inconvenient but that's how things were designed and duck taping some
>> special cases IMHO is not acceptable.
> 
> The issue here is that any relatively interesting program will have several 
> libraries who could at any time decide it needs to open a file. Perhaps even 
> /etc/hosts for network name resolution. You really don't know what the third 
> party libraries might do and the consequences are pretty severe - deadlock.
> 
> You could make it multi-threaded and have one thread dequeuing approval 
> requests and another servicing them. But...each request has a live descriptor 
> that counts towards the rlimit for max open descriptors. Hitting that is bad 
> also.
> 
> The simplest solution is to assume that the daemon is going to approve its own 
> request. It would never refuse its own request, should it?
> 

I have thought about this problem long ago when fanotify came out,
because dazukoFS actually provided the possibility to set arbitrary
processes to an "ignore" list exactly for this reason and I wondered how
we would handle this with fanotify.

But then I came to the result that all you need to do is make sure that
you have one dedicated process which does nothing else than approving
(or denying) file accesses. If that process knows the pid of the
processes that handle the file accesses (in our case the file scanner
processes) it could simply ack all accesses done by them (note that the
pid of the file accessing process is part of an fanotify event).
For all file accesses done by unknown pids it would have to wait for an
ACK from the scanner processes. Sure, that would require quite some
interprocess communication between scanner processes and the "approver
process" but it should work.

Regards,
Lino


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index d2f97ec..9b5802c 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -105,6 +105,8 @@  static bool fanotify_should_send_event(struct fsnotify_mark *inode_mark,
 {
 	__u32 marks_mask, marks_ignored_mask;
 	struct path *path = data;
+	pid_t grp_pid;
+	struct pid *cur_pid;
 
 	pr_debug("%s: inode_mark=%p vfsmnt_mark=%p mask=%x data=%p"
 		 " data_type=%d\n", __func__, inode_mark, vfsmnt_mark,
@@ -139,6 +141,17 @@  static bool fanotify_should_send_event(struct fsnotify_mark *inode_mark,
 		BUG();
 	}
 
+	/* Auto-approve the listening process's own requests if asked to */
+	grp_pid = pid_nr(vfsmnt_mark->group->fanotify_data.pid);
+	if (grp_pid) {
+		cur_pid = get_pid(task_tgid(current));
+		if (grp_pid == pid_nr(cur_pid)) {
+			put_pid(cur_pid);
+			return false;
+		}
+		put_pid(cur_pid);
+	}
+
 	if (d_is_dir(path->dentry) &&
 	    !(marks_mask & FS_ISDIR & ~marks_ignored_mask))
 		return false;
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 8e8e6bc..c81cee8 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -387,6 +387,8 @@  static int fanotify_release(struct inode *ignored, struct file *file)
 	 */
 	wake_up(&group->fanotify_data.access_waitq);
 #endif
+	/* Get rid of reference held since fanotify_init */
+	put_pid(group->fanotify_data.pid);
 
 	/* matches the fanotify_init->fsnotify_alloc_group */
 	fsnotify_destroy_group(group);
@@ -741,6 +743,11 @@  SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	group->fanotify_data.user = user;
 	atomic_inc(&user->fanotify_listeners);
 
+	if (flags & FAN_SELF_APPROVE)
+		group->fanotify_data.pid = get_pid(task_tgid(current));
+	else
+		group->fanotify_data.pid = 0;
+
 	oevent = fanotify_alloc_event(NULL, FS_Q_OVERFLOW, NULL);
 	if (unlikely(!oevent)) {
 		fd = -ENOMEM;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 533c440..48938ad 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -16,6 +16,7 @@ 
 #include <linux/spinlock.h>
 #include <linux/types.h>
 #include <linux/atomic.h>
+#include <linux/pid.h>
 
 /*
  * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily
@@ -184,6 +185,7 @@  struct fsnotify_group {
 			int f_flags;
 			unsigned int max_marks;
 			struct user_struct *user;
+			struct pid *pid;
 		} fanotify_data;
 #endif /* CONFIG_FANOTIFY */
 	};
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 030508d..5b4ce4e 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -35,6 +35,7 @@ 
 
 #define FAN_UNLIMITED_QUEUE	0x00000010
 #define FAN_UNLIMITED_MARKS	0x00000020
+#define FAN_SELF_APPROVE	0x00000040	/* listener pid auto-approved */
 
 #define FAN_ALL_INIT_FLAGS	(FAN_CLOEXEC | FAN_NONBLOCK | \
 				 FAN_ALL_CLASS_BITS | FAN_UNLIMITED_QUEUE |\