fanotify: introduce event flags FAN_EXEC and FAN_EXEC_PERM

Message ID	1531731011.19075.11.camel@mbobrowski.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Message-ID: <1531731011.19075.11.camel@mbobrowski.org> Subject: [PATCH] fanotify: introduce event flags FAN_EXEC and FAN_EXEC_PERM From: Matthew Bobrowski <mbobrowski@mbobrowski.org> To: jack@suse.cz Cc: amir73il@gmail.com, linux-fsdevel@vger.kernel.org, Steve Grubb <sgrubb@redhat.com> Date: Mon, 16 Jul 2018 18:50:11 +1000 Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Matthew Bobrowski July 16, 2018, 8:50 a.m. UTC

Currently, the fanotify API does not provide a means for user space
programs to register and receive events specifically when a file has been
opened with the intent to be executed. Two new event flags FAN_EXEC and
FAN_EXEC_PERM have been introduced to the fanotify API along with updates
to the generic filesystem notification hooks fsnotify_open and
fsnotify_perm in order to support this capability.

Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>

---

The proposed changes have been tested against Linus' mainline source tree
along with testing them against stable kernel releases 4.17.4, 4.17.5 and
4.17.6.

---

in perm check */
+#define FAN_EXEC_PERM		0x00040000	/* File executed in
perm check */
 
 #define FAN_ONDIR		0x40000000	/* event occurred
against dir */
 
@@ -69,13 +71,15 @@
 #define FAN_ALL_EVENTS (FAN_ACCESS |\
 			FAN_MODIFY |\
 			FAN_CLOSE |\
-			FAN_OPEN)
+			FAN_OPEN |\
+			FAN_EXEC)
 
 /*
  * All events which require a permission response from userspace
  */
 #define FAN_ALL_PERM_EVENTS (FAN_OPEN_PERM |\
-			     FAN_ACCESS_PERM)
+			     FAN_ACCESS_PERM |\
+			     FAN_EXEC_PERM)
 
 #define FAN_ALL_OUTGOING_EVENTS	(FAN_ALL_EVENTS |\
 				 FAN_ALL_PERM_EVENTS |\

Marko Rauhamaa July 16, 2018, 9:53 a.m. UTC | #1

Matthew Bobrowski <mbobrowski@mbobrowski.org>:

> Currently, the fanotify API does not provide a means for user space
> programs to register and receive events specifically when a file has
> been opened with the intent to be executed. Two new event flags
> FAN_EXEC and FAN_EXEC_PERM have been introduced to the fanotify API
> along with updates to the generic filesystem notification hooks
> fsnotify_open and fsnotify_perm in order to support this capability.

Does this affect:

  * executable shebang files (files starting with #!)

  * interpreted files:
     - bash ./xyzzy.sh
     - java xyzzy.jar
     - python3 xyzzy.py


Marko

Jan Kara July 16, 2018, 3:26 p.m. UTC | #2

On Mon 16-07-18 18:50:11, Matthew Bobrowski wrote:
> Currently, the fanotify API does not provide a means for user space
> programs to register and receive events specifically when a file has been
> opened with the intent to be executed. Two new event flags FAN_EXEC and
> FAN_EXEC_PERM have been introduced to the fanotify API along with updates
> to the generic filesystem notification hooks fsnotify_open and
> fsnotify_perm in order to support this capability.
> 
> Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>

I miss one important part in this changelog: Why do you need this feature?
Monitoring for read would be enough after all...

								Honza

> 
> ---
> 
> The proposed changes have been tested against Linus' mainline source tree
> along with testing them against stable kernel releases 4.17.4, 4.17.5 and
> 4.17.6.
> 
> ---
> 
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index f90842efea13..4882706e2188 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -197,6 +197,8 @@ static int fanotify_handle_event(struct fsnotify_group
> *group,
>  	BUILD_BUG_ON(FAN_OPEN_PERM != FS_OPEN_PERM);
>  	BUILD_BUG_ON(FAN_ACCESS_PERM != FS_ACCESS_PERM);
>  	BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
> +	BUILD_BUG_ON(FAN_EXEC != FS_EXEC);
> +	BUILD_BUG_ON(FAN_EXEC_PERM != FS_EXEC_PERM);
>  
>  	if (!fanotify_should_send_event(iter_info, mask, data, data_type))
>  		return 0;
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index f174397b63a0..ef5d3eca2e62 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -393,7 +393,7 @@ static __init int fsnotify_init(void)
>  {
>  	int ret;
>  
> -	BUG_ON(hweight32(ALL_FSNOTIFY_EVENTS) != 23);
> +	BUG_ON(hweight32(ALL_FSNOTIFY_EVENTS) != 25);
>  
>  	ret = init_srcu_struct(&fsnotify_mark_srcu);
>  	if (ret)
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index bdaf22582f6e..db3ee74a7903 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -42,9 +42,12 @@ static inline int fsnotify_perm(struct file *file, int
> mask)
>  		return 0;
>  	if (!(mask & (MAY_READ | MAY_OPEN)))
>  		return 0;
> -	if (mask & MAY_OPEN)
> +	if (mask & MAY_OPEN) {
>  		fsnotify_mask = FS_OPEN_PERM;
> -	else if (mask & MAY_READ)
> +		
> +		if (file->f_flags & FMODE_EXEC)
> +			fsnotify_mask |= FS_EXEC_PERM;
> +	} else if (mask & MAY_READ)
>  		fsnotify_mask = FS_ACCESS_PERM;
>  	else
>  		BUG();
> @@ -220,6 +223,9 @@ static inline void fsnotify_open(struct file *file)
>  	if (S_ISDIR(inode->i_mode))
>  		mask |= FS_ISDIR;
>  
> +	if (file->f_flags & FMODE_EXEC)
> +		mask |= FS_EXEC;
> +
>  	fsnotify_parent(path, NULL, mask);
>  	fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
>  }
> diff --git a/include/linux/fsnotify_backend.h
> b/include/linux/fsnotify_backend.h
> index b38964a7a521..7179a82d60d4 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -38,6 +38,7 @@
>  #define FS_DELETE		0x00000200	/* Subfile was deleted
> */
>  #define FS_DELETE_SELF		0x00000400	/* Self was
> deleted */
>  #define FS_MOVE_SELF		0x00000800	/* Self was moved */
> +#define FS_EXEC			0x00001000	/* File was
> executed */
>  
>  #define FS_UNMOUNT		0x00002000	/* inode on umount fs
> */
>  #define FS_Q_OVERFLOW		0x00004000	/* Event queued
> overflowed */
> @@ -45,6 +46,7 @@
>  
>  #define FS_OPEN_PERM		0x00010000	/* open event in an
> permission hook */
>  #define FS_ACCESS_PERM		0x00020000	/* access event in
> a permissions hook */
> +#define FS_EXEC_PERM		0x00040000	/* exec event in
> permission hook */
>  
>  #define FS_EXCL_UNLINK		0x04000000	/* do not send
> events if object is unlinked */
>  #define FS_ISDIR		0x40000000	/* event occurred
> against dir */
> @@ -62,11 +64,12 @@
>  #define FS_EVENTS_POSS_ON_CHILD   (FS_ACCESS | FS_MODIFY | FS_ATTRIB |\
>  				   FS_CLOSE_WRITE | FS_CLOSE_NOWRITE |
> FS_OPEN |\
>  				   FS_MOVED_FROM | FS_MOVED_TO | FS_CREATE
> |\
> -				   FS_DELETE | FS_OPEN_PERM |
> FS_ACCESS_PERM)
> +				   FS_DELETE | FS_OPEN_PERM |
> FS_ACCESS_PERM |\
> +				   FS_EXEC | FS_EXEC_PERM)
>  
>  #define FS_MOVE			(FS_MOVED_FROM | FS_MOVED_TO)
>  
> -#define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM)
> +#define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM |
> FS_EXEC_PERM)
>  
>  #define ALL_FSNOTIFY_EVENTS (FS_ACCESS | FS_MODIFY | FS_ATTRIB | \
>  			     FS_CLOSE_WRITE | FS_CLOSE_NOWRITE | FS_OPEN |
> \
> @@ -75,7 +78,8 @@
>  			     FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED |
> \
>  			     FS_OPEN_PERM | FS_ACCESS_PERM |
> FS_EXCL_UNLINK | \
>  			     FS_ISDIR | FS_IN_ONESHOT | FS_DN_RENAME | \
> -			     FS_DN_MULTISHOT | FS_EVENT_ON_CHILD)
> +			     FS_DN_MULTISHOT | FS_EVENT_ON_CHILD |\
> +			     FS_EXEC | FS_EXEC_PERM)
>  
>  struct fsnotify_group;
>  struct fsnotify_event;
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index 74247917de04..80822af0eeac 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -10,11 +10,13 @@
>  #define FAN_CLOSE_WRITE		0x00000008	/* Writtable file
> closed */
>  #define FAN_CLOSE_NOWRITE	0x00000010	/* Unwrittable file
> closed */
>  #define FAN_OPEN		0x00000020	/* File was opened */
> +#define FAN_EXEC		0x00001000	/* File was executed */
>  
>  #define FAN_Q_OVERFLOW		0x00004000	/* Event queued
> overflowed */
>  
>  #define FAN_OPEN_PERM		0x00010000	/* File open in
> perm check */
>  #define FAN_ACCESS_PERM		0x00020000	/* File accessed
> in perm check */
> +#define FAN_EXEC_PERM		0x00040000	/* File executed in
> perm check */
>  
>  #define FAN_ONDIR		0x40000000	/* event occurred
> against dir */
>  
> @@ -69,13 +71,15 @@
>  #define FAN_ALL_EVENTS (FAN_ACCESS |\
>  			FAN_MODIFY |\
>  			FAN_CLOSE |\
> -			FAN_OPEN)
> +			FAN_OPEN |\
> +			FAN_EXEC)
>  
>  /*
>   * All events which require a permission response from userspace
>   */
>  #define FAN_ALL_PERM_EVENTS (FAN_OPEN_PERM |\
> -			     FAN_ACCESS_PERM)
> +			     FAN_ACCESS_PERM |\
> +			     FAN_EXEC_PERM)
>  
>  #define FAN_ALL_OUTGOING_EVENTS	(FAN_ALL_EVENTS |\
>  				 FAN_ALL_PERM_EVENTS |\

Steve Grubb July 16, 2018, 8:29 p.m. UTC | #3

Hello,

On Monday, July 16, 2018 11:26:53 AM EDT Jan Kara wrote:
> On Mon 16-07-18 18:50:11, Matthew Bobrowski wrote:
> > Currently, the fanotify API does not provide a means for user space
> > programs to register and receive events specifically when a file has been
> > opened with the intent to be executed. Two new event flags FAN_EXEC and
> > FAN_EXEC_PERM have been introduced to the fanotify API along with updates
> > to the generic filesystem notification hooks fsnotify_open and
> > fsnotify_perm in order to support this capability.
> > 
> > Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
> 
> I miss one important part in this changelog: Why do you need this feature?
> Monitoring for read would be enough after all...

There are several reasons that I can think of. There are lots of file 
accesses. It is possible to guess which one is the beginning of an execve, 
but you really don't know. Apps can be started in so many ways. They can be 
run from the runtime linker, they have LD_PRELOAD, they can have an 
unexpected interpreter, they can be statically linked, they can be a script 
which may present a new pattern of file accesses. With all the variations in 
how programs can start up, it would be nice to have one anchor that 
unambiguously says we are overlaying this pid with a whole new app and forget 
everything you knew about the pid. And the access pattern can be accurately 
watched because we always have a marker to start the sequence.

Hope this helps...

-Steve

> > The proposed changes have been tested against Linus' mainline source tree
> > along with testing them against stable kernel releases 4.17.4, 4.17.5 and
> > 4.17.6.
> > 
> > ---
> > 
> > diff --git a/fs/notify/fanotify/fanotify.c
> > b/fs/notify/fanotify/fanotify.c
> > index f90842efea13..4882706e2188 100644
> > --- a/fs/notify/fanotify/fanotify.c
> > +++ b/fs/notify/fanotify/fanotify.c
> > @@ -197,6 +197,8 @@ static int fanotify_handle_event(struct
> > fsnotify_group
> > *group,
> >  	BUILD_BUG_ON(FAN_OPEN_PERM != FS_OPEN_PERM);
> >  	BUILD_BUG_ON(FAN_ACCESS_PERM != FS_ACCESS_PERM);
> >  	BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
> > +	BUILD_BUG_ON(FAN_EXEC != FS_EXEC);
> > +	BUILD_BUG_ON(FAN_EXEC_PERM != FS_EXEC_PERM);
> >  
> >  	if (!fanotify_should_send_event(iter_info, mask, data, data_type))
> >  		return 0;
> > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> > index f174397b63a0..ef5d3eca2e62 100644
> > --- a/fs/notify/fsnotify.c
> > +++ b/fs/notify/fsnotify.c
> > @@ -393,7 +393,7 @@ static __init int fsnotify_init(void)
> >  {
> >  	int ret;
> >  
> > -	BUG_ON(hweight32(ALL_FSNOTIFY_EVENTS) != 23);
> > +	BUG_ON(hweight32(ALL_FSNOTIFY_EVENTS) != 25);
> >  
> >  	ret = init_srcu_struct(&fsnotify_mark_srcu);
> >  	if (ret)
> > diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> > index bdaf22582f6e..db3ee74a7903 100644
> > --- a/include/linux/fsnotify.h
> > +++ b/include/linux/fsnotify.h
> > @@ -42,9 +42,12 @@ static inline int fsnotify_perm(struct file *file, int
> > mask)
> >  		return 0;
> >  	if (!(mask & (MAY_READ | MAY_OPEN)))
> >  		return 0;
> > -	if (mask & MAY_OPEN)
> > +	if (mask & MAY_OPEN) {
> >  		fsnotify_mask = FS_OPEN_PERM;
> > -	else if (mask & MAY_READ)
> > +
> > +		if (file->f_flags & FMODE_EXEC)
> > +			fsnotify_mask |= FS_EXEC_PERM;
> > +	} else if (mask & MAY_READ)
> >  		fsnotify_mask = FS_ACCESS_PERM;
> >  	else
> >  		BUG();
> > @@ -220,6 +223,9 @@ static inline void fsnotify_open(struct file *file)
> >  	if (S_ISDIR(inode->i_mode))
> >  		mask |= FS_ISDIR;
> >  
> > +	if (file->f_flags & FMODE_EXEC)
> > +		mask |= FS_EXEC;
> > +
> >  	fsnotify_parent(path, NULL, mask);
> >  	fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
> >  }
> > diff --git a/include/linux/fsnotify_backend.h
> > b/include/linux/fsnotify_backend.h
> > index b38964a7a521..7179a82d60d4 100644
> > --- a/include/linux/fsnotify_backend.h
> > +++ b/include/linux/fsnotify_backend.h
> > @@ -38,6 +38,7 @@
> >  #define FS_DELETE		0x00000200	/* Subfile was deleted
> > */
> >  #define FS_DELETE_SELF		0x00000400	/* Self was
> > deleted */
> >  #define FS_MOVE_SELF		0x00000800	/* Self was moved */
> > +#define FS_EXEC			0x00001000	/* File was
> > executed */
> >  
> >  #define FS_UNMOUNT		0x00002000	/* inode on umount fs
> > */
> >  #define FS_Q_OVERFLOW		0x00004000	/* Event queued
> > overflowed */
> > @@ -45,6 +46,7 @@
> >  
> >  #define FS_OPEN_PERM		0x00010000	/* open event in an
> > permission hook */
> >  #define FS_ACCESS_PERM		0x00020000	/* access event in
> > a permissions hook */
> > +#define FS_EXEC_PERM		0x00040000	/* exec event in
> > permission hook */
> >  
> >  #define FS_EXCL_UNLINK		0x04000000	/* do not send
> > events if object is unlinked */
> >  #define FS_ISDIR		0x40000000	/* event occurred
> > against dir */
> > @@ -62,11 +64,12 @@
> >  #define FS_EVENTS_POSS_ON_CHILD   (FS_ACCESS | FS_MODIFY | FS_ATTRIB |\
> >  				   FS_CLOSE_WRITE | FS_CLOSE_NOWRITE |
> > FS_OPEN |\
> >  				   FS_MOVED_FROM | FS_MOVED_TO | FS_CREATE
> > 
> > |\
> > 
> > -				   FS_DELETE | FS_OPEN_PERM |
> > FS_ACCESS_PERM)
> > +				   FS_DELETE | FS_OPEN_PERM |
> > FS_ACCESS_PERM |\
> > +				   FS_EXEC | FS_EXEC_PERM)
> >  
> >  #define FS_MOVE			(FS_MOVED_FROM | FS_MOVED_TO)
> >  
> > -#define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM)
> > +#define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM |
> > FS_EXEC_PERM)
> >  
> >  #define ALL_FSNOTIFY_EVENTS (FS_ACCESS | FS_MODIFY | FS_ATTRIB | \
> >  			     FS_CLOSE_WRITE | FS_CLOSE_NOWRITE | FS_OPEN |
> > \
> > @@ -75,7 +78,8 @@
> >  			     FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED |
> > \
> >  			     FS_OPEN_PERM | FS_ACCESS_PERM |
> > FS_EXCL_UNLINK | \
> >  			     FS_ISDIR | FS_IN_ONESHOT | FS_DN_RENAME | \
> > -			     FS_DN_MULTISHOT | FS_EVENT_ON_CHILD)
> > +			     FS_DN_MULTISHOT | FS_EVENT_ON_CHILD |\
> > +			     FS_EXEC | FS_EXEC_PERM)
> >  
> >  struct fsnotify_group;
> >  struct fsnotify_event;
> > diff --git a/include/uapi/linux/fanotify.h
> > b/include/uapi/linux/fanotify.h
> > index 74247917de04..80822af0eeac 100644
> > --- a/include/uapi/linux/fanotify.h
> > +++ b/include/uapi/linux/fanotify.h
> > @@ -10,11 +10,13 @@
> >  #define FAN_CLOSE_WRITE		0x00000008	/* Writtable file
> > closed */
> >  #define FAN_CLOSE_NOWRITE	0x00000010	/* Unwrittable file
> > closed */
> >  #define FAN_OPEN		0x00000020	/* File was opened */
> > +#define FAN_EXEC		0x00001000	/* File was executed */
> >  
> >  #define FAN_Q_OVERFLOW		0x00004000	/* Event queued
> > overflowed */
> >  
> >  #define FAN_OPEN_PERM		0x00010000	/* File open in
> > perm check */
> >  #define FAN_ACCESS_PERM		0x00020000	/* File accessed
> > in perm check */
> > +#define FAN_EXEC_PERM		0x00040000	/* File executed in
> > perm check */
> >  
> >  #define FAN_ONDIR		0x40000000	/* event occurred
> > against dir */
> >  
> > @@ -69,13 +71,15 @@
> >  #define FAN_ALL_EVENTS (FAN_ACCESS |\
> >  			FAN_MODIFY |\
> >  			FAN_CLOSE |\
> > -			FAN_OPEN)
> > +			FAN_OPEN |\
> > +			FAN_EXEC)
> >  
> >  /*
> >   * All events which require a permission response from userspace
> >   */
> >  #define FAN_ALL_PERM_EVENTS (FAN_OPEN_PERM |\
> > -			     FAN_ACCESS_PERM)
> > +			     FAN_ACCESS_PERM |\
> > +			     FAN_EXEC_PERM)
> >  
> >  #define FAN_ALL_OUTGOING_EVENTS	(FAN_ALL_EVENTS |\
> >  				 FAN_ALL_PERM_EVENTS |\

Amir Goldstein July 17, 2018, 12:21 p.m. UTC | #4

On Mon, Jul 16, 2018 at 11:50 AM, Matthew Bobrowski
<mbobrowski@mbobrowski.org> wrote:
> Currently, the fanotify API does not provide a means for user space
> programs to register and receive events specifically when a file has been
> opened with the intent to be executed. Two new event flags FAN_EXEC and
> FAN_EXEC_PERM have been introduced to the fanotify API along with updates
> to the generic filesystem notification hooks fsnotify_open and
> fsnotify_perm in order to support this capability.
>
> Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
>
> ---
>
[...]
> @@ -69,13 +71,15 @@
>  #define FAN_ALL_EVENTS (FAN_ACCESS |\
>                         FAN_MODIFY |\
>                         FAN_CLOSE |\
> -                       FAN_OPEN)
> +                       FAN_OPEN |\
> +                       FAN_EXEC)
>
>  /*
>   * All events which require a permission response from userspace
>   */
>  #define FAN_ALL_PERM_EVENTS (FAN_OPEN_PERM |\
> -                            FAN_ACCESS_PERM)
> +                            FAN_ACCESS_PERM |\
> +                            FAN_EXEC_PERM)
>

If we change these masks that are exposed to user and
there is a user program setting a mark with FAN_ALL_EVENTS,
recompiling that program with new headers will make the binary
incompatible with old kernels.

Jan,

Do you think that is a problem?

Thanks,
Amir.

Jan Kara July 17, 2018, 12:44 p.m. UTC | #5

On Mon 16-07-18 16:29:42, Steve Grubb wrote:
> Hello,
> 
> On Monday, July 16, 2018 11:26:53 AM EDT Jan Kara wrote:
> > On Mon 16-07-18 18:50:11, Matthew Bobrowski wrote:
> > > Currently, the fanotify API does not provide a means for user space
> > > programs to register and receive events specifically when a file has been
> > > opened with the intent to be executed. Two new event flags FAN_EXEC and
> > > FAN_EXEC_PERM have been introduced to the fanotify API along with updates
> > > to the generic filesystem notification hooks fsnotify_open and
> > > fsnotify_perm in order to support this capability.
> > > 
> > > Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
> > 
> > I miss one important part in this changelog: Why do you need this feature?
> > Monitoring for read would be enough after all...
> 
> There are several reasons that I can think of. There are lots of file 
> accesses. It is possible to guess which one is the beginning of an execve, 
> but you really don't know. Apps can be started in so many ways. They can be 
> run from the runtime linker, they have LD_PRELOAD, they can have an 
> unexpected interpreter, they can be statically linked, they can be a script 
> which may present a new pattern of file accesses. With all the variations in 
> how programs can start up, it would be nice to have one anchor that 
> unambiguously says we are overlaying this pid with a whole new app and forget 
> everything you knew about the pid. And the access pattern can be accurately 
> watched because we always have a marker to start the sequence.

I don't quite buy your "marker that pid is starting from scratch" argument.
Firstly, you'd have to place fanotify watches on all executables in your
system to even have a chance to tell that, secondly, the process does not
quite start a new - it still inherits some stuff from the old process like
open file descriptors... So I understand exec might be interesting for
audit purposes but then you should watch it using audit and not fanotify
which is for handling / mediating filesystem accesses.

And when you are looking at filesystem accesses, then there's no real
difference between exec and read which is exactly why I'm not sure why the
new feature is being added.

								Honza

Jan Kara July 17, 2018, 12:48 p.m. UTC | #6

On Tue 17-07-18 15:21:45, Amir Goldstein wrote:
> On Mon, Jul 16, 2018 at 11:50 AM, Matthew Bobrowski
> <mbobrowski@mbobrowski.org> wrote:
> > Currently, the fanotify API does not provide a means for user space
> > programs to register and receive events specifically when a file has been
> > opened with the intent to be executed. Two new event flags FAN_EXEC and
> > FAN_EXEC_PERM have been introduced to the fanotify API along with updates
> > to the generic filesystem notification hooks fsnotify_open and
> > fsnotify_perm in order to support this capability.
> >
> > Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
> >
> > ---
> >
> [...]
> > @@ -69,13 +71,15 @@
> >  #define FAN_ALL_EVENTS (FAN_ACCESS |\
> >                         FAN_MODIFY |\
> >                         FAN_CLOSE |\
> > -                       FAN_OPEN)
> > +                       FAN_OPEN |\
> > +                       FAN_EXEC)
> >
> >  /*
> >   * All events which require a permission response from userspace
> >   */
> >  #define FAN_ALL_PERM_EVENTS (FAN_OPEN_PERM |\
> > -                            FAN_ACCESS_PERM)
> > +                            FAN_ACCESS_PERM |\
> > +                            FAN_EXEC_PERM)
> >
> 
> If we change these masks that are exposed to user and
> there is a user program setting a mark with FAN_ALL_EVENTS,
> recompiling that program with new headers will make the binary
> incompatible with old kernels.
> 
> Jan,
> 
> Do you think that is a problem?

Hum, good point. Honestly, I think it has been a mistake to export
FAN_ALL_EVENTS and FAN_ALL_PERM_EVENTS to userspace. Now either the name is
going to be misleading or there's a risk of breaking existing apps as you
suggest. But let's decide that once I'm convinced this feature is actually
worth it.

								Honza

Steve Grubb July 17, 2018, 1:36 p.m. UTC | #7

On Tuesday, July 17, 2018 8:44:23 AM EDT Jan Kara wrote:
> On Mon 16-07-18 16:29:42, Steve Grubb wrote:
> > Hello,
> > 
> > On Monday, July 16, 2018 11:26:53 AM EDT Jan Kara wrote:
> > > On Mon 16-07-18 18:50:11, Matthew Bobrowski wrote:
> > > > Currently, the fanotify API does not provide a means for user space
> > > > programs to register and receive events specifically when a file has
> > > > been
> > > > opened with the intent to be executed. Two new event flags FAN_EXEC
> > > > and
> > > > FAN_EXEC_PERM have been introduced to the fanotify API along with
> > > > updates
> > > > to the generic filesystem notification hooks fsnotify_open and
> > > > fsnotify_perm in order to support this capability.
> > > > 
> > > > Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
> > > 
> > > I miss one important part in this changelog: Why do you need this
> > > feature?
> > > Monitoring for read would be enough after all...
> > 
> > There are several reasons that I can think of. There are lots of file
> > accesses. It is possible to guess which one is the beginning of an
> > execve, but you really don't know. Apps can be started in so many ways.
> > They can be run from the runtime linker, they have LD_PRELOAD, they can
> > have an unexpected interpreter, they can be statically linked, they can
> > be a script which may present a new pattern of file accesses. With all
> > the variations in how programs can start up, it would be nice to have one
> > anchor that unambiguously says we are overlaying this pid with a whole
> > new app and forget everything you knew about the pid. And the access
> > pattern can be accurately watched because we always have a marker to
> > start the sequence.
> 
> I don't quite buy your "marker that pid is starting from scratch" argument.
> Firstly, you'd have to place fanotify watches on all executables in your
> system to even have a chance to tell that,

True and we do.

> secondly, the process does not quite start a new - it still inherits some
> stuff from the old process like open file descriptors...

True. But that is not exactly relevant to the issues we're looking at.

> So I understand exec might be interesting for audit purposes but then you
> should watch it using audit and not fanotify which is for handling /
> mediating filesystem accesses.

This is not being used for the audit system. But let me illustrate the issue
that I'm looking at. I have the following sequence of events:


pid=13247 exe=/usr/bin/bash file=/home/sgrubb/test/static/test
pid=13250 exe=/usr/bin/bash file=/usr/bin/sed
pid=13250 exe=/usr/bin/bash file=/usr/lib64/ld-2.27.so
pid=13250 exe=/usr/bin/sed file=/etc/ld.so.cache
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libacl.so.1.1.2253
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libselinux.so.1
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libc-2.27.so
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libattr.so.1.1.2448
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libpcre2-8.so.0.7.0
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libdl-2.27.so
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libpthread-2.27.so
pid=13250 exe=/usr/bin/sed file=/usr/lib/locale/locale-archive
pid=13259 exe=/usr/bin/bash file=/home/sgrubb/test/static/wc
pid=13259 exe=/usr/bin/bash file=/home/sgrubb/test/static/test2

There is a clear pattern for sed. But what was bash doing to test? There is
no pattern. Was it an execution or echo "blah" > test? How can you tell? Then
what happened to test2?  (The first was execution of static app, last was
piping the program to wc -c. They pretty much leave the same footprint but
what is happening is very different.)
 
> And when you are looking at filesystem accesses, then there's no real
> difference between exec and read which is exactly why I'm not sure why the
> new feature is being added.

Its about intention to do something different with the data after the read.
That intention is important and just out of reach. But I otherwise agree that
read and execute do pretty much the same thing.

-Steve

Matthew Bobrowski July 18, 2018, 11:17 a.m. UTC | #8

On Tue, 2018-07-17 at 14:44 +0200, Jan Kara wrote:
> On Mon 16-07-18 16:29:42, Steve Grubb wrote:
> > Hello,
> > 
> > On Monday, July 16, 2018 11:26:53 AM EDT Jan Kara wrote:
> > > On Mon 16-07-18 18:50:11, Matthew Bobrowski wrote:
> > > > Currently, the fanotify API does not provide a means for user space
> > > > programs to register and receive events specifically when a file
> > > > has been
> > > > opened with the intent to be executed. Two new event flags FAN_EXEC
> > > > and
> > > > FAN_EXEC_PERM have been introduced to the fanotify API along with
> > > > updates
> > > > to the generic filesystem notification hooks fsnotify_open and
> > > > fsnotify_perm in order to support this capability.
> > > > 
> > > > Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
> > > 
> > > I miss one important part in this changelog: Why do you need this
> > > feature?
> > > Monitoring for read would be enough after all...

We're in the midst of writing an application whitelisting program whereby
it's predominantly concerned about file accesses that are a result of an
execution attempt. Without going into semantics, the execution and overall
access control is enforced by an overarching policy that either permits or
denies a file from being executed.

The current implementation makes use of the FAN_PERM_OPEN event flag in
order to determine whether a file within a monitored mark point has been
potentially opened for execution. Although this does somewhat provide the
functionality that we're looking for, it's far from ideal and I personally
think we can do a lot better. If in a position where you're monitoring an
entire filesystem using the FAN_OPEN_PERM event flag you can only imagine
the number of events that applications of this class need to process
unnecessarily due to the significant number of files being opened system
wide. Having the additional FAN_EXEC* event flags will allow user space
applications of such type to explicitly receive and process events that are
 in fact a result of file being opened and read with the intent on being
executed.

Irrespective of our particular use case, I believe that such flags will be
something that other consumers of this API may also find relatively useful.

> > There are several reasons that I can think of. There are lots of file 
> > accesses. It is possible to guess which one is the beginning of an
> > execve, 
> > but you really don't know. Apps can be started in so many ways. They
> > can be 
> > run from the runtime linker, they have LD_PRELOAD, they can have an 
> > unexpected interpreter, they can be statically linked, they can be a
> > script 
> > which may present a new pattern of file accesses. With all the
> > variations in 
> > how programs can start up, it would be nice to have one anchor that 
> > unambiguously says we are overlaying this pid with a whole new app and
> > forget 
> > everything you knew about the pid. And the access pattern can be
> > accurately 
> > watched because we always have a marker to start the sequence.
> 
> I don't quite buy your "marker that pid is starting from scratch"
> argument.
> Firstly, you'd have to place fanotify watches on all executables in your
> system to even have a chance to tell that, secondly, the process does not
> quite start a new - it still inherits some stuff from the old process
> like
> open file descriptors... So I understand exec might be interesting for
> audit purposes but then you should watch it using audit and not fanotify
> which is for handling / mediating filesystem accesses.
> 
> And when you are looking at filesystem accesses, then there's no real
> difference between exec and read which is exactly why I'm not sure why
> the
> new feature is being added.

Although I do completely agree with the fact that there's no real
difference between an exec and read, I do however believe that in the
context of filesystem accesses it's important to distinguish between the
type access i.e. the intent, right? Let me ask you this. Why is that we
have additional flags such as FAN_ACCESS and FAN_CLOSE_WRITE for example
available within the API? Wouldn't FAN_OPEN be sufficient in terms of
understanding whether some type of other access operation is being
performed on the file object that has been marked for monitoring? Sure,
maybe it would, but it's far from being specific and the entire objective
of all the additional flags is to presumably provide a means for a program
to receive filesystem access events that are a result of a specific intent
or action being performed on the object. Take the following for example:

FAN_ACCESS:

The events that come through for this particular flag are a result of a
file having its contents read. Before any contents can be read from it
however, the actual file needs to be opened.

FAN_EXEC:

The events that come through for this particular flag are a result of a
file being executed. Prior to the file being executed though it needs to be
opened, read, and mapped into memory accordingly.

What's the difference? Both perform an open, so wouldn't that just be
enough to tell some type of access is undertaken? Well, no. It's the intent
that needs highlighting here.

You may sit there and claim that part of a file being executed is to have
it opened and read, so why not just use either FAN_ACCESS or FAN_OPEN,
right? Well sure, but both operations open and read are in fact implicit
actions as a result of an actual explicit intent of the file being
executed. Now, how is FAN_ACCESS and any of the other event flags different
from the new flags that I've proposed here? After all, isn't exec just not
another type of file access despite also potentially having other file
actions embedded within it?

Jan Kara July 19, 2018, 9:33 a.m. UTC | #9

On Tue 17-07-18 09:36:05, Steve Grubb wrote:
> On Tuesday, July 17, 2018 8:44:23 AM EDT Jan Kara wrote:
> > On Mon 16-07-18 16:29:42, Steve Grubb wrote:
> > > Hello,
> > > 
> > > On Monday, July 16, 2018 11:26:53 AM EDT Jan Kara wrote:
> > > > On Mon 16-07-18 18:50:11, Matthew Bobrowski wrote:
> > > > > Currently, the fanotify API does not provide a means for user space
> > > > > programs to register and receive events specifically when a file has
> > > > > been
> > > > > opened with the intent to be executed. Two new event flags FAN_EXEC
> > > > > and
> > > > > FAN_EXEC_PERM have been introduced to the fanotify API along with
> > > > > updates
> > > > > to the generic filesystem notification hooks fsnotify_open and
> > > > > fsnotify_perm in order to support this capability.
> > > > > 
> > > > > Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
> > > > 
> > > > I miss one important part in this changelog: Why do you need this
> > > > feature?
> > > > Monitoring for read would be enough after all...
> > > 
> > > There are several reasons that I can think of. There are lots of file
> > > accesses. It is possible to guess which one is the beginning of an
> > > execve, but you really don't know. Apps can be started in so many ways.
> > > They can be run from the runtime linker, they have LD_PRELOAD, they can
> > > have an unexpected interpreter, they can be statically linked, they can
> > > be a script which may present a new pattern of file accesses. With all
> > > the variations in how programs can start up, it would be nice to have one
> > > anchor that unambiguously says we are overlaying this pid with a whole
> > > new app and forget everything you knew about the pid. And the access
> > > pattern can be accurately watched because we always have a marker to
> > > start the sequence.
> > 
> > I don't quite buy your "marker that pid is starting from scratch" argument.
> > Firstly, you'd have to place fanotify watches on all executables in your
> > system to even have a chance to tell that,
> 
> True and we do.
> 
> > secondly, the process does not quite start a new - it still inherits some
> > stuff from the old process like open file descriptors...
> 
> True. But that is not exactly relevant to the issues we're looking at.

OK.

> > So I understand exec might be interesting for audit purposes but then you
> > should watch it using audit and not fanotify which is for handling /
> > mediating filesystem accesses.
> 
> This is not being used for the audit system. But let me illustrate the issue
> that I'm looking at. I have the following sequence of events:
> 
> 
> pid=13247 exe=/usr/bin/bash file=/home/sgrubb/test/static/test
> pid=13250 exe=/usr/bin/bash file=/usr/bin/sed
> pid=13250 exe=/usr/bin/bash file=/usr/lib64/ld-2.27.so
> pid=13250 exe=/usr/bin/sed file=/etc/ld.so.cache
> pid=13250 exe=/usr/bin/sed file=/usr/lib64/libacl.so.1.1.2253
> pid=13250 exe=/usr/bin/sed file=/usr/lib64/libselinux.so.1
> pid=13250 exe=/usr/bin/sed file=/usr/lib64/libc-2.27.so
> pid=13250 exe=/usr/bin/sed file=/usr/lib64/libattr.so.1.1.2448
> pid=13250 exe=/usr/bin/sed file=/usr/lib64/libpcre2-8.so.0.7.0
> pid=13250 exe=/usr/bin/sed file=/usr/lib64/libdl-2.27.so
> pid=13250 exe=/usr/bin/sed file=/usr/lib64/libpthread-2.27.so
> pid=13250 exe=/usr/bin/sed file=/usr/lib/locale/locale-archive
> pid=13259 exe=/usr/bin/bash file=/home/sgrubb/test/static/wc
> pid=13259 exe=/usr/bin/bash file=/home/sgrubb/test/static/test2

These are file accesses I suppose.

> There is a clear pattern for sed. But what was bash doing to test? There is
> no pattern. Was it an execution or echo "blah" > test? How can you tell? Then
> what happened to test2?  (The first was execution of static app, last was
> piping the program to wc -c. They pretty much leave the same footprint but
> what is happening is very different.)

Yes, but my point is you just cannot tell what a process is going to do
with the file. E.g. if you run a script as ". my_script.sh", it will get
executed for all practical purposes but there's no "open for execution"
happening - the file is just read as any other and then interpretted. No
way to discern this from "<my_script.sh" from kernel POV. And similar thing
happens for dynamic libraries which is also executable code.

And if you want to look specifically at "what's getting used for exec(2)?",
then just trace exec(2) system call (either using audit or using syscall trace
points) and you get all the details including file name.

> > And when you are looking at filesystem accesses, then there's no real
> > difference between exec and read which is exactly why I'm not sure why the
> > new feature is being added.
> 
> Its about intention to do something different with the data after the read.
> That intention is important and just out of reach. But I otherwise agree that
> read and execute do pretty much the same thing.

Yes, and my point is the intention is sometimes communicated and sometimes
not. And I don't want to add an API which is vaguely defined... 

								Honza

Jan Kara July 19, 2018, 10:17 a.m. UTC | #10

On Wed 18-07-18 21:17:44, Matthew Bobrowski wrote:
> On Tue, 2018-07-17 at 14:44 +0200, Jan Kara wrote:
> > On Mon 16-07-18 16:29:42, Steve Grubb wrote:
> > > Hello,
> > > 
> > > On Monday, July 16, 2018 11:26:53 AM EDT Jan Kara wrote:
> > > > On Mon 16-07-18 18:50:11, Matthew Bobrowski wrote:
> > > > > Currently, the fanotify API does not provide a means for user space
> > > > > programs to register and receive events specifically when a file
> > > > > has been
> > > > > opened with the intent to be executed. Two new event flags FAN_EXEC
> > > > > and
> > > > > FAN_EXEC_PERM have been introduced to the fanotify API along with
> > > > > updates
> > > > > to the generic filesystem notification hooks fsnotify_open and
> > > > > fsnotify_perm in order to support this capability.
> > > > > 
> > > > > Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
> > > > 
> > > > I miss one important part in this changelog: Why do you need this
> > > > feature?
> > > > Monitoring for read would be enough after all...
> 
> We're in the midst of writing an application whitelisting program whereby
> it's predominantly concerned about file accesses that are a result of an
> execution attempt. Without going into semantics, the execution and overall
> access control is enforced by an overarching policy that either permits or
> denies a file from being executed.

OK, and I guess my point is that fanotify is not, in my opinion, a suitable
API for this. What you describe above is exactly what LSMs (Linux Security
Modules) are designed to be used for (but admittedly I'm no expert on
these). So you can use e.g. Apparmor to whitelist executables, that can be
executed by your contained program. Or similarly you can use SELinux, tag
executables appropriately and only properly tagged executables will be able
to be executed by your contained program. There's no need to reinvent the
functionality in fanotify + your framework...

> The current implementation makes use of the FAN_PERM_OPEN event flag in
> order to determine whether a file within a monitored mark point has been
> potentially opened for execution. Although this does somewhat provide the
> functionality that we're looking for, it's far from ideal and I personally
> think we can do a lot better. If in a position where you're monitoring an
> entire filesystem using the FAN_OPEN_PERM event flag you can only imagine
> the number of events that applications of this class need to process
> unnecessarily due to the significant number of files being opened system
> wide. Having the additional FAN_EXEC* event flags will allow user space
> applications of such type to explicitly receive and process events that are
>  in fact a result of file being opened and read with the intent on being
> executed.
> 
> Irrespective of our particular use case, I believe that such flags will be
> something that other consumers of this API may also find relatively useful.

Another problem I have with the flag is it is not reliable - e.g. shared
libraries get executed as well but are not open with FMODE_EXEC. Shell
scripts executed as ". script.sh" as well. So the semantics would be weakly
defined in my opinion.

> > > There are several reasons that I can think of. There are lots of file 
> > > accesses. It is possible to guess which one is the beginning of an
> > > execve, 
> > > but you really don't know. Apps can be started in so many ways. They
> > > can be 
> > > run from the runtime linker, they have LD_PRELOAD, they can have an 
> > > unexpected interpreter, they can be statically linked, they can be a
> > > script 
> > > which may present a new pattern of file accesses. With all the
> > > variations in 
> > > how programs can start up, it would be nice to have one anchor that 
> > > unambiguously says we are overlaying this pid with a whole new app and
> > > forget 
> > > everything you knew about the pid. And the access pattern can be
> > > accurately 
> > > watched because we always have a marker to start the sequence.
> > 
> > I don't quite buy your "marker that pid is starting from scratch"
> > argument.
> > Firstly, you'd have to place fanotify watches on all executables in your
> > system to even have a chance to tell that, secondly, the process does not
> > quite start a new - it still inherits some stuff from the old process
> > like
> > open file descriptors... So I understand exec might be interesting for
> > audit purposes but then you should watch it using audit and not fanotify
> > which is for handling / mediating filesystem accesses.
> > 
> > And when you are looking at filesystem accesses, then there's no real
> > difference between exec and read which is exactly why I'm not sure why
> > the
> > new feature is being added.
> 
> Although I do completely agree with the fact that there's no real
> difference between an exec and read, I do however believe that in the
> context of filesystem accesses it's important to distinguish between the
> type access i.e. the intent, right? Let me ask you this. Why is that we
> have additional flags such as FAN_ACCESS and FAN_CLOSE_WRITE for example
> available within the API? Wouldn't FAN_OPEN be sufficient in terms of
> understanding whether some type of other access operation is being
> performed on the file object that has been marked for monitoring? Sure,
> maybe it would, but it's far from being specific and the entire objective
> of all the additional flags is to presumably provide a means for a program
> to receive filesystem access events that are a result of a specific intent
> or action being performed on the object. Take the following for example:
> 
> FAN_ACCESS:
> 
> The events that come through for this particular flag are a result of a
> file having its contents read. Before any contents can be read from it
> however, the actual file needs to be opened.
> 
> FAN_EXEC:
> 
> The events that come through for this particular flag are a result of a
> file being executed. Prior to the file being executed though it needs to be
> opened, read, and mapped into memory accordingly.
> 
> What's the difference? Both perform an open, so wouldn't that just be
> enough to tell some type of access is undertaken? Well, no. It's the intent
> that needs highlighting here.
> 
> You may sit there and claim that part of a file being executed is to have
> it opened and read, so why not just use either FAN_ACCESS or FAN_OPEN,
> right? Well sure, but both operations open and read are in fact implicit
> actions as a result of an actual explicit intent of the file being
> executed. Now, how is FAN_ACCESS and any of the other event flags different
> from the new flags that I've proposed here? After all, isn't exec just not
> another type of file access despite also potentially having other file
> actions embedded within it?

So fanotify is a filesystem event notification API. For filesystem, open
and read are fundamentally different events and as such we have different
FAN_OPEN and FAN_ACCESS events in the API. The only disputable events we
have in the API are FAN_CLOSE_WRITE vs FAN_CLOSE_NOWRITE - from fs POV
there's no big difference. But at least this is 100% reliably (unlike
FMODE_EXEC) telling you whether the user was able to modify the file or not
and it caters to one of the use cases this API has been created for - virus
scanners, file caching daemons, ... - i.e., triggering specific actions
based on file contents.

What you are implementing seems to me more like access mediation based on
process (or its capabilities) which is really more a task for LSM.

								Honza

Steve Grubb July 19, 2018, 12:39 p.m. UTC | #11

Hello Jan,

On Thursday, July 19, 2018 5:33:07 AM EDT Jan Kara wrote:
> > This is not being used for the audit system. But let me illustrate the
> > issue that I'm looking at. I have the following sequence of events:
> > 
> > pid=13247 exe=/usr/bin/bash file=/home/sgrubb/test/static/test
> > pid=13250 exe=/usr/bin/bash file=/usr/bin/sed
> > pid=13250 exe=/usr/bin/bash file=/usr/lib64/ld-2.27.so
> > pid=13250 exe=/usr/bin/sed file=/etc/ld.so.cache
> > pid=13250 exe=/usr/bin/sed file=/usr/lib64/libacl.so.1.1.2253
> > pid=13250 exe=/usr/bin/sed file=/usr/lib64/libselinux.so.1
> > pid=13250 exe=/usr/bin/sed file=/usr/lib64/libc-2.27.so
> > pid=13250 exe=/usr/bin/sed file=/usr/lib64/libattr.so.1.1.2448
> > pid=13250 exe=/usr/bin/sed file=/usr/lib64/libpcre2-8.so.0.7.0
> > pid=13250 exe=/usr/bin/sed file=/usr/lib64/libdl-2.27.so
> > pid=13250 exe=/usr/bin/sed file=/usr/lib64/libpthread-2.27.so
> > pid=13250 exe=/usr/bin/sed file=/usr/lib/locale/locale-archive
> > pid=13259 exe=/usr/bin/bash file=/home/sgrubb/test/static/wc
> > pid=13259 exe=/usr/bin/bash file=/home/sgrubb/test/static/test2
> 
> These are file accesses I suppose.
> 
> > There is a clear pattern for sed. But what was bash doing to test? There
> > is no pattern. Was it an execution or echo "blah" > test? How can you
> > tell? Then what happened to test2?  (The first was execution of static
> > app, last was piping the program to wc -c. They pretty much leave the
> > same footprint but what is happening is very different.)
> 
> Yes, but my point is you just cannot tell what a process is going to do
> with the file. E.g. if you run a script as ". my_script.sh", it will get
> executed for all practical purposes but there's no "open for execution"

That is true. But a shell script eventually causes a real execution of 
something. The main problem is when it comes to statically linked programs. 
You get one view of the program and its gone. Unlike dynamically linked 
programs where there is a chain of things that happen. But going back to the 
script example, the binary interpretting the script has not changed so we can 
continue to maintain information about that pid.

> happening - the file is just read as any other and then interpretted. No
> way to discern this from "<my_script.sh" from kernel POV. And similar thing
> happens for dynamic libraries which is also executable code.

Any special handling of shared objects is not needed. The important thing is 
twofold. The first knowing that any information about that pid is now being 
overlaid and therefore we have to start over in determining trust. But also 
if the execution is a statically linked program, we absolutely need to know 
the intent. It is impossible to tell a benign read or write from an actual 
execution given what we know via the fanotify interface.

> And if you want to look specifically at "what's getting used for exec(2)?",
> then just trace exec(2) system call (either using audit or using syscall
> trace points) and you get all the details including file name.

Wouldn't that hurt overall system performance?

> > > And when you are looking at filesystem accesses, then there's no real
> > > difference between exec and read which is exactly why I'm not sure why
> > > the new feature is being added.
> > 
> > Its about intention to do something different with the data after the
> > read. That intention is important and just out of reach. But I otherwise
> > agree that read and execute do pretty much the same thing.
> 
> Yes, and my point is the intention is sometimes communicated and sometimes
> not. And I don't want to add an API which is vaguely defined...

To me, its pretty crisp. For everything except statically linked programs, 
there are patterns of access that tell you what is going on. But sometimes 
you need that marker to know this is the beginning of a sequence and start 
over and build up new information. And in the case of statically linked 
program, there is no pattern of access. It's one shot and its over. It's very 
important to identify the intent for this item. 

And I'd say, impossible to make decisions that allow certain system activity 
like creation of a file or packaging of a file vs stopping the execution of 
something unknown. More information is needed so that the system is tolerant 
of user activity and not a PITA because it has to be super paranoid. For 
example, system update. All these files are unknown but they are being 
written and not executed. This activity is OK. But once the update is 
complete, the extra information let's us know when to start paying attention 
again.

Cheers,
-Steve

Steve Grubb July 19, 2018, 1:06 p.m. UTC | #12

Hello Jan,

On Thursday, July 19, 2018 5:33:07 AM EDT Jan Kara wrote:
> Yes, and my point is the intention is sometimes communicated and sometimes
> not. And I don't want to add an API which is vaguely defined...

Maybe this is better illustrated with actual data. For an application 
whitelisting daemon to be usable, it has to cache information and look that 
up rather than make expensive syscalls in order to make a decision. Every 
syscall it makes is delaying system access.

So, how can we tell when its time to evict cached information and start over? 
We are given 2 pieces of information in a fanotify event. The fd and the pid. 
What can be done to find an execve is to stat("/proc/pid") and look at the 
inode create time. Suppose we have this event stream:

pid=13250 exe=/usr/bin/bash file=/usr/bin/sed  <- execve occurs here
pid=13250 exe=/usr/bin/bash file=/usr/lib64/ld-2.27.so
pid=13250 exe=/usr/bin/sed file=/etc/ld.so.cache  <- /proc/pid updated here
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libacl.so.1.1.2253
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libselinux.so.1
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libc-2.27.so
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libattr.so.1.1.2448
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libpcre2-8.so.0.7.0
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libdl-2.27.so
pid=13250 exe=/usr/bin/sed file=/usr/lib64/libpthread-2.27.so
pid=13250 exe=/usr/bin/sed file=/usr/lib/locale/locale-archive

In this example, we can determine that execve occurs, but we are 2 accesses 
late in knowing this. We can keep a ring buffer to have this history for 
pattern analysis but its not ideal. Contrast that with a statically linked 
program:

pid=13247 exe=/usr/bin/bash file=/home/sgrubb/static/test <- execeve here
..

We cannot tell that an execution occurs because by the time /proc/pid updates 
its creation time, execution has already happened and we don't see that 
executable again until it accesses a file and then we can notice its a new 
program. By then its too late.

I suppose we could use tracing as you suggested. But then we need to 
correlate the 2 independent event streams in addition possibly adding 
additional system overhead. And considering there is an event queue inside 
the daemon, finding the right fanotify event in the queue to add this 
metadata to will add complexity and may be prone to error due to racing. A 
simpler solution is to just tag the access as originating from execve and 
it's automatically correlated with file access and the cost is perhaps a 
couple if statements.

Hope this helps...

-Steve

Marko Rauhamaa July 19, 2018, 2:18 p.m. UTC | #13

Jan Kara <jack@suse.cz>:
> So fanotify is a filesystem event notification API. For filesystem,
> open and read are fundamentally different events and as such we have
> different FAN_OPEN and FAN_ACCESS events in the API. The only
> disputable events we have in the API are FAN_CLOSE_WRITE vs
> FAN_CLOSE_NOWRITE - from fs POV there's no big difference. But at
> least this is 100% reliably (unlike FMODE_EXEC) telling you whether
> the user was able to modify the file or not and it caters to one of
> the use cases this API has been created for - virus scanners, file
> caching daemons, ... - i.e., triggering specific actions based on file
> contents.

As a side note from the virus scanner point of view,
FAN_CLOSE_WRITE_PERM would be really useful because it would prevent the
hit-and-run corruption of a file. As it stands, fanotify communicates
the pid of the culprit but the process is long gone by the time you get
to analyze it...


Marko

Steve Grubb July 19, 2018, 2:59 p.m. UTC | #14

On Thursday, July 19, 2018 6:17:08 AM EDT Jan Kara wrote:
> So fanotify is a filesystem event notification API. For filesystem, open
> and read are fundamentally different events and as such we have different
> FAN_OPEN and FAN_ACCESS events in the API. The only disputable events we
> have in the API are FAN_CLOSE_WRITE vs FAN_CLOSE_NOWRITE - from fs POV
> there's no big difference. But at least this is 100% reliably (unlike
> FMODE_EXEC) telling you whether the user was able to modify the file or not
> and it caters to one of the use cases this API has been created for -
> virus scanners, file caching daemons, ... - i.e., triggering specific
> actions based on file contents.

Would it be more acceptable to not add FAN_EXEC_PERM on the front end where 
you ask for it at fanotify_mark. But rather add only FAN_EXEC? This would 
reduce the proposed API and just turn it into additional metadata about 
events that are already being requested. This ways you can do something like:

mask = FAN_OPEN_PERM | FAN_EXEC;

and then pass that to fanotify_mark. It would not affect old programs because 
they simply wouldn't ask for the bit. Would this be more palatable?

Best Regards,
-Steve

fanotify: introduce event flags FAN_EXEC and FAN_EXEC_PERM

Commit Message

Comments

Patch