[RFC,5/9] Define user structure for events and responses.

Message ID	cde6bbf0b52710b33170f2787fdcb11538e40813.1741047969.git.m@maowtm.org (mailing list archive)
State	Handled Elsewhere
Headers	show Received: from flow-b4-smtp.messagingengine.com (flow-b4-smtp.messagingengine.com [202.12.124.139]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E81E33D8; Tue, 4 Mar 2025 01:20:04 +0000 (UTC) Feedback-ID: i580e4893:Fastmail From: Tingmao Wang <m@maowtm.org> To: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= <mic@digikod.net>, =?utf-8?q?G?= =?utf-8?q?=C3=BCnther_Noack?= <gnoack@google.com>, Jan Kara <jack@suse.cz> Cc: Tingmao Wang <m@maowtm.org>, linux-security-module@vger.kernel.org, Amir Goldstein <amir73il@gmail.com>, Matthew Bobrowski <repnop@google.com>, linux-fsdevel@vger.kernel.org, Tycho Andersen <tycho@tycho.pizza> Subject: [RFC PATCH 5/9] Define user structure for events and responses. Date: Tue, 4 Mar 2025 01:13:01 +0000 Message-ID: <cde6bbf0b52710b33170f2787fdcb11538e40813.1741047969.git.m@maowtm.org> In-Reply-To: <cover.1741047969.git.m@maowtm.org> References: <cover.1741047969.git.m@maowtm.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Landlock supervise: a mechanism for interactive permission requests \| expand [RFC,0/9] Landlock supervise: a mechanism for interactive permission requests [RFC,1/9] Define the supervisor and event structure [RFC,2/9] Refactor per-layer information in rulesets and rules [RFC,3/9] Adds a supervisor reference in the per-layer information [RFC,4/9] User-space API for creating a supervisor-fd [RFC,5/9] Define user structure for events and responses. [RFC,6/9] Creating supervisor events for filesystem operations [RFC,7/9] Implement fdinfo for ruleset and supervisor fd [RFC,8/9] Implement fops for supervisor-fd [RFC,9/9] Enhance the sandboxer example to support landlock-supervise

Tingmao Wang March 4, 2025, 1:13 a.m. UTC

The two structures are designed to be passed via read and write
to the supervisor-fd.  Compile time check for no holes are added
to build_check_abi.

The event structure will be a dynamically sized structure with
possibly a NULL-terminating filename at the end.  This is so that
we can pass a raw filename to the supervisor for file creation
requests, without having the trouble of not being able to open a
fd to a file that has not been created.

NOTE: despite this patch having a new uapi, I'm still very open to e.g.
re-using fanotify stuff instead (if that makes sense in the end). This is
just a PoC.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 include/uapi/linux/landlock.h | 107 ++++++++++++++++++++++++++++++++++
 security/landlock/syscalls.c  |  28 +++++++++
 2 files changed, 135 insertions(+)

Mickaël Salaün March 4, 2025, 7:49 p.m. UTC | #1

On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
> The two structures are designed to be passed via read and write
> to the supervisor-fd.  Compile time check for no holes are added
> to build_check_abi.
> 
> The event structure will be a dynamically sized structure with
> possibly a NULL-terminating filename at the end.  This is so that
> we can pass a raw filename to the supervisor for file creation
> requests, without having the trouble of not being able to open a
> fd to a file that has not been created.
> 
> NOTE: despite this patch having a new uapi, I'm still very open to e.g.
> re-using fanotify stuff instead (if that makes sense in the end). This is
> just a PoC.
> 
> Signed-off-by: Tingmao Wang <m@maowtm.org>
> ---
>  include/uapi/linux/landlock.h | 107 ++++++++++++++++++++++++++++++++++
>  security/landlock/syscalls.c  |  28 +++++++++
>  2 files changed, 135 insertions(+)
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index 7bc1eb4859fb..b5645fdd998d 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -318,4 +318,111 @@ struct landlock_net_port_attr {
>  #define LANDLOCK_SCOPE_SIGNAL		                (1ULL << 1)
>  /* clang-format on*/
>  
> +/**
> + * DOC: supervisor
> + *
> + * Supervise mode
> + * ~~~~~~~~~~~~~~
> + *
> + * TODO
> + */
> +
> +typedef __u16 landlock_supervise_event_type_t;
> +/* clang-format off */
> +#define LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS         1
> +#define LANDLOCK_SUPERVISE_EVENT_TYPE_NET_ACCESS        2
> +/* clang-format on */
> +
> +struct landlock_supervise_event_hdr {
> +	/**
> +	 * @type: Type of the event.
> +	 */
> +	landlock_supervise_event_type_t type;
> +	/**
> +	 * @length: Length of the entire struct
> +	 * landlock_supervise_event including this header.
> +	 */
> +	__u16 length;
> +	/**
> +	 * @cookie: Opaque identifier to be included in the response.
> +	 */
> +	__u32 cookie;

I guess we could use a __u64 index counter per layer instead.  That
would also help to order requests if they are treated by different
supervisor threads.

> +};
> +
> +struct landlock_supervise_event {
> +	struct landlock_supervise_event_hdr hdr;
> +	__u64 access_request;
> +	__kernel_pid_t accessor;
> +	union {
> +		struct {
> +			/**
> +			 * @fd1: An open file descriptor for the file (open,
> +			 * delete, execute, link, readdir, rename, truncate),
> +			 * or the parent directory (for create operations
> +			 * targeting its child) being accessed.  Must be
> +			 * closed by the reader.
> +			 *
> +			 * If this points to a parent directory, @destname
> +			 * will contain the target filename. If @destname is
> +			 * empty, this points to the target file.
> +			 */
> +			int fd1;
> +			/**
> +			 * @fd2: For link or rename requests, a second file
> +			 * descriptor for the target parent directory.  Must
> +			 * be closed by the reader.  @destname contains the
> +			 * destination filename.  This field is -1 if not
> +			 * used.
> +			 */
> +			int fd2;

Can we just use one FD but identify the requested access instead and
send one event for each, like for the audit patch series?

> +			/**
> +			 * @destname: A filename for a file creation target.
> +			 *
> +			 * If either of fd1 or fd2 points to a parent
> +			 * directory rather than the target file, this is the
> +			 * NULL-terminated name of the file that will be
> +			 * newly created.
> +			 *
> +			 * Counting the NULL terminator, this field will
> +			 * contain one or more NULL padding at the end so
> +			 * that the length of the whole struct
> +			 * landlock_supervise_event is a multiple of 8 bytes.
> +			 *
> +			 * This is a variable length member, and the length
> +			 * including the terminating NULL(s) can be derived
> +			 * from hdr.length - offsetof(struct
> +			 * landlock_supervise_event, destname).
> +			 */
> +			char destname[];

I'd prefer to avoid sending file names for now.  I don't think it's
necessary, and that could encourage supervisors to filter access
according to names.

> +		};
> +		struct {
> +			__u16 port;
> +		};
> +	};
> +};
> +

[...]

Tingmao Wang March 6, 2025, 3:05 a.m. UTC | #2

On 3/4/25 19:49, Mickaël Salaün wrote:
> On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
[...]
>> +	/**
>> +	 * @cookie: Opaque identifier to be included in the response.
>> +	 */
>> +	__u32 cookie;
> 
> I guess we could use a __u64 index counter per layer instead.  That
> would also help to order requests if they are treated by different
> supervisor threads.

I don't immediately see a use for ordering requests (if we get more than 
one event at once, they are coming from different threads anyway so 
there can't be any dependencies between them, and the supervisor threads 
can use timestamps), but I think making it a __u64 is probably a good 
idea regardless, as it means we don't have to do some sort of ID 
allocation, and can just increment an atomic.

>> +};
>> +
>> +struct landlock_supervise_event {
>> +	struct landlock_supervise_event_hdr hdr;
>> +	__u64 access_request;
>> +	__kernel_pid_t accessor;
>> +	union {
>> +		struct {
>> +			/**
>> +			 * @fd1: An open file descriptor for the file (open,
>> +			 * delete, execute, link, readdir, rename, truncate),
>> +			 * or the parent directory (for create operations
>> +			 * targeting its child) being accessed.  Must be
>> +			 * closed by the reader.
>> +			 *
>> +			 * If this points to a parent directory, @destname
>> +			 * will contain the target filename. If @destname is
>> +			 * empty, this points to the target file.
>> +			 */
>> +			int fd1;
>> +			/**
>> +			 * @fd2: For link or rename requests, a second file
>> +			 * descriptor for the target parent directory.  Must
>> +			 * be closed by the reader.  @destname contains the
>> +			 * destination filename.  This field is -1 if not
>> +			 * used.
>> +			 */
>> +			int fd2;
> 
> Can we just use one FD but identify the requested access instead and
> send one event for each, like for the audit patch series?

I haven't managed to read or test out the audit patch yet (I will do), 
but I think having the ability to specifically tell whether the child is 
trying to move / rename / create a hard link of an existing file, and 
what it's trying to use as destination, might be useful (either for 
security, or purely for UX)?

For example, imagine something trying to link or move ~/.ssh/id_ecdsa to 
/tmp/innocent-tmp-file then read the latter. The supervisor can warn the 
user on the initial link attempt, and the shenanigan will probably be 
stopped there (although still, being able to say "[program] wants to 
link ~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just 
"[program] wants to create a link for ~/.ssh/id_ecdsa"), but even if 
somehow this ends up allowed, later on for the read request it could say 
something like

	[program] wants to read /tmp/innocent-tmp-file
	    (previously moved from ~/.ssh/id_ecdsa)

Maybe this is a bit silly, but there might be other use cases for 
knowing the exact details of a rename/link request, either for 
at-the-time decision making, or tracking stuff for future requests?

I will try out the audit patch to see how things like these appears in 
the log before commenting further on this. Maybe there is a way to 
achieve this while still simplifying the event structure?

> 
>> +			/**
>> +			 * @destname: A filename for a file creation target.
>> +			 *
>> +			 * If either of fd1 or fd2 points to a parent
>> +			 * directory rather than the target file, this is the
>> +			 * NULL-terminated name of the file that will be
>> +			 * newly created.
>> +			 *
>> +			 * Counting the NULL terminator, this field will
>> +			 * contain one or more NULL padding at the end so
>> +			 * that the length of the whole struct
>> +			 * landlock_supervise_event is a multiple of 8 bytes.
>> +			 *
>> +			 * This is a variable length member, and the length
>> +			 * including the terminating NULL(s) can be derived
>> +			 * from hdr.length - offsetof(struct
>> +			 * landlock_supervise_event, destname).
>> +			 */
>> +			char destname[];
> 
> I'd prefer to avoid sending file names for now.  I don't think it's
> necessary, and that could encourage supervisors to filter access
> according to names.
>

This is also motivated by the potential UX I'm thinking of. For example, 
if a newly installed application tries to create ~/.app-name, it will be 
much more reassuring and convenient to the user if we can show something 
like

	[program] wants to mkdir ~/.app-name. Allow this and future
	access to the new directory?

rather than just "[program] wants to mkdir under ~". (The "Allow this 
and future access to the new directory" bit is made possible by the 
supervisor knowing the name of the file/directory being created, and can 
remember them / write them out to a persistent profile etc)

Note that this is just the filename under the dir represented by fd - 
this isn't a path or anything that can be subject to symlink-related 
attacks, etc.  If a program calls e.g.
mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
my understanding is that fd1 will point to /some/dir, and destname would 
be "stuff"

Actually, in case your question is "why not send a fd to represent the 
newly created file, instead of sending the name" -- I'm not sure whether 
you can open even an O_PATH fd to a non-existent file.

>> +		};
>> +		struct {
>> +			__u16 port;
>> +		};
>> +	};
>> +};
>> +
> 
> [...]

Mickaël Salaün March 8, 2025, 7:07 p.m. UTC | #3

On Thu, Mar 06, 2025 at 03:05:10AM +0000, Tingmao Wang wrote:
> On 3/4/25 19:49, Mickaël Salaün wrote:
> > On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
> [...]
> > > +	/**
> > > +	 * @cookie: Opaque identifier to be included in the response.
> > > +	 */
> > > +	__u32 cookie;
> > 
> > I guess we could use a __u64 index counter per layer instead.  That
> > would also help to order requests if they are treated by different
> > supervisor threads.
> 
> I don't immediately see a use for ordering requests (if we get more than one
> event at once, they are coming from different threads anyway so there can't
> be any dependencies between them, and the supervisor threads can use
> timestamps), but I think making it a __u64 is probably a good idea
> regardless, as it means we don't have to do some sort of ID allocation, and
> can just increment an atomic.

Indeed, we should follow the seccomp unotify approach with a random u64
incremented per request.

> 
> > > +};
> > > +
> > > +struct landlock_supervise_event {
> > > +	struct landlock_supervise_event_hdr hdr;
> > > +	__u64 access_request;
> > > +	__kernel_pid_t accessor;
> > > +	union {
> > > +		struct {
> > > +			/**
> > > +			 * @fd1: An open file descriptor for the file (open,
> > > +			 * delete, execute, link, readdir, rename, truncate),
> > > +			 * or the parent directory (for create operations
> > > +			 * targeting its child) being accessed.  Must be
> > > +			 * closed by the reader.
> > > +			 *
> > > +			 * If this points to a parent directory, @destname
> > > +			 * will contain the target filename. If @destname is
> > > +			 * empty, this points to the target file.
> > > +			 */
> > > +			int fd1;
> > > +			/**
> > > +			 * @fd2: For link or rename requests, a second file
> > > +			 * descriptor for the target parent directory.  Must
> > > +			 * be closed by the reader.  @destname contains the
> > > +			 * destination filename.  This field is -1 if not
> > > +			 * used.
> > > +			 */
> > > +			int fd2;
> > 
> > Can we just use one FD but identify the requested access instead and
> > send one event for each, like for the audit patch series?
> 
> I haven't managed to read or test out the audit patch yet (I will do), but I
> think having the ability to specifically tell whether the child is trying to
> move / rename / create a hard link of an existing file, and what it's trying
> to use as destination, might be useful (either for security, or purely for
> UX)?
> 
> For example, imagine something trying to link or move ~/.ssh/id_ecdsa to
> /tmp/innocent-tmp-file then read the latter. The supervisor can warn the
> user on the initial link attempt, and the shenanigan will probably be
> stopped there (although still, being able to say "[program] wants to link
> ~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just "[program]
> wants to create a link for ~/.ssh/id_ecdsa"), but even if somehow this ends
> up allowed, later on for the read request it could say something like
> 
> 	[program] wants to read /tmp/innocent-tmp-file
> 	    (previously moved from ~/.ssh/id_ecdsa)
> 
> Maybe this is a bit silly, but there might be other use cases for knowing
> the exact details of a rename/link request, either for at-the-time decision
> making, or tracking stuff for future requests?

This pattern looks like datagram packets.  I think we should use the
netlink attributes.  There were concern about using a netlink socket for
the seccomp unotification though:
https://lore.kernel.org/all/CALCETrXeZZfVzXh7SwKhyB=+ySDk5fhrrdrXrcABsQ=JpQT7Tg@mail.gmail.com/

There are two main differences with seccomp unotify:
- the supervisor should be able to receive arbitrary-sized data (e.g.
  file name, not path);
- the supervisor should be able to receive file descriptors (instead of
  path).

Sockets are created with socket(2) whereas in our case we should only
get a supervisor FD (indirectly) through landlock_restrict_self(2),
which clearly identifies a kernel object.  Another issue would be to
deal with network namespaces, probably by creating a private one.
Sockets are powerful but we don't needs all the routing complexity.
Moreover, we should only need a blocking communication channel to avoid
issues managing in-flight object references (transformed to FDs when
received).  That makes me think that a socket might not be the right
construct, but we can still rely on the NLA macros to define a proper
protocol with dynamically-sized events, received and send with dedicated
IOCTL commands.

Netlink already provides a way to send a cookie, and
netlink_attribute_type defines the types we'll need, including string.

For instance, a link request/event could include 3 packets, one for each
of these properties:
1. the source file FD;
2. the destination directory FD;
3. the destination filename string.

This way we would avoid the union defined in this patch.

There is still the question about receiving FDs though. It would be nice
to have a (set of?) dedicated IOCTL(s) to receive an FD, but I'm not
sure how this could be properly handled wrt NLA.

> 
> I will try out the audit patch to see how things like these appears in the
> log before commenting further on this. Maybe there is a way to achieve this
> while still simplifying the event structure?
> 
> > 
> > > +			/**
> > > +			 * @destname: A filename for a file creation target.
> > > +			 *
> > > +			 * If either of fd1 or fd2 points to a parent
> > > +			 * directory rather than the target file, this is the
> > > +			 * NULL-terminated name of the file that will be
> > > +			 * newly created.
> > > +			 *
> > > +			 * Counting the NULL terminator, this field will
> > > +			 * contain one or more NULL padding at the end so
> > > +			 * that the length of the whole struct
> > > +			 * landlock_supervise_event is a multiple of 8 bytes.
> > > +			 *
> > > +			 * This is a variable length member, and the length
> > > +			 * including the terminating NULL(s) can be derived
> > > +			 * from hdr.length - offsetof(struct
> > > +			 * landlock_supervise_event, destname).
> > > +			 */
> > > +			char destname[];
> > 
> > I'd prefer to avoid sending file names for now.  I don't think it's
> > necessary, and that could encourage supervisors to filter access
> > according to names.
> > 
> 
> This is also motivated by the potential UX I'm thinking of. For example, if
> a newly installed application tries to create ~/.app-name, it will be much
> more reassuring and convenient to the user if we can show something like
> 
> 	[program] wants to mkdir ~/.app-name. Allow this and future
> 	access to the new directory?
> 
> rather than just "[program] wants to mkdir under ~". (The "Allow this and
> future access to the new directory" bit is made possible by the supervisor
> knowing the name of the file/directory being created, and can remember them
> / write them out to a persistent profile etc)
> 
> Note that this is just the filename under the dir represented by fd - this
> isn't a path or anything that can be subject to symlink-related attacks,
> etc.  If a program calls e.g.
> mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
> my understanding is that fd1 will point to /some/dir, and destname would be
> "stuff"

Right, this file name information would be useful.  In the case of
audit, the goal is to efficiently and asynchronously log security events
(and align with other LSM logs and related limitations), not primarily
to debug sandboxed apps nor to enrich this information for decision
making, but the supervisor feature would help here.  The patch message
should include this rationale.


> 
> Actually, in case your question is "why not send a fd to represent the newly
> created file, instead of sending the name" -- I'm not sure whether you can
> open even an O_PATH fd to a non-existent file.

That would not be possible because it would not exist yet, a file name
(not file path) is OK for this case.

> 
> > > +		};
> > > +		struct {
> > > +			__u16 port;
> > > +		};
> > > +	};
> > > +};
> > > +
> > 
> > [...]
> 
>

Tingmao Wang March 10, 2025, 12:39 a.m. UTC | #4

On 3/6/25 03:05, Tingmao Wang wrote:
[...]
> This is also motivated by the potential UX I'm thinking of. For example, 
> if a newly installed application tries to create ~/.app-name, it will be 
> much more reassuring and convenient to the user if we can show something 
> like
> 
>      [program] wants to mkdir ~/.app-name. Allow this and future
>      access to the new directory?
> 
> rather than just "[program] wants to mkdir under ~". (The "Allow this 
> and future access to the new directory" bit is made possible by the 
> supervisor knowing the name of the file/directory being created, and can 
> remember them / write them out to a persistent profile etc)

Another significant motivation, which I forgot to mention, is to 
auto-grant access to newly created files/sockets etc under things like 
/tmp, $XDG_RUNTIME_DIR, or ~/Downloads.

> [...]

Tingmao Wang March 10, 2025, 12:39 a.m. UTC | #5

On 3/8/25 19:07, Mickaël Salaün wrote:
> On Thu, Mar 06, 2025 at 03:05:10AM +0000, Tingmao Wang wrote:
>> On 3/4/25 19:49, Mickaël Salaün wrote:
>>> On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
>> [...]
>>>> +	/**
>>>> +	 * @cookie: Opaque identifier to be included in the response.
>>>> +	 */
>>>> +	__u32 cookie;
>>>
>>> I guess we could use a __u64 index counter per layer instead.  That
>>> would also help to order requests if they are treated by different
>>> supervisor threads.
>>
>> I don't immediately see a use for ordering requests (if we get more than one
>> event at once, they are coming from different threads anyway so there can't
>> be any dependencies between them, and the supervisor threads can use
>> timestamps), but I think making it a __u64 is probably a good idea
>> regardless, as it means we don't have to do some sort of ID allocation, and
>> can just increment an atomic.
> 
> Indeed, we should follow the seccomp unotify approach with a random u64
> incremented per request.

Do you mean a random starting value, incremented by one per request, or 
something like the landlock_id in the audit patch (random increments too)?

> 
>>
>>>> +};
>>>> +
>>>> +struct landlock_supervise_event {
>>>> +	struct landlock_supervise_event_hdr hdr;
>>>> +	__u64 access_request;
>>>> +	__kernel_pid_t accessor;
>>>> +	union {
>>>> +		struct {
>>>> +			/**
>>>> +			 * @fd1: An open file descriptor for the file (open,
>>>> +			 * delete, execute, link, readdir, rename, truncate),
>>>> +			 * or the parent directory (for create operations
>>>> +			 * targeting its child) being accessed.  Must be
>>>> +			 * closed by the reader.
>>>> +			 *
>>>> +			 * If this points to a parent directory, @destname
>>>> +			 * will contain the target filename. If @destname is
>>>> +			 * empty, this points to the target file.
>>>> +			 */
>>>> +			int fd1;
>>>> +			/**
>>>> +			 * @fd2: For link or rename requests, a second file
>>>> +			 * descriptor for the target parent directory.  Must
>>>> +			 * be closed by the reader.  @destname contains the
>>>> +			 * destination filename.  This field is -1 if not
>>>> +			 * used.
>>>> +			 */
>>>> +			int fd2;
>>>
>>> Can we just use one FD but identify the requested access instead and
>>> send one event for each, like for the audit patch series?
>>
>> I haven't managed to read or test out the audit patch yet (I will do), but I
>> think having the ability to specifically tell whether the child is trying to
>> move / rename / create a hard link of an existing file, and what it's trying
>> to use as destination, might be useful (either for security, or purely for
>> UX)?
>>
>> For example, imagine something trying to link or move ~/.ssh/id_ecdsa to
>> /tmp/innocent-tmp-file then read the latter. The supervisor can warn the
>> user on the initial link attempt, and the shenanigan will probably be
>> stopped there (although still, being able to say "[program] wants to link
>> ~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just "[program]
>> wants to create a link for ~/.ssh/id_ecdsa"), but even if somehow this ends
>> up allowed, later on for the read request it could say something like
>>
>> 	[program] wants to read /tmp/innocent-tmp-file
>> 	    (previously moved from ~/.ssh/id_ecdsa)
>>
>> Maybe this is a bit silly, but there might be other use cases for knowing
>> the exact details of a rename/link request, either for at-the-time decision
>> making, or tracking stuff for future requests?
> 
> This pattern looks like datagram packets.  I think we should use the
> netlink attributes.  There were concern about using a netlink socket for
> the seccomp unotification though:
> https://lore.kernel.org/all/CALCETrXeZZfVzXh7SwKhyB=+ySDk5fhrrdrXrcABsQ=JpQT7Tg@mail.gmail.com/
> 
> There are two main differences with seccomp unotify:
> - the supervisor should be able to receive arbitrary-sized data (e.g.
>    file name, not path);
> - the supervisor should be able to receive file descriptors (instead of
>    path).
> 
> Sockets are created with socket(2) whereas in our case we should only
> get a supervisor FD (indirectly) through landlock_restrict_self(2),
> which clearly identifies a kernel object.  Another issue would be to
> deal with network namespaces, probably by creating a private one.
> Sockets are powerful but we don't needs all the routing complexity.
> Moreover, we should only need a blocking communication channel to avoid
> issues managing in-flight object references (transformed to FDs when
> received).  That makes me think that a socket might not be the right
> construct, but we can still rely on the NLA macros to define a proper
> protocol with dynamically-sized events, received and send with dedicated
> IOCTL commands.
> 
> Netlink already provides a way to send a cookie, and
> netlink_attribute_type defines the types we'll need, including string.
> 
> For instance, a link request/event could include 3 packets, one for each
> of these properties:
> 1. the source file FD;
> 2. the destination directory FD;
> 3. the destination filename string.
> 
> This way we would avoid the union defined in this patch.

I had no idea about netlink - I will take a look.  Do you know if there 
is any existing code which uses it in a similar way (i.e. not creating 
an actual socket, but using netlink messages)?

I think in the end seccomp-unotify went with an ioctl with a custom 
struct seccomp_notif due to friction with the NL API [1] - do you think 
we will face the same problem here? (I will take a deeper look at 
netlink after sending this.)

(Tycho - could you weigh in?)

[1]: 
https://lore.kernel.org/all/CAGXu5jKsLDSBjB74SrvCvmGy_RTEjBsMtR5dk1CcRFrHEQfM_g@mail.gmail.com/

> 
> There is still the question about receiving FDs though. It would be nice
> to have a (set of?) dedicated IOCTL(s) to receive an FD, but I'm not
> sure how this could be properly handled wrt NLA.

Also, if we go with netlink messages, why do we need additional IOCTLs? 
Can we open the fd when we write out the message? (Maybe I will end up 
realizing the reason for this after reading netlink code, but I would )

> 
>>
>> I will try out the audit patch to see how things like these appears in the
>> log before commenting further on this. Maybe there is a way to achieve this
>> while still simplifying the event structure?
>>
>>>
>>>> +			/**
>>>> +			 * @destname: A filename for a file creation target.
>>>> +			 *
>>>> +			 * If either of fd1 or fd2 points to a parent
>>>> +			 * directory rather than the target file, this is the
>>>> +			 * NULL-terminated name of the file that will be
>>>> +			 * newly created.
>>>> +			 *
>>>> +			 * Counting the NULL terminator, this field will
>>>> +			 * contain one or more NULL padding at the end so
>>>> +			 * that the length of the whole struct
>>>> +			 * landlock_supervise_event is a multiple of 8 bytes.
>>>> +			 *
>>>> +			 * This is a variable length member, and the length
>>>> +			 * including the terminating NULL(s) can be derived
>>>> +			 * from hdr.length - offsetof(struct
>>>> +			 * landlock_supervise_event, destname).
>>>> +			 */
>>>> +			char destname[];
>>>
>>> I'd prefer to avoid sending file names for now.  I don't think it's
>>> necessary, and that could encourage supervisors to filter access
>>> according to names.
>>>
>>
>> This is also motivated by the potential UX I'm thinking of. For example, if
>> a newly installed application tries to create ~/.app-name, it will be much
>> more reassuring and convenient to the user if we can show something like
>>
>> 	[program] wants to mkdir ~/.app-name. Allow this and future
>> 	access to the new directory?
>>
>> rather than just "[program] wants to mkdir under ~". (The "Allow this and
>> future access to the new directory" bit is made possible by the supervisor
>> knowing the name of the file/directory being created, and can remember them
>> / write them out to a persistent profile etc)
>>
>> Note that this is just the filename under the dir represented by fd - this
>> isn't a path or anything that can be subject to symlink-related attacks,
>> etc.  If a program calls e.g.
>> mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
>> my understanding is that fd1 will point to /some/dir, and destname would be
>> "stuff"
> 
> Right, this file name information would be useful.  In the case of
> audit, the goal is to efficiently and asynchronously log security events
> (and align with other LSM logs and related limitations), not primarily
> to debug sandboxed apps nor to enrich this information for decision
> making, but the supervisor feature would help here.  The patch message
> should include this rationale.

Will do

> 
>>
>> Actually, in case your question is "why not send a fd to represent the newly
>> created file, instead of sending the name" -- I'm not sure whether you can
>> open even an O_PATH fd to a non-existent file.
> 
> That would not be possible because it would not exist yet, a file name
> (not file path) is OK for this case.
> 
>>
>>>> +		};
>>>> +		struct {
>>>> +			__u16 port;
>>>> +		};
>>>> +	};
>>>> +};
>>>> +
>>>
>>> [...]
>>
>>

Mickaël Salaün March 11, 2025, 7:28 p.m. UTC | #6

On Mon, Mar 10, 2025 at 12:39:04AM +0000, Tingmao Wang wrote:
> On 3/6/25 03:05, Tingmao Wang wrote:
> [...]
> > This is also motivated by the potential UX I'm thinking of. For example,
> > if a newly installed application tries to create ~/.app-name, it will be
> > much more reassuring and convenient to the user if we can show something
> > like
> > 
> >      [program] wants to mkdir ~/.app-name. Allow this and future
> >      access to the new directory?
> > 
> > rather than just "[program] wants to mkdir under ~". (The "Allow this
> > and future access to the new directory" bit is made possible by the
> > supervisor knowing the name of the file/directory being created, and can
> > remember them / write them out to a persistent profile etc)
> 
> Another significant motivation, which I forgot to mention, is to auto-grant
> access to newly created files/sockets etc under things like /tmp,
> $XDG_RUNTIME_DIR, or ~/Downloads.

What do you mean?  What is not currently possible?

Mickaël Salaün March 11, 2025, 7:29 p.m. UTC | #7

On Mon, Mar 10, 2025 at 12:39:08AM +0000, Tingmao Wang wrote:
> On 3/8/25 19:07, Mickaël Salaün wrote:
> > On Thu, Mar 06, 2025 at 03:05:10AM +0000, Tingmao Wang wrote:
> > > On 3/4/25 19:49, Mickaël Salaün wrote:
> > > > On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
> > > [...]
> > > > > +	/**
> > > > > +	 * @cookie: Opaque identifier to be included in the response.
> > > > > +	 */
> > > > > +	__u32 cookie;
> > > > 
> > > > I guess we could use a __u64 index counter per layer instead.  That
> > > > would also help to order requests if they are treated by different
> > > > supervisor threads.
> > > 
> > > I don't immediately see a use for ordering requests (if we get more than one
> > > event at once, they are coming from different threads anyway so there can't
> > > be any dependencies between them, and the supervisor threads can use
> > > timestamps), but I think making it a __u64 is probably a good idea
> > > regardless, as it means we don't have to do some sort of ID allocation, and
> > > can just increment an atomic.
> > 
> > Indeed, we should follow the seccomp unotify approach with a random u64
> > incremented per request.
> 
> Do you mean a random starting value, incremented by one per request, or

Yes

> something like the landlock_id in the audit patch (random increments too)?

There is no need for that because the supervisor is more privileged than
the sandbox.

> 
> > 
> > > 
> > > > > +};
> > > > > +
> > > > > +struct landlock_supervise_event {
> > > > > +	struct landlock_supervise_event_hdr hdr;
> > > > > +	__u64 access_request;
> > > > > +	__kernel_pid_t accessor;
> > > > > +	union {
> > > > > +		struct {
> > > > > +			/**
> > > > > +			 * @fd1: An open file descriptor for the file (open,
> > > > > +			 * delete, execute, link, readdir, rename, truncate),
> > > > > +			 * or the parent directory (for create operations
> > > > > +			 * targeting its child) being accessed.  Must be
> > > > > +			 * closed by the reader.
> > > > > +			 *
> > > > > +			 * If this points to a parent directory, @destname
> > > > > +			 * will contain the target filename. If @destname is
> > > > > +			 * empty, this points to the target file.
> > > > > +			 */
> > > > > +			int fd1;
> > > > > +			/**
> > > > > +			 * @fd2: For link or rename requests, a second file
> > > > > +			 * descriptor for the target parent directory.  Must
> > > > > +			 * be closed by the reader.  @destname contains the
> > > > > +			 * destination filename.  This field is -1 if not
> > > > > +			 * used.
> > > > > +			 */
> > > > > +			int fd2;
> > > > 
> > > > Can we just use one FD but identify the requested access instead and
> > > > send one event for each, like for the audit patch series?
> > > 
> > > I haven't managed to read or test out the audit patch yet (I will do), but I
> > > think having the ability to specifically tell whether the child is trying to
> > > move / rename / create a hard link of an existing file, and what it's trying
> > > to use as destination, might be useful (either for security, or purely for
> > > UX)?
> > > 
> > > For example, imagine something trying to link or move ~/.ssh/id_ecdsa to
> > > /tmp/innocent-tmp-file then read the latter. The supervisor can warn the
> > > user on the initial link attempt, and the shenanigan will probably be
> > > stopped there (although still, being able to say "[program] wants to link
> > > ~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just "[program]
> > > wants to create a link for ~/.ssh/id_ecdsa"), but even if somehow this ends
> > > up allowed, later on for the read request it could say something like
> > > 
> > > 	[program] wants to read /tmp/innocent-tmp-file
> > > 	    (previously moved from ~/.ssh/id_ecdsa)
> > > 
> > > Maybe this is a bit silly, but there might be other use cases for knowing
> > > the exact details of a rename/link request, either for at-the-time decision
> > > making, or tracking stuff for future requests?
> > 
> > This pattern looks like datagram packets.  I think we should use the
> > netlink attributes.  There were concern about using a netlink socket for
> > the seccomp unotification though:
> > https://lore.kernel.org/all/CALCETrXeZZfVzXh7SwKhyB=+ySDk5fhrrdrXrcABsQ=JpQT7Tg@mail.gmail.com/
> > 
> > There are two main differences with seccomp unotify:
> > - the supervisor should be able to receive arbitrary-sized data (e.g.
> >    file name, not path);
> > - the supervisor should be able to receive file descriptors (instead of
> >    path).
> > 
> > Sockets are created with socket(2) whereas in our case we should only
> > get a supervisor FD (indirectly) through landlock_restrict_self(2),
> > which clearly identifies a kernel object.  Another issue would be to
> > deal with network namespaces, probably by creating a private one.
> > Sockets are powerful but we don't needs all the routing complexity.
> > Moreover, we should only need a blocking communication channel to avoid
> > issues managing in-flight object references (transformed to FDs when
> > received).  That makes me think that a socket might not be the right
> > construct, but we can still rely on the NLA macros to define a proper
> > protocol with dynamically-sized events, received and send with dedicated
> > IOCTL commands.
> > 
> > Netlink already provides a way to send a cookie, and
> > netlink_attribute_type defines the types we'll need, including string.
> > 
> > For instance, a link request/event could include 3 packets, one for each
> > of these properties:
> > 1. the source file FD;
> > 2. the destination directory FD;
> > 3. the destination filename string.
> > 
> > This way we would avoid the union defined in this patch.
> 
> I had no idea about netlink - I will take a look.  Do you know if there is
> any existing code which uses it in a similar way (i.e. not creating an
> actual socket, but using netlink messages)?

I don't know.

> 
> I think in the end seccomp-unotify went with an ioctl with a custom struct
> seccomp_notif due to friction with the NL API [1] - do you think we will
> face the same problem here? (I will take a deeper look at netlink after
> sending this.)
> 
> (Tycho - could you weigh in?)
> 
> [1]: https://lore.kernel.org/all/CAGXu5jKsLDSBjB74SrvCvmGy_RTEjBsMtR5dk1CcRFrHEQfM_g@mail.gmail.com/

We need to check if the NLA API could work.  Kees's answer was missing
explanation.  Otherwise we should get inspiration from fanotify
messages.

> 
> > 
> > There is still the question about receiving FDs though. It would be nice
> > to have a (set of?) dedicated IOCTL(s) to receive an FD, but I'm not
> > sure how this could be properly handled wrt NLA.
> 
> Also, if we go with netlink messages, why do we need additional IOCTLs? Can
> we open the fd when we write out the message? (Maybe I will end up realizing
> the reason for this after reading netlink code, but I would )

It's much easier to have static-sized struct, both for developers and
for introspection tools (e.g. strace).  However, in this case we also
would also have variable-lenght data.  See my other reply discussing the
IOCTL idea.

> 
> > 
> > > 
> > > I will try out the audit patch to see how things like these appears in the
> > > log before commenting further on this. Maybe there is a way to achieve this
> > > while still simplifying the event structure?
> > > 
> > > > 
> > > > > +			/**
> > > > > +			 * @destname: A filename for a file creation target.
> > > > > +			 *
> > > > > +			 * If either of fd1 or fd2 points to a parent
> > > > > +			 * directory rather than the target file, this is the
> > > > > +			 * NULL-terminated name of the file that will be
> > > > > +			 * newly created.
> > > > > +			 *
> > > > > +			 * Counting the NULL terminator, this field will
> > > > > +			 * contain one or more NULL padding at the end so
> > > > > +			 * that the length of the whole struct
> > > > > +			 * landlock_supervise_event is a multiple of 8 bytes.
> > > > > +			 *
> > > > > +			 * This is a variable length member, and the length
> > > > > +			 * including the terminating NULL(s) can be derived
> > > > > +			 * from hdr.length - offsetof(struct
> > > > > +			 * landlock_supervise_event, destname).
> > > > > +			 */
> > > > > +			char destname[];
> > > > 
> > > > I'd prefer to avoid sending file names for now.  I don't think it's
> > > > necessary, and that could encourage supervisors to filter access
> > > > according to names.
> > > > 
> > > 
> > > This is also motivated by the potential UX I'm thinking of. For example, if
> > > a newly installed application tries to create ~/.app-name, it will be much
> > > more reassuring and convenient to the user if we can show something like
> > > 
> > > 	[program] wants to mkdir ~/.app-name. Allow this and future
> > > 	access to the new directory?
> > > 
> > > rather than just "[program] wants to mkdir under ~". (The "Allow this and
> > > future access to the new directory" bit is made possible by the supervisor
> > > knowing the name of the file/directory being created, and can remember them
> > > / write them out to a persistent profile etc)
> > > 
> > > Note that this is just the filename under the dir represented by fd - this
> > > isn't a path or anything that can be subject to symlink-related attacks,
> > > etc.  If a program calls e.g.
> > > mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
> > > my understanding is that fd1 will point to /some/dir, and destname would be
> > > "stuff"
> > 
> > Right, this file name information would be useful.  In the case of
> > audit, the goal is to efficiently and asynchronously log security events
> > (and align with other LSM logs and related limitations), not primarily
> > to debug sandboxed apps nor to enrich this information for decision
> > making, but the supervisor feature would help here.  The patch message
> > should include this rationale.
> 
> Will do
> 
> > 
> > > 
> > > Actually, in case your question is "why not send a fd to represent the newly
> > > created file, instead of sending the name" -- I'm not sure whether you can
> > > open even an O_PATH fd to a non-existent file.
> > 
> > That would not be possible because it would not exist yet, a file name
> > (not file path) is OK for this case.
> > 
> > > 
> > > > > +		};
> > > > > +		struct {
> > > > > +			__u16 port;
> > > > > +		};
> > > > > +	};
> > > > > +};
> > > > > +
> > > > 
> > > > [...]
> > > 
> > > 
>

Tingmao Wang March 11, 2025, 11:18 p.m. UTC | #8

On 3/11/25 19:28, Mickaël Salaün wrote:
> On Mon, Mar 10, 2025 at 12:39:04AM +0000, Tingmao Wang wrote:
>> On 3/6/25 03:05, Tingmao Wang wrote:
>> [...]
>>> This is also motivated by the potential UX I'm thinking of. For example,
>>> if a newly installed application tries to create ~/.app-name, it will be
>>> much more reassuring and convenient to the user if we can show something
>>> like
>>>
>>>       [program] wants to mkdir ~/.app-name. Allow this and future
>>>       access to the new directory?
>>>
>>> rather than just "[program] wants to mkdir under ~". (The "Allow this
>>> and future access to the new directory" bit is made possible by the
>>> supervisor knowing the name of the file/directory being created, and can
>>> remember them / write them out to a persistent profile etc)
>>
>> Another significant motivation, which I forgot to mention, is to auto-grant
>> access to newly created files/sockets etc under things like /tmp,
>> $XDG_RUNTIME_DIR, or ~/Downloads.
> 
> What do you mean?  What is not currently possible?

It is not currently possible with landlock to say "I will allow this 
application access to create and open new file/folders under this 
directory, change or delete the files it creates, but not touch any 
existing files". Landlock supervisor can make this possible (keeping 
track via its own state to allow future requests on the new file, or 
modifying the domain if we support that), but for that the supervisor 
has to know what file the application tried to create, hence motivating 
sending filename.

(I can see this kind of policy being applied to dirs like /tmp or my 
Downloads folder. $XDG_RUNTIME_DIR is also a sensible place for this 
behaviour due to the common pattern of creating a lock/pid file/socket 
there, although on second thought a GUI sandbox probably will want to 
create a private copy of that dir anyway for each app, to do dbus 
filtering etc)

Mickaël Salaün March 12, 2025, 11:49 a.m. UTC | #9

On Tue, Mar 11, 2025 at 11:18:49PM +0000, Tingmao Wang wrote:
> On 3/11/25 19:28, Mickaël Salaün wrote:
> > On Mon, Mar 10, 2025 at 12:39:04AM +0000, Tingmao Wang wrote:
> > > On 3/6/25 03:05, Tingmao Wang wrote:
> > > [...]
> > > > This is also motivated by the potential UX I'm thinking of. For example,
> > > > if a newly installed application tries to create ~/.app-name, it will be
> > > > much more reassuring and convenient to the user if we can show something
> > > > like
> > > > 
> > > >       [program] wants to mkdir ~/.app-name. Allow this and future
> > > >       access to the new directory?
> > > > 
> > > > rather than just "[program] wants to mkdir under ~". (The "Allow this
> > > > and future access to the new directory" bit is made possible by the
> > > > supervisor knowing the name of the file/directory being created, and can
> > > > remember them / write them out to a persistent profile etc)
> > > 
> > > Another significant motivation, which I forgot to mention, is to auto-grant
> > > access to newly created files/sockets etc under things like /tmp,
> > > $XDG_RUNTIME_DIR, or ~/Downloads.
> > 
> > What do you mean?  What is not currently possible?
> 
> It is not currently possible with landlock to say "I will allow this
> application access to create and open new file/folders under this directory,
> change or delete the files it creates, but not touch any existing files".
> Landlock supervisor can make this possible (keeping track via its own state
> to allow future requests on the new file, or modifying the domain if we
> support that), but for that the supervisor has to know what file the
> application tried to create, hence motivating sending filename.

This capability would be at least inconsistent, and dangerous at worse,
because of policy inconsistencies over time.  A sandbox policy should be
seen over several invocations of the same sandbox.  See related deny
listing issues: https://github.com/landlock-lsm/linux/issues/28

Let's say a first instance of the sandbox can create files and access
them, but not other existing files in the same directory.  A second
instance of this sandbox would not be able to access the files the same
application created, so it will not be able to clean them if required.
That could be OK in the case of the ~/Downloads directory but I think it
would be weird for users to not be able to open their previous
downloaded files from the browser, whereas it was allowed before.

For such use case, if we want to avoid new browser instances to access
old downloaded files, I'd recommand to create a new download directory
per browser/sandbox launch.

> 
> (I can see this kind of policy being applied to dirs like /tmp or my
> Downloads folder. $XDG_RUNTIME_DIR is also a sensible place for this
> behaviour due to the common pattern of creating a lock/pid file/socket
> there, although on second thought a GUI sandbox probably will want to create
> a private copy of that dir anyway for each app, to do dbus filtering etc)

An $XDG_RUNTIME_DIR per sandbox looks reasonable, but in practice we
also need secure proxies/portals to still share some user's resources.
This part should be implemented in user space because the kernel doesn't
know about this semantic (e.g. DBus requests).

[RFC,5/9] Define user structure for events and responses.

Commit Message

Comments

Patch