mbox series

[RFC,0/9] Landlock supervise: a mechanism for interactive permission requests

Message ID cover.1741047969.git.m@maowtm.org (mailing list archive)
Headers show
Series Landlock supervise: a mechanism for interactive permission requests | expand

Message

Tingmao Wang March 4, 2025, 1:12 a.m. UTC
Landlock supervise: a mechanism for interactive permission requests

Hi,

I would like to propose an extension to Landlock to support a "supervisor"
mode, which would enable a user program to sandbox applications (or
itself) in a dynamic, fine-grained, and potentially temporary way.
Practically, this makes it easy to give maximal control to the user,
perhaps in the form of a "just in time" permission prompt.  Read on, or
check the sandboxer program in the last patch for a "demo".

To Jan Kara and other fanotify reviewers, I've included you in this patch
as Mickaël suggested that we could potentially extend and re-use the
fanotify uapi and code instead of creating an entirely new representation
for permission requests and mechanism for passing it (as this patch
currently does).  I've not really thought out how that would work (there
will probably have to be some extension of the fanotify-fd uapi since
landlock handles more than FS access), but I think it is a promising idea,
hence I would like to hear your thoughts if you could spare a moment to
look at this.  A good outcome could also be that we add the necessary
hooks so that both this and fanotify (but really fsnotify?) can have _perm
events for create/delete/rename etc.

FS mailing list - I've CC'd this patchset to you too - even though the
patch doesn't currently touch any FS code, this is very FS related, and
also, in order to address an inode lock related problem which I will
mention in patch 6 of this series, future versions of this patch will
likely need to add a few more LSM hooks.  Especially for that part, but
also other bits of this project, a pair of eyes from the FS community
would be very helpful.

To Tycho Andersen -- I'm CC'ing you as you've worked on the seccomp-unotify
feature which is also quite related, so if you could spare some time for a
quick review, or provide some suggestions, that would be very appreciated
:)

I'm submitting this series as a non-production-ready, proof-of-concept
RFC, and I would appreciate feedback on any aspects of the design or
implementation.  Note that due to the PoC nature of this, I have not
handled checkpatch.pl errors etc.  I also welcome suggestions for
alternative names for this feature (e.g. landlock-unotify?
landlock-perm?).  At this point I'm very keen to hear some initial
feedback from the community before investing further into polishing this
patch.

(I've briefly pitched the overall idea to Mickaël, but he has not reviewed
the patch yet)


Why extend landlock?
--------------------

While this feature could be implemented as its own LSM, I feel like it is
a natural extension to landlock -- landlock has already defined a set of
fine-grained access requests with the intention to add more (and not just
for FS alone), is designed to be an unprivileged, stackable,
process-scoped, ad-hoc mechanism with no persistent state, which works
well as a generic API to support a dynamic sandbox, and landlock is
already doing the path traversal work to evaluate hierarchical filesystem
rules, which would also be useful for a performant dynamic sandbox
implementation.


Use cases
---------

I have several potential use cases in mind that will benefit from
landlock-supervise, for example:

1. A patch to firejail (I have not discussed with the firejail maintainers
on this yet - wanted to see the reception of this kernel patch first)
which can leverage landlock in a highly flexible way, prompting the user
for permission to access "extra" files after the sandbox has started
(without e.g. having to restart a very stateful GUI program).

This way of using landlock can potentially replace its current approach of
using bind mounts (as it will allow implementing "blacklists"), allowing
unprivileged sandbox creation (although need to check with firejail if
there are other factors preventing this).  This also allows editing
profiles "live" in a highly interactive way (i.e. the user can choose
"allow and remember" on a permission request which will also add the newly
allowed path to a local firejail profile, all automatically)

2. A "protected" mode for common development environments (e.g. VSCode or
a terminal can be launched "protected") that doesn't compromise on
ease-of-use.  File access to $PWD at launch can be allowed, and access to
other places can be allowed ad-hoc by the developer with hopefully one UI
click.  Since landlock can also be used to restrict network access, such a
protected mode can also restrict outgoing connections by default (but ask
the user if they allow it for all or certain processes, on the first
attempt to connect).

Recently there has been incidents of secret-stealing malware targeting
developers (on Linux) by social engineering them to open and build/run a
project. [1]  The hope is that landlock-supervise can drive adoption of
sandboxes for developers and others by making them more user-friendly.

In addition to the above, I also hope that this would help with landlock
adoption even in non-interaction-heavy scenarios, by allowing application
developers the choice to gracefully recover from over-restrictive rulesets
and collect failure metrics, until they are confident that actually
blocking non-allowed accesses would not break their application or degrade
the user experience.

I have more exploration to do regarding applying this to applications, but
I do have a working proof of concept already (implemented as an
enhancement to the sandboxer example). Here is a shortened output:

    bash # env LL_FS_RO=/usr:/lib:/bin:/etc:/dev:/proc LL_FS_RW= LL_SUPERVISE=1 ./sandboxer bash -i
    bash # echo "Hi, $(whoami)!"
    Hi, root!
    bash # ls /
    ------------- Sandboxer access request -------------
    Process ls[166] (/usr/bin/ls) wants to read
      /
    (y)es/(a)lways/(n)o > y
    ----------------------------------------------------
    bin
    boot
    dev
    ...
    usr
    var
    bash # echo 'evil' >> /etc/profile
    (a spurious create request due to current issue with dcache miss is omitted)
    ------------- Sandboxer access request -------------
    Process bash[163] (/usr/bin/bash) wants to read/write
      /etc/profile
    (y)es/(a)lways/(n)o > n
    ----------------------------------------------------
    bash: /etc/profile: Permission denied
    bash #


Alternatives
------------

I have looked for existing ways to implement the proposed use cases (at
least for FS access), and three main approaches stand out to me:

1. Fanotify: there is already FAM_OPEN_PERM which waits for an allow/deny
response from a fanotify listener.  However, it does not currently have
the equivalent _PERM for file creation, deletion, rename and linking, and
it is also not designed for unprivileged, process-scoped use (unlike
landlock).

2. Seccomp-unotify: this can be used to trap all syscalls and give the
sandbox a chance to allow or deny any one of them. However, a correct,
TOCTOU-proof implementation will likely require handling a large number of
fs-related syscalls in user-space, with the sandboxer opening the file or
carrying out the operation on behalf of the sandboxee.  This is probably
going to be extremely complex and makes everything less performant.

3. Using a FUSE filesystem which gates access.  This is actually an
approach taken by an existing sandbox solution - flatpak [2], however it
requires either tight integration with the application (and thus doesn't
work well for the mentioned use cases), or if one wants to sandbox a
program "transparently", SYS_ADMIN to chroot.


I've tested that what I have here works with the enhanced sandboxer, but
have yet to write any self tests or do extensive testing or perf
measurements.  I have also yet to implement support for supervising tcp
rules as well as FS refer operations.

Base commit: 78332fdb956f18accfbca5993b10c5ed69f00a2c (tag:
landlock-6.14-rc5, mic/next)


[1]: https://cybersecuritynews.com/beware-of-lazarus-linkedin-recruiting-scam/
[2]: https://flatpak.github.io/xdg-desktop-portal/docs/documents-and-fuse.html


Tingmao Wang (9):
  Define the supervisor and event structure
  Refactor per-layer information in rulesets and rules
  Adds a supervisor reference in the per-layer information
  User-space API for creating a supervisor-fd
  Define user structure for events and responses.
  Creating supervisor events for filesystem operations
  Implement fdinfo for ruleset and supervisor fd
  Implement fops for supervisor-fd
  Enhance the sandboxer example to support landlock-supervise

 include/uapi/linux/landlock.h | 119 ++++++
 samples/landlock/sandboxer.c  | 759 +++++++++++++++++++++++++++++++++-
 security/landlock/Makefile    |   2 +-
 security/landlock/fs.c        | 134 +++++-
 security/landlock/ruleset.c   |  49 ++-
 security/landlock/ruleset.h   |  66 +--
 security/landlock/supervise.c | 194 +++++++++
 security/landlock/supervise.h | 171 ++++++++
 security/landlock/syscalls.c  | 621 +++++++++++++++++++++++++++-
 9 files changed, 2036 insertions(+), 79 deletions(-)
 create mode 100644 security/landlock/supervise.c
 create mode 100644 security/landlock/supervise.h

--
2.39.5

Comments

Mickaël Salaün March 4, 2025, 7:48 p.m. UTC | #1
On Tue, Mar 04, 2025 at 01:12:56AM +0000, Tingmao Wang wrote:
> Landlock supervise: a mechanism for interactive permission requests
> 
> Hi,
> 
> I would like to propose an extension to Landlock to support a "supervisor"
> mode, which would enable a user program to sandbox applications (or
> itself) in a dynamic, fine-grained, and potentially temporary way.
> Practically, this makes it easy to give maximal control to the user,
> perhaps in the form of a "just in time" permission prompt.  Read on, or
> check the sandboxer program in the last patch for a "demo".

Thanks for this RFC, this is very promising!

> 
> To Jan Kara and other fanotify reviewers, I've included you in this patch
> as Mickaël suggested that we could potentially extend and re-use the
> fanotify uapi and code instead of creating an entirely new representation
> for permission requests and mechanism for passing it (as this patch
> currently does).  I've not really thought out how that would work (there
> will probably have to be some extension of the fanotify-fd uapi since
> landlock handles more than FS access), but I think it is a promising idea,
> hence I would like to hear your thoughts if you could spare a moment to
> look at this.  A good outcome could also be that we add the necessary
> hooks so that both this and fanotify (but really fsnotify?) can have _perm
> events for create/delete/rename etc.
> 
> FS mailing list - I've CC'd this patchset to you too - even though the
> patch doesn't currently touch any FS code, this is very FS related, and
> also, in order to address an inode lock related problem which I will
> mention in patch 6 of this series, future versions of this patch will
> likely need to add a few more LSM hooks.  Especially for that part, but
> also other bits of this project, a pair of eyes from the FS community
> would be very helpful.
> 
> To Tycho Andersen -- I'm CC'ing you as you've worked on the seccomp-unotify
> feature which is also quite related, so if you could spare some time for a
> quick review, or provide some suggestions, that would be very appreciated
> :)
> 
> I'm submitting this series as a non-production-ready, proof-of-concept
> RFC, and I would appreciate feedback on any aspects of the design or
> implementation.  Note that due to the PoC nature of this, I have not
> handled checkpatch.pl errors etc.  I also welcome suggestions for
> alternative names for this feature (e.g. landlock-unotify?
> landlock-perm?).  At this point I'm very keen to hear some initial
> feedback from the community before investing further into polishing this
> patch.
> 
> (I've briefly pitched the overall idea to Mickaël, but he has not reviewed
> the patch yet)
> 
> 
> Why extend landlock?
> --------------------
> 
> While this feature could be implemented as its own LSM, I feel like it is
> a natural extension to landlock -- landlock has already defined a set of
> fine-grained access requests with the intention to add more (and not just
> for FS alone), is designed to be an unprivileged, stackable,
> process-scoped, ad-hoc mechanism with no persistent state, which works
> well as a generic API to support a dynamic sandbox, and landlock is
> already doing the path traversal work to evaluate hierarchical filesystem
> rules, which would also be useful for a performant dynamic sandbox
> implementation.

I agree, that would be a great Landlock feature.

> 
> 
> Use cases
> ---------
> 
> I have several potential use cases in mind that will benefit from
> landlock-supervise, for example:
> 
> 1. A patch to firejail (I have not discussed with the firejail maintainers
> on this yet - wanted to see the reception of this kernel patch first)
> which can leverage landlock in a highly flexible way, prompting the user
> for permission to access "extra" files after the sandbox has started
> (without e.g. having to restart a very stateful GUI program).
> 
> This way of using landlock can potentially replace its current approach of
> using bind mounts (as it will allow implementing "blacklists"), allowing
> unprivileged sandbox creation (although need to check with firejail if
> there are other factors preventing this).  This also allows editing
> profiles "live" in a highly interactive way (i.e. the user can choose
> "allow and remember" on a permission request which will also add the newly
> allowed path to a local firejail profile, all automatically)
> 
> 2. A "protected" mode for common development environments (e.g. VSCode or
> a terminal can be launched "protected") that doesn't compromise on
> ease-of-use.  File access to $PWD at launch can be allowed, and access to
> other places can be allowed ad-hoc by the developer with hopefully one UI
> click.  Since landlock can also be used to restrict network access, such a
> protected mode can also restrict outgoing connections by default (but ask
> the user if they allow it for all or certain processes, on the first
> attempt to connect).
> 
> Recently there has been incidents of secret-stealing malware targeting
> developers (on Linux) by social engineering them to open and build/run a
> project. [1]  The hope is that landlock-supervise can drive adoption of
> sandboxes for developers and others by making them more user-friendly.
> 
> In addition to the above, I also hope that this would help with landlock
> adoption even in non-interaction-heavy scenarios, by allowing application
> developers the choice to gracefully recover from over-restrictive rulesets
> and collect failure metrics, until they are confident that actually
> blocking non-allowed accesses would not break their application or degrade
> the user experience.

Another interesting use case is to trace programs and get an
unprivileged "permissive" mode to quickly create sandbox policies.

> 
> I have more exploration to do regarding applying this to applications, but
> I do have a working proof of concept already (implemented as an
> enhancement to the sandboxer example). Here is a shortened output:
> 
>     bash # env LL_FS_RO=/usr:/lib:/bin:/etc:/dev:/proc LL_FS_RW= LL_SUPERVISE=1 ./sandboxer bash -i
>     bash # echo "Hi, $(whoami)!"
>     Hi, root!
>     bash # ls /
>     ------------- Sandboxer access request -------------
>     Process ls[166] (/usr/bin/ls) wants to read
>       /
>     (y)es/(a)lways/(n)o > y
>     ----------------------------------------------------
>     bin
>     boot
>     dev
>     ...
>     usr
>     var
>     bash # echo 'evil' >> /etc/profile
>     (a spurious create request due to current issue with dcache miss is omitted)
>     ------------- Sandboxer access request -------------
>     Process bash[163] (/usr/bin/bash) wants to read/write
>       /etc/profile
>     (y)es/(a)lways/(n)o > n
>     ----------------------------------------------------
>     bash: /etc/profile: Permission denied
>     bash #
> 
> 
> Alternatives
> ------------
> 
> I have looked for existing ways to implement the proposed use cases (at
> least for FS access), and three main approaches stand out to me:
> 
> 1. Fanotify: there is already FAM_OPEN_PERM which waits for an allow/deny
> response from a fanotify listener.  However, it does not currently have
> the equivalent _PERM for file creation, deletion, rename and linking, and
> it is also not designed for unprivileged, process-scoped use (unlike
> landlock).

As discussed, I was thinking about whether or not it would be possible
to use the fanotify interface (e.g. fanotify_init(), fanotify FD...),
but looking at your code, I think it would mostly increase complexity.
There are also the issue with the Landlock semantic (e.g. access rights)
which does not map 1:1 to the fanotify one.  A last thing is that
fanotify is deeply tied to the VFS.  So, unless someone has a better
idea, let's continue with your approach.

> 
> 2. Seccomp-unotify: this can be used to trap all syscalls and give the
> sandbox a chance to allow or deny any one of them. However, a correct,
> TOCTOU-proof implementation will likely require handling a large number of
> fs-related syscalls in user-space, with the sandboxer opening the file or
> carrying out the operation on behalf of the sandboxee.  This is probably
> going to be extremely complex and makes everything less performant.

We should get inspiration from the fanotify and seccomp-notify features
(while implementing the minimum for now) but also identify their design
issues and caveats.

Tycho, Christian, Kees, any suggestion?

> 
> 3. Using a FUSE filesystem which gates access.  This is actually an
> approach taken by an existing sandbox solution - flatpak [2], however it
> requires either tight integration with the application (and thus doesn't
> work well for the mentioned use cases), or if one wants to sandbox a
> program "transparently", SYS_ADMIN to chroot.

Android's SDCardFS is another example of such use.

> 
> 
> I've tested that what I have here works with the enhanced sandboxer, but
> have yet to write any self tests or do extensive testing or perf
> measurements.  I have also yet to implement support for supervising tcp
> rules as well as FS refer operations.

One of the main suggestion would be to align with the audit patch series
semantic and the defined "blockers":
https://lore.kernel.org/all/20250131163059.1139617-1-mic@digikod.net/
I'll send another series soon.

> 
> Base commit: 78332fdb956f18accfbca5993b10c5ed69f00a2c (tag:
> landlock-6.14-rc5, mic/next)
> 
> 
> [1]: https://cybersecuritynews.com/beware-of-lazarus-linkedin-recruiting-scam/
> [2]: https://flatpak.github.io/xdg-desktop-portal/docs/documents-and-fuse.html
> 
> 
> Tingmao Wang (9):
>   Define the supervisor and event structure
>   Refactor per-layer information in rulesets and rules
>   Adds a supervisor reference in the per-layer information
>   User-space API for creating a supervisor-fd
>   Define user structure for events and responses.
>   Creating supervisor events for filesystem operations
>   Implement fdinfo for ruleset and supervisor fd
>   Implement fops for supervisor-fd
>   Enhance the sandboxer example to support landlock-supervise
> 
>  include/uapi/linux/landlock.h | 119 ++++++
>  samples/landlock/sandboxer.c  | 759 +++++++++++++++++++++++++++++++++-
>  security/landlock/Makefile    |   2 +-
>  security/landlock/fs.c        | 134 +++++-
>  security/landlock/ruleset.c   |  49 ++-
>  security/landlock/ruleset.h   |  66 +--
>  security/landlock/supervise.c | 194 +++++++++
>  security/landlock/supervise.h | 171 ++++++++
>  security/landlock/syscalls.c  | 621 +++++++++++++++++++++++++++-
>  9 files changed, 2036 insertions(+), 79 deletions(-)
>  create mode 100644 security/landlock/supervise.c
>  create mode 100644 security/landlock/supervise.h
> 
> --
> 2.39.5
>
Tingmao Wang March 6, 2025, 2:57 a.m. UTC | #2
On 3/4/25 19:48, Mickaël Salaün wrote:

> Thanks for this RFC, this is very promising!

Hi Mickaël - thanks for the prompt review and for your support! I have 
read your replies and have some thoughts already, but I kept getting 
distracted by other stuff and so haven't had much chance to express 
them.  I will address some first today and some more over the weekend.

> Another interesting use case is to trace programs and get an
> unprivileged "permissive" mode to quickly create sandbox policies.

Yes that would also be a good use. I thought of this initially but was 
thinking "I guess you can always do that with audit" but if we have 
landlock supervise maybe that would be an easier thing for tools to 
build upon...?

> As discussed, I was thinking about whether or not it would be possible
> to use the fanotify interface (e.g. fanotify_init(), fanotify FD...),
> but looking at your code, I think it would mostly increase complexity.
> There are also the issue with the Landlock semantic (e.g. access rights)
> which does not map 1:1 to the fanotify one.  A last thing is that
> fanotify is deeply tied to the VFS.  So, unless someone has a better
> idea, let's continue with your approach.

That sounds sensible - I will keep going with the current direction of a 
landlock-specific uapi. (happy to revisit should other people have 
suggestions)

> Android's SDCardFS is another example of such use.

Interesting - seems like it was deprecated for reasons unrelated to 
security though.

> One of the main suggestion would be to align with the audit patch series
> semantic and the defined "blockers":
> https://lore.kernel.org/all/20250131163059.1139617-1-mic@digikod.net/
> I'll send another series soon.

I will have a read of the existing audit series - are you planning 
significant changes to it in the next one?
Amir Goldstein March 6, 2025, 5:07 p.m. UTC | #3
On Thu, Mar 6, 2025 at 3:57 AM Tingmao Wang <m@maowtm.org> wrote:
>
> On 3/4/25 19:48, Mickaël Salaün wrote:
>
> > Thanks for this RFC, this is very promising!
>
> Hi Mickaël - thanks for the prompt review and for your support! I have
> read your replies and have some thoughts already, but I kept getting
> distracted by other stuff and so haven't had much chance to express
> them.  I will address some first today and some more over the weekend.
>
> > Another interesting use case is to trace programs and get an
> > unprivileged "permissive" mode to quickly create sandbox policies.
>
> Yes that would also be a good use. I thought of this initially but was
> thinking "I guess you can always do that with audit" but if we have
> landlock supervise maybe that would be an easier thing for tools to
> build upon...?
>
> > As discussed, I was thinking about whether or not it would be possible
> > to use the fanotify interface (e.g. fanotify_init(), fanotify FD...),
> > but looking at your code, I think it would mostly increase complexity.
> > There are also the issue with the Landlock semantic (e.g. access rights)
> > which does not map 1:1 to the fanotify one.  A last thing is that
> > fanotify is deeply tied to the VFS.  So, unless someone has a better
> > idea, let's continue with your approach.
>
> That sounds sensible - I will keep going with the current direction of a
> landlock-specific uapi. (happy to revisit should other people have
> suggestions)
>

w.r.t sharing infrastructure with fanotify, I only looked briefly at
your patches
and I have only a vague familiarity with landlock, so I cannot yet form an
opinion whether this is a good idea, but I wanted to give you a few more
data points about fanotify that seem relevant.

1. There is already some intersection of fanotify and audit lsm via the
fanotify_response_info_audit_rule extension for permission
events, so it's kind of a precedent of using fanotify to aid an lsm

2. See this fan_pre_modify-wip branch [1] and specifically commit
  "fanotify: introduce directory entry pre-modify permission events"
I do have an intention to add create/delete/rename permission events.
Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
far from the security_path_ lsm hooks, but not exactly in the same place
because we want to fsnotify hooks to be before taking vfs locks, to allow
listener to write to filesystem from event context.
There are different semantics than just ALLOW/DENY that you need,
therefore, only if we move the security_path_ hooks outside the
vfs locks, our use cases could use the same hooks

3. There is a recent attempt to add BPF filter to fanotify [2]
which is driven among other things from the long standing requirement
to add subtree filtering to fanotify watches.
The challenge with all the attempt to implement a subtree filter so far,
is that adding vfs performance overhead for all the users in the system
is unacceptable.

IIUC, landlock rule set can already express a subtree filter (?),
so it is intriguing to know if there is room for some integration on this
aspect, but my guess is that landlock mostly uses subtree filter
after filtering by specific pids (?), so it can avoid the performance
overhead of a subtree filter on most of the users in the system.

Hope this information is useful.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fan_pre_modify-wip/
[2] https://lore.kernel.org/linux-fsdevel/20241122225958.1775625-1-song@kernel.org/
Jan Kara March 6, 2025, 9:04 p.m. UTC | #4
On Tue 04-03-25 01:12:56, Tingmao Wang wrote:
> Alternatives
> ------------
> 
> I have looked for existing ways to implement the proposed use cases (at
> least for FS access), and three main approaches stand out to me:
> 
> 1. Fanotify: there is already FAM_OPEN_PERM which waits for an allow/deny
> response from a fanotify listener.  However, it does not currently have
> the equivalent _PERM for file creation, deletion, rename and linking, and
> it is also not designed for unprivileged, process-scoped use (unlike
> landlock).

As Amir wrote, arbitration of creation / deletion / ... is not a principial
problem for fanotify and we plan to go in that direction anyway for HSM
usecase. However adjusting fanotify permission events for a per-process
scope and for unpriviledged users is a fundamental difference to how
fanotify is designed to work (it watches filesystem objects, not processes
and actions they do) and so I don't think that would be a great fit. Also I
don't see fanotify expanding in the networking area as the concepts are
rather different there :).

								Honza
Mickaël Salaün March 8, 2025, 6:57 p.m. UTC | #5
On Thu, Mar 06, 2025 at 02:57:13AM +0000, Tingmao Wang wrote:
> On 3/4/25 19:48, Mickaël Salaün wrote:
> 
> > Thanks for this RFC, this is very promising!
> 
> Hi Mickaël - thanks for the prompt review and for your support! I have read
> your replies and have some thoughts already, but I kept getting distracted
> by other stuff and so haven't had much chance to express them.  I will
> address some first today and some more over the weekend.
> 
> > Another interesting use case is to trace programs and get an
> > unprivileged "permissive" mode to quickly create sandbox policies.
> 
> Yes that would also be a good use. I thought of this initially but was
> thinking "I guess you can always do that with audit" but if we have landlock
> supervise maybe that would be an easier thing for tools to build upon...?

Both approaches are valuable.  The supervisor one would be unprivileged,
could get access to more information including O_PATH FD's, but it is
much slower and relies on user space monitoring code.

> 
> > As discussed, I was thinking about whether or not it would be possible
> > to use the fanotify interface (e.g. fanotify_init(), fanotify FD...),
> > but looking at your code, I think it would mostly increase complexity.
> > There are also the issue with the Landlock semantic (e.g. access rights)
> > which does not map 1:1 to the fanotify one.  A last thing is that
> > fanotify is deeply tied to the VFS.  So, unless someone has a better
> > idea, let's continue with your approach.
> 
> That sounds sensible - I will keep going with the current direction of a
> landlock-specific uapi. (happy to revisit should other people have
> suggestions)
> 
> > Android's SDCardFS is another example of such use.
> 
> Interesting - seems like it was deprecated for reasons unrelated to security
> though.

Yes, Android first used FUSE, then SDCardFS, then FUSE again, but the
goal has been the same:
https://source.android.com/docs/core/storage/scoped

> 
> > One of the main suggestion would be to align with the audit patch series
> > semantic and the defined "blockers":
> > https://lore.kernel.org/all/20250131163059.1139617-1-mic@digikod.net/
> > I'll send another series soon.
> 
> I will have a read of the existing audit series - are you planning
> significant changes to it in the next one?

Not significant changes but still some that hook changes that might
require a rebase.  I just sent v6, you'll find it applied here:
https://web.git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git/log/?h=next
Mickaël Salaün March 8, 2025, 7:14 p.m. UTC | #6
On Thu, Mar 06, 2025 at 06:07:35PM +0100, Amir Goldstein wrote:
> On Thu, Mar 6, 2025 at 3:57 AM Tingmao Wang <m@maowtm.org> wrote:
> >
> > On 3/4/25 19:48, Mickaël Salaün wrote:
> >
> > > Thanks for this RFC, this is very promising!
> >
> > Hi Mickaël - thanks for the prompt review and for your support! I have
> > read your replies and have some thoughts already, but I kept getting
> > distracted by other stuff and so haven't had much chance to express
> > them.  I will address some first today and some more over the weekend.
> >
> > > Another interesting use case is to trace programs and get an
> > > unprivileged "permissive" mode to quickly create sandbox policies.
> >
> > Yes that would also be a good use. I thought of this initially but was
> > thinking "I guess you can always do that with audit" but if we have
> > landlock supervise maybe that would be an easier thing for tools to
> > build upon...?
> >
> > > As discussed, I was thinking about whether or not it would be possible
> > > to use the fanotify interface (e.g. fanotify_init(), fanotify FD...),
> > > but looking at your code, I think it would mostly increase complexity.
> > > There are also the issue with the Landlock semantic (e.g. access rights)
> > > which does not map 1:1 to the fanotify one.  A last thing is that
> > > fanotify is deeply tied to the VFS.  So, unless someone has a better
> > > idea, let's continue with your approach.
> >
> > That sounds sensible - I will keep going with the current direction of a
> > landlock-specific uapi. (happy to revisit should other people have
> > suggestions)
> >
> 
> w.r.t sharing infrastructure with fanotify, I only looked briefly at
> your patches
> and I have only a vague familiarity with landlock, so I cannot yet form an
> opinion whether this is a good idea, but I wanted to give you a few more
> data points about fanotify that seem relevant.
> 
> 1. There is already some intersection of fanotify and audit lsm via the
> fanotify_response_info_audit_rule extension for permission
> events, so it's kind of a precedent of using fanotify to aid an lsm
> 
> 2. See this fan_pre_modify-wip branch [1] and specifically commit
>   "fanotify: introduce directory entry pre-modify permission events"
> I do have an intention to add create/delete/rename permission events.
> Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> far from the security_path_ lsm hooks, but not exactly in the same place
> because we want to fsnotify hooks to be before taking vfs locks, to allow
> listener to write to filesystem from event context.
> There are different semantics than just ALLOW/DENY that you need,
> therefore, only if we move the security_path_ hooks outside the
> vfs locks, our use cases could use the same hooks
> 
> 3. There is a recent attempt to add BPF filter to fanotify [2]
> which is driven among other things from the long standing requirement
> to add subtree filtering to fanotify watches.
> The challenge with all the attempt to implement a subtree filter so far,
> is that adding vfs performance overhead for all the users in the system
> is unacceptable.
> 
> IIUC, landlock rule set can already express a subtree filter (?),

Yes, Landlock uses a set of inode tags and a path walk to identify
hierarchies.

> so it is intriguing to know if there is room for some integration on this
> aspect, but my guess is that landlock mostly uses subtree filter
> after filtering by specific pids (?), so it can avoid the performance
> overhead of a subtree filter on most of the users in the system.

Landlock domains are indeed enforced for a set of specific tasks.

> 
> Hope this information is useful.

Yes, thanks for the explanations.  We should definitely take inspiration
from fanotify but I don't think it would be a good fit for Landlock: the
semantic of access rights is (and will) be different, and more
importantly it is not only to supervise filesystem accesses.

> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/linux/commits/fan_pre_modify-wip/
> [2] https://lore.kernel.org/linux-fsdevel/20241122225958.1775625-1-song@kernel.org/
Mickaël Salaün March 8, 2025, 7:15 p.m. UTC | #7
On Thu, Mar 06, 2025 at 10:04:54PM +0100, Jan Kara wrote:
> On Tue 04-03-25 01:12:56, Tingmao Wang wrote:
> > Alternatives
> > ------------
> > 
> > I have looked for existing ways to implement the proposed use cases (at
> > least for FS access), and three main approaches stand out to me:
> > 
> > 1. Fanotify: there is already FAM_OPEN_PERM which waits for an allow/deny
> > response from a fanotify listener.  However, it does not currently have
> > the equivalent _PERM for file creation, deletion, rename and linking, and
> > it is also not designed for unprivileged, process-scoped use (unlike
> > landlock).
> 
> As Amir wrote, arbitration of creation / deletion / ... is not a principial
> problem for fanotify and we plan to go in that direction anyway for HSM
> usecase. However adjusting fanotify permission events for a per-process
> scope and for unpriviledged users is a fundamental difference to how
> fanotify is designed to work (it watches filesystem objects, not processes
> and actions they do) and so I don't think that would be a great fit. Also I
> don't see fanotify expanding in the networking area as the concepts are
> rather different there :).

Yes, I agree.  We should take inspiration from the fanonify interface
though.

> 
> 								Honza
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
>
Tingmao Wang March 11, 2025, 12:42 a.m. UTC | #8
On 3/6/25 17:07, Amir Goldstein wrote:
[...]
> 
> w.r.t sharing infrastructure with fanotify, I only looked briefly at
> your patches
> and I have only a vague familiarity with landlock, so I cannot yet form an
> opinion whether this is a good idea, but I wanted to give you a few more
> data points about fanotify that seem relevant.
> 
> 1. There is already some intersection of fanotify and audit lsm via the
> fanotify_response_info_audit_rule extension for permission
> events, so it's kind of a precedent of using fanotify to aid an lsm
> 
> 2. See this fan_pre_modify-wip branch [1] and specifically commit
>    "fanotify: introduce directory entry pre-modify permission events"
> I do have an intention to add create/delete/rename permission events.
> Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> far from the security_path_ lsm hooks, but not exactly in the same place
> because we want to fsnotify hooks to be before taking vfs locks, to allow
> listener to write to filesystem from event context.
> There are different semantics than just ALLOW/DENY that you need,
> therefore, only if we move the security_path_ hooks outside the
> vfs locks, our use cases could use the same hooks

Hi Amir,

(this is a slightly long message - feel free to respond at your 
convenience, thank you in advance!)

Thanks a lot for mentioning this branch, and for the explanation! I've 
had a look and realized that the changes you have there will be very 
useful for this patch, and in fact, I've already tried a worse attempt 
of this (not included in this patch series yet) to create some 
security_pathname_ hooks that takes the parent struct path + last name 
as char*, that will be called before locking the parent.  (We can't have 
an unprivileged supervisor cause a directory to be locked indefinitely, 
which will also block users outside of the landlock domain)

I'm not sure if we can move security_path tho, because it takes the 
dentry of the child as an argument, and (I think at least for create / 
mknod / link) that dentry is only created after locking.  Hence the 
proposal for separate security_pathname_ hooks.  A search shows that 
currently AppArmor and TOMOYO (plus Landlock) uses the security_path_ 
hooks that would need changing, if we move it (and we will have to 
understand if the move is ok to do for the other two LSMs...)

However, I think it would still make a lot of sense to align with 
fsnotify here, as you have already made the changes that I would need to 
do anyway should I implement the proposed new hooks.  I think a sensible 
thing might be to have the extra LSM hooks be called alongside 
fsnotify_(re)name_perm - following the pattern of what currently happens 
with fsnotify_open_perm (i.e. security_file_open called first, then 
fsnotify_open_perm right after).

What's your thought on this? Do you think it would be a good idea to 
have LSM hook equivalents of the fsnotify (re)name perm hooks / fanotify 
pre-modify events?

Also, do you have a rough estimate of when you would upstream the 
fa/fsnotify changes? (asking just to get an idea of things, not trying 
to rush or anything :) I suspect this supervise patch would take a while 
anyway)

If you think the general idea is right, here are some further questions 
I have:

I think going by this approach any error return from 
security_pathname_mknod (or in fact, fsnotify_name_perm) when called in 
the open O_CREAT code path would end up becoming a -EROFS.  Can we turn 
the bool got_write in open_last_lookups into an int to store any error 
from mnt_want_write_parent, and return it if lookup_open returns -EROFS? 
  This is so that the user space still gets an -EACCESS on create 
denials by landlock (and in fact, if fanotify denies a create maybe we 
want it to return the correct errno also?). Maybe there is a better way, 
this is just my first though...

I also noticed that you don't currently have fsnotify hook calls for 
link (although it does end up invoking the name_perm hook on the dest 
with MAY_CREATE).  I want to propose also changing do_linkat to (pass 
the right flags to filename_create_srcu -> mnt_want_write_parent to) 
call the security_pathname_link hook (instead of the LSM hook it would 
normally call for a creation event in this proposal) that is basically 
like security_path_link, except passing the destination as a dir/name 
pair, and without holding vfs lock (still passing in the dentry of the 
source itself), to enable landlock to handle link requests separately. 
Do you think this is alright?  (Maybe the code would be a bit convoluted 
if written verbatim from this logic, maybe there is a better way, but 
the general idea is hopefully right)

btw, side question, I see that you added srcu read sections around the 
events - I'm not familiar with rcu/locking usage in vfs but is this for 
preventing e.g. changing the mount in some way (but still allowing 
access / changes to the directory)?

I realize I'm asking you a lot of things - big thanks in advance!  (also 
let me know if I should be pulling in other VFS maintainers)

--

For Mickaël,

Would you be on board with changing Landlock to use the new hooks as 
mentioned above?  My thinking is that it shouldn't make any difference 
in terms of security - Landlock permissions for e.g. creating/deleting 
files are based on the parent, and in fact except for link and rename, 
the hook_path_ functions in Landlock don't even use the dentry argument. 
  If you're happy with the general direction of this, I can investigate 
further and test it out etc.  This change might also reduce the impact 
of Landlock on non-landlocked processes, if we avoid holding exclusive 
inode lock while evaluating rules / traversing paths...? (Just a 
thought, not measured)

In terms of other aspects, ignoring supervisors for now, moving to these 
hooks:

- Should make no difference in the "happy" (access allowed) case

- Only when an access is disallowed, in order to know what error to
   return, we can check (within Landlock hook handler) if the target
   already exists - if yes, return -EEXIST, otherwise -EACCESS

If this is too large of a change at this point and you see / would 
prefer another way we can progress this series (at least the initial 
version), let me know.

Kind regards,
Tingmao

> 
> 3. There is a recent attempt to add BPF filter to fanotify [2]
> which is driven among other things from the long standing requirement
> to add subtree filtering to fanotify watches.
> The challenge with all the attempt to implement a subtree filter so far,
> is that adding vfs performance overhead for all the users in the system
> is unacceptable.
> 
> IIUC, landlock rule set can already express a subtree filter (?),
> so it is intriguing to know if there is room for some integration on this
> aspect, but my guess is that landlock mostly uses subtree filter
> after filtering by specific pids (?), so it can avoid the performance
> overhead of a subtree filter on most of the users in the system.
> 
> Hope this information is useful.
> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/linux/commits/fan_pre_modify-wip/
> [2] https://lore.kernel.org/linux-fsdevel/20241122225958.1775625-1-song@kernel.org/
Mickaël Salaün March 11, 2025, 7:28 p.m. UTC | #9
On Tue, Mar 11, 2025 at 12:42:05AM +0000, Tingmao Wang wrote:
> On 3/6/25 17:07, Amir Goldstein wrote:
> [...]
> > 
> > w.r.t sharing infrastructure with fanotify, I only looked briefly at
> > your patches
> > and I have only a vague familiarity with landlock, so I cannot yet form an
> > opinion whether this is a good idea, but I wanted to give you a few more
> > data points about fanotify that seem relevant.
> > 
> > 1. There is already some intersection of fanotify and audit lsm via the
> > fanotify_response_info_audit_rule extension for permission
> > events, so it's kind of a precedent of using fanotify to aid an lsm
> > 
> > 2. See this fan_pre_modify-wip branch [1] and specifically commit
> >    "fanotify: introduce directory entry pre-modify permission events"
> > I do have an intention to add create/delete/rename permission events.
> > Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> > far from the security_path_ lsm hooks, but not exactly in the same place
> > because we want to fsnotify hooks to be before taking vfs locks, to allow
> > listener to write to filesystem from event context.
> > There are different semantics than just ALLOW/DENY that you need,
> > therefore, only if we move the security_path_ hooks outside the
> > vfs locks, our use cases could use the same hooks
> 
> Hi Amir,
> 
> (this is a slightly long message - feel free to respond at your convenience,
> thank you in advance!)
> 
> Thanks a lot for mentioning this branch, and for the explanation! I've had a
> look and realized that the changes you have there will be very useful for
> this patch, and in fact, I've already tried a worse attempt of this (not
> included in this patch series yet) to create some security_pathname_ hooks
> that takes the parent struct path + last name as char*, that will be called
> before locking the parent.  (We can't have an unprivileged supervisor cause
> a directory to be locked indefinitely, which will also block users outside
> of the landlock domain)
> 
> I'm not sure if we can move security_path tho, because it takes the dentry
> of the child as an argument, and (I think at least for create / mknod /
> link) that dentry is only created after locking.  Hence the proposal for
> separate security_pathname_ hooks.  A search shows that currently AppArmor
> and TOMOYO (plus Landlock) uses the security_path_ hooks that would need
> changing, if we move it (and we will have to understand if the move is ok to
> do for the other two LSMs...)
> 
> However, I think it would still make a lot of sense to align with fsnotify
> here, as you have already made the changes that I would need to do anyway
> should I implement the proposed new hooks.  I think a sensible thing might
> be to have the extra LSM hooks be called alongside fsnotify_(re)name_perm -
> following the pattern of what currently happens with fsnotify_open_perm
> (i.e. security_file_open called first, then fsnotify_open_perm right after).

Yes, I think it would make sense to use the same hooks for fanotify and
other security subsystems, or at least to share them.  It would improve
consistency across different Linux subsystems and simplify changes and
maintenance where these hooks are called.

> 
> What's your thought on this? Do you think it would be a good idea to have
> LSM hook equivalents of the fsnotify (re)name perm hooks / fanotify
> pre-modify events?
> 
> Also, do you have a rough estimate of when you would upstream the
> fa/fsnotify changes? (asking just to get an idea of things, not trying to
> rush or anything :) I suspect this supervise patch would take a while
> anyway)
> 
> If you think the general idea is right, here are some further questions I
> have:
> 
> I think going by this approach any error return from security_pathname_mknod
> (or in fact, fsnotify_name_perm) when called in the open O_CREAT code path
> would end up becoming a -EROFS.  Can we turn the bool got_write in
> open_last_lookups into an int to store any error from mnt_want_write_parent,
> and return it if lookup_open returns -EROFS?  This is so that the user space
> still gets an -EACCESS on create denials by landlock (and in fact, if
> fanotify denies a create maybe we want it to return the correct errno
> also?). Maybe there is a better way, this is just my first though...
> 
> I also noticed that you don't currently have fsnotify hook calls for link
> (although it does end up invoking the name_perm hook on the dest with
> MAY_CREATE).  I want to propose also changing do_linkat to (pass the right
> flags to filename_create_srcu -> mnt_want_write_parent to) call the
> security_pathname_link hook (instead of the LSM hook it would normally call
> for a creation event in this proposal) that is basically like
> security_path_link, except passing the destination as a dir/name pair, and
> without holding vfs lock (still passing in the dentry of the source itself),
> to enable landlock to handle link requests separately. Do you think this is
> alright?  (Maybe the code would be a bit convoluted if written verbatim from
> this logic, maybe there is a better way, but the general idea is hopefully
> right)
> 
> btw, side question, I see that you added srcu read sections around the
> events - I'm not familiar with rcu/locking usage in vfs but is this for
> preventing e.g. changing the mount in some way (but still allowing access /
> changes to the directory)?
> 
> I realize I'm asking you a lot of things - big thanks in advance!  (also let
> me know if I should be pulling in other VFS maintainers)
> 
> --
> 
> For Mickaël,
> 
> Would you be on board with changing Landlock to use the new hooks as
> mentioned above?  My thinking is that it shouldn't make any difference in
> terms of security - Landlock permissions for e.g. creating/deleting files
> are based on the parent, and in fact except for link and rename, the
> hook_path_ functions in Landlock don't even use the dentry argument.  If
> you're happy with the general direction of this, I can investigate further
> and test it out etc.  This change might also reduce the impact of Landlock
> on non-landlocked processes, if we avoid holding exclusive inode lock while
> evaluating rules / traversing paths...? (Just a thought, not measured)

This looks reasonable.  As long as the semantic does not change it
should be good and Landlock tests should pass.  That would also require
other users of this hook to make sure it works for them too.  If it is
not the case, I guess we could add an alternative hooks with different
properties.  However, see the issue and the alternative approach below.

> 
> In terms of other aspects, ignoring supervisors for now, moving to these
> hooks:
> 
> - Should make no difference in the "happy" (access allowed) case
> 
> - Only when an access is disallowed, in order to know what error to
>   return, we can check (within Landlock hook handler) if the target
>   already exists - if yes, return -EEXIST, otherwise -EACCESS

We should avoid as much as possible to reimplement the error types in
fanotify/LSM hooks.  This is partially done for the VFS, and completely
duplicated for the network, which can lead to inconsistent errors.  It
would be good to only have one source of truth, but that might not be
possible in all cases.

> 
> If this is too large of a change at this point and you see / would prefer
> another way we can progress this series (at least the initial version), let
> me know.

For this patch series to work, we need all (used) LSM hooks to be
blockable (and interruptible).  We should then investigate if this is
possible, especially with the new fanotify hooks, but I don't think it
would work for all hooks (already or that will potentially be used by
Landlock).

An alternative approach would be to add a task_work (executed before
returning to user space) that will wait for the supervisor to take a
decision, and in the meantime the LSM hook would return -ERESTARTNOINTR
for the syscall to start again after the wait.  However, because the
request to the supervisor would be called outside of the hook, it should
not be possible to directly allow the request (because of race
condition) but to update the domain accordingly.  The restarted syscall
must not trigger a supervisor request though.

> 
> Kind regards,
> Tingmao
> 
> > 
> > 3. There is a recent attempt to add BPF filter to fanotify [2]
> > which is driven among other things from the long standing requirement
> > to add subtree filtering to fanotify watches.
> > The challenge with all the attempt to implement a subtree filter so far,
> > is that adding vfs performance overhead for all the users in the system
> > is unacceptable.
> > 
> > IIUC, landlock rule set can already express a subtree filter (?),
> > so it is intriguing to know if there is room for some integration on this
> > aspect, but my guess is that landlock mostly uses subtree filter
> > after filtering by specific pids (?), so it can avoid the performance
> > overhead of a subtree filter on most of the users in the system.
> > 
> > Hope this information is useful.
> > 
> > Thanks,
> > Amir.
> > 
> > [1] https://github.com/amir73il/linux/commits/fan_pre_modify-wip/
> > [2] https://lore.kernel.org/linux-fsdevel/20241122225958.1775625-1-song@kernel.org/
> 
>
Song Liu March 11, 2025, 8:58 p.m. UTC | #10
On Tue, Mar 11, 2025 at 12:28 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Tue, Mar 11, 2025 at 12:42:05AM +0000, Tingmao Wang wrote:
> > On 3/6/25 17:07, Amir Goldstein wrote:
> > [...]
> > >
> > > w.r.t sharing infrastructure with fanotify, I only looked briefly at
> > > your patches
> > > and I have only a vague familiarity with landlock, so I cannot yet form an
> > > opinion whether this is a good idea, but I wanted to give you a few more
> > > data points about fanotify that seem relevant.
> > >
> > > 1. There is already some intersection of fanotify and audit lsm via the
> > > fanotify_response_info_audit_rule extension for permission
> > > events, so it's kind of a precedent of using fanotify to aid an lsm
> > >
> > > 2. See this fan_pre_modify-wip branch [1] and specifically commit
> > >    "fanotify: introduce directory entry pre-modify permission events"
> > > I do have an intention to add create/delete/rename permission events.
> > > Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> > > far from the security_path_ lsm hooks, but not exactly in the same place
> > > because we want to fsnotify hooks to be before taking vfs locks, to allow
> > > listener to write to filesystem from event context.
> > > There are different semantics than just ALLOW/DENY that you need,
> > > therefore, only if we move the security_path_ hooks outside the
> > > vfs locks, our use cases could use the same hooks
> >
> > Hi Amir,
> >
> > (this is a slightly long message - feel free to respond at your convenience,
> > thank you in advance!)
> >
> > Thanks a lot for mentioning this branch, and for the explanation! I've had a
> > look and realized that the changes you have there will be very useful for
> > this patch, and in fact, I've already tried a worse attempt of this (not
> > included in this patch series yet) to create some security_pathname_ hooks
> > that takes the parent struct path + last name as char*, that will be called
> > before locking the parent.  (We can't have an unprivileged supervisor cause
> > a directory to be locked indefinitely, which will also block users outside
> > of the landlock domain)
> >
> > I'm not sure if we can move security_path tho, because it takes the dentry
> > of the child as an argument, and (I think at least for create / mknod /
> > link) that dentry is only created after locking.  Hence the proposal for
> > separate security_pathname_ hooks.  A search shows that currently AppArmor
> > and TOMOYO (plus Landlock) uses the security_path_ hooks that would need
> > changing, if we move it (and we will have to understand if the move is ok to
> > do for the other two LSMs...)
> >
> > However, I think it would still make a lot of sense to align with fsnotify
> > here, as you have already made the changes that I would need to do anyway
> > should I implement the proposed new hooks.  I think a sensible thing might
> > be to have the extra LSM hooks be called alongside fsnotify_(re)name_perm -
> > following the pattern of what currently happens with fsnotify_open_perm
> > (i.e. security_file_open called first, then fsnotify_open_perm right after).

I think there is a fundamental difference between LSM hooks and fsnotify,
so putting fsnotify behind some LSM hooks might be weird. Specifically,
LSM hooks are always global. If a LSM attaches to a hook, say
security_file_open, it will see all the file open calls in the system. On the
other hand, each fsnotify rule only applies to a group, so that one fanotify
handler doesn't touch files watched by another fanotify handler. Given this
difference, I am not sure how fsnotify LSM hooks should look like.

Does this make sense?

> Yes, I think it would make sense to use the same hooks for fanotify and
> other security subsystems, or at least to share them.  It would improve
> consistency across different Linux subsystems and simplify changes and
> maintenance where these hooks are called.

[...]

> > --
> >
> > For Mickaël,
> >
> > Would you be on board with changing Landlock to use the new hooks as
> > mentioned above?  My thinking is that it shouldn't make any difference in
> > terms of security - Landlock permissions for e.g. creating/deleting files
> > are based on the parent, and in fact except for link and rename, the
> > hook_path_ functions in Landlock don't even use the dentry argument.  If
> > you're happy with the general direction of this, I can investigate further
> > and test it out etc.  This change might also reduce the impact of Landlock
> > on non-landlocked processes, if we avoid holding exclusive inode lock while
> > evaluating rules / traversing paths...? (Just a thought, not measured)

I think the filter for process/thread is usually faster than the filter for
file/path/subtree? Therefore, it is better for landlock to check the filter for
process/thread first. Did I miss/misunderstand something?

Thanks,
Song




> This looks reasonable.  As long as the semantic does not change it
> should be good and Landlock tests should pass.  That would also require
> other users of this hook to make sure it works for them too.  If it is
> not the case, I guess we could add an alternative hooks with different
> properties.  However, see the issue and the alternative approach below.
>
Tingmao Wang March 11, 2025, 10:03 p.m. UTC | #11
On 3/11/25 20:58, Song Liu wrote:
> On Tue, Mar 11, 2025 at 12:28 PM Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On Tue, Mar 11, 2025 at 12:42:05AM +0000, Tingmao Wang wrote:
>>> On 3/6/25 17:07, Amir Goldstein wrote:
>>> [...]
>>>>
>>>> w.r.t sharing infrastructure with fanotify, I only looked briefly at
>>>> your patches
>>>> and I have only a vague familiarity with landlock, so I cannot yet form an
>>>> opinion whether this is a good idea, but I wanted to give you a few more
>>>> data points about fanotify that seem relevant.
>>>>
>>>> 1. There is already some intersection of fanotify and audit lsm via the
>>>> fanotify_response_info_audit_rule extension for permission
>>>> events, so it's kind of a precedent of using fanotify to aid an lsm
>>>>
>>>> 2. See this fan_pre_modify-wip branch [1] and specifically commit
>>>>     "fanotify: introduce directory entry pre-modify permission events"
>>>> I do have an intention to add create/delete/rename permission events.
>>>> Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
>>>> far from the security_path_ lsm hooks, but not exactly in the same place
>>>> because we want to fsnotify hooks to be before taking vfs locks, to allow
>>>> listener to write to filesystem from event context.
>>>> There are different semantics than just ALLOW/DENY that you need,
>>>> therefore, only if we move the security_path_ hooks outside the
>>>> vfs locks, our use cases could use the same hooks
>>>
>>> Hi Amir,
>>>
>>> (this is a slightly long message - feel free to respond at your convenience,
>>> thank you in advance!)
>>>
>>> Thanks a lot for mentioning this branch, and for the explanation! I've had a
>>> look and realized that the changes you have there will be very useful for
>>> this patch, and in fact, I've already tried a worse attempt of this (not
>>> included in this patch series yet) to create some security_pathname_ hooks
>>> that takes the parent struct path + last name as char*, that will be called
>>> before locking the parent.  (We can't have an unprivileged supervisor cause
>>> a directory to be locked indefinitely, which will also block users outside
>>> of the landlock domain)
>>>
>>> I'm not sure if we can move security_path tho, because it takes the dentry
>>> of the child as an argument, and (I think at least for create / mknod /
>>> link) that dentry is only created after locking.  Hence the proposal for
>>> separate security_pathname_ hooks.  A search shows that currently AppArmor
>>> and TOMOYO (plus Landlock) uses the security_path_ hooks that would need
>>> changing, if we move it (and we will have to understand if the move is ok to
>>> do for the other two LSMs...)
>>>
>>> However, I think it would still make a lot of sense to align with fsnotify
>>> here, as you have already made the changes that I would need to do anyway
>>> should I implement the proposed new hooks.  I think a sensible thing might
>>> be to have the extra LSM hooks be called alongside fsnotify_(re)name_perm -
>>> following the pattern of what currently happens with fsnotify_open_perm
>>> (i.e. security_file_open called first, then fsnotify_open_perm right after).
> 
> I think there is a fundamental difference between LSM hooks and fsnotify,
> so putting fsnotify behind some LSM hooks might be weird. Specifically,
> LSM hooks are always global. If a LSM attaches to a hook, say
> security_file_open, it will see all the file open calls in the system. On the
> other hand, each fsnotify rule only applies to a group, so that one fanotify
> handler doesn't touch files watched by another fanotify handler. Given this
> difference, I am not sure how fsnotify LSM hooks should look like.
> 
> Does this make sense?

To clarify, I wasn't suggesting that we put one hook _behind_ another 
("behind" in the sense of one calling the other), just that the place 
that calls the new fsnotify_name_perm/fsnotify_rename_perm hook (in 
Amir's WIP branch) could also be made to call some new LSM hooks in 
addition to fsnotify (i.e. security_pathname_create/delete/rename).

My understanding of the current code is that VFS calls security_... and 
fsnotify_... unconditionally, and the fsnotify_... functions figure out 
who needs to be notified.

> 
>> Yes, I think it would make sense to use the same hooks for fanotify and
>> other security subsystems, or at least to share them.  It would improve
>> consistency across different Linux subsystems and simplify changes and
>> maintenance where these hooks are called.

Mickaël - I'm not sure what you mean by "the same hook" - do you mean 
the relevant VFS functions could call both fsnotify and LSM hooks?

> 
> [...]
> 
>>> --
>>>
>>> For Mickaël,
>>>
>>> Would you be on board with changing Landlock to use the new hooks as
>>> mentioned above?  My thinking is that it shouldn't make any difference in
>>> terms of security - Landlock permissions for e.g. creating/deleting files
>>> are based on the parent, and in fact except for link and rename, the
>>> hook_path_ functions in Landlock don't even use the dentry argument.  If
>>> you're happy with the general direction of this, I can investigate further
>>> and test it out etc.  This change might also reduce the impact of Landlock
>>> on non-landlocked processes, if we avoid holding exclusive inode lock while
>>> evaluating rules / traversing paths...? (Just a thought, not measured)
> 
> I think the filter for process/thread is usually faster than the filter for
> file/path/subtree? Therefore, it is better for landlock to check the filter for
> process/thread first. Did I miss/misunderstand something?
>

Sorry, I should have clarified that the "impact" I'm talking about here 
isn't referring to directly the time it takes for landlock to decide if 
an access is allowed or not - in a non-landlocked process, the landlock 
hooks already returns really early and fast.  However, I'm thinking of a 
situation where a landlocked process makes lots of create/delete etc 
requests on a directory, and landlock does need to do some work (e.g. 
path traversal) to decide those access.  Because the 
security_path_mknod/unlink/... hooks are called in the VFS from a place 
where it is holding an exclusive lock on the directory (for O_CREAT'ing 
a child or other directory modification cases), when landlock is working 
out an access by the landlocked process, no other tasks will be able to 
read/write the directory (they will be blocked on the lock), even if 
their access have nothing to do with landlock.

I should add that this is probably just a very minor impact: the user 
space can't cause the dir to be blocked for arbitrary amount of time, at 
worst slowing everyone else down by a bit if it deliberately creates 
lots of layers (max 16) each with lots of rules (the ruleset evaluation 
is O(log(#rules) * dir_depth)). I didn't measure it, it's just something 
that occurred to me that could be improved by using new hooks that 
aren't called with inode locks held.

Kind regards,
Tingmao

> Thanks,
> Song
> 
> 
> 
> 
>> This looks reasonable.  As long as the semantic does not change it
>> should be good and Landlock tests should pass.  That would also require
>> other users of this hook to make sure it works for them too.  If it is
>> not the case, I guess we could add an alternative hooks with different
>> properties.  However, see the issue and the alternative approach below.
>>
Song Liu March 11, 2025, 11:23 p.m. UTC | #12
On Tue, Mar 11, 2025 at 3:03 PM Tingmao Wang <m@maowtm.org> wrote:
[...]
> >
> > I think there is a fundamental difference between LSM hooks and fsnotify,
> > so putting fsnotify behind some LSM hooks might be weird. Specifically,
> > LSM hooks are always global. If a LSM attaches to a hook, say
> > security_file_open, it will see all the file open calls in the system. On the
> > other hand, each fsnotify rule only applies to a group, so that one fanotify
> > handler doesn't touch files watched by another fanotify handler. Given this
> > difference, I am not sure how fsnotify LSM hooks should look like.
> >
> > Does this make sense?
>
> To clarify, I wasn't suggesting that we put one hook _behind_ another
> ("behind" in the sense of one calling the other), just that the place
> that calls the new fsnotify_name_perm/fsnotify_rename_perm hook (in
> Amir's WIP branch) could also be made to call some new LSM hooks in
> addition to fsnotify (i.e. security_pathname_create/delete/rename).
>
> My understanding of the current code is that VFS calls security_... and
> fsnotify_... unconditionally, and the fsnotify_... functions figure out
> who needs to be notified.

Yes, VFS calls security_* and fsnotify_* unconditionally. In this sense,
fsnotify can be implemented as a LSM. But fsnotify also supports some
non-security use cases. So it will be weird to implement it as a LSM.

Thanks,
Song

[...]
Tetsuo Handa March 12, 2025, 6:20 a.m. UTC | #13
On 2025/03/04 10:12, Tingmao Wang wrote:
>     bash # env LL_FS_RO=/usr:/lib:/bin:/etc:/dev:/proc LL_FS_RW= LL_SUPERVISE=1 ./sandboxer bash -i
>     bash # echo "Hi, $(whoami)!"
>     Hi, root!
>     bash # ls /
>     ------------- Sandboxer access request -------------
>     Process ls[166] (/usr/bin/ls) wants to read
>       /
>     (y)es/(a)lways/(n)o > y
>     ----------------------------------------------------
>     bin
>     boot
>     dev
>     ...
>     usr
>     var
>     bash # echo 'evil' >> /etc/profile
>     (a spurious create request due to current issue with dcache miss is omitted)
>     ------------- Sandboxer access request -------------
>     Process bash[163] (/usr/bin/bash) wants to read/write
>       /etc/profile
>     (y)es/(a)lways/(n)o > n
>     ----------------------------------------------------
>     bash: /etc/profile: Permission denied
>     bash #

Please check TOMOYO, for TOMOYO is already doing it.

https://tomoyo.sourceforge.net/2.6/chapter-7.html#7.3
Jan Kara March 12, 2025, 10:58 a.m. UTC | #14
On Tue 11-03-25 00:42:05, Tingmao Wang wrote:
> On 3/6/25 17:07, Amir Goldstein wrote:
> [...]
> > 
> > w.r.t sharing infrastructure with fanotify, I only looked briefly at
> > your patches
> > and I have only a vague familiarity with landlock, so I cannot yet form an
> > opinion whether this is a good idea, but I wanted to give you a few more
> > data points about fanotify that seem relevant.
> > 
> > 1. There is already some intersection of fanotify and audit lsm via the
> > fanotify_response_info_audit_rule extension for permission
> > events, so it's kind of a precedent of using fanotify to aid an lsm
> > 
> > 2. See this fan_pre_modify-wip branch [1] and specifically commit
> >    "fanotify: introduce directory entry pre-modify permission events"
> > I do have an intention to add create/delete/rename permission events.
> > Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> > far from the security_path_ lsm hooks, but not exactly in the same place
> > because we want to fsnotify hooks to be before taking vfs locks, to allow
> > listener to write to filesystem from event context.
> > There are different semantics than just ALLOW/DENY that you need,
> > therefore, only if we move the security_path_ hooks outside the
> > vfs locks, our use cases could use the same hooks
> 
> Hi Amir,
> 
> (this is a slightly long message - feel free to respond at your convenience,
> thank you in advance!)
> 
> Thanks a lot for mentioning this branch, and for the explanation! I've had a
> look and realized that the changes you have there will be very useful for
> this patch, and in fact, I've already tried a worse attempt of this (not
> included in this patch series yet) to create some security_pathname_ hooks
> that takes the parent struct path + last name as char*, that will be called
> before locking the parent.  (We can't have an unprivileged supervisor cause
> a directory to be locked indefinitely, which will also block users outside
> of the landlock domain)

Well, but if you call the hook before locking the parent isn't your hook
prone to TOCTOU races? I mean you call the hook do your stuff in it and
then before the parent is locked, the whole directory hierarchy can get
reorganized (from another process) without you knowing... So I'm not sure
which guarantees you can provide for such hooks.

								Honza
Mickaël Salaün March 12, 2025, 11:50 a.m. UTC | #15
On Tue, Mar 11, 2025 at 01:58:57PM -0700, Song Liu wrote:
> On Tue, Mar 11, 2025 at 12:28 PM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Tue, Mar 11, 2025 at 12:42:05AM +0000, Tingmao Wang wrote:
> > > On 3/6/25 17:07, Amir Goldstein wrote:
> > > [...]

> > > --
> > >
> > > For Mickaël,
> > >
> > > Would you be on board with changing Landlock to use the new hooks as
> > > mentioned above?  My thinking is that it shouldn't make any difference in
> > > terms of security - Landlock permissions for e.g. creating/deleting files
> > > are based on the parent, and in fact except for link and rename, the
> > > hook_path_ functions in Landlock don't even use the dentry argument.  If
> > > you're happy with the general direction of this, I can investigate further
> > > and test it out etc.  This change might also reduce the impact of Landlock
> > > on non-landlocked processes, if we avoid holding exclusive inode lock while
> > > evaluating rules / traversing paths...? (Just a thought, not measured)
> 
> I think the filter for process/thread is usually faster than the filter for
> file/path/subtree? Therefore, it is better for landlock to check the filter for
> process/thread first. Did I miss/misunderstand something?

The main reason is because only sandboxed processes should be impacted
by Landlock.  Similarly, only the security policies restricting a
process impact this process.  Using 16 layers would only impact the
process that sandboxed itself (and BTW the impact of the number of
layers would be negligible).  There is not really process filters, only
pointers set or not in tasks' credentials.
Amir Goldstein March 12, 2025, 12:26 p.m. UTC | #16
On Tue, Mar 11, 2025 at 1:42 AM Tingmao Wang <m@maowtm.org> wrote:
>
> On 3/6/25 17:07, Amir Goldstein wrote:
> [...]
> >
> > w.r.t sharing infrastructure with fanotify, I only looked briefly at
> > your patches
> > and I have only a vague familiarity with landlock, so I cannot yet form an
> > opinion whether this is a good idea, but I wanted to give you a few more
> > data points about fanotify that seem relevant.
> >
> > 1. There is already some intersection of fanotify and audit lsm via the
> > fanotify_response_info_audit_rule extension for permission
> > events, so it's kind of a precedent of using fanotify to aid an lsm
> >
> > 2. See this fan_pre_modify-wip branch [1] and specifically commit
> >    "fanotify: introduce directory entry pre-modify permission events"
> > I do have an intention to add create/delete/rename permission events.
> > Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> > far from the security_path_ lsm hooks, but not exactly in the same place
> > because we want to fsnotify hooks to be before taking vfs locks, to allow
> > listener to write to filesystem from event context.
> > There are different semantics than just ALLOW/DENY that you need,
> > therefore, only if we move the security_path_ hooks outside the
> > vfs locks, our use cases could use the same hooks
>
> Hi Amir,
>
> (this is a slightly long message - feel free to respond at your
> convenience, thank you in advance!)
>
> Thanks a lot for mentioning this branch, and for the explanation! I've
> had a look and realized that the changes you have there will be very
> useful for this patch, and in fact, I've already tried a worse attempt
> of this (not included in this patch series yet) to create some
> security_pathname_ hooks that takes the parent struct path + last name
> as char*, that will be called before locking the parent.  (We can't have
> an unprivileged supervisor cause a directory to be locked indefinitely,
> which will also block users outside of the landlock domain)
>
> I'm not sure if we can move security_path tho, because it takes the
> dentry of the child as an argument, and (I think at least for create /
> mknod / link) that dentry is only created after locking.  Hence the
> proposal for separate security_pathname_ hooks.  A search shows that
> currently AppArmor and TOMOYO (plus Landlock) uses the security_path_
> hooks that would need changing, if we move it (and we will have to
> understand if the move is ok to do for the other two LSMs...)
>
> However, I think it would still make a lot of sense to align with
> fsnotify here, as you have already made the changes that I would need to
> do anyway should I implement the proposed new hooks.  I think a sensible
> thing might be to have the extra LSM hooks be called alongside
> fsnotify_(re)name_perm - following the pattern of what currently happens
> with fsnotify_open_perm (i.e. security_file_open called first, then
> fsnotify_open_perm right after).
>
> What's your thought on this? Do you think it would be a good idea to
> have LSM hook equivalents of the fsnotify (re)name perm hooks / fanotify
> pre-modify events?
>

No clear answer but some data points:

The fanotify permission hooks (formerly fsnotify_perm) used to be inside
security_file_{open,permission} so when I started looking at dir modification
permission events I started to try using the security_path_ hooks, but as the
work progressed I found that fsnotify hooks have different
requirements (no locks).

Later, we found out that the existing fsnotify permission hooks have different
needs than the existing security hooks (for pre-content events), so after:

1cda52f1b4611 fsnotify, lsm: Decouple fsnotify from lsm

fanotify is not using any LSM hooks and not dependent on CONFIG_SECURITY.

Mentally, I do find it easy for fsnotify and security hook to be next
to each other,
unless there is a reason to do it otherwise, because from vfs POV they
are mostly
the same, but note that my branch implements the new fsnotify hooks actually as
scopes (for sb_write_srcu) and in some cases as other people have mentioned
on this thread, the security hooks need to be inside the vfs locks,
while the fsnotify
hooks need to be outside of the locks.

> Also, do you have a rough estimate of when you would upstream the
> fa/fsnotify changes? (asking just to get an idea of things, not trying
> to rush or anything :) I suspect this supervise patch would take a while
> anyway)
>

Besides my time to work on this, these patches are waiting for some
other things.

One is that I was waiting with promoting those patches until pre-content patches
got merged and that took longer than expected and even now there may
need to be follow ups in the next cycle.

Another thing is that these patches rely on the sb_write_srcu design concept
which is pretty intrusive to vfs, so I still need to sell this to vfs people.
I am going to make another shot of an elevator pitch at LSFMM in two weeks,
If we get past this design hurdle, the rest of the work will depend on
how much time I can spend on it.

I do *want* to  make the patches in time for the 2025 LTS kernel, but it may
not be a realistic goal.

One thing that helped a lot with pushing pre-content events is that Meta
already had a production use case for it.

I do not know of anyone else that requested the pre-directory-modify
hooks (besides myself), so that may make the sale a bit harder.
If there is someone out there that does need the pre-directory-modify
hooks now would be a good time to speak up.

> If you think the general idea is right, here are some further questions
> I have:
>
> I think going by this approach any error return from
> security_pathname_mknod (or in fact, fsnotify_name_perm) when called in
> the open O_CREAT code path would end up becoming a -EROFS.  Can we turn
> the bool got_write in open_last_lookups into an int to store any error
> from mnt_want_write_parent, and return it if lookup_open returns -EROFS?

IIUC you mean like this:

               err = mnt_want_write_parent(&nd->path, MAY_CREATE,
                                                  &res, &idx);
               if (err && err != -EROFS)
                       return err;
               got_write = !err;
               /*
                * do _not_ fail yet - ....

Yes, I think that is better, because the logic in the comment only
applies to EROFS.

>   This is so that the user space still gets an -EACCESS on create
> denials by landlock (and in fact, if fanotify denies a create maybe we
> want it to return the correct errno also?). Maybe there is a better way,
> this is just my first though...
>
> I also noticed that you don't currently have fsnotify hook calls for
> link (although it does end up invoking the name_perm hook on the dest
> with MAY_CREATE).  I want to propose also changing do_linkat to (pass
> the right flags to filename_create_srcu -> mnt_want_write_parent to)
> call the security_pathname_link hook (instead of the LSM hook it would
> normally call for a creation event in this proposal) that is basically
> like security_path_link, except passing the destination as a dir/name
> pair, and without holding vfs lock (still passing in the dentry of the
> source itself), to enable landlock to handle link requests separately.
> Do you think this is alright?  (Maybe the code would be a bit convoluted
> if written verbatim from this logic, maybe there is a better way, but
> the general idea is hopefully right)

I am not sure I understand your question.
fsnotify does not need to know this is a LINK and not CREATE.
I do not know what the requirements of other LSMs for those hooks,
so hard to say if it is ok to move those hooks but my guess is not ok.

>
> btw, side question, I see that you added srcu read sections around the
> events - I'm not familiar with rcu/locking usage in vfs but is this for
> preventing e.g. changing the mount in some way (but still allowing
> access / changes to the directory)?
>

No. this is meant to accommodate fsnotify_wait_handle_events()
(see last patch) - wait for in-flight modifications to complete without blocking
new modifications. That's the concept that I need to sell to vfs people.

Thanks,
Amir.