diff mbox series

[2/2] acct: block access to kernel internal filesystems

Message ID 20250211-work-acct-v1-2-1c16aecab8b3@kernel.org (mailing list archive)
State New
Headers show
Series acct: don't allow access to internal filesystems | expand

Commit Message

Christian Brauner Feb. 11, 2025, 5:16 p.m. UTC
There's no point in allowing anything kernel internal nor procfs or
sysfs.

Reported-by: Zicheng Qu <quzicheng@huawei.com>
Link: https://lore.kernel.org/r/20250127091811.3183623-1-quzicheng@huawei.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 kernel/acct.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Amir Goldstein Feb. 11, 2025, 8:30 p.m. UTC | #1
On Tue, Feb 11, 2025 at 6:17 PM Christian Brauner <brauner@kernel.org> wrote:
>
> There's no point in allowing anything kernel internal nor procfs or
> sysfs.
>
> Reported-by: Zicheng Qu <quzicheng@huawei.com>
> Link: https://lore.kernel.org/r/20250127091811.3183623-1-quzicheng@huawei.com
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  kernel/acct.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/kernel/acct.c b/kernel/acct.c
> index 48283efe8a12..6520baa13669 100644
> --- a/kernel/acct.c
> +++ b/kernel/acct.c
> @@ -243,6 +243,20 @@ static int acct_on(struct filename *pathname)
>                 return -EACCES;
>         }
>
> +       /* Exclude kernel kernel internal filesystems. */
> +       if (file_inode(file)->i_sb->s_flags & (SB_NOUSER | SB_KERNMOUNT)) {
> +               kfree(acct);
> +               filp_close(file, NULL);
> +               return -EINVAL;
> +       }
> +
> +       /* Exclude procfs and sysfs. */
> +       if (file_inode(file)->i_sb->s_iflags & SB_I_USERNS_VISIBLE) {
> +               kfree(acct);
> +               filp_close(file, NULL);
> +               return -EINVAL;
> +       }
> +
>         if (!(file->f_mode & FMODE_CAN_WRITE)) {
>                 kfree(acct);
>                 filp_close(file, NULL);
>
> --
> 2.47.2
>
>
Al Viro Feb. 11, 2025, 8:54 p.m. UTC | #2
On Tue, Feb 11, 2025 at 06:16:00PM +0100, Christian Brauner wrote:
> There's no point in allowing anything kernel internal nor procfs or
> sysfs.

> +	/* Exclude kernel kernel internal filesystems. */
> +	if (file_inode(file)->i_sb->s_flags & (SB_NOUSER | SB_KERNMOUNT)) {
> +		kfree(acct);
> +		filp_close(file, NULL);
> +		return -EINVAL;
> +	}
> +
> +	/* Exclude procfs and sysfs. */
> +	if (file_inode(file)->i_sb->s_iflags & SB_I_USERNS_VISIBLE) {
> +		kfree(acct);
> +		filp_close(file, NULL);
> +		return -EINVAL;
> +	}

That looks like a really weird way to test it, especially the second
part...
Christian Brauner Feb. 12, 2025, 10:32 a.m. UTC | #3
On Tue, Feb 11, 2025 at 08:54:18PM +0000, Al Viro wrote:
> On Tue, Feb 11, 2025 at 06:16:00PM +0100, Christian Brauner wrote:
> > There's no point in allowing anything kernel internal nor procfs or
> > sysfs.
> 
> > +	/* Exclude kernel kernel internal filesystems. */
> > +	if (file_inode(file)->i_sb->s_flags & (SB_NOUSER | SB_KERNMOUNT)) {
> > +		kfree(acct);
> > +		filp_close(file, NULL);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* Exclude procfs and sysfs. */
> > +	if (file_inode(file)->i_sb->s_iflags & SB_I_USERNS_VISIBLE) {
> > +		kfree(acct);
> > +		filp_close(file, NULL);
> > +		return -EINVAL;
> > +	}
> 
> That looks like a really weird way to test it, especially the second
> part...

SB_I_USERNS_VISIBLE has only ever applied to procfs and sysfs.

Granted, it's main purpose is to indicate that a caller in an
unprivileged userns might have a restricted view of sysfs/procfs already
so mounting it again must be prevented to not reveal any overmounted
entities (A Strong candidate for the price of least transparent cause of
EPERMs from the kernel imho.).

That flag could reasonably go and be replaced by explicit checks for
procfs and sysfs in general because we haven't ever grown any additional
candidates for that mess and it's unlikely that we ever will. But as
long as we have this I don't mind using it. If it's important to you
I'll happily change it. If you can live with the comment I added I'll
leave it.

To be perfectly blunt: Imho, this api isn't worth massaging a single
line of VFS code which is why this isn't going to win the price of
prettiest fix of a NULL-deref.
diff mbox series

Patch

diff --git a/kernel/acct.c b/kernel/acct.c
index 48283efe8a12..6520baa13669 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -243,6 +243,20 @@  static int acct_on(struct filename *pathname)
 		return -EACCES;
 	}
 
+	/* Exclude kernel kernel internal filesystems. */
+	if (file_inode(file)->i_sb->s_flags & (SB_NOUSER | SB_KERNMOUNT)) {
+		kfree(acct);
+		filp_close(file, NULL);
+		return -EINVAL;
+	}
+
+	/* Exclude procfs and sysfs. */
+	if (file_inode(file)->i_sb->s_iflags & SB_I_USERNS_VISIBLE) {
+		kfree(acct);
+		filp_close(file, NULL);
+		return -EINVAL;
+	}
+
 	if (!(file->f_mode & FMODE_CAN_WRITE)) {
 		kfree(acct);
 		filp_close(file, NULL);