diff mbox

[CFT,00/10] Making new mounts of proc and sysfs as safe as bind mounts (take 2)

Message ID 87iobcfkwx.fsf@x220.int.ebiederm.org (mailing list archive)
State New, archived
Headers show

Commit Message

Eric W. Biederman May 28, 2015, 9:32 p.m. UTC
Richard Weinberger <richard@nod.at> writes:

> Am 28.05.2015 um 21:57 schrieb Eric W. Biederman:
>>> FWIW, it breaks also libvirt-lxc:
>>> Error: internal error: guest failed to start: Failed to re-mount /proc/sys on /proc/sys flags=1021: Operation not permitted
>> 
>> Interesting.  I had not anticipated a failure there?  And it is failing
>> in remount?  Oh that is interesting.
>> 
>> That implies that there is some flag of the original mount of /proc that
>> the remount of /proc/sys is clearing, and that previously 
>> 
>> The flags specified are current rdonly,remount,bind so I expect there
>> are some other flags on proc that libvirt-lxc is clearing by accident
>> and we did not fail before because the kernel was not enforcing things.
>
> Please see:
> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l933
> lxcContainerMountBasicFS()
>
> and:
> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l850
> lxcBasicMounts
>
>> What are the mount flags in a working libvirt-lxc?
>
> See:
> test1:~ # cat /proc/self/mountinfo
> 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
> 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw

> If you need more info, please let me know. :-)

Oh interesting I had not realized libvirt-lxc had grown an unprivileged
mode using user namespaces.

This does appear to be a classic remount bug, where you are not
preserving the permissions.  It appears the fact that the code
failed to enforce locked permissions on the fresh mount of proc
was hiding this bug until now.

I expect what you actually want is the code below:


As the there is little point in making /proc/sys read-only in a
user-namespace, as the permission checks are uid based and no-one should
have the global uid 0 in your container.  Making mounting /proc/sys
read-only rather pointless.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Richard Weinberger May 28, 2015, 9:46 p.m. UTC | #1
Am 28.05.2015 um 23:32 schrieb Eric W. Biederman:
> Richard Weinberger <richard@nod.at> writes:
> 
>> Am 28.05.2015 um 21:57 schrieb Eric W. Biederman:
>>>> FWIW, it breaks also libvirt-lxc:
>>>> Error: internal error: guest failed to start: Failed to re-mount /proc/sys on /proc/sys flags=1021: Operation not permitted
>>>
>>> Interesting.  I had not anticipated a failure there?  And it is failing
>>> in remount?  Oh that is interesting.
>>>
>>> That implies that there is some flag of the original mount of /proc that
>>> the remount of /proc/sys is clearing, and that previously 
>>>
>>> The flags specified are current rdonly,remount,bind so I expect there
>>> are some other flags on proc that libvirt-lxc is clearing by accident
>>> and we did not fail before because the kernel was not enforcing things.
>>
>> Please see:
>> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l933
>> lxcContainerMountBasicFS()
>>
>> and:
>> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l850
>> lxcBasicMounts
>>
>>> What are the mount flags in a working libvirt-lxc?
>>
>> See:
>> test1:~ # cat /proc/self/mountinfo
>> 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
>> 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw
> 
>> If you need more info, please let me know. :-)
> 
> Oh interesting I had not realized libvirt-lxc had grown an unprivileged
> mode using user namespaces.

Yep. It works quite well. I've migrated all my containers from lxc
to libvirt-lxc because libvirt-lxc had a working user-namespace
implementation before lxc.

> This does appear to be a classic remount bug, where you are not
> preserving the permissions.  It appears the fact that the code
> failed to enforce locked permissions on the fresh mount of proc
> was hiding this bug until now.
> 
> I expect what you actually want is the code below:
> 
> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> index 9a9ae5c2aaf0..f008a7484bfe 100644
> --- a/src/lxc/lxc_container.c
> +++ b/src/lxc/lxc_container.c
> @@ -850,7 +850,7 @@ typedef struct {
>  
>  static const virLXCBasicMountInfo lxcBasicMounts[] = {
>      { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
> -    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false },
> +    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
>      { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true },
>      { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true },
>      { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
> 
> Or possibly just:
> 
> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> index 9a9ae5c2aaf0..a60ccbd12bfc 100644
> --- a/src/lxc/lxc_container.c
> +++ b/src/lxc/lxc_container.c
> @@ -850,7 +850,7 @@ typedef struct {
>  
>  static const virLXCBasicMountInfo lxcBasicMounts[] = {
>      { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
> -    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false },
> +    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false },
>      { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true },
>      { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true },
>      { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },

I'll test your diff tomorrow with a fresh brain.
I sent a similar patch to libvirt folks some time ago, looks like it got lost. ;-\

> As the there is little point in making /proc/sys read-only in a
> user-namespace, as the permission checks are uid based and no-one should
> have the global uid 0 in your container.  Making mounting /proc/sys
> read-only rather pointless.

Yeah, I've been ranting about that for ages...
libvirt-lxc contains a lot of cruft to make privileged container
kind of secure. Some users still fear using the user-namespace.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel P. Berrangé June 16, 2015, 12:30 p.m. UTC | #2
On Thu, May 28, 2015 at 11:46:50PM +0200, Richard Weinberger wrote:
> Am 28.05.2015 um 23:32 schrieb Eric W. Biederman:
> > Richard Weinberger <richard@nod.at> writes:
> > 
> >> Am 28.05.2015 um 21:57 schrieb Eric W. Biederman:
> >>>> FWIW, it breaks also libvirt-lxc:
> >>>> Error: internal error: guest failed to start: Failed to re-mount /proc/sys on /proc/sys flags=1021: Operation not permitted
> >>>
> >>> Interesting.  I had not anticipated a failure there?  And it is failing
> >>> in remount?  Oh that is interesting.
> >>>
> >>> That implies that there is some flag of the original mount of /proc that
> >>> the remount of /proc/sys is clearing, and that previously 
> >>>
> >>> The flags specified are current rdonly,remount,bind so I expect there
> >>> are some other flags on proc that libvirt-lxc is clearing by accident
> >>> and we did not fail before because the kernel was not enforcing things.
> >>
> >> Please see:
> >> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l933
> >> lxcContainerMountBasicFS()
> >>
> >> and:
> >> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l850
> >> lxcBasicMounts
> >>
> >>> What are the mount flags in a working libvirt-lxc?
> >>
> >> See:
> >> test1:~ # cat /proc/self/mountinfo
> >> 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
> >> 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw
> > 
> >> If you need more info, please let me know. :-)
> > 
> > Oh interesting I had not realized libvirt-lxc had grown an unprivileged
> > mode using user namespaces.
> 
> Yep. It works quite well. I've migrated all my containers from lxc
> to libvirt-lxc because libvirt-lxc had a working user-namespace
> implementation before lxc.
> 
> > This does appear to be a classic remount bug, where you are not
> > preserving the permissions.  It appears the fact that the code
> > failed to enforce locked permissions on the fresh mount of proc
> > was hiding this bug until now.
> > 
> > I expect what you actually want is the code below:
> > 
> > diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> > index 9a9ae5c2aaf0..f008a7484bfe 100644
> > --- a/src/lxc/lxc_container.c
> > +++ b/src/lxc/lxc_container.c
> > @@ -850,7 +850,7 @@ typedef struct {
> >  
> >  static const virLXCBasicMountInfo lxcBasicMounts[] = {
> >      { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
> > -    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false },
> > +    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
> >      { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true },
> >      { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true },
> >      { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
> > 
> > Or possibly just:
> > 
> > diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> > index 9a9ae5c2aaf0..a60ccbd12bfc 100644
> > --- a/src/lxc/lxc_container.c
> > +++ b/src/lxc/lxc_container.c
> > @@ -850,7 +850,7 @@ typedef struct {
> >  
> >  static const virLXCBasicMountInfo lxcBasicMounts[] = {
> >      { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
> > -    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false },
> > +    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false },
> >      { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true },
> >      { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true },
> >      { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
> 
> I'll test your diff tomorrow with a fresh brain.
> I sent a similar patch to libvirt folks some time ago, looks like it got lost. ;-\
> 
> > As the there is little point in making /proc/sys read-only in a
> > user-namespace, as the permission checks are uid based and no-one should
> > have the global uid 0 in your container.  Making mounting /proc/sys
> > read-only rather pointless.
> 
> Yeah, I've been ranting about that for ages...
> libvirt-lxc contains a lot of cruft to make privileged container
> kind of secure. Some users still fear using the user-namespace.

Yes, we've discussed this before and I'd like to simplify this. The
thing that has been stopping me tackling it has been figuring out a
way to ensure we don't change semantics for existing deployed users.
ie when RHEL-7 rebases to newer libvirt, I don't want existing
containers to suddenly change their setup, because although the
existing setup is sub-optimal, some apps / users might be relying
on its behaviour in ways I can't predict.

I do believe I have figured out a way to allow backwards compatibility
now though, so we should have able to have another stab at simplifying
and removing this cruft for newly deployed containers.

Regards,
Daniel
diff mbox

Patch

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 9a9ae5c2aaf0..f008a7484bfe 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -850,7 +850,7 @@  typedef struct {
 
 static const virLXCBasicMountInfo lxcBasicMounts[] = {
     { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
-    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false },
+    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
     { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true },
     { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true },
     { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },

Or possibly just:

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 9a9ae5c2aaf0..a60ccbd12bfc 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -850,7 +850,7 @@  typedef struct {
 
 static const virLXCBasicMountInfo lxcBasicMounts[] = {
     { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
-    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false },
+    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false },
     { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true },
     { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true },
     { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },