[v2] ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials
diff mbox

Message ID 148407302133.16047.411379729888561193.stgit@buzz
State New
Headers show

Commit Message

Konstantin Khlebnikov Jan. 10, 2017, 6:30 p.m. UTC
If overlay was mounted by root then quota set for upper layer does not work
because overlay now always use mounter's credentials for operations.
Also overlay might deplete reserved space and inodes in ext4.

This patch drops capability SYS_RESOURCE from saved credentials.
This affects creation new files, whiteouts, and copy-up operations.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/super.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Vivek Goyal Jan. 10, 2017, 7:17 p.m. UTC | #1
On Tue, Jan 10, 2017 at 09:30:21PM +0300, Konstantin Khlebnikov wrote:
> If overlay was mounted by root then quota set for upper layer does not work
> because overlay now always use mounter's credentials for operations.
> Also overlay might deplete reserved space and inodes in ext4.
> 
> This patch drops capability SYS_RESOURCE from saved credentials.
> This affects creation new files, whiteouts, and copy-up operations.
> 

I am not an expert in this area, but I thought previous patch was
better. I am not sure why overlay internal operations should be
done without CAP_SYS_RESOURCES when caller has CAP_SYS_RESOURCES. That
might be counter-intuitive.

If some task is allowed to bypass quota limitations on a file system
then same should be true when task is working on overlay.

Similary if a task is allowed to use reserved space on filesystem, then same
task should be allowed to use reserved space on underlying filesystem
when doing overlay.  It should not be overlay's job to prevent that?

May be it is just me....

Vivek


> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  fs/overlayfs/super.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 20f48abbb82f..8dba982e1af5 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -701,6 +701,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  	unsigned int stacklen = 0;
>  	unsigned int i;
>  	bool remote = false;
> +	struct cred *cred;
>  	int err;
>  
>  	err = -ENOMEM;
> @@ -870,10 +871,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  	else
>  		sb->s_d_op = &ovl_dentry_operations;
>  
> -	ufs->creator_cred = prepare_creds();
> -	if (!ufs->creator_cred)
> +	cred = prepare_creds();
> +	if (!cred)
>  		goto out_put_lower_mnt;
>  
> +	/* Never override disk quota limits or use reserved space */
> +	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
> +	ufs->creator_cred = cred;
> +
>  	err = -ENOMEM;
>  	oe = ovl_alloc_entry(numlower);
>  	if (!oe)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Goldstein July 22, 2017, 9:30 a.m. UTC | #2
Bumped into this patch (Now upstream commit 51f8f3c4e225) and realized
it is missing cc: stable # v4.8

At least this docker PR suggests that regression introduced in v4.8 will not be
appreciated down the road:
https://github.com/moby/moby/issues/29364


On Tue, Jan 10, 2017 at 9:17 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Jan 10, 2017 at 09:30:21PM +0300, Konstantin Khlebnikov wrote:
>> If overlay was mounted by root then quota set for upper layer does not work
>> because overlay now always use mounter's credentials for operations.
>> Also overlay might deplete reserved space and inodes in ext4.
>>
>> This patch drops capability SYS_RESOURCE from saved credentials.
>> This affects creation new files, whiteouts, and copy-up operations.
>>
>
> I am not an expert in this area, but I thought previous patch was
> better. I am not sure why overlay internal operations should be
> done without CAP_SYS_RESOURCES when caller has CAP_SYS_RESOURCES. That
> might be counter-intuitive.
>
> If some task is allowed to bypass quota limitations on a file system
> then same should be true when task is working on overlay.
>
> Similary if a task is allowed to use reserved space on filesystem, then same
> task should be allowed to use reserved space on underlying filesystem
> when doing overlay.  It should not be overlay's job to prevent that?
>
> May be it is just me....
>

Vivek,

Since your question was not answered in this thread, IMO, your concern
is just, but in practice I think that:
1. It's going to be harder to implement for every operation to combine the
    mounter's creds with the process capabilities... weird
2. The use case of ext4 reserved blocks is to allow sys admin some slack
    for disk allocations that are needed in order to free up disk space or for
    other critical tasks to prevent the system from hanging. It doesn't sound
    like this use case fits an overlayfs mount that well.
3. FYI, xfs project quota (which as you know can be applied to docker
    overlayfs container) does not check CAP_SYS_RESOURCES at all.
    and if and when ext4 project quotas can also be applied to docker
    overlayfs container, I am sure that containers admin will not appreciate
    a container exceeding its quota, even if that was a privileged process
    writing to that container

So IMO that fix as it is is good for all practical purpose.

Cheers,
Amir.

>
>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
>> Cc: Vivek Goyal <vgoyal@redhat.com>
>> Cc: Miklos Szeredi <mszeredi@redhat.com>
>> ---
>>  fs/overlayfs/super.c |    9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> index 20f48abbb82f..8dba982e1af5 100644
>> --- a/fs/overlayfs/super.c
>> +++ b/fs/overlayfs/super.c
>> @@ -701,6 +701,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>       unsigned int stacklen = 0;
>>       unsigned int i;
>>       bool remote = false;
>> +     struct cred *cred;
>>       int err;
>>
>>       err = -ENOMEM;
>> @@ -870,10 +871,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>       else
>>               sb->s_d_op = &ovl_dentry_operations;
>>
>> -     ufs->creator_cred = prepare_creds();
>> -     if (!ufs->creator_cred)
>> +     cred = prepare_creds();
>> +     if (!cred)
>>               goto out_put_lower_mnt;
>>
>> +     /* Never override disk quota limits or use reserved space */
>> +     cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
>> +     ufs->creator_cred = cred;
>> +
>>       err = -ENOMEM;
>>       oe = ovl_alloc_entry(numlower);
>>       if (!oe)
Miklos Szeredi July 24, 2017, 8:15 a.m. UTC | #3
On Sat, Jul 22, 2017 at 11:30 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Bumped into this patch (Now upstream commit 51f8f3c4e225) and realized
> it is missing cc: stable # v4.8
>
> At least this docker PR suggests that regression introduced in v4.8 will not be
> appreciated down the road:
> https://github.com/moby/moby/issues/29364

Greg,

Can you please queue 51f8f3c4e225 ("ovl: drop CAP_SYS_RESOURCE from
saved mounter's credentials") for 4.9.y?

Thanks,
Miklos


>
>
> On Tue, Jan 10, 2017 at 9:17 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Tue, Jan 10, 2017 at 09:30:21PM +0300, Konstantin Khlebnikov wrote:
>>> If overlay was mounted by root then quota set for upper layer does not work
>>> because overlay now always use mounter's credentials for operations.
>>> Also overlay might deplete reserved space and inodes in ext4.
>>>
>>> This patch drops capability SYS_RESOURCE from saved credentials.
>>> This affects creation new files, whiteouts, and copy-up operations.
>>>
>>
>> I am not an expert in this area, but I thought previous patch was
>> better. I am not sure why overlay internal operations should be
>> done without CAP_SYS_RESOURCES when caller has CAP_SYS_RESOURCES. That
>> might be counter-intuitive.
>>
>> If some task is allowed to bypass quota limitations on a file system
>> then same should be true when task is working on overlay.
>>
>> Similary if a task is allowed to use reserved space on filesystem, then same
>> task should be allowed to use reserved space on underlying filesystem
>> when doing overlay.  It should not be overlay's job to prevent that?
>>
>> May be it is just me....
>>
>
> Vivek,
>
> Since your question was not answered in this thread, IMO, your concern
> is just, but in practice I think that:
> 1. It's going to be harder to implement for every operation to combine the
>     mounter's creds with the process capabilities... weird
> 2. The use case of ext4 reserved blocks is to allow sys admin some slack
>     for disk allocations that are needed in order to free up disk space or for
>     other critical tasks to prevent the system from hanging. It doesn't sound
>     like this use case fits an overlayfs mount that well.
> 3. FYI, xfs project quota (which as you know can be applied to docker
>     overlayfs container) does not check CAP_SYS_RESOURCES at all.
>     and if and when ext4 project quotas can also be applied to docker
>     overlayfs container, I am sure that containers admin will not appreciate
>     a container exceeding its quota, even if that was a privileged process
>     writing to that container
>
> So IMO that fix as it is is good for all practical purpose.
>
> Cheers,
> Amir.
>
>>
>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> Fixes: 1175b6b8d963 ("ovl: do operations on underlying file system in mounter's context")
>>> Cc: Vivek Goyal <vgoyal@redhat.com>
>>> Cc: Miklos Szeredi <mszeredi@redhat.com>
>>> ---
>>>  fs/overlayfs/super.c |    9 +++++++--
>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>>> index 20f48abbb82f..8dba982e1af5 100644
>>> --- a/fs/overlayfs/super.c
>>> +++ b/fs/overlayfs/super.c
>>> @@ -701,6 +701,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>>       unsigned int stacklen = 0;
>>>       unsigned int i;
>>>       bool remote = false;
>>> +     struct cred *cred;
>>>       int err;
>>>
>>>       err = -ENOMEM;
>>> @@ -870,10 +871,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>>       else
>>>               sb->s_d_op = &ovl_dentry_operations;
>>>
>>> -     ufs->creator_cred = prepare_creds();
>>> -     if (!ufs->creator_cred)
>>> +     cred = prepare_creds();
>>> +     if (!cred)
>>>               goto out_put_lower_mnt;
>>>
>>> +     /* Never override disk quota limits or use reserved space */
>>> +     cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
>>> +     ufs->creator_cred = cred;
>>> +
>>>       err = -ENOMEM;
>>>       oe = ovl_alloc_entry(numlower);
>>>       if (!oe)
Miklos Szeredi July 24, 2017, 8:19 a.m. UTC | #4
On Tue, Jan 10, 2017 at 8:17 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Jan 10, 2017 at 09:30:21PM +0300, Konstantin Khlebnikov wrote:
>> If overlay was mounted by root then quota set for upper layer does not work
>> because overlay now always use mounter's credentials for operations.
>> Also overlay might deplete reserved space and inodes in ext4.
>>
>> This patch drops capability SYS_RESOURCE from saved credentials.
>> This affects creation new files, whiteouts, and copy-up operations.
>>
>
> I am not an expert in this area, but I thought previous patch was
> better. I am not sure why overlay internal operations should be
> done without CAP_SYS_RESOURCES when caller has CAP_SYS_RESOURCES. That
> might be counter-intuitive.
>
> If some task is allowed to bypass quota limitations on a file system
> then same should be true when task is working on overlay.
>
> Similary if a task is allowed to use reserved space on filesystem, then same
> task should be allowed to use reserved space on underlying filesystem
> when doing overlay.  It should not be overlay's job to prevent that?
>
> May be it is just me....

Well, depends how you look at at it.  Overlayfs just uses the
underlying filesystem as storage.  So this patch basically asserts
that overlayfs can only use non-reserved space for its storage.  I
don't see a problem with and it's the simpler fix, but if real use
cases turn up then this can be revisited.

Thanks,
Miklos
Greg KH July 25, 2017, 3:47 a.m. UTC | #5
On Mon, Jul 24, 2017 at 10:15:22AM +0200, Miklos Szeredi wrote:
> On Sat, Jul 22, 2017 at 11:30 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> > Bumped into this patch (Now upstream commit 51f8f3c4e225) and realized
> > it is missing cc: stable # v4.8
> >
> > At least this docker PR suggests that regression introduced in v4.8 will not be
> > appreciated down the road:
> > https://github.com/moby/moby/issues/29364
> 
> Greg,
> 
> Can you please queue 51f8f3c4e225 ("ovl: drop CAP_SYS_RESOURCE from
> saved mounter's credentials") for 4.9.y?

Now queued up, thanks.

greg k-h

Patch
diff mbox

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 20f48abbb82f..8dba982e1af5 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -701,6 +701,7 @@  static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	unsigned int stacklen = 0;
 	unsigned int i;
 	bool remote = false;
+	struct cred *cred;
 	int err;
 
 	err = -ENOMEM;
@@ -870,10 +871,14 @@  static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	else
 		sb->s_d_op = &ovl_dentry_operations;
 
-	ufs->creator_cred = prepare_creds();
-	if (!ufs->creator_cred)
+	cred = prepare_creds();
+	if (!cred)
 		goto out_put_lower_mnt;
 
+	/* Never override disk quota limits or use reserved space */
+	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
+	ufs->creator_cred = cred;
+
 	err = -ENOMEM;
 	oe = ovl_alloc_entry(numlower);
 	if (!oe)