btrfs: allow processes with cap_sys_resource to exceed quota
diff mbox

Message ID 20170421100943.GA2942@ircssh-2.c.rugged-nimbus-611.internal
State New
Headers show

Commit Message

Sargun Dhillon April 21, 2017, 10:09 a.m. UTC
This patch allows processes with CAP_SYS_RESOURCE to exceed the qgroup
limit. It's useful for administrative programs, such as log rotation,
that may need to temporarily use more disk space in order to free up
a greater amount of overall disk space without yielding more disk
space to the rest of userland.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
---
 fs/btrfs/qgroup.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Sargun Dhillon April 21, 2017, 10:34 a.m. UTC | #1
The log rotation code that I have requires creating the new file
before it calls FICLONERANGE. Since it tries to copy close to a
newline and not the filesystem block boundary, it means that some of
the clone isn't a lazy copy, and instead has to be cloned byte for
byte. At a minimum I need a mechanism in order to circumvent the
metadata quota.

Since we don't have the notion of "operator reserved space", I'm not
sure of another straightforward way to do this. The other idea I had
was to introduce the idea of an operator UID to btrfs_qgroup, and
introduce a rsv_operator_rfer, and rsv_operator_excl, but that seems
to be killing the mosquito with the cannon.

On Fri, Apr 21, 2017 at 5:09 AM, Sargun Dhillon <sargun@sargun.me> wrote:
> This patch allows processes with CAP_SYS_RESOURCE to exceed the qgroup
> limit. It's useful for administrative programs, such as log rotation,
> that may need to temporarily use more disk space in order to free up
> a greater amount of overall disk space without yielding more disk
> space to the rest of userland.
>
> Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> ---
>  fs/btrfs/qgroup.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index a59801d..b0af39c 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2349,6 +2349,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
>
>  static bool qgroup_check_limits(const struct btrfs_qgroup *qg, u64 num_bytes)
>  {
> +       if (capable(CAP_SYS_RESOURCE))
> +               return true;
> +
>         if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>             qg->reserved + (s64)qg->rfer + num_bytes > qg->max_rfer)
>                 return false;
> --
> 2.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adam Borowski April 21, 2017, 11:05 a.m. UTC | #2
On Fri, Apr 21, 2017 at 10:09:46AM +0000, Sargun Dhillon wrote:
> This patch allows processes with CAP_SYS_RESOURCE to exceed the qgroup
> limit. It's useful for administrative programs, such as log rotation,
> that may need to temporarily use more disk space in order to free up
> a greater amount of overall disk space without yielding more disk
> space to the rest of userland.

>  static bool qgroup_check_limits(const struct btrfs_qgroup *qg, u64 num_bytes)
>  {
> +	if (capable(CAP_SYS_RESOURCE))
> +		return true;
> +

I don't think it's a good idea to make random root-uid processes ignore
qgroups completely.  Just because the daemon in question doesn't use a
separate uid is no reason to not protect you from it consuming all the disk
space.

A temporary request "please let me exceed limits" would make sense, though.

The problem with CAP_SYS_RESOURCE is that it's always on unless explicitly
dropped.
Sargun Dhillon April 21, 2017, 12:27 p.m. UTC | #3
What do you think about putting this behaviour behind a sysctl? Seems
better than to start introducing a new mechanism of marking tasks?

On Fri, Apr 21, 2017 at 6:05 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> On Fri, Apr 21, 2017 at 10:09:46AM +0000, Sargun Dhillon wrote:
>> This patch allows processes with CAP_SYS_RESOURCE to exceed the qgroup
>> limit. It's useful for administrative programs, such as log rotation,
>> that may need to temporarily use more disk space in order to free up
>> a greater amount of overall disk space without yielding more disk
>> space to the rest of userland.
>
>>  static bool qgroup_check_limits(const struct btrfs_qgroup *qg, u64 num_bytes)
>>  {
>> +     if (capable(CAP_SYS_RESOURCE))
>> +             return true;
>> +
>
> I don't think it's a good idea to make random root-uid processes ignore
> qgroups completely.  Just because the daemon in question doesn't use a
> separate uid is no reason to not protect you from it consuming all the disk
> space.
>
> A temporary request "please let me exceed limits" would make sense, though.
>
> The problem with CAP_SYS_RESOURCE is that it's always on unless explicitly
> dropped.
>
> --
> ⢀⣴⠾⠻⢶⣦⠀ Meow!
> ⣾⠁⢠⠒⠀⣿⡁
> ⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second
> ⠈⠳⣄⠀⠀⠀⠀ preimage for double rot13!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba May 5, 2017, 6:22 p.m. UTC | #4
On Fri, Apr 21, 2017 at 07:27:23AM -0500, Sargun Dhillon wrote:
> What do you think about putting this behaviour behind a sysctl? Seems
> better than to start introducing a new mechanism of marking tasks?

Technically it's easy to add own btrfs-specific ioctl, temporarily
turning off quota limits, but I'm still not sure about all the
consequences as this would apply to the whole system, no? The
capabilities are per process, much more fine grained. I don't have other
ideas how to address the problem though.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sargun Dhillon May 5, 2017, 6:25 p.m. UTC | #5
If you see my follow-on patch, it allows disabling the quota limit for
folks with cap_sys_resource per filesystem. I don't want to have any
process to be able to turn off quota limits, but just the process that
is the logrotator (and has the proper capabilities). Unfortunately,
most folks don't lock down their capabilities, so I agree, making it
blindly based on capabilities seems like a poor idea.

On Fri, May 5, 2017 at 11:22 AM, David Sterba <dsterba@suse.cz> wrote:
> On Fri, Apr 21, 2017 at 07:27:23AM -0500, Sargun Dhillon wrote:
>> What do you think about putting this behaviour behind a sysctl? Seems
>> better than to start introducing a new mechanism of marking tasks?
>
> Technically it's easy to add own btrfs-specific ioctl, temporarily
> turning off quota limits, but I'm still not sure about all the
> consequences as this would apply to the whole system, no? The
> capabilities are per process, much more fine grained. I don't have other
> ideas how to address the problem though.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index a59801d..b0af39c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2349,6 +2349,9 @@  int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 
 static bool qgroup_check_limits(const struct btrfs_qgroup *qg, u64 num_bytes)
 {
+	if (capable(CAP_SYS_RESOURCE))
+		return true;
+
 	if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
 	    qg->reserved + (s64)qg->rfer + num_bytes > qg->max_rfer)
 		return false;