diff mbox series

xfs: allow SECURE namespace xattrs to use reserved pool

Message ID fa801180-0229-4ea7-b8eb-eb162935d348@redhat.com (mailing list archive)
State Accepted, archived
Headers show
Series xfs: allow SECURE namespace xattrs to use reserved pool | expand

Commit Message

Eric Sandeen July 19, 2024, 10:48 p.m. UTC
We got a report from the podman folks that selinux relabels that happen
as part of their process were returning ENOSPC when the filesystem is
completely full. This is because xattr changes reserve about 15 blocks
for the worst case, but the common case is for selinux contexts to be
the sole, in-inode xattr and consume no blocks.

We already allow reserved space consumption for XFS_ATTR_ROOT for things
such as ACLs, and selinux / SECURE attributes are not so very different,
so allow them to use the reserved space as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

Comments

Christoph Hellwig July 22, 2024, 2:41 p.m. UTC | #1
On Fri, Jul 19, 2024 at 05:48:53PM -0500, Eric Sandeen wrote:
>  	xfs_attr_sethash(args);
>  
> -	return xfs_attr_set(args, op, args->attr_filter & XFS_ATTR_ROOT);
> +	rsvd = args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_SECURE);
> +	return xfs_attr_set(args, op, rsvd);

This looks fine, although I'd probably do without the extra local
variable.  More importantly though, please write a comment documenting
why we are dipping into the reserved pool here.  We should have had that
since the beginning, but this is a better time than never.
Eric Sandeen July 22, 2024, 3:05 p.m. UTC | #2
On 7/22/24 9:41 AM, Christoph Hellwig wrote:
> On Fri, Jul 19, 2024 at 05:48:53PM -0500, Eric Sandeen wrote:
>>  	xfs_attr_sethash(args);
>>  
>> -	return xfs_attr_set(args, op, args->attr_filter & XFS_ATTR_ROOT);
>> +	rsvd = args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_SECURE);
>> +	return xfs_attr_set(args, op, rsvd);
> 
> This looks fine, although I'd probably do without the extra local
> variable.  More importantly though, please write a comment documenting
> why we are dipping into the reserved pool here.  We should have had that
> since the beginning, but this is a better time than never.
> 
> 

Ok, I thought the local var was a little prettier but *shrug* can do it
either way.

To be honest I'm not sure why it was done for ROOT; dchinnner mentioned
something about DMAPI requirements, long ago...

It seems reasonable, and it's been there forever but also not obviously
required, AFAICT.

What would your explanation be? ;) 

Thanks,
-Eric
Christoph Hellwig July 22, 2024, 3:11 p.m. UTC | #3
On Mon, Jul 22, 2024 at 10:05:03AM -0500, Eric Sandeen wrote:
> Ok, I thought the local var was a little prettier but *shrug* can do it
> either way.
> 
> To be honest I'm not sure why it was done for ROOT; dchinnner mentioned
> something about DMAPI requirements, long ago...
> 
> It seems reasonable, and it's been there forever but also not obviously
> required, AFAICT.
> 
> What would your explanation be? ;) 

Based on your explanation it's probably ACLs for the same reason it
applies to the security attributes.
mark.tinguely@oracle.com July 22, 2024, 4:43 p.m. UTC | #4
On 7/22/24 10:05 AM, Eric Sandeen wrote:
> On 7/22/24 9:41 AM, Christoph Hellwig wrote:
>> On Fri, Jul 19, 2024 at 05:48:53PM -0500, Eric Sandeen wrote:
>>>   	xfs_attr_sethash(args);
>>>   
>>> -	return xfs_attr_set(args, op, args->attr_filter & XFS_ATTR_ROOT);
>>> +	rsvd = args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_SECURE);
>>> +	return xfs_attr_set(args, op, rsvd);
>> This looks fine, although I'd probably do without the extra local
>> variable.  More importantly though, please write a comment documenting
>> why we are dipping into the reserved pool here.  We should have had that
>> since the beginning, but this is a better time than never.
>>
>>
> Ok, I thought the local var was a little prettier but *shrug* can do it
> either way.
>
> To be honest I'm not sure why it was done for ROOT; dchinnner mentioned
> something about DMAPI requirements, long ago...


The older Data Mover Framework (DMF v6) kept an extended attribute that 
denoted the file status (online/offline/partial online) and some region 
information (which was never used). Yeah DMF uses Data Management API 
(DMAPI) as the hooks to move data in when offline.

>
> It seems reasonable, and it's been there forever but also not obviously
> required, AFAICT.
>
> What would your explanation be? ;)
>
> Thanks,
> -Eric
>
Dave Chinner July 22, 2024, 10:45 p.m. UTC | #5
On Mon, Jul 22, 2024 at 10:05:03AM -0500, Eric Sandeen wrote:
> On 7/22/24 9:41 AM, Christoph Hellwig wrote:
> > On Fri, Jul 19, 2024 at 05:48:53PM -0500, Eric Sandeen wrote:
> >>  	xfs_attr_sethash(args);
> >>  
> >> -	return xfs_attr_set(args, op, args->attr_filter & XFS_ATTR_ROOT);
> >> +	rsvd = args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_SECURE);
> >> +	return xfs_attr_set(args, op, rsvd);
> > 
> > This looks fine, although I'd probably do without the extra local
> > variable.  More importantly though, please write a comment documenting
> > why we are dipping into the reserved pool here.  We should have had that
> > since the beginning, but this is a better time than never.
> > 
> > 
> 
> Ok, I thought the local var was a little prettier but *shrug* can do it
> either way.
> 
> To be honest I'm not sure why it was done for ROOT; dchinnner mentioned
> something about DMAPI requirements, long ago...

Because the xattrs created with inode allocation are not atomic
(which they could be now because parent pointers added the
infrastructure to add xattrs atomically in the create transaction),
stuff like ACLs, security xattrs and, historically, DMAPI xattrs
could fail to be created when the inode was allocated.

For DMAPI/DMF, this was a big issue if the xattr creation got ENOSPC
or the system crashed between inode creation (i.e the DMAPI CREATE
notification being processed by DMF) and the xattr being written on
the newly allocated inode. This would leave leave "untracked" inodes
in the filesystem, and the only way DMF could discover inodes
lacking in DMAPI xattrs was to run a full filesystem DMAPI-bulkstat
scan to synchronise the filesystem state with the DMF database held
in userspace. When you're tracking hundreds of millions to billions
of inodes, being forced to do a full fs inode scan after crashes or
ENOSPC before everything works properly again is, well, kinda
annoying.

Similar issues afflicted Trix (Trusted Irix) where security xattrs
(such as ACLs) went missing on crash or ENOSPC. On Irix, they were
stored in the XFS_ATTR_ROOT namespace, and the use of reserved block
space for XFS_ATTR_ROOT was introduced in 1997 on Irix.

commit 32d7e9a0d0fbca91a3d036c8518a87e10abfafb3
Author: gnuss <gnuss>
Date:   Fri Dec 19 19:35:42 1997 +0000

    pv: 553766 rv: lord@cray.com
    Add reserved flag param to routines in block allocation call sequence

This commit contains just the addition of XFS_TRANS_RESERVE for
XFS_ATTR_ROOT xattrs. Nothing else used it - this was specically a
fix for ACL/DMAPI xattr creation at ENOSPC....

However, ACL support on linux, and hence XFS_ATTR_SECURE, didn't
exist until 2004:

commit af80e14283d9475582dfb2d91395b674b9827fa8
Author: Nathan Scott <nathans@sgi.com>
Date:   Thu Jan 29 03:56:41 2004 +0000

    Add the security extended attributes namespace.

This added the XFS_ATTR_SECURE namespace because ACLs are in a
different xattr namespace in Linux (i.e. TRUSTED -> XFS_ATTR_ROOT,
SECURITY -> XFS_ATTR_SECURE), but the xattr set/change code never
added the XFS_ATTR_SECURE flag to the XFS_TRANS_RESERVE case.

It wasn't until 2007 that we started to use the reserve block pool
for other ENOSPC avoidance cases (like indirect delalloc BMBT block
reservation exhaustion in writeback) here:

commit bdebc6a4aca2ac056b8174f5b6a3bf27b28f6a5d
Author: Dave Chinner <dgc@sgi.com>
Date:   Fri Jun 8 16:03:59 2007 +0000

    Prevent ENOSPC from aborting transactions that need to succeed

So, essentially, for the first 10 years of it's life,
XFS_TRANS_RESERVE was used supposed to be to prevent ENOSPC at inode
creation for security and trusted xattrs....

> It seems reasonable, and it's been there forever but also not
> obviously required, AFAICT.

In hindsight, it looks to me like this was an oversight made back
in 2004 when XFS_ATTR_SECURE was added to linux for security
xattrs. As Christoph says: "it should have been there since the
beginning".

-Dave.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index ab3d22f662f2..e59193609003 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -82,6 +82,7 @@  xfs_attr_change(
 {
 	struct xfs_mount	*mp = args->dp->i_mount;
 	int			error;
+	bool			rsvd;
 
 	if (xfs_is_shutdown(mp))
 		return -EIO;
@@ -110,7 +111,8 @@  xfs_attr_change(
 	args->whichfork = XFS_ATTR_FORK;
 	xfs_attr_sethash(args);
 
-	return xfs_attr_set(args, op, args->attr_filter & XFS_ATTR_ROOT);
+	rsvd = args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_SECURE);
+	return xfs_attr_set(args, op, rsvd);
 }