xfs: return correct XFS_IOC_DIOINFO for DAX inode
diff mbox series

Message ID b3af182f-a547-9d04-3b29-232857e83fc4@redhat.com
State New
Headers show
Series
  • xfs: return correct XFS_IOC_DIOINFO for DAX inode
Related show

Commit Message

Eric Sandeen April 2, 2019, 3:44 p.m. UTC
pmem is byte addressable, and indeed byte-aligned DIO works on
a DAX file.  So, teach XFS_IOC_DIOINFO to return the correct
alignment information if IS_DAX(inode).

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

Comments

Darrick J. Wong April 2, 2019, 5:56 p.m. UTC | #1
On Tue, Apr 02, 2019 at 10:44:38AM -0500, Eric Sandeen wrote:
> pmem is byte addressable, and indeed byte-aligned DIO works on
> a DAX file.  So, teach XFS_IOC_DIOINFO to return the correct
> alignment information if IS_DAX(inode).

If it's a DAX filesystem, do we want to try to steer people towards
things like 2MB pages since (in theory) we can get away with fewer page
table mappings?  And (seeing as that's mmap that cares, not directio)
would advertising preferential page mapping sizes be more appropriate
advertised in a different ioctl?

--D

> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 6ecdbb3..35eae7d 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1919,12 +1919,21 @@ xfs_file_ioctl(
>  	}
>  	case XFS_IOC_DIOINFO: {
>  		struct dioattr	da;
> -		xfs_buftarg_t	*target =
> -			XFS_IS_REALTIME_INODE(ip) ?
> -			mp->m_rtdev_targp : mp->m_ddev_targp;
>  
> -		da.d_mem =  da.d_miniosz = target->bt_logical_sectorsize;
> -		da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
> +		if (IS_DAX(inode)) {
> +			/* pmem is byte addressable */
> +			da.d_mem = 1;
> +			da.d_miniosz = 1;
> +			da.d_maxiosz = INT_MAX;
> +		} else {
> +			xfs_buftarg_t	*target =
> +				XFS_IS_REALTIME_INODE(ip) ?
> +				mp->m_rtdev_targp : mp->m_ddev_targp;
> +
> +			da.d_mem = target->bt_logical_sectorsize;
> +			da.d_miniosz = target->bt_logical_sectorsize;
> +			da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
> +		}
>  
>  		if (copy_to_user(arg, &da, sizeof(da)))
>  			return -EFAULT;
>
Eric Sandeen April 2, 2019, 6:08 p.m. UTC | #2
On 4/2/19 12:56 PM, Darrick J. Wong wrote:
> On Tue, Apr 02, 2019 at 10:44:38AM -0500, Eric Sandeen wrote:
>> pmem is byte addressable, and indeed byte-aligned DIO works on
>> a DAX file.  So, teach XFS_IOC_DIOINFO to return the correct
>> alignment information if IS_DAX(inode).
> 
> If it's a DAX filesystem, do we want to try to steer people towards
> things like 2MB pages since (in theory) we can get away with fewer page
> table mappings?  And (seeing as that's mmap that cares, not directio)
> would advertising preferential page mapping sizes be more appropriate
> advertised in a different ioctl?

The xfsctl(3) manpage documents XFS_IOC_DIOINFO as providing the
minimum/required alignments to avoid DIO failure.  Says nothing about
optimal.  So, if you'd like to advertise preferences, it seems like
this is not the interface to use...

-Eric

> --D
> 
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>> index 6ecdbb3..35eae7d 100644
>> --- a/fs/xfs/xfs_ioctl.c
>> +++ b/fs/xfs/xfs_ioctl.c
>> @@ -1919,12 +1919,21 @@ xfs_file_ioctl(
>>  	}
>>  	case XFS_IOC_DIOINFO: {
>>  		struct dioattr	da;
>> -		xfs_buftarg_t	*target =
>> -			XFS_IS_REALTIME_INODE(ip) ?
>> -			mp->m_rtdev_targp : mp->m_ddev_targp;
>>  
>> -		da.d_mem =  da.d_miniosz = target->bt_logical_sectorsize;
>> -		da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
>> +		if (IS_DAX(inode)) {
>> +			/* pmem is byte addressable */
>> +			da.d_mem = 1;
>> +			da.d_miniosz = 1;
>> +			da.d_maxiosz = INT_MAX;
>> +		} else {
>> +			xfs_buftarg_t	*target =
>> +				XFS_IS_REALTIME_INODE(ip) ?
>> +				mp->m_rtdev_targp : mp->m_ddev_targp;
>> +
>> +			da.d_mem = target->bt_logical_sectorsize;
>> +			da.d_miniosz = target->bt_logical_sectorsize;
>> +			da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
>> +		}
>>  
>>  		if (copy_to_user(arg, &da, sizeof(da)))
>>  			return -EFAULT;
>>
Dave Chinner April 2, 2019, 9:31 p.m. UTC | #3
On Tue, Apr 02, 2019 at 10:56:32AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 02, 2019 at 10:44:38AM -0500, Eric Sandeen wrote:
> > pmem is byte addressable, and indeed byte-aligned DIO works on
> > a DAX file.  So, teach XFS_IOC_DIOINFO to return the correct
> > alignment information if IS_DAX(inode).
> 
> If it's a DAX filesystem, do we want to try to steer people towards
> things like 2MB pages since (in theory) we can get away with fewer page
> table mappings?  And (seeing as that's mmap that cares, not directio)
> would advertising preferential page mapping sizes be more appropriate
> advertised in a different ioctl?
> 
> --D
> 
> > Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > ---
> > 
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 6ecdbb3..35eae7d 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1919,12 +1919,21 @@ xfs_file_ioctl(
> >  	}
> >  	case XFS_IOC_DIOINFO: {
> >  		struct dioattr	da;
> > -		xfs_buftarg_t	*target =
> > -			XFS_IS_REALTIME_INODE(ip) ?
> > -			mp->m_rtdev_targp : mp->m_ddev_targp;
> >  
> > -		da.d_mem =  da.d_miniosz = target->bt_logical_sectorsize;
> > -		da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
> > +		if (IS_DAX(inode)) {
> > +			/* pmem is byte addressable */
> > +			da.d_mem = 1;
> > +			da.d_miniosz = 1;
> > +			da.d_maxiosz = INT_MAX;

I don't think we want to open that can of worms.

Have you run fsx on dax mixing mmap/dio with byte range granularity?

Cheers,

Dave.
Eric Sandeen April 3, 2019, 1:08 a.m. UTC | #4
On 4/2/19 4:31 PM, Dave Chinner wrote:
> On Tue, Apr 02, 2019 at 10:56:32AM -0700, Darrick J. Wong wrote:
>> On Tue, Apr 02, 2019 at 10:44:38AM -0500, Eric Sandeen wrote:
>>> pmem is byte addressable, and indeed byte-aligned DIO works on
>>> a DAX file.  So, teach XFS_IOC_DIOINFO to return the correct
>>> alignment information if IS_DAX(inode).
>>
>> If it's a DAX filesystem, do we want to try to steer people towards
>> things like 2MB pages since (in theory) we can get away with fewer page
>> table mappings?  And (seeing as that's mmap that cares, not directio)
>> would advertising preferential page mapping sizes be more appropriate
>> advertised in a different ioctl?
>>
>> --D
>>
>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>>> ---
>>>
>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>> index 6ecdbb3..35eae7d 100644
>>> --- a/fs/xfs/xfs_ioctl.c
>>> +++ b/fs/xfs/xfs_ioctl.c
>>> @@ -1919,12 +1919,21 @@ xfs_file_ioctl(
>>>  	}
>>>  	case XFS_IOC_DIOINFO: {
>>>  		struct dioattr	da;
>>> -		xfs_buftarg_t	*target =
>>> -			XFS_IS_REALTIME_INODE(ip) ?
>>> -			mp->m_rtdev_targp : mp->m_ddev_targp;
>>>  
>>> -		da.d_mem =  da.d_miniosz = target->bt_logical_sectorsize;
>>> -		da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
>>> +		if (IS_DAX(inode)) {
>>> +			/* pmem is byte addressable */
>>> +			da.d_mem = 1;
>>> +			da.d_miniosz = 1;
>>> +			da.d_maxiosz = INT_MAX;
> 
> I don't think we want to open that can of worms.

It's already open... byte-granularity dax+dio succeeds today.
Does it work? ;)

> Have you run fsx on dax mixing mmap/dio with byte range granularity?

Like:

# fsx -Z -r 1 -w 1 daxfile

?

yes (now that you asked) ;)
not to a bazillion ops, but I've not seen a failure yet.
This is on simulated pmem, on a 5.0 kernel.

-Eric

> Cheers,
> 
> Dave.
>

Patch
diff mbox series

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 6ecdbb3..35eae7d 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1919,12 +1919,21 @@  xfs_file_ioctl(
 	}
 	case XFS_IOC_DIOINFO: {
 		struct dioattr	da;
-		xfs_buftarg_t	*target =
-			XFS_IS_REALTIME_INODE(ip) ?
-			mp->m_rtdev_targp : mp->m_ddev_targp;
 
-		da.d_mem =  da.d_miniosz = target->bt_logical_sectorsize;
-		da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
+		if (IS_DAX(inode)) {
+			/* pmem is byte addressable */
+			da.d_mem = 1;
+			da.d_miniosz = 1;
+			da.d_maxiosz = INT_MAX;
+		} else {
+			xfs_buftarg_t	*target =
+				XFS_IS_REALTIME_INODE(ip) ?
+				mp->m_rtdev_targp : mp->m_ddev_targp;
+
+			da.d_mem = target->bt_logical_sectorsize;
+			da.d_miniosz = target->bt_logical_sectorsize;
+			da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
+		}
 
 		if (copy_to_user(arg, &da, sizeof(da)))
 			return -EFAULT;