[v6,3/5] mm: introduce mmap3 for safely defining new mmap flags
diff mbox

Message ID 20170826074047.GA6292@ls3530.fritz.box
State New
Headers show

Commit Message

Helge Deller Aug. 26, 2017, 7:40 a.m. UTC
* Dan Williams <dan.j.williams@intel.com>:
> On Fri, Aug 25, 2017 at 9:19 AM, Helge Deller <deller@gmx.de> wrote:
> > On 25.08.2017 18:16, Kirill A. Shutemov wrote:
> >> On Fri, Aug 25, 2017 at 09:02:36AM -0700, Christoph Hellwig wrote:
> >>> On Fri, Aug 25, 2017 at 06:58:03PM +0300, Kirill A. Shutemov wrote:
> >>>> Not all archs are ready for this:
> >>>>
> >>>> arch/parisc/include/uapi/asm/mman.h:#define MAP_TYPE    0x03            /* Mask for type of mapping */
> >>>> arch/parisc/include/uapi/asm/mman.h:#define MAP_FIXED   0x04            /* Interpret addr exactly */
> >>>
> >>> I'd be happy to say that we should not care about parisc for
> >>> persistent memory.  We'll just have to find a way to exclude
> >>> parisc without making life too ugly.
> >>
> >> I don't think creapling mmap() interface for one arch is the right way to
> >> go. I think the interface should be universal.
> >>
> >> I may imagine MAP_DIRECT can be useful not only for persistent memory.
> >> For tmpfs instead of mlock()?
> >
> > On parisc we have
> > #define MAP_SHARED      0x01            /* Share changes */
> > #define MAP_PRIVATE     0x02            /* Changes are private */
> > #define MAP_TYPE        0x03            /* Mask for type of mapping */
> > #define MAP_FIXED       0x04            /* Interpret addr exactly */
> > #define MAP_ANONYMOUS   0x10            /* don't use a file */
> >
> > So, if you need a MAP_DIRECT, wouldn't e.g.
> > #define MAP_DIRECT      0x08
> > be possible (for parisc, and others 0x04).
> > And if MAP_TYPE needs to include this flag on parisc:
> > #define MAP_TYPE        (0x03 | 0x08)  /* Mask for type of mapping */
> 
> The problem here is that to support new the mmap flags the arch needs
> to find a flag that is guaranteed to fail on older kernels. Defining
> MAP_DIRECT to 0x8 on parisc doesn't work because it will simply be
> ignored on older parisc kernels.
> 
> However, it's already the case that several archs have their own
> sys_mmap entry points. Those archs that can't follow the common scheme
> (only parsic it seems) will need to add a new mmap syscall. I think
> that's a reasonable tradeoff to allow every other architecture to add
> this support with their existing mmap syscall paths.

I don't want other architectures to suffer just because of parisc.
But adding a new syscall just for usage on parisc won't work either,
because nobody will add code to call it then.
 
> That means MAP_DIRECT should be defined to MAP_TYPE on parisc until it
> later defines an opt-in mechanism to a new syscall that honors
> MAP_DIRECT as a valid flag.

I'd instead propose to to introduce an ABI breakage for parisc users
(which aren't many). Most parisc users update their kernel regularily
anyway, because we fixed so many bugs in the latest kernel.

With the following patch pushed down to the stable kernel series,
MAP_DIRECT will fail as expected on those kernels, while we can
keep parisc up with current developments regarding MAP_DIRECT.



Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Dan Williams Aug. 26, 2017, 3:15 p.m. UTC | #1
On Sat, Aug 26, 2017 at 12:40 AM, Helge Deller <deller@gmx.de> wrote:
> * Dan Williams <dan.j.williams@intel.com>:
>> On Fri, Aug 25, 2017 at 9:19 AM, Helge Deller <deller@gmx.de> wrote:
>> > On 25.08.2017 18:16, Kirill A. Shutemov wrote:
>> >> On Fri, Aug 25, 2017 at 09:02:36AM -0700, Christoph Hellwig wrote:
>> >>> On Fri, Aug 25, 2017 at 06:58:03PM +0300, Kirill A. Shutemov wrote:
>> >>>> Not all archs are ready for this:
>> >>>>
>> >>>> arch/parisc/include/uapi/asm/mman.h:#define MAP_TYPE    0x03            /* Mask for type of mapping */
>> >>>> arch/parisc/include/uapi/asm/mman.h:#define MAP_FIXED   0x04            /* Interpret addr exactly */
>> >>>
>> >>> I'd be happy to say that we should not care about parisc for
>> >>> persistent memory.  We'll just have to find a way to exclude
>> >>> parisc without making life too ugly.
>> >>
>> >> I don't think creapling mmap() interface for one arch is the right way to
>> >> go. I think the interface should be universal.
>> >>
>> >> I may imagine MAP_DIRECT can be useful not only for persistent memory.
>> >> For tmpfs instead of mlock()?
>> >
>> > On parisc we have
>> > #define MAP_SHARED      0x01            /* Share changes */
>> > #define MAP_PRIVATE     0x02            /* Changes are private */
>> > #define MAP_TYPE        0x03            /* Mask for type of mapping */
>> > #define MAP_FIXED       0x04            /* Interpret addr exactly */
>> > #define MAP_ANONYMOUS   0x10            /* don't use a file */
>> >
>> > So, if you need a MAP_DIRECT, wouldn't e.g.
>> > #define MAP_DIRECT      0x08
>> > be possible (for parisc, and others 0x04).
>> > And if MAP_TYPE needs to include this flag on parisc:
>> > #define MAP_TYPE        (0x03 | 0x08)  /* Mask for type of mapping */
>>
>> The problem here is that to support new the mmap flags the arch needs
>> to find a flag that is guaranteed to fail on older kernels. Defining
>> MAP_DIRECT to 0x8 on parisc doesn't work because it will simply be
>> ignored on older parisc kernels.
>>
>> However, it's already the case that several archs have their own
>> sys_mmap entry points. Those archs that can't follow the common scheme
>> (only parsic it seems) will need to add a new mmap syscall. I think
>> that's a reasonable tradeoff to allow every other architecture to add
>> this support with their existing mmap syscall paths.
>
> I don't want other architectures to suffer just because of parisc.
> But adding a new syscall just for usage on parisc won't work either,
> because nobody will add code to call it then.

I don't understand this comment, if / when parisc gets around to
adding pmem and dax support why wouldn't libc grow support for the new
parisc mmap variant? Also, it's not just MAP_DIRECT you would also
need space for a MAP_SYNC flag.

>> That means MAP_DIRECT should be defined to MAP_TYPE on parisc until it
>> later defines an opt-in mechanism to a new syscall that honors
>> MAP_DIRECT as a valid flag.
>
> I'd instead propose to to introduce an ABI breakage for parisc users
> (which aren't many). Most parisc users update their kernel regularily
> anyway, because we fixed so many bugs in the latest kernel.
>
> With the following patch pushed down to the stable kernel series,
> MAP_DIRECT will fail as expected on those kernels, while we can
> keep parisc up with current developments regarding MAP_DIRECT.

The whole point is to avoid an ABI regression and the chance for false
positive results. We're immediately stuck if some application was
expecting 0x8 to be ignored, or conversely an application that
absolutely needs to rely on MAP_SYNC/MAP_DIRECT semantics assumes the
wrong result on a parisc kernel where they are ignored.

I have not seen any patches for parisc pmem+dax enabling so it seems
too early to worry about these "last mile" enabling features of
MAP_DIRECT and MAP_SYNC. In particular parisc doesn't appear to have
ARCH_ENABLE_MEMORY_HOTPLUG, so as far as I can see it can't yet
support the ZONE_DEVICE scheme that is a pre-requisite for MAP_DIRECT.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller Aug. 26, 2017, 7:50 p.m. UTC | #2
On 26.08.2017 17:15, Dan Williams wrote:
> On Sat, Aug 26, 2017 at 12:40 AM, Helge Deller <deller@gmx.de> wrote:
>> * Dan Williams <dan.j.williams@intel.com>:
>>> On Fri, Aug 25, 2017 at 9:19 AM, Helge Deller <deller@gmx.de> wrote:
>>>> On 25.08.2017 18:16, Kirill A. Shutemov wrote:
>>>>> On Fri, Aug 25, 2017 at 09:02:36AM -0700, Christoph Hellwig wrote:
>>>>>> On Fri, Aug 25, 2017 at 06:58:03PM +0300, Kirill A. Shutemov wrote:
>>>>>>> Not all archs are ready for this:
>>>>>>>
>>>>>>> arch/parisc/include/uapi/asm/mman.h:#define MAP_TYPE    0x03            /* Mask for type of mapping */
>>>>>>> arch/parisc/include/uapi/asm/mman.h:#define MAP_FIXED   0x04            /* Interpret addr exactly */
>>>>>>
>>>>>> I'd be happy to say that we should not care about parisc for
>>>>>> persistent memory.  We'll just have to find a way to exclude
>>>>>> parisc without making life too ugly.
>>>>>
>>>>> I don't think creapling mmap() interface for one arch is the right way to
>>>>> go. I think the interface should be universal.
>>>>>
>>>>> I may imagine MAP_DIRECT can be useful not only for persistent memory.
>>>>> For tmpfs instead of mlock()?
>>>>
>>>> On parisc we have
>>>> #define MAP_SHARED      0x01            /* Share changes */
>>>> #define MAP_PRIVATE     0x02            /* Changes are private */
>>>> #define MAP_TYPE        0x03            /* Mask for type of mapping */
>>>> #define MAP_FIXED       0x04            /* Interpret addr exactly */
>>>> #define MAP_ANONYMOUS   0x10            /* don't use a file */
>>>>
>>>> So, if you need a MAP_DIRECT, wouldn't e.g.
>>>> #define MAP_DIRECT      0x08
>>>> be possible (for parisc, and others 0x04).
>>>> And if MAP_TYPE needs to include this flag on parisc:
>>>> #define MAP_TYPE        (0x03 | 0x08)  /* Mask for type of mapping */
>>>
>>> The problem here is that to support new the mmap flags the arch needs
>>> to find a flag that is guaranteed to fail on older kernels. Defining
>>> MAP_DIRECT to 0x8 on parisc doesn't work because it will simply be
>>> ignored on older parisc kernels.
>>>
>>> However, it's already the case that several archs have their own
>>> sys_mmap entry points. Those archs that can't follow the common scheme
>>> (only parsic it seems) will need to add a new mmap syscall. I think
>>> that's a reasonable tradeoff to allow every other architecture to add
>>> this support with their existing mmap syscall paths.
>>
>> I don't want other architectures to suffer just because of parisc.
>> But adding a new syscall just for usage on parisc won't work either,
>> because nobody will add code to call it then.
> 
> I don't understand this comment, if / when parisc gets around to
> adding pmem and dax support why wouldn't libc grow support for the new
> parisc mmap variant? Also, it's not just MAP_DIRECT you would also
> need space for a MAP_SYNC flag.
> 
>>> That means MAP_DIRECT should be defined to MAP_TYPE on parisc until it
>>> later defines an opt-in mechanism to a new syscall that honors
>>> MAP_DIRECT as a valid flag.
>>
>> I'd instead propose to to introduce an ABI breakage for parisc users
>> (which aren't many). Most parisc users update their kernel regularily
>> anyway, because we fixed so many bugs in the latest kernel.
>>
>> With the following patch pushed down to the stable kernel series,
>> MAP_DIRECT will fail as expected on those kernels, while we can
>> keep parisc up with current developments regarding MAP_DIRECT.
> 
> The whole point is to avoid an ABI regression and the chance for false
> positive results. We're immediately stuck if some application was
> expecting 0x8 to be ignored, or conversely an application that
> absolutely needs to rely on MAP_SYNC/MAP_DIRECT semantics assumes the
> wrong result on a parisc kernel where they are ignored.
> 
> I have not seen any patches for parisc pmem+dax enabling so it seems
> too early to worry about these "last mile" enabling features of
> MAP_DIRECT and MAP_SYNC. In particular parisc doesn't appear to have
> ARCH_ENABLE_MEMORY_HOTPLUG, so as far as I can see it can't yet
> support the ZONE_DEVICE scheme that is a pre-requisite for MAP_DIRECT.

I see, but then it's probably best to not to define any MAP_DIRECT or 
MAP_SYNC at all in the headers of those arches which don't support
pmem+dax (parisc, m68k, alpha, and probably quite some others).
That way applications can detect at configure time if the platform
supports that, and can leave out the functionality completely.

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Aug. 26, 2017, 10:46 p.m. UTC | #3
On Sat, Aug 26, 2017 at 12:50 PM, Helge Deller <deller@gmx.de> wrote:
> On 26.08.2017 17:15, Dan Williams wrote:
[..]
>> I have not seen any patches for parisc pmem+dax enabling so it seems
>> too early to worry about these "last mile" enabling features of
>> MAP_DIRECT and MAP_SYNC. In particular parisc doesn't appear to have
>> ARCH_ENABLE_MEMORY_HOTPLUG, so as far as I can see it can't yet
>> support the ZONE_DEVICE scheme that is a pre-requisite for MAP_DIRECT.
>
> I see, but then it's probably best to not to define any MAP_DIRECT or
> MAP_SYNC at all in the headers of those arches which don't support
> pmem+dax (parisc, m68k, alpha, and probably quite some others).
> That way applications can detect at configure time if the platform
> supports that, and can leave out the functionality completely.

Yes, that's a good idea we can handle this similar to
CONFIG_MMAP_ALLOW_UNINITIALIZED. These patches will also modify
'struct file_operations' so that do_mmap() can validate whether a flag
is supported on per architecture basis. Also the plan is to plumb the
flags passed to the syscall all the way down to the individual mmap
implementations. The ext4 and xfs ->mmap() operations will be able to
return -EOPNOTSUP based on runtime variables.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirill A. Shutemov Aug. 26, 2017, 11:56 p.m. UTC | #4
On Sat, Aug 26, 2017 at 03:46:22PM -0700, Dan Williams wrote:
> On Sat, Aug 26, 2017 at 12:50 PM, Helge Deller <deller@gmx.de> wrote:
> > On 26.08.2017 17:15, Dan Williams wrote:
> [..]
> >> I have not seen any patches for parisc pmem+dax enabling so it seems
> >> too early to worry about these "last mile" enabling features of
> >> MAP_DIRECT and MAP_SYNC. In particular parisc doesn't appear to have
> >> ARCH_ENABLE_MEMORY_HOTPLUG, so as far as I can see it can't yet
> >> support the ZONE_DEVICE scheme that is a pre-requisite for MAP_DIRECT.
> >
> > I see, but then it's probably best to not to define any MAP_DIRECT or
> > MAP_SYNC at all in the headers of those arches which don't support
> > pmem+dax (parisc, m68k, alpha, and probably quite some others).
> > That way applications can detect at configure time if the platform
> > supports that, and can leave out the functionality completely.
> 
> Yes, that's a good idea we can handle this similar to
> CONFIG_MMAP_ALLOW_UNINITIALIZED. These patches will also modify
> 'struct file_operations' so that do_mmap() can validate whether a flag
> is supported on per architecture basis. Also the plan is to plumb the
> flags passed to the syscall all the way down to the individual mmap
> implementations. The ext4 and xfs ->mmap() operations will be able to
> return -EOPNOTSUP based on runtime variables.

BTW, we may be able to reuse the bit used for MAP_UNINITIALIZED -- it's
only used on !MMU machines.

Patch
diff mbox

diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 9a9c2fe..43b9a1e 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -13,6 +13,7 @@ 
 #define MAP_PRIVATE	0x02		/* Changes are private */
 #define MAP_TYPE	0x03		/* Mask for type of mapping */
 #define MAP_FIXED	0x04		/* Interpret addr exactly */
+#define MAP_DIRECT	0x08		/* Interpret addr exactly */
 #define MAP_ANONYMOUS	0x10		/* don't use a file */
 
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index 378a754..0499f87 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -270,6 +270,10 @@  asmlinkage unsigned long sys_mmap2(unsigned long addr, unsigned long len,
 {
 	/* Make sure the shift for mmap2 is constant (12), no matter what PAGE_SIZE
 	   we have. */
+#if !defined(CONFIG_HAVE_MAP_DIRECT_SUPPORT)
+	if (flags & MAP_DIRECT)
+		return -EINVAL;
+#endif
 	return sys_mmap_pgoff(addr, len, prot, flags, fd,
 			      pgoff >> (PAGE_SHIFT - 12));
 }
@@ -278,6 +282,10 @@  asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len,
 		unsigned long prot, unsigned long flags, unsigned long fd,
 		unsigned long offset)
 {
+#if !defined(CONFIG_HAVE_MAP_DIRECT_SUPPORT)
+	if (flags & MAP_DIRECT)
+		return -EINVAL;
+#endif
 	if (!(offset & ~PAGE_MASK)) {
 		return sys_mmap_pgoff(addr, len, prot, flags, fd,
 					offset >> PAGE_SHIFT);