Message ID | 150776923320.9144.6119113178052262946.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, > diff --git a/mm/mmap.c b/mm/mmap.c > index 680506faceae..2649c00581a0 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1389,6 +1389,18 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > struct inode *inode = file_inode(file); > > switch (flags & MAP_TYPE) { > + case MAP_SHARED_VALIDATE: > + if ((flags & ~LEGACY_MAP_MASK) == 0) { > + /* > + * If all legacy mmap flags, downgrade > + * to MAP_SHARED, i.e. invoke ->mmap() > + * instead of ->mmap_validate() > + */ > + flags &= ~MAP_TYPE; > + flags |= MAP_SHARED; > + } else if (!file->f_op->mmap_validate) > + return -EOPNOTSUPP; > + /* fall through */ > case MAP_SHARED: > if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE)) > return -EACCES; When thinking a bit more about this I've realized one problem: Currently user can call mmap() with MAP_SHARED type and MAP_SYNC or MAP_DIRECT flags and he will get the new semantics (if the kernel happens to support it). I think that is undesirable and we should force usage of MAP_SHARED_VALIDATE when you want to use flags outside of LEGACY_MAP_MASK. So I'd just mask off non-legacy flags for MAP_SHARED mappings (so they would be silently ignored as they used to be until now). Honza
On Thu, Oct 12, 2017 at 6:51 AM, Jan Kara <jack@suse.cz> wrote: > > When thinking a bit more about this I've realized one problem: Currently > user can call mmap() with MAP_SHARED type and MAP_SYNC or MAP_DIRECT flags > and he will get the new semantics (if the kernel happens to support it). I > think that is undesirable [..] Why? If you have a performance preference for MAP_DIRECT or something like that, but you don't want to *enforce* it, you'd use just plain MAP_SHARED with it. Ie there may well be "I want this to work, possibly with downsides" issues. So it seems to be a reasonable model, and disallowing it seems to limit people and not really help anything. Linus
On Thu, Oct 12, 2017 at 09:32:17AM -0700, Linus Torvalds wrote: > On Thu, Oct 12, 2017 at 6:51 AM, Jan Kara <jack@suse.cz> wrote: > > > > When thinking a bit more about this I've realized one problem: Currently > > user can call mmap() with MAP_SHARED type and MAP_SYNC or MAP_DIRECT flags > > and he will get the new semantics (if the kernel happens to support it). I > > think that is undesirable [..] > > Why? > > If you have a performance preference for MAP_DIRECT or something like > that, but you don't want to *enforce* it, you'd use just plain > MAP_SHARED with it. > > Ie there may well be "I want this to work, possibly with downsides" issues. > > So it seems to be a reasonable model, and disallowing it seems to > limit people and not really help anything. I don't think for MAP_DIRECT it matters (and I think we shouldn't have MAP_DIRECT to start with, see the discussions later in the thread). But for the main use case, MAP_SYNC you really want a hard error when you don't get it. And while we could tell people that they should only use MAP_SYNC with MAP_SHARED_VALIDATE instead of MAP_SHARED chances that they get it wrong are extremely high. On the other hand if you really only want a flag to optimize calling mmap twice is very little overhead, and a very good documentation of you intent: addr = mmap(...., MAP_SHARED_VALIDATE | MAP_DIRECT, ...); if (!addr && errno = EOPNOTSUPP) { /* MAP_DIRECT didn't work, we'll just cope using blah, blah */ addr = mmap(...., MAP_SHARED, ...); } if (!addr) goto handle_error;
On Thu 12-10-17 09:32:17, Linus Torvalds wrote: > On Thu, Oct 12, 2017 at 6:51 AM, Jan Kara <jack@suse.cz> wrote: > > > > When thinking a bit more about this I've realized one problem: Currently > > user can call mmap() with MAP_SHARED type and MAP_SYNC or MAP_DIRECT flags > > and he will get the new semantics (if the kernel happens to support it). I > > think that is undesirable [..] > > Why? > > If you have a performance preference for MAP_DIRECT or something like > that, but you don't want to *enforce* it, you'd use just plain > MAP_SHARED with it. > > Ie there may well be "I want this to work, possibly with downsides" issues. > > So it seems to be a reasonable model, and disallowing it seems to > limit people and not really help anything. I have two concerns: 1) IMHO it supports sloppy programming from userspace - if application asks e.g. for MAP_DIRECT and doesn't know whether it gets it or not, it would have to be very careful not to assume anything about that in its code. And frankly I think the most likely scenario is that a programmer will just use MAP_SHARED | MAP_DIRECT, *assume* he will get the MAP_DIRECT semantics if the call does not fail and then complain when his application breaks. 2) In theory there could be an application that inadvertedly sets some high flag bits and now it would get confused by getting different mmap(2) semantics. But I agree this is mostly theoretical. Overall I think the benefit of being able to say "do MAP_DIRECT if you can" does not outweight the risk of bugs in userspace applications. Especially since userspace can easily implement the same semantics by retrying the mmap(2) call without MAP_SHARED_VALIDATE | MAP_DIRECT. Honza
diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 3b26cc62dadb..f85f18ffbf8c 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -11,6 +11,7 @@ #define MAP_SHARED 0x01 /* Share changes */ #define MAP_PRIVATE 0x02 /* Changes are private */ +#define MAP_SHARED_VALIDATE 0x3 /* share + validate extension flags */ #define MAP_TYPE 0x0f /* Mask for type of mapping (OSF/1 is _wrong_) */ #define MAP_FIXED 0x100 /* Interpret addr exactly */ #define MAP_ANONYMOUS 0x10 /* don't use a file */ diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index da3216007fe0..054314bb062a 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -28,6 +28,7 @@ */ #define MAP_SHARED 0x001 /* Share changes */ #define MAP_PRIVATE 0x002 /* Changes are private */ +#define MAP_SHARED_VALIDATE 0x3 /* share + validate extension flags */ #define MAP_TYPE 0x00f /* Mask for type of mapping */ #define MAP_FIXED 0x010 /* Interpret addr exactly */ diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c index 019035d7225c..cf10654477a9 100644 --- a/arch/mips/kernel/vdso.c +++ b/arch/mips/kernel/vdso.c @@ -110,7 +110,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, VM_READ|VM_WRITE|VM_EXEC| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, - 0, NULL); + 0, NULL, 0); if (IS_ERR_VALUE(base)) { ret = base; goto out; diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index 775b5d5e41a1..a66fdb9c4b6d 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -11,6 +11,7 @@ #define MAP_SHARED 0x01 /* Share changes */ #define MAP_PRIVATE 0x02 /* Changes are private */ +#define MAP_SHARED_VALIDATE 0x3 /* share + validate extension flags */ #define MAP_TYPE 0x03 /* Mask for type of mapping */ #define MAP_FIXED 0x04 /* Interpret addr exactly */ #define MAP_ANONYMOUS 0x10 /* don't use a file */ diff --git a/arch/tile/mm/elf.c b/arch/tile/mm/elf.c index 889901824400..5ffcbe76aef9 100644 --- a/arch/tile/mm/elf.c +++ b/arch/tile/mm/elf.c @@ -143,7 +143,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, unsigned long addr = MEM_USER_INTRPT; addr = mmap_region(NULL, addr, INTRPT_SIZE, VM_READ|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0, NULL); + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0, + NULL, 0); if (addr > (unsigned long) -PAGE_SIZE) retval = (int) addr; } diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index b15b278aa314..875b0e6f7499 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -35,6 +35,7 @@ */ #define MAP_SHARED 0x001 /* Share changes */ #define MAP_PRIVATE 0x002 /* Changes are private */ +#define MAP_SHARED_VALIDATE 0x3 /* share + validate extension flags */ #define MAP_TYPE 0x00f /* Mask for type of mapping */ #define MAP_FIXED 0x010 /* Interpret addr exactly */ diff --git a/include/linux/fs.h b/include/linux/fs.h index 13dab191a23e..5aee97d64cae 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1701,6 +1701,8 @@ struct file_operations { long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); long (*compat_ioctl) (struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); + int (*mmap_validate) (struct file *, struct vm_area_struct *, + unsigned long); int (*open) (struct inode *, struct file *); int (*flush) (struct file *, fl_owner_t id); int (*release) (struct inode *, struct file *); diff --git a/include/linux/mm.h b/include/linux/mm.h index 065d99deb847..38f6ed954dde 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2133,7 +2133,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo extern unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf); + struct list_head *uf, unsigned long map_flags); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, diff --git a/include/linux/mman.h b/include/linux/mman.h index c8367041fafd..94b63b4d71ff 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -7,6 +7,45 @@ #include <linux/atomic.h> #include <uapi/linux/mman.h> +/* + * Arrange for legacy / undefined architecture specific flags to be + * ignored by default in LEGACY_MAP_MASK. + */ +#ifndef MAP_32BIT +#define MAP_32BIT 0 +#endif +#ifndef MAP_HUGE_2MB +#define MAP_HUGE_2MB 0 +#endif +#ifndef MAP_HUGE_1GB +#define MAP_HUGE_1GB 0 +#endif +#ifndef MAP_UNINITIALIZED +#define MAP_UNINITIALIZED 0 +#endif + +/* + * The historical set of flags that all mmap implementations implicitly + * support when a ->mmap_validate() op is not provided in file_operations. + */ +#define LEGACY_MAP_MASK (MAP_SHARED \ + | MAP_PRIVATE \ + | MAP_FIXED \ + | MAP_ANONYMOUS \ + | MAP_DENYWRITE \ + | MAP_EXECUTABLE \ + | MAP_UNINITIALIZED \ + | MAP_GROWSDOWN \ + | MAP_LOCKED \ + | MAP_NORESERVE \ + | MAP_POPULATE \ + | MAP_NONBLOCK \ + | MAP_STACK \ + | MAP_HUGETLB \ + | MAP_32BIT \ + | MAP_HUGE_2MB \ + | MAP_HUGE_1GB) + extern int sysctl_overcommit_memory; extern int sysctl_overcommit_ratio; extern unsigned long sysctl_overcommit_kbytes; diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 203268f9231e..debd98c2eb83 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -16,6 +16,7 @@ #define MAP_SHARED 0x01 /* Share changes */ #define MAP_PRIVATE 0x02 /* Changes are private */ +#define MAP_SHARED_VALIDATE 0x3 /* share + validate extension flags */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */ #define MAP_ANONYMOUS 0x20 /* don't use a file */ diff --git a/mm/mmap.c b/mm/mmap.c index 680506faceae..2649c00581a0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1389,6 +1389,18 @@ unsigned long do_mmap(struct file *file, unsigned long addr, struct inode *inode = file_inode(file); switch (flags & MAP_TYPE) { + case MAP_SHARED_VALIDATE: + if ((flags & ~LEGACY_MAP_MASK) == 0) { + /* + * If all legacy mmap flags, downgrade + * to MAP_SHARED, i.e. invoke ->mmap() + * instead of ->mmap_validate() + */ + flags &= ~MAP_TYPE; + flags |= MAP_SHARED; + } else if (!file->f_op->mmap_validate) + return -EOPNOTSUPP; + /* fall through */ case MAP_SHARED: if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE)) return -EACCES; @@ -1465,7 +1477,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; } - addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); + addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, flags); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) @@ -1602,7 +1614,7 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags) unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf) + struct list_head *uf, unsigned long map_flags) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma, *prev; @@ -1687,7 +1699,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr, * new file must not have been exposed to user-space, yet. */ vma->vm_file = get_file(file); - error = call_mmap(file, vma); + if ((map_flags & MAP_TYPE) == MAP_SHARED_VALIDATE) + error = file->f_op->mmap_validate(file, vma, map_flags); + else + error = call_mmap(file, vma); if (error) goto unmap_and_free_vma; diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h index 203268f9231e..debd98c2eb83 100644 --- a/tools/include/uapi/asm-generic/mman-common.h +++ b/tools/include/uapi/asm-generic/mman-common.h @@ -16,6 +16,7 @@ #define MAP_SHARED 0x01 /* Share changes */ #define MAP_PRIVATE 0x02 /* Changes are private */ +#define MAP_SHARED_VALIDATE 0x3 /* share + validate extension flags */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */ #define MAP_ANONYMOUS 0x20 /* don't use a file */