diff mbox series

[2/2] mm: adds NOSIGBUS extension for out-of-band shmem read

Message ID 1622589753-9206-3-git-send-email-mlin@kernel.org (mailing list archive)
State New, archived
Headers show
Series mm: adds MAP_NOSIGBUS extension for shmem read | expand

Commit Message

Ming Lin June 1, 2021, 11:22 p.m. UTC
Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
"don't SIGBUS on read beyond i_size". This flag is only allowed
for read only shmem mapping.

If you use MAP_NOSIGBUS, and you access pages that don't have a backing
store, you will get zero pages, and they will NOT BE SYNCHRONIZED with
the backing store possibly later being updated.

Any user that uses MAP_NOSIGBUS had better just accept that it's not
compatible with expanding the shmem backing store later.

Signed-off-by: Ming Lin <mlin@kernel.org>
---
 include/linux/mm.h                     |  2 ++
 include/linux/mman.h                   |  1 +
 include/uapi/asm-generic/mman-common.h |  1 +
 mm/mmap.c                              |  3 +++
 mm/shmem.c                             | 17 ++++++++++++++++-
 5 files changed, 23 insertions(+), 1 deletion(-)

Comments

Linus Torvalds June 2, 2021, 12:16 a.m. UTC | #1
This series passes my "looks fine, is simple and straightforward" test.

One nit:

On Tue, Jun 1, 2021 at 1:22 PM Ming Lin <mlin@kernel.org> wrote:
>
> +               error = vm_insert_page(vma, (unsigned long)vmf->address,
> +                                       ZERO_PAGE(0));

On architectures where this matters - bad virtual caches - it would be
better to use ZERO_PAGE(vmf->address).

It doesn't make a difference on any sane architecture, but it's the
RightThing(tm) to do.

            Linus
Ming Lin June 2, 2021, 1:06 a.m. UTC | #2
On 6/1/2021 5:16 PM, Linus Torvalds wrote:
> This series passes my "looks fine, is simple and straightforward" test.
>
> One nit:
>
> On Tue, Jun 1, 2021 at 1:22 PM Ming Lin <mlin@kernel.org> wrote:
>>
>> +               error = vm_insert_page(vma, (unsigned long)vmf->address,
>> +                                       ZERO_PAGE(0));
>
> On architectures where this matters - bad virtual caches - it would be
> better to use ZERO_PAGE(vmf->address).
>
> It doesn't make a difference on any sane architecture, but it's the
> RightThing(tm) to do.


grep -Rn ZERO_PAGE linux/arch/ | grep define

s390 and mips do use the "address" of ZERO_PAGE(address)

Fixed.
kernel test robot June 2, 2021, 2:02 a.m. UTC | #3
Hi Ming,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on arm64/for-next/core powerpc/next asm-generic/master linus/master v5.13-rc4]
[cannot apply to hnaz-linux-mm/master tip/x86/core next-20210601]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ming-Lin/mm-adds-MAP_NOSIGBUS-extension-for-shmem-read/20210602-072403
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git dd860052c99b1e088352bdd4fb7aef46f8d2ef47
config: parisc-randconfig-r015-20210601 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/c14d1ac79e68e85a2ff97e19c36100990b09a7c3
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ming-Lin/mm-adds-MAP_NOSIGBUS-extension-for-shmem-read/20210602-072403
        git checkout c14d1ac79e68e85a2ff97e19c36100990b09a7c3
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from mm/filemap.c:24:
   include/linux/mman.h: In function 'calc_vm_flag_bits':
>> include/linux/mman.h:157:31: error: 'MAP_NOSIGBUS' undeclared (first use in this function); did you mean 'VM_NOSIGBUS'?
     157 |         _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
         |                               ^~~~~~~~~~~~
   include/linux/mman.h:131:7: note: in definition of macro '_calc_vm_trans'
     131 |   ((!(bit1) || !(bit2)) ? 0 : \
         |       ^~~~
   include/linux/mman.h:157:31: note: each undeclared identifier is reported only once for each function it appears in
     157 |         _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
         |                               ^~~~~~~~~~~~
   include/linux/mman.h:131:7: note: in definition of macro '_calc_vm_trans'
     131 |   ((!(bit1) || !(bit2)) ? 0 : \
         |       ^~~~
--
   In file included from mm/util.c:15:
   include/linux/mman.h: In function 'calc_vm_flag_bits':
>> include/linux/mman.h:157:31: error: 'MAP_NOSIGBUS' undeclared (first use in this function); did you mean 'VM_NOSIGBUS'?
     157 |         _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
         |                               ^~~~~~~~~~~~
   include/linux/mman.h:131:7: note: in definition of macro '_calc_vm_trans'
     131 |   ((!(bit1) || !(bit2)) ? 0 : \
         |       ^~~~
   include/linux/mman.h:157:31: note: each undeclared identifier is reported only once for each function it appears in
     157 |         _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
         |                               ^~~~~~~~~~~~
   include/linux/mman.h:131:7: note: in definition of macro '_calc_vm_trans'
     131 |   ((!(bit1) || !(bit2)) ? 0 : \
         |       ^~~~
   mm/util.c: In function 'page_mapping':
   mm/util.c:700:15: warning: variable 'entry' set but not used [-Wunused-but-set-variable]
     700 |   swp_entry_t entry;
         |               ^~~~~
--
   In file included from mm/mmap.c:18:
   include/linux/mman.h: In function 'calc_vm_flag_bits':
>> include/linux/mman.h:157:31: error: 'MAP_NOSIGBUS' undeclared (first use in this function); did you mean 'VM_NOSIGBUS'?
     157 |         _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
         |                               ^~~~~~~~~~~~
   include/linux/mman.h:131:7: note: in definition of macro '_calc_vm_trans'
     131 |   ((!(bit1) || !(bit2)) ? 0 : \
         |       ^~~~
   include/linux/mman.h:157:31: note: each undeclared identifier is reported only once for each function it appears in
     157 |         _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
         |                               ^~~~~~~~~~~~
   include/linux/mman.h:131:7: note: in definition of macro '_calc_vm_trans'
     131 |   ((!(bit1) || !(bit2)) ? 0 : \
         |       ^~~~
   mm/mmap.c: In function 'do_mmap':
>> mm/mmap.c:1422:15: error: 'MAP_NOSIGBUS' undeclared (first use in this function); did you mean 'VM_NOSIGBUS'?
    1422 |  if ((flags & MAP_NOSIGBUS) && ((prot & PROT_WRITE) || !shmem_file(file)))
         |               ^~~~~~~~~~~~
         |               VM_NOSIGBUS
   In file included from mm/mmap.c:18:
   include/linux/mman.h: In function 'calc_vm_flag_bits':
   include/linux/mman.h:159:1: error: control reaches end of non-void function [-Werror=return-type]
     159 | }
         | ^
   cc1: some warnings being treated as errors
--
   In file included from drivers/char/mem.c:16:
   include/linux/mman.h: In function 'calc_vm_flag_bits':
>> include/linux/mman.h:157:31: error: 'MAP_NOSIGBUS' undeclared (first use in this function); did you mean 'VM_NOSIGBUS'?
     157 |         _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
         |                               ^~~~~~~~~~~~
   include/linux/mman.h:131:7: note: in definition of macro '_calc_vm_trans'
     131 |   ((!(bit1) || !(bit2)) ? 0 : \
         |       ^~~~
   include/linux/mman.h:157:31: note: each undeclared identifier is reported only once for each function it appears in
     157 |         _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
         |                               ^~~~~~~~~~~~
   include/linux/mman.h:131:7: note: in definition of macro '_calc_vm_trans'
     131 |   ((!(bit1) || !(bit2)) ? 0 : \
         |       ^~~~
   drivers/char/mem.c: At top level:
   drivers/char/mem.c:95:29: warning: no previous prototype for 'unxlate_dev_mem_ptr' [-Wmissing-prototypes]
      95 | #define unxlate_dev_mem_ptr unxlate_dev_mem_ptr
         |                             ^~~~~~~~~~~~~~~~~~~
   drivers/char/mem.c:96:13: note: in expansion of macro 'unxlate_dev_mem_ptr'
      96 | void __weak unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
         |             ^~~~~~~~~~~~~~~~~~~


vim +157 include/linux/mman.h

   146	
   147	/*
   148	 * Combine the mmap "flags" argument into "vm_flags" used internally.
   149	 */
   150	static inline vm_flags_t
   151	calc_vm_flag_bits(unsigned long flags)
   152	{
   153		return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
   154		       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
   155		       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
   156		       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
 > 157		       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
   158		       arch_calc_vm_flag_bits(flags);
   159	}
   160	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hugh Dickins June 2, 2021, 2:13 a.m. UTC | #4
On Tue, 1 Jun 2021, Linus Torvalds wrote:

> This series passes my "looks fine, is simple and straightforward" test.

I'm sorry, but it also passes my "hack that we do not want in shmem.c"
test. I'll say more in response to the preceding mail.

Hugh

> 
> One nit:
> 
> On Tue, Jun 1, 2021 at 1:22 PM Ming Lin <mlin@kernel.org> wrote:
> >
> > +               error = vm_insert_page(vma, (unsigned long)vmf->address,
> > +                                       ZERO_PAGE(0));
> 
> On architectures where this matters - bad virtual caches - it would be
> better to use ZERO_PAGE(vmf->address).
> 
> It doesn't make a difference on any sane architecture, but it's the
> RightThing(tm) to do.
> 
>             Linus
>
Hugh Dickins June 2, 2021, 3:49 a.m. UTC | #5
On Tue, 1 Jun 2021, Ming Lin wrote:

> Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
> "don't SIGBUS on read beyond i_size". This flag is only allowed
> for read only shmem mapping.
> 
> If you use MAP_NOSIGBUS, and you access pages that don't have a backing
> store, you will get zero pages, and they will NOT BE SYNCHRONIZED with
> the backing store possibly later being updated.
> 
> Any user that uses MAP_NOSIGBUS had better just accept that it's not
> compatible with expanding the shmem backing store later.
> 
> Signed-off-by: Ming Lin <mlin@kernel.org>

I disagree with Linus on this: I think it's a mistake,
and is being targeted at tmpfs to avoid wider scrutiny.
Though I have a more constructive suggestion under your mmap.c mod.

I've added linux-fsdevel and linux-api to the Cc list:
linux-api definitely needed to approve any MAP_NOSIGBUS semantics;
linux-fsdevel shouldn't be affected, but they need to know about it.

The prior discussion on "Sealed memfd & no-fault mmap" is at
https://lore.kernel.org/linux-mm/vs1Us2sm4qmfvLOqNat0-r16GyfmWzqUzQ4KHbXJwEcjhzeoQ4sBTxx7QXDG9B6zk5AeT7FsNb3CSr94LaKy6Novh1fbbw8D_BBxYsbPLms=@emersion.fr/

I've not yet seen a response from Simon Ser, as to whether this
kind of "opaque blob of zeroes" implementation would be of any
use to Wayland: you expected it to be a problem, and we shouldn't
waste any time on it if it's not going to be useful to someone.

Maybe there will be other takers (certainly SIGBUS is unpopular).

> ---
>  include/linux/mm.h                     |  2 ++
>  include/linux/mman.h                   |  1 +
>  include/uapi/asm-generic/mman-common.h |  1 +
>  mm/mmap.c                              |  3 +++
>  mm/shmem.c                             | 17 ++++++++++++++++-
>  5 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e9d67bc..5d0e0dc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>  # define VM_UFFD_MINOR		VM_NONE
>  #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
>  
> +#define VM_NOSIGBUS		VM_FLAGS_BIT(38)	/* Do not SIGBUS on out-of-band shmem read */

"out-of-band shmem read" means nothing to me: "Do not SIGBUS on fault".

> +
>  /* Bits set in the VMA until the stack is in its final location */
>  #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
>  
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index b2cbae9..c966b08 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags)
>  	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
>  	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>  	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
> +	       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
>  	       arch_calc_vm_flag_bits(flags);
>  }
>  
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index f94f65d..55f4be0 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -29,6 +29,7 @@
>  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
>  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
>  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on out-of-band shmem read */

Ditto.

>  
>  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 096bba4..69cd856 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1419,6 +1419,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  	if (!len)
>  		return -EINVAL;
>  
> +	if ((flags & MAP_NOSIGBUS) && ((prot & PROT_WRITE) || !shmem_file(file)))
> +		return -EINVAL;
> +

No, for several reasons.

This has nothing to do with shmem really, that's just where this patch
hacks it in - and where you have a first user in mind.  If this goes
forward, please modify mm/memory.c not mm/shmem.c, to make
VM_FAULT_SIGBUS on fault to VM_NOSIGBUS vma do the mapping of zero page.

(prot & PROT_WRITE) tells you about the mmap() flags, but says nothing
about what mprotect() could do later on.  Look out for VM_SHARED and
VM_MAYSHARE and VM_MAYWRITE further down; and beware the else (!file)
block below them, shared anonymous would need more protection too.

Constructive comment: I guess much of my objection to this feature
comes from allowing it in the MAP_SHARED case.  If you restrict it
to MAP_PRIVATE mapping of file, then it's less objectionable, and
you won't have to worry (so much?) about write protection.  Copy
on write is normal there, and it's well established that subsequent
changes in the file will not be shared; you'd just be extending that
behaviour from writes to sigbusy reads.

And by restricting to MAP_PRIVATE, you would allow for adding a
proper MAP_SHARED implementation later, if it's thought useful
(that being the implementation which can subsequently unmap a
zero page to let new page cache be mapped).

>  	/*
>  	 * Does the application expect PROT_READ to imply PROT_EXEC?
>  	 *
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 5d46611..5d15b08 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1812,7 +1812,22 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
>  repeat:
>  	if (sgp <= SGP_CACHE &&
>  	    ((loff_t)index << PAGE_SHIFT) >= i_size_read(inode)) {
> -		return -EINVAL;
> +		if (!vma || !(vma->vm_flags & VM_NOSIGBUS))
> +			return -EINVAL;
> +
> +		vma->vm_flags |= VM_MIXEDMAP;

No.  Presumably you hit the BUG_ON(mmap_read_trylock(vma->vm_mm))
in vm_insert_page(), so decided to modify the vm_flags here: no,
that BUG is saying you need mmap_write_lock() to write vm_flags.

And I have no idea of the ramifications of shmem in a VM_MIXEDMAP
vma; perhaps it works out fine, but I'd have to research that.
I'd rather not.

> +		/*
> +		 * Get zero page for MAP_NOSIGBUS mapping, which isn't
> +                 * coherent wrt shmem contents that are expanded and
> +		 * filled in later.
> +		 */
> +		error = vm_insert_page(vma, (unsigned long)vmf->address,
> +					ZERO_PAGE(0));
> +		if (error)
> +			return error;
> +
> +		*fault_type = VM_FAULT_NOPAGE;
> +		return 0;

But there are other ways in which shmem_getpage_gfp() can fail and
shmem_fault() end up returning VM_FAULT_SIGBUS.  Notably -ENOSPC.
It's trivial for someone to pass the MAP_NOSIGBUS user the fd of a
sparse file in a full filesystem, causing SIGBUS on access despite
MAP_NOSIGBUS.  On shmem or some other filesystem.

I say the VM_FAULT_SIGBUS->map-in-zero-page handling should be back
in mm/memory.c, where it calls ->fault(): where others can review it.

One other thing while it crosses my mind.  You'll need to decide
what truncating or hole-punching the file does to the zero pages
in its userspace mappings.  I may turn out wrong, but I think you'll
find that truncation removes them, but hole-punch leaves them, and
ought to be modified to remove them too (it's a matter of how the
"even_cows" arg to unmap_mapping_range() is treated).

Hugh

>  	}
>  
>  	sbinfo = SHMEM_SB(inode->i_sb);
> -- 
> 1.8.3.1
Chen, Rong A June 2, 2021, 9:30 a.m. UTC | #6
Hi Ming,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on arm64/for-next/core powerpc/next asm-generic/master linus/master v5.13-rc4]
[cannot apply to hnaz-linux-mm/master tip/x86/core next-20210601]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ming-Lin/mm-adds-MAP_NOSIGBUS-extension-for-shmem-read/20210602-072403
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git dd860052c99b1e088352bdd4fb7aef46f8d2ef47
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
cd tools/perf && ./check-headers.sh

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


perfheadercheck warnings: (new ones prefixed by >>)
>> Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h':   32> #define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on out-of-band shmem read */

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Ming Lin June 3, 2021, 12:05 a.m. UTC | #7
On 6/1/2021 8:49 PM, Hugh Dickins wrote:

>> index 096bba4..69cd856 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1419,6 +1419,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>>   	if (!len)
>>   		return -EINVAL;
>>   
>> +	if ((flags & MAP_NOSIGBUS) && ((prot & PROT_WRITE) || !shmem_file(file)))
>> +		return -EINVAL;
>> +
> 
> No, for several reasons.
> 
> This has nothing to do with shmem really, that's just where this patch
> hacks it in - and where you have a first user in mind.  If this goes
> forward, please modify mm/memory.c not mm/shmem.c, to make
> VM_FAULT_SIGBUS on fault to VM_NOSIGBUS vma do the mapping of zero page.
> 
> (prot & PROT_WRITE) tells you about the mmap() flags, but says nothing
> about what mprotect() could do later on.  Look out for VM_SHARED and
> VM_MAYSHARE and VM_MAYWRITE further down; and beware the else (!file)
> block below them, shared anonymous would need more protection too.
> 
> Constructive comment: I guess much of my objection to this feature
> comes from allowing it in the MAP_SHARED case.  If you restrict it
> to MAP_PRIVATE mapping of file, then it's less objectionable, and
> you won't have to worry (so much?) about write protection.  Copy
> on write is normal there, and it's well established that subsequent
> changes in the file will not be shared; you'd just be extending that
> behaviour from writes to sigbusy reads.
> 
> And by restricting to MAP_PRIVATE, you would allow for adding a
> proper MAP_SHARED implementation later, if it's thought useful
> (that being the implementation which can subsequently unmap a
> zero page to let new page cache be mapped).

This is what I wrote so far.

---
  include/linux/mm.h                     |  2 ++
  include/linux/mman.h                   |  1 +
  include/uapi/asm-generic/mman-common.h |  1 +
  mm/memory.c                            | 12 ++++++++++++
  mm/mmap.c                              |  4 ++++
  5 files changed, 20 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e9d67bc..af9e277 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
  # define VM_UFFD_MINOR		VM_NONE
  #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
  
+#define VM_NOSIGBUS		VM_FLAGS_BIT(38)	/* Do not SIGBUS on fault */
+
  /* Bits set in the VMA until the stack is in its final location */
  #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
  
diff --git a/include/linux/mman.h b/include/linux/mman.h
index b2cbae9..c966b08 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags)
  	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
  	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
  	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
+	       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
  	       arch_calc_vm_flag_bits(flags);
  }
  
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index f94f65d..a2a5333 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -29,6 +29,7 @@
  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
  
  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
  					 * uninitialized */
diff --git a/mm/memory.c b/mm/memory.c
index eff2a47..7195dac 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3676,6 +3676,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
  	}
  
  	ret = vma->vm_ops->fault(vmf);
+	if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS)) {
+		/*
+		 * Get zero page for MAP_NOSIGBUS mapping, which isn't
+		 * coherent wrt shmem contents that are expanded and
+		 * filled in later.
+		 */
+		vma->vm_flags |= VM_MIXEDMAP;
+		if (!vm_insert_page(vma, (unsigned long)vmf->address,
+				ZERO_PAGE(vmf->address)))
+			return VM_FAULT_NOPAGE;
+	}
+
  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
  			    VM_FAULT_DONE_COW)))
  		return ret;
diff --git a/mm/mmap.c b/mm/mmap.c
index 096bba4..74fb49a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1419,6 +1419,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
  	if (!len)
  		return -EINVAL;
  
+	/* Restrict MAP_NOSIGBUS to MAP_PRIVATE mapping */
+	if ((flags & MAP_NOSIGBUS) && !(flags & MAP_PRIVATE))
+		return -EINVAL;
+
  	/*
  	 * Does the application expect PROT_READ to imply PROT_EXEC?
  	 *

> 
>>   	/*
>>   	 * Does the application expect PROT_READ to imply PROT_EXEC?
>>   	 *
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 5d46611..5d15b08 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1812,7 +1812,22 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
>>   repeat:
>>   	if (sgp <= SGP_CACHE &&
>>   	    ((loff_t)index << PAGE_SHIFT) >= i_size_read(inode)) {
>> -		return -EINVAL;
>> +		if (!vma || !(vma->vm_flags & VM_NOSIGBUS))
>> +			return -EINVAL;
>> +
>> +		vma->vm_flags |= VM_MIXEDMAP;
> 
> No.  Presumably you hit the BUG_ON(mmap_read_trylock(vma->vm_mm))
> in vm_insert_page(), so decided to modify the vm_flags here: no,
> that BUG is saying you need mmap_write_lock() to write vm_flags.

But the comments above vm_insert_page() told me to set VM_MIXEDMAP on vma

  * Usually this function is called from f_op->mmap() handler
  * under mm->mmap_lock write-lock, so it can change vma->vm_flags.
  * Caller must set VM_MIXEDMAP on vma if it wants to call this
  * function from other places, for example from page-fault handler.

> 
> One other thing while it crosses my mind.  You'll need to decide
> what truncating or hole-punching the file does to the zero pages
> in its userspace mappings.  I may turn out wrong, but I think you'll
> find that truncation removes them, but hole-punch leaves them, and
> ought to be modified to remove them too (it's a matter of how the
> "even_cows" arg to unmap_mapping_range() is treated).

I did a quick test, after inserting zero pages, seems that truncation
also leaves the mappings.

I'm still reading code to learn this part ...
Hugh Dickins June 3, 2021, 12:46 a.m. UTC | #8
On Wed, 2 Jun 2021, Ming Lin wrote:
> 
> This is what I wrote so far.
> 
> ---
>  include/linux/mm.h                     |  2 ++
>  include/linux/mman.h                   |  1 +
>  include/uapi/asm-generic/mman-common.h |  1 +
>  mm/memory.c                            | 12 ++++++++++++
>  mm/mmap.c                              |  4 ++++
>  5 files changed, 20 insertions(+)

I have not looked at the rest, just looking at mm/memory.c:

> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3676,6 +3676,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
>  	}
>   	ret = vma->vm_ops->fault(vmf);
> +	if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS))
> {
> +		/*
> +		 * Get zero page for MAP_NOSIGBUS mapping, which isn't
> +		 * coherent wrt shmem contents that are expanded and
> +		 * filled in later.
> +		 */
> +		vma->vm_flags |= VM_MIXEDMAP;
> +		if (!vm_insert_page(vma, (unsigned long)vmf->address,
> +				ZERO_PAGE(vmf->address)))
> +			return VM_FAULT_NOPAGE;
> +	}
> +
>  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY
> |
>  			    VM_FAULT_DONE_COW)))
>  		return ret;

Sorry, I directed you to mm/memory.c without indicating what's
appropriate here.  Please don't attempt to use VM_MIXEDMAP and
vm_insert_page(): they're for special driver mmaps, they're no
better here than they were in mm/shmem.c.

It's do_anonymous_page()'s business to map in the zero page on
read fault (see "my_zero_pfn(vmf->address)" in there), or fill
a freshly allocated page with zeroes on write fault - and now
you're sticking to MAP_PRIVATE, write faults in VM_WRITE areas
are okay for VM_NOSIGBUS.

Ideally you can simply call do_anonymous_page() from __do_fault()
in the VM_FAULT_SIGBUS on VM_NOSIGBUS case.  That's what to start
from anyway: but look to see if there's state to be adjusted to
achieve that; and it won't be surprising if somewhere down in
do_anonymous_page() or something it calls, there's a BUG on it
being called when vma->vm_file is set, or something like that.
May need some tweaking.

Hugh
Linus Torvalds June 3, 2021, 6:25 p.m. UTC | #9
On Wed, Jun 2, 2021 at 5:46 PM Hugh Dickins <hughd@google.com> wrote:
>
> Ideally you can simply call do_anonymous_page() from __do_fault()
> in the VM_FAULT_SIGBUS on VM_NOSIGBUS case.

Heh.

We're actually then back to my original patch.

That one doesn't handle shared mappings (even read-only ones), for the
simple reason that do_anonymous_page() refuses to insert anonymous
pages into a shared mapping, and has

        /* File mapping without ->vm_ops ? */
        if (vma->vm_flags & VM_SHARED)
                return VM_FAULT_SIGBUS;

at the very top.

But yes, if we just remove that check, I think my original patch
should actually "JustWork(tm)".

I'm attaching it again, with old name and old commentary (ie that

    /* FIXME! We don't have a VM_NOFAULT bit */

should just be replaced with that VM_NOSIGBUS bit instead, and the
#if'ed out region should be enabled.

Oh, and we need to think hard about one more case: mprotect().

In particular, I think the attached patch fails horribly for the case
of a shared mapping that starts out read-only, then inserts a zero
page, then somebody does mprotect(MAP_WRITE), and then writes to the
page. I haven't checked what the write protect fault handler does, but
I think that for a shared mapping it will just make the page dirty and
writable.

Which would be horribly wrong for VM_NOSIGBUS.

So that support infrastructure that adds MAP_NOSIGBUS, and checks that
it is only done on a read-only mapping, also has to make sure that it
clears the VM_MAYWRITE bit when it sets VM_NOSIGBUS.

That way mprotect can't then later make it writable.

Hugh, comments on this approach?

Again: this patch is my *OLD* one, I didn't try to update it to the
new world order. It requires

 - Ming's MAP_NOSIGBUS ccode

 - removal of that "File mapping without ->vm_ops" case

 - that FIXME fixed and name updated

 - and that VM_MAYWRITE clearing if VM_NOSIGBUS is set, to avoid the
mprotect issue.

Hmm?

                  Linus
Hugh Dickins June 3, 2021, 7:07 p.m. UTC | #10
On Thu, 3 Jun 2021, Linus Torvalds wrote:
> On Wed, Jun 2, 2021 at 5:46 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > Ideally you can simply call do_anonymous_page() from __do_fault()
> > in the VM_FAULT_SIGBUS on VM_NOSIGBUS case.
> 
> Heh.
> 
> We're actually then back to my original patch.
> 
> That one doesn't handle shared mappings (even read-only ones), for the
> simple reason that do_anonymous_page() refuses to insert anonymous
> pages into a shared mapping, and has
> 
>         /* File mapping without ->vm_ops ? */
>         if (vma->vm_flags & VM_SHARED)
>                 return VM_FAULT_SIGBUS;
> 
> at the very top.
> 
> But yes, if we just remove that check, I think my original patch
> should actually "JustWork(tm)".

But no!

Sorry, I don't have time for this at present, so haven't looked at
your original patch.

But the point that we've arrived at, that I'm actually now fairly
happy with, is do *not* permit MAP_NOSIGBUS on MAP_SHARED mappings.

I didn't check the placement yet, easy to get wrong, but I believe
Ming Lin is now enforcing that over at the mmap() end.

On a MAP_PRIVATE mapping, the nasty opaque blob of zeroes can
claim some precedent in what already happens with COW'ed pages.

Which leaves MAP_NOSIGBUS on MAP_SHARED as currently unsupported,
perhaps never supported on anything, perhaps one day supported on
shmem; but if it's ever supported then that one will naturally be
transparent to future changes in page cache - we call that "shared".

Of course, internally, there's the in-between case of MAP_SHARED
without PROT_WRITE and without writable fd: VM_MAYSHARE without
VM_SHARED or VM_MAYWRITE.  We *could* let that one accept
MAP_NOSIGBUS, but who wants to write the manpage for it?

Please stick to MAP_PRIVATE: that's good enough.

> 
> I'm attaching it again, with old name and old commentary (ie that
> 
>     /* FIXME! We don't have a VM_NOFAULT bit */
> 
> should just be replaced with that VM_NOSIGBUS bit instead, and the
> #if'ed out region should be enabled.
> 
> Oh, and we need to think hard about one more case: mprotect().
> 
> In particular, I think the attached patch fails horribly for the case
> of a shared mapping that starts out read-only, then inserts a zero
> page, then somebody does mprotect(MAP_WRITE), and then writes to the
> page. I haven't checked what the write protect fault handler does, but
> I think that for a shared mapping it will just make the page dirty and
> writable.

Obviously the finished patch will need to be scrutinized carefully, but
I think the mprotect() questions vanish when restricted to MAP_PRIVATE.

> 
> Which would be horribly wrong for VM_NOSIGBUS.
> 
> So that support infrastructure that adds MAP_NOSIGBUS, and checks that
> it is only done on a read-only mapping, also has to make sure that it
> clears the VM_MAYWRITE bit when it sets VM_NOSIGBUS.
> 
> That way mprotect can't then later make it writable.
> 
> Hugh, comments on this approach?

Comments above, just stick to MAP_PRIVATE.

Hugh

> 
> Again: this patch is my *OLD* one, I didn't try to update it to the
> new world order. It requires
> 
>  - Ming's MAP_NOSIGBUS ccode
> 
>  - removal of that "File mapping without ->vm_ops" case
> 
>  - that FIXME fixed and name updated
> 
>  - and that VM_MAYWRITE clearing if VM_NOSIGBUS is set, to avoid the
> mprotect issue.
> 
> Hmm?
> 
>                   Linus
Linus Torvalds June 3, 2021, 7:12 p.m. UTC | #11
On Thu, Jun 3, 2021 at 12:07 PM Hugh Dickins <hughd@google.com> wrote:
>
> But the point that we've arrived at, that I'm actually now fairly
> happy with, is do *not* permit MAP_NOSIGBUS on MAP_SHARED mappings.

Yeah, if that's sufficient, then that original patch should just work as-is.

But there was some reason why people didn't like that patch
originally, and I think it was literally about how it only worked on
private mappings (the "we don't have a flag for it in the vm_flags"
part was just a small detail.

I guess that objection ended up changing over time.

            Linus
Linus Torvalds June 3, 2021, 7:15 p.m. UTC | #12
On Thu, Jun 3, 2021 at 12:12 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Yeah, if that's sufficient, then that original patch should just work as-is.

To clarify: it obviously needs the VM_xyz flags things, but the
VM_SHARED check in do_anonymous_page() is fine, and the whole issue
with VM_MAYWRITE is entirely moot.

MAP_PRIVATE works fine with zero pages even when writable - they get
COW'ed properly, of course.

               Linus
Andy Lutomirski June 3, 2021, 7:24 p.m. UTC | #13
> On Jun 3, 2021, at 12:14 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> On Thu, Jun 3, 2021 at 12:07 PM Hugh Dickins <hughd@google.com> wrote:
>> 
>> But the point that we've arrived at, that I'm actually now fairly
>> happy with, is do *not* permit MAP_NOSIGBUS on MAP_SHARED mappings.
> 
> Yeah, if that's sufficient, then that original patch should just work as-is.
> 
> But there was some reason why people didn't like that patch
> originally, and I think it was literally about how it only worked on
> private mappings (the "we don't have a flag for it in the vm_flags"
> part was just a small detail.
> 
> I guess that objection ended up changing over time.
> 
> 

I don’t understand the use case well enough to comment on whether MAP_PRIVATE is sufficient, but I’m with Hugh: if this feature is implemented for MAP_SHARED, it should be fully coherent.
Simon Ser June 3, 2021, 7:35 p.m. UTC | #14
On Thursday, June 3rd, 2021 at 9:24 PM, Andy Lutomirski <luto@amacapital.net> wrote:

> I don’t understand the use case well enough to comment on whether MAP_PRIVATE
> is sufficient, but I’m with Hugh: if this feature is implemented for
> MAP_SHARED, it should be fully coherent.

I've tried to explain what we'd need from user-space PoV in [1].
tl;dr the MAP_PRIVATE restriction would get us pretty far, even if it
won't allow us to have all of the bells and whistles.

[1]: https://lore.kernel.org/linux-mm/vs1Us2sm4qmfvLOqNat0-r16GyfmWzqUzQ4KHbXJwEcjhzeoQ4sBTxx7QXDG9B6zk5AeT7FsNb3CSr94LaKy6Novh1fbbw8D_BBxYsbPLms=@emersion.fr/T/#mb321a8d39e824740877ba95f1df780ffd52c3862
Ming Lin June 3, 2021, 7:57 p.m. UTC | #15
On 6/2/2021 5:46 PM, Hugh Dickins wrote
> 
> It's do_anonymous_page()'s business to map in the zero page on
> read fault (see "my_zero_pfn(vmf->address)" in there), or fill
> a freshly allocated page with zeroes on write fault - and now
> you're sticking to MAP_PRIVATE, write faults in VM_WRITE areas
> are okay for VM_NOSIGBUS.
> 
> Ideally you can simply call do_anonymous_page() from __do_fault()
> in the VM_FAULT_SIGBUS on VM_NOSIGBUS case.  That's what to start
> from anyway: but look to see if there's state to be adjusted to
> achieve that; and it won't be surprising if somewhere down in
> do_anonymous_page() or something it calls, there's a BUG on it
> being called when vma->vm_file is set, or something like that.
> May need some tweaking.

do_anonymous_page() works nicely for read fault and write fault.
I didn't see any BUG() thing in my test.

But I'm still struggling with how to do "punch hole should remove the mapping of zero page".
Here is the hack I have now.

diff --git a/mm/memory.c b/mm/memory.c
index 46ecda5..6b5a897 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1241,7 +1241,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
                         struct page *page;
  
                         page = vm_normal_page(vma, addr, ptent);
-                       if (unlikely(details) && page) {
+                       if (unlikely(details) && page && !(vma->vm_flags & VM_NOSIGBUS)) {
                                 /*
                                  * unmap_shared_mapping_pages() wants to
                                  * invalidate cache without truncating:


And other parts of the patch is following,

----

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e9d67bc..af9e277 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
  # define VM_UFFD_MINOR		VM_NONE
  #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
  
+#define VM_NOSIGBUS		VM_FLAGS_BIT(38)	/* Do not SIGBUS on fault */
+
  /* Bits set in the VMA until the stack is in its final location */
  #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
  
diff --git a/include/linux/mman.h b/include/linux/mman.h
index b2cbae9..c966b08 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags)
  	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
  	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
  	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
+	       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
  	       arch_calc_vm_flag_bits(flags);
  }
  
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index f94f65d..a2a5333 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -29,6 +29,7 @@
  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
  
  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
  					 * uninitialized */
diff --git a/mm/memory.c b/mm/memory.c
index eff2a47..46ecda5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3676,6 +3676,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
  	}
  
  	ret = vma->vm_ops->fault(vmf);
+	if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS)) {
+		/*
+		 * For MAP_NOSIGBUS mapping, map in the zero page on read fault
+		 * or fill a freshly allocated page with zeroes on write fault
+		 */
+		ret = do_anonymous_page(vmf);
+		if (!ret)
+			ret = VM_FAULT_NOPAGE;
+		return ret;
+	}
+
  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
  			    VM_FAULT_DONE_COW)))
  		return ret;
diff --git a/mm/mmap.c b/mm/mmap.c
index 096bba4..74fb49a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1419,6 +1419,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
  	if (!len)
  		return -EINVAL;
  
+	/* Restrict MAP_NOSIGBUS to MAP_PRIVATE mapping */
+	if ((flags & MAP_NOSIGBUS) && !(flags & MAP_PRIVATE))
+		return -EINVAL;
+
  	/*
  	 * Does the application expect PROT_READ to imply PROT_EXEC?
  	 *
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e9d67bc..5d0e0dc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -373,6 +373,8 @@  int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 # define VM_UFFD_MINOR		VM_NONE
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
 
+#define VM_NOSIGBUS		VM_FLAGS_BIT(38)	/* Do not SIGBUS on out-of-band shmem read */
+
 /* Bits set in the VMA until the stack is in its final location */
 #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
 
diff --git a/include/linux/mman.h b/include/linux/mman.h
index b2cbae9..c966b08 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -154,6 +154,7 @@  static inline bool arch_validate_flags(unsigned long flags)
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
 	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
 	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
+	       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
 	       arch_calc_vm_flag_bits(flags);
 }
 
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index f94f65d..55f4be0 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -29,6 +29,7 @@ 
 #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
 #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
 #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on out-of-band shmem read */
 
 #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
 					 * uninitialized */
diff --git a/mm/mmap.c b/mm/mmap.c
index 096bba4..69cd856 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1419,6 +1419,9 @@  unsigned long do_mmap(struct file *file, unsigned long addr,
 	if (!len)
 		return -EINVAL;
 
+	if ((flags & MAP_NOSIGBUS) && ((prot & PROT_WRITE) || !shmem_file(file)))
+		return -EINVAL;
+
 	/*
 	 * Does the application expect PROT_READ to imply PROT_EXEC?
 	 *
diff --git a/mm/shmem.c b/mm/shmem.c
index 5d46611..5d15b08 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1812,7 +1812,22 @@  static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 repeat:
 	if (sgp <= SGP_CACHE &&
 	    ((loff_t)index << PAGE_SHIFT) >= i_size_read(inode)) {
-		return -EINVAL;
+		if (!vma || !(vma->vm_flags & VM_NOSIGBUS))
+			return -EINVAL;
+
+		vma->vm_flags |= VM_MIXEDMAP;
+		/*
+		 * Get zero page for MAP_NOSIGBUS mapping, which isn't
+                 * coherent wrt shmem contents that are expanded and
+		 * filled in later.
+		 */
+		error = vm_insert_page(vma, (unsigned long)vmf->address,
+					ZERO_PAGE(0));
+		if (error)
+			return error;
+
+		*fault_type = VM_FAULT_NOPAGE;
+		return 0;
 	}
 
 	sbinfo = SHMEM_SB(inode->i_sb);