mbox series

[v2,00/15] Avoid MAP_FIXED gap exposure

Message ID 20240625191145.3382793-1-Liam.Howlett@oracle.com (mailing list archive)
Headers show
Series Avoid MAP_FIXED gap exposure | expand

Message

Liam R. Howlett June 25, 2024, 7:11 p.m. UTC
It is now possible to walk the vma tree using the rcu read locks and is
beneficial to do so to reduce lock contention.  Doing so while a
MAP_FIXED mapping is executing means that a reader may see a gap in the
vma tree that should never logically exist - and does not when using the
mmap lock in read mode.  The temporal gap exists because mmap_region()
calls munmap() prior to installing the new mapping.

This patch set stops rcu readers from seeing the temporal gap by
splitting up the munmap() function into two parts.  The first part
prepares the vma tree for modifications by doing the necessary splits
and tracks the vmas marked for removal in a side tree.  The second part
completes the munmapping of the vmas after the vma tree has been
overwritten (either by a MAP_FIXED replacement vma or by a NULL in the
munmap() case).

Please note that rcu walkers will still be able to see a temporary state
of split vmas that may be in the process of being removed, but the
temporal gap will not be exposed.  vma_start_write() are called on both
parts of the split vma, so this state is detectable.

RFC: https://lore.kernel.org/linux-mm/20240531163217.1584450-1-Liam.Howlett@oracle.com/
v1: https://lore.kernel.org/linux-mm/20240611180200.711239-1-Liam.Howlett@oracle.com/

Changes since v1:
- Fixed issue with expanding vmas clearing the incorrect PTE range,
  Thanks Bert Karwatzki and Jiri Olsa for testing linux-next
- Separated patches into smaller portions for easier reviewing
- Replaced open coded shifting of length with PHYS_PFN()
- Changed security calculation (Cc Kees on this change)
- Changed the vma iterator use to point to the targeted area instead of
  the previous vma
- Remove page table entries prior to fully removing the vmas, if a
  driver may do mappings
- Tested with stress-ng and split/joining of VMA at the start, end, and
  middle of a vma.

Liam R. Howlett (15):
  mm/mmap: Correctly position vma_iterator in __split_vma()
  mm/mmap: Introduce abort_munmap_vmas()
  mm/mmap: Introduce vmi_complete_munmap_vmas()
  mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap()
  mm/mmap: Introduce vma_munmap_struct for use in munmap operations
  mm/mmap: Change munmap to use vma_munmap_struct() for accounting and
    surrounding vmas
  mm/mmap: Extract validate_mm() from vma_complete()
  mm/mmap: Inline munmap operation in mmap_region()
  mm/mmap: Expand mmap_region() munmap call
  mm/mmap: Reposition vma iterator in mmap_region()
  mm/mmap: Track start and end of munmap in vma_munmap_struct
  mm/mmap: Avoid zeroing vma tree in mmap_region()
  mm/mmap: Use PHYS_PFN in mmap_region()
  mm/mmap: Use vms accounted pages in mmap_region()
  mm/mmap: Move may_expand_vm() check in mmap_region()

 mm/internal.h |  25 +++
 mm/mmap.c     | 449 +++++++++++++++++++++++++++++++-------------------
 2 files changed, 304 insertions(+), 170 deletions(-)

Comments

Andrew Morton June 26, 2024, 8:58 p.m. UTC | #1
On Tue, 25 Jun 2024 15:11:30 -0400 "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:

> It is now possible to walk the vma tree using the rcu read locks and is
> beneficial to do so to reduce lock contention.  Doing so while a
> MAP_FIXED mapping is executing means that a reader may see a gap in the
> vma tree that should never logically exist - and does not when using the
> mmap lock in read mode.  The temporal gap exists because mmap_region()
> calls munmap() prior to installing the new mapping.

What are the consequences when this race hits?  IOW, why do we need to
change anything?
Liam R. Howlett June 27, 2024, 1:15 a.m. UTC | #2
* Andrew Morton <akpm@linux-foundation.org> [240626 16:59]:
> On Tue, 25 Jun 2024 15:11:30 -0400 "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
> 
> > It is now possible to walk the vma tree using the rcu read locks and is
> > beneficial to do so to reduce lock contention.  Doing so while a
> > MAP_FIXED mapping is executing means that a reader may see a gap in the
> > vma tree that should never logically exist - and does not when using the
> > mmap lock in read mode.  The temporal gap exists because mmap_region()
> > calls munmap() prior to installing the new mapping.
> 
> What are the consequences when this race hits?  IOW, why do we need to
> change anything?
> 

In the (near) future, we want to walk the vma tree to produce
/proc/<pid>/maps.  Without this change we will see the temporal gap and
expose it to the user.  This series was initially sent to Suren as part
of his patch set.

We also have the new interface for an ioctl request to a vma at or above
an address. I had highlighted that an rcu reader would be ideal, but
proved too difficult at this time. These patches by Andrii are currently
not using the rcu reading method as this and a per-vma locking
clarification are needed.

Since there were two users for this code, I decided to send it out
before the other patches.

Thanks,
Liam
Andrew Morton June 27, 2024, 1:28 a.m. UTC | #3
On Wed, 26 Jun 2024 21:15:18 -0400 "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:

> * Andrew Morton <akpm@linux-foundation.org> [240626 16:59]:
> > On Tue, 25 Jun 2024 15:11:30 -0400 "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
> > 
> > > It is now possible to walk the vma tree using the rcu read locks and is
> > > beneficial to do so to reduce lock contention.  Doing so while a
> > > MAP_FIXED mapping is executing means that a reader may see a gap in the
> > > vma tree that should never logically exist - and does not when using the
> > > mmap lock in read mode.  The temporal gap exists because mmap_region()
> > > calls munmap() prior to installing the new mapping.
> > 
> > What are the consequences when this race hits?  IOW, why do we need to
> > change anything?
> > 
> 
> In the (near) future, we want to walk the vma tree to produce
> /proc/<pid>/maps.  Without this change we will see the temporal gap and
> expose it to the user.  This series was initially sent to Suren as part
> of his patch set.
> 
> We also have the new interface for an ioctl request to a vma at or above
> an address. I had highlighted that an rcu reader would be ideal, but
> proved too difficult at this time. These patches by Andrii are currently
> not using the rcu reading method as this and a per-vma locking
> clarification are needed.
> 
> Since there were two users for this code, I decided to send it out
> before the other patches.

OK, thanks.  We're approaching -rc6 and things are a bit sketchy so I'm
inclined to hold this off until the next cycle, unless there's urgency?
Liam R. Howlett June 27, 2024, 1:31 p.m. UTC | #4
* Andrew Morton <akpm@linux-foundation.org> [240626 21:28]:
> On Wed, 26 Jun 2024 21:15:18 -0400 "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
> 
> > * Andrew Morton <akpm@linux-foundation.org> [240626 16:59]:
> > > On Tue, 25 Jun 2024 15:11:30 -0400 "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
> > > 
> > > > It is now possible to walk the vma tree using the rcu read locks and is
> > > > beneficial to do so to reduce lock contention.  Doing so while a
> > > > MAP_FIXED mapping is executing means that a reader may see a gap in the
> > > > vma tree that should never logically exist - and does not when using the
> > > > mmap lock in read mode.  The temporal gap exists because mmap_region()
> > > > calls munmap() prior to installing the new mapping.
> > > 
> > > What are the consequences when this race hits?  IOW, why do we need to
> > > change anything?
> > > 
> > 
> > In the (near) future, we want to walk the vma tree to produce
> > /proc/<pid>/maps.  Without this change we will see the temporal gap and
> > expose it to the user.  This series was initially sent to Suren as part
> > of his patch set.
> > 
> > We also have the new interface for an ioctl request to a vma at or above
> > an address. I had highlighted that an rcu reader would be ideal, but
> > proved too difficult at this time. These patches by Andrii are currently
> > not using the rcu reading method as this and a per-vma locking
> > clarification are needed.
> > 
> > Since there were two users for this code, I decided to send it out
> > before the other patches.
> 
> OK, thanks.  We're approaching -rc6 and things are a bit sketchy so I'm
> inclined to hold this off until the next cycle, unless there's urgency?
> 

There is no urgency.  I'm more than happy to hold off merging to get a
full cycle of testing.

Thanks,
Liam