mbox series

[RFC,v2,0/4] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization

Message ID 20201026233150.371577-1-mike.kravetz@oracle.com (mailing list archive)
Headers show
Series hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization | expand


Mike Kravetz Oct. 26, 2020, 11:31 p.m. UTC
In commit c0d0381ade79, changes were made to use i_mmap_rwsem for pmd
sharing synchronization.  This required changes to mm locking order that
are hugetlb specific.  Specifically, i_mmap_rwsem must be taken before
the page lock.  This is not not a huge issue in hugetlb specific code,
but becomes more problematic in the areas of page migration and memory
failure where generic mm code had to deal with this change to lock
ordering.  An ugly routine 'hugetlb_page_mapping_lock_write' was added
to help with these issues.

Recently, Hugh Dickins diagnosed a migration BUG as caused by code
introduced with hugetlb i_mmap_rwsem synchronization [1].  Subsequent
discussion in that thread pointed out additional problems in the code.

Adding a rw_semaphore to the hugetlbfs inode for this type of synchronization
was mentioned.  Such an approach is actually 'cleaner' as it can be
inserted in the lock hierarchy where needed.  And, there is no issue
with other parts of the mm using this rw_semaphore.

This series adds a rw_semaphore (hinode_rwsem) to the hugetlbfs inode.

The first patch reverts all commits having to deal with the current use
of i_mmap_rwsem for pmd sharing and fault/truncate synchronization.  The
revert of 5 commits was combined into a single patch.  I am looking for
feedback on this approach.  I considered:
- 5 Patches to revert the 5 commits
- Reverting patches depending on c0d0381ade79, then having a patch to
  change from i_mmap_rwsem to hinode_rwsem.
To me, a 'clean slate' approach seemed best but I am open to whatever
would be easiest to review.

Changes in RFC v2
  - Added missing locking pointed out by Naoya Horiguchi
  - Cleaned up some comments as suggested by Naoya Horiguchi
  - Cleaned up and documented hinode_lock_read() helper and added
    hinode_lock_write() helper.
  - Split out addition of hinode_rwsem and helper routines to a separate

[1] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/

Mike Kravetz (4):
  hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync
  hugetlbfs: add hinode_rwsem to hugetlb specific inode
  hugetlbfs: use hinode_rwsem for pmd sharing synchronization
  huegtlbfs: handle page fault/truncate races

 fs/hugetlbfs/inode.c    |  87 +++++++------
 include/linux/fs.h      |  15 ---
 include/linux/hugetlb.h | 135 ++++++++++++++++++--
 mm/hugetlb.c            | 267 ++++++++++++++++------------------------
 mm/memory-failure.c     |  34 ++---
 mm/memory.c             |   5 +
 mm/migrate.c            |  34 +++--
 mm/rmap.c               |  17 +--
 mm/userfaultfd.c        |  19 +--
 9 files changed, 322 insertions(+), 291 deletions(-)