Message ID | 20230718234512.1690985-11-seanjc@google.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Delegated to: | Paul Moore |
Headers | show |
Series | KVM: guest_memfd() and per-page attributes | expand |
On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: > diff --git a/mm/compaction.c b/mm/compaction.c > index dbc9f86b1934..a3d2b132df52 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) > goto isolate_fail_put; > > + /* The mapping truly isn't movable. */ > + if (mapping && mapping_unmovable(mapping)) > + goto isolate_fail_put; > + I doubt that it is safe to dereference mapping here. I believe the folio can be truncated from under us and the mapping freed with the inode. The folio has to be locked to dereference mapping safely (given that the mapping is still tied to the folio). Vlastimil, any comments?
On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote: > On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: > > diff --git a/mm/compaction.c b/mm/compaction.c > > index dbc9f86b1934..a3d2b132df52 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > > if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) > > goto isolate_fail_put; > > > > + /* The mapping truly isn't movable. */ > > + if (mapping && mapping_unmovable(mapping)) > > + goto isolate_fail_put; > > + > > I doubt that it is safe to dereference mapping here. I believe the folio > can be truncated from under us and the mapping freed with the inode. > > The folio has to be locked to dereference mapping safely (given that the > mapping is still tied to the folio). There's even a comment to that effect later on in the function: /* * Only pages without mappings or that have a * ->migrate_folio callback are possible to migrate * without blocking. However, we can be racing with * truncation so it's necessary to lock the page * to stabilise the mapping as truncation holds * the page lock until after the page is removed * from the page cache. */ (that could be reworded to make it clear how dangerous dereferencing ->mapping is without the lock ... and it does need to be changed to say "folio lock" instead of "page lock", so ...) How does this look? /* * Only folios without mappings or that have * a ->migrate_folio callback are possible to * migrate without blocking. However, we can * be racing with truncation, which can free * the mapping. Truncation holds the folio lock * until after the folio is removed from the page * cache so holding it ourselves is sufficient. */
On Tue, Jul 25, 2023 at 01:51:55PM +0100, Matthew Wilcox wrote: > On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote: > > On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: > > > diff --git a/mm/compaction.c b/mm/compaction.c > > > index dbc9f86b1934..a3d2b132df52 100644 > > > --- a/mm/compaction.c > > > +++ b/mm/compaction.c > > > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > > > if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) > > > goto isolate_fail_put; > > > > > > + /* The mapping truly isn't movable. */ > > > + if (mapping && mapping_unmovable(mapping)) > > > + goto isolate_fail_put; > > > + > > > > I doubt that it is safe to dereference mapping here. I believe the folio > > can be truncated from under us and the mapping freed with the inode. > > > > The folio has to be locked to dereference mapping safely (given that the > > mapping is still tied to the folio). > > There's even a comment to that effect later on in the function: > > /* > * Only pages without mappings or that have a > * ->migrate_folio callback are possible to migrate > * without blocking. However, we can be racing with > * truncation so it's necessary to lock the page > * to stabilise the mapping as truncation holds > * the page lock until after the page is removed > * from the page cache. > */ > > (that could be reworded to make it clear how dangerous dereferencing > ->mapping is without the lock ... and it does need to be changed to say > "folio lock" instead of "page lock", so ...) > > How does this look? > > /* > * Only folios without mappings or that have > * a ->migrate_folio callback are possible to > * migrate without blocking. However, we can > * be racing with truncation, which can free > * the mapping. Truncation holds the folio lock > * until after the folio is removed from the page > * cache so holding it ourselves is sufficient. > */ > Looks good to me.
On 7/25/23 14:51, Matthew Wilcox wrote: > On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote: >> On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: >> > diff --git a/mm/compaction.c b/mm/compaction.c >> > index dbc9f86b1934..a3d2b132df52 100644 >> > --- a/mm/compaction.c >> > +++ b/mm/compaction.c >> > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, >> > if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) >> > goto isolate_fail_put; >> > >> > + /* The mapping truly isn't movable. */ >> > + if (mapping && mapping_unmovable(mapping)) >> > + goto isolate_fail_put; >> > + >> >> I doubt that it is safe to dereference mapping here. I believe the folio >> can be truncated from under us and the mapping freed with the inode. >> >> The folio has to be locked to dereference mapping safely (given that the >> mapping is still tied to the folio). > > There's even a comment to that effect later on in the function: Hmm, well spotted. But it wouldn't be so great if we now had to lock every inspected page (and not just dirty pages), just to check the AS_ bit. But I wonder if this is leftover from previous versions. Are the guest pages even PageLRU currently? (and should they be, given how they can't be swapped out or anything?) If not, isolate_migratepages_block will skip them anyway. > > /* > * Only pages without mappings or that have a > * ->migrate_folio callback are possible to migrate > * without blocking. However, we can be racing with > * truncation so it's necessary to lock the page > * to stabilise the mapping as truncation holds > * the page lock until after the page is removed > * from the page cache. > */ > > (that could be reworded to make it clear how dangerous dereferencing > ->mapping is without the lock ... and it does need to be changed to say > "folio lock" instead of "page lock", so ...) > How does this look? > > /* > * Only folios without mappings or that have > * a ->migrate_folio callback are possible to > * migrate without blocking. However, we can > * be racing with truncation, which can free > * the mapping. Truncation holds the folio lock > * until after the folio is removed from the page > * cache so holding it ourselves is sufficient. > */ >
On 7/28/23 18:02, Vlastimil Babka wrote: >> There's even a comment to that effect later on in the function: > Hmm, well spotted. But it wouldn't be so great if we now had to lock every > inspected page (and not just dirty pages), just to check the AS_ bit. > > But I wonder if this is leftover from previous versions. Are the guest pages > even PageLRU currently? (and should they be, given how they can't be swapped > out or anything?) If not, isolate_migratepages_block will skip them anyway. No, they're not (migration or even swap-out is not excluded for the future, but for now it's left for future work. Paolo
On 7/25/23 14:51, Matthew Wilcox wrote: > On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote: >> On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: >> > diff --git a/mm/compaction.c b/mm/compaction.c >> > index dbc9f86b1934..a3d2b132df52 100644 >> > --- a/mm/compaction.c >> > +++ b/mm/compaction.c >> > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, >> > if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) >> > goto isolate_fail_put; >> > >> > + /* The mapping truly isn't movable. */ >> > + if (mapping && mapping_unmovable(mapping)) >> > + goto isolate_fail_put; >> > + >> >> I doubt that it is safe to dereference mapping here. I believe the folio >> can be truncated from under us and the mapping freed with the inode. >> >> The folio has to be locked to dereference mapping safely (given that the >> mapping is still tied to the folio). > > There's even a comment to that effect later on in the function: > > /* > * Only pages without mappings or that have a > * ->migrate_folio callback are possible to migrate > * without blocking. However, we can be racing with > * truncation so it's necessary to lock the page > * to stabilise the mapping as truncation holds > * the page lock until after the page is removed > * from the page cache. > */ > > (that could be reworded to make it clear how dangerous dereferencing > ->mapping is without the lock ... and it does need to be changed to say > "folio lock" instead of "page lock", so ...) > > How does this look? > > /* > * Only folios without mappings or that have > * a ->migrate_folio callback are possible to > * migrate without blocking. However, we can > * be racing with truncation, which can free > * the mapping. Truncation holds the folio lock > * until after the folio is removed from the page > * cache so holding it ourselves is sufficient. > */ Incorporated to my attempt at a fix (posted separately per the requested process): https://lore.kernel.org/all/20230901082025.20548-2-vbabka@suse.cz/
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 716953ee1ebd..931d2f1da7d5 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -203,6 +203,7 @@ enum mapping_flags { /* writeback related tags are not used */ AS_NO_WRITEBACK_TAGS = 5, AS_LARGE_FOLIO_SUPPORT = 6, + AS_UNMOVABLE = 7, /* The mapping cannot be moved, ever */ }; /** @@ -273,6 +274,16 @@ static inline int mapping_use_writeback_tags(struct address_space *mapping) return !test_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags); } +static inline void mapping_set_unmovable(struct address_space *mapping) +{ + set_bit(AS_UNMOVABLE, &mapping->flags); +} + +static inline bool mapping_unmovable(struct address_space *mapping) +{ + return test_bit(AS_UNMOVABLE, &mapping->flags); +} + static inline gfp_t mapping_gfp_mask(struct address_space * mapping) { return mapping->gfp_mask; diff --git a/mm/compaction.c b/mm/compaction.c index dbc9f86b1934..a3d2b132df52 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio)) goto isolate_fail_put; + /* The mapping truly isn't movable. */ + if (mapping && mapping_unmovable(mapping)) + goto isolate_fail_put; + /* * Only allow to migrate anonymous pages in GFP_NOFS context * because those do not depend on fs locks. diff --git a/mm/migrate.c b/mm/migrate.c index 24baad2571e3..c00a4ca86698 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -954,6 +954,8 @@ static int move_to_new_folio(struct folio *dst, struct folio *src, if (!mapping) rc = migrate_folio(mapping, dst, src, mode); + else if (mapping_unmovable(mapping)) + rc = -EOPNOTSUPP; else if (mapping->a_ops->migrate_folio) /* * Most folios have a mapping and most filesystems
Signed-off-by: Sean Christopherson <seanjc@google.com> --- include/linux/pagemap.h | 11 +++++++++++ mm/compaction.c | 4 ++++ mm/migrate.c | 2 ++ 3 files changed, 17 insertions(+)