diff mbox series

[RFC,v11,10/29] mm: Add AS_UNMOVABLE to mark mapping as completely unmovable

Message ID 20230718234512.1690985-11-seanjc@google.com (mailing list archive)
State Handled Elsewhere
Delegated to: Paul Moore
Headers show
Series KVM: guest_memfd() and per-page attributes | expand

Commit Message

Sean Christopherson July 18, 2023, 11:44 p.m. UTC
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/pagemap.h | 11 +++++++++++
 mm/compaction.c         |  4 ++++
 mm/migrate.c            |  2 ++
 3 files changed, 17 insertions(+)

Comments

Kirill A . Shutemov July 25, 2023, 10:24 a.m. UTC | #1
On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote:
> diff --git a/mm/compaction.c b/mm/compaction.c
> index dbc9f86b1934..a3d2b132df52 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio))
>  			goto isolate_fail_put;
>  
> +		/* The mapping truly isn't movable. */
> +		if (mapping && mapping_unmovable(mapping))
> +			goto isolate_fail_put;
> +

I doubt that it is safe to dereference mapping here. I believe the folio
can be truncated from under us and the mapping freed with the inode.

The folio has to be locked to dereference mapping safely (given that the
mapping is still tied to the folio).

Vlastimil, any comments?
Matthew Wilcox July 25, 2023, 12:51 p.m. UTC | #2
On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote:
> On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote:
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index dbc9f86b1934..a3d2b132df52 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >  		if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio))
> >  			goto isolate_fail_put;
> >  
> > +		/* The mapping truly isn't movable. */
> > +		if (mapping && mapping_unmovable(mapping))
> > +			goto isolate_fail_put;
> > +
> 
> I doubt that it is safe to dereference mapping here. I believe the folio
> can be truncated from under us and the mapping freed with the inode.
> 
> The folio has to be locked to dereference mapping safely (given that the
> mapping is still tied to the folio).

There's even a comment to that effect later on in the function:

                        /*
                         * Only pages without mappings or that have a
                         * ->migrate_folio callback are possible to migrate
                         * without blocking. However, we can be racing with
                         * truncation so it's necessary to lock the page
                         * to stabilise the mapping as truncation holds
                         * the page lock until after the page is removed
                         * from the page cache.
                         */

(that could be reworded to make it clear how dangerous dereferencing
->mapping is without the lock ... and it does need to be changed to say
"folio lock" instead of "page lock", so ...)

How does this look?

                        /*
                         * Only folios without mappings or that have
                         * a ->migrate_folio callback are possible to
                         * migrate without blocking. However, we can
                         * be racing with truncation, which can free
                         * the mapping.  Truncation holds the folio lock
                         * until after the folio is removed from the page
                         * cache so holding it ourselves is sufficient.
                         */
Kirill A . Shutemov July 26, 2023, 11:36 a.m. UTC | #3
On Tue, Jul 25, 2023 at 01:51:55PM +0100, Matthew Wilcox wrote:
> On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote:
> > On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote:
> > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > index dbc9f86b1934..a3d2b132df52 100644
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > >  		if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio))
> > >  			goto isolate_fail_put;
> > >  
> > > +		/* The mapping truly isn't movable. */
> > > +		if (mapping && mapping_unmovable(mapping))
> > > +			goto isolate_fail_put;
> > > +
> > 
> > I doubt that it is safe to dereference mapping here. I believe the folio
> > can be truncated from under us and the mapping freed with the inode.
> > 
> > The folio has to be locked to dereference mapping safely (given that the
> > mapping is still tied to the folio).
> 
> There's even a comment to that effect later on in the function:
> 
>                         /*
>                          * Only pages without mappings or that have a
>                          * ->migrate_folio callback are possible to migrate
>                          * without blocking. However, we can be racing with
>                          * truncation so it's necessary to lock the page
>                          * to stabilise the mapping as truncation holds
>                          * the page lock until after the page is removed
>                          * from the page cache.
>                          */
> 
> (that could be reworded to make it clear how dangerous dereferencing
> ->mapping is without the lock ... and it does need to be changed to say
> "folio lock" instead of "page lock", so ...)
> 
> How does this look?
> 
>                         /*
>                          * Only folios without mappings or that have
>                          * a ->migrate_folio callback are possible to
>                          * migrate without blocking. However, we can
>                          * be racing with truncation, which can free
>                          * the mapping.  Truncation holds the folio lock
>                          * until after the folio is removed from the page
>                          * cache so holding it ourselves is sufficient.
>                          */
> 

Looks good to me.
Vlastimil Babka July 28, 2023, 4:02 p.m. UTC | #4
On 7/25/23 14:51, Matthew Wilcox wrote:
> On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote:
>> On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote:
>> > diff --git a/mm/compaction.c b/mm/compaction.c
>> > index dbc9f86b1934..a3d2b132df52 100644
>> > --- a/mm/compaction.c
>> > +++ b/mm/compaction.c
>> > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> >  		if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio))
>> >  			goto isolate_fail_put;
>> >  
>> > +		/* The mapping truly isn't movable. */
>> > +		if (mapping && mapping_unmovable(mapping))
>> > +			goto isolate_fail_put;
>> > +
>> 
>> I doubt that it is safe to dereference mapping here. I believe the folio
>> can be truncated from under us and the mapping freed with the inode.
>> 
>> The folio has to be locked to dereference mapping safely (given that the
>> mapping is still tied to the folio).
> 
> There's even a comment to that effect later on in the function:

Hmm, well spotted. But it wouldn't be so great if we now had to lock every
inspected page (and not just dirty pages), just to check the AS_ bit.

But I wonder if this is leftover from previous versions. Are the guest pages
even PageLRU currently? (and should they be, given how they can't be swapped
out or anything?) If not, isolate_migratepages_block will skip them anyway.

> 
>                         /*
>                          * Only pages without mappings or that have a
>                          * ->migrate_folio callback are possible to migrate
>                          * without blocking. However, we can be racing with
>                          * truncation so it's necessary to lock the page
>                          * to stabilise the mapping as truncation holds
>                          * the page lock until after the page is removed
>                          * from the page cache.
>                          */
> 
> (that could be reworded to make it clear how dangerous dereferencing
> ->mapping is without the lock ... and it does need to be changed to say
> "folio lock" instead of "page lock", so ...)

> How does this look?
> 
>                         /*
>                          * Only folios without mappings or that have
>                          * a ->migrate_folio callback are possible to
>                          * migrate without blocking. However, we can
>                          * be racing with truncation, which can free
>                          * the mapping.  Truncation holds the folio lock
>                          * until after the folio is removed from the page
>                          * cache so holding it ourselves is sufficient.
>                          */
>
Paolo Bonzini July 28, 2023, 4:13 p.m. UTC | #5
On 7/28/23 18:02, Vlastimil Babka wrote:
>> There's even a comment to that effect later on in the function:
> Hmm, well spotted. But it wouldn't be so great if we now had to lock every
> inspected page (and not just dirty pages), just to check the AS_ bit.
> 
> But I wonder if this is leftover from previous versions. Are the guest pages
> even PageLRU currently? (and should they be, given how they can't be swapped
> out or anything?) If not, isolate_migratepages_block will skip them anyway.

No, they're not (migration or even swap-out is not excluded for the 
future, but for now it's left for future work.

Paolo
Vlastimil Babka Sept. 1, 2023, 8:23 a.m. UTC | #6
On 7/25/23 14:51, Matthew Wilcox wrote:
> On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote:
>> On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote:
>> > diff --git a/mm/compaction.c b/mm/compaction.c
>> > index dbc9f86b1934..a3d2b132df52 100644
>> > --- a/mm/compaction.c
>> > +++ b/mm/compaction.c
>> > @@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> >  		if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio))
>> >  			goto isolate_fail_put;
>> >  
>> > +		/* The mapping truly isn't movable. */
>> > +		if (mapping && mapping_unmovable(mapping))
>> > +			goto isolate_fail_put;
>> > +
>> 
>> I doubt that it is safe to dereference mapping here. I believe the folio
>> can be truncated from under us and the mapping freed with the inode.
>> 
>> The folio has to be locked to dereference mapping safely (given that the
>> mapping is still tied to the folio).
> 
> There's even a comment to that effect later on in the function:
> 
>                         /*
>                          * Only pages without mappings or that have a
>                          * ->migrate_folio callback are possible to migrate
>                          * without blocking. However, we can be racing with
>                          * truncation so it's necessary to lock the page
>                          * to stabilise the mapping as truncation holds
>                          * the page lock until after the page is removed
>                          * from the page cache.
>                          */
> 
> (that could be reworded to make it clear how dangerous dereferencing
> ->mapping is without the lock ... and it does need to be changed to say
> "folio lock" instead of "page lock", so ...)
> 
> How does this look?
> 
>                         /*
>                          * Only folios without mappings or that have
>                          * a ->migrate_folio callback are possible to
>                          * migrate without blocking. However, we can
>                          * be racing with truncation, which can free
>                          * the mapping.  Truncation holds the folio lock
>                          * until after the folio is removed from the page
>                          * cache so holding it ourselves is sufficient.
>                          */

Incorporated to my attempt at a fix (posted separately per the requested
process):

https://lore.kernel.org/all/20230901082025.20548-2-vbabka@suse.cz/
diff mbox series

Patch

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 716953ee1ebd..931d2f1da7d5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -203,6 +203,7 @@  enum mapping_flags {
 	/* writeback related tags are not used */
 	AS_NO_WRITEBACK_TAGS = 5,
 	AS_LARGE_FOLIO_SUPPORT = 6,
+	AS_UNMOVABLE	= 7,	/* The mapping cannot be moved, ever */
 };
 
 /**
@@ -273,6 +274,16 @@  static inline int mapping_use_writeback_tags(struct address_space *mapping)
 	return !test_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags);
 }
 
+static inline void mapping_set_unmovable(struct address_space *mapping)
+{
+	set_bit(AS_UNMOVABLE, &mapping->flags);
+}
+
+static inline bool mapping_unmovable(struct address_space *mapping)
+{
+	return test_bit(AS_UNMOVABLE, &mapping->flags);
+}
+
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
 	return mapping->gfp_mask;
diff --git a/mm/compaction.c b/mm/compaction.c
index dbc9f86b1934..a3d2b132df52 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1047,6 +1047,10 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		if (!mapping && (folio_ref_count(folio) - 1) > folio_mapcount(folio))
 			goto isolate_fail_put;
 
+		/* The mapping truly isn't movable. */
+		if (mapping && mapping_unmovable(mapping))
+			goto isolate_fail_put;
+
 		/*
 		 * Only allow to migrate anonymous pages in GFP_NOFS context
 		 * because those do not depend on fs locks.
diff --git a/mm/migrate.c b/mm/migrate.c
index 24baad2571e3..c00a4ca86698 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -954,6 +954,8 @@  static int move_to_new_folio(struct folio *dst, struct folio *src,
 
 		if (!mapping)
 			rc = migrate_folio(mapping, dst, src, mode);
+		else if (mapping_unmovable(mapping))
+			rc = -EOPNOTSUPP;
 		else if (mapping->a_ops->migrate_folio)
 			/*
 			 * Most folios have a mapping and most filesystems