diff mbox series

[v1,5/7] mm: introduce page_offline_(begin|end|freeze|unfreeze) to synchronize setting PageOffline()

Message ID 20210429122519.15183-6-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages | expand

Commit Message

David Hildenbrand April 29, 2021, 12:25 p.m. UTC
A driver might set a page logically offline -- PageOffline() -- and
turn the page inaccessible in the hypervisor; after that, access to page
content can be fatal. One example is virtio-mem; while unplugged memory
-- marked as PageOffline() can currently be read in the hypervisor, this
will no longer be the case in the future; for example, when having
a virtio-mem device backed by huge pages in the hypervisor.

Some special PFN walkers -- i.e., /proc/kcore -- read content of random
pages after checking PageOffline(); however, these PFN walkers can race
with drivers that set PageOffline().

Let's introduce page_offline_(begin|end|freeze|unfreeze) for
synchronizing.

page_offline_freeze()/page_offline_unfreeze() allows for a subsystem to
synchronize with such drivers, achieving that a page cannot be set
PageOffline() while frozen.

page_offline_begin()/page_offline_end() is used by drivers that care about
such races when setting a page PageOffline().

For simplicity, use a rwsem for now; neither drivers nor users are
performance sensitive.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/page-flags.h |  5 +++++
 mm/util.c                  | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

Comments

Mike Rapoport May 2, 2021, 6:33 a.m. UTC | #1
On Thu, Apr 29, 2021 at 02:25:17PM +0200, David Hildenbrand wrote:
> A driver might set a page logically offline -- PageOffline() -- and
> turn the page inaccessible in the hypervisor; after that, access to page
> content can be fatal. One example is virtio-mem; while unplugged memory
> -- marked as PageOffline() can currently be read in the hypervisor, this
> will no longer be the case in the future; for example, when having
> a virtio-mem device backed by huge pages in the hypervisor.
> 
> Some special PFN walkers -- i.e., /proc/kcore -- read content of random
> pages after checking PageOffline(); however, these PFN walkers can race
> with drivers that set PageOffline().
> 
> Let's introduce page_offline_(begin|end|freeze|unfreeze) for

Bikeshed: freeze|thaw?

> synchronizing.
> 
> page_offline_freeze()/page_offline_unfreeze() allows for a subsystem to
> synchronize with such drivers, achieving that a page cannot be set
> PageOffline() while frozen.
> 
> page_offline_begin()/page_offline_end() is used by drivers that care about
> such races when setting a page PageOffline().
> 
> For simplicity, use a rwsem for now; neither drivers nor users are
> performance sensitive.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/page-flags.h |  5 +++++
>  mm/util.c                  | 38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index b8c56672a588..e3d00c72f459 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -767,6 +767,11 @@ PAGE_TYPE_OPS(Buddy, buddy)
>   */
>  PAGE_TYPE_OPS(Offline, offline)
>  
> +extern void page_offline_freeze(void);
> +extern void page_offline_unfreeze(void);
> +extern void page_offline_begin(void);
> +extern void page_offline_end(void);
> +
>  /*
>   * Marks pages in use as page tables.
>   */
> diff --git a/mm/util.c b/mm/util.c
> index 54870226cea6..95395d4e4209 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1013,3 +1013,41 @@ void mem_dump_obj(void *object)
>  	}
>  	pr_cont(" non-slab/vmalloc memory.\n");
>  }
> +
> +/*
> + * A driver might set a page logically offline -- PageOffline() -- and
> + * turn the page inaccessible in the hypervisor; after that, access to page
> + * content can be fatal.
> + *
> + * Some special PFN walkers -- i.e., /proc/kcore -- read content of random
> + * pages after checking PageOffline(); however, these PFN walkers can race
> + * with drivers that set PageOffline().
> + *
> + * page_offline_freeze()/page_offline_unfreeze() allows for a subsystem to
> + * synchronize with such drivers, achieving that a page cannot be set
> + * PageOffline() while frozen.
> + *
> + * page_offline_begin()/page_offline_end() is used by drivers that care about
> + * such races when setting a page PageOffline().
> + */
> +static DECLARE_RWSEM(page_offline_rwsem);
> +
> +void page_offline_freeze(void)
> +{
> +	down_read(&page_offline_rwsem);
> +}
> +
> +void page_offline_unfreeze(void)
> +{
> +	up_read(&page_offline_rwsem);
> +}
> +
> +void page_offline_begin(void)
> +{
> +	down_write(&page_offline_rwsem);
> +}
> +
> +void page_offline_end(void)
> +{
> +	up_write(&page_offline_rwsem);
> +}
> -- 
> 2.30.2
>
David Hildenbrand May 3, 2021, 8:11 a.m. UTC | #2
On 02.05.21 08:33, Mike Rapoport wrote:
> On Thu, Apr 29, 2021 at 02:25:17PM +0200, David Hildenbrand wrote:
>> A driver might set a page logically offline -- PageOffline() -- and
>> turn the page inaccessible in the hypervisor; after that, access to page
>> content can be fatal. One example is virtio-mem; while unplugged memory
>> -- marked as PageOffline() can currently be read in the hypervisor, this
>> will no longer be the case in the future; for example, when having
>> a virtio-mem device backed by huge pages in the hypervisor.
>>
>> Some special PFN walkers -- i.e., /proc/kcore -- read content of random
>> pages after checking PageOffline(); however, these PFN walkers can race
>> with drivers that set PageOffline().
>>
>> Let's introduce page_offline_(begin|end|freeze|unfreeze) for
> 
> Bikeshed: freeze|thaw?
>

Sure :)
Michal Hocko May 5, 2021, 1:24 p.m. UTC | #3
On Thu 29-04-21 14:25:17, David Hildenbrand wrote:
> A driver might set a page logically offline -- PageOffline() -- and
> turn the page inaccessible in the hypervisor; after that, access to page
> content can be fatal. One example is virtio-mem; while unplugged memory
> -- marked as PageOffline() can currently be read in the hypervisor, this
> will no longer be the case in the future; for example, when having
> a virtio-mem device backed by huge pages in the hypervisor.
> 
> Some special PFN walkers -- i.e., /proc/kcore -- read content of random
> pages after checking PageOffline(); however, these PFN walkers can race
> with drivers that set PageOffline().
> 
> Let's introduce page_offline_(begin|end|freeze|unfreeze) for
> synchronizing.
> 
> page_offline_freeze()/page_offline_unfreeze() allows for a subsystem to
> synchronize with such drivers, achieving that a page cannot be set
> PageOffline() while frozen.
> 
> page_offline_begin()/page_offline_end() is used by drivers that care about
> such races when setting a page PageOffline().
> 
> For simplicity, use a rwsem for now; neither drivers nor users are
> performance sensitive.

Please add a note to the PageOffline documentation as well. While are
adding the api close enough an explicit note there wouldn't hurt.

> Signed-off-by: David Hildenbrand <david@redhat.com>

As to the patch itself, I am slightly worried that other pfn walkers
might be less tolerant to the locking than the proc ones. On the other
hand most users shouldn't really care as they do not tend to touch the
memory content and PageOffline check without any synchronization should
be sufficient for those. Let's try this out and see where we get...

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/page-flags.h |  5 +++++
>  mm/util.c                  | 38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index b8c56672a588..e3d00c72f459 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -767,6 +767,11 @@ PAGE_TYPE_OPS(Buddy, buddy)
>   */
>  PAGE_TYPE_OPS(Offline, offline)
>  
> +extern void page_offline_freeze(void);
> +extern void page_offline_unfreeze(void);
> +extern void page_offline_begin(void);
> +extern void page_offline_end(void);
> +
>  /*
>   * Marks pages in use as page tables.
>   */
> diff --git a/mm/util.c b/mm/util.c
> index 54870226cea6..95395d4e4209 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1013,3 +1013,41 @@ void mem_dump_obj(void *object)
>  	}
>  	pr_cont(" non-slab/vmalloc memory.\n");
>  }
> +
> +/*
> + * A driver might set a page logically offline -- PageOffline() -- and
> + * turn the page inaccessible in the hypervisor; after that, access to page
> + * content can be fatal.
> + *
> + * Some special PFN walkers -- i.e., /proc/kcore -- read content of random
> + * pages after checking PageOffline(); however, these PFN walkers can race
> + * with drivers that set PageOffline().
> + *
> + * page_offline_freeze()/page_offline_unfreeze() allows for a subsystem to
> + * synchronize with such drivers, achieving that a page cannot be set
> + * PageOffline() while frozen.
> + *
> + * page_offline_begin()/page_offline_end() is used by drivers that care about
> + * such races when setting a page PageOffline().
> + */
> +static DECLARE_RWSEM(page_offline_rwsem);
> +
> +void page_offline_freeze(void)
> +{
> +	down_read(&page_offline_rwsem);
> +}
> +
> +void page_offline_unfreeze(void)
> +{
> +	up_read(&page_offline_rwsem);
> +}
> +
> +void page_offline_begin(void)
> +{
> +	down_write(&page_offline_rwsem);
> +}
> +
> +void page_offline_end(void)
> +{
> +	up_write(&page_offline_rwsem);
> +}
> -- 
> 2.30.2
>
David Hildenbrand May 5, 2021, 3:10 p.m. UTC | #4
On 05.05.21 15:24, Michal Hocko wrote:
> On Thu 29-04-21 14:25:17, David Hildenbrand wrote:
>> A driver might set a page logically offline -- PageOffline() -- and
>> turn the page inaccessible in the hypervisor; after that, access to page
>> content can be fatal. One example is virtio-mem; while unplugged memory
>> -- marked as PageOffline() can currently be read in the hypervisor, this
>> will no longer be the case in the future; for example, when having
>> a virtio-mem device backed by huge pages in the hypervisor.
>>
>> Some special PFN walkers -- i.e., /proc/kcore -- read content of random
>> pages after checking PageOffline(); however, these PFN walkers can race
>> with drivers that set PageOffline().
>>
>> Let's introduce page_offline_(begin|end|freeze|unfreeze) for
>> synchronizing.
>>
>> page_offline_freeze()/page_offline_unfreeze() allows for a subsystem to
>> synchronize with such drivers, achieving that a page cannot be set
>> PageOffline() while frozen.
>>
>> page_offline_begin()/page_offline_end() is used by drivers that care about
>> such races when setting a page PageOffline().
>>
>> For simplicity, use a rwsem for now; neither drivers nor users are
>> performance sensitive.
> 
> Please add a note to the PageOffline documentation as well. While are
> adding the api close enough an explicit note there wouldn't hurt.

Will do.

> 
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> As to the patch itself, I am slightly worried that other pfn walkers
> might be less tolerant to the locking than the proc ones. On the other
> hand most users shouldn't really care as they do not tend to touch the
> memory content and PageOffline check without any synchronization should
> be sufficient for those. Let's try this out and see where we get...

My thinking. Users that actually read random page content (as discussed 
in the cover letter) are

1. Hibernation
2. Dumping (/proc/kcore, /proc/vmcore)
3. Physical memory access bypassing the kernel via /dev/mem
4. Live debug tools (kgdb)

Other PFN walkers really shouldn't (and don't) access random page content.

Thanks!
Mike Rapoport May 5, 2021, 5:41 p.m. UTC | #5
On Wed, May 05, 2021 at 05:10:33PM +0200, David Hildenbrand wrote:
> On 05.05.21 15:24, Michal Hocko wrote:
> > On Thu 29-04-21 14:25:17, David Hildenbrand wrote:
> > > A driver might set a page logically offline -- PageOffline() -- and
> > > turn the page inaccessible in the hypervisor; after that, access to page
> > > content can be fatal. One example is virtio-mem; while unplugged memory
> > > -- marked as PageOffline() can currently be read in the hypervisor, this
> > > will no longer be the case in the future; for example, when having
> > > a virtio-mem device backed by huge pages in the hypervisor.
> > > 
> > > Some special PFN walkers -- i.e., /proc/kcore -- read content of random
> > > pages after checking PageOffline(); however, these PFN walkers can race
> > > with drivers that set PageOffline().
> > > 
> > > Let's introduce page_offline_(begin|end|freeze|unfreeze) for
> > > synchronizing.
> > > 
> > > page_offline_freeze()/page_offline_unfreeze() allows for a subsystem to
> > > synchronize with such drivers, achieving that a page cannot be set
> > > PageOffline() while frozen.
> > > 
> > > page_offline_begin()/page_offline_end() is used by drivers that care about
> > > such races when setting a page PageOffline().
> > > 
> > > For simplicity, use a rwsem for now; neither drivers nor users are
> > > performance sensitive.
> > 
> > Please add a note to the PageOffline documentation as well. While are
> > adding the api close enough an explicit note there wouldn't hurt.
> 
> Will do.
> 
> > 
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > 
> > As to the patch itself, I am slightly worried that other pfn walkers
> > might be less tolerant to the locking than the proc ones. On the other
> > hand most users shouldn't really care as they do not tend to touch the
> > memory content and PageOffline check without any synchronization should
> > be sufficient for those. Let's try this out and see where we get...
> 
> My thinking. Users that actually read random page content (as discussed in
> the cover letter) are
> 
> 1. Hibernation
> 2. Dumping (/proc/kcore, /proc/vmcore)
> 3. Physical memory access bypassing the kernel via /dev/mem
> 4. Live debug tools (kgdb)

I think you can add

5. Very old drivers
 
> Other PFN walkers really shouldn't (and don't) access random page content.
> 
> Thanks!
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 
>
diff mbox series

Patch

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index b8c56672a588..e3d00c72f459 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -767,6 +767,11 @@  PAGE_TYPE_OPS(Buddy, buddy)
  */
 PAGE_TYPE_OPS(Offline, offline)
 
+extern void page_offline_freeze(void);
+extern void page_offline_unfreeze(void);
+extern void page_offline_begin(void);
+extern void page_offline_end(void);
+
 /*
  * Marks pages in use as page tables.
  */
diff --git a/mm/util.c b/mm/util.c
index 54870226cea6..95395d4e4209 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1013,3 +1013,41 @@  void mem_dump_obj(void *object)
 	}
 	pr_cont(" non-slab/vmalloc memory.\n");
 }
+
+/*
+ * A driver might set a page logically offline -- PageOffline() -- and
+ * turn the page inaccessible in the hypervisor; after that, access to page
+ * content can be fatal.
+ *
+ * Some special PFN walkers -- i.e., /proc/kcore -- read content of random
+ * pages after checking PageOffline(); however, these PFN walkers can race
+ * with drivers that set PageOffline().
+ *
+ * page_offline_freeze()/page_offline_unfreeze() allows for a subsystem to
+ * synchronize with such drivers, achieving that a page cannot be set
+ * PageOffline() while frozen.
+ *
+ * page_offline_begin()/page_offline_end() is used by drivers that care about
+ * such races when setting a page PageOffline().
+ */
+static DECLARE_RWSEM(page_offline_rwsem);
+
+void page_offline_freeze(void)
+{
+	down_read(&page_offline_rwsem);
+}
+
+void page_offline_unfreeze(void)
+{
+	up_read(&page_offline_rwsem);
+}
+
+void page_offline_begin(void)
+{
+	down_write(&page_offline_rwsem);
+}
+
+void page_offline_end(void)
+{
+	up_write(&page_offline_rwsem);
+}