[v1,7/7] fs/proc/kcore: use page_offline_(freeze|unfreeze)

Message ID	20210429122519.15183-8-david@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=wbcb=J2=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B04161423 From: David Hildenbrand <david@redhat.com> To: linux-kernel@vger.kernel.org Cc: David Hildenbrand <david@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Alexey Dobriyan <adobriyan@gmail.com>, Mike Rapoport <rppt@kernel.org>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Oscar Salvador <osalvador@suse.de>, Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>, Alex Shi <alex.shi@linux.alibaba.com>, Steven Price <steven.price@arm.com>, Mike Kravetz <mike.kravetz@oracle.com>, Aili Yao <yaoaili@kingsoft.com>, Jiri Bohac <jbohac@suse.cz>, "K. Y. Srinivasan" <kys@microsoft.com>, Haiyang Zhang <haiyangz@microsoft.com>, Stephen Hemminger <sthemmin@microsoft.com>, Wei Liu <wei.liu@kernel.org>, Naoya Horiguchi <naoya.horiguchi@nec.com>, linux-hyperv@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 7/7] fs/proc/kcore: use page_offline_(freeze\|unfreeze) Date: Thu, 29 Apr 2021 14:25:19 +0200 Message-Id: <20210429122519.15183-8-david@redhat.com> In-Reply-To: <20210429122519.15183-1-david@redhat.com> References: <20210429122519.15183-1-david@redhat.com> MIME-Version: 1.0 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from="<david@redhat.com>"; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages \| expand [v1,0/7] fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages [v1,1/7] fs/proc/kcore: drop KCORE_REMAP and KCORE_OTHER [v1,2/7] fs/proc/kcore: pfn_is_ram check only applies to KCORE_RAM [v1,3/7] mm: rename and move page_is_poisoned() [v1,4/7] fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages [v1,5/7] mm: introduce page_offline_(begin\|end\|freeze\|unfreeze) to synchronize setting PageOffline() [v1,6/7] virtio-mem: use page_offline_(start\|end) when setting PageOffline() [v1,7/7] fs/proc/kcore: use page_offline_(freeze\|unfreeze)

David Hildenbrand April 29, 2021, 12:25 p.m. UTC

Let's properly synchronize with drivers that set PageOffline(). Unfreeze
every now and then, so drivers that want to set PageOffline() can make
progress.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 fs/proc/kcore.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Mike Rapoport May 2, 2021, 6:34 a.m. UTC | #1

On Thu, Apr 29, 2021 at 02:25:19PM +0200, David Hildenbrand wrote:
> Let's properly synchronize with drivers that set PageOffline(). Unfreeze
> every now and then, so drivers that want to set PageOffline() can make
> progress.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  fs/proc/kcore.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> index 92ff1e4436cb..3d7531f47389 100644
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -311,6 +311,7 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
>  static ssize_t
>  read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  {
> +	size_t page_offline_frozen = 0;
>  	char *buf = file->private_data;
>  	size_t phdrs_offset, notes_offset, data_offset;
>  	size_t phdrs_len, notes_len;
> @@ -509,6 +510,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  			pfn = __pa(start) >> PAGE_SHIFT;
>  			page = pfn_to_online_page(pfn);

Can't this race with page offlining for the first time we get here?
 
> +			/*
> +			 * Don't race against drivers that set PageOffline()
> +			 * and expect no further page access.
> +			 */
> +			if (page_offline_frozen == MAX_ORDER_NR_PAGES) {
> +				page_offline_unfreeze();
> +				page_offline_frozen = 0;
> +				cond_resched();
> +			}
> +			if (!page_offline_frozen++)
> +				page_offline_freeze();
> +

Don't we need to freeze before doing pfn_to_online_page()?

>  			/*
>  			 * Don't read offline sections, logically offline pages
>  			 * (e.g., inflated in a balloon), hwpoisoned pages,
> @@ -565,6 +578,8 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  	}
>  
>  out:
> +	if (page_offline_frozen)
> +		page_offline_unfreeze();
>  	up_read(&kclist_lock);
>  	if (ret)
>  		return ret;
> -- 
> 2.30.2
>

David Hildenbrand May 3, 2021, 8:28 a.m. UTC | #2

On 02.05.21 08:34, Mike Rapoport wrote:
> On Thu, Apr 29, 2021 at 02:25:19PM +0200, David Hildenbrand wrote:
>> Let's properly synchronize with drivers that set PageOffline(). Unfreeze
>> every now and then, so drivers that want to set PageOffline() can make
>> progress.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   fs/proc/kcore.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
>> index 92ff1e4436cb..3d7531f47389 100644
>> --- a/fs/proc/kcore.c
>> +++ b/fs/proc/kcore.c
>> @@ -311,6 +311,7 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
>>   static ssize_t
>>   read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>>   {
>> +	size_t page_offline_frozen = 0;
>>   	char *buf = file->private_data;
>>   	size_t phdrs_offset, notes_offset, data_offset;
>>   	size_t phdrs_len, notes_len;
>> @@ -509,6 +510,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>>   			pfn = __pa(start) >> PAGE_SHIFT;
>>   			page = pfn_to_online_page(pfn);
> 
> Can't this race with page offlining for the first time we get here?

To clarify, we have three types of offline pages in the kernel ...

a) Pages part of an offline memory section; the memap is stale and not 
trustworthy. pfn_to_online_page() checks that. We *can* protect against 
memory offlining using get_online_mems()/put_online_mems(), but usually 
avoid doing so as the race window is very small (and a problem all over 
the kernel we basically never hit) and locking is rather expensive. In 
the future, we might switch to rcu to handle that more efficiently and 
avoiding these possible races.

b) PageOffline(): logically offline pages contained in an online memory 
section with a sane memmap. virtio-mem calls these pages "fake offline"; 
something like a "temporary" memory hole. The new mechanism I propose 
will be used to handle synchronization as races can be more severe, 
e.g., when reading actual page content here.

c) Soft offline pages: hwpoisoned pages that are not actually harmful 
yet, but could become harmful in the future. So we better try to remove 
the page from the page allcoator and try to migrate away existing users.

So page_offline_* handle "b) PageOffline()" only. There is a tiny race 
between pfn_to_online_page(pfn) and looking at the memmap as we have in 
many cases already throughout the kernel, to be tackled in the future.

(A better name for PageOffline() might make sense; PageSoftOffline() 
would be catchy but interferes with c). PageLogicallyOffline() is ugly; 
PageFakeOffline() might do)

>   
>> +			/*
>> +			 * Don't race against drivers that set PageOffline()
>> +			 * and expect no further page access.
>> +			 */
>> +			if (page_offline_frozen == MAX_ORDER_NR_PAGES) {
>> +				page_offline_unfreeze();
>> +				page_offline_frozen = 0;
>> +				cond_resched();
>> +			}
>> +			if (!page_offline_frozen++)
>> +				page_offline_freeze();
>> +
> 
> Don't we need to freeze before doing pfn_to_online_page()?

See my explanation above. Thanks!

Mike Rapoport May 3, 2021, 9:28 a.m. UTC | #3

On Mon, May 03, 2021 at 10:28:36AM +0200, David Hildenbrand wrote:
> On 02.05.21 08:34, Mike Rapoport wrote:
> > On Thu, Apr 29, 2021 at 02:25:19PM +0200, David Hildenbrand wrote:
> > > Let's properly synchronize with drivers that set PageOffline(). Unfreeze
> > > every now and then, so drivers that want to set PageOffline() can make
> > > progress.
> > > 
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > >   fs/proc/kcore.c | 15 +++++++++++++++
> > >   1 file changed, 15 insertions(+)
> > > 
> > > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> > > index 92ff1e4436cb..3d7531f47389 100644
> > > --- a/fs/proc/kcore.c
> > > +++ b/fs/proc/kcore.c
> > > @@ -311,6 +311,7 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
> > >   static ssize_t
> > >   read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
> > >   {
> > > +	size_t page_offline_frozen = 0;
> > >   	char *buf = file->private_data;
> > >   	size_t phdrs_offset, notes_offset, data_offset;
> > >   	size_t phdrs_len, notes_len;
> > > @@ -509,6 +510,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
> > >   			pfn = __pa(start) >> PAGE_SHIFT;
> > >   			page = pfn_to_online_page(pfn);
> > 
> > Can't this race with page offlining for the first time we get here?
> 
> 
> To clarify, we have three types of offline pages in the kernel ...
> 
> a) Pages part of an offline memory section; the memap is stale and not
> trustworthy. pfn_to_online_page() checks that. We *can* protect against
> memory offlining using get_online_mems()/put_online_mems(), but usually
> avoid doing so as the race window is very small (and a problem all over the
> kernel we basically never hit) and locking is rather expensive. In the
> future, we might switch to rcu to handle that more efficiently and avoiding
> these possible races.
> 
> b) PageOffline(): logically offline pages contained in an online memory
> section with a sane memmap. virtio-mem calls these pages "fake offline";
> something like a "temporary" memory hole. The new mechanism I propose will
> be used to handle synchronization as races can be more severe, e.g., when
> reading actual page content here.
> 
> c) Soft offline pages: hwpoisoned pages that are not actually harmful yet,
> but could become harmful in the future. So we better try to remove the page
> from the page allcoator and try to migrate away existing users.
> 
> 
> So page_offline_* handle "b) PageOffline()" only. There is a tiny race
> between pfn_to_online_page(pfn) and looking at the memmap as we have in many
> cases already throughout the kernel, to be tackled in the future.

Right, but here you anyway add locking, so why exclude the first iteration?
 
> (A better name for PageOffline() might make sense; PageSoftOffline() would
> be catchy but interferes with c). PageLogicallyOffline() is ugly;
> PageFakeOffline() might do)
> 
> > > +			/*
> > > +			 * Don't race against drivers that set PageOffline()
> > > +			 * and expect no further page access.
> > > +			 */
> > > +			if (page_offline_frozen == MAX_ORDER_NR_PAGES) {
> > > +				page_offline_unfreeze();
> > > +				page_offline_frozen = 0;
> > > +				cond_resched();
> > > +			}
> > > +			if (!page_offline_frozen++)
> > > +				page_offline_freeze();
> > > +

BTW, did you consider something like

	if (page_offline_frozen++ % MAX_ORDER_NR_PAGES == 0) {
		page_offline_unfreeze();
		cond_resched();
		page_offline_freeze();
	}

We don't seem to care about page_offline_frozen overflows here, do we?

David Hildenbrand May 3, 2021, 10:13 a.m. UTC | #4

On 03.05.21 11:28, Mike Rapoport wrote:
> On Mon, May 03, 2021 at 10:28:36AM +0200, David Hildenbrand wrote:
>> On 02.05.21 08:34, Mike Rapoport wrote:
>>> On Thu, Apr 29, 2021 at 02:25:19PM +0200, David Hildenbrand wrote:
>>>> Let's properly synchronize with drivers that set PageOffline(). Unfreeze
>>>> every now and then, so drivers that want to set PageOffline() can make
>>>> progress.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>    fs/proc/kcore.c | 15 +++++++++++++++
>>>>    1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
>>>> index 92ff1e4436cb..3d7531f47389 100644
>>>> --- a/fs/proc/kcore.c
>>>> +++ b/fs/proc/kcore.c
>>>> @@ -311,6 +311,7 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
>>>>    static ssize_t
>>>>    read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>>>>    {
>>>> +	size_t page_offline_frozen = 0;
>>>>    	char *buf = file->private_data;
>>>>    	size_t phdrs_offset, notes_offset, data_offset;
>>>>    	size_t phdrs_len, notes_len;
>>>> @@ -509,6 +510,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>>>>    			pfn = __pa(start) >> PAGE_SHIFT;
>>>>    			page = pfn_to_online_page(pfn);
>>>
>>> Can't this race with page offlining for the first time we get here?
>>
>>
>> To clarify, we have three types of offline pages in the kernel ...
>>
>> a) Pages part of an offline memory section; the memap is stale and not
>> trustworthy. pfn_to_online_page() checks that. We *can* protect against
>> memory offlining using get_online_mems()/put_online_mems(), but usually
>> avoid doing so as the race window is very small (and a problem all over the
>> kernel we basically never hit) and locking is rather expensive. In the
>> future, we might switch to rcu to handle that more efficiently and avoiding
>> these possible races.
>>
>> b) PageOffline(): logically offline pages contained in an online memory
>> section with a sane memmap. virtio-mem calls these pages "fake offline";
>> something like a "temporary" memory hole. The new mechanism I propose will
>> be used to handle synchronization as races can be more severe, e.g., when
>> reading actual page content here.
>>
>> c) Soft offline pages: hwpoisoned pages that are not actually harmful yet,
>> but could become harmful in the future. So we better try to remove the page
>> from the page allcoator and try to migrate away existing users.
>>
>>
>> So page_offline_* handle "b) PageOffline()" only. There is a tiny race
>> between pfn_to_online_page(pfn) and looking at the memmap as we have in many
>> cases already throughout the kernel, to be tackled in the future.
> 
> Right, but here you anyway add locking, so why exclude the first iteration?

What we're protecting is PageOffline() below. If I didn't mess up, we 
should always be calling page_offline_freeze() before calling 
PageOffline(). Or am I missing something?

> 
> BTW, did you consider something like

Yes, I played with something like that. We'd have to handle the first 
page_offline_freeze() freeze differently, though, and that's where 
things got a bit ugly in my attempts.

> 
> 	if (page_offline_frozen++ % MAX_ORDER_NR_PAGES == 0) {
> 		page_offline_unfreeze();
> 		cond_resched();
> 		page_offline_freeze();
> 	}
> 
> We don't seem to care about page_offline_frozen overflows here, do we?

No, the buffer size is also size_t and gets incremented on a per-byte 
basis. The variant I have right now looked the cleanest to me. Happy to 
hear simpler alternatives.

Mike Rapoport May 3, 2021, 11:33 a.m. UTC | #5

On Mon, May 03, 2021 at 12:13:45PM +0200, David Hildenbrand wrote:
> On 03.05.21 11:28, Mike Rapoport wrote:
> > On Mon, May 03, 2021 at 10:28:36AM +0200, David Hildenbrand wrote:
> > > On 02.05.21 08:34, Mike Rapoport wrote:
> > > > On Thu, Apr 29, 2021 at 02:25:19PM +0200, David Hildenbrand wrote:
> > > > > Let's properly synchronize with drivers that set PageOffline(). Unfreeze
> > > > > every now and then, so drivers that want to set PageOffline() can make
> > > > > progress.
> > > > > 
> > > > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > > > ---
> > > > >    fs/proc/kcore.c | 15 +++++++++++++++
> > > > >    1 file changed, 15 insertions(+)
> > > > > 
> > > > > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> > > > > index 92ff1e4436cb..3d7531f47389 100644
> > > > > --- a/fs/proc/kcore.c
> > > > > +++ b/fs/proc/kcore.c
> > > > > @@ -311,6 +311,7 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
> > > > >    static ssize_t
> > > > >    read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
> > > > >    {
> > > > > +	size_t page_offline_frozen = 0;
> > > > >    	char *buf = file->private_data;
> > > > >    	size_t phdrs_offset, notes_offset, data_offset;
> > > > >    	size_t phdrs_len, notes_len;
> > > > > @@ -509,6 +510,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
> > > > >    			pfn = __pa(start) >> PAGE_SHIFT;
> > > > >    			page = pfn_to_online_page(pfn);
> > > > 
> > > > Can't this race with page offlining for the first time we get here?
> > > 
> > > 
> > > To clarify, we have three types of offline pages in the kernel ...
> > > 
> > > a) Pages part of an offline memory section; the memap is stale and not
> > > trustworthy. pfn_to_online_page() checks that. We *can* protect against
> > > memory offlining using get_online_mems()/put_online_mems(), but usually
> > > avoid doing so as the race window is very small (and a problem all over the
> > > kernel we basically never hit) and locking is rather expensive. In the
> > > future, we might switch to rcu to handle that more efficiently and avoiding
> > > these possible races.
> > > 
> > > b) PageOffline(): logically offline pages contained in an online memory
> > > section with a sane memmap. virtio-mem calls these pages "fake offline";
> > > something like a "temporary" memory hole. The new mechanism I propose will
> > > be used to handle synchronization as races can be more severe, e.g., when
> > > reading actual page content here.
> > > 
> > > c) Soft offline pages: hwpoisoned pages that are not actually harmful yet,
> > > but could become harmful in the future. So we better try to remove the page
> > > from the page allcoator and try to migrate away existing users.
> > > 
> > > 
> > > So page_offline_* handle "b) PageOffline()" only. There is a tiny race
> > > between pfn_to_online_page(pfn) and looking at the memmap as we have in many
> > > cases already throughout the kernel, to be tackled in the future.
> > 
> > Right, but here you anyway add locking, so why exclude the first iteration?
> 
> What we're protecting is PageOffline() below. If I didn't mess up, we should
> always be calling page_offline_freeze() before calling PageOffline(). Or am
> I missing something?
 
Somehow I was under impression we are protecting both pfn_to_online_page()
and PageOffline().
 
> > BTW, did you consider something like
> 
> Yes, I played with something like that. We'd have to handle the first
> page_offline_freeze() freeze differently, though, and that's where things
> got a bit ugly in my attempts.
> 
> > 
> > 	if (page_offline_frozen++ % MAX_ORDER_NR_PAGES == 0) {
> > 		page_offline_unfreeze();
> > 		cond_resched();
> > 		page_offline_freeze();
> > 	}
> > 
> > We don't seem to care about page_offline_frozen overflows here, do we?
> 
> No, the buffer size is also size_t and gets incremented on a per-byte basis.
> The variant I have right now looked the cleanest to me. Happy to hear
> simpler alternatives.

Well, locking for the first time before the while() loop and doing
resched-relock outside switch() would be definitely nicer, and it makes the
last unlock unconditional.

The cost of prevention of memory offline during reads of !KCORE_RAM parts
does not seem that significant to me, but I may be missing something.

David Hildenbrand May 3, 2021, 11:35 a.m. UTC | #6

On 03.05.21 13:33, Mike Rapoport wrote:
> On Mon, May 03, 2021 at 12:13:45PM +0200, David Hildenbrand wrote:
>> On 03.05.21 11:28, Mike Rapoport wrote:
>>> On Mon, May 03, 2021 at 10:28:36AM +0200, David Hildenbrand wrote:
>>>> On 02.05.21 08:34, Mike Rapoport wrote:
>>>>> On Thu, Apr 29, 2021 at 02:25:19PM +0200, David Hildenbrand wrote:
>>>>>> Let's properly synchronize with drivers that set PageOffline(). Unfreeze
>>>>>> every now and then, so drivers that want to set PageOffline() can make
>>>>>> progress.
>>>>>>
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>> ---
>>>>>>     fs/proc/kcore.c | 15 +++++++++++++++
>>>>>>     1 file changed, 15 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
>>>>>> index 92ff1e4436cb..3d7531f47389 100644
>>>>>> --- a/fs/proc/kcore.c
>>>>>> +++ b/fs/proc/kcore.c
>>>>>> @@ -311,6 +311,7 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
>>>>>>     static ssize_t
>>>>>>     read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>>>>>>     {
>>>>>> +	size_t page_offline_frozen = 0;
>>>>>>     	char *buf = file->private_data;
>>>>>>     	size_t phdrs_offset, notes_offset, data_offset;
>>>>>>     	size_t phdrs_len, notes_len;
>>>>>> @@ -509,6 +510,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>>>>>>     			pfn = __pa(start) >> PAGE_SHIFT;
>>>>>>     			page = pfn_to_online_page(pfn);
>>>>>
>>>>> Can't this race with page offlining for the first time we get here?
>>>>
>>>>
>>>> To clarify, we have three types of offline pages in the kernel ...
>>>>
>>>> a) Pages part of an offline memory section; the memap is stale and not
>>>> trustworthy. pfn_to_online_page() checks that. We *can* protect against
>>>> memory offlining using get_online_mems()/put_online_mems(), but usually
>>>> avoid doing so as the race window is very small (and a problem all over the
>>>> kernel we basically never hit) and locking is rather expensive. In the
>>>> future, we might switch to rcu to handle that more efficiently and avoiding
>>>> these possible races.
>>>>
>>>> b) PageOffline(): logically offline pages contained in an online memory
>>>> section with a sane memmap. virtio-mem calls these pages "fake offline";
>>>> something like a "temporary" memory hole. The new mechanism I propose will
>>>> be used to handle synchronization as races can be more severe, e.g., when
>>>> reading actual page content here.
>>>>
>>>> c) Soft offline pages: hwpoisoned pages that are not actually harmful yet,
>>>> but could become harmful in the future. So we better try to remove the page
>>>> from the page allcoator and try to migrate away existing users.
>>>>
>>>>
>>>> So page_offline_* handle "b) PageOffline()" only. There is a tiny race
>>>> between pfn_to_online_page(pfn) and looking at the memmap as we have in many
>>>> cases already throughout the kernel, to be tackled in the future.
>>>
>>> Right, but here you anyway add locking, so why exclude the first iteration?
>>
>> What we're protecting is PageOffline() below. If I didn't mess up, we should
>> always be calling page_offline_freeze() before calling PageOffline(). Or am
>> I missing something?
>   
> Somehow I was under impression we are protecting both pfn_to_online_page()
> and PageOffline().
>   
>>> BTW, did you consider something like
>>
>> Yes, I played with something like that. We'd have to handle the first
>> page_offline_freeze() freeze differently, though, and that's where things
>> got a bit ugly in my attempts.
>>
>>>
>>> 	if (page_offline_frozen++ % MAX_ORDER_NR_PAGES == 0) {
>>> 		page_offline_unfreeze();
>>> 		cond_resched();
>>> 		page_offline_freeze();
>>> 	}
>>>
>>> We don't seem to care about page_offline_frozen overflows here, do we?
>>
>> No, the buffer size is also size_t and gets incremented on a per-byte basis.
>> The variant I have right now looked the cleanest to me. Happy to hear
>> simpler alternatives.
> 
> Well, locking for the first time before the while() loop and doing
> resched-relock outside switch() would be definitely nicer, and it makes the
> last unlock unconditional.
> 
> The cost of prevention of memory offline during reads of !KCORE_RAM parts
> does not seem that significant to me, but I may be missing something.

Also true, I'll have a look if I can just simplify that.

[v1,7/7] fs/proc/kcore: use page_offline_(freeze|unfreeze)

Commit Message

Comments

Patch