diff mbox

[3/6] struct page: add field for vm_struct

Message ID 20180130151446.24698-4-igor.stoppa@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Igor Stoppa Jan. 30, 2018, 3:14 p.m. UTC
When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c             | 18 +++++++++++++-----
 2 files changed, 14 insertions(+), 5 deletions(-)

Comments

Christoph Lameter (Ampere) Feb. 1, 2018, midnight UTC | #1
On Tue, 30 Jan 2018, Igor Stoppa wrote:

> @@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>
>  	kmemleak_vmalloc(area, size, gfp_mask);
>
> +	for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
> +		area->pages[page_counter]->area = area;
> +
>  	return addr;

Well this introduces significant overhead for large sized allocation. Does
this not matter because the areas are small?

Would it not be better to use compound page allocations here?
page_head(whatever) gets you the head page where you can store all sorts
of information about the chunk of memory.
Igor Stoppa Feb. 1, 2018, 12:42 p.m. UTC | #2
On 01/02/18 02:00, Christopher Lameter wrote:
> On Tue, 30 Jan 2018, Igor Stoppa wrote:
> 
>> @@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>>
>>  	kmemleak_vmalloc(area, size, gfp_mask);
>>
>> +	for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
>> +		area->pages[page_counter]->area = area;
>> +
>>  	return addr;
> 
> Well this introduces significant overhead for large sized allocation. Does
> this not matter because the areas are small?

Relatively significant?
I do not object to your comment, but in practice i see that:

- vmalloc is used relatively little
- allocations do not seem to be huge
- there seem to be way larger overheads in the handling of virtual pages
  (see my proposal for the LFS/m summit, about collapsing struct
   vm_struct and struct vmap_area)


> Would it not be better to use compound page allocations here?
> page_head(whatever) gets you the head page where you can store all sorts
> of information about the chunk of memory.

Can you please point me to this function/macro? I don't seem to be able
to find it, at least not in 4.15

During hardened user copy permission check, I need to confirm if the
memory range that would be exposed to userspace is a legitimate
sub-range of a pmalloc allocation.


So, I start with the pair (address, size) and I must end up to something
I can compare it against.
The idea here is to pass through struct_page and then the related
vm_struct/vmap_area, which already has the information about the
specific chunk of virtual memory.

I cannot comment on your proposal because I do not know where to find
the reference you made, or maybe I do not understand what you mean :-(

--
igor
Kees Cook Feb. 1, 2018, 9:11 p.m. UTC | #3
On Thu, Feb 1, 2018 at 11:42 PM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> On 01/02/18 02:00, Christopher Lameter wrote:
>> Would it not be better to use compound page allocations here?
>> page_head(whatever) gets you the head page where you can store all sorts
>> of information about the chunk of memory.
>
> Can you please point me to this function/macro? I don't seem to be able
> to find it, at least not in 4.15

IIUC, he means PageHead(), which is also hard to grep for, since it is
a constructed name, via Page##uname in include/linux/page-flags.h:

__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)

-Kees
Igor Stoppa Feb. 2, 2018, 4:01 p.m. UTC | #4
On 01/02/18 23:11, Kees Cook wrote:

> IIUC, he means PageHead(), which is also hard to grep for, since it is
> a constructed name, via Page##uname in include/linux/page-flags.h:
> 
> __PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)

Thank you, I'll try to provide a meaningful reply soon, but I'll be AFK
during most of next 2 weeks, so it might be delayed :-(

--
igor
Christoph Lameter (Ampere) Feb. 2, 2018, 6:43 p.m. UTC | #5
On Thu, 1 Feb 2018, Igor Stoppa wrote:

> > Would it not be better to use compound page allocations here?
> > page_head(whatever) gets you the head page where you can store all sorts
> > of information about the chunk of memory.
>
> Can you please point me to this function/macro? I don't seem to be able
> to find it, at least not in 4.15

Ok its compound_head(). See also the use in the SLAB and SLUB allocator.

> During hardened user copy permission check, I need to confirm if the
> memory range that would be exposed to userspace is a legitimate
> sub-range of a pmalloc allocation.

If you save the size in the head page struct then you could do that pretty
fast.

> I cannot comment on your proposal because I do not know where to find
> the reference you made, or maybe I do not understand what you mean :-(

compund pages are higher order pages that are handled as a single page by
the VM. See https://lwn.net/Articles/619514/
Igor Stoppa Feb. 3, 2018, 4:13 p.m. UTC | #6
On 02/02/18 20:43, Christopher Lameter wrote:
> On Thu, 1 Feb 2018, Igor Stoppa wrote:
> 
>>> Would it not be better to use compound page allocations here?

[...]

> Ok its compound_head(). See also the use in the SLAB and SLUB allocator.
> 
>> During hardened user copy permission check, I need to confirm if the
>> memory range that would be exposed to userspace is a legitimate
>> sub-range of a pmalloc allocation.
> 
> If you save the size in the head page struct then you could do that pretty
> fast.

Ok, now I get what you mean.
But it doesn't seem to fit the intended use case, for other reasons
(maybe the same, from 2 different POV):

- compound pages are aggregates of regular pages, in numbers that are
powers of 2, while the amount of pages to allocate is not known upfront.
One *could* give a hint to pmalloc about how many pages to allocate
every time there is a need to grow the pool.
Iow it would be the size of a chunk. But I'm afraid the granularity
would still be pretty low, so maybe it would be 2-4 times less.

- the property of the compound page will affect the property of all the
pages in the compound, so when one is write protected, it can generate a
lot of wasted memory, if there is too much slack (because of the order)
With vmalloc, I can allocate any number of pages, minimizing the waste.


Finally, there was a discussion about optimization:
http://www.openwall.com/lists/kernel-hardening/2017/08/07/2

The patch I sent does indeed take advantage of the new information, not
just for pmalloc use.

I have not measured if/where/what there is gain, but it does look like
the extra info can be exploited also elsewhere.

--
igor
Christoph Lameter (Ampere) Feb. 5, 2018, 3:33 p.m. UTC | #7
On Sat, 3 Feb 2018, Igor Stoppa wrote:

> - the property of the compound page will affect the property of all the
> pages in the compound, so when one is write protected, it can generate a
> lot of wasted memory, if there is too much slack (because of the order)
> With vmalloc, I can allocate any number of pages, minimizing the waste.

I thought the intend here is to create a pool where the whole pool becomes
RO?
Matthew Wilcox Feb. 6, 2018, 12:37 p.m. UTC | #8
On Tue, Jan 30, 2018 at 05:14:43PM +0200, Igor Stoppa wrote:
> @@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  			const void *caller)
>  {
>  	struct vm_struct *area;
> +	unsigned int page_counter;
>  	void *addr;
>  	unsigned long real_size = size;
>  
> @@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  
>  	kmemleak_vmalloc(area, size, gfp_mask);
>  
> +	for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
> +		area->pages[page_counter]->area = area;
> +
>  	return addr;
>  

LOCAL variable names should be short, and to the point.  If you have
some random integer loop counter, it should probably be called ``i``.
Calling it ``loop_counter`` is non-productive, if there is no chance of it
being mis-understood.  Similarly, ``tmp`` can be just about any type of
variable that is used to hold a temporary value.

(Documentation/process/coding-style.rst)
Igor Stoppa Feb. 9, 2018, 11:34 a.m. UTC | #9
On 05/02/18 17:33, Christopher Lameter wrote:
> On Sat, 3 Feb 2018, Igor Stoppa wrote:
> 
>> - the property of the compound page will affect the property of all the
>> pages in the compound, so when one is write protected, it can generate a
>> lot of wasted memory, if there is too much slack (because of the order)
>> With vmalloc, I can allocate any number of pages, minimizing the waste.
> 
> I thought the intend here is to create a pool where the whole pool becomes
> RO?

Yes, but why would I force the number of pages in the pool to be a power
of 2, when it can be any number?

If a need, say, 17 pages, I would have to allocate 32.
But it can be worse than that.
Since the size of the overall allocated memory is not known upfront, I
wold have a problem to decide how many pages to allocate, every time
there is need to grow the pool.

Or push the problem to the user of the API, who might be equally unaware.

Notice that there is already a function (prealloc) available to the user
of the API, if the size is known upfront.

So I do not really see how using compound pages would make memory
utilization better or even not worse.

--
igor
Igor Stoppa Feb. 9, 2018, 1:45 p.m. UTC | #10
On 06/02/18 14:37, Matthew Wilcox wrote:

[...]

> LOCAL variable names should be short, and to the point.

[...]

> (Documentation/process/coding-style.rst)

ok, will do, thanks for the pointer!

--
igor
diff mbox

Patch

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cfd0ac4..2abd540 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -56,6 +56,7 @@  struct page {
 		void *s_mem;			/* slab first object */
 		atomic_t compound_mapcount;	/* first tail page */
 		/* page_deferred_list().next	 -- second tail page */
+		struct vm_struct *area;
 	};
 
 	/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6739420..44c5dfc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@  struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-	struct vmap_area *va;
+	struct page *page;
 
-	va = find_vmap_area((unsigned long)addr);
-	if (va && va->flags & VM_VM_AREA)
-		return va->vm;
+	if (unlikely(!is_vmalloc_addr(addr)))
+		return NULL;
 
-	return NULL;
+	page = vmalloc_to_page(addr);
+	if (unlikely(!page))
+		return NULL;
+
+	return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@  static void __vunmap(const void *addr, int deallocate_pages)
 			struct page *page = area->pages[i];
 
 			BUG_ON(!page);
+			page->area = NULL;
 			__free_pages(page, 0);
 		}
 
@@ -1744,6 +1748,7 @@  void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			const void *caller)
 {
 	struct vm_struct *area;
+	unsigned int page_counter;
 	void *addr;
 	unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@  void *__vmalloc_node_range(unsigned long size, unsigned long align,
 
 	kmemleak_vmalloc(area, size, gfp_mask);
 
+	for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
+		area->pages[page_counter]->area = area;
+
 	return addr;
 
 fail: