mbox series

[0/5] mm/vmalloc.c: improve readability and rewrite vmap_area

Message ID 20190630075650.8516-1-lpf.vector@gmail.com (mailing list archive)
Headers show
Series mm/vmalloc.c: improve readability and rewrite vmap_area | expand

Message

Pengfei Li June 30, 2019, 7:56 a.m. UTC
Hi,

This series of patches is to reduce the size of struct vmap_area.

Since the members of struct vmap_area are not being used at the same time,
it is possible to reduce its size by placing several members that are not
used at the same time in a union.

The first 4 patches did some preparatory work for this and improved
readability.

The fifth patch is the main patch, it did the work of rewriting vmap_area.

More details can be obtained from the commit message.

Thanks,

Pengfei

Pengfei Li (5):
  mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
  mm/vmalloc.c: Introduce a wrapper function of
    insert_vmap_area_augment()
  mm/vmalloc.c: Rename function __find_vmap_area() for readability
  mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
  mm/vmalloc.c: Rewrite struct vmap_area to reduce its size

 include/linux/vmalloc.h |  28 +++++---
 mm/vmalloc.c            | 144 +++++++++++++++++++++++++++-------------
 2 files changed, 117 insertions(+), 55 deletions(-)

Comments

Michal Hocko July 1, 2019, 9:20 a.m. UTC | #1
On Sun 30-06-19 15:56:45, Pengfei Li wrote:
> Hi,
> 
> This series of patches is to reduce the size of struct vmap_area.
> 
> Since the members of struct vmap_area are not being used at the same time,
> it is possible to reduce its size by placing several members that are not
> used at the same time in a union.
> 
> The first 4 patches did some preparatory work for this and improved
> readability.
> 
> The fifth patch is the main patch, it did the work of rewriting vmap_area.
> 
> More details can be obtained from the commit message.

None of the commit messages talk about the motivation. Why do we want to
add quite some code to achieve this? How much do we save? This all
should be a part of the cover letter.

> Thanks,
> 
> Pengfei
> 
> Pengfei Li (5):
>   mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
>   mm/vmalloc.c: Introduce a wrapper function of
>     insert_vmap_area_augment()
>   mm/vmalloc.c: Rename function __find_vmap_area() for readability
>   mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
>   mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
> 
>  include/linux/vmalloc.h |  28 +++++---
>  mm/vmalloc.c            | 144 +++++++++++++++++++++++++++-------------
>  2 files changed, 117 insertions(+), 55 deletions(-)
> 
> -- 
> 2.21.0
Uladzislau Rezki July 1, 2019, 10:11 a.m. UTC | #2
On Mon, Jul 01, 2019 at 11:20:37AM +0200, Michal Hocko wrote:
> On Sun 30-06-19 15:56:45, Pengfei Li wrote:
> > Hi,
> > 
> > This series of patches is to reduce the size of struct vmap_area.
> > 
> > Since the members of struct vmap_area are not being used at the same time,
> > it is possible to reduce its size by placing several members that are not
> > used at the same time in a union.
> > 
> > The first 4 patches did some preparatory work for this and improved
> > readability.
> > 
> > The fifth patch is the main patch, it did the work of rewriting vmap_area.
> > 
> > More details can be obtained from the commit message.
> 
> None of the commit messages talk about the motivation. Why do we want to
> add quite some code to achieve this? How much do we save? This all
> should be a part of the cover letter.
> 
> > Thanks,
> > 
> > Pengfei
> > 
> > Pengfei Li (5):
> >   mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> >   mm/vmalloc.c: Introduce a wrapper function of
> >     insert_vmap_area_augment()
> >   mm/vmalloc.c: Rename function __find_vmap_area() for readability
> >   mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> >   mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
> > 
> >  include/linux/vmalloc.h |  28 +++++---
> >  mm/vmalloc.c            | 144 +++++++++++++++++++++++++++-------------
> >  2 files changed, 117 insertions(+), 55 deletions(-)
> > 
> > -- 
> > 2.21.0

> > Pengfei Li (5):
> >   mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> >   mm/vmalloc.c: Introduce a wrapper function of
> >     insert_vmap_area_augment()
> >   mm/vmalloc.c: Rename function __find_vmap_area() for readability
> >   mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> >   mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
Fitting vmap_area to 1 cacheline boundary makes sense to me. I was thinking about
that and i have patches in my pipeline to send out but implementation is different.

I had a look at all 5 patches. What you are doing is reasonable to me, i mean when
it comes to the idea of reducing the size to L1 cache line. 

I have a concern about implementation and all logic around when we can use va_start
and when it is something else. It is not optimal at least to me, from performance point
of view and complexity. All hot paths and tree traversal are affected by that.

For example running the vmalloc test driver against this series shows the following
delta:

<5.2.0-rc6+>
Summary: fix_size_alloc_test passed: loops: 1000000 avg: 969370 usec
Summary: full_fit_alloc_test passed: loops: 1000000 avg: 989619 usec
Summary: long_busy_list_alloc_test loops: 1000000 avg: 12895813 usec
<5.2.0-rc6+>

<this series>
Summary: fix_size_alloc_test passed: loops: 1000000 avg: 1098372 usec
Summary: full_fit_alloc_test passed: loops: 1000000 avg: 1167260 usec
Summary: long_busy_list_alloc_test passed: loops: 1000000 avg: 12934286 usec
<this series>

For example, the degrade in second test is ~15%.

--
Vlad Rezki
Pengfei Li July 2, 2019, 11:51 a.m. UTC | #3
Michal Hocko <mhocko@kernel.org> 于2019年7月1日周一 下午5:20写道:
>
> On Sun 30-06-19 15:56:45, Pengfei Li wrote:
> > Hi,
> >
> > This series of patches is to reduce the size of struct vmap_area.
> >
> > Since the members of struct vmap_area are not being used at the same time,
> > it is possible to reduce its size by placing several members that are not
> > used at the same time in a union.
> >
> > The first 4 patches did some preparatory work for this and improved
> > readability.
> >
> > The fifth patch is the main patch, it did the work of rewriting vmap_area.
> >
> > More details can be obtained from the commit message.
>
> None of the commit messages talk about the motivation. Why do we want to
> add quite some code to achieve this? How much do we save? This all
> should be a part of the cover letter.
>

Hi Michal,

Thank you for your comments.

Sorry for the commit without any test data.
I will add motivation and necessary test data in the next version.

Best regards,

Pengfei

> > Thanks,
> >
> > Pengfei
> >
> > Pengfei Li (5):
> >   mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> >   mm/vmalloc.c: Introduce a wrapper function of
> >     insert_vmap_area_augment()
> >   mm/vmalloc.c: Rename function __find_vmap_area() for readability
> >   mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> >   mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
> >
> >  include/linux/vmalloc.h |  28 +++++---
> >  mm/vmalloc.c            | 144 +++++++++++++++++++++++++++-------------
> >  2 files changed, 117 insertions(+), 55 deletions(-)
> >
> > --
> > 2.21.0
>
> --
> Michal Hocko
> SUSE Labs
Pengfei Li July 2, 2019, 12:18 p.m. UTC | #4
Uladzislau Rezki <urezki@gmail.com> 于2019年7月1日周一 下午6:11写道:
>
> On Mon, Jul 01, 2019 at 11:20:37AM +0200, Michal Hocko wrote:
> > On Sun 30-06-19 15:56:45, Pengfei Li wrote:
> > > Hi,
> > >
> > > This series of patches is to reduce the size of struct vmap_area.
> > >
> > > Since the members of struct vmap_area are not being used at the same time,
> > > it is possible to reduce its size by placing several members that are not
> > > used at the same time in a union.
> > >
> > > The first 4 patches did some preparatory work for this and improved
> > > readability.
> > >
> > > The fifth patch is the main patch, it did the work of rewriting vmap_area.
> > >
> > > More details can be obtained from the commit message.
> >
> > None of the commit messages talk about the motivation. Why do we want to
> > add quite some code to achieve this? How much do we save? This all
> > should be a part of the cover letter.
> >
> > > Thanks,
> > >
> > > Pengfei
> > >
> > > Pengfei Li (5):
> > >   mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> > >   mm/vmalloc.c: Introduce a wrapper function of
> > >     insert_vmap_area_augment()
> > >   mm/vmalloc.c: Rename function __find_vmap_area() for readability
> > >   mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> > >   mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
> > >
> > >  include/linux/vmalloc.h |  28 +++++---
> > >  mm/vmalloc.c            | 144 +++++++++++++++++++++++++++-------------
> > >  2 files changed, 117 insertions(+), 55 deletions(-)
> > >
> > > --
> > > 2.21.0
>
> > > Pengfei Li (5):
> > >   mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> > >   mm/vmalloc.c: Introduce a wrapper function of
> > >     insert_vmap_area_augment()
> > >   mm/vmalloc.c: Rename function __find_vmap_area() for readability
> > >   mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> > >   mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
> Fitting vmap_area to 1 cacheline boundary makes sense to me. I was thinking about
> that and i have patches in my pipeline to send out but implementation is different.
>
> I had a look at all 5 patches. What you are doing is reasonable to me, i mean when
> it comes to the idea of reducing the size to L1 cache line.
>

Thank you for your review.

> I have a concern about implementation and all logic around when we can use va_start
> and when it is something else. It is not optimal at least to me, from performance point
> of view and complexity. All hot paths and tree traversal are affected by that.
>
> For example running the vmalloc test driver against this series shows the following
> delta:
>
> <5.2.0-rc6+>
> Summary: fix_size_alloc_test passed: loops: 1000000 avg: 969370 usec
> Summary: full_fit_alloc_test passed: loops: 1000000 avg: 989619 usec
> Summary: long_busy_list_alloc_test loops: 1000000 avg: 12895813 usec
> <5.2.0-rc6+>
>
> <this series>
> Summary: fix_size_alloc_test passed: loops: 1000000 avg: 1098372 usec
> Summary: full_fit_alloc_test passed: loops: 1000000 avg: 1167260 usec
> Summary: long_busy_list_alloc_test passed: loops: 1000000 avg: 12934286 usec
> <this series>
>
> For example, the degrade in second test is ~15%.
>
> --
> Vlad Rezki

Hi, Vlad

I think the reason for the performance degradation is that the value
of va_start is obtained by va->vm->addr.

And since the vmap area in the BUSY tree is always page-aligned,
there is no reason for _va_vmlid to override va_start, just let
the va->flags use the bits that lower than PAGE_OFFSET.

I will use this implementation in the next version and show almost
no performance penalty in my local tests.

I will send the next version soon.

Thank you for taking your time for the review.

Best regards,

Pengfei