diff mbox series

[1/1] mm:improve the performance during fork

Message ID 20201222121904.50845-1-qianjun.kernel@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/1] mm:improve the performance during fork | expand

Commit Message

jun qian Dec. 22, 2020, 12:19 p.m. UTC
From: jun qian <qianjun.kernel@gmail.com>

In our project, Many business delays come from fork, so
we started looking for the reason why fork is time-consuming.
I used the ftrace with function_graph to trace the fork, found
that the vm_normal_page will be called tens of thousands and
the execution time of this vm_normal_page function is only a
few nanoseconds. And the vm_normal_page is not a inline function.
So I think if the function is inline style, it maybe reduce the
call time overhead.

I did the following experiment:

I have wrote the c test code, pls ignore the memory leak :)
Before fork, I will malloc 4G bytes, then acculate the fork
time.

int main()
{
        char *p;
        unsigned long long i=0;
        float time_use=0;
        struct timeval start;
        struct timeval end;

        for(i=0; i<LEN; i++) {
                p = (char *)malloc(4096);
                if (p == NULL) {
                        printf("malloc failed!\n");
                        return 0;
                }
                p[0] = 0x55;
        }
        gettimeofday(&start,NULL);
        fork();
        gettimeofday(&end,NULL);

        time_use=(end.tv_sec * 1000000 + end.tv_usec) -
                (start.tv_sec * 1000000 + start.tv_usec);
        printf("time_use is %.10f us\n",time_use);

        return 0;
}

We need to compare the changes in the size of vmlinux, the time of
fork in inline and non-inline cases, and the vm_normal_page will be
called in many function. So we also need to compare this function's
size. For examples, the do_wp_page will call vm_normal_page, so I
also calculated it's size.

		  inline           non-inline       diff
vmlinux size      9709248 bytes    9709824 bytes    -576 bytes
fork time         23475ns          24638ns          -4.7%
do_wp_page size   972              743              +229

According to the above test data, I think inline vm_normal_page can
reduce fork execution time.

Signed-off-by: jun qian <qianjun.kernel@gmail.com>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Souptick Joarder Dec. 22, 2020, 3:07 p.m. UTC | #1
On Tue, Dec 22, 2020 at 5:49 PM <qianjun.kernel@gmail.com> wrote:
>
> From: jun qian <qianjun.kernel@gmail.com>
>
> In our project, Many business delays come from fork, so
> we started looking for the reason why fork is time-consuming.
> I used the ftrace with function_graph to trace the fork, found
> that the vm_normal_page will be called tens of thousands and
> the execution time of this vm_normal_page function is only a
> few nanoseconds. And the vm_normal_page is not a inline function.
> So I think if the function is inline style, it maybe reduce the
> call time overhead.
>
> I did the following experiment:
>
> I have wrote the c test code, pls ignore the memory leak :)
> Before fork, I will malloc 4G bytes, then acculate the fork
> time.
>
> int main()
> {
>         char *p;
>         unsigned long long i=0;
>         float time_use=0;
>         struct timeval start;
>         struct timeval end;
>
>         for(i=0; i<LEN; i++) {
>                 p = (char *)malloc(4096);
>                 if (p == NULL) {
>                         printf("malloc failed!\n");
>                         return 0;
>                 }
>                 p[0] = 0x55;
>         }
>         gettimeofday(&start,NULL);
>         fork();
>         gettimeofday(&end,NULL);
>
>         time_use=(end.tv_sec * 1000000 + end.tv_usec) -
>                 (start.tv_sec * 1000000 + start.tv_usec);
>         printf("time_use is %.10f us\n",time_use);
>
>         return 0;
> }
>
> We need to compare the changes in the size of vmlinux, the time of
> fork in inline and non-inline cases, and the vm_normal_page will be
> called in many function. So we also need to compare this function's
> size. For examples, the do_wp_page will call vm_normal_page, so I
> also calculated it's size.
>
>                   inline           non-inline       diff
> vmlinux size      9709248 bytes    9709824 bytes    -576 bytes
> fork time         23475ns          24638ns          -4.7%

Do you have time diff for both parent and child process ?

> do_wp_page size   972              743              +229
>
> According to the above test data, I think inline vm_normal_page can
> reduce fork execution time.
>
> Signed-off-by: jun qian <qianjun.kernel@gmail.com>
> ---
>  mm/memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 7d608765932b..a689bb5d3842 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -591,7 +591,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>   * PFNMAP mappings in order to support COWable mappings.
>   *
>   */
> -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> +inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>                             pte_t pte)
>  {
>         unsigned long pfn = pte_pfn(pte);
> --
> 2.18.2
>
>
jun qian Dec. 22, 2020, 3:32 p.m. UTC | #2
Souptick Joarder <jrdr.linux@gmail.com> 于2020年12月22日周二 下午11:08写道:
>
> On Tue, Dec 22, 2020 at 5:49 PM <qianjun.kernel@gmail.com> wrote:
> >
> > From: jun qian <qianjun.kernel@gmail.com>
> >
> > In our project, Many business delays come from fork, so
> > we started looking for the reason why fork is time-consuming.
> > I used the ftrace with function_graph to trace the fork, found
> > that the vm_normal_page will be called tens of thousands and
> > the execution time of this vm_normal_page function is only a
> > few nanoseconds. And the vm_normal_page is not a inline function.
> > So I think if the function is inline style, it maybe reduce the
> > call time overhead.
> >
> > I did the following experiment:
> >
> > I have wrote the c test code, pls ignore the memory leak :)
> > Before fork, I will malloc 4G bytes, then acculate the fork
> > time.
> >
> > int main()
> > {
> >         char *p;
> >         unsigned long long i=0;
> >         float time_use=0;
> >         struct timeval start;
> >         struct timeval end;
> >
> >         for(i=0; i<LEN; i++) {
> >                 p = (char *)malloc(4096);
> >                 if (p == NULL) {
> >                         printf("malloc failed!\n");
> >                         return 0;
> >                 }
> >                 p[0] = 0x55;
> >         }
> >         gettimeofday(&start,NULL);
> >         fork();
> >         gettimeofday(&end,NULL);
> >
> >         time_use=(end.tv_sec * 1000000 + end.tv_usec) -
> >                 (start.tv_sec * 1000000 + start.tv_usec);
> >         printf("time_use is %.10f us\n",time_use);
> >
> >         return 0;
> > }
> >
> > We need to compare the changes in the size of vmlinux, the time of
> > fork in inline and non-inline cases, and the vm_normal_page will be
> > called in many function. So we also need to compare this function's
> > size. For examples, the do_wp_page will call vm_normal_page, so I
> > also calculated it's size.
> >
> >                   inline           non-inline       diff
> > vmlinux size      9709248 bytes    9709824 bytes    -576 bytes
> > fork time         23475ns          24638ns          -4.7%
>
> Do you have time diff for both parent and child process ?

yes, the child time diff and the parent time diff are almost same,
just like this, a.out is the test program.

./a.out
time_use is 23342.0000000000 us
time_use is 23404.0000000000 us

>
> > do_wp_page size   972              743              +229
> >
> > According to the above test data, I think inline vm_normal_page can
> > reduce fork execution time.
> >
> > Signed-off-by: jun qian <qianjun.kernel@gmail.com>
> > ---
> >  mm/memory.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 7d608765932b..a689bb5d3842 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -591,7 +591,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
> >   * PFNMAP mappings in order to support COWable mappings.
> >   *
> >   */
> > -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> > +inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> >                             pte_t pte)
> >  {
> >         unsigned long pfn = pte_pfn(pte);
> > --
> > 2.18.2
> >
> >
David Laight Dec. 22, 2020, 6:42 p.m. UTC | #3
From: qianjun
> Sent: 22 December 2020 12:19
> 
> In our project, Many business delays come from fork, so
> we started looking for the reason why fork is time-consuming.
> I used the ftrace with function_graph to trace the fork, found
> that the vm_normal_page will be called tens of thousands and
> the execution time of this vm_normal_page function is only a
> few nanoseconds. And the vm_normal_page is not a inline function.
> So I think if the function is inline style, it maybe reduce the
> call time overhead.

Beware of taking timings from ftrace function trace.
The cost of the tracing is significant.

You can get sensible numbers if you only trace very specific
functions.
Slightly annoyingly the output format changes if you enable
the function exit trace - useful for the timestamp.
ISTR it is possible to get the process id traced if you fiddle
with enough options.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 7d608765932b..a689bb5d3842 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -591,7 +591,7 @@  static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
  * PFNMAP mappings in order to support COWable mappings.
  *
  */
-struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
+inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 			    pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);