Message ID | 20201222121904.50845-1-qianjun.kernel@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] mm:improve the performance during fork | expand |
On Tue, Dec 22, 2020 at 5:49 PM <qianjun.kernel@gmail.com> wrote: > > From: jun qian <qianjun.kernel@gmail.com> > > In our project, Many business delays come from fork, so > we started looking for the reason why fork is time-consuming. > I used the ftrace with function_graph to trace the fork, found > that the vm_normal_page will be called tens of thousands and > the execution time of this vm_normal_page function is only a > few nanoseconds. And the vm_normal_page is not a inline function. > So I think if the function is inline style, it maybe reduce the > call time overhead. > > I did the following experiment: > > I have wrote the c test code, pls ignore the memory leak :) > Before fork, I will malloc 4G bytes, then acculate the fork > time. > > int main() > { > char *p; > unsigned long long i=0; > float time_use=0; > struct timeval start; > struct timeval end; > > for(i=0; i<LEN; i++) { > p = (char *)malloc(4096); > if (p == NULL) { > printf("malloc failed!\n"); > return 0; > } > p[0] = 0x55; > } > gettimeofday(&start,NULL); > fork(); > gettimeofday(&end,NULL); > > time_use=(end.tv_sec * 1000000 + end.tv_usec) - > (start.tv_sec * 1000000 + start.tv_usec); > printf("time_use is %.10f us\n",time_use); > > return 0; > } > > We need to compare the changes in the size of vmlinux, the time of > fork in inline and non-inline cases, and the vm_normal_page will be > called in many function. So we also need to compare this function's > size. For examples, the do_wp_page will call vm_normal_page, so I > also calculated it's size. > > inline non-inline diff > vmlinux size 9709248 bytes 9709824 bytes -576 bytes > fork time 23475ns 24638ns -4.7% Do you have time diff for both parent and child process ? > do_wp_page size 972 743 +229 > > According to the above test data, I think inline vm_normal_page can > reduce fork execution time. > > Signed-off-by: jun qian <qianjun.kernel@gmail.com> > --- > mm/memory.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 7d608765932b..a689bb5d3842 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -591,7 +591,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, > * PFNMAP mappings in order to support COWable mappings. > * > */ > -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > +inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > pte_t pte) > { > unsigned long pfn = pte_pfn(pte); > -- > 2.18.2 > >
Souptick Joarder <jrdr.linux@gmail.com> 于2020年12月22日周二 下午11:08写道: > > On Tue, Dec 22, 2020 at 5:49 PM <qianjun.kernel@gmail.com> wrote: > > > > From: jun qian <qianjun.kernel@gmail.com> > > > > In our project, Many business delays come from fork, so > > we started looking for the reason why fork is time-consuming. > > I used the ftrace with function_graph to trace the fork, found > > that the vm_normal_page will be called tens of thousands and > > the execution time of this vm_normal_page function is only a > > few nanoseconds. And the vm_normal_page is not a inline function. > > So I think if the function is inline style, it maybe reduce the > > call time overhead. > > > > I did the following experiment: > > > > I have wrote the c test code, pls ignore the memory leak :) > > Before fork, I will malloc 4G bytes, then acculate the fork > > time. > > > > int main() > > { > > char *p; > > unsigned long long i=0; > > float time_use=0; > > struct timeval start; > > struct timeval end; > > > > for(i=0; i<LEN; i++) { > > p = (char *)malloc(4096); > > if (p == NULL) { > > printf("malloc failed!\n"); > > return 0; > > } > > p[0] = 0x55; > > } > > gettimeofday(&start,NULL); > > fork(); > > gettimeofday(&end,NULL); > > > > time_use=(end.tv_sec * 1000000 + end.tv_usec) - > > (start.tv_sec * 1000000 + start.tv_usec); > > printf("time_use is %.10f us\n",time_use); > > > > return 0; > > } > > > > We need to compare the changes in the size of vmlinux, the time of > > fork in inline and non-inline cases, and the vm_normal_page will be > > called in many function. So we also need to compare this function's > > size. For examples, the do_wp_page will call vm_normal_page, so I > > also calculated it's size. > > > > inline non-inline diff > > vmlinux size 9709248 bytes 9709824 bytes -576 bytes > > fork time 23475ns 24638ns -4.7% > > Do you have time diff for both parent and child process ? yes, the child time diff and the parent time diff are almost same, just like this, a.out is the test program. ./a.out time_use is 23342.0000000000 us time_use is 23404.0000000000 us > > > do_wp_page size 972 743 +229 > > > > According to the above test data, I think inline vm_normal_page can > > reduce fork execution time. > > > > Signed-off-by: jun qian <qianjun.kernel@gmail.com> > > --- > > mm/memory.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 7d608765932b..a689bb5d3842 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -591,7 +591,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, > > * PFNMAP mappings in order to support COWable mappings. > > * > > */ > > -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > > +inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > > pte_t pte) > > { > > unsigned long pfn = pte_pfn(pte); > > -- > > 2.18.2 > > > >
From: qianjun > Sent: 22 December 2020 12:19 > > In our project, Many business delays come from fork, so > we started looking for the reason why fork is time-consuming. > I used the ftrace with function_graph to trace the fork, found > that the vm_normal_page will be called tens of thousands and > the execution time of this vm_normal_page function is only a > few nanoseconds. And the vm_normal_page is not a inline function. > So I think if the function is inline style, it maybe reduce the > call time overhead. Beware of taking timings from ftrace function trace. The cost of the tracing is significant. You can get sensible numbers if you only trace very specific functions. Slightly annoyingly the output format changes if you enable the function exit trace - useful for the timestamp. ISTR it is possible to get the process id traced if you fiddle with enough options. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/mm/memory.c b/mm/memory.c index 7d608765932b..a689bb5d3842 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -591,7 +591,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * PFNMAP mappings in order to support COWable mappings. * */ -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, +inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte) { unsigned long pfn = pte_pfn(pte);