Message ID | 1532628614-111702-2-git-send-email-yang.shi@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: zap pages with read mmap_sem in munmap for large mapping | expand |
On Fri 27-07-18 02:10:13, Yang Shi wrote: > Introduces three new helper functions: > * munmap_addr_sanity() > * munmap_lookup_vma() > * munmap_mlock_vma() > > They will be used by do_munmap() and the new do_munmap with zapping > large mapping early in the later patch. > > There is no functional change, just code refactor. > > Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> > Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> > --- > mm/mmap.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------- > 1 file changed, 82 insertions(+), 38 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index d1eb87e..2504094 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2686,34 +2686,44 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, > return __split_vma(mm, vma, addr, new_below); > } > > -/* Munmap is split into 2 main parts -- this part which finds > - * what needs doing, and the areas themselves, which do the > - * work. This now handles partial unmappings. > - * Jeremy Fitzhardinge <jeremy@goop.org> > - */ > -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, > - struct list_head *uf) > +static inline bool munmap_addr_sanity(unsigned long start, size_t len) munmap_check_addr? Btw. why does this need to have munmap prefix at all? This is a general address space check. > { > - unsigned long end; > - struct vm_area_struct *vma, *prev, *last; > - > if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) > - return -EINVAL; > + return false; > > - len = PAGE_ALIGN(len); > - if (len == 0) > - return -EINVAL; > + if (PAGE_ALIGN(len) == 0) > + return false; > + > + return true; > +} > + > +/* > + * munmap_lookup_vma: find the first overlap vma and split overlap vmas. > + * @mm: mm_struct > + * @vma: the first overlapping vma > + * @prev: vma's prev > + * @start: start address > + * @end: end address This really doesn't help me to understand how to use the function. Why do we need both prev and vma etc... > + * > + * returns 1 if successful, 0 or errno otherwise This is a really weird calling convention. So what does 0 tell? /me checks the code. Ohh, it is nothing to do. Why cannot you simply return the vma. NULL implies nothing to do, ERR_PTR on error. > + */ > +static int munmap_lookup_vma(struct mm_struct *mm, struct vm_area_struct **vma, > + struct vm_area_struct **prev, unsigned long start, > + unsigned long end) > +{ > + struct vm_area_struct *tmp, *last; > > /* Find the first overlapping VMA */ > - vma = find_vma(mm, start); > - if (!vma) > + tmp = find_vma(mm, start); > + if (!tmp) > return 0; > - prev = vma->vm_prev; > - /* we have start < vma->vm_end */ > + > + *prev = tmp->vm_prev; Why do you set prev here. We might "fail" with 0 right after this > + > + /* we have start < vma->vm_end */ > > /* if it doesn't overlap, we have nothing.. */ > - end = start + len; > - if (vma->vm_start >= end) > + if (tmp->vm_start >= end) > return 0; > > /* > @@ -2723,7 +2733,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, > * unmapped vm_area_struct will remain in use: so lower split_vma > * places tmp vma above, and higher split_vma places tmp vma below. > */ > - if (start > vma->vm_start) { > + if (start > tmp->vm_start) { > int error; > > /* > @@ -2731,13 +2741,14 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, > * not exceed its limit; but let map_count go just above > * its limit temporarily, to help free resources as expected. > */ > - if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) > + if (end < tmp->vm_end && > + mm->map_count > sysctl_max_map_count) > return -ENOMEM; > > - error = __split_vma(mm, vma, start, 0); > + error = __split_vma(mm, tmp, start, 0); > if (error) > return error; > - prev = vma; > + *prev = tmp; > } > > /* Does it split the last one? */ > @@ -2747,7 +2758,48 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, > if (error) > return error; > } > - vma = prev ? prev->vm_next : mm->mmap; > + > + *vma = *prev ? (*prev)->vm_next : mm->mmap; > + > + return 1; > +} the patch would be much more easier to read if you didn't do vma->tmp renaming.
On 8/3/18 1:53 AM, Michal Hocko wrote: > On Fri 27-07-18 02:10:13, Yang Shi wrote: >> Introduces three new helper functions: >> * munmap_addr_sanity() >> * munmap_lookup_vma() >> * munmap_mlock_vma() >> >> They will be used by do_munmap() and the new do_munmap with zapping >> large mapping early in the later patch. >> >> There is no functional change, just code refactor. >> >> Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> >> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> >> --- >> mm/mmap.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------- >> 1 file changed, 82 insertions(+), 38 deletions(-) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index d1eb87e..2504094 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2686,34 +2686,44 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, >> return __split_vma(mm, vma, addr, new_below); >> } >> >> -/* Munmap is split into 2 main parts -- this part which finds >> - * what needs doing, and the areas themselves, which do the >> - * work. This now handles partial unmappings. >> - * Jeremy Fitzhardinge <jeremy@goop.org> >> - */ >> -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, >> - struct list_head *uf) >> +static inline bool munmap_addr_sanity(unsigned long start, size_t len) > munmap_check_addr? Btw. why does this need to have munmap prefix at all? > This is a general address space check. Just because I extracted this from do_munmap, no special consideration. It is definitely ok to use another name. > >> { >> - unsigned long end; >> - struct vm_area_struct *vma, *prev, *last; >> - >> if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) >> - return -EINVAL; >> + return false; >> >> - len = PAGE_ALIGN(len); >> - if (len == 0) >> - return -EINVAL; >> + if (PAGE_ALIGN(len) == 0) >> + return false; >> + >> + return true; >> +} >> + >> +/* >> + * munmap_lookup_vma: find the first overlap vma and split overlap vmas. >> + * @mm: mm_struct >> + * @vma: the first overlapping vma >> + * @prev: vma's prev >> + * @start: start address >> + * @end: end address > This really doesn't help me to understand how to use the function. > Why do we need both prev and vma etc... prev will be used by unmap_region later. > >> + * >> + * returns 1 if successful, 0 or errno otherwise > This is a really weird calling convention. So what does 0 tell? /me > checks the code. Ohh, it is nothing to do. Why cannot you simply return > the vma. NULL implies nothing to do, ERR_PTR on error. A couple of reasons why it is implemented as so: * do_munmap returns 0 for both success and no suitable vma * Since prev is needed by finding the start vma, and prev will be used by unmap_region later too, so I just thought it would look clean to have one function to return both start vma and prev. In this way, we can share as much as possible common code. * In this way, we just need return 0, 1 or error no just as same as what do_munmap does currently. Then we know what is failure case exactly to just bail out right away. Actually, I tried the same approach as you suggested, but it had two problems: * If it returns the start vma, we have to re-find its prev later, but the prev has been found during finding start vma. And, duplicate the code in do_munmap_zap_rlock. It sounds not that ideal. * If it returns prev, it might be null (start vma is the first vma). We can't tell if null is a failure or success case > >> + */ >> +static int munmap_lookup_vma(struct mm_struct *mm, struct vm_area_struct **vma, >> + struct vm_area_struct **prev, unsigned long start, >> + unsigned long end) >> +{ >> + struct vm_area_struct *tmp, *last; >> >> /* Find the first overlapping VMA */ >> - vma = find_vma(mm, start); >> - if (!vma) >> + tmp = find_vma(mm, start); >> + if (!tmp) >> return 0; >> - prev = vma->vm_prev; >> - /* we have start < vma->vm_end */ >> + >> + *prev = tmp->vm_prev; > Why do you set prev here. We might "fail" with 0 right after this No special reason, just copied from do_munmap. Yes, it is ideal to have prev set here. It can be moved further down. > >> + >> + /* we have start < vma->vm_end */ >> >> /* if it doesn't overlap, we have nothing.. */ >> - end = start + len; >> - if (vma->vm_start >= end) >> + if (tmp->vm_start >= end) >> return 0; >> >> /* >> @@ -2723,7 +2733,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, >> * unmapped vm_area_struct will remain in use: so lower split_vma >> * places tmp vma above, and higher split_vma places tmp vma below. >> */ >> - if (start > vma->vm_start) { >> + if (start > tmp->vm_start) { >> int error; >> >> /* >> @@ -2731,13 +2741,14 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, >> * not exceed its limit; but let map_count go just above >> * its limit temporarily, to help free resources as expected. >> */ >> - if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) >> + if (end < tmp->vm_end && >> + mm->map_count > sysctl_max_map_count) >> return -ENOMEM; >> >> - error = __split_vma(mm, vma, start, 0); >> + error = __split_vma(mm, tmp, start, 0); >> if (error) >> return error; >> - prev = vma; >> + *prev = tmp; >> } >> >> /* Does it split the last one? */ >> @@ -2747,7 +2758,48 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, >> if (error) >> return error; >> } >> - vma = prev ? prev->vm_next : mm->mmap; >> + >> + *vma = *prev ? (*prev)->vm_next : mm->mmap; >> + >> + return 1; >> +} > the patch would be much more easier to read if you didn't do vma->tmp > renaming. Yes, I should used another name for the "vma" argument. Thanks, Yang
On Fri 03-08-18 13:47:19, Yang Shi wrote: > > > On 8/3/18 1:53 AM, Michal Hocko wrote: > > On Fri 27-07-18 02:10:13, Yang Shi wrote: > > > Introduces three new helper functions: > > > * munmap_addr_sanity() > > > * munmap_lookup_vma() > > > * munmap_mlock_vma() > > > > > > They will be used by do_munmap() and the new do_munmap with zapping > > > large mapping early in the later patch. > > > > > > There is no functional change, just code refactor. > > > > > > Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> > > > Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> > > > --- > > > mm/mmap.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------- > > > 1 file changed, 82 insertions(+), 38 deletions(-) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index d1eb87e..2504094 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -2686,34 +2686,44 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, > > > return __split_vma(mm, vma, addr, new_below); > > > } > > > -/* Munmap is split into 2 main parts -- this part which finds > > > - * what needs doing, and the areas themselves, which do the > > > - * work. This now handles partial unmappings. > > > - * Jeremy Fitzhardinge <jeremy@goop.org> > > > - */ > > > -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, > > > - struct list_head *uf) > > > +static inline bool munmap_addr_sanity(unsigned long start, size_t len) > > munmap_check_addr? Btw. why does this need to have munmap prefix at all? > > This is a general address space check. > > Just because I extracted this from do_munmap, no special consideration. It > is definitely ok to use another name. > > > > > > { > > > - unsigned long end; > > > - struct vm_area_struct *vma, *prev, *last; > > > - > > > if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) > > > - return -EINVAL; > > > + return false; > > > - len = PAGE_ALIGN(len); > > > - if (len == 0) > > > - return -EINVAL; > > > + if (PAGE_ALIGN(len) == 0) > > > + return false; > > > + > > > + return true; > > > +} > > > + > > > +/* > > > + * munmap_lookup_vma: find the first overlap vma and split overlap vmas. > > > + * @mm: mm_struct > > > + * @vma: the first overlapping vma > > > + * @prev: vma's prev > > > + * @start: start address > > > + * @end: end address > > This really doesn't help me to understand how to use the function. > > Why do we need both prev and vma etc... > > prev will be used by unmap_region later. But what does it stand for? Why cannot you take prev from the returned vma? In other words, if somebody reads this documentation how does he know what the prev is supposed to be used for? > > > + * > > > + * returns 1 if successful, 0 or errno otherwise > > This is a really weird calling convention. So what does 0 tell? /me > > checks the code. Ohh, it is nothing to do. Why cannot you simply return > > the vma. NULL implies nothing to do, ERR_PTR on error. > > A couple of reasons why it is implemented as so: > > * do_munmap returns 0 for both success and no suitable vma > > * Since prev is needed by finding the start vma, and prev will be used > by unmap_region later too, so I just thought it would look clean to have one > function to return both start vma and prev. In this way, we can share as > much as possible common code. > > * In this way, we just need return 0, 1 or error no just as same as what > do_munmap does currently. Then we know what is failure case exactly to just > bail out right away. > > Actually, I tried the same approach as you suggested, but it had two > problems: > > * If it returns the start vma, we have to re-find its prev later, but > the prev has been found during finding start vma. And, duplicate the code in > do_munmap_zap_rlock. It sounds not that ideal. > > * If it returns prev, it might be null (start vma is the first vma). We > can't tell if null is a failure or success case Even if you need to return both vma and prev then it would be better to simply return vma directly than having this -errno, 0 or 1 return semantic.
On 8/6/18 6:26 AM, Michal Hocko wrote: > On Fri 03-08-18 13:47:19, Yang Shi wrote: >> >> On 8/3/18 1:53 AM, Michal Hocko wrote: >>> On Fri 27-07-18 02:10:13, Yang Shi wrote: >>>> Introduces three new helper functions: >>>> * munmap_addr_sanity() >>>> * munmap_lookup_vma() >>>> * munmap_mlock_vma() >>>> >>>> They will be used by do_munmap() and the new do_munmap with zapping >>>> large mapping early in the later patch. >>>> >>>> There is no functional change, just code refactor. >>>> >>>> Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> >>>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> >>>> --- >>>> mm/mmap.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------- >>>> 1 file changed, 82 insertions(+), 38 deletions(-) >>>> >>>> diff --git a/mm/mmap.c b/mm/mmap.c >>>> index d1eb87e..2504094 100644 >>>> --- a/mm/mmap.c >>>> +++ b/mm/mmap.c >>>> @@ -2686,34 +2686,44 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, >>>> return __split_vma(mm, vma, addr, new_below); >>>> } >>>> -/* Munmap is split into 2 main parts -- this part which finds >>>> - * what needs doing, and the areas themselves, which do the >>>> - * work. This now handles partial unmappings. >>>> - * Jeremy Fitzhardinge <jeremy@goop.org> >>>> - */ >>>> -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, >>>> - struct list_head *uf) >>>> +static inline bool munmap_addr_sanity(unsigned long start, size_t len) >>> munmap_check_addr? Btw. why does this need to have munmap prefix at all? >>> This is a general address space check. >> Just because I extracted this from do_munmap, no special consideration. It >> is definitely ok to use another name. >> >>>> { >>>> - unsigned long end; >>>> - struct vm_area_struct *vma, *prev, *last; >>>> - >>>> if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) >>>> - return -EINVAL; >>>> + return false; >>>> - len = PAGE_ALIGN(len); >>>> - if (len == 0) >>>> - return -EINVAL; >>>> + if (PAGE_ALIGN(len) == 0) >>>> + return false; >>>> + >>>> + return true; >>>> +} >>>> + >>>> +/* >>>> + * munmap_lookup_vma: find the first overlap vma and split overlap vmas. >>>> + * @mm: mm_struct >>>> + * @vma: the first overlapping vma >>>> + * @prev: vma's prev >>>> + * @start: start address >>>> + * @end: end address >>> This really doesn't help me to understand how to use the function. >>> Why do we need both prev and vma etc... >> prev will be used by unmap_region later. > But what does it stand for? Why cannot you take prev from the returned > vma? In other words, if somebody reads this documentation how does he > know what the prev is supposed to be used for? > >>>> + * >>>> + * returns 1 if successful, 0 or errno otherwise >>> This is a really weird calling convention. So what does 0 tell? /me >>> checks the code. Ohh, it is nothing to do. Why cannot you simply return >>> the vma. NULL implies nothing to do, ERR_PTR on error. >> A couple of reasons why it is implemented as so: >> >> * do_munmap returns 0 for both success and no suitable vma >> >> * Since prev is needed by finding the start vma, and prev will be used >> by unmap_region later too, so I just thought it would look clean to have one >> function to return both start vma and prev. In this way, we can share as >> much as possible common code. >> >> * In this way, we just need return 0, 1 or error no just as same as what >> do_munmap does currently. Then we know what is failure case exactly to just >> bail out right away. >> >> Actually, I tried the same approach as you suggested, but it had two >> problems: >> >> * If it returns the start vma, we have to re-find its prev later, but >> the prev has been found during finding start vma. And, duplicate the code in >> do_munmap_zap_rlock. It sounds not that ideal. >> >> * If it returns prev, it might be null (start vma is the first vma). We >> can't tell if null is a failure or success case > Even if you need to return both vma and prev then it would be better to > simply return vma directly than having this -errno, 0 or 1 return > semantic. OK, I will try to refactor the code. Thanks, Yang
On 07/26/2018 08:10 PM, Yang Shi wrote: > Introduces three new helper functions: > * munmap_addr_sanity() > * munmap_lookup_vma() > * munmap_mlock_vma() > > They will be used by do_munmap() and the new do_munmap with zapping > large mapping early in the later patch. > > There is no functional change, just code refactor. > > Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> > Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> > --- > mm/mmap.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------- > 1 file changed, 82 insertions(+), 38 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index d1eb87e..2504094 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2686,34 +2686,44 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, > return __split_vma(mm, vma, addr, new_below); > } > > -/* Munmap is split into 2 main parts -- this part which finds > - * what needs doing, and the areas themselves, which do the > - * work. This now handles partial unmappings. > - * Jeremy Fitzhardinge <jeremy@goop.org> > - */ > -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, > - struct list_head *uf) > +static inline bool munmap_addr_sanity(unsigned long start, size_t len) Since it's returning bool, the proper naming scheme would be something like "munmap_addr_ok()". I don't know how I would replace the "munmap_" prefix myself though. > { > - unsigned long end; > - struct vm_area_struct *vma, *prev, *last; > - > if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) > - return -EINVAL; > + return false; > > - len = PAGE_ALIGN(len); > - if (len == 0) > - return -EINVAL; > + if (PAGE_ALIGN(len) == 0) > + return false; > + > + return true; > +} > + > +/* > + * munmap_lookup_vma: find the first overlap vma and split overlap vmas. > + * @mm: mm_struct > + * @vma: the first overlapping vma > + * @prev: vma's prev > + * @start: start address > + * @end: end address > + * > + * returns 1 if successful, 0 or errno otherwise > + */ > +static int munmap_lookup_vma(struct mm_struct *mm, struct vm_area_struct **vma, > + struct vm_area_struct **prev, unsigned long start, > + unsigned long end) Agree with Michal that you could simply return vma, NULL, or error. Caller can easily find out prev from that, it's not like we have to count each cpu cycle here. It will be a bit less tricky code as well, which is a plus. ... > +static inline void munmap_mlock_vma(struct vm_area_struct *vma, > + unsigned long end) This function does munlock, not mlock. You could call it e.g. munlock_vmas(). > +{ > + struct vm_area_struct *tmp = vma; > + > + while (tmp && tmp->vm_start < end) { > + if (tmp->vm_flags & VM_LOCKED) { > + vma->vm_mm->locked_vm -= vma_pages(tmp); You keep 'vma' just for the vm_mm? Better extract mm pointer first and then you don't need the 'tmp'. > + munlock_vma_pages_all(tmp); > + } > + tmp = tmp->vm_next; > + } > +}
On 8/7/18 7:59 AM, Vlastimil Babka wrote: > On 07/26/2018 08:10 PM, Yang Shi wrote: >> Introduces three new helper functions: >> * munmap_addr_sanity() >> * munmap_lookup_vma() >> * munmap_mlock_vma() >> >> They will be used by do_munmap() and the new do_munmap with zapping >> large mapping early in the later patch. >> >> There is no functional change, just code refactor. >> >> Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> >> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> >> --- >> mm/mmap.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------- >> 1 file changed, 82 insertions(+), 38 deletions(-) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index d1eb87e..2504094 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2686,34 +2686,44 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, >> return __split_vma(mm, vma, addr, new_below); >> } >> >> -/* Munmap is split into 2 main parts -- this part which finds >> - * what needs doing, and the areas themselves, which do the >> - * work. This now handles partial unmappings. >> - * Jeremy Fitzhardinge <jeremy@goop.org> >> - */ >> -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, >> - struct list_head *uf) >> +static inline bool munmap_addr_sanity(unsigned long start, size_t len) > Since it's returning bool, the proper naming scheme would be something > like "munmap_addr_ok()". I don't know how I would replace the "munmap_" > prefix myself though. OK, thanks for the suggestion. > >> { >> - unsigned long end; >> - struct vm_area_struct *vma, *prev, *last; >> - >> if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) >> - return -EINVAL; >> + return false; >> >> - len = PAGE_ALIGN(len); >> - if (len == 0) >> - return -EINVAL; >> + if (PAGE_ALIGN(len) == 0) >> + return false; >> + >> + return true; >> +} >> + >> +/* >> + * munmap_lookup_vma: find the first overlap vma and split overlap vmas. >> + * @mm: mm_struct >> + * @vma: the first overlapping vma >> + * @prev: vma's prev >> + * @start: start address >> + * @end: end address >> + * >> + * returns 1 if successful, 0 or errno otherwise >> + */ >> +static int munmap_lookup_vma(struct mm_struct *mm, struct vm_area_struct **vma, >> + struct vm_area_struct **prev, unsigned long start, >> + unsigned long end) > Agree with Michal that you could simply return vma, NULL, or error. > Caller can easily find out prev from that, it's not like we have to > count each cpu cycle here. It will be a bit less tricky code as well, > which is a plus. > > ... >> +static inline void munmap_mlock_vma(struct vm_area_struct *vma, >> + unsigned long end) > This function does munlock, not mlock. You could call it e.g. > munlock_vmas(). OK > >> +{ >> + struct vm_area_struct *tmp = vma; >> + >> + while (tmp && tmp->vm_start < end) { >> + if (tmp->vm_flags & VM_LOCKED) { >> + vma->vm_mm->locked_vm -= vma_pages(tmp); > You keep 'vma' just for the vm_mm? Better extract mm pointer first and > then you don't need the 'tmp'. OK Thanks, Yang > >> + munlock_vma_pages_all(tmp); >> + } >> + tmp = tmp->vm_next; >> + } >> +}
diff --git a/mm/mmap.c b/mm/mmap.c index d1eb87e..2504094 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2686,34 +2686,44 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, return __split_vma(mm, vma, addr, new_below); } -/* Munmap is split into 2 main parts -- this part which finds - * what needs doing, and the areas themselves, which do the - * work. This now handles partial unmappings. - * Jeremy Fitzhardinge <jeremy@goop.org> - */ -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, - struct list_head *uf) +static inline bool munmap_addr_sanity(unsigned long start, size_t len) { - unsigned long end; - struct vm_area_struct *vma, *prev, *last; - if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) - return -EINVAL; + return false; - len = PAGE_ALIGN(len); - if (len == 0) - return -EINVAL; + if (PAGE_ALIGN(len) == 0) + return false; + + return true; +} + +/* + * munmap_lookup_vma: find the first overlap vma and split overlap vmas. + * @mm: mm_struct + * @vma: the first overlapping vma + * @prev: vma's prev + * @start: start address + * @end: end address + * + * returns 1 if successful, 0 or errno otherwise + */ +static int munmap_lookup_vma(struct mm_struct *mm, struct vm_area_struct **vma, + struct vm_area_struct **prev, unsigned long start, + unsigned long end) +{ + struct vm_area_struct *tmp, *last; /* Find the first overlapping VMA */ - vma = find_vma(mm, start); - if (!vma) + tmp = find_vma(mm, start); + if (!tmp) return 0; - prev = vma->vm_prev; - /* we have start < vma->vm_end */ + + *prev = tmp->vm_prev; + + /* we have start < vma->vm_end */ /* if it doesn't overlap, we have nothing.. */ - end = start + len; - if (vma->vm_start >= end) + if (tmp->vm_start >= end) return 0; /* @@ -2723,7 +2733,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, * unmapped vm_area_struct will remain in use: so lower split_vma * places tmp vma above, and higher split_vma places tmp vma below. */ - if (start > vma->vm_start) { + if (start > tmp->vm_start) { int error; /* @@ -2731,13 +2741,14 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, * not exceed its limit; but let map_count go just above * its limit temporarily, to help free resources as expected. */ - if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) + if (end < tmp->vm_end && + mm->map_count > sysctl_max_map_count) return -ENOMEM; - error = __split_vma(mm, vma, start, 0); + error = __split_vma(mm, tmp, start, 0); if (error) return error; - prev = vma; + *prev = tmp; } /* Does it split the last one? */ @@ -2747,7 +2758,48 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, if (error) return error; } - vma = prev ? prev->vm_next : mm->mmap; + + *vma = *prev ? (*prev)->vm_next : mm->mmap; + + return 1; +} + +static inline void munmap_mlock_vma(struct vm_area_struct *vma, + unsigned long end) +{ + struct vm_area_struct *tmp = vma; + + while (tmp && tmp->vm_start < end) { + if (tmp->vm_flags & VM_LOCKED) { + vma->vm_mm->locked_vm -= vma_pages(tmp); + munlock_vma_pages_all(tmp); + } + tmp = tmp->vm_next; + } +} + +/* Munmap is split into 2 main parts -- this part which finds + * what needs doing, and the areas themselves, which do the + * work. This now handles partial unmappings. + * Jeremy Fitzhardinge <jeremy@goop.org> + */ +int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, + struct list_head *uf) +{ + unsigned long end; + struct vm_area_struct *vma = NULL, *prev; + int ret = 0; + + if (!munmap_addr_sanity(start, len)) + return -EINVAL; + + len = PAGE_ALIGN(len); + + end = start + len; + + ret = munmap_lookup_vma(mm, &vma, &prev, start, end); + if (ret != 1) + return ret; if (unlikely(uf)) { /* @@ -2759,24 +2811,16 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, * split, despite we could. This is unlikely enough * failure that it's not worth optimizing it for. */ - int error = userfaultfd_unmap_prep(vma, start, end, uf); - if (error) - return error; + ret = userfaultfd_unmap_prep(vma, start, end, uf); + if (ret) + return ret; } /* * unlock any mlock()ed ranges before detaching vmas */ - if (mm->locked_vm) { - struct vm_area_struct *tmp = vma; - while (tmp && tmp->vm_start < end) { - if (tmp->vm_flags & VM_LOCKED) { - mm->locked_vm -= vma_pages(tmp); - munlock_vma_pages_all(tmp); - } - tmp = tmp->vm_next; - } - } + if (mm->locked_vm) + munmap_mlock_vma(vma, end); /* * Remove the vma's, and unmap the actual pages