Message ID | TY2PR04MB29753892EBAD17D9E8FECBAEE86A0@TY2PR04MB2975.apcprd04.prod.outlook.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: allow unmapped hole at head side of mbind range | expand |
+ linux-api On 10/24/19 9:35 AM, Li Xinhai wrote: > From: Li Xinhai <xinhai.li@outlook.com> > > mbind_range silently ignore unmapped hole at middle and tail of the > specified range, but report EFAULT if hole at head side. Hmm that's unfortunate. mbind() manpage says: EFAULT Part or all of the memory range specified by nodemask and maxnode points outside your accessible address space. Or, there was an unmapped hole in the specified memory range specified by addr and len. That sounds like any hole inside the specified range should return EFAULT. But perhaps it can be also interpreted as you suggest, that the whole range is an unmapped hole. There's some risk of breaking existing userspace if we change it either way. > It is more reasonable to support silently ignore holes at any part of > the range, only report EFAULT if the whole range is in hole. > > Signed-off-by: Li Xinhai <xinhai.li@outlook.com> > --- > > mm/mempolicy.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index 4ae967bcf954..ae160d9936d9 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start, > unsigned long vmend; > > vma = find_vma(mm, start); > - if (!vma || vma->vm_start > start) > + if (!vma || vma->vm_start >= end) > return -EFAULT; > > prev = vma->vm_prev; >
On Thu 24-10-19 07:35:06, Li Xinhai wrote: > From: Li Xinhai <xinhai.li@outlook.com> > > mbind_range silently ignore unmapped hole at middle and tail of the > specified range, but report EFAULT if hole at head side. > It is more reasonable to support silently ignore holes at any part of > the range, only report EFAULT if the whole range is in hole. The changelog is a bit cryptic but you are right. find_vma returns the first vma that ends above the given address. If vm_start > start then there is still an overlap possible [ vma ] [start end] and we should mbind [vma->vm_start, end] at least. I haven't checked whether changing the condition is sufficient for the rest of the code to work properly. I am pretty sure a test case shouldn't be really hard to construct and add to the kernel testing machinery. Btw. when writing a changelog then it is always preferred to describe user visible effect of the patch. In this case it would be an unexpected EFAULT on a range that starts before an existing VMA while still overlapping it. Make sure to note that. Fixes: 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") > Signed-off-by: Li Xinhai <xinhai.li@outlook.com> > --- > > mm/mempolicy.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index 4ae967bcf954..ae160d9936d9 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start, > unsigned long vmend; > > vma = find_vma(mm, start); > - if (!vma || vma->vm_start > start) > + if (!vma || vma->vm_start >= end) > return -EFAULT; > > prev = vma->vm_prev; >
On Thu, 24 Oct 2019, Vlastimil Babka wrote: > + linux-api > > On 10/24/19 9:35 AM, Li Xinhai wrote: > > From: Li Xinhai <xinhai.li@outlook.com> > > > > mbind_range silently ignore unmapped hole at middle and tail of the > > specified range, but report EFAULT if hole at head side. > > > Hmm that's unfortunate. mbind() manpage says: > > EFAULT Part or all of the memory range specified by nodemask and maxnode > points outside your accessible address space. Or, there was an unmapped > hole in the specified memory range specified by addr and len. > > That sounds like any hole inside the specified range should return > EFAULT. Yes (though an exception is allowed when restoring to default). > But perhaps it can be also interpreted as you suggest, that the > whole range is an unmapped hole. There's some risk of breaking existing > userspace if we change it either way. > > > It is more reasonable to support silently ignore holes at any part of > > the range, only report EFAULT if the whole range is in hole. > > > > Signed-off-by: Li Xinhai <xinhai.li@outlook.com> Xinhai, I'm sceptical about this patch: is it something you found by code inspection, or something you found when using mbind()? I've not looked long enough to be certain, nor experimented, but: mbind_range() is only one stage of the mbind() syscall implementation, and is preceded by queue_pages_range(): look what queue_pages_test_walk() does when MPOL_MF_DISCONTIG_OK not set. My impression is that mbind_range() is merely correcting an omission from the checks already made my queue_pages_test_walk() (an odd way to proceed, I admit: would be better to check initially than later). I do think that you should not make this change without considering MPOL_MF_DISCONTIG_OK and its intention. Hugh > > --- > > > > mm/mempolicy.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > index 4ae967bcf954..ae160d9936d9 100644 > > --- a/mm/mempolicy.c > > +++ b/mm/mempolicy.c > > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start, > > unsigned long vmend; > > > > vma = find_vma(mm, start); > > - if (!vma || vma->vm_start > start) > > + if (!vma || vma->vm_start >= end) > > return -EFAULT; > > > > prev = vma->vm_prev; > >
On 2019-10-28 at 15:14:51 Hugh Dickins wrote: >On Thu, 24 Oct 2019, Vlastimil Babka wrote: > >> + linux-api >> >> On 10/24/19 9:35 AM, Li Xinhai wrote: >> > From: Li Xinhai <xinhai.li@outlook.com> >> > >> > mbind_range silently ignore unmapped hole at middle and tail of the >> > specified range, but report EFAULT if hole at head side. >> >> >> Hmm that's unfortunate. mbind() manpage says: >> >> EFAULT Part or all of the memory range specified by nodemask and maxnode >> points outside your accessible address space. Or, there was an unmapped >> hole in the specified memory range specified by addr and len. >> >> That sounds like any hole inside the specified range should return >> EFAULT. > >Yes (though an exception is allowed when restoring to default). > >> But perhaps it can be also interpreted as you suggest, that the >> whole range is an unmapped hole. There's some risk of breaking existing >> userspace if we change it either way. >> >> > It is more reasonable to support silently ignore holes at any part of >> > the range, only report EFAULT if the whole range is in hole. >> > >> > Signed-off-by: Li Xinhai <xinhai.li@outlook.com> > >Xinhai, I'm sceptical about this patch: is it something you found >by code inspection, or something you found when using mbind()? > I encountered issue when using mbind (my issue was about using nodemask parameter), and then found this special range checking in mbind_range(). >I've not looked long enough to be certain, nor experimented, but: > >mbind_range() is only one stage of the mbind() syscall implementation, >and is preceded by queue_pages_range(): look what queue_pages_test_walk() >does when MPOL_MF_DISCONTIG_OK not set. > >My impression is that mbind_range() is merely correcting an omission >from the checks already made my queue_pages_test_walk() (an odd way >to proceed, I admit: would be better to check initially than later). > >I do think that you should not make this change without considering >MPOL_MF_DISCONTIG_OK and its intention. > >Hugh > A program was used to reveal issues as below. #include <stddef.h> #include <sys/mman.h> #include <numaif.h> int main(int argc, char *argv[]) { void *mapAddr; unsigned long nodemask; mapAddr = mmap(NULL, 6*(1<<12), PROT_READ|PROT_WRITE, MAP_PRIVATE| MAP_ANONYMOUS, -1, 0); // BIND and leave 2 pages as hole in the middle nodemask = 0x1; mbind(mapAddr, 6*(1<<12), MPOL_BIND, &nodemask, 2, 0); munmap(mapAddr+2*(1<<12), 2*(1<<12)); // part 1 mbind(mapAddr-1*(1<<12), 2*(1<<12), MPOL_DEFAULT, NULL, 0, 0); mbind(mapAddr, 3*(1<<12), MPOL_DEFAULT, NULL, 0, 0); // part 2 nodemask = 0x2; mbind(mapAddr+3*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0); mbind(mapAddr+4*(1<<12), 3*(1<<12), MPOL_BIND, &nodemask, 3, 0); mbind(mapAddr+3*(1<<12), 1*(1<<12), MPOL_BIND, &nodemask, 3, 0); mbind(mapAddr+4*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0); return 0; } syscall results: 83 mmap(NULL, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd24e13000 84 mbind(0x7fbd24e13000, 24576, MPOL_BIND, [0x0000000000000001], 2, 0) = 0 85 munmap(0x7fbd24e15000, 8192) = 0 // part 1 86 mbind(0x7fbd24e12000, 8192, MPOL_DEFAULT, NULL, 0, 0) = -1 EFAULT (Bad address) 87 mbind(0x7fbd24e13000, 12288, MPOL_DEFAULT, NULL, 0, 0) = 0 // part 2 88 mbind(0x7fbd24e16000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address) 89 mbind(0x7fbd24e17000, 12288, MPOL_BIND, [0x0000000000000002], 3, 0) = 0 90 mbind(0x7fbd24e16000, 4096, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address) 91 mbind(0x7fbd24e17000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = 0 The results on line 86 and line 89 were not correct (other lines were expected): line 86: hole at head side of range was reported as error; this should sucess for MPOL_DEFAULT; line 89: hole at tail side of range was reported as success; this should fail for !MPOL_DEFAULT cases; My patch only corrected line 86 case, but didn't handle line 89 case. It is better to detect valid or invalid hole for MPOL_DEFAULT and !MPOL_DEFAULT cases in queue_pages_range phase. New patch will be prepared, and fullfill the linux API description: 1. for MPOL_DEFAULT, hole at any part of specified range is allowed; 2. for !MPOL_DEFAULT, hole at any part of specified range is not allowed. Xinhai (BTW, I am adding two more mail accounts of mine to check which is best for this mailling list...) >> > --- >> > >> > mm/mempolicy.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > >> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> > index 4ae967bcf954..ae160d9936d9 100644 >> > --- a/mm/mempolicy.c >> > +++ b/mm/mempolicy.c >> > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start, >> > unsigned long vmend; >> > >> > vma = find_vma(mm, start); >> > - if (!vma || vma->vm_start > start) >> > + if (!vma || vma->vm_start >= end) >> > return -EFAULT; >> > >> > prev = vma->vm_prev; >> > >
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4ae967bcf954..ae160d9936d9 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start, unsigned long vmend; vma = find_vma(mm, start); - if (!vma || vma->vm_start > start) + if (!vma || vma->vm_start >= end) return -EFAULT; prev = vma->vm_prev;