Message ID | 20190226091314.18446-1-osalvador@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm,mremap: Bail out earlier in mremap_to under map pressure | expand |
On Tue, 26 Feb 2019 10:13:14 +0100 Oscar Salvador <osalvador@suse.de> wrote: > When using mremap() syscall in addition to MREMAP_FIXED flag, > mremap() calls mremap_to() which does the following: > > 1) unmaps the destination region where we are going to move the map > 2) If the new region is going to be smaller, we unmap the last part > of the old region > > Then, we will eventually call move_vma() to do the actual move. > > move_vma() checks whether we are at least 4 maps below max_map_count > before going further, otherwise it bails out with -ENOMEM. > The problem is that we might have already unmapped the vma's in steps > 1) and 2), so it is not possible for userspace to figure out the state > of the vma's after it gets -ENOMEM, and it gets tricky for userspace > to clean up properly on error path. > > While it is true that we can return -ENOMEM for more reasons > (e.g: see may_expand_vm() or move_page_tables()), I think that we can > avoid this scenario in concret if we check early in mremap_to() if the > operation has high chances to succeed map-wise. > > Should not be that the case, we can bail out before we even try to unmap > anything, so we make sure the vma's are left untouched in case we are likely > to be short of maps. > > The thumb-rule now is to rely on the worst-scenario case we can have. > That is when both vma's (old region and new region) are going to be split > in 3, so we get two more maps to the ones we already hold (one per each). > If current map count + 2 maps still leads us to 4 maps below the threshold, > we are going to pass the check in move_vma(). > > Of course, this is not free, as it might generate false positives when it is > true that we are tight map-wise, but the unmap operation can release several > vma's leading us to a good state. > > Another approach was also investigated [1], but it may be too much hassle > for what it brings. > How is this going to affect existing userspace which is aware of the current behaviour? And how does it affect your existing cleanup code, come to that? Does it work as well or better after this change?
On Tue, Feb 26, 2019 at 02:04:28PM -0800, Andrew Morton wrote: > How is this going to affect existing userspace which is aware of the > current behaviour? Well, current behavior is not really predictable. Our customer was "surprised" that the call to mremap() failed, but the regions got unmapped nevertheless. They found it the hard way when they got a segfault when trying to write to those regions when cleaning up. As I said in the changelog, the possibility for false positives exists, due to the fact that we might get rid of several vma's when unmapping, but I do not expect existing userspace applications to start failing. Should be that the case, we can revert the patch, it is not that it adds a lot of churn. > And how does it affect your existing cleanup code, come to that? Does > it work as well or better after this change? I guess the customer can trust more reliable that the maps were left untouched. I still have my reserves though. We can get as far as move_vma(), and copy_vma() can fail returning -ENOMEM. (Or not due to the "too small to fail" ?)
On 2/27/19 10:32 PM, Oscar Salvador wrote: > On Tue, Feb 26, 2019 at 02:04:28PM -0800, Andrew Morton wrote: >> How is this going to affect existing userspace which is aware of the >> current behaviour? > > Well, current behavior is not really predictable. > Our customer was "surprised" that the call to mremap() failed, but the regions > got unmapped nevertheless. > They found it the hard way when they got a segfault when trying to write to those > regions when cleaning up. > > As I said in the changelog, the possibility for false positives exists, due to > the fact that we might get rid of several vma's when unmapping, but I do not > expect existing userspace applications to start failing. > Should be that the case, we can revert the patch, it is not that it adds a lot > of churn. Hopefully the only program that would start failing would be a LTP test testing the current behavior near the limit (if such test exists). And that can be adjusted. >> And how does it affect your existing cleanup code, come to that? Does >> it work as well or better after this change? > > I guess the customer can trust more reliable that the maps were left untouched. > I still have my reserves though. > > We can get as far as move_vma(), and copy_vma() can fail returning -ENOMEM. > (Or not due to the "too small to fail" ?) >
On Thu, Feb 28, 2019 at 12:06 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 2/27/19 10:32 PM, Oscar Salvador wrote: > > On Tue, Feb 26, 2019 at 02:04:28PM -0800, Andrew Morton wrote: > >> How is this going to affect existing userspace which is aware of the > >> current behaviour? > > > > Well, current behavior is not really predictable. > > Our customer was "surprised" that the call to mremap() failed, but the regions > > got unmapped nevertheless. > > They found it the hard way when they got a segfault when trying to write to those > > regions when cleaning up. > > > > As I said in the changelog, the possibility for false positives exists, due to > > the fact that we might get rid of several vma's when unmapping, but I do not > > expect existing userspace applications to start failing. > > Should be that the case, we can revert the patch, it is not that it adds a lot > > of churn. > > Hopefully the only program that would start failing would be a LTP test > testing the current behavior near the limit (if such test exists). And > that can be adjusted. > IMO the original behavior is itself probably not a big issue because if userspace wanted to mremap over something, it was prepared to lose the "over something" mapping anyway. So it does seem to be a stretch to call the behavior a "bug". Still I agree with the patch that mremap should not leave any side effects after returning error. thanks, - Joel
Hi! > Hopefully the only program that would start failing would be a LTP test > testing the current behavior near the limit (if such test exists). And > that can be adjusted. There does not seem to be a mremap() test that would do such a thing, so we should be safe :-). BTW there was a similar fix for mmap() with MAP_FIXED that caused a LTP test to fail and was fixed in: commit e8420a8ece80b3fe810415ecf061d54ca7fab266 Author: Cyril Hrubis <chrubis@suse.cz> Date: Mon Apr 29 15:08:33 2013 -0700 mm/mmap: check for RLIMIT_AS before unmapping And I haven't heard of any breakages so far so I guess that this very similar situation and that the possibility of breaking real world applications here is really low.
diff --git a/mm/mremap.c b/mm/mremap.c index 3320616ed93f..e3edef6b7a12 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -516,6 +516,23 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if (addr + old_len > new_addr && new_addr + new_len > addr) goto out; + /* + * move_vma() need us to stay 4 maps below the threshold, otherwise + * it will bail out at the very beginning. + * That is a problem if we have already unmaped the regions here + * (new_addr, and old_addr), because userspace will not know the + * state of the vma's after it gets -ENOMEM. + * So, to avoid such scenario we can pre-compute if the whole + * operation has high chances to success map-wise. + * Worst-scenario case is when both vma's (new_addr and old_addr) get + * split in 3 before unmaping it. + * That means 2 more maps (1 for each) to the ones we already hold. + * Check whether current map count plus 2 still leads us to 4 maps below + * the threshold, otherwise return -ENOMEM here to be more safe. + */ + if ((mm->map_count + 2) >= sysctl_max_map_count - 3) + return -ENOMEM; + ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); if (ret) goto out;