Message ID | 20160816183231.21179-1-cov@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Aug 16, 2016 at 02:32:29PM -0400, Christopher Covington wrote: > Some userspace applications need to know the maximum virtual address they can > use (TASK_SIZE). Just curious, what are the cases needing TASK_SIZE in user space?
On August 17, 2016 6:30:06 AM EDT, Catalin Marinas <catalin.marinas@arm.com> wrote: >On Tue, Aug 16, 2016 at 02:32:29PM -0400, Christopher Covington wrote: >> Some userspace applications need to know the maximum virtual address >they can >> use (TASK_SIZE). > >Just curious, what are the cases needing TASK_SIZE in user space? Checkpoint/Restore In Userspace and the Mozilla Javascript Engine https://bugzilla.mozilla.org/show_bug.cgi?id=1143022 are the specific cases I've run into. I've heard LuaJIT might have a similar situation. In general I think making allocations from the top down is a shortcut for finding a large unused region of memory. Thanks, Cov
On 17 August 2016 at 13:12, Christopher Covington <cov@codeaurora.org> wrote: > > > On August 17, 2016 6:30:06 AM EDT, Catalin Marinas <catalin.marinas@arm.com> wrote: >>On Tue, Aug 16, 2016 at 02:32:29PM -0400, Christopher Covington wrote: >>> Some userspace applications need to know the maximum virtual address >>they can >>> use (TASK_SIZE). >> >>Just curious, what are the cases needing TASK_SIZE in user space? > > Checkpoint/Restore In Userspace and the Mozilla Javascript Engine https://bugzilla.mozilla.org/show_bug.cgi?id=1143022 are the specific cases I've run into. I've heard LuaJIT might have a similar situation. In general I think making allocations from the top down is a shortcut for finding a large unused region of memory. > One aspect of this that I would like to discuss is whether the current practice makes sense, of tying TASK_SIZE to whatever the size of the kernel VA space is. I could imagine simply limiting the user VA space to 39-bits (or even 36-bits, depending on how deeply we care about 16 KB pages), and implement an arch specific hook (prctl() perhaps?) to increase TASK_SIZE on demand. That would not only give us a reliable way to check whether this is supported (i.e., the prctl() would return error if it isn't), it also allows for some optimizations, since a 48-bit VA kernel can run all processes using 3 levels with relative ease (and switching between 4levels and 3levels processes would also be possible, but would either require a TLB flush, or would result in this optimization to be disabled globally, whichever is less costly in terms of performance)
On Wed, Aug 17, 2016 at 1:12 PM, Christopher Covington <cov@codeaurora.org> wrote: > > > On August 17, 2016 6:30:06 AM EDT, Catalin Marinas <catalin.marinas@arm.com> wrote: >>On Tue, Aug 16, 2016 at 02:32:29PM -0400, Christopher Covington wrote: >>> Some userspace applications need to know the maximum virtual address >>they can >>> use (TASK_SIZE). >> >>Just curious, what are the cases needing TASK_SIZE in user space? > > Checkpoint/Restore In Userspace and the Mozilla Javascript Engine https://bugzilla.mozilla.org/show_bug.cgi?id=1143022 are the specific cases I've run into. I've heard LuaJIT might have a similar situation. In general I think making allocations from the top down is a shortcut for finding a large unused region of memory. I think this makes sense for all archs. At lest UserModeLinux on x86 also needs to know bottom and top addresses of the usable address space. Currently it figures by scanning and catching SIGSEGV.
On Thu, Aug 18, 2016 at 02:00:56PM +0200, Ard Biesheuvel wrote: > On 17 August 2016 at 13:12, Christopher Covington <cov@codeaurora.org> wrote: > > On August 17, 2016 6:30:06 AM EDT, Catalin Marinas <catalin.marinas@arm.com> wrote: > >>On Tue, Aug 16, 2016 at 02:32:29PM -0400, Christopher Covington wrote: > >>> Some userspace applications need to know the maximum virtual address > >>they can > >>> use (TASK_SIZE). > >> > >>Just curious, what are the cases needing TASK_SIZE in user space? > > > > Checkpoint/Restore In Userspace and the Mozilla Javascript Engine > > https://bugzilla.mozilla.org/show_bug.cgi?id=1143022 are the > > specific cases I've run into. I've heard LuaJIT might have a similar > > situation. In general I think making allocations from the top down > > is a shortcut for finding a large unused region of memory. > > One aspect of this that I would like to discuss is whether the current > practice makes sense, of tying TASK_SIZE to whatever the size of the > kernel VA space is. I'm fine with decoupling them as long as we can have sane pgd/pud/pmd/pte macros. We rely on generic files line pgtable-nopud.h etc. currently, so we would have to give up on that and do our own checks. It's also worth testing any potential performance implication of creating/tearing down large page tables with the new macros. > I could imagine simply limiting the user VA space to 39-bits (or even > 36-bits, depending on how deeply we care about 16 KB pages), and > implement an arch specific hook (prctl() perhaps?) to increase > TASK_SIZE on demand. As you stated below, switching TASK_SIZE on demand is problematic if you actually want a switch the TCR_EL1.T0SZ. As per other recent discussions, I'm not sure we can do it safely without full TLBI on context switch. That's an aspect we'll have to sort out with 52-bit VA but most likely we'll allow this range in T0SZ and only artificially limit TASK_SIZE to smaller values so that we don't break any other tasks. But then you won't gain much from a reduced number of page table levels. > That would not only give us a reliable way to check whether this is > supported (i.e., the prctl() would return error if it isn't), it also > allows for some optimizations, since a 48-bit VA kernel can run all > processes using 3 levels with relative ease (and switching between > 4levels and 3levels processes would also be possible, but would either > require a TLB flush, or would result in this optimization to be > disabled globally, whichever is less costly in terms of performance) I'm more for using 48-bit VA permanently for both user and kernel (and 52-bit VA at some point in the future, though limiting user space to 48-bit VA by default). But it would be good to get some benchmark numbers on the impact to see whether it's still worth keeping the other VA combinations around.
On 18 August 2016 at 14:42, Catalin Marinas <catalin.marinas@arm.com> wrote: > On Thu, Aug 18, 2016 at 02:00:56PM +0200, Ard Biesheuvel wrote: >> On 17 August 2016 at 13:12, Christopher Covington <cov@codeaurora.org> wrote: >> > On August 17, 2016 6:30:06 AM EDT, Catalin Marinas <catalin.marinas@arm.com> wrote: >> >>On Tue, Aug 16, 2016 at 02:32:29PM -0400, Christopher Covington wrote: >> >>> Some userspace applications need to know the maximum virtual address >> >>they can >> >>> use (TASK_SIZE). >> >> >> >>Just curious, what are the cases needing TASK_SIZE in user space? >> > >> > Checkpoint/Restore In Userspace and the Mozilla Javascript Engine >> > https://bugzilla.mozilla.org/show_bug.cgi?id=1143022 are the >> > specific cases I've run into. I've heard LuaJIT might have a similar >> > situation. In general I think making allocations from the top down >> > is a shortcut for finding a large unused region of memory. >> >> One aspect of this that I would like to discuss is whether the current >> practice makes sense, of tying TASK_SIZE to whatever the size of the >> kernel VA space is. > > I'm fine with decoupling them as long as we can have sane > pgd/pud/pmd/pte macros. We rely on generic files line pgtable-nopud.h > etc. currently, so we would have to give up on that and do our own > checks. It's also worth testing any potential performance implication of > creating/tearing down large page tables with the new macros. > Well, I don't think it is necessarily worth the trouble of rewriting all that. My concern is that TASK_SIZE randomly increased to 48 bits recently, merely because some Freescale SoCs cannot fit their RAM into the linear mapping on a 39-bit VA kernel. This had nothing to do with userland requirements. Do we know the userland requirements? What use cases do we know about that require >39 bit userland VA space? >> I could imagine simply limiting the user VA space to 39-bits (or even >> 36-bits, depending on how deeply we care about 16 KB pages), and >> implement an arch specific hook (prctl() perhaps?) to increase >> TASK_SIZE on demand. > > As you stated below, switching TASK_SIZE on demand is problematic if you > actually want a switch the TCR_EL1.T0SZ. As per other recent > discussions, I'm not sure we can do it safely without full TLBI on > context switch. That's an aspect we'll have to sort out with 52-bit VA > but most likely we'll allow this range in T0SZ and only artificially > limit TASK_SIZE to smaller values so that we don't break any other > tasks. But then you won't gain much from a reduced number of page table > levels. > There are several ways to go about this. The 48-bit VA kernel could run everything with 3 levels, and simply switch to 4 levels the moment some process needs it. So we keep all the existing macros, but simply point TTBR0_EL1 to the level 1 translation table rather than to the level 0 table (and update T0SZ accordingly). So when the first 48 bit VA userland process arrives (which may be never in many cases), we either switch to 4 levels for everything (and the page tables are already set up for that), or we do a TLB flush, but only when switching from a 4levels task to a 3levels task or vice versa (but this is messy so the first approach is probably more suitable) So there is no associated space savings, only the TLB and cache footprint gets optimized. >> That would not only give us a reliable way to check whether this is >> supported (i.e., the prctl() would return error if it isn't), it also >> allows for some optimizations, since a 48-bit VA kernel can run all >> processes using 3 levels with relative ease (and switching between >> 4levels and 3levels processes would also be possible, but would either >> require a TLB flush, or would result in this optimization to be >> disabled globally, whichever is less costly in terms of performance) > > I'm more for using 48-bit VA permanently for both user and kernel (and > 52-bit VA at some point in the future, though limiting user space to > 48-bit VA by default). But it would be good to get some benchmark > numbers on the impact to see whether it's still worth keeping the other > VA combinations around. > Of course, none of this complexity is justified if the performance impact is negligible. I do wonder about the virt case, though.
Hi Richard, On 08/18/2016 08:17 AM, Richard Weinberger wrote: > On Wed, Aug 17, 2016 at 1:12 PM, Christopher Covington > <cov@codeaurora.org> wrote: >> >> >> On August 17, 2016 6:30:06 AM EDT, Catalin Marinas <catalin.marinas@arm.com> wrote: >>> On Tue, Aug 16, 2016 at 02:32:29PM -0400, Christopher Covington wrote: >>>> Some userspace applications need to know the maximum virtual address >>> they can >>>> use (TASK_SIZE). >>> >>> Just curious, what are the cases needing TASK_SIZE in user space? >> >> Checkpoint/Restore In Userspace and the Mozilla Javascript Engine https://bugzilla.mozilla.org/show_bug.cgi?id=1143022 are the specific cases I've run into. I've heard LuaJIT might have a similar situation. In general I think making allocations from the top down is a shortcut for finding a large unused region of memory. > > I think this makes sense for all archs. > At lest UserModeLinux on x86 also needs to know bottom and top > addresses of the usable > address space. > Currently it figures by scanning and catching SIGSEGV. For the bottom, can you use /proc/sys/vm/mmap_min_addr? Cov
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index a55384f..3811795 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -145,6 +145,7 @@ typedef struct user_fpsimd_state elf_fpregset_t; do { \ NEW_AUX_ENT(AT_SYSINFO_EHDR, \ (elf_addr_t)current->mm->context.vdso); \ + NEW_AUX_ENT(AT_TASKSZ, TASK_SIZE); \ } while (0) #define ARCH_HAS_SETUP_ADDITIONAL_PAGES diff --git a/arch/arm64/include/uapi/asm/auxvec.h b/arch/arm64/include/uapi/asm/auxvec.h index 4cf0c17..595bfda 100644 --- a/arch/arm64/include/uapi/asm/auxvec.h +++ b/arch/arm64/include/uapi/asm/auxvec.h @@ -18,7 +18,8 @@ /* vDSO location */ #define AT_SYSINFO_EHDR 33 +#define AT_TASKSZ 34 -#define AT_VECTOR_SIZE_ARCH 1 /* entries in ARCH_DLINFO */ +#define AT_VECTOR_SIZE_ARCH 2 /* entries in ARCH_DLINFO */ #endif
Some userspace applications need to know the maximum virtual address they can use (TASK_SIZE). There are several possible values for TASK_SIZE with the arm64 kernel, and such applications are either making bad hard-coded assumptions, or are guessing and checking using system calls like munmap(), which may have other reasons for returning an error than TASK_SIZE being exceeded. To make correct functioning easy for userspace applications that need to know the maximum virtual address they can use, communicate TASK_SIZE via the ELF auxiliary vector, just like PAGE_SIZE is currently communicated. Signed-off-by: Christopher Covington <cov@codeaurora.org> --- Tested with the following commands: LD_SHOW_AUXV=1 sleep 1 # GNU dynamic ld-linux*.so hexdump -v -e '4/4 "%08x " "\n"' /proc/self/auxv | \ sed -r 's/0*([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+)/\1 0x\4\3/ s/^0 / NULL: / s/^3 / PHDR: / s/^4 / PHENT: / s/^5 / PHNUM: / s/^6 / PAGESZ: / s/^7 / BASE: / s/^8 / FLAGS: / s/^9 / ENTRY: / s/^b / UID: / s/^c / EUID: / s/^d / GID: / s/^e / EGID: / s/^f /PLATFORM: / s/^10 / HWCAP: / s/^11 / CLKTCK: / s/^17 / SECURE: / s/^19 / RANDOM: / s/^1f / EXECFN: / s/^21 / VDSO: / s/^22 / TASKSZ: /' # compatible with static busybox --- arch/arm64/include/asm/elf.h | 1 + arch/arm64/include/uapi/asm/auxvec.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-)