Message ID | 20200515171612.1020-12-catalin.marinas@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Memory Tagging Extension user-space support | expand |
On Fri, May 15, 2020 at 10:16 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > To enable tagging on a memory range, the user must explicitly opt in via > a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new > memory type in the AttrIndx field of a pte, simplify the or'ing of these > bits over the protection_map[] attributes by making MT_NORMAL index 0. Should the userspace stack always be mapped as if with PROT_MTE if the hardware supports it? Such a change would be invisible to non-MTE aware userspace since it would already need to opt in to tag checking via prctl. This would let userspace avoid a complex stack initialization sequence when running with stack tagging enabled on the main thread. Peter
On Wed, May 27, 2020 at 11:57:39AM -0700, Peter Collingbourne wrote: > On Fri, May 15, 2020 at 10:16 AM Catalin Marinas > <catalin.marinas@arm.com> wrote: > > To enable tagging on a memory range, the user must explicitly opt in via > > a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new > > memory type in the AttrIndx field of a pte, simplify the or'ing of these > > bits over the protection_map[] attributes by making MT_NORMAL index 0. > > Should the userspace stack always be mapped as if with PROT_MTE if the > hardware supports it? Such a change would be invisible to non-MTE > aware userspace since it would already need to opt in to tag checking > via prctl. This would let userspace avoid a complex stack > initialization sequence when running with stack tagging enabled on the > main thread. I don't think the stack initialisation is that difficult. On program startup (can be the dynamic loader). Something like (untested): register unsigned long stack asm ("sp"); unsigned long page_sz = sysconf(_SC_PAGESIZE); mprotect((void *)(stack & ~(page_sz - 1)), page_sz, PROT_READ | PROT_WRITE | PROT_MTE | PROT_GROWSDOWN); (the essential part it PROT_GROWSDOWN so that you don't have to specify a stack lower limit) I don't like enabling this by default since it will have a small cost even if the application doesn't enable tag checking. The kernel would still have to zero the tags when mapping the stack and preserve them when swapping out. Another case where this could go wrong is if we want enable some quiet monitoring of user programs: the libc enables PROT_MTE on heap allocations but keeps tag checking disabled as it doesn't want any SIGSEGV; the kernel could enable async TCF and log any faults (rate-limited). Default PROT_MTE stack would get in the way. Anyway, this use-case is something for the future, so far these patches rely on the user solely driving the tag checking mode. I'm fine, however, with enabling PROT_MTE on the main stack based on some ELF note.
The 05/28/2020 10:14, Catalin Marinas wrote: > On Wed, May 27, 2020 at 11:57:39AM -0700, Peter Collingbourne wrote: > > On Fri, May 15, 2020 at 10:16 AM Catalin Marinas > > <catalin.marinas@arm.com> wrote: > > > To enable tagging on a memory range, the user must explicitly opt in via > > > a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new > > > memory type in the AttrIndx field of a pte, simplify the or'ing of these > > > bits over the protection_map[] attributes by making MT_NORMAL index 0. > > > > Should the userspace stack always be mapped as if with PROT_MTE if the > > hardware supports it? Such a change would be invisible to non-MTE > > aware userspace since it would already need to opt in to tag checking > > via prctl. This would let userspace avoid a complex stack > > initialization sequence when running with stack tagging enabled on the > > main thread. > > I don't think the stack initialisation is that difficult. On program > startup (can be the dynamic loader). Something like (untested): > > register unsigned long stack asm ("sp"); > unsigned long page_sz = sysconf(_SC_PAGESIZE); > > mprotect((void *)(stack & ~(page_sz - 1)), page_sz, > PROT_READ | PROT_WRITE | PROT_MTE | PROT_GROWSDOWN); > > (the essential part it PROT_GROWSDOWN so that you don't have to specify > a stack lower limit) does this work even if the currently mapped stack is more than page_sz? determining the mapped main stack area is i think non-trivial to do in userspace (requires parsing /proc/self/maps or similar). ... > I'm fine, however, with enabling PROT_MTE on the main stack based on > some ELF note. note that would likely mean an elf note on the dynamic linker (because a dynamic linked executable may not be loaded by the kernel and ctors in loaded libs run before the executable entry code anyway, so the executable alone cannot be in charge of this decision) i.e. one global switch for all dynamic linked binaries. i think a dynamic linker can map a new stack and switch to it if it needs to control the properties of the stack at runtime (it's wasteful though). and i think there should be a runtime mechanism for the brk area: it should be possible to request that future brk expansions are mapped as PROT_MTE so an mte aware malloc implementation can use brk. i think this is not important in the initial design, but if a prctl flag can do it that may be useful to add (may be at a later time). (and eventually there should be a way to use PROT_MTE on writable global data and appropriate code generation that takes colors into account when globals are accessed, but that requires significant ELF, ld.so and compiler changes, that need not be part of the initial mte design).
On Thu, May 28, 2020 at 12:05:09PM +0100, Szabolcs Nagy wrote: > The 05/28/2020 10:14, Catalin Marinas wrote: > > On Wed, May 27, 2020 at 11:57:39AM -0700, Peter Collingbourne wrote: > > > On Fri, May 15, 2020 at 10:16 AM Catalin Marinas > > > <catalin.marinas@arm.com> wrote: > > > > To enable tagging on a memory range, the user must explicitly opt in via > > > > a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new > > > > memory type in the AttrIndx field of a pte, simplify the or'ing of these > > > > bits over the protection_map[] attributes by making MT_NORMAL index 0. > > > > > > Should the userspace stack always be mapped as if with PROT_MTE if the > > > hardware supports it? Such a change would be invisible to non-MTE > > > aware userspace since it would already need to opt in to tag checking > > > via prctl. This would let userspace avoid a complex stack > > > initialization sequence when running with stack tagging enabled on the > > > main thread. > > > > I don't think the stack initialisation is that difficult. On program > > startup (can be the dynamic loader). Something like (untested): > > > > register unsigned long stack asm ("sp"); > > unsigned long page_sz = sysconf(_SC_PAGESIZE); > > > > mprotect((void *)(stack & ~(page_sz - 1)), page_sz, > > PROT_READ | PROT_WRITE | PROT_MTE | PROT_GROWSDOWN); > > > > (the essential part it PROT_GROWSDOWN so that you don't have to specify > > a stack lower limit) > > does this work even if the currently mapped stack is more than page_sz? > determining the mapped main stack area is i think non-trivial to do in > userspace (requires parsing /proc/self/maps or similar). Because of PROT_GROWSDOWN, the kernel adjusts the start of the range down automatically. It is potentially problematic if the top of the stack is more than a page away and you want the whole stack coloured. I haven't run a test but my reading of the kernel code is that the stack vma would be split in this scenario, so the range beyond sp+page_sz won't have PROT_MTE set. My assumption is that if you do this during program start, the stack is smaller than a page. Alternatively, could we use argv or envp to determine the top of the user stack (the bottom is taken care of by the kernel)? > > I'm fine, however, with enabling PROT_MTE on the main stack based on > > some ELF note. > > note that would likely mean an elf note on the dynamic linker > (because a dynamic linked executable may not be loaded by the > kernel and ctors in loaded libs run before the executable entry > code anyway, so the executable alone cannot be in charge of this > decision) i.e. one global switch for all dynamic linked binaries. I guess parsing such note in the kernel is only useful for static binaries. > i think a dynamic linker can map a new stack and switch to it > if it needs to control the properties of the stack at runtime > (it's wasteful though). There is already user code to check for HWCAP2_MTE and the prctl(), so adding an mprotect() doesn't look like a significant overhead. > and i think there should be a runtime mechanism for the brk area: > it should be possible to request that future brk expansions are > mapped as PROT_MTE so an mte aware malloc implementation can use > brk. i think this is not important in the initial design, but if > a prctl flag can do it that may be useful to add (may be at a > later time). Looking at the kernel code briefly, I think this would work. We do end up with two vmas for the brk, only the expansion having PROT_MTE, and I'd to find a way to store the extra flag. From a coding perspective, it's easier to just set PROT_MTE by default on both brk and initial stack ;) (VM_DATA_DEFAULT_FLAGS). > (and eventually there should be a way to use PROT_MTE on > writable global data and appropriate code generation that > takes colors into account when globals are accessed, but > that requires significant ELF, ld.so and compiler changes, > that need not be part of the initial mte design). The .data section needs to be driven by the ELF information. It's also a file mapping and we don't support PROT_MTE on them even if MAP_PRIVATE. There are complications like DAX where the file you mmap for CoW may be hosted on memory that does not support MTE (copied to RAM on write). Is there a use-case for global data to be tagged?
On Thu, May 28, 2020 at 9:34 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > On Thu, May 28, 2020 at 12:05:09PM +0100, Szabolcs Nagy wrote: > > The 05/28/2020 10:14, Catalin Marinas wrote: > > > On Wed, May 27, 2020 at 11:57:39AM -0700, Peter Collingbourne wrote: > > > > On Fri, May 15, 2020 at 10:16 AM Catalin Marinas > > > > <catalin.marinas@arm.com> wrote: > > > > > To enable tagging on a memory range, the user must explicitly opt > in via > > > > > a new PROT_MTE flag passed to mmap() or mprotect(). Since this is > a new > > > > > memory type in the AttrIndx field of a pte, simplify the or'ing of > these > > > > > bits over the protection_map[] attributes by making MT_NORMAL > index 0. > > > > > > > > Should the userspace stack always be mapped as if with PROT_MTE if > the > > > > hardware supports it? Such a change would be invisible to non-MTE > > > > aware userspace since it would already need to opt in to tag checking > > > > via prctl. This would let userspace avoid a complex stack > > > > initialization sequence when running with stack tagging enabled on > the > > > > main thread. > > > > > > I don't think the stack initialisation is that difficult. On program > > > startup (can be the dynamic loader). Something like (untested): > > > > > > register unsigned long stack asm ("sp"); > > > unsigned long page_sz = sysconf(_SC_PAGESIZE); > > > > > > mprotect((void *)(stack & ~(page_sz - 1)), page_sz, > > > PROT_READ | PROT_WRITE | PROT_MTE | PROT_GROWSDOWN); > > > > > > (the essential part it PROT_GROWSDOWN so that you don't have to specify > > > a stack lower limit) > > > > does this work even if the currently mapped stack is more than page_sz? > > determining the mapped main stack area is i think non-trivial to do in > > userspace (requires parsing /proc/self/maps or similar). > > Because of PROT_GROWSDOWN, the kernel adjusts the start of the range > down automatically. It is potentially problematic if the top of the > stack is more than a page away and you want the whole stack coloured. I > haven't run a test but my reading of the kernel code is that the stack > vma would be split in this scenario, so the range beyond sp+page_sz > won't have PROT_MTE set. > > My assumption is that if you do this during program start, the stack is > smaller than a page. Alternatively, could we use argv or envp to > determine the top of the user stack (the bottom is taken care of by the > kernel)? > PROT_GROWSDOWN seems to work fine in our case, and the extra tag maintenance overhead sounds like a valid argument against setting PROT_MTE unconditionally. On the other hand, we may end up doing this in the userspace in every process. The reason is, PROT_MTE can not be set on a page that contains a live frame with stack tagging because of mismatching tags (IRG is not affected by PROT_MTE but STG is). So ideally, this should be done at (or near) the program entry point, while the stack is mostly empty. > > > > I'm fine, however, with enabling PROT_MTE on the main stack based on > > > some ELF note. > > > > note that would likely mean an elf note on the dynamic linker > > (because a dynamic linked executable may not be loaded by the > > kernel and ctors in loaded libs run before the executable entry > > code anyway, so the executable alone cannot be in charge of this > > decision) i.e. one global switch for all dynamic linked binaries. > > I guess parsing such note in the kernel is only useful for static > binaries. > > > i think a dynamic linker can map a new stack and switch to it > > if it needs to control the properties of the stack at runtime > > (it's wasteful though). > > There is already user code to check for HWCAP2_MTE and the prctl(), so > adding an mprotect() doesn't look like a significant overhead. > > > and i think there should be a runtime mechanism for the brk area: > > it should be possible to request that future brk expansions are > > mapped as PROT_MTE so an mte aware malloc implementation can use > > brk. i think this is not important in the initial design, but if > > a prctl flag can do it that may be useful to add (may be at a > > later time). > > Looking at the kernel code briefly, I think this would work. We do end > up with two vmas for the brk, only the expansion having PROT_MTE, and > I'd to find a way to store the extra flag. > > From a coding perspective, it's easier to just set PROT_MTE by default > on both brk and initial stack ;) (VM_DATA_DEFAULT_FLAGS). > > > (and eventually there should be a way to use PROT_MTE on > > writable global data and appropriate code generation that > > takes colors into account when globals are accessed, but > > that requires significant ELF, ld.so and compiler changes, > > that need not be part of the initial mte design). > > The .data section needs to be driven by the ELF information. It's also a > file mapping and we don't support PROT_MTE on them even if MAP_PRIVATE. > There are complications like DAX where the file you mmap for CoW may be > hosted on memory that does not support MTE (copied to RAM on write). > > Is there a use-case for global data to be tagged? > Yes, catching global buffer overflow bugs. They are not nearly as common as heap-based issues though. > > -- > Catalin >
On Thu, May 28, 2020 at 11:35:50AM -0700, Evgenii Stepanov wrote: > On Thu, May 28, 2020 at 9:34 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Thu, May 28, 2020 at 12:05:09PM +0100, Szabolcs Nagy wrote: > > > On 05/28/2020 10:14, Catalin Marinas wrote: > > > > I don't think the stack initialisation is that difficult. On program > > > > startup (can be the dynamic loader). Something like (untested): > > > > > > > > register unsigned long stack asm ("sp"); > > > > unsigned long page_sz = sysconf(_SC_PAGESIZE); > > > > > > > > mprotect((void *)(stack & ~(page_sz - 1)), page_sz, > > > > PROT_READ | PROT_WRITE | PROT_MTE | PROT_GROWSDOWN); > > > > > > > > (the essential part it PROT_GROWSDOWN so that you don't have to specify > > > > a stack lower limit) > > > > > > does this work even if the currently mapped stack is more than page_sz? > > > determining the mapped main stack area is i think non-trivial to do in > > > userspace (requires parsing /proc/self/maps or similar). > > > > Because of PROT_GROWSDOWN, the kernel adjusts the start of the range > > down automatically. It is potentially problematic if the top of the > > stack is more than a page away and you want the whole stack coloured. I > > haven't run a test but my reading of the kernel code is that the stack > > vma would be split in this scenario, so the range beyond sp+page_sz > > won't have PROT_MTE set. > > > > My assumption is that if you do this during program start, the stack is > > smaller than a page. Alternatively, could we use argv or envp to > > determine the top of the user stack (the bottom is taken care of by the > > kernel)? > > PROT_GROWSDOWN seems to work fine in our case, and the extra tag > maintenance overhead sounds like a valid argument against setting PROT_MTE > unconditionally. > > On the other hand, we may end up doing this in the userspace in every > process. The reason is, PROT_MTE can not be set on a page that contains a > live frame with stack tagging because of mismatching tags (IRG is not > affected by PROT_MTE but STG is). So ideally, this should be done at (or > near) the program entry point, while the stack is mostly empty. Since stack tagging cannot use instructions in the NOP space anyway, I think we need an ELF note to check for the presence of STG etc. and, in addition, we can turn PROT_MTE by default on the initial stack. Maybe on such binaries we could just set PROT_MTE on all anonymous and ramfs mappings (i.e. VM_MTE_ALLOWED implies VM_MTE). For dynamically linked binaries, we base this decision on the main ELF, not the interpreter, and it would be up to the dynamic loader to reject libraries that have such note when HWCAP2_MTE is not present. > > > (and eventually there should be a way to use PROT_MTE on > > > writable global data and appropriate code generation that > > > takes colors into account when globals are accessed, but > > > that requires significant ELF, ld.so and compiler changes, > > > that need not be part of the initial mte design). > > > > The .data section needs to be driven by the ELF information. It's also a > > file mapping and we don't support PROT_MTE on them even if MAP_PRIVATE. > > There are complications like DAX where the file you mmap for CoW may be > > hosted on memory that does not support MTE (copied to RAM on write). > > > > Is there a use-case for global data to be tagged? > > Yes, catching global buffer overflow bugs. They are not nearly as > common as heap-based issues though. OK, so these would be tagged red-zones around global data. IIUC, having different colours for global variables was not considered because of the relocations and relative accesses. If such red-zone colouring is done during load (the dynamic linker?), we could set PROT_MTE only when MAP_PRIVATE and copied on write to make sure it is in RAM. As above, I think this should be driven by some ELF information. There's also the option of scrapping PROT_MTE altogether and enabling MTE (default tag 0) on all anonymous and private+copied pages (i.e. those stored in RAM). At this point, I can't really tell whether there will be a performance impact.
On Thu, May 28, 2020 at 05:34:13PM +0100, Catalin Marinas wrote: > On Thu, May 28, 2020 at 12:05:09PM +0100, Szabolcs Nagy wrote: > > The 05/28/2020 10:14, Catalin Marinas wrote: > > > On Wed, May 27, 2020 at 11:57:39AM -0700, Peter Collingbourne wrote: [...] Just jumping in on this point: > > > > Should the userspace stack always be mapped as if with PROT_MTE if the > > > > hardware supports it? Such a change would be invisible to non-MTE > > > > aware userspace since it would already need to opt in to tag checking > > > > via prctl. This would let userspace avoid a complex stack > > > > initialization sequence when running with stack tagging enabled on the > > > > main thread. > > > > > > I don't think the stack initialisation is that difficult. On program > > > startup (can be the dynamic loader). Something like (untested): > > > > > > register unsigned long stack asm ("sp"); > > > unsigned long page_sz = sysconf(_SC_PAGESIZE); > > > > > > mprotect((void *)(stack & ~(page_sz - 1)), page_sz, > > > PROT_READ | PROT_WRITE | PROT_MTE | PROT_GROWSDOWN); > > > > > > (the essential part it PROT_GROWSDOWN so that you don't have to specify > > > a stack lower limit) > > > > does this work even if the currently mapped stack is more than page_sz? > > determining the mapped main stack area is i think non-trivial to do in > > userspace (requires parsing /proc/self/maps or similar). > > Because of PROT_GROWSDOWN, the kernel adjusts the start of the range > down automatically. It is potentially problematic if the top of the > stack is more than a page away and you want the whole stack coloured. I > haven't run a test but my reading of the kernel code is that the stack > vma would be split in this scenario, so the range beyond sp+page_sz > won't have PROT_MTE set. > > My assumption is that if you do this during program start, the stack is > smaller than a page. Alternatively, could we use argv or envp to > determine the top of the user stack (the bottom is taken care of by the > kernel)? I don't think you can easily know when the stack ends, but perhaps it doesn't matter. From memory, the initial stack looks like: argv/env strings AT_NULL auxv NULL env NULL argv argc <--- sp If we don't care about tagging the strings correctly, we could step to the end of auxv and tag down from there. If we do care about tagging the strings, there's probably no good way to find the end of the string area, other than looking up sp in /proc/self/maps. I'm not sure we should trust all past and future kernels to spit out the strings in a predictable order. Assuming that the last env string has the highest address does not sounds like a good idea to me. It would be easy for someone to break that assumption later without realising. If we're concerned about this, and reading /proc/self/auxv is deemed unacceptable (likely: some binaries need to work before /proc is mounted) then we could perhaps add a new auxv entry to report the stack base address to the user startup code. I don't think it matters if all this is "hard" for userspace: only the C library / runtime should be doing this. After libc startup, it's generally too late to do this kind of thing safely. [...] Cheers ---Dave
On Mon, Jun 01, 2020 at 09:55:38AM +0100, Dave P Martin wrote: > On Thu, May 28, 2020 at 05:34:13PM +0100, Catalin Marinas wrote: > > On Thu, May 28, 2020 at 12:05:09PM +0100, Szabolcs Nagy wrote: > > > The 05/28/2020 10:14, Catalin Marinas wrote: > > > > On Wed, May 27, 2020 at 11:57:39AM -0700, Peter Collingbourne wrote: > > > > > Should the userspace stack always be mapped as if with PROT_MTE if the > > > > > hardware supports it? Such a change would be invisible to non-MTE > > > > > aware userspace since it would already need to opt in to tag checking > > > > > via prctl. This would let userspace avoid a complex stack > > > > > initialization sequence when running with stack tagging enabled on the > > > > > main thread. > > > > > > > > I don't think the stack initialisation is that difficult. On program > > > > startup (can be the dynamic loader). Something like (untested): > > > > > > > > register unsigned long stack asm ("sp"); > > > > unsigned long page_sz = sysconf(_SC_PAGESIZE); > > > > > > > > mprotect((void *)(stack & ~(page_sz - 1)), page_sz, > > > > PROT_READ | PROT_WRITE | PROT_MTE | PROT_GROWSDOWN); > > > > > > > > (the essential part it PROT_GROWSDOWN so that you don't have to specify > > > > a stack lower limit) > > > > > > does this work even if the currently mapped stack is more than page_sz? > > > determining the mapped main stack area is i think non-trivial to do in > > > userspace (requires parsing /proc/self/maps or similar). > > > > Because of PROT_GROWSDOWN, the kernel adjusts the start of the range > > down automatically. It is potentially problematic if the top of the > > stack is more than a page away and you want the whole stack coloured. I > > haven't run a test but my reading of the kernel code is that the stack > > vma would be split in this scenario, so the range beyond sp+page_sz > > won't have PROT_MTE set. > > > > My assumption is that if you do this during program start, the stack is > > smaller than a page. Alternatively, could we use argv or envp to > > determine the top of the user stack (the bottom is taken care of by the > > kernel)? > > I don't think you can easily know when the stack ends, but perhaps it > doesn't matter. > > From memory, the initial stack looks like: > > argv/env strings > AT_NULL > auxv > NULL > env > NULL > argv > argc <--- sp > > If we don't care about tagging the strings correctly, we could step to > the end of auxv and tag down from there. > > If we do care about tagging the strings, there's probably no good way > to find the end of the string area, other than looking up sp in > /proc/self/maps. I'm not sure we should trust all past and future > kernels to spit out the strings in a predictable order. I don't think we care about tagging whatever the kernel places on the stack since the argv/envp pointers are untagged. An mprotect(PROT_MTE) may or may not cover the environment but it shouldn't matter as the kernel clears the tags on the corresponding pages anyway. AFAIK stack tagging works by colouring a stack frame on function entry and clearing the tags on return. We would only hit a problem if the function issuing mprotect(sp, PROT_MTE) on and its callers already assumed a PROT_MTE stack. Without PROT_MTE, an STG would be write-ignore, so subsequently turning it on would lead to a mismatch between the pointer and the allocation tags. So PROT_MTE turning on should happen very early in the user process startup code before any code with stack tagging enabled. Whether you reach the top of the stack with such mprotect() doesn't really matter since up to that point there should not be any use of stack tagging. If that's not possible, for example the glibc code setting up the stack was compiled to stack tagging itself, the kernel would have to enable it when the user process starts. However, I'd only do this based on some ELF note.
On Mon, Jun 01, 2020 at 03:45:45PM +0100, Catalin Marinas wrote: > On Mon, Jun 01, 2020 at 09:55:38AM +0100, Dave P Martin wrote: > > On Thu, May 28, 2020 at 05:34:13PM +0100, Catalin Marinas wrote: > > > On Thu, May 28, 2020 at 12:05:09PM +0100, Szabolcs Nagy wrote: > > > > The 05/28/2020 10:14, Catalin Marinas wrote: > > > > > On Wed, May 27, 2020 at 11:57:39AM -0700, Peter Collingbourne wrote: > > > > > > Should the userspace stack always be mapped as if with PROT_MTE if the > > > > > > hardware supports it? Such a change would be invisible to non-MTE > > > > > > aware userspace since it would already need to opt in to tag checking > > > > > > via prctl. This would let userspace avoid a complex stack > > > > > > initialization sequence when running with stack tagging enabled on the > > > > > > main thread. > > > > > > > > > > I don't think the stack initialisation is that difficult. On program > > > > > startup (can be the dynamic loader). Something like (untested): > > > > > > > > > > register unsigned long stack asm ("sp"); > > > > > unsigned long page_sz = sysconf(_SC_PAGESIZE); > > > > > > > > > > mprotect((void *)(stack & ~(page_sz - 1)), page_sz, > > > > > PROT_READ | PROT_WRITE | PROT_MTE | PROT_GROWSDOWN); > > > > > > > > > > (the essential part it PROT_GROWSDOWN so that you don't have to specify > > > > > a stack lower limit) > > > > > > > > does this work even if the currently mapped stack is more than page_sz? > > > > determining the mapped main stack area is i think non-trivial to do in > > > > userspace (requires parsing /proc/self/maps or similar). > > > > > > Because of PROT_GROWSDOWN, the kernel adjusts the start of the range > > > down automatically. It is potentially problematic if the top of the > > > stack is more than a page away and you want the whole stack coloured. I > > > haven't run a test but my reading of the kernel code is that the stack > > > vma would be split in this scenario, so the range beyond sp+page_sz > > > won't have PROT_MTE set. > > > > > > My assumption is that if you do this during program start, the stack is > > > smaller than a page. Alternatively, could we use argv or envp to > > > determine the top of the user stack (the bottom is taken care of by the > > > kernel)? > > > > I don't think you can easily know when the stack ends, but perhaps it > > doesn't matter. > > > > From memory, the initial stack looks like: > > > > argv/env strings > > AT_NULL > > auxv > > NULL > > env > > NULL > > argv > > argc <--- sp > > > > If we don't care about tagging the strings correctly, we could step to > > the end of auxv and tag down from there. > > > > If we do care about tagging the strings, there's probably no good way > > to find the end of the string area, other than looking up sp in > > /proc/self/maps. I'm not sure we should trust all past and future > > kernels to spit out the strings in a predictable order. > > I don't think we care about tagging whatever the kernel places on the > stack since the argv/envp pointers are untagged. An mprotect(PROT_MTE) > may or may not cover the environment but it shouldn't matter as the > kernel clears the tags on the corresponding pages anyway. We have no match-all tag, right? So we do rely on the tags being cleared for the initial stack contents so that using untagged pointers to access it works. > AFAIK stack tagging works by colouring a stack frame on function entry > and clearing the tags on return. We would only hit a problem if the > function issuing mprotect(sp, PROT_MTE) on and its callers already > assumed a PROT_MTE stack. Without PROT_MTE, an STG would be > write-ignore, so subsequently turning it on would lead to a mismatch > between the pointer and the allocation tags. > > So PROT_MTE turning on should happen very early in the user process > startup code before any code with stack tagging enabled. Whether you > reach the top of the stack with such mprotect() doesn't really matter > since up to that point there should not be any use of stack tagging. If > that's not possible, for example the glibc code setting up the stack was > compiled to stack tagging itself, the kernel would have to enable it > when the user process starts. However, I'd only do this based on some > ELF note. Sounds fair. This early on, the process shouldn't be exposed to arbitrary, untrusted data. So it's probably not a problem that tagging isn't turned on right from the start. Cheers ---Dave
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 472c77a68225..770535b7ca35 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -129,14 +129,18 @@ /* * Memory types available. + * + * IMPORTANT: MT_NORMAL must be index 0 since vm_get_page_prot() may 'or' in + * the MT_NORMAL_TAGGED memory type for PROT_MTE mappings. Note + * that protection_map[] only contains MT_NORMAL attributes. */ -#define MT_DEVICE_nGnRnE 0 -#define MT_DEVICE_nGnRE 1 -#define MT_DEVICE_GRE 2 -#define MT_NORMAL_NC 3 -#define MT_NORMAL 4 -#define MT_NORMAL_WT 5 -#define MT_NORMAL_TAGGED 6 +#define MT_NORMAL 0 +#define MT_NORMAL_TAGGED 1 +#define MT_NORMAL_NC 2 +#define MT_NORMAL_WT 3 +#define MT_DEVICE_nGnRnE 4 +#define MT_DEVICE_nGnRE 5 +#define MT_DEVICE_GRE 6 /* * Memory types for Stage-2 translation diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h new file mode 100644 index 000000000000..c77a23869223 --- /dev/null +++ b/arch/arm64/include/asm/mman.h @@ -0,0 +1,64 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_MMAN_H__ +#define __ASM_MMAN_H__ + +#include <uapi/asm/mman.h> + +/* + * There are two conditions required for returning a Normal Tagged memory type + * in arch_vm_get_page_prot(): (1) the user requested it via PROT_MTE passed + * to mmap() or mprotect() and (2) the corresponding vma supports MTE. We + * register (1) as VM_MTE in the vma->vm_flags and (2) as VM_MTE_ALLOWED. Note + * that the latter can only be set during the mmap() call since mprotect() + * does not accept MAP_* flags. + */ +static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, + unsigned long pkey) +{ + if (!system_supports_mte()) + return 0; + + if (prot & PROT_MTE) + return VM_MTE; + + return 0; +} +#define arch_calc_vm_prot_bits arch_calc_vm_prot_bits + +static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) +{ + if (!system_supports_mte()) + return 0; + + /* + * Only allow MTE on anonymous mappings as these are guaranteed to be + * backed by tags-capable memory. The vm_flags may be overridden by a + * filesystem supporting MTE (RAM-based). + */ + if (flags & MAP_ANONYMOUS) + return VM_MTE_ALLOWED; + + return 0; +} +#define arch_calc_vm_flag_bits arch_calc_vm_flag_bits + +static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags) +{ + return (vm_flags & VM_MTE) && (vm_flags & VM_MTE_ALLOWED) ? + __pgprot(PTE_ATTRINDX(MT_NORMAL_TAGGED)) : + __pgprot(0); +} +#define arch_vm_get_page_prot arch_vm_get_page_prot + +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) +{ + unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM; + + if (system_supports_mte()) + supported |= PROT_MTE; + + return (prot & ~supported) == 0; +} +#define arch_validate_prot arch_validate_prot + +#endif /* !__ASM_MMAN_H__ */ diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index c01b52add377..673033e0393b 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -36,7 +36,7 @@ extern int pfn_valid(unsigned long); #endif /* !__ASSEMBLY__ */ -#define VM_DATA_DEFAULT_FLAGS VM_DATA_FLAGS_TSK_EXEC +#define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED) #include <asm-generic/getorder.h> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 647a3f0c7874..f2cd59b01b27 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -665,8 +665,13 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { + /* + * Normal and Normal-Tagged are two different memory types and indices + * in MAIR_EL1. The mask below has to include PTE_ATTRINDX_MASK. + */ const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY | - PTE_PROT_NONE | PTE_VALID | PTE_WRITE; + PTE_PROT_NONE | PTE_VALID | PTE_WRITE | + PTE_ATTRINDX_MASK; /* preserve the hardware dirty information */ if (pte_hw_dirty(pte)) pte = pte_mkdirty(pte); diff --git a/arch/arm64/include/uapi/asm/mman.h b/arch/arm64/include/uapi/asm/mman.h new file mode 100644 index 000000000000..d7677ee84878 --- /dev/null +++ b/arch/arm64/include/uapi/asm/mman.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI__ASM_MMAN_H +#define _UAPI__ASM_MMAN_H + +#include <asm-generic/mman.h> + +/* + * The generic mman.h file reserves 0x10 and 0x20 for arch-specific PROT_* + * flags. + */ +/* 0x10 reserved for PROT_BTI */ +#define PROT_MTE 0x20 /* Normal Tagged mapping */ + +#endif /* !_UAPI__ASM_MMAN_H */ diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8d382d4ec067..2f26112ebb77 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -647,6 +647,10 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_MERGEABLE)] = "mg", [ilog2(VM_UFFD_MISSING)]= "um", [ilog2(VM_UFFD_WP)] = "uw", +#ifdef CONFIG_ARM64_MTE + [ilog2(VM_MTE)] = "mt", + [ilog2(VM_MTE_ALLOWED)] = "", +#endif #ifdef CONFIG_ARCH_HAS_PKEYS /* These come out via ProtectionKey: */ [ilog2(VM_PKEY_BIT0)] = "", diff --git a/include/linux/mm.h b/include/linux/mm.h index 5a323422d783..132ca88e407d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -336,6 +336,14 @@ extern unsigned int kobjsize(const void *objp); # define VM_MPX VM_NONE #endif +#if defined(CONFIG_ARM64_MTE) +# define VM_MTE VM_HIGH_ARCH_0 /* Use Tagged memory for access control */ +# define VM_MTE_ALLOWED VM_HIGH_ARCH_1 /* Tagged memory permitted */ +#else +# define VM_MTE VM_NONE +# define VM_MTE_ALLOWED VM_NONE +#endif + #ifndef VM_GROWSUP # define VM_GROWSUP VM_NONE #endif