Message ID | 20220511022751.65540-5-kirill.shutemov@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Linear Address Masking enabling | expand |
On Wed, May 11 2022 at 05:27, Kirill A. Shutemov wrote: > Add three new arch_prctl() handles: > > +static long thread_feature_prctl(struct task_struct *task, int option, > + unsigned long features) Bah. I really hate the task pointer on all these x86 prctls. @task must always be current, so this @task argument is just confusion. > +{ > + const unsigned long known_features = 0; > + > + if (features & ~known_features) > + return -EINVAL; This implementation allows to task->read features_[locked] with @features == 0. That should be documented somewhere. > + if (option == ARCH_THREAD_FEATURE_LOCK) { > + task->thread.features_locked |= features; > + return task->thread.features_locked; > + } > + /* Do not allow to change locked features */ > + if (features & task->thread.features_locked) > + return -EPERM; > + > + if (option == ARCH_THREAD_FEATURE_DISABLE) { > + task->thread.features &= ~features; > + goto out; > + } > + > + /* Handle ARCH_THREAD_FEATURE_ENABLE */ > + > + task->thread.features |= features; > +out: > + return task->thread.features; Eyes bleed. if (option == ARCH_THREAD_FEATURE_ENABLE) task->thread.features |= features; else task->thread.features &= ~features; return task->thread.features; It's bloody obvious from the code that the counterpart of enable is disable, no? So neither the goto nor the 'comment the obvious' is useful in any way. Thanks, tglx
> + > + /* Handle ARCH_THREAD_FEATURE_ENABLE */ > + > + task->thread.features |= features; > +out: > + return task->thread.features; Isn't arch_prctl() supposed to return 0 on success?
On Fri, 2022-05-13 at 16:09 +0200, Alexander Potapenko wrote: > > + > > + /* Handle ARCH_THREAD_FEATURE_ENABLE */ > > + > > + task->thread.features |= features; > > +out: > > + return task->thread.features; > > Isn't arch_prctl() supposed to return 0 on success? Hmm, good point. Maybe we'll need a struct to pass info in and out.
On Fri, May 13, 2022 at 05:34:12PM +0000, Edgecombe, Rick P wrote: > On Fri, 2022-05-13 at 16:09 +0200, Alexander Potapenko wrote: > > > + > > > + /* Handle ARCH_THREAD_FEATURE_ENABLE */ > > > + > > > + task->thread.features |= features; > > > +out: > > > + return task->thread.features; > > > > Isn't arch_prctl() supposed to return 0 on success? > > Hmm, good point. Maybe we'll need a struct to pass info in and out. But values >0 are unused. I don't see why can't we use them.
On Sat, 2022-05-14 at 02:09 +0300, Kirill A. Shutemov wrote: > On Fri, May 13, 2022 at 05:34:12PM +0000, Edgecombe, Rick P wrote: > > On Fri, 2022-05-13 at 16:09 +0200, Alexander Potapenko wrote: > > > > + > > > > + /* Handle ARCH_THREAD_FEATURE_ENABLE */ > > > > + > > > > + task->thread.features |= features; > > > > +out: > > > > + return task->thread.features; > > > > > > Isn't arch_prctl() supposed to return 0 on success? > > > > Hmm, good point. Maybe we'll need a struct to pass info in and out. > > But values >0 are unused. I don't see why can't we use them. Hmm, I don't know what it would break since it is a new "code" argument. But the man page says: "On success, arch_prctl() returns 0; on error, -1 is returned, and errno is set to indicate the error." So just change the man pages? "On success, arch_prctl() returns positive values; on error, -1 is returned, and errno is set to indicate the error."
On Fri, May 13 2022 at 23:50, Edgecombe, Rick P wrote: > On Sat, 2022-05-14 at 02:09 +0300, Kirill A. Shutemov wrote: >> On Fri, May 13, 2022 at 05:34:12PM +0000, Edgecombe, Rick P wrote: >> > On Fri, 2022-05-13 at 16:09 +0200, Alexander Potapenko wrote: >> > > > + >> > > > + /* Handle ARCH_THREAD_FEATURE_ENABLE */ >> > > > + >> > > > + task->thread.features |= features; >> > > > +out: >> > > > + return task->thread.features; >> > > >> > > Isn't arch_prctl() supposed to return 0 on success? >> > >> > Hmm, good point. Maybe we'll need a struct to pass info in and out. >> >> But values >0 are unused. I don't see why can't we use them. > > Hmm, I don't know what it would break since it is a new "code" > argument. But the man page says: > "On success, arch_prctl() returns 0; on error, -1 is returned, and > errno is set to indicate the error." > > So just change the man pages? > "On success, arch_prctl() returns positive values; on error, -1 is > returned, and errno is set to indicate the error." Why? prctl(GET, &out) is the pattern used all over the place. Thanks, tglx
On Sat, 2022-05-14 at 10:37 +0200, Thomas Gleixner wrote: > On Fri, May 13 2022 at 23:50, Edgecombe, Rick P wrote: > > On Sat, 2022-05-14 at 02:09 +0300, Kirill A. Shutemov wrote: > > > On Fri, May 13, 2022 at 05:34:12PM +0000, Edgecombe, Rick P > > > wrote: > > > > On Fri, 2022-05-13 at 16:09 +0200, Alexander Potapenko wrote: > > > > > > + > > > > > > + /* Handle ARCH_THREAD_FEATURE_ENABLE */ > > > > > > + > > > > > > + task->thread.features |= features; > > > > > > +out: > > > > > > + return task->thread.features; > > > > > > > > > > Isn't arch_prctl() supposed to return 0 on success? > > > > > > > > Hmm, good point. Maybe we'll need a struct to pass info in and > > > > out. > > > > > > But values >0 are unused. I don't see why can't we use them. > > > > Hmm, I don't know what it would break since it is a new "code" > > argument. But the man page says: > > "On success, arch_prctl() returns 0; on error, -1 is returned, and > > errno is set to indicate the error." > > > > So just change the man pages? > > "On success, arch_prctl() returns positive values; on error, -1 is > > returned, and errno is set to indicate the error." > > Why? > > prctl(GET, &out) > > is the pattern used all over the place. It seems better to me, but we also need to pass something in. The idea of the "enable" operation is that userspace would pass in all the features that it wants in one call, and then find out back what was successfully enabled. So unlike the other arch_prctl()s, it wants to pass something in AND get a result in one arch_prctl() call. It doesn't need to check what is supported ahead of time. Since these enabling operations can fail (OOM, etc), userspace has to handle unexpected per- feature failure anyway. So it just blindly asks for what it wants. Any objections to having it write back the result in the same structure? Otherwise, the option that used to be used here was a "status" arch_prctl(), which was called separately to find out what actually got enabled after an "enable" call. That fit with the GET/SET semantics already in place. I guess we could also get rid of the batch enabling idea, and just have one "enable" call per feature too. But then it is syscall heavy. Any chance you could weigh in on this?
On Sat, May 14 2022 at 23:06, Edgecombe, Rick P wrote: > On Sat, 2022-05-14 at 10:37 +0200, Thomas Gleixner wrote: >> > "On success, arch_prctl() returns positive values; on error, -1 is >> > returned, and errno is set to indicate the error." >> >> Why? >> >> prctl(GET, &out) >> >> is the pattern used all over the place. > > It seems better to me, but we also need to pass something in. > > The idea of the "enable" operation is that userspace would pass in all > the features that it wants in one call, and then find out back what was > successfully enabled. So unlike the other arch_prctl()s, it wants to > pass something in AND get a result in one arch_prctl() call. It doesn't > need to check what is supported ahead of time. Since these enabling > operations can fail (OOM, etc), userspace has to handle unexpected per- > feature failure anyway. So it just blindly asks for what it wants. I'm not convinced at all, that this wholesale enabling of independent features makes any sense. That would require: - that all features are strict binary of/off which is alredy not the case with LAM due to the different mask sizes. - that user space knows at some potentially early point of process startup which features it needs. Some of them might be requested later when libraries are loaded, but that would be too late for others. Aside of that, if you lump all these things together, what's the semantics of this feature lump in case of failure: Try to enable the rest when one fails, skip the rest, roll back? Also such a feature lump results in a demultiplexing prctl() which is code wise always inferior to dedicated controls. > Any objections to having it write back the result in the same > structure? Why not. > Otherwise, the option that used to be used here was a "status" > arch_prctl(), which was called separately to find out what actually got > enabled after an "enable" call. That fit with the GET/SET semantics > already in place. > > I guess we could also get rid of the batch enabling idea, and just have > one "enable" call per feature too. But then it is syscall heavy. This is not a runtime hotpath problem. Those prctls() happen once when the process starts, so having three which are designed for the individual purpose instead of one ill defined is definitely the better choice. Premature optimization is never a good idea. Keep it simple is the right starting point. If it really turns out to be something which matters, then you can provide a batch interface later on if it makes sense to do so, but see above. Thanks, tglx
On Sun, 2022-05-15 at 11:02 +0200, Thomas Gleixner wrote: > > Otherwise, the option that used to be used here was a "status" > > arch_prctl(), which was called separately to find out what actually > > got > > enabled after an "enable" call. That fit with the GET/SET semantics > > already in place. > > > > I guess we could also get rid of the batch enabling idea, and just > > have > > one "enable" call per feature too. But then it is syscall heavy. > > This is not a runtime hotpath problem. Those prctls() happen once > when > the process starts, so having three which are designed for the > individual purpose instead of one ill defined is definitely the > better > choice. > > Premature optimization is never a good idea. Keep it simple is the > right > starting point. > > If it really turns out to be something which matters, then you can > provide a batch interface later on if it makes sense to do so, but > see > above. Thanks, sounds good to me. Kirill, so I guess we can just change ARCH_THREAD_FEATURE_ENABLE/ ARCH_THREAD_FEATURE_DISABLE to return EINVAL if more than one bit is set. It returns 0 on success and whatever error code on failure. Userspace can do whatever rollback logic it wants. What do you think?
On Sun, May 15 2022 at 18:24, Edgecombe, Rick P wrote: > On Sun, 2022-05-15 at 11:02 +0200, Thomas Gleixner wrote: >> If it really turns out to be something which matters, then you can >> provide a batch interface later on if it makes sense to do so, but >> see >> above. > > Thanks, sounds good to me. > > Kirill, so I guess we can just change ARCH_THREAD_FEATURE_ENABLE/ > ARCH_THREAD_FEATURE_DISABLE to return EINVAL if more than one bit is > set. It returns 0 on success and whatever error code on failure. > Userspace can do whatever rollback logic it wants. What do you think? Why having this feature bit interface in the first place? It's going to be a demultiplex mechanism with incompatible arguments. Just look at LAM. What's really architecture specific about it? The mechanism per se is architecture independent: pointer tagging. What's architecture specific is whether it's supported, the address mask and the enable/disable mechanism. So having e.g. prctl(POINTER_TAGGING_GET_MASK, &mask); works on all architectures which support this. Ditto prctl(POINTER_TAGGING_ENABLE, &mask); is architecture agnostic. Both need to be backed by an architecture specific implementation of course. This makes it future proof because new CPUs could define the mask to be bit 57-61 and use bit 62 for something else. So from a user space perspective the mask retrival is useful because it's obvious and trivial to use and does not need code changes when the hardware implementation provides a different mask. See? The thread.features bitmap could still be used as an internal storage for enabled features, but having this as the primary programming interface is cumbersome and unflexible for anything which is not binary on/off. Thanks, tglx
On Sun, 2022-05-15 at 21:38 +0200, Thomas Gleixner wrote: > On Sun, May 15 2022 at 18:24, Edgecombe, Rick P wrote: > > On Sun, 2022-05-15 at 11:02 +0200, Thomas Gleixner wrote: > > > If it really turns out to be something which matters, then you > > > can > > > provide a batch interface later on if it makes sense to do so, > > > but > > > see > > > above. > > > > Thanks, sounds good to me. > > > > Kirill, so I guess we can just change ARCH_THREAD_FEATURE_ENABLE/ > > ARCH_THREAD_FEATURE_DISABLE to return EINVAL if more than one bit > > is > > set. It returns 0 on success and whatever error code on failure. > > Userspace can do whatever rollback logic it wants. What do you > > think? > > Why having this feature bit interface in the first place? The idea was that we should not have duplicate interfaces if we can avoid it. It of course grew out of the "elf feature bit" stuff, but we considered splitting them after moving away from that. LAM and CET's enabling needs seemed close enough to avoid having two interfaces. > > It's going to be a demultiplex mechanism with incompatible > arguments. Just look at LAM. What's really architecture specific > about > it? > > The mechanism per se is architecture independent: pointer tagging. > > What's architecture specific is whether it's supported, the address > mask > and the enable/disable mechanism. > > So having e.g. > > prctl(POINTER_TAGGING_GET_MASK, &mask); > > works on all architectures which support this. Ditto > > prctl(POINTER_TAGGING_ENABLE, &mask); > > is architecture agnostic. Both need to be backed by an architecture > specific implementation of course. > > This makes it future proof because new CPUs could define the mask to > be > bit 57-61 and use bit 62 for something else. So from a user space > perspective the mask retrival is useful because it's obvious and > trivial > to use and does not need code changes when the hardware > implementation > provides a different mask. The lack of ability to pass extra arguments is a good point. > > See? Regarding making it arch specific or not, if the LAM interface can be arch agnostic, then that makes sense to me. I guess some CPU features (virtual memory, etc) are similar enough that the kernel can hide them beyond common interfaces. Some aren't (cpuid, gs register, etc). If LAM can be one of the former, then sharing an interface with other architectures does seem much better. I'm thinking CET is different enough from other similar features that leaving it as an arch thing is probably appropriate. BTI is probably the closest (to IBT). It uses it's own BTI specific elf header bit, and requires special PROT on memory, unlike IBT. > > The thread.features bitmap could still be used as an internal storage > for enabled features, but having this as the primary programming > interface is cumbersome and unflexible for anything which is not > binary > on/off. > > Thanks, > > tglx > >
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 91d0f93a00c7..ff0c34e18cc6 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -530,6 +530,9 @@ struct thread_struct { */ u32 pkru; + unsigned long features; + unsigned long features_locked; + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 500b96e71f18..67fc30d36c73 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -20,4 +20,9 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +/* Never implement 0x3001, it will confuse old glibc's */ +#define ARCH_THREAD_FEATURE_ENABLE 0x3002 +#define ARCH_THREAD_FEATURE_DISABLE 0x3003 +#define ARCH_THREAD_FEATURE_LOCK 0x3004 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index b370767f5b19..cb8fc28f2eae 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -367,6 +367,10 @@ void arch_setup_new_exec(void) task_clear_spec_ssb_noexec(current); speculation_ctrl_update(read_thread_flags()); } + + /* Reset thread features on exec */ + current->thread.features = 0; + current->thread.features_locked = 0; } #ifdef CONFIG_X86_IOPL_IOPERM @@ -985,6 +989,35 @@ unsigned long __get_wchan(struct task_struct *p) return addr; } +static long thread_feature_prctl(struct task_struct *task, int option, + unsigned long features) +{ + const unsigned long known_features = 0; + + if (features & ~known_features) + return -EINVAL; + + if (option == ARCH_THREAD_FEATURE_LOCK) { + task->thread.features_locked |= features; + return task->thread.features_locked; + } + + /* Do not allow to change locked features */ + if (features & task->thread.features_locked) + return -EPERM; + + if (option == ARCH_THREAD_FEATURE_DISABLE) { + task->thread.features &= ~features; + goto out; + } + + /* Handle ARCH_THREAD_FEATURE_ENABLE */ + + task->thread.features |= features; +out: + return task->thread.features; +} + long do_arch_prctl_common(struct task_struct *task, int option, unsigned long arg2) { @@ -999,6 +1032,10 @@ long do_arch_prctl_common(struct task_struct *task, int option, case ARCH_GET_XCOMP_GUEST_PERM: case ARCH_REQ_XCOMP_GUEST_PERM: return fpu_xstate_prctl(task, option, arg2); + case ARCH_THREAD_FEATURE_ENABLE: + case ARCH_THREAD_FEATURE_DISABLE: + case ARCH_THREAD_FEATURE_LOCK: + return thread_feature_prctl(task, option, arg2); } return -EINVAL;
Add three new arch_prctl() handles: - ARCH_THREAD_FEATURE_ENABLE/DISABLE enables or disables the specified features. Returns what features are enabled after the operation. - ARCH_THREAD_FEATURE_LOCK prevents future disabling or enabling of the specified features. Returns the new set of locked features. The features handled per-thread and inherited over fork(2)/clone(2), but reset on exec(). This is preparation patch. It does not impelement any features. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- arch/x86/include/asm/processor.h | 3 +++ arch/x86/include/uapi/asm/prctl.h | 5 +++++ arch/x86/kernel/process.c | 37 +++++++++++++++++++++++++++++++ 3 files changed, 45 insertions(+)