diff mbox

[v2] core, x86: Add user return notifiers

Message ID 1253342422-13811-1-git-send-email-avi@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Avi Kivity Sept. 19, 2009, 6:40 a.m. UTC
Add a general per-cpu notifier that is called whenever the kernel is
about to return to userspace.  The notifier uses a thread_info flag
and existing checks, so there is no impact on user return or context
switch fast paths.

Signed-off-by: Avi Kivity <avi@redhat.com>
---

v2: include new files in patch


 arch/Kconfig                         |   10 +++++++
 arch/x86/Kconfig                     |    1 +
 arch/x86/include/asm/thread_info.h   |    7 +++-
 arch/x86/kernel/process.c            |    2 +
 arch/x86/kernel/signal.c             |    3 ++
 include/linux/user-return-notifier.h |   42 +++++++++++++++++++++++++++++++
 kernel/Makefile                      |    1 +
 kernel/user-return-notifier.c        |   46 ++++++++++++++++++++++++++++++++++
 8 files changed, 110 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/user-return-notifier.h
 create mode 100644 kernel/user-return-notifier.c

Comments

Avi Kivity Sept. 22, 2009, 9:25 a.m. UTC | #1
On 09/19/2009 09:40 AM, Avi Kivity wrote:
> Add a general per-cpu notifier that is called whenever the kernel is
> about to return to userspace.  The notifier uses a thread_info flag
> and existing checks, so there is no impact on user return or context
> switch fast paths.
>    

Ingo/Peter?
Arjan van de Ven Sept. 22, 2009, 9:37 a.m. UTC | #2
On Tue, 22 Sep 2009 12:25:33 +0300
Avi Kivity <avi@redhat.com> wrote:

> On 09/19/2009 09:40 AM, Avi Kivity wrote:
> > Add a general per-cpu notifier that is called whenever the kernel is
> > about to return to userspace.  The notifier uses a thread_info flag
> > and existing checks, so there is no impact on user return or context
> > switch fast paths.
> >    
> 
> Ingo/Peter?

isn't this like really expensive when used ?
Avi Kivity Sept. 22, 2009, 9:48 a.m. UTC | #3
On 09/22/2009 12:37 PM, Arjan van de Ven wrote:
> On Tue, 22 Sep 2009 12:25:33 +0300
> Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 09/19/2009 09:40 AM, Avi Kivity wrote:
>>      
>>> Add a general per-cpu notifier that is called whenever the kernel is
>>> about to return to userspace.  The notifier uses a thread_info flag
>>> and existing checks, so there is no impact on user return or context
>>> switch fast paths.
>>>
>>>        
>> Ingo/Peter?
>>      
> isn't this like really expensive when used ?
>    

No, why?  It triggers a call to do_notify_resume() and a walks a list of 
length 1.
Ingo Molnar Sept. 22, 2009, 2:32 p.m. UTC | #4
* Avi Kivity <avi@redhat.com> wrote:

> On 09/19/2009 09:40 AM, Avi Kivity wrote:
>> Add a general per-cpu notifier that is called whenever the kernel is
>> about to return to userspace.  The notifier uses a thread_info flag
>> and existing checks, so there is no impact on user return or context
>> switch fast paths.
>>    
>
> Ingo/Peter?

Would be nice to convert some existing open-coded return-to-user-space 
logic to this facility. One such candidate would be lockdep_sys_exit?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 22, 2009, 2:45 p.m. UTC | #5
On 09/22/2009 05:32 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 09/19/2009 09:40 AM, Avi Kivity wrote:
>>      
>>> Add a general per-cpu notifier that is called whenever the kernel is
>>> about to return to userspace.  The notifier uses a thread_info flag
>>> and existing checks, so there is no impact on user return or context
>>> switch fast paths.
>>>
>>>        
>> Ingo/Peter?
>>      
> Would be nice to convert some existing open-coded return-to-user-space
> logic to this facility. One such candidate would be lockdep_sys_exit?
>    

I only implemented this for x86, while lockdep is arch independent.  If 
arch support is added, it should be trivial.

I think perf counters could use preempt notifiers though, these are arch 
independent.
H. Peter Anvin Sept. 22, 2009, 3:19 p.m. UTC | #6
Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
> 
>> On 09/19/2009 09:40 AM, Avi Kivity wrote:
>>> Add a general per-cpu notifier that is called whenever the kernel is
>>> about to return to userspace.  The notifier uses a thread_info flag
>>> and existing checks, so there is no impact on user return or context
>>> switch fast paths.
>>>    
>> Ingo/Peter?
> 
> Would be nice to convert some existing open-coded return-to-user-space 
> logic to this facility. One such candidate would be lockdep_sys_exit?
> 
> 	Ingo

Sorry, limited bandwidth due to LinuxCon, but I like the concept, and 
the previous (partial) patch was really clean.  I agree with Ingo that 
arch support so we can use this as a general facility would be nice, but 
I don't consider that as a prerequisite for merging.

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 22, 2009, 3:50 p.m. UTC | #7
On 09/22/2009 05:45 PM, Avi Kivity wrote:
>> Would be nice to convert some existing open-coded return-to-user-space
>> logic to this facility. One such candidate would be lockdep_sys_exit?
>
> I only implemented this for x86, while lockdep is arch independent.  
> If arch support is added, it should be trivial.
>

The lockdep_sys_exit bit is actually x86/s390 only, and can easily be 
adapted to use the new functionality on x86 only.  I'll try it out.
Peter Zijlstra Sept. 22, 2009, 4:50 p.m. UTC | #8
On Tue, 2009-09-22 at 16:32 +0200, Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
> 
> > On 09/19/2009 09:40 AM, Avi Kivity wrote:
> >> Add a general per-cpu notifier that is called whenever the kernel is
> >> about to return to userspace.  The notifier uses a thread_info flag
> >> and existing checks, so there is no impact on user return or context
> >> switch fast paths.
> >>    
> >
> > Ingo/Peter?
> 
> Would be nice to convert some existing open-coded return-to-user-space 
> logic to this facility. One such candidate would be lockdep_sys_exit?

And here I was thinking this was one of the hottest code paths in the
whole kernel...

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 22, 2009, 4:52 p.m. UTC | #9
On 09/22/2009 07:50 PM, Peter Zijlstra wrote:
>> Would be nice to convert some existing open-coded return-to-user-space
>> logic to this facility. One such candidate would be lockdep_sys_exit?
>>      
> And here I was thinking this was one of the hottest code paths in the
> whole kernel...
>    

If you're using lockdep, surely that's not your biggest worry?
Peter Zijlstra Sept. 22, 2009, 4:55 p.m. UTC | #10
On Tue, 2009-09-22 at 19:52 +0300, Avi Kivity wrote:
> On 09/22/2009 07:50 PM, Peter Zijlstra wrote:
> >> Would be nice to convert some existing open-coded return-to-user-space
> >> logic to this facility. One such candidate would be lockdep_sys_exit?
> >>      
> > And here I was thinking this was one of the hottest code paths in the
> > whole kernel...
> >    
> 
> If you're using lockdep, surely that's not your biggest worry?

No, but that's all under #ifdef and fully disappears when not enabled.
Generic return-tu-user notifiers don't sound like they will though.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 22, 2009, 5:05 p.m. UTC | #11
On 09/22/2009 07:55 PM, Peter Zijlstra wrote:
>> If you're using lockdep, surely that's not your biggest worry?
>>      
> No, but that's all under #ifdef and fully disappears when not enabled.
> Generic return-tu-user notifiers don't sound like they will though.
>    

They will if not selected.  If selected and not armed, they will have 
zero runtime impact since they piggyback on existing branches 
(_TIF_DO_NOTIFY_MASK and near relatives).  If selected and armed they'll 
cause __switch_to_xtra() on every context switch and do_notity_resume() 
on syscall exit until disarmed, but then you've asked for it.
Avi Kivity Sept. 22, 2009, 5:08 p.m. UTC | #12
On 09/22/2009 06:50 PM, Avi Kivity wrote:
> On 09/22/2009 05:45 PM, Avi Kivity wrote:
>>> Would be nice to convert some existing open-coded return-to-user-space
>>> logic to this facility. One such candidate would be lockdep_sys_exit?
>>
>> I only implemented this for x86, while lockdep is arch independent.  
>> If arch support is added, it should be trivial.
>>
>
> The lockdep_sys_exit bit is actually x86/s390 only, and can easily be 
> adapted to use the new functionality on x86 only.  I'll try it out.

Unfortunately it doesn't work out well.  The notifier is called until 
explicitly unregistered (since it relies on a bit in TIF_NOTIFY_MASK), 
so we have to disarm it on the first return to usersspace or it spins 
forever.  We could re-arm it on the next kernel entry, but we don't have 
a kernel entry notifier so we'll just be moving hooks from one point to 
another.
Peter Zijlstra Sept. 22, 2009, 6:06 p.m. UTC | #13
On Tue, 2009-09-22 at 16:32 +0200, Ingo Molnar wrote:

> Would be nice to convert some existing open-coded return-to-user-space 
> logic to this facility. One such candidate would be lockdep_sys_exit?

I don't really like lockdep_sys_exit() in such a call,
lockdep_sys_exit() is currently placed such that it will execute as the
last C code before returning to user-space.

Put it any earlier and you've got a window where people can leak a lock
to userspace undetected.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/Kconfig b/arch/Kconfig
index beea3cc..b1d0757 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -82,6 +82,13 @@  config KRETPROBES
 	def_bool y
 	depends on KPROBES && HAVE_KRETPROBES
 
+config USER_RETURN_NOTIFIER
+	bool
+	depends on HAVE_USER_RETURN_NOTIFIER
+	help
+	  Provide a kernel-internal notification when a cpu is about to
+	  switch to user mode.
+
 config HAVE_IOREMAP_PROT
 	bool
 
@@ -125,4 +132,7 @@  config HAVE_DMA_API_DEBUG
 config HAVE_DEFAULT_NO_SPIN_MUTEXES
 	bool
 
+config HAVE_USER_RETURN_NOTIFIER
+	bool
+
 source "kernel/gcov/Kconfig"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fc20fdc..ed21d6a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -50,6 +50,7 @@  config X86
 	select HAVE_KERNEL_BZIP2
 	select HAVE_KERNEL_LZMA
 	select HAVE_ARCH_KMEMCHECK
+	select HAVE_USER_RETURN_NOTIFIER
 
 config OUTPUT_FORMAT
 	string
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d27d0a2..375c917 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -83,6 +83,7 @@  struct thread_info {
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
 #define TIF_SECCOMP		8	/* secure computing */
 #define TIF_MCE_NOTIFY		10	/* notify userspace of an MCE */
+#define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* 32bit process */
 #define TIF_FORK		18	/* ret_from_fork */
@@ -107,6 +108,7 @@  struct thread_info {
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
 #define _TIF_MCE_NOTIFY		(1 << TIF_MCE_NOTIFY)
+#define _TIF_USER_RETURN_NOTIFY	(1 << TIF_USER_RETURN_NOTIFY)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
 #define _TIF_FORK		(1 << TIF_FORK)
@@ -142,13 +144,14 @@  struct thread_info {
 
 /* Only used for 64 bit */
 #define _TIF_DO_NOTIFY_MASK						\
-	(_TIF_SIGPENDING|_TIF_MCE_NOTIFY|_TIF_NOTIFY_RESUME)
+	(_TIF_SIGPENDING | _TIF_MCE_NOTIFY | _TIF_NOTIFY_RESUME |	\
+	 _TIF_USER_RETURN_NOTIFY)
 
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW							\
 	(_TIF_IO_BITMAP|_TIF_DEBUGCTLMSR|_TIF_DS_AREA_MSR|_TIF_NOTSC)
 
-#define _TIF_WORK_CTXSW_PREV _TIF_WORK_CTXSW
+#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY)
 #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW|_TIF_DEBUG)
 
 #define PREEMPT_ACTIVE		0x10000000
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 071166a..7ea6972 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -9,6 +9,7 @@ 
 #include <linux/pm.h>
 #include <linux/clockchips.h>
 #include <linux/random.h>
+#include <linux/user-return-notifier.h>
 #include <trace/power.h>
 #include <asm/system.h>
 #include <asm/apic.h>
@@ -227,6 +228,7 @@  void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		 */
 		memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
 	}
+	propagate_user_return_notify(prev_p, next_p);
 }
 
 int sys_fork(struct pt_regs *regs)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 81e5823..13aa99c 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -19,6 +19,7 @@ 
 #include <linux/stddef.h>
 #include <linux/personality.h>
 #include <linux/uaccess.h>
+#include <linux/user-return-notifier.h>
 
 #include <asm/processor.h>
 #include <asm/ucontext.h>
@@ -872,6 +873,8 @@  do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
 		if (current->replacement_session_keyring)
 			key_replace_session_keyring();
 	}
+	if (thread_info_flags & _TIF_USER_RETURN_NOTIFY)
+		fire_user_return_notifiers();
 
 #ifdef CONFIG_X86_32
 	clear_thread_flag(TIF_IRET);
diff --git a/include/linux/user-return-notifier.h b/include/linux/user-return-notifier.h
new file mode 100644
index 0000000..ef04e2e
--- /dev/null
+++ b/include/linux/user-return-notifier.h
@@ -0,0 +1,42 @@ 
+#ifndef _LINUX_USER_RETURN_NOTIFIER_H
+#define _LINUX_USER_RETURN_NOTIFIER_H
+
+#ifdef CONFIG_USER_RETURN_NOTIFIER
+
+#include <linux/list.h>
+#include <linux/sched.h>
+
+struct user_return_notifier {
+	void (*on_user_return)(struct user_return_notifier *urn);
+	struct hlist_node link;
+};
+
+
+void user_return_notifier_register(struct user_return_notifier *urn);
+void user_return_notifier_unregister(struct user_return_notifier *urn);
+
+static inline void propagate_user_return_notify(struct task_struct *prev,
+						struct task_struct *next)
+{
+	if (test_tsk_thread_flag(prev, TIF_USER_RETURN_NOTIFY)) {
+		clear_tsk_thread_flag(prev, TIF_USER_RETURN_NOTIFY);
+		set_tsk_thread_flag(next, TIF_USER_RETURN_NOTIFY);
+	}
+}
+
+void fire_user_return_notifiers(void);
+
+#else
+
+struct user_return_notifier {};
+
+static inline propagate_user_return_notify(struct task_struct *prev,
+					   struct task_struct *next)
+{
+}
+
+static inline void fire_user_return_notifiers(void) {}
+
+#endif
+
+#endif
diff --git a/kernel/Makefile b/kernel/Makefile
index 961379c..f6abe84 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -98,6 +98,7 @@  obj-$(CONFIG_RING_BUFFER) += trace/
 obj-$(CONFIG_SMP) += sched_cpupri.o
 obj-$(CONFIG_SLOW_WORK) += slow-work.o
 obj-$(CONFIG_PERF_COUNTERS) += perf_counter.o
+obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
 
 ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff --git a/kernel/user-return-notifier.c b/kernel/user-return-notifier.c
new file mode 100644
index 0000000..530ccb8
--- /dev/null
+++ b/kernel/user-return-notifier.c
@@ -0,0 +1,46 @@ 
+
+#include <linux/user-return-notifier.h>
+#include <linux/percpu.h>
+#include <linux/sched.h>
+#include <linux/module.h>
+
+static DEFINE_PER_CPU(struct hlist_head, return_notifier_list);
+
+#define URN_LIST_HEAD per_cpu(return_notifier_list, raw_smp_processor_id())
+
+/*
+ * Request a notification when the current cpu returns to userspace.  Must be
+ * called in atomic context.  The notifier will also be called in atomic
+ * context.
+ */
+void user_return_notifier_register(struct user_return_notifier *urn)
+{
+	set_tsk_thread_flag(current, TIF_USER_RETURN_NOTIFY);
+	hlist_add_head(&urn->link, &URN_LIST_HEAD);
+}
+EXPORT_SYMBOL_GPL(user_return_notifier_register);
+
+/*
+ * Removes a registered user return notifier.  Must be called from atomic
+ * context, and from the same cpu registration occured in.
+ */
+void user_return_notifier_unregister(struct user_return_notifier *urn)
+{
+	hlist_del(&urn->link);
+	if (hlist_empty(&URN_LIST_HEAD))
+		clear_tsk_thread_flag(current, TIF_USER_RETURN_NOTIFY);
+}
+EXPORT_SYMBOL_GPL(user_return_notifier_unregister);
+
+/* Calls registered user return notifiers */
+void fire_user_return_notifiers(void)
+{
+	struct user_return_notifier *urn;
+	struct hlist_node *tmp1, *tmp2;
+	struct hlist_head *head;
+
+	head = &get_cpu_var(return_notifier_list);
+	hlist_for_each_entry_safe(urn, tmp1, tmp2, head, link)
+		urn->on_user_return(urn);
+	put_cpu_var();
+}