diff mbox series

[v33,15/21] x86/vdso: Add support for exception fixup in vDSO functions

Message ID 20200617220844.57423-16-jarkko.sakkinen@linux.intel.com (mailing list archive)
State Rejected
Headers show
Series Intel SGX foundations | expand

Commit Message

Jarkko Sakkinen June 17, 2020, 10:08 p.m. UTC
From: Sean Christopherson <sean.j.christopherson@intel.com>

The basic concept and implementation is very similar to the kernel's
exception fixup mechanism.  The key differences are that the kernel
handler is hardcoded and the fixup entry addresses are relative to
the overall table as opposed to individual entries.

Hardcoding the kernel handler avoids the need to figure out how to
get userspace code to point at a kernel function.  Given that the
expected usage is to propagate information to userspace, dumping all
fault information into registers is likely the desired behavior for
the vast majority of yet-to-be-created functions.  Use registers
DI, SI and DX to communicate fault information, which follows Linux's
ABI for register consumption and hopefully avoids conflict with
hardware features that might leverage the fixup capabilities, e.g.
register usage for SGX instructions was at least partially designed
with calling conventions in mind.

Making fixup addresses relative to the overall table allows the table
to be stripped from the final vDSO image (it's a kernel construct)
without complicating the offset logic, e.g. entry-relative addressing
would also need to account for the table's location relative to the
image.

Regarding stripping the table, modify vdso2c to extract the table from
the raw, a.k.a. unstripped, data and dump it as a standalone byte array
in the resulting .c file.  The original base of the table, its length
and a pointer to the byte array are captured in struct vdso_image.
Alternatively, the table could be dumped directly into the struct,
but because the number of entries can vary per image, that would
require either hardcoding a max sized table into the struct definition
or defining the table as a flexible length array.  The flexible length
array approach has zero benefits, e.g. the base/size are still needed,
and prevents reusing the extraction code, while hardcoding the max size
adds ongoing maintenance just to avoid exporting the explicit size.

The immediate use case is for Intel Software Guard Extensions (SGX).
SGX introduces a new CPL3-only "enclave" mode that runs as a sort of
black box shared object that is hosted by an untrusted "normal" CPl3
process.

Entering an enclave can only be done through SGX-specific instructions,
EENTER and ERESUME, and is a non-trivial process.  Because of the
complexity of transitioning to/from an enclave, the vast majority of
enclaves are expected to utilize a library to handle the actual
transitions.  This is roughly analogous to how e.g. libc implementations
are used by most applications.

Another crucial characteristic of SGX enclaves is that they can generate
exceptions as part of their normal (at least as "normal" as SGX can be)
operation that need to be handled *in* the enclave and/or are unique
to SGX.

And because they are essentially fancy shared objects, a process can
host any number of enclaves, each of which can execute multiple threads
simultaneously.

Putting everything together, userspace enclaves will utilize a library
that must be prepared to handle any and (almost) all exceptions any time
at least one thread may be executing in an enclave.  Leveraging signals
to handle the enclave exceptions is unpleasant, to put it mildly, e.g.
the SGX library must constantly (un)register its signal handler based
on whether or not at least one thread is executing in an enclave, and
filter and forward exceptions that aren't related to its enclaves.  This
becomes particularly nasty when using multiple levels of libraries that
register signal handlers, e.g. running an enclave via cgo inside of the
Go runtime.

Enabling exception fixup in vDSO allows the kernel to provide a vDSO
function that wraps the low-level transitions to/from the enclave, i.e.
the EENTER and ERESUME instructions.  The vDSO function can intercept
exceptions that would otherwise generate a signal and return the fault
information directly to its caller, thus avoiding the need to juggle
signal handlers.

Note that unlike the kernel's _ASM_EXTABLE_HANDLE implementation, the
'C' version of _ASM_VDSO_EXTABLE_HANDLE doesn't use a pre-compiled
assembly macro.  Duplicating four lines of code is simpler than adding
the necessary infrastructure to generate pre-compiled assembly and the
intended benefit of massaging GCC's inlining algorithm is unlikely to
realized in the vDSO any time soon, if ever.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/entry/vdso/Makefile          |  6 ++--
 arch/x86/entry/vdso/extable.c         | 46 ++++++++++++++++++++++++
 arch/x86/entry/vdso/extable.h         | 29 ++++++++++++++++
 arch/x86/entry/vdso/vdso-layout.lds.S |  9 ++++-
 arch/x86/entry/vdso/vdso2c.h          | 50 ++++++++++++++++++++++++++-
 arch/x86/include/asm/vdso.h           |  5 +++
 6 files changed, 140 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/entry/vdso/extable.c
 create mode 100644 arch/x86/entry/vdso/extable.h

Comments

Borislav Petkov June 29, 2020, 5:10 p.m. UTC | #1
On Thu, Jun 18, 2020 at 01:08:37AM +0300, Jarkko Sakkinen wrote:
...
> intended benefit of massaging GCC's inlining algorithm is unlikely to
> realized in the vDSO any time soon, if ever.

That is a very good explanation and I would prefer if it would be in a
sgx-specific README or so instead of it getting lost in git...

> +bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
> +			  unsigned long error_code, unsigned long fault_addr)
> +{
> +	const struct vdso_image *image = current->mm->context.vdso_image;
> +	const struct vdso_exception_table_entry *extable;
> +	unsigned int nr_entries, i;
> +	unsigned long base;
> +
> +	/*
> +	 * Do not attempt to fixup #DB or #BP.  It's impossible to identify
> +	 * whether or not a #DB/#BP originated from within an SGX enclave and
> +	 * SGX enclaves are currently the only use case for vDSO fixup.
> +	 */

So this is all fine and dandy but nowhere do I see the code doing:

	if (am_I_an_sgx_enclave(tsk))
		fixup_vdso_exception()

because that vDSO exception fixup, albeit it looking kinda generic, is
SGX-only for now. So it should be designed to run only for SGX enclaves
for now.

Also, is there any particular reason for fixup_vdso_exception() to be in
arch/x86/entry/vdso/extable.c instead of in arch/x86/mm/extable.c?

I mean, it gets called by traps.c so it looks like normal kernel code to
me or am I missing some vdso magic?

And built only when CONFIG_INTEL_SGX is enabled.

And so on...

...

> diff --git a/arch/x86/entry/vdso/extable.h b/arch/x86/entry/vdso/extable.h
> new file mode 100644
> index 000000000000..aafdac396948
> --- /dev/null
> +++ b/arch/x86/entry/vdso/extable.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __VDSO_EXTABLE_H
> +#define __VDSO_EXTABLE_H
> +
> +/*
> + * Inject exception fixup for vDSO code.  Unlike normal exception fixup,
> + * vDSO uses a dedicated handler the addresses are relative to the overall
> + * exception table, not each individual entry.
> + */
> +#ifdef __ASSEMBLY__
> +#define _ASM_VDSO_EXTABLE_HANDLE(from, to)	\
> +	ASM_VDSO_EXTABLE_HANDLE from to
> +
> +.macro ASM_VDSO_EXTABLE_HANDLE from:req to:req
> +	.pushsection __ex_table, "a"
> +	.long (\from) - __ex_table
> +	.long (\to) - __ex_table
> +	.popsection
> +.endm
> +#else
> +#define _ASM_VDSO_EXTABLE_HANDLE(from, to)	\
> +	".pushsection __ex_table, \"a\"\n"      \
> +	".long (" #from ") - __ex_table\n"      \
> +	".long (" #to ") - __ex_table\n"        \
> +	".popsection\n"
> +#endif
> +
> +#endif /* __VDSO_EXTABLE_H */
> +

.git/rebase-apply/patch:122: new blank line at EOF.
+

Thx.
Sean Christopherson June 30, 2020, 6 a.m. UTC | #2
On Mon, Jun 29, 2020 at 07:10:22PM +0200, Borislav Petkov wrote:
> On Thu, Jun 18, 2020 at 01:08:37AM +0300, Jarkko Sakkinen wrote:
> ...
> > intended benefit of massaging GCC's inlining algorithm is unlikely to
> > realized in the vDSO any time soon, if ever.
> 
> That is a very good explanation and I would prefer if it would be in a
> sgx-specific README or so instead of it getting lost in git...
> 
> > +bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
> > +			  unsigned long error_code, unsigned long fault_addr)
> > +{
> > +	const struct vdso_image *image = current->mm->context.vdso_image;
> > +	const struct vdso_exception_table_entry *extable;
> > +	unsigned int nr_entries, i;
> > +	unsigned long base;
> > +
> > +	/*
> > +	 * Do not attempt to fixup #DB or #BP.  It's impossible to identify
> > +	 * whether or not a #DB/#BP originated from within an SGX enclave and
> > +	 * SGX enclaves are currently the only use case for vDSO fixup.
> > +	 */
> 
> So this is all fine and dandy but nowhere do I see the code doing:
> 
> 	if (am_I_an_sgx_enclave(tsk))
> 		fixup_vdso_exception()
> 
> because that vDSO exception fixup, albeit it looking kinda generic, is
> SGX-only for now. So it should be designed to run only for SGX enclaves
> for now.

That's not really feasible as there is no readily available identifier for
an SGX task.  The only indication that a relevant task is an SGX task is if
it has mmap()'d /dev/sgx/enclave, and hooking that would be heinous.  And
adding flag just to tag the task as SGX seems wasteful.

Even if we could easily condition the vDSO fixup on SGX tasks, I don't think
that'd be a good ABI for the SGX vDSO code.  The intended contract is that
fixup will happen simply by virtue of the code at the related IP taking a
fault (in userspace).  E.g. the vDSO function should get the fixup even if
userspace screws up mmap() and invokes __vdso_sgx_enter_enclave() without
being tagged an SGX task.

> Also, is there any particular reason for fixup_vdso_exception() to be in
> arch/x86/entry/vdso/extable.c instead of in arch/x86/mm/extable.c?
> 
> I mean, it gets called by traps.c so it looks like normal kernel code to
> me or am I missing some vdso magic?

No hard dependency, it's normal kernel code.  My reasoning for dropping it
in .../vdso was largely to co-locate it with vdso/extable.h due to the
dependency on the format of 'struct vdso_exception_table_entry'.  And I
put extable.h in .../vdso because it contains macros that are only for use
in actual vDSO code.
 
> And built only when CONFIG_INTEL_SGX is enabled.

Ya, shouldn't be a problem to stub it out for SGX=n.
Borislav Petkov June 30, 2020, 8:41 a.m. UTC | #3
On Mon, Jun 29, 2020 at 11:00:55PM -0700, Sean Christopherson wrote:
> E.g. the vDSO function should get the fixup even if userspace screws
> up mmap() and invokes __vdso_sgx_enter_enclave() without being tagged
> an SGX task.

I sincerely hope you don't mean this seriously.

Please add a member to task_struct which denotes that a task is an
sgx task, test that member where needed and forget real quickly about
running *any* *fixup* for unrelated tasks.

> No hard dependency, it's normal kernel code.  My reasoning for dropping it
> in .../vdso was largely to co-locate it with vdso/extable.h due to the
> dependency on the format of 'struct vdso_exception_table_entry'.

A struct which you defined instead of simply using struct
exception_table_entry even if it has a handler member which would remain
unused?

Let's not put code in vdso/ if it doesn't really belong there pls.

> Ya, shouldn't be a problem to stub it out for SGX=n.

Thx.
Sean Christopherson June 30, 2020, 2:55 p.m. UTC | #4
On Tue, Jun 30, 2020 at 10:41:28AM +0200, Borislav Petkov wrote:
> On Mon, Jun 29, 2020 at 11:00:55PM -0700, Sean Christopherson wrote:
> > E.g. the vDSO function should get the fixup even if userspace screws
> > up mmap() and invokes __vdso_sgx_enter_enclave() without being tagged
> > an SGX task.
> 
> I sincerely hope you don't mean this seriously.
> 
> Please add a member to task_struct which denotes that a task is an
> sgx task, test that member where needed and forget real quickly about
> running *any* *fixup* for unrelated tasks.

But IMO they're not unrelated if they end up in __vdso_sgx_enter_enclave().
Getting to the point where __vdso_sgx_enter_enclave() actually attempts
ENCLU (the single fixup entry) requires a very deliberate attempt to run an
enclave.  Not to mention the fixup doesn't squash the fault, it simply
morphs what would be a signal into a synchronous error.

And I don't see how to sanely track this in task_struct.  As stated before,
the only foolproof way to identify an SGX task is by tracking whether it
has a VMA backed by /dev/sgx/enclave, i.e. the flag would probably need to
reside in mm_struct.  Keying off opening /dev/sgx/enclave isn't viable as
enclaves can be handed off via SCM_RIGHTS or fork().

Putting a flag in mm_struct is doable, but it would need to be sticky to
keep things sane, e.g. clearing the flag on unmapping the enclave would
require refcounting VMAs and hooking vm_ops->close.  The refcounting is
painful (we had it for several years) and ->close prevents merging VMAs,
which IMO is far worse than unconditionally morphing fault-based signals
that originate in __vdso_sgx_enter_enclave().  And assuming we don't do the
whole ->close thing, we'd end up with MADV_DONTFORK -> fork() having
divergent behavior for mm_structs without /dev/sgx/enclave VMAs, i.e. the
child of MADV_DONTFORK -> fork() case would not be tagged SGX (unless we
intentionally do the "wrong" thing and propagate the flag), whereas the
munmap() case would result in a SGX-tagged mm_struct with no
/dev/sgx/enclave VMAs.
Andy Lutomirski June 30, 2020, 4:48 p.m. UTC | #5
On Tue, Jun 30, 2020 at 1:41 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Jun 29, 2020 at 11:00:55PM -0700, Sean Christopherson wrote:
> > E.g. the vDSO function should get the fixup even if userspace screws
> > up mmap() and invokes __vdso_sgx_enter_enclave() without being tagged
> > an SGX task.
>
> I sincerely hope you don't mean this seriously.
>
> Please add a member to task_struct which denotes that a task is an
> sgx task, test that member where needed and forget real quickly about
> running *any* *fixup* for unrelated tasks.

I don't see the problem.  If you call this magic vDSO function and get
a fault, it gets handled.  What's the failure mode?

>
> > No hard dependency, it's normal kernel code.  My reasoning for dropping it
> > in .../vdso was largely to co-locate it with vdso/extable.h due to the
> > dependency on the format of 'struct vdso_exception_table_entry'.
>
> A struct which you defined instead of simply using struct
> exception_table_entry even if it has a handler member which would remain
> unused?

Don't forget the cross-arch issue.  We need that structure to have
constant layout so that the -m32 build from the vDSO has the same
layout as in the kernel.

So my only actual objection to the patch is that there should probably
be a comment above the structure reminding people that it needs to
have the same layout on 32-bit, x32, and x86_64.  So maybe the entries
should be s32 instead of int, although this doesn't really matter.
Sean Christopherson June 30, 2020, 5:23 p.m. UTC | #6
On Tue, Jun 30, 2020 at 09:48:22AM -0700, Andy Lutomirski wrote:
> On Tue, Jun 30, 2020 at 1:41 AM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Mon, Jun 29, 2020 at 11:00:55PM -0700, Sean Christopherson wrote:
> > > E.g. the vDSO function should get the fixup even if userspace screws
> > > up mmap() and invokes __vdso_sgx_enter_enclave() without being tagged
> > > an SGX task.
> >
> > I sincerely hope you don't mean this seriously.
> >
> > Please add a member to task_struct which denotes that a task is an
> > sgx task, test that member where needed and forget real quickly about
> > running *any* *fixup* for unrelated tasks.
> 
> I don't see the problem.  If you call this magic vDSO function and get
> a fault, it gets handled.  What's the failure mode?
> 
> >
> > > No hard dependency, it's normal kernel code.  My reasoning for dropping it
> > > in .../vdso was largely to co-locate it with vdso/extable.h due to the
> > > dependency on the format of 'struct vdso_exception_table_entry'.
> >
> > A struct which you defined instead of simply using struct
> > exception_table_entry even if it has a handler member which would remain
> > unused?
> 
> Don't forget the cross-arch issue.  We need that structure to have
> constant layout so that the -m32 build from the vDSO has the same
> layout as in the kernel.
> 
> So my only actual objection to the patch is that there should probably
> be a comment above the structure reminding people that it needs to
> have the same layout on 32-bit, x32, and x86_64.  So maybe the entries
> should be s32 instead of int, although this doesn't really matter.

I highly doubt my past self thought about the cross-arch issue.  The main
reason I created 'struct vdso_exception_table_entry' is that interpretation
of the fields is different.  For the kernel, the fields contain offsets
that are relative to the address of the field itself, i.e. of the fixup
itself.  For vDSO, the offsets are relative to the base of the vDSO.

Reusing exception_table_entry felt like it would be all kinds of confusing.
Thomas Gleixner July 2, 2020, 12:52 p.m. UTC | #7
Andy Lutomirski <luto@amacapital.net> writes:
> On Tue, Jun 30, 2020 at 1:41 AM Borislav Petkov <bp@alien8.de> wrote:
>>
>> On Mon, Jun 29, 2020 at 11:00:55PM -0700, Sean Christopherson wrote:
>> > E.g. the vDSO function should get the fixup even if userspace screws
>> > up mmap() and invokes __vdso_sgx_enter_enclave() without being tagged
>> > an SGX task.
>>
>> I sincerely hope you don't mean this seriously.
>>
>> Please add a member to task_struct which denotes that a task is an
>> sgx task, test that member where needed and forget real quickly about
>> running *any* *fixup* for unrelated tasks.
>
> I don't see the problem.  If you call this magic vDSO function and get
> a fault, it gets handled.  What's the failure mode?

Handled by some definition of handled. If a random user space tasks ends
up in that function then it will not die as it would otherwise, but I
don't see a real issue with that either.

Thanks,

        tglx
diff mbox series

Patch

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 04e65f0698f6..ebe82b7aecda 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -31,7 +31,7 @@  vobjs32-y := vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
 vobjs32-y += vdso32/vclock_gettime.o
 
 # files to link into kernel
-obj-y				+= vma.o
+obj-y				+= vma.o extable.o
 KASAN_SANITIZE_vma.o		:= y
 UBSAN_SANITIZE_vma.o		:= y
 KCSAN_SANITIZE_vma.o		:= y
@@ -130,8 +130,8 @@  $(obj)/%-x32.o: $(obj)/%.o FORCE
 
 targets += vdsox32.lds $(vobjx32s-y)
 
-$(obj)/%.so: OBJCOPYFLAGS := -S
-$(obj)/%.so: $(obj)/%.so.dbg FORCE
+$(obj)/%.so: OBJCOPYFLAGS := -S --remove-section __ex_table
+$(obj)/%.so: $(obj)/%.so.dbg
 	$(call if_changed,objcopy)
 
 $(obj)/vdsox32.so.dbg: $(obj)/vdsox32.lds $(vobjx32s) FORCE
diff --git a/arch/x86/entry/vdso/extable.c b/arch/x86/entry/vdso/extable.c
new file mode 100644
index 000000000000..afcf5b65beef
--- /dev/null
+++ b/arch/x86/entry/vdso/extable.c
@@ -0,0 +1,46 @@ 
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <asm/current.h>
+#include <asm/traps.h>
+#include <asm/vdso.h>
+
+struct vdso_exception_table_entry {
+	int insn, fixup;
+};
+
+bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
+			  unsigned long error_code, unsigned long fault_addr)
+{
+	const struct vdso_image *image = current->mm->context.vdso_image;
+	const struct vdso_exception_table_entry *extable;
+	unsigned int nr_entries, i;
+	unsigned long base;
+
+	/*
+	 * Do not attempt to fixup #DB or #BP.  It's impossible to identify
+	 * whether or not a #DB/#BP originated from within an SGX enclave and
+	 * SGX enclaves are currently the only use case for vDSO fixup.
+	 */
+	if (trapnr == X86_TRAP_DB || trapnr == X86_TRAP_BP)
+		return false;
+
+	if (!current->mm->context.vdso)
+		return false;
+
+	base =  (unsigned long)current->mm->context.vdso + image->extable_base;
+	nr_entries = image->extable_len / (sizeof(*extable));
+	extable = image->extable;
+
+	for (i = 0; i < nr_entries; i++) {
+		if (regs->ip == base + extable[i].insn) {
+			regs->ip = base + extable[i].fixup;
+			regs->di = trapnr;
+			regs->si = error_code;
+			regs->dx = fault_addr;
+			return true;
+		}
+	}
+
+	return false;
+}
diff --git a/arch/x86/entry/vdso/extable.h b/arch/x86/entry/vdso/extable.h
new file mode 100644
index 000000000000..aafdac396948
--- /dev/null
+++ b/arch/x86/entry/vdso/extable.h
@@ -0,0 +1,29 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __VDSO_EXTABLE_H
+#define __VDSO_EXTABLE_H
+
+/*
+ * Inject exception fixup for vDSO code.  Unlike normal exception fixup,
+ * vDSO uses a dedicated handler the addresses are relative to the overall
+ * exception table, not each individual entry.
+ */
+#ifdef __ASSEMBLY__
+#define _ASM_VDSO_EXTABLE_HANDLE(from, to)	\
+	ASM_VDSO_EXTABLE_HANDLE from to
+
+.macro ASM_VDSO_EXTABLE_HANDLE from:req to:req
+	.pushsection __ex_table, "a"
+	.long (\from) - __ex_table
+	.long (\to) - __ex_table
+	.popsection
+.endm
+#else
+#define _ASM_VDSO_EXTABLE_HANDLE(from, to)	\
+	".pushsection __ex_table, \"a\"\n"      \
+	".long (" #from ") - __ex_table\n"      \
+	".long (" #to ") - __ex_table\n"        \
+	".popsection\n"
+#endif
+
+#endif /* __VDSO_EXTABLE_H */
+
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index 4d152933547d..dc8da7695859 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -75,11 +75,18 @@  SECTIONS
 	 * stuff that isn't used at runtime in between.
 	 */
 
-	.text		: { *(.text*) }			:text	=0x90909090,
+	.text		: {
+		*(.text*)
+		*(.fixup)
+	}						:text	=0x90909090,
+
+
 
 	.altinstructions	: { *(.altinstructions) }	:text
 	.altinstr_replacement	: { *(.altinstr_replacement) }	:text
 
+	__ex_table		: { *(__ex_table) }		:text
+
 	/DISCARD/ : {
 		*(.discard)
 		*(.discard.*)
diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index 6f46e11ce539..1c7cfac7e64a 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -5,6 +5,41 @@ 
  * are built for 32-bit userspace.
  */
 
+static void BITSFUNC(copy)(FILE *outfile, const unsigned char *data, size_t len)
+{
+	size_t i;
+
+	for (i = 0; i < len; i++) {
+		if (i % 10 == 0)
+			fprintf(outfile, "\n\t");
+		fprintf(outfile, "0x%02X, ", (int)(data)[i]);
+	}
+}
+
+
+/*
+ * Extract a section from the input data into a standalone blob.  Used to
+ * capture kernel-only data that needs to persist indefinitely, e.g. the
+ * exception fixup tables, but only in the kernel, i.e. the section can
+ * be stripped from the final vDSO image.
+ */
+static void BITSFUNC(extract)(const unsigned char *data, size_t data_len,
+			      FILE *outfile, ELF(Shdr) *sec, const char *name)
+{
+	unsigned long offset;
+	size_t len;
+
+	offset = (unsigned long)GET_LE(&sec->sh_offset);
+	len = (size_t)GET_LE(&sec->sh_size);
+
+	if (offset + len > data_len)
+		fail("section to extract overruns input data");
+
+	fprintf(outfile, "static const unsigned char %s[%lu] = {", name, len);
+	BITSFUNC(copy)(outfile, data + offset, len);
+	fprintf(outfile, "\n};\n\n");
+}
+
 static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 			 void *stripped_addr, size_t stripped_len,
 			 FILE *outfile, const char *image_name)
@@ -15,7 +50,7 @@  static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 	ELF(Ehdr) *hdr = (ELF(Ehdr) *)raw_addr;
 	unsigned long i, syms_nr;
 	ELF(Shdr) *symtab_hdr = NULL, *strtab_hdr, *secstrings_hdr,
-		*alt_sec = NULL;
+		*alt_sec = NULL, *extable_sec = NULL;
 	ELF(Dyn) *dyn = 0, *dyn_end = 0;
 	const char *secstrings;
 	INT_BITS syms[NSYMS] = {};
@@ -77,6 +112,8 @@  static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 		if (!strcmp(secstrings + GET_LE(&sh->sh_name),
 			    ".altinstructions"))
 			alt_sec = sh;
+		if (!strcmp(secstrings + GET_LE(&sh->sh_name), "__ex_table"))
+			extable_sec = sh;
 	}
 
 	if (!symtab_hdr)
@@ -155,6 +192,9 @@  static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 			(int)((unsigned char *)stripped_addr)[i]);
 	}
 	fprintf(outfile, "\n};\n\n");
+	if (extable_sec)
+		BITSFUNC(extract)(raw_addr, raw_len, outfile,
+				  extable_sec, "extable");
 
 	fprintf(outfile, "const struct vdso_image %s = {\n", image_name);
 	fprintf(outfile, "\t.data = raw_data,\n");
@@ -165,6 +205,14 @@  static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 		fprintf(outfile, "\t.alt_len = %lu,\n",
 			(unsigned long)GET_LE(&alt_sec->sh_size));
 	}
+	if (extable_sec) {
+		fprintf(outfile, "\t.extable_base = %lu,\n",
+			(unsigned long)GET_LE(&extable_sec->sh_offset));
+		fprintf(outfile, "\t.extable_len = %lu,\n",
+			(unsigned long)GET_LE(&extable_sec->sh_size));
+		fprintf(outfile, "\t.extable = extable,\n");
+	}
+
 	for (i = 0; i < NSYMS; i++) {
 		if (required_syms[i].export && syms[i])
 			fprintf(outfile, "\t.sym_%s = %" PRIi64 ",\n",
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index bbcdc7b8f963..b5d23470f56b 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -15,6 +15,8 @@  struct vdso_image {
 	unsigned long size;   /* Always a multiple of PAGE_SIZE */
 
 	unsigned long alt, alt_len;
+	unsigned long extable_base, extable_len;
+	const void *extable;
 
 	long sym_vvar_start;  /* Negative offset to the vvar area */
 
@@ -45,6 +47,9 @@  extern void __init init_vdso_image(const struct vdso_image *image);
 
 extern int map_vdso_once(const struct vdso_image *image, unsigned long addr);
 
+extern bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
+				 unsigned long error_code,
+				 unsigned long fault_addr);
 #endif /* __ASSEMBLER__ */
 
 #endif /* _ASM_X86_VDSO_H */