From patchwork Wed Apr 27 03:39:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Konrad Rzeszutek Wilk X-Patchwork-Id: 8952161 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id C8B6D9F39D for ; Wed, 27 Apr 2016 03:44:09 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id C61EA20222 for ; Wed, 27 Apr 2016 03:44:05 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9DA7A2013D for ; Wed, 27 Apr 2016 03:44:01 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1avGLj-0000S1-T1; Wed, 27 Apr 2016 03:41:15 +0000 Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1avGLi-0000Rv-LA for xen-devel@lists.xenproject.org; Wed, 27 Apr 2016 03:41:14 +0000 Received: from [193.109.254.147] by server-16.bemta-14.messagelabs.com id C5/AA-02839-9D430275; Wed, 27 Apr 2016 03:41:13 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrDIsWRWlGSWpSXmKPExsUyZ7p8oO4NE4V wg/WzNC2+b5nM5MDocfjDFZYAxijWzLyk/IoE1oy2J8YFlxuZKi6e2MnawPjqAGMXIyeHkEAb k8TWkz5djFxA9jdGiV2fz7JAOBsZJV6fncIM4fQwSnSd2scO0sIioCrR+HwLaxcjBwebgInEm 1WOIGERAWWJ3l+/wZqZBd4wSzw49wJshbBAssTh29/AbF4BK4nfp2+zQKzeyigx4aoKRFxQ4u TMJ2BxZgF1iT/zLjGDzGcWkJZY/o8DIiwv0bx1NjOIzSlgL3H71HIWkBJRARWJVwfrQcISAoY Spx9uY5zAKDQLydBZSIbOQhg6C8nQBYwsqxg1ilOLylKLdA3N9ZKKMtMzSnITM3N0DQ1N9HJT i4sT01NzEpOK9ZLzczcxAsO8noGBcQfjz9OehxglOZiURHmj5BXChfiS8lMqMxKLM+KLSnNSi w8xynBwKEnwZhsD5QSLUtNTK9Iyc4ARB5OW4OBREuGtA0nzFhck5hZnpkOkTjEac/TtfrCWie PIlk0nmYRY8vLzUqXEeUtBSgVASjNK8+AGwRLBJUZZKWFeRgYGBiGegtSi3MwSVPlXjOIcjEr CvA0gU3gy80rg9r0COoUJ6JTLh2RBTilJREhJNTDKSt95lKmwSlUlmXfz6skdb3yOPDgSszoh bsu1k6oBN3vOZ051ihKPDTHxmzWjdn+LYELYtFccBTVvDwi41wW8DxPjrN2ZdtLOQf6d4NHbx 9pdPE4e3CEss/Tzsur3io9+s/gfcMgTVFa/13xb6vaamIerukOmeW+/IH3mc81DJ+a5vctY5I uUWIozEg21mIuKEwE/FwAE/wIAAA== X-Env-Sender: konrad.wilk@oracle.com X-Msg-Ref: server-8.tower-27.messagelabs.com!1461728470!34280150!1 X-Originating-IP: [156.151.31.81] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTU2LjE1MS4zMS44MSA9PiAyODgzMzk=\n X-StarScan-Received: X-StarScan-Version: 8.34; banners=-,-,- X-VirusChecked: Checked Received: (qmail 18288 invoked from network); 27 Apr 2016 03:41:12 -0000 Received: from userp1040.oracle.com (HELO userp1040.oracle.com) (156.151.31.81) by server-8.tower-27.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 27 Apr 2016 03:41:12 -0000 Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u3R3e1rU000836 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 27 Apr 2016 03:40:02 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u3R3e12H000641 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 27 Apr 2016 03:40:01 GMT Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u3R3dxSe015599; Wed, 27 Apr 2016 03:40:00 GMT Received: from localhost.localdomain (/209.6.196.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 26 Apr 2016 20:39:59 -0700 Date: Tue, 26 Apr 2016 23:39:52 -0400 From: Konrad Rzeszutek Wilk To: Jan Beulich Message-ID: <20160427033952.GF26540@localhost.localdomain> References: <1461598514-5440-1-git-send-email-konrad.wilk@oracle.com> <1461598514-5440-13-git-send-email-konrad.wilk@oracle.com> <571FA39102000078000E5FC1@prv-mh.provo.novell.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <571FA39102000078000E5FC1@prv-mh.provo.novell.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Source-IP: userv0021.oracle.com [156.151.31.71] Cc: Kevin Tian , Stefano Stabellini , Keir Fraser , Suravee Suthikulpanit , andrew.cooper3@citrix.com, mpohlack@amazon.de, ross.lagerwall@citrix.com, Julien Grall , Jun Nakajima , sasha.levin@oracle.com, xen-devel@lists.xenproject.org, Boris Ostrovsky Subject: Re: [Xen-devel] [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches. X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP . snip.. > > Thinking about it again, even more stack conserving would be a > bitmap... Heheh > > > +static int apply_payload(struct payload *data) > > +{ > > + unsigned int i; > > + > > + printk(XENLOG_INFO XSPLICE "%s: Applying %u functions\n", > > + data->name, data->nfuncs); > > + > > + arch_xsplice_patching_enter(); > > + > > + for ( i = 0; i < data->nfuncs; i++ ) > > + arch_xsplice_apply_jmp(&data->funcs[i]); > > + > > + arch_xsplice_patching_leave(); > > + > > + list_add_tail_rcu(&data->applied_list, &applied_list); > > Neither in the comment earlier on nor here it becomes clear that this > is more of an abuse than a use of RCU. I added more comments and.. > > > +struct xsplice_patch_func { > > + const char *name; /* Name of function to be patched. */ > > + void *new_addr; > > + void *old_addr; > > + uint32_t new_size; > > + uint32_t old_size; > > + uint8_t version; /* MUST be XSPLICE_PAYLOAD_VERSION. */ > > + uint8_t opaque[31]; /* MUST be zero filled. */ > > I don't see the zero filling being a requirement, nor it being enforced. .. removed this. From adeadf8babcc5ef6d512cdc28899b4d1de34c60e Mon Sep 17 00:00:00 2001 From: Ross Lagerwall Date: Thu, 21 Apr 2016 06:14:29 -0400 Subject: [PATCH] xsplice: Implement support for applying/reverting/replacing patches. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implement support for the apply, revert and replace actions. To perform and action on a payload, the hypercall sets up a data structure to schedule the work. A hook is added in the reset_stack_and_jump to check for work and execute it if needed (specifically we check an per-cpu flag to make this as quick as possible). In this way, patches can be applied with all CPUs idle and without stacks. The first CPU to run check_for_xsplice_work() becomes the master and triggers a reschedule softirq to trigger all the other CPUs to enter check_for_xsplice_work() with no stack. Once all CPUs have rendezvoused, all CPUs disable their IRQs and NMIs are ignored. The system is then quiscient and the master performs the action. After this, all CPUs enable IRQs and NMIs are re-enabled. Note that it is unsafe to patch do_nmi and the xSplice internal functions. Patching functions on NMI/MCE path is liable to end in disaster on x86. This is not addressed in this patch and is mentioned in the design doc as a further TODO. The action to perform is one of: - APPLY: For each function in the module, store the first arch-specific number bytes of the old function and replace it with a jump to the new function. (on x86 it is 5 bytes, on ARM it will likey be 4 bytes). - REVERT: Copy the previously stored bytes into the first arch-specific number of bytes of the old function (again, 5 bytes on x86). - REPLACE: Revert each applied module and then apply the new module. To prevent a deadlock with any other barrier in the system, the master will wait for up to 30ms before timing out. Measurements found that the patch application to take about 100 ?s on a 72 CPU system, whether idle or fully loaded. Signed-off-by: Ross Lagerwall Signed-off-by: Konrad Rzeszutek Wilk Acked-by: Julien Grall --- Cc: Stefano Stabellini Cc: Julien Grall Cc: Keir Fraser Cc: Jan Beulich Cc: Andrew Cooper Cc: Boris Ostrovsky Cc: Suravee Suthikulpanit Cc: Jun Nakajima Cc: Kevin Tian v2: - Pluck the 'struct xsplice_patch_func' in this patch. - Modify code per review comments. - Add more data in the keyboard handler. - Redo the patching code, split it in functions. v3: - Add return_ macro for debug builds. - Move s/payload_list_lock/payload_list/ to earlier patch - Remove const and use ELF types for xsplice_patch_func - Add check routine to do simple sanity checks for various sections. - s/%p/PRIx64/ as ARM builds complain. - Move code around. Add more dprintk. Add XSPLICE in front of all printks/dprintk. Put the NMIs back if we fail patching. Add per-cpu to lessen contention for global structure. Extract from xsplice_do_single patching code into xsplice_do_action Squash xsplice_do_single and check_for_xsplice_work together to have all rendezvous in one place. Made XSPLICE_ACTION_REPLACE work again (wrong list iterator) s/find_special_sections/prepare_payload/ Use list_del_init and INIT_LIST_HEAD for applied_list v4: - Add comment, adjust spacing for "Timed out on CPU semaphore" - Added CR0.WP manipulations when altering the .text of hypervisor. - Added fix from Andrew for CR0.WP manipulation. v5: - Made xsplice_patch_func use uintXX_t instead of ELF_ types to easy making it work under ARM (32bit). Add more BUILD-BUG-ON checks. - Add more BUILD_ON checks. Sprinkle newlines. v6: Rebase on "arm/x86: Alter nmi_callback_t typedef" - Drop the recursive spinlock usage. - Move NMI callbacks in arch specific. - Fold the 'check_for_xsplice_work' in reset_stack_and_jump - Add arch specific check for .xsplice.funcs. - Seperate external and internal structure of .xsplice.funcs. - Changed per Jan's review - Modified the .xsplice.funcs checks v7: - Modified old_ptr to void* instead of uint8_t* - Modified the xsplice_patch_func_internal for ARM32 to have padding. - Used #if BITS_PER_LONG == 64 for the xsplice_patch_func_internal along with ifndef CONFIG_ARM for the undo (which may be different size on ARM64) v8: - Add "is empty" if special sections are in fact empty. - Added Andrew's Reviewed-by: - Rebase on v7.2 of x86/mm: Introduce modify_xen_mappings() - Change some of printk to dprintk and some of the dprintk to printk. - Make the xsplice_patch_func (and the internal) structure have uint32_t (instead of uint64_t) if BITS_PER_LONG==32. This makes the size and offset different so note that in the design and common code. - Add #undef ACTION - Guard struct xsplice_patch_func in sysctl.h with __XEN__ as toolstacks will fail to compile. We do have BITS_PER_LONG defined in xc_bitops.h but that will go away (and also that macro uses sizeof and the pre-processor will choke on that). - Dropped Julien's Acked as I replaced BITS_PER_LONG/CONFIG_ARM_32. (Stefano is OK with it, but would prefer BITS_PER_LONG, Jan does not want BITS_PER_LONG). v9: Expose the struct xsplice_patch_func old_addr and new_addr as void instead of uint32_t or uint64_t. - Added Julien' Ack back. - Rename pad to opaque. - Added comment in aidle_loop. - Squash internal and public of 'xsplice_patch_func' - Fixed remainig sizeof use. - Removed reference to MCE - Fixed comment styles. - Use bool_t in check_special_sections - Add a #define for .xsplice.funcs. - Remove full stops from printk - Fix xsplice_do_action per Jan's punchlist - Use spin_lock_try in keyhandler - Remove leading underscores from __CHECK_FOR_XSPLICE_WORK - Don't fail compilation on GCC5 - we MUST have rc set. - Don't bail out if finding !sh_type as those are for .rela or .debug and while we don't need to allocate it (as we had already done the relocation), do continue. - Make applied_list be an RCU type to guard against infinite loops when searching the applied_list. - Dropped the irq_semaphore and are re-using the semaphore atomic when CPUs have rendezvoused and are ready to go in IRQ disable phase. v10: Drop Reviewed-by - Use bitmap for in check_special_sections to check for sections appearing twice. - Add comment about us abusing the list RCU for our safety reasons. - And remove MUST comment about opaque having to be zero filled. --- xen/arch/arm/xsplice.c | 33 +++ xen/arch/x86/domain.c | 6 + xen/arch/x86/xsplice.c | 76 +++++++ xen/common/xsplice.c | 480 +++++++++++++++++++++++++++++++++++++++++- xen/include/asm-x86/current.h | 10 +- xen/include/public/sysctl.h | 20 ++ xen/include/xen/xsplice.h | 21 ++ 7 files changed, 634 insertions(+), 12 deletions(-) diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c index 8cb7767..db0dce2 100644 --- a/xen/arch/arm/xsplice.c +++ b/xen/arch/arm/xsplice.c @@ -7,6 +7,39 @@ #include #include +void arch_xsplice_patching_enter(void) +{ +} + +void arch_xsplice_patching_leave(void) +{ +} + +int arch_xsplice_verify_func(const struct xsplice_patch_func *func) +{ + return -ENOSYS; +} + +void arch_xsplice_apply_jmp(struct xsplice_patch_func *func) +{ +} + +void arch_xsplice_revert_jmp(const struct xsplice_patch_func *func) +{ +} + +void arch_xsplice_post_action(void) +{ +} + +void arch_xsplice_mask(void) +{ +} + +void arch_xsplice_unmask(void) +{ +} + int arch_xsplice_verify_elf(const struct xsplice_elf *elf) { return -ENOSYS; diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index e93ff20..d13b272 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -120,6 +121,11 @@ static void idle_loop(void) (*pm_idle)(); do_tasklet(); do_softirq(); + /* + * We MUST be last (or before pm_idle). Otherwise after we get the + * softirq we would execute pm_idle (and sleep) and not patch. + */ + check_for_xsplice_work(); } } diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c index e50cbc0..fc1338e 100644 --- a/xen/arch/x86/xsplice.c +++ b/xen/arch/x86/xsplice.c @@ -11,6 +11,82 @@ #include #include +#include + +#define PATCH_INSN_SIZE 5 + +void arch_xsplice_patching_enter(void) +{ + /* Disable WP to allow changes to read-only pages. */ + write_cr0(read_cr0() & ~X86_CR0_WP); +} + +void arch_xsplice_patching_leave(void) +{ + /* Reinstate WP. */ + write_cr0(read_cr0() | X86_CR0_WP); +} + +int arch_xsplice_verify_func(const struct xsplice_patch_func *func) +{ + /* No NOP patching yet. */ + if ( !func->new_size ) + return -EOPNOTSUPP; + + if ( func->old_size < PATCH_INSN_SIZE ) + return -EINVAL; + + return 0; +} + +void arch_xsplice_apply_jmp(struct xsplice_patch_func *func) +{ + int32_t val; + uint8_t *old_ptr; + + BUILD_BUG_ON(PATCH_INSN_SIZE > sizeof(func->opaque)); + BUILD_BUG_ON(PATCH_INSN_SIZE != (1 + sizeof(val))); + + old_ptr = func->old_addr; + memcpy(func->opaque, old_ptr, PATCH_INSN_SIZE); + + *old_ptr++ = 0xe9; /* Relative jump */ + val = func->new_addr - func->old_addr - PATCH_INSN_SIZE; + memcpy(old_ptr, &val, sizeof(val)); +} + +void arch_xsplice_revert_jmp(const struct xsplice_patch_func *func) +{ + memcpy(func->old_addr, func->opaque, PATCH_INSN_SIZE); +} + +/* Serialise the CPU pipeline. */ +void arch_xsplice_post_action(void) +{ + cpuid_eax(0); +} + +static nmi_callback_t *saved_nmi_callback; +/* + * Note that because of this NOP code the do_nmi is not safely patchable. + * Also if we do receive 'real' NMIs we have lost them. + */ +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu) +{ + /* TODO: Handle missing NMI/MCE.*/ + return 1; +} + +void arch_xsplice_mask(void) +{ + saved_nmi_callback = set_nmi_callback(mask_nmi_callback); +} + +void arch_xsplice_unmask(void) +{ + set_nmi_callback(saved_nmi_callback); +} + int arch_xsplice_verify_elf(const struct xsplice_elf *elf) { diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c index 3f3aacc..6d94023 100644 --- a/xen/common/xsplice.c +++ b/xen/common/xsplice.c @@ -3,6 +3,7 @@ * */ +#include #include #include #include @@ -11,18 +12,30 @@ #include #include #include +#include #include #include +#include #include #include #include -#include -/* Protects against payload_list operations. */ +/* + * Protects against payload_list operations and also allows only one + * caller in schedule_work. + */ static DEFINE_SPINLOCK(payload_lock); static LIST_HEAD(payload_list); +/* + * Patches which have been applied. Need RCU in case we crash (and then + * traps code would iterate via applied_list) when adding entries on the list. + * + * Note: There are no 'rcu_applied_lock' as we don't iterate yet the list. + */ +static LIST_HEAD(applied_list); + static unsigned int payload_cnt; static unsigned int payload_version = 1; @@ -37,9 +50,35 @@ struct payload { const void *ro_addr; /* Virtual address of .rodata. */ size_t ro_size; /* .. and its size (if any). */ unsigned int pages; /* Total pages for [text,rw,ro]_addr */ + struct list_head applied_list; /* Linked to 'applied_list'. */ + struct xsplice_patch_func *funcs; /* The array of functions to patch. */ + unsigned int nfuncs; /* Nr of functions to patch. */ char name[XEN_XSPLICE_NAME_SIZE]; /* Name of it. */ }; +/* Defines an outstanding patching action. */ +struct xsplice_work +{ + atomic_t semaphore; /* Used to rendezvous CPUs in + check_for_xsplice_work. */ + uint32_t timeout; /* Timeout to do the operation. */ + struct payload *data; /* The payload on which to act. */ + volatile bool_t do_work; /* Signals work to do. */ + volatile bool_t ready; /* Signals all CPUs synchronized. */ + unsigned int cmd; /* Action request: XSPLICE_ACTION_* */ +}; + +/* There can be only one outstanding patching action. */ +static struct xsplice_work xsplice_work; + +/* + * Indicate whether the CPU needs to consult xsplice_work structure. + * We want an per-cpu data structure otherwise the check_for_xsplice_work + * would hammer a global xsplice_work structure on every guest VMEXIT. + * Having an per-cpu lessens the load. + */ +static DEFINE_PER_CPU(bool_t, work_to_do); + static int get_name(const xen_xsplice_name_t *name, char *n) { if ( !name->size || name->size > XEN_XSPLICE_NAME_SIZE ) @@ -268,6 +307,91 @@ static int secure_payload(struct payload *payload, struct xsplice_elf *elf) return rc; } +static int check_special_sections(const struct xsplice_elf *elf) +{ + unsigned int i; + static const char *const names[] = { ELF_XSPLICE_FUNC }; + DECLARE_BITMAP(count, ARRAY_SIZE(names)) = { 0 }; + + for ( i = 0; i < ARRAY_SIZE(names); i++ ) + { + const struct xsplice_elf_sec *sec; + + sec = xsplice_elf_sec_by_name(elf, names[i]); + if ( !sec ) + { + dprintk(XENLOG_ERR, XSPLICE "%s: %s is missing!\n", + elf->name, names[i]); + return -EINVAL; + } + + if ( !sec->sec->sh_size ) + { + dprintk(XENLOG_ERR, XSPLICE "%s: %s is empty!\n", + elf->name, names[i]); + return -EINVAL; + } + + if ( test_bit(i, count) ) + { + dprintk(XENLOG_ERR, XSPLICE "%s: %s was seen more than once!\n", + elf->name, names[i]); + return -EINVAL; + } + + __set_bit(i, count); + } + + return 0; +} + +static int prepare_payload(struct payload *payload, + struct xsplice_elf *elf) +{ + const struct xsplice_elf_sec *sec; + unsigned int i; + struct xsplice_patch_func *f; + + sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_FUNC); + ASSERT(sec); + if ( sec->sec->sh_size % sizeof(*payload->funcs) ) + { + dprintk(XENLOG_ERR, XSPLICE "%s: Wrong size of "ELF_XSPLICE_FUNC"!\n", + elf->name); + return -EINVAL; + } + + payload->funcs = sec->load_addr; + payload->nfuncs = sec->sec->sh_size / sizeof(*payload->funcs); + + for ( i = 0; i < payload->nfuncs; i++ ) + { + int rc; + + f = &(payload->funcs[i]); + + if ( f->version != XSPLICE_PAYLOAD_VERSION ) + { + dprintk(XENLOG_ERR, XSPLICE "%s: Wrong version (%u). Expected %d!\n", + elf->name, f->version, XSPLICE_PAYLOAD_VERSION); + return -EOPNOTSUPP; + } + + if ( !f->new_addr || !f->new_size ) + { + dprintk(XENLOG_ERR, XSPLICE "%s: Address or size fields are zero!\n", + elf->name); + return -EINVAL; + } + + rc = arch_xsplice_verify_func(f); + if ( rc ) + return rc; + } + + return 0; +} + static void free_payload(struct payload *data) { ASSERT(spin_is_locked(&payload_lock)); @@ -299,6 +423,14 @@ static int load_payload_data(struct payload *payload, void *raw, size_t len) if ( rc ) goto out; + rc = check_special_sections(&elf); + if ( rc ) + goto out; + + rc = prepare_payload(payload, &elf); + if ( rc ) + goto out; + rc = secure_payload(payload, &elf); out: @@ -354,6 +486,7 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload) data->state = XSPLICE_STATE_CHECKED; INIT_LIST_HEAD(&data->list); + INIT_LIST_HEAD(&data->applied_list); list_add_tail(&data->list, &payload_list); payload_cnt++; @@ -464,6 +597,313 @@ static int xsplice_list(xen_sysctl_xsplice_list_t *list) return rc ? : idx; } +/* + * The following functions get the CPUs into an appropriate state and + * apply (or revert) each of the payload's functions. This is needed + * for XEN_SYSCTL_XSPLICE_ACTION operation (see xsplice_action). + */ + +static int apply_payload(struct payload *data) +{ + unsigned int i; + + printk(XENLOG_INFO XSPLICE "%s: Applying %u functions\n", + data->name, data->nfuncs); + + arch_xsplice_patching_enter(); + + for ( i = 0; i < data->nfuncs; i++ ) + arch_xsplice_apply_jmp(&data->funcs[i]); + + arch_xsplice_patching_leave(); + + /* + * We need RCU variant (which has barriers) in case we crash here. + * The applied_list is iterated by the trap code. + */ + list_add_tail_rcu(&data->applied_list, &applied_list); + + return 0; +} + +static int revert_payload(struct payload *data) +{ + unsigned int i; + + printk(XENLOG_INFO XSPLICE "%s: Reverting\n", data->name); + + arch_xsplice_patching_enter(); + + for ( i = 0; i < data->nfuncs; i++ ) + arch_xsplice_revert_jmp(&data->funcs[i]); + + arch_xsplice_patching_leave(); + + /* + * We need RCU variant (which has barriers) in case we crash here. + * The applied_list is iterated by the trap code. + */ + list_del_rcu(&data->applied_list); + + return 0; +} + +/* + * This function is executed having all other CPUs with no deep stack (we may + * have cpu_idle on it) and IRQs disabled. + */ +static void xsplice_do_action(void) +{ + int rc; + struct payload *data, *other, *tmp; + + data = xsplice_work.data; + /* + * This function and the transition from asm to C code should be the only + * one on any stack. No need to lock the payload list or applied list. + */ + switch ( xsplice_work.cmd ) + { + case XSPLICE_ACTION_APPLY: + rc = apply_payload(data); + if ( rc == 0 ) + data->state = XSPLICE_STATE_APPLIED; + break; + + case XSPLICE_ACTION_REVERT: + rc = revert_payload(data); + if ( rc == 0 ) + data->state = XSPLICE_STATE_CHECKED; + break; + + case XSPLICE_ACTION_REPLACE: + rc = 0; + /* + * N.B: Use 'applied_list' member, not 'list'. We also abuse the + * the 'normal' list iterator as the list is an RCU one. + */ + list_for_each_entry_safe_reverse ( other, tmp, &applied_list, applied_list ) + { + other->rc = revert_payload(other); + if ( other->rc == 0 ) + other->state = XSPLICE_STATE_CHECKED; + else + { + rc = -EINVAL; + break; + } + } + + if ( rc == 0 ) + { + rc = apply_payload(data); + if ( rc == 0 ) + data->state = XSPLICE_STATE_APPLIED; + } + break; + + default: + rc = -EINVAL; /* Make GCC5 happy. */ + ASSERT_UNREACHABLE(); + break; + } + + /* We must set rc as xsplice_action sets it to -EAGAIN when kicking of. */ + data->rc = rc; +} + +static int schedule_work(struct payload *data, uint32_t cmd, uint32_t timeout) +{ + ASSERT(spin_is_locked(&payload_lock)); + + /* Fail if an operation is already scheduled. */ + if ( xsplice_work.do_work ) + return -EBUSY; + + if ( !get_cpu_maps() ) + { + printk(XENLOG_ERR XSPLICE "%s: unable to get cpu_maps lock!\n", + data->name); + return -EBUSY; + } + + xsplice_work.cmd = cmd; + xsplice_work.data = data; + xsplice_work.timeout = timeout ?: MILLISECS(30); + + dprintk(XENLOG_DEBUG, XSPLICE "%s: timeout is %"PRI_stime"ms\n", + data->name, xsplice_work.timeout / MILLISECS(1)); + + atomic_set(&xsplice_work.semaphore, -1); + + xsplice_work.ready = 0; + + smp_wmb(); + + xsplice_work.do_work = 1; + this_cpu(work_to_do) = 1; + + put_cpu_maps(); + + return 0; +} + +static void reschedule_fn(void *unused) +{ + this_cpu(work_to_do) = 1; + raise_softirq(SCHEDULE_SOFTIRQ); +} + +static int xsplice_spin(atomic_t *counter, s_time_t timeout, + unsigned int cpus, const char *s) +{ + int rc = 0; + + while ( atomic_read(counter) != cpus && NOW() < timeout ) + cpu_relax(); + + /* Log & abort. */ + if ( atomic_read(counter) != cpus ) + { + printk(XENLOG_ERR XSPLICE "%s: Timed out on semaphore in %s quiesce phase %u/%u\n", + xsplice_work.data->name, s, atomic_read(counter), cpus); + rc = -EBUSY; + xsplice_work.data->rc = rc; + smp_wmb(); + xsplice_work.do_work = 0; + } + + return rc; +} + +/* + * The main function which manages the work of quiescing the system and + * patching code. + */ +void check_for_xsplice_work(void) +{ +#define ACTION(x) [XSPLICE_ACTION_##x] = #x + static const char *const names[] = { + ACTION(APPLY), + ACTION(REVERT), + ACTION(REPLACE), + }; +#undef ACTION + unsigned int cpu = smp_processor_id(); + s_time_t timeout; + unsigned long flags; + + /* Fast path: no work to do. */ + if ( !per_cpu(work_to_do, cpu ) ) + return; + + smp_rmb(); + /* In case we aborted, other CPUs can skip right away. */ + if ( !xsplice_work.do_work ) + { + per_cpu(work_to_do, cpu) = 0; + return; + } + + ASSERT(local_irq_is_enabled()); + + /* Set at -1, so will go up to num_online_cpus - 1. */ + if ( atomic_inc_and_test(&xsplice_work.semaphore) ) + { + struct payload *p; + unsigned int cpus; + + p = xsplice_work.data; + if ( !get_cpu_maps() ) + { + printk(XENLOG_ERR XSPLICE "%s: CPU%u - unable to get cpu_maps lock!\n", + p->name, cpu); + per_cpu(work_to_do, cpu) = 0; + xsplice_work.data->rc = -EBUSY; + smp_wmb(); + xsplice_work.do_work = 0; + /* + * Do NOT decrement xsplice_work.semaphore down - as that may cause + * the other CPU (which may be at this point ready to increment it) + * to assume the role of master and then needlessly time out + * out (as do_work is zero). + */ + return; + } + /* "Mask" NMIs. */ + arch_xsplice_mask(); + + barrier(); /* MUST do it after get_cpu_maps. */ + cpus = num_online_cpus() - 1; + + if ( cpus ) + { + dprintk(XENLOG_DEBUG, XSPLICE "%s: CPU%u - IPIing the other %u CPUs\n", + p->name, cpu, cpus); + smp_call_function(reschedule_fn, NULL, 0); + } + + timeout = xsplice_work.timeout + NOW(); + if ( xsplice_spin(&xsplice_work.semaphore, timeout, cpus, "CPU") ) + goto abort; + + /* All CPUs are waiting, now signal to disable IRQs. */ + atomic_set(&xsplice_work.semaphore, 0); + /* + * MUST have a barrier after semaphore so that the other CPUs don't + * leak out of the 'Wait for all CPUs to rendezvous' loop and increment + * 'semaphore' before we set it to zero. + */ + smp_wmb(); + xsplice_work.ready = 1; + + if ( !xsplice_spin(&xsplice_work.semaphore, timeout, cpus, "IRQ") ) + { + local_irq_save(flags); + /* Do the patching. */ + xsplice_do_action(); + /* Serialize and flush out the CPU via CPUID instruction (on x86). */ + arch_xsplice_post_action(); + local_irq_restore(flags); + } + arch_xsplice_unmask(); + + abort: + per_cpu(work_to_do, cpu) = 0; + xsplice_work.do_work = 0; + + /* put_cpu_maps has an barrier(). */ + put_cpu_maps(); + + printk(XENLOG_INFO XSPLICE "%s finished %s with rc=%d\n", + p->name, names[xsplice_work.cmd], p->rc); + } + else + { + /* Wait for all CPUs to rendezvous. */ + while ( xsplice_work.do_work && !xsplice_work.ready ) + cpu_relax(); + + /* Disable IRQs and signal. */ + local_irq_save(flags); + /* + * We re-use the sempahore, so MUST have it reset by master before + * we exit the loop above. + */ + atomic_inc(&xsplice_work.semaphore); + + /* Wait for patching to complete. */ + while ( xsplice_work.do_work ) + cpu_relax(); + + /* To flush out pipeline. */ + arch_xsplice_post_action(); + local_irq_restore(flags); + + per_cpu(work_to_do, cpu) = 0; + } +} + static int xsplice_action(xen_sysctl_xsplice_action_t *action) { struct payload *data; @@ -503,27 +943,24 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action) case XSPLICE_ACTION_REVERT: if ( data->state == XSPLICE_STATE_APPLIED ) { - /* No implementation yet. */ - data->state = XSPLICE_STATE_CHECKED; - data->rc = 0; + data->rc = -EAGAIN; + rc = schedule_work(data, action->cmd, action->timeout); } break; case XSPLICE_ACTION_APPLY: if ( data->state == XSPLICE_STATE_CHECKED ) { - /* No implementation yet. */ - data->state = XSPLICE_STATE_APPLIED; - data->rc = 0; + data->rc = -EAGAIN; + rc = schedule_work(data, action->cmd, action->timeout); } break; case XSPLICE_ACTION_REPLACE: if ( data->state == XSPLICE_STATE_CHECKED ) { - /* No implementation yet. */ - data->state = XSPLICE_STATE_CHECKED; - data->rc = 0; + data->rc = -EAGAIN; + rc = schedule_work(data, action->cmd, action->timeout); } break; @@ -588,6 +1025,7 @@ static const char *state2str(unsigned int state) static void xsplice_printall(unsigned char key) { struct payload *data; + unsigned int i; printk("'%c' pressed - Dumping all xsplice patches\n", key); @@ -598,10 +1036,30 @@ static void xsplice_printall(unsigned char key) } list_for_each_entry ( data, &payload_list, list ) + { printk(" name=%s state=%s(%d) %p (.data=%p, .rodata=%p) using %u pages.\n", data->name, state2str(data->state), data->state, data->text_addr, data->rw_addr, data->ro_addr, data->pages); + for ( i = 0; i < data->nfuncs; i++ ) + { + struct xsplice_patch_func *f = &(data->funcs[i]); + printk(" %s patch %p(%u) with %p (%u)\n", + f->name, f->old_addr, f->old_size, f->new_addr, f->new_size); + + if ( i && !(i % 64) ) + { + spin_unlock(&payload_lock); + process_pending_softirqs(); + if ( spin_trylock(&payload_lock) ) + { + printk("Couldn't reacquire lock. Try again.\n"); + return; + } + } + } + } + spin_unlock(&payload_lock); } diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h index 4083261..73a7209 100644 --- a/xen/include/asm-x86/current.h +++ b/xen/include/asm-x86/current.h @@ -86,10 +86,18 @@ static inline struct cpu_info *get_cpu_info(void) unsigned long get_stack_trace_bottom(unsigned long sp); unsigned long get_stack_dump_bottom (unsigned long sp); +#ifdef CONFIG_XSPLICE +# define CHECK_FOR_XSPLICE_WORK "call check_for_xsplice_work;" +#else +# define CHECK_FOR_XSPLICE_WORK "" +#endif + #define reset_stack_and_jump(__fn) \ ({ \ __asm__ __volatile__ ( \ - "mov %0,%%"__OP"sp; jmp %c1" \ + "mov %0,%%"__OP"sp;" \ + CHECK_FOR_XSPLICE_WORK \ + "jmp %c1" \ : : "r" (guest_cpu_user_regs()), "i" (__fn) : "memory" ); \ unreachable(); \ }) diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index 3fa1fe7..b2b312b 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -869,6 +869,26 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_featureset_t); * If zero exit with success. */ +#define XSPLICE_PAYLOAD_VERSION 1 +/* + * .xsplice.funcs structure layout defined in the `Payload format` + * section in the xSplice design document. + * + * We guard this with __XEN__ as toolstacks SHOULD not use it. + */ +#ifdef __XEN__ +struct xsplice_patch_func { + const char *name; /* Name of function to be patched. */ + void *new_addr; + void *old_addr; + uint32_t new_size; + uint32_t old_size; + uint8_t version; /* MUST be XSPLICE_PAYLOAD_VERSION. */ + uint8_t opaque[31]; +}; +typedef struct xsplice_patch_func xsplice_patch_func_t; +#endif + /* * Structure describing an ELF payload. Uniquely identifies the * payload. Should be human readable. diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h index 857c264..c9723e4 100644 --- a/xen/include/xen/xsplice.h +++ b/xen/include/xen/xsplice.h @@ -11,12 +11,16 @@ struct xsplice_elf_sec; struct xsplice_elf_sym; struct xen_sysctl_xsplice_op; +#include #ifdef CONFIG_XSPLICE /* Convenience define for printk. */ #define XSPLICE "xsplice: " +/* ELF payload special section names. */ +#define ELF_XSPLICE_FUNC ".xsplice.funcs" int xsplice_op(struct xen_sysctl_xsplice_op *); +void check_for_xsplice_work(void); /* Arch hooks. */ int arch_xsplice_verify_elf(const struct xsplice_elf *elf); @@ -39,6 +43,22 @@ enum va_type { int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type types); void arch_xsplice_init(void); + +#include /* For struct xsplice_patch_func. */ +int arch_xsplice_verify_func(const struct xsplice_patch_func *func); +/* + * These functions are called around the critical region patching live code, + * for an architecture to take make appropratie global state adjustments. + */ +void arch_xsplice_patching_enter(void); +void arch_xsplice_patching_leave(void); + +void arch_xsplice_apply_jmp(struct xsplice_patch_func *func); +void arch_xsplice_revert_jmp(const struct xsplice_patch_func *func); +void arch_xsplice_post_action(void); + +void arch_xsplice_mask(void); +void arch_xsplice_unmask(void); #else #include /* For -ENOSYS */ @@ -47,6 +67,7 @@ static inline int xsplice_op(struct xen_sysctl_xsplice_op *op) return -ENOSYS; } +static inline void check_for_xsplice_work(void) { }; #endif /* CONFIG_XSPLICE */ #endif /* __XEN_XSPLICE_H__ */